llms.txt and Sitemaps Explained: 2026 Guide for SaaS Sites | The SaaS Library
Doodle illustration showing a sitemap.xml file icon with a magnifying glass and web crawler spider on the left, and an llms.txt file icon with a robot face and brain on the right, illustrating the llms.txt and sitemaps two audiences concept
Thought Leadership

llms.txt and Sitemaps Explained: What They Are, How They Work, and When You Need Them in 2026

Daniel Voss · June 3, 2026 · 13 min read
✓ Independent Analysis ✓ 9 Verified Sources ✓ Updated June 2026

In May 2026, Google said two completely different things about llms.txt within eight days. One team called it unnecessary. Another team shipped a Lighthouse audit that checks for it. Both teams were right.

Sitemaps and llms.txt files solve different problems for different machine audiences. A sitemap is a structured XML file that helps search engines like Google index your site so human readers can find your pages. An llms.txt file is a curated Markdown document that helps AI agents understand your site when they need to fetch context, answer a question, or complete a task on a user’s behalf. In 2026, most B2B SaaS sites need a sitemap. Some will also benefit from an llms.txt file, and figuring out which combination matters is part of the broader work of optimising for AI search in 2026.

The confusion in 2026 is not that one of these files is wrong. It is that both files exist, both serve real audiences, and the audiences are visible in different layers of your traffic data: search-engine crawlers in one layer, AI agents in another. This article walks through what each file is, how we got from one to the other, and how to decide which combination your site actually needs. The framework that resolves the confusion is simple, and we call it the Two Audiences Framework.

Defined Term
The Two Audiences Framework
The Two Audiences Framework is a model for understanding modern site machine-readability: sitemaps are written for search-indexing bots that catalog a site for human readers, while llms.txt files are written for AI agents that fetch context to act on a user’s behalf. The two audiences need different files because they do different jobs.
Doodle infographic titled Who's actually fetching llms.txt showing 5 AI Search Bots on the left with the stat 408 fetched out of 500 million plus bot visits, and 5 IDE Coding Agents on the right with fetched constantly labelled Cursor, Claude Code, Copilot, Windsurf, Cline, Aider. Source: Limy CDN log analysis 2026
Across 500M+ LLM bot visits in a 90-day window, only 408 directly fetched /llms.txt. Source: Limy 2026.
Express Reader · The Position
The Short Version
Sitemaps were built in 2005 for search engines. llms.txt was proposed in 2024 for AI agents. Both still matter in 2026, but for different reasons and for different sites. Most B2B SaaS sites need a sitemap. Some will benefit from an llms.txt file. The decision depends on who is actually fetching your content today.
2005 Year sitemaps became a Google standard Source: sitemaps.org
Sept 2024 Year llms.txt was proposed by Jeremy Howard Source: Answer.AI
0 Major AI labs publicly committing to llms.txt in production search Source: Limy 2026
May 7, 2026 Google ships Lighthouse Agentic Browsing audit Source: Lighthouse 13.3

What is a sitemap, and what is it actually for?

A sitemap is a structured XML file that tells search engines which pages exist on your site, when each page was last updated, and how often it changes. It is the standard machine-readable index for the open web, and almost every site on the internet has one.

The protocol was introduced by Google in June 2005 and reached broader adoption in November 2006, when Google, Yahoo, and Microsoft jointly committed to supporting Sitemap 0.9 across their search engines. That joint announcement is what turned sitemaps from a Google convention into a cross-engine standard. The reason it stuck is simple: web crawlers had no other reliable way to know what existed on a site, especially as sites grew into thousands of pages, deep navigation hierarchies, and dynamically generated URLs.

A sitemap solves three problems for a search engine. It lists pages the crawler might otherwise miss. It signals when a page was last modified, so the crawler does not waste budget re-fetching unchanged content. And it offers a relative priority signal across pages on the same site, though Google has publicly downweighted that signal in recent years.

What a sitemap is not is content. It does not describe what any page is about. It does not summarise the site. It does not help a human reader navigate. It is an index built for machines that already plan to read every page individually.

That distinction matters, because in 2024 a new kind of machine started reading sites in a fundamentally different way.

2005
Google introduced the Sitemaps protocol in June 2005, and the joint Google, Yahoo, and Microsoft support announcement in November 2006 turned it into a cross-engine standard.
Doodle teaching diagram titled Anatomy of a sitemap.xml file showing Cookie's Coffee Roastery example with annotated XML elements urlset, url, loc, lastmod, changefreq, priority, and a side panel titled What search bots get listing what pages exist, when they changed, how often they change, which are most important
A sitemap.xml file lists every page on a site with structural metadata for search-engine crawlers.

What is llms.txt, and why does it exist?

An llms.txt file is a curated Markdown document that lives at the root of a website and tells AI agents what the site is, what it offers, and which pages matter most. It is designed for systems that read content the way a colleague reads a memo, not the way a crawler indexes a database.

The proposal came from Jeremy Howard of Answer.AI on September 3, 2024. AI agents face the same problem search engines faced in the late 1990s: they need a stable way to discover a site’s high-level structure without crawling every page. A sitemap could not solve this because sitemaps describe URLs, not meaning. An llms.txt file fills that gap.

The format is intentionally simple. The file begins with an H1 for the site or product name. A blockquote summarises what the site is in one or two sentences. After that, the site owner lists pages under H2 sections, with each entry written as a bullet containing the link and a one-line description. There is an optional ## Optional section for pages that agents can deprioritise if they are running short on context.

What makes this work is that the file is curated by the site owner, not generated automatically. A sitemap aims to be exhaustive. An llms.txt file is a selection. It is what you would tell an AI agent if you had thirty seconds to explain your site.

The proposal was lightweight, the format was readable, and the file took minutes to write. That combination is why it spread.

Sept 3, 2024
Jeremy Howard of Answer.AI published the llms.txt specification, proposing a Markdown-based discovery file for AI agents.
Doodle teaching diagram titled Anatomy of an llms.txt file showing Cookie's Coffee Roastery example with annotated Markdown elements H1 heading, blockquote summary, section headings, bullet links with title URL and description format, optional section, and a side panel titled What AI agents get listing what your site is in one sentence, which pages matter most, what each page is for, which pages to skip first
An llms.txt file is a curated Markdown index for AI agents, structured around section headings and short link descriptions.

How did we get from sitemaps to llms.txt?

The path from sitemaps in 2005 to llms.txt in 2026 is a 21-year story told through five turning points, and each one shows the web responding to a new kind of reader.

In 2005, Google introduced the Sitemaps protocol so crawlers could index pages efficiently as the web grew faster than they could keep up. In November 2006, Google, Yahoo, and Microsoft jointly announced support for Sitemap 0.9, turning the format from a Google convention into a cross-engine standard. For the next sixteen years, sitemaps did exactly what they were designed to do.

The next turning point was not technical. In November 2022, ChatGPT launched, and within months a new kind of machine was reading the web differently. AI assistants did not crawl every page. They fetched a small number on demand and synthesised answers in real time. Sitemaps list URLs without context, and generative systems need context more than they need a list.

The proposal came on September 3, 2024. Jeremy Howard of Answer.AI published the llms.txt specification, designed for AI agents that needed to discover a site’s high-level structure quickly. By late 2024, Mintlify had added native support, and thousands of developer-tool sites shipped llms.txt files overnight.

The final two milestones, both in May 2026, came from Google itself and appeared to contradict each other. The Lighthouse team shipped an audit on May 7 that explicitly checks for an llms.txt file. Eight days later, on May 15, Google’s Search team published a generative-AI optimisation guide that told site owners they did not need one. Both statements were correct. They were just answering different questions, which is what the rest of this article is about.

Doodle horizontal timeline titled From sitemaps to llms.txt how we got here showing 7 milestones: 2005 Google introduces the Sitemaps protocol, Nov 2006 Google Yahoo and MSN announce joint support Sitemap 0.9, Nov 2022 ChatGPT launches the AI search era begins, Sept 3 2024 Jeremy Howard proposes llms.txt, Late 2024 Mintlify adopts thousands of docs sites add llms.txt overnight, May 7 2026 Google ships Lighthouse Agentic Browsing audit, May 15 2026 Google Search Central llms.txt not needed for AI search
A 21-year arc from sitemaps to llms.txt, told through seven turning points.

How is llms.txt different from a sitemap?

The two files differ in four ways: who reads them, what they list, how they are written, and what each one is trying to accomplish.

A sitemap is read by search-engine crawlers like Googlebot and Bingbot. An llms.txt file is read by AI agents and IDE coding tools like Claude, ChatGPT, Cursor, and GitHub Copilot. These readers arrive with different intents, on different schedules, and they do different things with what they find.

A sitemap aims to be exhaustive: every page that should be discoverable goes in the file, often thousands of URLs deep. An llms.txt file aims to be selective: the site owner curates the most important pages and writes a one-line description of each, knowing the agent will only fetch a handful in a single session.

The formats reflect those goals. A sitemap is structured XML built for machines to parse at scale. An llms.txt file is Markdown that a person could read in a coffee shop. XML makes sense when the reader indexes every page anyway. Markdown makes sense when the reader needs to skim and decide.

Finally, the purpose. A sitemap answers “what pages exist on this site?” An llms.txt file answers “what is this site, and which pages should I read first?” The first is about discoverability. The second is about context. Both questions still matter in 2026, which is why neither file is replacing the other.

Doodle side-by-side comparison titled Same site Two file formats Two different audiences showing Cookie's Coffee Roastery represented as sitemap.xml for search engines on the left and llms.txt for AI agents and LLMs on the right, with teaching panels showing what search bots get and what AI agents get
The same site, represented as two completely different file formats for two completely different machine audiences.

What do AI agents actually do with llms.txt?

AI agents use llms.txt as a fast index. When an agent is pointed at a website, it fetches the llms.txt file first, reads the summary and curated link list, decides which pages are relevant, and fetches only those. The file is the agent’s shortcut to figuring out a site without crawling it.

The agents that actually do this in 2026 are concentrated in one category: developer tooling. IDE coding agents like Cursor, Claude Code, GitHub Copilot, Windsurf, Cline, and Aider fetch llms.txt routinely when working with documentation sites. Anthropic’s own engineering guidance recommends documentation publishers maintain an llms.txt file precisely because Claude Code and other agents look for it. Our analysis of AI agent use cases in SaaS walks through eight more places this pattern is showing up.

The picture changes outside that category. Limy’s 2026 analysis of more than 500 million LLM bot traffic events found that only 408 directly fetched a /llms.txt file across a 90-day window. The user agents driving the citation layer, including GPTBot, ClaudeBot, PerplexityBot, and Google-Extended, almost never touched it.

The answer is that llms.txt is not built for the AI search citation layer. It is built for the agentic layer, the systems that act on a user’s behalf rather than answer a user’s question. Those systems fetch llms.txt every day. They just do not show up in the same logs the SEO community has been watching.

408
Out of more than 500 million LLM bot traffic events analysed by Limy across a 90-day window, only 408 directly fetched a /llms.txt file. The fetching that does happen is concentrated in IDE coding agents like Cursor and Claude Code.
Doodle four-stage flow diagram titled How an AI agent uses llms.txt showing stage 1 user asks the AI agent How do I integrate with your API, stage 2 agent fetches the site's llms.txt, stage 3 agent scans the curated index and picks the right pages with the agent wearing glasses, stage 4 agent answers accurately with less guesswork
The four-stage workflow that defines how AI agents actually use llms.txt in practice.

Does your site need an llms.txt file?

The answer depends on who is fetching your content today and how much time you have to maintain another file. There are three honest answers, and the right one depends on the kind of site you run.

If you run a documentation-heavy SaaS product, an API, or a developer tool, ship a full version. AI coding agents are already fetching your documentation, and a curated llms.txt reduces the pages they have to crawl. Anthropic publishes its own at platform.claude.com for exactly this reason.

If you run a marketing site, a B2B SaaS company, or a publication, ship a minimal version. List your top fifteen to twenty pages with one-line descriptions and stop. Thirty minutes of work for a clean signal to any agent that fetches it, with no measurable downside.

If you run a local business or a small e-commerce store, skip it for now. The audiences you care about are search engines and humans, and llms.txt does not help with either. Focus on the basics: a working sitemap, fast pages, clean content, and a complete Google Business Profile.

The counter-evidence is worth naming honestly. SE Ranking analysed 300,000 domains in 2026 and found around 10% had published an llms.txt file, with no statistically significant correlation between having one and being cited by AI search systems. That finding is real, but it measures the wrong layer. It measures AI search citation, not the agentic layer the file is built for.

Doodle decision tree titled Does your site need an llms.txt file with three branches from the question What kind of site do you run: Documentation-heavy SaaS or API ship it full version AI coding agents already fetch your docs, Marketing site B2B SaaS or Publisher ship it minimal version low effort light upside 30 minutes of work, Local business or Small e-commerce skip it for now focus on the basics sitemap content Google Business Profile
A three-branch decision frame for whether your site should publish an llms.txt file in 2026.
Analyst Insight
“Prioritize needs before dreams.”
John Mueller · Search Advocate, Google · Search Engine Journal SEO Pulse, May 2026

How do you create an llms.txt file?

You write an llms.txt file the same way you would write a one-page brief for a colleague joining your team. Open a plain text editor, name the file llms.txt, and save it at the root of your domain at yoursite.com/llms.txt. The official specification at llmstxt.org documents the format in full.

The file begins with an H1 containing your brand or product name. Below it, a blockquote summarises what your site is in one or two sentences. This summary is the most important piece of content in the file because it is the first thing an AI agent reads.

After the summary, group your links under Markdown H2 headings. Each link is a bullet in the format - [Page Title](URL): one-line description. A bullet that says “About” gives an agent nothing; “About: founded in 2019, family-run roastery in Portland, Oregon” gives an agent a usable signal. Keep descriptions to one line.

An optional final ## Optional section lists pages that agents can deprioritise if they are running short on context. Press kits, career listings, partner programs.

For hosting, the file lives at the root of your domain. On WordPress, the easiest path is a plugin that writes the file and keeps it updated as you publish, though many publishers prefer to maintain a hand-curated version for editorial discipline. AI agents discover the file by fetching /llms.txt directly, with no submission step required. The broader question of how llms.txt fits into a 2026 strategy across SEO, AEO, and GEO is covered in our piece on hybrid engine optimisation.


What does Google actually say about all this?

Google’s official position is that you do not need llms.txt for AI search. On May 15, 2026, Google Search Central published a generative-AI optimisation guide grouping llms.txt with content chunking and AI-specific rewriting as tactics that do not help with AI Overviews or AI Mode. The exact wording: “llms.txt files: Google’s crawler may discover these files, but they’re treated like any other text file.” John Mueller, Google’s Search Advocate, has compared llms.txt to the deprecated keywords meta tag, noting that server logs show AI search bots do not request the file.

Eight days earlier, on May 7, 2026, a different Google team shipped a Lighthouse audit that does the opposite. The Lighthouse 13.3 release added an Agentic Browsing audit category, and one check verifies whether your site provides an llms.txt file. The Lighthouse documentation describes the file as “a machine-readable summary of a website’s content, specifically designed for LLMs and AI agents.”

Both positions are correct. The Search team is answering “does llms.txt help my site appear in AI Overviews?” and the answer is no. The Lighthouse team is answering “is your site ready for AI agents that act on a user’s behalf?” and the answer is yes. The two teams are not contradicting each other; they are answering different questions, and no single Google document explains how they relate.

For most B2B SaaS sites, both answers matter. If you only care about AI search visibility, the Search team’s guidance applies, and our breakdown of Google’s AI search guide covers that position in full. If you care about agentic readiness, the Lighthouse team’s guidance applies, and an llms.txt file is the cheapest signal you can ship. A strong AEO foundation sits underneath both answers.

Doodle illustration titled Google's split message on llms.txt showing one Google label connecting two columns: Google Search team with magnifying glass icon saying llms.txt files are treated like any other text file You don't need them for AI search Generative AI Search Guide May 15 2026, and Chrome Lighthouse team with browser checkmark icon saying We check for llms.txt to see if your site is ready for AI agents Lighthouse 13.3 Agentic Browsing May 7 2026, with bottom resolution line Different teams Different questions Both correct
Both statements come from Google, but they answer different questions about your site’s machine-readability.

Key Insight

The reason llms.txt feels confusing in 2026 is that the SEO community has been watching the wrong logs. AI search citation bots like GPTBot and ClaudeBot rarely fetch the file. IDE coding agents like Cursor, Claude Code, and GitHub Copilot fetch it constantly. The file is doing real work in one layer of the web while being invisible in another, and your decision about whether to publish one depends entirely on which layer your audience lives in.

The Two Audiences Framework

Sitemaps are written for search-indexing bots that catalog a site for human readers. llms.txt files are written for AI agents that fetch context to act on a user’s behalf. The two audiences need different files because they do different jobs. For most B2B SaaS sites in 2026, a sitemap is non-negotiable and an llms.txt file is a low-cost addition that helps the agentic layer find your content faster. Skip llms.txt only if no AI agents touch your traffic today and you have no documentation worth indexing.

Decision Frame
Which version of llms.txt should you ship?
Three honest answers, one per site type.
→ Ship it (full)
Documentation-heavy SaaS, API, or developer tool
Build a hand-curated file with descriptions for every link. Revisit monthly as you publish new docs. AI coding agents are already fetching your content. Spend the few hours: this is the highest-leverage AI-readiness step you can make this quarter.
→ Ship it (minimal)
Marketing site, B2B SaaS, or publication
Build a clean, short file under 50 lines that lists your top 15 to 20 pages with one-line descriptions. Skip llms-full.txt entirely. Thirty minutes of work, then move on. Revisit only if your traffic patterns change.
→ Skip it for now
Local business or small e-commerce
Your site won’t benefit much from llms.txt in 2026. Focus on the basics that move the needle for your audience: fast pages, clean content, a working sitemap, and a complete Google Business Profile. Revisit in 6 months as the agentic ecosystem matures.
The Answer Frame
So should you ship llms.txt?
If you run documentation-heavy SaaS, an API, or a developer tool, ship a full llms.txt file. AI coding agents are already fetching your content.
If you run a marketing site, a B2B SaaS company, or a publication, ship a minimal version. Thirty minutes of work for a clean signal to any agent that fetches it.
If you run a local business or small e-commerce store, skip it for now and focus on the basics that actually move your traffic.
The Two Audiences Framework is a 2026 answer to a 2026 question. The agentic web is moving fast, and what’s optional today may be standard within a year. The best operators ship what makes sense now and stay ready to revisit. For a deeper critique of why optimisation without understanding fails, our piece on the hidden cost of AI-assisted SEO is worth reading next.

Frequently Asked Questions

llms.txt is a curated Markdown file placed at the root of a website (yoursite.com/llms.txt) that helps AI agents understand a site’s content, structure, and most important pages. The file was proposed by Jeremy Howard of Answer.AI in September 2024. It is read by AI agents and IDE coding tools like Cursor, Claude Code, and GitHub Copilot, not by search engines.

Most B2B SaaS sites should have both. A sitemap helps search engines index your pages for human readers, and an llms.txt file helps AI agents fetch context to act on a user’s behalf. The two files serve different machine audiences. Sitemaps are non-negotiable. llms.txt is optional but low-cost, and recommended for documentation-heavy sites and SaaS companies.

Google’s Search team published official guidance on May 15, 2026, stating that llms.txt is not needed for Google’s AI Overviews or AI Mode. The Lighthouse team shipped an Agentic Browsing audit on May 7, 2026, that checks whether a site provides the file. Both positions are correct. They answer different questions about your site’s machine-readability.

Open a plain text editor, name the file llms.txt, and save it at the root of your domain. Start with an H1 containing your brand name, add a one-or-two-sentence blockquote summary, then list your important pages under H2 sections with bullet links in the format – [Page Title](URL): description. Upload to your site root and the file is live.

The Two Audiences Framework is a model for understanding modern site machine-readability: sitemaps are written for search-indexing bots that catalog a site for human readers, while llms.txt files are written for AI agents that fetch context to act on a user’s behalf. The two audiences need different files because they do different jobs.

Daniel Voss
SaaS Market Analyst and Content Strategist
Daniel Voss covers B2B SaaS market dynamics, AI adoption patterns, and the evolving information economy that surrounds them. He writes for operators and decision-makers who need defensible positions on contested questions, not summaries of what the industry already agrees on. His work focuses on what the data actually says, where consensus is breaking down, and which signals matter when the next decision is on the table.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top