Internal Linking for AI Search: What Changes, What Doesn't, and What to Audit First

Direct observations from inside ChatGPT Deep Research, large-scale citation data, and what they mean for how you structure internal links in 2026.

Updated May 28, 2026: Added findings from Ahrefs' Q4 2025–Q1 2026 AI Search Benchmark, published this week, which corroborates the ranking-citation decoupling discussed below.

Internal linking now serves two functions for LLM-based search: discovery and semantic context. Retrieval-augmented systems like ChatGPT Deep Research follow internal links to navigate between pages, and they use anchor text, URL slugs, and surrounding sentence context to decide which links are worth following. The principles of good linking, descriptive anchors, entity-first language, semantic clarity, have not changed. What changed is the role they now play in whether your content gets seen at all.

Do LLM-based search agents actually follow internal links?

Yes. The mechanism is documented at the protocol level for ChatGPT Deep Research, and inferred from documentation for other retrieval-augmented systems.

When OAI-SearchBot opens a page during ChatGPT Deep Research, every <a> tag on that page becomes a structured reference passed to the model. Each subsequent browser.open call contains a clicked_from_url field. That field is empty when the page was reached via Bing search, and populated when the page was reached by following a link from another page.

This was documented by David Konitzny, who analyzed Deep Research's WebSocket traffic and published his observations on LinkedIn. In one of his analyzed sessions, he observed 31 page opens. Nine of those came via internal links rather than search queries. That is roughly a third of the navigation graph happening without new searches.

The data is from one session, one researcher, one system. The mechanism is documented; the magnitude varies by query and site. Search Engine Land's AI crawler guide independently confirms that LLM crawlers follow links and navigate the sites they encounter, and distinguishes OAI-SearchBot (which handles citation indexing) from GPTBot (which handles training crawling).

The rest of this post draws on Konitzny's ChatGPT Deep Research observations, because they offer the most detailed public protocol-level analysis we found while researching this piece. Deep Research is ChatGPT's high-effort research workflow, the mode that does extensive reading and synthesis. Standard ChatGPT search uses related but possibly different retrieval mechanics, and the volumes Konitzny observed apply specifically to Deep Research rather than to every ChatGPT interaction. The underlying principles, link-following, anchor text as a selection signal, content structure preferences, are common to retrieval-augmented systems broadly. They apply directionally to Claude, Perplexity, Google AI Overviews, and standard ChatGPT, though the specific behaviors will vary by system and by interaction mode. Where evidence is specific to one system or mode, this post names it.

Internal links are no longer purely an SEO signal. They are an active discovery channel for retrieval-augmented search systems.

How do LLM agents decide which internal links to follow?

They evaluate page context, including anchor text, URL slug, and the sentence around the link, before choosing the next navigation step.

Konitzny's protocol traces show that link selection happens inside the model's working process. When the model receives the full link graph of a page, it does not follow every link. It selects based on semantic relevance to the active query, and that selection is visible in its reasoning before the next browser.open call executes.

Two surfaces carry the signal at the moment of selection:

The first is the anchor text, the visible clickable phrase. This is the most editable surface, and the one most often left weak in real-world content. A descriptive anchor that names the destination concept gives the model usable signal. A generic phrase like "click here" gives the model little to evaluate at the anchor itself, leaving the surrounding sentence and URL slug to carry more of the weight. The link is still selectable, but the anchor, the most editable and most direct signal, has been wasted.

The second is the URL slug, the path encoded in the link's href value. A URL like /internal-linking-analysis-tool is itself a description of the destination. Ahrefs analyzed 1.4 million ChatGPT prompts and found that both page title and URL are signals ChatGPT's retrieval pipeline evaluates for semantic relevance.

Consider a concrete example. Maria sells project management software for design agencies. She has a blog post titled "5 Project Management Mistakes Design Agencies Make," and inside that post she has a sentence about missed deadlines that links to her product page.

Her original version reads: "Missed deadlines are solvable with the right tools. Learn more here." The anchor text is "Learn more here." The link points to /project-management-software.

When someone asks ChatGPT "what's the best project management software for design agencies?", the model might surface Maria's blog post during research. It sees a link labeled "Learn more here" pointing to /project-management-software. The URL slug carries semantic signal. The anchor text carries nothing. The link is selectable but ambiguous.

Now consider the revised version: "Missed deadlines are one of the main reasons we built project management software for design agencies with a real-time dashboard at its core," with "project management software for design agencies" as the anchor. Same page, same link, same URL. The anchor text now matches the user query semantically. The anchor and the URL slug both point clearly at the destination concept. The model has aligned signals at the moment of selection, and the link is much more likely to be followed during the active research session. The product page becomes a candidate for citation in the final answer.

Traditional SEO treated anchor text primarily as a ranking signal for the destination page, evaluated after the link was followed. Retrieval-augmented search treats anchor text and URL as selection signals at retrieval time. The model is actively deciding, in real time, whether the linked destination is worth fetching. The anchor text and URL slug are not just describing the destination. They are the inputs the model uses to decide whether the destination gets read at all.

What changes about anchor text in the age of LLM retrieval?

Anchor text becomes more important, not less, and the principles of good anchors are largely the same for Google and for retrieval-augmented search.

For two decades, SEO thinking on anchor text has been partly defensive. Vary anchors, avoid keyword stuffing, look natural. That framing was always incomplete. Anchor text is not primarily a penalty-avoidance problem. It is a semantic surface area problem.

The principle: anchor text is a retrieval signal. Every anchor is a description of the destination. The more clearly and specifically it describes what the reader (or LLM) will find at the destination, the more useful it is, both for ranking and for retrieval-time link selection.

Google's own anchor text documentation includes this exact principle, presented as a tip: "Try reading only the anchor text (out of context) and check if it's specific enough to make sense by itself. If you don't know what the page could be about, you need more descriptive anchor text." We call this the Concept Specificity Test, and use it as the working framework for evaluating every anchor:

If you removed the link and a reader saw only the anchor text, could they correctly predict what the destination page is about?

Retrieval-augmented systems are doing exactly this evaluation. They are looking at anchor text without yet having fetched the destination. If the anchor passes the Concept Specificity Test, the model has a useful signal to decide whether to follow it. If the anchor fails, the link gives the model little to work with at the moment of selection.

Beyond the test itself, we use a primary entity framework. Each target page has a primary entity, the shortest reusable phrase that names what the page is about and passes the Concept Specificity Test. (We use "primary entity" as a working methodological term here, not in the strict knowledge-graph sense.) Good anchor strategy reuses that primary entity term across multiple links to the same target, varied with semantic variants for the rest. We work with a target minimum of approximately 45 percent primary entity reuse, with semantic variants filling the rest. This figure is methodology, not a tested optimum. The reasoning is straightforward: primary entity reuse signals consistency across the topic graph; variants expand the semantic surface area so the destination is reachable from multiple semantic angles.

One thing that needs addressing directly: there is widespread belief in the SEO industry that exact-match internal anchor text triggers Google penalties. The strongest direct Google statement on this is from Gary Illyes in a 2019 Reddit AMA, where he explicitly confirmed there is no internal-link over-optimization penalty. Google's documented penalty mechanics target external backlink profiles, not internal anchor patterns. The 2024 Google algorithm leak adds nuance: it revealed that internal anchor text is evaluated as part of broader quality scoring, suggesting Google does pay attention to internal anchor patterns even if there is no direct penalty. The honest read is that keyword-stuffed internal anchors are not penalized, but they may dampen the effectiveness of the link signal, and they waste your semantic surface area on a single phrasing when varied descriptive anchors would expand the destination's retrievability across query variants.

On how variant anchors actually help: the benefit operates at different layers, and it's worth being precise about each.

At link-selection time on each individual page, the benefit is local. The anchor on that specific page either passes the Concept Specificity Test or it does not. Variants do not help in this moment.

Across a single retrieval session, if the model reads multiple pages that link to the same destination using different anchor variants, it sees that destination from several semantic angles. This increases the likelihood the destination gets fetched at some point during the session.

In aggregate across the web, varied anchor text contributes to the destination's overall topical coverage in ways traditional SEO has long valued, and likely contributes to how retrieval indexes embed the destination across multiple semantic dimensions. The exact mechanism by which retrieval-augmented systems aggregate anchor signals across crawls is not publicly documented at protocol level.

Entity-first anchor strategy is not a new technique invented for AI search. It is good SEO practice that becomes load-bearing in the era of LLM-based retrieval.

How much of your page does ChatGPT actually read?

Less than most SEO professionals assume, and where your real content begins on the page determines whether it gets seen.

Konitzny's follow-up WebSocket analysis of ChatGPT Deep Research identified three distinct reading modes, each with vastly different character volumes:

Reading mode	What it does	Approximate volume
Snippet only	URL appears in Bing results; page never opened	~700 characters (Bing snippet)
Single-pass read	Page fetched and read top-down, once	~5,000 characters
Deep re-read	Page opened, keyword-searched, sections re-read	~12,000 characters total

These volumes are specific to Deep Research. Other retrieval-augmented search systems likely use different read budgets. Dan Petrovic (Dejan AI) independently documented similar windowed-reading mechanics for the OpenAI Assistants API Web Search tool: GPT systems consume pages as plain-text slices around a target line number, with each open() call returning a capped window. Different surface, different methodology, same underlying mechanism. The implication is that windowed plain-text retrieval is not specific to Deep Research; it appears to be how GPT-family systems consume web content broadly.

What matters more than the exact volumes is what they imply about navigation. Deep Research reads pages as plain text, including navigation menus, headers, and footer content. The first read chunk starts at line 0, regardless of what is there. On a PDF, line 0 is content. On a standard site, content typically begins around line 50, after navigation. On a heavy-nav site, content can begin more than 200 lines in, meaning the read budget is consumed by menu and link blocks before the real content begins.

The implication for internal linking is direct. Navigation menus are read first, in the model's high-attention zone, but their anchor text is typically generic ("Pricing," "About," "Resources"). They consume read budget without contributing semantic retrieval value. Contextual in-body links with descriptive anchors are the structural element that does the actual retrieval work.

There is an architectural reason this matters. LLMs exhibit a documented U-shaped attention bias, paying disproportionate attention to content at the beginning and end of any passage they read (Liu et al., 2023, Lost in the Middle; Wu et al., MIT, 2025). On a nav-heavy page, that means menu items sit in the high-attention zone at the top, real content slides into the low-attention valley in the middle, and footer content gets the high-attention zone at the end. The bias is well-documented as a general LLM property; its specific application to how retrieval-augmented systems weight web page content versus navigation has not been directly studied in published research.

When a page is judged valuable, the model returns to it and uses keyword search to jump directly to relevant sections. It searches for a term (something like browser.find(query="deductible")), gets back an exact line number, and re-reads only the matched section. If your content uses the precise terminology the model is searching for, that section gets re-read. If your content uses vague phrasing or branded shorthand in place of the real concept, the matched section is harder for the model to locate. This is mechanism-based reasoning rather than a directly tested claim. Konitzny showed the model can use browser.find to locate keywords. He did not specifically test that vague terminology causes retrieval to fail. The connection is plausible from the observed mechanism but not directly demonstrated.

Read budget is finite. Navigation menus consume it. Generic anchor text wastes it. Precise, entity-first language is what survives the cut.

What does the data say about what AI search actually cites?

Semantic relevance dominates. The overlap between Google's top 10 and AI Overview citations has shrunk sharply. Brand presence across the web outperforms backlink count.

Semantic relevance is the strongest predictor of citation in ChatGPT. Ahrefs analyzed 1.4 million ChatGPT prompts and found that the semantic relevance of a page's title and URL to ChatGPT's internal sub-questions (the fan-out queries the model decomposes the user's request into) strongly predicts citation likelihood. As Ahrefs put it, relevance does the heavy lifting. The implication is consistent with what the mechanism evidence already showed: clear, specific, descriptive surfaces are what the ChatGPT retrieval pipeline reaches for.

The link between Google ranking and AI Overview citation has weakened sharply. Pre-Gemini 3 analyses showed approximately 76% of AI Overview citations came from the top 10 organic results. After Gemini 3 became the default AIO model in January 2026, Ahrefs' analysis of 4 million AI Overview citations showed that figure dropped to roughly 38% (with positions 11–100 accounting for another 31.2%), and BrightEdge's separate one-year analysis placed top-10 overlap as low as 17%. Ahrefs attributes the shift partly to fan-out queries, where Gemini 3 splits the original query into multiple sub-queries, broadening the pool of pages it pulls from. Top-10 ranking and AIO citation now overlap far less than they did. Pages with strong passage structure and semantic clarity can be selected over higher-ranked pages with buried answers. This finding is specific to Google AI Overviews; the ranking-citation relationship in other retrieval-augmented systems (ChatGPT, Claude, Perplexity) is governed by different retrieval mechanics.

A meaningful share of AI-cited pages don't rank in traditional search at all. Ahrefs' Q4 2025 to Q1 2026 AI Search Benchmark analyzed the top 1,000 pages cited by ChatGPT and found that 28.3% had no organic Google keywords, meaning no traditional search visibility whatsoever. Of the pages that did rank, the median was 279 keywords each, on domains with a median Domain Rating of 90. The takeaway reinforces the mechanism: ranking and citation are decoupling. A page can be surfaced and cited through retrieval mechanics even when it has no conventional search footprint, which is exactly why the discovery surfaces this post focuses on (anchor text, URL clarity, internal link paths) matter independently of where a page ranks.

Brand mentions across the web correlate more strongly than backlinks with AI Overview brand visibility. Ahrefs (Patrick Stox, Si Quan Ong, May 2025) studied 75,000 brands and measured correlations between various factors and Google AI Overview brand visibility. Brand web mentions correlated at 0.664. Backlinks correlated at 0.218. That is roughly 3x stronger. Both signals are off-site: text references to your brand on third-party sites, and external links to your domain. Stox himself notes correlation does not equal causation, and all factors in the study show moderate to weak Spearman correlations. Traditional link-building remains valuable for Google rankings, but for AI Overview visibility specifically, PR, earned media, and presence on category-relevant publications matter more than they used to.

The citation data reflects what the mechanism evidence already showed. Structure, semantic clarity, and brand presence determine retrieval more than rank or link count alone.

What traditional SEO rewards vs. what LLM retrieval rewards

Practice	Traditional SEO	LLM-Based Retrieval
Descriptive anchor text	Established ranking signal	Operates as a selection signal during retrieval, not just a destination signal
Descriptive URL slugs	Helpful for ranking and click-through	Parallel semantic signal at the moment of link selection
Keyword-stuffed exact-match internal anchors	No direct penalty (Google has confirmed); wastes semantic surface area	Narrow semantic coverage limits the destination's retrievability across query variants
Entity-first language	Helpful for topical clarity and disambiguation	Critical for keyword-precision retrieval and link selection
Anchor diversity and semantic variants	Recommended for natural link profile	Helpful to expand the semantic surface area through which a destination can be retrieved
Generic anchors ("click here," "learn more")	Lost ranking opportunity	Anchor itself gives little to evaluate; weight shifts to surrounding sentence and URL
Where real content begins on the page	Matters for above-the-fold UX and engagement	Critical for finite read budget; heavy navigation pushes content out of the first read window
Internal links from contextually relevant pages	Topical authority signal	Discovery signal and topical coverage signal
Content structure (lists, tables, FAQ blocks)	Strong signal (featured snippets, rich results)	Stronger signal (listicles ~50% of top AI citations; tables ~2.5x cited per Onely)
Freshness and regular updates	Significant for time-sensitive queries; modest for evergreen	Strong and uniform across query types; content under 3 months old is ~3x more likely to be cited (AirOps)
Ranking in Google top 10	Defines visibility	Overlap with AIO citation has weakened sharply (76% → 17–38% post-Gemini 3)

Most of the table is overlap. These are not two different disciplines. They are the same discipline with different weighting. Descriptive anchors, descriptive URLs, entity clarity, internal link architecture, content structure: all of these were already best practice for traditional SEO. Retrieval-augmented search just makes some of them matter more.

Three places in the table mark genuine shifts. First, where your real content begins on the page now affects whether it gets read at all, which traditional SEO never quantified. Second, content structure matters more for AI citation than for traditional SEO, even though traditional SEO already valued it. Third, the overlap between top-10 ranking and AI Overview citation has collapsed, which changes how teams should think about ranking as a proxy for visibility.

What should SEO teams audit first?

Three audits, in priority order.

1. Audit your anchor text against the Concept Specificity Test. Go through your highest-priority pages and look at the anchor text of every internal link pointing into them. For each anchor, ask: if a reader saw only this phrase, could they predict where it leads? If not, rewrite it. Pay particular attention to anchors that use vague phrasing or branded shorthand in place of the precise concept. Those anchors give retrieval-augmented systems less signal at the surface most directly under your control.

A concrete example. An anchor reading "our platform" pointing to /cybersecurity-platform fails the test at the anchor surface: the anchor itself gives the model nothing to evaluate. The model can still lean on the URL and the surrounding sentence, but the anchor, the most editable signal, is wasted. The rewrite, "cybersecurity platform for IT teams" or similar, names the destination concept directly. Same link, same URL, fundamentally different signal at the surface most under your control.

2. Map your primary entity for each target page. For each target page on your site, identify the shortest phrase that names what the page is about. That is your primary entity. Reuse it on a meaningful share of anchors pointing to that page, with semantic variants filling the rest. This expands the semantic surface area through which the destination can be retrieved.

3. Surface your missing internal linking opportunities. Auditing your existing links is only half the job. The other half is finding the places where a link should exist but does not. Pages that discuss your target topic without linking to your target page are the biggest single source of untapped value, and the biggest blind spot in most traditional audits. A descriptive URL on the target page contributes no internal linking signal until a link actually exists pointing to it.

Most internal linking audits stop at step one. The other two are where the leverage is.

If you're thinking about this work seriously, whether you're in-house, at an agency, or running it as an independent, Axome is the tool I built to do it at scale. It scores anchor text against the Concept Specificity Test, surfaces missing source-to-target opportunities across a site, and produces implementation-ready recommendations. Private beta now. Request access here.

What we still don't know

The published research on how retrieval-augmented systems use internal links is genuinely thin, and several open questions would meaningfully sharpen the playbook.

Konitzny's protocol-level analysis is the strongest direct evidence we have, and it covers one mode of one system. Three questions remain genuinely open:

1. Relative weighting of link signals at retrieval time. When a retrieval system decides which internal link to follow, how heavily is anchor text weighted against URL slug, surrounding sentence, and heading context? We know all four are signals. We don't know the relative weights, or whether they shift by query type, system, or interaction mode.

2. Protocol-level behavior across systems. Konitzny documented ChatGPT Deep Research. The mechanics in Claude, Perplexity, and Google AI Overviews are inferred from documentation and general RAG principles, but not directly observed at the protocol level. Direct observation would either confirm the methodology generalizes or surface meaningful system-specific differences.

3. Whether the read-budget findings extend beyond Deep Research mode. Standard ChatGPT search, voice interactions, and conversational retrieval may use different read mechanics. The Konitzny volumes (700 / 5,000 / 12,000 characters for the three reading modes) apply specifically to Deep Research. Whether the navigation-overhead finding applies uniformly across other ChatGPT modes is not publicly documented.

If you have data on any of these questions, or have published research that addresses them, I want to hear about it. Get in touch through the Axome contact form or send me a LinkedIn message. Internal linking matters more in AI search than the public evidence base reflects, and the only way that gets better is more researchers publishing what they're finding.

Frequently asked questions

Does ChatGPT actually follow internal links when researching a topic?

Yes, in Deep Research mode. WebSocket analysis of ChatGPT Deep Research shows that when OAI-SearchBot opens a page, every link on that page becomes a structured reference passed to the model. Subsequent navigation can happen via those internal links instead of new searches. In one analyzed session, roughly a third of the model's page visits came from following internal links rather than fresh search queries. Whether other ChatGPT modes (standard search, voice, conversational) follow internal links the same way has not been directly documented at the protocol level.

Does any of this apply to Claude, Perplexity, or other AI search systems?

Directionally, yes. The strongest direct evidence in this post comes from ChatGPT Deep Research, because that is the system Konitzny analyzed at the protocol level. Claude, Perplexity, and Google AI Overviews use related retrieval-augmented architectures and the underlying principles (link-following, anchor text as a selection signal, content structure preferences) are common to that class of systems. The specific behaviors and read budgets vary by system. Where evidence in this post is system-specific, it is named as such.

How much of a page does ChatGPT actually read?

It depends on the page's value to the active query. Konitzny identified three reading modes in ChatGPT Deep Research: a ~700-character Bing snippet (page never opened), a ~5,000-character single-pass read, and a ~12,000-character deep re-read for pages judged valuable. The model can also use keyword search (browser.find) to jump to specific sections. Pages with heavy navigation menus consume read budget before reaching real content. These volumes are specific to Deep Research.

Should I change my anchor text strategy for AI search?

Not fundamentally. The principles of good anchor text were already aligned with what retrieval-augmented search rewards: descriptive, entity-first, specific anchor phrases. What changes is the role anchor text plays. Retrieval-augmented systems use anchor text as a selection signal in real time, before fetching the destination. Generic anchors that were merely wasteful for traditional SEO now give the model less signal at the surface most under your control when it decides which links to follow.

How is GEO different from SEO for internal linking?

Generative engine optimization (GEO) and SEO use largely the same methodology for internal linking: descriptive anchors and URLs, entity-first language, topical clusters, internal links from contextually relevant pages. The difference is in how the signals are used. Traditional SEO uses anchor text and URL as inputs to ranking algorithms. Retrieval-augmented systems use them as inputs to real-time link selection during a research session. The work to do is largely the same; the mechanism rewarding it is different.

Does ranking in Google's top 10 still matter for AI Overview citation?

Less than it used to. Pre-Gemini 3 analyses showed approximately 76% of AI Overview citations came from Google's top 10. After Gemini 3 became the default AIO model in January 2026, that figure dropped to between 17% and 38% depending on methodology. Top-10 ranking and AIO citation now overlap far less, and content structure and semantic clarity carry more of the weight in determining what gets cited.

The fundamentals of internal linking did not change. The role they play did. Anchor text and URL slugs are now read by both human readers and the retrieval-augmented systems deciding what to cite. The principles that have always made for good internal linking, descriptive, specific, entity-first, are the same principles that determine whether your content gets discovered in AI-generated answers. The audit has not changed. The cost of doing it badly has.

References