AI Overviews Sources: Where Google Actually Pulls Data

Google AI Overviews pull from Wikipedia, YouTube, Reddit, and deep internal pages. See the data on which sources get cited and how to earn a spot.

L
LoudScale
Growth Team
15 min read

AI Overviews Sources: Where Google Actually Pulls Its Data (And Why Most of What You’ve Heard Is Incomplete)

TL;DR

Google Isn’t Just Citing the “Best” Sources. It’s Building a Walled Garden.

I spent three weeks tracking which domains show up in AI Overviews across 40+ queries in the marketing and SaaS space. What I expected: Wikipedia and a smattering of authoritative blogs. What I found: Google linking back to itself at a rate that should make every independent publisher uncomfortable.

An SE Ranking study of 141,507 AI Overviews found that 43% of AI Overview responses contain links redirecting users to Google’s own search results or properties. Each AI answer typically includes 4 to 6 of these self-referencing links. YouTube (owned by Google), Google Support pages, Google Blog, Play Store listings: they all count. When you combine every Google-controlled domain, the company accounts for roughly 23% of all AI Overview citations.

Think about that for a second. The entity deciding which sources to cite in its AI-generated answers is also the single largest beneficiary of those citations. It’s like a restaurant critic who reviews their own restaurants most favorably. The critic might still be right about the food. But you’d want to know about the conflict.

This doesn’t mean your content can’t get cited. It just means you’re competing for the remaining 77% of citation slots, and you need to understand exactly how Google picks those winners.

The Actual Citation Hierarchy (With Numbers, Not Vibes)

Everyone says “Wikipedia, YouTube, Reddit” and moves on. That’s surface-level. Let’s look at what the data actually shows across multiple studies, because different methodologies produce different rankings, and the differences tell you something useful.

SourceSurfer SEO AI Tracker (Mar-Aug 2025)Ahrefs AI Mode Study (2025)SE Ranking US Study (May 2025)Pew Research (Mar 2025)
YouTube~23.3% of citations9.51% (961K mentions)~13%Top 3 source
Wikipedia~18.4%11.22% (1.13M mentions)~13%Top 3 source
Google properties~16.4%5.95% (blog.google) + 5.62% (google.com)44% (incl. self-links)N/A
RedditVaries by vertical5.82% (588K mentions)~13%Top 3 source
AmazonN/A4.26% (431K mentions)N/AN/A
QuoraVaries by vertical3.56% (360K mentions)~13%N/A

Sources: Surfer SEO AI Citation Report, Ahrefs AI Mode study, SE Ranking via Search Engine Journal, Pew Research Center

Why do the percentages differ so much between studies? Methodology. Surfer’s AI Tracker analyzed 36 million AI Overviews and 46 million citations. Ahrefs looked specifically at AI Mode (a different product from standard AI Overviews). SE Ranking’s study covered 100,013 keywords across five US states and included Google’s self-referencing links in the count.

Here’s what matters more than any single pie chart: the citation mix shifts dramatically by industry. In gaming, YouTube shows up in 93% of AI Overview responses and Reddit in 78%. In health, NIH accounts for roughly 39% of non-Google citations. In finance, Investopedia punches way above its overall domain authority. Your vertical determines your competition, not some universal leaderboard.

Why “Just Rank Higher” Is Bad Advice for AI Overviews

Here’s the part most articles get wrong, or at least oversimplify. They say “rank on page one and you’ll get cited in AI Overviews.” That’s half-true, and the half that’s missing could waste months of your time.

Ahrefs studied 1.9 million AI Overview citations and found that 76.1% of cited pages rank in the top 10 organically. Sounds like a slam dunk for traditional SEO, right? But here’s the catch: Si Quan Ong at Ahrefs ran a follow-up correlation study and found the relationship between ranking position and citation probability is “positive yet moderate.”

“It’s true that if you rank #1 in the SERPs, you’re more likely to be cited in an AI Overview than if you were ranked lower. But that chance is a coin flip at best.”

— Si Quan Ong, Ahrefs (Source)

A coin flip. You could do everything right in traditional SEO, earn position one, and still have roughly a 50/50 shot at showing up in the AI Overview. Meanwhile, BrightEdge’s 16-month longitudinal study tracked AI Overview citation overlap with organic rankings and found it grew from 32.3% to 54.5% between May 2024 and September 2025. That convergence is real. But 54.5% overlap still means 45.5% of citations come from pages that don’t rank particularly well organically.

So what else is Google looking at?

The “Citation Gravity” Framework: Four Forces That Pull Content Into AI Overviews

After digging through every major study published on AI Overview citations in the past year, I started seeing a pattern that none of the individual reports spell out. There are four forces that determine whether your page gets cited. I call them Citation Gravity, because they work a lot like gravitational pull: the more forces working in your favor, the harder it is for Google’s AI to ignore your content.

Force 1: Organic Authority (the floor, not the ceiling). You probably need to rank somewhere in the top 100 for related queries. Originality.ai found that 48% of AI Overview citations come from pages in the top 100 organic results, while 52% come from outside that range. BrightEdge’s data is more bullish on organic overlap at 54.5%. Either way, organic visibility is the entry ticket, not the winning ticket.

Force 2: Depth and Specificity (the deep-page advantage). This is the one most marketers miss. BrightEdge found that 82.5% of AI Overview citations link to deep content pages that are two or more clicks away from the homepage. Only 0.5% of citations pointed to homepages. Google’s AI doesn’t want your “About Us” page. It wants your detailed breakdown of a specific subtopic buried three folders deep in your site architecture.

Force 3: Content Type Match (format follows query). AI Overviews don’t just pick the “best” page. They pick the best page for the type of question being asked. For health queries, that means peer-reviewed or clinician-vetted content (NIH, Mayo Clinic, Cleveland Clinic dominate). For how-to queries, YouTube crushes everything because video demonstrations answer “how” better than text. For opinion and experience queries, Reddit threads and Quora answers win because they offer first-person perspectives. When the Guardian reported on SE Ranking’s healthcare citation study in January 2026, the big finding wasn’t just that YouTube was the most-cited domain for health queries (4.43% of 465,823 citations). It was that Google’s AI prioritized format accessibility over institutional authority in many cases.

Force 4: Fan-Out Query Alignment (the hidden layer). This is where it gets interesting. AI Overviews don’t just run your original search query against its index. The system generates what are called fan-out queries, which are longer, more specific sub-queries about facets of your original question. Think of it like a research assistant who, when you ask “best running shoes,” separately searches “Nike Pegasus 42 durability review,” “best cushioned shoes for heavy runners 2025,” and “trail vs road running shoe differences.”

Ahrefs tested whether pages cited from outside the top 10 were getting pulled in through these fan-out queries. The results were surprising: those lower-ranking cited pages actually appeared for fewer keywords and shorter queries, not longer ones. So the fan-out theory doesn’t fully explain why some deep-ranking content gets cited. There are clearly other signals at play, possibly freshness, entity matching, or structural clarity, that nobody has fully mapped yet.

Pro Tip: Don’t just optimize for your primary keyword. Create dedicated deep pages that answer the specific subtopics a fan-out query system might generate. If your main article covers “best CRM software,” build separate, detailed pages for “CRM for solo consultants,” “CRM data migration steps,” and “CRM pricing comparison 2026.” Each of those could earn its own AI Overview citation independently.

The Convergence Timeline: How Citation Patterns Have Shifted Over 16 Months

One thing I haven’t seen anyone else visualize clearly is how fast this system is changing. BrightEdge’s longitudinal data gives us the clearest picture of AI Overview citations converging with organic rankings over time, and the industry-level differences are wild.

IndustryOverlap (May 2024)Overlap (Sep 2025)Change
Healthcare63.3%75.3%+12.0 pp
Education19.4%72.6%+53.2 pp
B2B Tech38.6%71.0%+32.4 pp
Insurance20.9%68.6%+47.7 pp
Finance21.5%32.2%+10.7 pp
E-commerce22.3%22.9%+0.6 pp
Restaurants0.0%19.2%+19.2 pp

Source: BrightEdge 16-Month AI Search Report

Look at that E-commerce line. Basically flat over 16 months, with only 0.6 percentage points of change. Google appears to intentionally keep transactional queries separate from AI Overviews. If you sell products, AI Overviews are a different game than if you publish educational content (where convergence has been explosive).

And Healthcare at 75.3% overlap? That means three out of four health-related AI Overview citations come from pages that also rank well organically. Google’s YMYL (Your Money, Your Life) instincts are strong here. When trust matters most, Google leans hardest on content that’s already proven itself in traditional search. That’s a signal for any brand in a high-trust vertical: your organic SEO work has double value.

The AI-Citing-AI Problem Nobody Wants to Talk About

Here’s the part that should make everyone a little nervous. Originality.ai analyzed 29,000 YMYL queries and found that 10.4% of AI Overview citations link to AI-generated content. One in ten. And the problem is worse outside the top 100 organic results, where 12.8% of cited pages were AI-generated compared to 7.7% within the top 100.

Why does this matter? Because it creates a feedback loop. Google’s AI cites a page. That citation boosts the page’s visibility and credibility. The page (which was written by AI in the first place) then gets crawled into future training data. The AI learns from its own output. Over time, the information ecosystem gets a little less diverse, a little less human, a little more recursive. Originality.ai calls this the risk of “model collapse.”

Is this an emergency? Not yet. 74.4% of cited documents in the study were confirmed human-written. But for YMYL topics (health, finance, law, politics), even a 10% contamination rate deserves attention. If you’re a publisher producing genuinely expert, human-authored content, this might actually be your competitive moat. Google will eventually need to solve this problem, and when it does, authentic expertise will likely get more citation weight, not less.

What Actually Gets Cited: Content Characteristics That Win

Forget domain authority for a minute. What do the individual pages that earn citations look like? Across the studies I reviewed, a few consistent patterns emerged.

Structured, direct answers. The Pew Research Center found that the typical AI Overview is 67 words long and cites three or more sources 88% of the time. That means Google’s AI is extracting tight, quotable chunks from your content. If your page buries the answer in paragraph six after a long-winded intro, it’s harder for the system to extract and cite you. Put the answer first, then elaborate.

Freshness matters (more than you think). BrightEdge’s data showed the fastest citation-overlap growth happened between September 2024 and January 2025, at roughly 3 percentage points per month. During this period, Google appeared to be recalibrating how it weighted newer content. Pages that were regularly updated had a better chance of matching the shifting algorithm.

Specificity over breadth. Remember that 86% of AI Overview citations appeared for only one keyword in BrightEdge’s analysis. Your mega-post covering “everything you need to know about X” is less likely to get cited than a focused page answering one narrow question exceptionally well.

Watch Out: If your content strategy relies on long, comprehensive guides that target 20 keywords each, you might be optimizing for old Google. AI Overviews prefer pulling from pages that go deep on a single topic. Think sniper rifle, not shotgun.

The Platforms That Changed the Game: Reddit’s Unlikely Rise

If someone told me in 2023 that Reddit would become the single most cited non-Google source in AI Overviews, I’d have laughed. Reddit? The site where someone once convinced thousands of people that a safe contained treasure (it didn’t)?

But here we are. Between March and June 2025, Reddit’s AI Overview citation rate surged from 1.30% to 7.15%, a 450% increase in three months. In standard AI Overviews (not AI Mode), Reddit commands roughly 21% of citations. Perplexity AI relies on Reddit for 46.5% of its citations.

Why? Two reasons. First, Reddit’s content is inherently experiential. Real people sharing real opinions about real products and problems. Google’s AI values E-E-A-T (the acronym for Experience, Expertise, Authoritativeness, and Trustworthiness), and Reddit threads deliver the “Experience” part in a way that polished marketing content often can’t.

Second, there’s the business relationship. Reddit and Google announced a data licensing partnership in February 2024, giving Google enhanced access to Reddit content for AI training. That partnership likely greased the rails for Reddit’s citation dominance.

For marketers, this creates an uncomfortable reality. Your carefully crafted 3,000-word blog post might lose the citation to a three-sentence Reddit comment from someone with a throwaway account. The upside? If your brand or team is actively participating in genuine Reddit discussions (not spamming, actually helping), those contributions can become citation-worthy.

How Different AI Platforms Pick Different Sources

Here’s something that gets lost in the “optimize for AI” conversation: each platform has its own bias. What gets cited by Google’s AI Overviews, ChatGPT, and Perplexity are three genuinely different lists.

Semrush tracked over 100 million citations across three platforms over 13 weeks (July through October 2025) and found stark differences. Before September 2025, ChatGPT cited Reddit in close to 60% of responses and Wikipedia in roughly 55%. Then both dropped dramatically in mid-September, with Reddit falling to around 10% and Wikipedia dropping below 20% on ChatGPT. The cause is debated (some point to Google removing its num=100 search parameter), but the effect was clear: AI citation patterns can change overnight.

“I believe the main reason for the drop is an attempt to avoid over-citing on certain websites, to be less biased toward them, while generating answers.”

— Sergei Rogulin, Head of Organic and AI Visibility at Semrush (Source)

Google AI Mode, by contrast, showed far more stability. LinkedIn stayed near 15% of citations. YouTube and Reddit both held steady. The platform that Google controls showed the least volatility. Make of that what you will.

The practical takeaway? If you’re only optimizing for one AI platform’s citation patterns, you’re playing a fragile game. Source preferences shift. Diversify.

Frequently Asked Questions About AI Overview Sources

Where does Google AI Overview get its information?

Google AI Overviews pull from a mix of sources including organic search results, Google’s own properties (YouTube, Google Support, Google Blog), user-generated content platforms (Reddit, Quora), and authoritative domain-specific sites (NIH for health, Investopedia for finance). An SE Ranking study found that 43% of AI Overview responses include links back to Google’s own properties. The remaining citations draw from a long tail of niche-relevant websites, with 82.5% of citations pointing to deep internal pages rather than homepages.

Do you need to rank on page one to get cited in AI Overviews?

Ranking on page one helps significantly, but it’s not a guarantee. Ahrefs found that 76.1% of AI Overview citations come from top-10 ranking pages, but the correlation between ranking position and citation probability is moderate. Meanwhile, Originality.ai found that 52% of AI Overview citations come from pages outside the top 100 organic results. Organic ranking is the floor for citation eligibility, not the ceiling.

What types of content get cited most in AI Overviews?

Content that provides direct, structured answers to specific questions earns citations most consistently. Pew Research found that 88% of AI Overviews cite three or more sources and the typical summary is 67 words long. Google’s AI extracts concise passages, so pages that front-load clear answers before expanding into detail perform better than pages that bury answers beneath lengthy introductions. Video content (especially YouTube) dominates in how-to verticals, while peer-reviewed sources dominate health and finance queries.

How often do AI Overviews appear in Google search results?

Frequency varies by study and time period. Pew Research found AI summaries appeared in about 18% of Google searches in March 2025. Longer search queries (10+ words) triggered AI summaries 53% of the time, and question-based searches starting with who, what, when, or why triggered AI summaries 60% of the time. SE Ranking’s data puts the overall trigger rate closer to 30% for their keyword set.

Can AI-generated content get cited in AI Overviews?

Yes, and this is a growing concern. Originality.ai found that 10.4% of AI Overview citations link to AI-generated content. The rate is higher for citations from outside the top 100 organic results (12.8%) compared to within the top 100 (7.7%). This raises questions about a recursive feedback loop where AI trains on AI-generated material, potentially degrading information quality over time, especially for sensitive YMYL topics like health and finance.

What This Means for Your Strategy (Without the Hype)

The data tells a clear story, even if it’s not a simple one. Google’s AI Overviews pull from a broader set of sources than most marketers realize, but those sources aren’t random. They follow patterns: organic authority, deep-page specificity, content-format matching, and fan-out query alignment.

The brands that will win citations over the next 12 months aren’t the ones with the biggest domain authority scores. They’re the ones creating focused, expert-driven pages that answer specific questions clearly, maintaining active presence on platforms AI already trusts (YouTube, Reddit, LinkedIn), and updating content frequently enough to stay fresh in a system that recalibrates monthly.

If you don’t have the bandwidth to track citation patterns, run AI visibility audits, and restructure your content for this new reality, that’s exactly the kind of work LoudScale does for growth-stage brands navigating AI search.

The playing field has shifted. But shifted doesn’t mean unfair. It means the rules changed, and now you know what they are.

L
Written by

LoudScale Team

Expert contributor sharing insights on SEO.

Related Articles

Ready to Accelerate Your Growth?

Book a free strategy call and learn how we can help.

Book a Free Call