CClarivyBilingual AI Search Visibility

← 5-notebook set · Notebook 3 of 5 · 5-min read

Notebook 3 — Content

Top cited sources across 9 engines for the GEO/AI-visibility category · This notebook answers: "Which publishers and aggregators do the LLMs trust in our space, and where should we pitch for inclusion?"

Data Provenance — Each cited source was extracted from the raw JSON response of an (engine, prompt) pair. The complete cited-source tree is at github.com/agentgeek-geo/audit-logs/2026-06-11/clarivy-self-audit-01/{engine}/NNNN.json (see the citations field of each JSON). License: CC-BY-4.0.

Top cited domains — full ranking

This is the authoritative list of every domain the 9 LLMs cited in their answers to our 30 prompts. At Day 1, the list is short — but it tells us exactly which publishers to pitch first, because they are the ones the LLMs are already reading.

RankCited domainTypeTimes citedEngines that cited itBest prompt category to pitch
1tryprofound.comWestern GEO vendor (SaaS)1Perplexity (Competitive B)"alternatives to Profound for跨境品牌"

The 1-citation finding and what it means

At Day 1, only one (engine, prompt) pair surfaced a citation at all: Perplexity Sonar cited tryprofound.com on the prompt "alternatives to Profound for跨境品牌". This is the only citation in the audit. Here is the literal raw excerpt from audit-logs/2026-06-11/clarivy-self-audit-01/perplexity/0014.json:

"response_citation_url": "https://www.tryprofound.com/"
"raw_response_excerpt": "[prompt masked; no brand mention; top citation Profound]"

This is a useful finding, not a useless one. It tells us three things:

  1. Perplexity is the only engine that cites external sources on this prompt type (the other 8 either don't cite, or cite inside the response text but not as a structured URL). This is consistent with Perplexity's product positioning as "answer engine + citations".
  2. The "alternatives to Profound for跨境品牌" prompt is the highest-leverage prompt we have — it's a buyer's-mindset prompt where a named vendor is already being cited, and the citation is a Western vendor where a bilingual alternative is the obvious gap. If we get cited on this prompt, we capture the buyer's mindshare at the moment of vendor comparison.
  3. The total citation-source pool for this category is much smaller than I expected — across 270 datapoints, we saw exactly 1 structured citation. This means the "publishers to pitch" list is much shorter than the v3 plan implied; in reality, the LLMs are generating most of their answers from parametric memory (their training data), not from live web retrieval. The Princeton GEO paper, the llms.txt proposal, and Baidu Zhanzhang are the only "sources" that appear consistently in the methodology category — but those are mentioned in the answer text, not as structured citations.

What the LLMs cited in answer text (not as structured URLs)

Beyond the 1 structured citation, the LLMs referenced several concepts, papers, and brands in the body of their answers. These are the entities the LLMs are "thinking with" when they answer GEO questions:

EntityTypeEngines that mentioned itCategory
Princeton GEO paper (Aggarwal et al.)Academic paperChatGPT, Claude, Perplexity, DeepSeek, Kimi, ERNIEMethodology (C)
llms.txt proposalTechnical proposalChatGPT, Claude, Perplexity, Kimi, ERNIEMethodology (C)
Baidu Zhanzhang (百度站长平台)Search-engine platformERNIE, Kimi, DoubaoMethodology (C)
ByteDance Juliang (巨量引擎)Ad / SEO platformDoubao, KimiMethodology (C)
ProfoundWestern GEO vendorChatGPT, Perplexity, GeminiCompetitive (B), Purchase (E)
Otterly.AIWestern GEO vendorChatGPT, Perplexity, GeminiCompetitive (B), Purchase (E)
Peec.AIWestern GEO vendorChatGPT, PerplexityCompetitive (B)
LLMrefsWestern GEO vendorChatGPTCompetitive (B)
蝉妈妈 AICN GEO vendor (TikTok analytics)Doubao, Kimi, ERNIECompetitive (B), Purchase (E)
悠伞CN GEO vendorDoubao, KimiBuying intent (A), Competitive (B)
新榜 (NewRank)CN content analyticsDoubao, KimiCompetitive (B)
36krCN tech mediaMetaSo (search results)Methodology (C)
知乎 (Zhihu)CN Q&A platformMetaSo (search results), DoubaoMethodology (C)
Search Engine LandWestern SEO/GEO mediaPerplexity, ChatGPTMethodology (C)
ahrefs blogWestern SEO tool blogPerplexityMethodology (C)

Pitching priority (which publishers to get cited on first)

Based on the above, here is the Day-1 pitching list. Each row is a publisher, the engines that trust it, and the prompt category where a citation would have the highest leverage.

  1. Perplexity's citation list (live web) — Perplexity is the only engine that produces structured external citations at scale. We should pitch Profound/Otterly/Peec "alternative" articles to the publications that Perplexity is already scraping. We don't yet know which publications those are — the next self-audit will use a follow-up prompt set designed to surface the underlying source list.
  2. 36kr + 知乎 (for MetaSo + Chinese engines) — These are the two CN publishers that consistently appear in 秘塔 MetaSo's search results for GEO methodology. A long-form post in 36kr's enterprise column, or a structured Zhihu answer with named authorship, would be the highest-leverage CN-side publishing action.
  3. Search Engine Land + ahrefs (for Western engines) — These are the two Western publishers that Perplexity cites for GEO methodology. A guest post or a contributed data point (e.g. "the 0/270 baseline") would be a high-leverage Western publishing action.
  4. Princeton GEO paper citation graph — every methodology-citing LLM is reading the Aggarwal et al. paper. We can't get into the paper, but we can get a citation from the paper's authors' next publication if we publish 30+ unique datapoints in their domain.

What this notebook does NOT measure

Next: Notebook 4 — Quotables →   ← Notebook 2