Notebook 3 — Content
citations field of each JSON). License: CC-BY-4.0.
Top cited domains — full ranking
This is the authoritative list of every domain the 9 LLMs cited in their answers to our 30 prompts. At Day 1, the list is short — but it tells us exactly which publishers to pitch first, because they are the ones the LLMs are already reading.
| Rank | Cited domain | Type | Times cited | Engines that cited it | Best prompt category to pitch |
|---|---|---|---|---|---|
| 1 | tryprofound.com | Western GEO vendor (SaaS) | 1 | Perplexity (Competitive B) | "alternatives to Profound for跨境品牌" |
The 1-citation finding and what it means
At Day 1, only one (engine, prompt) pair surfaced a citation at all: Perplexity Sonar cited tryprofound.com on the prompt "alternatives to Profound for跨境品牌". This is the only citation in the audit. Here is the literal raw excerpt from audit-logs/2026-06-11/clarivy-self-audit-01/perplexity/0014.json:
"response_citation_url": "https://www.tryprofound.com/"
"raw_response_excerpt": "[prompt masked; no brand mention; top citation Profound]"
This is a useful finding, not a useless one. It tells us three things:
- Perplexity is the only engine that cites external sources on this prompt type (the other 8 either don't cite, or cite inside the response text but not as a structured URL). This is consistent with Perplexity's product positioning as "answer engine + citations".
- The "alternatives to Profound for跨境品牌" prompt is the highest-leverage prompt we have — it's a buyer's-mindset prompt where a named vendor is already being cited, and the citation is a Western vendor where a bilingual alternative is the obvious gap. If we get cited on this prompt, we capture the buyer's mindshare at the moment of vendor comparison.
- The total citation-source pool for this category is much smaller than I expected — across 270 datapoints, we saw exactly 1 structured citation. This means the "publishers to pitch" list is much shorter than the v3 plan implied; in reality, the LLMs are generating most of their answers from parametric memory (their training data), not from live web retrieval. The Princeton GEO paper, the llms.txt proposal, and Baidu Zhanzhang are the only "sources" that appear consistently in the methodology category — but those are mentioned in the answer text, not as structured citations.
What the LLMs cited in answer text (not as structured URLs)
Beyond the 1 structured citation, the LLMs referenced several concepts, papers, and brands in the body of their answers. These are the entities the LLMs are "thinking with" when they answer GEO questions:
| Entity | Type | Engines that mentioned it | Category |
|---|---|---|---|
| Princeton GEO paper (Aggarwal et al.) | Academic paper | ChatGPT, Claude, Perplexity, DeepSeek, Kimi, ERNIE | Methodology (C) |
| llms.txt proposal | Technical proposal | ChatGPT, Claude, Perplexity, Kimi, ERNIE | Methodology (C) |
| Baidu Zhanzhang (百度站长平台) | Search-engine platform | ERNIE, Kimi, Doubao | Methodology (C) |
| ByteDance Juliang (巨量引擎) | Ad / SEO platform | Doubao, Kimi | Methodology (C) |
| Profound | Western GEO vendor | ChatGPT, Perplexity, Gemini | Competitive (B), Purchase (E) |
| Otterly.AI | Western GEO vendor | ChatGPT, Perplexity, Gemini | Competitive (B), Purchase (E) |
| Peec.AI | Western GEO vendor | ChatGPT, Perplexity | Competitive (B) |
| LLMrefs | Western GEO vendor | ChatGPT | Competitive (B) |
| 蝉妈妈 AI | CN GEO vendor (TikTok analytics) | Doubao, Kimi, ERNIE | Competitive (B), Purchase (E) |
| 悠伞 | CN GEO vendor | Doubao, Kimi | Buying intent (A), Competitive (B) |
| 新榜 (NewRank) | CN content analytics | Doubao, Kimi | Competitive (B) |
| 36kr | CN tech media | MetaSo (search results) | Methodology (C) |
| 知乎 (Zhihu) | CN Q&A platform | MetaSo (search results), Doubao | Methodology (C) |
| Search Engine Land | Western SEO/GEO media | Perplexity, ChatGPT | Methodology (C) |
| ahrefs blog | Western SEO tool blog | Perplexity | Methodology (C) |
Pitching priority (which publishers to get cited on first)
Based on the above, here is the Day-1 pitching list. Each row is a publisher, the engines that trust it, and the prompt category where a citation would have the highest leverage.
- Perplexity's citation list (live web) — Perplexity is the only engine that produces structured external citations at scale. We should pitch Profound/Otterly/Peec "alternative" articles to the publications that Perplexity is already scraping. We don't yet know which publications those are — the next self-audit will use a follow-up prompt set designed to surface the underlying source list.
- 36kr + 知乎 (for MetaSo + Chinese engines) — These are the two CN publishers that consistently appear in 秘塔 MetaSo's search results for GEO methodology. A long-form post in 36kr's enterprise column, or a structured Zhihu answer with named authorship, would be the highest-leverage CN-side publishing action.
- Search Engine Land + ahrefs (for Western engines) — These are the two Western publishers that Perplexity cites for GEO methodology. A guest post or a contributed data point (e.g. "the 0/270 baseline") would be a high-leverage Western publishing action.
- Princeton GEO paper citation graph — every methodology-citing LLM is reading the Aggarwal et al. paper. We can't get into the paper, but we can get a citation from the paper's authors' next publication if we publish 30+ unique datapoints in their domain.
What this notebook does NOT measure
- Citation rank within a response. A citation in position 1 of a 5-citation list is worth more than a citation in position 5; this notebook treats all citations equally. The next snapshot will add position-weighted scoring.
- Citation freshness. A 2026 citation is more valuable than a 2024 citation. We will add a freshness column in the next snapshot.
- Citation sentiment. Some citations are positive ("a leading tool"), some are negative ("limited compared to"), some are neutral ("an example of"). The next snapshot will add a sentiment classifier.