GEO-bench (Generative Engine Optimization Benchmark)
The first large-scale benchmark for evaluating Generative Engine Optimization, introduced by Aggarwal et al. at KDD 2024. Combines diverse user queries with relevant web sources to measure how content-optimization strategies improve citation visibility inside AI-generated answers.
What is GEO-bench?
GEO-bench is the first large-scale benchmark for the GEO (Generative Engine Optimization) field, introduced by Pranjal Aggarwal et al. at KDD 2024. It bundles user queries across multiple domains together with the web sources used to answer them, so researchers and practitioners can measure how much a given content-optimization strategy lifts citation visibility inside AI-generated answers.
Unlike traditional SEO evaluation (rankings, CTR), GEO-bench treats citation and mention visibility inside the AI answer body itself as the unit of measurement. The same paper defines metrics such as Position-Adjusted Word Count (PAWC), which weights both the position and the volume of brand mentions inside the answer.
What GEO-bench measures — nine content-optimization strategies
The original paper used GEO-bench to compare nine optimization strategies. Headline results:
| Strategy | PAWC gain | Category |
|---|---|---|
| Quotation Addition | +40.7% | Trust |
| Statistics Addition | +31.7% | Trust |
| Cite Sources | +29.6% | Authority |
| Fluency Optimization | +28.1% | Style |
| Authoritative | +12.9% | Authority |
| Keyword Stuffing | no gain | (legacy SEO) |
Two findings stand out. (a) Quotations, statistics, and citations beat stylistic edits. All top three strategies fall in the "externally verifiable fact" category. (b) Keyword Stuffing — a classic SEO move — was shown empirically to have no effect. AI answer generation runs on semantic fact extraction, not keyword matching.
Why GEO-bench matters
GEO-bench is not just an evaluation tool — it became the academic reference point for the GEO field. Subsequent research reports its results relative to GEO-bench or close variants. Chen et al.'s 2025 paper How to Dominate AI Search and the 2026 Citation Absorption framework both extend GEO-bench's measurement perspective.
Limitations
As a benchmark, GEO-bench has two structural caveats. (1) Controlled query/answer set — it does not cover every variation users actually type into ChatGPT, Claude, Gemini, or Perplexity. (2) Time-locked — results were measured in the 2024 environment, before rapid changes in LLM models and search features. The nine-strategy conclusion remains a strong academic baseline, but should be cross-checked against 2026 industry reports (Similarweb, Ahrefs) and live measurement on your own brand.