RanketAI Guide #06: Schema.org 13 Types and GEO Impact
Maps RanketAI site check's 13 recommended schema.org types (Organization, Article, FAQPage, BreadcrumbList, etc.) to their GEO impact — using KDD 2024 + Chen 2025 + Google Rich Results + Ahrefs 2026-02. JSON-LD rationale and 4-group classification included.
AI-assisted draft · Editorially reviewedThis blog content may use AI tools for drafting and structuring, and is published after editorial review by the RanketAI Editorial Team.
Summary (as of 2026-05-09): The second gate of GEO measurement is schema.org. If the first gate (#05 — robots.txt policy) is bot access, schema.org is the layer at which bots accurately understand and identify content as an entity. This guide maps the GEO impact of the 13 schema types that RanketAI site check recommends, using schema.org + Google Rich Results + KDD 2024 academic backing + Ahrefs Feb 2026 analysis.
Why Schema.org is the Second GEO Gate
In the previous post #05 — The Four AI Crawler Policies, we established that robots.txt is the first gate of GEO. If bots can't access the page, the nine GEO strategies (Aggarwal et al. KDD 2024) all collapse to zero impact. But there is another decisive branch after a bot fetches the page — what did the bot understand this page to be?
LLMs can infer page meaning from natural-language HTML alone. But that inference is probabilistic, and accuracy diverges by page, language, and domain. Schema.org makes that inference deterministic. When a page explicitly declares "this company is RanketAI, this article is Guide series #06, the author is RanketAI Editorial," LLM entity identification, citation decisions, and answer composition all stabilize.
The "AI search bias toward earned media (third-party sources)" that Chen et al. 2025 quantitatively proves also operates on top of entity-identification accuracy. If AI cannot identify the page as an entity, third-party mentions can't be attributed back to the brand. Schema.org is the base layer of entity attribution.
Schema.org's Standard Status
Schema.org was launched in 2011 with co-sponsorship by Google, Microsoft, Yahoo, and Yandex — the four major search engines. It currently defines 800+ types and 1400+ properties, specifying three syntaxes: JSON-LD, Microdata, and RDFa.
The differences between the three syntaxes are summarized in this single table:
| Syntax | Location | Extraction Robustness | Google Recommendation |
|---|---|---|---|
| JSON-LD | <script type="application/ld+json"> (separated from head/body) |
✅ (independent of HTML body) | ✅ Preferred |
| Microdata | HTML attributes (itemscope, itemtype) |
△ (bound to HTML body) | Compatible |
| RDFa | HTML attributes (vocab, typeof) |
△ (bound to HTML body) | Compatible |
Google Search Central's official guide clearly declares JSON-LD as the preferred format. Two reasons:
- Separation from HTML body — JSON-LD is written as standalone JSON inside
<script>blocks. HTML markup changes don't break schema data. - Extraction robustness — Microdata and RDFa depend on the HTML parse tree, so some bots may fail to extract them correctly. JSON-LD's tokenization is consistent.
JSON-LD also has the highest official standard status as a W3C Recommendation (1.1, 2020-07). This is the basis for RanketAI site check's recommendation to convert Microdata to JSON-LD when detected.
13 Types in 4 Groups — Why These 13?
The 13 types that site check recommends are not chosen by quantity, but by stage-by-stage entity-identification responsibility, organized in 4 groups.
Group A — Identity (3 types)
- Organization — Company / institution identity. The primary signal for "who is this company?" in AI answers. The most important type for cross-LLM entity disambiguation. (See #02 LLM Citation Algorithm.)
- Corporation — Subtype of Organization, used for for-profit legal entities. Reinforces business registration / legal entity information.
- Person — Representative / author / expert. The Authoritativeness signal under E-E-A-T.
Group B — Page Meta (4 types)
- WebSite — Site-level metadata. Site-level signals like sitelinks searchbox and alternateName.
- WebPage — Per-page metadata. The most basic wrapper for any page.
- BreadcrumbList — Page hierarchy. The signal AI answers use to recognize "which category does this article belong to?". Also directly used in Google Rich Results' SERP path display.
- CollectionPage — Listing / archive pages. Suitable for category, tag, and blog-index pages.
Group C — Content (2 types)
- Article — News / blog / guide. The #1 type for AI answer citation. The four fields headline, author, datePublished, and publisher form the core of entity attribution.
- FAQPage — Q&A structure. The core of AEO (Answer Engine Optimization). The type cited most frequently in AI answers.
Group D — Business Actions (4 types)
- Product — Products. The primary schema for commerce pages. price, availability, and review trigger Rich Results' star ratings and price displays.
- Service — Services. SaaS, consulting, and professional services. Fields: provider, areaServed, serviceType.
- LocalBusiness — Local business. For brick-and-mortar shops or branches. address, geo, and openingHours feed local search and AI answers' location citations.
- Event — Events. Conferences, seminars, webinars, launches. startDate, location, organizer.
What these 13 types share is a 1:1 mapping with AI-answer information units (entity / claim / location / time). The 13 types are the result of mapping the information units AI needs when composing an answer to schema vocabulary.
GEO Impact — Academic + Industry Evidence (as of 2026-05-09)
The impact of these 13 types is verified by both academic research (KDD 2024 / Chen 2025) and 2026 industry analysis (Ahrefs / Bing / Google).
Ahrefs Feb 2026 Quantification — AI Citations Are Expanding Beyond SERPs
According to Ahrefs analysis covered by Search Engine Land in 2026, an analysis of 863,000 keyword SERPs and 4M AI Overview URLs found that the share of AI Overview cited pages also ranking in top-10 SERP positions plummeted from 76% (mid-2025) to 38% (February 2026). This means AI answers are increasingly citing pages outside SERP visibility — as of 2026, schema.org's role in entity attribution has become more decisive. Beyond SERP ranking signals, the primary basis for LLMs to identify and trust pages is schema.org. Bing (March 2025) and Google (April 2025) have publicly acknowledged schema markup's LLM contribution (Copilot content understanding + Google search advantage).
Cite-source Strategy (KDD 2024)
Aggarwal et al. KDD 2024 quantitatively evaluated nine GEO strategies, and the Cite-source strategy (explicitly citing sources) delivered +40% citation rate. Schema.org's Article.author, Organization.url, and Article.citation fields are the implementation form of this strategy. Declaring sources via schema is more robust to LLM extraction than writing "Source: ..." in body prose.
Earned Media Bias (Chen et al. 2025)
The "AI search bias toward earned media" that Chen et al. 2025 proves quantitatively boils down to how a third-party mention is attributed back to the brand. Schema.org's Organization.sameAs field (cross-reference to authoritative sources like Wikipedia, Crunchbase, LinkedIn) is the base layer of that attribution.
E-E-A-T (Google's Official Frame)
Google's E-E-A-T frame consists of four evaluation axes: Experience, Expertise, Authoritativeness, and Trust. Schema.org's Person.knowsAbout, Article.author, Organization.foundingDate, and Review.author serve as primary signals for each axis. The same frame applies to AI Overviews' answer-quality evaluation.
RanketAI Site Check Recommendation Matrix
The rationale for each of the 13 type recommendations, in one table:
| Type | Group | Google Rich Results | AI Answer Citation Frequency | Rationale |
|---|---|---|---|---|
| Organization | A | ✅ Logo · sameAs | High | Primary entity signal (every page) |
| Corporation | A | (Organization-compatible) | Medium | Reinforces legal entity info |
| Person | A | (Indirect) | High | E-E-A-T Authoritativeness |
| WebSite | B | ✅ Sitelinks Searchbox | Medium | Site-level wrapper |
| WebPage | B | (Baseline) | Medium | Baseline for every page |
| BreadcrumbList | B | ✅ SERP path display | High | Page hierarchy recognition |
| CollectionPage | B | (Indirect) | Medium | Listing-page identification |
| Article | C | ✅ headline · author · image | Top | #1 for AI answer citation |
| FAQPage | C | ✅ Direct FAQ display | Top | Core of AEO |
| Product | D | ✅ Star rating · price · stock | High | Commerce |
| Service | D | (Indirect) | Medium | SaaS · B2B |
| LocalBusiness | D | ✅ Local Pack · Map | Medium | Brick-and-mortar businesses |
| Event | D | ✅ Event card | Medium | Schedule information |
Key patterns:
- Article + FAQPage — Highest AI answer citation frequency. Apply to all content pages first.
- Organization + Person — Primary entity signals. One publish across the site (or on a representative page) is enough.
- BreadcrumbList — Direct trigger for Rich Results' SERP path display. Add to every page as a baseline.
- Product / LocalBusiness / Event — Apply selectively by business model.
Connection to RanketAI Site Check
RanketAI site check evaluates the following items during page analysis:
| Site check evaluation item | Mapping to 13 types |
|---|---|
| Number of detected schema.org types | How many of the 13 types appear on the page |
| JSON-LD format adoption | Recommend converting Microdata / RDFa to JSON-LD |
| Missing-recommended-type analysis | Which of the 13 are missing + page-type priorities |
When the diagnosis is weak, prioritize fixes by the following causal mapping:
| Weak scenario | Recommended type to add first |
|---|---|
| Company site without Organization | Organization (Group A) — every page |
| Article / blog without Article | Article (Group C) — content pages |
| FAQ section without FAQPage | FAQPage (Group C) — Q&A pages |
| Category path without BreadcrumbList | BreadcrumbList (Group B) — every page |
| Commerce site without Product | Product (Group D) — product pages |
| Microdata only (no JSON-LD) | Re-author the same schema in JSON-LD |
In short, RanketAI site check's schema diagnosis is not an abstract guideline but a frame that diagnoses AI-answer citation potential at the page level. When a weakness is found, prioritize additions by the mapping above.
Conclusion — Standard + Academic + Industry + Measurement (4-Axis Consensus, as of 2026-05-09)
This is a 4-axis consensus frame, adding 2026 industry analysis to the 3-axis frame from #05.
- Standard — schema.org · Google Rich Results · JSON-LD W3C Recommendation — Vocabulary and format consensus for the 13 types
- Academic — KDD 2024 · Chen et al. 2025 — Quantitative validation of Cite-source strategy and earned-media bias
- Industry (2026) — Search Engine Land 2026 (Ahrefs coverage) — AI Overview citations rapidly expanding beyond SERPs (top-10 SERP overlap dropped from 76% to 38%) + Bing/Google 2025 official acknowledgment of schema's contribution
- Measurement — RanketAI site check — Per-page detection of the 13 types + weakness diagnosis
The 4-axis consensus reaches a clear conclusion:
- JSON-LD first — Microdata / RDFa have lower extraction robustness. Adopt JSON-LD as the single format.
- Group C (Article · FAQPage) is top for citation frequency — Apply first to content pages.
- Group A (Organization · Person) is the primary entity signal — One site-wide publish is sufficient.
- BreadcrumbList triggers Rich Results — Add to every page as a baseline.
- Group D (Product · Service · LocalBusiness · Event) — Apply selectively by business model.
Schema.org review is the second starting point of GEO work. When weaknesses are found in your measurement results (the schema-type detection count of RanketAI site check), reinforce by priority following the 13-type matrix above.
⚠ Schema specifications are updated frequently. The 13-type recommendations and priorities in this guide are based on the 2026-05-09 snapshot. Before applying in production, please verify against the 10 official references above (especially Google Rich Results guides, individual schema.org type pages, and the latest Ahrefs / Search Engine Land analysis).
Further reading: #01 — Why SEO Alone Isn't Enough in the AI Search Era · #02 — LLM Citation Algorithm Anatomy · #03 — Korea's AI Visibility Gap · #04 — GEO Academia vs Industry vs Measurement Mapping · #05 — The Four AI Crawler Policies
Execution Summary
| Item | Practical guideline |
|---|---|
| Core topic | RanketAI Guide #06: Schema.org 13 Types and GEO Impact |
| Best fit | Prioritize for AI Business, Funding & Market workflows |
| Primary action | Define a measurable success KPI (cost, time, or quality) before starting any AI initiative |
| Risk check | Validate ROI assumptions with a small pilot before committing the full budget |
| Next step | Establish a quarterly review cadence to track KPI movement and adjust scope |
Data Basis
- Schema.org official standard (schema.org) — A structured data vocabulary maintained by the W3C Schema.org Community Group, defining 800+ types and 1400+ properties. Specifies three syntaxes — JSON-LD, Microdata, and RDFa. Co-sponsored by Google, Microsoft, Yahoo, and Yandex, making it the de-facto search engine standard.
- Google Search Central — Structured Data guide (developers.google.com/search/docs/appearance/structured-data). The official list of schema types eligible for Rich Results. Explicitly recommends JSON-LD over alternatives, with dedicated spec pages for Article, FAQPage, BreadcrumbList, Product, and Organization.
- Aggarwal et al. "GEO: Generative Engine Optimization" (Princeton · IIT Delhi · Georgia Tech, KDD 2024, arXiv:2311.09735) — The academic origin of quantitative GEO strategy validation. Quantifies how structured data and citation signals influence LLM answer visibility. The Cite-source strategy alone delivers a +40% improvement in citation rate.
- Chen · Wang · Chen · Koudas. "How to Dominate AI Search" (2025-09, arXiv:2509.08919) — Quantitatively shows AI search is systematically biased toward earned media (third-party sources) and structured authority signals. Provides academic backing for the citation effect of Schema.org Organization, Person, and Article.
- Google — Helpful, reliable, people-first content (updated 2024) — The E-E-A-T frame (Experience · Expertise · Authoritativeness · Trust) treats schema.org's author, publisher, and reviewedBy fields as primary entity-identification signals. The same frame applies to AI Overviews answer-quality evaluation.
- JSON-LD 1.1 W3C Recommendation (2020-07) — The official W3C Recommendation for JSON for Linked Data. The `<script type="application/ld+json">` pattern, which separates from HTML body, is superior to Microdata and RDFa in extraction robustness and maintainability. JSON-LD-first extraction is observed in Google, OpenAI, Anthropic, and Perplexity LLM citations.
- Microsoft Bing Webmaster — Structured Markup Validator and JSON-LD-first recommendation. The Bing search index (shared backend with Copilot and Bing Chat) uses Schema.org vocabulary as the primary signal for entity disambiguation (homonyms — same-name people or companies).
- Search Engine Land 2026 (covering Ahrefs' February 2026 study). Analysis of 863,000 keyword SERPs and 4M AI Overview URLs. The share of AI Overview citations that also rank in the top 10 SERP positions dropped from 76% (mid-2025) to 38% (February 2026). AI answers are rapidly expanding their citation pool beyond SERP-visible pages — including pages identified through schema.org. Bing (March 2025) and Google (April 2025) have publicly acknowledged schema's LLM contribution (Copilot content understanding + Google search advantage).
Key Claims and Sources
This section maps key claims to their supporting sources one by one for fast verification. Review each claim together with its original reference link below.
Claim:Schema.org is the de-facto official vocabulary, co-sponsored by Google, Microsoft, Yahoo, and Yandex
Source:Schema.org officialClaim:Google explicitly recommends JSON-LD over Microdata and RDFa
Source:Google Structured Data guideClaim:Article, FAQPage, and BreadcrumbList are the core schema types for Google Rich Results eligibility
Source:Google Search CentralClaim:KDD 2024 validates 9 GEO strategies — Cite-source and other structured signals deliver up to +40% citation rate
Source:Aggarwal et al. KDD 2024 (arXiv:2311.09735)Claim:AI search is systematically biased toward structured authority signals (Organization, Person, Article schema)
Source:Chen et al. 2025 (arXiv:2509.08919)Claim:JSON-LD 1.1 is the official W3C Recommendation, superior to Microdata in extraction robustness due to HTML body separation
Source:W3C JSON-LD 1.1 RecommendationClaim:Under E-E-A-T, schema.org author and publisher fields serve as primary entity-identification signals
Source:Google Helpful Content guideClaim:Ahrefs Feb 2026 — top-10 SERP overlap of AI Overview citations dropped from 76% (mid-2025) to 38%; AI answers expanded citation pool beyond SERP-visible pages, raising the role of schema.org entity attribution
Source:Search Engine Land 2026 (Ahrefs coverage)
External References
The links below are original sources directly used for the claims and numbers in this post. Checking source context reduces interpretation gaps and speeds up re-validation.
- Schema.org official
- Google Search Central — Structured Data guide
- Google — Article structured data
- Google — FAQPage structured data
- Google — BreadcrumbList structured data
- Google — Helpful, reliable, people-first content (E-E-A-T)
- JSON-LD 1.1 W3C Recommendation
- Aggarwal et al. — GEO: Generative Engine Optimization (KDD 2024)
- Chen et al. — How to Dominate AI Search (2025)
- Search Engine Land — Schema markup AI search (2026)
Related Posts
These related posts are selected to help validate the same decision criteria in different contexts. Read them in order below to broaden comparison perspectives.
13 Questions When Your Brand Is Missing from AI Answers — GEO Diagnosis Guide
When your brand isn't in ChatGPT, Gemini, or Perplexity answers — 13 most frequently asked questions from RanketAI operational data. GEO/AEO measurement, content structure for LLM citation, diagnose → improve → track workflow.
Google AI Mode (May 2026 Update): How Brand Visibility Is Being Redefined
How Google AI Mode and AI Overviews are reshaping web exploration — past search, current AI answers, future brand visibility. Why SEO alone is not enough, and which new checkpoints (answer inclusion, citation share, mention context) belong in operations.
GEO Analysis Tool vs AEO Analysis Tool: Which to Use, When (2026)
GEO and AEO analysis tools measure different surfaces. Compare scope, six tool categories, scenario-based selection, the Coverage × Depth × Locale framework, and where RanketAI fits.
RanketAI Guide #04: GEO Academia × Industry × Measurement — Mapping 9 Strategies to User Signals
Aggarwal et al. (KDD 2024) defined nine GEO strategies. Chen et al. (2025) found AI search is biased toward earned media. Similarweb 2026 GenAI Brand Visibility Index and Ahrefs Brand Radar 2026 (75K brands) confirmed authority-over-scale. This guide aligns all three axes into four user-facing measurement areas.
What Is an AEO Analysis Tool? 6 Signals, 4 KPIs, and a Self-Audit Checklist (2026)
An AEO analysis tool measures the likelihood that ChatGPT, Gemini, and Perplexity will quote your page inside an answer. Learn the definition, the 6 measured signals, 4 core KPIs, and a 7-step self-audit checklist.