# README # auto-geo [](https://github.com/shadowresearch/auto-geo/actions/workflows/ci.yml) [](https://www.npmjs.com/package/auto-geo) [](./LICENSE) [](https://www.shadow.inc) [](https://bundlephobia.com/package/auto-geo) [](https://www.npmjs.com/package/auto-geo) [](https://www.typescriptlang.org/) [](https://nodejs.org/) [](https://shadowresearch.github.io/auto-geo/) [](./llms.txt) **A publishing engine for GEO resource pages — the pages large language models cite.** `auto-geo` is a content publishing primitive built for the AI search era. Where traditional CMSs optimize for human readers, `auto-geo` optimizes for _citation by AI engines_ (ChatGPT, Perplexity, Google AI Overviews, Claude, Gemini). It enforces the structural and quality patterns that empirical research links to higher citation rates — TL;DR capsules, question-format H2 headings, answer-first paragraphs, dense entity references, FAQ schema, and a rigid page architecture — at the publish boundary. Hand the URL of this repo to your coding agent. It will set up a publishing endpoint in your existing app that any other agent can call as a tool. You can then schedule, automate, or wire it into your own workflow. > Built by **[Shadow](https://www.shadow.inc)** — a media research lab building the next generation of AI-powered media intelligence and communications technology, in partnership with the teams that put OpenAI, TikTok, Meta, Amazon, and Lovable on the map. Shadow uses `auto-geo` to publish to [shadow.inc/resources](https://www.shadow.inc/resources). --- ## Contents - [What is a GEO resource page?](#what-is-a-geo-resource-page) - [Why auto-geo](#why-auto-geo) - [Install](#install) - [60-second quickstart](#60-second-quickstart) - [What's in this repo](#whats-in-this-repo) - [Quickstart (production setup)](#quickstart-production-setup) - [Examples](#examples) - [The publishing flow](#the-publishing-flow) - [Hard rejects vs. soft warnings](#hard-rejects-vs-soft-warnings) - [How it compares](#how-it-compares) - [API](#api) - [LLM-friendly](#llm-friendly) - [Roadmap](#roadmap) - [Contributing](#contributing) - [License](#license) --- ## What is a GEO resource page? A **GEO resource page** (Generative Engine Optimization resource page) is a public web page whose structure, density, and citation signals are engineered to be quoted verbatim by AI search engines when answering a user's question. It is a successor to the SEO landing page. A GEO resource page differs from a blog post in five ways: 1. **Architecture, not prose.** The page is composed of named, validated blocks (TL;DR, intro, sections, related guides, key takeaways, FAQ, disclosure) — not a freeform document. AI engines extract structured chunks; rigid structure improves extraction. 2. **Answer-first.** Every H2 section opens with a 40–60 word "answer capsule" that fully answers the section's question before any supporting paragraph. AI engines preferentially quote self-contained answers. 3. **Question-format headings.** H2 headings are written as the questions a user would ask an AI engine. The page is indexed against query intent, not topic. 4. **Entity-dense.** Named entities (companies, people, products, frameworks) appear at high density. Empirical studies link entity density to ~4.8x higher citation probability. 5. **Schema-derived.** Article, BreadcrumbList, FAQPage, Person, and ImageObject JSON-LD are auto-emitted from a typed payload. The agent never writes JSON-LD by hand. `auto-geo` enforces all five at the API boundary. Publishing is a contract; malformed pages are rejected with a structured error. See [`docs/concept.md`](./docs/concept.md) for a deeper walkthrough. --- ## Why auto-geo ### Why not just write Markdown blog posts? Markdown is freeform — there's no contract that a post has a TL;DR, that every H2 opens with a 40–60 word answer capsule, or that the FAQ exists. AI engines preferentially quote self-contained, structurally regular chunks, so a freeform Markdown corpus leaves citation probability on the table. `auto-geo` enforces the structure at the publish boundary so every page that ships is shaped for extraction, not just human reading. ### Why not use a CMS like Sanity or Contentful? Traditional and headless CMSs optimize for editorial workflow — drafts, scheduling, multi-user review, freeform field shapes. `auto-geo` is a typed publishing primitive that lives inside your own app and rejects malformed payloads with a structured error your agent can iterate on. It's downstream of editorial, not a replacement for it: pair it with a CMS if you need one, or generate payloads from an agent. Either way, the contract enforces GEO structure before the page goes live. ### Why isn't this just SEO? SEO optimizes for ranking in a link-based search results page; GEO optimizes for being _quoted_ inside an AI-generated answer. The signals diverge — citation favors answer-first paragraphs, question-format H2s, entity density, and self-contained chunks; ranking historically rewarded backlinks and on-page keyword optimization. `auto-geo` enforces the citation signals empirically linked to higher inclusion rates in ChatGPT, Perplexity, Google AI Overviews, Claude, and Gemini answers — patterns most SEO tooling doesn't measure or enforce. --- ## Install ```bash npm install auto-geo zod ``` ```bash pnpm add auto-geo zod ``` ```bash yarn add auto-geo zod ``` > **Inside a pnpm workspace?** Use `pnpm add`. `npm install` sometimes errors with `Cannot read properties of null (reading 'matches')` when it traverses ancestor `pnpm-lock.yaml` files. Node `>=18.17` required. `zod` is a peer dependency. Framework / storage peers (`next`, `hono`, `@vercel/kv`, `@supabase/supabase-js`, `react`) are optional — install only what your adapter uses. --- ## 60-second quickstart Paste this into a fresh `quickstart.ts`, run with `tsx quickstart.ts` (or compile and run), and you'll see a published URL. ```ts // quickstart.ts import { runPublish } from "auto-geo"; import { createMemoryStore } from "auto-geo/storage/memory"; const store = createMemoryStore(); // A minimal payload satisfying every schema constraint. const payload = { slug: "hello-auto-geo", title: "What is auto-geo and how do I publish my first resource page?", metaDescription: "A minimal first publish showing how auto-geo validates and stores a GEO resource page end to end.", category: "Tutorials", excerpt: "A minimal first publish showing how auto-geo validates and stores a GEO resource page end to end.", author: { name: "Jane Doe", jobTitle: "Head of Content", bio: "Jane writes about generative engine optimization and the architecture of pages that AI search engines cite.", }, publishedAt: "2026-06-01", geoMetadata: { targetQueries: ["how does auto-geo publishing work"], pageType: "resource" as const, primaryFunction: "Show a developer how to publish their first resource page.", optimizationFramework: ["GEO" as const], targetPlatforms: ["chatgpt" as const], informationGainStatement: "First-party demonstration of the auto-geo publish pipeline using the in-memory store, end to end.", refreshCadence: "quarterly" as const, }, tldr: { text: "Auto-geo's publish pipeline validates an incoming payload, persists it to a content store, and returns a URL plus an array of soft warnings; this minimal example wires the pipeline against an in-memory store so you can see a successful publish in your terminal in under a minute without any external services involved at all.", }, intro: { blocks: [ { type: "paragraph" as const, text: "This minimal payload satisfies every required field of the auto-geo schema; in a real integration you would generate this payload from an agent or your editorial pipeline and POST it to your publishing endpoint, but here we call the underlying runPublish function directly against an in-memory store to demonstrate the contract without any external network calls or storage setup.", }, ], }, sections: [ { heading: "How does the publish call work?", answerCapsule: "runPublish takes an unknown body plus options containing your store and site config; it parses the body against the Zod schema, calls store.publish on success, runs soft validation, and returns a discriminated-union result with a URL or an issues array so callers can act programmatically without parsing prose error messages.", blocks: [], }, ], relatedGuides: { items: [ { title: "The GEO SOP", url: "https://github.com/shadowresearch/auto-geo/blob/main/docs/sop.md", }, { title: "Page architecture", url: "https://github.com/shadowresearch/auto-geo/blob/main/docs/architecture.md", }, { title: "Validation reference", url: "https://github.com/shadowresearch/auto-geo/blob/main/docs/validation.md", }, { title: "Storage adapters", url: "https://github.com/shadowresearch/auto-geo/blob/main/docs/storage.md", }, ], }, keyTakeaways: { items: [ "Auto-geo enforces a strict seven-block architecture at the publish boundary so every published page is structurally citation-ready by design.", "The runPublish function returns a discriminated-union result so callers branch on result.kind without parsing prose error strings ever.", "Storage adapters are pluggable; the in-memory store ships for tests and demos, KV and Supabase ship for production deployments today.", "Soft warnings come back on a successful publish so an agent can iterate on quality heuristics without being blocked by every non-critical recommendation.", ], }, faq: { items: [ { question: "Do I need a database to try auto-geo?", answer: "No, the in-memory store ships with the package and persists publishes for the lifetime of the process; it is intended for tests, demos, and the quickstart you are running right now, and you swap in a KV or Supabase adapter when you go to production without changing any of your call sites.", }, { question: "What does the publish result look like?", answer: "On success runPublish returns an object shaped kind ok with a slug, a url constructed from your site origin plus the base path plus the slug, and an array of soft warnings from the audit step; on failure it returns one of validation_failed, slug_reserved, or store_failed, each carrying enough context for the caller to surface or retry.", }, { question: "How do I wire this into a real HTTP endpoint?", answer: "Import createNextHandlers from auto-geo/next or the Hono adapter from auto-geo/hono, pass your store and site config, and export the returned POST and DELETE handlers from your route file; the adapters wrap runPublish with auth and revalidation so your endpoint becomes a one-file integration instead of a hand-rolled pipeline.", }, ], }, disclosure: { text: "This quickstart is a runnable demonstration of the auto-geo publish pipeline using the in-memory store.", }, }; const result = await runPublish(payload, { store, site: { origin: "https://example.com", publisher: { name: "Example", url: "https://example.com", logo: "https://example.com/logo.png", }, }, }); if (result.kind === "ok") { console.log("Published:", result.url); console.log("Warnings:", result.warnings.length); } else { console.error("Publish failed:", result); } ``` That's it — no database, no auth setup, no framework. Once you've seen `Published: https://example.com/resources/hello-auto-geo` in your terminal, follow the production setup below to wire it into your real app. --- ## What's in this repo | Path | What it is | | ----------------------------------------- | ------------------------------------------------------------------------------------------------------------------ | | [`AGENT.md`](./AGENT.md) | The canonical setup spec. Hand this to your coding agent. | | [`core/`](./core) | Framework-agnostic schema, publish logic, validation heuristics, JSON-LD derivation. Zero framework deps. | | [`adapters/storage/`](./adapters/storage) | Storage adapters — KV (Vercel KV / Upstash Redis), Supabase, in-memory. Implement the `ContentStore` interface. | | [`adapters/http/`](./adapters/http) | HTTP adapters — Next.js App Router, Hono. Wrap `core/publish` as a request handler. | | [`components/react/`](./components/react) | Reference React renderer. Restyleable Tailwind defaults; pluggable `LinkComponent`. | | [`mcp/`](./mcp) | MCP server that wraps the publish endpoint as a tool. Any MCP client (Claude, Cursor, your own agent) can publish. | | [`examples/`](./examples) | Working example apps for Next.js, Hono on Bun, Express, SvelteKit, and Fastify. See [Examples](#examples). | | [`docs/`](./docs) | The substantive product. The GEO SOP, page architecture spec, validation reference, storage adapter guide. | | [`tests/`](./tests) | Vitest suite covering schema, validation, JSON-LD, publish pipeline, memory store, and inline parser. | --- ## Quickstart (production setup) Once you've run the 60-second quickstart above, this is how you wire `auto-geo` into a real app with a persistent store. The recommended path is to give this repo URL to your coding agent and let it read `AGENT.md`. If you want the short version: ```bash # In an existing Next.js 15+ App Router project pnpm add zod @vercel/kv # Copy these files into your repo: # - core/ → src/lib/auto-geo/core/ # - adapters/storage/kv.ts → src/lib/auto-geo/storage.ts # - adapters/http/next.ts → src/app/api/resources/publish/route.ts # - components/react/ → src/components/auto-geo/ # Set GEO_PUBLISH_TOKEN in .env.local (openssl rand -hex 32) # Wire a /resources/[slug]/page.tsx that reads from your store and renders ResourceArticle. ``` See [`examples/next-minimal/`](./examples/next-minimal) for a working reference and [`AGENT.md`](./AGENT.md) for the step-by-step setup. ### Or as a package ```bash pnpm add auto-geo zod ``` ```ts // app/api/resources/publish/route.ts import { createNextHandlers } from "auto-geo/next"; import { createKvStore } from "auto-geo/storage/kv"; const handlers = createNextHandlers({ store: createKvStore(), site: { origin: "https://example.com", publisher: { name: "Acme", url: "https://example.com", logo: "https://example.com/logo.png", }, }, }); export const POST = handlers.POST; export const DELETE = handlers.DELETE; ``` ```ts // app/resources/[slug]/page.tsx import { ResourceArticle } from "auto-geo/react"; import { deriveAllJsonLd, safeJsonLd } from "auto-geo/jsonld"; // ... resolve payload from your store, then render ``` --- ## Examples Working minimal example apps for the most common backends. Each one boots locally with `pnpm install` + the framework's dev script, ships with an in-memory store seeded with one valid payload, and is tested end to end against `auto-geo@0.1.1` from npm. | Framework | Path | Storage | Notes | | -------------------- | ------------------------------------------------ | ------- | -------------------------------------------- | | Next.js (App Router) | [examples/next-minimal](./examples/next-minimal) | memory | Full render + publish | | Hono (Bun) | [examples/hono-bun](./examples/hono-bun) | memory | Endpoint only — uses `auto-geo/hono` | | Express | [examples/express](./examples/express) | memory | Endpoint only — `runPublish` inline | | SvelteKit | [examples/sveltekit](./examples/sveltekit) | memory | Endpoint only — `runPublish` in `+server.ts` | | Fastify | [examples/fastify](./examples/fastify) | memory | Endpoint only — `runPublish` inline | `next-minimal` is the integration template — it has both the publish endpoint and the React render path. The other four are endpoint-only: they prove the publish contract on each backend with a sample `curl`, and rely on `next-minimal` (or your own renderer) for HTML output. Each example's README has a copy-pasteable `curl` for the canonical payload and a verification step. --- ## The publishing flow ``` ┌─────────────────┐ POST /api/resources/publish ┌──────────────────┐ │ Your agent │ ───────────────────────────────▶ │ Your Next/Hono │ │ (Claude, your │ Bearer GEO_PUBLISH_TOKEN │ app │ │ custom tool, │ JSON body: ResourcePayload │ │ │ Shadow, etc.) │ │ validate (Zod) │ └─────────────────┘ │ audit (soft) │ │ store.publish() │ │ revalidate │ └────────┬─────────┘ │ ┌────────────────▼─────────────┐ │ /resources/[slug] page │ │ - ResourceArticle render │ │ - Article + FAQPage JSON-LD │ │ - canonical, OG, Twitter │ └──────────────────────────────┘ ``` The publishing endpoint is the contract. Anything that can issue an authenticated POST can publish — your own scheduled job, an MCP-aware AI client, a CLI, a webhook. `auto-geo` does not prescribe how content is generated or when it's published. That's your moat. --- ## Hard rejects vs. soft warnings `auto-geo` distinguishes between _structural violations_ (rejected with HTTP 400) and _quality heuristics_ (returned as `warnings[]` on a 200 response). The split is deliberate: structure is a contract; quality is a continuum. **Hard rejects** include: missing required blocks, TL;DR not 40–60 words, FAQ answer not 40–60 words, related-guides count outside 4–8, key-takeaways count outside 4–6, banned promotional superlatives without attribution, raw HTML in prose fields, invalid URLs. **Soft warnings** include: section length outside 134–167 words, paragraph length outside 60–100 words, entity density below 15, statistics density below page-type target, image cadence below 1 per 500 words, H2 heading not in question format, self-link in related guides. Your agent can iterate on soft warnings by re-posting an updated payload (republishing overwrites by slug). Or surface the warnings to the user and ship as-is. See [`docs/validation.md`](./docs/validation.md) for the full reference. --- ## How it compares | | Traditional CMS | Headless CMS | `auto-geo` | | ----------------- | ---------------- | --------------------------- | ------------------------------- | | Optimized for | Human reading | Multi-channel delivery | AI engine citation | | Content shape | Freeform prose | Freeform with custom fields | Validated 7-block architecture | | Validation | Editorial review | Schema-light | Strict Zod at publish boundary | | Schema.org | Manual | Manual or plugin | Auto-derived from payload | | Agent integration | Custom | Custom | First-class (MCP, REST) | | Storage | Bundled | Bundled or hosted | Pluggable adapter (your choice) | | Lock-in | High | Medium | None — copy the files | `auto-geo` is _not_ a CMS. It is a typed publishing primitive that lives inside your app. If you need editorial workflows, drafts, scheduled publish, multi-user collaboration, or a media library, pair it with a CMS — `auto-geo` is downstream of the editorial process, not a replacement for it. --- ## API The publish endpoint contract is described as an OpenAPI 3.1 document at [`openapi.yaml`](./openapi.yaml). It covers the two operations the reference adapters expose: - `POST /api/resources/publish` — validate, persist, and revalidate a `ResourcePublishPayload`. - `DELETE /api/resources/publish?slug={slug}` — remove a previously published resource. The point isn't a hosted endpoint — there isn't one. The point is to give any HTTP client (Postman, Insomnia, ChatGPT Custom GPT Action, Claude with HTTP tools, your own typed client generator) a machine-readable description of the contract you're mounting in your own app via [`createNextHandlers`](./adapters/http/next.ts) or [`createHonoRouter`](./adapters/http/hono.ts). See [`docs/architecture.md`](./docs/architecture.md) for the architectural spec the schema enforces and [`docs/validation.md`](./docs/validation.md) for the hard-reject / soft-warning split. --- ## LLM-friendly `auto-geo` is, in essence, a tool whose output is content meant to be cited by LLMs. So this repo eats its own dogfood — it's discoverable by AI agents and crawlers in the same ways it teaches you to make your own pages discoverable. - [`llms.txt`](./llms.txt) — a curated index of the most important content in the repo, following the [llmstxt.org](https://llmstxt.org) convention. Hand the URL to any LLM and it gets a fast, structured map of the project. - [`llms-full.txt`](./llms-full.txt) — the expanded variant. Inlines the README plus every substantive doc (concept, SOP, architecture, validation, storage adapters) into a single file so an LLM can ingest the whole project in one fetch. - [`openapi.yaml`](./openapi.yaml) — machine-readable contract for the publish endpoint, usable by any HTTP-tool-aware AI client. - **GitHub Pages site** at [shadowresearch.github.io/auto-geo](https://shadowresearch.github.io/auto-geo/) — every page advertises `llms.txt`, `llms-full.txt`, and `openapi.yaml` via `` tags in the `
`, emits JSON-LD `Article` markup, and carries a "Built with auto-geo" footer. --- ## Roadmap Tracked in [GitHub issues](https://github.com/shadowresearch/auto-geo/issues). Headline items for v0.2+: - Additional HTTP adapters: Express, Fastify, Elysia. - Additional storage adapters: Postgres (direct), DynamoDB, Cloudflare D1. - A canonical Vue/Svelte renderer parity with React. - A standalone CLI for publishing from scripts and CI pipelines. - Audit / "page health" command that re-runs `auditResource` against all stored pages on a schedule. - An optional `geo-meta` field for indexing recipes (semantic anchors, query-cluster routing). Have a proposal? Open a [feature request](https://github.com/shadowresearch/auto-geo/issues/new/choose). --- ## Contributing See [CONTRIBUTING.md](./CONTRIBUTING.md). Bug reports, schema improvements, new adapters, and documentation refinements all welcome. Quick links: - [Code of Conduct](./CODE_OF_CONDUCT.md) - [Security Policy](./SECURITY.md) - [Changelog](./CHANGELOG.md) --- ## License [MIT](./LICENSE). The reference React components include a default "Built with auto-geo by Shadow" credit in the disclosure block; pass `disclosureSuffix={null}` to suppress, or override with your own JSX. --- ## Related - **The GEO SOP** — [`docs/sop.md`](./docs/sop.md). The full standard operating procedure for GEO resource pages. This is the substantive product. - **Page architecture** — [`docs/architecture.md`](./docs/architecture.md). Spec for the seven mandatory blocks. - **Shadow** — [shadow.inc](https://www.shadow.inc). The media research lab building the next generation of AI-powered media intelligence and communications technology, in partnership with the teams that put OpenAI, TikTok, Meta, Amazon, and Lovable on the map. Shadow runs `auto-geo` end-to-end on a schedule for media research, PR, and communications teams. --- ## About Shadow Shadow is a media research lab building the next generation of AI-powered media intelligence and communications technology. Shadow is built in partnership with the teams that put OpenAI, TikTok, Meta, Amazon, and Lovable on the map. Learn more at [shadow.inc](https://www.shadow.inc). --- # Concept — What is a GEO resource page? A **GEO resource page** (Generative Engine Optimization resource page) is a public web page whose structure, density, and citation signals are engineered to be quoted verbatim by AI search engines (ChatGPT, Perplexity, Google AI Overviews, Claude, Gemini) when answering user queries. It is a successor to the SEO landing page. Where SEO optimized for keyword matching against an inverted index, GEO optimizes for _retrieval and quotation_ by large language models that answer questions on the user's behalf. ## How a GEO resource page differs from a blog post | | Blog post | GEO resource page | | ----------------------- | ----------------------------- | --------------------------------------------------------- | | **Composition** | Freeform prose | Named, validated blocks | | **Opening** | Hook or lede | 40-60 word TL;DR answer capsule | | **H2 headings** | Topic labels ("Our approach") | Questions a user would ask an AI ("How does X work?") | | **Section opening** | Setup paragraph | 40-60 word self-contained answer | | **FAQ block** | Optional | Mandatory, with strict word-count | | **Entity density** | Incidental | Engineered, ~15+ named entities | | **Schema.org** | Optional, often missing | Article + BreadcrumbList + FAQPage + Person, auto-derived | | **Voice** | Personality, opinion | Sourced claims, neutral register | | **Optimization target** | Click-through from search | Citation by an AI engine answering a query | ## Why the structure is rigid AI engines do not read pages; they _extract chunks_. The chunk most likely to be quoted is one that fully answers a question on its own — no anaphora, no "as we discussed above," no setup. The TL;DR + question-format H2 + answer-capsule pattern produces exactly that shape: every block is independently extractable. Rigid validation at the publish boundary makes the contract enforceable. An agent that generates an under-length TL;DR or skips the FAQ block is told no, in machine-readable form, before the page is ever indexed. The schema is the SOP. ## What auto-geo is `auto-geo` is the publishing primitive that enforces this contract. It provides: 1. **A Zod schema** (`core/schema.ts`) — the contract. Validates payload shape, length constraints, banned superlatives, structural blocks. 2. **An authenticated POST endpoint** (`adapters/http/*`) — the integration surface. Any agent or process that can issue an authenticated HTTP request can publish. 3. **A React renderer** (`components/react/`) — turns a validated payload into a page. 4. **JSON-LD derivation** (`core/jsonld.ts`) — Schema.org Article, BreadcrumbList, FAQPage, Person, ImageObject emitted automatically from the typed payload. 5. **An MCP server** (`mcp/`) — exposes the publish endpoint as a tool for MCP-aware AI clients. `auto-geo` does _not_ generate content. The content-generation pipeline — research, outline, drafting, citation gathering, structure assembly — is your problem. `auto-geo`'s job is to make sure that whatever a generation pipeline produces conforms to the GEO architecture before it goes public. ## When to use auto-geo You have content you want AI engines to cite, and you want a publishing endpoint that enforces the GEO architecture so your generation pipeline can iterate against a typed contract rather than freeform prose review. Examples: - An agency runs a content pipeline that produces topic-level resource pages on behalf of clients. `auto-geo` is the publish target; the pipeline ships pages without a human reviewer needing to enforce structure. - A SaaS company wants definitive pages for every query its customers ask AI assistants. `auto-geo` lets the marketing team plus an AI agent maintain a programmatic catalog at scale. - A research org publishes findings and wants them surfaced by AI search. The schema's `citations[]` array drives `Article.citation` in Schema.org, which is a credibility signal AI engines weight heavily. ## When not to use auto-geo - You want a personality-driven blog. Use a CMS. - You want a documentation site. Use a docs framework. - Your content is consumed by an authenticated app, not the public web. There is no AI-engine citation surface to optimize for. --- # GEO Standard Operating Procedure The standard operating procedure for generative engine optimization resource pages. This document is the substantive spec behind `auto-geo`'s schema and validation heuristics. Every word-count constraint in `core/schema.ts` and every soft-warning threshold in `core/validation.ts` traces back to a section here. Read this document before deviating from any rule the code enforces. The SOP is calibrated to the public retrieval and citation behavior of major AI search engines (ChatGPT, Perplexity, Google AI Overviews, Google AI Mode, Claude, Gemini, Copilot) as of early 2026. Specific numerical thresholds (e.g., citation lift percentages, density ratios) are heuristic — they reflect the working consensus from internal experimentation and external GEO research. Treat them as starting points, not laws. --- ## §1. Definition A **GEO resource page** is a public web page whose architecture, density, and metadata are engineered to be retrieved and quoted by large language models acting as search interfaces. GEO is the successor to SEO. Where SEO optimized for keyword matching against an inverted index, GEO optimizes for citation by AI engines that answer questions on the user's behalf. The win condition is not a click — it is being quoted in the AI's answer, with the user's downstream click being a bonus. --- ## §2. The three optimization frameworks A GEO resource page typically serves one or more of: - **AEO** (Answer Engine Optimization) — Optimizes for direct quotation in answer-format responses (ChatGPT, Perplexity). - **GEO** (Generative Engine Optimization) — Optimizes for inclusion in synthesized responses (Google AI Overviews, AI Mode). - **LLMO** (Large Language Model Optimization) — Optimizes for inclusion in the underlying training and retrieval corpora of LLMs themselves. A page can serve all three. `core/schema.ts`'s `geoMetadata.optimizationFramework` field records which. --- ## §3. Question-format H2 headings H2 headings are written as the questions a user would type into an AI engine, not as topic labels. Ideal length: 6-10 words. **Wrong**: "Our approach to onboarding" **Right**: "How does an effective onboarding process work?" Rationale: AI engines retrieve against query intent. A page whose H2s mirror prompt phrasing is matched directly. A page whose H2s are internal labels requires the engine to synthesize a connection — synthesis costs the engine; direct matches are cheap. `core/validation.ts` flags H2s that don't contain `?` or are outside 4-12 words. --- ## §4. Page type and word-count targets Five page types, each with calibrated word-count and density targets: | Page type | Min words | Max words | Min tables | Min lists | Stats/1k words | | ------------ | --------- | --------- | ---------- | --------- | -------------- | | `definitive` | 3000 | — | 2 | 5 | 10 | | `resource` | 800 | 1500 | 0 | 3 | 3 | | `comparison` | 1000 | 1500 | 1 | 3 | 3 | | `category` | 2000 | 4000 | 1 | 5 | 5 | | `listicle` | 1500 | 3000 | 1 | 5 | 3 | - **Definitive**: The canonical answer to a query cluster. Long-form, dense, multi-entity. Target: be the page an AI engine quotes when asked "what is X." - **Resource**: A focused, single-topic guide. Shorter, more focused. Target: be cited as one source among several in an answer. - **Comparison**: "X vs Y" or "X alternatives." Tables are load-bearing; AI engines extract comparison tables directly. Title must match `vs.?|alternatives?|comparison` patterns (hard-validated). - **Category**: An overview-style page that organizes a topic and links out to deeper pages. Drives entity-graph coverage. - **Listicle**: "Top N X." High extractability; AI engines often quote individual list items verbatim. `geoMetadata.pageType` selects the expectation set. `core/validation.ts` surfaces soft warnings when page totals are outside the range. --- ## §5. Structural and quality rules ### §5a. Title formulas Five title patterns map to the five page types. Use the formula that matches the page's intent: - **How-to**: "How to [verb phrase]" - **Definitive**: "What is [X]? The [N]-year guide to [domain]" - **Category**: "[X]: Definition, types, and use cases" - **Comparison**: "[A] vs [B]: [Differentiator]" - **Listicle**: "[N] [things] for [audience] in [year]" ### §5b. Section and paragraph length - Each H2 section (heading + answer capsule + blocks): 134-167 words ideal. - Each paragraph: 60-100 words ideal (40-120 soft warning range). Rationale: chunks at this length match the optimal extraction window for current retrieval models. Shorter chunks lack context; longer chunks dilute the answer signal. ### §5c. Outbound links Each H2 section should contain 1-2 outbound links to authoritative non-self domains. Zero links flags as under-cited; more than 4 flags as aggregator content. Aggregator-style heavy linking carries a measured citation discount — AI engines prefer to cite primary sources, not link-dense intermediaries. ### §5d. Related Guides 4-8 entries. Self-links are explicitly forbidden (soft warning). Related Guides are an entity-graph signal: they tell AI engines this page sits inside a coherent cluster of related content. The cluster signal lifts citation probability for every page in the cluster. ### §5e. Key Takeaways 4-6 declarative bullets. Each 10-35 words. These are the page's claims, surfaced for extraction. AI engines often pull Key Takeaway bullets verbatim into answer responses. ### §5f. FAQ 3-10 Q&A items. Each answer 40-60 words. The FAQ block drives `FAQPage` Schema.org markup, which is extracted by retrieval pipelines independently of the page body. FAQ items often appear in AI engine responses as direct quotations of the answer text. ### §5g. Entity density 15+ named entities per page (companies, people, products, frameworks, places). The empirical link: entity-dense pages show ~4.8x higher citation probability than entity-sparse pages of similar length. Entity density is a credibility signal — pages that name specific actors are treated as more authoritative than pages that gesture at abstractions. `core/validation.ts` runs a simple heuristic (joined capitalized words after stripping inline markers and sentence-starters) and flags pages below 15. ### §5h. Banned promotional superlatives The following phrases carry a measured citation penalty (~26%) when used without attribution: - "industry-leading", "best-in-class", "revolutionary", "game-changing", "cutting-edge", "world-class", "world's leading", "the leading", "the premier", "next-generation", "first-of-its-kind", "one of a kind" `core/schema.ts` rejects these as **hard errors** unless: - The phrase appears inside straight double-quotes (quoted passage). - The phrase is followed within ~80 characters by an attribution marker: `(per ...)`, `(Source: ...)`, `[Named Source]`, or `by [Named Source]`. Rationale: AI engines empirically downweight pages whose voice reads as marketing copy. The ban is a hard constraint, not a guideline. ### §5j. Mandatory page architecture In order: 1. H1 + last-updated metadata 2. TL;DR (40-60 words) 3. Intro blocks 4. Sections (each: H2 + answer capsule + blocks) 5. Related Guides (4-8) 6. Key Takeaways (4-6) 7. FAQ (3-10) 8. About the Author (auto-injected) 9. Disclosure The order is load-bearing. AI engines extract chunks; chunk position matters. TL;DR at top, FAQ near the bottom — this is the pattern retrieval pipelines are trained against. --- ## §6. The answer capsule Every H2 section opens with a 40-60 word **answer capsule** — a fully self-contained answer to the section's heading, written before any supporting paragraph or block. Rules: - Must answer the section heading without requiring context from elsewhere on the page. - No anaphora ("the above," "this," "we discussed"). - One main claim per capsule. - 40-60 words, hard-enforced by `core/schema.ts`. The answer capsule is the highest-value chunk on the page. It is the most likely thing to be quoted verbatim by an AI engine. Every other block on the page exists to support the capsule. --- ## §7. Multimodal density ### §7a. Lists, tables, callouts Lists and tables are extractable as structured data. The minimum-block-count targets in §4 force a baseline of structure per page. ### §7b. Images Image cadence: ~1 image per 500 words. Each image's `alt` text must: - Include the entity name being depicted. - Include context (what the image shows in relation to the page topic). - Be at least 20 characters. Generic alt ("chart", "image") is rejected. Multimodal pages (text + images) show empirical citation lift of 156-317% over text-only pages of similar length. AI engines that ground their answers in retrieved pages preferentially cite pages with strong visual content. --- ## §8. Information gain `geoMetadata.informationGainStatement` is a required field stating what the page contains that is **not in current AI engine responses** for its target queries. Rationale: AI engines have an implicit novelty filter. Pages that restate what the engine can already synthesize from existing training data contribute nothing — they are not cited because they are not informative. Pages with proprietary data, original analysis, or first-party research provide information gain and are cited disproportionately. Before publishing, articulate what makes this page non-redundant. If the answer is "nothing," the page should not exist. --- ## §13. Schema.org markup Auto-emitted by `core/jsonld.ts` from the typed payload. The agent never writes JSON-LD by hand. Five schema types are derived: - **Article** — page metadata, author, publisher, dates, citations - **BreadcrumbList** — Home → Resources → page title - **FAQPage** — drives FAQ extraction - **Person** — author identity, drives author-graph signals - **ImageObject** — per image block JSON-LD is rendered inside `