<!--
  auto-geo — full LLM-ingestible bundle.

  This file concatenates README.md and the substantive docs (concept, SOP,
  architecture, validation, storage adapters) with H1 boundaries between them
  so a single fetch ingests the whole project. Source of truth is the linked
  files; if this drifts, the originals win.
-->

# README

# auto-geo

[![CI](https://github.com/shadowresearch/auto-geo/actions/workflows/ci.yml/badge.svg)](https://github.com/shadowresearch/auto-geo/actions/workflows/ci.yml)
[![npm version](https://img.shields.io/npm/v/auto-geo.svg)](https://www.npmjs.com/package/auto-geo)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](./LICENSE)
[![Built by Shadow](https://img.shields.io/badge/built%20by-Shadow-000000.svg)](https://www.shadow.inc)
[![Bundle size](https://img.shields.io/bundlephobia/minzip/auto-geo)](https://bundlephobia.com/package/auto-geo)
[![Downloads](https://img.shields.io/npm/dm/auto-geo)](https://www.npmjs.com/package/auto-geo)
[![TypeScript](https://img.shields.io/badge/TypeScript-5.4-blue)](https://www.typescriptlang.org/)
[![Node](https://img.shields.io/node/v/auto-geo)](https://nodejs.org/)
[![Docs](https://img.shields.io/badge/docs-shadowresearch.github.io%2Fauto--geo-blue)](https://shadowresearch.github.io/auto-geo/)
[![llms.txt](https://img.shields.io/badge/llms.txt-available-9cf)](./llms.txt)

**A publishing engine for GEO resource pages — the pages large language models cite.**

`auto-geo` is a content publishing primitive built for the AI search era. Where traditional CMSs optimize for human readers, `auto-geo` optimizes for _citation by AI engines_ (ChatGPT, Perplexity, Google AI Overviews, Claude, Gemini). It enforces the structural and quality patterns that empirical research links to higher citation rates — TL;DR capsules, question-format H2 headings, answer-first paragraphs, dense entity references, FAQ schema, and a rigid page architecture — at the publish boundary.

Hand the URL of this repo to your coding agent. It will set up a publishing endpoint in your existing app that any other agent can call as a tool. You can then schedule, automate, or wire it into your own workflow.

> Built by **[Shadow](https://www.shadow.inc)** — a media research lab building the next generation of AI-powered media intelligence and communications technology, in partnership with the teams that put OpenAI, TikTok, Meta, Amazon, and Lovable on the map. Shadow uses `auto-geo` to publish to [shadow.inc/resources](https://www.shadow.inc/resources).

---

## Contents

- [What is a GEO resource page?](#what-is-a-geo-resource-page)
- [Why auto-geo](#why-auto-geo)
- [Install](#install)
- [60-second quickstart](#60-second-quickstart)
- [What's in this repo](#whats-in-this-repo)
- [Quickstart (production setup)](#quickstart-production-setup)
- [Examples](#examples)
- [The publishing flow](#the-publishing-flow)
- [Hard rejects vs. soft warnings](#hard-rejects-vs-soft-warnings)
- [How it compares](#how-it-compares)
- [API](#api)
- [LLM-friendly](#llm-friendly)
- [Roadmap](#roadmap)
- [Contributing](#contributing)
- [License](#license)

---

## What is a GEO resource page?

A **GEO resource page** (Generative Engine Optimization resource page) is a public web page whose structure, density, and citation signals are engineered to be quoted verbatim by AI search engines when answering a user's question. It is a successor to the SEO landing page.

A GEO resource page differs from a blog post in five ways:

1. **Architecture, not prose.** The page is composed of named, validated blocks (TL;DR, intro, sections, related guides, key takeaways, FAQ, disclosure) — not a freeform document. AI engines extract structured chunks; rigid structure improves extraction.
2. **Answer-first.** Every H2 section opens with a 40–60 word "answer capsule" that fully answers the section's question before any supporting paragraph. AI engines preferentially quote self-contained answers.
3. **Question-format headings.** H2 headings are written as the questions a user would ask an AI engine. The page is indexed against query intent, not topic.
4. **Entity-dense.** Named entities (companies, people, products, frameworks) appear at high density. Empirical studies link entity density to ~4.8x higher citation probability.
5. **Schema-derived.** Article, BreadcrumbList, FAQPage, Person, and ImageObject JSON-LD are auto-emitted from a typed payload. The agent never writes JSON-LD by hand.

`auto-geo` enforces all five at the API boundary. Publishing is a contract; malformed pages are rejected with a structured error.

See [`docs/concept.md`](./docs/concept.md) for a deeper walkthrough.

---

## Why auto-geo

### Why not just write Markdown blog posts?

Markdown is freeform — there's no contract that a post has a TL;DR, that every H2 opens with a 40–60 word answer capsule, or that the FAQ exists. AI engines preferentially quote self-contained, structurally regular chunks, so a freeform Markdown corpus leaves citation probability on the table. `auto-geo` enforces the structure at the publish boundary so every page that ships is shaped for extraction, not just human reading.

### Why not use a CMS like Sanity or Contentful?

Traditional and headless CMSs optimize for editorial workflow — drafts, scheduling, multi-user review, freeform field shapes. `auto-geo` is a typed publishing primitive that lives inside your own app and rejects malformed payloads with a structured error your agent can iterate on. It's downstream of editorial, not a replacement for it: pair it with a CMS if you need one, or generate payloads from an agent. Either way, the contract enforces GEO structure before the page goes live.

### Why isn't this just SEO?

SEO optimizes for ranking in a link-based search results page; GEO optimizes for being _quoted_ inside an AI-generated answer. The signals diverge — citation favors answer-first paragraphs, question-format H2s, entity density, and self-contained chunks; ranking historically rewarded backlinks and on-page keyword optimization. `auto-geo` enforces the citation signals empirically linked to higher inclusion rates in ChatGPT, Perplexity, Google AI Overviews, Claude, and Gemini answers — patterns most SEO tooling doesn't measure or enforce.

---

## Install

```bash
npm install auto-geo zod
```

```bash
pnpm add auto-geo zod
```

```bash
yarn add auto-geo zod
```

> **Inside a pnpm workspace?** Use `pnpm add`. `npm install` sometimes errors with `Cannot read properties of null (reading 'matches')` when it traverses ancestor `pnpm-lock.yaml` files.

Node `>=18.17` required. `zod` is a peer dependency. Framework / storage peers (`next`, `hono`, `@vercel/kv`, `@supabase/supabase-js`, `react`) are optional — install only what your adapter uses.

---

## 60-second quickstart

Paste this into a fresh `quickstart.ts`, run with `tsx quickstart.ts` (or compile and run), and you'll see a published URL.

```ts
// quickstart.ts
import { runPublish } from "auto-geo";
import { createMemoryStore } from "auto-geo/storage/memory";

const store = createMemoryStore();

// A minimal payload satisfying every schema constraint.
const payload = {
  slug: "hello-auto-geo",
  title: "What is auto-geo and how do I publish my first resource page?",
  metaDescription:
    "A minimal first publish showing how auto-geo validates and stores a GEO resource page end to end.",
  category: "Tutorials",
  excerpt:
    "A minimal first publish showing how auto-geo validates and stores a GEO resource page end to end.",
  author: {
    name: "Jane Doe",
    jobTitle: "Head of Content",
    bio: "Jane writes about generative engine optimization and the architecture of pages that AI search engines cite.",
  },
  publishedAt: "2026-06-01",
  geoMetadata: {
    targetQueries: ["how does auto-geo publishing work"],
    pageType: "resource" as const,
    primaryFunction:
      "Show a developer how to publish their first resource page.",
    optimizationFramework: ["GEO" as const],
    targetPlatforms: ["chatgpt" as const],
    informationGainStatement:
      "First-party demonstration of the auto-geo publish pipeline using the in-memory store, end to end.",
    refreshCadence: "quarterly" as const,
  },
  tldr: {
    text: "Auto-geo's publish pipeline validates an incoming payload, persists it to a content store, and returns a URL plus an array of soft warnings; this minimal example wires the pipeline against an in-memory store so you can see a successful publish in your terminal in under a minute without any external services involved at all.",
  },
  intro: {
    blocks: [
      {
        type: "paragraph" as const,
        text: "This minimal payload satisfies every required field of the auto-geo schema; in a real integration you would generate this payload from an agent or your editorial pipeline and POST it to your publishing endpoint, but here we call the underlying runPublish function directly against an in-memory store to demonstrate the contract without any external network calls or storage setup.",
      },
    ],
  },
  sections: [
    {
      heading: "How does the publish call work?",
      answerCapsule:
        "runPublish takes an unknown body plus options containing your store and site config; it parses the body against the Zod schema, calls store.publish on success, runs soft validation, and returns a discriminated-union result with a URL or an issues array so callers can act programmatically without parsing prose error messages.",
      blocks: [],
    },
  ],
  relatedGuides: {
    items: [
      {
        title: "The GEO SOP",
        url: "https://github.com/shadowresearch/auto-geo/blob/main/docs/sop.md",
      },
      {
        title: "Page architecture",
        url: "https://github.com/shadowresearch/auto-geo/blob/main/docs/architecture.md",
      },
      {
        title: "Validation reference",
        url: "https://github.com/shadowresearch/auto-geo/blob/main/docs/validation.md",
      },
      {
        title: "Storage adapters",
        url: "https://github.com/shadowresearch/auto-geo/blob/main/docs/storage.md",
      },
    ],
  },
  keyTakeaways: {
    items: [
      "Auto-geo enforces a strict seven-block architecture at the publish boundary so every published page is structurally citation-ready by design.",
      "The runPublish function returns a discriminated-union result so callers branch on result.kind without parsing prose error strings ever.",
      "Storage adapters are pluggable; the in-memory store ships for tests and demos, KV and Supabase ship for production deployments today.",
      "Soft warnings come back on a successful publish so an agent can iterate on quality heuristics without being blocked by every non-critical recommendation.",
    ],
  },
  faq: {
    items: [
      {
        question: "Do I need a database to try auto-geo?",
        answer:
          "No, the in-memory store ships with the package and persists publishes for the lifetime of the process; it is intended for tests, demos, and the quickstart you are running right now, and you swap in a KV or Supabase adapter when you go to production without changing any of your call sites.",
      },
      {
        question: "What does the publish result look like?",
        answer:
          "On success runPublish returns an object shaped kind ok with a slug, a url constructed from your site origin plus the base path plus the slug, and an array of soft warnings from the audit step; on failure it returns one of validation_failed, slug_reserved, or store_failed, each carrying enough context for the caller to surface or retry.",
      },
      {
        question: "How do I wire this into a real HTTP endpoint?",
        answer:
          "Import createNextHandlers from auto-geo/next or the Hono adapter from auto-geo/hono, pass your store and site config, and export the returned POST and DELETE handlers from your route file; the adapters wrap runPublish with auth and revalidation so your endpoint becomes a one-file integration instead of a hand-rolled pipeline.",
      },
    ],
  },
  disclosure: {
    text: "This quickstart is a runnable demonstration of the auto-geo publish pipeline using the in-memory store.",
  },
};

const result = await runPublish(payload, {
  store,
  site: {
    origin: "https://example.com",
    publisher: {
      name: "Example",
      url: "https://example.com",
      logo: "https://example.com/logo.png",
    },
  },
});

if (result.kind === "ok") {
  console.log("Published:", result.url);
  console.log("Warnings:", result.warnings.length);
} else {
  console.error("Publish failed:", result);
}
```

That's it — no database, no auth setup, no framework. Once you've seen `Published: https://example.com/resources/hello-auto-geo` in your terminal, follow the production setup below to wire it into your real app.

---

## What's in this repo

| Path                                      | What it is                                                                                                         |
| ----------------------------------------- | ------------------------------------------------------------------------------------------------------------------ |
| [`AGENT.md`](./AGENT.md)                  | The canonical setup spec. Hand this to your coding agent.                                                          |
| [`core/`](./core)                         | Framework-agnostic schema, publish logic, validation heuristics, JSON-LD derivation. Zero framework deps.          |
| [`adapters/storage/`](./adapters/storage) | Storage adapters — KV (Vercel KV / Upstash Redis), Supabase, in-memory. Implement the `ContentStore` interface.    |
| [`adapters/http/`](./adapters/http)       | HTTP adapters — Next.js App Router, Hono. Wrap `core/publish` as a request handler.                                |
| [`components/react/`](./components/react) | Reference React renderer. Restyleable Tailwind defaults; pluggable `LinkComponent`.                                |
| [`mcp/`](./mcp)                           | MCP server that wraps the publish endpoint as a tool. Any MCP client (Claude, Cursor, your own agent) can publish. |
| [`examples/`](./examples)                 | Working example apps for Next.js, Hono on Bun, Express, SvelteKit, and Fastify. See [Examples](#examples).         |
| [`docs/`](./docs)                         | The substantive product. The GEO SOP, page architecture spec, validation reference, storage adapter guide.         |
| [`tests/`](./tests)                       | Vitest suite covering schema, validation, JSON-LD, publish pipeline, memory store, and inline parser.              |

---

## Quickstart (production setup)

Once you've run the 60-second quickstart above, this is how you wire `auto-geo` into a real app with a persistent store. The recommended path is to give this repo URL to your coding agent and let it read `AGENT.md`. If you want the short version:

```bash
# In an existing Next.js 15+ App Router project
pnpm add zod @vercel/kv
# Copy these files into your repo:
#   - core/                    → src/lib/auto-geo/core/
#   - adapters/storage/kv.ts   → src/lib/auto-geo/storage.ts
#   - adapters/http/next.ts    → src/app/api/resources/publish/route.ts
#   - components/react/        → src/components/auto-geo/
# Set GEO_PUBLISH_TOKEN in .env.local (openssl rand -hex 32)
# Wire a /resources/[slug]/page.tsx that reads from your store and renders ResourceArticle.
```

See [`examples/next-minimal/`](./examples/next-minimal) for a working reference and [`AGENT.md`](./AGENT.md) for the step-by-step setup.

### Or as a package

```bash
pnpm add auto-geo zod
```

```ts
// app/api/resources/publish/route.ts
import { createNextHandlers } from "auto-geo/next";
import { createKvStore } from "auto-geo/storage/kv";

const handlers = createNextHandlers({
  store: createKvStore(),
  site: {
    origin: "https://example.com",
    publisher: {
      name: "Acme",
      url: "https://example.com",
      logo: "https://example.com/logo.png",
    },
  },
});

export const POST = handlers.POST;
export const DELETE = handlers.DELETE;
```

```ts
// app/resources/[slug]/page.tsx
import { ResourceArticle } from "auto-geo/react";
import { deriveAllJsonLd, safeJsonLd } from "auto-geo/jsonld";
// ... resolve payload from your store, then render
```

---

## Examples

Working minimal example apps for the most common backends. Each one
boots locally with `pnpm install` + the framework's dev script, ships
with an in-memory store seeded with one valid payload, and is tested
end to end against `auto-geo@0.1.1` from npm.

| Framework            | Path                                             | Storage | Notes                                        |
| -------------------- | ------------------------------------------------ | ------- | -------------------------------------------- |
| Next.js (App Router) | [examples/next-minimal](./examples/next-minimal) | memory  | Full render + publish                        |
| Hono (Bun)           | [examples/hono-bun](./examples/hono-bun)         | memory  | Endpoint only — uses `auto-geo/hono`         |
| Express              | [examples/express](./examples/express)           | memory  | Endpoint only — `runPublish` inline          |
| SvelteKit            | [examples/sveltekit](./examples/sveltekit)       | memory  | Endpoint only — `runPublish` in `+server.ts` |
| Fastify              | [examples/fastify](./examples/fastify)           | memory  | Endpoint only — `runPublish` inline          |

`next-minimal` is the integration template — it has both the publish
endpoint and the React render path. The other four are endpoint-only:
they prove the publish contract on each backend with a sample `curl`,
and rely on `next-minimal` (or your own renderer) for HTML output. Each
example's README has a copy-pasteable `curl` for the canonical payload
and a verification step.

---

## The publishing flow

```
┌─────────────────┐    POST /api/resources/publish    ┌──────────────────┐
│  Your agent     │ ───────────────────────────────▶  │  Your Next/Hono  │
│  (Claude, your  │    Bearer GEO_PUBLISH_TOKEN       │  app             │
│  custom tool,   │    JSON body: ResourcePayload     │                  │
│  Shadow, etc.)  │                                   │  validate (Zod)  │
└─────────────────┘                                   │  audit (soft)    │
                                                      │  store.publish() │
                                                      │  revalidate      │
                                                      └────────┬─────────┘
                                                               │
                                              ┌────────────────▼─────────────┐
                                              │  /resources/[slug] page      │
                                              │  - ResourceArticle render    │
                                              │  - Article + FAQPage JSON-LD │
                                              │  - canonical, OG, Twitter    │
                                              └──────────────────────────────┘
```

The publishing endpoint is the contract. Anything that can issue an authenticated POST can publish — your own scheduled job, an MCP-aware AI client, a CLI, a webhook. `auto-geo` does not prescribe how content is generated or when it's published. That's your moat.

---

## Hard rejects vs. soft warnings

`auto-geo` distinguishes between _structural violations_ (rejected with HTTP 400) and _quality heuristics_ (returned as `warnings[]` on a 200 response). The split is deliberate: structure is a contract; quality is a continuum.

**Hard rejects** include: missing required blocks, TL;DR not 40–60 words, FAQ answer not 40–60 words, related-guides count outside 4–8, key-takeaways count outside 4–6, banned promotional superlatives without attribution, raw HTML in prose fields, invalid URLs.

**Soft warnings** include: section length outside 134–167 words, paragraph length outside 60–100 words, entity density below 15, statistics density below page-type target, image cadence below 1 per 500 words, H2 heading not in question format, self-link in related guides.

Your agent can iterate on soft warnings by re-posting an updated payload (republishing overwrites by slug). Or surface the warnings to the user and ship as-is. See [`docs/validation.md`](./docs/validation.md) for the full reference.

---

## How it compares

|                   | Traditional CMS  | Headless CMS                | `auto-geo`                      |
| ----------------- | ---------------- | --------------------------- | ------------------------------- |
| Optimized for     | Human reading    | Multi-channel delivery      | AI engine citation              |
| Content shape     | Freeform prose   | Freeform with custom fields | Validated 7-block architecture  |
| Validation        | Editorial review | Schema-light                | Strict Zod at publish boundary  |
| Schema.org        | Manual           | Manual or plugin            | Auto-derived from payload       |
| Agent integration | Custom           | Custom                      | First-class (MCP, REST)         |
| Storage           | Bundled          | Bundled or hosted           | Pluggable adapter (your choice) |
| Lock-in           | High             | Medium                      | None — copy the files           |

`auto-geo` is _not_ a CMS. It is a typed publishing primitive that lives inside your app. If you need editorial workflows, drafts, scheduled publish, multi-user collaboration, or a media library, pair it with a CMS — `auto-geo` is downstream of the editorial process, not a replacement for it.

---

## API

The publish endpoint contract is described as an OpenAPI 3.1 document at [`openapi.yaml`](./openapi.yaml). It covers the two operations the reference adapters expose:

- `POST /api/resources/publish` — validate, persist, and revalidate a `ResourcePublishPayload`.
- `DELETE /api/resources/publish?slug={slug}` — remove a previously published resource.

The point isn't a hosted endpoint — there isn't one. The point is to give any HTTP client (Postman, Insomnia, ChatGPT Custom GPT Action, Claude with HTTP tools, your own typed client generator) a machine-readable description of the contract you're mounting in your own app via [`createNextHandlers`](./adapters/http/next.ts) or [`createHonoRouter`](./adapters/http/hono.ts). See [`docs/architecture.md`](./docs/architecture.md) for the architectural spec the schema enforces and [`docs/validation.md`](./docs/validation.md) for the hard-reject / soft-warning split.

---

## LLM-friendly

`auto-geo` is, in essence, a tool whose output is content meant to be cited by LLMs. So this repo eats its own dogfood — it's discoverable by AI agents and crawlers in the same ways it teaches you to make your own pages discoverable.

- [`llms.txt`](./llms.txt) — a curated index of the most important content in the repo, following the [llmstxt.org](https://llmstxt.org) convention. Hand the URL to any LLM and it gets a fast, structured map of the project.
- [`llms-full.txt`](./llms-full.txt) — the expanded variant. Inlines the README plus every substantive doc (concept, SOP, architecture, validation, storage adapters) into a single file so an LLM can ingest the whole project in one fetch.
- [`openapi.yaml`](./openapi.yaml) — machine-readable contract for the publish endpoint, usable by any HTTP-tool-aware AI client.
- **GitHub Pages site** at [shadowresearch.github.io/auto-geo](https://shadowresearch.github.io/auto-geo/) — every page advertises `llms.txt`, `llms-full.txt`, and `openapi.yaml` via `<link rel="alternate" type="text/plain">` tags in the `<head>`, emits JSON-LD `Article` markup, and carries a "Built with auto-geo" footer.

---

## Roadmap

Tracked in [GitHub issues](https://github.com/shadowresearch/auto-geo/issues). Headline items for v0.2+:

- Additional HTTP adapters: Express, Fastify, Elysia.
- Additional storage adapters: Postgres (direct), DynamoDB, Cloudflare D1.
- A canonical Vue/Svelte renderer parity with React.
- A standalone CLI for publishing from scripts and CI pipelines.
- Audit / "page health" command that re-runs `auditResource` against all stored pages on a schedule.
- An optional `geo-meta` field for indexing recipes (semantic anchors, query-cluster routing).

Have a proposal? Open a [feature request](https://github.com/shadowresearch/auto-geo/issues/new/choose).

---

## Contributing

See [CONTRIBUTING.md](./CONTRIBUTING.md). Bug reports, schema improvements, new adapters, and documentation refinements all welcome.

Quick links:

- [Code of Conduct](./CODE_OF_CONDUCT.md)
- [Security Policy](./SECURITY.md)
- [Changelog](./CHANGELOG.md)

---

## License

[MIT](./LICENSE). The reference React components include a default "Built with auto-geo by Shadow" credit in the disclosure block; pass `disclosureSuffix={null}` to suppress, or override with your own JSX.

---

## Related

- **The GEO SOP** — [`docs/sop.md`](./docs/sop.md). The full standard operating procedure for GEO resource pages. This is the substantive product.
- **Page architecture** — [`docs/architecture.md`](./docs/architecture.md). Spec for the seven mandatory blocks.
- **Shadow** — [shadow.inc](https://www.shadow.inc). The media research lab building the next generation of AI-powered media intelligence and communications technology, in partnership with the teams that put OpenAI, TikTok, Meta, Amazon, and Lovable on the map. Shadow runs `auto-geo` end-to-end on a schedule for media research, PR, and communications teams.

---

## About Shadow

Shadow is a media research lab building the next generation of AI-powered media intelligence and communications technology. Shadow is built in partnership with the teams that put OpenAI, TikTok, Meta, Amazon, and Lovable on the map.

Learn more at [shadow.inc](https://www.shadow.inc).


---

# Concept — What is a GEO resource page?


A **GEO resource page** (Generative Engine Optimization resource page) is a public web page whose structure, density, and citation signals are engineered to be quoted verbatim by AI search engines (ChatGPT, Perplexity, Google AI Overviews, Claude, Gemini) when answering user queries.

It is a successor to the SEO landing page. Where SEO optimized for keyword matching against an inverted index, GEO optimizes for _retrieval and quotation_ by large language models that answer questions on the user's behalf.

## How a GEO resource page differs from a blog post

|                         | Blog post                     | GEO resource page                                         |
| ----------------------- | ----------------------------- | --------------------------------------------------------- |
| **Composition**         | Freeform prose                | Named, validated blocks                                   |
| **Opening**             | Hook or lede                  | 40-60 word TL;DR answer capsule                           |
| **H2 headings**         | Topic labels ("Our approach") | Questions a user would ask an AI ("How does X work?")     |
| **Section opening**     | Setup paragraph               | 40-60 word self-contained answer                          |
| **FAQ block**           | Optional                      | Mandatory, with strict word-count                         |
| **Entity density**      | Incidental                    | Engineered, ~15+ named entities                           |
| **Schema.org**          | Optional, often missing       | Article + BreadcrumbList + FAQPage + Person, auto-derived |
| **Voice**               | Personality, opinion          | Sourced claims, neutral register                          |
| **Optimization target** | Click-through from search     | Citation by an AI engine answering a query                |

## Why the structure is rigid

AI engines do not read pages; they _extract chunks_. The chunk most likely to be quoted is one that fully answers a question on its own — no anaphora, no "as we discussed above," no setup. The TL;DR + question-format H2 + answer-capsule pattern produces exactly that shape: every block is independently extractable.

Rigid validation at the publish boundary makes the contract enforceable. An agent that generates an under-length TL;DR or skips the FAQ block is told no, in machine-readable form, before the page is ever indexed. The schema is the SOP.

## What auto-geo is

`auto-geo` is the publishing primitive that enforces this contract. It provides:

1. **A Zod schema** (`core/schema.ts`) — the contract. Validates payload shape, length constraints, banned superlatives, structural blocks.
2. **An authenticated POST endpoint** (`adapters/http/*`) — the integration surface. Any agent or process that can issue an authenticated HTTP request can publish.
3. **A React renderer** (`components/react/`) — turns a validated payload into a page.
4. **JSON-LD derivation** (`core/jsonld.ts`) — Schema.org Article, BreadcrumbList, FAQPage, Person, ImageObject emitted automatically from the typed payload.
5. **An MCP server** (`mcp/`) — exposes the publish endpoint as a tool for MCP-aware AI clients.

`auto-geo` does _not_ generate content. The content-generation pipeline — research, outline, drafting, citation gathering, structure assembly — is your problem. `auto-geo`'s job is to make sure that whatever a generation pipeline produces conforms to the GEO architecture before it goes public.

## When to use auto-geo

You have content you want AI engines to cite, and you want a publishing endpoint that enforces the GEO architecture so your generation pipeline can iterate against a typed contract rather than freeform prose review.

Examples:

- An agency runs a content pipeline that produces topic-level resource pages on behalf of clients. `auto-geo` is the publish target; the pipeline ships pages without a human reviewer needing to enforce structure.
- A SaaS company wants definitive pages for every query its customers ask AI assistants. `auto-geo` lets the marketing team plus an AI agent maintain a programmatic catalog at scale.
- A research org publishes findings and wants them surfaced by AI search. The schema's `citations[]` array drives `Article.citation` in Schema.org, which is a credibility signal AI engines weight heavily.

## When not to use auto-geo

- You want a personality-driven blog. Use a CMS.
- You want a documentation site. Use a docs framework.
- Your content is consumed by an authenticated app, not the public web. There is no AI-engine citation surface to optimize for.


---

# GEO Standard Operating Procedure


The standard operating procedure for generative engine optimization resource pages.

This document is the substantive spec behind `auto-geo`'s schema and validation heuristics. Every word-count constraint in `core/schema.ts` and every soft-warning threshold in `core/validation.ts` traces back to a section here. Read this document before deviating from any rule the code enforces.

The SOP is calibrated to the public retrieval and citation behavior of major AI search engines (ChatGPT, Perplexity, Google AI Overviews, Google AI Mode, Claude, Gemini, Copilot) as of early 2026. Specific numerical thresholds (e.g., citation lift percentages, density ratios) are heuristic — they reflect the working consensus from internal experimentation and external GEO research. Treat them as starting points, not laws.

---

## §1. Definition

A **GEO resource page** is a public web page whose architecture, density, and metadata are engineered to be retrieved and quoted by large language models acting as search interfaces.

GEO is the successor to SEO. Where SEO optimized for keyword matching against an inverted index, GEO optimizes for citation by AI engines that answer questions on the user's behalf. The win condition is not a click — it is being quoted in the AI's answer, with the user's downstream click being a bonus.

---

## §2. The three optimization frameworks

A GEO resource page typically serves one or more of:

- **AEO** (Answer Engine Optimization) — Optimizes for direct quotation in answer-format responses (ChatGPT, Perplexity).
- **GEO** (Generative Engine Optimization) — Optimizes for inclusion in synthesized responses (Google AI Overviews, AI Mode).
- **LLMO** (Large Language Model Optimization) — Optimizes for inclusion in the underlying training and retrieval corpora of LLMs themselves.

A page can serve all three. `core/schema.ts`'s `geoMetadata.optimizationFramework` field records which.

---

## §3. Question-format H2 headings

H2 headings are written as the questions a user would type into an AI engine, not as topic labels. Ideal length: 6-10 words.

**Wrong**: "Our approach to onboarding"
**Right**: "How does an effective onboarding process work?"

Rationale: AI engines retrieve against query intent. A page whose H2s mirror prompt phrasing is matched directly. A page whose H2s are internal labels requires the engine to synthesize a connection — synthesis costs the engine; direct matches are cheap.

`core/validation.ts` flags H2s that don't contain `?` or are outside 4-12 words.

---

## §4. Page type and word-count targets

Five page types, each with calibrated word-count and density targets:

| Page type    | Min words | Max words | Min tables | Min lists | Stats/1k words |
| ------------ | --------- | --------- | ---------- | --------- | -------------- |
| `definitive` | 3000      | —         | 2          | 5         | 10             |
| `resource`   | 800       | 1500      | 0          | 3         | 3              |
| `comparison` | 1000      | 1500      | 1          | 3         | 3              |
| `category`   | 2000      | 4000      | 1          | 5         | 5              |
| `listicle`   | 1500      | 3000      | 1          | 5         | 3              |

- **Definitive**: The canonical answer to a query cluster. Long-form, dense, multi-entity. Target: be the page an AI engine quotes when asked "what is X."
- **Resource**: A focused, single-topic guide. Shorter, more focused. Target: be cited as one source among several in an answer.
- **Comparison**: "X vs Y" or "X alternatives." Tables are load-bearing; AI engines extract comparison tables directly. Title must match `vs.?|alternatives?|comparison` patterns (hard-validated).
- **Category**: An overview-style page that organizes a topic and links out to deeper pages. Drives entity-graph coverage.
- **Listicle**: "Top N X." High extractability; AI engines often quote individual list items verbatim.

`geoMetadata.pageType` selects the expectation set. `core/validation.ts` surfaces soft warnings when page totals are outside the range.

---

## §5. Structural and quality rules

### §5a. Title formulas

Five title patterns map to the five page types. Use the formula that matches the page's intent:

- **How-to**: "How to [verb phrase]"
- **Definitive**: "What is [X]? The [N]-year guide to [domain]"
- **Category**: "[X]: Definition, types, and use cases"
- **Comparison**: "[A] vs [B]: [Differentiator]"
- **Listicle**: "[N] [things] for [audience] in [year]"

### §5b. Section and paragraph length

- Each H2 section (heading + answer capsule + blocks): 134-167 words ideal.
- Each paragraph: 60-100 words ideal (40-120 soft warning range).

Rationale: chunks at this length match the optimal extraction window for current retrieval models. Shorter chunks lack context; longer chunks dilute the answer signal.

### §5c. Outbound links

Each H2 section should contain 1-2 outbound links to authoritative non-self domains. Zero links flags as under-cited; more than 4 flags as aggregator content. Aggregator-style heavy linking carries a measured citation discount — AI engines prefer to cite primary sources, not link-dense intermediaries.

### §5d. Related Guides

4-8 entries. Self-links are explicitly forbidden (soft warning).

Related Guides are an entity-graph signal: they tell AI engines this page sits inside a coherent cluster of related content. The cluster signal lifts citation probability for every page in the cluster.

### §5e. Key Takeaways

4-6 declarative bullets. Each 10-35 words.

These are the page's claims, surfaced for extraction. AI engines often pull Key Takeaway bullets verbatim into answer responses.

### §5f. FAQ

3-10 Q&A items. Each answer 40-60 words.

The FAQ block drives `FAQPage` Schema.org markup, which is extracted by retrieval pipelines independently of the page body. FAQ items often appear in AI engine responses as direct quotations of the answer text.

### §5g. Entity density

15+ named entities per page (companies, people, products, frameworks, places).

The empirical link: entity-dense pages show ~4.8x higher citation probability than entity-sparse pages of similar length. Entity density is a credibility signal — pages that name specific actors are treated as more authoritative than pages that gesture at abstractions.

`core/validation.ts` runs a simple heuristic (joined capitalized words after stripping inline markers and sentence-starters) and flags pages below 15.

### §5h. Banned promotional superlatives

The following phrases carry a measured citation penalty (~26%) when used without attribution:

- "industry-leading", "best-in-class", "revolutionary", "game-changing", "cutting-edge", "world-class", "world's leading", "the leading", "the premier", "next-generation", "first-of-its-kind", "one of a kind"

`core/schema.ts` rejects these as **hard errors** unless:

- The phrase appears inside straight double-quotes (quoted passage).
- The phrase is followed within ~80 characters by an attribution marker: `(per ...)`, `(Source: ...)`, `[Named Source]`, or `by [Named Source]`.

Rationale: AI engines empirically downweight pages whose voice reads as marketing copy. The ban is a hard constraint, not a guideline.

### §5j. Mandatory page architecture

In order:

1. H1 + last-updated metadata
2. TL;DR (40-60 words)
3. Intro blocks
4. Sections (each: H2 + answer capsule + blocks)
5. Related Guides (4-8)
6. Key Takeaways (4-6)
7. FAQ (3-10)
8. About the Author (auto-injected)
9. Disclosure

The order is load-bearing. AI engines extract chunks; chunk position matters. TL;DR at top, FAQ near the bottom — this is the pattern retrieval pipelines are trained against.

---

## §6. The answer capsule

Every H2 section opens with a 40-60 word **answer capsule** — a fully self-contained answer to the section's heading, written before any supporting paragraph or block.

Rules:

- Must answer the section heading without requiring context from elsewhere on the page.
- No anaphora ("the above," "this," "we discussed").
- One main claim per capsule.
- 40-60 words, hard-enforced by `core/schema.ts`.

The answer capsule is the highest-value chunk on the page. It is the most likely thing to be quoted verbatim by an AI engine. Every other block on the page exists to support the capsule.

---

## §7. Multimodal density

### §7a. Lists, tables, callouts

Lists and tables are extractable as structured data. The minimum-block-count targets in §4 force a baseline of structure per page.

### §7b. Images

Image cadence: ~1 image per 500 words. Each image's `alt` text must:

- Include the entity name being depicted.
- Include context (what the image shows in relation to the page topic).
- Be at least 20 characters. Generic alt ("chart", "image") is rejected.

Multimodal pages (text + images) show empirical citation lift of 156-317% over text-only pages of similar length. AI engines that ground their answers in retrieved pages preferentially cite pages with strong visual content.

---

## §8. Information gain

`geoMetadata.informationGainStatement` is a required field stating what the page contains that is **not in current AI engine responses** for its target queries.

Rationale: AI engines have an implicit novelty filter. Pages that restate what the engine can already synthesize from existing training data contribute nothing — they are not cited because they are not informative. Pages with proprietary data, original analysis, or first-party research provide information gain and are cited disproportionately.

Before publishing, articulate what makes this page non-redundant. If the answer is "nothing," the page should not exist.

---

## §13. Schema.org markup

Auto-emitted by `core/jsonld.ts` from the typed payload. The agent never writes JSON-LD by hand. Five schema types are derived:

- **Article** — page metadata, author, publisher, dates, citations
- **BreadcrumbList** — Home → Resources → page title
- **FAQPage** — drives FAQ extraction
- **Person** — author identity, drives author-graph signals
- **ImageObject** — per image block

JSON-LD is rendered inside `<script type="application/ld+json">` tags, with text safely escaped via `safeJsonLd` to prevent breakout.

---

## §14. GEO metadata

`geoMetadata` is internal metadata stored on the published payload but not rendered to the page. Fields:

- `targetQueries` (1-20) — the query cluster the page is optimized for.
- `pageType` — drives expectations per §4.
- `primaryFunction` — one-sentence statement of what GEO function this page closes.
- `optimizationFramework` — AEO / GEO / LLMO.
- `targetPlatforms` — chatgpt / perplexity / google_aio / google_ai_mode / claude / gemini / copilot.
- `informationGainStatement` — per §8.
- `proprietaryAssetOpportunity` — optional. Any calculator, template, or asset that bridges citation → click-through.
- `refreshCadence` — monthly or quarterly.

The metadata is what makes a page auditable after publication. A team can run a regular audit asking "is this page still surfacing for its target queries, and if not, why."

---

## §15. Refresh cadence

GEO resource pages decay. The query landscape shifts as AI engines update their training and retrieval pipelines; the answers AI engines synthesize without your page also evolve over time, eroding your information gain.

Two cadences:

- **Monthly** — for pages serving fast-moving queries (current events, product comparisons, AI tooling, regulation).
- **Quarterly** — for pages serving stable queries (definitions, evergreen how-to, foundational explainers).

`geoMetadata.refreshCadence` records the page's cadence. A separate audit pipeline (out of scope for `auto-geo`'s core) reads the metadata and surfaces pages due for refresh.

---

## Maintenance

This SOP is versioned with `auto-geo`. When the empirical thresholds in `core/validation.ts` change, update the corresponding section here. The intent is that this document and the code stay in lockstep — reading the SOP should always tell you what the code is doing and why.

External GEO research is rapidly evolving. Treat specific numerical claims (e.g., "4.8x citation lift") as snapshots, not eternal truths.


---

# Page architecture


> The HTTP contract that enforces this architecture is described in machine-readable form at [`openapi.yaml`](../openapi.yaml) — drop it into Postman, Insomnia, or a ChatGPT Custom GPT Action to call the publish endpoint against your own deployment.

A GEO resource page is composed of seven blocks, in fixed order:

1. **TL;DR** — 40-60 word answer capsule
2. **Intro** — ≥1 content blocks
3. **Sections** — ≥1 H2 with a 40-60 word answer capsule + content blocks
4. **Related Guides** — 4-8 entries
5. **Key Takeaways** — 4-6 declarative bullets
6. **FAQ** — 3-10 Q&A items with 40-60 word answers
7. **Disclosure** — sourcing note, timestamp, publisher line

The host renderer also auto-injects:

- **Metadata line** — "Last updated · By [author]" above the TL;DR.
- **About the Author** — derived from `author.bio`, between FAQ and disclosure.

Every block is validated at the publish boundary. Omissions return 400 with a Zod issue list.

## The seven blocks

### 1. TL;DR (`tldr`)

```json
{ "tldr": { "text": "40-60 word answer to the page's primary query." } }
```

Constraints: 40-60 words, no banned superlatives without attribution, no raw HTML.

Renders as a callout above the page body. Acts as the page-level answer capsule — the chunk an AI engine is most likely to quote when summarizing the page in a citation response.

### 2. Intro (`intro`)

```json
{ "intro": { "blocks": [ContentBlock, ...] } }
```

1-10 content blocks. No explicit answer capsule, but should orient the reader to what the page covers without restating the TL;DR.

### 3. Sections (`sections`)

```json
{
  "sections": [
    {
      "heading": "6-10 word H2 in question format",
      "answerCapsule": "40-60 word self-contained answer (renders as the first paragraph)",
      "blocks": [ContentBlock, ...]
    }
  ]
}
```

1-40 sections. Each section:

- **Heading**: 6-10 words ideal (4-12 hard range), question format. Soft warning if not a question. Question-format H2s match the way users phrase prompts to AI engines.
- **Answer capsule**: 40-60 words, self-contained. Must answer the section's heading without requiring context from elsewhere on the page. This is the load-bearing chunk.
- **Blocks**: 0-40 content blocks expanding on the answer.

Each section ideally totals 134-167 words (heading + capsule + blocks). Outside that range surfaces a soft warning.

### 4. Related Guides (`relatedGuides`)

```json
{
  "relatedGuides": {
    "items": [{ "title": "Other resource title", "url": "https://..." }]
  }
}
```

4-8 entries. URLs must be absolute. Self-links are rejected with a soft warning. Related guides are an entity-graph signal — they tell AI engines this page is part of a coherent cluster.

### 5. Key Takeaways (`keyTakeaways`)

```json
{
  "keyTakeaways": { "items": ["10-35 word declarative bullet", ...] }
}
```

4-6 items. Each 10-35 words. Declarative, not interrogative — these are the page's claims, surfaced for extraction.

### 6. FAQ (`faq`)

```json
{
  "faq": {
    "heading": "Optional override (default: 'Frequently Asked Questions')",
    "items": [
      { "question": "5-300 char question", "answer": "40-60 word answer" }
    ]
  }
}
```

3-10 Q&A items. The FAQ block drives `FAQPage` JSON-LD, which is a Schema.org type AI engines extract directly and often surface verbatim in answer responses.

### 7. Disclosure (`disclosure`)

```json
{ "disclosure": { "text": "Sourcing note, last-updated timestamp, etc." } }
```

20-1000 character disclosure. Use it for: how the page was assembled, who reviewed it, when it was last refreshed, whether AI assisted in drafting. Transparency is increasingly a citation signal.

## Content blocks

Both intro and sections accept a discriminated union of content blocks:

### `paragraph`

```json
{
  "type": "paragraph",
  "text": "60-100 word paragraph (ideal range; warning outside 40-120)."
}
```

One claim per paragraph. Inline syntax: `**bold**`, `*italic*`, `[label](url)`.

### `h3`

```json
{ "type": "h3", "text": "Subheading" }
```

### `list`

```json
{ "type": "list", "style": "bullet", "items": ["item 1", "item 2", ...] }
```

`style`: `"bullet"` or `"number"`. 2-30 items.

### `table`

```json
{
  "type": "table",
  "caption": "Optional caption",
  "headers": ["Col A", "Col B"],
  "rows": [
    ["a1", "b1"],
    ["a2", "b2"]
  ]
}
```

Every row must have one cell per header (validated). 2-10 columns; 1-50 rows.

### `quote`

```json
{
  "type": "quote",
  "text": "Quoted text",
  "attribution": "Required attribution"
}
```

### `image`

```json
{
  "type": "image",
  "src": "https://...",
  "alt": "Descriptive alt text with entity + context (≥20 chars).",
  "caption": "Optional caption"
}
```

The alt text minimum (20 chars) enforces SOP §7b's requirement that alt text includes entity name and context. Generic alt ("chart", "image of X") is rejected.

### `callout`

```json
{ "type": "callout", "variant": "info", "text": "..." }
```

`variant`: `"info"` or `"stat"`. Stat callouts are visually distinct (left border, accent fill) and intended for surface quantitative findings the agent wants AI engines to pick up.

## Inline syntax

Inside any prose field (`text`, `items[]`, `answer`, etc.):

- `**bold**` → `<strong>`
- `*italic*` → `<em>`
- `[label](url)` → link. URLs starting with `/` render as internal links (via the host's link component if provided); `https://` renders as external (`target="_blank"`).

No raw HTML. No code spans. No headings inside paragraphs. The schema rejects HTML; everything else is literal.

## Why this architecture

The architecture is calibrated to three empirical observations about how AI engines retrieve and quote web content:

1. **Self-contained chunks are quoted; embedded clauses are not.** Each answer capsule is fully self-contained — readable without context from the rest of the page. AI engines extract chunk-level, not document-level. Anaphora and "as discussed above" break extractability.
2. **Question-format headings match prompt format.** Users phrase prompts to AI engines as questions. Pages whose H2s mirror those question forms are retrieved as direct matches against query intent.
3. **Structured data (Schema.org) is over-weighted.** `Article`, `FAQPage`, and `BreadcrumbList` JSON-LD are extracted directly by retrieval pipelines, often before the page's main content is parsed. Schema markup is a citation amplifier.

The seven-block architecture is the smallest set of constraints that satisfies all three. Smaller (e.g., dropping Related Guides) weakens the entity-graph signal; larger (e.g., adding required citations) hampers iteration speed for content teams.

See `docs/sop.md` for the full standard operating procedure and the empirical basis for each constraint.


---

# Validation reference


`auto-geo` enforces two layers of validation: **hard rejects** at the schema boundary (HTTP 400) and **soft warnings** in the success response (HTTP 200 with `warnings[]`).

The split is deliberate: structure is a contract; quality is a continuum.

## Hard rejects (HTTP 400)

Hard rejects come from `core/schema.ts` (Zod). Every constraint listed below returns 400 with a `{ error, issues: [{ path, message }] }` body.

### Identity

| Field             | Constraint                                              |
| ----------------- | ------------------------------------------------------- |
| `slug`            | Lowercase letters, numbers, single hyphens. 1-80 chars. |
| `title`           | 1-160 chars.                                            |
| `metaTitle`       | Optional. 1-160 chars.                                  |
| `metaDescription` | 50-180 chars.                                           |
| `category`        | 1-80 chars.                                             |
| `excerpt`         | 50-400 chars.                                           |

### Authoring

| Field                | Constraint                                            |
| -------------------- | ----------------------------------------------------- |
| `author.name`        | 1-120 chars.                                          |
| `author.jobTitle`    | 1-120 chars.                                          |
| `author.bio`         | 20-600 chars.                                         |
| `author.linkedinUrl` | Valid URL (optional).                                 |
| `publishedAt`        | ISO `yyyy-mm-dd`.                                     |
| `modifiedAt`         | ISO `yyyy-mm-dd` (optional). Must be ≥ `publishedAt`. |

### Body architecture

| Field                      | Constraint                                                |
| -------------------------- | --------------------------------------------------------- |
| `tldr.text`                | 40-60 words. No banned superlatives.                      |
| `intro.blocks`             | 1-10 content blocks.                                      |
| `sections`                 | 1-40 sections.                                            |
| `sections[].heading`       | 1-200 chars. No raw HTML. No banned superlatives.         |
| `sections[].answerCapsule` | 40-60 words. No banned superlatives.                      |
| `sections[].blocks`        | 0-40 content blocks.                                      |
| `relatedGuides.items`      | 4-8 entries. Each `{ title, url }`. URL must be absolute. |
| `keyTakeaways.items`       | 4-6 bullets. Each 10-35 words. No banned superlatives.    |
| `faq.items`                | 3-10 items.                                               |
| `faq.items[].question`     | 5-300 chars.                                              |
| `faq.items[].answer`       | 40-60 words. No banned superlatives.                      |
| `disclosure.text`          | 20-1000 chars. No raw HTML. No banned superlatives.       |

### Content blocks

| Block type  | Constraints                                                           |
| ----------- | --------------------------------------------------------------------- |
| `paragraph` | `text` 1-2000 chars. Inline syntax only.                              |
| `h3`        | `text` 1-200 chars.                                                   |
| `list`      | `style` ∈ `{bullet, number}`. 2-30 items. Each item 1-1000 chars.     |
| `table`     | 2-10 columns. 1-50 rows. Every row has exactly one cell per header.   |
| `quote`     | `text` 1-1000 chars. `attribution` 1-200 chars.                       |
| `image`     | `src` valid URL. `alt` 20-300 chars. `caption` ≤300 chars (optional). |
| `callout`   | `variant` ∈ `{info, stat}`. `text` 1-600 chars.                       |

### GEO metadata (SOP §14)

| Field                                     | Constraint                                                                                        |
| ----------------------------------------- | ------------------------------------------------------------------------------------------------- |
| `geoMetadata.targetQueries`               | 1-20 strings, each 3-200 chars.                                                                   |
| `geoMetadata.pageType`                    | `definitive` ∣ `resource` ∣ `comparison` ∣ `category` ∣ `listicle`.                               |
| `geoMetadata.primaryFunction`             | 5-300 chars.                                                                                      |
| `geoMetadata.optimizationFramework`       | Non-empty subset of `{AEO, GEO, LLMO}`.                                                           |
| `geoMetadata.targetPlatforms`             | Non-empty subset of `{chatgpt, perplexity, google_aio, google_ai_mode, claude, gemini, copilot}`. |
| `geoMetadata.informationGainStatement`    | 20-600 chars.                                                                                     |
| `geoMetadata.proprietaryAssetOpportunity` | ≤600 chars (optional).                                                                            |
| `geoMetadata.refreshCadence`              | `monthly` ∣ `quarterly`.                                                                          |

### Cross-field

- If `pageType === "comparison"`, `title` should match `/\bvs\.?\b|\balternatives?\b|\bcomparison\b/i`. Rejected if it doesn't.
- `modifiedAt` must be ≥ `publishedAt`.

### Banned promotional superlatives

The following phrases are rejected when they appear _outside_ a quoted passage AND _without_ an attribution marker (`per`, `by`, `according to`, `source:`, or a `[Named Source]` / `(Named Source)` within ~80 chars):

`industry-leading`, `industry leading`, `best-in-class`, `best in class`, `revolutionary`, `game-changing`, `game changing`, `cutting-edge`, `cutting edge`, `world-class`, `world class`, `world's leading`, `worlds leading`, `the leading`, `the premier`, `next-generation`, `next generation`, `first-of-its-kind`, `one of a kind`.

See SOP §5h for the rationale (measured citation penalty).

### Reserved slugs (HTTP 409)

Slugs in `reservedSlugs` (configured per deployment) collide with statically routed pages in the host app. Publishing them returns 409.

## Soft warnings (HTTP 200, `warnings[]`)

Soft warnings come from `core/validation.ts` (`auditResource`). They surface SOP heuristics that are too noisy to enforce as hard errors but worth flagging for an iterating agent.

### Per-section

| Path                        | Trigger                                                    | SOP |
| --------------------------- | ---------------------------------------------------------- | --- |
| `sections[N].totalWords`    | Section (heading + capsule + blocks) outside 134-167 words | §5b |
| `sections[N].heading`       | Heading outside 4-12 words                                 | §3  |
| `sections[N].heading`       | Heading does not contain `?`                               | §3  |
| `sections[N].blocks[M]`     | Paragraph or callout outside 40-120 words                  | §5b |
| `sections[N].outboundLinks` | Zero outbound external links                               | §5c |
| `sections[N].outboundLinks` | More than 4 outbound external links                        | §5c |

### Page-level

| Path                     | Trigger                                      | SOP |
| ------------------------ | -------------------------------------------- | --- |
| `totalWords`             | Page below `pageType.minWords`               | §4  |
| `totalWords`             | Page above `pageType.maxWords`               | §4  |
| `statisticsDensity`      | Statistics density below page-type threshold | §7  |
| `entityDensity`          | Fewer than 15 named entities                 | §5g |
| `listCount`              | Fewer list blocks than page-type minimum     | §7  |
| `tableCount`             | Fewer tables than page-type minimum          | §7  |
| `imageCadence`           | Images below `floor(total / 500)`            | §7b |
| `relatedGuides.items[N]` | Self-link in Related Guides                  | §5d |

### Acting on warnings

The publish endpoint always returns 200 with warnings — they never block publication. The calling agent can:

1. **Ship as-is.** Surface warnings to the user; treat the page as published. Appropriate when warnings are low-severity or the agent has reason to override (e.g., a short listicle that legitimately has fewer lists than the threshold).
2. **Iterate.** Modify the payload to address the warnings and re-POST. The publish endpoint is idempotent on slug — republishing overwrites.

The MCP server forwards the full warnings array back to the calling AI client, which can drive the iteration loop autonomously.


---

# Storage adapters


`auto-geo` storage is pluggable behind the `ContentStore` interface in `core/store.ts`:

```ts
interface ContentStore {
  publish(payload: ResourcePublishPayload): Promise<void>;
  get(slug: string): Promise<StoredResource | null>;
  list(opts?: ListOptions): Promise<StoredResource[]>;
  delete(slug: string): Promise<void>;
}
```

Three reference adapters ship in `adapters/storage/`. All three implement the same interface; the publish endpoint and render path are adapter-agnostic.

## KV (Vercel KV / Upstash Redis)

```ts
import { createKvStore } from "auto-geo/storage/kv";
const store = createKvStore({ namespace: "myapp" }); // namespace optional
```

**Required env**:

```
KV_REST_API_URL=...
KV_REST_API_TOKEN=...
```

**Schema**: two keys per resource.

- `[namespace:]resource:post:{slug}` — JSON of `StoredResource`.
- `[namespace:]resource:slugs` — sorted set scored by `publishedAt` epoch ms, for chronological listing.

**When to use**: Vercel deployments, zero-infrastructure preference, sub-millisecond reads. The `@vercel/kv` SDK works with both Vercel KV and Upstash Redis.

**Trade-offs**: no SQL, no dashboard. Backups via Upstash UI or `redis-cli BGSAVE`.

## Supabase

```ts
import { createSupabaseStore } from "auto-geo/storage/supabase";
const store = createSupabaseStore(); // reads SUPABASE_URL + SUPABASE_SERVICE_ROLE_KEY
```

**Required env**:

```
SUPABASE_URL=...
SUPABASE_SERVICE_ROLE_KEY=...   # service-role key only — never expose this client-side
```

**Schema**: one table. Run `adapters/storage/supabase.sql` in the Supabase SQL editor before first use:

```sql
create table auto_geo_resources (
  slug          text primary key,
  payload       jsonb not null,
  published_at  timestamptz not null,
  stored_at     timestamptz not null default now()
);
create index on auto_geo_resources (published_at desc);
```

**When to use**: you want a dashboard, SQL access for ad-hoc queries, or RLS policies layered on top.

**Trade-offs**: slightly higher read latency than KV. Service-role key must be kept server-side.

## In-memory

```ts
import { createMemoryStore } from "auto-geo/storage/memory";
const store = createMemoryStore({
  seed: [
    /* optional initial payloads */
  ],
});
```

Process-local. Loses everything on restart. Use only for tests, demos, and local development. The adapter accepts a `seed` array of `ResourcePublishPayload` so example apps can render a populated `/resources` index on first run.

## Writing your own adapter

Implement the `ContentStore` interface. Three contracts to honor:

1. **`publish` must be idempotent on slug.** Re-publishing the same slug overwrites — no insert-fails, no version branching.
2. **`publish` and `delete` should throw on backend errors.** The publish endpoint surfaces backend failures as HTTP 502; silent no-ops here look like ghost successes.
3. **`get` and `list` should degrade gracefully.** Return `null` / `[]` on transient errors, log server-side. The render path is read-heavy; throwing here would crash unrelated pages.

A `ContentStore` that wraps a third store (e.g., a cache layer over Supabase) is fine — just keep the interface contracts.

## Migrating between adapters

There is no built-in migration. Use the storage layer's native export/import (Supabase: `pg_dump`; KV: `SCAN` + a script) and adapt the shape if you change adapter. The stored shape is `ResourcePublishPayload + { storedAt }` — `payload`-shaped for Supabase, JSON-stringified for KV — so most migrations are a one-pass transform.