LLM helpers

Generate llms.txt and extract page content as Markdown for AI consumption.

The inventra/llms sub-export builds llms.txt content for your site — a plain-text index that LLMs read to understand what your site is about. Each page also gets a deep-content URL serving its full Markdown.

There are two building blocks:

  • createLlmsService — composes the site-wide llms.txt index plus per-page content resolvers. Covers your blog posts automatically; picks up static pages via autoPages.
  • extractPageAsMarkdown — standalone helper that fetches any URL, extracts the main content with Mozilla Readability, and returns it as Markdown.

Quick setup

Two route handlers — one for the index, one for per-page deep content:

// app/llms.txt/route.ts
import { createLlmsService } from 'inventra/llms';
import { withISR } from 'inventra/next';
import { inventra } from '@/packages/plugins/config';

const llms = createLlmsService(
  {
    client: inventra,
    siteName: 'Acme',
    siteDescription: 'We build great things.',
    baseUrl: 'https://example.com',
    pages: {},
    autoPages: [
      { path: '/' },
      { path: '/about' },
      { path: '/pricing' },
      { path: '/contact' }
    ]
  },
  withISR(3600)
);

export const revalidate = 3600;

export async function GET() {
  const body = await llms.generateIndex();
  return new Response(body, {
    headers: { 'Content-Type': 'text/plain; charset=utf-8' }
  });
}
// app/llms.txt/[...slug]/route.ts
import { createLlmsService } from 'inventra/llms';
import { withISR } from 'inventra/next';
import { inventra } from '@/packages/plugins/config';

const llms = createLlmsService(
  {
    /* same config as above */
  },
  withISR(3600)
);

export const revalidate = 3600;

export async function GET(
  _req: Request,
  { params }: { params: Promise<{ slug: string[] }> }
) {
  const { slug } = await params;
  const body = await llms.generatePageContent(slug);
  if (body === null) return new Response('Not found', { status: 404 });
  return new Response(body, {
    headers: { 'Content-Type': 'text/plain; charset=utf-8' }
  });
}

That's it — /llms.txt returns the index, /llms.txt/<slug> returns the full Markdown for any configured page or blog post.

createLlmsService(config, fetchConfig?)

Returns { generateIndex, generatePageContent }.

Config

Field Type Description
client Inventra The SDK client (used to list blog posts).
siteName string Shown as the # Title of the llms.txt index.
siteDescription string One-sentence blurb under the title.
baseUrl string Absolute base URL, no trailing slash.
pages Record<string, LlmsPageProvider> Hand-authored page content. Pass {} if you only want autoPages.
autoPages LlmsAutoPageConfig[] Pages whose content is auto-extracted from rendered HTML.

autoPages — zero-config page coverage

Each entry points at a path on your site. The SDK fetches that URL, strips nav/header/footer/aside chrome, runs Mozilla Readability to isolate the main content, and converts to Markdown with GFM support.

autoPages: [
  { path: '/' },
  { path: '/about' },
  { path: '/pricing', exclude: ['.cookie-banner'] }, // extra selectors to strip
  { path: '/careers', label: 'Join us', description: 'Open roles and culture.' }
]

Entry fields

Field Type Description
path string Path on the site, starting with /. Use / for the home page.
label string Optional. Defaults to the extracted <title>.
description string Optional. Defaults to the meta description.
exclude string[] Optional CSS selectors to remove before extraction, on top of the default chrome stripping.

pages — hand-authored content

Use this when you want full control of the content, or when the page is gated / JS-heavy and HTML extraction won't capture what matters.

pages: {
  about: {
    label: 'About',
    description: 'Who we are and what we do.',
    getContent: () => ({
      title: 'About Acme',
      description: 'Our story.',
      sections: [
        { heading: 'Mission', content: '...' },
        { heading: 'Team', content: '...' }
      ]
    })
  }
}

extractPageAsMarkdown(url, options?)

Standalone helper. Useful outside createLlmsService — e.g. embeddings, RAG indexing, custom llms.txt layouts.

import { extractPageAsMarkdown } from 'inventra/llms';

const { title, description, markdown, url } = await extractPageAsMarkdown(
  'https://example.com/about',
  {
    exclude: ['.newsletter-banner'],
    fetchOptions: { next: { revalidate: 3600 } }
  }
);

Options

Field Type Description
exclude string[] CSS selectors to remove before extraction.
userAgent string Override the fetch user agent.
fetchOptions RequestInit Merged into the fetch init — pass Next.js cache directives here.

Generated output

The index produces standard llms.txt:

# Acme

> We build great things.

## Pages

- [Home](https://example.com/): ... — [Full content](https://example.com/llms.txt/home)
- [About](https://example.com/about): ... — [Full content](https://example.com/llms.txt/about)

- [Blog](https://example.com/blog): Blog posts and articles — [Full content](https://example.com/llms.txt/blog)

## Blog Posts

- [First post](https://example.com/blog/first): ... — [Full content](https://example.com/llms.txt/blog/first)

Each "Full content" URL returns the deep Markdown for that entity.

Performance notes

  • Self-calls: autoPages makes the SDK fetch your own pages over HTTP. Use withISR(seconds) to cache the extractions between requests. First uncached hit will be slow in proportion to page count (~300ms–1s per page).
  • JS-rendered content: Readability works on the SSR'd HTML only. If a page is mostly client-rendered, the extracted Markdown will be thin. Use a hand-authored pages entry for those.
  • External pages: extractPageAsMarkdown works against any public URL, not just your own site.