LLM helpers

The inventra/llms sub-export builds llms.txt content for your site — a plain-text index that LLMs read to understand what your site is about. Each page also gets a deep-content URL serving its full Markdown.

There are two building blocks:

createLlmsService — composes the site-wide llms.txt index plus per-page content resolvers. Covers your blog posts automatically; picks up static pages via autoPages.
extractPageAsMarkdown — standalone helper that fetches any URL, extracts the main content with Mozilla Readability, and returns it as Markdown.

Quick setup

Two route handlers — one for the index, one for per-page deep content:

// app/llms.txt/route.ts
import { createLlmsService } from 'inventra/llms';
import { withISR } from 'inventra/next';
import { inventra } from '@/packages/plugins/config';

const llms = createLlmsService(
  {
    client: inventra,
    siteName: 'Acme',
    siteDescription: 'We build great things.',
    baseUrl: 'https://example.com',
    pages: {},
    autoPages: [
      { path: '/' },
      { path: '/about' },
      { path: '/pricing' },
      { path: '/contact' }
    ]
  },
  withISR(3600)
);

export const revalidate = 3600;

export async function GET() {
  const body = await llms.generateIndex();
  return new Response(body, {
    headers: { 'Content-Type': 'text/plain; charset=utf-8' }
  });
}

// app/llms.txt/[...slug]/route.ts
import { createLlmsService } from 'inventra/llms';
import { withISR } from 'inventra/next';
import { inventra } from '@/packages/plugins/config';

const llms = createLlmsService(
  {
    /* same config as above */
  },
  withISR(3600)
);

export const revalidate = 3600;

export async function GET(
  _req: Request,
  { params }: { params: Promise<{ slug: string[] }> }
) {
  const { slug } = await params;
  const body = await llms.generatePageContent(slug);
  if (body === null) return new Response('Not found', { status: 404 });
  return new Response(body, {
    headers: { 'Content-Type': 'text/plain; charset=utf-8' }
  });
}

That's it — /llms.txt returns the index, /llms.txt/<slug> returns the full Markdown for any configured page or blog post.

`createLlmsService(config, fetchConfig?)`

Returns { generateIndex, generatePageContent }.

Config

Field	Type	Description
`client`	`Inventra`	The SDK client (used to list blog posts).
`siteName`	`string`	Shown as the `# Title` of the llms.txt index.
`siteDescription`	`string`	One-sentence blurb under the title.
`baseUrl`	`string`	Absolute base URL, no trailing slash.
`pages`	`Record<string, LlmsPageProvider>`	Hand-authored page content. Pass `{}` if you only want `autoPages`.
`autoPages`	`LlmsAutoPageConfig[]`	Pages whose content is auto-extracted from rendered HTML.

`autoPages` — zero-config page coverage

Each entry points at a path on your site. The SDK fetches that URL, strips nav/header/footer/aside chrome, runs Mozilla Readability to isolate the main content, and converts to Markdown with GFM support.

autoPages: [
  { path: '/' },
  { path: '/about' },
  { path: '/pricing', exclude: ['.cookie-banner'] }, // extra selectors to strip
  { path: '/careers', label: 'Join us', description: 'Open roles and culture.' }
]

Entry fields

Field	Type	Description
`path`	`string`	Path on the site, starting with `/`. Use `/` for the home page.
`label`	`string`	Optional. Defaults to the extracted `<title>`.
`description`	`string`	Optional. Defaults to the meta description.
`exclude`	`string[]`	Optional CSS selectors to remove before extraction, on top of the default chrome stripping.

`pages` — hand-authored content

Use this when you want full control of the content, or when the page is gated / JS-heavy and HTML extraction won't capture what matters.

pages: {
  about: {
    label: 'About',
    description: 'Who we are and what we do.',
    getContent: () => ({
      title: 'About Acme',
      description: 'Our story.',
      sections: [
        { heading: 'Mission', content: '...' },
        { heading: 'Team', content: '...' }
      ]
    })
  }
}

`extractPageAsMarkdown(url, options?)`

Standalone helper. Useful outside createLlmsService — e.g. embeddings, RAG indexing, custom llms.txt layouts.

import { extractPageAsMarkdown } from 'inventra/llms';

const { title, description, markdown, url } = await extractPageAsMarkdown(
  'https://example.com/about',
  {
    exclude: ['.newsletter-banner'],
    fetchOptions: { next: { revalidate: 3600 } }
  }
);

Options

Field	Type	Description
`exclude`	`string[]`	CSS selectors to remove before extraction.
`userAgent`	`string`	Override the fetch user agent.
`fetchOptions`	`RequestInit`	Merged into the fetch init — pass Next.js cache directives here.

Generated output

The index produces standard llms.txt:

# Acme

> We build great things.

## Pages

- [Home](https://example.com/): ... — [Full content](https://example.com/llms.txt/home)
- [About](https://example.com/about): ... — [Full content](https://example.com/llms.txt/about)

- [Blog](https://example.com/blog): Blog posts and articles — [Full content](https://example.com/llms.txt/blog)

## Blog Posts

- [First post](https://example.com/blog/first): ... — [Full content](https://example.com/llms.txt/blog/first)

Each "Full content" URL returns the deep Markdown for that entity.

Performance notes

Self-calls: autoPages makes the SDK fetch your own pages over HTTP. Use withISR(seconds) to cache the extractions between requests. First uncached hit will be slow in proportion to page count (~300ms–1s per page).
JS-rendered content: Readability works on the SSR'd HTML only. If a page is mostly client-rendered, the extracted Markdown will be thin. Use a hand-authored pages entry for those.
External pages: extractPageAsMarkdown works against any public URL, not just your own site.

Quick setup

createLlmsService(config, fetchConfig?)

autoPages — zero-config page coverage

pages — hand-authored content