LLM helpers
Generate llms.txt and extract page content as Markdown for AI consumption.
The inventra/llms sub-export builds llms.txt content for your site — a plain-text index that LLMs read to understand what your site is about. Each page also gets a deep-content URL serving its full Markdown.
There are two building blocks:
createLlmsService— composes the site-widellms.txtindex plus per-page content resolvers. Covers your blog posts automatically; picks up static pages viaautoPages.extractPageAsMarkdown— standalone helper that fetches any URL, extracts the main content with Mozilla Readability, and returns it as Markdown.
Quick setup
Two route handlers — one for the index, one for per-page deep content:
// app/llms.txt/route.ts
import { createLlmsService } from 'inventra/llms';
import { withISR } from 'inventra/next';
import { inventra } from '@/packages/plugins/config';
const llms = createLlmsService(
{
client: inventra,
siteName: 'Acme',
siteDescription: 'We build great things.',
baseUrl: 'https://example.com',
pages: {},
autoPages: [
{ path: '/' },
{ path: '/about' },
{ path: '/pricing' },
{ path: '/contact' }
]
},
withISR(3600)
);
export const revalidate = 3600;
export async function GET() {
const body = await llms.generateIndex();
return new Response(body, {
headers: { 'Content-Type': 'text/plain; charset=utf-8' }
});
}// app/llms.txt/[...slug]/route.ts
import { createLlmsService } from 'inventra/llms';
import { withISR } from 'inventra/next';
import { inventra } from '@/packages/plugins/config';
const llms = createLlmsService(
{
/* same config as above */
},
withISR(3600)
);
export const revalidate = 3600;
export async function GET(
_req: Request,
{ params }: { params: Promise<{ slug: string[] }> }
) {
const { slug } = await params;
const body = await llms.generatePageContent(slug);
if (body === null) return new Response('Not found', { status: 404 });
return new Response(body, {
headers: { 'Content-Type': 'text/plain; charset=utf-8' }
});
}That's it — /llms.txt returns the index, /llms.txt/<slug> returns the full Markdown for any configured page or blog post.
createLlmsService(config, fetchConfig?)
Returns { generateIndex, generatePageContent }.
Config
| Field | Type | Description |
|---|---|---|
client |
Inventra |
The SDK client (used to list blog posts). |
siteName |
string |
Shown as the # Title of the llms.txt index. |
siteDescription |
string |
One-sentence blurb under the title. |
baseUrl |
string |
Absolute base URL, no trailing slash. |
pages |
Record<string, LlmsPageProvider> |
Hand-authored page content. Pass {} if you only want autoPages. |
autoPages |
LlmsAutoPageConfig[] |
Pages whose content is auto-extracted from rendered HTML. |
autoPages — zero-config page coverage
Each entry points at a path on your site. The SDK fetches that URL, strips nav/header/footer/aside chrome, runs Mozilla Readability to isolate the main content, and converts to Markdown with GFM support.
autoPages: [
{ path: '/' },
{ path: '/about' },
{ path: '/pricing', exclude: ['.cookie-banner'] }, // extra selectors to strip
{ path: '/careers', label: 'Join us', description: 'Open roles and culture.' }
]Entry fields
| Field | Type | Description |
|---|---|---|
path |
string |
Path on the site, starting with /. Use / for the home page. |
label |
string |
Optional. Defaults to the extracted <title>. |
description |
string |
Optional. Defaults to the meta description. |
exclude |
string[] |
Optional CSS selectors to remove before extraction, on top of the default chrome stripping. |
pages — hand-authored content
Use this when you want full control of the content, or when the page is gated / JS-heavy and HTML extraction won't capture what matters.
pages: {
about: {
label: 'About',
description: 'Who we are and what we do.',
getContent: () => ({
title: 'About Acme',
description: 'Our story.',
sections: [
{ heading: 'Mission', content: '...' },
{ heading: 'Team', content: '...' }
]
})
}
}extractPageAsMarkdown(url, options?)
Standalone helper. Useful outside createLlmsService — e.g. embeddings, RAG indexing, custom llms.txt layouts.
import { extractPageAsMarkdown } from 'inventra/llms';
const { title, description, markdown, url } = await extractPageAsMarkdown(
'https://example.com/about',
{
exclude: ['.newsletter-banner'],
fetchOptions: { next: { revalidate: 3600 } }
}
);Options
| Field | Type | Description |
|---|---|---|
exclude |
string[] |
CSS selectors to remove before extraction. |
userAgent |
string |
Override the fetch user agent. |
fetchOptions |
RequestInit |
Merged into the fetch init — pass Next.js cache directives here. |
Generated output
The index produces standard llms.txt:
# Acme
> We build great things.
## Pages
- [Home](https://example.com/): ... — [Full content](https://example.com/llms.txt/home)
- [About](https://example.com/about): ... — [Full content](https://example.com/llms.txt/about)
- [Blog](https://example.com/blog): Blog posts and articles — [Full content](https://example.com/llms.txt/blog)
## Blog Posts
- [First post](https://example.com/blog/first): ... — [Full content](https://example.com/llms.txt/blog/first)Each "Full content" URL returns the deep Markdown for that entity.
Performance notes
- Self-calls:
autoPagesmakes the SDK fetch your own pages over HTTP. UsewithISR(seconds)to cache the extractions between requests. First uncached hit will be slow in proportion to page count (~300ms–1s per page). - JS-rendered content: Readability works on the SSR'd HTML only. If a page is mostly client-rendered, the extracted Markdown will be thin. Use a hand-authored
pagesentry for those. - External pages:
extractPageAsMarkdownworks against any public URL, not just your own site.