Introduction
What is llms.txt?
llms.txt is an emerging standard for providing AI and Large Language Model (LLM)
crawlers with structured information about a website. It serves as a machine-readable
index that helps AI systems understand your website's content and structure.
The llmstxt.org specification defines how websites can provide this information in a standardized format.
Concept
This extension provides a two-tier approach for LLM content access:
-
llms.txt Index File - A single file containing:
- Website metadata (title, description, domain)
- Page structure with SEO descriptions and keywords
- Instructions for accessing full page content
-
Markdown Content Access - Access any page content via:
.mdsuffix - Returns clean Markdown with YAML frontmatter- Example:
/about.mdreturns the "About" page as Markdown
Multi-Language Support
Instead of generating separate llms.txt files per language, this extension
uses a simpler approach:
- Single llms.txt - Contains the site structure in the default language
-
Language-specific content - Access any page in any language using the
.mdsuffix with language URL prefix:- Default:
https://example.com/about.md - English:
https://example.com/en/about.md - German:
https://example.com/de/ueber-uns.md
- Default:
This approach follows how multi-language sites actually work in TYPO3.
Features
Core Features
- Automatic llms.txt generation with smart caching
- Markdown output for all pages via
.mdURL suffix - Multi-language support via URL prefixes
- API key protection for restricted access
- YAML frontmatter in Markdown output with page metadata
Page-Level Control
- LLM tab in page properties for fine-grained control
- Custom descriptions and summaries per page
- Keywords for better LLM understanding
- Priority setting (0-100) for page ordering
- Exclude option to hide specific pages from llms.txt
Technical Features
- 24-hour caching for optimal performance
- HTML-to-Markdown conversion using League/html-to-markdown
- Clean output - removes scripts, styles, navigation elements
- UTF-8 BOM for proper encoding detection
- Backend notification if robots.txt lacks llms.txt reference
- Header link injection (
<link rel="alternate">) in HTML pages
Requirements
- TYPO3 13.0 - 14.x
- PHP 8.2 or higher
Supported Content Elements
The extension converts the following TYPO3 content elements to Markdown:
- Header (
header) - Text (
text) - Text with Image (
textpic,textmedia) - Image (
image) - Bullet List (
bullets) - Table (
table) - HTML (
html) - Menu elements (
menu_*) - All other elements via HTML-to-Markdown fallback