LLMs.txt Generator 

Extension key

rt_llms_txt

Package name

rtfirst/llms-txt

Version

1.0

Language

en

Author

Roland Tfirst

License

This document is published under the GNU General Public License v2.0 or later.

Rendered

Sat, 25 Apr 2026 10:15:16 +0000


This TYPO3 extension generates llms.txt files for AI/LLM crawlers, providing a compact index of your website with SEO metadata and instructions for accessing page content in Markdown format. Optionally protect access with an API key.


Table of Contents

Introduction 

Learn about the concept and features of this extension.

Installation 

Install the extension via Composer or classic mode.

Configuration 

Configure site settings and page properties for LLM optimization.

Usage 

How to access content in different formats and languages.

API Protection 

Protect your LLM endpoints with API key authentication.

Developer Information 

Technical details for developers and integrators.

FAQ 

Frequently asked questions.

Get Help 

Where to get help and report issues.

Introduction 

What is llms.txt? 

llms.txt is an emerging standard for providing AI and Large Language Model (LLM) crawlers with structured information about a website. It serves as a machine-readable index that helps AI systems understand your website's content and structure.

The llmstxt.org specification defines how websites can provide this information in a standardized format.

Concept 

This extension provides a two-tier approach for LLM content access:

  1. llms.txt Index File - A single file containing:

    • Website metadata (title, description, domain)
    • Page structure with SEO descriptions and keywords
    • Instructions for accessing full page content
  2. Markdown Content Access - Access any page content via:

    • .md suffix - Returns clean Markdown with YAML frontmatter
    • Example: /about.md returns the "About" page as Markdown

Multi-Language Support 

Instead of generating separate llms.txt files per language, this extension uses a simpler approach:

  • Single llms.txt - Contains the site structure in the default language
  • Language-specific content - Access any page in any language using the .md suffix with language URL prefix:

    • Default: https://example.com/about.md
    • English: https://example.com/en/about.md
    • German: https://example.com/de/ueber-uns.md

This approach follows how multi-language sites actually work in TYPO3.

Features 

Core Features 

  • Automatic llms.txt generation with smart caching
  • Markdown output for all pages via .md URL suffix
  • Multi-language support via URL prefixes
  • API key protection for restricted access
  • YAML frontmatter in Markdown output with page metadata

Page-Level Control 

  • LLM tab in page properties for fine-grained control
  • Custom descriptions and summaries per page
  • Keywords for better LLM understanding
  • Priority setting (0-100) for page ordering
  • Exclude option to hide specific pages from llms.txt

Technical Features 

  • 24-hour caching for optimal performance
  • HTML-to-Markdown conversion using League/html-to-markdown
  • Clean output - removes scripts, styles, navigation elements
  • UTF-8 BOM for proper encoding detection
  • Backend notification if robots.txt lacks llms.txt reference
  • Header link injection (<link rel="alternate">) in HTML pages

Requirements 

  • TYPO3 13.0 - 14.x
  • PHP 8.2 or higher

Supported Content Elements 

The extension converts the following TYPO3 content elements to Markdown:

  • Header (header)
  • Text (text)
  • Text with Image (textpic, textmedia)
  • Image (image)
  • Bullet List (bullets)
  • Table (table)
  • HTML (html)
  • Menu elements (menu_*)
  • All other elements via HTML-to-Markdown fallback

Installation 

Installation in Classic Mode 

  1. Download the extension from the TYPO3 Extension Repository (TER) or from GitHub.
  2. Install the extension via the Extension Manager in the TYPO3 Backend.
  3. Clear all caches.

Activate the Site Set 

After installation, you need to add the Site Set to your site configuration:

  1. Go to Site Management > Sites in the TYPO3 Backend.
  2. Edit your site configuration.
  3. Go to the Sets tab.
  4. Add the set LLMs.txt Generator (rtfirst/llms-txt).
  5. Save and clear all caches.
Adding the LLMs.txt Generator site set

Add the LLMs.txt Generator site set to your site configuration.

Verify Installation 

After installation, verify that everything works:

  1. Access https://your-site.com/llms.txt - You should see the generated llms.txt content.
  2. Access any page with .md suffix, e.g., https://your-site.com/about.md - You should see Markdown content with YAML frontmatter.
  3. Check the page properties of any page - You should see a new LLM tab.

Configuration 

The extension can be configured via Site Settings and page properties.

Site Settings 

After adding the Site Set to your site configuration (see Activate the Site Set), you can configure the extension in Site Management > Settings.

llmsTxt.baseUrl

llmsTxt.baseUrl
type

string

Default

(empty)

Full URL of the website (e.g., https://example.com). This is used as the base URL in the generated llms.txt file. If empty, the site's base URL from the site configuration is used.

llmsTxt.intro

llmsTxt.intro
type

text

Default

(empty)

Website description shown in the intro section of the llms.txt file. This text appears as a blockquote below the site title and helps AI crawlers understand the purpose of your website.

Example:

Your expert for tires, wheels, and automotive services since 1985.
Copied!

llmsTxt.excludePages

llmsTxt.excludePages
type

string

Default

(empty)

Comma-separated list of page UIDs to exclude from the llms.txt index. Use this for pages that should not appear in the LLM index, such as imprint, privacy policy, or internal pages.

Example:

42,56,123
Copied!

llmsTxt.includeHidden

llmsTxt.includeHidden
type

boolean

Default

false

If enabled, hidden pages are also included in the llms.txt generation. This can be useful for staging environments or preview purposes.

llmsTxt.apiKey

llmsTxt.apiKey
type

string

Default

(empty)

API key for protected access to /llms.txt and .md endpoints. If set, requests without a valid API key will receive a 401 Unauthorized response. Leave empty for public access.

See API Protection for details on how to use API key protection.

Page Properties 

Each page has an LLM tab in the page properties with the following fields:

Exclude from llms.txt

Exclude from llms.txt
type

checkbox

Default

false

If enabled, this page will not appear in the llms.txt index. The page is also excluded from the Markdown output.

LLM Priority

LLM Priority
type

number (slider)

Default
Range

0-100

Higher values (0-100) cause the page to appear earlier in the llms.txt page list. Use this to highlight important pages for AI crawlers.

Recommendations:

  • 80-100: Main landing pages, key services
  • 50-70: Important content pages
  • 20-40: Secondary pages
  • 0-10: Low-priority pages

LLM Description

LLM Description
type

textarea

Default

(empty)

Max length

500 characters

Custom description for this page in the llms.txt index. If empty, the page's SEO meta description (from the SEO tab) is used as fallback.

This description helps AI crawlers understand what the page is about.

LLM Summary

LLM Summary
type

textarea

Default

(empty)

Max length

2000 characters

Additional summary text shown as a blockquote in the llms.txt index. Use this for longer explanations that don't fit in the description.

LLM Keywords

LLM Keywords
type

text input

Default

(empty)

Max length

255 characters

Comma-separated keywords/topics for this page. These appear in the llms.txt index and help AI crawlers categorize the page content.

Example:

tires, wheels, alignment, services
Copied!

robots.txt Configuration 

To allow AI crawlers to discover and access your llms.txt file, add these lines to your public/robots.txt:

# Allow AI crawlers to access llms.txt
User-agent: GPTBot
Allow: /llms.txt

User-agent: Claude-Web
Allow: /llms.txt

User-agent: Anthropic-AI
Allow: /llms.txt

User-agent: Google-Extended
Allow: /llms.txt
Copied!

Usage 

This chapter explains how to access the LLM-optimized content.

Accessing llms.txt 

The llms.txt index file is available at the root of your website:

https://example.com/llms.txt
Copied!

This file contains:

  • Website metadata (title, description, domain)
  • Page structure with descriptions and keywords
  • Instructions for accessing page content in Markdown format

Example llms.txt Output 

# My Website

> Your expert for quality products and services.

**Specification:** <https://llmstxt.org/>
**Domain:** https://example.com
**Language:** de
**Generated:** 2026-01-31 12:00:00

## LLM-Optimized Content Access

This site provides LLM-friendly Markdown output for all pages:

### Markdown Format
Append `.md` to any page URL to get plain Markdown with YAML frontmatter.
- **Example:** `https://example.com/page-slug.md`

### Multi-Language Access
Use language-specific URL prefixes with the `.md` suffix:
- **Default language:** `https://example.com/page.md`
- **English:** `https://example.com/en/page.md`

## Page Structure

- **[Home](/)**
  Welcome to our website with all important information.
  [Markdown](/index.html.md)

  - **[About](/about/)**
    Learn about our company history and values.
    [Markdown](/about.md)

  - **[Services](/services/)**
    Professional services for your needs.
    *Keywords: services, consulting, support*
    [Markdown](/services.md)

- **[Contact](/contact/)**
  Get in touch with us via phone or email.
  [Markdown](/contact.md)
Copied!

Accessing Markdown Content 

Append .md to any page URL to get the content as clean Markdown with YAML frontmatter.

https://example.com/about.md
Copied!

Example Markdown Output 

---
title: "About Us"
description: "Learn about our company history and values."
language: en
date: 2024-06-15
lastmod: 2026-01-31
canonical: "/about"
format: markdown
generator: "TYPO3 LLMs.txt Extension"
---

# About Us

> Learn about our company history and values.

## Our History

Our company was founded in 1985...

## Our Values

- Quality and reliability
- Fair and transparent prices
- Personal consultation
Copied!

The YAML frontmatter contains:

  • title: Page title
  • description: Page description (from LLM or SEO settings)
  • language: ISO language code
  • date: Page creation date (from TYPO3 crdate)
  • lastmod: Last modification date (most recent change across page and content elements); omitted if same as date
  • canonical: Canonical URL path
  • format: Output format (always "markdown")
  • generator: Extension identifier

Accessing the Root Page 

For the root/home page, use:

https://example.com/index.html.md
Copied!

Or simply:

https://example.com/.md
Copied!

Multi-Language Access 

Access page content in different languages using the language URL prefix with the .md suffix:

# German (default language)
https://example.com/ueber-uns.md

# English
https://example.com/en/about.md

# French
https://example.com/fr/a-propos.md
Copied!

The extension automatically:

  • Detects the language from the URL prefix
  • Loads the translated page content
  • Sets the correct language in the YAML frontmatter

Caching 

The extension uses smart caching for optimal performance:

  • llms.txt: Cached and regenerated when TYPO3 cache is cleared
  • Markdown output: Cached for 24 hours per page/language combination

To force regeneration:

vendor/bin/typo3 cache:flush
Copied!

Or in DDEV:

ddev typo3 cache:flush
Copied!

Content Filtering 

The Markdown output is automatically cleaned for better LLM consumption:

Removed elements:

  • Scripts and styles
  • Navigation and footer elements
  • Sidebar content
  • Bootstrap accessibility spans (visually-hidden)
  • Empty anchor tags (<a id="c1"></a>)

Preserved elements:

  • Main content text
  • Headings and structure
  • Lists and tables
  • Images (converted to Markdown syntax)
  • Links (converted to absolute URLs)

API Protection 

You can protect both /llms.txt and the .md suffix endpoints with an API key. This is useful when you want to:

  • Restrict access to your own chatbots or RAG systems
  • Prevent external scraping of structured content
  • Control who can access your LLM-optimized content

Setting Up API Protection 

  1. Go to Site Management > Settings in the TYPO3 Backend.
  2. Find the LLMs-Text category.
  3. Enter your API key in the API Key for Format Access field.
  4. Save and clear all caches.

Authenticating Requests 

Pass the API key via HTTP header (recommended):

# Access llms.txt
curl -H "X-LLM-API-Key: your-secret-key" https://example.com/llms.txt

# Access page as Markdown
curl -H "X-LLM-API-Key: your-secret-key" https://example.com/about.md
Copied!

Or via query parameter:

https://example.com/llms.txt?api_key=your-secret-key
https://example.com/about.md?api_key=your-secret-key
Copied!

Error Response 

Invalid or missing API key returns HTTP 401 Unauthorized with a JSON body:

{
  "error": "Unauthorized",
  "message": "Valid API key required. Provide via X-LLM-API-Key header or api_key query parameter."
}
Copied!

Integration Examples 

n8n Integration 

In n8n HTTP Request node, add the header:

Name Value
X-LLM-API-Key your-secret-key

Python Integration 

import requests

headers = {
    "X-LLM-API-Key": "your-secret-key"
}

# Get llms.txt
response = requests.get("https://example.com/llms.txt", headers=headers)
print(response.text)

# Get page as Markdown
response = requests.get("https://example.com/about.md", headers=headers)
print(response.text)
Copied!

JavaScript/Node.js Integration 

const response = await fetch("https://example.com/llms.txt", {
  headers: {
    "X-LLM-API-Key": "your-secret-key"
  }
});

const content = await response.text();
console.log(content);
Copied!

cURL Integration 

# Store API key in environment variable
export LLM_API_KEY="your-secret-key"

# Access llms.txt
curl -H "X-LLM-API-Key: $LLM_API_KEY" https://example.com/llms.txt

# Access multiple pages
for page in about services contact; do
  curl -H "X-LLM-API-Key: $LLM_API_KEY" "https://example.com/${page}.md" > "${page}.md"
done
Copied!

Behavior When Enabled 

When API key protection is enabled:

  1. llms.txt requires authentication
  2. All .md endpoints require authentication
  3. The HTML header link (<link rel="alternate">) is automatically hidden
  4. The llms.txt file includes authentication instructions

Disabling API Protection 

To make endpoints publicly accessible again:

  1. Go to Site Management > Settings
  2. Clear the API Key for Format Access field
  3. Save and clear all caches

The header link will automatically reappear and endpoints will be publicly accessible.

Developer Information 

This chapter provides technical details for developers and integrators.

Architecture 

The extension uses three PSR-15 middlewares:

  1. UrlSuffixMiddleware - Detects .md suffix and rewrites URLs
  2. LlmsTxtMiddleware - Serves the /llms.txt endpoint
  3. ContentFormatMiddleware - Transforms HTML to Markdown

Middleware Chain 

Request: /about.md
   │
   ▼
UrlSuffixMiddleware (before site resolver)
   │  Strips .md suffix
   │  Sets request attribute 'llms_txt_format' = 'md'
   │  Rewrites URI to /about
   ▼
TYPO3 Site Resolver
   │
   ▼
LlmsTxtMiddleware (after site, before page resolver)
   │  Handles /llms.txt requests
   ▼
TYPO3 Page Resolver & Frontend
   │
   ▼
ContentFormatMiddleware (after content-length-headers)
   │  Checks for 'llms_txt_format' attribute
   │  Converts HTML response to Markdown
   ▼
Response: Markdown with YAML frontmatter
Copied!

Services 

PageTreeService 

RTfirst\LlmsTxt\Service\PageTreeService

Traverses the TYPO3 page tree and collects page data for llms.txt generation.

  • Supports multi-language sites
  • Respects page exclusion settings
  • Handles translated pages with fallback

MarkdownConverterService 

RTfirst\LlmsTxt\Service\MarkdownConverterService

Orchestrates content element to Markdown conversion using registered converters.

LlmsTxtGeneratorService 

RTfirst\LlmsTxt\Service\LlmsTxtGeneratorService

Generates the llms.txt content for a site.

Content Converters 

The extension uses a converter pattern for content element to Markdown conversion. Each converter implements ContentConverterInterface:

interface ContentConverterInterface
{
    public function supports(string $cType): bool;
    public function convert(array $record, string $baseUrl): string;
}
Copied!

Built-in Converters 

Converter Supported CTypes Description
HeaderConverter header Converts header elements
TextConverter text, textpic, textmedia Converts text and text+media
ImageConverter image Converts image galleries
BulletsConverter bullets Converts bullet lists
TableConverter table Converts tables
MenuConverter menu_* Converts menu elements
HtmlConverter html Converts raw HTML
DefaultConverter (fallback) HTML-to-Markdown fallback

Creating Custom Converters 

  1. Create a class implementing ContentConverterInterface:
<?php
declare(strict_types=1);

namespace Vendor\MyExtension\Converter;

use RTfirst\LlmsTxt\Converter\ContentConverterInterface;

class MyCustomConverter implements ContentConverterInterface
{
    public function supports(string $cType): bool
    {
        return $cType === 'my_custom_element';
    }

    public function convert(array $record, string $baseUrl): string
    {
        $header = $record['header'] ?? '';
        $bodytext = $record['bodytext'] ?? '';

        return "## {$header}\n\n{$bodytext}";
    }
}
Copied!
  1. Register the converter in Services.yaml:
services:
  Vendor\MyExtension\Converter\MyCustomConverter:
    tags:
      - name: 'llms_txt.content_converter'
        priority: 100
Copied!

Higher priority converters are checked first.

Event Listeners 

CacheFlushEventListener 

Invalidates llms.txt cache when TYPO3 caches are flushed.

HeaderLinkEventListener 

Injects the <link rel="alternate"> tag into HTML responses.

BackendNotificationEventListener 

Shows a notification in the Backend if robots.txt lacks llms.txt reference.

Caching 

The extension uses two cache layers:

  1. llms.txt Index Cache (cache_pages)

    • Stores generated llms.txt content per site
    • Invalidated on cache flush
  2. Format Output Cache (llms_txt_format)

    • Stores Markdown output per page/language
    • 24-hour default lifetime
    • Part of the pages cache group

Database Schema 

The extension adds fields to the pages table:

Field Type Description
tx_llmstxt_description text LLM-specific page description
tx_llmstxt_summary text Extended page summary
tx_llmstxt_keywords varchar(255) Comma-separated keywords
tx_llmstxt_exclude tinyint(1) Exclude from llms.txt
tx_llmstxt_priority int(11) Priority (0-100) for sorting

Code Quality 

The extension maintains high code quality standards:

  • PHPStan Level 8 compliant
  • PSR-12 code style (php-cs-fixer)
  • Unit tests for converters and services

Run quality checks:

# Static analysis
vendor/bin/phpstan analyse packages/llms_txt --level=8

# Code style check
vendor/bin/php-cs-fixer fix packages/llms_txt --dry-run

# Fix code style
vendor/bin/php-cs-fixer fix packages/llms_txt

# Run tests
vendor/bin/phpunit -c packages/llms_txt/phpunit.xml
Copied!

Dependencies 

  • league/html-to-markdown (^5.1) - HTML to Markdown conversion
  • typo3/cms-core (^13.0 || ^14.0)
  • typo3/cms-frontend (^13.0 || ^14.0)

Frequently Asked Questions (FAQ) 

General Questions 

What is llms.txt? 

llms.txt is an emerging standard for providing AI and Large Language Model (LLM) crawlers with structured information about a website. It helps AI systems understand your website's content and structure. See the llmstxt.org specification for details.

Why should I use this extension? 

  • Better AI understanding of your website content
  • Structured access for chatbots and RAG systems
  • Clean Markdown output without navigation, scripts, or styling
  • Multi-language support out of the box
  • Optional protection with API keys

Installation Questions 

I get a 404 error when accessing /llms.txt 

Make sure you have:

  1. Added the Site Set to your site configuration
  2. Cleared all caches: vendor/bin/typo3 cache:flush

See Activate the Site Set for details.

The LLM tab doesn't appear in page properties 

  1. Verify the extension is installed: vendor/bin/typo3 extension:list
  2. Clear the system cache
  3. Log out and log back into the TYPO3 Backend

Configuration Questions 

How do I exclude specific pages? 

You have two options:

  1. Site Settings: Add page UIDs to llmsTxt.excludePages (comma-separated)
  2. Per Page: Check "Exclude from llms.txt" in the page's LLM tab

How does the priority setting work? 

Pages with higher priority values (0-100) appear earlier in the llms.txt page list. Use this to highlight important pages for AI crawlers:

  • 80-100: Main landing pages
  • 50-70: Important content
  • 20-40: Secondary pages
  • 0-10: Low priority

Can I include hidden pages? 

Yes, enable llmsTxt.includeHidden in the Site Settings. This is useful for staging environments.

Usage Questions 

How do I access the root page as Markdown? 

Use /index.html.md or /.md:

https://example.com/index.html.md
Copied!

How do I access translated pages? 

Use the language prefix with the .md suffix:

# English
https://example.com/en/about.md

# German
https://example.com/de/ueber-uns.md
Copied!

How do I refresh the cached content? 

Clear the TYPO3 cache:

vendor/bin/typo3 cache:flush
Copied!

Or in DDEV:

ddev typo3 cache:flush
Copied!

API Protection Questions 

Which HTTP header should I use for the API key? 

Use X-LLM-API-Key:

curl -H "X-LLM-API-Key: your-key" https://example.com/llms.txt
Copied!

Can I use a query parameter instead? 

Yes, use api_key:

https://example.com/llms.txt?api_key=your-key
Copied!

However, HTTP headers are recommended for security reasons.

Troubleshooting 

The Markdown output is empty 

  • Check if the page has content elements
  • Verify the page is not excluded from llms.txt
  • Clear all caches

Content shows in wrong language 

  • Verify your site language configuration
  • Check if translations exist for the page
  • Use the correct language prefix in the URL

Some content elements are missing 

The extension filters out:

  • Navigation elements
  • Footer content
  • Scripts and styles
  • Empty elements

If a specific content type is missing, you may need to create a custom converter. See Creating Custom Converters.

Where to Get Help 

If you need help with this extension, here are your options:

Community Support 

TYPO3 Slack 

Join the TYPO3 Slack workspace and ask questions in the appropriate channels:

TYPO3 Forum 

Ask questions in the official TYPO3 forum:

Contact the Author 

For direct support, you can contact the extension author:

Documentation 

Report Issues 

Found a bug or have a feature request? Please report it on GitHub:

When reporting issues, please include:

  1. TYPO3 version (e.g., 13.4.1)
  2. PHP version (e.g., 8.2.15)
  3. Extension version (e.g., 1.0.5)
  4. Steps to reproduce the issue
  5. Expected behavior vs. actual behavior
  6. Error messages or logs (if any)

Contribute 

Contributions are welcome! The source code is available on GitHub:

To contribute:

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Run code quality checks (see Code Quality)
  5. Submit a pull request

Changelog 

See the CHANGELOG.md for a detailed list of changes in each version.