LLMS TXT Generator for TYPO3 

Extension key

ai_llms_txt

Package name

web-vision/ai-llms-txt

Version

main

Language

en

Copyright

2025

Author

web-vision GmbH

Rendered

Thu, 22 Jan 2026 09:59:44 +0000

License

This document is published under the Open Content License available from http://www.opencontent.org/opl.shtml

The content of this document is related to TYPO3, a GNU/GPL CMS/Framework available from www.typo3.org.

Table of Contents

Introduction 

What does this extension do? 

The LLMS TXT Generator extension provides TYPO3 with the capability to generate machine-readable files according to the llmstxt.org specification.

The extension creates two types of content:

  1. llms.txt files - Machine-readable policy files that inform Large Language Models (LLMs) and AI crawlers about your site's crawling preferences and content structure
  2. Markdown content - Human and machine-readable representations of your TYPO3 pages in Markdown format

Key Features 

  • Automatic llms.txt generation - Creates policy links at /.well-known/llms.txt according to the official specification
  • Site navigation structure - Automatically includes your site's navigation hierarchy in the llms.txt file
  • Configurable metadata - Add topics, contact information, and custom descriptions
  • Markdown export - Convert any TYPO3 page to Markdown format via .md suffix
  • TYPO3 v12, v13 & v14 compatibility - Supports all current LTS versions using modern PHP practices
  • Flexible configuration - Control depth, content, and behavior through Site Configuration
  • Frontend rendering integration - Leverages TYPO3's native content rendering pipeline

What is llms.txt? 

llms.txt is an emerging standard for websites to communicate with Large Language Models and AI systems. Similar to robots.txt for web crawlers, llms.txt files provide:

  • Crawling policies - Guidelines for AI systems on how to interact with your content
  • Site structure - Navigation and content organization information
  • Metadata - Topics, contact information, and site descriptions
  • Content access - Direct links to machine-readable content formats

The specification follows a simple, human-readable format that both humans and AI systems can easily parse.

Use Cases 

Educational Institutions
Provide structured access to course catalogs, faculty information, and academic content for AI-powered educational tools.
Content Publishers
Offer clear guidelines for AI systems accessing articles, documentation, and media content.
Business Websites
Structure company information, services, and contact details for AI-powered business discovery tools.
Documentation Sites
Enable AI systems to better understand and reference technical documentation and knowledge bases.

Requirements 

  • TYPO3 CMS 13.0 or higher
  • PHP 8.2 or higher
  • league/html-to-markdown package (automatically installed)

The extension integrates seamlessly with existing TYPO3 installations and requires minimal configuration to get started.

Administrator 

Installation 

The extension can be installed using Composer (recommended). Legacy mode is not tested.

Composer Installation 

composer require web-vision/ai-llms-txt
Copied!

Configuration 

After installation, the extension works out of the box with sensible defaults. However, you can customize its behavior through TypoScript configuration.

Basic TypoScript Setup 

The extension automatically includes its TypoScript configuration. No manual setup is required for basic functionality.

Site Configuration 

The extension automatically uses your site's configuration from config/sites/[site]/config.yaml. If you want to customize the routing, you can add custom route enhancers:

imports:
  -
    resource: 'EXT:ai_llms_txt/Configuration/Routes/RouterEnhancer.yaml'
Copied!

This import adds the following route enhancers:

  • llms.txt for the llms.txt specification (typeNum 1699)
  • .md suffix for Markdown content (typeNum 1701)

Accessing Generated Content 

After installation, the following URLs become available:

llms.txt Files 

  • https://yoursite.com/?type=1699 - Default access via typeNum
  • https://yoursite.com/llms.txt - Alternative direct access (if route enhancer is configured)

Markdown Content 

  • https://yoursite.com/any-page.md - Markdown version of any TYPO3 page

Testing the Installation 

  1. Test llms.txt generation: Visit https://yoursite.com/llms.txt to verify the llms.txt is accessible.

    or

    Visit https://yoursite.com/.well-known/llms.txt

  2. Test Markdown conversion:

    Visit any page on your site with .md appended (e.g., https://yoursite.com/about.md) to see the Markdown version.

  3. Check content structure:

    The llms.txt file should include your site's title, description, and navigation structure.

Troubleshooting 

llms.txt file not accessible
Ensure your web server is configured to serve files from the .well-known directory. Some servers block access to hidden directories by default.
Markdown conversion fails
Check that the league/html-to-markdown package is properly installed via Composer.
Navigation structure missing
Verify that your pages are not hidden and have proper navigation settings in the page properties.
Content not rendering
Ensure content elements are not hidden and are in standard column positions (colPos).

Performance Considerations 

  • The extension uses TYPO3's caching mechanisms where possible
  • llms.txt generation processes the entire site navigation, so performance depends on site size
  • Markdown conversion processes all content elements on a page
  • For large sites, consider implementing additional caching strategies if needed

Editor 

Working with LLMS TXT Content 

As an editor, you don't need to directly interact with the LLMS TXT Generator extension in most cases. The extension automatically generates content based on your existing TYPO3 pages and content.

Understanding Generated Content 

The extension creates two types of content from your TYPO3 pages:

llms.txt Files 

The extension automatically generates an llms.txt file that includes:

  • Site title and description - Taken from your site's root page
  • Navigation structure - Your page hierarchy and menu structure
  • Topics and keywords - Configured by administrators
  • Contact information - Configured by administrators

This file helps AI systems understand your site's structure and content.

Markdown Pages 

Any page on your website can be viewed in Markdown format by adding .md to the URL. For example:

  • /about becomes /about.md
  • /services/consulting becomes /services/consulting.md

The Markdown version includes:

  • Page title - From the page properties
  • Page description - From the page properties
  • All content elements - Headers, text, images, etc. converted to Markdown format

Best Practices for Content 

To ensure your content works well with AI systems and the LLMS TXT Generator:

Page Properties 

Use descriptive titles
Clear, descriptive page titles help AI systems understand your content's purpose.
Add page descriptions
The description field in page properties provides context for AI systems and appears in both llms.txt and Markdown output.
Structure your navigation logically
Your site's navigation structure is included in the llms.txt file, so logical organization helps AI systems understand your content hierarchy.

Content Elements 

Use semantic headers
Use header elements (H1, H2, H3, etc.) to structure your content logically.
Write clear, descriptive text
Well-written content is more useful for AI systems and human readers alike.
Add alt text to images
Image alternative text is included in Markdown conversion and helps AI systems understand visual content.
Organize content logically
Content elements are rendered in their page order, so logical organization improves the Markdown output.

Viewing Generated Content 

As an editor, you can preview the generated content to understand how AI systems will see your pages:

To view the llms.txt file:
Visit /.well-known/llms.txt on your website's frontend.
To view a page in Markdown:
Add .md to any page URL to see the Markdown version.
To check navigation structure:
The llms.txt file includes your site's navigation, which reflects your page tree structure.

Content that Works Well 

The following types of content work particularly well with the LLMS TXT Generator:

  • Articles and blog posts - Convert cleanly to Markdown with proper heading structure
  • Documentation pages - Structured content with headers and lists
  • Service descriptions - Clear, descriptive content about what you offer
  • About pages - Company or organization information
  • Contact information - Structured contact details

Content Limitations 

Some content types may not convert perfectly to Markdown:

  • Complex layouts - Multi-column layouts may not preserve exact visual structure
  • Interactive elements - Forms, JavaScript widgets, etc. may not convert meaningfully
  • Custom styling - Visual formatting is simplified in Markdown
  • Media galleries - Complex image arrangements may be simplified

This is normal and expected - the Markdown format is designed to be a simplified, semantic representation of your content that focuses on meaning rather than visual presentation.

Configuration 

Page Type Configuration 

The extension defines two page types for serving content:

llms.txt Page Type 

llmstxt = PAGE
llmstxt {
    typeNum = 1699

    config {
        disableAllHeaderCode = 1
        additionalHeaders.10.header = Content-Type: text/plain; charset=utf-8
        additionalHeaders.10.replace = 1
        xhtml_cleaning = 0
        admPanel = 0
        debug = 0
        no_cache = 1
    }

    10 = USER
    10 {
        userFunc = WebVision\AiLlmsTxt\Controller\LlmsTxtController->generateAction
    }
}
Copied!

Markdown Page Type 

markdown_page = PAGE
markdown_page {
    typeNum = 1701

    config {
        disableAllHeaderCode = 1
        additionalHeaders.10.header = Content-Type: text/plain; charset=utf-8
        additionalHeaders.10.replace = 1
        xhtml_cleaning = 0
        admPanel = 0
        debug = 0
        no_cache = 1
        forceAbsoluteUrls = 1
    }

    10 = USER
    10 {
        userFunc = WebVision\AiLlmsTxt\Controller\LlmsTxtController->renderPageAsMarkdown
    }
}
Copied!

Route Configuration 

The extension includes route enhancers to create user-friendly URLs:

# EXT:ai_llms_txt/Configuration/Routes/RouterEnhancer.yaml
routeEnhancers:
  PageTypeSuffix:
    type: PageType
    map:
      .md: 1701
      llms.txt: 1699
Copied!

To use these routes, include them in your site configuration:

# config/sites/main/config.yaml
imports:
  -
    resource: 'EXT:ai_llms_txt/Configuration/Routes/RouterEnhancer.yaml'
Copied!

Advanced Configuration 

Custom Navigation Filtering 

If you need to exclude certain pages from the llms.txt navigation structure, you can extend the NavigationBuilder service or use standard TYPO3 page properties:

  • Set "Hide in navigation" to exclude pages from llms.txt
  • Use "Access" settings to control visibility
  • Set pages to hidden to exclude them entirely
  • Set no index meta tag on pages to prevent inclusion in llms.txt

Custom Content Processing 

The extension uses TYPO3's standard content rendering. To customize how content appears in Markdown:

  • Use standard TYPO3 content element configuration
  • Customize TypoScript rendering for specific content types
  • The extension respects all standard TYPO3 content visibility settings

Developer 

Architecture Overview 

The LLMS TXT Generator extension follows modern TYPO3 development practices and supports TYPO3 v12, v13, and v14 LTS.

Core Components 

Services 

ConfigurationService
Handles all site configuration reading and provides typed access to configuration values. Uses the TYPO3 core request injection pattern - request is set once via setRequest() and all methods access it internally.
LlmsTxtGeneratorService
Main application service that orchestrates the generation of llms.txt content by coordinating other services.
MarkdownConverterService
Converts HTML content to Markdown format using the league/html-to-markdown library.
NavigationBuilder
Builds the site navigation structure for inclusion in llms.txt files.

Repository 

PageRepository
Provides database access for page records with proper respect for TYPO3's language and workspace handling.

Controller 

LlmsTxtController
Thin controller layer that handles HTTP requests and delegates business logic to services. Provides entry points for TypoScript USER objects.

API Reference 

Controller Methods 

WebVision\LlmsTxt\Controller\LlmsTxtController 

generateAction ( string $content = '', array $conf = []) : string

Generates llms.txt content for TypoScript USER object.

param string $content

Content passed from TypoScript (usually empty)

param array $conf

Configuration array from TypoScript

throws

Exception on generation failures

Returns

Generated llms.txt content as string

renderPageAsMarkdown ( string $content = '', array $conf = []) : string

Renders current page as Markdown by leveraging TYPO3's frontend rendering.

param string $content

Content passed from TypoScript (usually empty)

param array $conf

Configuration array from TypoScript

throws

Exception on rendering or conversion failures

Returns

Page content converted to Markdown format

Service Classes 

WebVision\LlmsTxt\Service\LlmsTxtGeneratorService 

generateLlmsTxt ( int $currentPageId) : string

Generates complete llms.txt content for a given page context.

param int $currentPageId

Current page ID for context

Returns

Complete llms.txt formatted content

WebVision\LlmsTxt\Service\ConfigurationService 

setRequest ( ServerRequestInterface $request) : void

Sets the request for the service. Must be called before using other methods. Follows the TYPO3 core pattern used by ContentObjectRenderer.

param ServerRequestInterface $request

The current PSR-7 request

isEnabled ( ) : bool

Checks if llms.txt generation is enabled.

Returns

True if enabled, false otherwise

getCurrentPageId ( ) : int

Gets the current page ID from the request. Handles TYPO3 v12/v13 via TSFE and v14+ via frontend.page.information request attribute.

throws

RuntimeException if page ID cannot be determined

Returns

Current page ID

getMaxDepth ( ) : int

Gets maximum navigation depth setting.

Returns

Maximum depth as integer

getTitleOverride ( ) : string

Gets custom title override if configured.

Returns

Custom title or empty string

getDescriptionOverride ( ) : string

Gets custom description override if configured.

Returns

Custom description or empty string

getKeywords ( ) : array

Gets configured keywords/topics.

Returns

Array of keyword strings

getContactEmail ( ) : string

Gets configured contact email.

Returns

Contact email or empty string

getAdditionalInfo ( ) : string

Gets additional information text.

Returns

Additional info or empty string

getSiteUrl ( ) : string

Gets the current site's base URL.

Returns

Site URL as string

Extending the Extension 

tbd.

Hooks and Events 

The extension doesn't currently provide PSR-14 events, but you can extend functionality through:

  1. Service replacement - Override services through dependency injection
  2. TypoScript configuration - Extend configuration options
  3. Custom page types - Create additional page types using the same controller methods

Testing 

Unit Testing 

The extension includes comprehensive unit tests, particularly for the ConfigurationService. Tests can be run against all supported TYPO3 versions:

# Run unit tests with TYPO3 v12
Build/Scripts/runTests.sh -s unit -t 12

# Run unit tests with TYPO3 v13
Build/Scripts/runTests.sh -s unit -t 13

# Run unit tests with TYPO3 v14
Build/Scripts/runTests.sh -s unit -t 14

# Run with specific PHP version
Build/Scripts/runTests.sh -s unit -t 14 -p 8.3
Copied!

The ConfigurationService tests cover:

  • Request injection and retrieval
  • Fallback to $GLOBALS['TYPO3_REQUEST']
  • All configuration getters (isEnabled, getMaxDepth, getTitleOverride, etc.)
  • Default values and type casting

Functional Testing 

Functional tests can be run using:

Build/Scripts/runTests.sh -s functional
Copied!

Performance Considerations 

Navigation Building

Navigation structure generation scales with site size. For large sites (1000+ pages), consider:

  • Reducing maxDepth setting
  • Implementing custom navigation filtering
  • Adding caching layers
Content Rendering

Markdown conversion processes all content elements. For content-heavy pages:

  • Consider selective content rendering
  • Implement content type filtering
  • Use caching for expensive conversions
Memory Usage

The HTML-to-Markdown conversion can be memory-intensive for large pages. Monitor memory usage and consider:

  • Chunked processing for very large pages
  • Custom memory-efficient conversion strategies

Contributing 

When contributing to the extension:

  1. Follow TYPO3 coding standards - Use php-cs-fixer with TYPO3 ruleset
  2. Write tests - Include unit and functional tests for new features
  3. Document changes - Update documentation for new configuration options
  4. Use dependency injection - Prefer constructor injection over service location
  5. Type everything - Use strict types and comprehensive type hints

Known Problems 

Current Limitations 

Markdown Conversion 

Complex HTML Structures

Some complex HTML structures may not convert perfectly to Markdown. This particularly affects:

  • Multi-column layouts
  • Nested tables with complex formatting
  • Custom HTML elements with specific styling
  • Interactive elements (JavaScript widgets, forms)
Content Element Limitations

Certain TYPO3 content elements may not convert optimally:

  • File collections with custom rendering
  • Media galleries with specific layouts
  • Custom content elements without semantic HTML
Large Page Content

Pages with very large amounts of content may experience:

  • Memory limitations during HTML-to-Markdown conversion
  • Slower response times for .md requests
  • Potential timeouts on resource-constrained servers

Web Server Configuration 

.well-known Directory Access

Some web server configurations may block access to .well-known directories:

  • Apache servers may require specific .htaccess rules
  • Nginx servers may need location block configuration
  • Some shared hosting providers block hidden directory access
MIME Type Handling
Text/plain MIME type for .md and llms.txt files may not be properly configured on all servers.

Known Issues 

HTML-to-Markdown Edge Cases 

Issue: Nested blockquotes may not render correctly

Workaround: Avoid deeply nested blockquote structures in content elements

Issue: Table formatting may be simplified or lost

Workaround: Use simple table structures for content that will be converted to Markdown

Issue: Custom CSS classes and styling are not preserved

Expected Behavior: Markdown is a semantic format - visual styling is intentionally simplified

Performance Issues 

Issue: Large sites (1000+ pages) may experience slow llms.txt generation

Workaround: Reduce the maxDepth setting in TypoScript configuration

Solution: Consider implementing page-level caching for navigation structures

Issue: Memory usage scales with page content size

Workaround: Monitor memory limits and consider splitting very large pages

Solution: Implement chunked processing for extremely large content

Compatibility Issues 

Issue: Some third-party extensions may interfere with content rendering

Symptoms: Missing content in Markdown output or errors during generation

Workaround: Test with third-party extensions disabled to isolate conflicts

Issue: Custom TypoScript configurations may affect page type rendering

Symptoms: Incorrect MIME types or additional headers in output

Solution: Ensure the extension's TypoScript is loaded after custom configurations

Planned Improvements 

Performance Enhancements 

  • Caching Layer: Implementation of dedicated caching for navigation structures and frequently accessed content
  • Chunked Processing: Support for processing very large pages in chunks to reduce memory usage
  • Selective Rendering: Options to exclude specific content types from Markdown conversion

Feature Additions 

  • Custom Content Filters: Configuration options to exclude specific content element types
  • Enhanced Metadata: Support for additional llms.txt specification fields as the standard evolves
  • Multi-language Support: Better handling of multi-language sites and language-specific llms.txt files

Developer Experience 

  • PSR-14 Events: Addition of events for custom processing hooks
  • Better Error Handling: More detailed error messages and logging
  • Development Tools: CLI commands for testing and debugging llms.txt generation

Workarounds 

Large Site Performance 

For sites with performance issues:

llmstxt.settings {
    # Reduce navigation depth
    maxDepth = 2

    # Consider adding custom page exclusion logic
    # (requires custom extension development)
}
Copied!

Custom Content Filtering 

To exclude specific content types from Markdown conversion, extend the controller:

<?php
// Custom implementation to filter content elements
// See Developer documentation for detailed examples
Copied!

Reporting Issues 

When reporting issues, please include:

  • TYPO3 version
  • PHP version
  • Extension version
  • Site size (approximate number of pages)
  • Specific error messages or unexpected behavior
  • Steps to reproduce the issue
  • Server configuration details (if relevant)

Report issues on the project's issue tracker or contact the development team directly.

ChangeLog 

Version 0.2.0-0.2.1 

Release Date: 2026-01-22

Changes 

Added

  • Unit tests for ConfigurationService with full coverage (24 tests)
  • TYPO3 v14 LTS support in test runner scripts
  • Comprehensive test suite runnable across TYPO3 12, 13, and 14

Changed

  • Refactored ConfigurationService to use TYPO3 core request injection pattern
  • Request is now injected once via setRequest() instead of passing to each method
  • Added fallback to $GLOBALS['TYPO3_REQUEST'] for backward compatibility
  • Improved code architecture following TYPO3 ContentObjectRenderer pattern

Removed

  • Removed DownloadMarkdownCommand from Services.yaml (command not yet implemented)

Version 0.1.8 - 0.1.9 

Release Date: 2025-12-30

Latest Changes 

Fixed * Corrected the exclusion of specific doktypes in PageRepository to ensure proper fetching of child pages.

ChangeLog 

Version 0.1.7 

Release Date: 2025-12-18

Latest Changes 

Changed

  • Update PageRepository to exclude specific doktypes and fetch their children

Release Date: 2025-10-29

Initial Release 

This is the initial release of the LLMS TXT Generator extension for TYPO3 v13.

New Features 

Core Functionality

  • Complete llms.txt generation according to the llmstxt.org specification
  • Automatic site navigation structure inclusion with configurable depth
  • Page-to-Markdown conversion for any TYPO3 page via .md suffix

Configuration Options

  • TypoScript-based configuration for all settings
  • Configurable navigation depth (maxDepth setting)
  • Custom title and description overrides
  • Keywords/topics configuration for site metadata
  • Contact email specification for AI systems
  • Additional information text support

Technical Implementation

  • Built specifically for TYPO3 v13 using modern PHP 8.2+ practices
  • Service-oriented architecture with dependency injection
  • Proper separation of concerns with dedicated services for each responsibility
  • HTML-to-Markdown conversion using league/html-to-markdown library
  • Route enhancers for user-friendly URLs (.md suffix and llms.txt endpoints)

Documentation

  • Complete documentation following TYPO3 standards
  • Administrator installation and configuration guide
  • Editor usage guidelines
  • Developer API reference and extension guide
  • Configuration examples and best practices

System Requirements 

  • TYPO3 CMS 13.0 or higher
  • PHP 8.2 or higher
  • league/html-to-markdown ^5.1 (automatically installed via Composer)

Breaking Changes 

This is the initial release, so no breaking changes apply.

Known Issues 

  • Large sites (1000+ pages) may experience performance impacts during navigation generation
  • Complex HTML structures may not convert perfectly to Markdown format
  • Some web servers may require configuration for .well-known directory access

Migration Notes 

This is a new extension, so no migration is required.

Deprecations 

No deprecations in this initial release.

Credits 

  • Development: web-vision GmbH
  • Based on the llmstxt.org specification for AI-readable content guidelines
  • Uses league/html-to-markdown for HTML-to-Markdown conversion

Installation 

Install via Composer:

composer require web-vision/ai-llms-txt
Copied!

After installation, the extension is ready to use with default settings. Visit ?type=1699 or ?type=1701 or /.well-known/llms.txt if you have configured RouterEnhancers, to access the generated links.

For detailed configuration options, see the Configuration section of this documentation.