Developer manual

This chapter describes some internals of this extension to let you extend it easily.

Assets such as PDF, images, documents, … are uploaded to TYPO3. Metadata extraction services are called, one after another, based on their advertised priority or quality. These services are the various extraction classes you find under Classes/Service/Extraction/).

The service classes invoke the actual wrappers to the extraction tools (Apache Tika, ExifTool, PHP, …) to be found under Classes/Service/Wrapper/.

In order to map the data format used by the various extraction tools to the FAL metadata structure used by TYPO3, a JSON-based configuration file is used. Those mapping configuration files can be found under Configuration/Services/Wrapper/.

Overview of the extraction of metadata in TYPO3

Overview of the workflow of metadata extraction in TYPO3 when using this extension.

JSON mapping configuration file

A mapping configuration file is of the form:

    "FAL": "caption",
    "DATA": "CaptionAbstract"
    "FAL": "color_space",
    "DATA": [
This is the name (column) of the metadata in FAL.
This is either a unique key or an array of ordered keys to be checked for content in the extracted metadata. In addition, an arbitrary post-processor may be specified using the -> array notation.
Configuration Helper Tool

A configuration helper tool is available in Extension Manager.


The method \Causal\Extractor\Service\Extraction\AbstractExtractionService::getDataMapping() is the central method invoked to map extracted metadata to FAL properties. Developers may dynamically alter the mapping by hooking into the process using $GLOBALS['TYPO3_CONF_VARS']['EXTCONF']['extractor']['dataMappingHook'].

Signal after extraction

Once the meta data has been extracted, a signal is emitted, which allows other extensions to process the file further. The Signal can be connected to a Slot as follows (e.g., in file file:ext_localconf.php of your extension).

Registration in TYPO3 v8 and v9

// Initiate SignalSlotDispatcher
$signalSlotDispatcher = \TYPO3\CMS\Core\Utility\GeneralUtility::makeInstance(

// Connect the Signal "postMetaDataExtraction" to a Slot

This requires a PHP class \VENDOR\MyExtension\Service\Extractor and a method enhanceMetadata() in this class:

namespace VENDOR\MyExtension\Service;

use TYPO3\CMS\Core\Resource\FileInterface;

class Extractor
    public function enhanceMetadata(FileInterface $file, array &$metadata): void
        // your code

Registration since TYPO3 v10

The signal slot dispatcher is deprecated since TYPO3 v10 and you should instead register a middleware by creating file Configuration/Services.yaml within your extension:

    autowire: true
    autoconfigure: true
    public: false

      - name: event.listener
        identifier: 'causal/extractor'
        method: 'postMetaDataExtraction'
        event: Causal\Extractor\Resource\Event\AfterMetadataExtractedEvent


Be sure to module Admin Tools > Maintenance and to flush the TYPO3 and PHP Cache when you register middlewares.

This requires a PHP class \VENDOR\MyExtension\EventListener\ExtractorEventListener and a method enhanceMetadata() in this class:

namespace VENDOR\MyExtension\EventListener;

use Causal\Extractor\Resource\Event\AfterMetadataExtractedEvent;

class Extractor
    public function postMetaDataExtraction(AfterMetadataExtractedEvent $event): void
        // your code

Associated TYPO3 categories

By default TYPO3 categories are automatically assigned using keywords found in the metadata due to the mapping associating them to the special FAL field __categories__. This virtual field expects a comma-separated list of TYPO3 category titles.

Since version 2.1.0, we added another special FAL field __category_uids__ which works similarly but expecting a comma-separated list of category uids instead. One would use the signal/event and expand extracted metadata with a custom business logic.

An real-life example is using the geographical coordinates latitude/longitude, send them to the Google reverse geocoding service to translate them into a human-readable address and thus populating the fields “location”, “region” and “country” and possibly assign geographical-related TYPO3 categories based on the API output.