Developer manual
This chapter describes some internals of this extension to let you extend it easily.
Assets such as PDF, images, documents, ... are uploaded to TYPO3. Metadata
extraction services are called, one after another, based on their advertised
priority or quality. These services are the various extraction classes you find
under Classes/
).
The service classes invoke the actual wrappers to the extraction tools (Apache
Tika, ExifTool, PHP, ...) to be found under Classes/
.
In order to map the data format used by the various extraction tools to the FAL
metadata structure used by TYPO3, a JSON-based configuration file is used. Those
mapping configuration files can be found under
Configuration/
.

Overview of the workflow of metadata extraction in TYPO3 when using this extension.
JSON mapping configuration file
A mapping configuration file is of the form:
[
{
"FAL": "caption",
"DATA": "CaptionAbstract"
},
{
"FAL": "color_space",
"DATA": [
"ColorSpaceData",
"ColorSpace->Causal\\Extractor\\Utility\\ColorSpace::normalize"
]
}
]
- FAL
- This is the name (column) of the metadata in FAL.
- DATA
- This is either a unique key or an array of ordered keys to be checked for
content in the extracted metadata. In addition, an arbitrary post-processor
may be specified using the
->
array notation.

A configuration helper tool is available in Extension Manager, prior to TYPO3 v11.
Hook
The method \Causal\Extractor\Service\Extraction\AbstractExtractionService::getDataMapping()
is the central method invoked to map extracted metadata to FAL properties.
Developers may dynamically alter the mapping by hooking into the process using
$GLOBALS['TYPO3_CONF_VARS']['EXTCONF']['extractor']['dataMappingHook']
.
Signal after extraction
Once the meta data has been extracted, a signal is emitted, which allows other
extensions to process the file further. The Signal can be connected to a Slot as
follows (e.g., in file file:ext_
of your extension).
Registration in TYPO3 v8 and v9
// Initiate SignalSlotDispatcher
$signalSlotDispatcher = \TYPO3\CMS\Core\Utility\GeneralUtility::makeInstance(
\TYPO3\CMS\Extbase\SignalSlot\Dispatcher::class
);
// Connect the Signal "postMetaDataExtraction" to a Slot
$signalSlotDispatcher->connect(
\Causal\Extractor\Service\AbstractService::class,
'postMetaDataExtraction',
\VENDOR\MyExtension\Service\Extractor::class,
'enhanceMetadata'
);
This requires a PHP class \VENDOR\MyExtension\Service\Extractor
and a
method enhanceMetadata()
in this class:
<?php
namespace VENDOR\MyExtension\Service;
use TYPO3\CMS\Core\Resource\FileInterface;
class Extractor
{
public function enhanceMetadata(FileInterface $file, array &$metadata): void
{
// your code
}
}
Registration since TYPO3 v10
The signal slot dispatcher is deprecated since TYPO3 v10 and you should instead
register a middleware by creating file Configuration/
within your extension:
services:
_defaults:
autowire: true
autoconfigure: true
public: false
VENDOR\MyExtension\EventListener\ExtractorEventListener:
tags:
- name: event.listener
identifier: 'causal/extractor'
method: 'postMetaDataExtraction'
event: Causal\Extractor\Resource\Event\AfterMetadataExtractedEvent
Caution
Be sure to module Admin Tools > Maintenance and to flush the TYPO3 and PHP Cache when you register middlewares.
This requires a PHP class
\VENDOR\MyExtension\EventListener\ExtractorEventListener
and a method
enhanceMetadata()
in this class:
<?php
namespace VENDOR\MyExtension\EventListener;
use Causal\Extractor\Resource\Event\AfterMetadataExtractedEvent;
class Extractor
{
public function postMetaDataExtraction(AfterMetadataExtractedEvent $event): void
{
// your code
}
}
Associated TYPO3 categories
By default TYPO3 categories are automatically assigned using keywords found in
the metadata due to the mapping associating them to the special FAL field
__categories__
. This virtual field expects a comma-separated list of TYPO3
category titles.
Since version 2.1.0, we added another special FAL field __category_uids__
which works similarly but expecting a comma-separated list of category uids
instead. One would use the signal/event and expand extracted metadata with a
custom business logic.
An real-life example is using the geographical coordinates latitude/longitude, send them to the Google reverse geocoding service to translate them into a human-readable address and thus populating the fields "location", "region" and "country" and possibly assign geographical-related TYPO3 categories based on the API output.