Developer manual¶
This chapter describes some internals of this extension to let you extend it easily.
Assets such as PDF, images, documents, … are uploaded to TYPO3. Metadata
extraction services are called, one after another, based on their advertised
priority or quality. These services are the various extraction classes you find
under Classes/Service/Extraction/
).
The service classes invoke the actual wrappers to the extraction tools (Apache
Tika, ExifTool, PHP, …) to be found under Classes/Service/Wrapper/
.
In order to map the data format used by the various extraction tools to the FAL
metadata structure used by TYPO3, a JSON-based configuration file is used. Those
mapping configuration files can be found under
Configuration/Services/Wrapper/
.
JSON mapping configuration file¶
A mapping configuration file is of the form:
[
{
"FAL": "caption",
"DATA": "CaptionAbstract"
},
{
"FAL": "color_space",
"DATA": [
"ColorMode",
"ColorSpaceData",
"ColorSpace->Causal\\Extractor\\Utility\\ColorSpace::normalize"
]
}
]
- FAL
- This is the name (column) of the metadata in FAL.
- DATA
- This is either a unique key or an array of ordered keys to be checked for
content in the extracted metadata. In addition, an arbitrary post-processor
may be specified using the
->
array notation.
Hook¶
The method \Causal\Extractor\Service\Extraction\AbstractExtractionService::getDataMapping()
is the central method invoked to map extracted metadata to FAL properties.
Developers may dynamically alter the mapping by hooking into the process using
$GLOBALS['TYPO3_CONF_VARS']['EXTCONF']['extractor']['dataMappingHook']
.
Signal after extraction¶
Once the meta data has been extracted, a signal is emitted, which allows other extensions to process the file further. The Signal can be connected to a Slot as follows (e.g., in file file:ext_localconf.php of your extension).
Registration in TYPO3 v8 and v9
// Initiate SignalSlotDispatcher
$signalSlotDispatcher = \TYPO3\CMS\Core\Utility\GeneralUtility::makeInstance(
\TYPO3\CMS\Extbase\SignalSlot\Dispatcher::class
);
// Connect the Signal "postMetaDataExtraction" to a Slot
$signalSlotDispatcher->connect(
\Causal\Extractor\Service\AbstractService::class,
'postMetaDataExtraction',
\VENDOR\MyExtension\Service\Extractor::class,
'enhanceMetadata'
);
This requires a PHP class \VENDOR\MyExtension\Service\Extractor
and a
method enhanceMetadata()
in this class:
<?php
namespace VENDOR\MyExtension\Service;
use TYPO3\CMS\Core\Resource\FileInterface;
class Extractor
{
public function enhanceMetadata(FileInterface $file, array &$metadata): void
{
// your code
}
}
Registration since TYPO3 v10
The signal slot dispatcher is deprecated since TYPO3 v10 and you should instead
register a middleware by creating file Configuration/Services.yaml
within your extension:
services:
_defaults:
autowire: true
autoconfigure: true
public: false
VENDOR\MyExtension\EventListener\ExtractorEventListener:
tags:
- name: event.listener
identifier: 'causal/extractor'
method: 'postMetaDataExtraction'
event: Causal\Extractor\Resource\Event\AfterMetadataExtractedEvent
Caution
Be sure to module Admin Tools > Maintenance and to flush the TYPO3 and PHP Cache when you register middlewares.
This requires a PHP class
\VENDOR\MyExtension\EventListener\ExtractorEventListener
and a method
enhanceMetadata()
in this class:
<?php
namespace VENDOR\MyExtension\EventListener;
use Causal\Extractor\Resource\Event\AfterMetadataExtractedEvent;
class Extractor
{
public function postMetaDataExtraction(AfterMetadataExtractedEvent $event): void
{
// your code
}
}
Associated TYPO3 categories¶
By default TYPO3 categories are automatically assigned using keywords found in
the metadata due to the mapping associating them to the special FAL field
__categories__
. This virtual field expects a comma-separated list of TYPO3
category titles.
Since version 2.1.0, we added another special FAL field __category_uids__
which works similarly but expecting a comma-separated list of category uids
instead. One would use the signal/event and expand extracted metadata with a
custom business logic.
An real-life example is using the geographical coordinates latitude/longitude, send them to the Google reverse geocoding service to translate them into a human-readable address and thus populating the fields “location”, “region” and “country” and possibly assign geographical-related TYPO3 categories based on the API output.