Developer manual

This chapter describes some internals of this extension to let you extend it easily.

Assets such as PDF, images, documents, ... are uploaded to TYPO3. Metadata extraction services are called, one after another, based on their advertised priority or quality. These services are the various extraction classes you find under Classes/Service/Extraction/).

The service classes invoke the actual wrappers to the extraction tools (Apache Tika, ExifTool, PHP, ...) to be found under Classes/Service/Wrapper/.

In order to map the data format used by the various extraction tools to the FAL metadata structure used by TYPO3, a JSON-based configuration file is used. Those mapping configuration files can be found under Configuration/Services/Wrapper/.

Overview of the extraction of metadata in TYPO3

Overview of the workflow of metadata extraction in TYPO3 when using this extension.

JSON mapping configuration file

A mapping configuration file is of the form:

[
  {
    "FAL": "caption",
    "DATA": "CaptionAbstract"
  },
  {
    "FAL": "color_space",
    "DATA": [
      "ColorMode",
      "ColorSpaceData",
      "ColorSpace->Causal\\Extractor\\Utility\\ColorSpace::normalize"
    ]
  }
]
FAL

This is the name (column) of the metadata in FAL.

DATA

This is either a unique key or an array of ordered keys to be checked for content in the extracted metadata. In addition, an arbitrary post-processor may be specified using the -> array notation.

Configuration Helper Tool

A configuration helper tool is available in Extension Manager, prior to TYPO3 v11.

Hook

The method \Causal\Extractor\Service\Extraction\AbstractExtractionService::getDataMapping() is the central method invoked to map extracted metadata to FAL properties. Developers may dynamically alter the mapping by hooking into the process using $GLOBALS['TYPO3_CONF_VARS']['EXTCONF']['extractor']['dataMappingHook'].

Signal after extraction

Once the meta data has been extracted, a signal is emitted, which allows other extensions to process the file further. The Signal can be connected to a Slot as follows (e.g., in file file:ext_localconf.php of your extension).

Registration in TYPO3 v8 and v9

// Initiate SignalSlotDispatcher
$signalSlotDispatcher = \TYPO3\CMS\Core\Utility\GeneralUtility::makeInstance(
    \TYPO3\CMS\Extbase\SignalSlot\Dispatcher::class
);

// Connect the Signal "postMetaDataExtraction" to a Slot
$signalSlotDispatcher->connect(
    \Causal\Extractor\Service\AbstractService::class,
    'postMetaDataExtraction',
    \VENDOR\MyExtension\Service\Extractor::class,
    'enhanceMetadata'
);

This requires a PHP class \VENDOR\MyExtension\Service\Extractor and a method enhanceMetadata() in this class:

<?php
namespace VENDOR\MyExtension\Service;

use TYPO3\CMS\Core\Resource\FileInterface;

class Extractor
{
    public function enhanceMetadata(FileInterface $file, array &$metadata): void
    {
        // your code
    }
}

Registration since TYPO3 v10

The signal slot dispatcher is deprecated since TYPO3 v10 and you should instead register a middleware by creating file Configuration/Services.yaml within your extension:

services:
  _defaults:
    autowire: true
    autoconfigure: true
    public: false

  VENDOR\MyExtension\EventListener\ExtractorEventListener:
    tags:
      - name: event.listener
        identifier: 'causal/extractor'
        method: 'postMetaDataExtraction'
        event: Causal\Extractor\Resource\Event\AfterMetadataExtractedEvent

Caution

Be sure to module Admin Tools > Maintenance and to flush the TYPO3 and PHP Cache when you register middlewares.

This requires a PHP class \VENDOR\MyExtension\EventListener\ExtractorEventListener and a method enhanceMetadata() in this class:

<?php
namespace VENDOR\MyExtension\EventListener;

use Causal\Extractor\Resource\Event\AfterMetadataExtractedEvent;

class Extractor
{
    public function postMetaDataExtraction(AfterMetadataExtractedEvent $event): void
    {
        // your code
    }
}

Associated TYPO3 categories

By default TYPO3 categories are automatically assigned using keywords found in the metadata due to the mapping associating them to the special FAL field __categories__. This virtual field expects a comma-separated list of TYPO3 category titles.

Since version 2.1.0, we added another special FAL field __category_uids__ which works similarly but expecting a comma-separated list of category uids instead. One would use the signal/event and expand extracted metadata with a custom business logic.

An real-life example is using the geographical coordinates latitude/longitude, send them to the Google reverse geocoding service to translate them into a human-readable address and thus populating the fields "location", "region" and "country" and possibly assign geographical-related TYPO3 categories based on the API output.