Configuration 

All the settings for the extension can be made through the TYPO3 Extension Configuration module.

Extension configuration for EXT:tika

Extractor 

Simply select what service you would like to use, either

  • Tika App(not recommended)
  • Tika Server(recommended)
  • Solr Server.

Depending on that, configure the necessary settings for your service on the according settings tab.

About Tika variants 

Each variant has its advantages and its drawbacks.

Solr Cell - variant 

Apache Solr Content Extraction Library (Solr Cell) variant does not support all the features supported by the App and by Server variants, but does not require to run and maintain any additional service/stack, if EXT:solr is already configured. Any connection/core used by EXT:solr can be reused there. Possible implications can be found on Apache Solr docs page

Enable Logging 

Enables the logging for extraction actions.

Show Tika Backend Module 

Enables a Tika module within the Solr backend module (experimental, only works with Tika server, will be removed.)

Exclude mime types 

Expects a list of mime types to be excluded in metadata extraction.

File size limit... 

Expects a file size limit in MB when a file should be processed. (Defaults to 500)

Enable meta data extraction 

Enables MetaDataExtractor, including LanguageDetector, if available. (Default: true) Useful on frequent file movements or mass file processing or if metadata must not be overridden.