Apache Tika for TYPO3 

Extension key

tika

Package name

apache-solr-for-typo3/tika

Version

12.0

Language

en

Author

Ingo Renner, Markus Friedrich, Rafael Kähm, Timo Hund & Contributors

License

This document is published under the Open Publication License.

Rendered

Tue, 25 Nov 2025 17:30:43 +0000


Apache Tika is a toolkit for detecting and extracting metadata and structured text content from various documents using existing parser libraries.

All in all Tika knows/can detect about 1200 file formats and can read about half of them. These formats include the most common ones: HTML, XML including RSS and ATOM feeds, Microsoft Office (binary formats and OOXML), OpenDocument (OpenOffice.org), Apple iWork, PDF, ePUB, RTF, compressed formats like ZIP, audio formats including MP3, flash flv video, image formats including JPEG and TIFF, mail box mbox format, and many more.

Apache Tika for TYPO3 provides three services to retrieve information from files:


  • Text extraction
  • Language detection of file contents
  • Meta data extraction

All three services can be used with FAL.

It is recommended to use Apache Tika version 1.28 or higher.

Getting Help 


Table of Contents:

Introduction 

What does it do? 

This extension is a toolkit for detecting and extracting metadata and structured text content from various documents using existing parser libraries. By default it enriches the mata-data on TYPO3 FAL managed files automatically.

In combination with EXT:solrfal this extension makes it possible to index and search for contents of TYPO3 FAL managed files.

EXT:tika uses Apache Tika as backing service to extract the data. So it sends the TYPO3 FAL managed files to the Tika application and uses its response to enrich the data of TYPO3.

Beside of that all the EXT:Tika provides the public API and tools-set to developers to communicate with Apache Tika.

Configuration 

All the settings for the extension can be made through the TYPO3 Extension Configuration module.

Extension configuration for EXT:tika

Extractor 

Simply select what service you would like to use, either

  • Tika App(not recommended)
  • Tika Server(recommended)
  • Solr Server.

Depending on that, configure the necessary settings for your service on the according settings tab.

About Tika variants 

Each variant has its advantages and its drawbacks.

Solr Cell - variant 

Apache Solr Content Extraction Library (Solr Cell) variant does not support all the features supported by the App and by Server variants, but does not require to run and maintain any additional service/stack, if EXT:solr is already configured. Any connection/core used by EXT:solr can be reused there. Possible implications can be found on Apache Solr docs page

Enable Logging 

Enables the logging for extraction actions.

Show Tika Backend Module 

Enables a Tika module within the Solr backend module (experimental, only works with Tika server, will be removed.)

Exclude mime types 

Expects a list of mime types to be excluded in metadata extraction.

File size limit... 

Expects a file size limit in MB when a file should be processed. (Defaults to 500)

Enable meta data extraction 

Enables MetaDataExtractor, including LanguageDetector, if available. (Default: true) Useful on frequent file movements or mass file processing or if metadata must not be overridden.

Configuration of Tika Server 

Requirements 

  • Setting EXT:tika to use the Apache Tika server connection.

Setup EXT:tika for Tika Server 

Open Extension settings for EXT:tika General tab and choose "Tika Server" as Extractor.

Extension configuration for EXT:tika - Choosing Server extractor in General tab

Extension configuration for EXT:tika - Choosing Server extractor in General tab

After that open the Server tab and paste the connection infos/datas according fields.

Extension configuration for EXT:tika - Provide the connection infos/datas for Tika Server

Extension configuration for EXT:tika - Provide the connection infos/datas for Tika Server

See Check if it works for test instructions.

Configuration of Solr Cell 

Requirements 

  • Running and configured Apache Solr service.
  • Setting EXT:tika to use the Apache Solr server connection.

Setup EXT:tika for Solr Server 

Open Extension settings for EXT:tika General tab and choose "Solr Server" as Extractor.

Extension configuration for EXT:tika - Choosing Solr Server extractor in General tab

Extension configuration for EXT:tika - Choosing Solr Server extractor in General tab

After that open the Solr tab and paste the connection infos/datas according fields.

Extension configuration for EXT:tika - Provide the connection infos/datas for Solr Server

Extension configuration for EXT:tika - Provide the connection infos/datas for Solr Server

See Check if it works for test instructions.

Check if it works 

TYPO3 Reports 

First of all check the TYPO3 Reports module for any errors reported by the extension. You will find them as reported from "Apache Tika".

The extension checks whether you have Java installed when using the Tika app or Tika server.

It will also check your configuration, whether the configured paths for Tika app and Tika server are available and whether Tika Server and Solr server can be reached depending on what you're using.

If all is configured as expected, you'll get following in TYPO3 Reports:

EXT:tika Check configs - OK

EXT:tika Check configs - OK

Real test via Tika Preview 

If all is fine, you can try to extract really via Tika Preview

Configuring Tika Services 

General information about how to configure the Tika Services can be found in the official Tika documentation

In case you want to exclude certain mime types from being processed by Tika, you can do the following:

Create the file /etc/tika/tika-config.xml with this content:

<?xml version="1.0" encoding="UTF-8"?>
<properties>
  <parsers>
    <parser class="org.apache.tika.parser.DefaultParser">
      <mime-exclude>application/zip</mime-exclude>
    </parser>
    <parser class="org.apache.tika.parser.EmptyParser">
      <mime>application/zip</mime>
    </parser>
  </parsers>
</properties>
Copied!

This tells Tika to exclude zip files from DefaultParser and use EmptyParser instead, who does basically nothing.

Apply tika-config.xml 

For Editors 

Tika Preview 

The editors can preview the extractable contents and meta-data in Filelist BE module in context menu of a file:

File context menu - Tika Preview

Tika Preview button on file context menu.

By clicking on "Tika Preview" button the file will be processed by Tika and the extracted data will be listed in pop-up window. This pop-up window contains the extracted file contents and meta-data:

Extracted data from Tika Preview

Extracted data from Tika Preview.

Releases 12.0 

Release 12.0.4 

This is maintenance release for TYPO3 12 allowing PHP 8.4.

## What's Changed - [TASK] Simplify EXT:solr tests stack by Rafael Kähm in 0e09fe1 - [TASK] Remove Implicitly nullable parameter declarations deprecated by Thomas Hohn in fc31751 - [DOCS] Migrate Documentation to the new rendering by Rafael Kähm in 763fa1d - [BUGFIX] Fix CS issues by Rafael Kähm in b173e41 - [DOCS] Apache Solr 9.10.0 CELL does not extract metadata from binaries properly by Rafael Kähm in 9144c12

Release 12.0.3 

This is maintenance release for TYPO3 12.

  • [CI] Fix Github-Actions deprecations dad7aab
  • [TASK] Set branch alias release-12.0.x as 12.0.x-dev for composer 3a8f7ea
  • [TASK] Use new template module API 37ffddf
  • [TASK] Remove and disallow all superflous @author/copyright annotations 0df1d84
  • [TASK] Prepare README for TYPO3 12 c49d51d

Release 12.0.2 

This is maintenance release for TYPO3 12.

  • [DOCS] Fix repository URL in documentation 6cbc6f2 (thanks to @eliashaeussler)
  • [TASK] Upgrade to PHPUnit:10.1 and typo3/testing-framework 8.0+ 04716c0 (thanks to @dkd-kaehm)
  • [FEATURE] add config option for MetaDataExtractor 8b10fd7 (thanks to @hvomlehn-sds)
  • [TASK] Add tests for documentation c9e18ca (thanks to @dkd-kaehm)
  • [TASK] Set min. TYPO3 version to 12.4.3 4a9e209 (thanks to @dkd-kaehm)
  • [DOCS] Fix repository URL in releases documentation 3f3250e (thanks to @dkd-kaehm)

Release 12.0.1 

This release is relevant for Apache Solr Cell/server users only. To be able to use Apache Solr server as extractor.

Important for Solr Cell users 

  • !!![BUGFIX] SolrCell broken due of EXT:solr BC change on connection conf cdd7134 on @2023-10-19 (thanks to Rafael Kähm)

EXT:solr 12.0.0 requires separate configurations for path + core and username + password. All this settings must be given separately now. The path setting is handled the same way as in EXT:solr also:

Must not contain "/solr/"! Unless you have an additional "solr" segment in your path like "http://localhost:8983/solr/solr/core_en".

Extension configuration for EXT:tika - Solr Cell configuration

Extension configuration for EXT:tika - Solr Cell configuration

All other changes 

Release 12.0.0 

We are happy to announce version 12.0.0 of EXT:tika.

New in this Release 

  • [BUGFIX] Don't use minimum-stability dev on TYPO3 stable in build/CI 3e5c6c2
  • [TASK] Automated cleanup via rector 0e5d0d7
  • [TASK] Remove unneeded checks 187f261
  • [TASK] Allow install of v12 38d3a3d
  • [TASK] Make status work bdc3843
  • [TASK] Migrate icon registration 0fe8b6d
  • [TASK] Simplify code in viewhelper 594ad42
  • [TASK] Enable autoconfigure by default 5677a79
  • [TASK] Make the preview work 488084f
  • [TASK] Make BE module work 4095fe5
  • [TASK] Setup Github-Actions for TYPO3 12 LTS 59bc08c
  • [TASK] Sync Setup within composer.json with TYPO3 12 stack ca4d7df
  • [TASK] Apply TYPO3 coding standards from EXT:solr 12.0.x-dev 63f51d8
  • [TASK] setup dg/bypass-finals PHPUnit extension to be able to mock finals b3474a6
  • [TASK] Fix Integration tests for TYPO3 12 3830729
  • [TASK] Remove unused item provider registration 8a26824
  • [TASK] Remove unused hook 9f8c1a0
  • [TASK] Improve code by using PHP8 features f16e4bd
  • [TASK] Replace Scrutinizer analysis by PHPStan :: Level 3 640d234
  • [TASK] PHPStan fix up to :: Level 5 2d9fabf
  • [TASK] synchronize with EXT:solr* tests stack 5c04545
  • [FIX] PHP-linter: Cannot redeclare exec() in ExecMockFunctions.php 3a4aae6
  • [BUGFIX] Fix old linter issues with php-cs-fixer v3.23.0+ 4eca0d7

Contributors 

  • @internezzo-prod
  • Benni Mack
  • Elias Häußler
  • Georg Ringer
  • Hendrik vom Lehn
  • Lars Tode
  • Markus Friedrich
  • Peter Kraume
  • Rafael Kähm
  • Stefan Frömken
  • Thomas Hohn

Thanks to everyone who helped in creating this release!

Also a big thanks to our partners that have joined the Apache Solr EB für TYPO3 12 LTS (Feature) program:

  • .hausformat
  • 711media websolutions GmbH
  • ACO Ahlmann SE & Co. KG
  • AVM Computersysteme Vertriebs GmbH
  • Ampack AG
  • Amt der Oö Landesregierung
  • Autorité des Marchés Financiers (Québec)
  • b13 GmbH
  • Beech IT
  • CARL von CHIARI GmbH
  • clickstorm GmbH Apache Solr EB für TYPO3 12 LTS (Feature)
  • Connecta AG
  • cosmoblonde GmbH
  • cron IT GmbH
  • CS2 AG
  • cyperfection GmbH
  • digit.ly
  • DMK E-BUSINESS GmbH
  • DP-Medsystems AG
  • DSCHOY GmbH
  • Deutsches Literaturarchiv Marbach
  • EB-12LTS-FEATURE
  • F7 Media GmbH
  • FTI Touristik GmbH
  • gedacht
  • GPM Deutsche Gesellschaft für Projektmanagement e. V.
  • HEAD acoustics GmbH
  • in2code GmbH
  • Internezzo
  • jweiland.net
  • keeen GmbH
  • KONVERTO AG
  • Kassenärztliche Vereinigung Rheinland-Pfalz
  • Kreis Euskirchen
  • L.N. Schaffrath DigitalMedien GmbH
  • LOUIS INTERNET GmbH
  • Leuchtfeuer Digital Marketing GmbH
  • Lingner Consulting New Media GmbH
  • Macaw Germany Cologne GmbH
  • Marketing Factory Consulting GmbH
  • mehrwert intermediale kommunikation GmbH
  • morbihan.fr - Commande BDC_99143_202404081250
  • ochschule Furtwangen
  • pietzpluswild GmbH
  • plan2net GmbH
  • ProPotsdam GmbH
  • Québec.ca gouv.qc.ca Apache Solr EB für TYPO3 12 LTS (Feature)
  • Red Dot GmbH & Co. KG
  • Schoene neue kinder GmbH
  • Snowflake Productions GmbH Apache Solr EB für TYPO3 12 LTS (Feature)
  • Stadtverwaltung Villingen-Schwenningen
  • Stämpfli AG
  • studio ahoi - Weitenauer Schwardt GbR
  • THE BRETTINGHAMS GmbH
  • Typoheads GmbH
  • UEBERBIT GmbH
  • Universität Regensburg
  • VisionConnect.de
  • WACON Internet GmbH
  • webconsulting business services gmbh
  • werkraum Digitalmanufaktur GmbH
  • WIND Internet BV
  • XIMA MEDIA GmbH
  • wow! solution

How to Get Involved 

There are many ways to get involved with Apache Solr for TYPO3:

Support us by becoming an EB partner:

http://www.typo3-solr.com/en/contact/

or call:

+49 (0)69 - 2475218 0

Releases 11.0 

Release 11.0.1 

This release is relevant for Apache Solr Cell/server users only. To be able to use Apache Solr server as extractor, the EXT:solr v11.5.1+ is required as well.

Bugfixes: 

  • [BUGFIX] Use always string instead of null in all trim() calls ef2028b
  • [BUGFIX] Cast port to int for Solr connection e4f062e
  • [TASK] Fix TYPO3 coding standards issues after upgrade to v0.5.5 cd010f3
  • [TASK] Remove strict_type from ext_emconf to be able to publish in TER 94803dc

Release 11.0.0 

We are happy to announce version 11.0.0 of EXT:tika.

Important: This version is compatible with 11 LTS only.

New in this Release 

  • [TASK] Prepare releases for TYPO3 11 LTS 910681d
  • [TASK] Fix issues recognized by scrutinizer 24aa731
  • [FEATURE] Allow definition of additional Java command options 2420888
  • [BUGFIX] Handle custom java command options for server module as well 1ec312e
  • [TASK] Let php-cs-fixer fix some CGL 38ca19b
  • [TASK] Move ext icon 4cffbd7
  • [BUGFIX] Force variable as string f763ebb
  • [TASK] Allow installation of 11.5 897b12c
  • [BUGFIX] Use correct controller code 5c8976c
  • [TASK] update ci pipeline eaad00e
  • [TASK] TYPO3 11 LTS and PHP 8.1 compatibility ed160cd
  • [TASK] Fix scrutinizer issues on release-11.0.x d0d9439
  • [TASK] Update Apache TIKA to v1.27 on release-11.0.x 24f2929

Contributors 

  • Elias Häußler
  • Georg Ringer
  • Rafael Kähm
  • Roman Schilter

Thanks to everyone who helped in creating this release!

Also a big thanks to our partners that have joined the EB2021 program:

  • +Pluswerk AG
  • 711media websolutions GmbH
  • Abt Sportsline GmbH
  • ACO Severin Ahlmann GmbH & Co. KG
  • AVM Computersysteme Vertriebs GmbH
  • cosmoblonde GmbH
  • creativ clicks GmbH
  • cron IT GmbH
  • CS2 AG
  • CW Media & Systems
  • Earlybird GmbH & Co KG
  • FLOWSITE GmbH
  • form4 GmbH & Co. KG
  • Getdesigned GmbH
  • Granpasso Digital Strategy GmbH
  • Ikanos GmbH
  • internezzo ag
  • Intersim AG
  • Ion2s GmbH
  • Leitgab Gernot
  • mellowmessage GmbH
  • Moselwal Digitalagentur UG (haftungsbeschränkt)
  • network.publishing Möller-Westbunk GmbH
  • OST Ostschweizer Fachhochschule
  • Plan.Net Suisse AG
  • Provitex GmbH
  • punkt.de GmbH
  • queo GmbH
  • Rechnungshof
  • Schoene neue kinder GmbH
  • SIT GmbH
  • SIZ GmbH
  • Stämpfli AG
  • Triplesense Reply Frankfurt
  • TWT reality bytes GmbH
  • visol digitale Dienstleistungen GmbH
  • Web Commerce GmbH
  • webconsulting business services gmbh
  • webschuppen GmbH
  • Webstobe GmbH
  • Webtech AG
  • wow! solution
  • XIMA MEDIA GmbH
  • Bundesanstalt Statistik Österreich
  • ECOS TECHNOLOGY GMBH
  • Fachhochschule Erfurt
  • Hochschule Furtwangen - IMZ Online-Services
  • Hochschule Niederrhein University of Applied Sciences
  • l'Autorité des marchés financiers
  • La Financière agricole du Québec
  • LfdA - Labor für digitale Angelegenheiten GmbH

How to Get Involved 

There are many ways to get involved with Apache Solr for TYPO3:

Support us by becoming an EB partner:

http://www.typo3-solr.com/en/contact/

or call:

+49 (0)69 - 2475218 0

Releases 10.0 

Release 10.0.2 

Important: This version contains CVE-2021-44228 fixes for users, who starting the Tika Server-daemons within TYPO3 BE or using Tika app modes. All users using dedicated Tika server or Apache Solr Tika cell connections do not benefit from the update and should harden the Solr Servers and/or Tika Servers with official CVE-2021-44228 patches manually.

Manual action required for Tika App or enabled Tika Server module

Please note that the release does not automatically include security measures against CVE-2021-44228. Rather, it is now possible to specify additional parameters that can be passed when the java binary is executed. The parameters can be set using the extension configuration javaCommandOptions. Example:

# LocalConfiguration.php
return [
    'EXTENSIONS' => [
        'tika' => [
            'javaCommandOptions' => '-Dlog4j2.formatMsgNoLookups=true',
        ],
    ],
];
Copied!

Release 10.0.1 

Important: This version contains CVE-2021-44228 fixes for users, who starting the Tika Server-daemons within TYPO3 BE or using Tika app modes. All users using dedicated Tika server or Apache Solr Tika cell connections do not benefit from the update and should harden the Solr Servers and/or Tika Servers with official CVE-2021-44228 patches manually.

Release 10.0.0 

We are happy to announce version 10.0.0 of EXT:tika.

Important: This version is compatible with 10 LTS only.

New in this Release 

[TASK] Introduce TYPO3 PSR-18 client (#156) 

https://github.com/TYPO3-Solr/ext-tika/pull/156 https://github.com/TYPO3-Solr/ext-tika/issues/154

The implementation to fetch website content based on the stream context and method 'file_get_content()' is removed. Instead the TYPO3 build in HTTP client is in use to access the Tika server. This client supports PSR-18 and allows to use proxy information configured in TYPO3.

Internally the string representation replaced by the URI interface. This allows an easier and better way to handle server URIs.

Replace use of general exception with BadResponseException. Log exception in case it should not thrown.

  • Reduce log warnings while building supported mime types.
  • Refactor unit and integration tests according to internal changes.
  • Switch log severity from integer to LogLevel constants.
  • Several code changes to method declaration.

[TASK] Refactor logging (#161) 

https://github.com/TYPO3-Solr/ext-tika/pull/161 https://github.com/TYPO3-Solr/ext-tika/issues/137 https://github.com/TYPO3-Solr/ext-tika/issues/160

Use LoggerAwareInterface and LoggerAwareTrait instead of setup logging over the log manager.

Replace log severity numbers with LogLevel constants. Set default level to debug.

Unit tests:

  • Inject instance of NullLogger due changes to the logging behaviour.
  • Access environment variables for unit and integration tests in order to allow different testing environments.

Contributors 

  • Lars Tode
  • Markus Friedrich
  • Rafael Kähm

Thanks to everyone who helped in creating this release!

Also a big thanks to our partners that have joined the EB2021 program:

  • +Pluswerk AG
  • 711media websolutions GmbH
  • Abt Sportsline GmbH
  • ACO Severin Ahlmann GmbH & Co. KG
  • AVM Computersysteme Vertriebs GmbH
  • cosmoblonde GmbH
  • creativ clicks GmbH
  • cron IT GmbH
  • CS2 AG
  • CW Media & Systems
  • Earlybird GmbH & Co KG
  • FLOWSITE GmbH
  • form4 GmbH & Co. KG
  • Getdesigned GmbH
  • Granpasso Digital Strategy GmbH
  • Ikanos GmbH
  • internezzo ag
  • Intersim AG
  • Ion2s GmbH
  • Leitgab Gernot
  • mellowmessage GmbH
  • Moselwal Digitalagentur UG (haftungsbeschränkt)
  • network.publishing Möller-Westbunk GmbH
  • OST Ostschweizer Fachhochschule
  • Plan.Net Suisse AG
  • Provitex GmbH
  • punkt.de GmbH
  • queo GmbH
  • Rechnungshof
  • Schoene neue kinder GmbH
  • SIT GmbH
  • SIZ GmbH
  • Stämpfli AG
  • Triplesense Reply Frankfurt
  • TWT reality bytes GmbH
  • visol digitale Dienstleistungen GmbH
  • Web Commerce GmbH
  • webconsulting business services gmbh
  • webschuppen GmbH
  • Webstobe GmbH
  • Webtech AG
  • wow! solution
  • XIMA MEDIA GmbH
  • Bundesanstalt Statistik Österreich
  • ECOS TECHNOLOGY GMBH
  • Fachhochschule Erfurt
  • Hochschule Furtwangen - IMZ Online-Services
  • Hochschule Niederrhein University of Applied Sciences
  • l'Autorité des marchés financiers
  • La Financière agricole du Québec
  • LfdA - Labor für digitale Angelegenheiten GmbH

How to Get Involved 

There are many ways to get involved with Apache Solr for TYPO3:

Support us by becoming an EB partner:

http://www.typo3-solr.com/en/contact/

or call:

+49 (0)69 - 2475218 0

Release 6.0.2 

Important: This version contains CVE-2021-44228 fixes for users, who starting the Tika Server-daemons within TYPO3 BE or using Tika app modes. All users using dedicated Tika server or Apache Solr Tika cell connections do not benefit from the update and should harden the Solr Servers and/or Tika Servers with official CVE-2021-44228 patches manually.

Manual action required for Tika App or enabled Tika Server module

Please note that the release does not automatically include security measures against CVE-2021-44228. Rather, it is now possible to specify additional parameters that can be passed when the java binary is executed. The parameters can be set using the extension configuration javaCommandOptions. Example:

# LocalConfiguration.php
return [
    'EXTENSIONS' => [
        'tika' => [
            'javaCommandOptions' => '-Dlog4j2.formatMsgNoLookups=true',
        ],
    ],
];
Copied!

Release 6.0.1 

Important: This version contains CVE-2021-44228 fixes for users, who starting the Tika Server-daemons within TYPO3 BE or using Tika app modes. All users using dedicated Tika server or Apache Solr Tika cell connections do not benefit from the update and should harden the Solr Servers and/or Tika Servers with official CVE-2021-44228 patches manually.

Release 6.0.0 

We are happy to announce version 6.0.0 of EXT:tika.

New in this Release 

[FEATURE] Allow driver configuration for extractor services
https://github.com/TYPO3-Solr/ext-tika/pull/142

To be able to extract data for files from file storages with other drivers, a configuration option is added to the extension.

Usage:

$GLOBALS['TYPO3_CONF_VARS']['EXTCONF']['tika']['extractor']['driverRestrictions'][] = 'MaxServ.FalS3';
Copied!
[BUGFIX] wrong width and height properties taken from EXIF
https://github.com/TYPO3-Solr/ext-tika/pull/124
[FEATURE] Tika version 1.24 supported
EXT:tika is now tested against version 1.24 of Apache Tika.

Thanks 

  • Nicole Cordes
  • Rostyslav Matviyiv

Thanks to everyone who helped in creating this release!

How to Get Involved 

There are many ways to get involved with Apache Solr for TYPO3:

Support us in 2020 by becoming an EB partner:

http://www.typo3-solr.com/en/contact/

or call:

+49 (0)69 - 2475218 0

Release 5.0.1 

We are happy to announce version 5.0.1 of EXT:tika.

New in this Release 

[BUGFIX] wrong width and height properties taken from EXIF
https://github.com/TYPO3-Solr/ext-tika/pull/124
[FEATURE] Tika version 1.24 supported
EXT:tika is now tested against version 1.24 of Apache Tika.

Thanks 

  • Rostyslav Matviyiv

Release 5.0.0 

We are happy to announce version 5.0.0 of EXT:tika.

New in this Release 

This release is a compatibility release for EXT:solr 10.0.0

EXT:solr 10.0.0 support 

EXT:solr 10 support's the configuration of Solr together with your TYPO3 site. Beside that some methods have been marked as deprecated and have been removed from the EXT:solr API. This release of tika add's the compatibility for EXT:solr 10.

Thanks 

Thanks to all contributors

(patches, comments, bug reports, reviews, ... in alphabetical order)

  • Timo Hund

Also a big thanks to our partners that have joined the EB2019 program:

  • 21TORR GmbH
  • 3m5, Media GmbH
  • Absolut Research GmbH
  • AgenturWebfox GmbH
  • Amedick & Sommer Neue Medien GmbH
  • arndtteunissen GmbH
  • Arrabiata Solutions GmbH
  • artif GmbH & Co. KG
  • Atol Conseils & Développements
  • b13 GmbH
  • bgm business websolutions GmbH & Co KG
  • Bitmotion GmbH
  • BIBUS AG Group
  • Bitmotion GmbH
  • Columbus Interactive GmbH
  • Consulting Piezunka und Schamoni - Information Technologies GmbH
  • cosmoblonde GmbH
  • CS2 AG
  • datamints GmbH
  • Diesel Technic AG
  • Die Medialen GmbH
  • Direction des Systèmes d’Information - Département du Morbihan
  • dörler engineering services
  • E-Magineurs
  • Fachhochschule für öffentliche Verwaltung NRW Zentralverwaltung
  • fixpunkt werbeagentur gmbh
  • Flowd GmbH
  • Frequentis Comsoft GmbH
  • GAYA - La Nouvelle Agence
  • Gernot Leitgab
  • Getdesigned GmbH
  • .hausformat GmbH
  • Haute école de travail social et de la santé - EESP
  • Hirsch & Wölfl GmbH
  • Hochschule Furtwangen
  • Hypo Tirol Bank AG
  • Intera Gesellschaft für Software-Entwicklung mbH
  • interactive tools GmbH - Agentur für digitale Medien
  • internezzo ag
  • iresults gmbh
  • ITK Rheinland
  • LOUIS INTERNET GmbH
  • Kassenärztliche Vereinigung Bayerns (KZVB)
  • KONVERTO AG
  • kraftwerk Agentur für neue Kommunikation GmbH
  • Landesinstitut für Schule und Medien Berlin-Brandenburg
  • Libéo
  • LINGNER CONSULTING NEW MEDIA GMBH
  • MaxServ B.V.
  • McLicense GmbH
  • MeinEinkauf AG
  • NEW.EGO GmbH
  • medien.de mde GmbH
  • mehrwert intermediale kommunikation GmbH
  • mellowmessage GmbH
  • mentronic . Digitale Kommunikation
  • MOSAIQ GmbH
  • pietzpluswild GmbH
  • plan2net GmbH
  • plan.net - agence conseil en stratégies digitales
  • Proud Nerds
  • +Pluswerk AG
  • punkt.de GmbH
  • Redkiwi
  • ressourcenmangel dresden GmbH
  • rrdata
  • RKW Rationalisierungs- und Innovationszentrum der Deutschen Wirtschaft e.V.
  • Site’nGo
  • SIWA Online GmbH
  • Stadt Wien - Wiener Wohnen Kundenservice GmbH
  • Stadtverwaltung Villingen-Schwenningen
  • Stefan Galinski Internetdienstleistungen
  • Studio Mitte Digital Media GmbH
  • TOUMORO
  • Ueberbit Gmbh
  • WACON Internet GmbH
  • webconsulting business services gmbh
  • webschuppen GmbH
  • Webstobe GmbH
  • webit! Gesellschaft für neue Medien mbH
  • wegewerk GmbH
  • werkraum Digitalmanufaktur GmbH
  • XIMA MEDIA GmbH

Special thanks to our premium EB 2019 partners:

  • jweiland.net
  • sitegeist media solutions GmbH

Thanks to everyone who helped in creating this release!

How to Get Involved 

There are many ways to get involved with Apache Solr for TYPO3:

Support us in 2019 by becoming an EB partner:

http://www.typo3-solr.com/en/contact/

or call:

+49 (0)69 - 2475218 0

Release 4.0.1 

We are happy to announce version 4.0.1 of EXT:tika.

New in this Release 

[BUGFIX] wrong width and height properties taken from EXIF
https://github.com/TYPO3-Solr/ext-tika/pull/124
[FEATURE] Tika version 1.24 supported
EXT:tika is now tested against version 1.24 of Apache Tika.

Thanks 

  • Rostyslav Matviyiv

Release 4.0.0 

We are happy to announce version 4.0.0 of EXT:tika.

New in this Release 

This release is a compatibility release for EXT:solr 9.0.0

Usage of solarium PHP-API 

Since we are using the solarium PHP-API in EXT:solr now, we want to use that in EXT:tika as well, when we use Apache Solr for tika extraction.

Tika version 1.20 supported 

EXT:tika is now tested against version 1.20 of Apache Tika.

Add mimetype mpeg/audio to list of allowed mimetypes for solr cell 

Add's the mimetype mpeg/audio as allowed mimetype for the solr cell extraction

Bugfixes 

Thanks 

Thanks to all contributors

(patches, comments, bug reports, reviews, ... in alphabetical order)

  • Helmut Hummel
  • Timo Hund

Also a big thanks to our partners that have joined the EB2019 program:

  • Amedick & Sommer Neue Medien GmbH
  • BIBUS AG Group
  • Bitmotion GmbH
  • CS2 AG
  • Gernot Leitgab
  • Getdesigned GmbH
  • Hirsch & Wölfl GmbH
  • ITK Rheinland
  • Kassenärztliche Vereinigung Bayerns (KZVB)
  • TOUMORO
  • Ueberbit Gmbh
  • XIMA MEDIA GmbH
  • b13 GmbH
  • bgm business websolutions GmbH & Co KG
  • datamints GmbH
  • medien.de mde GmbH
  • mehrwert intermediale kommunikation GmbH
  • mellowmessage GmbH
  • plan2net GmbH
  • punkt.de GmbH

Special thanks to our premium EB 2019 partners:

  • jweiland.net
  • sitegeist media solutions GmbH

Thanks to everyone who helped in creating this release!

How to Get Involved 

There are many ways to get involved with Apache Solr for TYPO3:

Support us in 2019 by becoming an EB partner:

http://www.typo3-solr.com/en/contact/

or call:

+49 (0)69 - 2475218 0

Release 3.1.2 

We are happy to announce version 3.1.2 of EXT:tika.

New in this Release 

[BUGFIX] wrong width and height properties taken from EXIF
https://github.com/TYPO3-Solr/ext-tika/pull/124
[FEATURE] Tika version 1.24 supported
EXT:tika is now tested against version 1.24 of Apache Tika.

Thanks 

  • Rostyslav Matviyiv

Release 3.1.0 

We are happy to announce version 3.1.0 of EXT:tika.

New in this Release 

This release contains only a few features and bugfixes

Allow to preview extracted content from file module 

As TYPO3 admin you can now get a preview of the extracted content of a file from the file module, by clicking "Tika Preview", in the context menu. This is useful for debugging and to check, why which content is visible in Solr.

Tika version 1.18 supported 

EXT:tika is now tested against version 1.18 of Apache Tika.

Thanks 

Thanks to all contributors

(patches, comments, bug reports, reviews, ... in alphabetical order)

  • Timo Hund

Also a big thanks to our partners that have joined the EB2018 program:

  • 4eyes GmbH
  • Albervanderveen
  • Agentur Frontal AG
  • AlrweNWR Internet BV
  • Amedick & Sommer
  • AUSY SA
  • Bibus AG
  • Bitmotion GmbH
  • bgm Websolutions GmbH
  • bplusd interactive GmbH
  • Centre de gestion de la Fonction Publique Territoriale du Nord (Siège)
  • Citkomm services GmbH
  • Consulting Piezunka und Schamoni - Information Technologies GmbH
  • Cobytes GmbH
  • Cows Online GmbH
  • creativ clicks GmbH
  • DACHCOM.DIGITAL AG
  • Deutsches Literaturarchiv Marbach
  • food media Frank Wörner
  • Fachhochschule für öffentliche Verwaltung NRW
  • FTI Touristik GmbH
  • GAYA - La Nouvelle Agence
  • Hirsch & Wölfl GmbH
  • Hochschule Furtwangen
  • ijuice Agentur GmbH
  • Image Transfer GmbH
  • JUNGMUT Communications GmbH
  • Kreis Coesfeld
  • LINGNER CONSULTING NEW MEDIA GMBH
  • LOUIS INTERNET GmbH
  • L.N. Schaffrath DigitalMedien GmbH
  • MEDIA::ESSENZ
  • Mehr Demokratie e.V.
  • mehrwert intermediale kommunikation GmbH
  • Mercedes AMG GmbH
  • Petz & Co
  • pietzpluswild GmbH
  • pixelcreation GmbH
  • plan.net
  • Pluswerk AG
  • Pottkinder GmbH
  • PROVITEX GmbH
  • Publicis Pixelpark
  • punkt.de GmbH
  • PROFILE MEDIA GmbG
  • Q3i GmbH & Co. KG
  • ressourcenmangel an der panke GmbH
  • Roza Sancken
  • Site'nGo
  • SIWA Online GmbH
  • snowflake productions gmbh
  • Studio B12 GmbH
  • systime
  • SYZYGY Deutschland GmbH
  • Talleux & Zöllner GbR
  • TOUMORO
  • THE BRETTINGHAMS GmbH
  • TWT Interactive GmbH
  • T-Systems Multimedia Solutions GmbH
  • Typoheads GmbH
  • Q3i GmbH
  • Ueberbit GmbH
  • zdreicon GmbH
  • zimmer7 GmbH

Special thanks to our premium EB 2018 partners:

Thanks to everyone who helped in creating this release!

How to Get Involved 

There are many ways to get involved with Apache Solr for TYPO3:

Support us in 2018 by becoming an EB partner:

http://www.typo3-solr.com/en/contact/

or call:

+49 (0)69 - 2475218 0

Release 3.0.0 

We are happy to announce version 3.0.0 of EXT:tika.

New in this Release 

This release contains only a few features and bugfixes

Compatibility for EXT:solr 8.0.0 

There where several changes in EXT:solr 8.0.0 that requires adaptions in EXT:tika:

  • The ExtractionQuery was moved into "DomainSearchQuery"
  • Usage of TYPO3_user_agent constant is deprecated
  • Solr Service was splitted into read and write services

https://github.com/TYPO3-Solr/ext-tika/pull/83 https://github.com/TYPO3-Solr/ext-tika/pull/82

Add size limit for extracted files 

By now EXT:tika tried to extract the content of a file no matter how big it was. For very large files this could lead to errors and it was not possible to exclude them.

Now you can configure a limit in the extension configuration (fileSizeLimit). Above this limit a file will not we used for extraction.

By default the limit is 500 MB

Thanks: Thanks to SYZYGY for sponsoring this feature!

https://github.com/TYPO3-Solr/ext-tika/pull/77

Thanks 

Thanks to all contributors

(patches, comments, bug reports, reviews, ... in alphabetical order)

  • Rafael Kähm
  • Timo Hund

Also a big thanks to our partners that have joined the EB2018 program:

  • Albervanderveen
  • Amedick & Sommer
  • AUSY SA
  • bgm Websolutions GmbH
  • Citkomm services GmbH
  • Consulting Piezunka und Schamoni - Information Technologies GmbH
  • Cows Online GmbH
  • food media Frank Wörner
  • FTI Touristik GmbH
  • Hirsch & Wölfl GmbH
  • Hochschule Furtwangen
  • JUNGMUT Communications GmbH
  • Kreis Coesfeld
  • LOUIS INTERNET GmbH
  • L.N. Schaffrath DigitalMedien GmbH
  • Mercedes AMG GmbH
  • Petz & Co
  • Pluswerk AG
  • ressourcenmangel an der panke GmbH
  • Site'nGo
  • Studio B12 GmbH
  • systime
  • Talleux & Zöllner GbR
  • TOUMORO
  • TWT Interactive GmbH

Special thanks to our premium EB 2018 partners:

Thanks to everyone who helped in creating this release!

How to Get Involved 

There are many ways to get involved with Apache Solr for TYPO3:

Support us in 2018 by becoming an EB partner:

http://www.typo3-solr.com/en/contact/

or call:

+49 (0)69 - 2475218 0

Release 2.4.0 

We are happy to announce version 2.4.0 of EXT:tika. This is the release for TYPO3 CMS LTS 8.

New in this Release 

Support Apache Tika 1.16 

Since there as a change in the tika response in version 1.16 we adapted the detection, to check if the tika server is running.

https://github.com/TYPO3-Solr/ext-tika/pull/63 https://github.com/TYPO3-Solr/ext-tika/pull/71

Fix Solr Cell status check 

As the extract handler is configured for lazy startups, it is possible that it's not loaded while testing. This commit improves the status check by performing a test extraction, instead of checking the plugin list.

https://github.com/TYPO3-Solr/ext-tika/pull/67

Corrected several typos 

Some typos in the code and documentation have been fixed.

https://github.com/TYPO3-Solr/ext-tika/pull/68 https://github.com/TYPO3-Solr/ext-tika/pull/69

Improved documentation 

A paragraph with the configuration of the tika app was added that describes how to exlude mime types.

https://github.com/TYPO3-Solr/ext-tika/pull/70

Adjusted report to only require java for the app mode 

When you configure tika in server mode, it is not required to have java installed since it can also run on another node. Therefore the report checks have been changed to trigger an error when java is not installed when the app is used and only a warning, when the tika server mode is used.

https://github.com/TYPO3-Solr/ext-tika/pull/73

Thanks 

Thanks to all contributors

(patches, comments, bug reports, reviews, ... in alphabetical order)

  • Markus Friedrich
  • Peter Kraume
  • Rafael Kähm
  • Timo Hund

Also a big thanks to our partners that have joined the EB2017 program:

  • .hausformat
  • AGENTUR FRONTAG AG
  • amarantus - media design & conding Mario Drengner & Enrico Nemack GbR
  • Amedick & Sommer Neue Medien GmbH
  • Andrea Pausch
  • Animate Agentur für interaktive Medien GmbH
  • artig GmbH & Co. KG
  • b:dreizehn GmbH
  • BIBUS AG Group
  • Bitmotion GmbH
  • cab services ag
  • Causal Sarl
  • CHIARI GmbH
  • Citkomm services GmbH
  • clickstorm GmbH
  • Connecta AG
  • Creative360
  • cron IT GmbH
  • CYBERhouse Agentur für interaktive Kommukation GmbH
  • cyperfection GmbH
  • data-graphis GmbH
  • Deutsche Welthungerhilfe e.V.
  • Deutscher Ärzteverlag
  • Deutscher Volkshochschul-Verband
  • Die Medialen GmbH
  • die_schnittsteller gmbh
  • Dörfer engineering services
  • E-Magineurs
  • EYE Communications AG
  • Fachhochschule für öffentliche Verwaltung NRW Zentralverwaltung Gelsenkirchen
  • familie redlich AG
  • Fork Unstable Media GmbH
  • hauptsache.net GmbH
  • Havas Düsseldorf GmbH
  • Hirsch & Wölfl GmbH
  • Hochschule Furtwangen - IMZ Online Services
  • Hochschule Konstanz
  • Institut der deutschen Wirtschaft Köln Medien GmbH
  • iresults gmbh
  • ITK Rheinland
  • itl Institut für technische Literatur AG
  • jweiland.net
  • Kassenärztliche Vereinigung Rheinland-Pfalz
  • Kerstin Nägler Web & Social Media Beratung
  • Landesinstitut für Schule und Medien Berlin-Brandenburg
  • Leibniz Universität IT Services
  • Libéo
  • Lime Flavour GbR
  • LINGNER CONSULTING NEW MEDIA GMBH
  • LOUIS INTERNET
  • Maximilian Walter
  • MEDIA:ESSENZ
  • mehrwert intermediäre kommunikation GmbH
  • Mercedes-AMG GmbH
  • mlm media process management GmbH
  • n@work Internet Informationssystems GmbH
  • Netcreators
  • netz-haut GmbH
  • neuwerk interactive
  • Nintendo of Europe GmbH
  • Onedrop Solutions GmbH
  • Open New Media GmbH
  • Paints Multimedia GmbG
  • pixelcreation GmbH
  • plan2net
  • Pluswerk AG
  • polargold GmbH
  • punkt.de GmbH
  • Raiffeisen OnLine GmbH
  • ruhmesmeile GmbH
  • Rundfunk und Telekom Regulierung GmbH
  • Schweizer Alpen-Club SAC
  • sitegeist media solutions GmbH
  • Star Finanz-Software Entwicklung und Vertriebs GmbH
  • Stefan Galinski Internetdienstleistungen
  • Stratis - Toulon
  • Studio Mitte Digital Media GmbH
  • Studio 9 GmbH
  • Systime A/S
  • SYZYGY Deutschland GmbH
  • takomat Agentur GbR
  • THE BRETTINGHAMS GmbH
  • TOUMORO
  • Triplesense Reply GmbH
  • Typoheads GmbH
  • unternehmen online GmbH & Co. KG
  • Universität Bremen
  • VERDURE Medienteam GmbH
  • WACON Internet GmbH
  • webedit AG
  • Webstore GmbH
  • Webtech AG
  • wegewerk GmbH
  • Wohnungsbau- und Verwaltungsgesellschaft mbH Greifswald
  • XIMA MEDIA GmbH
  • zdreicom GmbH
  • zimmer7 GmbH

Thanks to everyone who helped in creating this release!

How to Get Involved 

There are many ways to get involved with Apache Solr for TYPO3:

Support us in 2017 by becoming an EB partner:

http://www.typo3-solr.com/en/contact/

or call:

+49 (0)69 - 2475218 0

Release 2.3.0 

We are happy to announce version 2.3.0 of EXT:tika. This is compatibility release for TYPO3 CMS LTS 8.

Thanks 

Thanks to all contributors

(patches, comments, bug reports, reviews, ... in alphabetical order)

  • Ingo Renner
  • Thomas Hohn
  • Timo Hund

Also a big thanks to our partners that have joined the EB2016 program:

  • Arrabiata Solutions GmbH & Co. KG
  • avonis
  • Bank CIC AG
  • Bitmotion GmbH
  • Citkomm services GmbH
  • cron IT
  • CS2 AG
  • Cosmoblonde GmbH
  • Daniz online markting
  • datenwerk innovationsagentur gmbh
  • Die Medialen GmbH
  • die_schnittsteller GmbH
  • E-magineurs
  • Fernando Hernáez Lopez
  • Future Connection AG
  • Gernot Leitgab
  • .hausformat
  • Hirsch & Wölfl GmbH
  • hs-digital GmbH
  • IHK Neubrandenburg
  • internezzo AG
  • jweiland.net
  • L.N. Schaffrath DigitalMedien GmbH
  • mehrwert intermediale kommunikation GmbH
  • netlogix GmbH & Co. KG
  • Pixel Ink
  • Pixelpark AG
  • pixolith GmbH & Co. KG
  • polargold GmbH
  • portrino GmbH
  • Q3i GmbH & Co. KG
  • raphael gmbh
  • RUAG Corporate Services AG
  • sitegeist media solutions GmbH
  • ST3 Elkartea
  • Star Finanz-Software Entwicklung und Vertriebs GmbH
  • Stefan Galinski Interndienstleistungen
  • Speedpartner GmbH
  • sunzinet AG
  • Systime A/S
  • SYZYGY Deutschland GmbH
  • tecsis GmbH
  • web-vision GmbH
  • websedit AG - Internetagentur
  • Webstobe GmbH
  • werkraum GmbH
  • WIND Internet
  • wow! solution
  • zdreicon AG

Thanks also to our partners who already singed up for a 2017 partnership (EB2017):

  • Amedick & Sommer Neue Medien GmbH
  • cron IT GmbH
  • b:dreizehn GmbH
  • Die Medialen GmbH
  • Leibniz Universität IT Services, Hannover
  • LOUIS INTERNET
  • polargold GmbH
  • Mercedes-AMG GmbH
  • Triplesense Reply GmbH
  • zdreicom AG

Thanks to everyone who helped in creating this release!

How to Get Involved 

There are many ways to get involved with Apache Solr for TYPO3:

Support us in 2017 by becoming an EB partner:

http://www.typo3-solr.com/en/contact/

or call:

+49 (0)69 - 2475218 0

Release 2.2.0 

We are happy to announce version 2.2.0 of EXT:tika.

New in this Release 

The following features have been added in this release:

Get supported extract file type from Tika 

Instead of using a hardcoded list of supported file types, we now get the supported types from tika, and allow the extraction from these types.

https://github.com/TYPO3-Solr/ext-tika/pull/31

Support Apache Tika 1.14 

Since there as a change in the tika response in version 1.14 we adapted the detection, to check if the tika server is running.

https://github.com/TYPO3-Solr/ext-tika/pull/44

Bugfixes 

Beside the features the following bugfixes are included:

Disable Language Extraction when using Solr Cell 

Language extraction is not supported with solr cell. Therefore it is disabled when solr cell is used.

https://github.com/TYPO3-Solr/ext-tika/pull/41

Thanks 

Thanks to all contributors

(patches, comments, bug reports, reviews, ... in alphabetical order)

  • Ingo Renner
  • Pierrick Caillon
  • Thomas Hohn
  • Timo Hund

Also a big thanks to our partners that have joined the EB2016 program:

  • Arrabiata Solutions GmbH & Co. KG
  • avonis
  • Bank CIC AG
  • Bitmotion GmbH
  • Citkomm services GmbH
  • cron IT
  • CS2 AG
  • Cosmoblonde GmbH
  • Daniz online markting
  • datenwerk innovationsagentur gmbh
  • Die Medialen GmbH
  • die_schnittsteller GmbH
  • E-magineurs
  • Fernando Hernáez Lopez
  • Future Connection AG
  • Gernot Leitgab
  • .hausformat
  • Hirsch & Wölfl GmbH
  • hs-digital GmbH
  • IHK Neubrandenburg
  • internezzo AG
  • jweiland.net
  • L.N. Schaffrath DigitalMedien GmbH
  • mehrwert intermediale kommunikation GmbH
  • netlogix GmbH & Co. KG
  • Pixel Ink
  • Pixelpark AG
  • pixolith GmbH & Co. KG
  • polargold GmbH
  • portrino GmbH
  • Q3i GmbH & Co. KG
  • raphael gmbh
  • RUAG Corporate Services AG
  • sitegeist media solutions GmbH
  • ST3 Elkartea
  • Star Finanz-Software Entwicklung und Vertriebs GmbH
  • Stefan Galinski Interndienstleistungen
  • Speedpartner GmbH
  • sunzinet AG
  • Systime A/S
  • SYZYGY Deutschland GmbH
  • tecsis GmbH
  • web-vision GmbH
  • websedit AG - Internetagentur
  • Webstobe GmbH
  • werkraum GmbH
  • WIND Internet
  • wow! solution
  • zdreicon AG

Thanks also to our partners who already singed up for a 2017 partnership (EB2017):

  • Amedick & Sommer Neue Medien GmbH
  • cron IT GmbH
  • b:dreizehn GmbH
  • Die Medialen GmbH
  • Leibniz Universität IT Services, Hannover
  • LOUIS INTERNET
  • polargold GmbH
  • Mercedes-AMG GmbH
  • Triplesense Reply GmbH
  • zdreicom AG

Thanks to everyone who helped in creating this release!

How to Get Involved 

There are many ways to get involved with Apache Solr for TYPO3:

Support us in 2017 by becoming an EB partner:

http://www.typo3-solr.com/en/contact/

or call:

+49 (0)69 - 2475218 0

Release 2.1.0 

In this release we provide compatibility changes that are needed to use EXT:tika with EXT:solr 4.0.0 and PHP 7.0

Sitemap