Files

The files indexer allows you to index content of files from the file system. Currently the indexer supports indexing of the following files: : PDF, DOC, PPT, XLS, DOCX, PPTX, XLSX.

Note: Until version 1.2 only PDF and PPT files could be indexed.

  • directory based (with FAL support)
  • content based with FAL suppport

System requirements

  • For PDF indexing you will need to have the external tools „pdfinfo“ and „pdftotext“ installed (in Ubuntu Linux they come with the package poppler-utils).
  • For PPT indexint you will need to have the external tool „catppt“ installed (in Ubuntu Linux it comes with the package catdoc).

Please use the extension manager settings to tell ke_search the filepaths where to find these tools.

Directory based file indexer with FAL support

You can specify the folders ke_search should index.Since the files are indexed directly from the file system, there’s no access check! Please make sure only public content is in the folders you make searchable.

You can select a FAL storage from where you want to index files. If you do so, FAL metadata will be indexed. Categories will be used to generate tags (like in the news indexer), this makes it possible to do facetting over files.

Configuration:

  • Set a title (only for internal use).
  • Set the record storage page of search data your search data folder.
  • Set the type to „Files”.
  • Select FAL storage or select “Don’t use FAL”.
  • Select one or more directory which contain your files to be indexed. They have to be subdirectories of „fileadmin/“. If you selected a FAL storage, the directories must be subdirectories of your storage.
  • Enter the list of allowed file extensions. Only files with extensions the indexer supports will be indexed. If you use FAL indexing, you can also provide other filetypes, eg. JPG. From these files the metadata will be indexed.

Content based file indexer with FAL support

This indexer detects files while indexing pages and content elements and indexes the files automatically. Supported content element types are “Text” / “Text with image” and “Filelinks”.

Just create an indexer configuration of type “pages” or “content elements”. Indexing of files will take place automatically.

File types which should be indexed can be specified in the indexer configuration. Leaving the field empty will have the effekt that no files will be indexed.

Content restrictions from the linking content elements will be taken into account.

FAL metadata will be indexed. Tags will be generated from categories (like in the news indexer).