DEPRECATION WARNING

This documentation is not using the current rendering mechanism and is probably outdated. The extension maintainer should switch to the new system. Details on how to use the rendering mechanism can be found here.

EXT: mm_forum

Author:Kasper Skårhøj
Created:2002-11-01T00:32:00
Changed:2008-09-19T09:46:44
Author:Knut Möller / Metaways Infosystems GmbH
Email:k.moeller@metaways.de
Info 3:
Info 4:

Extension Key: powersearchindexlucene

Copyright 2008, Metaways Infosystems GmbH, Knut Möller <typo3@metaways.de>

This document is published under the Open Content License

available from http://www.opencontent.org/opl.shtml

The content of this document is related to TYPO3

- a GNU/GPL CMS/Framework available from www.typo3.com

((generated))

Table of Contents

EXT: powersearchindexlucene 1

Introduction 2

What does it do? 2

Features 3

Requirements 3

Getting started 4

Install extension 4

Creating the index 5

Known problems and solutions 6

To-Do list 6

Team 6

Contributors 6

Change log 6

Introduction

The powersearchindexlucene extension is an implementation of the popular vector search engine Lucene for Typo3.

The Zend Framework library "Zend_Search_Lucene" is used to create and query the index, so Zend Framework must be installed in include_path.

The extension is divided into indexer modules for different data sources and an interface to query the index. Currently modules are implemented Typo3 Files (via DAM or standard Typo3 filehandling), Typo3 Content.

To achieve a great flexibility, the full indexed search is split to several extensions with different responsibilities as described below.

What does it do?

powersearchindexlucene is an extension to provide indexing capability for Typo3 content and user files in Typo3. The generated index can be used and queried by defined search interface. An implementation to use the query interface is powersearchui .

First an overview of required extensions and their dependencies:

img-1 powersearch *–* containsseveral base classes for general search.

powersearchui – displays the search from and renders the result for display on typo3 page.

It is possible to override the default behaviour of the classes in powersearchindexlucene or powersearchui dirs. This will be mostly necessary to adjust the user interface.

Features

High performance search – using the popular lucene engine

Out-of-the-box Typo3 indexing

Extensibility – new Indexers can be introduced, existing methods can be overridden

Included Indexers:

T3Content

Indexer

T3Content

Type

Typo3 Standard Text Elements

comment

HTML tags are stripped out

T3Files

Indexer

T3Files

Type

Files: PDF, TXT, DOC

comment

Files placed in Typo3

T3Dam

Indexer

T3Dam

Type

Files: PDF, TXT, DOC

comment

Files linked via Typo3 DAM

Files

Indexer

Files

Type

Files: PDF, TXT, DOC

comment

Files in special directory (not linked via Typo3)

NewsML

Indexer

NewsML

Type

special Indexer for NewsML database

comment

only included as an example

Requirements

  • Zend Framework is required for lucene indexing and has to be setup ininclude_path
  • PHP >= 5.1.6 with PDO (mysql)
  • powersearch – common base classes
  • for file-indexing capability, the unix-tools "catdoc", "pdf2text" and "cat" have to be available and the path has to be configured in Extension-Manager.
  • iconv must be available (with utf8 charset support) for Zend Lucene
  • powersearchui – weak requirement, but required to query index and display output

Getting started

Install extension

To run the extension, the extension *powersearch* and the Zend Framework (Version >= 1.5) have to be installed.

The extension is installed and configured using the Extension Manager. The configuration options are explained below:

img-2

Alternative path...: can be used for custom modifications to the classes in "lib". Explained below.

Used indexers: Indexer modules to run. Case sensitive!

Directory for index files: directory relative to typo3-root where the index files are placed.

img-3

Text lengths: length for index fields where text is truncated. Important to reduce index size.

img-4

Directory with files to index: used by "Files" indexer. Directory is traversed recursively.

Location...: path to converters used in File Indexers.

img-5

Optimize Index after update: Executes Zend_Search_Lucenes optimize Method. The number of index Files is reduced to gain search speed. Takes a long time depending on indexsize and is cpu intensive.

Enable debug mode: debugging is sent to error_log. Should be off in production!

Creating the index

The indexer is run via php CLI call.

Before running the command a CLI be-user named "_CLI_powersearchindexlucene" has to be created. Admin rights are not required. Password has to be something secure, so nobody can use the user. It is not required for the CLI script.

By default the index files are placed into typo3temp/powersearchindexlucene/* so write permissions have to granted to the cli user.

Example : "php /var/www/site/htdocs/typo3conf/ext/powersearchindex lucene/cli/indexer.php"The absolute script path has to be used.

It is recommended to use a crontab entry to recreate the index i.e. once a day at night.

Known problems and solutions

  • Possible UTF8 char problem in index or PHP Error "iconv...".
  • Use at least ZendFramework 1.5
  • Capability for UTF8 charset must be included in iconv
  • Permission problem on index creation (Lucene chown problem). Chown is executed every time. Conflict in most environments, where apache user differs from cron/cli user.Workaround implemented by including a modified Zend class before the Zend Framework (in PowerSearchIndexerRunner.php).

To-Do list

Support Startingpoint option to limit indexing to subtrees of the T3 page tree

Team

Contributors

Change log

((generated))

Beta 1.0.4
  • Reverted to existing index creation
  • Removed suggestion
  • Create index directory on installation
Beta 1.0.3
  • Don't create index if none exists (doesn't seem to work properly)
  • Todo list update
Beta 1.0.2

Documentation update

Beta 1.0.1
  • Minor bugfixes
  • Changed plugin description
Beta 1.0.0

first release to TER

img-6 - 6