DEPRECATION WARNING

This documentation is not using the current rendering mechanism and is probably outdated. The extension maintainer should switch to the new system. Details on how to use the rendering mechanism can be found here.

EXT: mnoGoSearch¶

Created:	2008-11-01T07:51:37
Changed by:	Dmitry Dulepov
Changed:	2009-04-16T14:00:25
Author:	Dmitry Dulepov
Email:	dmitry@typo3.org
Info 3:
Info 4:

EXT: mnoGoSearch

EXT: mnoGoSearch¶

Extension Key: mnogosearch

This document is published under the Open Content License

available from http://www.opencontent.org/opl.shtml

The content of this document is related to TYPO3

- a GNU/GPL CMS/Framework available from www.typo3.org

Table of Contents¶

EXT: mnoGoSearch 1

`Introduction 3 <#1.1.Introduction|outline>`_

What does it do? 3

Screenshots 3

Search results in the Frontend 3

Configuring what to index 4

Requirements 5

Support for this extension 5

Translations 5

Bugs 5

`Users manual 6 <#1.2.Users%20manual|outline>`_

Specifying web space to search 6

How mnoGoSearch decides what to index 6

Specifying pages to index 6

Excluding parts of the web site from indexing 7

Indexing only real content 7

Indexing records 8

Indexing files 9

Indexing large file collections 9

Indexing https pages 9

Creating search form 10

Using TypoScript 10

Using page module 10

Using HTML 10

Creating advanced search form 11

Creating page with search results 11

Plugin mode 11

Limiting search to a certain web space 12

`Administration 13 <#1.3.Administration|outline>`_

Compiling and installing search engine 13

Compiling and installing PHP extension 14

Creating index database 14

Using mnogosearch binary and extension supplied with operating system 14

Adding cron job 15

Installing TYPO3 extension 15

Configuring Frontend plugin using TypoScript 15

Using Google Analytics to track your searches 15

FAQ 15

TYPO3SEARCH_xxx comments are not respected. What is wrong? 15

I experimented and messed up my index. How do I clear it? 16

`I removed a page. How do I remove it from index? 16 <#1.3.9.3.I%20rem oved%20a%20page.%20How%20do%20I%20remove%20it%20from%20index_|outline> `_

I receive a error “Got error 139 from the database engine” while indexing 16

There seems to be a clone of mnoGoSearch called DataParkSearch. What is it? 16

What does “mnoGoSearch” mean? 16

`Configuration 17 <#1.4.Configuration|outline>`_

TypoScript reference 17

Command line tool parameters 18

`Tutorial 19 <#1.5.Tutorial|outline>`_

`Known problems 20 <#1.6.Known%20problems|outline>`_

`To-Do list 21 <#1.7.To-Do%20list|outline>`_

`ChangeLog 22 <#1.8.ChangeLog|outline>`_

Introduction¶

What does it do?¶

This extension provides an alternative search engine for TYPO3. It features high performance, relevancy, true crawler, searching for word forms (go/goes, man/men), clone detection, suggest mode for misspelled words, great scalability, Google–like look. The extension can be configured to index and search pages, record and files. When searching thousand of pages, the performance of this extension is much better than any other existing TYPO3 search solution known to the author of the extension.

This extension requires external software to be installed on the server. The software can be downloaded from the http://www.mnogosearch.org/ web site. This software is a search engine that works behind this extension and provides indexing and searching services. Additionally mnoGoSearch PHP module is required. This manual contains instructions on building the search engine and PHP extension. Building can be performed even by unexperienced users if the follow instructions exactly.

In general mnoGoSearch extension outperforms standard indexed search extension. It is much faster and more feature rich. It has all features of indexed search but much more efficient.

Screenshots¶

This section shows how mnoGoSearch looks like in action. Screenshots in this section come from different sites, therefore visual styling also differs.

Search results in the Frontend¶

The following screenshot shows search results. Notice file type icon in the first result (OpenOffice document), relevancy indicator (green bar), size and last modification date. The second result did not provide last modification date and it is not displayed in the result.

The extension uses rich page browser to allow better navigation. Page browser can be customized to show as many page links as necessary:

Configuring what to index¶

The following screenshot shows Backend configuration of web space to be indexed. It says that the whole web site should be indexed:

Next, parts of the web site are prohibited from being indexed. These pages contain news and FAQ items. We will index them differently.

Finally we index FAQ items and news. Here is how indexing of news look like:

The reasons to index news, FAQ and some other records like this will be explained later in this manual.

Requirements¶

mnoGoSearch extension does not work on Windows servers because corresponding PHP extension is not available for Windows. It works fine on Linux, Unix, FreeBSD and Mac OS X servers.

RealURL or CoolURI is necessary if some parts of the site has to be excluded from search. See “Specifying web space to search” for more information.

To compile search engine and PHP extension, gcc and accompanying GNU build tools must be installed on the server.

Support for this extension¶

Free support for this extension is available through TYPO3 mailing lists. Author does not provide free support by e–mail. Commercial support is available on request when time permits.

Translations¶

Translation of this extension happens only through TYPO3 translation server. Please, do not send translation to the author as they will not be accepted. Instead contact TYPO3 translators using corresponding TYPO3 mailing list.

Bugs¶

Bugs must be reported only by using http://forge.typo3.org/projects /extension-mnogosearch/issues tracker. Bugs must not be sent by e–mail because such e–mails are not processed.

Users manual¶

This section describes how and what end users should do to enable searching web pages using mnoGoSearch. If you are looking for “Quick start”–like guide, you should check the “Tutorial” section first. It describes the workflow to get mnoGoSearch up and running quickly. This section describes various options to search pages.

Specifying web space to search¶

This section describes how to specify what the extension will search and index. Often the whole web site can be indexed but sometimes certain parts of the web site should not be indexed or should be indexed in a more efficient manner than just indexing pages. This section explains how to do it all.

How mnoGoSearch decides what to index¶

mnoGoSearch sees web sites as a hierarchical structure. When indexing, it needs to know where to start indexing. Typically the start of the hierarchy is the root of the web site (like :code:``http://example.com/ <http://example.com/>`_` ). But if necessary there can be many starting points (like :code:``http://example.com/products/ <http://example.com/>`_` and :code:``http://example.com/services/ <http://example.com/services/>`_` ). In this case search will be limited to corresponding starting points and everything below them (i.e. :code:``http://example.com/products/navigation/ <http://example.com/products/navigation/>`_` ) . Any pages outside of the configured starting points are not indexed and therefore not searchable.

The important point in the information above is that web site can be indexed as whole ( :code:``http://example.com/ <http://example.com/>`_` ) or as parts. When indexing as parts, site URLs should be hierarchical, which implies usage of RealURL or CoolURI.

When the whole site in indexed, some pages may still need to be excluded. mnoGoSearch provides a way to disallow certain pages from indexing. This can be accomplished by either using No search checkbox in page properties. When multiple pages starting from a certain page should not be indexed (like checkout pages), mnoGoSearch allows to disable hierarchies by specifying path to the hierarchy.

Specifying pages to index¶

To specify pages for indexing, an indexing configuration record should be created. While creating these records, it is important to keep in mind that mnoGoSearch works with URL path hierarchies.

The first step in specifying pages for indexing is to choose where indexing records are stored. Typically it will be a web site home page or a storage folder. It does not make much difference. However it is good to be consistent and keep all indexing records for a web site on a single page. It allows to see what is actually indexed and what is excluded from indexing.

To create indexing configuration record, navigate to the page and use List module to create indexing configuration record. By default records are of the type “Server”. This is a simplest possible type. It specifies indexing starting point as a path withing the web site. For example, to index the full web site, it should be http://example.com/ (assuming that example.com is your web site domain). Note the trailing slash, it is necessary if the URL does not include any other path. Below is how such indexing records look like:

Additional options include indexing period (24 hours is the default) and “Additional indexing configuration”. The latter allows to enter mnoGoSearch configuration directives directly. They will be appended to the generated indexer configuration. Information about directives can be found at http://mnogosearch.org/doc33/ . Notice that this field is not validated and any wrong directives will result in fatal error during indexing.

The next type of indexing records is a “Realm”. Realm is very similar to “Server” but it allows to use regular expressions or wildcards to specify paths. For example, one can enter http://example.com/(news\|faq)/.\* as a path. Make sure that correct comparison type is specified:

Excluding parts of the web site from indexing¶

To exclude parts of the web site from indexing, create an indexing configuration record as described above bit set method to “Disallow”. It will prohibit any pages starting from the current path from indexing. The the screenshot above (“Real” record).

Notice that such records should appear in the List module before records that en able site indexing. The first record takes precedence when matching URLs. For example, consider http://example.com/page/?excludeMe=1. This order is correct:

Disallow: *?excludeMe=*
Allow: http://example.com/

This will first check “disallow” rule. If it matches, it will be used. It means that http://example.com/page/ will be indexed but http://example.com/page/?excludeMe=1 will not. However consider these rules:

Allow: http://example.com/
Disallow: *?excludeMe=*

Now both URLs will match because they match to the first rule and “disallow” rule will never work.

So when disallowing some pages from being indexed, always put disallow rule before the rule that allows indexing.

Indexing only real content¶

To improve search relevancy some parts of the page should be excluded from indexing. Such parts include navigation (menu), logo, copyright, statistics, partner links, copyright, etc. Typically only the real content should be included into index.

Special HTML comments can be added to the page to tell the indexer what parts of the site should be indexed. There can be many such markers on a single page. Here is a HTML fragment that illustrates how to add such markers:

<body>
<div id=”logo”>My site</div>
<!-- menu -->
<ul>
       <li><a href=”/products/”>Products</a></li>
       ...
</ul>
<!-- content -->
<!--TYPO3SEARCH_begin-->
<div id=”content”>
    Here goes real web site content...
</div>
<!--TYPO3SEARCH_end-->
<ul id="partner-links">
       <li><a href="http://example.com/">Partner web site</a></li>
       ...
</ul>
<!-- extra content -->
<!--TYPO3SEARCH_begin-->
<div id=”extra-content”>
    Here goes another content block...
</div>
<!--TYPO3SEARCH_end-->
<div id=”copyright”>Copyright © My company.</div>
</body>

In the example above content inside TYPO3SEARCH_xxx will be indexed and all links outside of these comments will be followed (added to the indexer queue). Notice that there must be no spaces or line breaks in these comments. They must be spelled exactly as shown in the example above.

Note that TemplaVoila creates such markers automatically. Other templating engines do not add such markers automatically.

Indexing records¶

In certain cases indexing content as pages is not efficient. For example, it is more efficient to index news records as records than as pages. When indexing news as pages, it adds more content than necessary to the index, increase load on the web server and lowers search relevance. When indexing news items as records, mnoGoSearch indexes only title and text fields fields. Thus only true news text is searchable.

Same applies to the FAQ (extension irfaq ) and some other extensions that store information as records.

To index records, indexing configuration for them should be created. To create indexing configuration for records navigate to the page of the web site you have chosen to store indexed configuration at. Then create indexing configuration record and set its type to “Records”. Next choose the table you want to index. The form will refresh. Here is how it will look like of “News” table from tt\_news extension is chosen:

The form requires a title and text fields of the record to be selected. There must be one title field and one or more text fields to index. Text fields will be concatenated together during indexing. Notice that no conversion done on fields. Thus using “Archive date” in the form above will not be useful because this field is stored as integer value in the database. Only true text fields should be selected.

Next parameter to specify is URL parameters for the item's single view. For most extensions it looks like &tx\_extkey\_pi1[showUid]={field:uid} . For tt\_news it looks like shown on the screenshot above. The & symbol in the beginning of the parameter is mandatory. {field:uid} is replaced uid of the record. No other substitutions available.

It is possible to limit indexing to records from the certain storage folder. This way, for example, only news records of the web site will be indexed and not any imported news in another sysfolder.

Indexing files¶

Indexing files is possible in the same way as indexing pages. Specify correct path to files ( :code:``http://example.com/fileadmin/ <http://example.com/fileadmin/>`_` and :code:``http://example.com/uploads/ <http://example.com/uploads/>`_` ) to allow indexing them. The rest is done automatically. Directories must show index of files in them (use Apache mod\_autoindex ).

To index file you successfully you must ensure that file parsing applications (like catdoc or pdftotext are installed on the server to the default places, normally /usr/bin ).

Currently mnoGoSearch supports indexing for the following file types:

sxw¶

Extension

sxw

odt

Mime type

application/vnd.oasis.opendocument.text

Requires applications

unzip

Description

OpenOffice document, requires unzip to be in the current execution path

doc¶

Extension

doc

Mime type

appication/msword

Requires applications

catdoc

Description

Microsoft Office document

xls¶

Extension

xls

Mime type

application/vnd.ms-excel

Requires applications

xmltohtml

ppt¶

Extension

ppt

Mime type

application/vnd.ms-powerpoint

Requires applications

pptohtml

pdf¶

Extension

pdf

Mime type

application/pdf

Requires applications

pdftotext

Description

Adobe PDF

txt¶

Extension

txt

Mime type

text/plain

Requires applications

Description

Plain text

html¶

Extension

html

Mime type

text/html

Requires applications

Description

HTML

Web servers must be configured to return correct mime type when file is downloaded. With Apache, use AddType Apache directive to add mime type:

AddType application/vnd.oasis.opendocument.text *.sxw
AddType application/vnd.oasis.opendocument.text *.odt

Indexing large file collections¶

If number of files is large, it does not make sense to fetch them all using HTTP. In this case an additional directive into the “Additional configuration” field of the indexing configuration for files. This directive will force the indexer to access files locally instead of fetching them through HTTP. Assuming that files are located at :code:``http://example.com/fileadmin/fileserver/ <http://example.com/fileadmin/fileserver/>`_` and physically at /path/to/fileadmin/fileserver/ , the following directive should be added:

Alias http://example.com/fileadmin/fileserver/ file:///path/to/fileadmin/fileserver/

Notice the correct number of slashes in paths.

Indexing https pages¶

Indexing https pages with self–signed certificates is not possible directly. mnoGoSearch indexer will refuse to index sucvh pages because it will not see certificate as valid. If obtaining a valid certificate is not an option, there is another way to index such pages. For that an utility named “curl” should be installed on the server.

Firsts, navigate to the web site root and execute the following command:

php typo3/cli_dispatch.phpsh mnogosearch -d | grep X-TYPO3

It will produce the output similar to:

HTTPHeader "X-TYPO3-mnogosearch: d3e203fdb699f7ba6ad7396fdba5c25a"

Note the part in quotes.

Next create a new file named curl.sh somewhere in the file system. If many sites run on the same host, it makes sense to put this file inside the web site space. Put the following content into this file:

#!/bin/sh
curl -i -k -H "X-TYPO3-mnogosearch: d3e203fdb699f7ba6ad7396fdba5c25a" $1 2>/dev/null

Note the part in quotes, it is taken from the output of the previous command. Do not copy this example! The header is unique for each site, even for sites running on the same server!

-H option adds a special HTTP header to the HTTP request. This header tells the extension that indexer is running. The extension will exclude all content outside of TYPO3SEARCH_xxx markers from indexed data. See “Indexing only real content” chapter for more information.

This script will fetch https pages even if certificate is self–signed. Make this file accessible and executable for the current use only:

chmod 0700 /path/to/curl.sh

Warning! Setting permissions like this is extremely important! Neither web server should be able to read this file, nor execute it. If permissions are not set correctly, security of the web site will be compromised!

Next add the following lines to the “Additional configuration” of the first indexing configuration you have:

Alias https:// exec:/path/to/curl.sh?https://

This will call this script for https:// scheme to fetch pages. Now https pages with self–signed certificates can be indexed too.

Make sure that /path/to/curl.sh points to the script. /path/to above is the placeholder for the real path.

Creating advanced search form¶

Creating advanced search form needs a little more work in addition to creating simple search form. Currently advanced search form will display only one additional field. This field will allow to select what part of web site is to search. This field is hidden by default in configuration until version 2.1.8 when it will become enabled by default.

To enable advanced search form administrator should define more than one indexing configuration for the web site. For example, he can define configurations like “Everywhere” ( http://example.com/ ), “News only” (table: “News”), “FAQ only” (table: “FAQ”). Next these configurations should be added to the search limit field in the plugin's flexform configuration or their ID values should be added to the TypoScript property named “siteLimits”. Finally the selector should be enabled in TypoScript:

lib.advanced_form < plugin.tx_mnogosearch_pi1
lib.advanced_form.form.advanced {
       siteSelector = select
}

This code will render selector as a HTML <select> element. Other possible options are: radio or checkboxes .

It is possible to apply search limits but hide them in the advanced search form. For example, if “FAQ” indexing configuration has id value equals to 5, the following will TypoScript hide it from the form:

lib.advanced_form.form.advanced.siteSelector.exclude = 5

When there are several sites, current site is typically set as the default for search. This is accomplished by using the default option, which is set to the id of the indexing configuration for the current site:

lib.advanced_form.form.advanced.siteSelector.default = 3

Another important feature of the advanced search form is the ability to search the whole web site. Notice that it is not the same as “Everywhere” configuration above. “Everywhere” configuration above defines searching http://example.com/ , which means that “FAQ” and “News” are not necessarily be searched. To add search for the whole web site the following TypoScript property should be set:

lib.advanced_form.form.advanced.siteSelector.searchAll = 1

Now a new option will be added to the search form as the first item. It will make sure that site is searched as if search was submitted using simple search form (use only limits defined in siteLimit TypoScript property or plugin flexform configuration).

searchAll can be selected as the default option by setting the default value too empty string.

But now there will be to “Everywhere” entries in the search form: one from the searchAll and another from the indexing configuration. Hiding the second will leave only one option and make a proper form. Assuming that ID valie for the “Everywhere” indexing configuration is 3, we should have the following TypoScript for the advanced search form:

lib.advanced_form < plugin.tx_mnogosearch_pi1
lib.advanced_form {
       # Add "Everywhere", "FAQ" and "News" limits to the search
       siteList = 3,4,5
       form.advanced {
               # Enable selector
               siteSelector = select
               siteSelector {
                       # Allow to search the whole site (configurations 3, 4, and 5)
                       searchAll = 1
                       # Exclude the configuration with id=3 from the advanced search form only
                       exclude = 3
               }
       }
}

Creating page with search results¶

To create a page with search results an instance of the mnoGoSearch plugin should be inserted to the page.

Plugin mode¶

Plugin options contain only two controls: plugin mode and optional selector to limit search to certain web space. Plugin mode can be set to one or more of:

Short search formThis form typically contains a short search box and a submit button
Long search formCurrently it is identical to the short search form but has a larger search box, different text on the button. In future this form will be extended to contain search limit controls.
Search resultsThis mode shows search results

Limiting search to a certain web space¶

If mnoGoSearch SQL database contains index for several sites, search will return results for every site. Sometimes search results should be limited to the current domain only or even to a part of the domain.

To limit search results to the domain, the field named “Limit search to” should be used. If this field is empty, search is not limited.

When this field is not empty, it must contain indexing configuration records that meet two criterias:

they are of type “Server” or “Records” (“Realm” is not supported!)
for “Server” type record indexing method must be “Allow” (any other method is not supported)

When field is not empty, mnoGoSearch will take all URLs from this field and returns results only that match these records. For example, if this field contains the following limits:

Server: http://example.com/products
Records: News, pid=12345

So if the whole server is indexed (http://example.com/), search result will contain only pages with news and pages with search terms on http://example.com/products and any page below in the page hierarchy.

Notice: it is impossible to use “Disallow” records! So it is not possible to disallow “News” and allow the rest of the site.

This feature behaves very similar to Google's domain search feature. In Google it is possible to search for a term everywhere by typing “term” in the Google search box. However it is possible to limit searach results to certain domains only using “term site:example.com/products” in the Google search box.

See “Specifying web space to search” above for information about indexing configurations.

Administration¶

This section provides information about extension installation.

Compiling and installing search engine¶

While some platforms may include mnoGoSearch, it is advised that you download mnoGoSearch source code, compile and install it. This will minimize compatibility problems between all three components of mnoGoSearch.

These instructions assume that MySQL is used as a database backend. No other backends were tested.

Download Unix source code of mnoGoSearch from its web site at http://www.mnogosearch.org/download.html . Make sure that you download at least version 3.3.6 (earlier versions may work but not supported by this document or TYPO3 extension).

Note: it is known that mnogosearch version 3.3.7 has a bug in the PHP extension module. Do not use version 3.3.7!

Server, where mnoGoSearch is compiled, must have the following packages installed:

automake
autoconf
gcc
php-devel
mysql
mysql-devel ( libmysqlclient15-dev for Debian, :code :libmysqlclient-devel for SuSE)
zlib
zlib-devel
mc (not needed if you want to use editor other than mcedit )

Later text assumes that mnoGoSearch source code is downloaded to the /tmp directory. Unpack it using

tar xzf mnogosearch-3.3.6.tar.gz

This will create a directory mnogosearch-3.3.6 inside the /tmp directory with all files in it.

Now locate file named configure.in and modify it to change value of HAVE\_PGSQL from 1 to 0 (near line 990):

cd mnogosearch-3.3.6
mcedit configure.in

While in editor, press F7 (search) and HAVE\_PGSQL , press ENTER. Line will look like:

AC_DEFINE([HAVE_PGSQL], [1], [Define if you want to use PostgreSQL])

Change 1 to 0. Press F2 to save and F10 to exit editor.

Check the file named configure near line 26517 and see if value of HAVE\_PGSQL is 0. Change if not.

Next source code must be configured for build:

./configure --prefix=/opt/mnogosearch --disable-mp3 --disable-news --without-debug --with-pgsql=no --with-freetds=no --with-oracle8=no --with-oracle8i=no --with-iodbc=no --with-unixODBC=no --with-db2=no --with-solid=no --with-openlink=no --with-easysoft=no --with-sapdb=no --with-ibase=no --with-ctlib=no --with-zlib --with-mysql --disable-syslog

All the above should come on a single line.

On Mac OS X only add the following to the end of configure command:

CFLAGS=”-arch x86_64 -arch i386” LDFLAGS=”-arch x86_64 -arch i386” --enable-all-static

Warning: copy/paste of the line above will not work. Type it manually!

Now open include/udm\_autoconf.h and locate

/* #undef HAVE_PGSQL */

near line 133 and change to

#undef HAVE_PGSQL

and save. Assuming that there were no errors in configure execution, make and install extension:

make
make install

Now check /opt/mnogosearch directory. There should be subdirectories like bin , etc , var , sbin (may be some others).

Mac OS X users: read this post for additional information.

Compiling and installing PHP extension¶

Prepare and compile PHP module:

cd /tmp/mnogosearch-3.3.6/php
phpize
./configure --with-mnogosearch=/opt/mnogosearch

On Mac OS X only add the following to the end of configure command:

CFLAGS=”-arch x86_64 -arch i386”

This properly configures mnoGoSearch PHP extension for the current PHP version.

Next step is extremely important to get PHP extension compiled and working right. Open php_mnogo.c in text editor and add

#undef HAVE_PGSQL

on the new line after

#include "php.h"

To make extension execute

make

Extension files will be places to "modules" subdirectory. Now you need to check your php.ini to find where PHP extensions are located (search for extension_dir there) and place mnogosearch.so from modules subdirectory into the directory identified by extension\_dir .

If extension\_dir does not contain file system path but something like ./ , run this from the shell:

php -i | grep extension_dir

This will give you actual value of extension\_dir parameter.

Next you need to add the following line to php.ini to enable extension:

extension=mnogosearch.so

Restart Apache and check for any error message in error log. If you find nothing related to mnogosearch, then all went fine.

Creating index database¶

You need to create mnoGoSearch index database manually. Do not use TYPO3 database to store indexing data because TYPO3 may remove your tables (they are not defined as TYPO3 tables).

To create index database, launch mysql command line tool with root account or account, who can create databases and grant privileges. The following is a snapshot of Linux shell session that will create mnoGoSearch database. You, however, can use any other tool (such as your web hosting control panel) to create a new MySQL database.

prompt$ mysql -u root -p mysql
Enter password: ****
> create database mnogosearch;
> grant all privileges on mnogosearch.* to 'jane'@'localhost' identified by 'janescomplexpassword';
> exit;
prompt$

The "create database" statement above creates a new database named mnogosearch . grant all privileges statement grants permissions to user jane , who connects from localhost to all tables in mnogosearch database. Notice use of single quotes in the last statement.

You can choose any database name but you need to memorize it because you will need it during the next step.

Database connection parameters will be entered later in the extension configuration. See Extension configuration later in this manual.

Using mnogosearch binary and extension supplied with operating system¶

Using mnogosearch binary and PHP extension supplied with the operating system of the server is possible but not recommended. The author found that those binaries do not always work correctly or severely outdated. The author will not accept any bug reports for such versions.

Additionally such versions are often installed in the location other then /opt/mnogosearch/ , which means that TYPO3 extension will not work. It is strongly recommended to compile both mnogosearch engine and PHP extension. If it is not possible, the administrator must ensure that mnogosearch binary is located in /opt/mnogosearch/ or make a symbolic link for the indexer binary at /opt/mnogosearch/sbin/ .

Adding cron job¶

Cron script should be run once a day to perform reindexing in the database. Cron script should have access to mysql and mnogosearch engine (the once installed at /opt/mnogosearch/ ).

To install a cron script, enter the following at Linux shell prompt:

crontab -e

It will open and editor. If you did not configure any special editor, it will be default Limux editor (either "vi" or "vim"). Press "i" button and enter the following at the end of the file as one line :

0 3 * * * /path/to/php/php5 -q /path/to/web/site/typo3/cli_dispatch.phpsh mnogosearch -w -n &>/dev/null

Make sure to enter correct paths. The /path/to/web/site is the path to the directory where typo3conf/ is located.

The cron line executes CLI script every day at 03:00. You can choose any hour by altering the second digit or create completely your own schedule. See cron Linux manual for more information.

Next create a new user in TYPO3 Backend. Name should be \_cli\_ mnogosearch user. Passwords and permissions do not matter. TYPO3 requires this user for CLI scripts. No users will be able to login to BE with this user account.

Installing TYPO3 extension¶

When installing extension, several options must be specified. The most important is the mnoGoSearch database. When specifying the database, make sure that slash exists before the question mark in the database URL. Failing to add the slash will result in connection problems to the mnoGoSearch database while indexing.

Configuring Frontend plugin using TypoScript¶

Most of the extension options are configured through TypoScript. The extension already has old good defaults, so settings rarely need to be changed. A full list of settings is shown in the “Configuration” section of this manual.

Using Google Analytics to track your searches¶

Google Analytics is able to track what users search on the web sites. The following screenshot shows how to configure Google Analytics to track mnoGoSearch searches:

If Google Analytics support is required, search forms must be submitted as “get” requests.

FAQ¶

TYPO3SEARCH_xxx comments are not respected. What is wrong?¶

Here is the checklist:

make sure that there are no spaces around TYPO3SEARCH_xxx markers. They must look exactly like  and 
check that comments are in the right order and not nested. The first comment must be TYPO3SEARCH_begin, followed by TYPO3SEARCH_end. Two consecutive TYPO3SERACH_begin or TYPO3SEARCH_end comments will mess up the indexing
if nc\_staticfilecache extension is installed, check its version. Extension with versions < 2.3.2 conflict with mnogosearch extension. Read the bug report at http://forge.typo3.org/issues/show/2291 and follow instructions there

I experimented and messed up my index. How do I clear it?¶

Run command line tool like this:

php typo3/cli_dispatch.phpsh mnogosearch -x -Cw

Answer YES (in block letters!) to the prompt.

I removed a page. How do I remove it from index?¶

The extension does it automatically. There is no need to do anything like this. Same happens when page is renamed: old address is deleted from index and new address is added.

I receive a error “Got error 139 from the database engine” while indexing¶

This may happen if you use database mode blob and InnoDb as your default database engine. To fix the problem open mysql command line tool for the search database and run this command:

alter table bdicti engine=myisam;

This will convert corresponding table to MyISAM format. Next restart the indexer.

There seems to be a clone of mnoGoSearch called DataParkSearch. What is it?¶

This is a fork of the mnoGoSearch project. It focuses on documentation and code quality improvements. It lacks PHP extension, so it cannot be used as a replacement for mnoGoSearch in TYPO3. mnoGoSearch PHP extension is incompatible with DataParkSearch.

What does “mnoGoSearch” mean?¶

This is about playing with words. Original developers are from Russia. “Mnogo” in Russian means “much” or “many” or “a lot”.

Configuration¶

This section contains a description of TypoScript configuration for the mnoGoSearch TYPO3 extension.

TypoScript reference¶

The table below lists top–level TypoScript properties.

`templateFile`¶

Property

templateFile

Data type

String

Description

Template file to use

Default

EXT:mnogosearch/resources/template.html

`mode`¶

Property

mode

Data type

String

Description

What to show in the plugin. Supported modes are:

short_form
long_form
search

Default

long\_form,results

`form`¶

Property

form

Data type

->FORM

Description

Search form options

Default

`search`¶

Property

search

The table below lists TypoScript properties for search forms.

`siteSelector`¶

Property

siteSelector

Data type

string

Description

How to render the search limit selector. Valid values are:

disabled
select
radio
checkboxes

Default

`siteSelector.`¶

Property

siteSelector.

Property

pageBrowser

Data type

Description

Configuration of the “pagebrowse” plugin

`resultsPerPagenumber\_stdWrap`¶

Property

resultsPerPagenumber\_stdWrap

Data type

stdWrap

Description

All numbers in the summary will be wrapped with this stdWrap.

`resultTime\_stdWrap`¶

Property

resultTime\_stdWrap

Data type

stdWrap

Property

weightFactor

Data type

String

Description

A hexadecimal string that describes weight factors for different document sections. Default value puts priority on title and than body text.

Details on this option can be found at Mnogosearch documentation (see “wf” there)

Command line tool parameters¶

This section shows command line tool parameters. This is identical to running typo3/cli\_dispatch.phpsh mnogosearch help .

-c                Only check and create database if necessary. Do not reindex.
-d                Display generated indexer configuration and exit.
-n                Force reindexing of new URLs (normally should be set)
-p pid            Process indexing configuration only from this pid
-w                Create statistic for misspelled words. Useful only if
.                   Ispell dictionaries are included to mnoGoSearch
.                   configuration (see mnoGoSearch documentation)
--dry-run         Show what will be done (not applicable to -d and -E)
-h, --help, -?    Display this help message
-x                Pass the argument to mnoGoSearch indexer
-v level          Be verbose. Level is 0-5. Default is 0 (complete silence)

Tutorial¶

This section provides step by step instructions to get mnoGoSearch up and running.

The following steps should be done to use mnoGoSearch for searching the web site. Refer to the corresponding sections of this manual for more information.

Compile and install search engine and PHP extension

Create database for index

Create indexing configuration records

Run mnoGoSearch command line tool manually with -v 3 -w options to ensure that it indexes the web site

Install cron job for regular reindexing

Add search results page and configure Frontend plugin on it

Add search box to pages of your site

Known problems¶

None.

To-Do list¶

Extend search form with options to limit search to certain indexing configurations

ChangeLog¶

See the “ChangeLog” file in the extension.

22

EXT: mnoGoSearch¶

EXT: mnoGoSearch¶

Table of Contents¶

Introduction¶

What does it do?¶

Screenshots¶

Search results in the Frontend¶

Configuring what to index¶

Requirements¶

Support for this extension¶

Translations¶

Bugs¶

Users manual¶

Specifying web space to search¶

How mnoGoSearch decides what to index¶

Specifying pages to index¶

Excluding parts of the web site from indexing¶

Indexing only real content¶

Indexing records¶

Indexing files¶

sxw¶

doc¶

xls¶

ppt¶

pdf¶

txt¶

html¶

Indexing large file collections¶

Indexing https pages¶

Creating search form¶

Using TypoScript¶

Using page module¶

Using HTML¶

Creating advanced search form¶

Creating page with search results¶

Plugin mode¶

Limiting search to a certain web space¶

Administration¶

Compiling and installing search engine¶

Compiling and installing PHP extension¶

Creating index database¶

Using mnogosearch binary and extension supplied with operating system¶

Adding cron job¶

Installing TYPO3 extension¶

Configuring Frontend plugin using TypoScript¶

Using Google Analytics to track your searches¶

FAQ¶

TYPO3SEARCH_xxx comments are not respected. What is wrong?¶

I experimented and messed up my index. How do I clear it?¶

I removed a page. How do I remove it from index?¶

I receive a error “Got error 139 from the database engine” while indexing¶

There seems to be a clone of mnoGoSearch called DataParkSearch. What is it?¶

What does “mnoGoSearch” mean?¶

Configuration¶

TypoScript reference¶

templateFile¶

mode¶

form¶

search¶

->FORM¶

resultsPage¶

advanced¶

-> ADVANCED¶

siteSelector¶

siteSelector.¶

-> SELECTOR¶

searchAll¶

exclude¶

default¶

->SEARCH¶

excerptSize¶

excerptPadding¶

excerptHighlight¶

extendedConfiguration¶

minimumWordLength¶

maximumWordLength¶

numberOfSections¶

options¶

pageBrowser¶

resultsPerPagenumber\_stdWrap¶

`templateFile`¶

`mode`¶

`form`¶

`search`¶

`resultsPage`¶

`advanced`¶

`siteSelector`¶

`siteSelector.`¶

`searchAll`¶

`exclude`¶

`default`¶

`excerptSize`¶

`excerptPadding`¶

`excerptHighlight`¶

`extendedConfiguration`¶

`minimumWordLength`¶

`maximumWordLength`¶

`numberOfSections`¶

`options`¶

`pageBrowser`¶

`resultsPerPagenumber\_stdWrap`¶

`resultTime\_stdWrap`¶

`siteList`¶

`sortMode`¶

`time\_format`¶

`weightFactor`¶