DEPRECATION WARNING
This documentation is not using the current rendering mechanism and is probably outdated. The extension maintainer should switch to the new system. Details on how to use the rendering mechanism can be found here.
EXT: mnoGoSearch¶
Created: | 2008-11-01T07:51:37 |
---|---|
Changed by: | Dmitry Dulepov |
Changed: | 2009-04-16T14:00:25 |
Author: | Dmitry Dulepov |
Email: | dmitry@typo3.org |
Info 3: | |
Info 4: |
EXT: mnoGoSearch
EXT: mnoGoSearch¶
Extension Key: mnogosearch
Copyright 2004-2009, Dmitry Dulepov, <dmitry@typo3.org>
This document is published under the Open Content License
available from http://www.opencontent.org/opl.shtml
The content of this document is related to TYPO3
- a GNU/GPL CMS/Framework available from www.typo3.org
Table of Contents¶
`Introduction 3 <#1.1.Introduction|outline>`_
Search results in the Frontend 3
`Users manual 6 <#1.2.Users%20manual|outline>`_
Specifying web space to search 6
How mnoGoSearch decides what to index 6
Excluding parts of the web site from indexing 7
Indexing large file collections 9
Creating advanced search form 11
Creating page with search results 11
Limiting search to a certain web space 12
`Administration 13 <#1.3.Administration|outline>`_
Compiling and installing search engine 13
Compiling and installing PHP extension 14
Using mnogosearch binary and extension supplied with operating system 14
Configuring Frontend plugin using TypoScript 15
Using Google Analytics to track your searches 15
TYPO3SEARCH_xxx comments are not respected. What is wrong? 15
I experimented and messed up my index. How do I clear it? 16
`I removed a page. How do I remove it from index? 16 <#1.3.9.3.I%20rem oved%20a%20page.%20How%20do%20I%20remove%20it%20from%20index_|outline> `_
I receive a error “Got error 139 from the database engine” while indexing 16
There seems to be a clone of mnoGoSearch called DataParkSearch. What is it? 16
What does “mnoGoSearch” mean? 16
`Configuration 17 <#1.4.Configuration|outline>`_
Command line tool parameters 18
`Tutorial 19 <#1.5.Tutorial|outline>`_
`Known problems 20 <#1.6.Known%20problems|outline>`_
`To-Do list 21 <#1.7.To-Do%20list|outline>`_
`ChangeLog 22 <#1.8.ChangeLog|outline>`_
Introduction¶
What does it do?¶
This extension provides an alternative search engine for TYPO3. It features high performance, relevancy, true crawler, searching for word forms (go/goes, man/men), clone detection, suggest mode for misspelled words, great scalability, Google–like look. The extension can be configured to index and search pages, record and files. When searching thousand of pages, the performance of this extension is much better than any other existing TYPO3 search solution known to the author of the extension.
This extension requires external software to be installed on the server. The software can be downloaded from the http://www.mnogosearch.org/ web site. This software is a search engine that works behind this extension and provides indexing and searching services. Additionally mnoGoSearch PHP module is required. This manual contains instructions on building the search engine and PHP extension. Building can be performed even by unexperienced users if the follow instructions exactly.
In general mnoGoSearch extension outperforms standard indexed search extension. It is much faster and more feature rich. It has all features of indexed search but much more efficient.
Screenshots¶
This section shows how mnoGoSearch looks like in action. Screenshots in this section come from different sites, therefore visual styling also differs.
Search results in the Frontend¶
The following screenshot shows search results. Notice file type icon in the first result (OpenOffice document), relevancy indicator (green bar), size and last modification date. The second result did not provide last modification date and it is not displayed in the result.
The extension uses rich page browser to allow better navigation. Page browser can be customized to show as many page links as necessary:
Configuring what to index¶
The following screenshot shows Backend configuration of web space to be indexed. It says that the whole web site should be indexed:
Next, parts of the web site are prohibited from being indexed. These pages contain news and FAQ items. We will index them differently.
Finally we index FAQ items and news. Here is how indexing of news look like:
The reasons to index news, FAQ and some other records like this will be explained later in this manual.
Requirements¶
mnoGoSearch extension does not work on Windows servers because corresponding PHP extension is not available for Windows. It works fine on Linux, Unix, FreeBSD and Mac OS X servers.
RealURL or CoolURI is necessary if some parts of the site has to be excluded from search. See “Specifying web space to search” for more information.
To compile search engine and PHP extension, gcc
and
accompanying GNU build tools must be installed on the server.
Support for this extension¶
Free support for this extension is available through TYPO3 mailing lists. Author does not provide free support by e–mail. Commercial support is available on request when time permits.
Translations¶
Translation of this extension happens only through TYPO3 translation server. Please, do not send translation to the author as they will not be accepted. Instead contact TYPO3 translators using corresponding TYPO3 mailing list.
Bugs¶
Bugs must be reported only by using http://forge.typo3.org/projects /extension-mnogosearch/issues tracker. Bugs must not be sent by e–mail because such e–mails are not processed.
Users manual¶
This section describes how and what end users should do to enable searching web pages using mnoGoSearch. If you are looking for “Quick start”–like guide, you should check the “Tutorial” section first. It describes the workflow to get mnoGoSearch up and running quickly. This section describes various options to search pages.
Specifying web space to search¶
This section describes how to specify what the extension will search and index. Often the whole web site can be indexed but sometimes certain parts of the web site should not be indexed or should be indexed in a more efficient manner than just indexing pages. This section explains how to do it all.
How mnoGoSearch decides what to index¶
mnoGoSearch sees web sites as a hierarchical structure. When indexing, it needs to know where to start indexing. Typically the start of the hierarchy is the root of the web site (like :code:``http://example.com/ <http://example.com/>`_` ). But if necessary there can be many starting points (like :code:``http://example.com/products/ <http://example.com/>`_` and :code:``http://example.com/services/ <http://example.com/services/>`_` ). In this case search will be limited to corresponding starting points and everything below them (i.e. :code:``http://example.com/products/navigation/ <http://example.com/products/navigation/>`_` ) . Any pages outside of the configured starting points are not indexed and therefore not searchable.
The important point in the information above is that web site can be indexed as whole ( :code:``http://example.com/ <http://example.com/>`_` ) or as parts. When indexing as parts, site URLs should be hierarchical, which implies usage of RealURL or CoolURI.
When the whole site in indexed, some pages may still need to be
excluded. mnoGoSearch provides a way to disallow certain pages from
indexing. This can be accomplished by either using No search
checkbox in page properties. When multiple pages starting from a
certain page should not be indexed (like checkout pages), mnoGoSearch
allows to disable hierarchies by specifying path to the hierarchy.
Specifying pages to index¶
To specify pages for indexing, an indexing configuration record should be created. While creating these records, it is important to keep in mind that mnoGoSearch works with URL path hierarchies.
The first step in specifying pages for indexing is to choose where indexing records are stored. Typically it will be a web site home page or a storage folder. It does not make much difference. However it is good to be consistent and keep all indexing records for a web site on a single page. It allows to see what is actually indexed and what is excluded from indexing.
To create indexing configuration record, navigate to the page and use
List module to create indexing configuration record. By default
records are of the type “Server”. This is a simplest possible type. It
specifies indexing starting point as a path withing the web site. For
example, to index the full web site, it should be
http://example.com/
(assuming that example.com
is your
web site domain). Note the trailing slash, it is necessary if the URL
does not include any other path. Below is how such indexing records
look like:
Additional options include indexing period (24 hours is the default) and “Additional indexing configuration”. The latter allows to enter mnoGoSearch configuration directives directly. They will be appended to the generated indexer configuration. Information about directives can be found at http://mnogosearch.org/doc33/ . Notice that this field is not validated and any wrong directives will result in fatal error during indexing.
The next type of indexing records is a “Realm”. Realm is very similar
to “Server” but it allows to use regular expressions or wildcards to
specify paths. For example, one can enter
http://example.com/(news\|faq)/.\*
as a path. Make sure that
correct comparison type is specified:
Excluding parts of the web site from indexing¶
To exclude parts of the web site from indexing, create an indexing configuration record as described above bit set method to “Disallow”. It will prohibit any pages starting from the current path from indexing. The the screenshot above (“Real” record).
Notice that such records should appear in the List module before records that en able site indexing. The first record takes precedence when matching URLs. For example, consider http://example.com/page/?excludeMe=1. This order is correct:
- Disallow: *?excludeMe=*
- Allow: http://example.com/
This will first check “disallow” rule. If it matches, it will be used. It means that http://example.com/page/ will be indexed but http://example.com/page/?excludeMe=1 will not. However consider these rules:
- Allow: http://example.com/
- Disallow: *?excludeMe=*
Now both URLs will match because they match to the first rule and “disallow” rule will never work.
So when disallowing some pages from being indexed, always put disallow rule before the rule that allows indexing.
Indexing only real content¶
To improve search relevancy some parts of the page should be excluded from indexing. Such parts include navigation (menu), logo, copyright, statistics, partner links, copyright, etc. Typically only the real content should be included into index.
Special HTML comments can be added to the page to tell the indexer what parts of the site should be indexed. There can be many such markers on a single page. Here is a HTML fragment that illustrates how to add such markers:
<body>
<div id=”logo”>My site</div>
<!-- menu -->
<ul>
<li><a href=”/products/”>Products</a></li>
...
</ul>
<!-- content -->
<!--TYPO3SEARCH_begin-->
<div id=”content”>
Here goes real web site content...
</div>
<!--TYPO3SEARCH_end-->
<ul id="partner-links">
<li><a href="http://example.com/">Partner web site</a></li>
...
</ul>
<!-- extra content -->
<!--TYPO3SEARCH_begin-->
<div id=”extra-content”>
Here goes another content block...
</div>
<!--TYPO3SEARCH_end-->
<div id=”copyright”>Copyright © My company.</div>
</body>
In the example above content inside TYPO3SEARCH_xxx will be indexed and all links outside of these comments will be followed (added to the indexer queue). Notice that there must be no spaces or line breaks in these comments. They must be spelled exactly as shown in the example above.
Note that TemplaVoila creates such markers automatically. Other templating engines do not add such markers automatically.
Indexing records¶
In certain cases indexing content as pages is not efficient. For example, it is more efficient to index news records as records than as pages. When indexing news as pages, it adds more content than necessary to the index, increase load on the web server and lowers search relevance. When indexing news items as records, mnoGoSearch indexes only title and text fields fields. Thus only true news text is searchable.
Same applies to the FAQ (extension irfaq
) and some other
extensions that store information as records.
To index records, indexing configuration for them should be created.
To create indexing configuration for records navigate to the page of
the web site you have chosen to store indexed configuration at. Then
create indexing configuration record and set its type to “Records”.
Next choose the table you want to index. The form will refresh. Here
is how it will look like of “News” table from tt\_news
extension is chosen:
The form requires a title and text fields of the record to be selected. There must be one title field and one or more text fields to index. Text fields will be concatenated together during indexing. Notice that no conversion done on fields. Thus using “Archive date” in the form above will not be useful because this field is stored as integer value in the database. Only true text fields should be selected.
Next parameter to specify is URL parameters for the item's single
view. For most extensions it looks like
&tx\_extkey\_pi1[showUid]={field:uid}
. For tt\_news
it looks like shown on the screenshot above. The &
symbol in
the beginning of the parameter is mandatory. {field:uid}
is
replaced uid of the record. No other substitutions available.
It is possible to limit indexing to records from the certain storage folder. This way, for example, only news records of the web site will be indexed and not any imported news in another sysfolder.
Indexing files¶
Indexing files is possible in the same way as indexing pages. Specify
correct path to files ( :code:``http://example.com/fileadmin/
<http://example.com/fileadmin/>`_` and
:code:``http://example.com/uploads/ <http://example.com/uploads/>`_` )
to allow indexing them. The rest is done automatically. Directories
must show index of files in them (use Apache mod\_autoindex
).
To index file you successfully you must ensure that file parsing
applications (like catdoc
or pdftotext
are installed
on the server to the default places, normally /usr/bin
).
Currently mnoGoSearch supports indexing for the following file types:
sxw¶
Extension
sxw
odt
Mime type
application/vnd.oasis.opendocument.text
Requires applications
unzip
Description
OpenOffice document, requires unzip
to be in the current
execution path
doc¶
Extension
doc
Mime type
appication/msword
Requires applications
catdoc
Description
Microsoft Office document
xls¶
Extension
xls
Mime type
application/vnd.ms-excel
Requires applications
xmltohtml
ppt¶
Extension
ppt
Mime type
application/vnd.ms-powerpoint
Requires applications
pptohtml
pdf¶
Extension
Mime type
application/pdf
Requires applications
pdftotext
Description
Adobe PDF
txt¶
Extension
txt
Mime type
text/plain
Requires applications
Description
Plain text
html¶
Extension
html
Mime type
text/html
Requires applications
Description
HTML
Web servers must be configured to return correct mime type when file
is downloaded. With Apache, use AddType
Apache directive to
add mime type:
AddType application/vnd.oasis.opendocument.text *.sxw
AddType application/vnd.oasis.opendocument.text *.odt
Indexing large file collections¶
If number of files is large, it does not make sense to fetch them all
using HTTP. In this case an additional directive into the “Additional
configuration” field of the indexing configuration for files. This
directive will force the indexer to access files locally instead of
fetching them through HTTP. Assuming that files are located at
:code:``http://example.com/fileadmin/fileserver/
<http://example.com/fileadmin/fileserver/>`_` and physically at
/path/to/fileadmin/fileserver/
, the following directive
should be added:
Alias http://example.com/fileadmin/fileserver/ file:///path/to/fileadmin/fileserver/
Notice the correct number of slashes in paths.
Indexing https pages¶
Indexing https pages with self–signed certificates is not possible directly. mnoGoSearch indexer will refuse to index sucvh pages because it will not see certificate as valid. If obtaining a valid certificate is not an option, there is another way to index such pages. For that an utility named “curl” should be installed on the server.
Firsts, navigate to the web site root and execute the following command:
php typo3/cli_dispatch.phpsh mnogosearch -d | grep X-TYPO3
It will produce the output similar to:
HTTPHeader "X-TYPO3-mnogosearch: d3e203fdb699f7ba6ad7396fdba5c25a"
Note the part in quotes.
Next create a new file named curl.sh
somewhere in the file
system. If many sites run on the same host, it makes sense to put this
file inside the web site space. Put the following content into this
file:
#!/bin/sh
curl -i -k -H "X-TYPO3-mnogosearch: d3e203fdb699f7ba6ad7396fdba5c25a" $1 2>/dev/null
Note the part in quotes, it is taken from the output of the previous command. Do not copy this example! The header is unique for each site, even for sites running on the same server!
-H option adds a special HTTP header to the HTTP request. This header tells the extension that indexer is running. The extension will exclude all content outside of TYPO3SEARCH_xxx markers from indexed data. See “Indexing only real content” chapter for more information.
This script will fetch https pages even if certificate is self–signed. Make this file accessible and executable for the current use only:
chmod 0700 /path/to/curl.sh
Warning! Setting permissions like this is extremely important! Neither web server should be able to read this file, nor execute it. If permissions are not set correctly, security of the web site will be compromised!
Next add the following lines to the “Additional configuration” of the first indexing configuration you have:
Alias https:// exec:/path/to/curl.sh?https://
This will call this script for https://
scheme to fetch pages.
Now https pages with self–signed certificates can be indexed too.
Make sure that /path/to/curl.sh
points to the script.
/path/to
above is the placeholder for the real path.
Creating search form¶
There are various ways to add search form to the page. You can use one or more ways. If search box appears on each page, TypoScript will work best. However two other options are also available.
Using TypoScript¶
To create a short simple form (equivalent to “macina_searchbox” for indexed search) on each page do the following in TypoScript:
lib.search_form < plugin.tx_mnogosearch_pi1
lib.search_form.mode = short_form
Now lib.search_form can be used for replacing a marker or as a TemplaVoila object. You can also change other options (like template file). See the “Configuration” section later in this manual.
Notice that by default form is not cacheable. Non–cacheable form will show search terms in the search box after the submission. This may slightly decrease web site performance. To avoid this the following line can be added after the two lines shown above:
lib.search_form = USER
See also “Using HTML” below to make even better performance when using search forms.
Using page module¶
When using Page module, insert mnoGoSearch plugin to the page and select the desired form in the plugin properties:
Using HTML¶
Instead of using plugin or TypoScript it is possible to have the form directly in the web site template. The following is the minimum required form mark up for mnoGoSearch:
Note that it needs proper “action” URL. If you plan to use Google Analytics to track search results, the method must be “get”. Otherwise it can be “post”. Here is HTML:
<form action="/search/" method="get">
<input type="text" name="tx_mnogosearch_pi1[q]" value="" />
<input type="submit" name="tx_mnogosearch_pi1[submit]" value="Search" />
</form>
This method is recommended for better web site performance. Plugin for this extension is defined as USER_INT, which means that the plugin is never cached. Having this plugin on every page may cause a slightly lower web site performance. Notice that using form directly in HTML will show search field in such form empty after submission.
Creating advanced search form¶
Creating advanced search form needs a little more work in addition to creating simple search form. Currently advanced search form will display only one additional field. This field will allow to select what part of web site is to search. This field is hidden by default in configuration until version 2.1.8 when it will become enabled by default.
To enable advanced search form administrator should define more than
one indexing configuration for the web site. For example, he can
define configurations like “Everywhere” ( http://example.com/
), “News only” (table: “News”), “FAQ only” (table: “FAQ”). Next these
configurations should be added to the search limit field in the
plugin's flexform configuration or their ID values should be added to
the TypoScript property named “siteLimits”. Finally the selector
should be enabled in TypoScript:
lib.advanced_form < plugin.tx_mnogosearch_pi1
lib.advanced_form.form.advanced {
siteSelector = select
}
This code will render selector as a HTML <select>
element.
Other possible options are: radio
or checkboxes
.
It is possible to apply search limits but hide them in the advanced search form. For example, if “FAQ” indexing configuration has id value equals to 5, the following will TypoScript hide it from the form:
lib.advanced_form.form.advanced.siteSelector.exclude = 5
When there are several sites, current site is typically set as the
default for search. This is accomplished by using the default
option, which is set to the id of the indexing configuration for the
current site:
lib.advanced_form.form.advanced.siteSelector.default = 3
Another important feature of the advanced search form is the ability
to search the whole web site. Notice that it is not the same as
“Everywhere” configuration above. “Everywhere” configuration above
defines searching http://example.com/
, which means that “FAQ”
and “News” are not necessarily be searched. To add search for the
whole web site the following TypoScript property should be set:
lib.advanced_form.form.advanced.siteSelector.searchAll = 1
Now a new option will be added to the search form as the first item.
It will make sure that site is searched as if search was submitted
using simple search form (use only limits defined in siteLimit
TypoScript property or plugin flexform configuration).
searchAll
can be selected as the default option by setting the
default
value too empty string.
But now there will be to “Everywhere” entries in the search form: one
from the searchAll
and another from the indexing
configuration. Hiding the second will leave only one option and make a
proper form. Assuming that ID valie for the “Everywhere” indexing
configuration is 3, we should have the following TypoScript for the
advanced search form:
lib.advanced_form < plugin.tx_mnogosearch_pi1
lib.advanced_form {
# Add "Everywhere", "FAQ" and "News" limits to the search
siteList = 3,4,5
form.advanced {
# Enable selector
siteSelector = select
siteSelector {
# Allow to search the whole site (configurations 3, 4, and 5)
searchAll = 1
# Exclude the configuration with id=3 from the advanced search form only
exclude = 3
}
}
}
Creating page with search results¶
To create a page with search results an instance of the mnoGoSearch plugin should be inserted to the page.
Plugin mode¶
Plugin options contain only two controls: plugin mode and optional selector to limit search to certain web space. Plugin mode can be set to one or more of:
- Short search formThis form typically contains a short search box and a submit button
- Long search formCurrently it is identical to the short search form but has a larger search box, different text on the button. In future this form will be extended to contain search limit controls.
- Search resultsThis mode shows search results
Limiting search to a certain web space¶
If mnoGoSearch SQL database contains index for several sites, search will return results for every site. Sometimes search results should be limited to the current domain only or even to a part of the domain.
To limit search results to the domain, the field named “Limit search to” should be used. If this field is empty, search is not limited.
When this field is not empty, it must contain indexing configuration records that meet two criterias:
- they are of type “Server” or “Records” (“Realm” is not supported!)
- for “Server” type record indexing method must be “Allow” (any other method is not supported)
When field is not empty, mnoGoSearch will take all URLs from this field and returns results only that match these records. For example, if this field contains the following limits:
- Server: http://example.com/products
- Records: News, pid=12345
So if the whole server is indexed (http://example.com/), search result will contain only pages with news and pages with search terms on http://example.com/products and any page below in the page hierarchy.
Notice: it is impossible to use “Disallow” records! So it is not possible to disallow “News” and allow the rest of the site.
This feature behaves very similar to Google's domain search feature. In Google it is possible to search for a term everywhere by typing “term” in the Google search box. However it is possible to limit searach results to certain domains only using “term site:example.com/products” in the Google search box.
See “Specifying web space to search” above for information about indexing configurations.
Administration¶
This section provides information about extension installation.
Compiling and installing search engine¶
While some platforms may include mnoGoSearch, it is advised that you download mnoGoSearch source code, compile and install it. This will minimize compatibility problems between all three components of mnoGoSearch.
These instructions assume that MySQL is used as a database backend. No other backends were tested.
Download Unix source code of mnoGoSearch from its web site at http://www.mnogosearch.org/download.html . Make sure that you download at least version 3.3.6 (earlier versions may work but not supported by this document or TYPO3 extension).
Note: it is known that mnogosearch version 3.3.7 has a bug in the PHP extension module. Do not use version 3.3.7!
Server, where mnoGoSearch is compiled, must have the following packages installed:
automake
autoconf
gcc
php-devel
mysql
mysql-devel
(libmysqlclient15-dev
for Debian, :code :libmysqlclient-devel
for SuSE)zlib
zlib-devel
mc
(not needed if you want to use editor other thanmcedit
)
Later text assumes that mnoGoSearch source code is downloaded to the
/tmp
directory. Unpack it using
tar xzf mnogosearch-3.3.6.tar.gz
This will create a directory mnogosearch-3.3.6
inside the
/tmp
directory with all files in it.
Now locate file named configure.in
and modify it to change
value of HAVE\_PGSQL
from 1
to 0
(near line
990):
cd mnogosearch-3.3.6
mcedit configure.in
While in editor, press F7 (search) and HAVE\_PGSQL
, press
ENTER. Line will look like:
AC_DEFINE([HAVE_PGSQL], [1], [Define if you want to use PostgreSQL])
Change 1 to 0. Press F2 to save and F10 to exit editor.
Check the file named configure
near line 26517 and see if
value of HAVE\_PGSQL
is 0. Change if not.
Next source code must be configured for build:
./configure --prefix=/opt/mnogosearch --disable-mp3 --disable-news --without-debug --with-pgsql=no --with-freetds=no --with-oracle8=no --with-oracle8i=no --with-iodbc=no --with-unixODBC=no --with-db2=no --with-solid=no --with-openlink=no --with-easysoft=no --with-sapdb=no --with-ibase=no --with-ctlib=no --with-zlib --with-mysql --disable-syslog
All the above should come on a single line.
On Mac OS X only add the following to the end of configure
command:
CFLAGS=”-arch x86_64 -arch i386” LDFLAGS=”-arch x86_64 -arch i386” --enable-all-static
Warning: copy/paste of the line above will not work. Type it manually!
Now open include/udm\_autoconf.h
and locate
/* #undef HAVE_PGSQL */
near line 133 and change to
#undef HAVE_PGSQL
and save. Assuming that there were no errors in configure execution, make and install extension:
make
make install
Now check /opt/mnogosearch
directory. There should be
subdirectories like bin
, etc
, var
,
sbin
(may be some others).
Mac OS X users: read this post for additional information.
Compiling and installing PHP extension¶
Prepare and compile PHP module:
cd /tmp/mnogosearch-3.3.6/php
phpize
./configure --with-mnogosearch=/opt/mnogosearch
On Mac OS X only add the following to the end of configure
command:
CFLAGS=”-arch x86_64 -arch i386”
This properly configures mnoGoSearch PHP extension for the current PHP version.
Next step is extremely important to get PHP extension compiled and working right. Open php_mnogo.c in text editor and add
#undef HAVE_PGSQL
on the new line after
#include "php.h"
To make extension execute
make
Extension files will be places to "modules" subdirectory. Now you need
to check your php.ini
to find where PHP extensions are located
(search for extension_dir there) and place mnogosearch.so
from modules
subdirectory into the directory identified by
extension\_dir
.
If extension\_dir
does not contain file system path but
something like ./
, run this from the shell:
php -i | grep extension_dir
This will give you actual value of extension\_dir
parameter.
Next you need to add the following line to php.ini
to enable
extension:
extension=mnogosearch.so
Restart Apache and check for any error message in error log. If you find nothing related to mnogosearch, then all went fine.
Creating index database¶
You need to create mnoGoSearch index database manually. Do not use TYPO3 database to store indexing data because TYPO3 may remove your tables (they are not defined as TYPO3 tables).
To create index database, launch mysql command line tool with root account or account, who can create databases and grant privileges. The following is a snapshot of Linux shell session that will create mnoGoSearch database. You, however, can use any other tool (such as your web hosting control panel) to create a new MySQL database.
prompt$ mysql -u root -p mysql
Enter password: ****
> create database mnogosearch;
> grant all privileges on mnogosearch.* to 'jane'@'localhost' identified by 'janescomplexpassword';
> exit;
prompt$
The "create database" statement above creates a new database named
mnogosearch
. grant all privileges
statement grants
permissions to user jane
, who connects from localhost
to all tables in mnogosearch
database. Notice use of single
quotes in the last statement.
You can choose any database name but you need to memorize it because you will need it during the next step.
Database connection parameters will be entered later in the extension configuration. See Extension configuration later in this manual.
Using mnogosearch binary and extension supplied with operating system¶
Using mnogosearch binary and PHP extension supplied with the operating system of the server is possible but not recommended. The author found that those binaries do not always work correctly or severely outdated. The author will not accept any bug reports for such versions.
Additionally such versions are often installed in the location other
then /opt/mnogosearch/
, which means that TYPO3 extension will
not work. It is strongly recommended to compile both mnogosearch
engine and PHP extension. If it is not possible, the administrator
must ensure that mnogosearch binary is located in
/opt/mnogosearch/
or make a symbolic link for the
indexer
binary at /opt/mnogosearch/sbin/
.
Adding cron job¶
Cron script should be run once a day to perform reindexing in the
database. Cron script should have access to mysql and mnogosearch
engine (the once installed at /opt/mnogosearch/
).
To install a cron script, enter the following at Linux shell prompt:
crontab -e
It will open and editor. If you did not configure any special editor, it will be default Limux editor (either "vi" or "vim"). Press "i" button and enter the following at the end of the file as one line :
0 3 * * * /path/to/php/php5 -q /path/to/web/site/typo3/cli_dispatch.phpsh mnogosearch -w -n &>/dev/null
Make sure to enter correct paths. The /path/to/web/site
is the
path to the directory where typo3conf/
is located.
The cron line executes CLI script every day at 03:00. You can choose any hour by altering the second digit or create completely your own schedule. See cron Linux manual for more information.
Next create a new user in TYPO3 Backend. Name should be
\_cli\_
mnogosearch
user. Passwords and permissions do
not matter. TYPO3 requires this user for CLI scripts. No users will be
able to login to BE with this user account.
Installing TYPO3 extension¶
When installing extension, several options must be specified. The most important is the mnoGoSearch database. When specifying the database, make sure that slash exists before the question mark in the database URL. Failing to add the slash will result in connection problems to the mnoGoSearch database while indexing.
Configuring Frontend plugin using TypoScript¶
Most of the extension options are configured through TypoScript. The extension already has old good defaults, so settings rarely need to be changed. A full list of settings is shown in the “Configuration” section of this manual.
Using Google Analytics to track your searches¶
Google Analytics is able to track what users search on the web sites. The following screenshot shows how to configure Google Analytics to track mnoGoSearch searches:
If Google Analytics support is required, search forms must be submitted as “get” requests.
FAQ¶
TYPO3SEARCH_xxx comments are not respected. What is wrong?¶
Here is the checklist:
- make sure that there are no spaces around TYPO3SEARCH_xxx markers.
They must look exactly like
<!--TYPO3SEARCH\_begin-->
and<!--TYPO3SEARCH\_end-->
- check that comments are in the right order and not nested. The first comment must be TYPO3SEARCH_begin, followed by TYPO3SEARCH_end. Two consecutive TYPO3SERACH_begin or TYPO3SEARCH_end comments will mess up the indexing
- if
nc\_staticfilecache
extension is installed, check its version. Extension with versions < 2.3.2 conflict withmnogosearch
extension. Read the bug report at http://forge.typo3.org/issues/show/2291 and follow instructions there
I experimented and messed up my index. How do I clear it?¶
Run command line tool like this:
php typo3/cli_dispatch.phpsh mnogosearch -x -Cw
Answer YES
(in block letters!) to the prompt.
I removed a page. How do I remove it from index?¶
The extension does it automatically. There is no need to do anything like this. Same happens when page is renamed: old address is deleted from index and new address is added.
I receive a error “Got error 139 from the database engine” while indexing¶
This may happen if you use database mode blob
and InnoDb as
your default database engine. To fix the problem open mysql
command line tool for the search database and run this command:
alter table bdicti engine=myisam;
This will convert corresponding table to MyISAM format. Next restart the indexer.
There seems to be a clone of mnoGoSearch called DataParkSearch. What is it?¶
This is a fork of the mnoGoSearch project. It focuses on documentation and code quality improvements. It lacks PHP extension, so it cannot be used as a replacement for mnoGoSearch in TYPO3. mnoGoSearch PHP extension is incompatible with DataParkSearch.
What does “mnoGoSearch” mean?¶
This is about playing with words. Original developers are from Russia. “Mnogo” in Russian means “much” or “many” or “a lot”.
Configuration¶
This section contains a description of TypoScript configuration for the mnoGoSearch TYPO3 extension.
TypoScript reference¶
The table below lists top–level TypoScript properties.
templateFile
¶
Property
templateFile
Data type
String
Description
Template file to use
Default
EXT:mnogosearch/resources/template.html
mode
¶
Property
mode
Data type
String
Description
What to show in the plugin. Supported modes are:
- short_form
- long_form
- search
Default
long\_form,results
form
¶
Property
form
Data type
->FORM
Description
Search form options
Default
search
¶
Property
search
Data type
->SEARCH
Description
Search results options
Default
->FORM¶
The table below lists TypoScript properties for search forms.
resultsPage
¶
Property
resultsPage
Data type
Integer
Description
ID of the page where search results are shown. This is mainly used by the short search form when it is located on each page of the site.
Default
advanced
¶
Property
advanced
Data type
->ADVANCED
Description
Advanced search form configuration
Default
-> ADVANCED¶
The table below lists TypoScript properties for search forms.
siteSelector
¶
Property
siteSelector
Data type
string
Description
How to render the search limit selector. Valid values are:
- disabled
- select
- radio
- checkboxes
Default
siteSelector.
¶
Property
siteSelector.
Data type
->SELECTOR
Description
Selector configuration
Default
-> SELECTOR¶
The table below lists TypoScript properties for search forms.
searchAll
¶
Property
searchAll
Data type
boolean
Description
Enables “Everywhere” item in the search selector. This item will search all indexing configurations for this plugin instance
Default
exclude
¶
Property
exclude
Data type
string
Description
Comma–separated lists of indexing configuration id values to exclude form the form only
Default
default
¶
Property
default
Data type
mixed
Description
Id of the indexing configuration to select by default
Default
->SEARCH¶
The table below lists TypoScript properties for search results view.
excerptSize
¶
Property
excerptSize
Data type
Integer
Description
Maximum size of the excerpt to show in search results
excerptPadding
¶
Property
excerptPadding
Data type
Integer
Description
Maximum number of words to show before and after found terms
excerptHighlight
¶
Property
excerptHighlight
Data type
stdWrap
Description
Wraps found search terms in excerpt
extendedConfiguration
¶
Property
extendedConfiguration
Data type
Array
Description
This is an array of mnoGoSearch options (as defined be the search engine manual). Geberally there is no need to use these options unless you know what you are doing. Key is the option name and value is the value.
minimumWordLength
¶
Property
minimumWordLength
Data type
Integer
Description
Minimum word length
maximumWordLength
¶
Property
maximumWordLength
Data type
Integer
Description
Maximum word length
numberOfSections
¶
Property
numberOfSections
Data type
Integer
Description
Number of sections. Thus must correspond to the number of sections during indexing.
options
¶
Property
options
Data type
->OPTIONS
Description
Search options. Do not change these unless you know what you are doing.
pageBrowser
¶
Property
pageBrowser
Data type
Description
Configuration of the “pagebrowse” plugin
resultsPerPagenumber\_stdWrap
¶
Property
resultsPerPagenumber\_stdWrap
Data type
stdWrap
Description
All numbers in the summary will be wrapped with this stdWrap.
resultTime\_stdWrap
¶
Property
resultTime\_stdWrap
Data type
stdWrap
Description
Formats “Last-modified” date for pages and files. Default configuration produces empty string if last modification date is empty.
((Unknown Property))¶
Property
Data type
Description
siteList
¶
Property
siteList
Data type
String
Description
Comma–separated list of indexing configuration uid values for indexing records. Only paths that correspond to these indexing configurations will be searched. If empty, than there is no limit.
sortMode
¶
Property
sortMode
Data type
String
Description
RPD or DRP (relevancy, page rank, date or date, relevancy, page rank)
time\_format
¶
Property
time\_format
Data type
String
Description
Format for the search time
weightFactor
¶
Property
weightFactor
Data type
String
Description
A hexadecimal string that describes weight factors for different document sections. Default value puts priority on title and than body text.
Details on this option can be found at Mnogosearch documentation (see “wf” there)
Command line tool parameters¶
This section shows command line tool parameters. This is identical to
running typo3/cli\_dispatch.phpsh mnogosearch help
.
-c Only check and create database if necessary. Do not reindex.
-d Display generated indexer configuration and exit.
-n Force reindexing of new URLs (normally should be set)
-p pid Process indexing configuration only from this pid
-w Create statistic for misspelled words. Useful only if
. Ispell dictionaries are included to mnoGoSearch
. configuration (see mnoGoSearch documentation)
--dry-run Show what will be done (not applicable to -d and -E)
-h, --help, -? Display this help message
-x Pass the argument to mnoGoSearch indexer
-v level Be verbose. Level is 0-5. Default is 0 (complete silence)
Tutorial¶
This section provides step by step instructions to get mnoGoSearch up and running.
The following steps should be done to use mnoGoSearch for searching the web site. Refer to the corresponding sections of this manual for more information.
Compile and install search engine and PHP extension
Create database for index
Create indexing configuration records
Run mnoGoSearch command line tool manually with -v 3 -w
options to ensure that it indexes the web site
Install cron job for regular reindexing
Add search results page and configure Frontend plugin on it
Add search box to pages of your site
Known problems¶
None.
To-Do list¶
Extend search form with options to limit search to certain indexing configurations