index_phash

This table contains references to TYPO3 pages or external documents. The fields are like this:

phash

Field

phash

Description

7md5/int hash. It's an integer based on a 7-char md5-hash.

This is a unique representation of the 'page' indexed.

For TYPO3 pages this is a serialization of id,type,gr_list (see later), MP and cHashParams (which enables 'subcaching' with extra parameters). This concept is also used for TYPO3 caching (although the caching hash includes the all-array and thus takes the template into account, which this hash does not! It's expected that template changes through conditions would not seriously alter the page content)

For external media this is a serialization of 1) unique filename id, 2) any subpage indication (parallel to cHashParams). gr_list is NOT taken into consideration here!

phash_grouping

Field

phash_grouping

Description

7md5/int hash.

This is a non-unique hash exactly like phash, but WITHOUT the gr_list and (in addition) for external media without subpage indication. Thus this field will indicate a 'unique' page (or file) while this page may exist twice or more due to gr_list. Use this field to GROUP BY the search so you get only one hit per page when selecting with gr_list in mind.

Currently a search result does not either group or limit by this, but rather the result display may group the result into logical units.

item_mtime

Field

item_mtime

Description

Modification time:

For TYPO3 pages: the SYS_LASTCHANGED value

For external media: The filemtime() value.

Depending on config, if mtime hasn't changed compared to this value the file/page is not indexed again.

tstamp

Field

tstamp

Description

time stamp of the indexing operation. You can configure min/max ages which are checked with this timestamp.

A min-age defines how long an indexed page must be indexed before it's reconsidered to index it again.

A max-age defines an absolute point at which re-indexing will occur (unless the content has not changed according to an md5-hash)

cHashParams

Field

cHashParams

Description

The cHashParams.

For TYPO3 pages: These are used to re-generate the actual url of the TYPO3 page in question

For files this is an empty array. Not used.

item_type

Field

item_type

Description

An integer indicating the content type,

0 is TYPO3 pages

1- external files like pdf (2), doc (3), html (1), txt (4) and so on. See the class.indexer.php file

item_title

Field

item_title

Description

Title:

For TYPO3 pages, the page title

For files, the basename of the file (no path)

item_description

Field

item_description

Description

Short description of the item. Top information on the page. Used in search result.

data_page_id

Field

data_page_id

Description

For TYPO3 pages: The id

data_page_type

Field

data_page_type

Description

For TYPO3 pages: The type

data_filename

Field

data_filename

Description

For external files: The filepath (relative) or URL (not used yet)

contentHash

Field

contentHash

Description

md5 hash of the content indexed. Before reindexing this is compared with the content to be indexed and if it matches there is obviously no need for reindexing.

crdate

Field

crdate

Description

The creation date of the INDEXING - not the page/file! (see item_crdate)

parsetime

Field

parsetime

Description

The parsetime of the indexing operation.

sys_language_uid

Field

sys_language_uid

Description

Will contain the value of GLOBALS["TSFE"]->sys_language_uid, which tells us the language of the page indexed.

item_crdate

Field

item_crdate

Description

The creation date. For files only the modification date can be read from the files, so here it will be the filemtime().

gr_list

Field

gr_list

Description

Contains the gr_list of the user initiating the indexing of the document.