Configuration records

Formerly configuration was done by using pageTS (see below). This is still possible (fully backwards compatible) but not recommended. Instead of writing pageTS simply create a configuration record (table: tx_crawler_configuration) and put it on the topmost page of the pagetree you want to affect with this configuration.

The fields in these records are related to the pageTS keys described below.

Fields and their pageTS equivalents

General

Backend configuration record: General

Backend configuration record: General

Name
Corresponds to the “key” part in the pageTS setup e.g. tx_crawler.crawlerCfg.paramSets.myConfigurationKeyName
Protocol for crawling
Force HTTP, HTTPS or keep the configured protocol
Processing instruction filter
List of processing instructions. See also: paramSets.[key].procInstrFilter
Base URL
Set baseUrl (most likely the same as the entry point configured in your site configuration)
Pids only
List of Page Ids to limit this configuration to. See also: paramSets.[key].pidsOnly
Exclude pages
Comma separated list of page ids which should not be crawled
Configuration
Parameter configuration. The values of GET variables are according to a special syntax. See also: paramSets.[key]
Processing instruction parameters
Options for processing instructions. Will be defined in the respective third party modules. See also: paramSets.[key].procInstrParams
Crawl with FE user groups
User groups to set for the request. See also: paramSets.[key].userGroups and the hint in create-crawler-configuration

Access

Backend configuration record: Access

Backend configuration record: Access

Hide
If activated the configuration record is not taken into account.
Restrict access to
Restricts access to this configuration record to selected backend user groups. Empty means no restriction is set.