Formerly configuration was done by using pageTS (see below). This is
still possible (fully backwards compatible) but not recommended.
Instead of writing pageTS simply create a configuration record (table:
tx_crawler_configuration) and put it on the topmost page of the
pagetree you want to affect with this configuration.
The fields in these records are related to the pageTS keys described below.
Fields and their pageTS equivalents¶
Corresponds to the "key" part in the pageTS setup e.g.
- Protocol for crawling
Force HTTP, HTTPS or keep the configured protocol
- Processing instruction filter
List of processing instructions. See also: paramSets.[key].procInstrFilter
- Base URL
Set baseUrl (most likely the same as the entry point configured in your site configuration)
- Pids only
List of Page Ids to limit this configuration to. See also: paramSets.[key].pidsOnly
- Exclude pages
Comma separated list of page ids which should not be crawled. You can do recursive exclusion by adding
uid`+`depthe.g. 6+3, this will ensure that all pages including pageUid 6 and 3 levels down will not be crawled.
Parameter configuration. The values of GET variables are according to a special syntax. See also: paramSets.[key]
- Processing instruction parameters
Options for processing instructions. Will be defined in the respective third party modules. See also: paramSets.[key].procInstrParams
- Crawl with FE user groups
User groups to set for the request. See also: paramSets.[key].userGroups and the hint in create-crawler-configuration
If activated the configuration record is not taken into account.
- Restrict access to
Restricts access to this configuration record to selected backend user groups. Empty means no restriction is set.