Formerly configuration was done by using pageTS (see below). This is
still possible (fully backwards compatible) but not recommended.
Instead of writing pageTS simply create a configuration record (table:
tx_crawler_configuration) and put it on the topmost page of the
pagetree you want to affect with this configuration.
The fields in these records are related to the pageTS keys described below.
Fields and their pageTS equivalents¶
- Corresponds to the “key” part in the pageTS setup e.g.
- Protocol for crawling
- Force HTTP, HTTPS or keep the configured protocol
- Processing instruction filter
- List of processing instructions. See also: paramSets.[key].procInstrFilter
- Base URL
- Set baseUrl (most likely the same as the entry point configured in your site configuration)
- Pids only
- List of Page Ids to limit this configuration to. See also: paramSets.[key].pidsOnly
- Exclude pages
- Comma separated list of page ids which should not be crawled. You can do recursive exclusion by adding uid`+`depth e.g. 6+3, this will ensure that all pages including pageUid 6 and 3 levels down will not be crawled.
- Parameter configuration. The values of GET variables are according to a special syntax. See also: paramSets.[key]
- Processing instruction parameters
- Options for processing instructions. Will be defined in the respective third party modules. See also: paramSets.[key].procInstrParams
- Crawl with FE user groups
- User groups to set for the request. See also: paramSets.[key].userGroups and the hint in create-crawler-configuration
- If activated the configuration record is not taken into account.
- Restrict access to
- Restricts access to this configuration record to selected backend user groups. Empty means no restriction is set.