Table of Contents
With the crawler release 9.1.0 we have changed the data stores in crawler queue from serialized to json data. If you are experiencing problems with the old data still in your database, you can flush your complete crawler queue and the problem should be solved.
We have build in a JsonCompatibilityConverter to ensure that this should not happen, but in case of it run:
$ vendor/bin/typo3 crawler:flushQueue all
If you are using direct request, see Extension Manager Configuration, and it doesn’t give you any result, or that the scheduler tasks stalls.
It can be because of a faulty configured TrustedHostPattern, this can be
changed in the
$GLOBALS['TYPO3_CONF_VARS']['SYS']['trustedHostsPattern'] = '<your-pattern>';
The crawler won’t process all entries at command-line-way. This might happened because the php run into an time out, to avoid this you can call the crawler like:
php -d max_execution_time=512 vendor/bin/typo3 crawler:buildQueue
If you experiences that the crawler queue only adds one url to the queue, you are probably on a new setup, or an update from TYPO3 8LTS you might have some migration not executed yet.
Please check the Upgrade Wizard, and check if the Introduce URL parts (“slugs”) to all existing pages is marked as done, if not you should perform this step.
See related issue: [BUG] Crawling Depth not respected #464
If you update the extension from older versions you can run into following error:
SQL error: 'Field 'sys_domain_base_url' doesn't have a default value'
Make sure to delete all unnecessary fields from database tables. You can do
this in the backend via Analyze Database Structure tool or if you
have TYPO3 Console
installed via command line command
In some cases you get an error, if the PHP path is not set correctly. It occures if you select the Site Crawler in Info-module.
In this case you have to set the path to your PHP in the Extension configuration.
Please be sure to add the correct path to your PHP. The path in this screenshot might be different to your PHP path.
We have had a bug in the Crawler for a while, which I had difficulties figuring out. The bug is cause by a problem with the CrawlerHook in the TYPO3 Core, as this is remove in TYPO3 11.
I will not try to provide a fix for this, but only a workaround.
The problem appears when the Crawler Configuration and the Indexed_Search Configuration are stored on the same page. The workaround is then to move the Indexed_Search Configuration to a different page. I have not experience any side-effects on this change, but if you do so, please report them to me.
This workaround is for these two bugs:
If you would like to know more about what’s going it, you can look at the core:
Here a int value is submitted instead of a String. This is a change that goes more than 8 years back. So surprised that it never was a problem before.