Troubleshooting¶
Table of Contents
Problem reading data in Crawler Queue¶
With the crawler release 9.1.0 we have changed the data stores in crawler queue from serialized to json data. If you are experiencing problems with the old data still in your database, you can flush your complete crawler queue and the problem should be solved.
We have build in a JsonCompatibilityConverter
to ensure that this should not
happen, but in case of it run:
$ vendor/bin/typo3 crawler:flushQueue all
Make Direct Request doesn't work¶
If you are using direct request, see Extension Manager Configuration, and it doesn't give you any result, or that the scheduler tasks stalls.
It can be because of a faulty configured TrustedHostPattern
, this can be
changed in the LocalConfiguration.php
.
$GLOBALS['TYPO3_CONF_VARS']['SYS']['trustedHostsPattern'] = '<your-pattern>';
Crawler want process all entries from command line¶
The crawler won't process all entries at command-line-way. This might happened because the php run into an time out, to avoid this you can call the crawler like:
php -d max_execution_time=512 vendor/bin/typo3 crawler:buildQueue
Crawler Count is 0 (zero)¶
If you experiences that the crawler queue only adds one url to the queue, you are probably on a new setup, or an update from TYPO3 8LTS you might have some migration not executed yet.
Please check the Upgrade Wizard, and check if the Introduce URL parts ("slugs") to all existing pages is marked as done, if not you should perform this step.
See related issue: [BUG] Crawling Depth not respected #464
Update from older versions¶
If you update the extension from older versions you can run into following error:
SQL error: 'Field 'sys_domain_base_url' doesn't have a default value'
Make sure to delete all unnecessary fields from database tables. You can do
this in the backend via Analyze Database Structure tool or if you
have TYPO3 Console
installed via command line command
vendor/bin/typo3cms database:updateschema
.
TYPO3 shows error if the PHP path is not correct¶
In some cases you get an error, if the PHP path is not set correctly. It occures if you select the Site Crawler in Info-module.

Error message in Info-module¶
In this case you have to set the path to your PHP in the Extension configuration.

Correct PHP path settings in Extension configuration¶
Please be sure to add the correct path to your PHP. The path in this screenshot might be different to your PHP path.
Info Module throws htmlspecialchars() expects parameter 1 to be string¶
We have had a bug in the Crawler for a while, which I had difficulties figuring out. The bug is cause by a problem with the CrawlerHook in the TYPO3 Core, as this is remove in TYPO3 11.
I will not try to provide a fix for this, but only a workaround.
Workaround¶
The problem appears when the Crawler Configuration and the Indexed_Search Configuration are stored on the same page. The workaround is then to move the Indexed_Search Configuration to a different page. I have not experience any side-effects on this change, but if you do so, please report them to me.
This workaround is for these two bugs:
https://github.com/tomasnorre/crawler/issues/576 and https://github.com/tomasnorre/crawler/issues/739
If you would like to know more about what's going it, you can look at the core:
Here a int value is submitted instead of a String. This is a change that goes more than 8 years back. So surprised that it never was a problem before.
Crawler Log shows "-" as result¶
In Crawler v11.0.0 after introducing PHP 8.0 compatibility. We are influenced by a bug in the PHP itself
https://bugs.php.net/bug.php?id=81320, this bugs make the Crawler status an invalid JSON and can therefore
not render the correct result. It will display the result in the Crawler Log as -
.
Even though the page is correct crawler, the status is incorrect, which is of course not desired.
Workaround¶
On solution can be to remove the php8.0-uploadprogress
package from your server. If this version is below
1.1.4, this will trigger the problem. Removing the package can of course be a problem if you are depending on it.
If possible, better update it to 1.1.4 or higher, then the problem should be solved as well.
Site config baseVariants not used¶
An issue was reported for the Crawler, that the Site Config baseVariants was not respected by the Crawler.
https://github.com/tomasnorre/crawler/issues/851, it turned out that crawler had problems with ApplicationContexts
set in .htaccess
like in example.
<IfModule mod_rewrite.c>
# Rules to set ApplicationContext based on hostname
RewriteCond %{HTTP_HOST} ^(.*)\.my\-site\.localhost$
RewriteRule .? - [E=TYPO3_CONTEXT:Development]
RewriteCond %{HTTP_HOST} ^(.*)\.mysite\.info$
RewriteRule .? - [E=TYPO3_CONTEXT:Production/Staging]
RewriteCond %{HTTP_HOST} ^(.*)\.my\-site\.info$
RewriteRule .? - [E=TYPO3_CONTEXT:Production]
</IfModule>
Workaround¶
this problem isn't solved, but it can be bypassed by using the helhum/dotenv-connector
https://github.com/helhum/dotenv-connector