Troubleshooting
Table of Contents
Problem reading data in Crawler Queue
With the crawler release 9.1.0 we have changed the data stores in crawler queue from serialized to json data. If you are experiencing problems with the old data still in your database, you can flush your complete crawler queue and the problem should be solved.
We have build in a Json
to ensure that this should not
happen, but in case of it run:
$ vendor/bin/typo3 crawler:flushQueue all
Make Direct Request doesn't work
If you are using direct request, see Extension Manager Configuration, and it doesn't give you any result, or that the scheduler tasks stalls.
It can be because of a faulty configured Trusted
, this can be
changed in the Local
.
$GLOBALS['TYPO3_CONF_VARS']['SYS']['trustedHostsPattern'] = '<your-pattern>';
Crawler want process all entries from command line
The crawler won't process all entries at command-line-way. This might happened because the php run into an time out, to avoid this you can call the crawler like:
php -d max_execution_time=512 vendor/bin/typo3 crawler:buildQueue
Crawler Count is 0 (zero)
If you experiences that the crawler queue only adds one url to the queue, you are probably on a new setup, or an update from TYPO3 8LTS you might have some migration not executed yet.
Please check the Upgrade Wizard, and check if the Introduce URL parts ("slugs") to all existing pages is marked as done, if not you should perform this step.
See related issue: [BUG] Crawling Depth not respected #464
Update from older versions
If you update the extension from older versions you can run into following error:
SQL error: 'Field 'sys_domain_base_url' doesn't have a default value'
Make sure to delete all unnecessary fields from database tables. You can do
this in the backend via Analyze Database Structure tool or if you
have TYPO3 Console
installed via command line command
vendor/
.
TYPO3 shows error if the PHP path is not correct
In some cases you get an error, if the PHP path is not set correctly. It occures if you select the Site Crawler in Info-module.

Error message in Info-module
In this case you have to set the path to your PHP in the Extension configuration.

Correct PHP path settings in Extension configuration
Please be sure to add the correct path to your PHP. The path in this screenshot might be different to your PHP path.
Info Module throws htmlspecialchars() expects parameter 1 to be string
We have had a bug in the Crawler for a while, which I had difficulties figuring out. The bug is cause by a problem with the CrawlerHook in the TYPO3 Core, as this is remove in TYPO3 11.
I will not try to provide a fix for this, but only a workaround.
Workaround
The problem appears when the Crawler Configuration and the Indexed_Search Configuration are stored on the same page. The workaround is then to move the Indexed_Search Configuration to a different page. I have not experience any side-effects on this change, but if you do so, please report them to me.
This workaround is for these two bugs:
https://github.com/tomasnorre/crawler/issues/576 and https://github.com/tomasnorre/crawler/issues/739
If you would like to know more about what's going it, you can look at the core:
Here a int value is submitted instead of a String. This is a change that goes more than 8 years back. So surprised that it never was a problem before.
Crawler Log shows "-" as result
In Crawler v11.0.0 after introducing PHP 8.0 compatibility. We are influenced by a bug in the PHP itself
https://bugs.php.net/bug.php?id=81320, this bugs make the Crawler status an invalid JSON and can therefore
not render the correct result. It will display the result in the Crawler Log as -
.
Even though the page is correct crawler, the status is incorrect, which is of course not desired.
Workaround
On solution can be to remove the php8.
package from your server. If this version is below
1.1.4, this will trigger the problem. Removing the package can of course be a problem if you are depending on it.
If possible, better update it to 1.1.4 or higher, then the problem should be solved as well.
Site config baseVariants not used
An issue was reported for the Crawler, that the Site Config baseVariants was not respected by the Crawler.
https://github.com/tomasnorre/crawler/issues/851, it turned out that crawler had problems with Application
set in .htaccess
like in example.
<IfModule mod_rewrite.c>
# Rules to set ApplicationContext based on hostname
RewriteCond %{HTTP_HOST} ^(.*)\.my\-site\.localhost$
RewriteRule .? - [E=TYPO3_CONTEXT:Development]
RewriteCond %{HTTP_HOST} ^(.*)\.mysite\.info$
RewriteRule .? - [E=TYPO3_CONTEXT:Production/Staging]
RewriteCond %{HTTP_HOST} ^(.*)\.my\-site\.info$
RewriteRule .? - [E=TYPO3_CONTEXT:Production]
</IfModule>
Workaround
this problem isn't solved, but it can be bypassed by using the helhum/
https://github.com/helhum/dotenv-connector