Introduction 

About 

Broken Link Fixer (brofix) is an extension which enables you to conveniently check your website for broken links. This manual explains how to install, configure and use the extension.

The extension was started using the core EXT:linkvalidator source code and documentation but is now an independent project.

What does it do? 

Broken Link Fixer checks the links in your website, reports broken links and provides a way to fix these problems.

It includes the following features:

  • Broken Link Fixer can check all kinds of links. This includes internal links to pages and content elements, file links to files in the local file system and external links to files somewhere else in the web.
  • Broken Link Fixer checks a number of fields by default, for example header fields and text fields of content elements. It can be configured to check any field you like (via TSconfig).
  • A console command can be setup to check automatically. This can also generate a report which is sent via email.
  • Broken Link Fixer is extendable. It provides hooks to check special types of links or override how the checking of external, file and page links works.

Difference to linkvalidator 

Broken Links Fixer was forked off linkvalidator but then developed independantly, which made it possible to make more significant changes:

  • improved user interface with better handling of list of broken links:

    • sort (by page, link type, link target, error type etc.)
    • paginate (if more than 100 broken links are displayed)
    • filter the broken link list, e.g. by URL
    • "clickfilter": filter by content element or URL by click
    • possible to recheck for a specific URL by clicking a button "Check link again" - all broken link records with this target will be updated if status changes
    • more descriptive (flash) messages to show what is going on
    • visible hints for "stale" broken link records (e.g. if content element was edited after last link check)
  • more visibility of broken links by showing number of broken links for the page in the page module (if EXT:page_callouts is installed)
  • better handling of external links

    • possibility to exclude links from being checked to avoid false positives
    • link target cache to avoid frequent rechecking of external links
    • crawl delay: automatically delay between checking links of one domain
  • link checking

    • the scheduler task was replaced by a console command
    • it is not necessary to specify the start pid, if no pid is given, the site configuration is used
    • configuration from Global Configuration is used, if not explicitly specified in link configuration (e.g. from email address)
    • the broken link records are not removed and created again, but updated. In linkvalidator, the entire list (for current check criteria) is removed at beginning of link check. This might result in duplicates and in broken links missing during link check.
    • content fields are not checked if they are not editable in the BE. This includes permission checks (which linkvalidator also handles), but also checks via FormEnginge - for example tt_content.bodytext is not editable in the BE for plugins. If CE types are switching in content elements, this can be a problem with linkvalidator.
    • broken link records are automatically removed via DataHandler hook if a record is deleted.

Credits 

This extension is based on the TYPO3 core extension EXT:linkvalidator. It was forked from the source code of linkvalidator. Thus, it is based on the work of the original authors and maintainers.

Glossary 

false positives 

These are URL which were falsely detected as broken. They are valid URLs which Broken Link Fixer detects as broken.

on-the-fly checking 

"On-the-fly" checking means almost immediate link checking as soon as the record is saved. This is in contrast to periodic link checking via the console command.

Installation 

If you are using Composer, you can install it like any other extension.

composer require sypets/brofix
Copied!

If you are not using Composer, you can install Broken Link Fixer (brofix) using the Extension Manager.

page_callouts 

It is recommended to install the extension page_callouts as well, as it will supply the hook / event to show information about broken links in the page module.

Page module with information about broken links and link to broken link list.

Installation of page_callouts

composer require sypets/page-callouts
Copied!

Setup quickstart 

  1. Setup Minimal configuration

    Also see the Configuration Reference, for more configuration options

  2. Check mail sending

    If an email should be sent on every link check performed via the console command, it is a good idea to check if email sending is setup correctly and works. (Sending a mail is optional).

    Go to Environment > Test Mail Setup

  3. Setup the console command brofix:checkLinks

Minimal configuration 

Page TSconfig 

# email recipients
mod.brofix.mail.recipients = recipient@example.org

# Add contact information here, such as an email address or a URL which contains an email address
mod.brofix.linktypesConfig.external.headers.User-Agent =  Mozilla/5.0 (compatible; Site link checker; +https://gratesturff.com/imprint.html)

# pid of a page of type folder - this is where the exclude link target
# records are stored
mod.brofix.excludeLinkTarget.storagePid = 20
Copied!

Commands 

If using scheduler, select Task Execute console commands first.

Configuration Reference 

Backend user configuration 

Give your backend users / user groups permission to the "Check Links" (web_brofix) module.

Give backend users / user groups permission to the table tx_brofix_exclude_link_target, if they should be able to add URLs to the list of URLs not to be checked. (This requires a certain amount of prudence and understanding, otherwise this feature may be misused.)

In this case, you must also set TSconfig excludeLinkTarget.storagePid to a page of type system folder. The editors must have access to this page (to be able to save records on this page).

Global Configuration 

The global configuration affects not just brofix but the behaviour of other extensions as well.

If mod.brofix.mail.sendOnCheckLinks is 1, an email will be sent. You can override this in the console command. If an email should be sent, you should configure the recipient and sender address.

You can configure the following settings to set the from address globally (or you can set it specifically for brofix via TSconfig):

$GLOBALS['TYPO3_CONF_VARS']['MAIL']['defaultMailFromName'] = 'Webmaster';
$GLOBALS['TYPO3_CONF_VARS']['MAIL']['defaultMailFromAddress'] = 'webmaster@example.org';
Copied!

This determines whether an html mail is sent, a text mail or both:

$GLOBALS['TYPO3_CONF_VARS']['MAIL']['format'] = 'both';
Copied!

The template path is already added in this extension in ext_localconf.php, but only if the slot 901 is still free:

$GLOBALS['TYPO3_CONF_VARS']['MAIL']['templateRootPaths'][901]
   = 'EXT:brofix/Resources/Private/Templates/Email';
$GLOBALS['TYPO3_CONF_VARS']['MAIL']['partialRootPaths'][901]
   = 'EXT:brofix/Resources/Private/Partials';
Copied!

If not, you need to set this yourself, if a mail should be submitted when link checking is performed, using the default template in this extension.

Extension Configuration 

Is handled on separate page: Extension configuration.

Tsconfig 

Is handled on separate page: TSconfig Reference.

Extension configuration 

Extension configuration is used for global settings which should be the same for the entire TYPO3 installation.

It is configured in the backend, via Settings | Extension Configuation or using the file settings.php.

EXT:backend 

EXT:brofix 

EXT:brofix | linkTargetCacheExpiresLow 

External link target cache (in seconds) for checking

Checking tab

default:
0 (means use TSconfig value linkTargetCache.expiresLow)
available values:
any integer value

For a description see the TSconfig option linkTargetCache.expiresLow.

EXT:brofix | linkTargetCacheExpiresHigh 

External link target cache (in seconds) for checking

Checking tab

default:
0 (means use TSconfig value linkTargetCache.expiresHigh)
available values:
any integer value

This should be a slightly higher value than EXT:brofix | linkTargetCacheExpiresLow or 0.

For a description see the TSconfig option linkTargetCache.expiresHigh.

EXT:brofix | combinedErrorNonCheckableMatch 

(since TYPO3 v12)

Non-checkable match

Checking tab

default:
"regex:/^(httpStatusCode:(401|403):|libcurlErrno:60:SSL certificate problem: unable to get local issuer certificate)/"
available values:
either a regex starting with regex: or a string

If result from link target checking match this, consider the link target (URL) as non-checkable. This is written to the database table and displayed in the backend module. It is possible to filter by this status. By default, these links are not displayed (since the default filter in the backend shows only broken links).

Currently, these are the known status:

  • 1: broken
  • 2: ok
  • 3: not possible to check ("non-checkable")
  • 4: is excluded

This should also improve handling of cloudflare protected sites as these typically return 403 HTTP status code. The link checking status is no longer considered broken, it is now considered "not-checkable", since the actual link check result cannot be obtained.

What kind of results from link checking, make the URL "non-checkable" can be configured via Exension Configuration "combinedErrorNonCheckableMatch".

This can be either a regular expression (with prefix "regex:" and enclosing delimeters (e.g. "/"). Or it can be a list of strings, separated by comma.

This is matched against a combination of the link checking result, consisting of:

<errorType> ":" <errorCode> ":" <exceptionMessage>
Copied!

To match HTTP status code 401, you could use:

httpStatusCode:401:
Copied!

The possible errorTypes and errorCodes can be seen in the class ExternalLinktype or via the database field tx_brofix_broken_links.url_response.

This is the default value:

regex:/^(httpStatusCode:(401|403):|libcurlErrno:60:SSL certificate problem: unable to get local issuer certificate)/
Copied!

EXT:brofix | excludeSoftrefs 

Do not use these softreference parsers (comma separated list) when parsing content

Checking tab

  • default: url
  • available values: any softref parser keys, separated by comma

This is a workaround for a TYPO3 core bug, see see https://forge.typo3.org/issues/97937

EXT:brofix | excludeSoftrefsInFields 

In which fields should excludeSoftrefs apply

Checking tab

default:
"tt_content.bodytext"
available values:
any softref parser keys, separated by comma

This is a workaround for a TYPO3 core bug, see see https://forge.typo3.org/issues/97937

Usually, you will want to apply this in any rich text fields where link tags are used.

EXT:brofix | tcaProcessing 

Perform TCA processing

Checking tab

default:
"default"
available values:
"default" | "full"

Changes how the TCA processing is done. The default setting may not work for some configurations and especially for Flexforms. In that case, it should be set to "full". This setting is still experimental, so it is not on by default.

This setting results in 2 changes:

  1. Use of the FormDataGroup
  2. If the entire row is fetched for TCA processing. If "full" is on, the entire row is fetched. If the value is "default", only the fields defined in "searchFields" are fetched, in addition to some fields such as type, relevant fields for language evaluation and header.

By default, one of the following class names to use as FormDataGroup for TCA processing will be used based on the value of tcaProcessing:

  • "default": SypetsBrofixFormEngineFieldShouldBeChecked
  • "full": SypetsBrofixFormEngineFieldShouldBeCheckedWithFlexform

EXT:brofix | overrideFormDataGroup 

Override FormDataGroup for processing TCA

Checking tab

default:
"" (empty, which means the default FormDataGroup based on tcaProcessing is used)
available values:
any valid class name which implements FormDataGroupInterface as fully qualified class name, for example MyvendorMyextensionFormEngineMyFormdatagroup

Changes how the TCA processing is done.

EXT:brofix | showEditButtons 

(since TYPO3 v12)

Show button to edit entire record, only the field with a broken link or both.

Report tab

default:
"Both" (both buttons are displayed)
available values:
"Both", "Edit field", "Edit full"

EXT:brofix | traverseMaxNumberOfPagesInBackend 

Maximum number of pages to traverse in Backend ...

Report tab

default:
1000
available values:
any number, 0 turns feature off

Set the maximum number of pages traversed in the backend module. This should be limited so that loading the broken link list in the backend does not feel sluggish and slow. A good rule of thumb is to always keep the time required to load a page in the Backend always under 1 second. Depending on the performance of your site, you should use a limit such as 1000 (thousand).

$GLOBALS['TYPO3_CONF_VARS']['EXTENSIONS']['brofix']['traverseMaxNumberOfPagesInBackend'] = 1000;
Copied!

TSconfig Reference 

Default configuration:

User-Agent 

required

Property
mod.brofix.linktypesConfig.external.headers.User-Agent
Data type
string
Description

This is what is sent as "User-Agent" to the external site when checking external URLs. It should contain a working URL with contact information.

User-Agent = Mozilla/5.0 (compatible; Mysite LinkChecker/1.1; +https://mysite.com/imprint.html
Copied!
Default
If not set, a default is automatically generated using the email address from Global Configuration $GLOBALS['TYPO3_CONF_VARS']['MAIL']['defaultMailFromAddress'].

linktypes 

optional

Property
mod.brofix.linktypes
Data type
string
Description

Comma separated list of hooks to load.

Possible values:

db: Check links to database records (pages, content elements).

file: Check links to files located in your local TYPO3 installation.

external: Check links to external files.

This list may be extended by other extensions providing a linktype checker.

Default
db,file,external

searchFields.[table] 

optional

Property
mod.brofix.searchFields.[table]
Data type
string
Description

Comma separated list of table fields in which to check for broken links. Broken Link Fixer only checks fields that have been defined in searchFields.

Broken Link Fixer ships with sensible defaults that work well for the TYPO3 core. Not all fields which contain links are currently checked though. You can configure additional fields for extensions.

Example

Only check for bodytext in tt_content:

mod.brofix.searchFields.tt_content = bodytext
Copied!

Add checks for news and calendarize events:

mod.brofix.searchFields.tx_news_domain_model_news = bodytext
mod.brofix.searchFields.tx_calendarize_domain_model_event = description
Copied!
Default
pages = media,url
tt_content = bodytext,header_link,records
Copied!

excludeCtype 

optional

Property
mod.brofix.excludeCtype
Data type
string
Description
Exclude specific content types from link checking. 'html' is not checked by default, because the parsing for links does not always work correctly and may cause a number of links to be displayed as broken, which are in fact ok (false positives).
Default
html

doNotCheckContentOnPagesDoktypes 

optional

Property
mod.brofix.check.doNotCheckContentOnPagesDoktypes
Data type
string
Description

Comma separated list of page types on which content should not be checked. This still means the pages will get checked.

This is for example by default the cause for the page types shortcut and external link.

Default
3,4 (Link to external URL, shortcut)

doNotCheckPagesDoktypes 

optional

Property
mod.brofix.check.doNotCheckPagesDoktypes
Data type
string
Description
Comma separated list of page types which should not be checked. This means if a page has a doktype which is listed in this list, we do not do any link checking on the page.
Default
6,7,199,255 (Backend User section, Mount Point, Menu Separator, Recycler)

doNotTraversePagesDoktypes 

optional

Property
mod.brofix.check.doNotTraversePagesDoktypes
Data type
string
Description
Comma separated list of page types which should not be traversed. This means if a page has a doktype which is listed in this list, we do not do any link checking on subpages of these pages (and subpages of the subpages etc.).
Default
6,199,255 (Backend User section, Menu Separator, Recycler)

doNotCheckLinksOnWorkspace 

optional

Property
mod.brofix.check.doNotCheckLinksOnWorkspace
Data type
int
Description
This option is used to enable or disable checking links that are created in Workspace, by default, the links created on workspaces will be checked and reported.
Default
0

reportHiddenRecords 

optional

Property
mod.brofix.reportHiddenRecords
Data type
int
Description

Whether links to hidden records should be treated as broken links.

Default
1

depth 

optional

Property
mod.brofix.depth
Data type
int
Description
Default depth when checking with console command
Default
999 (for infinite)

Accept 

optional

Property
mod.brofix.linktypesConfig.external.headers.Accept
Data type
string
Description
HTTP request header "Accept". It is recommended to leave the default value and not change this.
Default
*/*

Accept-Language 

optional

Property
mod.brofix.linktypesConfig.external.headers.Accept
Data type
string
Description
HTTP request header. It is recommended to leave the default value and not change this.
Default
*

Accept-Encoding 

optional

Property
mod.brofix.linktypesConfig.external.headers.Accept-Encoding
Data type
string
Description
HTTP request header. It is recommended to leave the default value and not change this.
Default
*

timeout 

optional

Property
mod.brofix.linktypesConfig.external.timeout
Data type
int
Description
Timeout for HTTP request.
Default
10

redirects 

optional

Property
mod.brofix.linktypesConfig.external.redirects
Data type
int
Description
Number of redirects to follow. If more redirects are necessary to reach the destination final URL, this is handled as broken link.
Default
5

excludeLinkTarget.storagePid 

required (if "exclude URL" functionality should be available for non-admin editors)

Property
mod.brofix.excludeLinkTarget.storagePid
Data type
int
Description

The pid of the storage folder which contains the excluded link target records. If you want to enable editors to add URLs to list of excluded URLs, you must change this (it must be != 0).

Create a central folder to store the excluded URLs or create one for each site.

Excluded link targets (=URLs) are treated as valid URLs. This can be used for the rare case that an URL is detected as broken, but is not broken. This may be the case for some sites which require login credentials, but also for common sites where the automatic link checking mechanism yields false results.

Default
0

excludeLinkTarget.allowed 

optional

Property
mod.brofix.excludeLinkTarget.allowed
Data type
string
Description

Allowed link types which can be excluded. By default, it is only possible to exclude external URLs. If you would like to make this available for page links too, add additional link types, e.g.

allowed = external,db
Copied!

You can set it to empty to disable the "exclude URL" functionality:

allowed =
Copied!
Default
external

linkTargetCache.expiresLow 

optional

Property
mod.brofix.linkTargetCache.expiresLow
Data type
int
Description

When the link target cache expires in seconds. Whenever an external URL is checked or rechecked, the link target cache is used. Once the cache expires, the URL must be checked again.

The value means that the information for external URLs is retained for that time without having to access the external site.

2 different values are used for expiresLow and expiresHigh so that the target will usually not expire during the on-the-fly checking which would lead to delays.

As a rule of thumb, use the interval for full checking (e.g. 1 day for once a day checking) and multiply that with a factor of 1 to 10 for expiresLow. Add another interval for expiresHigh.

The interval for expiresLow will be used for full checking via the console command.

# checking links daily, use 7 as factor:
#  1 day * 7 * (seconds per day)
#  1 * 7 * 24*60*60
linkTargetCache.expiresLow = 604800
#  1 * 8 * 24*60*60
linkTargetCache.expiresHigh = 691200
Copied!
Default
604800 (7 days)

linkTargetCache.expiresHigh 

optional

Property
mod.brofix.linkTargetCache.expiresHigh
Data type
int
Description
See tsconfiglinktargetcacheexpires for description
Default
691200 (8 days)

crawlDelay.seconds 

optional

Property
mod.brofix.crawlDelay.seconds
Data type
int
Description

The minimum number of seconds that must have passed between checking 2 URL for the same domain.

If the required time has already passed since an URL of the same domain was last checked, the wait is not performed.

This helps to prevent that external sites are bombarded with requests from our site.

This is a pragmatic approach to make sure that a minimum delay is used when checking URLs of the same site. As a site may have multiple domains or several domains may be used by the same site, this will not always get the desired result, but it is a "good enough" approach.

This will not be used for on-the-fly checking, only for checking via the console command task.

crawlDelay.seconds = 10
Copied!
Default
5

crawlDelay.nodelay 

optional

Property
mod.brofix.crawlDelay.nodelay
Data type
string
Description

Do not use the crawlDelay.seconds wait period for these domains

crawlDelay.nodelay = example.org,example.com
Copied!
Default
empty

report.docsurl 

optional

Property
mod.brofix.report.docsurl
Data type
string
Description

Add a documentation URL. This will add an "i" button to the broken link report with a link to the documentation.

Add a link to the official documentation:

report.docsurl = https://docs.typo3.org/p/sypets/brofix/main/en-us/Index.html
Copied!
Default
Empty

report.recheckButton 

optional

Property
mod.brofix.report.recheckButton
Data type
int
Description

Whether to show the "Check links" button. By default, the button is available for "admin" users, but not for regular editors.

Deactivate the button for non-admin users (default):

mod.brofix.report.recheckButton = -1
Copied!

Activate button if depth=0 (current page) is selected:

mod.brofix.report.recheckButton = 0
Copied!

Enable the button in User TSconfig with depth "infinite" (for a user or group):

page.mod.brofix.report.recheckButton = 999
Copied!

If the current depth <= recheckButton, the button will be displayed. This makes it possible to not only control whether checking is possible, but also the depth

Default
-1 (do not show button for non-admin users)

mail.recipients 

required

Property
mod.brofix.mail.recipients
Data type
string
Description
Set the recipient email address(es) of the report mail sent by the console command. Can be several, separated by comma.
Example
mod.brofix.mail.recipients = sender@example.org
Copied!
Default
This is empty by default. $GLOBALS['TYPO3_CONF_VARS']['MAIL']['defaultMailFromName'] and $GLOBALS['TYPO3_CONF_VARS']['MAIL']['defaultMailFromAddress'] is used if this is empty.

mail.fromname 

required (unless set in $GLOBALS['TYPO3_CONF_VARS']['MAIL']['defaultMailFromName'])

Property
mod.brofix.mail.from
Data type
string
Description
Set the from name of the report mail sent by the console command.
Example
mod.brofix.mail.from = Sender
Copied!
Default
This is empty by default. $GLOBALS['TYPO3_CONF_VARS']['MAIL']['defaultMailFromName'] is used if this is empty.

mail.fromemail 

required (unless set in $GLOBALS['TYPO3_CONF_VARS']['MAIL']['defaultMailFromEmail'])

Property
mod.brofix.mail.from
Data type
string
Description
Set the from email of the report mail sent by the console command.
Example
mod.brofix.mail.from = sender@example.org
Copied!
Default
This is empty by default. $GLOBALS['TYPO3_CONF_VARS']['MAIL']['defaultMailFromEmail'] is used if this is empty.

mail.replytoemail 

optional

Property
mod.brofix.mail.replytoemail
Data type
string
Description
Set the replyto email of the report mail sent by the cron script.
Default
Empty

mail.replytoname 

optional

Property
mod.brofix.mail.replytoma,e
Data type
string
Description
Set the replyto name of the report mail sent by the cron script.
Default
Empty

mail.subject 

optional

If this is not set explicitly, a subject will be auto-generated.

Property
mod.brofix.mail.subject
Data type
string
Description
Set the subject of the report mail.
Default
Empty, auto-generated

mail.template 

optional

Always uses the default template CheckLinksResults if not supplied.

Property
mod.brofix.mail.template
Data type
string
Description
Set the template name of the report mail. If $GLOBALS['TYPO3_CONF_VARS']['MAIL']['format'] equals 'both', CheckLinksResults.html and CheckLinksResults.txt must exist.
Default
CheckLinksResults

mail.language 

optional

Property
mod.brofix.mail.language
Data type
string
Description
Use this language for the report sent via email.
Default
en

Usage quickstart 

  1. Switch to the broken link list

    When you see the message "Broken links were found" in the page module, click on the link that is displayed.

    You will now see the list of broken links.

    Alternatively, select "Check Links" in the left column.

  2. Start editing

    Click on the "Edit" action button for one of the items in the list.

  3. Fix the link

    In the rich text editor (RTE), the broken link should stand out (with yellow background and red border).

    Double click on it and the link browser will open.

    If the broken link is not in an RTE field, you may have to directly edit the field.

In general, broken link fixing is pretty straightforward, but there are some pitfalls which you may run into sooner or later. Look at usagepitfalls for some tips on how to deal with these.

The report 

This section covers some more details of the list of broken links.

depth selector 

What Broken Link Fixer will show depends on the currently selected page (in the page tree). Additionally, you can show broken link information on subpages. This depends on the depth, that is selected:

  • This page: shows only broken links on current page
  • 1 level: shows broken links on current page and direct subpages of this page
  • 2 levels: additionally show broken links on subpages of the subpages
  • etc.

This is the same behaviour as in the "Pagetree Overview", also in the Info module.

Using a high level (e.g. "infinite") may be useful for getting an overview. But, depending on the number of pages and broken links found, working with this may feel sluggish and slow. It is recommended to use a low level when working with the list (e.g. "This page").

buttons 

  • Refresh display: This just reloads the list. It does not recheck links.
  • Check links (if available): This checks broken links, depending on the selected depth. For external URLs, the Link target cache is used.
  • i (if configured): This opens the documentation in another browser tab.

table 

Columns in the table:

Columns 1-3

  1. Page: The title and [page id] are shown.
  2. Element: The record in which the broken link was found in. A language icon may be displayed. The header of the element (if available) and the [uid] are displayed.
  3. Type: This shows the type of the record (e.g. "Page Content", "Page", "News") and the field the broken link is found in (e.g. "Text", "Link").

Columns 5-8

  1. Link target: The link target (e.g. the URL or target page). You can click on it, to open it in another browser tab.
  2. Error: The error that occurred, e.g. "Page not found". Hover over the text with the mouse to see the original exception message.
  3. Checked: The last check time of the link target or when the element was last checked (whichever is older). Since a link target cache is used for external URLs, the check time may be before the check time of the element in column 2. If you loaded the URL in the browser and feel the information displayed is not up to date, you can press the "Recheck URL" button.

    If the record was edited after the last check, the broken link information may be outdated. It does not mean, it will always be outdated, as the record may have been edited without changing the links, but it is an indicator, that it might be a good idea to recheck. In this case, you can click on the "Recheck URL" button .

    The following is displayed:

    • if (possibly) outdated: red background and a recheck icon.
  4. Action: Action buttons:

    • Edit: Edit the field containing this broken link
    • Refresh URL: Rechecks the URL and removes the broken link record if the URL is ok or the broken link is no longer in the record. This is the only checking action which will actually check external URLs and refresh the link target cache.
    • Exclude URL: Only use this for "false positives" (if the URL is ok, but displayed as broken). Always "Recheck URL" first. This button opens a form to create an "Exclude URL" record. Once this is stored, all broken link records related to this URL are removed. In all subsequents checks, the URL is treated as valid and is not rechecked!

Known problems 

The most relevant known problems currently concern only "external broken links". You can turn off external link checking entirely or use one of the other counter measures.

False positives 

The main problem with external links are false positives where the automatic link checking will report a problem even though the URL is ok.

Issues 

For more known problems see the list of issues:

Best Practices 

Development 

This covers extending Broken Link Fixer via an extension.

Examples 

Override ExternalLinktype 

We override the ExternalLinktype class to make some changes in how external link types are checked:

  1. a specific error type is not treated as error
  2. some specific domains are not checked
ext_localconf.php
use Myvendor\MyExtension\Linktype\MyExternalLinktype;

$GLOBALS['TYPO3_CONF_VARS']['EXTCONF']['brofix']['checkLinks']['external'] = MyExternalLinktype::class;
Copied!
Classes/Linktype/MyExternalLinktype.php
<?php

declare(strict_types=1);
namespace Myvendor\MyExtension\Linktype;

use Sypets\Brofix\Linktype\ErrorParams;
use Sypets\Brofix\Linktype\ExternalLinktype;

class ExternalUniolLinktype extends ExternalLinktype
{
      public function checkLink(string $origUrl, array $softRefEntry, int $flags = 0): bool
      {
            // do some checking here, if $origUrl should get checked ..

            $isValidUrl = parent::checkLink($origUrl, $softRefEntry, $this->flags);
            if (!$isValidUrl) {
               $exceptionMsg = $this->errorParams->getExceptionMsg();
               // highly probably certificate chain issue, which should be treated as edge case false positive
               // curl(60): 'SSL certificate problem: unable to get local issuer certificate'
               if ($exceptionMsg === 'SSL certificate problem: unable to get local issuer certificate') {
                   return true;
               }
           }
           return $isValidUrl;
      }
}
Copied!

Changelog 

7.0 

6.2 

6.0.0-6.1.x 

3.2.0 

3.1.0 

3.0.0 

  • [BREAKING] Move the module from the Info module to its own module. This requires changes in the editor configuration: Give the editors permission to the "brofix" module.

2.3.0 

  • Rename branch master => main
  • Add module for "Manage Exclusions"

2.2.0 

Update to 2.2.0 requires updating the database.

  • Add support for TYPO3 v11
  • Add crdate to table. This will later make it possible to detect new broken links (or broken links recently detected).
  • Add start and stop time to check links email report
  • Change order of settings in check links email report
  • Add additional setting mod.brofix.mail.language to set the language of the email report.
  • Do not check records of default language if l18n_cfg is 1 or 3 ("Hide default language of page")
  • Also consider if records should be checked on page if rechecking URL or fields.
  • Optimize external link checking: Do not use extra headers Accept-Language and Accept-Encoding by default. This causes problems with some websites.
  • Optimize pagination: Do not show pagination controls if there is only one page

2.1.1 

  • Fix setting of depth=0 via CLI command brofix:checklinks (issue:69)
  • Fix fatal error: Exception was thrown on CLI command checklinks if replytoemail was set (due to call to not existing function). (issue:66)
  • Fix version constraints (in ext_emconf.php)

2.1.0 

!!! Change in SQL: It is required to do database compare and recheck links.

  • UI optimization: use card layout instead of table for small screens
  • Add editable restrictions: Show only broken links the editor has access to.

2.0.2 

  • "Check links" button is always available for admins, but deactivated for editors by default
  • Change styling for "Last check" field if information is considered "fresh".

2.0.1 

  • add "freshness" / "stale" information in "Last check" column in broken link report
  • add "Last check" time for the URL as well. Because of the "link target" cache, the "last check" information for the field and the URL may differ.
  • Add "Check links" button to report

2.0.0 

  • Support for TYPO3 9 was dropped
  • bugfix: Do not use FlashMessages in DataHandler hook
  • several improvements in broken link list
  • It is now possible to recheck URLs from the GUI via a button. This is configurable (report.recheckButton).
  • It is checked if record was edited after last check. In that case the broken link information may be "stale" (outdated). This is shown in the list along with the time of the last check.
  • The last check time of the URLs are shown as well. Since the check status is cached, this may differ from the time when the record was last checked. While this may be confusing (to show different values), it makes the behaviour more transparent.

1.0.4 

  • Add --send-email and --dry-run option to command controller
  • Add more output to command controller
  • Fix formatting of floats in email

> 1.0.0 

  • bug fixing

1.0.0 

Supports TYPO3 9 and 10

  • Initial version, change extension key to "brofix" (Broken Link Fixer)

GUI: page module

  • Shows message in page module, if broken links on page (depends on ext:page_callouts to add hook)

GUI: RTE

  • Links to hidden or deleted CE are also marked as broken as RTE (as they are also reported as broken by brofix)

GUI: broken link report

  • List was decluttered:

    • Only page title is displayed (not full page path)
    • short localized error messages are used.
    • Show a short date format for "last checked" (only hours and minutes, not the date if timestamp is today)
  • Sorting: It is now possible to sort by page / element, content type, URL or error type
  • "Check links" tab was removed
  • All links types are always displayed, no need to check checkboxes
  • In the report, the broken links of the just edited record are displayed in a different color. This makes it easier to keep track if jumping back and forth from the edit form to the list of broken links.
  • Show an informational message if no page is selected in the page tree.
  • Reload list immediately, if form was changed. Since there is currently only a select list, it does not make sense to have to click a button additionally to changing the value in the select field.

Link checking

  • Use console command instead of scheduler task
  • Previously, all records in the broken links table (of currently to be checked pages) were removed at the beginning of the check. This resulted in inconsistent results, especially during checks which took longer than just a few minutes. Now, the records are not removed, but the existing records are updated (if they already exist) or are inserted (if new). This way, the link check results are mostly up-to-date.
  • Crawl delay: A minimum wait time between 2 checks of URLs of the same domain. The crawl delay is not used in on-the-fly checking.
  • Link target "cache": External URLs are now stored in a persistent "cache" table. The duration (expiration time) is configurable: Thus, on-the-fly checking is faster because the cache is used.
  • Exclude link targets: For the still existing problem of false positives, it is possible to exclude URLs (or domains) from link checking. Excluded URLs are treated as valid URLs. URLs can be excluded by clicking a button in the link list and then editing the record. Permissions for editors are restricted and must explicitly be granted.
  • A timestamp of the last check is added to the broken links table and obsolete records (e.g. belonging to a removed page) are removed at the end of the link checking.
  • Links in fields, which are not editable, are no longer checked. Previously, the fields which were configured to be checked were always checked, independently of the content type (CType) or page type (doktype). However, for some types, content is not relevant such as bodytext for plugins or the URL for normal page types. Furthermore, it is not possible to edit these as editor and the editor would get an error message. These broken links stayed in the list of broken links and there was no way to remove them. Now, only editable fields are checked.
  • Do not check records if in hidden gridelement.

Email report

  • Shows additional statistics, such as number of pages checked, number of links checked, percentage of broken links to number of checked links. Especially the percentage of broken links to total number of links can be used as an indicator for the "site health".
  • The number of broken links is added to the subject of the email. This way, it is not necessary to click on each email to see the most relevant numbers.

Development

  • Setup unit and functional tests (see Build directory)
  • Added .editorconfig

Feature - Add index url_hash for performance 

since verion 6.2.0

An index is added to the database table tx_brofix_broken_links for the fields link_type, url_hash and check_status. A new field url_hash is introduced which generates a hash for the URL.

This results may result in significant performance improvements when opening records with RTE fields and many links in the backend. When the RTE is opened, pre-processing is performed in order to mark the broken links as broken. For this, a db query is performed for each link.

Migration 

php vendor/bin/typo3 database:updateschema
php vendor/bin/typo3 upgrade:run brofix_urlHashUpgradeWizard
Copied!

Breaking - Use LinkTargetResponse (Add check_status) 

A database field "check_status" was added to the tx_brofix_broken_links and tx_brofix_link_target_cache tables. It is now possible to save all links, not just the broken links to tx_brofix_broken_links (configurable by extension configuration).

For some status codes for external links (e.g. HTTP status code 401 and 403), the link targets are considered uncheckable - we cannot really know if they are broken or not. This is stored as separate status and it is possible to filter by this status in the broken link module.

Currently, these are the known status:

  • 1: broken
  • 2: ok
  • 3: not possible to check
  • 4: is excluded

This should also improve handling of cloudflare protected sites as these typically return 403 HTTP status code. The link checking status is no longer considered broken, it is now considered "not checkable", since the actual link check result cannot be obtained.

What kind of results from link checking, make the URL "uncheckable" can be configured via Exension Configuration "combinedErrorNonCheckableMatch".

This can be either a regular expression (with prefix "regex:" and enclosing delimeters (e.g. "/"). Or it can be a list of strings, separated by comma.

This is matched against a combination of the link checking result, consisting of:

<errorType> ":" <errorCode> ":" <exceptionMessage>
Copied!

To match HTTP status code 401, you could use:

httpStatusCode:401:
Copied!

This is the default value:

1
regex:/^(httpStatusCode:(401|403):|libcurlErrno:60:SSL certificate problem: unable to get local issuer certificate)/
Copied!

Impact 

There were some changes to the database and the LinktypeInterface. See "Migration" for necessary actions.

Also, there was a change to the backend module: A new filter "Check status:" was added to filter broken links by status. By default, only broken links are shown (as before this change).

Migration 

Update any custom classes implementing LinktypeInterface to address changes to the interface. In particular, the checkLinks() method will now return LinkTargetResult instead of int.

Also, a database update must be performed to address the changed schema.

The tables tx_brofix_broken_links and tx_brofix_link_target_cache should be emptied. This can be done by performing the Upgrade wizard "Truncate tables tx_brofix_broken_links and tx_brofix_link_target_cache".

Some language labels have been added, it is advised to join Crowdin and contribute translations.

Also, a new select field "Check status" was added to the Broken Link list module: It is recommended to advise editors about this, but it should not be a problem. Editors can just use the default selection (only show broken links).

Feature - Add button to jump to page layout 

since verion 6.0.2

By default, a close button is now displayed in the list of broken links.

The button is only displayed, if

  • the extension page_callouts is loaded
  • showPageLayoutButton is set in the brofix extension configuration

You should use the latest version of page_callouts so that the close button will displayed in the page layout to jump back to the list of broken links.

If this button should not be displayed, it is possible to deactivate it in the extension configuration.

Migration 

No migration necessary.

Feature - More configuration options for sending emails 

There is an option "send-email" in the command / scheduler task which determined if an email should be sent when the link checking is complete. There are now more options which also make it possible to send an email only when broken links were found and also only when new broken links were found.

The old values (0, 1, -1) are still supported and are mapped to the new values.

  • "never" : never send email (previously: 0)
  • "always": send email (previously: 1)
  • "any" : send email if any broken links were found
  • "new" : send email if new broken links were found
  • "auto" : do not override, use TSconfig mail.sendOnCheckLinks

If "auto" is used, the TSconfig configuration will be used which makes it possible to configure this for each site individually.

Migration 

As the old values will still work, no change is necessary, but it is recommended to use the new string values instead of the old numeric values.

Info 

Feature - Make page callouts configurable 

since verion 6.1.0

If EXT:page_callouts is installed, information is displayed in the page module, if broken links exists.

Since this has a small performance impact, is not really necessary if broken links are fixed regularly etc., this is now configurable via:

  • extension configuration: "Show message in page module if broken links exist on page" [showPageCalloutBrokenLinksExist] (default: on)
  • user settings: "Show message in page module if broken links exist on page" [tx_brofix_showPageCalloutBrokenLinksExist] in tab "Broken links" (default: on)

The information is only displayed if extension configuration is set to true, the user settings is active and page_callouts is installed (and of course, if broken links exist on that page).

Migration 

No migration necessary. It might make sense to inform the BE users about this.

Feature - Support checking in Flexforms 

This feature can be used and has been tested, but should be considered experimental until further notice!

For Flexform checking to fully work, you must set :ref:`tcaProcessing <extensionConfiguation_tcaProcessing>` to "full" in the extension configuration for brofix.

Since Flexforms consist of nested fields, checking these kind of fields needed modified functionality. It is now possible to also check Flexforms for broken links.

Implementation 

Which Flexform fields are visible is determined by the fields defined in the Flexform XML schema, just as is the case for other fields. When the values are written to the database field (e.g. tt_content.pi_flexform), this may include older fields which will no longer be displayed. However, if we get the schema from the processed TCA, we process only the fields which would be displayed in the Backend.

How do we determine which fields should be checked and how?

We use the type (and other TCA configuration in the Flexform schema) and only parse fields which have a type which might include links, e.g.

  • if the "softref" field is set, we get the list of softref parsers from this field
  • if "enableRichtext" is set (but softref not), we use the "typolink_tag" parser key
  • type "link" and type "input" with "renderType" "inputLink" use the "typolink" softref parser key
  • more field types (such as "file") will be supported in the future

Using this new feature 

  1. Add your Flexform fields to the search fields, for example:

    mod.brofix.searchFields.tt_content = bodytext,header_link,records,pi_flexform
    Copied!
  2. In the extension configuration for brofix, set tcaProcessing to "full"
  3. Check your fields in your Flexform configuration, to make sure, you are using field configuration which will be checked by brofix (see the "Implementation" section), such as type "link-t3tca-columns-link-since-typo3-v12-and-set-the-correct-refsoftref <t3tca:tca_property_softref>".
  4. Check your links

Caveats 

This new feature comes with some caveats:

  1. It is not possible to edit the field with the broken link directly: When clicking the edit button in the broken link list, an edit dialog is opened for all fields in the flexform while for non-Flexform fields, the edit dialog will show only the affected field. The advantage of showing only the affected field is that it is easier to find the broken link, especially in non-RTE fields where the broken link is not highlighted. (The reason for this caveat is that it is not possible currently with core functionality using the record_edit route.)
  2. It is not possible to specify directly which fields in the Flexform will be checked. The fields which are checked is derived directly form the field configuration (type, renderType, enableRichtext and softref).
  3. In order for the Flexform processing to work the full record is fetched from the database. This makes the process possibly slightly slower and less efficient, but should not have a big impact.

Some of these caveats may be addressed in future releases.

Combination with extensions 

dce 

This new feature was tested with EXT:dce, but a problem is found. If patch from the PR is applied, it should work:

Feature: issue 352 - Make it possible to edit the full record 

See

Description 

Previously, a form showing only the field with the broken link is opened, if clicking the "pencil" button in the Broken Link Fixer report.

This is not ideal in some cases because relevant context is missing, for example when editing redirect records.

For this reason, it is now possible to also edit the full record, but this is configurable (see Extension Configuration).

Impact 

A new button is now displayed in the broken link list BE module, in addition to the already existing button. The buttons have the following functionality:

  1. button to edit only the field (same as before)
  2. button to edit the entire record (which contains an additional icon)

If this makes sense depends on which records / fields are checked and if it is helpful to have more context. If not, this can be deactivated in the Extension Configuration.

Sitemap