Important: #94484 - Introduce HTML Sanitizer¶
See forge#94484
Description¶
To sanitize and purge XSS from markup during frontend rendering, new
custom HTML sanitizer has been introduced, based on masterminds/html5
.
Both \TYPO3\HtmlSanitizer\Builder\CommonBuilder
and
\TYPO3\HtmlSanitizer\Visitor\CommonVisitor
provide common configuration
which is in line with expected tags that are allowed in backend RTE.
Using a custom builder instance, it is possible to adjust for individual demands - however, configuration possibilities cannot be modified using TypoScript - basically since the existing syntax does not cover all necessary scenarios.
PHP API¶
The API is considered "internal", however it might be necessary to provide custom markup handling, add additional tags, attributes or values. The whole process of sanitization is based on an "allow-list" - everything that is not allowed, is automatically denied.
The following example is meant to give a brief overview of the behavior and corresponding possibilities.
<?php
use TYPO3\CMS\Core\Html\DefaultSanitizerBuilder;
use TYPO3\HtmlSanitizer\Behavior;
use TYPO3\HtmlSanitizer\Builder\BuilderInterface;
class MyCustomBuilder extends DefaultSanitizerBuilder implements BuilderInterface
{
public function createBehavior(): Behavior
{
// extends existing behavior, adds new tag
return parent::createBehavior()
->withName('my-custom')
->withTags(
(new Behavior\Tag('my-element', Behavior\Tag::ALLOW_CHILDREN))
->addAttrs(
(new Behavior\Attr('href'))->addValues(
new Behavior\RegExpAttrValue('#^(?:https?://|mailto:)#')
),
...$this->globalAttrs
)
);
}
}
As a result a new tag my-element
is which is allowed to
have any safe global attribute (
id
,class
,data-*
, ...)have attribute
href
, in case corresponding value either starting withhttp://
,http://
ormailto:
- evaluated from the given regular expression
TypoScript¶
stdWrap.htmlSanitize¶
New stdWrap
property htmlSanitize
has been introduced
to control sanitization of markup, removing tags, attributes or values that have
not been allowed explicitly.
htmlSanitize = [boolean]
whether to invoke sanitization (enabled per default).htmlSanitize.build = [string]
defines which specific builder (must be an instance of\TYPO3\HtmlSanitizer\Builder\BuilderInterface
) to be used for building a\TYPO3\HtmlSanitizer\Sanitizer
instance using a particular\TYPO3\HtmlSanitizer\Behavior
. This can either be a fully qualified class name or the name of a preset as defined in$GLOBALS['TYPO3_CONF_VARS']['SYS']['htmlSanitizer']
- per default,\TYPO3\CMS\Core\Html\DefaultSanitizerBuilder
is used.
10 = TEXT
10 {
value = <div><img src="invalid.file" onerror="alert(1)"></div>
htmlSanitize = 1
// htmlSanitize.build = default
// htmlSanitize.build = TYPO3\CMS\Core\Html\DefaultSanitizerBuilder
}
stdWrap.parseFunc¶
stdWrap.htmlSanitize
is enabled per default when
stdWrap.parseFunc
is invoked. This also includes Fluid
view-helper <f:format.html>
, since it invokes parseFunc
using lib.parseFunc_RTE
directly.
The following example shows how sanitization behavior - enabled per default - can be disabled. This is not recommended, but occasionally might be necessary.
// either disable globally
lib.parseFunc.htmlSanitize = 0
lib.parseFunc_RTE.htmlSanitize = 0
// or disable individually per use-case
10 = TEXT
10 {
value = <div><img src="invalid.file" onerror="alert(1)"></div>
parseFunc =< lib.parseFunc_RTE
parseFunc.htmlSanitize = 0
}
Troubleshooting¶
Since any invocation of stdWrap.parseFunc
triggers HTML
sanitization per default - except it is disabled explicitly - the following
example lead to lots of generated markup being sanitized - and was solved by
explicitly disabling it using htmlSanitize = 0
.
10 = FLUIDTEMPLATE
10 {
templateRootPaths {
// ...
}
variables {
// ...
}
stdWrap.parseFunc {
// replace --- with soft-hyphen
short.--- = ­
// sanitization of ALL MARKUP is NOT DESIRED here
htmlSanitize = 0
}
}
HTML sanitization should be used for user-submitted input like rich-text data - but not for the overall markup of a complete website.
Backend RTE configuration¶
Processing instructions for rich-text fields in the backend user interface
can be adjusted in a similar way, e.g. in Configuration/Processing.yaml
.
processing:
allowTags:
# ...
HTMLparser_db:
# ...
htmlSanitize:
# use default builder as configured in
# $GLOBALS['TYPO3_CONF_VARS']['SYS']['htmlSanitizer']
build: default
# disable individually per use-case
# htmlSanitize: false
Sanitization for persisting data can be needs to be enabled globally using corresponding
feature flag $GLOBALS['TYPO3_CONF_VARS']['SYS']['features']['security.backend.htmlSanitizeRte']
.
Debugging & Logging¶
In order to debug and log occurrences that have been modified during the sanitization
process, following configuration can be configured in corresponding LOG
section
of typo3conf/LocalConfiguration.php
.
// ...
'LOG' => [
'TYPO3' => [
'HtmlSanitizer' => [
'writerConfiguration' => [
\TYPO3\CMS\Core\Log\LogLevel::DEBUG => [
'TYPO3\CMS\Core\Log\Writer\FileWriter' => [
'logFileInfix' => 'html',
],
],
],
],
],
],
// ...
Which produces log entries in e.g. typo3temp/var/log/typo3_html_[hash-value].log
like below
Wed, 11 Aug 2021 09:03:08 +0200 [DEBUG] request="b62c11bcbd3d7"
component="TYPO3.HtmlSanitizer.Visitor.CommonVisitor":
Found invalid attribute a.href - {"behavior":"default","nodeName":"a","attrName":"href"}
Wed, 11 Aug 2021 09:03:08 +0200 [DEBUG] request="b62c11bcbd3d7"
component="TYPO3.HtmlSanitizer.Visitor.CommonVisitor":
Found invalid attribute div.onmouseover - {"behavior":"default","nodeName":"div","attrName":"onmouseover"}
Wed, 11 Aug 2021 09:03:08 +0200 [DEBUG] request="b62c11bcbd3d7"
component="TYPO3.HtmlSanitizer.Visitor.CommonVisitor":
Found unexpected tag script - {"behavior":"default","nodeName":"script"}