Important: #94484 - Introduce HTML Sanitizer
See forge#94484
Description
To sanitize and purge XSS from markup during frontend rendering, new
custom HTML sanitizer has been introduced, based on masterminds/
.
Both \TYPO3\
and
\TYPO3\
provide common configuration
which is in line with expected tags that are allowed in backend RTE.
Using a custom builder instance, it is possible to adjust for individual demands - however, configuration possibilities cannot be modified using TypoScript - basically since the existing syntax does not cover all necessary scenarios.
PHP API
The API is considered "internal", however it might be necessary to provide custom markup handling, add additional tags, attributes or values. The whole process of sanitization is based on an "allow-list" - everything that is not allowed, is automatically denied.
The following example is meant to give a brief overview of the behavior and corresponding possibilities.
<?php
use TYPO3\CMS\Core\Html\DefaultSanitizerBuilder;
use TYPO3\HtmlSanitizer\Behavior;
use TYPO3\HtmlSanitizer\Builder\BuilderInterface;
class MyCustomBuilder extends DefaultSanitizerBuilder implements BuilderInterface
{
public function createBehavior(): Behavior
{
// extends existing behavior, adds new tag
return parent::createBehavior()
->withName('my-custom')
->withTags(
(new Behavior\Tag('my-element', Behavior\Tag::ALLOW_CHILDREN))
->addAttrs(
(new Behavior\Attr('href'))->addValues(
new Behavior\RegExpAttrValue('#^(?:https?://|mailto:)#')
),
...$this->globalAttrs
)
);
}
}
As a result a new tag my-
is which is allowed to
- have any safe global attribute (
id
,class
,data-*
, ...) - have attribute
href
, in case corresponding value either starting withhttp://
,http://
ormailto:
- evaluated from the given regular expression
TypoScript
stdWrap.htmlSanitize
New std
property html
has been introduced
to control sanitization of markup, removing tags, attributes or values that have
not been allowed explicitly.
html
whether to invoke sanitization (enabled per default).Sanitize = [boolean] html
defines which specific builder (must be an instance ofSanitize. build = [string] \TYPO3\
) to be used for building aHtml Sanitizer\ Builder\ Builder Interface \TYPO3\
instance using a particularHtml Sanitizer\ Sanitizer \TYPO3\
. This can either be a fully qualified class name or the name of a preset as defined inHtml Sanitizer\ Behavior $GLOBALS
- per default,['TYPO3_ CONF_ VARS'] ['SYS'] ['html Sanitizer'] \TYPO3\
is used.CMS\ Core\ Html\ Default Sanitizer Builder
10 = TEXT
10 {
value = <div><img src="invalid.file" onerror="alert(1)"></div>
htmlSanitize = 1
// htmlSanitize.build = default
// htmlSanitize.build = TYPO3\CMS\Core\Html\DefaultSanitizerBuilder
}
stdWrap.parseFunc
std
is enabled per default when
std
is invoked. This also includes Fluid
view-helper <f:
, since it invokes parse
using lib.
directly.
The following example shows how sanitization behavior - enabled per default - can be disabled. This is not recommended, but occasionally might be necessary.
// either disable globally
lib.parseFunc.htmlSanitize = 0
lib.parseFunc_RTE.htmlSanitize = 0
// or disable individually per use-case
10 = TEXT
10 {
value = <div><img src="invalid.file" onerror="alert(1)"></div>
parseFunc =< lib.parseFunc_RTE
parseFunc.htmlSanitize = 0
}
Troubleshooting
Since any invocation of std
triggers HTML
sanitization per default - except it is disabled explicitly - the following
example lead to lots of generated markup being sanitized - and was solved by
explicitly disabling it using html
.
10 = FLUIDTEMPLATE
10 {
templateRootPaths {
// ...
}
variables {
// ...
}
stdWrap.parseFunc {
// replace --- with soft-hyphen
short.--- = ­
// sanitization of ALL MARKUP is NOT DESIRED here
htmlSanitize = 0
}
}
HTML sanitization should be used for user-submitted input like rich-text data - but not for the overall markup of a complete website.
Backend RTE configuration
Processing instructions for rich-text fields in the backend user interface
can be adjusted in a similar way, e.g. in Configuration/
.
processing:
allowTags:
# ...
HTMLparser_db:
# ...
htmlSanitize:
# use default builder as configured in
# $GLOBALS['TYPO3_CONF_VARS']['SYS']['htmlSanitizer']
build: default
# disable individually per use-case
# htmlSanitize: false
Sanitization for persisting data can be needs to be enabled globally using corresponding
feature flag $GLOBALS
.
Debugging & Logging
In order to debug and log occurrences that have been modified during the sanitization
process, following configuration can be configured in corresponding LOG
section
of typo3conf/
.
// ...
'LOG' => [
'TYPO3' => [
'HtmlSanitizer' => [
'writerConfiguration' => [
\TYPO3\CMS\Core\Log\LogLevel::DEBUG => [
'TYPO3\CMS\Core\Log\Writer\FileWriter' => [
'logFileInfix' => 'html',
],
],
],
],
],
],
// ...
Which produces log entries in e.g. typo3temp/
like below
Wed, 11 Aug 2021 09:03:08 +0200 [DEBUG] request="b62c11bcbd3d7"
component="TYPO3.HtmlSanitizer.Visitor.CommonVisitor":
Found invalid attribute a.href - {"behavior":"default","nodeName":"a","attrName":"href"}
Wed, 11 Aug 2021 09:03:08 +0200 [DEBUG] request="b62c11bcbd3d7"
component="TYPO3.HtmlSanitizer.Visitor.CommonVisitor":
Found invalid attribute div.onmouseover - {"behavior":"default","nodeName":"div","attrName":"onmouseover"}
Wed, 11 Aug 2021 09:03:08 +0200 [DEBUG] request="b62c11bcbd3d7"
component="TYPO3.HtmlSanitizer.Visitor.CommonVisitor":
Found unexpected tag script - {"behavior":"default","nodeName":"script"}