.. include:: /Includes.rst.txt .. _important-94484: ============================================ Important: #94484 - Introduce HTML Sanitizer ============================================ See :issue:`94484` Description =========== To sanitize and purge XSS from markup during frontend rendering, new custom HTML sanitizer has been introduced, based on `masterminds/html5`. Both :php:`\TYPO3\HtmlSanitizer\Builder\CommonBuilder` and :php:`\TYPO3\HtmlSanitizer\Visitor\CommonVisitor` provide common configuration which is in line with expected tags that are allowed in backend RTE. Using a custom builder instance, it is possible to adjust for individual demands - however, configuration possibilities cannot be modified using TypoScript - basically since the existing syntax does not cover all necessary scenarios. PHP API ======= The API is considered "internal", however it might be necessary to provide custom markup handling, add additional tags, attributes or values. The whole process of sanitization is based on an "allow-list" - everything that is not allowed, is automatically denied. The following example is meant to give a brief overview of the behavior and corresponding possibilities. .. code-block:: php withName('my-custom') ->withTags( (new Behavior\Tag('my-element', Behavior\Tag::ALLOW_CHILDREN)) ->addAttrs( (new Behavior\Attr('href'))->addValues( new Behavior\RegExpAttrValue('#^(?:https?://|mailto:)#') ), ...$this->globalAttrs ) ); } } As a result a new tag :html:`my-element` is which is allowed to * have any safe global attribute (`id`, `class`, `data-*`, ...) * have attribute `href`, in case corresponding value either starting with `http://`, `http://` or `mailto:` - evaluated from the given regular expression TypoScript ========== stdWrap.htmlSanitize -------------------- New :typoscript:`stdWrap` property :typoscript:`htmlSanitize` has been introduced to control sanitization of markup, removing tags, attributes or values that have not been allowed explicitly. * `htmlSanitize = [boolean]` whether to invoke sanitization (enabled per default). * `htmlSanitize.build = [string]` defines which specific builder (must be an instance of :php:`\TYPO3\HtmlSanitizer\Builder\BuilderInterface`) to be used for building a :php:`\TYPO3\HtmlSanitizer\Sanitizer` instance using a particular :php:`\TYPO3\HtmlSanitizer\Behavior`. This can either be a fully qualified class name or the name of a preset as defined in :php:`$GLOBALS['TYPO3_CONF_VARS']['SYS']['htmlSanitizer']` - per default, :php:`\TYPO3\CMS\Core\Html\DefaultSanitizerBuilder` is used. .. code-block:: typoscript 10 = TEXT 10 { value =
htmlSanitize = 1 // htmlSanitize.build = default // htmlSanitize.build = TYPO3\CMS\Core\Html\DefaultSanitizerBuilder } stdWrap.parseFunc ----------------- :typoscript:`stdWrap.htmlSanitize` is enabled per default when :typoscript:`stdWrap.parseFunc` is invoked. This also includes Fluid view-helper :html:``, since it invokes :php:`parseFunc` using :typoscript:`lib.parseFunc_RTE` directly. The following example shows how sanitization behavior - enabled per default - can be disabled. This is not recommended, but occasionally might be necessary. .. code-block:: typoscript // either disable globally lib.parseFunc.htmlSanitize = 0 lib.parseFunc_RTE.htmlSanitize = 0 // or disable individually per use-case 10 = TEXT 10 { value =
parseFunc =< lib.parseFunc_RTE parseFunc.htmlSanitize = 0 } Troubleshooting --------------- Since any invocation of :typoscript:`stdWrap.parseFunc` triggers HTML sanitization per default - except it is disabled explicitly - the following example lead to lots of generated markup being sanitized - and was solved by explicitly disabling it using :typoscript:`htmlSanitize = 0`. .. code-block:: typoscript 10 = FLUIDTEMPLATE 10 { templateRootPaths { // ... } variables { // ... } stdWrap.parseFunc { // replace --- with soft-hyphen short.--- = ­ // sanitization of ALL MARKUP is NOT DESIRED here htmlSanitize = 0 } } HTML sanitization should be used for user-submitted input like rich-text data - but not for the overall markup of a complete website. Backend RTE configuration ========================= Processing instructions for rich-text fields in the backend user interface can be adjusted in a similar way, e.g. in :file:`Configuration/Processing.yaml`. .. code-block:: yaml processing: allowTags: # ... HTMLparser_db: # ... htmlSanitize: # use default builder as configured in # $GLOBALS['TYPO3_CONF_VARS']['SYS']['htmlSanitizer'] build: default # disable individually per use-case # htmlSanitize: false Sanitization for persisting data can be needs to be enabled globally using corresponding feature flag :php:`$GLOBALS['TYPO3_CONF_VARS']['SYS']['features']['security.backend.htmlSanitizeRte']`. Debugging & Logging =================== In order to debug and log occurrences that have been modified during the sanitization process, following configuration can be configured in corresponding :php:`LOG` section of :file:`typo3conf/LocalConfiguration.php`. .. code-block:: php // ... 'LOG' => [ 'TYPO3' => [ 'HtmlSanitizer' => [ 'writerConfiguration' => [ \TYPO3\CMS\Core\Log\LogLevel::DEBUG => [ 'TYPO3\CMS\Core\Log\Writer\FileWriter' => [ 'logFileInfix' => 'html', ], ], ], ], ], ], // ... Which produces log entries in e.g. :file:`typo3temp/var/log/typo3_html_[hash-value].log` like below .. code-block:: text Wed, 11 Aug 2021 09:03:08 +0200 [DEBUG] request="b62c11bcbd3d7" component="TYPO3.HtmlSanitizer.Visitor.CommonVisitor": Found invalid attribute a.href - {"behavior":"default","nodeName":"a","attrName":"href"} Wed, 11 Aug 2021 09:03:08 +0200 [DEBUG] request="b62c11bcbd3d7" component="TYPO3.HtmlSanitizer.Visitor.CommonVisitor": Found invalid attribute div.onmouseover - {"behavior":"default","nodeName":"div","attrName":"onmouseover"} Wed, 11 Aug 2021 09:03:08 +0200 [DEBUG] request="b62c11bcbd3d7" component="TYPO3.HtmlSanitizer.Visitor.CommonVisitor": Found unexpected tag script - {"behavior":"default","nodeName":"script"} .. index:: Backend, Frontend, ext:core