The external import setup

The import of a RSS feed into table tx_news_domain_model_news poses a particular challenge. We want to store the URI of the news item in the related links table, which uses IRRE and a “parent” field to relate links to news items.

We will see later what the trick is. The first important thing to note is the order of import. Since it is links that are related to news items, we must import news before links.

A second peculiarity is that both links and news items are in the same source of data. Thus we will import the RSS feed twice.

Importing news items

Thus we start with the news items. A new column was added to the tx_news_domain_model_news table. It is used to store the external id found in the RSS feed.

Here is the setup for the general section:

$GLOBALS['TCA']['tx_news_domain_model_news']['external']['general'] = [
    0 => [
        'connector' => 'feed',
        'parameters' => [
            'uri' => 'https://typo3.org/?type=100'
        ],
        'data' => 'xml',
        'nodetype' => 'item',
        'referenceUid' => 'tx_externalimporttut_externalid',
        'enforcePid' => true,
        'priority' => 200,
        'group' => 'externalimport_tut',
        'disabledOperations' => 'delete',
        'description' => 'Import of typo3.org news'
    ],
];

Note that we don’t use the same connector service as before. Indeed, we now need the “feed” sub-type, which is provided by extension “svconnector_feed”. This connector is specialized in getting XML data from some source (remote or local), which is defined with the uri property inside the parameters array.

Next, we declare that the data will be provided in XML format and that the reference node type in “item”. With this instruction, External Import will take all nodes of type “item” and import each of them. The enforcePid property is set to true so that the import takes place only in the predefined page and that existing news items entered somewhere else are not deleted. This is a useful precaution to take.

Also note that the delete operation is disabled. This makes sense in this case, as an RSS feed normally contains only the latest news items. Thus if you don’t want each import to delete the data from the previous import, the delete operation should be disabled.

In the previous chapter, we said that we wanted to import only the news items that are part of the “TYPO3 CMS” category. For this, we want to read the <category> tag, but not store it in the database. Thus we declare it as an additional field:

$GLOBALS['TCA']['tx_news_domain_model_news']['external']['additionalFields'] = [
    0 => [
        'category' => [
            'xpath' => './category[text()=\'TYPO3 CMS\']',
            'transformations' => [
                10 => [
                    'isEmpty' => [
                        'invalidate' => true
                    ]
                ]
            ]
        ]
    ]
];

The “xpath” property makes it so that only items who have the following:

<category>TYPO3 CMS</category>

will have a value in the “category” field. For all other records, it will be empty. And thus we can filter by using the “isEmpty” transformation property. This property tests whether a given value is empty or not. By default, it relies on the PHP empty() function, but it can also use the Symfony Expression Language for more sophisticated conditions. In this case, we have declared nothing special, so empty() will be used. We then set the “invalidate” sub-property to true, meaning that records which have an empty value will be discarded from the imported dataset. As a result, only items with the “TYPO3 CMS” category are imported.

Let’s now look at the setup for the columns:

$GLOBALS['TCA']['tx_news_domain_model_news']['columns']['title']['external'] = [
    0 => [
        'field' => 'title'
    ]
];
$GLOBALS['TCA']['tx_news_domain_model_news']['columns']['tx_externalimporttut_externalid']['external'] = [
    0 => [
        'field' => 'link',
        'transformations' => [
            10 => [
                'trim' => true
            ]
        ]
    ]
];
$GLOBALS['TCA']['tx_news_domain_model_news']['columns']['datetime']['external'] = [
    0 => [
        'field' => 'pubDate',
        'transformations' => [
            10 => [
                'userFunction' => [
                    'class' => \Cobweb\ExternalImport\Transformation\DateTimeTransformation::class,
                    'method' => 'parseDate'
                ]
            ]
        ]
    ]
];
$GLOBALS['TCA']['tx_news_domain_model_news']['columns']['teaser']['external'] = [
    0 => [
        'field' => 'description',
        'transformations' => [
            10 => [
                'trim' => true
            ]
        ]
    ]
];
$GLOBALS['TCA']['tx_news_domain_model_news']['columns']['bodytext']['external'] = [
    0 => [
        'field' => 'encoded',
        'transformations' => [
            10 => [
                'userFunction' => [
                    'class' => \Cobweb\ExternalimportTut\Transformation\LinkTransformation::class,
                    'method' => 'absolutizeUrls',
                    'parameters' => [
                        'host' => 'https://typo3.org'
                    ]
                ]
            ],
            20 => [
                'rteEnabled' => true
            ]
        ]
    ]
];
$GLOBALS['TCA']['tx_news_domain_model_news']['columns']['type']['external'] = [
    0 => [
        'transformations' => [
            10 => [
                'value' => 0
            ]
        ]
    ]
];
$GLOBALS['TCA']['tx_news_domain_model_news']['columns']['hidden']['external'] = [
    0 => [
        'transformations' => [
            10 => [
                'value' => 0
            ]
        ]
    ]
];

For most of the fields, the setup is just as simple as if we were importing database records, thanks to the connector services, which have abstracted the tediousness of getting data in different formats. However XML format allows for more complicated retrieval of data via the use of XPath or attributes.

The only particular configuration above is for the “bodytext” field, which uses the “rteEnabled” property to indicate that the content from this field is rich text and RTE transformations should be applied upon saving. This helps ensure that such content can be edited correctly in a RTE-enabled field in the TYPO3 backend, although the varying quality of available HTML makes it impossible to guarantee a 100% smooth process.