The external import setup¶
The import of a RSS feed into table tx_news_domain_model_news
poses a particular challenge. We want to store the URI of the news
item in the related links table, which uses IRRE and a "parent" field
to relate links to news items.
We will see later what the trick is. The first important thing to note is the order of import. Since it is links that are related to news items, we must import news before links.
A second peculiarity is that both links and news items are in the same source of data. Thus we will import the RSS feed twice.
Importing news items¶
Thus we start with the news items. A new column was added to the
tx_news_domain_model_news
table. It is used to store the
external id found in the RSS feed.
Here is the setup for the general section:
$GLOBALS['TCA']['tx_news_domain_model_news']['external']['general'] = [
0 => [
'connector' => 'feed',
'parameters' => [
'uri' => 'https://typo3.org/?type=100'
],
'data' => 'xml',
'nodetype' => 'item',
'referenceUid' => 'tx_externalimporttut_externalid',
'enforcePid' => true,
'priority' => 200,
'group' => 'externalimport_tut',
'disabledOperations' => 'delete',
'description' => 'Import of typo3.org news'
],
];
Note that we don't use the same connector service as before. Indeed,
we now need the "feed" sub-type, which is provided by extension
"svconnector_feed". This connector is specialized in getting XML data
from some source (remote or local), which is defined with the
uri
property inside the parameters
array.
Next, we declare that the data will be provided in XML format and that
the reference node type in "item". With this instruction, External
Import will take all nodes of type "item" and import each of them. The
enforcePid
property is set to true
so that the import
takes place only in the predefined page and that existing news items
entered somewhere else are not deleted. This is a useful precaution to
take.
Also note that the delete operation is disabled. This makes sense in this case, as an RSS feed normally contains only the latest news items. Thus if you don't want each import to delete the data from the previous import, the delete operation should be disabled.
In the previous chapter, we said that we wanted to import only the news items
that are part of the "TYPO3 CMS" category. For this, we want to read the
<category>
tag, but not store it in the database. Thus we declare it as an
additional field:
$GLOBALS['TCA']['tx_news_domain_model_news']['external']['additionalFields'] = [
0 => [
'category' => [
'xpath' => './category[text()=\'TYPO3 CMS\']',
'transformations' => [
10 => [
'isEmpty' => [
'invalidate' => true
]
]
]
]
]
];
The "xpath" property makes it so that only items who have the following:
<category>TYPO3 CMS</category>
will have a value in the "category" field. For all other records, it will be empty.
And thus we can filter by using the "isEmpty" transformation property. This property
tests whether a given value is empty or not. By default, it relies on the PHP
empty()
function, but it can also use the Symfony Expression Language for
more sophisticated conditions. In this case, we have declared nothing special, so
empty()
will be used. We then set the "invalidate" sub-property to true
,
meaning that records which have an empty value will be discarded from the imported
dataset. As a result, only items with the "TYPO3 CMS" category are imported.
Let's now look at the setup for the columns:
$GLOBALS['TCA']['tx_news_domain_model_news']['columns']['title']['external'] = [
0 => [
'field' => 'title'
]
];
$GLOBALS['TCA']['tx_news_domain_model_news']['columns']['tx_externalimporttut_externalid']['external'] = [
0 => [
'field' => 'link',
'transformations' => [
10 => [
'trim' => true
]
]
]
];
$GLOBALS['TCA']['tx_news_domain_model_news']['columns']['datetime']['external'] = [
0 => [
'field' => 'pubDate',
'transformations' => [
10 => [
'userFunction' => [
'class' => \Cobweb\ExternalImport\Transformation\DateTimeTransformation::class,
'method' => 'parseDate'
]
]
]
]
];
$GLOBALS['TCA']['tx_news_domain_model_news']['columns']['teaser']['external'] = [
0 => [
'field' => 'description',
'transformations' => [
10 => [
'trim' => true
]
]
]
];
$GLOBALS['TCA']['tx_news_domain_model_news']['columns']['bodytext']['external'] = [
0 => [
'field' => 'encoded',
'transformations' => [
10 => [
'userFunction' => [
'class' => \Cobweb\ExternalimportTut\Transformation\LinkTransformation::class,
'method' => 'absolutizeUrls',
'parameters' => [
'host' => 'https://typo3.org'
]
]
],
20 => [
'rteEnabled' => true
]
]
]
];
$GLOBALS['TCA']['tx_news_domain_model_news']['columns']['type']['external'] = [
0 => [
'transformations' => [
10 => [
'value' => 0
]
]
]
];
$GLOBALS['TCA']['tx_news_domain_model_news']['columns']['hidden']['external'] = [
0 => [
'transformations' => [
10 => [
'value' => 0
]
]
]
];
For most of the fields, the setup is just as simple as if we were importing database records, thanks to the connector services, which have abstracted the tediousness of getting data in different formats. However XML format allows for more complicated retrieval of data via the use of XPath or attributes.
The only particular configuration above is for the "bodytext" field, which uses the "rteEnabled" property to indicate that the content from this field is rich text and RTE transformations should be applied upon saving. This helps ensure that such content can be edited correctly in a RTE-enabled field in the TYPO3 backend, although the varying quality of available HTML makes it impossible to guarantee a 100% smooth process.