External Import 

Extension key

external_import

Package name

cobweb/external_import

Version

main

Language

en

Author

François Suter (Idéative), typo3@ideative.ch

License

This document is published under the Open Publication License.

Rendered

Sat, 10 Jan 2026 20:37:43 +0000


Tool for importing data from external sources into the TYPO3 database, using an extended TCA syntax. Provides a BE module, a Scheduler task, a command-line interface, reactions and an API.


A general presentation of the features provided by this extension.

Installing and updagring the extension, with highlights of new features. General extension configuration.

How the extension works and what are the various tools available.

All the options available when setting up an import configuration.

Everything about events, user functions and all other entry points for programatically enhancing the import process. Description of the main APIs.

Description of the known (and tricky) issues that are not (yet) solved.

Introduction 

This extension is designed to fetch data from external sources and store them into tables of the TYPO3 CMS database. The mapping between this external data and the TYPO3 CMS tables is done by extending the syntax of the TCA. A backend module provides a way to synchronize any table manually or to define a scheduling for all synchronizations. Synchronizations can also be run using the command-line interface. Automatic scheduling can be defined using a Scheduler task. Finally, this extension provides reactions (starting with TYPO3 12) to import or delete data, responding to calls from remote sources.

The main idea of getting external data into the TYPO3 CMS database is to be able to use TYPO3 CMS standard functions on that data (such as enable fields, for example, if available).

Connection to external applications is handled by a class of services called "connectors", the base of which is available as a separate extension (svconnector).

Data from several external sources can be stored into the same table allowing data aggregation.

The extension also provides an API for receiving data from some other source. This data is stored into the TYPO3 CMS database using the same mapping process as when data is fetched directly by the extension.

This extension is quite flexible, thanks to the possibility of calling user functions to transform incoming data, listening to events to react to some part of the process or adding custom steps at any point in the process. It is also possible to create custom connectors for reading from a specific external source. Still this extension was not designed for extensive data manipulation. It is assumed that the data received from the external source is in a "palatable" format. If the external data requires a lot of processing, it is probably better to put it through an ETL or ESB tool first, and then import it into TYPO3 CMS.

Please also check extension externalimport_tut which provides a tutorial to this extension.

More examples can be found in extension "externalimport_test", which is used for testing purposes. The setup is not documented, but can be interesting to look at. This extension is distributed only via Github: https://github.com/cobwebch/externalimport_test

Alternatives 

There exists several extensions for importing data into TYPO3, including the system extension "impexp". Extension "impexp" is specifically designed to export data from a TYPO3 installation and import it again into TYPO3, using a specific file format ("T3D"). When the need is to move around data that is already in a TYPO3 installation, "impexp" is the logical choice. External Import differs by being designed to import data into TYPO3 from a large variety of sources outside TYPO3.

There are other extensions available, like xlsimport and importr, which were released years after External Import and - as such - I never really looked into them since I had all the tools I needed. So it is hard to compare their features.

"xlsimport" is designed for one-time import of data in Excel and CSV format. It cannot be automated. An interface is provided for the mapping configuration, but it cannot be saved. The process is definitely quicker and lighter to set up than External Import, but is limited if you need to import the same data on a regular basis.

"importr" seems to come quite close to External Import in terms of features, although maybe with less flexibility in the data handling and less import sources (import resources can probably be added). It is probably easier to set up than External Import, since it allows for simply pointing to an Extbase model, plus a simple mapping of fields to import.

Questions and support 

If you have any questions about this extension, use the dedicated channel in the TYPO3 Slack workspace (#ext-external_import) or the issue tracker on GitHub (https://github.com/cobwebch/external_import/issues).

Please also check the Troubleshooting section in case your issue is already described there.

Keeping the developer happy 

Every encouragement keeps the developer ticking, so don't hesitate to send thanks or share your enthusiasm about the extension.

If you appreciate this work and want to show some support, please check https://www.monpetitcoin.com/en/support-me/.

Participating 

This tool can be used in a variety of situations and all use cases are certainly not covered by the current version. I will probably not have the time to implement any use case that I don't personally need. However you are welcome to join the development team if you want to bring in new features. If you are interested use GitHub to submit pull requests.

Sponsoring 

You are very welcome to support the further development of this extension. You will get mentioned here.

Credits 

The icon for the log table records is derived from an icon made by iconixar from www.flaticon.com.

Installation 

Installing this extension does nothing in and of itself. You still need to extend the TCA definition of some tables with the appropriate syntax and create specific connectors for the application you want to connect to.

TYPO3 CMS 12 or 13 is required, as well as the "scheduler" and "reactions" system extensions.

Upgrading and what's new 

Upgrade to 8.2.0 

XPath functions can be used in the columns configuration to directly return a string value. Previously XPath expressions could only be used to select a node or list of nodes in the XML structure.

A new event ModifyReactionResponseEvent is available to modify the response of a reaction before it is sent back. Both the response body and the HTTP return code may be changed.

Loading of the TCA has been encapsulated into a repository class, making it easier to follow the evolutions of the TYPO3 Core and allowing developers who might need it to dynamically manipulate the full TCA before the External Import configurations are extracted from it.

Upgrade to 8.1.0 

\Cobweb\ExternalImport\Importer::getContext() and \Cobweb\ExternalImport\Importer::setContext() have been deprecated in favor of \Cobweb\ExternalImport\Importer::getCallType() and \Cobweb\ExternalImport\Importer::setCallType(). These methods rely on the \Cobweb\ExternalImport\Enum\CallType enumeration which is used more consistenty throughout External Import.

A new event ChangeConfigurationBeforeRunEvent makes it possible to modify the External Import configuration at run-time. This happens before any of the import steps is executed.

Upgrade to 8.0.0 

Configurations can now be part of several groups. As such, the "group" property is deprecated and is replaced with the groups property (with an array value rather than string).

System extension "reactions" is now a requirement. The "Import external data" reaction can now target a group of configurations.

The logging mechanism has been changed to store the backend user's name rather than its id. This makes it much easier for the Log module and keeps working even if a user is removed. An update wizard is available for updating existing log records.

In version 7.2.0, a change was introduced to preserve null values from the imported data. It affected only fields with 'eval' => 'null' in their TCA. Since version 8.0.0, null are preserved also for relation-type fields ("group", "select", "inline" and "file") which have no minitems property or 'minitems' => 0. This makes it effectively possible to remove existing relations. This is an important change of behavior, which - although more correct - may have unexpected effects on your date.

A new disabled flag makes it possible to completely hide a configuration.

Upgrade to 7.3.0 

This version introduces a new reaction dedicated to deleting already import data.

Upgrade to 7.2.0 

The HandleDataStep process now keeps null values found in the imported data. This is an important change, but is has a concrete effect only if the target field is nullable (i.e. it has an eval property including null or has property nullable set to true in its TCA configuration). In such cases, existing values will be set to null where they would have been left untouched before. It may also affect user functions in transformations where a null value was not expected to be found until now.

Upgrade to 7.1.0 

External Import now supports PHP 8.2.

When running the preview mode from the backend module, some steps now provide a download button, to retrieve the data being handled in its current state.

When setting a fixed value, the new column configuration property should be preferred over the historical transformation property.

It is now possible to define explicitly the order in which columns are processed.

Upgrade to 7.0.0 

Support for old-style Connector services was droppped (i.e. connectors registered as TYPO3 Core Services). If you use custom connector services, make sure to update them (see the update instructions provided by extension "svconnector").

When editing Scheduler tasks in the External Import backend module, it is no longer possible to define a start date (this tiny feature was a lot of hassle to maintain across TYPO3 versions).

All hooks were removed. If you were still using hooks, please refer to the archived page about hooks to find replacement instructions.

A new ReportStep has been introduced, which triggers a webhook reporting about the just finished import run. In order for this step to run (and do the reporting) even when the process is aborted, a new possibility has been added for steps to run despite the interruption. This actually fixes a bug with the ConnectorCallbackStep which was never called when the process was aborted. If you use such a post-processing, you can now report about failed imports if needed.

New stuff 

The arrayPath is now available as both a general configuration option and a column configuration option. It was also enriched with more capabilities.

A new exception \Cobweb\ExternalImport\Exception\InvalidRecordException was introduced which can be used inside user function to remove an entire record from the data to import if needed.

A new transformation property isEmpty is available for checking if a given data can be considered empty or not. For maximum flexibility, it relies on the Symfony Expression language.

It is also possible to set multiple mail recipients for the import report instead of a single one (see the extension configuration).

Upgrade to older version 

In case you are upgrading from a very old version and proceeding step by step, you find all the old upgrade instructions in the Appendix.

Other requirements 

As is mentioned in the introduction, this extension makes heavy use of an extended syntax for the TCA. If you are not familiar with the TCA, you are strongly advised to read up on it in the TCA Reference manual.

Extension configuration 

The extension has the following configuration options:

Storage PID
Defines a general page where all the imported records are stored. This can be overridden specifically for each table (see Administration below).
Log storage PID
Defines a page where log entries will be stored. The default is 0 (root page).
Force time limit
Sets a maximum execution time (in seconds) for the manual import processes (i.e. imports launched from the BE module). This time limit affects both PHP (where the default value is defined by max_execution_time) and the AJAX calls triggered by the BE module (where the default limit is 30 seconds). This is necessary if you want to run large imports. Setting this value to -1 preserves the default time limit.
Email for reporting

If an email address is entered here, a detailed report will be sent to this address after every automated synchronization. Multiple email adresses may be defined, separated by commas.

Mails are not sent after manual synchronizations started from the BE module. The mail address used for sending the report is ($GLOBALS['TYPO3_CONF_VARS']['MAIL']['defaultMailFromAddress']). If it is not defined, the report will not be sent and an error will will be logged.

Subject of email report
A label that will be prepended to the subject of the reporting mail. It may be convenient – for example – to use the server's name, in case you have several servers running the same imports.
Debug

Check to enable the extension to log some data during import runs. This may have an effect depending on the call context (e.g. in verbose mode on the command line, debug output will be sent to standard output). Debug output is routed using the Core Logger API. Hence if you wish to see more details, you may want to add specific configuration for the \Cobweb\ExternalImport\Importer class which centralizes logging. Example:

$GLOBALS['TYPO3_CONF_VARS']['LOG']['Cobweb']['ExternalImport']['Importer']['writerConfiguration'] = [
    // configuration for ERROR level log entries
    \TYPO3\CMS\Core\Log\LogLevel::DEBUG => [
        // add a FileWriter
        \TYPO3\CMS\Core\Log\Writer\FileWriter::class => [
            // configuration for the writer
            'logFile' => 'typo3temp/logs/typo3_import.log'
        ]
    ]
];
Copied!
Disable logging

Disables logging by the TYPO3 Core Engine. By default an entry will be written in the System > Log for each record touched by the import process. This may create quite a lot of log entries on large imports. Checking this box disables logging for all tables. It can be overridden at table-level by the disableLog.

General considerations 

The purpose of this extension is to take data from somewhere else (called the "external source") than the local TYPO3 CMS database and store it into that local database. Data from the external source is matched to local tables and fields, using information stored in the TCA with the extended syntax provided by this extension.

The extension can either fetch the data from some external source or receive data from any kind of script using the provided API. Fetching data from an external source goes through a standardized process.

Connecting to an external source is achieved using connector services (see extension svconnector), that return the fetched data to the external import in either XML format or as a PHP array. Currently, the following connectors exist:

  • svconnector_csv for CSV and silimar flat files
  • svconnector_feed for XML source files
  • svconnector_json for JSON source files
  • svconnector_sql for connecting to another database

It is quite easy to develop a custom connector, should that be needed.

The external data is mapped to one or more TYPO3 CMS tables using the extended TCA syntax. From then on the table can be synchronized with the external source. Every time a synchronization is started (either manually or according to a schedule), the connector service is called upon to fetch the data. Such tables are referred to as "synchronizable tables". This type of action is called "pulling data".

On the other hand this extension also provides an API that can be called up to pass data directly to the external import process. No connector services are used in this case. The extension is called on a need-to basis by any script that uses it. As such it is not possible to synchronize those tables from the BE module, nor to schedule their synchronization. Such tables are referred to as "non-synchronizable tables". This type of action is called "pushing data".

Note that it is perfectly possible to also push data towards synchronizable tables. The reverse is not true (non-synchronizable tables cannot pull data).

It is perfectly possible to define several import configurations for the same table, thus pulling or pushing data from various sources into a single destination.

Synchronizations can be run in preview mode.

Using the backend modules 

The extension provides two backend modules. The "Data Import" is the main one, displaying all configurations and allowing to start imports manually. The second one, "Log", displays a list of all log entries generated during External Import runs.

Synchronizable tables 

The first function of the "Data Import" BE module – called "Tables with synchronization" – displays a list of all synchronizable tables. The various features are summarized in the picture below.

BE module overview for synchronizable tables

Overview of the synchronizable tables view with all available functions

Viewing configuration details 

Clicking on the information icon leads to a screen showing all the information about that particular configuration. The view consists of three tabs: the first one displays the general configuration, the second one displays the configuration for each column (including the additional fields) and the third one displays the list of steps that the process will go through, including any custom steps.

Inspecting TCA properties

Viewing the details of the TCA properties for External Import

If the configuration contains errors, they will be displayed in this detailed view.

Raised errors about wrong configuration

Viewing errors in the External Import configuration

Triggering a synchronization 

Clicking on the synchronize data button will immediately start the synchronization of the corresponding table. This may take quite some time if the data to import is large. If you move away from the BE module during that time, the process will abort. At the end of the process, flash messages will appear with the results:

Results of synchronization

Flash messages show the results of the synchronization

Running in preview mode 

Clicking on the preview button leads to the preview feature. For running a preview your first need to select a specific step from the process. The synchronization will run up to that step and stop. Preview data gets displayed if available. This depends on the step.

Again depending on the step, a download button may appear or not. If it does, you can use it to retrieve a CSV file of the records being imported, in their state at the end of the previewed step. This makes it easier to explore the data when there is a lot of it.

Most importantly nothing permanent happens in preview mode. For example, data is not stored into the database.

Preview of a synchronization

The synchronization is run up to the Transform Data step and preview data is dumped to the screen

Setting up the automatic schedule 

The automatic scheduling facility relies on the Scheduler to run. On top of the normal Scheduler setup, there are some points you must pay particular attention to in the case of external import.

As can be seen in the above screenshot, the information whether the automatic synchronization is enabled or not is displayed for each table. It is possible to add or change that schedule, by clicking on the respective icons. This leads to an input form where you can choose a frequency, a task group and a start date (date of first execution; leave empty for immediate activation). The frequency can be entered as a number of seconds or using the same syntax as for cron jobs.

Automation input form

Input form for setting automated synchronization parameters

Clicking on the trash can icon cancels the automatic synchronization (a confirmation window will appear first).

At the top of the screen, before the list, it is possible to define a schedule for all tables. This means that all imports will be executed one after the other, in the order of priority.

Automating all tables

Setting automated synchronization for all tables

The same input form appears as for individual automation settings.

Non-synchronizable tables 

The second function of the "Data Import" BE module – called "Tables without synchronization" – displays a list of non-synchronizable tables. This view is purely informative as no action can be taken for these tables. Only the detailed configuration information can be accessed.

BE module overview for non-synchronizable tables

Overview for non-synchronizable tables, with just the information icon

Logs 

As its name implies, the "Log" module displays a list of all log entries generated during External Import runs. The list is sortable and searchable. Each entry has a context, which gives an idea on how the run took place, either triggered manually (via the backend module), run via the Scheduler or the command line, or called using the API. Any other status will appear as "Other".

There is also a duration associated with each log entry. This is actually the duration of the whole import run and will be the same for all log entries related to the same run.

There is not much more to it for now. It may gain new features in the future.

BE module overview for non-synchronizable tables

List of import log entries

The Scheduler task 

The External Import process can be automated using the provided Scheduler task. The automation can be defined from the External Import backend module or directly from the Scheduler backend module.

The taks provides two specific options:

View of the External Import Scheduler task options

The options of the External Import Scheduler task

Item to synchronize
Choose which import configuration to automate. If you choose "all", all configurations will be synchronized in order of priority. The selector also provides a choice of all available groups and of each individual configuration.
Storage page
This is the uid of a TYPO3 page. The imported data will be stored in that page, no matter what has been configured in the TCA or in the extension settings.

The command-line interface 

The External Import process can be called from the command line. It can be used to run a single synchronization, all of them or a group of them. When several synchronizations are run, they happen in order of increasing priority. The following operations are possible:

List all configurations available for synchronization
path/to/php path/to/bin/typo3 externalimport:sync --list
Synchronize everything
path/to/php path/to/bin/typo3 externalimport:sync --all.
Synchronize a group of configurations
path/to/php path/to/bin/typo3 externalimport:sync --group=(group name).
Synchronize a single configuration
path/to/php path/to/bin/typo3 externalimport:sync --table=foo --index=bar.

Forcing the storage page 

The storage flag can be used to pass the id of a page in the TYPO3 system where the imported data will be stored. This overrides both the TCA and the extension settings.

Running in preview mode 

Preview mode can be activated by using the preview flag and a Step class name as argument. The import process will stop after the given step and return some preview data (or not; that depends on the step). No permanent changes are made (e.g. nothing is saved to the database).

A typical command will look like:

path/to/php path/to/bin/typo3 externalimport:sync --table=foo --index=bar --preview='Cobweb\\ExternalImport\\Step\\TransformDataStep'
Copied!

This will stop the process after the TransformDataStep and dump the transformed data in the standard output. Mind the correct syntax for defining the Step class (quote with no opening backslash).

Debugging on the command-line 

Debugging on the command-line is achieved by using the verbose flag, which is available for all commands. If global debugging is turned on (see the Extension configuration), debugged variables will be dumped along with the usual output from the External Import command. If global debugging is disabled, it can be enabled for a single run, by using the "debug" flag:

path/to/php path/to/bin/typo3 externalimport:sync --table=foo --index=bar --debug -v
Copied!

Reactions (External Import endpoints) 

When using TYPO3 12, External Import provides reactions, i.e. endpoints which can be called by any third-party software to push data to import or to delete imported data.

Both reactions are defined in the same way. The expected payload is different, and this is explained further down.

The "Import external data" reaction will import the data defined in the payload as per the usual External Import process, inserting, updating and deteting records by matching the incoming data set with the existing data set. However the import reaction could also be used to import a single record (for example, if it used as a webhook in a third-party application). In such a case, it is still easy to insert or update, but the deleting of records cannot be automated anymore. This is where the "Delete external data" reaction comes in. With it, one or more records can be targeted for deletion, using their external primary key to identify them.

Defining the reaction 

A reaction must be defined using the "Reactions" module in the TYPO3 backend. There can be more than one External Import reaction depending on your needs. Having several reactions allows you to distribute secret keys to different people.

Defining a reaction

Defining a reaction in the dedicated backend module

Choosing a configuration is optional. If one is chosen, the reaction will only execute if the incoming configuration matches the selected configuration. This provides better safety, but is more restrictive.

It is absolutely necessary to choose a BE user to impersonate, otherwise the data will not be stored. The easiest option is to choose the _cli_ user but this may seem too encompassing. You can use another BE user or define a specific one, but make sure that it has the proper rights for writing to the table(s) targeted by the import.

External Import configuration 

The External Import configuration does not need anything special to be used by a reaction. However if it is only ever used by reactions, then it does not need connector information and can thus be a Non-synchronizable table.

Request payload 

To call the endpoint and trigger the External Import reaction, you need to call the URI given by the reaction and pass it the secret key in the headers. The payload in the request body is comprised of the following information:

table
The name of the table targeted by the reaction (not necessary when a configuration is explicitly defined).
index
The index of the targeted External Import configuration (not necessary when a configuration is explicitly defined).
group
Instead of defining a table and an index, it is also possible to define a group. In such a case, all configurations from the corresponding group will be executed in order of increasing priority. This is used only for the "Import external data" reaction. It is incompatible with a table and index definition. Defining both will trigger an error. It is not necessary when a group has been explicitly defined.
data

The data to handle.

For the "Import external data" reaction, this can be either a JSON array (for array-type data) or a (XML) string for XML-type data).

For the "Delete external data" reaction, it must be a JSON array, with the item(s) to delete. The key for identifying the external data must be in a field called "external_id". Example:

{
    "table": "tx_externalimporttest_tag",
    "index": "api",
    "data": [
        {
            "external_id": "miraculous"
        },
        {
            "external_id": "rotten"
        }
    ]
}
Copied!

If the incoming data cannot match this structure (but is still a JSON array), use the GetExternalKeyEvent event to extract the external key from the incoming data. If the incoming data does not match the above structure at all, you have to develop your own reaction.

pid (optional)

If defined, this uid from the "pages" table will override the pid property from the general configuration.

This is not used by the "Delete external data" reaction.

Here is how it could look like (example made with Postman):

Request headers

The header with the URI, the accepted content type and the secret key

Request body

The body of the payload with the table name, configuration index and data to import

The delete reaction 

Since the "Delete external data" reaction is dedicated to deleting records, it is quite different from the other bits of code in External Import. As far as reaction payload is concerned, this has been discussed above.

About the configuration, it is important to understand that most of the configuration is not used by the delete process. In fact the only properties that are used from the general configuration are:

  • referenceUid to know in which field the external primary key is stored.
  • enforcePid, which could be useful is a scenario where you would import the same records to different places in your TYPO3 installation, and thus have external primary keys which are unique only per pid.
  • whereClause

Reaction response 

The response contains a success entry with value true or false.

If the success is false, the response will contain a error entry (string) for the delete reaction or an errors entry (array of strings) for the import reaction. These contain information about what went wrong.

If the success is true, the response will contain a message entry (string) for the delete reaction or an messages entry (array of strings) for the import reaction. These contain information about the number of operations performed.

Webhook (outgoing message) 

When using TYPO3 12, External Import provides a webhook, i.e. a message that can be sent to some third-party endpoint.

Defining the webhook 

A webhook must be defined using the "Webhooks" module in the TYPO3 backend, choosing the "... when an External Import run is completed" trigger. You can define several webhooks with the same trigger. Defining the webhook is essentially about setting the target URL and generating the "secret" using the field provided by TYPO3.

Defining a webhook

Defining a webhook in the dedicated backend module

The message is sent right after an import has completed, in the ReportStep of the import process. The payload sent by External Import message contains the following information:

  • the name of the table
  • the index of the import configuration
  • the description of the import configuration
  • all the messages reported by the process, in three categories (success, warnings and errors).

Mapping data 

In the Administration chapter, you will find explanations about how to map the data from the external source to existing or newly created tables in the TYPO3 CMS database. There are two mandatory conditions for this operation to succeed:

  • the external data must have the equivalent of a primary key
  • this primary key must be stored into some column of the TYPO3 CMS database, but not the uid column which is internal to TYPO3 CMS.

The primary key in the external data is the key that is used to decide whether a given entry in the external data corresponds to a record already stored in the TYPO3 CMS database or if a new record should be created for that entry. Records in the TYPO3 CMS database that do not match primary keys in the external data can be deleted if desired.

Import scenarios 

External Import offers many options, some of which can be combined. This can sometimes be confusing. This chapter attempts to explain some import scenarios in order to show what is possible with External Import. It is possible to create other scenarios than those shown below.

Above all else, the preview mode is your friend. Test and tune your configuration and check what data structure results using the preview at any step in the process.

The simplest scenario 

The simplest scenario is when one row/line of external data corresponds to one record in the TYPO3 database, possibly after some transformations. This is what this image tries to convey:

The simplest import scenario

One line of external data is read (red), it goes through some transformations (grey) and finally gets saved to the TYPO3 database (green).

Multiple values 

One particular scenario is when one or more fields in the external data contains multiple values, often comma-separated. What you probably want is to access each individual value, apply transformations to it and then reconcatenate it. This is what the multipleValuesSeparator property does. It takes each value, tries to match it to an entry in the given database table and concatenates again (with a comma), all the values that were mapped. This is can be represented as:

Import scenario with multiple values separator

The external data (red) contains values that correspond to keys in the TYPO3 database. The values are matched one by one (little magenta squares in the grey area) and concatenated again for saving to the TYPO3 database (green).

Denormalized data 

One common scenario - particularly with flat (CSV) data - is to received denormalized data. This means that the data itself represents a many-to-many relation between two sets of entities and that the total number of row/lines does not represent the actual amount of entities but the amount of relationships between them. External Import takes care not to import duplicate entries and automatically filters on the defined external key (see property referenceUid).

However if you don't do any specific configuration, it is always the first row that will be imported and the others will simply be discarded. This may not be what you want. A schema for this situation could be:

Import scenario with denormalized data and no specific configuration

The black key and the white key represent the external keys. Among the four rows, there are only two different keys. And indeed, at the end of the process (green), only two records are created in the TYPO3 database.

The column with the pattern represents the denormalized data. During the process (grey), inside each row, this column may be mapped to some other database table (magenta squares), but then only the first row is actually stored.

Denormalized data with multiple rows 

The previous scenario may correspond to a real use case, but most likely not, because it involves losing relationship information. To preserve it, one way is to use the multipleRows property. It is defined at column-level and instructs External Import to not discard the excess data, but to keep and merge it after all other transformations (it is assembled as a comma-separated list of values).

The result can be represented as:

Import scenario with denormalized data and multiple rows activated

Only two records are created but the many-to-many relations are preserved.

Substructure fields with multiple rows 

Another scenario is that the external data is not a flat structure, but contains nested data. This is what the substructureFields property is for. It allows to fetch a value inside a deeper structure. But if there are mutliple values, it will actually trigger an on-the-fly denormalization of the external data, as the schema below attempts to portray:

Import scenario with substructure fields and multiple rows activated

The structured nested inside the external data (little yellow squares inside the red bar) is extracted leading to two rows durign the process. The process may also add columns. If the fields of the substructure are mapped to names of already defined columns (from the column configuration or the additional fields), the values will be put into those fields (and replace any existing value). If they are mapped to differents names, however, this will create new columns. A mix and match is possible.

In the schema above, the yellow column is new and the striped grey column represents an existing column which was "overridden" with values from the substructure.

Note that extra columns do not have a full definition like the other columns and thus don't go through the Transformation step (but are available in the rows for manipulation inside user functions or custom steps). They are also not stored to the database. If you map a substructure field to an existing column, it will both go through the Transformation step and be saved to the database.

As for the extra rows they are collapsed back using comma-separated list of values in the columns for which the multipleRows property was set.

Substructure fields with child records 

Starting from the same scenario as above, it is also possible to define child records with the children property instead of using multipleRows. In this case, the denormalized rows are not collapsed but each row is used to create a separate child record:

Import scenario with substructure fields and child recrods

Substructure fields may be used to fill children columns.

Clearing the cache 

When data is imported into your TYPO3 CMS installation, you may want to clear the cache for a number of pages in order for the new data to be displayed as soon as it is available. One way to achieve this is to rely purely on TYPO3 CMS and use the TSconfig property:

TCEMAIN.clearCacheCmd = xx,yy
Copied!

on the page(s) where the data is stored to automatically trigger the clearing of the cache for the given pages (xx and yy) when any record they contain is modified or deleted, or some new record inserted.

This works fine but has one big drawback: it is triggered for each record. If you manipulate a lot of records, the cache clearing may be called hundreds or thousands of times. This can be very bad for your site, especially if you have a very large cache.

It is also possible to trigger the clearing of the cache after the whole import process has completed for a given configuration. Instead of using TSconfig, the configuration would be something like:

$GLOBALS['TCA']['tx_news_domain_model_news']['external']['general']['0']['clearCache'] = 'xx,yy';
Copied!

This will clear the cache for pages "xx" and "yy", but only after all records have been inserted, updated and deleted. The process still relies on DataHandler for clearing the cache of each page, so you may rely on the usual clear cache hooks if needed.

Besides page numbers, you can also use more general cache identifiers like "pages" (to clear the cache for all pages), cache tags, or any other value that can be used with TCEMAIN.clearCacheCmd.

Debugging 

There are many potential sources of error during synchronization, from wrong mapping configurations to missing user rights to PHP errors in user functions. When a synchronization is launched from the BE module a status is displayed when the operation is finished.

The extension tries to report at best on the success or failure of the operation. Turning on the "debug" mode (see the Configuration chapter) will provide additional information.

As described in the Configuration chapter, it is also possible to receive a detailed report by email. It will contain a general summary of what happened during synchronization, but also all error messages logged by the TYPO3 Core Engine, if any.

Troubleshooting 

This chapter tries to address a number of common issues.

The automatic synchronization is not being executed 

You may observe that the scheduled synchronization is not taking place at all. Even if the debug mode is activated and you look at the logs, you will see no call to external_import. This may happen when you set a too high frequency for synchronizations (like 1 minute for example). If the previous synchronization has not finished, the Scheduler will prevent the new one from taking place. The symptom is a message like "[scheduler]: Event is already running and multiple executions are not allowed, skipping! CRID: xyz, UID: nn" in the system log (SYSTEM > Log). In this case you should stop the current execution in the Scheduler backend module.

The manual synchronization never ends 

It may be that no results are reported during a manual synchronization and that the looping arrows continue spinning endlessly. This happens when something failed completely during the synchronization and the BE module received no response. See the advice in Debugging.

All the existing data was deleted 

The most likely cause is that the external data could not be fetched, resulting in zero items to import. If the delete operation is not disabled, External import will take that as a sign that all existing data should be deleted, since the external source didn't provide anything.

There are various ways to protect yourself against that. Obviously you can disable the delete operation, so that no record ever gets deleted. If this is not desirable, you can use the "minimumRecords" option (see General TCA configuration) below. For example, if you always expect at least 100 items to be imported, set this option to 100. If fewer items than this are present in the external data, the import process will be aborted and nothing will get deleted.

Data on unrelated pages was deleted 

If all imported data should only be syncronized in a certain page, set the enforcePid to 1 to prevent the import from altering or deleting data on pages with a different page ID.

Only a single entry was imported 

This generally happens when the referenceUid property is wrongly defined. External Import is unable to differentiate the records from the external source and each record overwrites the preceding one. In the end, only the last one is actually imported.

Can I leave out records with "empty" fields? 

A likely scenario is wanting to leave out records where one field is empty. There's no configuration property for that as it is a difficult topic. First of all what constitutes an "empty field" will vary depending on the incoming data and what handling is applied to it. What is more one may want to filter the data at different points in the process (e.g. after the data is read or after the data is transformed).

This is why there is no configuration property for "requiring" a field. Such a need is better addressed by creating a custom step, that can applied specific criteria and at a precise point in the import process.

Process overview 

The schema below provides an overview of the external import process:

Import process overview

The various steps of the external import process

The process is comprised of steps, each of which corresponds to a PHP class (found in Classes/Step). The steps are not the same when synchronizing (pulling) data or when using the API or a reaction (pushing). In the above schema, the steps with a gradient background belong to both processes. The ones with a single color background are called only by the corresponding process.

Each step may affect the raw data (the data provided by the external source) and the so-called "records" (the data as it is transformed by External Import along the various steps). A step can also set an "abort" flag, which will interrupt the import process after the step has completed. The following steps will not be executed unless specifically designed to do so (this is indicated in the list below).

The following is an overview of what each step does:

class CheckPermissionsStep
Fully qualified name
\Cobweb\ExternalImport\Step\CheckPermissionsStep

Check permissions

This step checks whether the current user has the rights to modify the table into which data is being imported. If not, the process will abort.

class ValidateConfigurationStep
Fully qualified name
\Cobweb\ExternalImport\Step\ValidateConfigurationStep

Validate configuration

This step checks that the main configuration as well as each column configuration are valid. If any of them is not, the process will abort. The process will also abort if there is no general configuration or not a single column configuration.

class ValidateConnectorStep
Fully qualified name
\Cobweb\ExternalImport\Step\ValidateConnectorStep

Validate connector

This steps checks if a Connector has been defined for the synchronize process. In a sense, it is also a validation of the configuration, but restricted to a property used only when pulling data.

Up to that point, the \Cobweb\ExternalImport\Domain\Model\Data object contains no data at all.

class ReadDataStep
Fully qualified name
\Cobweb\ExternalImport\Step\ReadDataStep

Read data

This step reads the data from the external source using the defined Connector. It stores the result as the "raw data" of the \Cobweb\ExternalImport\Domain\Model\Data object.

class HandleDataStep
Fully qualified name
\Cobweb\ExternalImport\Step\HandleDataStep

Handle data

This step takes the raw data, which may be a XML structure or a PHP array, and makes it into an associative PHP array. The keys are the names of the columns being mapped and any additional fields declared with the additionalFields property. The values are those of the external data. The results are stored in the "records" of the \Cobweb\ExternalImport\Domain\Model\Data object.

class ValidateDataStep
Fully qualified name
\Cobweb\ExternalImport\Step\ValidateDataStep

Validate data

This steps checks that the external data passes whatever conditions have been defined. If this is not the case, the process is aborted.

class TransformDataStep
Fully qualified name
\Cobweb\ExternalImport\Step\TransformDataStep

Transform data

This step applies all the possible transformations to the external data, in particular mapping it to other database tables. The "records" in the \Cobweb\ExternalImport\Domain\Model\Data object are updated with the transformed values.

class StoreDataStep
Fully qualified name
\Cobweb\ExternalImport\Step\StoreDataStep

Store data

This is where data is finally stored to the database. Some operations related to MM relations also happen during this step. The "records" in the \Cobweb\ExternalImport\Domain\Model\Data object now contain the "uid" field.

class ClearCacheStep
Fully qualified name
\Cobweb\ExternalImport\Step\ClearCacheStep

Clear cache

This step runs whatever cache clearing has been configured.

class ConnectorCallbackStep
Fully qualified name
\Cobweb\ExternalImport\Step\ConnectorCallbackStep

Connector callback

In this step the connector is called again in case one wishes to perform some clean up operations on the source from which the data was imported (for example, mark the source data as having been imported). The postProcessOperations() method of the connector API is called.

This step is called even if the process was aborted, so that error handling can happen with regards to the connector.

class ReportStep
Fully qualified name
\Cobweb\ExternalImport\Step\ReportStep

Report

This last step on the process performs reporting, essentially writing all log entries. It also triggers the \Cobweb\ExternalImport\Event\ReportEvent, which itself triggers the end of run webhook message.

This step is called even if the process was aborted, so that error can be reported.

It is possible to add custom Step classes at any point in the process. On top of this several steps trigger events which allow for further interactions with the default process.

Tutorial 

Extension externalimport_tut provides an extensive tutorial about external import. It makes use of many configuration options. All examples are discussed in the extension's manual.

Test extension 

Extension externalimport_test also contains many example configurations which are use for integration (functional) testing. The extension itself does not contain a detailed documentation like the tutorial, but it is still a useful resource. The many scenarios and features covered in that extension are briefly mentioned below to help you find your way around it. It is structured according to the file names containing the TCA, either in Configuration/TCA.

tx_externalimporttest_bundle.php

Scenario: import of 1:n relationships (bundles to products) with denormalized data, preserving sorting order, using multipleRows and multipleSorting.

Additional usage of: additional fields, user function transformations, array path (at column level).

tx_externalimporttest_designer.php
Scenario: import data (designers) nested inside other data (products) in a XML structure using XPath (nodepath property).
tx_externalimporttest_invoice.php
Scenario: import denormalized data from a XML file with namespaced tags, using properties namespaces and fieldNS.
tx_externalimporttest_order.php
Scenario: import 1:n relationships (orders to products) from nested data into an IRRE structure. Usage of arrayPath (at general level), substructureFields and children properties.
tx_externalimporttest_product.php

Products are used for testing several scenarios. They are described below according to the configuration key:

  • base: usage of an EventListener (listening to \Cobweb\ExternalImport\Event\ProcessConnectorParametersEvent), of a custom step, of XPath at column level (property xpath); creation of 1:n relations to tags from comma-separated values (property multipleValuesSeparator) and creation of file references using both substructureFields and children properties.
  • more: simpler import scenario than "base", but from a siliar XML structure and thus the same mapping. Tests the usage of the useColumnIndex property.
  • stable: same as "more", testing the disabling of both "update" and "delete" operations, using property disabledOperations.
  • products_for_stores: creation of m:n relations between stores and products, from the product side. Again usage of the children property for creating IRRE entries.
  • general_configuration_errors: as the name implies, this configuration contains many errors and is used for testing the general configuration validator.
  • updated_products: importing products that change name (for testing the updateSlugs property) and also that change "pid" (for testing the moving of records).
tx_externalimporttest_store.php
Scenario: import stores and their m:n relations to products, from the store side, again usage of the children property for creating IRRE entries.
tx_externalimporttest_tag.php

Like products, tags are used to test several scenarios:

  • 0: usage of a custom step to filter out some entries.
  • only-delete: this one is really specific to integration testing, as it is used to test the deletion of existing tags (loaded from a fixture during testing) when importing.
  • api: tests the usage of External Import as an API. See class \Cobweb\ExternalimportTest\Command\ImportCommand.
Overrides/pages.php
Scenario: importing some data (in this case products) as pages to test ordering and nesting (some pages are children of others). The configuration itself is very simple.
Overrides/sys_category.php

Two scenarios are tested here:

  • product_categories: simple import into an existing table, extending for storing the external id.
  • column_configuration_errors: this configuration contains many errors and is used for testing the column configuration validator.
Overrides/tx_externalimporttest_product.php
This is just used to demonstrate how to make a table categorizable and import categories relationships. It is related to the "base" configuration for products above.

Import configuration 

To start inserting data from an external source into your TYPO3 CMS tables, you must first extend their TCA with a specific syntax. This syntax is comprised of 3 parts:

  • general information ("General TCA configuration")
  • specific information for each column where data will be stored ("Columns configuration")
  • so-called "additional fields" which are read from the external source, but not saved

The first two parts are required, the third is optional.

This chapter describes all possible configuration options. For each property, a step or a more general scope is mentioned to help understand which part of the process it impacts. The names of the steps correspond to the process steps.

There are some code examples throughout this chapter. They are taken either from the External Import Tutorial or from the test extension: https://github.com/fsuter/externalimport_test. You are encouraged to refer to them for more examples and more details about each example (in the Tutorial).

User rights 

Before digging into the TCA specifics let's have a look at the topic of user rights. Since External Import relies on \TYPO3\CMS\Core\DataHandling\DataHandler for storing data, the user rights on the synchronized tables will always be enforced. However additional checks are performed in both the BE module and the automated tasks to avoid displaying sensitive data or throwing needless error messages.

When accessing the BE module, user rights are taken into account in that:

  • a user must have at least listing rights on a table to see it in the BE module.
  • a user must have modify rights on a table to be allowed to synchronize it manually or define an automated synchronization for it.

Furthermore explicit permissions must be set in the BE user group for allowing a user to run synchronizations from the BE module and to define Scheduler tasks. This is found at the bottom of the "Access Lists" tab.

Specific user permissions

Setting specific permissions for the BE module

DB mount points are not checked at this point, so the user may be able to start a synchronization and still get error messages if not allowed to write to the page where the imported data should be stored.

An automated synchronization will be run by the Scheduler. This means that the active user will be _cli_, who is an admin user. Thus no special setup is needed. The same is true for command-line calls.

General TCA configuration 

Here is an example of a typical general section syntax, containing two import configurations.

Each configuration must be identified with a key (in the example below, 0 and 'api'). The same keys need to be used again in the column configuration.

$GLOBALS['TCA']['tx_externalimporttest_tag'] = array_merge_recursive( $GLOBALS['TCA']['tx_externalimporttest_tag'], [
    'external' => [
         'general' => [
              0 => [
                   'connector' => 'csv',
                   'parameters' => [
                        'filename' => 'EXT:externalimport_test/Resources/Private/ImportData/Test/Tags.txt',
                        'delimiter' => ';',
                        'text_qualifier' => '"',
                        'encoding' => 'utf8',
                        'skip_rows' => 1
                   ],
                   'data' => 'array',
                   'referenceUid' => 'code',
                   'priority' => 5000,
                   'description' => 'List of tags'
              ],
              'api' => [
                   'data' => 'array',
                   'referenceUid' => 'code',
                   'description' => 'Tags defined via the import API'
              ]
         ]
    ],
]);
Copied!

All available properties are described below.

Properties 

Property Data type Scope/Step
additionalFields string Read data
arrayPath string Handle data (array)
arrayPathFlatten bool Handle data (array)
arrayPathSeparator string Handle data (array)
clearCache string Clear cache
columnsOrder string Transform data
connector string Read data
customSteps array Any step
data string Read data
dataHandler string Handle data
description string Display
disabled boolean General
disabledOperations string Store data
disableLog boolean Store data
enforcePid boolean Store data
group string Sync process
groups array Sync process
minimumRecords integer Validate data
namespaces array Handle data (XML)
nodetype string Handle data (XML)
nodepath string Handle data (XML)
parameters array Read data
pid integer Store data
priority integer Display/automated import
referenceUid string Store data
updateSlugs boolean Store data
useColumnIndex string or integer Configuration
whereClause string Store data

connector 

Type
string
Description

Connector service subtype.

Must be defined only for pulling data. Leave blank for pushing data. You will need to install the relevant connector extension. Here is a list of available extensions and their corresponding types:

Type Extension
csv svconnector_csv
json svconnector_json
sql svconnector_sql
feed svconnector_feed
Scope
Read data

parameters 

Type
array
Description

Array of parameters that must be passed to the connector service.

Not used when pushing data.

Scope
Read data

data 

Type
string
Description
The format in which the data is returned by the connector service. Can be either xml or array.
Scope
Read data

dataHandler 

Type
string
Description
A class name for replacing the standard data handlers. See the Developer's Guide for more details.
Scope
Handle data

disabled 

Type
bool
Description
A disabled configuration is completely ignored by External Import. It does not appear in any listing, not will it ever by synchronized. This can be useful, for example, when you share a package between TYPO3 installations, but do not need to run the imports everywhere.
Scope
General

groups 

Type
array
Description
Any External Import configuration may belong to one or more groups. A group is just an arbitrary string. It is possible to execute the synchronization of all configurations in a given group in one go, in order of priority (lowest goes first). Group synchronization is available on the command line and in the Scheduler task.
Scope
Sync process

group 

Type
string
Description

This can be any arbitrary string of characters. All External Import configurations having the same value for the "group" property will form a group of configurations. It is then possible to execute the synchronization of all configurations in the group in one go, in order of priority (lowest goes first). Group synchronization is available on the command line and in the Scheduler task.

Scope
Sync process

nodetype 

Type
string
Description
Name of the reference nodes inside the XML structure, i.e. the children of these nodes correspond to the data that goes into the database fields (see also the description of the field attribute).
Scope
Handle data (XML)

nodepath 

Type
string
Description
XPath expression for selecting the reference nodes inside the XML structure. This is an alternative to the nodetype property and will take precedence if both are defined.
Scope
Handle data (XML)

arrayPath 

Type
string
Description

Pointer to a sub-array inside the incoming external data, as a list of keys separated by some marker. The sub-array pointed to will be used as the source of data in the subsenquent steps, rather than the whole structure that was read during the ReadDataStep.

For more details on usage and available options, see the dedicated page.

Scope
Handle data (array)

arrayPathFlatten 

Type
bool
Description

When the special * segment is used in an arrayPath, the resulting structure is always an array. If the arrayPath target is actually a single value, this may not be desirable. When arrayPathFlatten is set to true, the result is preserved as a simple type.

Scope
Handle data (array)

arrayPathSeparator 

Type
string
Description
Separator to use in the arrayPath property. Defaults to / if this property is not defined.
Scope
Handle data (array)

referenceUid 

Type
string
Description

Name of the column where the equivalent of a primary key for the external data is stored.

Records for which this data does not exist are skipped (since version 6.1). This is tested with PHP's isset() function. If you think your data may contain empty values and you wish to skip them too, use the isEmpty transformation property with the invalidate option set to true.

Scope
Store data

priority 

Type
integer
Description

A level of priority for the execution of the synchronization. Some tables may need to be synchronized before others if foreign relations are to be established. This gives a clue to the user and a strict order for scheduled synchronizations (either when synchronizing all configurations or when synchronizing a group).

The lowest priority value goes first.

If priority is not defined, a default value of 1000 is applied (defined by class constant \Cobweb\ExternalImport\Importer::DEFAULT_PRIORITY).

Not used when pushing data.

Scope
Display/Automated import process

pid 

Type
string
Description
ID of the page where the imported records should be stored. Can be ignored and the general storage pid is used instead (see Configuration).
Scope
Store data

enforcePid 

Type
boolean
Description

If this is set to true, all operations regarding existing records will be limited to records stored in the defined pid (i.e. either the above property or the general extension configuration). This has two consequences:

  1. when checking for existing records, those records will be selected only from the defined pid.
  2. when checking for records to delete, only records from the defined pid will be affected

This is a convenient way of protecting records from operations started from within the external import process, so that it won't affect e.g. records created manually.

Scope
Store data

useColumnIndex 

Type
string or integer
Description

In a basic configuration the same index must be used for the general TCA configuration and for each column configuration. With this property it is possible to use a different index for the column configurations. The general configuration part has to exist with its own index (say "index A"), but the columns may refer to another index (say "index B") and thus their configuration does not need to be defined. Obviously the index referred to ("index B") must exist for columns.

The type may be a string or an integer, because a configuration key may also be either a string or an integer.

Since version 6.1, it is possible to define specific configurations for selected columns using the index from the general configuration ("index A"). It will not be overridden by the configuration corresponding to the index referred to with useColumnIndex property ("index B").

Example:

'stable' => [
    'connector' => 'feed',
    'parameters' => [
        'uri' => 'EXT:externalimport_test/Resources/Private/ImportData/Test/StableProducts.xml',
        'encoding' => 'utf8'
    ],
    'group' => 'Products',
    'data' => 'xml',
    'nodetype' => 'products',
    'referenceUid' => 'sku',
    'priority' => 5120,
    'useColumnIndex' => 'base',
    ...
],
Copied!

This general configuration makes reference to the "base" configuration. This means that all columns will use the "base" configuration, unless they have a configuration using specifically the "stable" index. So the "sku" column will use the configuration from the "base" index:

'sku' => [
    'exclude' => false,
    'label' => 'SKU',
    'config' => [
        'type' => 'input',
        'size' => 10
    ],
    'external' => [
        'base' => [
            'xpath' => './self::*[@type="current"]/item',
            'attribute' => 'sku'
        ],
        'products_for_stores' => [
            'field' => 'product'
        ],
        'updated_products' => [
            'field' => 'product_sku'
        ]
    ]
],
Copied!

However, the "name" column has a specific configuration corresponding to the "stable" index, so it will be used, and not the configuration from the "base" index:

'name' => [
    'exclude' => false,
    'label' => 'Name',
    'config' => [
        'type' => 'input',
        'size' => 30,
        'eval' => 'required,trim',
    ],
    'external' => [
        'base' => [
            'xpath' => './self::*[@type="current"]/item',
        ],
        'stable' => [
            'xpath' => './self::*[@type="current"]/item',
            'transformations' => [
                10 => [
                    'userFunction' => [
                        'class' => \Cobweb\ExternalimportTest\UserFunction\Transformation::class,
                        'method' => 'caseTransformation',
                        'parameters' => [
                            'transformation' => 'upper'
                        ]
                    ]
                ]
            ]
        ],
        'updated_products' => [
            'field' => 'name'
        ]
    ]
],
Copied!
Scope
Configuration

columnsOrder 

Type
string
Description

By default, columns (regular columns or additional fields) are handled in alphabetical order whenever a loop is performed on all columns (typically in the \Cobweb\ExternalImport\Step\TransformDataStep class). This can be an issue when you need a specific column to be handled before another one.

With this property, you can define a comma-separated list of columns, that will be handled in that specific order. It is not necessary to define an order for all columns. If only some columns are explicitly ordered, the rest will be handled after the ordered ones, in alphabetical order. The order is visually reflected in the backend module, when viewing the configuration details.

Scope
Transform data (essentially)

customSteps 

Type
array
Description

As explained in the process overview, the import process goes through several steps, depending on its type. This property makes it possible to register additional steps. Each step can be placed before or after any existing step (including previously registered custom steps).

The configuration is a simple array, each entry being itself an array with three properties:

  • class (required): name of the PHP class containing the custom step.
  • position (required): states when the new step should happen. The syntax for position is made of the keyword before or after, followed by a colon (:) and the name of an existing step class.
  • parameters (optional): array which is passed as is to the custom step class when it is called during the import process. Inside the step, it can be accessed using $this->parameters.

Example:

'customSteps' => [
        [
                'class' => \Cobweb\ExternalimportTest\Step\EnhanceDataStep::class,
                'position' => 'after:' . \Cobweb\ExternalImport\Step\ValidateDataStep::class
        ]
],
Copied!

If any element of the custom step declaration is invalid, the step will be ignored. More information is given in the Developer's Guide.

Scope
Any step

whereClause 

Type
string
Description

SQL condition that will restrict the records considered during the import process. Only records matching the condition will be updated or deleted. This condition comes on top of the "enforcePid" condition, if defined.

Scope
Store data

additionalFields 

Type
string
Description
This property is not part of the general configuration anymore. Please refer to the dedicated chapter.
Scope
Read data

updateSlugs 

Type
boolean
Description
Slugs are populated automatically for new records thanks to External Import relying on the \TYPO3\CMS\Core\DataHandling\DataHandler class. The same is not true for updated records. If you want record slugs to be updated when modified external data is imported, set this flag to true.
Scope
Store data

namespaces 

Type
array
Description

Associative array of namespaces that can be used in XPath queries. The keys correspond to prefixes and the values to URIs. The prefixes can then be used in XPath queries.

Example

Given the following declaration:

'namespaces' => array(
   'atom' => 'http://www.w3.org/2005/Atom'
)
Copied!

a Xpath query like:

atom:link
Copied!

could be used. The prefixes used for XPath queries don't need to match the prefixes used in the actual XML source. The defaut namespace has to be registered too in order for XPath queries to succeed.

Scope
Handle data (XML)

description 

Type
string
Description
A purely descriptive piece of text, which should help you remember what this particular synchronization is all about. Particularly useful when a table is synchronized with multiple sources.
Scope
Display

disabledOperations 

Type
string
Description

Comma-separated list of operations that should not be performed. Possible operations are insert, update and delete. This way you can block any of these operations.

insert
The operation performed when new records are found in the external source.
update
Performed when a record already exists and only its data needs to be updated.
delete
Performed when a record is in the database, but is not found in the external source anymore.

See also the column-specific property disabledOperations.

Scope
Store data

minimumRecords 

Type
integer
Description
Minimum number of items expected in the external data. If fewer items are present, the import is aborted. This can be used – for example – to protect the existing data against deletion when the fetching of the external data failed (in which case there are no items to import).
Scope
Validate data

disableLog 

Type
integer
Description
Set to true to disable logging by the TYP3 Core Engine. This setting will override the general "Disable logging" setting (see Configuration for more details).
Scope
Store data

clearCache 

Type
string
Description
Comma-separated list of caches identifiers for caches which should be cleared at the end of the import process. See Clearing the cache.
Scope
Clear cache

Columns configuration 

You also need an "external" syntax for each column to define which external data goes into that column and any handling that might apply. This is also an indexed array. Obviously indices used for each column must relate to the indices used in the general configuration. In its simplest form this is just a reference to the external data's name:

'code' => [
    'exclude' => 0,
    'label' => 'LLL:EXT:externalimport_tut/locallang_db.xml:tx_externalimporttut_departments.code',
    'config' => [
        'type' => 'input',
        'size' => 10,
        'max' => 4,
        'eval' => 'required,trim',
    ],
    'external' => [
        0 => [
            'field' => 'code'
        ]
    ]
],
Copied!

The properties for the columns configuration are described below.

Properties 

Property Data type Step/Scope
arrayPath string Handle data (array)
arrayPathSeparator string Handle data (array)
arrayPathFlatten bool Handle data (array)
attribute string Handle data (XML)
attributeNS string Handle data (XML)
children Children records configuration Store data
disabledOperations string Store data
field string Handle data
fieldNS string Handle data (XML)
multipleRows boolean Store data
multipleSorting string Store data
substructureFields array Handle data
transformations Transformations configuration Transform data
value Simple type (string, integer, float, boolean) Handle data
xmlValue boolean Handle data (XML)
xpath string Handle data (XML)

value 

Type
Simple type (string, integer, float, boolean)
Description

Sets a fixed value, independent of the data being imported. For example, this might be used to set a flag for all imported records. Or you might want to use different types for different import sources.

This can be used for both array-type and XML-type data.

Scope
Handle data

field 

Type
string
Description

Name or index of the field (or node, in the case of XML data) that contains the data in the external source.

For array-type data, this information is mandatory. For XML-type data, it can be left out. In such a case, the value of the current node itself will be used, or an attribute of said node, if the attribute property is also defined.

Scope
Handle data

arrayPath 

Type
string
Description

Replaces the field property for pointing to a field in a "deeper" position inside a multidimensional array. The value is a string comprised of the keys for pointing into the array, separated by some character.

For more details on usage and available options, see the dedicated page.

Works only for array-type data.

If both "field" and "arrayPath" are defined, the latter takes precedence.

Scope
Handle data (array)

arrayPathFlatten 

Type
bool
Description

When the special * segment is used in an arrayPath, the resulting structure is always an array. If the arrayPath target is actually a single value, this may not be desirable. When arrayPathFlatten is set to true, the result is preserved as a simple type.

Scope
Handle data (array)

arrayPathSeparator 

Type
string
Description
Separator to use in the arrayPath property. Defaults to / if this property is not defined.
Scope
Handle data (array)

attribute 

Type
string
Description

If the data is of type XML, use this property to retrieve the value from an attribute of the node rather than the value of the node itself.

This applies to the node selected with the field property or to the current node if field is not defined.

Scope
Handle data (XML)

xpath 

Type
string
Description

This property can be used to execute a XPath query relative to the node selected with the field property or (since version 2.3.0) directly on the current node if field is not defined.

The value will be taken from the first node returned by the query. If the attribute property is also defined, it will be applied to the node returned by the XPath query. If the XPath query is just a function (without selector), the resulting string will be returned.

Please see the namespaces property for declaring namespaces to use in a XPath query.

Scope
Handle data (XML)

fieldNS 

Type
string
Description

Namespace for the given field. Use the full URI for the namespace, not a prefix.

Example

Given the following data to import:

<?xml version="1.0" encoding="UTF-8"?>
<Invoice xmlns:cbc="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2" xmlns:cac="urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2">
    <InvoiceLine>
        <cbc:ID>A1</cbc:ID>
        <cbc:LineExtensionAmount currencyID="USD">100.00</cbc:LineExtensionAmount>
        <cac:OrderReference>
            <cbc:ID>000001</cbc:ID>
        </cac:OrderReference>
    </InvoiceLine>
    ...
</Invoice>
Copied!

getting the value in the <cbc:LineExtensionAmount> tag would require the following configuration:

'external' => [
    0 => [
        'fieldNS' => 'urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2',
        'field' => 'LineExtensionAmount'
    ]
]
Copied!
Scope
Handle data (XML)

attributeNS 

Type
string
Description
Namespace for the given attribute. Use the full URI for the namespace, not a prefix. See fieldNS for example usage.
Scope
Handle data (XML)

substructureFields 

Type
array
Description

Makes it possible to read several values that are located inside nested data structures. Consider the following data source:

[
  {
    "order": "000001",
    "date": "2014-08-07",
    "customer": "Conan the Barbarian",
    "products": [
      {
        "product": "000001",
        "qty": 3
      },
      {
        "product": "000005",
        "qty": 1
      },
      {
        "product": "000101",
        "qty": 10
      },
      {
        "product": "000102",
        "qty": 2
      }
    ]
  },
  {
    "order": "000002",
    "date": "2014-08-08",
    "customer": "Sonja the Red",
    "products": [
      {
        "product": "000001",
        "qty": 1
      },
      {
        "product": "000005",
        "qty": 2
      },
      {
        "product": "000202",
        "qty": 1
      }
    ]
  }
]
Copied!

The "products" field is actually a nested structure, from which we want to fetch the values from both product and qty. This can be achieved with the following configuration:

'products' => [
 'exclude' => 0,
 'label' => 'Products',
 'config' => [
    ...
 ],
 'external' => [
    0 => [
       'field' => 'products',
       'substructureFields' => [
          'products' => [
             'field' => 'product'
          ],
          'quantity' => [
             'field' => 'qty'
          ]
       ],
       ...
    ]
 ]
]
Copied!

The keys to the configuration array correspond to the names of the columns where the values will be stored. The configuration for each element can use all the existing properties for retrieving data:

The substructure fields are searched for inside the structure selected with the "main" data pointer. In the example above, the whole "products" structure is first fetched, then the product and qty are searched for inside that structure.

The above example will read the values in the product nested field and put it into "products" column. Same for qty and "quantity". The fact that there are several entries will multiply imported records, actually denormalising the data on the fly. The result would be something like:

order date customer products quantity
000001 2014-08-07 Conan the Barbarian 000001 3
000001 2014-08-07 Conan the Barbarian 000005 1
000001 2014-08-07 Conan the Barbarian 000101 10
000001 2014-08-07 Conan the Barbarian 000102 2
000002 2014-08-08 Sonja the Red 000001 1
000002 2014-08-08 Sonja the Red 000005 2
000002 2014-08-08 Sonja the Red 000202 1

Obviously if you have a single element in the nested structure, no denormalisation happens. Due to this denormalisation you probably want to use this property in conjunction with the multipleRows or children properties.

Scope
Handle data

multipleRows 

Type
boolean
Description

Set to true if you have denormalized data. This will tell the import process that there may be more than one row per record to import and that all values for the given column must be gathered and collapsed into a comma-separated list of values. See the Mapping data chapter for explanations about the impact of this flag.

If these values need to be sorted, use the multipleSorting property.

Scope
Store data

multipleSorting 

Type
string
Description

If the multipleRows need to be sorted, use this property to name the field which should be used for sorting. This can be any of the mapped fields, additional fields or substructure fields.

Scope
Store data

children 

Type
array (see Children records configuration)
Description
This property makes it possible to create nested structures and import them in one go. This may typically be "sys_file_reference" records for a field containing images. This should be used anytime you are using a MM table into which you need to write specific properties (like "sys_file_reference"). For simple MM tables (like "sys_category_record_mm"), you don't need to create this children sub-structure for the MM table. It is enough to gather a comma-separated list of "sys_category" primary keys.
Scope
Store data

transformations 

Type
array (see Transformations configuration)
Description

Array of transformation properties. The transformations will be executed as ordered by their array keys.

Example:

$GLOBALS['TCA']['fe_users']['columns']['starttime']['external'] = [
 0 => [
    'field' => 'start_date',
    'transformations' => [
       20 => [
          'trim' => true
       ],
       10 => [
          'userFunction' => [
             'class' => \Cobweb\ExternalImport\Transformation\DateTimeTransformation::class,
             'method' => 'parseDate'
          ]
       ]
    ]
 ]
];
Copied!

The "userFunction" will be executed first (10) and the "trim" next (20).

Scope
Transform data

xmlValue 

Type
boolean
Description
When taking the value of a node inside a XML structure, the default behaviour is to retrieve this value as a string. If the node contained a XML sub-structure, its tags will be stripped. When setting this value to true, the XML structure of the child nodes is preserved.
Scope
Handle data (XML)

disabledOperations 

Type
array
Description

Comma-separated list of database operations from which the column should be excluded. Possible values are "insert" and "update".

See also the general property disabledOperations.

Scope
Store data

Additional fields configuration 

Additional fields are fields that are read from the external source but not saved to the database. They do not match TCA columns. They are most likely used in user functions and custom steps to prepare some other data, but are not persisted in the TYPO3 database.

Since External Import 5.0, additional fields are defined in their own "configuration space":

$GLOBALS['TCA']['tx_externalimporttest_tag'] = [
   'external' => [
      'additionalFields' => [
         0 => [
            'quantity' => [
               'field' => 'qty'
            ]
         ]
      ]
   ],
];
Copied!

As usual the index (here 0) must match between the general configuration, the columns configuration and the additional fields configuration.

In the above example the "qty" field from the external data will be read and stored in the "quantity" column, which will be available for any processing, but not saved to the database.

All properties from the columns configuration can be used with additional fields too (although some may not make sense).

Transformations configuration 

A number of properties relate to transforming the data during the import process. All of these properties are used during the "Transform data" step. They are sub-properties of the transformations property.

Properties 

Property Data type Step/Scope
isEmpty array Transform data
mapping Mapping configuration Transform data
rteEnabled boolean Transform data
trim boolean Transform data
userFunction array Transform data
value simple type (string, integer, float, boolean) Transform data

mapping 

Type
Mapping configuration
Description
This property can be used to map values from the external data to values coming from some internal table. A typical example might be to match 2-letter country ISO codes to the uid of the "static_countries" table.
Scope
Transform data

value 

Type
Simple type (string, integer, float, boolean)
Description

With this property, it is possible to set a fixed value for a given field. For example, this might be used to set a flag for all imported records. Or you might want to use different types for different import sources.

Example:

EXT:my_extension/Configuration/Overrides/tx_sometable.php
$GLOBALS['TCA']['tx_sometable'] = array_replace_recursive($GLOBALS['TCA']['tx_sometable'],
[
  // ...
    'columns' => [
        'type' => [
            'external' => [
                0 => [
                    'transformations' => [
                        10 => [
                            // Default type
                            'value' => 0
                        ]
                    ],
                ],
                'another_import' => [
                    'transformations' => [
                        10 => [
                            // Another type
                            'value' => 1
                        ]
                    ],
                ]
            ]
        ],
     // ...
    ],
]);
Copied!
Scope
Transform data

trim 

Type
boolean
Description

If set to true, every value for this column will be trimmed during the transformation step.

Scope
Transform data

rteEnabled 

Type
boolean
Description

If set to true when importing HTML data into a RTE-enable field, the imported data will go through the usual RTE transformation process on the way to the database.

Scope
Transform data

userFunction 

Type
array
Description

This property can be used to define a function that will be called on each record to transform the data from the given field. See example below.

Example

Here is a sample setup referencing a user function:

$GLOBALS['TCA']['fe_users']['columns']['starttime']['external'] = [
 0 => [
    'field' => 'start_date',
    'transformations' => [
       10 => [
          'userFunction' => [
             'class' => \Cobweb\ExternalImport\Transformation\DateTimeTransformation::class,
             'method' => 'parseDate'
          ]
       ]
    ]
 ]
];
Copied!

The definition of a user function takes three parameters:

class
(string) Required. Name of the class to be instantiated.
method
(string) Required. Name of the method that should be called.
parameters (formerly "params")
(array) Optional. Can contain any number of data, which will be passed to the method. This used to be called "params". Backwards-compatibility is ensured for now, but please update your configuration as soon as possible.

In the example above we are using a sample class provided by External Import that can be used to parse a date and either return it as a timestamp or format it using either of the PHP functions date() or strftime() .

For more details about creating a user function, please refer to the Developer's Guide.

Scope
Transform data

isEmpty 

Type
array
Description

This property is used to assess if a value in the given column can be considered empty or not and, if yes, act on it. The action can be either to set a default value or to remove the entire record from the imported dataset.

Deciding whether a given value is "empty" is a bit tricky, since null, false, 0 or an empty string - to name a few - could all be considered empty depending on the circumstances. By default, this property will rely on the PHP function empty(). However it is also possible to evaluate an expression based on the values in the record using the Symfony Expression Language.

expression

(string) A condition using the Symfony Expression Language syntax. If it evaluates to true, the action (see below) will be triggered. The values in the record can be used, by simply referencing them with the column name.

If no expression is defined, the PHP function empty() is used.

See the Symfony documentation for reference.

invalidate
(bool) Set this property to true to discard the entire record from the imported dataset if the expression (or empty()) evaluated to true. invalidate takes precedence over default.
default
(mixed) If the expression (or empty()) evaluates to true, this value will be set in the record instead of the empty value.

Example

'store_code' => [
    'exclude' => 0,
    'label' => 'Code',
    'config' => [
        'type' => 'input',
        'size' => 10
    ],
    'external' => [
        0 => [
            'field' => 'code',
            'transformations' => [
                10 => [
                    'trim' => true
                ],
                20 => [
                    'isEmpty' => [
                        'expression' => 'store_code === ""',
                        'invalidate' => true
                    ]
                ],
            ]
        ]
    ]
],
Copied!

In this example, the store_code field is compared with an empty string. Any record with an empty string in that column will be removed from the dataset.

Mapping configuration 

The external values can be matched to values from an existing TYPO3 CMS table, using the "mapping" property, which has its own set of properties. They are described below.

Properties 

Property Data type
default mixed
matchMethod string
matchSymmetric boolean
multipleValuesSeparator string
referenceField string
table string
valueField string
valueMap array
whereClause string

table 

Type
string
Description
Name of the table to read the mapping data from.
Scope
Transform data

referenceField 

Type
string
Description

Name of the field against which external values must be matched.

Scope
Transform data

valueField 

Type
string
Description

Name of the field to take the mapped value from. If not defined, this will default to "uid".

Scope
Transform data

whereClause 

Type
string
Description

SQL condition (without the "WHERE" keyword) to apply to the referenced table. This is typically meant to be a mirror of the foreign_table_where property of select-type fields.

However only one marker is supported in this case: ###PID_IN_USE### which will be replaced by the current storage pid. So if you have something like:

'foreign_table_where' => 'AND pid = ###PAGE_TSCONFIG_ID###'
Copied!

in the TCA for your column, you should replace the marker by a hard- coded value instead for external import, e.g.

'whereClause' => 'pid = 42'
Copied!
Scope
Transform data

default 

Type
mixed
Description

Default value that will be used when a value cannot be mapped. Otherwise the field is unset for the record.

Scope
Transform data

valueMap 

Type
array
Description
Fixed hash table for mapping. Instead of using a database table to match external values to internal values, this property makes it possible to use a simple list of key-value pairs. The keys correspond to the external values.
Scope
Transform data

multipleValuesSeparator 

Type
string
Description

Set this property if the field to map contains several values, separated by some symbol (for example, a comma). The values will be split using the symbol defined in this property and each resulting value will go through the mapping process.

This makes it possible to handle 1:n or m:n relations, where the incoming values are all stored in the same field.

Scope
Transform data

matchMethod 

Type
array
Description

Value can be "strpos" or "stripos".

Normally mapping values are matched based on a strict equality. This property can be used to match in a "softer" way. It will match if the external value is found inside the values pointed to by the referenceField property. "strpos" will perform a case-sensitive matching, while "stripos" is case-unsensitive.

Caution should be exercised when this property is used. Since the matching is less strict it may lead to false positives. You should review the data after such an import.

Scope
Transform data

matchSymmetric 

Type
boolean
Description
This property complements matchMethod. If set to true, the import process will not only try to match the external value inside the mapping values, but also the reverse, i.e. the mapping values inside the external value.
Scope
Transform data

Examples 

Simple mapping 

Here's an example TCA configuration.

$GLOBALS['TCA']['fe_users']['columns']['tx_externalimporttut_department']['external'] = [
    0 => [
        'field' => 'department',
        'mapping' => [
            'table' => 'tx_externalimporttut_departments',
            'referenceField' => 'code'
        ]
    ]
];
Copied!

The value found in the "department" field of the external data will be matched to the "code" field of the "tx_externalimporttut_departments" table, and thus create a relation between the "fe_users" and the "tx_externalimporttut_departments" table.

Mapping multiple values 

This second example demonstrates usage of the multipleValuesSeparator property.

The incoming data looks like:

<catalogue>
    <products type="current">
        <item sku="000001">Long sword</item>
        <tags>attack,metal</tags>
    </products>
    <products type="obsolete">
        <item index="000002">Solar cream</item>
    </products>
    <products type="current">
        <item sku="000005">Chain mail</item>
        <tags>defense,metal</tags>
    </products>
    <item sku="000014" type="current">Out of structure</item>
</catalogue>
Copied!

and the external import configuration like:

$GLOBALS['TCA']['tx_externalimporttest_product']['columns']['tags']['external'] = [
  'base' => [
      'xpath' => './self::*[@type="current"]/tags',
      'transformations' => [
           10 => [
                'mapping' => [
                     'table' => 'tx_externalimporttest_tag',
                     'referenceField' => 'code',
                     'multipleValuesSeparator' => ','
                ]
           ]
      ]
  ]
];
Copied!

The values in the <tags> nodes will be split on the comma and each will be matched to a tag from "tx_externalimporttest_tag" table, using the "code" field for matching.

This example is taken from the "externalimport_test" extension.

Child records configuration 

The "children" property is used to create nested structures, generally MM tables where additional information needs to be stored.

See the Mapping data chapter for an overview of import scenarios which may help understand this feature.

Example:

$GLOBALS['TCA']['tx_externalimporttest_product']['columns']['pictures']['external'] = [
   'base' => [
        'field' => 'Pictures', // remote db field
        'transformations' => [
            10 => [
                'userFunction' => [
                    'class' => \Cobweb\ExternalImport\Transformation\ImageTransformation::class,
                    'method' => 'saveImageFromUri',
                    'parameters' => [
                        'storage' => '1:importedpictures', // local folder for files
                    ]
                ]
            ]
        ],
        'children' => [
            'table' => 'sys_file_reference',
            'columns' => [
                'uid_local' => [
                    'field' => 'pictures'
                ],
                'uid_foreign' => [
                    'field' => '__parent.id__'
                ],
                'title' => [
                    'field' => 'picture_title'
                ],
                'tablenames' => [
                    'value' => 'tx_externalimporttest_product'
                ],
                'fieldname' => [
                    'value' => 'pictures'
                ],
                'table_local' => [
                    'value' => 'sys_file'
                ]
            ],
            'sorting' => [
                'source' => 'picture_order',
                'target' => 'sorting_foreign'
            ],
            'controlColumnsForUpdate' => 'uid_local, uid_foreign, tablenames, fieldname, table_local',
            'controlColumnsForDelete' => 'uid_foreign, tablenames, fieldname, table_local'
        ]
       ...
   ]
]
Copied!

Properties 

Property Data type Step/Scope
columns array Store data
controlColumnsForUpdate string Store data
controlColumnsForDelete string Store data
disabledOperations string Store data
sorting array Store data
table string Store data

table 

Type
string
Description
Name of the nested table. This information is mandatory.
Scope
Store data

columns 

Type
array
Description

List of columns (database fields) needed for the nested table. This is an associative array, using the column name as the key. Then each column must have one of two properties:

value
This is a simple value that will be used for each entry into the nested table. Use it for invariants like the "tablenames" field of a MM table.
field

This is the name of a field that is available in the imported data. The value is copied from the current record. Note that such fields can be any of the mapped columns, any of the additionalFields or any of the substructureFields.

The special value __parent.id__ refers to the primary key of the current record and will typically be used for "uid_local" or "uid_foreign" fields in MM tables, depending on how the relation is built.

Scope
Store data

controlColumnsForUpdate 

Type
string
Description

Comma-separated list of columns that need to be used for checking if a child record already exists. All these columns must exist in the list of columns defined above. Defining this property ensures that existing relations are updated instead of being created anew.

This list should contain all columns that are significant for identifying a child record without ambiguity. In the example above, we have:

'controlColumnsForUpdate' => 'uid_local, uid_foreign, tablenames, fieldname, table_local',
Copied!

These are all the columns that need to be queried in the "sys_file_reference" table to be sure that we are targeting the right record in the database. Any missing information might mean retrieving another record (for a different table or field, or whatever).

Scope
Store data

controlColumnsForDelete 

Type
string
Description

This is similar to controlColumnsForUpdate but for finding out which existing relations are no longer relevant and need to be deleted. It is not the same list of fields as you need to leave out the field which references the relation on the "other side". In the case of "sys_file_reference", you would leave out "uid_local", which is the reference to the "sys_file" table.

Scope
Store data

sorting 

Type
array
Description

External Import stores child records in the order in which they appear, which is generally the order in which they are in the external data source. It may be needed to sort the child records differently, according to some other data available in the in the external source.

This property allows this. It is defined by two elements:

source

The name of the column containing the sorting value in the external data source. This column should ideally contain numerical values. If that is not the case, the values are cast to integer when they are used, so you need to make sure that the values contained in this column can be cast safely.

If the sorting value is missing for some records, a value of 0 will be used instead, putting those child records at the top of the list.

target
The name of the sorting field in the child record table.

Both elements are mandatory. Configuration validation will fail otherwise.

'sorting' => [
    'source' => 'picture_order',
    'target' => 'sorting_foreign'
],
Copied!
Scope
Store data

disabledOperations 

Type
string
Description

Comma-separated list of operations which should not take place. This can be "insert" (no new child records), "update" (no update to existing child records) and/or "delete" (no removal of existing child records).

Scope
Store data

Array Path configuration 

Introduction 

The "arrayPath" property, which can apply to both the general configuration and the columns configuration has several options which can make it tricky to use once you try more complicated scenarios. Thus this dedicated chapter.

This property is like a path pointing some specific part of a multidimensional array. The different parts of the path are separated by some marker, itself defined by the arrayPathSeparator property. if "arrayPathSeparator" is not set, the separator defaults to /.

Examples 

As a simple example, consider the following structure to import:

[
   'name' => 'Zaphod Beeblebrox',
   'book' => [
      'title' => 'Hitchiker\'s Guide to the Galaxy'
   ]
]
Copied!

To import the title of the book (and not the book itself), use the following configuration:

[
   'arrayPath' => 'book/title'
]
Copied!

If, for some reason, you needed a different separator, you could use something like:

[
   'arrayPath' => 'book#title',
   'arrayPathSeparator' => '#'
]
Copied!

It is perfectly okay to use numerical indices in the path. With this structure:

[
   'series' => 'Hitchiker\'s Guide to the Galaxy',
   'books' => [
      'The Hitchiker\'s Guide to the Galaxy',
      'The Restaurant at the End of the Universe',
      'So long, and thanks for all the Fish'
      // etc.
   ]
]
Copied!

and this configuration:

[
   'arrayPath' => 'books/0'
]
Copied!

The result will be "The Hitchiker's Guide to the Galaxy". It is always the first element inside "books" that will be selected.

Conditions 

Conditions can be applied to each segment of the path using the Symfony Expression Language syntax, wrapped in curly braces. If the value being tested is an array, its items can be accessed directly in the expression. If the value is a simple type, it can be accessed in the expression with the key value.

See the Symfony documentation for reference on the Symfony Expression Language syntax.

Examples 

With the following data to import:

[
   'name' => 'Zaphod Beeblebrox',
   'book' => [
      'state' => 'new',
      'title' => 'Hitchiker\'s Guide to the Galaxy'
   ]
]
Copied!

let's imagine two scenarios. First, we want to get the name of the character, but only if it's "Zaphod Beeblebrox". The configuration would be:

[
   'arrayPath' => 'name{value === \'Zaphod Beeblebrox\'}'
]
Copied!

When the name is indeed "Zaphod Beeblebrox", the result will be "Zaphod Beeblebrox" too. When the name is anything else, the result will be null.

A second scenario is to take the title of the book, only if the book is new. That would be achieved with a configuration like:

[
   'arrayPath' => 'book{state === \'new\'}/title'
]
Copied!

With the above data, the result will be "Hitchiker's Guide to the Galaxy", but for a book whose state is "used", the result would be null.

Such usage of conditions may seem a bit far-fetched at first, but can be quite interesting when combined (at a later stage in the import process) with the isEmpty property. However conditions are much more interesting for looping on substructures and filtering them, as described next.

Looping and filtering 

The special segment * can be included in the path. It indicates that all values selected up to that point should be looped on and the condition following the * applied to each of them (the * without a condition is useful when wanting to loop on an array with numerical indices). This will effectively filter the currently selected elements. Further segments in the path are applied only to that resulting set.

Usage of special segment * can be followed by usage of special segment ., which changes the way the selected elements are handled. This is better explained by using examples.

Examples 

Let's consider the following structure to import:

[
    'test' => [
        'data' => [
            0 => [
                'status' => 'valid',
                'list' => [
                    0 => 'me',
                    1 => 'you'
                ]
            ],
            1 => [
                'status' => 'invalid',
                'list' => [
                    4 => 'we'
                ]
            ],
            2 => [
                'status' => 'valid',
                'list' => [
                    3 => 'them'
                ]
            ]
        ]
    ]
]
Copied!

And let's say that we want to have all the items that are inside the "list" key, but only when the "status" is "valid". We would use the following configuration:

[
   'arrayPath' => 'test/data/*{status === \'valid\'}/list'
]
Copied!

which would result in:

[
    0 => 'me',
    1 => 'you',
    2 => 'them'
]
Copied!

This may not seem very intuitive at first. This is because this feature was designed to mimic what you might get from a XML structure with a XPath query. Consider the following structure:

<books>
   <book>
      <title>Foo</title>
      <authors>
         <author>A</author>
         <author>B</author>
      </authors>
   </book>
   <book>
      <title>Bar</title>
      <authors>
         <author>C</author>
      </authors>
   </book>
</books>
Copied!

With an XPath like //author, you would get values "A", "B" and "C" in a single list, no matter what context surrounds them.

If you need to preserve the structure of the elements matched, you can add the special segment . after the * segment. This preserves the matched structure, to which you can apply further path segments. The above example would be modified as such:

[
   'arrayPath' => 'test/data/*{status === \'valid\'}/./list'
]
Copied!

which changes the result to:

[
    0 => [
        0 => 'me',
        1 => 'you'
    ],
    1 => [
        3 => 'them'
    ]
]
Copied!

If we change the structure to import to this:

[
    'test' => [
        'data' => [
            0 => [
                'status' => 'invalid',
                'list' => [
                    0 => 'me',
                    1 => 'you'
                ]
            ],
            1 => [
                'status' => 'invalid',
                'list' => [
                    4 => 'we'
                ]
            ],
            2 => [
                'status' => 'valid',
                'list' => [
                    3 => 'them'
                ]
            ]
        ]
    ]
]
Copied!

making the first entry also "invalid" and using the same first condition:

[
   'arrayPath' => 'test/data/*{status === \'valid\'}/list'
]
Copied!

we will have a single result:

[
    0 => 'them'
]
Copied!

When we know that we have such a scenario, it might be convenient to get the actual value as a result (i.e. "them") rather than a single-entry array. This is where property arrayPathFlatten can be used. Modifying the configuration to:

[
   'arrayPath' => 'test/data/*{status === \'valid\'}/list',
   'arrayPathFlatten' => true
]
Copied!

changes the result to simply:

'them'
Copied!

Log cleanup 

The log table can be cleaned up automatically using the Table garbage collection Scheduler task.

A new entry for that task can be created with the following options:

  • Table to clean up: tx_externalimport_domain_model_log
  • Delete entries older than given number of days: 30 (default)

A pre-configuration exists in the ext_localconf.php file with a configuration of 180 days.

If you run a lot of imports, make sure that this table is cleaned up regularly.

Available APIs 

This chapter describes the various APIs and data models existing in this extension and which might be of use to developers.

Import API 

As mentioned earlier, External Import can be used from within another piece of code, just passing it data and benefiting from its mapping, transformation and storing features.

It is very simple to use this feature. You just need to assemble data in a format that External Import can understand (XML structure or PHP array) and call the appropriate method. All you need is an instance of class \Cobweb\ExternalImport\Importer and a single call.

$importer = \TYPO3\CMS\Core\Utility\GeneralUtility::makeInstance(\Cobweb\ExternalImport\Importer::class);
$messages = $importer->import($table, $index, $rawData);
Copied!

The call parameters are as follows:

Name Type Description
$table string Name of the table to store the data into.
$index integer Index of the relevant external configuration.
$rawData mixed The data to store, either as XML (string) or PHP array.

The result is a multidimensional array of messages. The first dimension is a status and corresponds to the \TYPO3\CMS\Core\Messaging\AbstractMessage::ERROR, \TYPO3\CMS\Core\Messaging\AbstractMessage::WARNING and \TYPO3\CMS\Core\Messaging\AbstractMessage::OK constants. The second dimension is a list of messages. Your code should handle these messages as needed.

Data Model 

The data that goes through the import process is encapsulated in the \Cobweb\ExternalImport\Domain\Model\Data class. This class contains two member variables:

rawData
The data as it is read from the external source or as it is passed to the import API. Given the current capacities of External Import, this may be either a string representing a XML structure or a PHP array.
extraData

An array available for anyone to write into and read from. This is some kind of storage space where any type of data can be stored and passed from step to step.

On top of the usual getter and setter, use addExtraData($key, $data) to add some data to this array using the defined array key.

records
The data as structured by External Import, step after step.
downloadable
Indicates whether the records variable contains data that is appropriate for downloading as CSV. The download feature is available in the preview mode of the backend module.

There are getters and setters for each of these.

Configuration Model 

Whenever an import is run, the corresponding TCA configuration is loaded into an instance of the \Cobweb\ExternalImport\Domain\Model\Configuration class. The main member variables are:

table
The name of the table for which data is being imported.
index
The index of the configuration being used.
generalConfiguration
The general part of the External Import TCA configuration.
columnConfiguration
The columns configuration part of the External Import TCA configuration.
additionalFields
Array containing the list of additional fields. This should be considered a runtime cache for an often requested property.
countAdditionalFields
Number of additional fields. This is also a runtime cache.
steps
List of steps the process will go through. When the External Import configuration is loaded, the list of steps is established, based on the type of import (synchronized or via the API) and any custom steps. This ensures that custom steps are handled in a single place.
connector
The Configuration object also contains a reference to the Connector service used to read the external data, if any.

There are getters and setters for each of these.

Furthermore the setExcludedFromSavingFlagForColumn() method makes it possible to programmatically exclude (or re-include) a field from being saved to the database. By default, all additional fields are excluded. Using this method should not be necessary is most normal usage scenarios.

The Importer class 

Beyond the import() method mentioned above the \Cobweb\ExternalImport\Importer class also makes a number of internal elements available via getters:

getExtensionConfiguration
Get an array with the unserialized extension configuration.
getExternalConfiguration
Get the current instance of the Configuration model.
setContext/getContext

Define or retrieve the execution context. This is mostly informative and is used to set a context for the log entries. Expected values are "manual", "cli", "scheduler" and "api". Any other value can be set, but will not be interpreted by the External Import extension. In the Log module, such values will be displayed as "Other".

setCallType/getCallType
Define or retrieve the execution context. This is based on the \Cobweb\ExternalImport\Enum\CallType enumeration. It is normally set by External Import itself, but can be set from the outside, especially when using External Import as an API (in which case, the call type should set to \Cobweb\ExternalImport\Enum\CallType::Api).
setDebug/getDebug
Define or retrieve the debug flag. This makes it possible to programatically turn debugging on or off.
setVerbose/getVerbose
Define or retrieve the verbosity flag. This is currently used only by the command-line utility for debugging output.

and a few more which are not as significant and can be explored by anyone interested straight in the source code.

For reporting, the \Cobweb\ExternalImport\Importer class also provides the addMessage() method which takes as arguments a message and a severity (using the constants of the \TYPO3\CMS\Core\Messaging\AbstractMessage class).

The call context 

External Import may be called in various contexts (command line, Scheduler task, manual call in the backend or API call). While the code tries to be as generic as possible, it is possible to hit some limits in some circumstances. The "call context" classes have been designed for such situations.

A call context class must inherit from \Cobweb\ExternalImport\Context\AbstractCallContext and implement the necessary methods. There is currently a single method called outputDebug() which is supposed to display some debug output. Currently a specific call context exists only for the command line and makes it possible to display debugging information in the Symfony console.

The reporting utility 

The \Cobweb\ExternalImport\Utility\ReportingUtility class is in charge of giving feedback in various contexts, lik sending an email once a synchronization is finished.

It provides a generic API for storing values from Step classes that could make sense in terms of reporting. Currently this is used only by the \Cobweb\ExternalImport\Step\StoreDataStep class which reports on the number of operations performed (inserts, updates, deletes and moves).

User functions 

The external import extension can call user functions for any field where external data is imported. Some sample functions are provided in Classes/Transformation/DateTimeTransformation.php and Classes/Transformation/ImageTransformation.php.

Basically, the function receives three parameters:

Name Type Description
$record array The complete record being handled. This makes it possible to refer to other fields of the same record during the transformation, if needed.
$index string The key of the field to transform. Modifying other fields in the record is not possible since the record is passed by value and not by reference. Only the field corresponding to this key should be transformed and returned.
$parameters array Additional parameters passed to the function. This will be very specific to each function and can even be completely omitted. External import will pass an empty array to the user function if the "parameters" property is not defined.

The function is expected to return only the value of the transformed field.

The class containing the user function may implement the \Cobweb\ExternalImport\ImporterAwareInterface (using the \Cobweb\ExternalImport\ImporterAwareTrait or not). In such a case, it will have access to the Importer instance simply by using $this->getImporter(). In particular, this makes it possible for user functions to check if the current run is operating in preview mode or in debug mode.

The function may throw the special exception \Cobweb\ExternalImport\Exception\CriticalFailureException. This will cause the "Transform Data" step to abort. More details in the chapter about critical exceptions.

The function may also throw the special exception \Cobweb\ExternalImport\Exception\InvalidRecordException. The related record will be removed from the imported dataset.

The function may throw any other kind of exception if the transformation it is supposed to apply to the value it receives fails. This will trigger the removal of this value from the imported dataset, thus avoiding that it be further processed and eventually saved to the database.

Events 

Interrupting the process: critical exceptions 

One exception class plays a particular role: \Cobweb\ExternalImport\Exception\CriticalFailureException. It can be thrown from within a user function or an event and will cause the import process to abort.

The reason for this exception is to react to some critical issue that may happen during the call to a user function or inside an event listener and which affects the whole import process. For example, if you are transforming a date and a single record has an invalid date, you probably don't want to interrupt the whole process for this. You want to record the issue is some way, but not pull the hand brake. On the other hand, say that you are saving some files and the target file storage is not available: you will probably want to stop the process before every record is saved with its related files.

Such exception thrown from within any user function will cause the "Transform Data" step to abort. When thrown from within an event listener it may abort the "Transform Data", the "Handle Data", the "Validate Data" or the "Store Data" steps. For the latter, however, note that data may have already been saved depending on which event listener it is thrown from. Refer to the chapter about events for more details.

Make sure to include a helpful error message when throwing this exception.

Custom process steps 

Besides all the events, it is also possible to register custom process steps. How to register a custom step is covered in the Administration chapter. This section describes what a custom step can or should do and what resources are available from within a custom step class.

Parent class 

A custom step class must inherit from abstract class \Cobweb\ExternalImport\Step\AbstractStep. If it does not, the step will be ignored during import. The parent class makes a lot of features available some of which are described below.

If you want to use Dependency Injection in your custom step class, just remember to declare it as being public in your service configuration file.

Available resources 

A custom step class has access to the following member variables:

data
Instance of the object model encapsulating the data being processed ( \Cobweb\ExternalImport\Domain\Model\Data).
importer
Back-reference to the current instance of the \Cobweb\ExternalImport\Importer class.
parameters
Array of parameters declared in the configuration of the custom step.

See the API chapter for more information about these classes.

Furthermore, the custom step class can access a member variable called abortFlag. Setting this variable to true will cause the import process to be aborted after the custom step. Any such interruption is logged by the \Cobweb\ExternalImport\Importer class, albeit without any detail. If you feel the need to report about the reason for interruption, do so from within the custom step class:

$this->getImporter()->addMessage(
     'Your message here...',
     FlashMessage::WARNING // or whatever error level
);
Copied!

It is also possible to mark a custom step so that it is executed even if the process was aborted by a previous step. This is done by setting the executeDespiteAbort member variable to true in the constructor.

public function __construct() {
    $this->setExecuteDespiteAbort(true);
}
Copied!

In general, use the getters and setters to access the member variables.

Custom step basics 

A custom step class must implement the run() method. This method receives no arguments and returns nothing. All interactions with the process happens via the member variables described above and their API.

The main reason to introduce a custom step is to manipulate the data being processed. To read the data, use:

// Read the raw data or...
$rawData = $this->getData()->getRawData();
// Read the processed data
$records = $this->getData()->getRecords();
Copied!

If you manipulate the data, you need to store it explicitely:

// Store the raw data or...
$this->getData()->setRawData();
// Store the processed data
$this->getData()->setRecords();
Copied!

Another typical usage would be to interrupt the process entirely by setting the abortFlag variable to true, as mentioned above.

The rich API that is available makes it possible to do many things beyond these. For example, one could imagine changing the External Import configuration on the fly.

In general the many existing Step classes provide many examples of API usage and should help when creating a custom process step.

Preview mode 

It is very important that your custom step respects the preview mode. This has two implications:

  1. If relevant, you should return some preview data. For example, the TransformDataStep class returns the import data once transformations have been applied to it, the StoreDataStep class returns the TCE structure, and so on. There's an API for returning preview data:

    $this->getImporter()->setPreviewData(...);
    Copied!

    The preview data can be of any type.

  2. Most importantly, you must respect the preview mode and not make any persistent changes, like saving stuff to the database. Use the API to know whether preview mode is on or not:

    $this->getImporter()->isPreview();
    Copied!
  3. Indicate that the records of the Data object are downloadable if it makes sense (see the Data model API). This is done by overriding the hasDownloadableData() method of the \Cobweb\ExternalImport\Step\AbstractStep class to return true.

Example 

Finally here is a short example of a custom step class. Note how the API is used to retrieve the list of records (processed data), which is looped over and then saved again to the Data object.

In this example, the "name" field of every record is used to filter acceptable entries.

<?php

declare(strict_types=1);

namespace Cobweb\ExternalimportTest\Step;

use Cobweb\ExternalImport\Step\AbstractStep;

/**
 * Class demonstrating how to use custom steps for external import.
 *
 * @package Cobweb\ExternalimportTest\Step
 */
class TagsPreprocessorStep extends AbstractStep
{

    /**
     * Filters out some records from the raw data for the tags table.
     *
     * Any name containing an asterisk is considered censored and thus removed.
     */
    public function run(): void
    {
        $records = $this->getData()->getRecords();
        foreach ($records as $index => $record) {
            if (strpos($record['name'], '*') !== false) {
                unset($records[$index]);
            }
        }
        $records = array_values($records);
        $this->getData()->setRecords($records);
        $this->getData()->isDownloadable(true);
        // Set the filtered records as preview data
        $this->importer->setPreviewData($records);
    }

    /**
     * Define the data as being downloadable
     *
     * @return bool
     */
    public function hasDownloadableData(): bool
    {
        return true;
    }
}
Copied!

Custom data handlers 

It is possible to use a custom data handler instead of the standard \Cobweb\ExternalImport\Importer::handleArray() and \Cobweb\ExternalImport\Importer::handleXML(). The value declared as a custom data handler is a class name:

$GLOBALS['TCA']['some_table']['external']['general'][0]['data'] = Foo\MyExtension\DataHandler\CustomDataHandler::class;
Copied!

The class itself must implement the \Cobweb\ExternalImport\DataHandlerInterface interface, which contains only the handleData() method. This method will receive two arguments:

  • an array containing the raw data returned by the connector service
  • a reference to the calling \Cobweb\ExternalImport\Importer object

The method is expected to return a simple PHP array, with indexed entries, like the standard methods (\Cobweb\ExternalImport\Importer::handleArray() and \Cobweb\ExternalImport\Importer::handleXML()).

Dynamic TCA loading 

Retrieval of the TCA global array is encapsulated in a class called \Cobweb\ExternalImport\Domain\Repository\TcaDirectAccessRepository which implements the \Cobweb\ExternalImport\Domain\Repository\TcaRepositoryInterface interface. This system pursues three aims:

  1. encapsulating the retrieval of the TCA to simplify following up the evolutions in the TYPO3 Core (like the introduction of the TCA Schema in TYPO3 13).
  2. abstracting into a base class ( \Cobweb\ExternalImport\Domain\Repository\AbstractTcaRepository) all the logic for retrieving all External Import-related configuration from the TCA.
  3. allowing developers to perfom dynamic manipulations on the TCA by providing their own TCA repository class through dependency injection. This is detailed below.

Custom TCA repository 

Although an event exists for manipulating a single import configuration, it is not unusual to have repetitive import configurations, sometimes implying a dynamic modification of the TCA. For such special cases, it may be useful to provide your own custom implementation of a TCA repository.

The recommended way is to extend the abstract class \Cobweb\ExternalImport\Domain\Repository\AbstractTcaRepository which implements all the methods related to extracting the External Import configurations from the TCA. The only method to implement is getTca(), where you can perform any processing you need. Then simply declare your repository as a service replacing \Cobweb\ExternalImport\Domain\Repository\TcaDirectAccessRepository, by placing in your extension's Services.yaml file the following:

services:
  _defaults:
    autowire: true
    autoconfigure: true
    public: false

  Vendor\ExtName\Import\DynamicTcaRepository:
    decorates: Cobweb\ExternalImport\Domain\Repository\TcaRepositoryInterface
    public: true
Copied!

Upgrading instructions for older versions 

Upgrade to 6.3.0 

External Import now supports Connector services registered with new system introduced with extension "svconnector" version 5.0.0, while staying compatible with the older versions.

Another small new feature is the possibility to define a storage pid for the imported data on the command line or when creating a Scheduler task, which overrides storage information that might be found in the TCA or in the extension configuration.

Upgrade to 6.2.0 

The Substructure Preprocess event is now fired for both array-type and XML-type data (previously, only for array-type data). To know which type of data is being handled, a new getDataType() method is available. The type of structure that must be returned after modfication (by calling setStructure() must be either an array or a \DomNodeList, as opposed to just an array in older versions. Existing event listeners may need to be adapted.

Upgrade to 6.1.0 

Records which have no external key set (the value referenced by the referenceUid property) are now skipped in the import. Indeed it makes no sense to import records without such keys, as they can never be updated and - if several are created in a single import run - they will override each other. Still it is a change of behaviour and should be noted.

Upgrade to 6.0.0 

All properties that were deprecated in version 5.0.0 were removed and the backwards-compatibility layer was dropped. Please refer to the 5.0.0 upgrade instructions and check if you have applied all changes.

All hooks were marked as deprecated. They will be removed in version 7.0.0. You should migrate your code to use either custom process steps or the newly introduced PSR-14 events. See the hooks chapter for information about how to migrate each hook.

External Import is now configured for using the standard (Symfony) dependency injection mechanism. This means it is not necessary to instantiate the \Cobweb\ExternalImport\Importer class using Extbase's \TYPO3\CMS\Extbase\Object\ObjectManager anymore when using the Importer as an API.

The PHP code was cleaned up as much as possible and strict typing was declared in every class file. This may break your custom code if you were calling public methods without properly casting arguments.

Upgrade to 5.1.0 

There is a single change in version 5.1.0 that may affect existing imports: when a user function fails to handle the value it was supposed to transform (by throwing an exception), that value is now removed from the imported dataset. Before that it was left unchanged.

Upgrade to 5.0.0 

There are many changes in version 5.0.0, but backwards-compatibility has been provided for all them (except the minor breaking change mentioned below). Please make sure to update your configuration as soon as possible, backwards-compatibility will be dropped in version 5.1.0. Messages for deprecated configuration appear in the backend module when viewing the details of a configuration.

Changes 

The general configuration must now be placed in $GLOBALS['TCA'][table-name]['external']['general'] instead of $GLOBALS['TCA'][table-name]['ctrl']['external'].

The "additionalFields" property from the general configuration (and not from the "MM" property) has been moved to its own configuration space. Rather than $GLOBALS['TCA'][table-name]['ctrl']['external'][some-index]['additionalFields] it is now $GLOBALS['TCA'][table-name]['external']['additionalFields'][some-index]. Furthermore, it is no longer a simple comma-separated list of fields, but an array structure with all the same options as standard column configurations. For more details, see the relevant chapter.

The "MM" property is deprecated. It should not be used anymore. Instead the new multipleRows or children properties should be used according to your import scenario.

The "userFunc" property of the transformations configuration has been renamed to userFunction and its sub-property "params" has been renamed "parameters".

If both "insert" and "update" operations are disabled in the general configuration (using the disabledOperations property), External Import will now delete records that were not marked for update (even if the actual update does not take place). Previously, no records would have been deleted, because the entire matching of existing records was skipped.

Accessing the external configuration inside a custom step with $this->configuration or $this->getConfiguration() is deprecated. $this->getImporter()->getExternalConfiguration() instead.

The "scheduler" system extension is required instead of just being suggested.

New stuff 

It is possible to import nested structures using the children property. For example, you can now import data into some table and its images all in one go by creating a nested structure for the "sys_file_reference" table.

The multipleRows and multipleSorting properties allow for a much clearer handling of denormalized external sources.

Check out the revamped Mapping data chapter which should hopefully help you get a better picture of what is possible with External Import and how different properties (especially the new ones) can be combined.

Custom steps can now receive an array of arbitrary parameters.

Breaking changes 

The \Cobweb\ExternalImport\Step\StoreDataStep class puts the list of stored records into the "records" member variable of the \Cobweb\ExternalImport\Domain\Model\Data object. This used to be a simple list of records for the imported table. Since child tables are now supported, the structure has changed so that there's now a list of records for each table that was imported. The table name is the key in the first dimension of the array. If you were relying on this data in a custom step, you will need to update your code as no backward-compatibility was provided for this change.

Upgrade to 4.1.0 

Version 4.1.0 introduces one breaking change. There now exists custom permissions for backend users regarding usage of the backend module. On top of table-related permissions, users must have been given explicit rights (via the user groups they belong to) to perform synchronizations or define Scheduler tasks. See the User rights chapter for more information.

Upgrade to 4.0.0 

Importer API changes 

The External Import configuration is now fully centralized in a \Cobweb\ExternalImport\Domain\Model\Configuration object. Every time you need some aspect of the configuration, you should get it via the instance of this class rather than through any other mean. The most current use case was getting the name of the current table and index from the \Cobweb\ExternalImport\Importer class, using Importer::getTableName() and Importer::getIndex(). Such methods were deprecated and should not be used anymore. Use instead:

$table = $importer->getExternalConfiguration()->getTable();
$index = $importer->getExternalConfiguration()->getIndex();
Copied!

The Importer::synchronizeData() method was renamed to Importer::synchronize() and the Importer::importData() method was renamed to Importer::import(). The old methods were kept, but are deprecated.

The Importer::synchronizeAllTables() method should not be used anymore as it does not allow for a satisfying reporting. Instead a loop should be done on all configurations and Importer::synchronize() called inside the loop. See for example \Cobweb\ExternalImport\Command\ImportCommand::execute().

Other deprecated methods are Importer::getColumnIndex() and Importer::getExternalConfig().

The Importer::getExistingUids() method was moved to a new class called \Cobweb\ExternalImport\Domain\Repository\UidRepository (which is a Singleton).

Transformation properties 

All column properties that are related to the "Transform data" scope have been grouped into a new property called transformations. This is an ordered array, which makes it possible to use transformation properties several times on the same field (e.g. calling several user functions) and to do that in a precise order. As an example, usage of such properties should be changed from:

$GLOBALS['TCA']['fe_users']['columns']['starttime']['external'] = [
      0 => [
            'field' => 'start_date',
            'trim' => true
            'userFunction' => [
                  'class' => \Cobweb\ExternalImport\Task\DateTimeTransformation::class,
                  'method' => 'parseDate'
            ]
      ]
];
Copied!

to:

$GLOBALS['TCA']['fe_users']['columns']['starttime']['external'] = [
      0 => [
            'field' => 'start_date',
            'transformations => [
                  10 => [
                        'trim' => true
                  ],
                  20 => [
                        'userFunc' => [
                              'class' => \Cobweb\ExternalImport\Task\DateTimeTransformation::class,
                              'method' => 'parseDate'
                        ]
                  ]
            ]
      ]
];
Copied!

If you want to preserve "old-style" order, the transformation properties were called in the following order up to version 3.0.x: "trim", "mapping", "value", "rteEnabled" and "userFunc". Also note that "value" was ignored if "mapping" was also defined. Now both will be taken into account if both exist (although that sounds rather like a configuration mistake).

A compatibility layer ensures that old-style transformation properties are preserved, but this is a temporary convenience, which will be removed in the next version. So please upgrade your configurations.

Renamed properties 

To continue the move to unified naming conventions for properties started in version 3.0, the mapping and MM properties which had underscores in their names were moved to lowerCamelCase name.

The old properties are interpreted for backwards-compatibility, but this will be dropped in the next major version. The backend module will show you the deprecated properties.

Breaking changes 

While all hooks were preserved as is, in the sense that they still receive a back-reference to the \Cobweb\ExternalImport\Importer object, the processParameters hook was modified due to its particular usage (it is called in the backend module, so that processed parameters can be viewed when checking the configuration). It now receives a reference to the \Cobweb\ExternalImport\Domain\Model\Configuration object and not to the \Cobweb\ExternalImport\Importer object anymore. Please update your hooks accordingly.

Upgrade to 3.0.0 

The "excludedOperations" column configuration, which was deprecated since version 2.0.0, was entirely removed. The same goes for the "mappings.uid_foreign" configuration.

More importantly the Scheduler task was renamed from tx_externalimport_autosync_scheduler_Task to \Cobweb\ExternalImport\Task\AutomatedSyncTask. As such, existing Scheduler tasks need to be updated. An upgrade wizard is provided in the Install Tool. It will automatically migrate existing old tasks.

The update wizard shows that there are tasks to update

If there are no tasks to migrate, the External Import wizard will simply not show up. Otherwise just click on the "Execute" button and follow the instructions.

Several general TCA configuration properties were renamed, to respect a global lowerCamelCase naming convention. This is the list of properties and how they were renamed:

  • additional_fields => additionalFields
  • reference_uid => referenceUid
  • where_clause => whereClause

Upgrade to 2.0.0 

The column configuration "excludedOperations" has been renamed to "disabledOperations", for consistency with the table configuration option. The "excludedOperations" is preserved for now and will log an entry into the deprecation log. You are advised to change the naming of this configuration if you use it, support will be dropped at some point in the future.

Migrating hooks 

processParameters

(deprecated)

This allows for dynamic manipulation of the parameters array before it is passed to the connector.

Example

Let's assume that you are using the CSV connector and that you would like the filename to automatically adjust to the current year. Your parameters could be something like:

'parameters' => [
    'filename' => 'fileadmin/imports/data-%Y.csv'
]
Copied!

Inside the hook, you could run strftime() on the filename parameter in order to replace "%Y" with the current year.

The hook receives the parameters array as the first argument and a reference to the current configuration object (an instance of class \Cobweb\ExternalImport\Domain\Model\Configuration) as second argument. It is expected to return the full parameters array, even if not modified.

preprocessRawRecordset

(deprecated)

This hook makes it possible to manipulate the data just after it was fetched from the remote source, but already transformed into a PHP array, no matter what the original format. The hook receives the full recordset and a back-reference to the calling object (an instance of class \Cobweb\ExternalImport\Importer) as parameters. It is expected to return a full recordset too.

This hook may throw the \Cobweb\ExternalImport\Exception\CriticalFailureException.

validateRawRecordset

(deprecated)

This hook is called during the data validation step. It is used to perform checks on the nearly raw data (it has only been through "preprocessRawRecordset") and decide whether to continue the import or not. The hook receives the full recordset and a back-reference to the calling object (an instance of class \Cobweb\ExternalImport\Importer) as parameters. It is expected to return a boolean, true if the import may continue, false if it must be aborted. Note the following: if the minimum number of records condition was not matched, the hooks will not be called at all. Import is aborted before that. If several methods are registered with the hook, the first method that returns false aborts the import. Further methods are not called.

This hook may throw the \Cobweb\ExternalImport\Exception\CriticalFailureException.

preprocessRecordset

(deprecated)

Similar to "preprocessRawRecordset", but after the transformation step, so just before it is stored to the database. The hook receives the full recordset and a back-reference to the calling object (an instance of class \Cobweb\ExternalImport\Importer) as parameters. It is expected to return a full recordset too.

This hook may throw the \Cobweb\ExternalImport\Exception\CriticalFailureException.

updatePreProcess

(deprecated)

This hook can be used to modify a record just before it is updated in the database. The hook is called for each record that has to be updated. The hook receives the complete record and a back-reference to the calling object (an instance of class \Cobweb\ExternalImport\Importer) as parameters. It is expected to return the complete record.

This hook may throw the \Cobweb\ExternalImport\Exception\CriticalFailureException.

insertPreProcess

(deprecated)

Similar to the "updatePreProcess" hook, but for the insert operation.

This hook may throw the \Cobweb\ExternalImport\Exception\CriticalFailureException.

deletePreProcess

(deprecated)

This hook can be used to modify the list of records that will be deleted. As a first parameter it receives the name of the main table, as a second parameter a list of primary keys, corresponding to the records set for deletion. The third parameter is a reference to the calling object (again, an instance of class \Cobweb\ExternalImport\Importer). The method invoked is expected to return a list of primary keys too.

This hook may throw the \Cobweb\ExternalImport\Exception\CriticalFailureException. However note that the data will already have been saved.

datamapPostProcess

(deprecated)

This hook is called after all records have been updated or inserted using the TYPO3 Core Engine. It can be used for any follow- up operation. It receives as parameters the name of the affected table, the list of records keyed to their uid (including the new uid's for the new records) and a back-reference to the calling object (an instance of class \Cobweb\ExternalImport\Importer). Each record contains an additional field called tx_externalimport:status which contains either "insert" or "update" depending on what operation was performed on the record.

This hook may throw the \Cobweb\ExternalImport\Exception\CriticalFailureException. However note that the data will already have been saved.

cmdmapPostProcess

(deprecated)

This hook is called after all records have been deleted using the TYPO3 Core Engine. It receives as parameters the name of the affected table, the list of uid's of the deleted records and a back- reference to the calling object (an instance of class \Cobweb\ExternalImport\Importer).

This hook may throw the \Cobweb\ExternalImport\Exception\CriticalFailureException. However note that the data will already have been saved.

Sitemap