User Manual

Indexing Documents

New documents may be indexed via the TYPO3 command line interface (CLI).

Index single document

The command kitodo:index is used for indexing a single document:

./vendor/bin/typo3 kitodo:index -d http://example.com/path/mets.xml -p 123 -s dlfCore1
Option Required Description Example
-d|--doc yes

This may be an UID of an existing document in tx_dlf_documents or the URL of a METS XML file. If the URL is already known as location in tx_dlf_documents, the file is processed anyway and the records in database and solr index are updated.

Hint: Do not encode the URL! If you have spaces in path, use quotation marks.

123 or http://example.com/path/mets.xml
-p|--pid yes The page UID of the Kitodo.Presentation data folder. This keeps all records of documents, metadata, structures, solrcores etc. 123
-s|--solr yes

This may be the UID of the solrcore record in tx_dlf_solrcores. Alternatively you may write the index name of the solr core.

The solr core must exist in table tx_dlf_solrcores on page “pid”. Otherwise an error is shown and the processing won’t start.

123 or ‘dlfCore1’
-o|--owner no This may be the UID of the library record in tx_dlf_libraries which should be set as the owner of the document. If omitted, the default is to try to read the ownership from the metadata field “owner”. 123
--dry-run no Nothing will be written to database or index. The solr-setting will be checked and the documents location URL will be shown.  
-q|--quite no Do not output any message. Usefull when using a wrapper script. The script may check the return value of the CLI job. This is always 0 on success and 1 on failure.  
-v|--verbose no Show processed documents uid and location with indexing parameters.  

Reindex collections

With the command kitodo:reindex it is possible to reindex one or more collections or even to reindex all documents on the given page.:

# reindex collection with uid 1 on page 123 with solr core 'dlfCore1'
./vendor/bin/typo3 kitodo:reindex -c 1 -p 123 -s dlfCore1

# reindex collection with uid 1 and 4 on page 123 with solr core 'dlfCore1'
./vendor/bin/typo3 kitodo:reindex -c 1,4 -p 123 -s dlfCore1

# reindex all documents on page 123 with solr core 'dlfCore1'
./vendor/bin/typo3 kitodo:reindex -a -p 123 -s dlfCore1
Option Required Description Example
-a|--all no With this option, all documents from the given page will be reindex.  
-c|--coll no This may be a single collection UID or a list of UIDs to reindex. 1 or 1,2,3
-p|--pid yes The page UID of the Kitodo.Presentation data folder. This keeps all records of documents, metadata, structures, solrcores etc. 123
-s|--solr yes

This may be the UID of the solrcore record in tx_dlf_solrcores. Alternatively you may write the index name of the solr core.

The solr core must exist in table tx_dlf_solrcores on page “pid”. Otherwise an error is shown and the processing won’t start.

123 or ‘dlfCore1’
-o|--owner no This may be the UID of the library record in tx_dlf_libraries which should be set as the owner of the documents. If omitted, the default is to try to read the ownership from the metadata field “owner”. 123
--dry-run no Nothing will be written to database or index. All documents will be listed which would be processed on a real run.  
-q|--quite no Do not output any message. Usefull when using a wrapper script. The script may check the return value of the CLI job. This is always 0 on success and 1 on failure.  
-v|--verbose no Show each processed documents uid and location with timestamp and amount of processed/all documents.  

Harvest OAI-OMH interface

With the command kitodo:harvest it is possible to harvest an OAI-PMH interface and index all fetched records.:

# example
./vendor/bin/typo3 kitodo:harvest --lib=<UID> --pid=<PID> --solr=<CORE> --from=<timestamp> --until=<timestamp> --set=<set>

In order to use the command, you first have to configure a library in the backend, setting at least a label and oai_base. The latter should be a valid OAI-PMH base URL (e.g. https://digital.slub-dresden.de/oai/).

Option Required Description Example
-l|--lib yes This is the UID of the library record with the OAI interface that should be harvested. This library is also automatically set as the documents’ owner. 123
-p|--pid yes This is the page UID of the library record and therefore the page the documents are added to. 123
-s|--solr yes

This may be the UID of the solrcore record in tx_dlf_solrcores. Alternatively you may write the index name of the solr core.

The solr core must exist in table tx_dlf_solrcores on page “pid”. Otherwise an error is shown and the processing won’t start.

123 or ‘dlfCore1’
--from no This is a timestamp in the format YYYY-MM-DD. The parameters from and until limit harvesting to the given period, e.g. for incremental updates. 2021-01-01
--until no This is a timestamp in the format YYYY-MM-DD. The parameters from and until limit harvesting to the given period, e.g. for incremental updates. 2021-06-30
--set no This is the name of an OAI set. The parameter limits harvesting to the given set. ‘vd18’
--dry-run no Nothing will be written to database or index. All documents will be listed which would be processed on a real run.  
-q|--quite no Do not output any message. Usefull when using a wrapper script. The script may check the return value of the CLI job. This is always 0 on success and 1 on failure.  
-v|--verbose no Show each processed documents uid and location with timestamp and amount of processed/all documents.