User Manual

Indexing Documents

New documents may be indexed via the TYPO3 command line interface (CLI).

Index single document

The command kitodo:index is used for indexing a single document:

./vendor/bin/typo3 kitodo:index -d http://example.com/path/mets.xml -p 123 -s dlfCore1

Option

Required

Description

Example

-d|--doc

yes

This may be an UID of an existing document in tx_dlf_documents or the URL of a METS XML file. If the URL is already known as location in tx_dlf_documents, the file is processed anyway and the records in database and solr index are updated.

Hint: Do not encode the URL! If you have spaces in path, use quotation marks.

123 or http://example.com/path/mets.xml

-p|--pid

yes

The page UID of the Kitodo.Presentation data folder. This keeps all records of documents, metadata, structures, solrcores etc.

123

-s|--solr

yes

This may be the UID of the solrcore record in tx_dlf_solrcores. Alternatively you may write the index name of the solr core.

The solr core must exist in table tx_dlf_solrcores on page "pid". Otherwise an error is shown and the processing won't start.

123 or 'dlfCore1'

-o|--owner

no

This may be the UID of the library record in tx_dlf_libraries which should be set as the owner of the document. If omitted, the default is to try to read the ownership from the metadata field "owner".

123

--dry-run

no

Nothing will be written to database or index. The solr-setting will be checked and the documents location URL will be shown.

-q|--quite

no

Do not output any message. Useful when using a wrapper script. The script may check the return value of the CLI job. This is always 0 on success and 1 on failure.

-v|--verbose

no

Show processed documents uid and location with indexing parameters.

Reindex collections

With the command kitodo:reindex it is possible to reindex one or more collections or even to reindex all documents on the given page.:

# reindex collection with uid 1 on page 123 with solr core 'dlfCore1'
./vendor/bin/typo3 kitodo:reindex -c 1 -p 123 -s dlfCore1

# reindex collection with uid 1 and 4 on page 123 with solr core 'dlfCore1'
./vendor/bin/typo3 kitodo:reindex -c 1,4 -p 123 -s dlfCore1

# reindex all documents on page 123 with solr core 'dlfCore1'
./vendor/bin/typo3 kitodo:reindex -a -p 123 -s dlfCore1

Option

Required

Description

Example

-a|--all

no

With this option, all documents from the given page will be reindex.

-c|--coll

no

This may be a single collection UID or a list of UIDs to reindex.

1 or 1,2,3

-p|--pid

yes

The page UID of the Kitodo.Presentation data folder. This keeps all records of documents, metadata, structures, solrcores etc.

123

-s|--solr

yes

This may be the UID of the solrcore record in tx_dlf_solrcores. Alternatively you may write the index name of the solr core.

The solr core must exist in table tx_dlf_solrcores on page "pid". Otherwise an error is shown and the processing won't start.

123 or 'dlfCore1'

-o|--owner

no

This may be the UID of the library record in tx_dlf_libraries which should be set as the owner of the documents. If omitted, the default is to try to read the ownership from the metadata field "owner".

123

--dry-run

no

Nothing will be written to database or index. All documents will be listed which would be processed on a real run.

-q|--quite

no

Do not output any message. Useful when using a wrapper script. The script may check the return value of the CLI job. This is always 0 on success and 1 on failure.

-v|--verbose

no

Show each processed documents uid and location with timestamp and amount of processed/all documents.

Harvest OAI-PMH interface

With the command kitodo:harvest it is possible to harvest an OAI-PMH interface and index all fetched records.:

# example
./vendor/bin/typo3 kitodo:harvest --lib=<UID> --pid=<PID> --solr=<CORE> --from=<timestamp> --until=<timestamp> --set=<set>

In order to use the command, you first have to configure a library in the backend, setting at least a label and oai_base. The latter should be a valid OAI-PMH base URL (e.g. https://digital.slub-dresden.de/oai/).

Option

Required

Description

Example

-l|--lib

yes

This is the UID of the library record with the OAI interface that should be harvested. This library is also automatically set as the documents' owner.

123

-p|--pid

yes

This is the page UID of the library record and therefore the page the documents are added to.

123

-s|--solr

yes

This may be the UID of the solrcore record in tx_dlf_solrcores. Alternatively you may write the index name of the solr core.

The solr core must exist in table tx_dlf_solrcores on page "pid". Otherwise an error is shown and the processing won't start.

123 or 'dlfCore1'

--from

no

This is a timestamp in the format YYYY-MM-DD. The parameters from and until limit harvesting to the given period, e.g. for incremental updates.

2021-01-01

--until

no

This is a timestamp in the format YYYY-MM-DD. The parameters from and until limit harvesting to the given period, e.g. for incremental updates.

2021-06-30

--set

no

This is the name of an OAI set. The parameter limits harvesting to the given set.

'vd18'

--dry-run

no

Nothing will be written to database or index. All documents will be listed which would be processed on a real run.

-q|--quite

no

Do not output any message. Useful when using a wrapper script. The script may check the return value of the CLI job. This is always 0 on success and 1 on failure.

-v|--verbose

no

Show each processed documents uid and location with timestamp and amount of processed/all documents.