DEPRECATION WARNING

This documentation is not using the current rendering mechanism and is probably outdated. The extension maintainer should switch to the new system. Details on how to use the rendering mechanism can be found here.

Linkservice

Classification:linkservice
Version:3.0.0
Language:en
Keywords:seo,link,refresh,validation,crawler
Copyright:2014
Author:Daniel Schledermann
Email:daniel@linkfactory.dk
License:This document is published under the Open Content License available from http://www.opencontent.org/opl.shtml
Rendered:2017-09-14 16:23

The content of this document is related to TYPO3, a GNU/GPL CMS/Framework available from www.typo3.org.

Table of Contents

Introduction

What does it do

Linkservice is a tool that can maintain the external links in you TYPO3 installation. It does so by crawling each link using the CURL HTTP-client and listening for "301 Moved Permanently" responses.

Configuration

Configuration is done with the extension configuration upon install. No special care is nessesary since the defaults are workable in most cases. However these settings can be tuned:

  • field_linkservice - Define which fields you need to refresh links in. The default is just tt_content.bodytext. However any RTE-text field will do. A common selection could be tt_news.bodytext or even a field on a custom extension. The only requirement is that it is a text field and that the links are encoded using RTE-style <link http://.../> or HTML-style <a href="http://.../">.
  • link_validity_period - This is a validity period for each link. It is important when links might appear in more than one record. If the crawler discover the same link within this period, it will skip the HTTP-session an use a cached answer.
  • field_validity_period - The validity for each field record. This means that the field will not be rechecked if last check was less than this period.
  • records_per_run - This is a performance setting that limits the number of records processed in each invocation of the crawler.
  • http_timeout - This is the timeout defined for a link. The default is a mere 5 seconds. A longer timeout would delay the crawling process and a long time is propably just a stressed server. You can adjust the time up to 60 seconds if you wish to get a more standard compliant behaviour.
  • generate_report - This make a report viewable for the editors so they can get information on changed links.
  • log_retention - A timeout for the crawler log.

Setting up scheduler jobs

In order for the crawler to function the scheduler jobs should be activated. There are two jobs that should both be activated.

Handling codes

Logging

If logging is activated, any response that is not "HTTP 200 OK" will be logged, so that editor will be able to review all links in the content.

Editor usage