DEPRECATION WARNING

This documentation is not using the current rendering mechanism and is probably outdated. The extension maintainer should switch to the new system. Details on how to use the rendering mechanism can be found here.

EXT: webparser

Author:Kasper Skårhøj
Created:2002-11-01T00:32:00
Changed:2006-05-04T19:16:09
Author:Reto Grimm
Email:t3rg@zeitwerk.com
Info 3:
Info 4:

EXT: webparser

Extension Key: webparser

Copyright 2006, Reto Grimm, <t3rg@zeitwerk.com>

This document is published under the Open Content License

available from http://www.opencontent.org/opl.shtml

The content of this document is related to TYPO3

- a GNU/GPL CMS/Framework available from www.typo3.com

Table of Contents

EXT: webparser 1

Introduction 1

What does it do? 1

Screenshots 2

Users manual 3

How to use it 3

The syntax of the parsercode field 3

The syntax of the plugin template 4

Known problems 5

Limitations 5

To-Do list 5

Changelog 5

Introduction

What does it do?

The webparser reads one or more URLs and parse the data for use into a frontend plugin. The results will be cached for a period of validity to reduce transfer size and -time. There is a function included, which allows you to receive an infomail if the requested page has changed or is not accessible.

You can parse the content of external webpages with given commands and extract the interesting information. This informations you can now insert with placeholders to a frontend plugin, which show the content on your website.

This will offer you the most flexible way to make use of the required data.

Please take care for legal aspects! In most cases content of other webpages is copyright protected stuff. This extension shoud not be an invitation to steal content.

Screenshots

After installing the plugin with the Extension Manager you have the ability to create and administer webparser-sheet in the Web->List- Module.

img-1

The setup of a webparser sheet is easy: Give it a name, insert a definition (see above) and set the period of validity.

img-2

Users manual

How to use it

After installation the plugin offers a new recordtype “websheet”. In the list-view you can add new sheets with “Create new record” and edit them. The definitions of the “how-to-parse” and of the period of validity are stored in these “sheets”, like the requested data. This webparser-sheets are located in a SysFolder or anywhere in the pagetree (whereas SysFolders are the better way).

The definitions of the output-layout are located in the plugin itself. The data stored in the sheets can inserted in placeholders in this output-layout.

The syntax of the parsercode field

The way to configure your webparser sheet is similar to TypoScript, but it is not!

The following structs are possible:

array.key.command = value
array {
  otherkey.command = value
}
array {
  key {
    command = value
    othercom = otherval
  }
}

Values for array: inp, out and tmp

Only the content of the out-Array is stored and can used in the output!

Values for keys: every name with a-z, 0-9, - and _

::

Command

=

Example

tmp.data = This is a test
tmp.data = 'This is a test '
tmp.data = {$tmp.otherkey}

Description

::

Command

.=
.add

Example

tmp.data .= who say hello
tmp.data.add = who say hello

Description

Attach data to the var

::

Command

.url

Example

inp.urldata.url = http://www.xyz.com/a.htm

Description

Read URL

::

Command

<
.cutAfter

Example

inp.test = This is <b>bold</b>
inp.test.cutAfter = <b>
# inp.test get now 'This is '

Description

Delivers the part of the string which is before the given string.

::

Command

>
.cutBefore

Example

inp.test = This is <b>bold</b>
inp.test.cutBefore = <b>
# inp.test get now 'bold </b>'

Description

Delivers the part of the string which is after the given string.

::

Command

.between

Example

inp.test.between = start|end

Description

Only values between the first (!) start- and the first following end- mark will be returned.

::

Command

.split

Example

inp.test = 'This is a test'
inp.test.split = ' '
inp.data = {$tmp.1}
# inp.data get now 'is'

Description

Split the given variable in the 'tmp'-array. Existing values in tmp will be overwritten! Now you can use any part of this array by a numerical index.

::

Command

.removeTags

Example

tmp.htmldata.removeTags = *
tmp.htmldata.removeTags = a,div

Description

Remove all xml/html-Tags (*) or...

...some Tags (<a> and <div>) incl. closing Tags.

::

Command

.replace

Example

tmp.data.replace = old|new

Description

Substitute “old” with “new”.

((Unknown Property))

Command

Example

Description

((Unknown Property))

Command

Example

Description

Configuration Commands

Command

Configuration Commands

Example

Example

Description

Description:

::

Command

config.errorCond

Example

config.errorCond = tmp.data == 'required'

Description

Not implemented

::

Command

config.errorMail

Example

config.errorMail = error@xyz.com

Description

Not implemented

::

Command

config.debug

Example

config.debug = 1

Description

Debug switch for additional information

::

Command

config.htmlspecialchars

Example

config.htmlspecialchars = 1

Description

Activate htmlspecialchars

((generated))
Examples

Here are some examples of the webparser-sheet configuration.

inp.urldata.url = http://www.somedomain.com/somepage.html

# cut data after <endtoken> away...
inp.urldata.cutBefore = <endtoken>

# split data by <tag> in array tmp...
inp.urldata.split = <tag>

# out.data get now the first element of tmp...
out.data = {$tmp.0}

# Second element i...
out.data = {$tmp.0}

# Remove a and img-tags...
out.data.removeTags = a,img

# open another domain...
inp.otherdomain.url = http://www.zeitwerk.com/

# take content between values...
inp.otherdomain.between = <meta name="description" content="|">

# add text to the result...
inp.otherdomain.add = Text to added

# add text with space...
inp.otherdomain.add = "Text with Space "

# It an error occurs send a mail....
config.errorMail = admin@somedomain.com

The syntax of the plugin template

The field “template” in the Frontend plugin is only a container for outputting the values.

img-3

You can use html-code with placeholders. The syntax of a placeholder is

{$key}

key means any value in the out-array. $temp means the value of 'out.temp'.

Known problems

Limitations

On some URLs you will not be sucessful with parsing, because they separate content with JavaScript.

To-Do list

Many things, this is the initial alpha...

  • Conditions for extending error handling
  • Regular Expressions

Changelog

  • 2006-05-02 first public release
  • 2006-05-03 added flexforms, mail confirmation

img-4 EXT: webparser - 5