DEPRECATION WARNING
This documentation is not using the current rendering mechanism and is probably outdated. The extension maintainer should switch to the new system. Details on how to use the rendering mechanism can be found here.
EXT: webparser¶
Author: | Kasper Skårhøj |
---|---|
Created: | 2002-11-01T00:32:00 |
Changed: | 2006-05-04T19:16:09 |
Author: | Reto Grimm |
Email: | t3rg@zeitwerk.com |
Info 3: | |
Info 4: |
EXT: webparser¶
Extension Key: webparser
Copyright 2006, Reto Grimm, <t3rg@zeitwerk.com>
This document is published under the Open Content License
available from http://www.opencontent.org/opl.shtml
The content of this document is related to TYPO3
- a GNU/GPL CMS/Framework available from www.typo3.com
Table of Contents¶
EXT: webparser 1
Introduction 1
What does it do? 1
Screenshots 2
Users manual 3
How to use it 3
The syntax of the parsercode field 3
The syntax of the plugin template 4
Known problems 5
Limitations 5
To-Do list 5
Changelog 5
Introduction¶
What does it do?¶
The webparser reads one or more URLs and parse the data for use into a frontend plugin. The results will be cached for a period of validity to reduce transfer size and -time. There is a function included, which allows you to receive an infomail if the requested page has changed or is not accessible.
You can parse the content of external webpages with given commands and extract the interesting information. This informations you can now insert with placeholders to a frontend plugin, which show the content on your website.
This will offer you the most flexible way to make use of the required data.
Please take care for legal aspects! In most cases content of other webpages is copyright protected stuff. This extension shoud not be an invitation to steal content.
Screenshots¶
After installing the plugin with the Extension Manager you have the ability to create and administer webparser-sheet in the Web->List- Module.
The setup of a webparser sheet is easy: Give it a name, insert a definition (see above) and set the period of validity.
Users manual¶
How to use it¶
After installation the plugin offers a new recordtype “websheet”. In the list-view you can add new sheets with “Create new record” and edit them. The definitions of the “how-to-parse” and of the period of validity are stored in these “sheets”, like the requested data. This webparser-sheets are located in a SysFolder or anywhere in the pagetree (whereas SysFolders are the better way).
The definitions of the output-layout are located in the plugin itself. The data stored in the sheets can inserted in placeholders in this output-layout.
The syntax of the parsercode field¶
The way to configure your webparser sheet is similar to TypoScript, but it is not!
The following structs are possible:
array.key.command = value
array {
otherkey.command = value
}
array {
key {
command = value
othercom = otherval
}
}
Values for array: inp, out and tmp
Only the content of the out-Array is stored and can used in the output!
Values for keys: every name with a-z, 0-9, - and _
::¶
Command
=
Example
tmp.data = This is a test
tmp.data = 'This is a test '
tmp.data = {$tmp.otherkey}
Description
::¶
Command
.=
.add
Example
tmp.data .= who say hello
tmp.data.add = who say hello
Description
Attach data to the var
::¶
Command
.url
Example
inp.urldata.url = http://www.xyz.com/a.htm
Description
Read URL
::¶
Command
<
.cutAfter
Example
inp.test = This is <b>bold</b>
inp.test.cutAfter = <b>
# inp.test get now 'This is '
Description
Delivers the part of the string which is before the given string.
::¶
Command
>
.cutBefore
Example
inp.test = This is <b>bold</b>
inp.test.cutBefore = <b>
# inp.test get now 'bold </b>'
Description
Delivers the part of the string which is after the given string.
::¶
Command
.between
Example
inp.test.between = start|end
Description
Only values between the first (!) start- and the first following end- mark will be returned.
::¶
Command
.split
Example
inp.test = 'This is a test'
inp.test.split = ' '
inp.data = {$tmp.1}
# inp.data get now 'is'
Description
Split the given variable in the 'tmp'-array. Existing values in tmp will be overwritten! Now you can use any part of this array by a numerical index.
::¶
Command
.removeTags
Example
tmp.htmldata.removeTags = *
tmp.htmldata.removeTags = a,div
Description
Remove all xml/html-Tags (*) or...
...some Tags (<a> and <div>) incl. closing Tags.
::¶
Command
.replace
Example
tmp.data.replace = old|new
Description
Substitute “old” with “new”.
((Unknown Property))¶
Command
Example
Description
((Unknown Property))¶
Command
Example
Description
Configuration Commands¶
Command
Configuration Commands
Example
Example
Description
Description:
::¶
Command
config.errorCond
Example
config.errorCond = tmp.data == 'required'
Description
Not implemented
::¶
Command
config.errorMail
Example
config.errorMail = error@xyz.com
Description
Not implemented
::¶
Command
config.debug
Example
config.debug = 1
Description
Debug switch for additional information
::¶
Command
config.htmlspecialchars
Example
config.htmlspecialchars = 1
Description
Activate htmlspecialchars
((generated))¶
Examples¶
Here are some examples of the webparser-sheet configuration.
inp.urldata.url = http://www.somedomain.com/somepage.html
# cut data after <endtoken> away...
inp.urldata.cutBefore = <endtoken>
# split data by <tag> in array tmp...
inp.urldata.split = <tag>
# out.data get now the first element of tmp...
out.data = {$tmp.0}
# Second element i...
out.data = {$tmp.0}
# Remove a and img-tags...
out.data.removeTags = a,img
# open another domain...
inp.otherdomain.url = http://www.zeitwerk.com/
# take content between values...
inp.otherdomain.between = <meta name="description" content="|">
# add text to the result...
inp.otherdomain.add = Text to added
# add text with space...
inp.otherdomain.add = "Text with Space "
# It an error occurs send a mail....
config.errorMail = admin@somedomain.com
The syntax of the plugin template¶
The field “template” in the Frontend plugin is only a container for outputting the values.
You can use html-code with placeholders. The syntax of a placeholder is
{$key}
key means any value in the out-array. $temp means the value of 'out.temp'.
Known problems¶
Limitations¶
On some URLs you will not be sucessful with parsing, because they separate content with JavaScript.
To-Do list¶
Many things, this is the initial alpha...
- Conditions for extending error handling
- Regular Expressions
Changelog¶
- 2006-05-02 first public release
- 2006-05-03 added flexforms, mail confirmation
EXT: webparser - 5