.. You may want to use the usual include line. Uncomment and adjust the path. .. include:: ../Includes.txt ============== EXT: webparser ============== :Author: Kasper Skårhøj :Created: 2002-11-01T00:32:00 :Changed: 2006-05-04T19:16:09 :Author: Reto Grimm :Email: t3rg@zeitwerk.com :Info 3: :Info 4: .. _EXT-webparser: EXT: webparser ============== Extension Key: **webparser** Copyright 2006, Reto Grimm, This document is published under the Open Content License available from http://www.opencontent.org/opl.shtml The content of this document is related to TYPO3 \- a GNU/GPL CMS/Framework available from www.typo3.com .. _Table-of-Contents: Table of Contents ----------------- **EXT: webparser 1** **Introduction 1** What does it do? 1 Screenshots 2 **Users manual 3** How to use it 3 The syntax of the parsercode field 3 The syntax of the plugin template 4 **Known problems 5** Limitations 5 **To-Do list 5** **Changelog 5** .. _Introduction: Introduction ------------ .. _What-does-it-do: What does it do? ^^^^^^^^^^^^^^^^ The webparser reads one or more URLs and parse the data for use into a frontend plugin. The results will be cached for a period of validity to reduce transfer size and -time. There is a function included, which allows you to receive an infomail if the requested page has changed or is not accessible. You can parse the content of external webpages with given commands and extract the interesting information. This informations you can now insert with placeholders to a frontend plugin, which show the content on your website. This will offer you the most flexible way to make use of the required data. Please take care for legal aspects! In most cases content of other webpages is copyright protected stuff. This extension shoud not be an invitation to steal content. .. _Screenshots: Screenshots ^^^^^^^^^^^ After installing the plugin with the Extension Manager you have the ability to create and administer webparser-sheet in the Web->List- Module. |img-1| The setup of a webparser sheet is easy: Give it a name, insert a definition (see above) and set the period of validity. |img-2| .. _Users-manual: Users manual ------------ .. _How-to-use-it: How to use it ^^^^^^^^^^^^^ After installation the plugin offers a new recordtype “websheet”. In the list-view you can add new sheets with “Create new record” and edit them. The definitions of the “how-to-parse” and of the period of validity are stored in these “sheets”, like the requested data. This webparser-sheets are located in a SysFolder or anywhere in the pagetree (whereas SysFolders are the better way). The definitions of the output-layout are located in the plugin itself. The data stored in the sheets can inserted in placeholders in this output-layout. .. _The-syntax-of-the-parsercode-field: The syntax of the parsercode field ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The way to configure your webparser sheet is similar to TypoScript, but it is not! The following structs are possible: :: array.key.command = value array { otherkey.command = value } array { key { command = value othercom = otherval } } Values for array: **inp, out** and **tmp** Only the content of the out-Array is stored and can used in the output! Values for keys: every name with **a-z, 0-9, - and \_** .. ### BEGIN~OF~TABLE ### .. _: :: "" .. container:: table-row Command :: = Example :: tmp.data = This is a test tmp.data = 'This is a test ' tmp.data = {$tmp.otherkey} Description .. _: :: "" .. container:: table-row Command :: .= .add Example :: tmp.data .= who say hello tmp.data.add = who say hello Description Attach data to the var .. _: :: "" .. container:: table-row Command :: .url Example :: inp.urldata.url = http://www.xyz.com/a.htm Description Read URL .. _: :: "" .. container:: table-row Command :: < .cutAfter Example :: inp.test = This is bold inp.test.cutAfter = # inp.test get now 'This is ' Description Delivers the part of the string which is before the given string. .. _: :: "" .. container:: table-row Command :: > .cutBefore Example :: inp.test = This is bold inp.test.cutBefore = # inp.test get now 'bold ' Description Delivers the part of the string which is after the given string. .. _: :: "" .. container:: table-row Command :: .between Example :: inp.test.between = start|end Description Only values between the first (!) start- and the first following end- mark will be returned. .. _: :: "" .. container:: table-row Command :: .split Example :: inp.test = 'This is a test' inp.test.split = ' ' inp.data = {$tmp.1} # inp.data get now 'is' Description Split the given variable in the 'tmp'-array. Existing values in tmp will be overwritten! Now you can use any part of this array by a numerical index. .. _: :: "" .. container:: table-row Command :: .removeTags Example :: tmp.htmldata.removeTags = * tmp.htmldata.removeTags = a,div Description Remove all xml/html-Tags (\*) or... ...some Tags ( and
) incl. closing Tags. .. _: :: "" .. container:: table-row Command :: .replace Example :: tmp.data.replace = old|new Description Substitute “old” with “new”. .. _Unknown-Property: ((Unknown Property)) """""""""""""""""""" .. container:: table-row Command Example Description .. _Unknown-Property: ((Unknown Property)) """""""""""""""""""" .. container:: table-row Command Example Description .. _Configuration-Commands: Configuration Commands """""""""""""""""""""" .. container:: table-row Command Configuration Commands Example Example Description Description: .. _: :: "" .. container:: table-row Command :: config.errorCond Example :: config.errorCond = tmp.data == 'required' Description Not implemented .. _: :: "" .. container:: table-row Command :: config.errorMail Example :: config.errorMail = error@xyz.com Description Not implemented .. _: :: "" .. container:: table-row Command :: config.debug Example :: config.debug = 1 Description Debug switch for additional information .. _: :: "" .. container:: table-row Command :: config.htmlspecialchars Example :: config.htmlspecialchars = 1 Description Activate htmlspecialchars .. ###### END~OF~TABLE ###### .. _generated: ((generated)) """"""""""""" .. _Examples: Examples ~~~~~~~~ Here are some examples of the webparser-sheet configuration. :: inp.urldata.url = http://www.somedomain.com/somepage.html # cut data after away... inp.urldata.cutBefore = # split data by in array tmp... inp.urldata.split = # out.data get now the first element of tmp... out.data = {$tmp.0} # Second element i... out.data = {$tmp.0} # Remove a and img-tags... out.data.removeTags = a,img # open another domain... inp.otherdomain.url = http://www.zeitwerk.com/ # take content between values... inp.otherdomain.between = # add text to the result... inp.otherdomain.add = Text to added # add text with space... inp.otherdomain.add = "Text with Space " # It an error occurs send a mail.... config.errorMail = admin@somedomain.com .. _The-syntax-of-the-plugin-template: The syntax of the plugin template ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The field “template” in the Frontend plugin is only a container for outputting the values. |img-3| You can use html-code with placeholders. The syntax of a placeholder is :: {$key} key means any value in the out-array. $temp means the value of 'out.temp'. .. _Known-problems: Known problems -------------- .. _Limitations: Limitations ^^^^^^^^^^^ On some URLs you will not be sucessful with parsing, because they separate content with JavaScript. .. _To-Do-list: To-Do list ---------- Many things, this is the initial alpha... - Conditions for extending error handling - Regular Expressions .. _Changelog: Changelog --------- - 2006-05-02 first public release - 2006-05-03 added flexforms, mail confirmation |img-4| EXT: webparser - 5 .. ######CUTTER_MARK_IMAGES###### .. |img-1| image:: img-1.png .. :align: left .. :border: 0 .. :height: 126 .. :id: Grafik2 .. :name: Grafik2 .. :width: 473 .. |img-2| image:: img-2.png .. :align: left .. :border: 0 .. :height: 430 .. :id: Grafik1 .. :name: Grafik1 .. :width: 493 .. |img-3| image:: img-3.png .. :align: left .. :border: 0 .. :height: 284 .. :id: Grafik3 .. :name: Grafik3 .. :width: 532 .. |img-4| image:: img-4.png .. :align: left .. :border: 0 .. :height: 32 .. :id: Graphic1 .. :name: Graphic1 .. :width: 102