.. You may want to use the usual include line. Uncomment and adjust the path. .. include:: ../Includes.txt ================ EXT: news feeder ================ :Author: Kasper Skårhøj :Created: 2002-11-01T00:32:00 :Changed: 2014-11-05T10:14:25.790000000 :Author: Alex Tuveri, University of Udine :Email: at@uniud.it :Info 3: http://www.luxaeterna.it :Info 4: .. _EXT-news-feeder: EXT: news feeder ================ Extension Key: **ttnews\_feeder** Copyright 2000-2014, Alex Tuveri, University of Udine, **current version: 3.0.1 BETA** This document is published under the Open Content License available from http://www.opencontent.org/opl.shtml The content of this document is related to TYPO3 \- a GNU/GPL CMS/Framework available from www.typo3.com .. _Table-of-Contents: Table of Contents ----------------- **EXT: news feeder 1** **Introduction 1** What does it do? 1 Screenshots 2 Extension tested on... 4 Stable, unstable or beta? 4 **User manual 4** News approval 4 News Statistics 4 Manual Check 4 Site/Search Engine Test 4 Delete news 5 Clean Database 5 Show Configuration 5 Load site definitions 5 FAQ 6 **Administration 6** Installation notes 6 Configuration example 6 FAQ 7 **Configuration 7** How to define a new engine/site 7 Titles excluded, accredited and refused sites 9 How define and use keywords 9 Test mode 9 Production mode 10 CRON mode 10 Notes about the images 11 FAQ 11 Reference 12 **To Do 13** **Known problems 13** **To-Do list 13** **Changelog 14** .. _Introduction: Introduction ------------ .. _What-does-it-do: What does it do? ^^^^^^^^^^^^^^^^ If you want to fetch news from Google, Altavista, Excite this extension might fit your needs. With this +ext you can also check sites (not engine!), parse the page and retrieve the news required. This +ext it is not a RSS system to retrieve news from search engines, for that purpose you can use another extension downloadable from typo3.org. **The advantage of a ttnews\_feeder:** The product is very flexible and useful; the main purpose is to get fresh news from search engines/single (dynamic or not) sites), manually or through CRON. The aim is to have a simple system to populate your TYPO3 site and give more interesting things to you visitors. *With this extension you can* : - Fetch news from search engines (Google, Excite, etc.)You can define several parameters: keywords to search, keywords to exclude, how many news to fetch, etc. - Fetch news (virtually) from static/dynamic sites that do not export their news via RSS *Among other things you can define:* - one or more sys folder to store the news (each with its own configuraton) - one or more sys folder to store your keywords and search parameters - keywords to search on the requested engine and excluded - relate each keyword to the desired site/engine - categorize each keyword: with this option the news will be associated to the news categories and published on the - site in a different way according your needings. - Image supported: it will be downloaded and stored in you server, resized and related to the news fetched - titles to exclude or part of them - sites to exclude, undesired - accredited sites, if loaded and recognized the news will be automatically published on the FE - run mode: test mode and production for each site: CRON mode, MANUAL CHECK , CRON+MANUAL CHECK to satisfy all needings. - CRON mode keeps clean your DB for internal/external news without any operator intervention - Full report via email for CRON mode for administrator - Partial report for the news responsible .. _Screenshots: Screenshots ^^^^^^^^^^^ **Manual check** as you can see some records was accepted automatically and published, other is waiting for approval. Photos and Images are retrieved and stored in your server! |img-1| When you click on **Manual check** *ttnews\_feeder* connect to Google and other engines or static sites previously defined and fetch the news according the given parameters. An icon explains the record *status* : News Feeder check for duplicated records and marks the status as refused. You can run the **test mode** and **simulate the production mode;** this is very comfortable way to test one or more sites and pass them in production mode when all is OK. **WARNING!** Don't press the button “Run manual check” twice! Just pressed, some browsers like msie7+ seems to do nothing. Just wait for the results. **News approval** (TYPO3 4.0.2+):Three options: **suspend,** **delete** , **approve** . |img-2| **Test mode** You can select individually the site you need to test or invert selection. Sites hidden wil not be considered. |img-3| **Load sites definition** Configure the commonest search engine is very easy, simple select what you want and click the button – you are ready to run. Define one or more keywords and you fetch the news!!! |img-4| .. _Extension-tested-on: Extension tested on... ^^^^^^^^^^^^^^^^^^^^^^ This extension works fine and was tested successfully on TYPO3 3.8.1, 4.0, 4.0.2, 4.1.1 and 4.1.4, under PHP 4.4.x -> and PHP 5.2.x ->. .. _Stable-unstable-or-beta: Stable, unstable or beta? ^^^^^^^^^^^^^^^^^^^^^^^^^ Since v.1.0 News Feeder was declared ***stable*** because it can read the news and extractc correctly contents (except last v. 1.1.20-22 cause changes in HTML code by Google.it/.com). However this extension works correctlyt (see Todo List and Known problems) and will be declared Beta only if there are major problems causing great instability.However some problems could depend from new sites definitions not loaded. At each update **do not forget to reload site definition.** .. _User-manual: User manual ----------- .. _News-approval: **News approval** ^^^^^^^^^^^^^^^^^ When ttnews\_feeder is launched interactively or via CRON it stores the news in the DB for the sites marked 'production' ; news fetched from accredited sites are immediately published (to do it, please configure your TSConfig properly, parameter: clearCachePages).If you don't clear the cache or the page cache is not cleared using other methods your news will not available in the FE.News approval is very easy. Please select the item: **News approval** from top-right menu and wait. For each news item you will see some data and the url. If you want to check the original page press to the url and the page will be opened in a new window. Click on the radio-button: **suspend** keep the news suspended, no effect on status **delete** delete the news (hidden) **approve** news is approved and published Just decided what to do, press the **Confirm** button. .. _News-Statistics: **News Statistics** ^^^^^^^^^^^^^^^^^^^ Here you can see the stats for news published, deleted, to approve, etc. .. _Manual-Check: **Manual Check** ^^^^^^^^^^^^^^^^ Click on Web > News Feederand click on your FEEDER FOLDER. I suggest you, before run a Manual Check, to define correctly one or more sites and then test them through the menu 'Site/Search Engine Test'. Manual Check loads the news retrieved in your database; the news fetched from accredited sites will be immediately available online if you set the cache parameters correctly (see forward for the parameter: clearCachePages). .. _Site-Search-Engine-Test: **Site/Search Engine Test** ^^^^^^^^^^^^^^^^^^^^^^^^^^^ **This otpion is only for Admins -** Click on Web > News Feederand click on your FEEDER FOLDER. You can define one or more search engines/site to visit and easily fetch the news required. Just you have define one site and marked it as 'test-site' you can try if it works correctly and test the criteria loaded for exclusion or automatic approval.Note: this option visit all sites and repeats the visit for each keyword associated. .. _Delete-news: **Delete news** ^^^^^^^^^^^^^^^ This option allow you to delete manually all news updated, according the preferences selected for each searc engine/site defined. News will be not really deleted, it will be loaded on your database as record marked 'deleted'. This is very useful because News Feeder will check if a title is already loaded and all criteria will work until the record will be removed definitely. .. _Clean-Database: **Clean Database** ^^^^^^^^^^^^^^^^^^ Acts only on the records deleted with the previous; records will be definitely removed from your database after the number of days according to your preferences, see option removeExternalOldNews (external News)and helps you to maintain clean your database for your internal News, see the option: removeMyOldNews. *Images note* : this option will remove definitely all images related to your news. .. _Show-Configuration: **Show Configuration** ^^^^^^^^^^^^^^^^^^^^^^ This is a simple report for each search engine, showing if the engine is under test, hidden and other parameters for delete and clean options. .. _Load-site-definitions: Load site definitions ^^^^^^^^^^^^^^^^^^^^^ **This otpion is only for Admins -** This allows the admin user to *load* any of predefined sites listed and checked. **New site** if the site was not loaded before, it will be added on your database. Each new site added will be configured to run on test mode; to run it on production mode you should edit the record properties and change the status. **Update** if the site was created before using **News feeder** the site (if checked) will be automatically updated. Updating process modifies only the fields containing the occurrences to extract records from the page and the site url to connect.You must re-update your site definitions when something goes wrong (i.e. You can't more read news from Google.com). *Important* – if you need update your sites, remember that running this options News Feeder doesn't uses internet to establish a connection and download new definitions. You must to reinstall the extension. To do it the best way is to download directly from typo3.org/extensions/and avoid older version (often mirrors are not updated). *Warning* – updating process override all fields values and it is based on the **creation date** for the records listed. The only way to use data from a pre-defined site is to copy it ONLY using the BE interface; infact the creation date changes and you have a new site that will be not more update. This could happen e.g. you are **dutch** and you need to copy ' *google.com news* ' site to keep the original site and modify the copied (e.g. *google.nl news).* Read the following steps: - first time load ' *google.* ***com*** *news* ' site definitions - through BE interface make a copy and paste - rename the new (copied) site to *google.* ***nl*** according to your needings (adjust the name, URL, etc. connecting before to *google.* ***nl*** and after doing some tests). - Edit the new (copied) site and apply your modifications, make a test - Next time News Feeder will not touch the *google.* ***nl*** site definition; it will update only google.com definitions. - If you want *collaborate* please send me a copy of your definition (you can save from BE – simply press right button -- your window and attach it to the email). Latest site update .. ### BEGIN~OF~TABLE ### .. _news-google-com: news.google.com """"""""""""""" .. container:: table-row Site name news.google.com Site type Search engine Review date NOT SUPPORTED(1) .. _news-google-it: news.google.it """""""""""""" .. container:: table-row Site name news.google.it Site type Search engine Review date NOT SUPPORTED(1) .. _yahoo-com-news-english: yahoo.com news (english) """""""""""""""""""""""" .. container:: table-row Site name yahoo.com news (english) Site type Search engine Review date Dec 2011 .. _yahoo-it-news-italian: yahoo.it news (italian) """"""""""""""""""""""" .. container:: table-row Site name yahoo.it news (italian) Site type Search engine Review date Dec 2011 .. _yahoo-it-news-german: yahoo.it news (german) """""""""""""""""""""" .. container:: table-row Site name yahoo.it news (german) Site type Search engine Review date Dec 2011 .. _it-bing-com-talian: it.bing.com/ (talian) """"""""""""""""""""" .. container:: table-row Site name it.bing.com/ (talian) Site type Search engine Review date Dec 2011 (2) .. _www-bing-com-deutsch: www.bing.com (deutsch) """""""""""""""""""""" .. container:: table-row Site name www.bing.com (deutsch) Site type Search engine Review date Sept 2012 (2) .. ###### END~OF~TABLE ###### (1) Since Dec,1 2011 the news published via google are displayed in the page of the browser using javascript, so that it is not possible to fetch them. Within 1-2 months a new +ext will be available to read the news using POP3 and store the fresh news into DB.(2) Bing detects the location of your server and give you back the news according the languare of your location. Deutsch mean that the language of the news will be 'German' for the connections to Bing from Germany. **HINT** : re-edit your keywords and relate them to new search engines to ensure fresh news for your site (you can use yahoo, bing, etc.) **About yahoo.it/.com** - this engine show images that cannot be fetched by News Feeder because the images published are not related to any news. .. _FAQ: FAQ ^^^ None .. _Administration: Administration -------------- .. _Installation-notes: Installation notes ^^^^^^^^^^^^^^^^^^ This +ext is reserved to administrators only. However if you limit the access of your folders (this will be explained in detailed mode in the future) you will able to allow the news approval, deletion and other to one ore more BE users. This manual is under development, so that to run correctly the extension I suggest to follow step-by-step the configuration instructions; see next Chapter. *Legal issue: somewhere in your site please cite the sites/engine visited* . **It is recommend to read carefully this steps, otherwise it will be very difficult to run correctly the extension!** Install the extension from admin BE user. just installed please clear /typo3conf cache. Confirm the requested DB modifications. The extension requires **tt\_news +** ext installed and will add a new field to tt\_newstable: this is required to understand from what site the news was fetched. Create a *sysfolder* (i.e. name it NEWS\_FEEDER) to store your configuration parameters and take note of the PID number: a. within your site you can create one or more folders – suggested: create one folder. b. edit the **page properties of your FOLDER or another page in your root-line (above your page...)** and in the Tsconfiginsert the following configuration lines – simple you can copy/paste them: .. _Configuration-example: Configuration example ^^^^^^^^^^^^^^^^^^^^^ (copy and paste, then change references...): :: mod.web_txttnewsfeederM1 { clearCachePages = 1,364,365,366,367,369,370,378,383 useRandomTime = 1 fetchImages = 1 resizeImages = 1 resizedJpgCompression = 60 resizedImagePxWidth = 80 maxImageByteSize = 20000 maxImagePxWidth = 240 maxImagePxHeight = 240 useSubIfTitleIsEmpty = 1 useTitleIfSubIsEmpty = 1 backDays = 7 suspendFlag = 0 autosuspendLimit = 100 maxRecordsPerSession = 30 feederSysFolderPID = 353 newsSysFolderPID = 360 removeExternalOldNews = 20 removeMyOldNews = 360 debugFeed = 0 charSet = cp1252 cronWriteOnlyAccredited = 1 :: } **Images** TEST/MANUAL CHECK - If you need to download **images** note that *News Feeder* you must define the following parameters (like above): :: fetchImages = 1 resizeImages = 1 CRON MODE - If you need to download **images** remember that the pictures are written within uploads/pics. News Feeder assigns automatically them to owner/group of the folder uploads/pics. However you can force another owner/group adding this two params to the configuration above: :: apacheOwner = www-data apacheGroup = www-data Finally you can set the compression quality for *Jpg* file formats and the limits (see reference). **Clear the cache** Just in CRON mode, when the feeder is over, you need to clear the cache for some pages, using this parameter: :: mod.web_txttnewsfeederM1.clearCachePages = all depending on your needings you can use 'pages', temp\_CACHED' (see TYPO3 API reference). .. _FAQ: FAQ ^^^ \- none .. _Configuration: Configuration ------------- **Before run this extension It is recommend to read carefully this steps, configure it (see** ***Administration*** **) otherwise it will be very difficult to run correctly the extension!** .. _How-to-define-a-new-engine-site: **How to define a new engine/site** ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ **Define your first engine** Within your sysfolder assigned to the FEEDER create your first site. The following example concerns the configuration parameters for the engine: `http://www.google.it `_ As stated before this manual is reserved only to Administrators (see Users Manual). Thus the best way to put on work this extension is to follow the following instruction step-by-step. In the future will be published new documentation to explain how to do (configure a new site, learn and study html, etc.). If you are *admin* you can load a new engine or define a new one. To start as soon as possible, run News Feeder and select the last option from the drop-down menu: ' *Load sites definition'* .This option allows you to create a new engine; the definitions are stored within a file you received with this extension.News feeder will check and create a new engine for you: :: Google (test mode) news.google.it This engine-setup works fine and was tested for a long time. Tag- definitions inside are related for the Google news engine in ITALIAN language (http://www.google.it); google.com news was tested on Jan 04, 2007 and works fine. Now I can connect and read the pages: **contact me only** if sites definition preloaded do not work correctly. However google.com recently changed html code output for the news and since Jan 04, 2007 all is OK. Now open your FEEDER folder (from BE interface: List -> select your folder) and you will see what happened. Modify the Google (test mode) news.google.itrecord and you will see the page with the parameters needed to fetch the news. *Warning* : This extension works using GET vars, the PHP file functionto fetch the pages and PHP eregifunction to accept or exclude sites/titles. Thus if you don't know how to, please refer to `http://www.php.net `_ . The +ext does not use navigators (could be in the future) and therefore is *unable* to send POST data. *Brief explanation of used fields* : **Hide** if engine is hidden it will not be processed by *ttnews\_feeder* **Search engine name** site/engine name **Scheme** default: http://, alternative: https:// - Trick: to do the test please save the remote page (using Mozilla, Explorer, etc.) on your hard disk and transfer it on your server. This way is useful to avoid to stress remote server for testing. **Url** url for connection. Here you can use some markers: ###RECORDSTOVIEW### how many records retrieve (i.e. 10,20,50,100); content is defined under keywords table###SEARCHKW### this will be substituted with the search keywords; content is defined under keywords table###EXCLUDEKW### this will be substituted with the keywords to exclude; content is defined under keywords table **Charset** You can select one of the listed items. All strings (title, subtitle, font) will be translated to this charset. If you don't know what to do try cp1252.If you see some undesired chars try to change this parameter until the problem disappears. **Content unwrap** this is a tag or piece of a tag and a tag or piece of a tag that tells to the +ext what fetch from the page. Content means all the block of a page containing *all* the news. :: Section unwrap this is a tag or piece of a tag and a tag or piece of a tag that tells to the +ext what fetch from the Content (above) to extract each news (title, subtitle, font, etc.). **Title unwrap** this is a tag or piece of a tag and a tag or piece of a tag that tells to the +ext what fetch from the Section (above) to extract the title. **Subtitle, Font and Link unwrap** Like above. **Subtitle extraction method** If the title of the news and its subtitle is located in a page , select: ' **from search page (url above)** ': will be used the URL field to fetch the subtitle – means from the same page.Otherwise you must select: ' **from target page, news link** '. This second option can slow the extracting process because News Feeder loads another page to examine and fetch the subtitle. The page depends on the link extracted (see below **Link unwrap** ) If the text is long it will be truncated to the first 255 chars found, preserving the last word found (this is not a simple and bad crop!) **image unwrap, if any found in the section** If the section extracted c(captured with *Section unwrap* ) ontains an image and you configured with the parameter fetchImages = 1 (bool) News Feeder will download the images recognized as TYPO3 configuration parameters defined during installation process. The images will be stored within the /uploads/pics/ folder of your site.Images greater maxImageBytesSize parameter will not be written and thus ignored.All tags for extraction are divided by the marker ###SEP###, you should use this markers and the url markers to project a new engine/site. If you need to define a new site, you must study carefully the page and define correctly these unwraps, then use the TEST MODE to test if the site is working correctly and at the end pass the site in production mode (MANUAL CHECK or CRON MODE). **Link unwrap** This is used to fetch the link that points to the site where the entire news is published (see also subtitle extraction method). **Url to add to the extracted link** somewhat could happen that a site (expecially when *static* ) point to internal news using only relative references (i.e.:/index.php?id=28). If this site is indexed by *ttnews\_feeder* we cannot publish on our TYPO3 site the relative path, then the +ext adds this url to reconstruct the entire ( *absolute* ) path. *note* : if you are configuring a static/dynamic site and theimage unwrapis set, this url will be used to fetch the images. When News Feeder analyze the url it looks if the URL starts with 'http://' or 'https://' (absolute paths); if not it will compose what fetched prepending this parameter. **Autoclean** (interactive or CRON mode)– If enabled you can delete (not remove!) records expired and defined in the next box: **Autoclean backdays** All news related to this site will be considered as deletion after the days here defined. News deleted will be still present in thte database, used for title/url exclusion, but will not available for visitors. **Mode** Running mode. At the first time please select **Test mode** . **Check every n days** Check frequency under Cron/Manual check mode: '0' means each day, otherwise write the number of days between one check and the next. *Note* : if you leave this field empty News Feeder will use 0. **Notes** Internal notes. When you proceed with an UPDATE this field will be preserved and News Feeder will add the UPDATE date and hour. .. _Titles-excluded-accredited-and-refused-sites: Titles excluded, accredited and refused sites ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This tables are used for exclude or accredited sites and the use is intuitive and easy. A Title excluded field need to specify the url related to this title, you can use REGEXP. As stated before, please refer to PHP site for **REGEXP** syntax. .. _How-define-and-use-keywords: How define and use keywords ^^^^^^^^^^^^^^^^^^^^^^^^^^^ *Define your keywords* - Within your system folder assigned to the FEEDER create your keywords. The following example concerns the configuration parameters for the keywords. Here you can define several keywords and configure them individually to obtain different results. Each keyword can be related to one or more sites: **Hide** if keyword is hidden it will not be processed by *ttnews\_feeder* **keyword** search keyword: you must to use the syntax connection to the search engine desired, *i.e.* For Google you can load this field with:antivirus+security(use '+' as separator) **but not...** keyword (or list of keywords) to exclude, typically Google uses:+-microsoft+-HIV+-flu **search engines** select from the right-box the search engine you want to explore using the keyword. Note that Google, Yahoo, Excite use the same syntax for keyword. For sites that use different syntax for keyword definition and exclusion you must to open a new keyword. **Category** here you can select one or more categories to relate the news extracted and approved. This is very useful if you need to aggregate news in your site using tt\_news plugin. Refere tott\_newsdocumentation to know how to create categories. **Notes** internal notes. Put here what you want and remember. **I suggest** *you to define one or more search engine* and then define the keywords. You can associate (relate) each keywords to one or more search engines, but each configured keyword must respect the syntax ot the search engine(s) selected: google, altavista, excite uses the same syntax. If the syntax is different, you must to define another keyword for the desired search engine. **How to define a keyword correctly –** To avoid errors, please follow the steps below: - using your preferred browser connect to the desired engine (i.e. *http://news.google.it)* - fill the search box and run a search i.e. Using the following keywords: *bush -powell* (stays for search for *bush* news but avoid the ' *powell* ' contents) - click on the search button - note that the URL box has changed, for the example above you will see: `http://news.google.it/news?hl=it&ned=it&q= `_ ` **bush+-powell** `_ `&btnG=Cerca+nelle+notizie `_ - well, now you can see the way google uses to pass the GET vars. - Fill the field keyword(see previous paragraph ***Define your keywords*** *) inserting:* bush - Fill the field but not...(see previous paragraph ***Define your keywords*** *) inserting:* +-powell - finaly associate your keyword to the search engine and run a test. - When all is OK, change your search engine properties switching to *production mode* .. _Test-mode: **Test mode** ^^^^^^^^^^^^^ Just configured the extension, defined a keyword and search engine, you can do a test. Test mode doesn't write any record on your DB and it is a great method to check if your *engine-configuration* is working well.To run test- mode click on: |img-5| and then in the right-frame select the menu item: :: Test news engine/sites read the text, select the name of the site to test (or All) and click on the button: :: Run site/engine test *Note* : if you see nothing probably you have not defined yet. Test mode is very similar to production mode, only the modifications will not be written. The only difference is when from test mode there is a DB check for the records already stored. The records displayed have an icon on the left. Right side there is a brief explanation (this is called ' *news status* ').Images will be not written on your server only displayed through a link to remote site. .. _Production-mode: **Production mode** ^^^^^^^^^^^^^^^^^^^ Just configured the extension, and tested the site/engine as explained you can modify the site/engine status in production mode (refer to the engine configuration to do it). When a site is under production mode records will be written in the DB. To run follow the instruction: Click on |img-5| and then in the right-frame select the menu item: :: Run Manual Check read the text and click on the button: :: Run Manual Check please wait some seconds for conclusion and read what fetched. *Note* : if you have deleted a record (manually or automatically refused) the record will be only hidden and it is stored in the DB. It will be deleted (removed definitely) only using the menu item: Clean DB. It is very important to keep on mind that if you remove the records *definitively* using *ttnews\_feeder* or other utilities, the +ext cannot more check if a certain news is yet stored and if you run a new manual (or CRON) check the fresh news will be reloaded.Images, if any, will be written on your server within the folder *upload/pics, according with parameters given – images upper than* maxImageByteSize *will be skipped.* .. _CRONmode: **CRONmode** ^^^^^^^^^^^^ **Since v. 3.0.1 you must to remove all CRONTAB entries and modify as follows.** First add to your site a new BE user with the name: :: _cli _ttnewsfeeder Set the parameter **newsBEOwner** (see reference): :: mod.web_txttnewsfeederM1.newsBEOwner = if you want to edit/display the news fetched remember to set the uid above to '1' (usually this is the uid for Admin user); otherwise use another BE user uid or, if you want, write the uid of the user: :: _cli _ttnewsfeeder it's your own choice depending on security issues and privileges assigned to various BE users. **Since ttnews\_feeder v. 3.0.1** I suggest you to install and configure the system extension SCHEDULER, then configure a new task to fech the news using ttnews\_feeder, with the **desired interval.** Finally set the cron tab adding a line like this: :: 5 * * * * php -q /var/www/www.example.com/web/typo3/cli_dispatch.phpsh scheduler Please ajust the path /var/www... of your site and refer to the dispatcher configuration, that is part of the core. Under some circumstances you will need to change access for ttnews\_feeder\_cli.phpsh: :: chmod 0755 typo3conf/ext/ttnews_feeder/Classes/Cli/ttnews_feeder_cli.phpsh **Warning** ! The News Feeder behaviour will be the same as in the BE. Then I suggest you to try before in the BE.Using CRON News Feeder will fetch news and, for the accredited sites, the news will be published immediately!!!This is a good way to automatize your site but can be some risks so that I suggest you to select carefully the site to define as 'accredited'. The other news, coming from not accredited sites will be stored in your data base and you must to approve the manually. **Don't forget that you must define at least a keyword and/or an engine and select the MODE:CRON MODE** *or* **CRON MODE+MANUAL MODE** **Suspend CRON mode** You can *suspend* CRON (i.e. When you are on vacation...) setting :: suspendFlag = 1 Set this parameter:autoSuspendLimit = with a proper value and when CRON detects that news not approved are over the limit CRON will not fetch and store news. **How to receive a report via email** If you are admin set CRON like above, at the end of the line add the chars here in bold: :: (...) ttnews_feeder_cli.phpsh | admin@your-domain.com If admin and there are more people that are responsible for the news approval each for a different section, you will receive the same report you see in interactive mode (BE) for all section activated.Otherwise, if you want that each of responsible for a certain section receives an email with a report, in the modTSConfig (see Reference) configure the parameter:newsResponsibleEmail At each CRON running the responsible will receive an email with its own report. **Store only the accredited site records** If you set cronWriteOnlyAccreditedto '1' and CRON TASK is active News Feeder will store in the db only the records coming from accredited sites. This could be very useful if you need to automatize completely the approval process avoiding manual approval.Valid records, usually get for manually approval, are stored in the db and marked as deleted so that News Feeder can recognize them and reject again on the next check. **Cron keeps your DB clean!** If you set suspendFlag to 1 and CRON TASK is active News Feeder will be launched and will keep clean your db, checking for records to delete and erase. .. _Notes-about-the-images: Notes about the images ^^^^^^^^^^^^^^^^^^^^^^ Images download is available only if you set to true (1) the fetchImagesparameter. However if you want that downloaded images are resized to a certain value (e.g. 100 px), you must to set the autoresizeImagesparameter too. If you set up autoresizeImagesto true (1) the images will be first resized and only **after** resized the images will be measured and accepted according to maxImageByteSize, maxImagePxWidth, maxImagePxHeightparameters. Values. **Check for extensions allowed** – News Feeder accept first the images extensions allowed by TYPO3 general configuration. Note that autoresize option is allowed only for JPEG, JPG, GIF, PNG images format. If autoresize is on and an image has not any of these format, it will accepted and measured as described above and, if it is oversized it will be refused. **Autoresize images –** I suggest to keep it on because you save disk- space in your server and you will have more and more images for your news because the images will be rarely refused. **Images quality –** First release with image support (v 1.1.16) was not tested with PNG format and could be improved. Please contact me if images will be displayed as not expected so I can introduce news code for resizing. **Images and tt\_news –** If you order News Feeder to resize images please keep note that all images will be resized from tt\_news extensions to create thumbnails in news listing and others. Please note that the best way to avoid low quality is to define some tt\_news parameters (max images width and max images eight) greater/equals of resizedImagePxWidth.The *height* will be calculated automatically from News Feeder. .. _FAQ: FAQ ^^^ **Why can't I see anything under test mode?** Check if you configuration is ok (header unwrap etc.) then verify if your site. Acommon error for the engines is that they need to be related from a keyword definition. If you have not loaded a keyword related to your (new) site, your site will be not visited. **I've had just loaded a new definition, run a manual test and I can't see nothing. Why?** You can define several sites/engine but to run them you must create at least one keyword and associate it (relate) to your engine. So, if you have just loaded a new engine (i.e. *Google* ) please load a new keyword and from the menu select the engine. **Parsing 'news.google.it' sometimes a subtitle disappears. Why?** The extension extract the text using the 'unwrap' parameters passed through the search engine definition. Some *google* records are different and the extension cannot extract them correctly. However the title is always available. **I'm Italian and I have loaded news.google.COM site definition. Nothing works, why?** The extension connects to news.google.com but google redirects to italian service: news.google.it. The pages are formatted differently and the extension cannot fetch record if the site is redirected. .. _Reference: Reference ^^^^^^^^^ Most important configuration in order to guarantee the correct implementation: - Define the pid of the *ttnews\_feeder* system folder - Define the uid of the (user): news owner \- Reference (TSconfig): ttnews\_feeder – **News Feeder** .. ### BEGIN~OF~TABLE ### .. _clearCachePages: clearCachePages """"""""""""""" .. container:: table-row Property clearCachePages Data type int+/string Description List of all page pid's you need to clear from cache. This will run at the end of the process so that the fresh news of accredited sites will be immediately available on BE (since v.2.1.1 you can use also: pages,all,temp\_CACHED) Default - .. _useSubIfTitleIsEmpty: useSubIfTitleIsEmpty """""""""""""""""""" .. container:: table-row Property useSubIfTitleIsEmpty Data type boolean Description 1 (true), 0 (false) – If set to 1 when the news field Title is not extracted (for some reasons...) it will be substituted by the subtitle with limit to 60 chars Default 1 .. _useTitleIfSubIsEmpty: useTitleIfSubIsEmpty """""""""""""""""""" .. container:: table-row Property useTitleIfSubIsEmpty Data type boolean Description 1 (true), 0 (false) – If set to 1 when the news field Subtitle is not extracted (for some reasons...) it will be substituted by the Title with limit to 250 chars Default 1 .. _BackDays: BackDays """""""" .. container:: table-row Property BackDays Data type int+ Description Under evaluation; currently not used Default 7 .. _suspendFlag: suspendFlag """"""""""" .. container:: table-row Property suspendFlag Data type boolean Description Set to '1' if you are on vacation: this will suspend any fetching through CRON Default 0 .. _autosuspendLimit: autosuspendLimit """""""""""""""" .. container:: table-row Property autosuspendLimit Data type int+ Description Works only in CRON mode. If this limit is reached (e.g. There is not any operator to approve fresh news, cause vacation...) no more news are accepted and stored in the DB. The counter keep track only of approved news. This prevents from DB overload. Default 100 .. _maxRecordsPerSession: maxRecordsPerSession """""""""""""""""""" .. container:: table-row Property maxRecordsPerSession Data type int+ Description Works only in MANUAL CHECK mode. If this limit is reached no more news are accepted and stored in the DB. The counter keep track only of approved news. Default 30 .. _feederSysFolderPID: feederSysFolderPID """""""""""""""""" .. container:: table-row Property feederSysFolderPID Data type int+ Description The PID of the page where store your configuration tables (keywords, sites/engine to visit, etc.). Default required .. _newsSysFolderPID: newsSysFolderPID """""""""""""""" .. container:: table-row Property newsSysFolderPID Data type int+ Description The PID of the page where store your EXTERNAL NEWS. I suggest to keep separated your internal and external news so that it will be more easy for you to inspect them. Default ul .. _newsBEOwner: newsBEOwner """"""""""" .. container:: table-row Property newsBEOwner Data type int+ Description Use this parameter only if you wish write into tt\_news table the same user id, otherwise will be used the user UID of the BE user running News Feeder. Default 1 .. _removeExternalOldNews: removeExternalOldNews """"""""""""""""""""" .. container:: table-row Property removeExternalOldNews Data type int+ Description Days back - When this limit is reached: CRON (if used) will remove expired news; if you work in MANUAL CHECK, the news will be removed manually Default 50 .. _removeMyOldNews: removeMyOldNews """"""""""""""" .. container:: table-row Property removeMyOldNews Data type string Description Days back - When this limit is reached: CRON (if used) will remove expired news; if you work in MANUAL CHECK, the news will be removed manually. Default 920 .. _charSet: charSet """"""" .. container:: table-row Property charSet Data type String Description Charset for Html conversion, same parameters of the PHP htmlentities function Default cp1252 .. _maxImageByteSize: maxImageByteSize """""""""""""""" .. container:: table-row Property maxImageByteSize Data type int+ Description Max dimension for images fetched Default 15000 .. _fetchImages: fetchImages """"""""""" .. container:: table-row Property fetchImages Data type bool Description Fetch or not the images from site/engine, default: disabled Default 0 .. _maxImagePxWidth: maxImagePxWidth """"""""""""""" .. container:: table-row Property maxImagePxWidth Data type int+ Description If the image captured width is over this limit, it will be refused Default 300 .. _maxImagePxHeight: maxImagePxHeight """""""""""""""" .. container:: table-row Property maxImagePxHeight Data type int+ Description If the image captured height is over this limit, it will be refused Default 300 .. _resizeImages: resizeImages """""""""""" .. container:: table-row Property resizeImages Data type bool Description Autoresize for the images downloaded, if set all Images will be resized according to the resizedImagePxWidthparameter Default 0 .. _resizedImagePxWidth: resizedImagePxWidth """"""""""""""""""" .. container:: table-row Property resizedImagePxWidth Data type int+ Description This works only if fetchImagesandresizeImagesare both set to 1 (true). If an image is less or more than the parameter; e.g. If the width of downloaded image is 120 pixels the width of resulting image will be 80 pixels width; if it is 60 pixels the new width will be 80 pixels. Default 80 .. _resizedJpgCompression: resizedJpgCompression """"""""""""""""""""" .. container:: table-row Property resizedJpgCompression Data type int+ Description Compression for output image if extension is JPG or JPEG; use 100 for no compression. Default 70 .. _useRandomTime: useRandomTime """"""""""""" .. container:: table-row Property useRandomTime Data type bool Description Date and hour set for the news fetched will be calculated randomly or not. You can disable this setting to '0'; this can be usefull to fetch news according to importace order of search engine visited Default 1 .. _newsResponsibleEmail: newsResponsibleEmail """""""""""""""""""" .. container:: table-row Property newsResponsibleEmail Data type String Description Type a valid email address. Each time CRON will be executed an email containing a report will be sent to this email address. Default - .. _cronWriteOnlyAccredited: cronWriteOnlyAccredited """"""""""""""""""""""" .. container:: table-row Property cronWriteOnlyAccredited Data type Bool Description If set to '1' and News Feeder is running under CRON, only the records of accredited site will be written in the db. Default .. _apacheOwner: apacheOwner """"""""""" .. container:: table-row Property apacheOwner Data type String Description CRON mode: images downloaded will be set with this owner.Default: owner of uploads/pics. Default Same ofuploads/pics .. _apacheGroup: apacheGroup """"""""""" .. container:: table-row Property apacheGroup Data type String Description CRON mode: images downloaded will be set with this group.Default: owner of uploads/pics. Default Same ofuploads/pics .. ###### END~OF~TABLE ###### [tsref:(cObject).web\_txttnewsfeederM1] .. _To-Do: To Do ----- - **a new +ext to read Google news via POP3 (within february 2012)** - improve settings (site defs) and add some new engines - integrate with scheduler. .. _Known-problems: Known problems -------------- - **Since Dec 01, 2011** ttnews\_feeder cannot fetch google.it/.com/.de news because google publish the news in your browser \* exclusively \* using javascript. News are not coded and readable. Within 1-2 months a new +ext will be issued to read google records via POP3. - Running the feeder via CRON if you made two or more (different) BE FOLDERS the news fetched are store improperly. **Please avoid to use more than one folder** , this will be fixed soon. - Running the feeder from BE, using the **SCHEDULER** (manually) you should see the record fetched on the screen. SCHEDULER mode requires to be adjusted and today this way is not perfect. Moreover I tried to add the code for the SCHEDULER but the scheduler refuse to be configured and I got this error: *PHP Fatal error: Class 'tx\_ttnews\_feeder\_schedule' not found in /var/www/typo3\_src-4.4.5/t3lib/class.t3lib\_div.php on line 5260* this is under evaluation - > **instead** use the manual confitguration to run the feeder from CRON. - if you run the extension using WEB ACCELERATOR, please disable it because the images will be not calculated correctly. PHP doesn't use WEB ACCELERATOR and the images fetched are the same as the remote site. - Check your memory limit for PHP – News Feeder was tested under a server with the value configured to 72MB with image fetching enabled and thus the extension ran very slow. A value of 96 MB could be right to work correctly. If you have not access to server configuration (i.e. hosting plan limited to 64 MB or less, consider to disable the download of images to reduce time and resources consumption).Please inform me if you face problems: *at(at)uniud.it* .. _To-Do-list: To-Do list ---------- **SOME things to-do:** - check for bugs under T3 6.2.X; I do not tried to downlad and reuse images - test more extensively for base64 decode of the image tag - A new menu for the BE with some infos/log about CRON mode. - improve output messages and log for updating process - keywords for static/dynamic sites.... - for each keyword enable or disable image fetching... - documentation in italian language - static/dynamic sites: add code to fetch full news and import in DB (long text, news type= internal) - static/dynamic sites (not engines!) add a field to exclude undesired keywords - static/dynamic sites (not engines!) add a field to relate news fetched to one or more news category .. _Changelog: Changelog --------- - **05-10-2014 (v.3.0.1, beta) –** Minor manual modifications (Crontab section). - **04-10-2014 (v.3.0.0, beta) – Now is compatible with TYPO3 6.2; please avoid to install for previous version. Manual updated; CRON must be reconfigured.** - **05-09-2012 (v.2.7.0) -** some code changes to ensure 4.7.x compatibility. Not yet compatible with 6.x., review of site definition (now google is removed). - **05-12-2011 (v.2.5.0) –** new site definition upgraded: google not supported (the record will be hidden after the upgrade). News engines: yahoo.com for DE, IT, EN - BING for italian - **04-02-2011 (v.2.4.8) –** new site definition upgraded, minor bug fixed. Now works with dispatcher from BE - **29-10-2009 (v. 2.3.3) - guide updated ,** new site definition upgraded - **29-08-2009 (v. 2.3.2) –** documentation updated. - **28-08-2009 (v. 2.3.1) -** modified htmlspecialchars\_decode adding some code to ensure compatibility with PHP < 5.1; thanks to *Andreas Weigelt* for discovering this “bug”. - **01-06-2009** **(v. 2.2.12)** – add the use of htmlspecialchars\_decode for the URL retrieved, unfortunately this features restricts the use to PHP v.5.1+ - **28-02-2009 (v.2.2.11) – Site definition updated, guide updated.** Google has just changed the format of HTML page and since today news.google.it and news.google.com have the same parameters. - **04-05-2008 (v.2.2.2 and v.2.2.3) – guide updated** , new site definition - **26-03-2008 (v.2.2.1) – guide updated** (some little mistakes) - **22-03-2008 (v.2.2.1) – Cron mode:** pcitures downloaded perms are set (default) with owner/group of upload/pics; you can override this parameter. - **22-03-2008 (v.2.2.0) – Cron mode now downloads correctly the images.** - **10-02-2007 (v.2.1.5) – Output suppressed (debug), site definition updated.** - **27-12-2007 (v.2.1.1) – Bug fixes –** Function to clear cache now works accepting more parameters (see reference) Property: mod.web\_txttnewsfeederM1.clearCachePages = allclear modified to clear all cache, pages and list of id. - **23-12-2007 (v.2.0.7) – Bug fixes –** Library class modified (if there is only a site defined, news wasn't fetched). Italian definition for google.it doesn't work correctly because the URL was not defined correctly. Guide updated. - **27-07-2007 (v.2.0.4) – New TS config,** LI mode report messages added/improved - **28-05-2007 (v.2.0.2) – Minor bug fixes,** CLI mode report messages added/improved - **22-05-2007 (v.2.0.1) – Two bug fixes –** External news not removed (all Modes), mail not starting in CRON mode. - **22-05-2007 (v.2.0.0) – Major release –** Now works in CRON mode, PHP code has been reviewed and heavily modified; a new class introduced; some improvements and minor bux fixed. Guide updated for CRON and other. - **06-01-2007 (v.1.2.2) –** new field for search engine/sites charset (please update this version and reload site/definitions) - **06-01-2007 (v.1.2.1) –** ajusted Google.news definition; images: some PHP code modified to preserve colors. - **05-01-2007 (v1.2) –** new site definitions for *google.it news* ; modified some code for update, new documentation. - **04-01-2007 (v.1.1.21) –** new site definitions for *google.com news* ; modified some code for update. - **02.01.2007 (v. 1.1.16) –** images autoresize feature impelented; field *check every n days* it is not more required because a bug of Typo3 to testing this type of field. - **30.12.2006 (v. 1.1.12 to v. 1.1.15) –** minor bug fixing - **22.12.2006 (v. 1.1.11) –** add url parameter adjusted for static/dynamic sites to allow remote image fetching; manual upgraded, a message substituted; font inserted before subtitle in test/production mode. - **17.12.2006 (v. 1.1.10) -** all (little) bugs connected to image management are removed. - **12.12.2006 -** modified userBEowner parameter; access bug: not Admin users now can load, delete and remove records. - **30.11.2006 -** new DB field to configure how every day/s start the check for site/engine. - **26.11.2006 -** new TSConf parameter: useTitleIfSubIsEmpty, fills the subtitle with title if subtitle is empty; new TSConf parameter:: useRandomTime, you can enable/disable this feature – if disabled the records will be displayed according to the fetching order and all with the same hour and minute; title/subtitle check: if both are empty record is refused. image status introduced with refused/accepted and bytes message; mandatory field for titles and url to exclude; better specified that you can use REGEXP; option DELETE for image uploaded on approval; messages for images accepted/refused on test and production mode - **21.11.2006 -** delete expired news: corrected the code to show how many records to clean - **20.11.2006 -** image downloading support - **19.11.2006 -** stable version with site definition update feature implemented - **14.11.2006 -** problem discovered: you must copy/paste the code *not* in your feeder-folder but in you root-page properties! - **11.11.2006 -** new parameter for charset conversion; new function: load site definitions; error messages improved; checkboxes to select one or more sites; manual on-line updated - **30.10.2006 -** Second version: manual upgrade, a new field introduced for the scheme. Minor changes, cache not cleared fixed. - **25.10.2006 -** First version published |img-6| EXT: news feeder - 15 .. ######CUTTER_MARK_IMAGES###### .. |img-1| image:: img-1.png .. :align: left .. :border: 0 .. :height: 283 .. :id: graphics4 .. :name: graphics4 .. :width: 513 .. |img-2| image:: img-2.png .. :align: left .. :border: 0 .. :height: 213 .. :id: graphics1 .. :name: graphics1 .. :width: 513 .. |img-3| image:: img-3.png .. :align: left .. :border: 0 .. :height: 363 .. :id: graphics2 .. :name: graphics2 .. :width: 478 .. |img-4| image:: img-4.png .. :align: left .. :border: 0 .. :height: 193 .. :id: graphics3 .. :name: graphics3 .. :width: 477 .. |img-5| image:: img-5.png .. :align: left .. :border: 0 .. :height: 17 .. :id: immagini2 .. :name: immagini2 .. :width: 91 .. |img-6| image:: img-6.png .. :align: left .. :border: 0 .. :height: 32 .. :id: Graphic1 .. :name: Graphic1 .. :width: 102