.. You may want to use the usual include line. Uncomment and adjust the path. .. include:: ../Includes.txt ================= textLang: TextCat ================= :Author: Kasper Skårhøj :Created: 2002-11-01T00:32:00 :Changed: 2005-07-20T14:36:12 :Author: René Fritz :Email: r.fritz@colorcube.de :Info 3: :Info 4: .. _textLang-Lang-guess: textLang: Lang guess ==================== Extension Key: **cc\_langguess** Copyright 2003-2005, René Fritz, This document is published under the Open Content License available from http://www.opencontent.org/opl.shtml The content of this document is related to TYPO3 \- a GNU/GPL CMS/Framework available from www.typo3.com .. _Table-of-Contents: Table of Contents ----------------- **textLang: Lang guess 1** **Introduction 1** **Users manual 1** .. _Introduction: Introduction ------------ This extension provides a service of the type 'textLang' which can be used to guess a language of a given text snippet. This service use the Perl script which can detect around 70 languages. The script itself is provided within this extension and don't have to be installed separately. To make this work you need to have Perl installed of course. This service type is used by the DAM extension. The difference of this service to cc\_textcat is that this service work with different text encodings (charsets). .. _generated: ((generated)) ^^^^^^^^^^^^^ .. _generated: ((generated)) """"""""""""" .. _generated: ((generated)) ~~~~~~~~~~~~~ .. _The-Perl-script: The Perl script ''''''''''''''' The perl script use a package developed by Maciej Ceglowski. The package expect the text content in utf-8 encoding. Other solutions often use the encoding commonly used by the languages. `http://search.cpan.org/~mceglows/ `_ `http://www.idlewords.com/lang/ident.pl?text `_ = The author writes: “I've been a big fan of TextCat, and wanted to see what happened if I combined the same algorithm for n-gram based identification with some intelligence about Unicode. The result is a Unicode-friendly language identifier that makes some initial guesses based on script block. It relies on proper UTF-8 input to be happy.” Other resources of language detection: `http://www.let.rug.nl/~vannoord/TextCat/ `_ .. _Users-manual: Users manual ------------ The service can be used in own extension like this: :: $textExcerpt = 'This is a sample text in the englisch language'; if (is_object($serviceObj = t3lib_div::makeInstanceService('textLang'))) { $conf['encoding'] = 'utf-8'; $serviceObj->process(($textExcerpt, '', $conf); $lang_ISO_code = $serviceObj->getOutput(); $serviceObj->__destruct(); unset($serviceObj); } $content = 'The guessed language is: '.$lang_ISO_code; The charset encoding should be provided with the option 'encoding'. Otherwise the value of $TYPO3\_CONF\_VARS['BE']['forceCharset'] will be used. If your Perl installation can't be found you can configure it by adding the path to the following variable in :code:`localconf.php` : :: // String, comma separated list: // list of absolute paths where external programs should be searched for $TYPO3_CONF_VARS['SYS']['binPath'] = '/some/special/path/to/your/binaries/'; .. _generated: ((generated)) ^^^^^^^^^^^^^ .. _generated: ((generated)) """"""""""""" .. _FAQ: FAQ ~~~ .. _Q-The-service-seems-not-to-work-What-can-be-the-reason: Q: The service seems not to work. What can be the reason? ''''''''''''''''''''''''''''''''''''''''''''''''''''''''' A: If the service don't work it can have following reasons: - Perl is not installed - Perl is installed in a path it can not be found - Your web server or PHP is configured not to allow to execute scripts - Your web server or PHP is configured to allow to execute scripts only in some special directories |img-1| textLang: TextCat - 2 .. ######CUTTER_MARK_IMAGES###### .. |img-1| image:: img-1.png .. :align: left .. :border: 0 .. :height: 32 .. :id: Graphic1 .. :name: Graphic1 .. :width: 102