.. You may want to use the usual include line. Uncomment and adjust the path. .. include:: ../Includes.txt =============================== EXT: robots.txt for multidomain =============================== :Created: 2010-02-18T17:33:18 :Changed: 2014-04-27T11:58:51.980000000 :Classification: dev_null_robots :Keywords: robots, robots.txt, multidomain :Author: Wolfgang Rotschek :Email: scotty@dev-null.at :Info 4: :Language: en, de |img-1| |img-2| EXT: robots.txt for multidomain - dev\_null\_robots .. _EXT-robots-txt-for-multidomain: EXT: robots.txt for multidomain =============================== Extension Key: dev\_null\_robots Language: en, de Keywords: robots, robots.txt, multidomain Copyright 2011, Wolfgang Rotschek, This document is published under the Open Content License available from http://www.opencontent.org/opl.shtml The content of this document is related to TYPO3 \- a GNU/GPL CMS/Framework available from www.typo3.org .. _Table-of-Contents: Table of Contents ----------------- `EXT: robots.txt for multidomain 1 <#__RefHeading__937_1075929538>`_ **`Introduction 3 <#__RefHeading__939_1075929538>`_** `What does Dev/Null Robots.txt do? 3 <#__RefHeading__941_1075929538>`_ `Features 3 <#__RefHeading__943_1075929538>`_ `Credits 3 <#__RefHeading__1827_726902644>`_ **`Users manual 4 <#__RefHeading__953_1075929538>`_** `Usage 4 <#__RefHeading__1347_1245499790>`_ **`Administration 5 <#__RefHeading__957_1075929538>`_** `Cooluri 5 <#__RefHeading__959_1075929538>`_ `RealURL 5 <#__RefHeading__1202_1613841388>`_ `.htaccess 5 <#__RefHeading__1206_1613841388>`_ **`Domain Record Configuration 6 <#__RefHeading__1218_1613841388>`_** `Crawler Config List 6 <#__RefHeading__1220_1613841388>`_ `Default Crawler Directives 6 <#__RefHeading__1222_1613841388>`_ **`Crawler-specific Configuration 7 <#__RefHeading__1224_1613841388>`_** **`Page Meta Tags 8 <#__RefHeading__634_1224899791>`_** **`Configuration 9 <#__RefHeading__961_1075929538>`_** `Reference 9 <#__RefHeading__963_1075929538>`_ `Output Defaults for robots.txt (useDefaults) 9 <#__RefHeading__945_1075929538>`_ **`Known problems 10 <#__RefHeading__967_1075929538>`_** `Feature request & Bugs 10 <#__RefHeading__969_1075929538>`_ **`To-Do list 11 <#__RefHeading__971_1075929538>`_** **`ChangeLog 12 <#__RefHeading__973_1075929538>`_** .. _Introduction: Introduction ------------ .. _What-does-Dev-Null-Robots-txt-do: What does *Dev/Null Robots.txt* do? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This extension generates domain specific robots.txt files. Configuration is done from in sys\_domain records and typoscript. .. _Features: Features ^^^^^^^^ - Preconfigured TS-Templates for robots.txt - Different robots.txt per domain - Supports RealURL auto-configuration .. _Credits: Credits ^^^^^^^ Automate RealURL configuration where possible – Jacob Floyd < `jacob@cognifire.com `_ > .. _Users-manual: Users manual ------------ - Download and install the extension with the extension manager - Configure URL-rewriting for your robots.txt (as necessary) .. _Usage: Usage ^^^^^ This extension works nearly out of the box, especially if you use RealURL with automatic configuration. If you don't have that, then only URL-rewriting has to be configured. (See the Administration section for help configuring URL-Rewriting) Assuming your website starts with id=1, make sure your robots.txt is working with the following url: http://www.yourdomain.tld/index.php?id=1&type=1964 Next, configure URL-rewriting for each domain, as necessary. (See *Administration* ) [Optional] Add additional robots.txt configuration in domain records. (See *Domain Record Configuration* and *Crawler-specific Configuration* ) When you're done, make sure that it works by visiting: http://www.yourdomain.tld/robots.txt [Optional] Add any page specific meta-data (See *Page Meta Tags* ) .. _generated: ((generated)) """"""""""""" .. _Example: Example ~~~~~~~ You can see a working robots.txt at: `http://www.dev- null.at/robots.txt `_ .. _Administration: Administration -------------- The extension is pre-configured to output a robots.txt whenever page type 1964 is requested. If your Templates already use type 1964, you can change the default value of typeNumin the Constant Editor. To get to the Constant Editor, go to the root of your website, edit the template, and choose “Constant Editor” from the drop down. Just remember, if you change the default typeNum, you must also manually configure your url-rewriting solution. Depending on your chosen URL-rewriting method, you may also have to configure rewriting the robots.txt file for each domain. Check out the following sections for configuration instructions for Cooluri, RealURL and .htaccess. .. _Cooluri: Cooluri ^^^^^^^ If you use the extension Cooluri for URL-rewriting you have to define the link for each domain that should serve a robots.txt |img-3| .. _RealURL: RealURL ^^^^^^^ If you use RealURL's automatic configuration, you're in luck; RealURL auto-configuration is supported out of the box. You may have to delete your realurl\_autoconf.php file in typo3conf to get RealURL to regenerate it. However, you must manually update your RealURL configuration if: (a) you have customized your RealURL re-writing configuration; or (b) you have changed the default typeNum setting. To manually make RealURL serve your robots.txt, add the following to your RealURL configuration (see the RealURL documentation for details on how to do this). The default configuration puts this block in the 'fileName' part of the configuration array. :: 'index' => array( 'robots.txt' => array( 'keyValues' => array( 'type' => '1964', ), ), Remember: If you change your typeNum you have to change it here too! .. _htaccess: .htaccess ^^^^^^^^^ This method is not recommended. It does not work in conjunction with RealURL, and n it hasn't been tested in any other siituations. You may have to create a rewrite rule for each domain. :: RewriteCond %{HTTP_HOST} www.yourdoman.tld RewriteRule ^(robots\.txt)$ index.php?id=1&type=1964 [L,NC] RewriteRule .* index.php [L] ### End: Rewrite stuff ### .. _Domain-Record-Configuration: Domain Record Configuration --------------------------- Domain records get two new fields: “crawler config list” and “Default crawler directives”. |img-4| .. _Crawler-Config-List: Crawler Config List ^^^^^^^^^^^^^^^^^^^ List of crawler specific configurations. This will be inserted before the default configuration. .. _Default-Crawler-Directives: Default Crawler Directives ^^^^^^^^^^^^^^^^^^^^^^^^^^ Default Disallow and Allow statements without User-agent. To disallow all for the given domain just enter :: Disallow: / .. _Crawler-specific-Configuration: Crawler-specific Configuration ------------------------------ Default crawler configurations can be linked to domain records. Each crawler configuration has two fields: |img-5| .. ### BEGIN~OF~TABLE ### .. _User-agent: User-agent ^^^^^^^^^^ .. container:: table-row Field User-agent Description String which identifies the crawler. .. _Crawler-config: Crawler config ^^^^^^^^^^^^^^ .. container:: table-row Field Crawler config Description Configuration for the crawler. .. ###### END~OF~TABLE ###### .. _Page-Meta-Tags: Page Meta Tags -------------- Additional or alternately you can add robots meta tags to each page. You can setup the default behavior through the constants editor. |img-6| Within the page on the register metadata you can limit the robots for the page. |img-7| The restrictions from the typo script setup and the page record are merged. This means a noindex in the setup is reflected on all pages even if the checkbox is unchecked. .. _generated: ((generated)) ^^^^^^^^^^^^^ .. _generated: ((generated)) """"""""""""" .. _Example: Example: ~~~~~~~~ :: .. _Important: Important: ~~~~~~~~~~ Currently there is no relation between the robots.txt file and the meta tags settings. .. _Configuration: Configuration ------------- You can configure the defaults for this extension by modifying the config.devnullrobotsTyposcript object. You can override these defaults on a per-domain basis, by modifying the appropriate fields in your sys\_domain record(s). .. _Reference: Reference ^^^^^^^^^ .. ### BEGIN~OF~TABLE ### .. _useDefault: useDefault """""""""" .. container:: table-row Property useDefault Data type option Description String option which default configuration from the TS-template has to be used. off no default configuration will be added allow use config.devnullrobots.defaultRules.defaultAllow disallow use config.devnullrobots.defaultRules.defaultDisallow useDefault is ignored when a domain record provides its own configuration. Default allow .. _sitemapName: sitemapName """"""""""" .. container:: table-row Property sitemapName Data type string Description If set, a *Sitemap* :directive will be written at the and of the robots.txt Default sitemap.xml .. _metaRobots: metaRobots """""""""" .. container:: table-row Property metaRobots Data type bool Description If set, a robots meta tag will be written into the pages header Default 0 .. _metaDefault: metaDefault """"""""""" .. container:: table-row Property metaDefault Data type int Description Bitmask for robots meta tag :: 0 index,follow 1 noindex,follow 2 index,nofollow 3 noindex,nofollow Default 0 .. _typeNum: typeNum """"""" .. container:: table-row Property typeNum Data type int Description typeNum for the TS-page object, responsible for the robots.txt file Default 1964 .. ###### END~OF~TABLE ###### [tsref:config.devnullrobots] .. _Output-Defaults-for-robots-txt-useDefaults: Output Defaults for robots.txt (useDefaults) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. _generated: ((generated)) """"""""""""" .. _Default-allow-output: Default 'allow' output ~~~~~~~~~~~~~~~~~~~~~~ This is the default Robots.txt output when useDefault = allow. If you would like to create your own defaults to be used across all of the sites in your TYPO3 install, modify config.devnullrobots.defaultRules.defaultAllowin your TS-Template. :: User-agent: * Disallow: /fileadmin Disallow: /t3lib Disallow: /typo3conf Disallow: /typo3temp Allow: /fileadmin/media .. _Default-disallow-output: Default 'disallow' output ~~~~~~~~~~~~~~~~~~~~~~~~~ This is the default Robots.txt output when useDefault = disallow. If you would like to create your own defaults to be used across all of the sites in your TYPO3 install, modify config.devnullrobots.defaultRules.defaultDisallowin your TS-Template. :: User-agent: * Disallow: / .. _Known-problems: Known problems -------------- Extension must be loaded after extension sourceopt .. _Feature-request-Bugs: Feature request & Bugs ^^^^^^^^^^^^^^^^^^^^^^ If you find a bug or a missing feature, please add it to the bugtracker of the extension at forge. The address is: `http://forge.typo3.org/projects/extension-dev\_null\_robots/issues `_ For posting a bug or feature request you need an account of typo3.org. You can see this extension in operation on one of my websites `http://www.dev-null.at `_ `http://www.letsdive.at `_ .. _To-Do-list: To-Do list ---------- .. _ChangeLog: ChangeLog --------- .. ### BEGIN~OF~TABLE ### .. _2-0-0: 2.0.0 ^^^^^ .. container:: table-row Version 2.0.0 Changes Typo3 6.2 compatibility .. _1-4-0: 1.4.0 ^^^^^ .. container:: table-row Version 1.4.0 Changes Added version dependency information Tested with Typo3 4.5, 4.7, 6.0 .. _1-3-0: 1.3.0 ^^^^^ .. container:: table-row Version 1.3.0 Changes Added RealURL Auto-configuration – thanks to Jacob Floyd .. _1-2-0: 1.2.0 ^^^^^ .. container:: table-row Version 1.2.0 Changes Site root doesn’t need to be on topmost level within pagetree anymore .. _1-1-1: 1.1.1 ^^^^^ .. container:: table-row Version 1.1.1 Changes Added german translation – thanks to Bernhard Eckl Added manual section for robot meta tagsReformatted manual .. _1-1-0: 1.1.0 ^^^^^ .. container:: table-row Version 1.1.0 Changes Support for meta tags Support for sitemap directive in robots.txt .. _1-0-1: 1.0.1 ^^^^^ .. container:: table-row Version 1.0.1 Changes Added documentation for RealURL .. _1-0-0: 1.0.0 ^^^^^ .. container:: table-row Version 1.0.0 Changes Initial upload .. ###### END~OF~TABLE ###### |img-2| 12 .. ######CUTTER_MARK_IMAGES###### .. |img-1| image:: img-1.png .. :align: left .. |img-2| image:: img-2.png .. :border: 0 .. :height: 21 .. :hspace: 9 .. :id: Grafik2 .. :name: Grafik2 .. :width: 87 .. |img-3| image:: img-3.png .. :align: left .. :border: 0 .. :height: 178 .. :id: Grafik1 .. :name: Grafik1 .. :width: 477 .. |img-4| image:: img-4.png .. :align: left .. :border: 0 .. :height: 202 .. :id: Grafik3 .. :name: Grafik3 .. :width: 316 .. |img-5| image:: img-5.png .. :align: left .. :border: 0 .. :height: 180 .. :id: Grafik4 .. :name: Grafik4 .. :width: 409 .. |img-6| image:: img-6.png .. :align: left .. :border: 0 .. :height: 177 .. :id: Grafik5 .. :name: Grafik5 .. :width: 559 .. |img-7| image:: img-7.png .. :align: left .. :border: 0 .. :height: 133 .. :id: Grafik7 .. :name: Grafik7 .. :width: 364