DEPRECATION WARNING

This documentation is not using the current rendering mechanism and is probably outdated. The extension maintainer should switch to the new system. Details on how to use the rendering mechanism can be found here.

EXT: MOC Varnish

Created:2010-02-18T17:33:18
Changed by:Jan-Erik Revsbech
Changed:2011-07-11T16:36:38
Classification:moc_varnish
Description:The keywords help with categorizing and tagging of the manuals. You can combine two or more keywords and add additional keywords yourself. Please use at least one keyword from both lists. If your manual is NOT in english, see next tab "language" ---- forEditors (use this for editors / german "Redakteure") forAdmins (use this for Administrators) forDevelopers (use this for Developers) forBeginners (manuals covering TYPO3 basics) forIntermediates (manuals going into more depth) forAdvanced (covering the most advanced TYPO3 topics) see more: http://wiki.typo3.org/doc_template#tags ----
Keywords:forDevelopers, forIntermediates, for SysAdms, varnish
Author:Jan-Erik Revsbech
Email:janerik@mocsystems.com
Info 4:
Language:en

img-1 img-2 EXT: MOC Varnish - moc_varnish

´

EXT: MOC Varnish

Extension Key: moc_varnish

Language: en

Keywords: forDevelopers, forIntermediates, for SysAdms, varnish

Copyright 2000-2010, Jan-Erik Revsbech, <janerik@mocsystems.com>

This document is published under the Open Content License

available from http://www.opencontent.org/opl.shtml

The content of this document is related to TYPO3

- a GNU/GPL CMS/Framework available from www.typo3.org

Table of Contents

EXT: MOC Varnish 1

`Introduction 4 <#__RefHeading__23019_1911072973>`_

What does it do? 4

This extension provides TYPO3 changes to increase performance of TYPO3 websites using Varnish. 4

Varnish is s front-end cache or web-accelerator, developed specifically for boosting performance. It is written by FreeBSD core- developer Paul Henning Kamp, and more information can be found at http://www.varnish-cache.org/. This extension requires some specific setup of Varnish, and as such, requires some knowledge of what varnish is and how it works. 4

The extensions provides several things: 4

Support for having all USER_INT as Edge-side includes. 4

Changes to loginprocedure to handle content when logged in (@TODO) 4

Cache clearing in Varnish when cache is cleared. (Currently only for Sites with RealURL in “normal” configuration). 4

Examples of Varnish VCL configuration files for TYPO3 hosting. 4

When you program the plugin, decide what is the Time-to-live of this particular plugin. A latest news-list might have a long cache (since TYPO3 clears the cache when new articles are created), but a “currently logged in users” might have a shorter time-to-live. A plugins showing latest Twitter updates, might have a ttl of 60 seconds. Even though we want it to be 100% dynamic, a 60 second delay is probably acceptable, especially if it saves your site the day you have 6000 simultaneous hits. 5

`Installation and administration 6 <#__RefHeading__23025_1911072973>`_

`Basically all configuration is done in the Extension manager, so you basically just have to install the extension. 6 <#__RefHeading__23027_1911072973>`_

`To use automatic cache clearing, the extensions currently relies on RealURL path cache. So when a given uid is typo is cleared, the URL for that page is looked up int the RealURL pathcache. So RealURL pathcache this mus be enabled to work. We do have a solution for sites where realURL pathcache is not used. Contact janerik@mocsystems.com for details. 6 <#__RefHeading__23029_1911072973>`_

`Notice that currently, all Varnish cache is cleared when a “clear all cache” is issued in TYPO3. This means that if you use one Varnish cache for several sites, this will purge all cache! In newer versiones of Varnish (2.3 I think), its possible to PURGE only URL's matcing a given host. This only requires a simple change to the VCL configuration. See the example VCL esi-full.vcl. 6 <#__RefHeading__23031_1911072973>`_

`Currently clearing of cache is done by calling the URL that you are currently logged into. If you log into typo3 on http://mytest.com/ the cache PURGE request to varnish is done to http://mytest.com/ this could later on be configurable. 6 <#__RefHeading__23033_1911072973>`_

`You need to include the MOC Varnish static TypoScript template to have ESI working correctly. 6 <#__RefHeading__12147_1911072973>`_

`Needless to say, you need to have Varnish up and running, either on the same machine, or on another machine. 6 <#__RefHeading__23035_1911072973>`_

`Configuration 7 <#__RefHeading__23037_1911072973>`_

`This extension has som simple configuration options when installing via Extension manager. Basically it allows to switch on of off different features, like ESI and automatic cache clearing. 7 <#__RefHeading__23039_1911072973>`_

TypoScript Reference 7

The extensions provides some additional fields that can be added to USER_INT objects. 7

`allowing the editor to choose the cache time-to-live himself. 7 <#__RefHeading__23041_1911072973>`_

`Tutorial 8 <#__RefHeading__23043_1911072973>`_

`This section contains some examples and tutorials. The tutorial takes you through a rather simple, yet effective configuration of TYPO3 and Varnish. 8 <#__RefHeading__23045_1911072973>`_

`The tutorial assumes that you already have Varnish set up and running, and that you have just a little experience with configuring Varnish using VCL. Please consult the sparse, yet pretty good Varnish manual. 8 <#__RefHeading__11329_1911072973>`_

`I have run all the examples on standard MAC Ports packages, as well as standard Debian packages. 8 <#__RefHeading__23047_1911072973>`_

Using Edge Side includes 8

`The extension allows you to set a default value for how many seconds a USER_INT should be cached. This is very effective for boosting a complete site, but you must take care, that not USER_INT creates content that is unique for each user, since content is cached across users. 12 <#__RefHeading__23049_1911072973>`_

`VCL Examples 13 <#__RefHeading__23051_1911072973>`_

Production ready VCL 13

`How does it work? 17 <#__RefHeading__11335_1911072973>`_

ESI 17

Vache clearing 17

`When a (or possible several) URL for a page is found, the webserver makes a PURGE request to the given URL. Currently we use the host the backend is running on for the request. So you need to be logged in to the backend on a URL that the server resolves to the Varnish host. We are currently working on a solution where the varnish URL can be specified in the extension manager. 17 <#__RefHeading__23055_1911072973>`_

`Known problems 18 <#__RefHeading__23057_1911072973>`_

`To-Do list 19 <#__RefHeading__23059_1911072973>`_

`ChangeLog 20 <#__RefHeading__23061_1911072973>`_

Introduction

What does it do?

This extension provides TYPO3 changes to increase performance of TYPO3 websites using Varnish.

Varnish is s front-end cache or web-accelerator, developed specifically for boosting performance. It is written by FreeBSD core- developer Paul Henning Kamp, and more information can be found at http://www.varnish-cache.org/ . This extension requires some specific setup of Varnish, and as such, requires some knowledge of what varnish is and how it works.

The extensions provides several things:

  • Support for having all USER_INT as Edge-side includes.
  • Changes to loginprocedure to handle content when logged in (@TODO)
  • Cache clearing in Varnish when cache is cleared. (Currently only for Sites with RealURL in “normal” configuration).
  • Examples of Varnish VCL configuration files for TYPO3 hosting.

The extension and the included tutorial assumes that you have Varnish up and running, and have at least enough experience with vanish to actually compile and load configurations (yes, configurations needs compiling. But arnish does that for you!). To understand how it works, you should be familiar with typoscript, and at least have a general idea how the TYPO3 cache works.

ESI

Or Edge-side includes is a very cool feature of Varnish (and other web accelerators). It basically allows the webserver to tell varnish that the page is built out of several building blocks and have different cache definitions for each of them. An example could be a newspage with a “Latest news” function on all views f articles. The news article itself really does not change very often, and could possible be cached for months (if not forever, when TYPO3 clears its cache, varnish cache for that page is also cleared). But the “latest news” can not be cached for months since new article hopefully is added much more frequently. ESI allow the CMS to tell Varnish that the page in general can be cached for one month, but the specific block should not be cached, or possibly cached for only 1 minute, this allowing varnish to handle extreme high loads, but still allow the website to have dynamic content.

The Edge-side include should really be thought of as old-fashioned server-side-includes, but with full cache management.

For TYPO3 this model is very much in line with TYPO3s own caching scheme. In TYPO3 the concept of USER_INT typoscript objects allow the TYPO3 cache to have elements that are not cached, although the major part of the page is actually cached. This extension provides examples of how to set up Varnish with ESI, so all USER_INT are actually handled by Varnish as ESI. The performance boost are tremendous, but there is some minor catches, like not being able to add dynamic headerdata etc. handling default timeout and users logged into you website.

Varnish and FE users

The whole concept of having a cache, is that content should be cached. But when users are logged in, we do not wish to cache content for different user, we might accidentally cache one persons address details and display them to another person.

There are several ways to handle this challenge, and all of them require Varnish to know when a user (and possibly who) is logged in to TYPO3. This extensions hooks into the loginprocedure and adds a cookie to the user that Varnish can use to handle the challenge. Varnish might handle this differently. One approach is to simply disable caching when logged in (except for images, movies, css and possibly JS). This is the most simple and easy to configure way to handle it. We prove VCL for this approach in the CVL Examples folder.Another approach could be tell varnish to include the user uid in the cache hash generation. This way the same page would be cached differently for different users. This approach is quite nice as it allows to cache content for each individual user, and the user will benefit from a faster website. However the cache might fill up fast if many users log in.

In version 1.0, this is actually not included, as it requires a little work to make it stable.

Cache clearing

Since cache is already built into TYPO3, but content is now also cached in Varnish, this extension provides TYPO3 with a way to clear the varnish cache for a page when the cache for that page is cleared. This way we can cache content longer in Varnish, as we know the cache is purged when the content is altered. There are a lot of caveats for this, specifically since many sites use RealURL, and there might be many URL's that a specific page can be reached on. Multiple domains, old not-yet-expired URLS etc. We need to clear the cache for all of them, when content is changed.

This extension tries to handle this automatic clearing of Varnish cache when TYPO3 cache is cleared.

Developing extensions for high-performance websites

This is a topic that has enough information for a (large) blog-post, but here are the key points, when developing extensions for high- perfomance websites using Varnish.

Avoid user logins

Since its hard and expensive to cache content for each user, most sites sites disable caching when a user is logged in. Perhaps you can limit the pages that displays per-user specific information to a branch of the site. Of limit the user-specific content to ajax calls (see below).

Use AJAX for dynamic content

If you wish to do something for each user, based of sessions. For instance register each time a user sees a given news-article to do view statistics (this is done much better with Google analytics, but as an example). Instead of writing a USER_INT plugin (or installing one from the repository) that inserts a line into a log file (you would never write an extension that increments a counter of the news- records off course), you make a very small lightweight eID script that is called thorugh ajax. That way the whole newspage can be cached in Varnish and only a lightweight ajax call is made. Make the ajax with POST to make sure varnish does not cache it! (or write a rule that disables varnish cache on that script). This way, if you server is really busy, the statistics might not function properly, but you page is still visible, since the content is cached and served through Varnish.

Decide the ttl

When you program the plugin, decide what is the Time-to-live of this particular plugin. A latest news-list might have a long cache (since TYPO3 clears the cache when new articles are created), but a “currently logged in users” might have a shorter time-to-live. A plugins showing latest Twitter updates, might have a ttl of 60 seconds. Even though we want it to be 100% dynamic, a 60 second delay is probably acceptable, especially if it saves your site the day you have 6000 simultaneous hits.

Installation and administration

Basically all configuration is done in the Extension manager, so you basically just have to install the extension.

To use automatic cache clearing, the extensions currently relies on RealURL path cache. So when a given uid is typo is cleared, the URL for that page is looked up int the RealURL pathcache. So RealURL pathcache this mus be enabled to work. We do have a solution for sites where realURL pathcache is not used. Contact janerik@mocsystems.com for details.

Notice that currently, all Varnish cache is cleared when a “clear all cache” is issued in TYPO3. This means that if you use one Varnish cache for several sites, this will purge all cache! In newer versiones of Varnish (2.3 I think), its possible to PURGE only URL's matcing a given host. This only requires a simple change to the VCL configuration. See the example VCL esi-full.vcl.

Currently clearing of cache is done by calling the URL that you are currently logged into. If you log into typo3 on http://mytest.com/ the cache PURGE request to varnish is done to http://mytest.com/ this could later on be configurable.

You need to include the MOC Varnish static TypoScript template to have ESI working correctly.

Needless to say, you need to have Varnish up and running, either on the same machine, or on another machine.

Configuration

This extension has som simple configuration options when installing via Extension manager. Basically it allows to switch on of off different features, like ESI and automatic cache clearing.

TypoScript Reference

The extensions provides some additional fields that can be added to USER_INT objects.

max_age

Property

max_age

Data type

int/stdWrap

Description

Sets the TTL for this specific USER_INT. If not set, the default for the installation is used (se Extension configuration).

Specify in seconds.

Default

Depends on EXT conf

no_esi

Property

no_esi

Data type

boolean

Description

Disables ESI includes for this particular USER_INT. Use with caution, since the content will be cached in Varnish along with the rest of the page!

This will be deprecated in later versions, and replaced by an option to enable specific USER_INTS!

Default

[tsref:(cObject).USER_INT]

((generated))
Example

Set the cache TTL to 10 seconds for pi1 on extension moc_varnish.

Plugin.tx_mocvarnishtest_pi1.max_age=10

Assuming you write your own extension, you could add a field maxage to the tt_content table and configure something like this

plugin.tx_myext_pi1.max_age.field = tx_myext_maxage

allowing the editor to choose the cache time-to-live himself.

Tutorial

This section contains some examples and tutorials. The tutorial takes you through a rather simple, yet effective configuration of TYPO3 and Varnish.

The tutorial assumes that you already have Varnish set up and running, and that you have just a little experience with configuring Varnish using VCL. Please consult the sparse, yet pretty good Varnish manual.

I have run all the examples on standard MAC Ports packages, as well as standard Debian packages.

Using Edge Side includes

For testing Edge side includes. Please configure you varnish with the esi support for all files (or at least files that are not images etc).

Simple test

In you vcl_fetch function, add something like this to the top of the function

#Respect force-reload, and clear cache accordingly. This means that a ctrl-reload will acutally purge
# the cache for this URL.
if (req.http.Cache-Control ~ "no-cache") {
  set obj.ttl = 0s;
  #Make sure ESI includes are processed!
  esi;
  return (deliver);
}

if (req.url ~ "\.(png|gif|jpg|swf)$") {
  unset obj.http.set-cookie;
  set obj.http.X-Cacheable = "YES:jpg,gif,jpg ans swf are always cached";
  return (deliver);
}


#Allow 24 hour stale content, before an error 500/404 is thrown.
# When a backend server is not responding
# allow varnish to server stale content for 24 hours afters its expirery.
set obj.grace = 24h;

#Allow edgeside includes
esi;
if (req.url == "/date.php") {
  set obj.ttl = 0s;
} else  {
  set obj.ttl = 24s;
}
return (deliver);

The last 6 lines will tell varnish to process esi tags, and will not cache that. All other content is cached for 24 hours. An example of a vcl for this is given in vcl-exampeles/esi-test.vcl in the extension root.

Just to test that it works, add two files to you website. Test.php and date.php

test.php
<html>
  <body>
  The cached time is <?php echo date("d/m-Y H:i:s")?><br />
  The real time is: <esi:include src="date.php"/>
  </body>
</html>
date.php
<?php
echo date("d/m-Y H:i:s");
?>

When hitting the test.php file in your browser (through varnish) and continually refreshing your browser (not-force reload!) you should se the fist line being the same over all requests, but the second lines changes with every refresh to display the correct time.

TYPO3 Test

Now for the much more fun part. The same example but with TYPO3!

Take the above VCL as a base, alter the vcl_fetch_method to something like this

#Respect force-reload, and clear cache accordingly. This means that a ctrl-reload will actually purge
# the cache for this URL.
if (req.http.Cache-Control ~ "no-cache") {
  set obj.ttl = 0s;
  #Make sure ESI includes are processed!
  esi;
  return (deliver);
}

if (req.url ~ "\.(png|gif|jpg|swf)$") {
  unset obj.http.set-cookie;
  set obj.http.X-Cacheable = "YES:jpg,gif,jpg ans swf are always cached";
  return (deliver);
}


#Allow 24 hour stale content, before an error 500/404 is thrown.
# When a backend server is not responding
# allow varnish to server stale content for 24 hours afters its expirery.
set obj.grace = 24h;

#Allow edgeside includes
esi;
if (obj.http.X-ESI-RESPONSE) {
  set obj.ttl = 0s;
  return (pass);
} else  {
  set obj.ttl = 24h;
}

return (deliver);

The only change compared to the first configuration is that we now test for the HTTP Header X-ESI-REPOSNSE instead of testing for date.php specifically. This way, when we render the ESI include we can simple add a header X-ESI-RESPONSE, and this little piece will be fethed each time. MOC Varnish makes sure that this header is set.

There is a esi-test.vcl file in vcl-examples that can be used for this part of the tutorial.

Notice that we only have ESI processing if content is not png, gif, jpg swf etc. This could also be expanded to css and js files, but for now only media files are not processed for esi tags.

Now install the moc_varnish extension and clear cache. Make sure the “Convert USER_INT to ESI for Varnish” is enabled in the Extension manager. Also make sure that you have the Static TypoScript configuration for MOC Varnish loaded.

Install the moc_varnish_test extension which provides a simple plugin that show the current time on the server (for testing).

On a page of you own choice, insert the plugin “Varnish test” which is found in the moc_varnish_test extension.

Make sure that you do not have the config.xhtml_clening set, this will not work with Varnish (at least not in version 2.1).

Now view the page you inserted the plugin on. Refresh the page a couple of times (wait at least a second), you should see the timer show the correct time. If the timer is stuck, and does not change when you upate, somehting is wrong with your caching-setup.

While refreshing you site, take a look at your apache logfiles, and you will see that only a page with typeNum=978 is called instead of the whole page. This is because Varnish caches the whole page for 24h, but the USER_INT is fetched once for each request.

To test this more thoroughly, we use the shell command seige. Siege is used to stress test servers, and is basically the same as Apache benchmark (ab) but just a little better (however ab will do fine for the tests below). We run it in benchmark mode, so siege will try to make as many requests as possible to the server. That allows us to benchmark different configuration.

The command we use is

siege -b -c 10 -t10s  "http://edntest.local/test-10/"

Adjust the URL to your server configuration. The command will make 10 simultaneous connections for 10 seconds, and try to make as many requests as possible. First we run it without caching anything in varnish, and without the moc_vanish extension installed.

On my labtop, this gives 315hits in ten seconds.

Now we enable Varnish with the above configuration, and watch the apache log while we benchmark the site.

What we should see in the log files is something like this:

127.0.0.1 - - [17/Oct/2010:13:49:56 +0200] "GET /test-10/?id=10&type=978&key=INT_SCRIPT.a1c1197e2fea1585dcef88c2f1365ecc&from_varnish=1 HTTP/1.1" 200 604 "-" "JoeDog/1.00 [en] (X11; I; Siege 2.67)"

Mainly request for type=978&key=INT_SC... but no requests for the test-10 file alone. This indidcates that varnish cached the main file, but is now requesting the content of the USER_INT from TYPO3 for each user. Now lets see what this did to performance. We run siege command again, and find that we are now able to make only 284hits in ten seconds.This is actually worse than without varnish! But thinking about, it makes sense. For each user, varnish will have to request the content of the user_int script from the TYPO3 server. The TYPO3 server will have to initialize the whole front-end rendering in order to just render this single USER_INT object. Even worse, if there are more than one USER_INT on a page, varnish will contact TYPO3 once for each USER_INT, thus making the result even worse!

Not impressed? Hold on, it gets better.

Forcing time to live

Lets try to change the varnish configuration a little. Instead of making varnish call the TYPO3 server once for each USER_INT, we allow varnish to cache the USER_INT for 5 seconds. The whole page is still cached for 24h, but the single piece of HTML that the USER_INT generates is cached for only 5 seconds.

#Allow edgeside includes
esi;
if (obj.http.X-ESI-RESPONSE) {
  set obj.ttl = 5s;
} else  {
  set obj.ttl = 24h;
}

return (deliver);

Now change you VCL to the above, and load varnish with it. I find it most easy to just log into the console with telnet and issue a vcl.load command. Just to make sure, I issue an url.purge .* as well after each reload. See the varnish manual for a more complete guide on how to use the console.

Now while watching the apache logfile, I run the siege again. What I see in the log file is the following.

127.0.0.1 - - [17/Oct/2010:13:59:36 +0200] "GET /test-10/ HTTP/1.1" 200 3306 "-" "JoeDog/1.00 [en] (X11; I; Siege 2.67)"
127.0.0.1 - - [17/Oct/2010:13:59:36 +0200] "GET /test-10/?id=10&type=978&key=INT_SCRIPT.a1c1197e2fea1585dcef88c2f1365ecc&from_varnish=1 HTTP/1.1" 200 604 "-" "JoeDog/1.00 [en] (X11; I; Siege 2.67)"
127.0.0.1 - - [17/Oct/2010:13:59:41 +0200] "GET /test-10/?id=10&type=978&key=INT_SCRIPT.a1c1197e2fea1585dcef88c2f1365ecc&from_varnish=1 HTTP/1.1" 200 604 "-" "JoeDog/1.00 [en] (X11; I; Siege 2.67)"

Just three lines is appearing in the log file! First siege is requesting the test-10/ page. Varnish then caches this page for 24 hours (remember I cleared the Varnish cache before the test). Then is requests the test-10/?type=97 page to fetch the content of the USER_INT. This is then cached for 5 seconds, and since we run the test for 10 seconds, we get another request for the USER_INT again 5 seconds later.

Now lets see the result. In 10 seconds, siege was able to make 11367 requests! This I 36 times better than without varnish!

Of course, the caveat is that the USER_INT is only updated every 5 seconds, so it really should not contain any user-specific content since it is cached across all connections!

Determining time-to-live in cache: Cache-control header

Instead of hardcoding varnish to cache each USER_INT object for 5 seconds, we really should let TYPO3 decide how long the individual elements should be cacheable. Some elements might be cached for several hours, and some might only be cacheable for 5 seconds. We can do this by using the cache control headers of the http protocol. Varnish looks at the “Cache-control: max-age=XXX” header to determine the time-to-live in the cache.

To use this feature, you must configures your site to with the TypoScript

config.sendCacheHeaders = 1

This tells TYPO3 to send cache-control headers for normal pages. Normally TYPO3 will not send cache-control headers if either

  • You are logged into the backend
  • You are previewing workspace content
  • The no_cache param is set, either via TypoScript of via a GET var
  • A user is logged in.
  • The page contains external scripts
  • The page contains USER_INT's

The moc_varnish extensions actually removes the last constraint (unless disabled in Ext manager). This is because we do wish to send cache-control headers even though we have USER_INT's since they are transformed to ESI-tags instead, thus allowing Varnish to process them. The other contraints are still in function though.

So with the actuale page that now contains ESI-tags, we need to handle the caching period of the ESI's. The moc_varnish extension actually allows each USER_INT to send different max-age headers. To test this, insert the following TypoScript snippet in you TS-Template

plugin.tx_mocvarnshtest_pi1.max_age = 2

This tells TYPO3 to issue an Cache-control: max-age=2s header when serving the USER_INT for the ESI include. Change you VCL again, and completely discard the section where we test for obj.http.X-Typo3-NoCache. Your vcl_fetch should now look something like this

sub vcl_fetch {

  #Respect force-reload, and clear cache accordingly. This means that a ctrl-reload will actually purge
  # the cache for this URL.
  if (req.http.Cache-Control ~ "no-cache") {
    set obj.ttl = 0s;
    #Make sure ESI includes are processed!
    esi;
    return (deliver);
  }

  if (req.url ~ "\.(png|gif|jpg|swf)$") {
    unset obj.http.set-cookie;
    set obj.http.X-Cacheable = "YES:jpg,gif,jpg ans swf are always cached";
    return (deliver);
  }

  #Allow 24 hour stale content, before an error 500/404 is thrown. When a backend server is not responding
  # allow varnish to server stale content for 24 hours afters its expirery.
  set obj.grace = 24h;

  #Allow edgeside includes
  esi;

  #Make sure that We remove all cache headers, so the Browser does not cache it for us!
  remove obj.http.Cache-Control;
  remove obj.http.Expires;
  remove obj.http.Last-Modified;
  remove obj.http.ETag;
  remove obj.http.Pragma;

  #We rely on the Cache-control: max-age header to set ttl in cache. Typo3 sends these headers.
  return (deliver);
}

Load this VCL into you Varnish configuration, and purge the cache (purge.url .*). Now run the siege benchmark again, and wathc you Apache logfiles. You should see something like this:

127.0.0.1 - - [17/Oct/2010:21:55:50 +0200] "GET /test-10/ HTTP/1.1" 200 3306 "-" "JoeDog/1.00 [en] (X11; I; Siege 2.67)"
127.0.0.1 - - [17/Oct/2010:21:55:50 +0200] "GET /test-10/?id=10&type=978&key=INT_SCRIPT.13e8f885e27c06a6ff9ba2c2eaebe432&from_varnish=1 HTTP/1.1" 200 776 "-" "JoeDog/1.00 [en] (X11; I; Siege 2.67)"
127.0.0.1 - - [17/Oct/2010:21:55:52 +0200] "GET /test-10/?id=10&type=978&key=INT_SCRIPT.13e8f885e27c06a6ff9ba2c2eaebe432&from_varnish=1 HTTP/1.1" 200 776 "-" "JoeDog/1.00 [en] (X11; I; Siege 2.67)"
127.0.0.1 - - [17/Oct/2010:21:55:55 +0200] "GET /test-10/?id=10&type=978&key=INT_SCRIPT.13e8f885e27c06a6ff9ba2c2eaebe432&from_varnish=1 HTTP/1.1" 200 776 "-" "JoeDog/1.00 [en] (X11; I; Siege 2.67)"
127.0.0.1 - - [17/Oct/2010:21:55:57 +0200] "GET /test-10/?id=10&type=978&key=INT_SCRIPT.13e8f885e27c06a6ff9ba2c2eaebe432&from_varnish=1 HTTP/1.1" 200 776 "-" "JoeDog/1.00 [en] (X11; I; Siege 2.67)"

Since the cache was cleare, the first request made to Varnish will be sent through to TYPO3. That is the first “GET /test-10/” in the logfile. For this request, TYPO3 sends a Cache-control header, corresponding to the age of the TYPO3 cache, and since default is 24 hours, varnish inserts this page with a 24h ttl.

Then this is parsed for ESI includes, and Varnish determines to fetch the USER_INT from the TYPO3 backend. This is the “GET /test-10/?type=978...” request. Since we told TYPO3 that this USER_INT has a max-age of 2s, Varnish til insert this into its cache with a 2 second ttl. Now Varnish serves this for al subsequent requests in the next 2 seconds. 2 seconds later the ESI corresponding to the USER_INT has outlived its ttl, and is refetched from the TYPO3 backend and again inserted with a 2second ttl. This happens every two seconds until the siege is over.

The benchmark reached 8993 hits in 10seconds. Pretty neat.

Since we told TYPO3 to send cache control headers and the page has a max-age of 24 hours, most browser will respect this, and cache the file loacally on disk, and not request it from Varnish for the next 24 hours! So they wont see the USER_INT part update! To fix this we tell Varnish to strip these headers before sending the content to the client. This is the following lines from the configuration above

#Make sure that We remove all cache headers, so the Browser does not cache it for us!
remove obj.http.Cache-Control;
remove obj.http.Expires;
remove obj.http.Last-Modified;
remove obj.http.ETag;
remove obj.http.Pragma;

This concludes the first Tutorial on running TYPO3 and Varnish. What we really should learn, is that using it requires a good knowledge of how the single elements on a site works in order to determine their ttl in the cache. If used incorrectly, you might end up having a slower site than without Varnish! Especially if you site contains many USER_INT's that are fetched for all requests.

The extension allows you to set a default value for how many seconds a USER_INT should be cached. This is very effective for boosting a complete site, but you must take care, that not USER_INT creates content that is unique for each user, since content is cached across users.

VCL Examples

This section contains some Varnish examples that you are free to use. They should be properly commented to be easier to understand. Here we list the whole VCL as one file, be we generally recommend you to split the configuration into several files. We try to keep a folder for each main-configuration, that we then include.

The extension bundles with some sample vcl files in vcl-examples. Most of them are custom made for the Tutorial in this manual, but can be used in production with very little modification.

Production ready VCL

The esi-full.vcl is a configuration ready for production. I'll walk through each part of the configuration here.

First we define our backends servers. In this case, my Apache and Varnish is running on the same server. Varnish on port 80, and Apache on port 8080, so we define a single default backend server on localhost port 8080.

# This is the standard test VCL configuration used in the examples and documentation of
# moc_varnish. Use it at own risk, but feel free to modify it and use it for any purpose.
# If you have good suggestions for improvement, please send me an email on janerik@mocsystems.com
#
#Backend definitions
#

#default is localhost, on my computer Apache is running on port 8080. Change to you specific needs. See the Varnish website for exmamples of
#how to configures multiple backends and load-balancing etc.
backend default {
        .host = "127.0.0.1";
        .port = "8080";
}

We define the Access Control List “purge” that we later use for determining who can actually purge content from the cache.

acl purge {
        "localhost";
}

sub vcl_recv {

If the request is a PURGE request, we first check the IP agains our ACL, and granted we purge the URL.

if (req.request == "PURGE") {
     if (!client.ip ~ purge) {
         error 405 "Not allowed.";
     }

             purge("req.url ~ " req.url " && req.http.host == " req.http.host);
             error 200 "Purged.";
     }

If the backend-server (TYPO3) is responding, we allow a grace of 10 min, otherwise 24min. First time an object is asked for in the cache, but is TTL is outlived, varnish will ask the backend server for the content. This might take some time (compared to just retrieving it from the cache), and in the mean time Varnish will server the stale cache content to all other requests asking for the same object.

if(req.backend.healthy) {
            set req.grace = 10m;
} else {
            set req.grace = 24h;
}

Defaut handling of different requests. Notice that we already handles the PURGE request (which is not a RFC-2616 request).

if (req.request != "GET" &&
  req.request != "HEAD" &&
  req.request != "PUT" &&
  req.request != "POST" &&
  req.request != "TRACE" &&
  req.request != "OPTIONS" &&
  req.request != "DELETE") {
    /* Non-RFC2616 or CONNECT which is weird. */
    return (pipe);
}

All POST requests are not cached, as they will probably contain data that we should not cache.

if (req.request != "GET" && req.request != "HEAD") {
    /* We only deal with GET and HEAD by default */
    return (pass);
}

#do not cache awstats subfolder.
if (req.url ~ "/awstats") {
    return (pass);
}

When defining backends in round-robin server load balanced mode, backend login breaks, because TYPO3 uses the proccess id of the apache process running. This makes sure that we always pipe all requests directly to TYPO3. This might be obsolte (and undesired) when running Varnish 3 since the new random client director support sticky sessions, and this is a much more clean solution. You can leave it out, but if you have problems with logging in to the TYPO3 backend, try to enable this.

#logins need to go via pipe, so it dosnt break when there a multiple backends
if (req.url ~ "/typo3/index.php$") {
    return (pipe);
}

Alle images are always cached, this could very well be extended to css and JavaScript as well. If you have movies and other kind of heavy stuff, you could add this to the list of always cached content.

#Always cache all images
if (req.url ~ "\.(png|gif|jpg|swf)$") {
  return(lookup);
}

This bit is rather important. We need to handle users logged into the TYPO3 backend specially. We have two options: either we disable cache when logged into the backend. We can determine this with the be_typo_user cookie. This might cause som e problems as editors see content differently than normal website users, and since we only check the cookie, not its validity (that would reqire an expensive database lookup) we have the effect, that if you log in to the backend, and log out again. You still have the cookie, and Varnish wont cache content.

##Do not cache if either be_typo_user
    # Disabled for testing purposes
#if (req.http.Authorization || req.http.Cookie ~ ".*be_typo_user=.*") {
#    return (pass);
#}

We have experienced some problems with caching the whoel TYPO3 backend, and we have much better experience simply not caching requests for files in the typo3/ directory.

   if( req.url ~ "^/typo3/.*") {
                return (pass);
        }


    return (lookup);
}


sub vcl_fetch {

When a Force reload is issued, we simply set the objects time to live to 0seconds, forcing Varnish to fetch it from the cache again. This has the effect that if for some reason TYPO3 did not clear the Varnish cache, a force reload will fix it. You might want to restrict this to internal users by using an ACL.

#Respect force-reload, and clear cache accordingly. This means that a ctrl-reload will acutally purge
# the cache for this URL.
if (req.http.Cache-Control ~ "no-cache") {
        set obj.ttl = 0s;
        #Make sure ESI includes are processed!
        esi;
        return (deliver);
}

For png and flash files, we remove all set-cookie requests, and we add our own header, to display why the file was cached.

if (req.url ~ "\.(png|gif|jpg|swf)$") {
   unset obj.http.set-cookie;
   set obj.http.X-Cacheable = "YES:jpg,gif,jpg ans swf are always cached";
   return (deliver);
}

Allow ESI. Notice that all image files are already delivered above, so only non-image are processed for ESI. We might by very rare coincidence have an image that contains the bits for <esi:include

#Allow edgeside includes
esi;

This is the place where we could force all contant to have a ttl of 1minut or other. In this example we rely on TYPO3 to send Cache- control headers so varnish can determine the ttl itself.

#Since we rely on TYPO to send the correct Cache-control headers, we do nothing except for removing the cache-control headers before output

Be very sure to remove all Cache-control headers when sent to the client. We do not want the clients browser to cache any content itself, we want it to ask Varnish for the content.

#Make sure that We remove alle cache headers, so the Browser does not cache it for us!
remove obj.http.Cache-Control;
remove obj.http.Expires;
remove obj.http.Last-Modified;
remove obj.http.ETag;
remove obj.http.Pragma;

Finallym deliver the cached/fetched content.

        return (deliver);
}

The configuration below, is simply a custom error message giving a little more human error messages than “Guru meditation” which is the default error message.

sub vcl_error {
    set obj.http.Content-Type = "text/html; charset=utf-8";

    synthetic {"
    <?xml version="1.0" encoding="utf-8"?>
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
     <html>
      <head>
      <title>"} obj.status " " obj.response {"</title>
      <meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
     </head>
    <body>
     <script type="text/javascript">
       function show_moreinfo(var1){
         document.getElementById(var1).style.display="block";
         document.getElementById(var1+"_button").style.display="none";
       }
      </script>

      <div style="color:#A5C642;">
       Der er desv&aelig;rre et problem med at tilg&aring; den &oslash;nskede side.
       <br/>
       Pr&oslash;v venligst igen senere.
      </div>
      <br />
      <div style="color:#949494;">
       The requested page is not available.
       <br/>
       Please try again later.
      </div>
      <br />

      <span id="moreinfo" style="display:none;border:2px #a5c642 solid; width: 550px;">
       <span style="color:#949494;">
        <h2>More information: </h2>
        <h3>Error "} obj.status " " obj.response {"</h3>
       <p>"} obj.response {"</p>
       <p>XID: "} req.xid {"</p>
       </span>
      </span>
      <br />
      <input id="moreinfo_button" type="button" value="More information" onclick="show_moreinfo('moreinfo')"/>

      <br /><br />
      <div id="logo">
       <img src="´9fmbuj1qbdaTvvx99UF47WXn4CDaXWDgQfepyB2AH72IIRNIQhgg0W5xtxtQBWI4VIXsHbdhPgR2NuIMmnooXMfPgejiCy2iJduMg4XY1001giTi5bhyGF+JuqEw4o+uuRfVi8yWGGAPSbZEn1ONmmbeFJFKeVKUEl45Y5Dthbfli+l52VzQ8JolXlktlTdbVgm6KVUbLa50nKU9XQBCBqEZMIHElgwAkOAaiXBBwyZAIKGN3HgZwmMSsCdnSsBRxxOJQQBxA004BDEDUDgQMMNO+CAEwdA+EDDqp6mcBMN/p+a5CmeEgjBGAYizFabhxGMEASVGhRhAlAXELGcDDjUZUIQEXBAxJIRuBDYTbLV4MACCzDQQRAQINdtBwxg68AOIhhwgAMNdFAEBeEugEERO2Bw7QIdsBBuAzUEscMD2DLwbmrW6VSCEETF1uVNQowgwQ0v5OVBEBaAEERRFmhowUoHQMDCxgTsoAAERQQhwgEwDLAxCw8wAEMCC4ggAgwPIHByAhDAAAAGMJxQgAEnsIDBAEEo0MDJBdTAmXgl7GDVxekhrDDDPH0AscRnWeCBThespEACDnStbg0D7FBEASzUEEDXDjxQQxENPKB1AQnEbUACMBSwUtcrCRAE/gMIoP3uYzmOQHBOFRRxcK0jtGUqWykIEYEGT1mVwlwSBCtQBy530AEBJxSB7shF1IA55g8YUAQD3RaxwMc11LADDLDbLRDeAoXtsgiaG9DBDbtlqpYEkWslBHAa+BAqDsgHsSusyOPQQxCuRrCYQAc8oPnmLIQ8AAHZs0DA9Q4csK7uDyRAQQEYaJ5AA3Xf7cBKYXN9fQEiUD6kBd+xVWhOHFigFQcgKMFD/JeTPgVwUVrpy0B2AIGuPeAEsnmA+EImggd0jQJiC0IHHAABsVUwbdsSmWxgJxANwquBacsVs8JTHNwc5kaDeYFMPEYAGLDoKZdi4VYgs6C1jAkmPEHIl49uAKQW5qkyKaKKCSgFEyKUigYJWYgAQSARilwEIxbIYhYvoEWMZIAjH/hISEbyAmT5AElMLEJAAAA7"/>
     </div>
    </body>
   </html>
     "};
    deliver;
}

#custom error page

How does it work?

ESI

In order to handle the ESI, we alter the front-end rendering of USER_INT objects. Normally, the generation simply inserts a marker, <!-- INT.SCRIPT.*** which is then processed just before the content is output. We simple change this to render an <esi:include instead. For thus purpose, we have registered a new pagetype 978, which can take such a marker and render the content. We rely on the page-cache being available. This is because TYPO3 saves the TypoScript configuration for this particular USER_INT together with the page cache. This might sometimes result in Varnish requesting a certain ESI, and TYPO3 is unable to render it because the page-cache for the page is gone (for whatever reason). However this should not happen s long as TYPO3 makes sure that the Varnish cache is cleared when the page cache is.

Vache clearing

Actually the cache clearing is the hard part. The problem lies in the fact that the TYPO3 backend does not know the URL of a given page. Depending on whether SimulateStatic, cooluri og RealURL is enabled (and how they are configured), one page might have different URLs.

Currently the extension is only compatible with RealURL running with URL's like http://mysite.com/about/contact/ and with PageCache enabled for realURL. This allows us to look up all URL's for a page in the RealURL cache table.

When a (or possible several) URL for a page is found, the webserver makes a PURGE request to the given URL. Currently we use the host the backend is running on for the request. So you need to be logged in to the backend on a URL that the server resolves to the Varnish host. We are currently working on a solution where the varnish URL can be specified in the extension manager.

Known problems

  • Since Varnish does not understand urlencoded src attributes in <esi:includes – at least not in ealier versions. The ESI includes fail if you have config.xhtml_cleaning enabled.
  • If you do not use RealURL, the Varnish cache is not cleared when the TYPO3 cache is cleared. This results on some pretty “funky” errors. The USER_INT object might render “Unable to find page cache”, or other similar errors. This is due to cache inconsistency issues. If this turns out to be a major problem, we will stop relying on TYPO3 internal cache and save the USER_INT configuration ourselves.

To-Do list

  • Making Cache-clearing work better. Enable it to clear subsections etc.
  • Make cache clearing both by direct request, but also by telneting to Varnish management console.
  • Handle Frontend login. Provide examples for different ways of handling this.
  • More examples? Send me an e-mail with features you would like examples of.

ChangeLog

1.0.0

Version

1.0.0

Changes

Initial release

1.0.1

Version

1.0.1

Changes

Updated Manual.

´

img-2 20