• Caching and Load Balancing

last modified October 22, 2008 by slinkp

Varnish and nginx setup

To improve performance of OpenPlans-based sites it is useful to set up a caching server between the user and the OpenPlans instance. To improve scalability, it is helpful to have multiple OpenPlans instances sharing the same databases, with a load balancer to delegate requests to the back-end servers as needed.

To ease the process of getting caching and load balancing servers set up, a buildout has been created which will install and configure Varnish and nginx, specifically configured to be used in front of one or more OpenPlans instances. (NOTE: The OpenPlans instances must be set up separately; the buildout only installs Varnish and nginx, it does not install OpenPlans.)

To use the buildout, simply use the following commands:

$ svn co https://svn.openplans.org/svn/buildouts/openplans-varnish-nginx/

$ cd openplans-varnish-nginx

# <OPTIONAL>

$ virtualenv --no-site-packages .

$ source ./bin/activate

# </OPTIONAL>

$ python bootstrap.py

$ ./bin/buildout

At this point, you will be interactively prompted to enter several pieces of information:

  • the hostname and port number you want for the Varnish server
  • the hostname and port number you want for the nginx server
  • the hostname and base port number for 2 OpenPlans instances

Once the build is complete, you should be able to start the servers with the following commands:

$ ./bin/nginxctl start
$ ./bin/start-varnish

The result will be a serving pipeline like so:

Varnish (caching) ---> nginx (round robin load balancing) ---> OpenPlans servers 

Varnish will be set up with a relatively safe caching policy. It will not cache any pages unless they explicitly contain caching instructions in the form of a Cache-Control HTTP header. This pushes all caching policy decisions out of the caching server and into the applications themselves. This is a reasonable choice because OpenPlans consists of many different applications, each with varying levels of cacheability.

CacheFu configuration

Packaged with OpenCore (the Zope-based portion of the OpenPlans stack) is a suite of products called CacheFu that is used to manage the caching policy of any pages that are served up by Zope. Complete CacheFu documentation is beyond the scope of this document, but a brief overview of where the configuration lives and what the default policies are follows.

CacheFu configuration is handled via the cachesettings.xml file in the GenericSetup profile. For convenience, you can use the Plone interface for changing this configuration at available at {YOURSITE}/portal_cache_settings. Once these have been adjusted to your needs, you can use the portal_setup tool to export a new cachesettings.xml file for use in your own GenericSetup profiles.

By default, CacheFu is configured but this configuration is inactive. You must activate the configuration (using the interface or by changing and reapplying the profile) before any of the caching policy will be used. The default configuration implements the following policy:

  • CSS and JS are cached aggressively, in the browser and the proxy server for 24 hours, even for authenticated users
  • Files and images are cached in the proxy cache for 24 hours if they are visible to anonymous users, are not cached at all if they are not (this should maybe be changed to use an etag)
  • Wiki pages and news items views are cached in the proxy cache for 10 minutes for anonymous users. (News items change much less often... maybe they should get a separate rule to cache for longer.) For authenticated users they are cached with an etag that includes the member id and last modified time as part of the etag construction.
  • Certain views (i.e. any view named either "view" or "summary") on certain types of folderish items, including the site root, the projects folder, projects themselves, and the mailing list folders, are cached in the proxy for 10 minutes for anonymous users. They are cached with an etag for authenticated users. Among the values used to generate the etag is "Time of last catalog change", which means that any object in the site getting reindexed will invalidate the cache.