Clustering and Caching GeoServer

This documentation is no longer maintained. Please see the new GeoServer documentation at http://docs.geoserver.org

With a small budget to work with, I decided to go with a three machine "cluster" of Geoserver boxes. First, I built one machine (dual xeon 2.8ghz with 1.5GB ram) and configured it to run geoserver.

Squid listens on port 80, forwarding through to port 8080 for things not in its cache (this is called "http accelerator mode" in squid) Squid 'accelerator mode' FAQ/HOWTO. In addition, squid is configured to broadcast out to its peers to find things not in its cache...in case another nearby server has seen the requested content.  Note:  This last part (about linking the squids as peers) was very hard.  I still don't have it right. It involves the "cache_peer" squid directive, along with a bunch of other acl-ish direcives allowing cache access from the specific peers.  Sometimes I get really fast access to previously generated images.  Sometimes I don't.  At some point I'll probably figure out what's going on and post a solution here.

I then cloned the hard disk two more times using g4u - Ghost 4 Unix (a.k.a. "slurpdisk") and a pair of IDE cables, and put the two "new" hard drives into the other computers. Then I changed their respective ip addresses and hostnames, and hooked them all up to a gigabit switch.

So now a request for a map to one of the three computers would go something like this:

  1. Request received by server-1 on port 80
  2. broadcast message (via ICP) to server-2 and server-3: "Have you seen this request before?)
  3. Responses from server-2 and server-3 : "no!"
  4. Forward the request through to server1:8080 (where tomcat/geoserver is running)
  5. Cache the result, (if and only if appropriate caching headers are present on the result)

 Note: as mentioned above, steps 2 &3 are mostly broken for me.  I may fix them later.

Great! Except I have three ip addresses and no way to distribute load across the three servers.

So I found an old P-III 733, and installed debian 3.1 on it. I also installed "balance"

Balance is a TCP load-balancer, and it lets us expose one external ip address as an end-point for all the machines, and then round-robin incoming requests through to the three different back-end servers.

So now we have one public ip address, which forwards requests at the TCP level (round-robin style) through to the back-end "cluster" of identically configured machines.

The only remaining problem was how to maintain a consistent configuration across the different machines. The very day that I faced this problem geoserver released "GEOSERVER_DATA_DIR" support.

Sweet.

So I set up samba on the P-III 733 load-balancer machine, and set up the three "workhorse" machines to mount the samba shared directory. I then set up the workhorses to use the shared directory as their geoserver_data_dir (and actually linked their geoserver.war file onto that share, too...so depolyment of a new version of geoserver is simply copy-to-samba-share -> restart all three machines).

I then set up ssh and keychain on those machines, and wrote a script which performs some basic admin tasks like "start, stop, reload-config" on all machines at once, from a central place.  This script is available as an attachment to this page.

Note

Note that this work was ground breaking towards getting GeoServer working with caching, but there are some improved solutions of late. See the TileCache Tutorial for a really nice way to get WMS caching. These notes on load balancing and cluster are still relevant though.

Added by Chris Holmes, last edited by Chris Holmes on Feb 23, 2007  (view change)

Comments

peppo.herney@gmx.de says:

Hello,
I am trying to set up a configuration with geoserver and squid, so the same map does not have to be generated again and again. Everything works fine except nothing is cached. Squid is running in the http accelerator mode, but always redirects the requests to the geoserver.
Is there any options which need to be set on the squid, so it would cache the images?
Is there maybe another easy way to make caching possible?

Thank you for help.

Chris Holmes says:

Did you set the featureTypes in GeoServer to enable caching with a max age? You have to set it for each featureType in the featureType editor.

In the future I hope to have an easier way to make caching possible, but for now this is it.

thijsbrentjens@gmail.com says:

Another caching option might be OSCache (http://wiki.opensymphony.com/display/CACHE/Home). It's very easy to configure, well-documented and has lots of features. I've only tried it today, but it seems very promising to me.

I think it would be very easy to integrate this tool in Geoserver as well and would make a nice improvement.

thijsbrentjens@gmail.com says:

One addition: I'm not sure the license can cause any troubles, but it is derived from Apache's license, so I hope it's okay then.

Chris Holmes says:

Yeah, OSCache is the top of my list to try. I think one could make a plug-in on 1.4.x quite easily. I don't think apache license should be a difficulty, as I believe GPL and Apache grant exceptions for one another. And regardless we can make it a separate download as a plug-in. If anyone has time to explore it as a plug-in for GeoServer do let us know.

peppo.herney@gmx.de says:

Hello,

thanks for your previous reply. Unfourtunatly I could not resolve the issue, also after adding the max age directive. Adding any number there did not change the header: "Cache-Control: max-age=0"
I also put it into a cacheability checker and it said the object would be stale and did not have any information on freshness, therefore my squid does not keep it in its cache.
Thanks for any hints.

ilyesjrad says:

when clustering geoserver, did we need 3 machines?

could we install 3 jvm in one server Linux and work with them?

View Attachments (1) Info