Friday, November 21, 2008

Creating a Reverse Proxy Cache using Open Source Software - Part 1

As library OPACs have evolved, more and more content has been added to enhance the patron experience. Common content includes cover art, reviews, and even chapters of books. Libraries cannot provide this content on their own, so they use providers like Syndetics. Many libraries are now seeing their bandwidth maxed out as patrons move toward higher bandwidth applications like streaming video. As the content comes across the internet, this can cause slowness during catalog searches as the OPAC struggles to download the content and insert it into the results pages.

What can we do to speed this up?

One easy solution is to add more bandwidth -- if only we had unlimited budgets and a never ending supply of bandwidth! Unfortunately, this may not be possible.

A second solution is to cache the content locally, so that delivery is speeded up, particulary to those patrons who are inside your library. Instead of having to go out to the Internet for every piece of content, the content is pulled locally from a cache server in house, and delivered at internal network speeds -- far greater than internet speed. This is called a reverse proxy cache.


Creating a reverse proxy cache can be done at low or no cost to your library. The software can be obtained for free, and potentially run on an older server or existing machine in combination with other services you currently offer. In this example, we are going to use the Fedora operating system and the Squid proxy server. However, any open-source Linux that has Squid should suffice.

Fedora is a free, Linux-based operating system. The software can be downloaded from the Fedora Project, http://fedoraproject.org/oject.org/. There are about 6 CD-ROM images, or one DVD image to download. Be patient, it may take a few hours. Once Fedora has been downloaded, copy the ISO images to CD-ROM or DVD, and you are ready to install.

When installing Fedora, make sure you are installing Squid. Squid should be in the Internet Servers group of programs, and should install by default if you tell the install that the machine is going to be serving web pages.

Once the install is complete, we need to do some quick configuration of Squid. Squid will install a default configuration file, usually in /etc/squid. We're going to create a new squid.conf file, so you can backup and delete the default configuration file. Using vi or another editor, create a new squid.conf file with the following lines:

http_port 80 defaultsite=syndetics.com
cache_peer 165.254.62.247 parent 80 0 no-query originserver
acl syndetics dstdomain syndetics.com
http_acess allow syndetics

This configuration file works with Squid 3.0, which is the current version as of this writing. Please note that it references Syndetics -- if you do not use Syndetics, you will obviously need to change both the domain reference, and the IP address reference.

Once Squid is installed, you must create the swap directories. Issue this command:

# squid -z

Then, you are ready to start Squid:

# squid start

You can examine the various Squid commands by typing:

# squid -?

If Squid doesn't start, or you need more help, there is a wealth of information on configuring and troubleshooting Squid available at the Squid site: http://www.squid-cache.org/

Next time, I'll talk post how we connected our new Squid reverse proxy cache to our ILS.

No comments: