NoSwitchport.com

Quick and Dirty Web Proxy Load Balancing Using PAC Files

Posted in Networking by shaw38 on October 21, 2010

While digging through the archives of the Cisco Ironport support knowledge base, I came across a pretty slick solution for load balancing client web traffic between two or more proxy servers in a PAC file-based deployment scenario. So what is a PAC file? A PAC file is a text file containing policy information written in JavaScript and interpreted by a web browser each time a HTTP request is made.  The policy defines what and where web traffic should be sent by the web browser, either to the proxy server or bypass the proxy server. Typically, this is a fairly vanilla policy. For example (see comments for detail):

function FindProxyForURL(url, host)
{
	if (isPlainHostName(host) || dnsDomainIs(host, ".test.net")) // Bypass for non-dotted hostname or test.net domain
		return "DIRECT";
	else if (isInNet(host, "10.0.0.0", "255.0.0.0"))		// Bypass proxy for RFC1918
        return "DIRECT";
	else if (isInNet(host, "172.16.0.0", "255.240.0.0"))	// Bypass proxy for RFC1918
        return "DIRECT";
	else if (isInNet(host, "192.168.0.0", "255.255.0.0"))	// Bypass proxy for RFC1918
        return "DIRECT";
	else if (isInNet(host, "127.0.0.0", "255.0.0.0"))		// Bypass proxy for RFC3330
        return "DIRECT";
 	else
        return "PROXY IPROXY01:8080; IPROXY02:8080 DIRECT";		
}													

Using the PAC file above, all web traffic not matching a conditional statement resulting in a “direct” action will be sent only to iproxy01 in a steady state. Upon failure of iproxy01 traffic be sent to iproxy02. What if we have 40Mb of internet traffic we would like to load balance between the two? We could deploy WCCP and move to transparent redirection but are there any options with a PAC file? Absolutely!

Since a PAC file is JavaScript-based, the plethora of Java classes are at your disposal to manipulate policy as you see fit. We’ll need to instruct the web browser to send connections to either web proxy using the result of some sort of algorithm. To accomplish this, we can write a Java function using a couple objects from the math class:

function selectRandomProxy()
{
	switch( Math.floor( Math.random() *2))		// Randomly generate an integer of 0 or 1
	{
		case 0: return "PROXY IPROXY01:8080; PROXY IPROXY02:8080; DIRECT;"
		case 1: return "PROXY IPROXY02:8080; PROXY IPROXY01:8080; DIRECT;"
	}
}												

This function (selectRandomProxy()) will randomly select either case 0 which sends web traffic to iproxy01 or case 1 which sends web traffic to iproxy02. Using Math.random(), a random value will be select between 0.0 and 1.0 (i.e. 0.7234213). This value is then multiplied by 2. Math.floor() will then normalize the result to the closest integer no greater than the original result. For example, if Math.random() generates a random value of 0.25 which is then multiplied by two (0.50), Math.floor() would normalize this to an integer of zero. If Math.random() generates a random value of 0.75 which is then multiplied by two (1.50), Math.floor() would normalize this to an integer of one. A switch statement then evaluates the resulting integer value against a list of cases and returns the case matching the integer.

Now we’ll integrate this new function into our original PAC file:

function FindProxyForURL(url, host)
{
      if (isPlainHostName(host) || dnsDomainIs(host, ".chesco.org"))
            return "DIRECT";
	  else if (isInNet(host, "10.0.0.0", "255.0.0.0"))
            return "DIRECT";
	  else if (isInNet(host, "172.16.0.0", "255.240.0.0"))
            return "DIRECT";
	  else if (isInNet(host, "192.168.0.0", "255.255.0.0"))
            return "DIRECT";
	  else if (isInNet(host, "127.0.0.0", "255.0.0.0"))
            return "DIRECT";
 	  else
            return selectRandomProxy();
}
function selectRandomProxy()
{
	switch( Math.floor( Math.random() *2))
	{
		case 0: return "PROXY IPROXY01:8080; PROXY IPROXY02:8080; DIRECT;"
		case 1: return "PROXY IPROXY02:8080; PROXY IPROXY01:8080; DIRECT;"
	}
}

The web browser will evaluate the configured PAC file prior to every new HTTP connection and over time, result in nearly a 50/50 distribution of traffic between both web proxies.

A word of caution: There is no intelligence or session tracking with the load balancing decision making. It’s completely stateless. During a single HTTP session, objects will be fetched using both web proxies. While this isn’t necessarily an issue from the perspective of the web proxies, this may wreak some havoc on web apps behind a load balancer relying on session stickiness by source IP address. As HTTP objects part of the same session are fetched from two different source IP addresses (two web proxies), this will look like a new session to the destination load balancer and may not be “stuck” to the same real server. As long as both proxies are PAT’d to the same address, this shouldn’t cause an issue. Also, if you are doing any type of SSL termination on your web proxies for content inspection, this will cause you some problems.

One Response

Subscribe to comments with RSS.

  1. GD said, on October 27, 2010 at 7:52 am

    PAC files and Java
    Be aware that Java also reads and (tries) to execute your PAC file. I believe that some Javas can break if you do anything too out of the ordinary. Not sure if it’s bug or lack of features or even just that Java has stricter syntax checking than the browser… but troubleshooting it wasn’t nice |:


Leave a reply to GD Cancel reply