Description of the Rate.d Load Balancing Algorithm
The Application Gateway System (AGS) balances load across a cluster of
servers. It uses an adaptive load balancing algorithm that avoids
introducing a single point of failure. The algorithm adapts quickly
to changes in configuration or actual load of the server machines.
The AGS load balancing algorithm requires some custom client code but
is far more flexible than the custom client solution (e.g., in
Netscape browsers which special-case the Netscape home
page). Compared to round robin DNS, it provides fast, automatic
adaptation to changes in load or configuration. Compared to Cisco's
LocalDirector, a commercially available packet rewriting device, the
fully distributed AGS implementation has no central point of failure
and provides more direct measurements of server load.
The Load Balancing Algorithm
The AGS load balancing algorithm works by running a low impact daemon
program, known as rate.d, on each server machine in the cluster.
Using IP broadcast or multicast, the client sends a single packet
containing a request for service to the cluster. The rate.d daemons
on all the participating servers receive and evaluate the request. The
algorithm selects a single server from the cluster to handle the
request, using a proportional share lottery (described below) to
balance the load. The selection is made by each daemon independently,
but each daemon usually makes the same selection. The selected server
sends a unicast response to the client, indicating that it will accept
the client's request for service. Once the client receives the
response, it initiates the actual service connection to the server
machine indicated in the usual way (e.g., by opening an FTP or HTTP
connection).
How does a rate.d daemon know that it is the most suitable one to
handle a request? The daemon makes job assignments based on load
information that is broadcast by each server several times a second.
Each daemon monitors these broadcasts and keeps a loosely consistent
view of the load on each server. The lottery is weighted so that the
number of jobs assigned to a particular server is inversely
proportional to its current load.
The lottery is held so that, when the daemons have exactly the same
load information, they make the same choice, and when they have
approximately the same information, they make the same choice with high
probability. The lottery does not depend on any other data, only a
hash of the request packet is used to determine the outcome. In
particular, the history of previous lotteries held has no effect.
This is important to insure that lost client requests do not affect
the stability of the algorithm.
In general, the load of a machine varies gradually. We can assume
that the load of a machine remains the same until another update is
received. This, together with the properties of the lottery, means
that losing an occasional broadcast update from another daemon does
not greatly affect the lottery outcome. The same is true when
updates and requests are received in different orders by different
machines. On the other hand, when no updates are received from a
particular machine for a longer period of time, that machine is deemed
to have crashed and is no longer considered as a candidate.
The rate.d algorithm has a fallback mechanism that takes care of
inconsistencies in the lottery outcomes and of lost packets. When more
than one machine responds to a request, the client simply uses the
first response. When no machine responds to a request, two timeouts
come into play.
First, as part of their regular update broadcasts, the rate.d daemons
also transmit a list of the request IDs to which they have recently
responded, and the other rate.d daemons track this information. When
the daemon that should have responded to a particular request fails to
indicate this in its update within a certain interval, the other
daemons repeat the lottery for this request, this time disregarding
the previous lottery winner. If necessary, this process is repeated.
The fallback mechanism takes care of three different sources of
inconsistencies between lottery outcomes: differences in relative
arrival times between requests and update broadcasts, loss of request
packets on some machines, and partial or total loss of update
broadcast packets. In rare cases, a second lottery is held and a
second response sent unnecessarily; in this case the client will just
ignore the second response.
The client can also time out and resubmit its request. The client
code must implement a timeout, because the request may be lost on its
way to the service cluster. The client timeout period is longer than
the lottery's internal timeout because of the higher latency of
wide-area communications.
Many other aspects and details of the AGS system can only be mentioned
in passing. The update broadcasts contain load information that is
calibrated by the capacity of the server, so that load can be balanced
across machines with different capacities. The rate.d implementation
supports running a number of different services on the same cluster,
possibly on different subsets of the machines in the cluster, all
handled by a single rate.d daemon per machine. A secure remote
management system allows control of system activity, including
activation and tuning of rate.d services.
|