Thread Pools and Erlang Models
Sizing, Danish Style
Folks in telecommunications and operations research have used Erlang models for almost a century. A. K. Erlang, a Danish telephone engineer, developed these models to help plan the capacity of the phone network and predict the grade of service that could be guaranteed, given some basic metrics about call volume and duration. Telephone networks are expensive to deploy, particularly when upgrading your trunk lines involves digging up large portions of rocky Danish ground or running cables under the North Sea.
The Erlang-B formula predicts the probability that an incoming call cannot be serviced, based on the call arrival rate, average call time, and number of lines available. Erlang-C is similar, but allows for calls to be queued while waiting for service. It predicts the probability that a call will be queued. It can also show when calls will never be serviced, because the rate of arriving calls exceeds the system's total capacity to serve them.
Erlang models are widely used in telecomm, including GPRS network sizing, trunk line sizing, call center staffing models, and other capacity planning arenas where request arrival is apparently random. In fact, you can use it to predict the capacity and wait time at a restaurant, bank branch, or theme park, too.
It should be pretty obvious that Erlang models are widely applicable in computer performance analysis, too. There's a rich body of literature on this subject that goes back to the dawn of the mainframe. Erlang models are the foundation of most capacity management groups. I'm not even going to scratch the surface here, except to show how some back-of-the-envelope calculations can help you save millions of dollars.
One Million Page Views
In my case, I wanted to look at thread pool sizing. Suppose you have an even 1,000,000 requests per hour to handle. This implies an arrival rate (or lambda) of 0.27777... requests per millisecond. (Erlang units are dimensionless, but you need to start with the same units of time, whether it's hours, days, or milliseconds.) I'm going to assume for the moment that the system is pretty fast, so it handles a request in 250 milliseconds, on average.
(Please note that there are many assumptions underneath simply statements like "on average". For the moment, I'll pretend that request processing time follows a normal distribution, even though any modern system is more likely to be bimodal.)
Table 1 shows a portion of the Erlang-C table for these parameters. Feel free to double-check my work with this spreadsheet or this short C program to compute the Erlang-B and Erlang-C values for various numbers of threads. (Thanks to Kenneth J. Christensen for the original program. I can only claim credit for the extra "for" loop.)
From Table 1, I can immediately see that anything less than 70 threads will never keep up. With less than 70 threads, the queue of unprocessed requests will grow without bound. I need at least 91 threads to get below a 1% chance that a request will be delayed by queueing.
Performance and Capacity
Now, what happens if the average request processing time goes up by 100 milliseconds on those same million requests? Adjusting the parameters, I get Table 2.
Now we need a minimum of 99 threads before we can even expect to keep up and we need 122 threads to get down under that 1% queuing threshold.
On the other hand, what about increasing performance by 100 millseconds per request? I'll let you run the calculator for that, but it looks to me like we need between 42 and 59 threads to meet the same thresholds.
That swing, from 150 to 350 milliseconds per request makes a huge difference in the number of concurrent threads your system must support to handle a million requests per hour---almost a factor of 3 times. Would you be willing to triple your hardware for the same request volume? Next time anyone says that "CPU is cheap", fold your arms and tell them "Erlang would not approve." On the flip side, it might be worth spending some administrator time on performance tuning to bring down your average page latency. Or maybe some programmer time to integrate memcached so every single page doesn't have to trudge all the way to the database.
Summary and Extension
Obviously, there's a lot more to performance analysis for web servers than this. Over time, I'll be mixing more analytic pieces with the pragmatic, hands-on posts that I usually make. It'll take some time. For one thing, I have to go back and learn about stochastic process and Markov chains. Pattern recognition and signal processing I've got. Advanced probability and statistics I don't got.
In fact, I'll offer a free copy of Release It to the first commenter who can show me how to derive an Erlang-like model that accounts for a) garbage collection times (bimodal processing time distribution), b) multiple coupled wait states during processing, c) non-equilibrium system states, and d) processing time that varies as a function of system utilization.