Thread Pools and Erlang Models

Sizing, Danish Style

Folks in telecommunications and operations research have used Erlang models for almost a century. A. K. Erlang, a Danish telephone engineer, developed these models to help plan the capacity of the phone network and predict the grade of service that could be guaranteed, given some basic metrics about call volume and duration. Telephone networks are expensive to deploy, particularly when upgrading your trunk lines involves digging up large portions of rocky Danish ground or running cables under the North Sea.

The Erlang-B formula predicts the probability that an incoming call cannot be serviced, based on the call arrival rate, average call time, and number of lines available. Erlang-C is similar, but allows for calls to be queued while waiting for service. It predicts the probability that a call will be queued. It can also show when calls will never be serviced, because the rate of arriving calls exceeds the system's total capacity to serve them.

Erlang models are widely used in telecomm, including GPRS network sizing, trunk line sizing, call center staffing models, and other capacity planning arenas where request arrival is apparently random. In fact, you can use it to predict the capacity and wait time at a restaurant, bank branch, or theme park, too.

It should be pretty obvious that Erlang models are widely applicable in computer performance analysis, too. There's a rich body of literature on this subject that goes back to the dawn of the mainframe. Erlang models are the foundation of most capacity management groups. I'm not even going to scratch the surface here, except to show how some back-of-the-envelope calculations can help you save millions of dollars.

One Million Page Views

In my case, I wanted to look at thread pool sizing. Suppose you have an even 1,000,000 requests per hour to handle. This implies an arrival rate (or lambda) of 0.27777... requests per millisecond. (Erlang units are dimensionless, but you need to start with the same units of time, whether it's hours, days, or milliseconds.) I'm going to assume for the moment that the system is pretty fast, so it handles a request in 250 milliseconds, on average.

(Please note that there are many assumptions underneath simply statements like "on average". For the moment, I'll pretend that request processing time follows a normal distribution, even though any modern system is more likely to be bimodal.)

Table 1 shows a portion of the Erlang-C table for these parameters. Feel free to double-check my work with this spreadsheet or this short C program to compute the Erlang-B and Erlang-C values for various numbers of threads. (Thanks to Kenneth J. Christensen for the original program. I can only claim credit for the extra "for" loop.)

Table 1. Erlang-C values at 250 ms / request

N	Pr_Queue (Erlang-C)
67	undef
68	undef
69	undef
70	0.921417281
71	0.791698369
72	0.676255938
73	0.574128540
74	0.484342834
75	0.405921606
76	0.337892350
77	0.279296163
78	0.229196685
79	0.186688788
80	0.150906701
81	0.121031288
82	0.096296202
83	0.075992736
84	0.059473196
85	0.046152756
86	0.035509802
87	0.027084849
88	0.020478191
89	0.015346497
90	0.011398581
91	0.008390600
92	0.006120940
93	0.004424999
94	0.003170077
95	0.002250524
96	0.001583268
97	0.001103786
98	0.000762573
99	0.000522098

From Table 1, I can immediately see that anything less than 70 threads will never keep up. With less than 70 threads, the queue of unprocessed requests will grow without bound. I need at least 91 threads to get below a 1% chance that a request will be delayed by queueing.

Performance and Capacity

Now, what happens if the average request processing time goes up by 100 milliseconds on those same million requests? Adjusting the parameters, I get Table 2.

Table 2. Erlang-C values at 350 ms / request

N	Pr_Queue (Erlang-C)
96	undef
97	undef
98	0.907100356
99	0.797290966
100	0.697789489
101	0.608014385
102	0.527376532
103	0.455282634
104	0.391138874
105	0.334354749
106	0.284347016
107	0.240543652
108	0.202387733
109	0.169341130
110	0.140887936
111	0.116537521
112	0.095827141
113	0.078324041
114	0.063626999
115	0.051367297
116	0.041209109
117	0.032849334
118	0.026016901
119	0.020471625
120	0.016002658
121	0.012426630
122	0.009585560
123	0.007344611
124	0.005589775
125	0.004225555

Now we need a minimum of 99 threads before we can even expect to keep up and we need 122 threads to get down under that 1% queuing threshold.

On the other hand, what about increasing performance by 100 millseconds per request? I'll let you run the calculator for that, but it looks to me like we need between 42 and 59 threads to meet the same thresholds.

That swing, from 150 to 350 milliseconds per request makes a huge difference in the number of concurrent threads your system must support to handle a million requests per hour---almost a factor of 3 times. Would you be willing to triple your hardware for the same request volume? Next time anyone says that "CPU is cheap", fold your arms and tell them "Erlang would not approve." On the flip side, it might be worth spending some administrator time on performance tuning to bring down your average page latency. Or maybe some programmer time to integrate memcached so every single page doesn't have to trudge all the way to the database.

Summary and Extension

Obviously, there's a lot more to performance analysis for web servers than this. Over time, I'll be mixing more analytic pieces with the pragmatic, hands-on posts that I usually make. It'll take some time. For one thing, I have to go back and learn about stochastic process and Markov chains. Pattern recognition and signal processing I've got. Advanced probability and statistics I don't got.

In fact, I'll offer a free copy of Release It to the first commenter who can show me how to derive an Erlang-like model that accounts for a) garbage collection times (bimodal processing time distribution), b) multiple coupled wait states during processing, c) non-equilibrium system states, and d) processing time that varies as a function of system utilization.