In my last post, I talked about using Erlang models to size the front end of a system. By using some fundamental capacity models that are almost a century old, you can estimate the number of request handling threads you need for a given traffic load and request duration.

Inside the Box

It gets tricky, though, when you start to consider what happens inside the server itself. Processing the request usually involves some kind of database interaction with a connection pool. (There are many ways to avoid database calls, or at least minimize the damage they cause. I'll address some of these in a future post, but you can also check out Two Ways to Boost Your Flagging Web Site for starters.) Database calls act like a kind of "interior" request that can be considered to have its own probability of queuing.

Exterior call to server becomes an "interior" call to a database.

Because this interior call can block, we have to consider what effects it will have on the duration of the exterior call. In particular, the exterior call must take at least the sum of the blocking time plus the processing time for the interior call.

At this point, we need to make a few assumptions about the connection pool. First, the connection pool is finite. Every connection pool should have a ceiling. If nothing else, the database server can only handle a finite number of connections. Second, I'm going to assume that the pool blocks when exhausted. That is, calling threads that can't get a connection right away will happily wait forever rather than abandoning the request. This is a simplifying assumption that I need for the math to work out. It's not a good configuration in practice!

With these assumption in place, I can predict the probability of blocking within the interior call. It's a formula closely related to the Erlang model from my last post, but with a twist. The Erlang models assume an essentially infinite pool of requestors. For this interior call, though, the pool of requestors is quite finite: it's the number of request handling threads for the exterior calls. Once all of those threads are busy, there aren't any left to generate more traffic on the interior call!

The formula to compute the blocking probability with a finite number of sources is the Engset formula. Like the Erlang models, Engset originated in the world of telephony. It's useful for predicting the outbound capacity needed on a private branch exchange (PBX), because the number of possible callers is known. In our case, the request handling threads are the callers and the connection pool is the PBX.

Practical Example

Using our 1,000,000 page views per hour from last time, Table 1 shows the Engset table for various numbers of connections in the pool. This assumes that the application server has a maximum of 40 request handling threads. This also supposes that the database processing time uses 200 milliseconds of the 250 milliseconds we measured for the exterior call.

NEngset(N,A,S)
0100.00000%
198.23183%
296.37740%
394.43061%
492.38485%
590.23293%
687.96709%
785.57891%
883.05934%
980.39867%
1077.58656%
1174.61210%
1271.46397%
1368.13065%
1464.60087%
1560.86421%
1656.91211%
1752.73932%
1848.34604%
1943.74105%
2038.94585%
2134.00023%
2228.96875%
2323.94730%
2419.06718%
2514.49235%
2610.40427%
276.97050%
284.30152%
292.41250%
301.21368%
310.54082%
320.21081%
330.07093%
340.02028%
350.00483%
360.00093%
370.00014%
380.00002%
390.00000%
400.00000%

Notice that when we get to 18 connections in the pool, the probability of blocking drops below 50%.  Also, notice how sharply the probability of blocking drops off around 23 to 31 connections in the pool. This is a decidedly nonlinear effect!

From this table, it's clear that even though there are 40 request handling threads that could call into this pool, there's not much point in having more than 30 connections in the pool. At 30 connections, the probability of blocking is already less than 1%, meaning that the queuing time is only going to add a few milliseconds to the average request.

Why do we care? Why not just crank up the connection pool size to 40? After all, if we did, then no request could ever block waiting for a connection. That would minimize latency, wouldn't it?

Yes, it would, but at a cost. Increasing the number of connections to the database by a third means more memory and CPU time on the database just managing those connections, even if they're idle. If you've got two app servers, then the database probably won't notice an extra 10 connections. Suppose you scale out at the app tier, though, and you now have 50 or 60 app servers. You'd better believe that the DB will notice an extra 500 to 600 connections. They'll affect memory needs, CPU utilization, and your ability to fail over correctly when a database node goes down.

Feedback and Coupling

There's a strong coupling between the total request duration in the interior call and the request duration for the exterior call. If we assume that every request must go through the database call, then the exterior response time must be strictly greater than the interior blocking time plus the interior processing time.

In practice, it actually gets a little worse than that, as this causal loop diagram illustrates.

 Time dependencies between the interior call and the exterior call.

It reads like this: "As the interior call blocking time increases, the exterior call duration increase. As the interior call blocking increases, the exterior call duration time increases." This type of representation helps clarify relations between the different layers. It's very often the case that you'll find feedback loops this way. Any time you do find a feedback loop, it means that slowdowns will produce increasing slowdowns. Blocking begets blocking, quickly resulting in a site hang.

Conclusions

Queues are like timing dots. Once you start seeing them, you'll never be able to stop. You might even start to think that your entire server farm looks like one vast, interconnected set of queues.

That's because it is.

People use database connection pools because creating new connections is very slow. Tuning your database connection pool size, however, is all about optimizing the cost of queueing against the cost of extra connections. Each connection consumes resources on the database server and in the application server. Striking the right balance starts by identifying the required exterior response time, then sizing the connection pool---or changing the architecture---so the interior blocking time doesn't break the SLA.

For much, much more on the topic of capacity modeling and analysis, I definitely recommend Neil Gunther's website, Performance Agora. His books are also a great---and very practical---way to start applying performance and capacity management.