I'm reading a fabulous book called "Inviting Disaster", by James R. Chiles. He discusses hundreds of engineering and mechanical disasters. Most of them caused serious loss of life.

There are several common themes:

1. Enormously complex systems that react in sometimes unpredictable ways

2. Inadequate testing, training, or preparedness for failures -- particularly for multiple concurrent failures

3. A chain of events leading to the "system fracture". Usually exacerbated by human error

4. Politics or budget pressure causing otherwise responsible people to rush things out. This often involves whitewashing or pooh-poohing legitimate criticism and concern from experts involved.

The parallels to some projects I've worked on are kind of eerie. Particularly when he's talking about things like the DC-10 and the Hubble Space Telescope. In both of those cases, warning signs were visible during the construction and early testing, but because each of the people involved had tunnel vision limited to that person's silo, the clues got missed.

The scary part is that there is no solution here. Sometimes, you can't even place the blame very squarely. When half-a-dozen people were involved with unloading and handling of oxygen-generating cylinders on a ValuJet flight, no single individual really did something wrong (or contrary to procedure, anyway). Still, the net effect of their actions cost the lives of every single person on that flight.

It's grim stuff, but it ought to be required reading. If you ever leave your house again, you'll be much better prepared for building and operating complex systems.

Technorati Tags: operations, systems