The 5 A.M. Production Problem
I've got a new piece up at InfoQ.com, discussing the limits of unit and functional testing:
"Functional testing falls short, however, when you want to build software to survive the real world. Functional testing can only tell you what happens when all parts of the system are behaving within specification. True, you can coerce a system or subsystem into returning an error response, but that error will still be within the protocol! If you're calling a method on a remote EJB that either returns "true" or "false" or it throws an exception, that's all it will do. No amount of functional testing will make that method return "purple". Nor will any functional test case force that method to hang forever, or return one byte per second.
One of my recurring themes in Release It is that every call to another system, without exception, will someday try to kill your application. It usually comes from behavior outside the specification. When that happens, you must be able to peel back the layers of abstraction, tear apart the constructed fictions of "concurrent users", "sessions", and even "connections", and get at what's really happening."