Wide Awake Developers

Root Cause Analysis as Storytelling

| Comments

Humans are great storytellers and even better story-listeners. We love to hear stories so much that when there aren’t any available, we make them up on our own.

From an early age, children grasp the idea of narrative. Even if they don’t understand the forms of storytelling so much, you can hear a four-year-old weave a linked list of events from her day.

We look for stories behind everything. At a deep level, we want the world’s events to mean something. Effect follows cause, and causes have an actor to set them in motion.

Our sense of balance also demands that large effects should have large causes, with correspondingly large intent.

A drunk driver speeds through a red light, oblivious. A crossing car stops short. The shaken driver creeps home with a pounding pulse, full of queasy adrenaline. She unbuckles her daughter and hugs her tightly.

A drunk driver speeds through a red light, oblivious. A crossing car is in the intersection. The drunk smashes into it, right at the drivers’ side door. The woman’s bloody face is hidden behind airbags. Her daughter sits in her new wheelchair for her mother’s funeral.

The difference between those stories is a matter of a split second in timing. There is absolutely no change in the motives or desires of anyone in the two vignettes. The first drunk, if caught, would get a jail term and large fine. He would probably lose his driver’s license.

But most people would judge the motives of the second driver far more harshly. They would condemn him to a lengthy prison term and a lifetime ban on driving.

When we see a large effect, we expect a large cause, with a large intent.

The idea that some vast, horrible events strike randomly fills us with dread. People can’t bear the thought that a single unbalanced nobody can change the course of a nation’s history with one rifle shot, so they spend more than 50 years searching for “the truth.”

“Root Cause Analysis” expresses a desire for narrative. With the power of hindsight, we want to find out what went wrong, who did it, and how we can make sure it never happens again. But because we have the posterior event, we judge the prior probabilities differently. Any anomaly or blip suddenly becomes suspect.

People don’t look as hard at anomalies when nothing bad happens.

They don’t notice all the times the same weird log message pops up before … everything continues as normal.

When we look for “root cause,” what we are really trying to discern is not “what made this happen.” We are looking for something that would have stopped it from happening. We are building a counterfactual narrative—an alternate history—where that drunk driver dropped his keys in the parking lot and was thereby delayed a few crucial seconds.

Peel back the surface on a root cause analysis and you almost always see a formula that goes like this: “factor X” could have prevented this. “Factor X” was not present, therefore the bad event happened.

The catch is that there is usually an endless variety of possible counterfactuals. Often, more than one counterfactual narrative would have prevented the bad outcome equally well. Which one was the root cause? Non-existence of “factor X” or non-existence of “factor Y?”

Next time you have a bad incident, why not try to focus your efforts in a different way? Work on learning from the times that things don’t go wrong. And be explicit about looking for many possible interventions that would have prevented the problem. Then select ones with broad ability to prevent or impede many different problems.

Comments