The Outcome Bias

Eight decisions. Four hidden pairs. The same call shown twice, once with a good result and once with a bad one. One 0-100 wisdom slider each. WIZ measures how much the result rewrote your verdict on the decision.

“Outcomes affect evaluations of decisions even when the decision-maker could not have known the outcome at the time of the choice.”Jonathan Baron and John Hershey, JPSP 1988

In 1988 Jonathan Baron and John Hershey described a 55-year-old man with a heart condition. His cardiologist recommended bypass surgery with an 8% mortality risk. The man chose the surgery. Half the subjects in the study were told he recovered. Half were told he died on the table. The decision was identical in both versions; the outcome was the only thing that changed. Subjects rated the wisdom of the decision. The good-outcome group averaged 78. The bad-outcome group averaged 43. The thirty-five-point gap on identical decision inputs became the founding measurement of the outcome bias.

Walster (1966) had found the same pattern in driver-accident scenarios: identical driving was assigned 50% more blame when it injured a pedestrian. Mitchell and Kalb (1981) showed it in supervisor evaluations. Lipshitz (1989) in military command. Allison Mackie and Messick (1996) in group decisions. Marshall and Mowen (1993) in retail manager hiring. The empirical picture across the literature is consistent: a decision\'s outcome substantially rewrites the wisdom rating of the underlying choice, even when subjects are explicitly told to ignore the outcome and judge process only.

Annie Duke (2018) named the popular version “resulting” in her book on poker decision-making: the habit of grading a poker hand by whether it won rather than by whether it was the right play given the cards. Phil Ivey raises pre-flop with pocket kings, loses to a backdoor flush, and resulting subjects say he should have folded. The fold was the wrong play. The result was the wrong result.

You are about to see eight decision scenarios. Four are hidden pairs: a bypass operation, a concentrated investment, letting a 16-year-old drive in winter, a startup skipping its private beta. Each pair runs the identical decision twice, once with a good outcome and once with a bad outcome. You move a 0-100 slider on the wisdom of each decision judged on what the chooser knew at the time. At the end, WIZ averages your good-outcome and bad-outcome ratings and shows you the gap. A pure process judge would have a gap of zero. The 1988 founding study found a gap of thirty-five. Most adult subjects sit between twenty and forty.

The decision-time information is the same in both halves of every pair. Anything the outcome adds is the bias.