Superforecasting – Philip Tetlock & Dan Gardner
Great book about how to look at predictions and how to do them better. Based on a life’s work. Could/should be one for the podcast. Let’s see if I can get my notes out.
“The most important book on decision making since Daniel Kahneman’s Thinking, Fast and Slow.” —Jason Zweig, The Wall Street Journal Everyone would benefit from seeing further into the future, whether buying stocks, crafting policy, launching a new product, or simply planning the week’s meals. Unfortunately, people tend to be terrible forecasters. As Wharton professor Philip Tetlock showed in a landmark 2005 study, even experts’ predictions are only slightly better than chance. However, an important and underreported conclusion of that study was that some experts do have real foresight, and Tetlock has spent the past decade trying to figure out why. What makes some people so good? And can this talent be taught? In Superforecasting, Tetlock and coauthor Dan Gardner offer a masterwork on prediction, drawing on decades of research and the results of a massive, government-funded forecasting tournament. The Good Judgment Project involves tens of thousands of ordinary people—including a Brooklyn filmmaker, a retired pipe installer, and a former ballroom dancer—who set out to forecast global events. Some of the volunteers have turned out to be astonishingly good. They’ve beaten other benchmarks, competitors, and prediction markets. They’ve even beaten the collective judgment of intelligence analysts with access to classified information. They are “superforecasters.” In this groundbreaking and accessible book, Tetlock and Gardner show us how we can learn from this elite group. Weaving together stories of forecasting successes (the raid on Osama bin Laden’s compound) and failures (the Bay of Pigs) and interviews with a range of high-level decision makers, from David Petraeus to Robert Rubin, they show that good forecasting doesn’t require powerful computers or arcane methods. It involves gathering evidence from a variety of sources, thinking probabilistically, working in teams, keeping score, and being willing to admit error and change course. Superforecasting offers the first demonstrably effective way to improve our ability to predict the future—whether in business, finance, politics, international affairs, or daily life—and is destined to become a modern classic.
INTRO Vandaag bespreken we Superforecasting: De Kunst en Wetenschap van Voorspellingen. – We nemen een uitstapje naar de toekomst en zien wat ons te wachten staan – Daarvoor neemt deze reis ons eerst ook terug naar het verleden en bekijken we historische voorbeeld zoals desastreuse De Bay of Pigs, en de succesvolle aanval op Osama bin Laden. – Uiteindelijk beantwoorden we: Hoe kunnen wij, als eigenaren van een bedrijf, betere voorspellingen doen. – En, doen we zelf een paar voorspellingen met de kennis van Philip Tetlock en Dan Gardner in ons achterhoofd.
1 An Optimistic Skeptic
We are all forecasters. When we think about changing jobs, getting married, buying a home, making an investment, launching a product, or retiring, we decide based on how we expect the future will unfold. These expectations are forecasts. Often we do our own forecasting.
Every day, the news media deliver forecasts without reporting, or even asking, how good the forecasters who made the forecasts really are. no feedback loop – Peak Anders Ericcson accuracy of the predictions, he found that the average expert did about as well as random guessing. Some are cautious. More are bold and confident. A handful claim to be Olympian visionaries able to see decades into the future. With few exceptions, they are not in front of the cameras because they possess any proven skill at forecasting. Accuracy is seldom even mentioned. “chaos theory”: in nonlinear systems like the atmosphere, even small changes in initial conditions can mushroom to enormous proportions. Butterfly example, or better one: change of getting Mousoulini, Hitler and Obama in same world = 1/8
there are hard limits on predictability,
We live in a world where the actions of one nearly powerless man can have ripple effects around the world—ripples that affect us all to varying degrees.
In a world where a butterfly in Brazil can make the difference between just another sunny day in Texas and a tornado tearing through a town, it’s misguided to think anyone can see very far into the future.
We make mundane predictions like these routinely, while others just as routinely make predictions that shape our lives.
Unpredictability and predictability coexist uneasily in the intricately interlocking systems that make up our bodies, our societies, and the cosmos. How predictable something is depends on what we are trying to predict, how far into the future, and under what circumstances.
The consumers of forecasting—governments, business, and the public—don’t demand evidence of accuracy. So there is no measurement. Which means no revision. And without revision, there can be no improvement.
have been struck by how important measurement is to improving the human condition,” Bill Gates wrote. “You can achieve incredible progress if you set a clear goal and find a measure that will drive progress toward that goal….This may seem basic, but it is amazing how often it is not done and how hard it is to get right.” Example: business goal is to grow, but what will you measure it against? Smart goals: Specific, Measurable, Attainable, Realistic, Time-bound
Good Judgment Project
Foresight isn’t a mysterious gift bestowed at birth. It is the product of particular ways of thinking, of gathering information, of updating beliefs.
These habits of thought can be learned and cultivated by any intelligent, thoughtful, determined person. It may not even be all that hard to get started.
superforecasting demands thinking that is open-minded, careful, curious, and—above all—self-critical. It also demands focus.
commitment to self-improvement to be the strongest predictor of performance. The result is a combination that can (sometimes) beat both humans and machines. Same skill as needed for entrepreneurship? 2 Illusions of Knowledge We have all been too quick to make up our minds and too slow to change them. Ignorance and confidence remained defining features of medicine. cargo cult science has the outward form of science but lacks what makes it truly scientific. Cargo cult science: comprises practices that have the semblance of being scientific, but do not in fact follow the scientific method. The term was first used by physicist Richard Feynman during his 1974 commencement address at the California Institute of Technology.
What medicine lacked was doubt. “Doubt is not a fearful thing,” Feynman observed, “but a thing of very great value.” It’s what propels science forward.
The politicians would be blind men arguing over the colors of the rainbow.
THINKING ABOUT THINKING you can only share thoughts by introspecting; that is, by turning your attention inward and examining the contents of your mind. And introspection can only capture a tiny fraction of the complex processes whirling inside your head—and behind your decisions.
System 2 is the familiar realm of conscious thought. It consists of everything we choose to focus on. By contrast, System 1 is largely a stranger to us. It is the realm of automatic perceptual and cognitive operations—like those you are running right now to transform the print on this page into a meaningful sentence or to hold the book while reaching for a glass and taking a sip. Thinking, Fast and Slow – Daniel Kahneman System 1 follows a primitive psycho-logic: if it feels true, it is. “System 1 is designed to jump to conclusions from little evidence.” System 1 can only do its job of delivering strong conclusions at lightning speed if it never pauses to wonder whether the evidence at hand is flawed or inadequate, or if there is better evidence elsewhere. It must treat the available evidence as reliable and sufficient. “The Dow rose ninety-five points today on news that…” A quick check will often reveal that the news that supposedly drove the market came out well after the market had risen. But that minimal level of scrutiny is seldom applied. Like everyone else, scientists have intuitions. Indeed, hunches and flashes of insight—the sense that something is true even if you can’t prove it—have been behind countless breakthroughs. But scientists are trained to be cautious. They know that no matter how tempting it is to anoint a pet hypothesis as The Truth, alternative explanations must get a hearing. best evidence that a hypothesis is true is often an experiment designed to prove the hypothesis is false, but which fails to do so. Application in business? A/B test of homepage. Change workshop protocol. Remove a product you think is driving sales.
The key is doubt. If we can only teach one thing today, doubt = good our natural inclination is to grab on to the first plausible explanation and happily gather supportive evidence without checking its reliability. That is what psychologists call confirmation bias. BAIT AND SWITCH attribute substitution, but I call it bait and switch: when faced with a hard question, we often surreptitiously replace it with an easy one. That question becomes a proxy for the original question and if the answer is yes to the second question, the answer to the first also becomes yes. How did we do this in 1) personal life, and 2) Queal 1) If I have money I will be happy is common thing. But that is only a proxy for what really matters (e.g. family connections) 2)
Like it or not, System 1 operations keep humming, nonstop, beneath the babbling brook of consciousness.
BLINKING AND THINKING Popular books often draw a dichotomy between intuition and analysis—“blink” versus “think”—and pick one or the other as the way to go. I am more of a thinker than a blinker, but blink-think is another false dichotomy. The choice isn’t either/or, it is how to blend them in evolving situations. Fire-fighter example – leave house before floor collapses There is nothing mystical about an accurate intuition like the fire commander’s. It’s pattern recognition. Whether intuition generates delusion or insight depends on whether you work in a world full of valid cues you can unconsciously register for future use. it’s often hard to know when there are enough valid cues to make intuition work. And even where it clearly can, caution is advisable. bad forecasting rarely leads as obviously to harm as does bad medicine, it steers us subtly toward bad decisions and all that flows from them—including monetary losses, missed opportunities, unnecessary suffering, even war and death. 3 Keeping Score There can’t be any ambiguity about whether a forecast is accurate or not is far from unusual that a forecast that at first looks as clear as a freshly washed window proves too opaque to be conclusively judged right or wrong. The truth is, the truth is elusive. Judging forecasts is much harder than often supposed, Nostradamus, ‘predictions’ You will see, sooner and later, great changes made, Extreme horrors and vengeances: For as the moon is thus led by its angel, The heavens draw near to the reckoning. Century 1, Quatrain 56 Great changes are being made ! We have the “Yes We Can” changes of Barack Obama but also the Arab Spring, the great changes in the Muslim world and the great changes that President Donald Trump will make. Also at this time China is building a real super army, Russia is preparing for a world war, NATO is planning to grow their army after President Trump warned them to spend more on military. North Korea is building an ICBM to reach the US Mainland and Iran is back into the picture !!! Do i need to say more ? The heavens draw near to the reckoning, we are close to the end of times than we think.
intelligence and knowledge would improve forecasting but the benefits would taper off fast.
Sherman Kent https://en.wikipedia.org/wiki/Sherman_Kent The key word in Kent’s work is estimate. First, the word “possible” should be reserved for important matters where analysts have to make a judgment but can’t reasonably assign any probability. So something that is “possible” has a likelihood ranging from almost zero to almost 100%. Of course that’s not helpful, so analysts should narrow the range of their estimates whenever they can. vague thoughts are easily expressed with vague language but when forecasters are forced to translate terms like “serious possibility” into numbers, they have to think carefully about how they are thinking, a process known as metacognition. wrong-side-of-maybe fallacy. Brier scores measure the distance between what you forecast and what actually happened. benchmarks and comparability. the average expert was roughly as accurate as a dart-throwing chimpanzee. The critical factor was how they thought. One group tended to organize their thinking around Big Ideas, As ideologically diverse as they were, they were united by the fact that their thinking was so ideological. The other group consisted of more pragmatic experts who drew on many analytical tools, Big Idea experts “hedgehogs” and the more eclectic experts “foxes.” Foxes beat hedgehogs on both calibration and resolution. Foxes had real foresight. Hedgehogs didn’t. the hedgehog’s one Big Idea doesn’t improve his foresight. It distorts it. And more information doesn’t help because it’s all seen through the same tinted glasses. Lessson: Take in all the facts, don’t have ideology or wishful thinking blind you
inverse correlation between fame and accuracy: the more famous an expert was, the less accurate he was. Animated by a Big Idea, hedgehogs tell tight, simple, clear stories that grab and hold audiences. People tend to find uncertainty disturbing and “maybe” underscores uncertainty with a bright red crayon. The simplicity and confidence of the hedgehog impairs foresight, but it calms nerves—which is good for the careers of hedgehogs.
The Wisdom of Crowds. Aggregating the judgment of many consistently beats the accuracy of the average member of the group, and is often as startlingly accurate as Galton’s weight-guessers.
Ask 100 people to judge weight of cow, you get very accurate prediction, variance/error up and down, average = correct Some reverently call it the miracle of aggregation but it is easy to demystify. The key is recognizing that useful information is often dispersed widely, with one person possessing a scrap, another holding a more important piece, a third having a few bits, and so on. With valid information piling up and errors nullifying themselves, the net result was an astonishingly accurate estimate. What I should have done is look at the problem from both perspectives—the perspectives of both logic and psycho-logic—and combine what I saw. Unfortunately, aggregation doesn’t come to us naturally. The tip-of-your-nose perspective insists that it sees reality objectively and correctly, so there is no need to consult other perspectives. All too often we agree. We don’t consider alternative views—even when it’s clear that we should. My fox/hedgehog model is not a dichotomy. It is a spectrum. But even expanding the categories to four doesn’t capture people’s styles of thinking. People can and do think differently in different circumstances—cool and calculating at work, perhaps, but intuitive and impulsive when shopping. “All models are wrong,” the statistician George Box observed, “but some are useful.” The fox/hedgehog model is a starting point, not the end. 4 Superforecasters The question “Was the IC’s judgment reasonable?” is challenging. But it’s a snap to answer “Was the IC’s judgment correct?” As I noted in chapter 2, a situation like that tempts us with a bait and switch: replace the tough question with the easy one, answer it, and then sincerely believe that we have answered the tough question. Absent accuracy metrics, there is no meaningful way to hold intelligence analysts accountable for accuracy. Quit pretending you know things you don’t and start running experiments. Good Judgment Project. Give short intro. ordinary people. Asked for specific predictions. Could adjust over time. World-events from 1month to 3 years ahead. E.g. revolution here, election there.
The great majority of my forecasters will have accurately predicted the coin toss about 50% of the time, and they can be found in the middle of the curve. But a few will get quite different results.
This is not complicated stuff. But it’s easy to misinterpret randomness. We don’t have an intuitive feel for it. Randomness is invisible from the tip-of-your-nose perspective. We can only see it if we step outside ourselves.
A similar mistake can be found by rummaging through remaindered business books: a corporation or executive is on a roll, going from success to success, piling up money and fawning profiles in magazines. What comes next? Inevitably, it’s a book recounting the successes and assuring readers that they can reap similar successes simply by doing whatever the corporation or executive did. These stories may be true—or fairy tales. Good to Great, very interesting, but those businesses didn’t do as well afterward Most things in life involve skill and luck, “regression to the mean.” The placebo effect may have helped, but you probably would have felt better the next day even if you had received no treatment at all—thanks to regression to the mean, a fact that won’t occur to you unless you stop and think carefully, instead of going with the tip-of-your-nose conclusion. This modest little mistake is responsible for many of the things people believe but shouldn’t. Keep regression Lesson: don’t trust all (or any) alternative medicine
Mauboussin notes that slow regression is more often seen in activities dominated by skill, while faster regression is more associated with chance.
This suggests that being recognized as “super” and placed on teams of intellectually stimulating colleagues improved their performance enough to erase the regression to the mean we would otherwise have seen.
The correlation between how well individuals (GJP) do from one year to the next is about 0.65, Mostly, their results reflected skill.
5 Supersmart? Not the most business-y chapter, mostly says: they aren’t savants and you can do it too! fluid intelligence, or raw crunching power, crystallized intelligence—knowledge—using We can be quite confident these are not average people. So to understand the role of intelligence and knowledge in superforecaster success, we have to go a step further. They were people who volunteered to spend 1-2 hours per day doing this
Regular forecasters scored higher on intelligence and knowledge tests than about 70% of the population. Superforecasters did better, placing higher than about 80% of the population.
intelligence and knowledge help but they add little beyond a certain threshold—so
not the crunching power that counts. It’s how you use it.
breaking down the question, we can better separate the knowable and the unknowable. So guessing—pulling a number out of the black box—isn’t eliminated. But we have brought our guessing process out into the light of day where we can inspect it. And the net result tends to be a more accurate estimate than whatever number happened to pop out of the black box when we first read the question.
base rate—how common something is within a broader class.
“outside view”—in contrast to the “inside view,” which is the specifics of the particular case.
The outside view is typically abstract, bare, and doesn’t lend itself so readily to storytelling. So even smart, accomplished people routinely fail to consider the outside view.
You may wonder why the outside view should come first. After all, you could dive into the inside view and draw conclusions, then turn to the outside view. Wouldn’t that work as well? Unfortunately, no, it probably wouldn’t. The reason is a basic psychological concept called anchoring.
Superforecasters constantly look for other views they can synthesize into their own.
Researchers have found that merely asking people to assume their initial judgment is wrong, to seriously consider why that might be, and then make another judgment, produces a second estimate which, when combined with the first, improves accuracy almost as much as getting a second estimate from another person.13 The same effect was produced simply by letting several weeks pass before asking people to make a second estimate. This approach, built on the “wisdom of the crowd” concept, has been called “the crowd within.”
“dragonfly eye” in operation. And yes, it is mentally demanding. Superforecasters pursue point-counterpoint discussions routinely, and they keep at them long past the point where most people would succumb to migraines.
“Need for cognition” is the psychological term for the tendency to engage in and enjoy hard mental slogs. People high in need for cognition are the sort who like crosswords and Sudoku puzzles, the harder, the better—and superforecasters score high in need-for-cognition tests.
“Big Five” traits is “openness to experience,”
Active open-mindedness (AOM)
central feature of superforecasters: they have a way with numbers.
There is no moat and no castle wall. While superforecasters do occasionally deploy their own explicit math models, or consult other people’s, that’s rare. The great majority of their forecasts are simply the product of careful thought and nuanced judgment.
In dealing with probabilities, he said, most people only have three settings: “gonna happen,” “not gonna happen,” and “maybe.”
Human beings have coped with uncertainty for as long as we have been recognizably human. And for almost all that time we didn’t have access to statistical models of uncertainty because they didn’t exist. It was remarkably late in history
Before that, people had no choice but to rely on the tip-of-your-nose perspective.
A parent willing to pay something to reduce her child’s risk of contracting a serious disease from 10% to 5% may be willing to pay two to three times as much to reduce the risk from 5% to 0%. Why is a decline from 5% to 0% so much more valuable than a decline from 10% to 5%? Because it delivers more than a 5% reduction in risk. It delivers certainty. Both 0% and 100% weigh far more heavily in our minds than the mathematical models of economists say they should.
Confidence and accuracy are positively correlated. But research shows we exaggerate the size of the correlation.
“Most people would identify science with certainty,”
Scientific results and theories seem to promise such certainty.”
uncertainty is an ineradicable element of reality. “Uncertainty is real,” Byers writes. “It is the dream of total certainty that is an illusion.”15 This is true both at the margin of scientific knowledge and at what currently appears to be its core.
All scientific knowledge is tentative. Nothing is chiseled in granite.
By rejecting certainty, everything became a matter of probability for Rubin, and he wanted as much precision as possible.
“epistemic” and “aleatory” uncertainty. Epistemic uncertainty is something you don’t know but is, at least in theory, knowable. If you wanted to predict the workings of a mystery machine, skilled engineers could, in theory, pry it open and figure it out. Mastering mechanisms is a prototypical clocklike forecasting challenge. Aleatory uncertainty is something you not only don’t know; it is unknowable. No matter how much you want to know whether it will rain in Philadelphia one year from now, no matter how many great meteorologists you consult, you can’t outguess the seasonal averages.
Superforecasters grasp this deep truth better than most. When they sense that a question is loaded with irreducible uncertainty—say, a currency-market question—they have learned to be cautious, keeping their initial estimates inside the shades-of-maybe zone between 35% and 65% and moving out tentatively.
“If you don’t get this elementary, but mildly unnatural, mathematics of elementary probability into your repertoire, then you go through a long life like a one-legged man in an ass-kicking contest.”
It’s a trenchant insight. When something unlikely and important happens it’s deeply human to ask “Why?”
“events happen for a reason and that there is an underlying order to life that determines how events turn out.”27 Meaning is a basic human need.
But as psychologically beneficial as this thinking may be, it sits uneasily with a scientific worldview.
Science doesn’t tackle “why” questions about the purpose of life. It sticks to “how” questions that focus on causation and probabilities.
those who had contemplated alternative paths in life saw the path taken as meant to be.
Natural as such thinking may be, it is problematic. Lay out the tangled chain of reasoning in a straight line and you see this: “The probability that I would meet the love of my life was tiny. But it happened. So it was meant to be. Therefore the probability that it would happen was 100%.” This is beyond dubious. It’s incoherent. Logic and psycho-logic are in tension.
illustration of how radically indeterminate the future is. “You tend to believe that history played out in a logical sort of sense, that people ought to have foreseen, but it’s not like that,” he told me. “It’s an illusion of hindsight.”
probabilistic thinkers take the outside view toward even profoundly identity-defining events, seeing them as quasi-random draws from distributions of once-possible worlds.
So finding meaning in events is positively correlated with wellbeing but negatively correlated with foresight. That sets up a depressing possibility: Is misery the price of accuracy? I don’t know. Discuss! 7 Supernewsjunkies? Unpack the question into components. Distinguish as sharply as you can between the known and unknown and leave no assumptions unscrutinized. Adopt the outside view and put the problem into a comparative perspective that downplays its uniqueness and treats it as a special case of a wider class of phenomena. Then adopt the inside view that plays up the uniqueness of the problem. Also explore the similarities and differences between your views and those of others—and pay special attention to prediction markets and other methods of extracting wisdom from crowds. Synthesize all these different views into a single vision as acute as that of a dragonfly. Finally, express your judgment as precisely as you can, using a finely grained scale of probability. Short Summary
this process is as demanding as it sounds, taking a lot of time and mental energy. And yet it is literally just the beginning.
Forecasts aren’t like lottery tickets that you buy and file away until the big draw. They are judgments that are based on available information and that should be updated in light of changing information.
“When the facts change, I change my mind,”
superforecasters’ initial forecasts were at least 50% more accurate than those of regular forecasters.
Good updating requires the same skills used in making the initial forecast and is often just as demanding. It can even be more challenging.
So there are two dangers a forecaster faces after making the initial call. One is not giving enough weight to new information. That’s underreaction. The other danger is overreacting to new information, seeing it as more meaningful than it is, and adjusting a forecast too radically.
“belief perseverance.” People can be astonishingly intransigent—and capable of rationalizing like crazy to avoid acknowledging new information that upsets their settled beliefs.
Psycho-logic trumps logic. And when Kahan asks people who feel strongly that gun control increases risk, or diminishes it, to imagine conclusive evidence that shows they are wrong, and then asks if they would change their position if that evidence were handed to them, they typically say no. That belief block is holding up a lot of others. Take it out and you risk chaos, so many people refuse to even imagine
This extreme commitment leads to extreme reluctance to admit error, which explains why the men responsible for imprisoning 112,000 innocent people could be so dogged in their belief that the threat of sabotage was severe. Their commitment was massive. Warren was, deep down, a civil libertarian. Admitting to himself that he had unjustly imprisoned 112,000 people would have taken a sledgehammer to his mental tower. This suggests that superforecasters may have a surprising advantage: they’re not experts or professionals, so they have little ego invested in each forecast.
Psychologists call this the dilution effect, and given that stereotypes are themselves a source of bias we might say that diluting them is all to the good. Yes and no. Yes, it is possible to fight fire with fire, and bias with bias, but the dilution effect remains a bias.
Good updating is all about finding the middle passage.
new belief should depend on two things—your prior belief (and all the knowledge that informed it) multiplied by the “diagnostic value” of the new information. Example: possiblity for a competitor to enter our market (soylent). How to process competing info. But there is no magical formula, just broad principles with lots of caveats. Superforecasters understand the principles but also know that their application requires nuanced judgments. And they would rather break the rules than make a barbarous forecast. 8 Perpetual Beta “growth mindset,” your abilities are largely the product of effort—that you can “grow” to the extent that you are willing to work hard and learn. Many people have what she calls a “fixed mindset”—the belief that we are who we are, and abilities can only be revealed, not created and developed. “Only people with a growth mindset paid close attention to information that could stretch their knowledge. Only for them was learning a priority.” To be a top-flight forecaster, a growth mindset is essential. The best illustration is the man who is reputed to have said—but didn’t—“When the facts change, I change my mind.”Discussion: growth and fixed mindset in different domains of our knowledge/life? E.g. fixed about weight, growth about programming skills
For Keynes, failure was an opportunity to learn—to identify mistakes, spot new alternatives, and try again. The one consistent belief of the “consistently inconsistent” John Maynard Keynes was that he could do better. Failure did not mean he had reached the limits of his ability. It meant he had to think hard and give it another go. Try, fail, analyze, adjust, try again: Keynes cycled through those steps ceaselessly.
We learn new skills by doing. We improve those skills by doing more. These fundamental facts are true of even the most demanding skills.
We need “tacit knowledge,” the sort we only get from bruising experience.
It should be equally obvious that learning to forecast requires trying to forecast. Reading books on forecasting is no substitute for the experience of the real thing.
informed practice. You need to know which mistakes to look out for—and which best practices really are best.
Fortune favors the prepared mind.
To learn from failure, we must know when we fail.
Unfortunately, most forecasters do not get the high-quality feedback that helps meteorologists and bridge players improve. There are two main reasons why. Ambiguous language is a big Vague language is elastic language. The students stretched it to fit their self-images, even though they thought they were judging the test objectively. The lesson for forecasters who would judge their own vague forecasts is: don’t kid yourself.
The second big barrier to feedback is time lag. Not only will you have to contend with ordinary forgetfulness, you are likely to be afflicted by what psychologists call hindsight bias. Once we know the outcome of something, that knowledge skews our perception of what we thought before we knew the outcome: that’s hindsight bias.
judgment calibrated in one context transfers poorly, if at all, to another.
“Be careful about making assumptions of expertise, ask experts if you can find them, reexamine your assumptions from time to time.”
prepares for the postmortem from the moment the tournament question is announced.
Even with a growth mindset, the forecaster who wants to improve has to have a lot of what my colleague Angela Duckworth dubbed “grit.” Grit is passionate perseverance of long-term goals, even in the face of frustration and failure. Married with a growth mindset, it is a potent force for personal progress.
Superforecasters are perpetual beta.
CAUTIOUS: Nothing is certain HUMBLE: Reality is infinitely complex NONDETERMINISTIC: What happens is not meant to be and does not have to happen In their abilities and thinking styles, they tend to be: ACTIVELY OPEN-MINDED: Beliefs are hypotheses to be tested, not treasures to be protected INTELLIGENT AND KNOWLEDGEABLE, WITH A “NEED FOR COGNITION”: Intellectually curious, enjoy puzzles and mental challenges REFLECTIVE: Introspective and self-critical NUMERATE: Comfortable with numbers In their methods of forecasting they tend to be: PRAGMATIC: Not wedded to any idea or agenda ANALYTICAL: Capable of stepping back from the tip-of-your-nose perspective and considering other views DRAGONFLY-EYED: Value diverse views and synthesize them into their own PROBABILISTIC: Judge using many grades of maybe THOUGHTFUL UPDATERS: When facts change, they change their minds GOOD INTUITIVE PSYCHOLOGISTS: Aware of the value of checking thinking for cognitive and emotional biases In their work ethic, they tend to have: A GROWTH MINDSET: Believe it’s possible to get better GRIT: Determined to keep at it however long it takes
superforecasting appears to be roughly 75% perspiration, 25% inspiration.
Victims of Groupthink,
Janis’s hypothesis, “members of any small cohesive group tend to maintain esprit de corps by unconsciously developing a number of shared illusions and related norms that interfere with critical thinking and reality testing.”3 Groups that get along too well don’t question assumptions or confront uncomfortable facts.
Teams can cause terrible mistakes. They can also sharpen judgment and accomplish together what cannot be done alone. Managers tend to focus on the negative or the positive but they need to see both. As mentioned earlier, the term “wisdom of crowds” comes from James Surowiecki’s 2004 bestseller of the same name, but Surowiecki’s title was itself a play on the title of a classic 1841 book, Extraordinary Popular Delusions and the Madness of Crowds, which chronicled a litany of collective folly. Groups can be wise, or mad, or both. What makes the difference isn’t just who is in the group, Kennedy’s circle of advisers demonstrated. The group is its own animal.
teams might foster cognitive loafing. Why labor to master a complex problem when others will do the heavy lifting? Example: people screaming ‘as loud as they can’, do less. But All Blacks (NZ rugby) probably louder together. But groups also let people share information and perspectives. That’s good. It helps make dragonfly eye work, and aggregation is critical to accuracy. In so many ways, a group can get people to abandon independent judgment and buy into errors. When that happens, the mistakes will pile up, not cancel out. This is the root of collective folly, whether it’s Dutch investors in the seventeenth century, who became collectively convinced that a tulip bulb was worth more than a laborer’s annual salary, or American home buyers in 2005, talking themselves into believing that real estate prices could only go up. Since Socrates, good teachers have practiced precision questioning, but still it’s often not used when it’s needed most. At the end of the year, the results were unequivocal: on average, teams were 23% more accurate than individuals. e.g. All Blacks
There’s also the “premortem,” in which the team is told to assume a course of action has failed and to explain why—which makes team members feel safe to express doubts they may have about the leader’s plan.
“The team is so much more effective at gathering information than one person could ever be,” Paul told me. “There is simply no way that any individual could cover as much ground as a good team does. Even if you had unlimited hours, it would be less fruitful, given different research styles. Each team member brings something different.”
efficient market hypothesis (EMH), and it’s hard to square with what we have learned from psychology and experience. Markets make mistakes. Sometimes they lose their collective minds. But even if markets are far less efficient than ardent proponents of the EMH suppose, it is still very hard to consistently beat markets, which is why so few can plausibly claim to have done it.
Teams were not merely the sum of their parts. How the group thinks collectively is an emergent property of the group itself, a property of communication patterns among group members, not just the thought processes inside each member.
final feature of winning teams: the fostering of a culture of sharing. My Wharton colleague Adam Grant categorizes people as “givers,” “matchers,” and “takers.” Givers are those who contribute more to others than they receive in return; matchers give as much as they get; takers give less than they take.
the pro-social example of the giver can improve the behavior of others, which helps everyone, including the giver—which explains why Grant has found that givers tend to come out on top.
Combining uniform perspectives only produces more of the same, while slight variation will produce slight improvement. It is the diversity of the perspectives that makes the magic work.
Discuss: how to make predictions for your business. With team, open-minded (sharing), different perspectives. 10 The Leader’s Dilemma Leaders must decide, and to do that they must make and use forecasts. The more accurate those forecasts are, the better, so the lessons of superforecasting should be of intense interest to them. But leaders must also act and achieve their goals. In a word, they must lead. And anyone who has led people may have doubts about how useful the lessons of superforecasting really are for leaders. qualities an effective leader. Confidence will be on everyone’s list. Leaders must be reasonably confident, and instill confidence in those they lead, because nothing can be accomplished without the belief that it can be. Decisiveness is another essential attribute. Leaders can’t ruminate endlessly. They need to size up the situation, make a decision, and move on. And leaders must deliver a vision—the goal that everyone strives together to achieve. style of thinking that produces superforecasting and consider how it squares with what leaders must deliver. Leaders must be forecasters and leaders but it seems that what is required to succeed at one role may undermine the other. Helmuth von Moltke.was a German Field Marshal. The chief of staff of the Prussian Army for thirty years, he is regarded as the creator of a new, more modern method of directing armies in the field.
“everything is uncertain” was for Moltke an axiom whose implications needed to be teased out.
The most urgent is to never entirely trust your plan. “No plan of operations extends with certainty beyond the first encounter with the enemy’s main strength,”
is impossible to lay down binding rules”
“two cases never will be exactly the same.” Improvisation is essential.
Disagreement was not only permitted, it was expected, and even the instructor’s views could be challenged because he “understood himself to be a comrade among others,”
The fundamental message: think.
imperfect decision made in time was better than a perfect one made too late.
sharp line between deliberation and implementation: once a decision has been made, the mindset changes. Forget uncertainty and complexity. Act! “If one wishes to attack, then one must do so with resoluteness. Half measures are out of place,”
leader must possess unwavering determination to overcome obstacles and accomplish his goals—while remaining open to the possibility that he may have to throw out the plan and try something else.
“Once a course of action has been initiated it must not be abandoned without overriding reason,”
principle of Auftragstaktik. Usually translated today as “mission command,” Pause moment: mission command. Example? Decision-making power must be pushed down the hierarchy so that those on the ground—the first to encounter surprises on the evolving battlefield—can respond quickly. Of course those on the ground don’t see the bigger picture. If they made strategic decisions the army would lose coherence and become a collection of tiny units, each seeking its own ends. Auftragstaktik blended strategic coherence and decentralized decision making with a simple principle: commanders were to tell subordinates what their goal is but not how to achieve it. “Tell them what to do, and they will surprise you with their ingenuity.” leader “needs to figure out what’s the right move and then execute it boldly.”23 That’s the tension between deliberation and implementation that Moltke emphasized and Petraeus balanced in Iraq. “We let our people know what we want them to accomplish. But—and it is a very big ‘but’—we do not tell them how to achieve those goals.”25 That is a near-perfect summary of “mission command.” The speaker is William Coyne, who was senior vice president of research and development at 3M, the famously innovative manufacturing conglomerate. “Have backbone; disagree and commit” is one of Jeff Bezos’s fourteen leadership principles drilled into every new employee at Amazon. It continues: “Leaders are obligated to respectfully challenge decisions when they disagree, even when doing so is uncomfortable or exhausting. Leaders have conviction and are tenacious. They do not compromise for the sake of social cohesion. Once a decision is determined, they commit wholly.” distinguishes what she is confident about—and what she’s not. intellectual humility. It is a recognition that reality is profoundly complex, that seeing things clearly is a constant struggle, when it can be done at all, and that human judgment must therefore be riddled with mistakes. Understanding what worked in the Wehrmacht requires engaging in the toughest of all forms of perspective taking: acknowledging that something we despise possesses impressive qualities. Forecasters who can’t cope with the dissonance risk making the most serious possible forecasting error in a conflict: underestimating your opponent. 11 Are They Really So Super? WYSIATI—What You See Is All There Is—the mother of all cognitive illusions, the egocentric worldview that prevents us from seeing any world beyond the one visible from the tips of our noses. The cognitive illusions that the tip-of-your-nose perspective sometimes generates are similarly impossible to stop. We can’t switch off the tip-of-our-nose perspective. We can only monitor the answers that bubble up into consciousness—and, when we have the time and cognitive capacity, use a ruler to check. scope insensitivity thirty years ago when he asked a randomly selected group in Toronto, the capital city of Ontario, how much they would be willing to pay to clean up the lakes in a small region of the province. On average, people said about $10.3 Kahneman asked another randomly selected group how much they would be willing to pay to clean up every one of the 250,000 lakes in Ontario. They too said about $10. Later people were responding to, Kahneman later wrote, was the prototypical image that came to mind. To put this in Kahneman’s terms, the time frame is the “scope” of the forecast. No matter how physically or cognitively demanding a task may be—cooking, sailing, surgery, operatic singing, flying fighter jets—deliberative practice can make it second nature. But Taleb isn’t interested only in surprise. A black swan must be impactful. Indeed, Taleb insists that black swans, and black swans alone, determine the course of history. “History and societies do not crawl,” he wrote. “They make jumps.” If black swans must be inconceivable before they happen, a rare species of event suddenly becomes a lot rarer. Taleb also offers a more modest definition of a black swan as a “highly improbable consequential event.” History does sometimes jump. But it also crawls, and slow, incremental change can be profoundly important. “It’s funny that a 90% chance of failing people don’t like, but a 10% chance of changing the world people love.” This is black swan investing, and it’s similar to how Taleb himself traded—very successfully—before becoming an author. But it’s not the only way to invest. A very different way is to beat competitors by forecasting more accurately—for example, correctly deciding that there is a 68% chance of something happening when others foresee only a 60% chance. This is the approach of the best poker players. It pays off more often, but the returns are more modest, and fortunes are amassed slowly. It is neither superior nor inferior to black swan investing. It is different. Taleb, Kahneman, and I agree there is no evidence that geopolitical or economic forecasters can predict anything ten years out beyond the excruciatingly obvious—“there will be conflicts”—and the odd lucky hits that are inevitable whenever lots of forecasters make lots of forecasts. These limits on predictability are the predictable results of the butterfly dynamics of nonlinear systems. In my EPJ research, the accuracy of expert predictions declined toward chance five years out. If you have to plan for a future beyond the forecasting horizon, plan for surprise. How to apply this?
“Plans are useless,” Eisenhower said about preparing for battle, “but planning is indispensable.”
Probability judgments should be explicit so we can consider whether they are as accurate as they can be. And if they are nothing but a guess, because that’s the best we can do, we should say so. Knowing what we don’t know is better than thinking we know what we don’t.
Kahneman and other pioneers of modern psychology have revealed that our minds crave certainty and when they don’t find it, they impose it. In forecasting, hindsight bias is the cardinal sin.
He posits that historical probabilities—all the possible ways the future could unfold—are distributed like wealth, not height. That means our world is vastly more volatile than most of us realize and we are at risk of grave miscalculations.\
Savoring how history could have generated an infinite array of alternative outcomes and could now generate a similar array of alternative futures, is like contemplating the one hundred billion known stars in our galaxy and the one hundred billion known galaxies.
Kahneman, Taleb, and I agree on that much. But I also believe that humility should not obscure the fact that people can, with considerable effort, make accurate forecasts about at least some developments that really do matter. To be sure, in the big scheme of things, human foresight is puny, but it is nothing to sniff at when you live on that puny human scale.
12 What’s Next?
Tournaments could help society, too, by providing tools for structuring our thinking about what is likely to happen if we venture down one policy path or another. Vague expectations about indefinite futures are not helpful. Fuzzy thinking can never be proven wrong. And only when we are proven wrong so clearly that we can no longer deny it to ourselves will we adjust our mental models of the world—producing a clearer picture of reality. Forecast, measure, revise: it is the surest path to seeing better.
Whether the analyst’s forecast was accurate did not matter in the slightest.
It follows that the goal of forecasting is not to see what’s coming. It is to advance the interests of the forecaster and the forecaster’s tribe. Accurate forecasts may help do that sometimes, and when they do accuracy is welcome, but it is pushed aside if that’s what the pursuit of power requires. Earlier, I discussed Jonathan Schell’s 1982 warning that a holocaust would certainly occur in the near future “unless we rid ourselves of our nuclear arsenals,” which was clearly not an accurate forecast. Schell wanted to rouse readers to join the swelling nuclear disarmament movement. He did.
You don’t have to be a Marxist-Leninist to concede that Lenin had a point. Self and tribe matter. If forecasting can be co-opted to advance their interests, it will be. From this perspective, there is no need to reform and improve forecasting, and it will not change, because it is already serving its primary purpose well.
The same thinking is transforming charitable foundations, which are increasingly making funding contingent on stringent program evaluation. Those that deliver are expanded, those that fail are shuttered. In line with Bill Gates’s insistence on having clear goals and measures, the Gates Foundation, one of the world’s largest, is renowned for the rigor of its evaluations.
Perhaps the best reason to hope for change is the IARPA tournament itself. If someone had asked me a decade ago to list the organizations that most needed to get serious about forecasting but were least likely to do so, the intelligence community would have been at the top. Why? Kto-kogo. Evidence-based forecasting will improve their work in the long run, but it’s dangerous in the short run.
What would help is a sweeping commitment to evaluation: Keep score. Analyze results. Learn what works and what doesn’t. But that requires numbers, and numbers would leave the intelligence community vulnerable to the wrong-side-of-maybe fallacy, lacking any cover the next time they blow a big call.
As the cultural critic Leon Wieseltier put it in the New York Times, “There are ‘metrics’ for phenomena that cannot be metrically measured.
The truly numerate know that numbers are tools, nothing more, and their quality can range from wretched to superb. Numbers must be constantly scrutinized and improved, which can be an unnerving process because it is unending. Progressive improvement is attainable. Perfection is not.
Credit scores may be miles from perfect but they’re a huge advance from that. Similarly, while I can’t claim my scoring system is perfect, it is a big improvement on judging a forecaster by the criteria used nowadays—titles, confidence, skill at spinning a story, number of books sold, CNN appearances, and time spent at Davos.
So we confront a dilemma. What matters is the big question, but the big question can’t be scored. The little question doesn’t matter but it can be scored, so the IARPA tournament went with it. You could say we were so hell-bent on looking scientific that we counted what doesn’t count. That is unfair. The questions in the tournament had been screened by experts to be both difficult and relevant to active problems on the desks of intelligence analysts. But it is fair to say these questions are more narrowly focused than the big questions we would all love to answer, like “How does this all turn out?” That one tiny question doesn’t nail down the big question, but it does contribute a little insight. And if we ask many tiny-but-pertinent questions, we can close in on an answer for the big question. call this Bayesian question clustering
In future research, I want to develop the concept and see how effectively we can answer unscorable “big questions” with clusters of little ones.
The psychological recipe for the ideal superforecaster may prove to be quite different from that for the ideal superquestioner,
Holy Grail of my research program: using forecasting tournaments to depolarize unnecessarily polarized policy debates and make us collectively smarter.
When a debate like “Keynesians versus Austerians” emerges, key figures could work together—with the help of trusted third parties—to identify what they disagree about and which forecasts would meaningfully test those disagreements. The key is precision. It’s one thing for Austerians to say a policy will cause inflation and Keynesians to say it won’t. But how much inflation? Measured by what benchmark? Over what time period? The result must be a forecast question that reduces ambiguity to an absolute minimum.
we have to do is get serious about keeping score.
can’t imagine a better description of the “try, fail, analyze, adjust, try again” cycle—and of the grit to keep at it and keep improving. Bill Flack is perpetual beta. And that, too, is why he is a superforecaster.
If you like what you’ve read about superforecasters and the Good Judgment Project, please consider joining us. You’ll be improving your own forecasting skills and helping science at the same time. You can learn more at www.goodjudgment.com.
Goal of forecasting should be better predictions, not pleasing the crowd/tribe Do this: https://www.farnamstreetblog.com/2015/12/ten-commandments-for-superforecasters/ (talk about process afterward on podcast)