Human Judgment is Heavy-Tailed: Empirical Evidence and Implications for the Aggregation of Estimates and Forecasts Management Science Forthcoming
How frequent are large disagreements in human judgment? The substantial literature relating to expert assessments of real-valued quantities and their aggregation almost universally assumes that errors follow a jointly normal distribution. We investigate this question empirically using 17 datasets that include over 20,000 estimates and forecasts. We find incontrovertible evidence for excess kurtosis, that is, of fat tails. Despite the diversity of the analyzed datasets as regards the degree of uncertainty about the quantity being assessed and the level of expertise and sophistication of those making the assessments, we find consistency in the frequency with which an expert is in large disagreement with the consensus. Fitting a generalized normal distribution to the data, we find values for the shape parameter ranging from 1 to 1.6 (where 1 is the double-exponential distribution, and 2 the normal distribution). This has important implications, in particular for the aggregation of expert estimates and forecasts. We describe optimal Bayesian aggregation with heavy tails, and propose a simple average-median average heuristic that performs well for the range of empirically observed distributions.