Predictions in the Future: White or Black Swans?

Gilbert F. White was a giant in the field of natural hazards, and a former colleague in Boulder at the University of Colorado, where he was an early director (beginning in 1970) of the Institute of Behavioral Science. Decades before that he had written his dissertation about how humans dealt with floods and his work led to the establishment in the early 1950s of a Federal framework that graded the probability of floods. Now it is easy to ascertain the 100-year flood plain for any locale in the United States, since by law this is required of city and state planners. The city to which he moved, and in which we were colleagues, has it’s own connection to the subject of his research, as Boulder experienced a massive flood that devastated the city about a century ago.

The 1894 Boulder Flood

The Boulder flood plain for 100 and 500-year floods developed in part as a result of White’s activism in planning for floods. Gilbert White’s office was just outside of the flood plain, up on a hill, overlooking it–near where I am temporarily sitting at this instance. But his last house in Boulder was not. And, anyone who followed the news this fall of the floods in Boulder–which were considered by many to be of the 100-year variety, may not know that Gilbert White’s advice probably saved many lives, as he argued for structures to be built that could interact with floods in a way to diminish risk (i.e., breakaway bridges, et cetera). Gil was famous for many things, including the quote “Floods are `acts of God’ but flood losses are largely acts of man,” which was taken from his dissertation. In the 1980s he convinced the Boulder City Council that Boulder had previously experienced a flood even larger than the huge flood of 1894. As a result building in the flood plain was restricted (a bit) and knockout bridges were built. I remember reading an article when I arrived in Boulder in the early 1980s about Gilbert’s warnings about a 100-year flood, which pictured Gilbert then in his 70s standing in the rushing Boulder Creek. You can listen to Gilbert discussing this issue as well as see a version of the Boulder floodplain.

The problem with rating something as a risk in the next 100 years is that people assume that it will never happen. A one-hundred year flood is defined as something that has a 1% chance of occurring in any given year. The field of extreme value theory was developed to model such rare events, but for most people a 1% chance is probably not something to worry about at night.

But suppose you calculate the 100-year probability every year anew, without any evidence about how things have gone in the past.  In a certain sense that may seem correct.  After all, the probability of getting a head in a coin toss doesn’t depend on how many tails there have been in a row.  But suppose that we have a probability that is based on some temporal understanding, even a model with co-variates that may move over time. Ignoring the fact that there was no major flood in Boulder since 1894, but that the probability of a 100-year flood was not zero in 2013 would certainly underestimate the risk of flood.  And if a downward biased estimate of risk was used in planning, then the impact of the flood were it to arrive would be much worse. It turns out that there is a 63% chance of a 100 year flood occurring once in 100 years. Suppose instead that it is a probability of an earthquake.  The earthquake doesn’t happen in a given year, but the tectonic forces continue apace and there is a greater probability in subsequent years.  Base rates are important, as we have discussed before, but so is the incorporation of new information and accounting for the passage of time. Each year in this scenario is not independent.  Lebanon in the 1970s was considered very safe and calm. But forces were afoot that each year continued to build until it was eventually considered to be one of the hotspots of the Middle East.

So suppose you are forecasting that a Civil War will occur in Mozambique in August of 2013.  Let it have a probability of 0.3.  But no Civil War occurred in August. Now it is October, and your monthly model has not been updated. How can you use it to forecast the (new) probability that a Civil War will start in October? Has the likelihood of war dissipated? Stayed the same? Or, has it increased?

The easy case: Suppose you have new information. You could re-estimate your model. But even while updating your probabilities, if you didn’t have some information about prior floods and non-floods it would not be a completely satisfactory approach.

Let p be the probability of an event at time t; p is between 0.0 and 1.0, by definition. The probability that the event doesn’t occur at time t is given, definitionally, by (1 – p), and thus the probability that it occurs is also given by the identity:

$\mathbf{1 - (1 - p)}$

Given that it didn’t occur at time t, what is the probability that the event occurs at time t+1? It is the conditional probability that it happens at t+1 and that it didn’t happen at t:

1 – (probability it doesn’t occur at t+1) times (probability it didn’t occur at t), or

$\mathbf{1 - (1 -p) \times (1-p)}$

or the generalization to

$\mathbf{1 - (1 - p)^n}$

where n is the number of periods into the future at which the forecast is to be applied. This calculates the probability of at least one even over the subsequent n periods

So, assume that you are high and dry in year 100, and the probability of a 100-year flood is 0.01, but that there hasn’t been a flood for the last century (the available data)? What is the probability of a 100-year flood? As mentioned, it is about 63% or

$\mathbf{1-(1 - 0.01)^{100}}$

This is the probability of at least one flood over the 100 year period. Based on this kind of reasoning, Gilbert White told us that we needed to be ready for a Boulder flood. He did so in the 1980s. A 1% probability is nothing to trifle with. He also knew why the probability of a flood was not trivial: the confluence of drainage surfaces and creeks, streams, and rivers was greater in Boulder than anywhere else in Colorado.  A very substantial period of rain would overwhelm that drainage system.

What is the point of this for conflict forecasting?  The point is that if we are doing predictions, our best models are likely to have a short time horizon. If we apply those models to the more distant future, they will get fuzzier as uncertainty increases, but the probabilities will grow in response to the passage of time. Thus, we estimated that the monthly probability of an insurgency in Chad was 0.48 in September, but no insurgency occurred that month, the probability in October is  predicted to be 0.55, somewhat higher, but unless we update or estimates this probability will grow to 0.90 after six months. And this assumes we have the correct model, whatever that might be.  More specifically, it assumes that Chad is more like an earthquake with building pressures, and that the probability of a civil war is not independent each year. To be clear it also assumes that the passage of time does not dissipate the prospect of conflict onset.

Boulder Flood near the Gilbert White Memorial, and not too far from where he was standing in in the above picture.

Obviously if we don’t re-estimate models, in all likelihood they will become uncalibrated. If we wait long enough all probabilities will reach 1.0. That would present a serious upward bias problem. But by the same token, if we constantly re-estimate the probabilities so they can only forecast for a single period, we’re likely to bias things further away from the longer term calibration.  Some people use this kind of reasoning to justify a focus on black swans, but that may be ignoring part of the problem. Our models make predictions. We need to be careful about applying those predictions to the future, even for 100 days, let alone 100 years.

There are three lessons in all of this:

1. Small probabilities should not be ignored.
2. Floods and other rare events happen even if the probability in a given year is low because the cumulative probability over a long period of time is high.
3. The third lesson is that predictions need to be calibrated. If we predict that there is a 0.33 chance of something occurring, only one in three of those predictions should be expected to occur. If we predict that something has a 1% chance of occurring, then about 1% of the time, it will occur.

Like many things this represents a stark tradeoff. Always predict a white swan or always predict eventually that you will have a black swan? Maybe I should have named the blog predictivetradeoffs.com