# Methodology: Statistical Model

We generate our Statistical Risk Assessment by identifying historical instances of mass killing, discerning patterns that distinguished countries that experienced mass killing from others, and then applying that model to the latest publicly available data to estimate the likelihood of a new mass killing in each of more than 160 countries.

## Methodology for Generating Our Statistical Risk Assessment

We generate our Statistical Risk Assessment with a statistical modeling approach involving five steps:

- Identifying historical episodes of state- and nonstate-led mass killing (1960–present for state-led, 1989–present for nonstate-led)
- Compiling data on potential “predictors” or “risk factors”—i.e., characteristics of countries that are thought to be associated with the likelihood of mass killing in the near future—from existing public sources
- Training different statistical algorithms on historical data (1960 to 2015) to identify a model that performs well predicting onset of mass killing within the training set
- Testing alternative models and selecting one that maximizes accuracy (as measured on a new dataset, not the one used for training the model), while still allowing for useful interpretation of the model
- Using current data on countries to make forecasts two years into the future (i.e. 2018 data for the 2019-2020 forecasts; 2019 data for the 2020-21 forecasts, etc.); this generates an estimated risk (as a percentage chance of onset of mass killing) for each country, and a corresponding ranking

As of the 2017–18 assessment, the “winning” algorithm in our tests, which we employ, is a logistic regression model with “elastic-net” regularization. We feed a set of about 38 variables into this machine learning algorithm, which essentially looks for patterns in the data, and identifies how these variables or potential “risk factors” are related to the onset of mass killing. Based on the model, factors associated with greater risk of mass killing include history of mass killing, political killings that are frequently approved of or incited by top leaders of government, coup attempts within the last five years, inequality in civil liberties by geographic region, large population size, restrictions on religious freedom, high battle-related deaths, and high infant mortality rate. You can find the model coefficients on each constituent variable via our GitHub repository.

### How accurate is the model?

We assessed the accuracy of this model in ways that mimicked how we use its results: we built our model on data from a period of years and then tested its accuracy on data for later years (i.e., we conducted out-of-sample testing). Our results indicate that about two out of every three countries that later experienced a new onset of mass killing ranked among the top-30 countries in a given year. See the Accuracy page for more details based on analysis we conducted in 2020. Updated results are forthcoming.

## Important Updates in 2017 Model Revision

Note: Because our data and methods changed in 2017, risk estimates and rankings from 2014 through 2016 should not be compared directly with results from 2017 onward

### Perpetrator Groups

#### We now estimate the risk of both state-led and nonstate mass killing onsets.

Previous iterations of our Statistical Risk Assessment assessed risk of state-led mass killing only. Now we train our statistical model on episodes of state-led and nonstate-led mass killing, meaning the results reflect the likelihood of mass killing by either type of perpetrator.

### Data Sources

#### We systematically reviewed the data we use to generate forecasts to take advantage of new sources and guard against potential biases.

Our risk assessment relies on publicly available data on a variety of country characteristics. In 2017, we updated our data sources as new datasets became available—for example, measures of civil liberties and government repression. We also took special care to avoid using data that, in our judgment, could be susceptible to bias when coded or recoded retrospectively. This should help ensure that our model performs as well in practice as it does on historical data.

### Timing

#### We now estimate the probability of a mass killing onset over a two-year window.

Our updated model forecasts events that could occur anytime in the two calendar years following the year in which our risk factors are measured. Previously, our forecasts were focused on the following one year. We believe the two-year forecasts are a better match for the time required to conduct additional analysis and planning, and implement preventive actions while they are still timely. They are also statistically preferable because “rare events” make forecasting much more difficult, but onsets are less rare in two-year windows than in single-year windows.

### Model

#### We tested several statistical algorithms and selected an approach that maximized forecasting accuracy and interpretability.

We sought to leverage statistical procedures that have shown promise on similar problems, using a process of model development and selection that follows best practices for political forecasting. This meant comparing different models on a task that closely mimicked true forecasting in practice. We identified one model whose accuracy compared well with alternatives (including an average of multiple models) and whose results could be interpreted with relative ease.

## Previous Statistical Risk Assessment Methodology (2014–16)

Previously, the project used an average of forecasts from three models representing different ideas about the origins of mass atrocities.

Drawing on work by Barbara Harff and the Political Instability Task Force, the first model emphasized characteristics of countries’ national politics that hint at a predilection to commit genocide or “politicide,” especially in the context of political instability. Key risk factors in Harff’s model include authoritarian rule, the political salience of elite ethnicity, evidence of an exclusionary elite ideology, and international isolation as measured by trade openness. We refer to this model as the "bad regime" model.

The second model took a more instrumental view of mass killing. It used statistical forecasts of future coup attempts and new civil wars as proxy measures of factors that could either spur incumbent rulers to lash out against threats to their power or usher in an insecure new regime that might do the same. We refer to this model as the "elite threat" model.

The third model was a machine-learning process called Random Forests applied to the risk factors identified by the other two models. The resulting algorithm was an amalgamation of theory and induction that took experts’ beliefs about the origins of mass killing as its jumping-off point. It also left more room for inductive discovery of contingent effects.

To get our single-best risk assessment, we averaged the forecasts from these three models. We preferred the average to a single model’s output because we knew from work in many fields—including meteorology and elections forecasting—that this “ensemble” approach generally produces more accurate assessments than we could expect to get from any one model alone. By averaging the forecasts, we learned from all three perspectives while hedging against the biases of any one of them.