News

Using synthetic control methods to model the progression of COVID-19

A new way of using synthetic control methods to model predictions

What can we expect a country’s COVID-19 trajectory to look like, based on the experience of countries that are ahead in the outbreak?

To answer this question, we developed a new tool that adapts synthetic control methods to allow for comparative COVID-19 predictions.

The tool models the progression of the outbreak in a country of interest by creating a synthetic country – a combination of other countries – that has followed a similar COVID-19 trajectory. The experience of this synthetic country is then used to model what might happen in the country of interest in the near future.

Back up a bit. What are synthetic control methods?

Synthetic control methods, developed by Abadie and Gardeazabal (2003), create a “control region” for a geographic area where a policy change has taken place (“treatment region”). This control region is called “synthetic” because it is constructed using a combination of areas where the intervention has not taken place.

If control and treatment regions are matched before the intervention takes place, then the synthetic control region is likely to be a good predictor of what would have happened in the treatment region had the event or intervention not taken place. Synthetic control methods are used to create counterfactuals for evaluation purposes.

But what if we use them to make predictions?

How is this model different?

Synthetic control approaches compare the outcomes in a location of interest with their synthetic counterfactual at the same point in time. The innovation we offer is to move that point in time and create many versions of potential comparison countries by shifting timelines forwards, backwards, or both.

This allows us to make projections forward in time. In this approach there is no “event” or “treatment” that we are interested in. We are interested in:

(i) creating a good synthetic match for a country of interest during a “training period”;

(ii) verifying whether the synthetic comparator produces accurate estimates during a “validation” period during which we have both actual and synthetic data; and finally

(iii) projecting forward.

What insights does our model offer on the COVID-19 crisis?

The COVID-19 pandemic is a good test case for this idea, because not all countries are at the same point in the outbreak.

As a comparative tool, our model offers a different perspective from epidemiological models, which are based on parameters specific to the disease. Epidemiological modeling relies on assumptions about transmission and mortality rates, the period of incubation and communicability, and the sizes of the susceptible, infected and recovered populations. Modelers also have to estimate how much government interventions, such as social distancing, affect transmission rates.

The key assumption of the synthetic controls approach is that, on average, the experience of the synthetic control region (in terms of how cases or deaths are recorded, and how social distancing and other measures are implemented) will closely mimic that of the country of interest. While that will not always be the case, given that each country is tracing its own path through this outbreak, there are similarities in both the trajectory of the outbreak and the measures that are being implemented around the world. The tool can therefore be informative both in its predictions and in its deviations.

To illustrate how it works, here is an example with data from the United States. We used data from 24 March through to 7 April to create a synthetic ”United States”. Then we used information from the following week (7-15 April), to assess how our synthetic United States models the actual trajectory of total deaths in the real country. Finally, we project forward. We estimate that the United States will see approximately 37,500 COVID-19 related deaths by 22 April, about one week from now.

So what can we learn from the model?

Using a synthetic controls approach to build a predictive model comes with limitations. For example, we cannot model the countries where the outbreak first started, because the tool relies on the experience of regions that are ahead in the outbreak. The tool also doesn’t work well for outliers, and it doesn’t take into account new innovations (in testing or treatment) that might affect the curve of the outbreak and were not available to countries that experienced the outbreak first. Finally, as with any model, our predictions are only as good as the data that goes in: the model relies on official reporting on cases and death rates, which we know are compiled with different standards in different countries. This also means that the trajectories of countries with very low death rates and numbers of cases cannot be modeled.

Despite these limitations, we hope that this approach offers researchers a new way to use synthetic controls for predictive purposes, and policy makers an alternative way to model the trajectory of this outbreak in the short term.

We are making incremental improvements to the model as we move along and we welcome feedback. Over the next few weeks we will be comparing the performance of the tool against alternatives and drawing conclusions on the effectiveness of predictions in the short/long term.

Should the tool show promising results, we will explore ways of using it to help governments in the countries where we work to model the future trajectory of the epidemic and think of other potential applications.

—

References:

Abadie, A. & Gardeazabal, J. (2003). ‘The economic costs of conflict: a case study of the Basque country’. The American Economic Review, 93(1), pp. 113 – 132.

This tool was developed by David Wickland, Research Analyst Laterite Ethiopia, and Dimitri Stoelinga, Managing Partner.

Back