Attrition, or the inability to relocate a survey respondent in a longitudinal study, is a common occurrence in development research. It can be due to various reasons, such as a respondent refusing to participate or feeling unwell at the time of the interview. Geographical and social factors can also play a role. Whatever the reasons may be, attrition is almost always non-random and affects some types of respondents more than others. This means that it may have an impact on the internal validity of the study.

Which leads to the question: how can we re-weight an endline sample (affected by non-random attrition) in such a way that it is as comparable as possible to the sample collected at baseline?

Testing for comparability

The first step in our process is deciding whether or not attrition has affected our sample enough to warrant re-weighting. This can be done by using a joint orthogonality test, which examines whether two groups (baseline and endline samples) are significantly different over a range of chosen characteristics. This method is preferable to other common comparison techniques such as balance tables or bivariate regressions, as we can compare a set of characteristics as a whole rather than individually.

If the test finds that our two groups are significantly different, there is a risk that comparisons between them, without further adjustments, could be misleading.

Propensity score weights

Luckily for us, we can deploy propensity scores to re-weight our sample.

Propensity score weights are often used to balance the characteristics of two groups (such as accounting for selection bias in non-experimental studies, or non-random attrition between baseline & endline samples).

We can use a logit or probit regression model to construct our weights. The dependent variable in the model is a binary variable that distinguishes between the baseline and endline samples. The independent variables are baseline characteristics that are statistically different between the two groups (as discovered during orthogonality testing), but do not change as a result of the intervention. These can be, for example, age, gender, or location. Note that our data must be in long format (one observation per individual) in order to construct this model correctly.

Using this model (and your statistical software of choice), we can now predict a propensity score and create propensity score weights. These will be the inverse of the propensity score (1/ps) for the endline sample, and the inverse of one minus the propensity score (1/(1-ps)) for the baseline sample.

We can apply these weights to our chosen comparability test in order to examine whether our sample has been adequately balanced.

Things to keep in mind

Be sure to check for large outliers in the weights that you have constructed. Outliers can give more importance to certain observations and skew your data. These could then be dealt with accordingly by limiting extreme values in the data, or assigning to the mean of the strata the observation belongs to.

More variation in weights can result in less statistical power. In other words, adding weights decreases the reliability of the estimates, reflected by the standard error. Just as a smaller sample size results in a larger standard error, so does the use of weights.

Sampling weights are commonly used to ensure that a sample in a study is representative of the population it was drawn from. If this is the case, propensity score weights can easily be combined with sampling weights by multiplying them to form a combined weight variable. When we don’t interact our propensity score weight with our sampling weight, using only our propensity score weights for the analysis, the population size estimate of this analysis will not be in line with the population size in reality.

Other techniques also exist, such as entropy balancing (Hainmueller 2012), which uses an algorithm to create weights that balance a set of given variables/characteristics. It can be a useful check to try other techniques and see how the weights they produce may differ.



This post was contributed by Oliver Budd, a Research Analyst based in Amsterdam, Netherlands