Update 4/3/2016: I’ve since included a more comprehensive notebook here. Use this version instead of the code posted below.
I need to run some simulations using propensity score matching, but I haven’t been able to find a Python module that does it. So, I’ve taken it upon myself to implement it. This is a serious work in progress, progress being the keyword. I’m definitely the tortoise in this race, slow and steady progress.
Propensity score matching is a method to match case-control pairs in observational studies (or treated-control pairs in quasi-experimental studies) in order to better estimate the effect of the treatment or exposure on the outcome of interest. We first estimate the “propensity” of getting assigned to the treatment group given the other covariates measured, then match pairs with similar propensities. The idea is that the individuals in these pairs have similar levels of other confounding variables, so in a sense we can see the effect of the treatment with the other things held constant.
The tricky thing about propensity score matching is that there’s no one good way to do it. If your estimated propensities are wrong, then you’re screwed right off the bat. But assuming they’re alright, then how do you pick the “best” case-control pairs? You could try every possible pairing and minimize the within-pair differences in propensity, but that’s computationally intensive. What’s typically done is greedy matching, but even then there are a number of factors to decide: in what order do we match cases to controls? Do we match with or without replacement, allowing one control to be matched to one or more cases? Do we use a caliper to set a maximum difference in propensities, and if so how do we pick the caliper?
I thought I’d share my IPython notebook for this code so far because I’m really enjoying using it. Before, I was just using a basic text editor to write code and copying and pasting it into the terminal, but this is much more elegant. I’ve posted the bit of code that I’ve written so far to GitHub. I set it up to do greedy matching with no replacement and randomized the order of matching the treatment group to controls, with the default caliper set to 0.05 arbitrarily. My IPython notebook uses the Dehejia-Wahba sample data from “Evaluating the Econometric Evaluations of Training Programs,” available freely here.