Matchit March 2012

matchit@lists.gking.harvard.edu

3 participants
1 discussions

matchit() silently adds a propensity score to X when method=='genetic'

by John G. Bullock

Hello, I am writing on behalf of myself and Rocío Titiunik. As we see it, matchit() serves as a front end to GenMatch() when method=='genetic'. So it should be possible to get the same results from both the MatchIt and Matching packages. But this proves difficult. Discrepancies seem to occur because matchit() silently adds a column of propensity scores to the X matrix that it passes to GenMatch(). I'm appending some code that we developed to demonstrate this point. The benefits (in most cases) of adding a propensity score are clear, but it seems problematic that this addition occurs without any notification. For example, a user of MatchIt may claim to have matched on a set of predictors, unaware that he has also matched on the propensity score. Or, if a user thinks that he has matched on one propensity score (because he included it in X when calling matchit()), he will be unaware that he has also matched on a second, matchit()-generated propensity score that is based on the first propensity score. In either case, he will be also be unable to replicate his MatchIt results with Matching or vice versa, even though MatchIt is supposed to be a wrapper for Matching. And third-party readers of code that includes a call to matchit() will likely have no idea that a propensity score has been added. We can't see any explicit mention of this propensity-score-adding feature in the MatchIt manual. To promote future replication efforts, can a line be added to the MatchIt manual about the addition of a propensity score to X? Or can a note be added to matchit() output whenever method=='genetic' is used? Thank you, John Bullock ### library(Matching) library(MatchIt) data(lalonde) X <- with(lalonde, cbind(age, educ, black, hispan, married, nodegree, re74, re75)) # GET ESTIMATE FROM MATCHIT() set.seed(5678) m.out1 <- matchit(treat ~ X, data=lalonde, method='genetic', estimand='ATT', ties=TRUE, print.level=1, pop.size=150, wait.generations=1, max.generations=10, hard.generation.limit=TRUE, unif.seed=1945, int.seed=1906) # Create a vector of "parameters at the solution" reported by matchit(). # Behind the scenes, these weights have been produced by GenMatch(). weights <- c(8.121767e+02, 7.735192e+02, 5.938936e+02, 4.079661e+02, 3.665018e+02, 1.361011e+02, 9.644896e+02, 6.083636e+02, 2.346772e+00) ATT.MatchIt <- with(match.data(m.out1), weighted.mean(re78[treat==1], weights[treat==1])) - with(match.data(m.out1), weighted.mean(re78[treat==0], weights[treat==0])) print(ATT.MatchIt) # 939.2 # GET ESTIMATE FROM MATCH() # The Match() estimate is -952.3 -- very different. Match(Y=lalonde$re78, Tr=lalonde$treat, X=X, estimand="ATT", Weight.matrix=diag(weights), ties=TRUE)$est # NOTE DISCREPANCY BETWEEN ncol(X) AND length(weights) # There are only 8 variables in X, so why is matchit() producing weights for nine variables? ncol(X) # 8 length(weights) # 9 # ADD PROPENSITY SCORE TO X AND RE-ESTIMATE WITH MATCH() glm1 <- glm(treat ~ age+educ+black+hispan+married+nodegree+re74+re75, family=binomial, data=lalonde) X2 <- with(lalonde, cbind(age, educ, black, hispan, married, nodegree, re74, re75)) X2 <- cbind(glm1$fitted, X2) Match(Y=lalonde$re78, Tr=lalonde$treat, X=X2, estimand="ATT", Weight.matrix=diag(weights), ties=TRUE)$est # 939.2

12 years

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Matchit March 2012