Hello,
I am writing on behalf of myself and Rocío Titiunik.
As we see it, matchit() serves as a front end to GenMatch()
when method=='genetic'. So it should be possible to get the same
results from both the MatchIt and Matching packages. But this
proves difficult. Discrepancies seem to occur because matchit()
silently adds a column of propensity scores to the X matrix that it
passes to GenMatch(). I'm appending some code that we developed to
demonstrate this point.
The benefits (in most cases) of adding a propensity score are
clear, but it seems problematic that this addition occurs without
any notification. For example, a user of MatchIt may claim to have
matched on a set of predictors, unaware that he has also matched on
the propensity score. Or, if a user thinks that he has matched on
one propensity score (because he included it in X when calling
matchit()), he will be unaware that he has also matched on a second,
matchit()-generated propensity score that is based on the first
propensity score. In either case, he will be also be unable to
replicate his MatchIt results with Matching or vice versa, even
though MatchIt is supposed to be a wrapper for Matching. And
third-party readers of code that includes a call to matchit() will
likely have no idea that a propensity score has been added.
We can't see any explicit mention of this
propensity-score-adding feature in the MatchIt manual. To promote
future replication efforts, can a line be added to the MatchIt
manual about the addition of a propensity score to X? Or can a note
be added to matchit() output whenever method=='genetic' is used?
Thank you,
John Bullock
###
library(Matching)
library(MatchIt)
data(lalonde)
X <- with(lalonde, cbind(age, educ, black, hispan, married,
nodegree, re74, re75))
# GET ESTIMATE FROM MATCHIT()
set.seed(5678)
m.out1 <- matchit(treat ~ X, data=lalonde, method='genetic',
estimand='ATT', ties=TRUE, print.level=1,
pop.size=150, wait.generations=1,
max.generations=10, hard.generation.limit=TRUE,
unif.seed=1945, int.seed=1906)
# Create a vector of "parameters at the solution" reported by matchit().
# Behind the scenes, these weights have been produced by GenMatch().
weights <- c(8.121767e+02, 7.735192e+02, 5.938936e+02, 4.079661e+02,
3.665018e+02, 1.361011e+02,
9.644896e+02, 6.083636e+02, 2.346772e+00)
ATT.MatchIt <- with(match.data(m.out1),
weighted.mean(re78[treat==1], weights[treat==1])) -
with(match.data(m.out1),
weighted.mean(re78[treat==0], weights[treat==0]))
print(ATT.MatchIt) # 939.2
# GET ESTIMATE FROM MATCH()
# The Match() estimate is -952.3 -- very different.
Match(Y=lalonde$re78, Tr=lalonde$treat, X=X, estimand="ATT",
Weight.matrix=diag(weights), ties=TRUE)$est
# NOTE DISCREPANCY BETWEEN ncol(X) AND length(weights)
# There are only 8 variables in X, so why is matchit() producing
weights for nine variables?
ncol(X) # 8
length(weights) # 9
# ADD PROPENSITY SCORE TO X AND RE-ESTIMATE WITH MATCH()
glm1 <- glm(treat ~
age+educ+black+hispan+married+nodegree+re74+re75, family=binomial,
data=lalonde)
X2 <- with(lalonde, cbind(age, educ, black, hispan, married,
nodegree, re74, re75))
X2 <- cbind(glm1$fitted, X2)
Match(Y=lalonde$re78, Tr=lalonde$treat, X=X2, estimand="ATT",
Weight.matrix=diag(weights), ties=TRUE)$est # 939.2