Hello, I have 500 individuals in my treatment group and 400,000 in my
control group. I am trying to find the nearest match on three covariates.
If I match only on the first two covariates, then there is no problem. But
when I add the third covariate, I get the message: "Warning message:
glm.fit: fitted probabilities numerically 0 or 1 occurred." To help
identify the problem, I have tried the matching procedure using only a small
subsample of the treatment group (e.g., the first 10 individuals or the last
10 individuals). But no matter which subsample I use, I still get the exact
same warning message. Can somebody please tell me how to correct this
problem? PS I still get the same warning message when I match using only
the third covariate by itself.
--
Geoffrey Smith
Visiting Assistant Professor
Department of Finance
WP Carey School of Business
Arizona State University
Hi there,
I tried to match my data with matchit using nearest matching and got only zeros in the percent balance improvement table. I found it weird therefore decided to use the same data with Sekhon's Matching package where the balance has actually improved. I don't really understand why this has happened. Does anyone has any explanation what could cause to get only zeros in the percent balance improvement summary? Let me add that this occurred only while using nearest and optimal method, while full and genetic matching produced results different from 0.
Your answers/thoughts will be much appreciated!
Best regards,
Ana
Hi there,
I'm wondering what Zelig in the following situation (code below) actually does. Is this considered as a so called regression adjustment after propensity score matching as recommended by Rubin?
m.out <- matchit(treat ~ age + educ + black + hispan + married + nodegree + re74 + re75, data = lalonde)
z.out <- zelig(re78 ~ distance, data = match.data(m.out, "control"), model = "ls")
x.out <- setx(z.out, fn = NULL, data = match.data(m.out, "treat"), cond = TRUE)
s.out <- sim(z.out, x = x.out)
summary(s.out)
I have few more questions:
1. Any idea how it would be possible to extract variance ration between treated and control group before and after matching?
2. I understand that balance improvement is presented with summary(m.out) command, but how can I know that covariate distributions of treated and control groups, after matching, are close enough (the difference between the distributions is not significant) so that I can conclude that sufficient balance has been achieved and therefore estimation of ATT or ATE is sensible? Is matchit package doing any kind of distribution similarity tests behind and doesn't tell us about?
3. Zelig provides us with ATT estimation, what about if we are interested in ATT estimation?
Many thanks in advance for all your answers and thoughts!
Best regards,
Ana
Hi there,
Anyone knows why does matchit function returns error whenever the "hull" option is used (either "hull.both", "hull.control" or "hull.treat"). Things work well with all the rest of discard options.
This is the error msg.
>m.out.base <- matchit(formula=f, data=d, method="nearest", discard="hull.control")
[1] "Preprocessing data ..."
[1] "Performing convex hull test ..."
[1] "Calculating distances ...."
[1] "Calculating the geometric variance..."
[1] "Calculating cumulative frequencies ..."
[1] "Finishing up ..."
Error in weights.matrix(match.matrix, treat, discarded) :
No units were matched
Any help will be greatly appreciated!
I think that the "hull" option actually works well with lalonde data, but I don't understand why it doesn't work with my data. Can someone also please explain what's the exact calculation behind "hull" and if the value of the "hull" can be extracted?
Many thanks in advance.
Best regards,
Ana
Hi,
I'm trying to implement the ideas in Harder, Stuart and Anthony (2010), to seperate the estimation step from the application step and that way to combine various mathcing methods with various distance measures.
But I have a problme with the Mahalanobis distance measure in conjonction with subclassification. Since the Mahalanobis measure don't produce a propensity score, the distance object from matchit() is filled with NA values. In principle this is not a problem, I have just to specifiy method="subclass" and distance="mahalanobis" to obtain a model, even if the distance object is useless in this case.
But that model produce only one subclass instead to the five subclasses I specified. There is an error message with this :
Notice :
Due to discreteness in data, fewer subclasses generated
Thanks,
François