Dear all,
I am having problems with the optimal method and the use of a caliper.
What is happening is, despite my use of a caliper, and despite matchit recognising that it
can't match every "Treated" with a "Control", no
"Treated" observations are discarded.
For example, my syntax:
#--
out<-matchit( formula, data=dat.final, method =
"optimal", distance="linear.logit", ratio=1,caliper=0.1,
m.order="random")
Warning messages:
1: In fullmatch(d, min.controls = ratio, max.controls = ratio, omit.fraction = (n0 - :
Without 'data' argument the order of the match is not guaranteed
to be the same as your original data.
2: In fullmatch.matrix(d, min.controls = ratio, max.controls = ratio, :
The problem is infeasible with the given constraints; some units were omitted to allow a
match.
summary(out)
Sample sizes:
Control Treated
All 5783 677
Matched 677 677
Unmatched 5106 0
Discarded 0 0
#--
Let me just reiterate that I am fine with some treated being omitted, but none are despite
the warning suggesting some are omitted.
When I look at the match.matrix and extract the pairs, I can see that the difference in
distances for some pairs are much greater than the 0.1*sd(distance measure)
I have also tried changing the option "omit.fraction" with no success. Also
choosing discard="control" also does not work.
Am I doing something incorrect or is there a problem?
I have also noticed in the documentation that when method="nearest" and
caliper!=0 then the documentation says (pg 26): "if a caliper is specified, a control
unit within the caliper for a treated unit is randomly selected as the match for that
treated unit"
Is this correct? This would seem to defeat the purpose of nearest neighbour matching. For
instance, if my caliper was large, and almost all distances fell within the caliper then I
am just randomly matching. Do they instead mean: if a caliper was specified, then the
closest control unit within the caliper is selected as the match for that treated unit.
I also get problems when I use distance="linear.logit",
method="nearest", m.order="random", with caliper=0.2 (I am trying to
find 1:1 matches for 300 treated from 1600 controls). I also set verbose=TRUE to see what
is going on.
The error that comes up is:
Error in matchedc[goodmatch == clabels] <- itert :
replacement has length zero
But also with verbose=TRUE
Nearest neighbor matching...
Matching Treated: 10%...20%...30%...40%...50%...60%...70%...80%...90%...100%...
When I calculate the logit.linear distance myself using a glm and plot the
logit(propensity score) there is a large overlap. Rough calculations suggest I would
immediately be unable to find matches for 3, but it is possible to find matches for all
others. To add an extra level, it works for some random seeds, not others, and it works
for all cases if I use m.order="largest".
To confuse the situation more, the distance "logit" works fine over 10000
simulations in 8 different scenarios!
In summary, there are many bugs. Or am I missing something??
Best,
Leesa.