I am using MatchIT to do a 1:4 nearest neighbor match with 11 treatment and
about 500 potential control schools. Over the course of the past few
weeks, I have run a number of matches as the scoring of the to-be matched
metrics has changed just slightly. My solutions yield very different
matches in terms of what control schools are matched in some cases there
is only a 25% overlap in school from one analysis to the next.
I suppose that the instability is most likely due to the changes in the
metrics, but I also wonder if there is some random number seed that I
should declare at the start of each matching process?
Thanks
Bill
--
William N. Dudley, PhD
Professor - Public Health Education
The School of Health and Human Sciences
The University of North Carolina at Greensboro
437-L Coleman Building
Greensboro, NC 27402-6170
Visit my Web Site <http://www.uncg.edu/phe/faculty/dudley.html>
VOICE 336.256 2475
Hi everyone,
I am trying to match control cities to only one treatment city on six covariates.
As far as I understand propensity score matching, the above would violate the common support assumption, as the covariates of the treatment city would lead to a treatment probability, given the covariates, of 1 (leading to the warning “glm.fit: fitted probabilities numerically 0 or 1 occurred”). Therefore, I cannot apply PSM.
Using the Mahalanobis distance, I get no warning, but the balance is a lot worse than with logit.
How would you suggest to proceed?
Your help is greatly appreciated.
Best regards,
Justus
--
Justus Kirchhoff
北京大学中关新园
(北京市海淀区中关村北大街126号)
justus.kirchhoff(a)student.hu-berlin.de
Hello everyone,
First, thank you so much for your beautiful work on the MatchIt package. I use MatchIt regularly in research and am also teaching a course on Quasi-Experimental Designs and Propensity Score matching this semester. Your work has been of great help.
Second, I have a question about caliper matching. In the current literature (e.g., Austin's work) are recommendations for calipers that are 0.2 sd of the logit. In the past, I computed the standard deviation of the propensity score (m.out$distance in the probability metric). Recently, based upon the literature recommendations, I have been begun computing the standard deviation on the logit of the propensity score (logit metric, rather than probability metric), which makes more sense, given that the logit is linear.
However, I wondered what distance metric the MatchIt package is using to conduct the matching process (e.g., Nearest Neighbor). Is it being conducted on the propensity score metric or the logit metric? The standard deviations of the propensity score can be quite different from the standard deviation of the logit. Moreover, I struggle with whether it makes sense to even compute a standard deviation of the propensity score in the probability metric.
Any guidance, would be helpful. I also noticed that the matchit code offers the option to simply state the caliper distance -- is that on the metric of the logit or the propensity score (probability metric)?
Third, when conducting Mahalanobis distance, is the matrix of Mahalanobis distances stored anywhere accessible? Depending on sample size, the matrix would have to be huge, so I'm not sure what I would even do with it, other than illustrate it to my class. (I have not peeked into the source code to check.)
Thank you for any help or guidance you can offer! My students also thank you.
Best,
Jeanne
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
S. Jeanne Horst, PhD
Center for Assessment and Research Studies
Associate Professor, Department of Graduate Psychology
James Madison University
1122 Lakeview Hall; 298 Port Republic Road; MSC 6806
Harrisonburg, VA 22807
Office Phone: (540) 568-7103
http://www.psyc.jmu.edu/assessment/people/horst.htmlhttp://www.jmu.edu/assessment/http://www.jmu.edu/learningimprovement/
"Learning to live with ambiguity is learning to live with how life really is, full of complexities and strange surprises." James Hollis
"Fall seven times, stand up eight." Japanese Proverb
"Walk to the beat of your own tuba." Dove Chocolate
Hi all, I have been using MatchIt with success to match datasets together, usually with the "nearest" method. However I'm also running into situations where rather than match one dataset to another I just want to match to a set of target criteria.
As a simple example I may want to match a group to targets such as - Male:100, Female:100, CityA: 50, CityB: 100, CityC: 50.
Often this isn't possible and I simply want to get as close as possible while still hitting the overall target (200 in above case for example).
Am I horribly over-thinking this? I've been experimenting with creating fake groups to match to but it seems like I may be over complicating things.
Thanks in advance,
Eric
Dear Pr Kimai,
First, I would like to thank you for your R notice. They are very clear and very useful.
To introduce myself, I'm physiotherapist and also clinical research assistant ; I have a French degree in statistics (University Degree) and I'm interested in propensity score to analyze retrospective data.
Nevertheless, despite your enlightened notice, I have some questions without answers.
We are studying Single Event Multilevel Surgery (SEMLS) were multiple orthopedic procedures may be associated at the same time in order to improve the gait of cerebral palsy patients. The "independent" data of interest is one type of surgical procedure (treatment == 1 , control == 0) (in fact 8 others surgical procedures may be combined in the data set ). We have multiple variables to explore (+1000), but we focus on 10 variables on which we have the hypothesis that could be improved after surgery (and may not be biased by the other surgical procedures effects). We consider propensity score could be the best solution to be able to compare our two samples in order to assess the effect of this specific procedure.
We tried to perform propensity score matching with MatchIt package using argument "nearest" and we didn't obtained comparable samples... Probably because of some outliers. So we introduced a "caliper=0.1" to limit the matching. (I don't know if it's a correct strategy but it works...).
We then assessed the overall effect of this procedure.
Studying the data it seems that some subgroups among the patients specified with respect to some parameters (that has been used in the matching) may have better outcomes. The question is :
Can we perform subgroup analysis after a PSM was realized on the global population ?
Or shall we better, first define the subgroups candidates, then realize some à posteriori matching on those subpopulations only ?
A bonus question :
What is the difference between this method :
mod_match <- matchit(treatment ~ var1 + var2 + var3 + var4 + var5,
method = "nearest", data = data)
and the fact of Performing a logistic regression, then to compute propensity score with "predict" (to create a variable called SP), and do :
matchit(treatment ~ SP, data = data, method = "nearest", ratio =1)
Thank you in advance for your answers,
Best regards,
Anne-Laure GUINET
Clinical Research Assistant
Physiotherapist
anne-laure.guinet(a)fondationpoidatz.com<mailto:anne-laure.guinet@fondationpoidatz.com>
01 60 65 82 65
1 rue Ellen Poidatz
77310 Saint Fargeau Ponthierry
FRANCE
[logo Fondation Poidatz]
I’m currently experiencing what I believe to a bug with MatchIt while using
R 3.3.2 (64bit) on a windows machine. In my case the full matching is not
optimal and in fact can lead to a negative balance improvement (balance
worsening) for the covariates specified.
Here is a toy example I made to illustrate the issue:
####################
library(MatchIt)
treat = c(0 , 0, 0, 0, 0, 0, 1, 1, 1, 1, 1)
x = c(10,20,30,40,100,102, 11, 21, 31, 41, 101)
data = as.data.frame( cbind( treat, x) )
matched = matchit( treat ~ x, data=data, distance = "mahalanobis", method
="full")
x[which( matched$subclass == 1)]
x[which( matched$subclass == 2)]
x[which( matched$subclass == 3)]
x[which( matched$subclass == 4)]
x[which( matched$subclass == 5)]
summary(matched)
####################
The optimal match in this case is clear: 10 is to be matched with 11, 20
with 21, 30 with 31, 40 with 41, and 101 with 100 and 102. This is indeed
the match given by other matchit distances (logit, log, etc.) and gives a
91% balance improvement in the covariate x.
However, with the mahalanobis distance I get a rather strange matching. For
example, 40 is matched with 11, while 102 is matched with 41. The percent
balance “improvement” is -76%.
I am wondering if this is the intended behavior for matchit, and if there
is a work around to provide the matching one would expect for these data.
Best regards,
Guillaume
Dear all,
If I want to use different* distance option* which is not in the package,
how do I combine with the package?
Because I want to use linear regression, or lm(), to calculate propensity
scores and do matching.
Should I need to create a similar function like the following and use
matchit function and that function together, or maybe other solutions?
distance2logit <- function(formula, data, ...) {
res <- glm(formula, data, family=binomial(logit), ...)
return(list(model = res, distance = fitted(res)))
}
Thank you so much
*Wan-Yi Chou (May Chou), Master*
*Management Science, **National Chiao Tung University*
*國立交通大學管理科學研究所 *
*周宛誼*|*Email mayritaspring(a)gmail.com <mayritaspring(a)gmail.com> *
Hi Ignacio, here's the simplest explanation of weights I could come up with
for one method (CEM), but it applies more generally to matching with
replacement: j.mp/CEMweights
Gary
--
*Gary King* - Albert J. Weatherhead III University Professor - Director,
IQSS <http://iq.harvard.edu/> - Harvard University
GaryKing.org - King(a)Harvard.edu - @KingGary <https://twitter.com/kinggary> -
617-500-7570 - Assistant <king-assist(a)iq.harvard.edu>: 617-495-9271
On Thu, Jan 12, 2017 at 9:59 AM, Ignacio Martinez <ignacio82(a)gmail.com>
wrote:
> Hi everyone,
>
>
> When you do matching with replacement you have to use weights because some
> observations are used multiple times. Can somebody explain what would be
> the consequences of ignoring those weights when running OLS? My intuition
> is that I would end up with bias estimator. Is this correct? Is it possible
> to sign the bias? Is there a paper that discuss this?
>
>
> Thanks,
>
>
> Ignacio
>
>
Hi everyone,
When you do matching with replacement you have to use weights because some
observations are used multiple times. Can somebody explain what would be
the consequences of ignoring those weights when running OLS? My intuition
is that I would end up with bias estimator. Is this correct? Is it possible
to sign the bias? Is there a paper that discuss this?
Thanks,
Ignacio
i'm sure it is mentioned (probably in our paper somewhere). The costs and
benefits are not methodological; they are more of a choice about what
quantity of interest you are willing to try to estimate.
Gary
--
*Gary King* - Albert J. Weatherhead III University Professor - Director,
IQSS <http://iq.harvard.edu/> - Harvard University
GaryKing.org - King(a)Harvard.edu - @KingGary <https://twitter.com/kinggary> -
617-500-7570 - Assistant <king-assist(a)iq.harvard.edu>: 617-495-9271
On Wed, Jan 11, 2017 at 2:06 PM, Ignacio Martinez <ignacio82(a)gmail.com>
wrote:
> Thanks a lot Gary. Is there any literature that talks about this case? I
> imagine that there are plus and minuses to those approaches.
>
>
>
> On Wed, Jan 11, 2017 at 2:04 PM Gary King <king(a)harvard.edu> wrote:
>
>> one simple possibility is to switch 0s to 1s and 1s to 0s. if that
>> really won't work for you, then you could match with (a lot of)
>> replacement.
>>
>> Gary
>> --
>> *Gary King* - Albert J. Weatherhead III University Professor - Director,
>> IQSS <http://iq.harvard.edu/> - Harvard University
>> GaryKing.org - King(a)Harvard.edu - @KingGary
>> <https://twitter.com/kinggary> - 617-500-7570 <(617)%20500-7570> -
>> Assistant <king-assist(a)iq.harvard.edu>: 617-495-9271 <(617)%20495-9271>
>>
>> On Wed, Jan 11, 2017 at 2:01 PM, Ignacio Martinez <ignacio82(a)gmail.com>
>> wrote:
>>
>> Hi everyone,
>>
>> Is there a paper that talks about matching when the sample has more
>> treatment observations than control observations? Is there an algorithm
>> that works better for this case? Can somebody explain to me why optimal
>> matching does not work at all in this case?
>>
>> Thanks,
>>
>> Ignacio
>>
>>
>>