Matchit

matchit@lists.gking.harvard.edu

373 discussions

Re: [matchit] MatchIt issue with mahalanobis distance and full matching

by Jonas Geldmann

Dear Guillaume Have you found a solution to this issue with MDM? I have been recommended to stay away from Propensity Score Matching and MDM is one of the alternatives suggested to avoid assuming complete randomization. But this does not sound too encouraging. Best, Jonas _____________________________________________________ Postdoctoral Research Fellow Conservation Science Group Department of Zoology University of Cambridge The David Attenborough Building Pembroke Street, Cambridge, CB2 3QZ Phone: +44 7412 885 112 Danish: +45 2990 5192 Skype: jgeldmann

6 years, 8 months

Fixed subscript out of bounds using my advice from 2007

by Janet Rosenbaum

Dear all, I fixed the subscript out of bounds problem in matchit using a solution that I discovered 10 years ago and sent to the matchit mailing list. Here is my post from 2007: https://lists.gking.harvard.edu/pipermail/matchit/2007-September/000029.html I have a dataset called Small without missing values. I get a subscript out of bounds error. When I create a new dataset that is exactly the same as the old dataset, matchit works. See the below code. The two data frames are the same data types, and they have the same row names. I do not know how they are different, but for some reason only the second one works. I can’t give you this data to replicate the problem in for data confidentiality reasons, but perhaps someone else has data that creates this error. > sum(apply(apply(Small, 1, is.na), 1, sum)) [1] 0 > model1<-matchit(formula= treatment ~ male, exact=c("black"), mahvars=c("age1"), data=Small, method="nearest", distance = "logit", caliper=0.25, replace=T, calclosest=T, ratio=3, verbose=T) Nearest neighbor matching... Matching Treated: Error in mahvars[pool, , drop = F] : subscript out of bounds > dim(Small) [1] 7756 8 > > nrow(Small) [1] 7756 > Small2=Small[1:nrow(Small),] > > model1<-matchit(formula= treatment ~ male, exact=c("black"), mahvars=c("age1"), data=Small2, method="nearest", distance = "logit", caliper=0.25, replace=T, calclosest=T, ratio=3, verbose=T) Nearest neighbor matching... Matching Treated: 10%...20%...30%...40%...50%...60%...70%...80%...90%...100%...Done > is.data.frame(Small) [1] TRUE > is.data.frame(Small2) [1] TRUE > is.matrix(Small) [1] FALSE > is.matrix(Small2) [1] FALSE > rownames(Small)[1:4] [1] "1" "2" "3" "4" > rownames(Small2)[1:4] [1] "1" "2" "3" “4” Janet Janet Rosenbaum, Ph.D. Assistant Professor of Epidemiology School of Public Health, SUNY Downstate Medical Center, Brooklyn, NY janet(a)post.harvard.edu

6 years, 9 months

Error in mahvars[pool, , drop = F] : subscript out of bounds

by Janet Rosenbaum

Dear all, I’m another person who gets the following error with the Mahalanobis matching option for matchit: Error in mahvars[pool, , drop = F] : subscript out of bounds I have done Mahalanobis matching a thousand times before without problems. Now I’ve exported a slightly different version of the data — slightly different sample size, slightly different set of variables. Running one matchit command with the first version of the data is fine. Running the exact same matchit command with the second version of the data gives the error; if I take out the mahvars option, the command runs fine. Which is unfortunate because Mahalanobis matching seems to work with the earlier version of the data. An earlier poster said that the row names may be different, but in my case, I don’t have row names for either dataset. Both datasets are csv files exported from Stata with the same command, with only slight variations. Stata version is the same. I did upgrade R today from 3.4.0 to 3.4.1. I replicated the problem by exporting a small version of my data file with only 8 variables: I can do nearest-neighbor matching without the mahvars option, but it doesn’t work if I specify the option. I can do full matching with this data. Any ideas how to proceed or how I can figure out what might be causing the problem? I tried debugging the data and looked at the variable pool noted in the error message, but I didn’t know where to go from there. I recall having had a student in my class of 17 last spring who was the only student in the class who had this error, and I wasn’t able to help them either; the student just used full matching for their term paper. > dim(Small) [1] 7756 8 > names(Small) [1] "treatment" "male" "age1" "black" "gonorrhea" "chlamydia" "tricho" [8] "any_sti3" > model1<-matchit(formula= treatment ~ male, exact=c("black"), mahvars=c("age1"), data=Small, method="nearest", distance = "logit", caliper=0.25, replace=T, calclosest=T, ratio=3, verbose=T) Nearest neighbor matching... Matching Treated: Error in mahvars[pool, , drop = F] : subscript out of bounds > model0<-matchit(formula= treatment ~ male + age1, exact=c("black"), data=Small, method="nearest", distance = "logit", caliper=0.25, replace=T, calclosest=T, ratio=3, verbose=T) Nearest neighbor matching... Matching Treated: 10%...20%...30%...40%...50%...60%...70%...80%...90%...100%...Done > model0 Call: matchit(formula = treatment ~ male + age1, data = Small, method = "nearest", distance = "logit", exact = c("black"), caliper = 0.25, replace = T, calclosest = T, ratio = 3, verbose = T) Sample sizes: Control Treated All 7375 381 Matched 1061 381 Unmatched 6314 0 Discarded 0 0 > > model.full<-matchit(formula= treatment ~ male + priv_ah + black + age1, data=Small, method="full", verbose=T) Error in eval(predvars, data, env) : object 'priv_ah' not found > model.full Call: matchit(formula = suspended_lastyr2 ~ male + priv_ah + black + age1, data = Small, method = "full", verbose = T) Sample sizes: Control Treated All 7375 381 Matched 7375 381 Discarded 0 0 Thanks, Janet Janet Rosenbaum, Ph.D. Assistant Professor of Epidemiology School of Public Health, SUNY Downstate Medical Center, Brooklyn, NY janet(a)post.harvard.edu

6 years, 9 months

Re: [matchit] Please help me : Error in matchit () Missing values exist in the data

by Kosuke Imai

Will do. Kosuke Imai Professor, Department of Politics Center for Statistics and Machine Learning Princeton University http://imai.princeton.edu On Thu, Jun 22, 2017 at 6:08 PM, kgmacau <kgmacau(a)163.com> wrote: > > Dear Prof. Imai, > > Many thanks. The number of running sum(is.na(real_data)) is positive, > which indicates a variable has missing values. However, I do not use this > variable in matchit function but it actually has influence on this > function. Is it possible to solve this limitation in your future update of > this package. > > Thanks & Regards, > Ning > > > > > At 2017-06-22 13:49:33, "Kosuke Imai" <kimai(a)princeton.edu> wrote: > > Hi Ning, > > Please send your inquirty to the matchit mailing list, to which I'm > forwarding your email. As for your question, can you try sum(is.na(real_data))? > If you get a positive number, it does mean that you have missing data. > "subclass" tells you which subclass each observation belongs to (after > full matching). > > Good luck, > Kosuke > > Kosuke Imai > Professor, Department of Politics > Center for Statistics and Machine Learning > Princeton University > http://imai.princeton.edu > > On Wed, Jun 21, 2017 at 1:17 AM, kgmacau <kgmacau(a)163.com> wrote: > >> Dear Prof. Ho and Imai, >> >> I am implementing the propensity scores matching method for my research >> and came across below error when running the matchit function: >> >> library(MatchIt) >> result <- matchit(group~sex+age+primary_disease >> +kps+rpa+gpa+primary_control >> +extracranial_metastasis >> +extracranial_metastasis_control >> +primary_treatment >> +past_treatment >> +N_brain_metastases, >> data=real_data, >> method = "full") >> >> Error in matchit(group ~ sex + age + primary_disease + kps + rpa + gpa + >> : >> Missing values exist in the data >> >> I checked my data and there is no missing value in my data and searched >> the same error online but did not find any solution. So I do not know why >> there occurred this error. I highly appreciate it if you can tell me how to >> solve this error. >> >> Besides, we can present the matched results in data via >> match.data(result), in which the argument "subclass" specifies the >> variable name used to store the subclass indicator. Does this argument >> indicate the matched information for subjects? If not, what argument in >> what function can present the matched subject lists? Thank you very much >> and look forward to your reply. >> >> Thanks & Regards, >> >> Ning Li >> Programmer analyst >> PPD Biostatistics Department, Beijing, China >> Email: ning.li(a)ppdi.com >> Tel: 86 17701386979 >> >> >> >> >> >> > > > > >

6 years, 10 months

Re: [matchit] Please help me : Error in matchit () Missing values exist in the data

by Kosuke Imai

Hi Ning, Please send your inquirty to the matchit mailing list, to which I'm forwarding your email. As for your question, can you try sum(is.na(real_data))? If you get a positive number, it does mean that you have missing data. "subclass" tells you which subclass each observation belongs to (after full matching). Good luck, Kosuke Kosuke Imai Professor, Department of Politics Center for Statistics and Machine Learning Princeton University http://imai.princeton.edu On Wed, Jun 21, 2017 at 1:17 AM, kgmacau <kgmacau(a)163.com> wrote: > Dear Prof. Ho and Imai, > > I am implementing the propensity scores matching method for my research > and came across below error when running the matchit function: > > library(MatchIt) > result <- matchit(group~sex+age+primary_disease > +kps+rpa+gpa+primary_control > +extracranial_metastasis > +extracranial_metastasis_control > +primary_treatment > +past_treatment > +N_brain_metastases, > data=real_data, > method = "full") > > Error in matchit(group ~ sex + age + primary_disease + kps + rpa + gpa + > : > Missing values exist in the data > > I checked my data and there is no missing value in my data and searched > the same error online but did not find any solution. So I do not know why > there occurred this error. I highly appreciate it if you can tell me how to > solve this error. > > Besides, we can present the matched results in data via > match.data(result), in which the argument "subclass" specifies the > variable name used to store the subclass indicator. Does this argument > indicate the matched information for subjects? If not, what argument in > what function can present the matched subject lists? Thank you very much > and look forward to your reply. > > Thanks & Regards, > > Ning Li > Programmer analyst > PPD Biostatistics Department, Beijing, China > Email: ning.li(a)ppdi.com > Tel: 86 17701386979 > > > > > >

6 years, 11 months

Different "methods" per variable

by Jonas Geldmann

Dear mailing list I have a question about the MatchIt that might be very naïve or simple, but I am very new to the package. I am matching on a series of variables some continues (e.g. temperature and elevation) and some categorical (country and landcover class). As a result, I need method = exactly for some variable (i.e. country) and method = nearest, distance = logit for others (i.e. temperature). I am not sure if I can specific different methods for different variables in the same matchit call Thank you in advance. Sincerely, Jonas _____________________________________________________ Postdoctoral Research Fellow Conservation Science Group Department of Zoology University of Cambridge Downing Street, Cambridge CB2 3EJ Phone: +44 7412 885 112 Danish: +45 2990 5192 Skype: jgeldmann

6 years, 12 months

Re: [matchit] MatchIt Package

by Kosuke Imai

Try creating larger strata. Kosuke Imai Professor, Department of Politics Center for Statistics and Machine Learning Princeton University http://imai.princeton.edu On Mon, May 15, 2017 at 12:30 PM, Guogen Shan <guogen.shan(a)unlv.edu> wrote: > Dear Prof. Imai, Thanks for your prompt response. This strata approach > works great when the control group has at least one subject having the same > age as one subject from the treatment group. The control group in our study > is not large enough to meet that assumption. Any comments? Thank you. > > Guogen > > On May 15, 2017 6:52 AM, "Kosuke Imai" <kimai(a)princeton.edu> wrote: > >> You can stratify the age variable and then do a matching with an extract >> restriction on that strata. >> >> Kosuke Imai >> Professor, Department of Politics >> Center for Statistics and Machine Learning >> Princeton University >> http://imai.princeton.edu >> >> On Sun, May 14, 2017 at 11:58 PM, Guogen Shan <guogen.shan(a)unlv.edu> >> wrote: >> >>> Dear Prof. Imai, >>> >>> Thanks for developing the MatchIt package. I have a data set from a >>> treatment, and I want to match it with another data base (control), by age, >>> race, education. Age is the primary match criteria, followed by race, and >>> education. In other words, I want to give more weights to the age >>> difference than those to race difference. Any suggestion how to use your >>> package for this type of matching? >>> >>> Thank you. >>> >>> Guogen >>> >> >>

7 years

Re: [matchit] MatchIt Package

by Kosuke Imai

You can stratify the age variable and then do a matching with an extract restriction on that strata. Kosuke Imai Professor, Department of Politics Center for Statistics and Machine Learning Princeton University http://imai.princeton.edu On Sun, May 14, 2017 at 11:58 PM, Guogen Shan <guogen.shan(a)unlv.edu> wrote: > Dear Prof. Imai, > > Thanks for developing the MatchIt package. I have a data set from a > treatment, and I want to match it with another data base (control), by age, > race, education. Age is the primary match criteria, followed by race, and > education. In other words, I want to give more weights to the age > difference than those to race difference. Any suggestion how to use your > package for this type of matching? > > Thank you. > > Guogen >

7 years

Calculate ATE

by 周宛誼

Dear all, May I ask a question about package: MatchIt (version 2.4-22) in R(3.3.2)? When I want to calculate ate by the code below, it doesn't work. Could you help me solve the problem? Because when I run > s.out1$qi$att.ev, it comes the error: Error in s.out1$qi$att.ev : object of type 'closure' is not subsettable I think it can not find qi object in s.out1 because it's a function? So I can't call the object att.ev from qi? Do you have any suggestion for calculating ate? The code is below: m.out1 <- matchit(treat ~ age + educ + black + hispan + nodegree + married + re74 + re75, method = "nearest", data = lalonde) z.out1 <- zelig(re78 ~ age + educ + black + hispan + nodegree + married + re74 + re75, data = match.data(m.out1, "control"), model = "ls") x.out1 <- setx(z.out1, data = match.data(m.out1, "treat"), cond = TRUE) s.out1 <- sim(z.out1, x = x.out1) z.out2 <- zelig(re78 ~ age + educ + black + hispan + nodegree + married + re74 + re75, data = match.data(m.out1, "treat"), model = "ls") x.out2 <- setx(z.out2, data = match.data(m.out1, "control"), cond = TRUE) s.out2 <- sim(z.out2, x = x.out2) ate.all <- c(s.out1$qi$att.ev, -s.out2$qi$att.ev) Thank you so much Best, Wan-Yi,Chou *Wan-Yi Chou (May Chou), Master* *Management Science, **National Chiao Tung University* *國立交通大學管理科學研究所 * *周宛誼*｜*Email mayritaspring(a)gmail.com <mayritaspring(a)gmail.com> * -- *Wan-Yi Chou (May Chou), Master* *Management Science, **National Chiao Tung University* *國立交通大學管理科學研究所 * *周宛誼*｜*Email mayritaspring(a)gmail.com <mayritaspring(a)gmail.com> *

7 years

Match gets worse with fewer covariates

by William Dudley

I am using MatchIT nearest and mahalanobis distance to match 11 treatment school with a pool of candidate schools with a 1:4 match. My question : If I exclude school size from the match, my Std Differences in the matched sample are higher on a few covariates than if I leave in Size. I would think that fewer covariates should lead to better matching on the remaining covariates. I provide the syntax and output for match with and without Size. The first analysis provides a good match (All ST Diff < .25) match BUT WHen I drop SIZE as a covariate some STD DIffs rise to > .25 . I appologise for th elenghty output but I wanted to be sure that I provided all information Thanks Bill > x2.out <- matchit(tx ~ schlvl + collcred + colltake + + drop + PovPct + racePct + size + + Znat + Zstate, + data = MI317, method = "nearest", + exact = c("schlvl"), ratio=4, Distance ="mahalanobis") > > x2.data <- match.data(x2.out) > > summary(x2.out, standardize=TRUE) Call: matchit(formula = tx ~ schlvl + collcred + colltake + drop + PovPct + racePct + size + Znat + Zstate, data = MI317, method = "nearest", exact = c("schlvl"), ratio = 4, Distance = "mahalanobis") Summary of balance for all data: Means Treated Means Control SD Control Std. Mean Diff. eCDF Med eCDF Mean eCDF Max distance 0.0378 0.0213 0.0209 0.6086 0.2245 0.2219 0.4085 schlvl 0.8182 0.7581 0.4287 0.1486 0.0301 0.0301 0.0601 collcred 0.3208 0.4780 0.6604 -0.3593 0.0794 0.0792 0.1565 colltake 0.1534 0.2005 0.1830 -0.2834 0.0867 0.0903 0.2227 drop 0.0178 0.0149 0.0229 0.1530 0.0602 0.0764 0.2412 PovPct 0.4701 0.4663 0.2008 0.0261 0.0523 0.0634 0.1853 racePct 0.2985 0.2208 0.2946 0.2439 0.0654 0.0802 0.2366 size 622.0909 704.9556 489.4964 -0.2255 0.0504 0.0581 0.1569 Znat -0.3471 0.0094 0.9994 -0.4011 0.0806 0.0845 0.2110 Zstate -0.2346 0.0072 1.0008 -0.2900 0.0927 0.0979 0.2700 Summary of balance for matched data: Means Treated Means Control SD Control Std. Mean Diff. eCDF Med eCDF Mean eCDF Max distance 0.0378 0.0382 0.0270 -0.0127 0.0227 0.0376 0.1136 schlvl 0.8182 0.8182 0.3902 0.0000 0.0000 0.0000 0.0000 collcred 0.3208 0.3932 0.6002 -0.1656 0.0455 0.0597 0.1364 colltake 0.1534 0.1537 0.1405 -0.0024 0.0455 0.0622 0.1818 drop 0.0178 0.0165 0.0236 0.0674 0.0455 0.0838 0.2955 PovPct 0.4701 0.4376 0.1763 0.2193 0.0909 0.0860 0.2045 racePct 0.2985 0.2274 0.3155 0.2231 0.0682 0.0818 0.2045 size 622.0909 548.7045 374.6600 0.1997 0.0909 0.1070 0.3182 Znat -0.3471 -0.2540 0.9097 -0.1047 0.0455 0.0553 0.1591 Zstate -0.2346 -0.1521 0.9579 -0.0989 0.0455 0.0628 0.1818 Percent Balance Improvement: Std. Mean Diff. eCDF Med eCDF Mean eCDF Max distance 97.9195 89.8776 83.0534 72.1848 schlvl 100.0000 100.0000 100.0000 100.0000 collcred 53.9154 42.7252 24.5279 12.8806 colltake 99.1641 47.5687 31.0922 18.3539 drop 55.9415 24.5053 -9.6510 -22.4924 PovPct -739.5158 -73.7303 -35.4728 -10.3858 racePct 8.5286 -4.2017 -2.0508 13.5554 size 11.4384 -80.3636 -84.1409 -102.8037 Znat 73.8822 43.6364 34.5807 24.5873 Zstate 65.8771 50.9881 35.8666 32.6544 Sample sizes: Control Treated All 496 11 Matched 44 11 Unmatched 452 0 Discarded 0 0 ************************************* Matchit without school size NOTE racepct and CollTake get worse Call: matchit(formula = tx ~ schlvl + collcred + colltake + drop + PovPct + racePct + Znat + Zstate, data = MI317, method = "nearest", exact = c("schlvl"), ratio = 4, Distance = "mahalanobis") Summary of balance for all data: Means Treated Means Control SD Control Std. Mean Diff. eCDF Med eCDF Mean eCDF Max distance 0.0378 0.0213 0.0208 0.6133 0.2183 0.2186 0.3990 schlvl 0.8182 0.7581 0.4287 0.1486 0.0301 0.0301 0.0601 collcred 0.3208 0.4780 0.6604 -0.3593 0.0794 0.0792 0.1565 colltake 0.1534 0.2005 0.1830 -0.2834 0.0867 0.0903 0.2227 drop 0.0178 0.0149 0.0229 0.1530 0.0602 0.0764 0.2412 PovPct 0.4701 0.4663 0.2008 0.0261 0.0523 0.0634 0.1853 racePct 0.2985 0.2208 0.2946 0.2439 0.0654 0.0802 0.2366 Znat -0.3471 0.0094 0.9994 -0.4011 0.0806 0.0845 0.2110 Zstate -0.2346 0.0072 1.0008 -0.2900 0.0927 0.0979 0.2700 Summary of balance for matched data: Means Treated Means Control SD Control Std. Mean Diff. eCDF Med eCDF Mean eCDF Max distance 0.0378 0.0371 0.0247 0.0265 0.0455 0.0488 0.1364 schlvl 0.8182 0.8182 0.3902 0.0000 0.0000 0.0000 0.0000 collcred 0.3208 0.2884 0.3777 0.0740 0.0455 0.0506 0.1591 colltake 0.1534 0.1282 0.1294 0.1513 0.0682 0.0750 0.1591 drop 0.0178 0.0185 0.0269 -0.0370 0.0455 0.0554 0.1818 PovPct 0.4701 0.4500 0.1593 0.1358 0.0682 0.0694 0.1818 racePct 0.2985 0.2023 0.2746 0.3017 0.0682 0.0868 0.2500 Znat -0.3471 -0.2889 0.7923 -0.0655 0.0455 0.0559 0.1818 Zstate -0.2346 -0.1612 0.7852 -0.0880 0.0455 0.0591 0.1591 Percent Balance Improvement: Std. Mean Diff. eCDF Med eCDF Mean eCDF Max distance 95.6857 79.1772 77.6937 65.8245 schlvl 100.0000 100.0000 100.0000 100.0000 collcred 79.4135 42.7252 36.0128 -1.6393 colltake 46.6206 21.3531 16.9131 28.5597 drop 75.8229 24.5053 27.5071 24.6201 PovPct -419.9426 -30.2977 -9.4203 1.8793 racePct -23.7074 -4.2017 -8.2357 -5.6545 Znat 83.6764 43.6364 33.8539 13.8141 Zstate 69.6610 50.9881 39.6640 41.0726 Sample sizes: Control Treated All 496 11 Matched 44 11 Unmatched 452 0 Discarded 0 0 > > -- William N. Dudley, PhD Professor - Public Health Education The School of Health and Human Sciences The University of North Carolina at Greensboro 437-L Coleman Building Greensboro, NC 27402-6170 Visit my Web Site <http://www.uncg.edu/phe/faculty/dudley.html> VOICE 336.256 2475 <(336)%20256-2475>

7 years

Jump to page:

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Matchit