Matchit

matchit@lists.gking.harvard.edu

373 discussions

by Lawrence, Emily

Thanks for your interest in MatchIt! We have replaced this mailing list with a github issues forum, and so if you have a question, bug report, or anything else please post an issue here https://github.com/kosukeimai/MatchIt/issues. And please also see the new MatchIt website https://kosukeimai.github.io/MatchIt/index.html. Best, Dan Ho, Kosuke Imai, Gary King, Liz Stuart, and Noah Greifer

3 years, 3 months

Is the matched sample from MatchIt probabilistic?

by Hong Chen

Hello, I am very interested in applying 'MatchIt' to a health study, but I noticed that each time when I ran MatchIt() using the nearest neighbor matching with a caliper distance of 0.2 and with replacement, the resulting matched set yielded a slightly different number of matched treated and control subjects, despite the same PS model. I thought that the matched set is invariable, but maybe my intuitions about PS matching are not very good. To illustrate this, I attached a R code below: ==== library(MatchIt) m.out <- matchit(treat ~ re74 + re75 + educ + factor(race) + age, data = lalonde, method = "nearest", ratio=3, caliper=0.2, replacement=TRUE) m.out # 1 #Sample sizes: # Control Treated #All 429 185 #Matched 178 115 #Unmatched 251 70 #Discarded 0 0 # 2 #Sample sizes: # Control Treated #All 429 185 #Matched 178 116 #Unmatched 251 69 #Discarded 0 0 ==== Thank you, Hong

3 years, 8 months

Re: [matchit] Possibly unexpected behavior using matchit for cem

by Gary King

Hi Alberto, this is an interesting discovery! I don't recall intending this, and so we should figure out why we did this. Can I ask that you send this to the CEM mailing list at GaryKing.org/cem so we can include Stefano and Beppe in the conversation? Gary -- *Gary King* - Albert J. Weatherhead III University Professor - Director, IQSS <http://iq.harvard.edu/> - Harvard University GaryKing.org <http://garyking.org/> - King(a)Harvard.edu - @KingGary <https://twitter.com/kinggary> - 617-500-7570 - Assistant <king-assist(a)iq.harvard.edu>: 617-495-9271 On Sat, Jun 6, 2020 at 10:58 AM Alberto Palomar < alberto.palomares.ny(a)gmail.com> wrote: > Hello, > Firstly thank you for this very useful package. I came to it via > the CEM package in the course of trying to explore how some versions > of PSM would compare on some clinical data we are analyzing to > retrospectively make an estimate of how effective a certain therapy is > for moderately severe hospitalized Covid-19 cases. While doing this > there seemed to be some unexpected behavior with respect to whether > the treatment variable is included as an integer or as a factor, which > I do not think I understand after reading the package documentation, > vignettes, and the explanation of weights (at > > https://docs.google.com/document/d/1xQwyLt_6EXdNpA685LjmhjO20y5pZDZYwe2qeNo… > ). > > Briefly, when the treatment variable is a factor, the output of > matchit with the cem method appears to use the opposite convention for > assigning weights [i.e. *un*treated weights are 1 and treated weights > are presumably (m_C/m_T)*(m^s_T/m^s_C) ] from what comes out of cem > directly (regardless of whether the treatment variable is integer or > factor in the cem input data). So then after applying match.data() to > the matchit outputs, the results of subsequent regressions (have > tested lm and glm) with the corresponding weights also differ > depending on whether the treatment variable is coded as integer or as > a factor. Is this an expected behavior? > > Here is a somewhat minimal example using the LaLonde data, with > apologies for its inelegance: > > #### > library(cem) > library(MatchIt) > > data("LeLonde") > #set up a data frame with treatment as integer (Le) and another with > treatment.f as factor (Le.f) > Le <- data.frame(na.omit(LeLonde)) > Le.f <- Le > Le.f$treated <- as.factor(Le.f$treated) > colnames(Le.f)[which(colnames(Le.f) == 'treated')] <- 'treated.f' > > #for simplicity will match only on age, re74, and q1; here make lists > of the other variables to drop in the cem commands > LeLonde_vars_to_match <- c("age", "re74", "q1") > LeLonde_vars_to_drop <- setdiff(names(Le), LeLonde_vars_to_match) > LeLonde_vars_to_drop.f <- setdiff(names(Le.f), LeLonde_vars_to_match) > > #use cem directly to match between treated/untreated on age, re74, and > q1; first with treated as integer, second with treated.f as factor > mat.cem <- cem(treatment = "treated", data = Le, drop = > LeLonde_vars_to_drop, eval.imbalance = TRUE, keep.all = TRUE) > mat.f.cem <- cem(treatment = "treated.f", data = Le.f, drop = > LeLonde_vars_to_drop.f, eval.imbalance = TRUE, keep.all = TRUE) > > identical(mat.cem$w, mat.f.cem$w) #is true, weights from cem do not > depend on whether treatment variable is integer or factor > > #use matchit with method cem to match between treated/untreated on > age, re74, and q1; first with treated as integer, second with > treated.f as factor > mat.mi <- matchit(treated ~ age + re74 + q1, Le, method = "cem") > mat.f.mi <- matchit(treated.f ~ age + re74 + q1, Le.f, method = "cem") > > identical(mat.mi$weights, mat.f.mi$weights) #is false, weights from > matchit with method cem do depend on whether treatment variable is > integer or factor, seemingly by different choice of whether control or > treated weights are set to 1, as the same data entries are selected by > the match, as suggested by plot(match.data(mat.mi)$re78, > match.data(mat.f.mi)$re78) being linear with slope 1 > identical(mat.cem$w, mat.mi$weights) #is true, indicating that the > convention from cem coincides with using matchit with method cem when > the treatment variable is integer > > #generate matched datasets, first with treated as integer, second with > treated.f as factor > matched.data.mi <- match.data(mat.mi) > matched.data.f.mi <- match.data(mat.f.mi) > > #compute the linear models for re78 ~ treated, first with treated as > integer, second with treated.f as factor > mod.mi <- lm(re78 ~ treated, matched.data.mi, weights = > matched.data.mi$weights) > mod.f.mi <- lm(re78 ~ treated.f, matched.data.f.mi, weights = > matched.data.f.mi$weights) > identical(mod.mi$coefficients["treated"], > mod.f.mi$coefficients["treated.f1"]) #is false, the estimates depend > on whether the treatment variable is integer or factor > > mod.mi.g <- glm(re78 ~ treated, data = matched.data.mi, weights = > matched.data.mi$weights) > mod.f.mi.g <- glm(re78 ~ treated.f, data = matched.data.f.mi, weights > = matched.data.f.mi$weights) > identical(mod.mi.g$coefficients["treated"], > mod.f.mi.g$coefficients["treated.f1"]) #is false, the estimates depend > on whether the treatment variable is integer or factor > > #### > > There is a separate peculiarity when using cem directly with the > treatment variable as a factor, where the att() command fails with > error messages saying that treated.f is not a factor, but I have not > delved into that as much yet and if it resists further analysis might > post it to the CEM mailing list. > > In the short term I think a reasonable solution is just not to cast > the treatment variable as a factor in matchit when using cem - am > sorry if this was advised in the documentation somewhere and I missed > it - but it might be worth considering if matchit should be able to > tolerate having the treatment variable as a factor as well (or raise a > runtime alert about possibly different output). > > Thank you, > Alberto > - > --- > MatchIt mailing list served by HUIT > List Address: matchit(a)lists.gking.harvard.edu > Subscribe/Unsubscribe: http://lists.gking.harvard.edu/mailman/listinfo/ei > MatchIt Software and Documentation: http://gking.harvard.edu/matchit/ > Browse/Search <http://gking.harvard.edu/matchit/Browse/Search> List > Archive: http://lists.gking.harvard.edu/mailman/private/matchit/ > Matchit mailing list > Matchit(a)lists.gking.harvard.edu > > To unsubscribe from this list or get other information: > > https://lists.gking.harvard.edu/mailman/listinfo/matchit >

3 years, 11 months

Possibly unexpected behavior using matchit for cem

by Alberto Palomar

Hello, Firstly thank you for this very useful package. I came to it via the CEM package in the course of trying to explore how some versions of PSM would compare on some clinical data we are analyzing to retrospectively make an estimate of how effective a certain therapy is for moderately severe hospitalized Covid-19 cases. While doing this there seemed to be some unexpected behavior with respect to whether the treatment variable is included as an integer or as a factor, which I do not think I understand after reading the package documentation, vignettes, and the explanation of weights (at https://docs.google.com/document/d/1xQwyLt_6EXdNpA685LjmhjO20y5pZDZYwe2qeNo…). Briefly, when the treatment variable is a factor, the output of matchit with the cem method appears to use the opposite convention for assigning weights [i.e. *un*treated weights are 1 and treated weights are presumably (m_C/m_T)*(m^s_T/m^s_C) ] from what comes out of cem directly (regardless of whether the treatment variable is integer or factor in the cem input data). So then after applying match.data() to the matchit outputs, the results of subsequent regressions (have tested lm and glm) with the corresponding weights also differ depending on whether the treatment variable is coded as integer or as a factor. Is this an expected behavior? Here is a somewhat minimal example using the LaLonde data, with apologies for its inelegance: #### library(cem) library(MatchIt) data("LeLonde") #set up a data frame with treatment as integer (Le) and another with treatment.f as factor (Le.f) Le <- data.frame(na.omit(LeLonde)) Le.f <- Le Le.f$treated <- as.factor(Le.f$treated) colnames(Le.f)[which(colnames(Le.f) == 'treated')] <- 'treated.f' #for simplicity will match only on age, re74, and q1; here make lists of the other variables to drop in the cem commands LeLonde_vars_to_match <- c("age", "re74", "q1") LeLonde_vars_to_drop <- setdiff(names(Le), LeLonde_vars_to_match) LeLonde_vars_to_drop.f <- setdiff(names(Le.f), LeLonde_vars_to_match) #use cem directly to match between treated/untreated on age, re74, and q1; first with treated as integer, second with treated.f as factor mat.cem <- cem(treatment = "treated", data = Le, drop = LeLonde_vars_to_drop, eval.imbalance = TRUE, keep.all = TRUE) mat.f.cem <- cem(treatment = "treated.f", data = Le.f, drop = LeLonde_vars_to_drop.f, eval.imbalance = TRUE, keep.all = TRUE) identical(mat.cem$w, mat.f.cem$w) #is true, weights from cem do not depend on whether treatment variable is integer or factor #use matchit with method cem to match between treated/untreated on age, re74, and q1; first with treated as integer, second with treated.f as factor mat.mi <- matchit(treated ~ age + re74 + q1, Le, method = "cem") mat.f.mi <- matchit(treated.f ~ age + re74 + q1, Le.f, method = "cem") identical(mat.mi$weights, mat.f.mi$weights) #is false, weights from matchit with method cem do depend on whether treatment variable is integer or factor, seemingly by different choice of whether control or treated weights are set to 1, as the same data entries are selected by the match, as suggested by plot(match.data(mat.mi)$re78, match.data(mat.f.mi)$re78) being linear with slope 1 identical(mat.cem$w, mat.mi$weights) #is true, indicating that the convention from cem coincides with using matchit with method cem when the treatment variable is integer #generate matched datasets, first with treated as integer, second with treated.f as factor matched.data.mi <- match.data(mat.mi) matched.data.f.mi <- match.data(mat.f.mi) #compute the linear models for re78 ~ treated, first with treated as integer, second with treated.f as factor mod.mi <- lm(re78 ~ treated, matched.data.mi, weights = matched.data.mi$weights) mod.f.mi <- lm(re78 ~ treated.f, matched.data.f.mi, weights = matched.data.f.mi$weights) identical(mod.mi$coefficients["treated"], mod.f.mi$coefficients["treated.f1"]) #is false, the estimates depend on whether the treatment variable is integer or factor mod.mi.g <- glm(re78 ~ treated, data = matched.data.mi, weights = matched.data.mi$weights) mod.f.mi.g <- glm(re78 ~ treated.f, data = matched.data.f.mi, weights = matched.data.f.mi$weights) identical(mod.mi.g$coefficients["treated"], mod.f.mi.g$coefficients["treated.f1"]) #is false, the estimates depend on whether the treatment variable is integer or factor #### There is a separate peculiarity when using cem directly with the treatment variable as a factor, where the att() command fails with error messages saying that treated.f is not a factor, but I have not delved into that as much yet and if it resists further analysis might post it to the CEM mailing list. In the short term I think a reasonable solution is just not to cast the treatment variable as a factor in matchit when using cem - am sorry if this was advised in the documentation somewhere and I missed it - but it might be worth considering if matchit should be able to tolerate having the treatment variable as a factor as well (or raise a runtime alert about possibly different output). Thank you, Alberto

3 years, 11 months

Fwd: Doubt MatchIt function

by JESUS HERNANDO SARRIA PEDROZA

Hello I'm new in this mail list. Please I have a doubt about the matchIt outcomes. Below you find a description of my doubt, I will be very grateful if you could help me. Lastly, how can I save a MatchiT exit like this (match_model) in my hard disk . match_model <- matchit(IO~AGE+SIZE2008+EBTA2008+R_LTEB2008+R_TASH2008+R_EBOP2008+R_CLTA2008, data = data123.psm_nm, method = "nearest", distance = "mahalanobis", discard = "both", ratio=1, caliper=0.25) Thank you so much. JSarria. ---------- Forwarded message --------- De: JESUS HERNANDO SARRIA PEDROZA <jsarria(a)ucm.es> Date: jue., 26 mar. 2020 a las 14:17 Subject: Re: Doubt MatchIt function To: King, Gary <king(a)harvard.edu> Thank you Dr. King. The attached dataset contains information about financial variables like leverage, size, age, Ebitda Margin, Debt Quality, and the variable IO that identify Treat=1 and control= 0. DD_OPRE0812 is the income (diff in diff) variable between (2008-2012) in percent to measure the impact of treat (soft-loan). And MN_OPRE0608 is the average of income between 2006 and 2008 to test parallel path to apply Diff in Diff. The treatment year is 2009. I Run "lm" model (in the attached R-Script) with IO variable and covariables used in the matching pre-proces to calculate the impact. The coefficient of IO will be the impact, like you do in the paper MatchingFrontier: Automated Matching for Causal Inference in conclusion section To check outcomes please run step 2 and 3 twice and compare output. These are different. In section 4 there are some outcomes that I ran. Pd: Please update the path to call the dataset. Thank you so much Dr. King. By last, I used Matching Frontier with this dataset but i don't get good balance for this reason I´m try with mahalanobis distance. Thanks again Dr. King. I will be attentive to your requirements. Best regards. JSarria. El jue., 26 mar. 2020 a las 12:31, Gary King (<thegaryking(a)gmail.com>) escribió: > Hi Jesus, thanks for your note. Why don't you send a note to the matching > email list with an explanation of exactly what you ran, perhaps with some > code, and what you are seeing, and we or someone will help you figure > it out. > Best, > Gary > -- > *Gary King* - Albert J. Weatherhead III University Professor - Director, > IQSS <http://iq.harvard.edu/> - Harvard University > GaryKing.org <http://garyking.org/> - King(a)Harvard.edu - @KingGary > <https://twitter.com/kinggary> - 617-500-7570 - Assistant > <king-assist(a)iq.harvard.edu>: 617-495-9271 > > > On Thu, Mar 26, 2020 at 5:38 AM JESUS HERNANDO SARRIA PEDROZA < > jsarria(a)ucm.es> wrote: > >> Dear Dr. King, I hope you are healthy and managing well despite the >> situation. >> >> I was in contact with you some months ago, Please can you help me with >> some doubt? Is about Matchit function indeed about Mahalanobis distance >> option. What is the reason why in every calculate of the matchit function >> yield different outcomes, I have read the documents about MatchIt, but only >> said *"We reestimate the matching procedure until we achieve the best >> balance possible. The running examples here are meant merely to illustrate, >> not to suggest that we've achieved the best balance*". ¿Why this occur? >> Because when I use parametric analysis after matching pre-process, the >> parameters are different. So How can I contrast when I get the best >> balance. There is some function in the package or maybe i can do that with >> some loop. >> >> By last I have ordered my dataset the first 5000 rows are the control and >> the last 300 are treated. This has an influence on the outcome or need to >> be in randomly order >> >> Thank you so much. >> >> Jesús S. >> >

4 years, 1 month

about missing value in matchit running

by Y. Chen

Hi, all, I want to run matchit on my computer. Here is my code: mout<-matchit(treat ~ Age+Gender..M.F.+stage,data=D1,method = "nearest", distance = "logit",na.action=na.pass) It always gave me error message: Missing values exist in the data I tried with na.action and without na.action. Both of them does not work. I checked all the four variables in the formula, and all of them showing that sum(is.na(D1$var))=0. What is the problem? Thanks everyone. Best,

4 years, 3 months

Questions about categorical variable in matchit()

by YY J

Dear Matchit authors: Hope this email finds you well. I have some questions about categorical variable in matchit(). When I load the data using read.csv(), I keep the stringsAsFactors = TRUE (default) and the categorical variables are stored as factor with n levels. For example, gender will be factor w/2 levels "Female", "male": 1 1 1 2; Treatment will be factor w/ 2 levels "Treatment",..: 2 2 2 2 2 2 2 2 2 2 ..." ; RELIGION will be "factor w/ 3 levels "Catholic","None",..: 1 2 3 3 3 3 3 1 1 1 ..." . I wonder: 1. whether matchit() works with factors now? 2. if not, could I just convert the categorical variables to numeric, i.e. gender to 1 and 2; religion to 1, 2, 3; treatment to 1, 2? 3. Or I need to not only convert the categorical variables to numeric but also need to manually code them as dummies i.e, convert gender to 0 and 1, convert religion to n-1 dummy variables (Catholic vs. Other; Catholic vs. None), convert treatment to 0 and 1? Thank you in advance for your help. Yanyi

4 years, 11 months

Re: [matchit] Matchit Digest, Vol 183, Issue 1

by YY J

Dear Dr. Rosenbaum: Thank you so much for your confirmation! Yanyi On Wed, Jun 5, 2019 at 12:05 PM Janet Rosenbaum < Janet.Rosenbaum(a)downstate.edu> wrote: > Convert to dummies. > > > On Jun 5, 2019, at 12:00 PM, matchit-request(a)lists.gking.harvard.edu > wrote: > > > > Send Matchit mailing list submissions to > > matchit(a)lists.gking.harvard.edu > > > > To subscribe or unsubscribe via the World Wide Web, visit > > https://lists.gking.harvard.edu/mailman/listinfo/matchit > > or, via email, send a message with subject or body 'help' to > > matchit-request(a)lists.gking.harvard.edu > > > > You can reach the person managing the list at > > matchit-owner(a)lists.gking.harvard.edu > > > > When replying, please edit your Subject line so it is more specific > > than "Re: Contents of Matchit digest..." > > > > > > Today's Topics: > > > > 1. Questions about categorical variable in matchit() (YY J) > > > > > > ---------------------------------------------------------------------- > > > > Message: 1 > > Date: Wed, 5 Jun 2019 11:16:14 -0400 > > From: YY J <jiangyanyi01(a)gmail.com> > > To: matchit(a)lists.gking.harvard.edu > > Subject: [matchit] Questions about categorical variable in matchit() > > Message-ID: > > <CAFqjsgvjJ+8T4Mv7HctsNpzjP5ocMGEaM8fQQy3sj7qneh6uOg(a)mail.gmail.com> > > Content-Type: text/plain; charset="utf-8" > > > > Dear Matchit authors: > > > > Hope this email finds you well. > > > > I have some questions about categorical variable in matchit(). When I > load > > the data using read.csv(), I keep the stringsAsFactors = TRUE (default) > and > > the categorical variables are stored as factor with n levels. For > example, > > gender will be factor w/2 levels "Female", "male": 1 1 1 2; Treatment > will > > be factor w/ 2 levels "Treatment",..: 2 2 2 2 2 2 2 2 2 2 ..." ; RELIGION > > will be "factor w/ 3 levels "Catholic","None",..: 1 2 3 3 3 3 3 1 1 1 > ..." > > . > > > > I wonder: > > > > 1. whether matchit() works with factors now? > > 2. if not, could I just convert the categorical variables to numeric, > i.e. > > gender to 1 and 2; religion to 1, 2, 3; treatment to 1, 2? > > 3. Or I need to not only convert the categorical variables to numeric but > > also need to manually code them as dummies i.e, convert gender to 0 and > 1, > > convert religion to n-1 dummy variables (Catholic vs. Other; Catholic vs. > > None), convert treatment to 0 and 1? > > > > Thank you in advance for your help. > > > > Yanyi > >

4 years, 11 months

[Question for help] Get pscore for unmatched data

by 张波

Hello all, If I want to get pscore for unmatched data. Is there a method that I can use ? Thank you very much for your help! Bo

5 years

Do results of full matching depend on order of input data?

by Widdecke, Kai Arne

Hello all, When using the matchit-function for full matching, the results differ by the order of the input dataframe. That is, if the order of the data is changed, results change, too. This is surprising, because in my understanding, the optimal full algorithm should yield only one single best solution. Am I missing something or is this an error? Similar differences occur with the optimal algorithm. Below you find a reproducible example. In my understanding, subclasses should be identical for the two data sets, which they are not. Thank you very much for your help! Kai # create data set.seed(42) nr <- c(1:100) x1 <- rnorm(100, mean=50, sd=20) x2 <- c(rep("a", 20),rep("b", 60), rep("c", 20)) x3 <- rnorm(100, mean=230, sd=2) outcome <- rnorm(100, mean=500, sd=20) group <- c(rep(0, 50),rep(1, 50)) df <- data.frame(x1=x1, x2=x2, outcome=outcome, group=group, row.names=nr, nr=nr) df_neworder <- df[order(outcome),] # re-order data.frame # perform matching model_oldorder <- matchit(group~x1, data=df, method="full", distance ="logit") model_neworder <- matchit(group~x1, data=df_neworder, method="full", distance ="logit") # store matching results matcheddata_oldorder <- match.data(model_oldorder, distance="pscore") matcheddata_neworder <- match.data(model_neworder, distance="pscore") # Results based on original data.frame head(matcheddata_oldorder[order(nr),], 10) x1 x2 outcome group nr pscore weights subclass 1 69.773776 a 489.1769 0 1 0.5409943 1.0 27 2 63.949637 a 529.2733 0 2 0.5283582 1.0 32 3 52.217666 a 526.7928 0 3 0.5028106 0.5 17 4 48.936397 a 492.9255 0 4 0.4956569 1.0 9 5 36.501507 a 512.9301 0 5 0.4685876 1.0 16 # Results based on re-ordered data.frame head(matcheddata_neworder[order(matcheddata_neworder$nr),], 10) x1 x2 outcome group nr pscore weights subclass 1 69.773776 a 489.1769 0 1 0.5409943 1.0 25 2 63.949637 a 529.2733 0 2 0.5283582 1.0 31 3 52.217666 a 526.7928 0 3 0.5028106 0.5 15 4 48.936397 a 492.9255 0 4 0.4956569 1.0 7 5 36.501507 a 512.9301 0 5 0.4685876 2.0 14 Kai Widdecke, M. Sc. Universität Hamburg

5 years

Jump to page:

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Matchit