Matchit June 2020

matchit@lists.gking.harvard.edu

2 participants
2 discussions

Re: [matchit] Possibly unexpected behavior using matchit for cem

by Gary King

Hi Alberto, this is an interesting discovery! I don't recall intending this, and so we should figure out why we did this. Can I ask that you send this to the CEM mailing list at GaryKing.org/cem so we can include Stefano and Beppe in the conversation? Gary -- *Gary King* - Albert J. Weatherhead III University Professor - Director, IQSS <http://iq.harvard.edu/> - Harvard University GaryKing.org <http://garyking.org/> - King(a)Harvard.edu - @KingGary <https://twitter.com/kinggary> - 617-500-7570 - Assistant <king-assist(a)iq.harvard.edu>: 617-495-9271 On Sat, Jun 6, 2020 at 10:58 AM Alberto Palomar < alberto.palomares.ny(a)gmail.com> wrote: > Hello, > Firstly thank you for this very useful package. I came to it via > the CEM package in the course of trying to explore how some versions > of PSM would compare on some clinical data we are analyzing to > retrospectively make an estimate of how effective a certain therapy is > for moderately severe hospitalized Covid-19 cases. While doing this > there seemed to be some unexpected behavior with respect to whether > the treatment variable is included as an integer or as a factor, which > I do not think I understand after reading the package documentation, > vignettes, and the explanation of weights (at > > https://docs.google.com/document/d/1xQwyLt_6EXdNpA685LjmhjO20y5pZDZYwe2qeNo… > ). > > Briefly, when the treatment variable is a factor, the output of > matchit with the cem method appears to use the opposite convention for > assigning weights [i.e. *un*treated weights are 1 and treated weights > are presumably (m_C/m_T)*(m^s_T/m^s_C) ] from what comes out of cem > directly (regardless of whether the treatment variable is integer or > factor in the cem input data). So then after applying match.data() to > the matchit outputs, the results of subsequent regressions (have > tested lm and glm) with the corresponding weights also differ > depending on whether the treatment variable is coded as integer or as > a factor. Is this an expected behavior? > > Here is a somewhat minimal example using the LaLonde data, with > apologies for its inelegance: > > #### > library(cem) > library(MatchIt) > > data("LeLonde") > #set up a data frame with treatment as integer (Le) and another with > treatment.f as factor (Le.f) > Le <- data.frame(na.omit(LeLonde)) > Le.f <- Le > Le.f$treated <- as.factor(Le.f$treated) > colnames(Le.f)[which(colnames(Le.f) == 'treated')] <- 'treated.f' > > #for simplicity will match only on age, re74, and q1; here make lists > of the other variables to drop in the cem commands > LeLonde_vars_to_match <- c("age", "re74", "q1") > LeLonde_vars_to_drop <- setdiff(names(Le), LeLonde_vars_to_match) > LeLonde_vars_to_drop.f <- setdiff(names(Le.f), LeLonde_vars_to_match) > > #use cem directly to match between treated/untreated on age, re74, and > q1; first with treated as integer, second with treated.f as factor > mat.cem <- cem(treatment = "treated", data = Le, drop = > LeLonde_vars_to_drop, eval.imbalance = TRUE, keep.all = TRUE) > mat.f.cem <- cem(treatment = "treated.f", data = Le.f, drop = > LeLonde_vars_to_drop.f, eval.imbalance = TRUE, keep.all = TRUE) > > identical(mat.cem$w, mat.f.cem$w) #is true, weights from cem do not > depend on whether treatment variable is integer or factor > > #use matchit with method cem to match between treated/untreated on > age, re74, and q1; first with treated as integer, second with > treated.f as factor > mat.mi <- matchit(treated ~ age + re74 + q1, Le, method = "cem") > mat.f.mi <- matchit(treated.f ~ age + re74 + q1, Le.f, method = "cem") > > identical(mat.mi$weights, mat.f.mi$weights) #is false, weights from > matchit with method cem do depend on whether treatment variable is > integer or factor, seemingly by different choice of whether control or > treated weights are set to 1, as the same data entries are selected by > the match, as suggested by plot(match.data(mat.mi)$re78, > match.data(mat.f.mi)$re78) being linear with slope 1 > identical(mat.cem$w, mat.mi$weights) #is true, indicating that the > convention from cem coincides with using matchit with method cem when > the treatment variable is integer > > #generate matched datasets, first with treated as integer, second with > treated.f as factor > matched.data.mi <- match.data(mat.mi) > matched.data.f.mi <- match.data(mat.f.mi) > > #compute the linear models for re78 ~ treated, first with treated as > integer, second with treated.f as factor > mod.mi <- lm(re78 ~ treated, matched.data.mi, weights = > matched.data.mi$weights) > mod.f.mi <- lm(re78 ~ treated.f, matched.data.f.mi, weights = > matched.data.f.mi$weights) > identical(mod.mi$coefficients["treated"], > mod.f.mi$coefficients["treated.f1"]) #is false, the estimates depend > on whether the treatment variable is integer or factor > > mod.mi.g <- glm(re78 ~ treated, data = matched.data.mi, weights = > matched.data.mi$weights) > mod.f.mi.g <- glm(re78 ~ treated.f, data = matched.data.f.mi, weights > = matched.data.f.mi$weights) > identical(mod.mi.g$coefficients["treated"], > mod.f.mi.g$coefficients["treated.f1"]) #is false, the estimates depend > on whether the treatment variable is integer or factor > > #### > > There is a separate peculiarity when using cem directly with the > treatment variable as a factor, where the att() command fails with > error messages saying that treated.f is not a factor, but I have not > delved into that as much yet and if it resists further analysis might > post it to the CEM mailing list. > > In the short term I think a reasonable solution is just not to cast > the treatment variable as a factor in matchit when using cem - am > sorry if this was advised in the documentation somewhere and I missed > it - but it might be worth considering if matchit should be able to > tolerate having the treatment variable as a factor as well (or raise a > runtime alert about possibly different output). > > Thank you, > Alberto > - > --- > MatchIt mailing list served by HUIT > List Address: matchit(a)lists.gking.harvard.edu > Subscribe/Unsubscribe: http://lists.gking.harvard.edu/mailman/listinfo/ei > MatchIt Software and Documentation: http://gking.harvard.edu/matchit/ > Browse/Search <http://gking.harvard.edu/matchit/Browse/Search> List > Archive: http://lists.gking.harvard.edu/mailman/private/matchit/ > Matchit mailing list > Matchit(a)lists.gking.harvard.edu > > To unsubscribe from this list or get other information: > > https://lists.gking.harvard.edu/mailman/listinfo/matchit >

3 years, 11 months

Possibly unexpected behavior using matchit for cem

by Alberto Palomar

Hello, Firstly thank you for this very useful package. I came to it via the CEM package in the course of trying to explore how some versions of PSM would compare on some clinical data we are analyzing to retrospectively make an estimate of how effective a certain therapy is for moderately severe hospitalized Covid-19 cases. While doing this there seemed to be some unexpected behavior with respect to whether the treatment variable is included as an integer or as a factor, which I do not think I understand after reading the package documentation, vignettes, and the explanation of weights (at https://docs.google.com/document/d/1xQwyLt_6EXdNpA685LjmhjO20y5pZDZYwe2qeNo…). Briefly, when the treatment variable is a factor, the output of matchit with the cem method appears to use the opposite convention for assigning weights [i.e. *un*treated weights are 1 and treated weights are presumably (m_C/m_T)*(m^s_T/m^s_C) ] from what comes out of cem directly (regardless of whether the treatment variable is integer or factor in the cem input data). So then after applying match.data() to the matchit outputs, the results of subsequent regressions (have tested lm and glm) with the corresponding weights also differ depending on whether the treatment variable is coded as integer or as a factor. Is this an expected behavior? Here is a somewhat minimal example using the LaLonde data, with apologies for its inelegance: #### library(cem) library(MatchIt) data("LeLonde") #set up a data frame with treatment as integer (Le) and another with treatment.f as factor (Le.f) Le <- data.frame(na.omit(LeLonde)) Le.f <- Le Le.f$treated <- as.factor(Le.f$treated) colnames(Le.f)[which(colnames(Le.f) == 'treated')] <- 'treated.f' #for simplicity will match only on age, re74, and q1; here make lists of the other variables to drop in the cem commands LeLonde_vars_to_match <- c("age", "re74", "q1") LeLonde_vars_to_drop <- setdiff(names(Le), LeLonde_vars_to_match) LeLonde_vars_to_drop.f <- setdiff(names(Le.f), LeLonde_vars_to_match) #use cem directly to match between treated/untreated on age, re74, and q1; first with treated as integer, second with treated.f as factor mat.cem <- cem(treatment = "treated", data = Le, drop = LeLonde_vars_to_drop, eval.imbalance = TRUE, keep.all = TRUE) mat.f.cem <- cem(treatment = "treated.f", data = Le.f, drop = LeLonde_vars_to_drop.f, eval.imbalance = TRUE, keep.all = TRUE) identical(mat.cem$w, mat.f.cem$w) #is true, weights from cem do not depend on whether treatment variable is integer or factor #use matchit with method cem to match between treated/untreated on age, re74, and q1; first with treated as integer, second with treated.f as factor mat.mi <- matchit(treated ~ age + re74 + q1, Le, method = "cem") mat.f.mi <- matchit(treated.f ~ age + re74 + q1, Le.f, method = "cem") identical(mat.mi$weights, mat.f.mi$weights) #is false, weights from matchit with method cem do depend on whether treatment variable is integer or factor, seemingly by different choice of whether control or treated weights are set to 1, as the same data entries are selected by the match, as suggested by plot(match.data(mat.mi)$re78, match.data(mat.f.mi)$re78) being linear with slope 1 identical(mat.cem$w, mat.mi$weights) #is true, indicating that the convention from cem coincides with using matchit with method cem when the treatment variable is integer #generate matched datasets, first with treated as integer, second with treated.f as factor matched.data.mi <- match.data(mat.mi) matched.data.f.mi <- match.data(mat.f.mi) #compute the linear models for re78 ~ treated, first with treated as integer, second with treated.f as factor mod.mi <- lm(re78 ~ treated, matched.data.mi, weights = matched.data.mi$weights) mod.f.mi <- lm(re78 ~ treated.f, matched.data.f.mi, weights = matched.data.f.mi$weights) identical(mod.mi$coefficients["treated"], mod.f.mi$coefficients["treated.f1"]) #is false, the estimates depend on whether the treatment variable is integer or factor mod.mi.g <- glm(re78 ~ treated, data = matched.data.mi, weights = matched.data.mi$weights) mod.f.mi.g <- glm(re78 ~ treated.f, data = matched.data.f.mi, weights = matched.data.f.mi$weights) identical(mod.mi.g$coefficients["treated"], mod.f.mi.g$coefficients["treated.f1"]) #is false, the estimates depend on whether the treatment variable is integer or factor #### There is a separate peculiarity when using cem directly with the treatment variable as a factor, where the att() command fails with error messages saying that treated.f is not a factor, but I have not delved into that as much yet and if it resists further analysis might post it to the CEM mailing list. In the short term I think a reasonable solution is just not to cast the treatment variable as a factor in matchit when using cem - am sorry if this was advised in the documentation somewhere and I missed it - but it might be worth considering if matchit should be able to tolerate having the treatment variable as a factor as well (or raise a runtime alert about possibly different output). Thank you, Alberto

3 years, 11 months

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Matchit June 2020