Matchit January 2017

matchit@lists.gking.harvard.edu

5 participants
9 discussions

Re: [matchit] Matching with replacement and weights

by Gary King

Hi Ignacio, here's the simplest explanation of weights I could come up with for one method (CEM), but it applies more generally to matching with replacement: j.mp/CEMweights Gary -- *Gary King* - Albert J. Weatherhead III University Professor - Director, IQSS <http://iq.harvard.edu/> - Harvard University GaryKing.org - King(a)Harvard.edu - @KingGary <https://twitter.com/kinggary> - 617-500-7570 - Assistant <king-assist(a)iq.harvard.edu>: 617-495-9271 On Thu, Jan 12, 2017 at 9:59 AM, Ignacio Martinez <ignacio82(a)gmail.com> wrote: > Hi everyone, > > > When you do matching with replacement you have to use weights because some > observations are used multiple times. Can somebody explain what would be > the consequences of ignoring those weights when running OLS? My intuition > is that I would end up with bias estimator. Is this correct? Is it possible > to sign the bias? Is there a paper that discuss this? > > > Thanks, > > > Ignacio > >

7 years, 3 months

Matching with replacement and weights

by Ignacio Martinez

Hi everyone, When you do matching with replacement you have to use weights because some observations are used multiple times. Can somebody explain what would be the consequences of ignoring those weights when running OLS? My intuition is that I would end up with bias estimator. Is this correct? Is it possible to sign the bias? Is there a paper that discuss this? Thanks, Ignacio

7 years, 3 months

Re: [matchit] Matching with more treatments than comparisons observations?

by Gary King

i'm sure it is mentioned (probably in our paper somewhere). The costs and benefits are not methodological; they are more of a choice about what quantity of interest you are willing to try to estimate. Gary -- *Gary King* - Albert J. Weatherhead III University Professor - Director, IQSS <http://iq.harvard.edu/> - Harvard University GaryKing.org - King(a)Harvard.edu - @KingGary <https://twitter.com/kinggary> - 617-500-7570 - Assistant <king-assist(a)iq.harvard.edu>: 617-495-9271 On Wed, Jan 11, 2017 at 2:06 PM, Ignacio Martinez <ignacio82(a)gmail.com> wrote: > Thanks a lot Gary. Is there any literature that talks about this case? I > imagine that there are plus and minuses to those approaches. > > > > On Wed, Jan 11, 2017 at 2:04 PM Gary King <king(a)harvard.edu> wrote: > >> one simple possibility is to switch 0s to 1s and 1s to 0s. if that >> really won't work for you, then you could match with (a lot of) >> replacement. >> >> Gary >> -- >> *Gary King* - Albert J. Weatherhead III University Professor - Director, >> IQSS <http://iq.harvard.edu/> - Harvard University >> GaryKing.org - King(a)Harvard.edu - @KingGary >> <https://twitter.com/kinggary> - 617-500-7570 <(617)%20500-7570> - >> Assistant <king-assist(a)iq.harvard.edu>: 617-495-9271 <(617)%20495-9271> >> >> On Wed, Jan 11, 2017 at 2:01 PM, Ignacio Martinez <ignacio82(a)gmail.com> >> wrote: >> >> Hi everyone, >> >> Is there a paper that talks about matching when the sample has more >> treatment observations than control observations? Is there an algorithm >> that works better for this case? Can somebody explain to me why optimal >> matching does not work at all in this case? >> >> Thanks, >> >> Ignacio >> >> >>

7 years, 3 months

Re: [matchit] Matching with more treatments than comparisons observations?

by Gary King

one simple possibility is to switch 0s to 1s and 1s to 0s. if that really won't work for you, then you could match with (a lot of) replacement. Gary -- *Gary King* - Albert J. Weatherhead III University Professor - Director, IQSS <http://iq.harvard.edu/> - Harvard University GaryKing.org - King(a)Harvard.edu - @KingGary <https://twitter.com/kinggary> - 617-500-7570 <(617)%20500-7570> - Assistant <king-assist(a)iq.harvard.edu>: 617-495-9271 <(617)%20495-9271> On Wed, Jan 11, 2017 at 2:01 PM, Ignacio Martinez <ignacio82(a)gmail.com> wrote: > Hi everyone, > > Is there a paper that talks about matching when the sample has more > treatment observations than control observations? Is there an algorithm > that works better for this case? Can somebody explain to me why optimal > matching does not work at all in this case? > > Thanks, > > Ignacio >

7 years, 3 months

Matching with more treatments than comparisons observations?

by Ignacio Martinez

Hi everyone, Is there a paper that talks about matching when the sample has more treatment observations than control observations? Is there an algorithm that works better for this case? Can somebody explain to me why optimal matching does not work at all in this case? Thanks, Ignacio

7 years, 3 months

Re: [matchit] Subset of top matches

by Gary King

for the first, you can take the 25 matches of any T to any C that are smallest. for the second, there is one graph it will calculate if it has the outcome variable, but you don't need it or can ignore it. Gary -- *Gary King* - Albert J. Weatherhead III University Professor - Director, IQSS <http://iq.harvard.edu/> - Harvard University GaryKing.org - King(a)Harvard.edu - @KingGary <https://twitter.com/kinggary> - 617-500-7570 - Assistant <king-assist(a)iq.harvard.edu>: 617-495-9271 On Tue, Jan 3, 2017 at 10:50 PM, Juan Tellez <juan.f.tellez(a)gmail.com> wrote: > Thank you Gary. On the first solution, I imagine I would have to specify > which 25 of the 50 treated units to find matches for? And on the second > solution, since I am designing the survey I do not yet have an outcome to > measure. The MakeFrontier() function call requires an outcome to run. Is > there some workaround or is my thinking mistaken? > > > > On Tue, Jan 3, 2017 at 8:55 AM, Gary King <king(a)harvard.edu> wrote: > >> Hi Juan, You could ask matchit for the lowest imbalance on a "greedy" >> basis, say 25 treated units with the closest controls or some such. >> Alternatively, if you add one component -- a specific overall imbalance >> metric -- you have a well defined mathematical problem. To rephrase, you'd >> like the subset with 25 treated and 25 (or more) controls that has the >> lowest level of imbalance among the (huge number of) all possible such >> subsets. If so, this paper <http://j.mp/1dRDMrE> on the matching >> balance frontier can calculate this. >> >> Best of luck with your research, >> >> Gary >> -- >> *Gary King* - Albert J. Weatherhead III University Professor - Director, >> IQSS <http://iq.harvard.edu/> - Harvard University >> GaryKing.org - King(a)Harvard.edu - @KingGary >> <https://twitter.com/kinggary> - 617-500-7570 <(617)%20500-7570> - >> Assistant <king-assist(a)iq.harvard.edu>: 617-495-9271 <(617)%20495-9271> >> >> On Tue, Jan 3, 2017 at 9:26 PM, Juan Tellez <juan.f.tellez(a)gmail.com> >> wrote: >> >>> Hello, >>> >>> >>> >>> I am planning a survey where I have 50ish treated municipalities and >>> hundreds to choose from as potential controls. The tricky part of this >>> matching exercise is that the survey will, in the end, only sample 25 of >>> the 50 treated municipalities. I don't particularly care which 25 of the 50 >>> are chosen; what I am effectively looking for is the top 25 >>> treatment-control pairs from my sample. >>> >>> Is it possible to do this in MatchIt with a distance measure >>> like mahalanobis? The MatchIt package is understandably conservative about >>> discarding treatment observations, and when I use it to match I generally >>> end up with around 50 matched treated units. How might I go about this? >>> Thank you. >>> >>> -- >>> Best, >>> >>> Juan Fernando Tellez >>> PhD Candidate >>> Department of Political Science >>> Duke University >>> >> >> > > > -- > Best, > > Juan Fernando Tellez > PhD Candidate > Department of Political Science > Duke University >

7 years, 4 months

Re: [matchit] Subset of top matches

by Gary King

Hi Juan, You could ask matchit for the lowest imbalance on a "greedy" basis, say 25 treated units with the closest controls or some such. Alternatively, if you add one component -- a specific overall imbalance metric -- you have a well defined mathematical problem. To rephrase, you'd like the subset with 25 treated and 25 (or more) controls that has the lowest level of imbalance among the (huge number of) all possible such subsets. If so, this paper <http://j.mp/1dRDMrE> on the matching balance frontier can calculate this. Best of luck with your research, Gary -- *Gary King* - Albert J. Weatherhead III University Professor - Director, IQSS <http://iq.harvard.edu/> - Harvard University GaryKing.org - King(a)Harvard.edu - @KingGary <https://twitter.com/kinggary> - 617-500-7570 - Assistant <king-assist(a)iq.harvard.edu>: 617-495-9271 On Tue, Jan 3, 2017 at 9:26 PM, Juan Tellez <juan.f.tellez(a)gmail.com> wrote: > Hello, > > > > I am planning a survey where I have 50ish treated municipalities and > hundreds to choose from as potential controls. The tricky part of this > matching exercise is that the survey will, in the end, only sample 25 of > the 50 treated municipalities. I don't particularly care which 25 of the 50 > are chosen; what I am effectively looking for is the top 25 > treatment-control pairs from my sample. > > Is it possible to do this in MatchIt with a distance measure > like mahalanobis? The MatchIt package is understandably conservative about > discarding treatment observations, and when I use it to match I generally > end up with around 50 matched treated units. How might I go about this? > Thank you. > > -- > Best, > > Juan Fernando Tellez > PhD Candidate > Department of Political Science > Duke University >

7 years, 4 months

Subset of top matches

by Juan Tellez

Hello, I am planning a survey where I have 50ish treated municipalities and hundreds to choose from as potential controls. The tricky part of this matching exercise is that the survey will, in the end, only sample 25 of the 50 treated municipalities. I don't particularly care which 25 of the 50 are chosen; what I am effectively looking for is the top 25 treatment-control pairs from my sample. Is it possible to do this in MatchIt with a distance measure like mahalanobis? The MatchIt package is understandably conservative about discarding treatment observations, and when I use it to match I generally end up with around 50 matched treated units. How might I go about this? Thank you. -- Best, Juan Fernando Tellez PhD Candidate Department of Political Science Duke University

7 years, 4 months

Combining full matching and exact matching with MatchIt

by Jessika Golle

Dear All, Unfortunately, I have problems combining full matching and exact matching. I want to do exact matching on the cluster variable „IDA“, and full matching on all other variables. The propensity score I calculated previously is called „PS“ and I use it as distance measure. The dataset is called "impute1“. The treatment variable is "teilgenommen“ and it is predicted by many covariates (e.g., t1_age + STLT1nmath + STLT1ndeu). In the documentation of the package I read that for nearest neighbor matching it is very simple to combine nearest neighbor and exact matching. I tried to use this approach for full matching but it did not work. I am not very good in writing complex functions or even understanding very complex functions. If anybody can give me some advice I appreciate it. I tried the following: First: test <- matchit (teilgenommen ~ t1_age + STLT1nmath + STLT1ndeu + sgeschlecht + s1kt01flu + hisei + Zgc + Zgf + s1int_deu + s1int_mat + s1int_mnk + s1int_spr + s1ria_i + s1ria_r + s1ria_a + s1ria_s + s1ria_e + s1ria_c + s1ec + s1abs + s1eng + s1sdq_au + s1sdq_sp + s1sdq_el + s1sdq_pe + s1sdq_sw + s1sdq_aa + s1sdq_ak + s1ske_le + s1ske_sc + s1ske_ma + s1lem_ml + s1lem_ms + s1lem_mr + s1sc + s1fee + s1sf_la + s1sf_lb + s1sf_ma + s1sf_mb + s1sf_ea + s1sf_eb + e1hh + e1em + e1ex + e1ag + e1co + e1op + e1sozk + Zgc_c + Zgf_c + teilgenommen_c, data = impute1, distance = impute1$PS, discard = "control", method = "full", exact = c("IDA"), replace = F) summary(test) This revealed the same result as without the command: exact = c("IDA“) Second: test <- matchit (teilgenommen ~ t1_age + STLT1nmath + STLT1ndeu + sgeschlecht + s1kt01flu + hisei + Zgc + Zgf + s1int_deu + s1int_mat + s1int_mnk + s1int_spr + s1ria_i + s1ria_r + s1ria_a + s1ria_s + s1ria_e + s1ria_c + s1ec + s1abs + s1eng + s1sdq_au + s1sdq_sp + s1sdq_el + s1sdq_pe + s1sdq_sw + s1sdq_aa + s1sdq_ak + s1ske_le + s1ske_sc + s1ske_ma + s1lem_ml + s1lem_ms + s1lem_mr + s1sc + s1fee + s1sf_la + s1sf_lb + s1sf_ma + s1sf_mb + s1sf_ea + s1sf_eb + e1hh + e1em + e1ex + e1ag + e1co + e1op + e1sozk + Zgc_c + Zgf_c + teilgenommen_c, data = impute1, distance = impute1$PS, discard = "control", method = "full", #within=exactMatch(teilgenommen ~ IDC + PS, data = impute1), within = exactMatch(teilgenommen ~ IDA, data = impute1), #exact = c("IDA"), replace = F) summary(test) This revealed the same result as without the command: within = exactMatch(teilgenommen ~ IDA, data = impute1), Warning: In fullmatch.matrix(d, ...) : Ignoring non-null 'within' argument. When using 'fullmatch' with pre-formed distances, please combine them using '+'. Third: test <- matchit (teilgenommen ~ t1_age + STLT1nmath + STLT1ndeu + sgeschlecht + s1kt01flu + hisei + Zgc + Zgf + s1int_deu + s1int_mat + s1int_mnk + s1int_spr + s1ria_i + s1ria_r + s1ria_a + s1ria_s + s1ria_e + s1ria_c + s1ec + s1abs + s1eng + s1sdq_au + s1sdq_sp + s1sdq_el + s1sdq_pe + s1sdq_sw + s1sdq_aa + s1sdq_ak + s1ske_le + s1ske_sc + s1ske_ma + s1lem_ml + s1lem_ms + s1lem_mr + s1sc + s1fee + s1sf_la + s1sf_lb + s1sf_ma + s1sf_mb + s1sf_ea + s1sf_eb + e1hh + e1em + e1ex + e1ag + e1co + e1op + e1sozk + Zgc_c + Zgf_c + teilgenommen_c, data = impute1, distance = impute1$PS, discard = "control", method = "full", within=exactMatch(teilgenommen ~ IDC + PS, data = impute1), #within = exactMatch(teilgenommen ~ IDA, data = impute1), #exact = c("IDA"), replace = F) summary(test) With this command combination it took much more time, however the result was the same as before and the warning was also the same. Furthermore, I did not exactly know what I was doing here. Warning: In fullmatch.matrix(d, ...) : Ignoring non-null 'within' argument. When using 'fullmatch' with pre-formed distances, please combine them using '+‘. Forth: test <- fullmatch(glm(teilgenommen ~ t1_age + STLT1nmath + STLT1ndeu + sgeschlecht + s1kt01flu + hisei + Zgc + Zgf + s1int_deu + s1int_mat + s1int_mnk + s1int_spr + s1ria_i + s1ria_r + s1ria_a + s1ria_s + s1ria_e + s1ria_c + s1ec + s1abs + s1eng + s1sdq_au + s1sdq_sp + s1sdq_el + s1sdq_pe + s1sdq_sw + s1sdq_aa + s1sdq_ak + s1ske_le + s1ske_sc + s1ske_ma + s1lem_ml + s1lem_ms + s1lem_mr + s1sc + s1fee + s1sf_la + s1sf_lb + s1sf_ma + s1sf_mb + s1sf_ea + s1sf_eb + e1hh + e1em + e1ex + e1ag + e1co + e1op + e1sozk + Zgc_c + Zgf_c + teilgenommen_c, family = binomial, data = impute1), data = impute1, distance = impute1$PS, discard = "control", method = "full", #within=exactMatch(teilgenommen ~ IDC + PS, data = impute1), within = exactMatch(teilgenommen ~ IDA, data = impute1), replace = F) This was my last try and I did not understand the output. summary(test) Structure of matched sets: 5+:1 4:1 3:1 2:1 1:1 1:2 1:3 1:4 1:5+ 0:1 9 9 11 21 101 38 9 15 77 71 Effective Sample Size: 406 (equivalent number of matched pairs). However, no warning occurred. I am looking forward to hearing from you. Kind regards, Jessika ------------------------------------------------------------ Dr. Jessika Golle Universität Tübingen Hector-Institut für Empirische Bildungsforschung Europastr. 6 72072 Tübingen Tel.: +49 (0)7071/29-76124 Fax: +49 (0)7071/29-5371 ------------------------------------------------------------ University of Tübingen Hector Research Institute of Education Sciences and Psychology Europastr. 6 72072 Tübingen Phone: +49 (0)7071/29-76124 Fax: +49 (0)7071/29-5371

7 years, 4 months

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Matchit January 2017