Matchit February 2012

matchit@lists.gking.harvard.edu

9 participants
9 discussions

by Adamakis, Sotirios (Customer Analytics & Decision)

Hi, I have the same problem. Which option turns off the calculation of L1? Regards, Sotiris ________________________________ if you use CEM and turn off the calculation of L1, its very fast and can deal with very large data sets. Gary -- *Gary King* - Albert J. Weatherhead III University Professor - Director, IQSS - Harvard University GKing.Harvard.edu <http://gking.harvard.edu/> - King at Harvard.edu<https://lists.gking.harvard.edu/mailman/listinfo/matchit> - @kinggary<http://twitter.com/kinggary>- 617-500-7570 - Asst 495-9271 - Fax 812-8581 On Mon, Feb 27, 2012 at 12:14 PM, Donny Baum <donnybaum at gmail.com<https://lists.gking.harvard.edu/mailman/listinfo/matchit>> wrote: > Has anyone come across problems of performance using MatchIt with large > datasets? I am trying to perform nearest neighbor matching with 10 > subclassifications on a sample of 400,000 (50,000 treatment cases, 350,00 > untreated) with about 25 covariates. I was able to get one round of > successful results after about 8 hours of waiting for R to produce the > output. Is this typical of using MatchIt with large data? Is there any way > to increase the speed or otherwise work around this? > > Any help would be great. > > Cheers, > > Don Baum > > -- > Don Baum > Ph.D. Candidate/ Graduate Assistant > Comparative and International Development Education > Educational Policy and Administration > University of Minnesota > > > > - > --- > MatchIt mailing list served by HUIT > List Address: matchit at lists.gking.harvard.edu<https://lists.gking.harvard.edu/mailman/listinfo/matchit> > Subscribe/Unsubscribe: http://lists.gking.harvard.edu/mailman/listinfo/ei > MatchIt Software and Documentation: http://gking.harvard.edu/matchit/ > Browse/Search List Archive: > http://lists.gking.harvard.edu/mailman/private/matchit/ > Matchit mailing list > Matchit at lists.gking.harvard.edu<https://lists.gking.harvard.edu/mailman/listinfo/matchit> > https://lists.gking.harvard.edu/mailman/listinfo/matchit Lloyds TSB Bank plc. Registered Office: 25 Gresham Street, London EC2V 7HN. Registered in England and Wales, number 2065. Telephone: 020 7626 1500. Bank of Scotland plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland, number 327000. Telephone: 0870 600 5000 Lloyds TSB Scotland plc. Registered Office: Henry Duncan House, 120 George Street, Edinburgh EH2 4LH. Registered in Scotland, number 95237. Telephone: 0131 225 4555. Cheltenham & Gloucester plc. Registered Office: Barnett Way, Gloucester GL4 3RL. Registered in England and Wales, number 2299428. Telephone: 01452 372372. Lloyds TSB Bank plc, Lloyds TSB Scotland plc, Bank of Scotland plc and Cheltenham & Gloucester plc are authorised and regulated by the Financial Services Authority. Halifax is a division of Bank of Scotland plc. Cheltenham & Gloucester Savings is a division of Lloyds TSB Bank plc. HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland, number 218813. Telephone: 0870 600 5000 Lloyds Banking Group plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland, number 95000. Telephone: 0131 225 4555 This e-mail (including any attachments) is private and confidential and may contain privileged material. If you have received this e-mail in error, please notify the sender and delete it (including any attachments) immediately. You must not copy, distribute, disclose or use any of the information in it or any attachments. Telephone calls may be monitored or recorded.

12 years, 2 months

(no subject)

by Marco Francesco

Thank You very much Kosuke, I will check my variables right now. May I ask you a little methodological question ? When I use the (logit) propensity matching method , the average propensity score for treated is about 12%... because only a small part of the treated has scores between 50% and 80%... Does it means that there is an omitted variable somewhere ? Or there is no absolute reference level and I have to assess this figure relatively to the average score of the control group? I was expecting to obtain a propensity score much higher for the treated ... also because when I run the same logistic specification of the probability of bein treated using another stat package (in order to see the significance of the coefficients) I obtain that all my variables are highly significant at the 1% level... What do you think ? Many thanks On 27 February 2012 04:36, Kosuke Imai <kimai(a)princeton.edu> wrote: > My guess is that there is a perfect collinearity among your variables. > > Kosuke > > Department of Politics > Princeton University > http://imai.princeton.edu > > On Feb 24, 2012, at 5:32 PM, Francesco wrote: > >> Dear Matchit list, >> >> I am using matchit for my work, and I really appreciate the excellent work you have done so far. >> I have a question : I perform a nearest neighbor matching procedure with a large dataset ( 40 000 individuals, 15 variables) and when I use the standard propensity score as a distance, everything works fine : the matching is quite good >> However if I specify the "mahalanobis" distance I get an error saying that : >> >> "Lapack dgesv : le système est exactement singulier" (the system is exactly singular)... >> >> Do you have an idea of what might cause this problem ? I have no missing data... >> >> Many thanks >> - >> --- >> MatchIt mailing list served by HUIT >> List Address: matchit(a)lists.gking.harvard.edu >> Subscribe/Unsubscribe: http://lists.gking.harvard.edu/mailman/listinfo/ei >> MatchIt Software and Documentation: http://gking.harvard.edu/matchit/ >> Browse/Search List Archive: http://lists.gking.harvard.edu/mailman/private/matchit/ >> Matchit mailing list >> Matchit(a)lists.gking.harvard.edu >> https://lists.gking.harvard.edu/mailman/listinfo/matchit >

12 years, 2 months

Using matchit for large datasets

by Donny Baum

Has anyone come across problems of performance using MatchIt with large datasets? I am trying to perform nearest neighbor matching with 10 subclassifications on a sample of 400,000 (50,000 treatment cases, 350,00 untreated) with about 25 covariates. I was able to get one round of successful results after about 8 hours of waiting for R to produce the output. Is this typical of using MatchIt with large data? Is there any way to increase the speed or otherwise work around this? Any help would be great. Cheers, Don Baum -- Don Baum Ph.D. Candidate/ Graduate Assistant Comparative and International Development Education Educational Policy and Administration University of Minnesota

12 years, 2 months

MAHALANOBIS

by Francesco

Dear Matchit list, I am using matchit for my work, and I really appreciate the excellent work you have done so far. I have a question : I perform a nearest neighbor matching procedure with a large dataset ( 40 000 individuals, 15 variables) and when I use the standard propensity score as a distance, everything works fine : the matching is quite good However if I specify the "mahalanobis" distance I get an error saying that : "Lapack dgesv : le système est exactement singulier" (the system is exactly singular)... Do you have an idea of what might cause this problem ? I have no missing data... Many thanks

12 years, 2 months

Trying to implement an example from the MatchIt manual

by Francois Maurice

Hi, I'm trying to implement the example at page 18 from the MatchIt manual. The example is about a way to estimate the ATT. I can't implement the example because I'm using model="ls.mixed" so my dataframe has to be in a long format. In concrete terms, when I extract my dataframe with match.data() to restructure it in long format, I can't use the option data=match.data(mydata, "control") within Zelig. Is there a workaround ? I tried to use matchit() with the long format dataframe, but it gives wrong mathching since each case has multiple observations in that format. Thanks, François Maurice

12 years, 3 months

ATE or ATT ?

by Francois Maurice

Hi, I'm using MatchIt. I'm trying to understand which method produce ATE estimate and which one produce ATT estimate. In the documentation, section 5: Frequently Asked Questions: How Exactly are the Weights Created?, it is said : "These weights are constructed to estimate the average treatment effect on the treated, [...]". Is there a way with MatchIt to estimate ATE ? To be concrete, I'm using experimental data with a control group almost three times the treated group. I'm using the following four methods with ratio=2 in matchit(): Nearest : Drop some controls Subclassification : Keep all controls Nearest with exact : Drop some controls Genetic : Drop some contrls Since subclassification keeps all controls, can that be an ATE estimate or do I need to built my own weights to make sure it is ATE? In general, is MatchIt produced only match set with weights that can only be use to estimate ATT ? And if I use Zelig after MatchIt, is there a way to produce ATE estimate ? Thanks, François Maurice

12 years, 3 months

Another optimal matching question

by Shane Phillips

Hi, again! I am comparing optimal matching to two nearest neighbor matching methods. I ran the PSM in MatchIt! and I am seeing some weird stuff in the distributions of my covariates for the optimal method. The standardized mean differences are very low on average for my simulation runs. However the treated and control groups ore very dissimilar in the optimal matching method. The covariate distributions in the nearest neighbor methods are very similar. How can I explain that? Thanks! Shane Phillips

12 years, 3 months

Question about optimal matching

by Shane Phillips

> > Could you please verify that the code below is properly constructed? > > nearest0 <- matchit(treat ~ age+lunch+iep+gender+cogat+mapfall+itbs, data = Run0031, method="nearest", exact=c("lunch","iep")) > nearest1 <- matchit(treat ~ age+lunch+iep+gender+cogat+mapfall+itbs, data = Run0031, method="nearest", caliper=0.1) > opt <- matchit(treat ~ age+lunch+iep+gender+cogat+mapfall+itbs, data = Run0031, method="optimal", ratio=1) > > This the segment of my code for doing PSM on one run of my data using 3 different methods. > > Thanks! > > Shane >

12 years, 3 months

Question about optimal matching

by Shane Phillips

Good morning! I used MatchIt! to run a simulation comparing three different types of propensity score matching techniques: 1-to-1 Nearest neighbor including exact matching on two dichotomous variables, 1-to-1 nearest neighbor with a .1 sd caliper, and 1-to-1 optimal matching. After conducting 1000 runs of 1600 cases each (134 treated cases and 1466 possible control cases), optimal showed the lowest average standardized mean difference, but there was MUCH more variability in the standardized mean difference values than in the other two methods. How can I explain this? All of the methods used the same data. There was not much competition for controls. The nearest neighbor methods used the default order settings. Please help!!!! Thanks, Shane Phillips

12 years, 3 months

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Matchit February 2012