by Adamakis, Sotirios (Customer Analytics & Decision)
Hi, I have the same problem. Which option turns off the calculation of L1?
Regards,
Sotiris
________________________________
if you use CEM and turn off the calculation of L1, its very fast and can
deal with very large data sets.
Gary
--
*Gary King* - Albert J. Weatherhead III University Professor - Director,
IQSS - Harvard University
GKing.Harvard.edu <http://gking.harvard.edu/> - King at Harvard.edu<https://lists.gking.harvard.edu/mailman/listinfo/matchit> -
@kinggary<http://twitter.com/kinggary>- 617-500-7570 - Asst 495-9271 -
Fax 812-8581
On Mon, Feb 27, 2012 at 12:14 PM, Donny Baum <donnybaum at gmail.com<https://lists.gking.harvard.edu/mailman/listinfo/matchit>> wrote:
> Has anyone come across problems of performance using MatchIt with large
> datasets? I am trying to perform nearest neighbor matching with 10
> subclassifications on a sample of 400,000 (50,000 treatment cases, 350,00
> untreated) with about 25 covariates. I was able to get one round of
> successful results after about 8 hours of waiting for R to produce the
> output. Is this typical of using MatchIt with large data? Is there any way
> to increase the speed or otherwise work around this?
>
> Any help would be great.
>
> Cheers,
>
> Don Baum
>
> --
> Don Baum
> Ph.D. Candidate/ Graduate Assistant
> Comparative and International Development Education
> Educational Policy and Administration
> University of Minnesota
>
>
>
> -
> ---
> MatchIt mailing list served by HUIT
> List Address: matchit at lists.gking.harvard.edu<https://lists.gking.harvard.edu/mailman/listinfo/matchit>
> Subscribe/Unsubscribe: http://lists.gking.harvard.edu/mailman/listinfo/ei
> MatchIt Software and Documentation: http://gking.harvard.edu/matchit/
> Browse/Search List Archive:
> http://lists.gking.harvard.edu/mailman/private/matchit/
> Matchit mailing list
> Matchit at lists.gking.harvard.edu<https://lists.gking.harvard.edu/mailman/listinfo/matchit>
> https://lists.gking.harvard.edu/mailman/listinfo/matchit
Lloyds TSB Bank plc. Registered Office: 25 Gresham Street, London EC2V 7HN. Registered in England and Wales, number 2065. Telephone: 020 7626 1500.
Bank of Scotland plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland, number 327000. Telephone: 0870 600 5000
Lloyds TSB Scotland plc. Registered Office: Henry Duncan House, 120 George Street, Edinburgh EH2 4LH. Registered in Scotland, number 95237. Telephone: 0131 225 4555.
Cheltenham & Gloucester plc. Registered Office: Barnett Way, Gloucester GL4 3RL. Registered in England and Wales, number 2299428. Telephone: 01452 372372.
Lloyds TSB Bank plc, Lloyds TSB Scotland plc, Bank of Scotland plc and Cheltenham & Gloucester plc are authorised and regulated by the Financial Services Authority.
Halifax is a division of Bank of Scotland plc. Cheltenham & Gloucester Savings is a division of Lloyds TSB Bank plc.
HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland, number 218813. Telephone: 0870 600 5000
Lloyds Banking Group plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland, number 95000. Telephone: 0131 225 4555
This e-mail (including any attachments) is private and confidential and may contain privileged material. If you have received this e-mail in error, please notify the sender and delete it (including any attachments) immediately. You must not copy, distribute, disclose or use any of the information in it or any attachments.
Telephone calls may be monitored or recorded.
Thank You very much Kosuke,
I will check my variables right now.
May I ask you a little methodological question ?
When I use the (logit) propensity matching method , the average propensity score for treated is about 12%...
because only a small part of the treated has scores between 50% and
80%...
Does it means that there is an omitted variable somewhere ? Or there is no absolute reference level and I have to assess this figure relatively to the average score of the control group?
I was expecting to obtain a propensity score much higher for the treated ... also because when I run the same logistic specification of the probability of bein treated using another stat package (in order to see the significance of the coefficients) I obtain that all my variables are highly significant at the 1% level...
What do you think ?
Many thanks
On 27 February 2012 04:36, Kosuke Imai <kimai(a)princeton.edu> wrote:
> My guess is that there is a perfect collinearity among your variables.
>
> Kosuke
>
> Department of Politics
> Princeton University
> http://imai.princeton.edu
>
> On Feb 24, 2012, at 5:32 PM, Francesco wrote:
>
>> Dear Matchit list,
>>
>> I am using matchit for my work, and I really appreciate the excellent work you have done so far.
>> I have a question : I perform a nearest neighbor matching procedure with a large dataset ( 40 000 individuals, 15 variables) and when I use the standard propensity score as a distance, everything works fine : the matching is quite good
>> However if I specify the "mahalanobis" distance I get an error saying that :
>>
>> "Lapack dgesv : le système est exactement singulier" (the system is exactly singular)...
>>
>> Do you have an idea of what might cause this problem ? I have no missing data...
>>
>> Many thanks
>> -
>> ---
>> MatchIt mailing list served by HUIT
>> List Address: matchit(a)lists.gking.harvard.edu
>> Subscribe/Unsubscribe: http://lists.gking.harvard.edu/mailman/listinfo/ei
>> MatchIt Software and Documentation: http://gking.harvard.edu/matchit/
>> Browse/Search List Archive: http://lists.gking.harvard.edu/mailman/private/matchit/
>> Matchit mailing list
>> Matchit(a)lists.gking.harvard.edu
>> https://lists.gking.harvard.edu/mailman/listinfo/matchit
>
Has anyone come across problems of performance using MatchIt with large
datasets? I am trying to perform nearest neighbor matching with 10
subclassifications on a sample of 400,000 (50,000 treatment cases, 350,00
untreated) with about 25 covariates. I was able to get one round of
successful results after about 8 hours of waiting for R to produce the
output. Is this typical of using MatchIt with large data? Is there any way
to increase the speed or otherwise work around this?
Any help would be great.
Cheers,
Don Baum
--
Don Baum
Ph.D. Candidate/ Graduate Assistant
Comparative and International Development Education
Educational Policy and Administration
University of Minnesota
Dear Matchit list,
I am using matchit for my work, and I really appreciate the excellent work
you have done so far.
I have a question : I perform a nearest neighbor matching procedure with a
large dataset ( 40 000 individuals, 15 variables) and when I use the
standard propensity score as a distance, everything works fine : the
matching is quite good
However if I specify the "mahalanobis" distance I get an error saying that :
"Lapack dgesv : le système est exactement singulier" (the system is exactly
singular)...
Do you have an idea of what might cause this problem ? I have no missing
data...
Many thanks
Hi,
I'm trying to implement the example at page 18 from the MatchIt manual. The example is about a way to estimate the ATT.
I can't implement the example because I'm using model="ls.mixed" so my dataframe has to be in a long format. In concrete terms, when I extract my dataframe with match.data() to restructure it in long format, I can't use the option data=match.data(mydata, "control") within Zelig.
Is there a workaround ? I tried to use matchit() with the long format dataframe, but it gives wrong mathching since each case has multiple observations in that format.
Thanks,
François Maurice
Hi,
I'm using MatchIt. I'm trying to understand which method produce ATE estimate and which one produce ATT estimate.
In the documentation, section 5: Frequently Asked Questions: How Exactly are the Weights Created?, it is said :
"These weights are constructed to estimate the average treatment effect on the treated, [...]".
Is there a way with MatchIt to estimate ATE ? To be concrete, I'm using experimental data with a control group almost three times the treated group. I'm using the following four methods with ratio=2 in matchit():
Nearest : Drop some controls
Subclassification : Keep all controls
Nearest with exact : Drop some controls
Genetic : Drop some contrls
Since subclassification keeps all controls, can that be an ATE estimate or do I need to built my own weights to make sure it is ATE?
In general, is MatchIt produced only match set with weights that can only be use to estimate ATT ?
And if I use Zelig after MatchIt, is there a way to produce ATE estimate ?
Thanks,
François Maurice
Hi, again!
I am comparing optimal matching to two nearest neighbor matching methods.
I ran the PSM in MatchIt! and I am seeing some weird stuff in the
distributions of my covariates for the optimal method. The standardized
mean differences are very low on average for my simulation runs. However
the treated and control groups ore very dissimilar in the optimal matching
method. The covariate distributions in the nearest neighbor methods are
very similar. How can I explain that?
Thanks!
Shane Phillips
>
> Could you please verify that the code below is properly constructed?
>
> nearest0 <- matchit(treat ~ age+lunch+iep+gender+cogat+mapfall+itbs, data = Run0031, method="nearest", exact=c("lunch","iep"))
> nearest1 <- matchit(treat ~ age+lunch+iep+gender+cogat+mapfall+itbs, data = Run0031, method="nearest", caliper=0.1)
> opt <- matchit(treat ~ age+lunch+iep+gender+cogat+mapfall+itbs, data = Run0031, method="optimal", ratio=1)
>
> This the segment of my code for doing PSM on one run of my data using 3 different methods.
>
> Thanks!
>
> Shane
>
Good morning!
I used MatchIt! to run a simulation comparing three different types of
propensity score matching techniques: 1-to-1 Nearest neighbor including
exact matching on two dichotomous variables, 1-to-1 nearest neighbor with a
.1 sd caliper, and 1-to-1 optimal matching. After conducting 1000 runs of
1600 cases each (134 treated cases and 1466 possible control cases),
optimal showed the lowest average standardized mean difference, but there
was MUCH more variability in the standardized mean difference values than
in the other two methods. How can I explain this? All of the methods used
the same data. There was not much competition for controls. The nearest
neighbor methods used the default order settings. Please help!!!!
Thanks,
Shane Phillips