Unfortunately, I don't think we have an automated procedure for everything. You would have to multiply impute the data, do matching on each imputed data set, and then combine it in zelig() using mi() function. But this does not require any programming. You can simply run the same matching procedure on each data set via matchit() and then feed the resulting multiple matched data sets into zelig().
Good luck,
Kosuke
Department of Politics
Princeton University
http://imai.princeton.edu
On Sep 13, 2011, at 6:02 PM, Pingaul jb wrote:
> Dear Professor,
> I’m a post-doctoral student at Montreal University. I’m actually in Columbia, working and propensity scores with a colleague and using MatchIt and Zelig. First, congratulations for your packages that are very flexible.
>
> My question is about multiple imputation and propensity scores with these softwares. From what I understand, combining both approaches would include:
>
> 1/ Doing multiple imputation and testing which variables to include.
>
> 2/ Propensity score analysis on each imputed data set and pooling the overall balance to check if it is ok (or on each data set?).
>
> 3/ Calculation of the quantities of interest for each data set
>
> 4/ Pooling the quantities across data sets.
>
> I would like to know if there is a written syntax to perform the MatchIt analysis for all of the imputed data set without having to do it manually and check the overall balance. Also, in theory, the number of individuals retained after propensity score matching and the weights can be different for each imputed data set. So that we have to perform the final analysis on each one and then pool the data with a specific procedure to take into account the eventual varying Ns? I normally use Mice package for multiple imputation but it seems that Zelig handle Amelia. My colleague seems to do be able to do all that in stata, but I’m not sure how to make all the three R packages work together.
>
> I would be very happy if you could indicate to me a reference or a place where I can find the syntax to do that (I’ve been using R for some times so I can use packages easily but I have no programming skills).
>
>
> Best Regards!
>
>
>
> Jean-Baptiste
>
-
Zelig Mailing List, served by Harvard-MIT Data Center
Send messages: zelig(a)lists.gking.harvard.edu
[un]subscribe Options: http://lists.gking.harvard.edu/?info=zelig
Zelig program information: http://gking.harvard.edu/zelig/
Good morning
If I run
<<<
susan.lsmixed.out <- zelig(formula = unprot_vag_sex ~ married + age + TREATMENT.ARM*time + highest_grade + income + tag(1|id),
data = susanMI.out$imputations, model = "ls.mixed")
summary(susan.lsmixed.out)
>>>>
I get an error
Error in x$coef : $ operator is invalid for atomic vectors
Searching the archives, I see that others have had similar problems. Is there a workaround?
summary(susan.lsmixed.out[[1]])
works fine; should I then average across the five imputed data sets?
thanks!
Peter
Peter L. Flom, PhD
Statistical Consultant
Website: http://www DOT statisticalanalysisconsulting DOT com/
Writing; http://www.associatedcontent.com/user/582880/peter_flom.html
Twitter: @peterflom
-
Zelig Mailing List, served by Harvard-MIT Data Center
Send messages: zelig(a)lists.gking.harvard.edu
[un]subscribe Options: http://lists.gking.harvard.edu/?info=zelig
Zelig program information: http://gking.harvard.edu/zelig/
Hello,
I am trying to run a multilevel probit model using Zelig, but keep receiving
the following error message: " in .deparseTag(TT.vars[[vind]]) : wrong use
of tag function!!"
A simplified version of the model I am trying to run is:
z.out <- zelig(formula= list(mu=investment.binary ~ edlevel +
tag(1 + edlevel, gamma | country),
gamma = ~ tag(GDPpc06.full| country)), data=data2006.mod1,
model="probit.mixed")
What I would like to do is allow the intercept and the edlevel variable
listed within the first tag() to vary by country as a function of the
GDPpc06.full variable, all of which are included in the same dataframe. I
followed the syntax here - http://cran.r-project.org/web/packages/Zelig
/vignettes/probit.mixed.pdf - but I think that I am incorrectly specifying
the gamma part of the syntax, which may be causing the error.
I *am* able to get the model to run when I allow the intercept and edlevel
variable to vary using the following syntax:
z.out <- zelig(investment.binary ~ edlevel +
+ tag(1 + edlevel | country),
data=data2006.mod1, model="probit.mixed")
However, this syntax does not allow me to specify that the intercept and
edlevel variable should vary as a function of GDPpc06.full, as in the first
model specified above. I have tried including multiple tags at the
non-group level of the model specification - i.e. one for the intercept and
one for the edlevel variable - but this does not seem to work either.
Do you have any suggestions for how to fix the syntax?
Sincerely,
Jason
--
Jason I. McMann
PhD Student | Department of Politics
Princeton University | jmcmann(a)princeton.edu
The argument "weights" takes the name of the variable. So, you should try something like:
weights = "weights"
Best,
Kosuke
Department of Politics
Princeton University
http://imai.princeton.edu
On Sep 19, 2011, at 11:45 AM, Pingaul jb wrote:
> Dear professor,
> Thanks for your answer! I finally built on MatchIt to write quick functions to help in the matching with multiple imputation (equivalent to matchit, summary and match.data). I don't think they are very elegant but I send them to you anyway now that they are done (with a csv file with data as an example).
> More importantly, I get an error warning with a syntax adapted from Ho et al. (2011) with MatchIt to calculate ATT. The syntax with the article is with method=”nearest” with no replacement. I tried with replacement. Therefore, it seems I need to introduce weights when estimating the model on the controls. But when I apply the resulting model on the treated I get a problem with different variables length for the weights. To make sure the control group is well matched I think I must introduce the weights anyway but I’m unsure how to do it. Under is my syntax with the lalonde data.
> library(MatchIt)
> library(Zelig)
> data(lalonde)
> m.out0 <- matchit(treat ~ age + educ + black + hispan + nodegree + married + re74 + re75, method = "nearest",replace=T, data = lalonde)
> datacontrol= match.data(m.out0, "control")
> summary(m.out0)
> datatreat=match.data(m.out0, "treat")
> z.out1 <- zelig(re78 ~ age + educ + black + hispan + nodegree + married + re74 + re75, data = datacontrol,weights=datacontrol$weights, model = "ls")
> x.out1 <- setx(z.out1, data = datatreat, cond = TRUE)
> s.out1 <- sim(z.out1, x = x.out1)
>
>
> Best regards,
>
> Jean-Baptiste
>
> --- En date de : Jeu 15.9.11, Kosuke Imai <kimai(a)Princeton.EDU> a écrit :
>
> De: Kosuke Imai <kimai(a)Princeton.EDU>
> Objet: Re: MatchIt Zelig and multiple imputation
> À: "Pingaul jb" <pingaultjb(a)yahoo.fr>
> Cc: "matchit" <matchit(a)lists.gking.harvard.edu>, "zelig(a)lists.gking.harvard.edu" <zelig(a)lists.gking.harvard.edu>
> Date: Jeudi 15 septembre 2011, 5h03
>
> Unfortunately, I don't think we have an automated procedure for everything. You would have to multiply impute the data, do matching on each imputed data set, and then combine it in zelig() using mi() function. But this does not require any programming. You can simply run the same matching procedure on each data set via matchit() and then feed the resulting multiple matched data sets into zelig().
>
> Good luck,
> Kosuke
>
> Department of Politics
> Princeton University
> http://imai.princeton.edu
>
>
> On Sep 13, 2011, at 6:02 PM, Pingaul jb wrote:
>
> > Dear Professor,
> > I’m a post-doctoral student at Montreal University. I’m actually in Columbia, working and propensity scores with a colleague and using MatchIt and Zelig. First, congratulations for your packages that are very flexible.
> >
> > My question is about multiple imputation and propensity scores with these softwares. From what I understand, combining both approaches would include:
> >
> > 1/ Doing multiple imputation and testing which variables to include.
> >
> > 2/ Propensity score analysis on each imputed data set and pooling the overall balance to check if it is ok (or on each data set?).
> >
> > 3/ Calculation of the quantities of interest for each data set
> >
> > 4/ Pooling the quantities across data sets.
> >
> > I would like to know if there is a written syntax to perform the MatchIt analysis for all of the imputed data set without having to do it manually and check the overall balance. Also, in theory, the number of individuals retained after propensity score matching and the weights can be different for each imputed data set. So that we have to perform the final analysis on each one and then pool the data with a specific procedure to take into account the eventual varying Ns? I normally use Mice package for multiple imputation but it seems that Zelig handle Amelia. My colleague seems to do be able to do all that in stata, but I’m not sure how to make all the three R packages work together.
> >
> > I would be very happy if you could indicate to me a reference or a place where I can find the syntax to do that (I’ve been using R for some times so I can use packages easily but I have no programming skills).
> >
> >
> > Best Regards!
> >
> >
> >
> > Jean-Baptiste
> >
>
> <MatchItMI.txt><DataExample.csv>
-
Zelig Mailing List, served by Harvard-MIT Data Center
Send messages: zelig(a)lists.gking.harvard.edu
[un]subscribe Options: http://lists.gking.harvard.edu/?info=zelig
Zelig program information: http://gking.harvard.edu/zelig/
It's possible that as it is implemented, the bprobit in R does not take an endogenous variable...
Best,
Kosuke
Department of Politics
Princeton University
http://imai.princeton.edu
On Sep 20, 2011, at 2:06 AM, Stefanie Schurer wrote:
> Hi Kosuke,
>
> I am a problem with Zelig’s bivariate probit model. I am trying to replicate a study by Carrasco 2001, JBES, which estimates jointly labour supply and fertility with a bivariate probit model. I can replicate her binary response results with R, but once using zelig’s - bprobit – command I get 14 error messages and totally nonsensical results. The problem is that when estimating the exact same model with STATA’s – biprobit -- command, I am able to replicate Carrasco’s results. Is there any known bug in bprobit which I happen to be not aware of?
>
> This is what I programmed (please note that this is a recursive model in which f = fertility is an endogenous RHS variable, and thus is separately modelled in the second equation, using an instrument for identification “dsex”).
>
> Any help would be highly appreciated as I intend to teach this to my third year econometrics students next Thursday.
>
> Cheers,
> Stefi
>
> ######
>
> fml <- list(mu1 = dhw ~ f + ags26l + fxag26l + educ2 + educ3 + drace + age + income + dhwl, mu2 = f ~ ags26l + educ2 + educ3 + drace + age + income + dsex)
> z.out <- zelig(fml, model = "blogit", data = mydata)
> z.out
>
> Below is the errors I get
>
> Warning messages:
> 1: glm.fit: algorithm did not converge
> 2: In checkwz(wz, M = M, trace = trace, wzeps = control$wzepsilon) :
> 805 elements replaced by 1.819e-12
> 3: In tfun(mu = mu, y = y, w = w, res = FALSE, eta = eta, ... :
> fitted values close to 0 or 1
> 4: In checkwz(wz, M = M, trace = trace, wzeps = control$wzepsilon) :
> 2064 elements replaced by 1.819e-12
> 5: In tfun(mu = mu, y = y, w = w, res = FALSE, eta = eta, ... :
> fitted values close to 0 or 1
> 6: In tfun(mu = mu, y = y, w = w, res = FALSE, eta = eta, ... :
> fitted values close to 0 or 1
> 7: In tfun(mu = mu, y = y, w = w, res = FALSE, eta = eta, ... :
> fitted values close to 0 or 1
> 8: In tfun(mu = mu, y = y, w = w, res = FALSE, eta = eta, ... :
> fitted values close to 0 or 1
> 9: In tfun(mu = mu, y = y, w = w, res = FALSE, eta = eta, ... :
> fitted values close to 0 or 1
> 10: In tfun(mu = mu, y = y, w = w, res = FALSE, eta = eta, ... :
> fitted values close to 0 or 1
> 11: In tfun(mu = mu, y = y, w = w, res = FALSE, eta = eta, ... :
> fitted values close to 0 or 1
> 12: In tfun(mu = mu, y = y, w = w, res = FALSE, eta = eta, ... :
> fitted values close to 0 or 1
> 13: In tfun(mu = mu, y = y, w = w, res = FALSE, eta = eta, ... :
> fitted values close to 0 or 1
> 14: In eval(expr, envir, enclos) :
> iterations terminated because half-step sizes are very small
-
Zelig Mailing List, served by Harvard-MIT Data Center
Send messages: zelig(a)lists.gking.harvard.edu
[un]subscribe Options: http://lists.gking.harvard.edu/?info=zelig
Zelig program information: http://gking.harvard.edu/zelig/
Hi,
I'm trying to do elegant coding, but I have trouble with setx(). I defined two objects, tx.var, wich contains the name of the treatment variable, and causal.model, which contains the model used in zelig() (see the code below). Everything works fine except in setx(). When I specify tx.var instead of TREAT, which is the name of the treatment variable, sim() produce zero effect. But when I specify TREAT, sim() produces a quantity. Is there something I can do to correct this ?
tx.var <- c("TREAT")
causal.model <- QASBAT ~ TREAT + T + T2 + T3 + tag(T | ID.factor)
(some other codes here)
z.out.1 <- zelig(formula= as.formula(causal.model), data=matched.1.mtch.long, model="ls.mixed")
x.out.0.1 <- setx(z.out.1, fn=NULL, tx.var=0)
x.out.1.1 <- setx(z.out.1, fn=NULL, tx.var=1)
s.out.1 <- sim(z.out.1, x=x.out.0.1, x1=x.out.1.1)
Merci,
François Maurice, B. Sc., A. Stat.
Candidat à la maîtrise
Département de sociologie
Université de Montréal