Unfortunately, I don't think we have an automated procedure for everything. You would have to multiply impute the data, do matching on each imputed data set, and then combine it in zelig() using mi() function. But this does not require any programming. You can simply run the same matching procedure on each data set via matchit() and then feed the resulting multiple matched data sets into zelig().
Good luck,
Kosuke
Department of Politics
Princeton University
http://imai.princeton.edu
On Sep 13, 2011, at 6:02 PM, Pingaul jb wrote:
> Dear Professor,
> I’m a post-doctoral student at Montreal University. I’m actually in Columbia, working and propensity scores with a colleague and using MatchIt and Zelig. First, congratulations for your packages that are very flexible.
>
> My question is about multiple imputation and propensity scores with these softwares. From what I understand, combining both approaches would include:
>
> 1/ Doing multiple imputation and testing which variables to include.
>
> 2/ Propensity score analysis on each imputed data set and pooling the overall balance to check if it is ok (or on each data set?).
>
> 3/ Calculation of the quantities of interest for each data set
>
> 4/ Pooling the quantities across data sets.
>
> I would like to know if there is a written syntax to perform the MatchIt analysis for all of the imputed data set without having to do it manually and check the overall balance. Also, in theory, the number of individuals retained after propensity score matching and the weights can be different for each imputed data set. So that we have to perform the final analysis on each one and then pool the data with a specific procedure to take into account the eventual varying Ns? I normally use Mice package for multiple imputation but it seems that Zelig handle Amelia. My colleague seems to do be able to do all that in stata, but I’m not sure how to make all the three R packages work together.
>
> I would be very happy if you could indicate to me a reference or a place where I can find the syntax to do that (I’ve been using R for some times so I can use packages easily but I have no programming skills).
>
>
> Best Regards!
>
>
>
> Jean-Baptiste
>
-
Zelig Mailing List, served by Harvard-MIT Data Center
Send messages: zelig(a)lists.gking.harvard.edu
[un]subscribe Options: http://lists.gking.harvard.edu/?info=zelig
Zelig program information: http://gking.harvard.edu/zelig/
*Please, let me know how can I install ZeligMultinomial package. I want to
use mlogit, which according to the manual (page 50), is found in said
package.
I tried with the command,
install.packages("ZeligMultinomial", repos="http://r.iq.harvard.edu/",
type="source")
But received the following message:
Warning: dependency 'MNP' is not available
trying URL '
http://r.iq.harvard.edu/src/contrib/ZeligMultinomial_0.5-4.tar.gz'
Content type 'application/x-gzip' length 9730 bytes
opened URL
==================================================
downloaded 9730 bytes
During startup - Warning messages:
1: Setting LC_CTYPE failed, using "C"
2: Setting LC_TIME failed, using "C"
3: Setting LC_MESSAGES failed, using "C"
4: Setting LC_PAPER failed, using "C"
ERROR: dependency 'MNP' is not available for package 'ZeligMultinomial'
* removing
'/Library/Frameworks/R.framework/Versions/2.14/Resources/library/ZeligMultinomial'
The downloaded packages are in
'/private/var/folders/UL/ULLu+bi5GR8IM1G-7XZBJU+++TI/-Tmp-/Rtmp6jw6M9/downloaded_packages'
Warning message:
In install.packages("ZeligMultinomial", repos = "http://r.iq.harvard.edu/",
:
installation of package 'ZeligMultinomial' had non-zero exit status
***
Zelig experts,
[I apologize in advance for the long email.]
I am working with a colleague to figure out a good way to place a
confidence interval around an average treatment effect for the treated
(ATT) when using the Zelig sim(z.out) function and sample size is not
particularly large. It would be great to get some advice about whether
we are correctly using sim.out and whether our alternative approach
makes sense.
My (cursory) understanding of sim.out is that the ATT point estimate
and confidence interval is based on the posterior distribution of the
1,000 conditional expected values for the counterfactual. One concern
is that this can produce an appropriate confidence interval
asymptotically but may be too narrow in finite samples.
As context, we were asked to estimate the effect of attending a magnet
school verses a comparison school (for simplicity assume strong
ignorability even though it probably doesn't hold and the
within-school nesting) on a test score. We tried to equate treatment &
control groups using inverse probability of treatment weighting
(IPTW). After running
x.out1 <- setx(z.out, data =data.t, cond = TRUE)
s.out <- sim(z.out, x = x.out1)
We get the following:
> summary(s.out)
Model: ls
Number of simulations: 1000
Mean Values of Observed Data (n = 156)
(Intercept) ZMath1011 ZRead1011 ZWrite1011
1.00000000 0.06322742 0.05521872 -0.28610638
Pooled Expected Values: E(Y|X)
mean sd 2.5% 97.5%
0.03321294 0.80429472 -1.56758254 1.44453357
Pooled Average Treatment Effect for the Treated: Y - EV
mean sd 2.5% 97.5%
0.05244464 0.01945088 0.01379341 0.08863216
We're worried the sd of 0.02 only reflects between-imputation variance
and not within-sample variance, so we pulled out the expected value
matrix and recalculated the ATT & standard error treating the 1,000
expected values as 1,000 multiply imputed data sets and then used
Little & Rubin combination rules to get total variance:
> ## Merge Expected Values with Main Treatment-Unit Data File ##
> id<-data.t$SASID # vector of student ids
> ev<-s.out$qi$ev # matrix of expected values & students
>
> datar <- NULL for (i in 1:ncol(ev)) { # loop over each treatment student
+ tmp <- cbind(c(1:nrow(ev)),rep(id[i],nrow(ev)),ev[,i])
+ datar <- rbind(datar,tmp)
+ }
>
> datar <- data.frame(datar) names(datar) <- c("m","SASID","EV")
>
> datat<-merge(datar,data.t, by="SASID") # merge with main data set
>
> ## Calculate ATT ##
>
> datat$ATT<-datat$ZMath1112-datat$EV # individual level effect
> att.m<-aggregate(datat$ATT,by=list(datat$m),mean) # mean ATT per imputation
> att.v<-aggregate(datat$ATT,by=list(datat$m),var) # variance of ATT
> per imputation
>
> W <- mean(att.v$x) # average within variance
> B <- sum((att.m$x-mean(att.m$x))^2)/(nrow(ev)-1) # between variance
> T <- sqrt(W/ncol(ev)) + (1+(1/nrow(ev)))*B # total standard error
>
> # ATT point estimate & standard error #
> mean(att.m$x); T
[1] 0.05244464
[1] 0.02913942
> # ATT confidence interval #
> mean(att.m$x)-2*T; mean(att.m$x)+2*T
[1] -0.005834205
[1] 0.1107235
So using this approach returns the same point estimate, but a somewhat
larger standard error (0.029 vs. 0.019). As a point of reference, if
you just run a regression on the full sample (weighted by IPTW) you
get ATT=0.053 (se=0.031).
We would like to estimate the ATT for different subgroups as well as
the overall ATT, and sample size will really become an issue for some
subgroups. Our main question is whether you think our approach is
appropriate or whether we should stick with the sd & confidence
interval produced by sim(z.out) ... or if there's something better we
should do.
Thank you,
Jordan Rickles
Hi,
I'm trying to run a weighted multinomial logistic -- but "mlogit" family
doesn't allow weights... I tryed to implement an external function from
VGAM package not multinomial() -- which seems to be the Zelig's built-in;
but vglm() ] using Zelig2... But I'm couldn't make it work.
Is there any other way to use weights in a mlogit?
thanks,
Rogério J. Barbosa
Researcher at Centre for Metropolitan Studies/Cebrap
São Paulo - Brazil
The zelig() and setx() functions are working fine for me but I am having trouble with the sim() command.
I get the following error:
Error in mvrnorm(num, mu = coef(object), Sigma = vcov(object)) : incompatible arguments
I am running a negative binomial model with a large number of covariates (about 300) and a large data set (about 150,000 observations).
I would be interested in bootstrapping as an alternative, but I need a way to bootstrap with smaller subsets of my dataset or I run out of memory, and I am not sure how to do this either.
Running R version 2.15.0.
Using Zelig (Version 3.5.5, built: 2010-01-20)
Thanks!
________________________________
Mr. Louis Merlin, AICP
Doctoral Student
UNC CH Department of City and Regional Planning