ATT standard errors - Zelig

21 Sep 2012

Zelig experts,

[I apologize in advance for the long email.]

I am working with a colleague to figure out a good way to place a  
confidence interval around an average treatment effect for the treated  
(ATT) when using the Zelig sim(z.out) function and sample size is not  
particularly large. It would be great to get some advice about whether  
we are correctly using sim.out and whether our alternative approach  
makes sense.

My (cursory) understanding of sim.out is that the ATT point estimate  
and confidence interval is based on the posterior distribution of the  
1,000 conditional expected values for the counterfactual. One concern  
is that this can produce an appropriate confidence interval  
asymptotically but may be too narrow in finite samples.

As context, we were asked to estimate the effect of attending a magnet  
school verses a comparison school (for simplicity assume strong  
ignorability even though it probably doesn't hold and the  
within-school nesting) on a test score. We tried to equate treatment &  
control groups using inverse probability of treatment weighting  
(IPTW). After running

x.out1 <- setx(z.out, data =data.t, cond = TRUE)
s.out <- sim(z.out, x = x.out1)

We get the following:

...
  summary(s.out) 
   Model: ls
   Number of simulations: 1000

Mean Values of Observed Data (n = 156)
(Intercept)   ZMath1011   ZRead1011  ZWrite1011
  1.00000000  0.06322742  0.05521872 -0.28610638

Pooled Expected Values: E(Y|X)
        mean          sd        2.5%       97.5%
  0.03321294  0.80429472 -1.56758254  1.44453357

Pooled Average Treatment Effect for the Treated: Y - EV
       mean         sd       2.5%      97.5%
0.05244464 0.01945088 0.01379341 0.08863216

We're worried the sd of 0.02 only reflects between-imputation variance  
and not within-sample variance, so we pulled out the expected value  
matrix and recalculated the ATT & standard error treating the 1,000  
expected values as 1,000 multiply imputed data sets and then used  
Little & Rubin combination rules to get total variance:

...
  ## Merge Expected Values with Main Treatment-Unit Data
File ##
 id<-data.t$SASID # vector of student ids
 ev<-s.out$qi$ev # matrix of expected values & students

 datar <- NULL for (i in 1:ncol(ev)) { # loop over each treatment student +   tmp
<- cbind(c(1:nrow(ev)),rep(id[i],nrow(ev)),ev[,i])
+   datar <- rbind(datar,tmp)
+ }
...

 datar <- data.frame(datar) names(datar) <-
c("m","SASID","EV")

 datat<-merge(datar,data.t, by="SASID") # merge with main data set

 ## Calculate ATT ##

 datat$ATT<-datat$ZMath1112-datat$EV # individual level effect
 att.m<-aggregate(datat$ATT,by=list(datat$m),mean) # mean ATT per imputation
 att.v<-aggregate(datat$ATT,by=list(datat$m),var) # variance of ATT  
 per imputation

 W <- mean(att.v$x) # average within variance
 B <- sum((att.m$x-mean(att.m$x))^2)/(nrow(ev)-1) # between variance
 T <- sqrt(W/ncol(ev)) + (1+(1/nrow(ev)))*B # total standard error

 # ATT point estimate & standard error #
 mean(att.m$x); T [1] 0.05244464
[1] 0.02913942
...
  # ATT confidence interval #
 mean(att.m$x)-2*T; mean(att.m$x)+2*T [1] -0.005834205
[1] 0.1107235

So using this approach returns the same point estimate, but a somewhat  
larger standard error (0.029 vs. 0.019). As a point of reference, if  
you just run a regression on the full sample (weighted by IPTW) you  
get ATT=0.053 (se=0.031).

We would like to estimate the ATT for different subgroups as well as  
the overall ATT, and sample size will really become an issue for some  
subgroups. Our main question is whether you think our approach is  
appropriate or whether we should stick with the sd & confidence  
interval produced by sim(z.out) ... or if there's something better we  
should do.

Thank you,

Jordan Rickles