Re: question related to multiple imputation - Zelig

26 Nov 2005

(see below)

Zelig combines quantities of interest from multiply imputed data sets. 
ANOVA tables aren't really quantities of interest; they are normally 
intermediate quantities.  you can take the combined simulations tho and 
compute any quantity you might like.

on your other question, we're putting multilevel models and many others in 
Zelig now.

Gary

On Sun, 27 Nov 2005, Leo Gürtler wrote:

...
  Gary King wrote:

 Dear Prof King,
 thank you very much, this looks very interesting. I immediately tried Zelig 
 but encountered a problem while trying anova on multiple imputed datasets

 this anova works without any imputation:

  data(macro)

  #      Estimate model:

  z.out1 <- zelig(unem ~ gdp + capmob + trade, model = "normal",  +    
              data = macro)

  summary(z.out1) 
 Call:
 zelig(formula = unem ~ gdp + capmob + trade, model = "normal",
    data = macro)

 Deviance Residuals:
  Min      1Q  Median      3Q     Max -5.301  -2.077  -0.319   1.979   7.772 
 Coefficients:
           Estimate Std. Error t value Pr(>|t|)   (Intercept)  6.18129 
 0.45057   13.72  < 2e-16 ***
 gdp         -0.32360    0.06282   -5.15  4.4e-07 ***
 capmob       1.42194    0.16644    8.54  4.2e-16 ***
 trade        0.01985    0.00561    3.54  0.00045 ***
 ---
 Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05
'.' 0.1 ' ' 1

 (Dispersion parameter for gaussian family taken to be 7.543)

   Null deviance: 3664.8  on 349  degrees of freedom
 Residual deviance: 2609.9  on 346  degrees of freedom
 AIC: 1706

 Number of Fisher Scoring iterations: 2

   anova(z.out1)  Analysis of Deviance Table

 Model: gaussian, link: identity

 Response: unem

 Terms added sequentially (first to last)

       Df Deviance Resid. Df Resid. Dev
 NULL                      349       3665
 gdp      1      428       348       3237
 capmob   1      532       347       2705
 trade    1       95       346       2610
 : 

 -> this does not work:

   data(immi1, immi2, immi3, immi4, immi5)
  z.out <- zelig(ipip ~ wage1992 + prtyid + ideol,  +                model =
"normal",
 +                data = list(immi1, immi2, immi3, immi4, immi5))

  summary(z.out) 
  Model: normal
  Number of multiply imputed data sets: 5

 Combined results:

 Call:
 zelig(formula = ipip ~ wage1992 + prtyid + ideol, model = "normal",
    data = list(immi1, immi2, immi3, immi4, immi5))

 Coefficients:
              Value Std. Error  t-stat   p-value
 (Intercept)  3.43930    0.09233 37.2483 4.810e-92
 wage1992    -0.24000    0.13256 -1.8105 7.724e-02
 prtyid       0.00967    0.01405  0.6885 4.965e-01
 ideol        0.05795    0.02156  2.6878 1.425e-02

 For combined results from datasets i to j, use summary(x, subset = i:j).
 For separate results, use print(summary(x), subset = i:j).

   anova(z.out)  Fehler in anova(z.out) :
keine anwendbare Methode für "anova"
 ^^^^^^^^^^^

 Is there a trick to get anovas with multiple imputed datasets pooled by 
 Zelig? (anova is not mentioned in the docu)
 Last: is Zelig compatible with lme from Peinheiro&Bates, i.e.is it possible 
 to fit multilevel models?

 thanks a lot,
 best

 leo gürtler

  the trick is to think of the final quantity of
interest you might want and 
 to combine according to rubin's rules or, much more conveniently, by 
 simulation.  Zelig or clarify will do that automatically for you.

 Best if luck with your research,
 Gary King
 -----Original Message-----
 From: "=?ISO-8859-1?Q?Leo_G=FCrtler?=" &lt;leog(a)anicca-vijja.de&gt;
 Date: Saturday, Nov 26, 2005 10:22 am
 Subject: question related to multiple imputation

 Dear Prof. King,

 I visited your webpage on missing data, because I am searching for a method 
 to pool Anova tables and to estimate effect-size and power of a mixed 
 effect model in case of multiple imputation. I am not an statistician but a 
 psychologist. I work with R but for that I did not found a module which 
 seems to fulfill this need.

 (1) after pooling the estimates of beta coefficients (e.g. by using pool() 
 from MICE or mi.inference() from NORM in R), is this also necessary for 
 p-values of F-statistics, i.e. how to pool ANOVAS in R? Or is it ok if the 
 betas are statistical significant to proceed with the ANOVA from single 
 imputation? It should not, but how to to pool the F-statistics etc as 
 described by Rubins rules. For that it is only possible with a 
 standarderror of F-statistics, but how to obtain them?

 (2) How to proceed in the same manner with AIC/ BIC/ loglik that are 
 criteria in linear mixed effect models (Pinheiro&Bates)? How to obtain 
 standard errors for these statistics?

 (3) How to proceed with the determination of effect sizes/ power? Are there 
 formulas or how to do this empirically with multiple imputaiton? So far, 
 what I do not understand is how to implement some kind of effect size for 
 the detection of sample differences.

 (4) how "good" is multiple imputation compared to bootstrapping a single 
 imputation? According to literature, about 5-10 imputations are enough to 
 get unbiased estimates by pooling the results according to Rubins-rules. 
 But these rules are quite simple so that I thought "why not boostrap via 
 the residuals?", i.e.

 a) pooling the data _prior_ to the data analysis by averaging over the 
 imputations and then
 b) bootstrapping this dataset with the residuals like

 1- fit model1
 2- determine p value
 3- model2 = fit_model1 + bootstrap(residuals_model1)
 4- fit model2
 5- repeat 1-4 for x times (number of simulations)
 6- plot p-values to determine power

 Thank you very much for your answer. I hope it is ok to ask you these 
 questions (maybe there are quite simple for a statistician), but after 
 thinking alone I decided that it is necessary to search for a reasonable 
 answer by contacting people. In my environment I do not know anybody who 
 can answer my questions.

 With best regards,

 Leo Gürtler / Germany (Berg)

 PS: I know that measureing effect sizes in this proposed way is a post-hoc 
 procedure and therefor not the best way at all, but in this case