(see below)
Zelig combines quantities of interest from multiply imputed data sets.
ANOVA tables aren't really quantities of interest; they are normally
intermediate quantities. you can take the combined simulations tho and
compute any quantity you might like.
on your other question, we're putting multilevel models and many others in
Zelig now.
Gary
On Sun, 27 Nov 2005, Leo Gürtler wrote:
Gary King wrote:
Dear Prof King,
thank you very much, this looks very interesting. I immediately tried Zelig
but encountered a problem while trying anova on multiple imputed datasets
this anova works without any imputation:
data(macro)
# Estimate model:
z.out1 <- zelig(unem ~ gdp + capmob + trade, model = "normal",
+
data = macro)
summary(z.out1)
Call:
zelig(formula = unem ~ gdp + capmob + trade, model = "normal",
data = macro)
Deviance Residuals:
Min 1Q Median 3Q Max -5.301 -2.077 -0.319 1.979 7.772
Coefficients:
Estimate Std. Error t value Pr(>|t|) (Intercept) 6.18129
0.45057 13.72 < 2e-16 ***
gdp -0.32360 0.06282 -5.15 4.4e-07 ***
capmob 1.42194 0.16644 8.54 4.2e-16 ***
trade 0.01985 0.00561 3.54 0.00045 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05
'.' 0.1 ' ' 1
(Dispersion parameter for gaussian family taken to be 7.543)
Null deviance: 3664.8 on 349 degrees of freedom
Residual deviance: 2609.9 on 346 degrees of freedom
AIC: 1706
Number of Fisher Scoring iterations: 2
anova(z.out1)
Analysis of Deviance Table
Model: gaussian, link: identity
Response: unem
Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev
NULL 349 3665
gdp 1 428 348 3237
capmob 1 532 347 2705
trade 1 95 346 2610
:
-> this does not work:
data(immi1, immi2, immi3, immi4, immi5)
z.out <- zelig(ipip ~ wage1992 + prtyid + ideol,
+ model =
"normal",
+ data = list(immi1, immi2, immi3, immi4, immi5))
summary(z.out)
Model: normal
Number of multiply imputed data sets: 5
Combined results:
Call:
zelig(formula = ipip ~ wage1992 + prtyid + ideol, model = "normal",
data = list(immi1, immi2, immi3, immi4, immi5))
Coefficients:
Value Std. Error t-stat p-value
(Intercept) 3.43930 0.09233 37.2483 4.810e-92
wage1992 -0.24000 0.13256 -1.8105 7.724e-02
prtyid 0.00967 0.01405 0.6885 4.965e-01
ideol 0.05795 0.02156 2.6878 1.425e-02
For combined results from datasets i to j, use summary(x, subset = i:j).
For separate results, use print(summary(x), subset = i:j).
anova(z.out)
Fehler in anova(z.out) :
keine anwendbare Methode für "anova"
^^^^^^^^^^^
Is there a trick to get anovas with multiple imputed datasets pooled by
Zelig? (anova is not mentioned in the docu)
Last: is Zelig compatible with lme from Peinheiro&Bates, i.e.is it possible
to fit multilevel models?
thanks a lot,
best
leo gürtler
the trick is to think of the final quantity of
interest you might want and
to combine according to rubin's rules or, much more conveniently, by
simulation. Zelig or clarify will do that automatically for you.
Best if luck with your research,
Gary King
-----Original Message-----
From: "=?ISO-8859-1?Q?Leo_G=FCrtler?=" <leog(a)anicca-vijja.de>
Date: Saturday, Nov 26, 2005 10:22 am
Subject: question related to multiple imputation
Dear Prof. King,
I visited your webpage on missing data, because I am searching for a method
to pool Anova tables and to estimate effect-size and power of a mixed
effect model in case of multiple imputation. I am not an statistician but a
psychologist. I work with R but for that I did not found a module which
seems to fulfill this need.
(1) after pooling the estimates of beta coefficients (e.g. by using pool()
from MICE or mi.inference() from NORM in R), is this also necessary for
p-values of F-statistics, i.e. how to pool ANOVAS in R? Or is it ok if the
betas are statistical significant to proceed with the ANOVA from single
imputation? It should not, but how to to pool the F-statistics etc as
described by Rubins rules. For that it is only possible with a
standarderror of F-statistics, but how to obtain them?
(2) How to proceed in the same manner with AIC/ BIC/ loglik that are
criteria in linear mixed effect models (Pinheiro&Bates)? How to obtain
standard errors for these statistics?
(3) How to proceed with the determination of effect sizes/ power? Are there
formulas or how to do this empirically with multiple imputaiton? So far,
what I do not understand is how to implement some kind of effect size for
the detection of sample differences.
(4) how "good" is multiple imputation compared to bootstrapping a single
imputation? According to literature, about 5-10 imputations are enough to
get unbiased estimates by pooling the results according to Rubins-rules.
But these rules are quite simple so that I thought "why not boostrap via
the residuals?", i.e.
a) pooling the data _prior_ to the data analysis by averaging over the
imputations and then
b) bootstrapping this dataset with the residuals like
1- fit model1
2- determine p value
3- model2 = fit_model1 + bootstrap(residuals_model1)
4- fit model2
5- repeat 1-4 for x times (number of simulations)
6- plot p-values to determine power
Thank you very much for your answer. I hope it is ok to ask you these
questions (maybe there are quite simple for a statistician), but after
thinking alone I decided that it is necessary to search for a reasonable
answer by contacting people. In my environment I do not know anybody who
can answer my questions.
With best regards,
Leo Gürtler / Germany (Berg)
PS: I know that measureing effect sizes in this proposed way is a post-hoc
procedure and therefor not the best way at all, but in this case