Hello,I have estimated a logistic regression model with Zelig using the mi
command to combine multiple imputed datasets. I understand that the summary
output is the combined result and that I can also print out individual
results by dataset. However, in calculating predicted probabilities, I am
wondering which coefficients are being used by Zelig - would these be from
the combined results? In addition, when setting the independent variables
to their means (setx command) - would this be some weighted mean of that
variable across those five datasets? Finally, could someone clarify the
difference between the expected values and the predicted values that are
given in the output.
Thank you,
Amber Wichowsky
University of Wisconsin-Madison
Dear Colleagues:
I am using Zelig to estimate a population model and would like to do
some counterfactual simulation based on the results. Both the
dependent (whether have a newly born boy) and the main independent
variable of interest (whether had a prior abortion) are binary ones.
Zelig can handle questions like: what is the difference in probability
of having a boy between those who had prior abortion and those who had
not. I want to go a step further and ask: give the estimated model, if
50% of the total women had abortion, what the percentage of boys will
be in all the newly born babies? What if only 25% of the women had
abortion? And what about 75% of women had abortion? etc. Is there an
easy way to do this? Thanks.
Best,
Shige
-
Zelig Mailing List, served by Harvard-MIT Data Center
Send messages: zelig(a)lists.gking.harvard.edu
[un]subscribe Options: http://lists.gking.harvard.edu/?info=zelig
Zelig program information: http://gking.harvard.edu/zelig/
Dear Colleagues:
In the process of simulating the sex ratio at birth using Zelig, I
encountered a memory problem. Here is my program:
-------------------
z.out <- zelig(...)
x.0 <- setx(z.out, abort = 0, fn = NULL)
x.1 <- setx(z.out, abort = 1, fn = NULL)
s.out <- sim(z.out, x=x.0, x1=x.1)
summary(s.out)
-------------------
The first three statements ran fine. But I got error message with the
fourth statement saying something like "cannot allocate vector size of
380 MB". If I get rid of the "fn = NULL" option, the problem goes
away. is this because my data is too big? I am running Ubuntu linux on
a core2duo laptop with 2 GB of memory. My data file contains about
50,000 observations.
The workaround (other than buying a new computer with 64-bit
processor) is draw a random sample from the data file and do the
counterfactual simulation on that smaller data, something like:
------------------------
z.out <- zelig(...)
sd <- d[sample(1:nrow(d), 2000, replace = F),]
x.0 <- setx(z.out, abort = 0, fn = NULL, data=sd)
x.1 <- setx(z.out, abort = 1, fn = NULL, data=sd)
s.out <- sim(z.out, x=x.0, x1=x.1)
summary(s.out)
-----------------------
Is this a viable option? Are there other options?
Thanks.
Best,
Shige
-
Zelig Mailing List, served by Harvard-MIT Data Center
Send messages: zelig(a)lists.gking.harvard.edu
[un]subscribe Options: http://lists.gking.harvard.edu/?info=zelig
Zelig program information: http://gking.harvard.edu/zelig/
Hi Gregor,
You are correct that the predicted values should be draws from the
normal distribution. Thanks for catching this!
Additionally, the code benefits from your clean-up proposals ...
pulling definitions out of loops, adding stop() errors, updating
dependencies, etc. I appreciate your edits. My code can virtually
always benefit from an editor!
Finally, the log-normal support is a straightforward extension.
Thanks for the additional utility.
So, in short, thank you for your careful attention and contribution
to the public good.
Much appreciation,
Delia
************************************************
Delia Bailey
Washington University in St. Louis
School of Law, CERL
Campus Box 1120
One Brookings Drive
St. Louis, MO 63130
dbailey(a)wustl.edu
http://delia.wustl.edu
************************************************
Begin forwarded message:
> From: Gregor Gorjanc <gregor.gorjanc(a)bfro.uni-lj.si>
> Date: January 8, 2008 1:03:25 PM EST
> To: zelig(a)lists.gking.harvard.edu
> Subject: [zelig] Re: Predicted values for lm.mixed + patches
>
> Hi!
>
> I fixed the issue with predicted values for lm.mixed plus some
> other minor
> tweaks. I uploaded the files as noted bellow. Please let me know
> what you
> think about these fixes/additions.
>
> Regards, Gregor
>
> ----------------------------------------------------------------------
> ----
>
> http://gregor.gorjanc.googlepages.com/qi.mixed.R
>
> - fixed issue with predicted values
>
> - added two errors i.e. stop() if family or link not provided by
> qi.mixed.R is being used. Additionally, I added support for log-
> normal
> model to above file, so that one can use
>
> z.out1 <- zelig(income ~ education + age + female + tag(1 | state),
> data=voteincome, model="ls.mixed", family=gaussian
> (link="log"))
>
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> ----------------------------------------------------------------------
> ----
>
> http://gregor.gorjanc.googlepages.com/ls.mixed.Rnw
>
> - added note about support for log-normal mixed model
>
> ----------------------------------------------------------------------
> ----
>
> http://gregor.gorjanc.googlepages.com/param.mixed.R
>
> - removed duplicated code so that it is easier to understand what
> is done
>
> ----------------------------------------------------------------------
> ----
>
> http://gregor.gorjanc.googlepages.com/gamma.mixed.Rnw
> http://gregor.gorjanc.googlepages.com/logit.mixed.Rnw
> http://gregor.gorjanc.googlepages.com/poisson.mixed.Rnw
> http://gregor.gorjanc.googlepages.com/probit.mixed.Rnw
>
> - changed Matrix package to lme4, since lmer is now back in lme4
>
> -
> Zelig Mailing List, served by Harvard-MIT Data Center
> Send messages: zelig(a)lists.gking.harvard.edu
> [un]subscribe Options: http://lists.gking.harvard.edu/?info=zelig
> Zelig program information: http://gking.harvard.edu/zelig/
>
>
************************************************
Delia Bailey
Washington University in St. Louis
School of Law, CERL
Campus Box 1120
One Brookings Drive
St. Louis, MO 63130
dbailey(a)wustl.edu
http://delia.wustl.edu
************************************************
I apologize if I am missing something ovious. I am analyzing the Current
Population Survey (CPS) which uses the multi-stage stratified samples.
__________________________________________________________________________
This email message is for the sole use of the intended recipient(s) and
may contain confidential information. Any unauthorized review, use,
disclosure or distribution is prohibited. If you are not the intended
recipient, please contact the sender by reply email and destroy all copies
of the original message.
"lm.mixed", "logit.mixed", "probit.mixed", "gamma.mixed", "poisson.mixed"
-
Zelig Mailing List, served by Harvard-MIT Data Center
Send messages: zelig(a)lists.gking.harvard.edu
[un]subscribe Options: http://lists.gking.harvard.edu/?info=zelig
Zelig program information: http://gking.harvard.edu/zelig/
Hello!
I noticed that the predicted vaues (PR) in ls.mixed are not caulculated as in
other {poisson,gamma,probit,logit}.mixed models. In ls.mixed the PR are equal
to
Y = Xbeta + Zb
while PR in other *.mixed models are equal to
rdist(linkinv(eta))
where
eta = Xbeta + Zb
Current implementation for PR in ls.mixed is not predicting the values from
the normal distribution, but just adding variability from beta and b posteriors
to fitted values (Xbeta + Zb).
If I got the whole thing right, the code in qi.mixed.R (from line 95)
should be
else if (class(object) == "lmer"){
ev <- betas %*% t(as.matrix(fTerms))
mu <- ev
## For predicted values, add in random effects draws
for (i in 1:length(rTerms)){
mu <- mu + gammas[[names(rTerms[i])]] %*%
t(as.matrix(rTerms[[i]]))
}
## Sample from normal distribution with given mean and variance
n <- length(mu[,1])
for (i in 1:ncol(mu)){
pr[,i] <- rnorm(n, mean=mu[,i], sd=attr(VarCorr(object), "sc"))
}
}
Of course ls.mixed vignette should be updated. My proposal is:
The predicted values (qi$pr) are draws from the gaussian (normal) distribution
defined by mean $\mu_{ij}$ and variance $\sigma^2$,
$$\mu_{ij} = X_{ij}\beta + Z_{ij} b_{i}$$
given $X_{ij}$ and $Z_{ij}$ and simulations of $\beta$ and $b_{i}$ from
their posterior distributions. The estimated variance covariance matrices
are taken as correct and are themselves not simulated.
Additionally the code in qi.mixed.R uses length(mut[,i]) or length(mut[i,])
within each loop of prediction. I propose the following change starting on
line 65
n <- length(mut[, 1])
if (family == "binomial"){
ev <- theta
for (i in 1:ncol(mut)){
pr[,i] <- as.character(rbinom(n, 1, mut[,i]))
}
if (!is.null(y)) {
if (NCOL(y) > 1) {
y <- y[,1]
}
}
}
else if (family == "Gamma"){
ev <- theta * 1/alpha
n <- length(mut[i,])
for (i in 1:nrow(mut)){
pr[i,] <- rgamma(n, shape = mut[i,], scale= 1/alpha)
}
}
else if (family == "poisson"){
ev <- theta
for (i in 1:ncol(mut)){
pr[,i] <- rpois(n, lambda = mut[,i])
}
}
Regards, Gregor
-
Zelig Mailing List, served by Harvard-MIT Data Center
Send messages: zelig(a)lists.gking.harvard.edu
[un]subscribe Options: http://lists.gking.harvard.edu/?info=zelig
Zelig program information: http://gking.harvard.edu/zelig/