Good morning
If I run
<<<
susan.lsmixed.out <- zelig(formula = unprot_vag_sex ~ married + age + TREATMENT.ARM*time + highest_grade + income + tag(1|id),
data = susanMI.out$imputations, model = "ls.mixed")
summary(susan.lsmixed.out)
>>>>
I get an error
Error in x$coef : $ operator is invalid for atomic vectors
Searching the archives, I see that others have had similar problems. Is there a workaround?
summary(susan.lsmixed.out[[1]])
works fine; should I then average across the five imputed data sets?
thanks!
Peter
Peter L. Flom, PhD
Statistical Consultant
Website: http://www DOT statisticalanalysisconsulting DOT com/
Writing; http://www.associatedcontent.com/user/582880/peter_flom.html
Twitter: @peterflom
-
Zelig Mailing List, served by Harvard-MIT Data Center
Send messages: zelig(a)lists.gking.harvard.edu
[un]subscribe Options: http://lists.gking.harvard.edu/?info=zelig
Zelig program information: http://gking.harvard.edu/zelig/
maybe, the problem is that you are creating the interaction terms
separately. In R, you should do the following:
y ~ x1 + x2 + x1:x2
if you want to include an interaction term in addition to the main terms.
setx() assumes such an input.
Kosuke
--
Department of Politics
Princeton University
http://imai.princeton.edu
On Tue, 25 May 2010, Eric McGhee wrote:
> I'm taking this off-list, because I have a feeling my questions are not
> of general interest. However, I hope you can help me through a few more
> questions.
>
> The standard error of the estimate I described below seems enormously
> high to me. Taken literally, it suggests that a dichotomous variable
> that once had a standard error of 0.013 (since the mean of the var is
> 0.504 and the number of cases in the data set is 1518) suddenly has a
> standard error of 0.1446 when translated into probabilities through a
> logit model. A confidence interval of 0.1446*1.96=0.283 makes
> statistical significance basically impossible. Even apparently enormous
> shifts in the mean are reduced to mush.
>
> The irony is that if I set all vars to their sample means (a possibly
> unrealistic extrapolation), the problem goes away. I get a very
> manageable standard error. So a prediction I'm less certain about looks
> more solid.
>
> Some background: I'm trying to replicate a procedure in the literature
> that simulates "full information" by first interacting all the
> independent variables in the model with a variable indicating how well
> informed a respondent is, and then generating predicted values with all
> respondents set equal to the fully-informed condition. Even though this
> literature has been careful to generate bootstrapped errors and the
> like, I've never seen them produce a standard error this large.
>
> I feel like I must be missing something. Isn't there any way to
> generate conditional predictions without losing all precision in the
> estimates?
>
> Thanks again for any help you can provide.
>
> Best,
> Eric
>
> Eric McGhee | Research Fellow | PPIC | 415-291-4439
>
> Any opinions expressed in this message are those of the author alone and
> do not necessarily reflect any position of the Public Policy Institute
> of California.
>
>
> -----Original Message-----
> From: Kosuke Imai [mailto:kimai@Princeton.Edu]
> Sent: Tuesday, May 25, 2010 10:52 AM
> To: Eric McGhee
> Cc: zelig(a)lists.gking.harvard.edu
> Subject: RE: [zelig] simple question
>
> "sd" is the standard deviation of posterior distribution, which is
> equivalent to standard error asymptotically. So, you can interpret it
> as
> standard error.
>
> Kosuke
>
>
-
Zelig Mailing List, served by Harvard-MIT Data Center
Send messages: zelig(a)lists.gking.harvard.edu
[un]subscribe Options: http://lists.gking.harvard.edu/?info=zelig
Zelig program information: http://gking.harvard.edu/zelig/
When using the fn=NULL option in setx, how does one calculate the
standard error of the resulting expected values? Is it just "sd" from
the output divided by the square root of the number of cases in the
data? Or something else? I'm confused because "sd" for conditional
prediction is much larger than "sd" for a unconditional prediction (like
setting all vars to their sample means), even though the former seems
grounded in more real information.
Thanks,
Eric
Eric McGhee
Research Fellow
PUBLIC POLICY
INSTITUTE OF CALIFORNIA
500 Washington Street, Suite 600
San Francisco, CA 94111
tel 415 291 4439
fax 415 291 4401
web www.ppic.org <http://www.ppic.org>
Any opinions expressed in this message are those of the author alone and
do not necessarily reflect any position of the Public Policy Institute
of California.
Hi everyone,
I'm sorry to ask what might be a basic R question, but when I run:
impa.model <- zelig(impa_nm ~ age + age_sqr + hh_income_hundred + urban +
female, model = "logit.survey", weights = ~weight, ids = ~county, data =
my.data)
I get the warning:
In res$call <- as.call(zelig.call) :
Reached total allocation of 8175Mb: see help(memory.size)
After running summarize(impa.model) I get the output from the logit model,
but the setx commands I have written afterwards give the same above warning
message (see output below).
My computer has 8 G of RAM, so I can't allocate more memory I think. Even
though I have a large dataset (2.5 million observations), is it really
possible that the program requires so much memory?
Thank you kindly,
Prashant
#rest of code and output after the zelig command line
> summary(impa.model)
Call:
zelig(formula = impa_nm ~ age + age_sqr + hh_income_hundred +
urban + female, model = "logit.survey", data = my.data,
weights = ~weight, ids = ~county)
Survey design:
svydesign(data = data, ids = ids, probs = probs, strata = strata,
fpc = fpc, nest = nest, check.strata = check.strata, weight = weights)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -4.227e+00 2.916e-02 -144.968 < 2e-16 ***
age 1.700e-02 1.129e-03 15.057 < 2e-16 ***
age_sqr 4.049e-04 1.187e-05 34.120 < 2e-16 ***
hh_income_hundred -2.853e-03 1.017e-04 -28.040 < 2e-16 ***
urban -7.631e-02 2.056e-02 -3.711 0.000222 ***
femalefemale -1.704e-01 8.307e-03 -20.513 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 0.9801836)
Number of Fisher Scoring iterations: 7
WARNING INFORMATION
1: In class(y) <- oldClass(x) :
Reached total allocation of 8175Mb: see help(memory.size)
2: In class(y) <- oldClass(x) :
Reached total allocation of 8175Mb: see help(memory.size)
3: Reached total allocation of 8175Mb: see help(memory.size)
4: Reached total allocation of 8175Mb: see help(memory.size)
5: In ifelse(y > mu, d.res, -d.res) :
Reached total allocation of 8175Mb: see help(memory.size)
6: In ifelse(y > mu, d.res, -d.res) :
Reached total allocation of 8175Mb: see help(memory.size)
> impair.urb15f.out <- setx(impa.model, urban = 1, female = 1, age = 15,
hh_income_hundred = quantile(hh_income_hundred, .05:1))
> impair.urb40f.out <- setx(impa.model, urban = 1, female = 1, age = 40,
hh_income_hundred = quantile(hh_income_hundred, .05:1))
WARNING INFORMATION:
1: In as.list.data.frame(X) :
Reached total allocation of 8175Mb: see help(memory.size)
2: In as.list.data.frame(X) :
Reached total allocation of 8175Mb: see help(memory.size)
Hi,
I apologize in advance if this is a basic question. I am using zelig and
logit.survey and trying to simulate simple quantities of interest using setx
and sim. After a standard sim command, I get the error
Argument eta must be a nonempty numeric vector
I have checked my syntax against standard examples but am still not sure
what the problem could be. Any help would be most appreciated.
An example of some of the code is:
impa.model <- zelig(impa ~ age + age_sqr + hh_income + urban + female, model
= "logit.survey", weights = ~weight, ids = ~county, data = my.data)
impa.urb20pf.age <- setx(impa.model, urban = 1, female = 1, age = 0:100,
hh_income = quantile(hh_income, .2))
impa.urb50pf.age <- setx(impa.model, urban = 1, female = 1, age = 0:100,
hh_income = quantile(hh_income, .5))
#(everything runs fine up to here)
urbf.age.out <- sim(impa.model, x = impa.urb20pf.age, x1 = impa.urb50pf.age)
Additional background: I have a large dataset (2 million plus) observations
but with no missing data. All of the variables in the logit equation are
numeric (binary or continuous), while the cluster identifier is a
string/categorical.
Thank you so much in advance,
Prashant
I'm having trouble getting Zelig to run simulations from a multinomial
logit model. The DV is called "defhow2" and has five categories, and
I'm running the following syntax:
ppicjan10 <- read.dta("Jan 2010.regonly.dta", convert.factors=FALSE)
ppicjan10$defhow2 <- factor(ppicjan10$defhow2, levels=c(1,2,3,4,8),
labels=c("cuts", "taxes",
"cuts+taxes", "borrow", "other/dk"))
z.out <- zelig(as.factor(defhow2) ~
age+income+educ+lat+sex2+own2+cv+ba+osc+oth+pid7+ideo5+
ageXginfo+incXginfo+educXginfo+latXginfo+ownXginfo+cvXginfo+baXginfo+osc
Xginfo+
othXginfo+pidXginfo+ideoXginfo+ginfo, model="mlogit",
data=ppicjan10, baseline="other/dk")
x.out <- setx(z.out, fn=NULL)
s.out <- sim(z.out, x=x.out)
Everything is fine until the simulation command, where I get the
following error:
Error in factor(pr, levels = sort(unique(pr)), labels = ynames) :
invalid labels; length 5 should be 1 or 4
Resources online suggest that missing data is often to blame for this
error message, so I eliminated all of my missing data just to see if I
could get it to work in principle. No luck-same message. Does anyone
know what might be going on?
Eric McGhee
Research Fellow
PUBLIC POLICY
INSTITUTE OF CALIFORNIA
500 Washington Street, Suite 600
San Francisco, CA 94111
tel 415 291 4439
fax 415 291 4401
web www.ppic.org <http://www.ppic.org>
Any opinions expressed in this message are those of the author alone and
do not necessarily reflect any position of the Public Policy Institute
of California.