Zelig May 2010

zelig@lists.gking.harvard.edu

5 participants
6 discussions

summary with imputations bug ... any workarounds?

by Peter Flom

Good morning If I run <<< susan.lsmixed.out <- zelig(formula = unprot_vag_sex ~ married + age + TREATMENT.ARM*time + highest_grade + income + tag(1|id), data = susanMI.out$imputations, model = "ls.mixed") summary(susan.lsmixed.out) >>>> I get an error Error in x$coef : $ operator is invalid for atomic vectors Searching the archives, I see that others have had similar problems. Is there a workaround? summary(susan.lsmixed.out[[1]]) works fine; should I then average across the five imputed data sets? thanks! Peter Peter L. Flom, PhD Statistical Consultant Website: http://www DOT statisticalanalysisconsulting DOT com/ Writing; http://www.associatedcontent.com/user/582880/peter_flom.html Twitter: @peterflom - Zelig Mailing List, served by Harvard-MIT Data Center Send messages: zelig(a)lists.gking.harvard.edu [un]subscribe Options: http://lists.gking.harvard.edu/?info=zelig Zelig program information: http://gking.harvard.edu/zelig/

12 years, 2 months

RE: [zelig] simple question

by Kosuke Imai

maybe, the problem is that you are creating the interaction terms separately. In R, you should do the following: y ~ x1 + x2 + x1:x2 if you want to include an interaction term in addition to the main terms. setx() assumes such an input. Kosuke -- Department of Politics Princeton University http://imai.princeton.edu On Tue, 25 May 2010, Eric McGhee wrote: > I'm taking this off-list, because I have a feeling my questions are not > of general interest. However, I hope you can help me through a few more > questions. > > The standard error of the estimate I described below seems enormously > high to me. Taken literally, it suggests that a dichotomous variable > that once had a standard error of 0.013 (since the mean of the var is > 0.504 and the number of cases in the data set is 1518) suddenly has a > standard error of 0.1446 when translated into probabilities through a > logit model. A confidence interval of 0.1446*1.96=0.283 makes > statistical significance basically impossible. Even apparently enormous > shifts in the mean are reduced to mush. > > The irony is that if I set all vars to their sample means (a possibly > unrealistic extrapolation), the problem goes away. I get a very > manageable standard error. So a prediction I'm less certain about looks > more solid. > > Some background: I'm trying to replicate a procedure in the literature > that simulates "full information" by first interacting all the > independent variables in the model with a variable indicating how well > informed a respondent is, and then generating predicted values with all > respondents set equal to the fully-informed condition. Even though this > literature has been careful to generate bootstrapped errors and the > like, I've never seen them produce a standard error this large. > > I feel like I must be missing something. Isn't there any way to > generate conditional predictions without losing all precision in the > estimates? > > Thanks again for any help you can provide. > > Best, > Eric > > Eric McGhee | Research Fellow | PPIC | 415-291-4439 > > Any opinions expressed in this message are those of the author alone and > do not necessarily reflect any position of the Public Policy Institute > of California. > > > -----Original Message----- > From: Kosuke Imai [mailto:kimai@Princeton.Edu] > Sent: Tuesday, May 25, 2010 10:52 AM > To: Eric McGhee > Cc: zelig(a)lists.gking.harvard.edu > Subject: RE: [zelig] simple question > > "sd" is the standard deviation of posterior distribution, which is > equivalent to standard error asymptotically. So, you can interpret it > as > standard error. > > Kosuke > > - Zelig Mailing List, served by Harvard-MIT Data Center Send messages: zelig(a)lists.gking.harvard.edu [un]subscribe Options: http://lists.gking.harvard.edu/?info=zelig Zelig program information: http://gking.harvard.edu/zelig/

13 years, 11 months

simple question

by Eric McGhee

When using the fn=NULL option in setx, how does one calculate the standard error of the resulting expected values? Is it just "sd" from the output divided by the square root of the number of cases in the data? Or something else? I'm confused because "sd" for conditional prediction is much larger than "sd" for a unconditional prediction (like setting all vars to their sample means), even though the former seems grounded in more real information. Thanks, Eric Eric McGhee Research Fellow PUBLIC POLICY INSTITUTE OF CALIFORNIA 500 Washington Street, Suite 600 San Francisco, CA 94111 tel 415 291 4439 fax 415 291 4401 web www.ppic.org <http://www.ppic.org> Any opinions expressed in this message are those of the author alone and do not necessarily reflect any position of the Public Policy Institute of California.

13 years, 11 months

question about memory

by Prashant

Hi everyone, I'm sorry to ask what might be a basic R question, but when I run: impa.model <- zelig(impa_nm ~ age + age_sqr + hh_income_hundred + urban + female, model = "logit.survey", weights = ~weight, ids = ~county, data = my.data) I get the warning: In res$call <- as.call(zelig.call) : Reached total allocation of 8175Mb: see help(memory.size) After running summarize(impa.model) I get the output from the logit model, but the setx commands I have written afterwards give the same above warning message (see output below). My computer has 8 G of RAM, so I can't allocate more memory I think. Even though I have a large dataset (2.5 million observations), is it really possible that the program requires so much memory? Thank you kindly, Prashant #rest of code and output after the zelig command line > summary(impa.model) Call: zelig(formula = impa_nm ~ age + age_sqr + hh_income_hundred + urban + female, model = "logit.survey", data = my.data, weights = ~weight, ids = ~county) Survey design: svydesign(data = data, ids = ids, probs = probs, strata = strata, fpc = fpc, nest = nest, check.strata = check.strata, weight = weights) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -4.227e+00 2.916e-02 -144.968 < 2e-16 *** age 1.700e-02 1.129e-03 15.057 < 2e-16 *** age_sqr 4.049e-04 1.187e-05 34.120 < 2e-16 *** hh_income_hundred -2.853e-03 1.017e-04 -28.040 < 2e-16 *** urban -7.631e-02 2.056e-02 -3.711 0.000222 *** femalefemale -1.704e-01 8.307e-03 -20.513 < 2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 0.9801836) Number of Fisher Scoring iterations: 7 WARNING INFORMATION 1: In class(y) <- oldClass(x) : Reached total allocation of 8175Mb: see help(memory.size) 2: In class(y) <- oldClass(x) : Reached total allocation of 8175Mb: see help(memory.size) 3: Reached total allocation of 8175Mb: see help(memory.size) 4: Reached total allocation of 8175Mb: see help(memory.size) 5: In ifelse(y > mu, d.res, -d.res) : Reached total allocation of 8175Mb: see help(memory.size) 6: In ifelse(y > mu, d.res, -d.res) : Reached total allocation of 8175Mb: see help(memory.size) > impair.urb15f.out <- setx(impa.model, urban = 1, female = 1, age = 15, hh_income_hundred = quantile(hh_income_hundred, .05:1)) > impair.urb40f.out <- setx(impa.model, urban = 1, female = 1, age = 40, hh_income_hundred = quantile(hh_income_hundred, .05:1)) WARNING INFORMATION： 1: In as.list.data.frame(X) : Reached total allocation of 8175Mb: see help(memory.size) 2: In as.list.data.frame(X) : Reached total allocation of 8175Mb: see help(memory.size)

13 years, 11 months

Error when using survey.logit

by Prashant

Hi, I apologize in advance if this is a basic question. I am using zelig and logit.survey and trying to simulate simple quantities of interest using setx and sim. After a standard sim command, I get the error Argument eta must be a nonempty numeric vector I have checked my syntax against standard examples but am still not sure what the problem could be. Any help would be most appreciated. An example of some of the code is: impa.model <- zelig(impa ~ age + age_sqr + hh_income + urban + female, model = "logit.survey", weights = ~weight, ids = ~county, data = my.data) impa.urb20pf.age <- setx(impa.model, urban = 1, female = 1, age = 0:100, hh_income = quantile(hh_income, .2)) impa.urb50pf.age <- setx(impa.model, urban = 1, female = 1, age = 0:100, hh_income = quantile(hh_income, .5)) #(everything runs fine up to here) urbf.age.out <- sim(impa.model, x = impa.urb20pf.age, x1 = impa.urb50pf.age) Additional background: I have a large dataset (2 million plus) observations but with no missing data. All of the variables in the logit equation are numeric (binary or continuous), while the cluster identifier is a string/categorical. Thank you so much in advance, Prashant

13 years, 11 months

mlogit

by Eric McGhee

I'm having trouble getting Zelig to run simulations from a multinomial logit model. The DV is called "defhow2" and has five categories, and I'm running the following syntax: ppicjan10 <- read.dta("Jan 2010.regonly.dta", convert.factors=FALSE) ppicjan10$defhow2 <- factor(ppicjan10$defhow2, levels=c(1,2,3,4,8), labels=c("cuts", "taxes", "cuts+taxes", "borrow", "other/dk")) z.out <- zelig(as.factor(defhow2) ~ age+income+educ+lat+sex2+own2+cv+ba+osc+oth+pid7+ideo5+ ageXginfo+incXginfo+educXginfo+latXginfo+ownXginfo+cvXginfo+baXginfo+osc Xginfo+ othXginfo+pidXginfo+ideoXginfo+ginfo, model="mlogit", data=ppicjan10, baseline="other/dk") x.out <- setx(z.out, fn=NULL) s.out <- sim(z.out, x=x.out) Everything is fine until the simulation command, where I get the following error: Error in factor(pr, levels = sort(unique(pr)), labels = ynames) : invalid labels; length 5 should be 1 or 4 Resources online suggest that missing data is often to blame for this error message, so I eliminated all of my missing data just to see if I could get it to work in principle. No luck-same message. Does anyone know what might be going on? Eric McGhee Research Fellow PUBLIC POLICY INSTITUTE OF CALIFORNIA 500 Washington Street, Suite 600 San Francisco, CA 94111 tel 415 291 4439 fax 415 291 4401 web www.ppic.org <http://www.ppic.org> Any opinions expressed in this message are those of the author alone and do not necessarily reflect any position of the Public Policy Institute of California.

13 years, 11 months

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

Zelig May 2010