Hello,
This question is less an issue of an EI error and more one of
interpretation.
My collaborator and I have been experimenting with Zelig's EI routine.
We're considering an example in public health, where EI type problems
are common (though investigators at times seem rather oblivious to the
problem). In this case, we have data on mental health centers across
the country, and I'm interested in the relationship between race and the
use of psychotropic medications. The actual individual-specific data
show a strong relationship--African-American kids are less likely to be
medicated.
| medbeh.1: Taking
| medication for
| behavioral/emotional
| problems
black | No Yes | Total
-----------+----------------------+----------
0 | 6,343 7,555 | 13,898
| 45.64 54.36 | 100.00
-----------+----------------------+----------
1 | 2,617 2,391 | 5,008
| 52.26 47.74 | 100.00
-----------+----------------------+----------
Total | 8,960 9,946 | 18,906
| 47.39 52.61 | 100.00
So, black kids are 6.5 percentage points less likely to be medicaid.
The data are nested within 44 sites, and when one collapses the data to
the site level, one can generate a table like the following:
Since the % medicated and % black are now continuous variables, I could
calculate a table with blacksite (%black>.33) and medsite (% med >.50)
So, I can do an analogous site-level analysis
. tab blacksite medsite , row
| medsite
blacksite | 0 1 | Total
-----------+----------------------+----------
0 | 6 15 | 21
| 28.57 71.43 | 100.00
-----------+----------------------+----------
1 | 11 12 | 23
| 47.83 52.17 | 100.00
-----------+----------------------+----------
Total | 17 27 | 44
| 38.64 61.36 | 100.00
So, site-level analyses show an exaggerated relationship.
You can see this, too, by looking at the comparable logit coefficients
. logit medbeh_1 black
Iteration 0: log likelihood = -13078.918
Iteration 1: log likelihood = -13046.625
Iteration 2: log likelihood = -13046.625
Logistic regression Number of obs =
18906
LR chi2(1) =
64.59
Prob > chi2 =
0.0000
Log likelihood = -13046.625 Pseudo R2 =
0.0025
------------------------------------------------------------------------------
medbeh_1 | Coef. Std. Err. z P>|z| [95% Conf.
Interval]
-------------+----------------------------------------------------------------
black | -.2651747 .0330207 -8.03 0.000 -.3298941
-.2004552
_cons | .1748578 .0170299 10.27 0.000 .1414798
.2082357
------------------------------------------------------------------------------
. glm medbeh black , link(logit)
/{I probably should have weighted this by the number of kids in each site.)/
Iteration 0: log likelihood = 13.480109
Iteration 1: log likelihood = 13.486325
Iteration 2: log likelihood = 13.486326
Generalized linear models No. of obs
= 44
Optimization : ML Residual df
= 42
Scale parameter =
.0332277
Deviance = 1.395562768 (1/df) Deviance =
.0332277
Pearson = 1.395562768 (1/df) Pearson =
.0332277
Variance function: V(u) = 1 [Gaussian]
Link function : g(u) = ln(u/(1-u)) [Logit]
AIC =
-.5221057
Log likelihood = 13.48632594 BIC =
-157.5404
------------------------------------------------------------------------------
| OIM
medbeh | Coef. Std. Err. z P>|z| [95% Conf.
Interval]
-------------+----------------------------------------------------------------
black | -.3436786 .5148366 -0.67 0.504 -1.35274
.6653826
_cons | .2926308 .211924 1.38 0.167 -.1227326
.7079941
------------------------------------------------------------------------------
so, the site-level analyses exaggerates the relationships.
I wasn't sure EI would give me "the answer", but I didn't expect it to
be so off--the log odds implied by the expected values reported by Zelig
with ei is
-.570. It's actually a worse estimate that just naive, site-level analyses.
Any insights into the poor performance of EI in this case? Is it just a
case of "not every statistical method works every time"? Is there
substantive I can learn from this?
thanks--michael
E. Michael Foster
Professor
School of Public Health
UNC-CH
Dear All,
I try to run a gamma mixed model in zelig, but got the following error message:
Error in eval(expr, envir, enclos) :
non-positive values not allowed for the gamma family
Besides the dependent structure among observations, a large proportion (about 75%) of my response variable is zero. I was advised that zelig gamma mixed model can handle my zero-augmented data. Is that true?
The code for my model is:
z.out1 <- zelig (OutDegr ~ Pr65 + Pr80 + Pr00 + Semipr65 + Semipr80 + Semipr00 + tag(1 | cityid), data = wc3, model = "gamma.mixed")
Could anybody give me some help?
Thank you very much!
Xiulian
Hi,
I'm not familiar with these kinds of models, but I wouldn't be surprised
if the convergence has not been achieved at n = 300. It's also possible
that variance parameters have uncertainty estimation which has not been
taken into account (although I'm not 100 % sure about if this is indeed
the case, and I'm ccing to the contributor of this model to see if she has
any insight). If this is the case, you might want to choose the bootstrap
option.
Kosuke
--
Department of Politics
Princeton University
http://imai.princeton.edu
On Mon, 30 Mar 2009, Zhang, Hui wrote:
> Dear Kosuke,
>
> Thank you so much for your response. After I update all the three, my R,
> lme4 and Zelig packages, to the newest version, the problem disappeared.
>
>
> Then I tested the Zelig package for binary data GLMM modeling. However,
> I did not get the result as supposed.
>
> Basically, I generate the dataset from a GLMM model to and then test the
> type I error using Zelig fitting. It did not give the correct number
> even for large sample size.
>
> I attached both the simulation strategy and R code in this email. Do you
> think that is because I did not use the the Package "Zelig" correctly or
> it has deficit?
>
> Thank you so much!
>
> Hui
>
> -----Original Message-----
> From: Kosuke Imai [mailto:kimai@Princeton.EDU]
> Sent: Sunday, March 29, 2009 9:34 PM
> To: Zhang, Hui
> Cc: Gary King; Delia Bailey; Tu, Xin; zelig(a)lists.gking.harvard.edu
> Subject: Re: a question regarding R package "Zelig"
>
> Can you try the latest version of Zelig and lme4 and do:
>
> demo(ls.mixed)
>
> It works for me and should work for you too... Note that you should do
>
> install.packages("Zelig", repos = "http://gking.harvard.edu")
>
> for Zelig.
>
> Thanks,
> Kosuke
>
>
-
Zelig Mailing List, served by Harvard-MIT Data Center
Send messages: zelig(a)lists.gking.harvard.edu
[un]subscribe Options: http://lists.gking.harvard.edu/?info=zelig
Zelig program information: http://gking.harvard.edu/zelig/
Can you try the latest version of Zelig and lme4 and do:
demo(ls.mixed)
It works for me and should work for you too... Note that you should do
install.packages("Zelig", repos = "http://gking.harvard.edu")
for Zelig.
Thanks,
Kosuke
--
Department of Politics
Princeton University
http://imai.princeton.edu
On Wed, 25 Mar 2009, Zhang, Hui wrote:
> Dear Professors King and Imai,
>
>
>
> I am a graduate student in University of Rochester Medical Center. I am
> using your R package "Zelig" for Generalized Linear Mixed-effects Model
> with binary data.
>
>
>
> This is an excellent package and I really want to use it. However, I
> keep getting trouble. When I initially used it, it always gave an error
> ""terms" is not a slot in class "mer". I figured out the reason "this is
> because of the new version of "lme4" you have installed. Current version
> of Zelig is tested against "lme4" version 0.99875-9". Then, I changed my
> "lme4" package to the version 0.99875-9.
>
>
>
> Now, the Zelig starts working. But I started to get another error very
> frequently "Error in asMethod(object) : matrix is not symmetric [1,2]".
> According to a website
> (https://stat.ethz.ch/pipermail/r-sig-mixed-models/2008q3/001283.html
> <https://stat.ethz.ch/pipermail/r-sig-mixed-models/2008q3/001283.html>
> ), the solution should be "Update your version of the lme4 package. The
> version you are using is quite out of date.".
>
> As you can see, I was put in a dilemma on "lme4" version when I use
> Zelig.
>
> I am just wondering if you could please give me an suggestion.
>
> Thank you very much.
>
> Hui
>
>
>
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> Hui Zhang
>
> Doctoral Student
>
> University of Rochester, Department of Biostatistics and Computational
> Biology,
>
> 601 Elmwood Avenue Box 630,
>
> Rochester, NY 14642
>
> Website:www.urmc.rochester.edu/smd/biostat/people/student/zhang/index.ht
> ml
>
>
>
>
-
Zelig Mailing List, served by Harvard-MIT Data Center
Send messages: zelig(a)lists.gking.harvard.edu
[un]subscribe Options: http://lists.gking.harvard.edu/?info=zelig
Zelig program information: http://gking.harvard.edu/zelig/
Folks,
I'm having trouble with a new installation of R on a macbook pro (OS X).
This code used to work on my PC, but now doesn't. I'm wondering what the
problem might be. Any help would be much appreciated.
-Don
Here's what I've been running (and running up against):
> z.out <- zelig(bc.guilty.bin ~
+ male + white,
+ model = "logit",
+ data = sd.data)
How to cite this model in Zelig:
Kosuke Imai, Gary King, and Oliva Lau. 2008. "logit: Logistic Regression for
Dichotomous Dependent Variables" in Kosuke Imai, Gary King, and Olivia Lau,
"Zelig: Everyone's Statistical Software," http://gking.harvard.edu/zelig
>
> summary(z.out)
Call:
zelig(formula = bc.guilty.bin ~ male + white, model = "logit",
data = sd.data)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.101 -0.854 -0.803 1.321 1.606
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.331 0.157 -2.11 0.03490 *
male 0.148 0.156 0.95 0.34367
white -0.637 0.170 -3.74 0.00018 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 970.45 on 765 degrees of freedom
Residual deviance: 956.24 on 763 degrees of freedom
AIC: 962.2
Number of Fisher Scoring iterations: 4
>
> x.out <- setx(z.out, male = 1, white = 1 )
>
> s.out <- sim(z.out, x = x.out)
Error in sim(z.out, x = x.out) :
unused argument(s) (list(coefficients = c(-0.33055376498435,
0.147796121774405, -0.636859852954591), residuals = c(-1.38006476068037,
-1.38006476068037, -1.44060015352681, -1.44060015352681, -1.44060015352681,
-1.44060015352681, -1.44060015352681, -1.44060015352681, -1.44060015352681,
3.26963152871244, -1.38006476068037, -1.38006476068037, 2.20052341781859,
3.26963152871244, -1.38006476068037, -1.38006476068037, -1.38006476068037,
-1.44060015352681, -1.44060015352681, -1.44060015352681, -1.44060015352681,
-1.83297000721323,
>
> summary(s.out)
Error in summary(s.out) : object "s.out" not found
Hi, Ken.
Thanks for taking the time to report the bug. I took a look at the
plot.ci function and it wasn't handling interaction terms properly.
I'm attaching a revised plot.ci function that will run your code
without error. To use it, simply launch R, load Zelig, then
source("plot.ci.R") or copy and paste the attached function into R.
It should be released in the next version of Zelig.
Yours,
Olivia Lau
p.s. -- For everyoen: In the future, put the () after traceback. If
you just type "traceback", you get the traceback function itself. If
you type "traceback()", you get the log of error messages. (More
useful!) Thanks.
-----------------
On 10 Mar 2009, at 01:26, Kosuke Imai wrote:
> Hmmm.. It's hard to tell from this. Can you send us traceback()? Also, please make sure you update both R and Zelig to their latest version.
>
> Best,
> Kosuke
>
> --
> Department of Politics
> Princeton University
> http://imai.princeton.edu
>
> On Mon, 9 Mar 2009, Kenneth Benoit wrote:
>
>> Hi - I am having the following problem with the plot.ci method:
>>
>> [note that pspend_total is the total spending,
>> incumb is 0 or 1 depending on incumbent or challenger]
>>
>>> z.out <- zelig(wonseat ~ pspend_total*incumb+m, model="probit", data=dail)
>>
>> How to cite this model in Zelig:
>> Kosuke Imai, Gary King, and Oliva Lau. 2007. "probit: Probit Regression for Dichotomous Dependent Variables" in Kosuke Imai, Gary King, and Olivia Lau, "Zelig: Everyone's Statistical Software," http://gking.harvard.edu/zelig
>>>
>>> x.incumb <- setx(z.out, pspend_total=1:30, incumb=1, m=4)
>>> x.chall <- setx(z.out, pspend_total=1:30, incumb=0, m=4)
>>> s.out <- sim(z.out, x=x.incumb, x1=x.chall)
>>> plot.ci(s.out, xlab="% Candidate Spending in Constituency",
>>
>> + ylab="Probability of Winning a Seat")
>> Error in plot.ci(s.out, xlab = "% Candidate Spending in Constituency", :
>> x and x1 in vary on different dimensions.
>>
>>
>> Any ideas before I have to dig in and compute this from the return values of sim()?? Help much appreciated!
>>
>> Ken
>>
>>
>> Kenneth Benoit
>> Professor of Quantitative Social Sciences
>> Head, Department of Political Science
>> Trinity College
>> Dublin 2, Ireland
>> http://kenbenoit.net
>> Tel: 353-1-896-2491
Hi - I am having the following problem with the plot.ci method:
[note that pspend_total is the total spending,
incumb is 0 or 1 depending on incumbent or challenger]
> z.out <- zelig(wonseat ~ pspend_total*incumb+m, model="probit",
data=dail)
How to cite this model in Zelig:
Kosuke Imai, Gary King, and Oliva Lau. 2007. "probit: Probit
Regression for Dichotomous Dependent Variables" in Kosuke Imai, Gary
King, and Olivia Lau, "Zelig: Everyone's Statistical Software," http://gking.harvard.edu/zelig
> x.incumb <- setx(z.out, pspend_total=1:30, incumb=1, m=4)
> x.chall <- setx(z.out, pspend_total=1:30, incumb=0, m=4)
> s.out <- sim(z.out, x=x.incumb, x1=x.chall)
>
> plot.ci(s.out, xlab="% Candidate Spending in Constituency",
+ ylab="Probability of Winning a Seat")
Error in plot.ci(s.out, xlab = "% Candidate Spending in
Constituency", :
x and x1 in vary on different dimensions.
Any ideas before I have to dig in and compute this from the return
values of sim()?? Help much appreciated!
Ken
Kenneth Benoit
Professor of Quantitative Social Sciences
Head, Department of Political Science
Trinity College
Dublin 2, Ireland
http://kenbenoit.net
Tel: 353-1-896-2491
-
Zelig Mailing List, served by Harvard-MIT Data Center
Send messages: zelig(a)lists.gking.harvard.edu
[un]subscribe Options: http://lists.gking.harvard.edu/?info=zelig
Zelig program information: http://gking.harvard.edu/zelig/