hi -- thanks for responding --
(looking at the note, I can see that I had a regrettable grammar failure--a
rather long week)
can say more about "various diagnostics may also help you choose the right
model"? You don't mean influence diagnostics and the things one would use
for individual-level data, do you?
I realize aggregating destroys information. I think part of what is going
on here is that the within- and between-relationships are working in
different directions -- I suspect that the performance of the methods also
must depend on the variation of the within relationship across the higher
units -- is there any research that links the structure of the overall model
to the performance of the EI methods?
This set of methods seem so applicable to issues of public health --
thanks
Michael Foster
Professor
UNC School of Public Health
On Fri, Apr 3, 2009 at 7:17 PM, Gary King <king(a)harvard.edu> wrote:
There are lots of ecological inference models. Zelig
includes three
_classes_of models (it doesn't yet include the one from my book, although
we're working on it). Each one of these of course includes many specific
models. Each one of these makes assumptions about the process by which the
individual data get aggregated into your areal units. If you get that
process right in choosing the model, or if the unit-level observations imply
relatively narrow bounds (regardless of whether the model is right or
wrong), then you'll get good estimates from the aggregate data. various
diagnostics may also help you choose the right model. Of course,
information is destroyed when you aggregate and so its no surprise that
sometimes you can't recover the individual relationships.
Gary
---
http://gking.harvard.edu
On 4/3/2009 4:56 PM, Michael Foster wrote:
* I forgive me if this was a double post--I never saw it appear on the list
*
Hello,
This question is less an issue of an EI error and more one of
interpretation.
My collaborator and I have been experimenting with Zelig's EI routine.
We're considering an example in public health, where EI type problems
are common (though investigators at times seem rather oblivious to the
problem). In this case, we have data on mental health centers across
the country, and I'm interested in the relationship between race and the
use of psychotropic medications. The actual individual-specific data
show a strong relationship--African-American kids are less likely to be
medicated.
| medbeh.1: Taking
| medication for
| behavioral/emotional
| problems
black | No Yes | Total
-----------+----------------------+----------
0 | 6,343 7,555 | 13,898
| 45.64 54.36 | 100.00
-----------+----------------------+----------
1 | 2,617 2,391 | 5,008
| 52.26 47.74 | 100.00
-----------+----------------------+----------
Total | 8,960 9,946 | 18,906
| 47.39 52.61 | 100.00
So, black kids are 6.5 percentage points less likely to be medicaid.
The data are nested within 44 sites, and when one collapses the data to
the site level, one can generate a table like the following:
Since the % medicated and % black are now continuous variables, I could
calculate a table with blacksite (%black>.33) and medsite (% med >.50)
So, I can do an analogous site-level analysis
. tab blacksite medsite , row
| medsite
blacksite | 0 1 | Total
-----------+----------------------+----------
0 | 6 15 | 21
| 28.57 71.43 | 100.00
-----------+----------------------+----------
1 | 11 12 | 23
| 47.83 52.17 | 100.00
-----------+----------------------+----------
Total | 17 27 | 44
| 38.64 61.36 | 100.00
So, site-level analyses show an exaggerated relationship.
You can see this, too, by looking at the comparable logit coefficients
. logit medbeh_1 black
Iteration 0: log likelihood = -13078.918
Iteration 1: log likelihood = -13046.625
Iteration 2: log likelihood = -13046.625
Logistic regression Number of obs =
18906
LR chi2(1) =
64.59
Prob > chi2 =
0.0000
Log likelihood = -13046.625 Pseudo R2 =
0.0025
------------------------------------------------------------------------------
medbeh_1 | Coef. Std. Err. z P>|z| [95% Conf.
Interval]
-------------+----------------------------------------------------------------
black | -.2651747 .0330207 -8.03 0.000 -.3298941
-.2004552
_cons | .1748578 .0170299 10.27 0.000 .1414798
.2082357
------------------------------------------------------------------------------
. glm medbeh black , link(logit)
/{I probably should have weighted this by the number of kids in each
site.)/
Iteration 0: log likelihood = 13.480109
Iteration 1: log likelihood = 13.486325
Iteration 2: log likelihood = 13.486326
Generalized linear models No. of obs
= 44
Optimization : ML Residual df
= 42
Scale parameter =
.0332277
Deviance = 1.395562768 (1/df) Deviance =
.0332277
Pearson = 1.395562768 (1/df) Pearson =
.0332277
Variance function: V(u) = 1 [Gaussian]
Link function : g(u) = ln(u/(1-u)) [Logit]
AIC =
-.5221057
Log likelihood = 13.48632594 BIC =
-157.5404
------------------------------------------------------------------------------
| OIM
medbeh | Coef. Std. Err. z P>|z| [95% Conf.
Interval]
-------------+----------------------------------------------------------------
black | -.3436786 .5148366 -0.67 0.504 -1.35274
.6653826
_cons | .2926308 .211924 1.38 0.167 -.1227326
.7079941
------------------------------------------------------------------------------
so, the site-level analyses exaggerates the relationships.
I wasn't sure EI would give me "the answer", but I didn't expect it
to
be so off--the log odds implied by the expected values reported by Zelig
with ei is
-.570. It's actually a worse estimate that just naive, site-level
analyses.
Any insights into the poor performance of EI in this case? Is it just a
case of "not every statistical method works every time"? Is there
substantive I can learn from this?
thanks--michael
E. Michael Foster
Professor
School of Public Health
UNC-CH
--
Michael Foster
Chapel Hill, North Carolina
- Zelig Mailing List, served by Harvard-MIT Data Center Send messages:
zelig(a)lists.gking.harvard.edu [un]subscribe Options:
http://lists.gking.harvard.edu/?info=zelig Zelig program information:
http://gking.harvard.edu/zelig/