New subject: Fwd: EI and Zelig

15 Mar 2009

 Hello,

This question is less an issue of an EI error and more one of
interpretation.

My collaborator and I have been experimenting with Zelig's EI routine.
We're considering an example in public health, where EI type problems
are common (though investigators at times seem rather oblivious to the
problem).  In this case, we have data on mental health centers across
the country, and I'm interested in the relationship between race and the
use of psychotropic medications.  The actual individual-specific data
show a strong relationship--African-American kids are less likely to be
medicated.

          |   medbeh.1: Taking
          |    medication for
          | behavioral/emotional
          |       problems
    black |        No        Yes |     Total
-----------+----------------------+----------
        0 |     6,343      7,555 |    13,898
          |     45.64      54.36 |    100.00
-----------+----------------------+----------
        1 |     2,617      2,391 |     5,008
          |     52.26      47.74 |    100.00
-----------+----------------------+----------
    Total |     8,960      9,946 |    18,906
          |     47.39      52.61 |    100.00

So, black kids are 6.5 percentage points less likely to be medicaid.

The data are nested within 44 sites, and when one collapses the data to
the site level, one can generate a table like the following:
Since the % medicated and % black are now continuous variables, I could
calculate a table with blacksite (%black>.33) and medsite (% med >.50)
So, I can do an analogous site-level analysis

. tab blacksite medsite , row

          |        medsite
blacksite |         0          1 |     Total
-----------+----------------------+----------
        0 |         6         15 |        21
          |     28.57      71.43 |    100.00
-----------+----------------------+----------
        1 |        11         12 |        23
          |     47.83      52.17 |    100.00
-----------+----------------------+----------
    Total |        17         27 |        44
          |     38.64      61.36 |    100.00

So, site-level analyses show an exaggerated relationship.

You can see this, too, by looking at the comparable logit coefficients

. logit medbeh_1 black

Iteration 0:   log likelihood = -13078.918
Iteration 1:   log likelihood = -13046.625
Iteration 2:   log likelihood = -13046.625

Logistic regression                               Number of obs   =
18906
                                                 LR chi2(1)      =
64.59
                                                 Prob > chi2     =
0.0000
Log likelihood = -13046.625                       Pseudo R2       =
0.0025

------------------------------------------------------------------------------
   medbeh_1 |      Coef.   Std. Err.      z    P>|z|     [95% Conf.
Interval]
-------------+----------------------------------------------------------------
      black |  -.2651747   .0330207    -8.03   0.000    -.3298941
-.2004552
      _cons |   .1748578   .0170299    10.27   0.000     .1414798
.2082357
------------------------------------------------------------------------------

. glm medbeh black , link(logit)
/{I probably should have weighted this by the number of kids in each site.)/

Iteration 0:   log likelihood =  13.480109
Iteration 1:   log likelihood =  13.486325
Iteration 2:   log likelihood =  13.486326

Generalized linear models                          No. of obs
=        44
Optimization     : ML                              Residual df
=        42
                                                  Scale parameter =
.0332277
Deviance         =  1.395562768                    (1/df) Deviance =
.0332277
Pearson          =  1.395562768                    (1/df) Pearson  =
.0332277

Variance function: V(u) = 1                        [Gaussian]
Link function    : g(u) = ln(u/(1-u))              [Logit]

                                                  AIC             =
-.5221057
Log likelihood   =  13.48632594                    BIC             =
-157.5404

------------------------------------------------------------------------------
            |                 OIM
     medbeh |      Coef.   Std. Err.      z    P>|z|     [95% Conf.
Interval]
-------------+----------------------------------------------------------------
      black |  -.3436786   .5148366    -0.67   0.504     -1.35274
.6653826
      _cons |   .2926308    .211924     1.38   0.167    -.1227326
.7079941
------------------------------------------------------------------------------

so, the site-level analyses exaggerates the relationships.

I wasn't sure EI would give me "the answer", but I didn't expect it
to
be so off--the log odds implied by the expected values reported by Zelig
with ei is
-.570.  It's actually a worse estimate that just naive, site-level analyses.

Any insights into the poor performance of EI in this case?  Is it just a
case of "not every statistical method works every time"?  Is there
substantive I can learn from this?

thanks--michael

E. Michael Foster
Professor
School of Public Health
UNC-CH