Professor King and other ei practitioners:
I am hoping for some clarification and/or advice. My polisci
colleague and I study political movements and elections in Ecuador. We
have analyzed some of the results of the 2002 elections there, primarily
the votes for President in the first and second rounds. We are
particularly interested in the voting differences between indigenous
peoples (hereafter Indians) and others (mestizos, blanco-mestizos,
etc.). I won't go into the methodological problems of estimating
ethnicity here. We have the data at the parish level (the closest thing
to a precinct) and have looked at the relationship between %Indian and %
voting for Presidential Candidate G, who was in an alliance with an
indigenous-led political movement. The hypothesis is simple: a larger %
of Indians should vote for candidate G than should non-Indians. For the
943 parishes we first ran the "Goodman regression" and find that the
estimate for the proportion of Indians casting their vote for candidate
G is .465 and for non-Indians it is .163. Then, using the ezi program
we run the regular or first-stage ei and the "Aggregate Quantities of
Interest" are .463 for Indians and .164 for non-Indians. These results,
and others, are very close between the "no-intercept" OLS regression and
ei.
So, it seems we have two results: 1) a much higher proportion of
Indians voted for candidate G than did non-Indians and,
2) there is not an "ecological fallacy" or aggregation problem with that
conclusion. The thing is, we have a very important "control" variable -
region. There are very strong regional differences in Ecuador in voting
patterns. There are three regions: Coast, Sierra, and Oriente (jungle).
The third is not very important since only 3% of the population lives
there. Candidate G is a Sierra (and Oriente) candidate and he did not
receive a lot of support on the Coast (especially in the first round).
This confounds the ethnic differences because Indians live
overwhelmingly in the Sierra, not on the coast.
I have read numerous other articles using ei, including those going
on to a "second-stage" and the exchange between Herron and Shotts and
Adolph and King. But we are not interested in analyzing the variance in
the ei estimates of the proportion of Indians voting for candidate G
across the parishes, which is the equivalent of what the other
researchers have been doing. If the ecological inference issue has been
resolved, i.e., a higher % of Indians really did vote for candidate G
than did non-Indians, would it not be appropriate to just return to
simple OLS regression if I want to explain variance in votes for
candidate G across parishes, with %Indian and two dummy variables
representing the Coast and Oriente regions of Ecuador? This is to
resolve the question of whether a higher proportion of Indians voted for
candidate G in the Sierra than did non-Indians, which is the case. By
the way, we also estimated this by "regular" ei by just using the
parishes in the Sierra, which once again produced estimates very close
to the Goodman regression.
SO, I JUST WANT TO KNOW WHETHER USING OLS REGRESSION WITH OUR
ORIGINAL DEPENDENT VARIABLE, THE %INDIAN PREDICTOR AND A COUPLE OF
CONTROL VARIABLES IS AN ACCEPTABLE APPROACH. (By the way, in the OLS
regressions we do weight the cases by size of parish).
I appreciate any comments,
Scott H. Beck, Professor
Department of Sociology and Anthropology
East Tennessee State University
Johnson City, TN 37614
Tel.: 423-439-6648
Email: r30scott(a)etsu.edu
Gary,
As an applied researcher, I'm curious as to your reaction to the Herron
and Shotts article in the January 2004 AJPS. They essentially argue that
for any
meaningful application involving EI to employ the extended model, and they
explain how to do so.
Obviously if you have serious disagreements with their argument you cannot
address their points in full in an email to a listserve.
I guess something I'm wondering about in particular is the relationship
between the information contained in the bounds and how EI responds to
aggregation bias. H&S make a convincing argument (as far as I am
concerned, at this point anyway) that this "logical inconsistency" means
aggregation bias will be passed along to contaminate any second-stage EI-R
effect estimates. However, it seems intuitive to me that this problem is
not necessarily absolutely corrupting, but more a matter of degree. They
show with the Burden and Kimball data that in that case this "logical
inconsistency" was a real problem. However, the bounds in that data was
quite atrocious. I can imagine that if you have very informative data (a
la AKHS 2003), and then you run extended models with some covariates, that
these extended models might remove some of the aggregation bias from the
resulting second stage estimates (in fact, that's what they are designed
to do). We don't have Monte Carlo simulations on this question, only one
example with poor data, so the question is open as to the degree to which
this "logical inconsistency" actually matters for the estimates.
I guess I'm wondering if this is logic you might agree with, and I'm
curious as to your other thoughts. I really should have discussed this
with you at PolMeth, but the article only came to my attention later.
Greg
Gregory A. Pettis
ABD, Political Science
UNC Chapel Hill
Polling Fellow, Elon University
CB # 2203
Elon, N.C. 27244
(336) 278-5239
-
EI mailing list served by Harvard-MIT Data Center
Subscribe/Unsubscribe: http://lists.hmdc.harvard.edu/?info=ei