On Wed, 13 Jul 2005, Barbaresco Gabriele wrote:
> Dear Prof. King,
>
> I have read with great interest the following your papers on logit and
> rare events:
>
> Logistic regression in rare events Explaining rare events in
> international relations
>
> I really appreciated them, also due to the lack of academic literature
> on this important topics. I am currently working on a model to forecast
> companies failure and so I am dealing with logit and rare events.
>
> Referring to the second paper, the less technical one, I'd like to be
> advised on the following points:
>
> 1) Pag. 697, I see that the standard errors of betas depend on the
> estimated probability, that in rare events is usually very far from
> 0.5 , but greater (if the model works well) for Y=1 than for Y=0.
> So, you argue, the ones have more informative power. To reduce these
> errors, can I balance the sample (50% of ones and 50% of zeros) or
> better use the original sample balancing observations that they are
> equally weighted? It seems from my simulation that rebalancing the
> sample lead to better Wald test and also to better pseudo Rsquared
> measures of fitting. In doing so the estimated probability range
> from 0 to 1, and 0.5 could be used as cutoff.
if you can 'balance' the sample by collecting additional data (the 1s),
then that's great. but if you're talking about discarding 0s, that's not
generally a good idea (although there are possible exceptions unrelated to
your point). in statistics, more data are better.
> 2) Page 704, here again it seems to me that the
bias coefficient on the
> costant is smaller the bigger is the sample (n stand at denominator)
> and the closer the probability to 0.5. I work with a sample of
> around 180.000 items, if I rebalance the sample as above, can I
> reduce this source of bias too?
probably not.
>
> In a nutshell, it seems to me that rebalancing the sample can reduce
> some of the problems you highlight, but it seem also that you never
> suggest such a solution, so I wonder if I have misunderstood the
> meaning of your paper.
we thought about it and tried it when writing the paper, and there is
probably still something going on that might be useful from a theoreetical
perspective, but from you perspective, I wouldn't discard any data if you
don't have to.
Gary King
---
Gary King
Institute for Quantitative Social Science
Harvard University, 34 Kirkland St, Cambridge, MA 02138
http://GKing.Harvard.Edu, King(a)Harvard.Edu
Direct 617-495-2027, Assistant 495-9271, eFax 812-8581
Sorry for my boring questions, and thank you very much for your kind
assistance.
Best regards,
Mr. Gabriele Barbaresco
Research Dept. - Mediobanca
Milan - Italy
www.mediobanca.it
www.mbres.it
-
relogit mailing list served by Harvard-MIT Data Center
List Address: relogit(a)latte.harvard.edu
Subscribe/Unsubscribe:
http://lists.hmdc.harvard.edu/?info=relogit