I haven't read the Weiss & Provost study, but what you report they suggest
makes sense for some methods. For logistic regression (which is a simple
classifier) the corrections relogit implements will deal with the bias as
best as is currently known. Subsetting shouldn't be helpful, altho you
never konw what happens in practice of course.
Unfortunately, Relogit is not available for SPSS or Excel, but if you
don't have Stata you can use the Relogit module as part of Zelig which
runs under R, all three of which are open source and free. see
http://gking.harvard.edu/zelig/
Gary
---
Gary King
Institute for Quantitative Social Science
Harvard University, 34 Kirkland St, Cambridge, MA 02138
http://GKing.Harvard.Edu, email: King(a)Harvard.Edu
Direct 617-495-2027, Assistant 495-9271, eFax 812-8581
On Fri, 22 Apr 2005, Derek K.Y. Wong wrote:
Dear Prof. King,
I am conducting an analysis on the risk of aircraft crashes, comparing data
from accidents and normal flights. Given the rare occurence of accidents, I
have an extremely unblanced data set.
A discussion on ALLSTAT
(
http://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind04&L=allstat&D=0&a…
7323) on this topic mentions your paper (Logistic Regression in Rare Events
Data, 2001) as well as other methods to deal with rare events in logistic
regression. One of them is to conduct logistic regression on subsets.
Each subset has a 50:50 ratio of events to non-events, achieved by selecting
all events and a random selection of the same number of non-events. The
final parameter values would be defined by averaging the subsets' results.
This method, apparently, is base on findings of Weiss & Provost (The Effect
of Class Distribution on Classifier Learning: An Empirical Study, 2001),
which deals with machine learning and data-mining methods.
Given my lack of training in this field, I am unsure whether it is
appropriate to treat logistic regression as a classifier/learner in the
data-mining sense, such that Weiss & Provosts' conclusions also apply to the
statistical technique. Your opinion on this matter would be greatly
appreciated.
Also, would it be possible to implement the corrections suggested in your
Logistic Regression in Rare Events Data paper with SPSS/Excel?
Thank you for your kind attention; I look forward to hearing from you soon.
Yours sincerely,
Derek Wong
Research Assistant
Transport Studies Group
Civil & Building Engineering Dept.
Loughborugh University
Leics. LE11 3TU
UK
Tel.: +44 (0)1509 263171 ext. 4681
Fax: +44 (0)1509 223981
Project website:
http://civil-unrest.lboro.ac.uk/cvaja/index.htm
-
relogit mailing list served by Harvard-MIT Data Center
List Address: relogit(a)latte.harvard.edu
Subscribe/Unsubscribe:
http://lists.hmdc.harvard.edu/?info=relogit
-
relogit mailing list served by Harvard-MIT Data Center
List Address: relogit(a)latte.harvard.edu
Subscribe/Unsubscribe:
http://lists.hmdc.harvard.edu/?info=relogit