Dear Prof. King,
I am conducting an analysis on the risk of aircraft crashes, comparing data
from accidents and normal flights. Given the rare occurence of accidents, I
have an extremely unblanced data set.
A discussion on ALLSTAT
(http://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind04&L=allstat&D=0&I=-1&P=15
7323) on this topic mentions your paper (Logistic Regression in Rare Events
Data, 2001) as well as other methods to deal with rare events in logistic
regression. One of them is to conduct logistic regression on subsets.
Each subset has a 50:50 ratio of events to non-events, achieved by selecting
all events and a random selection of the same number of non-events. The
final parameter values would be defined by averaging the subsets' results.
This method, apparently, is base on findings of Weiss & Provost (The Effect
of Class Distribution on Classifier Learning: An Empirical Study, 2001),
which deals with machine learning and data-mining methods.
Given my lack of training in this field, I am unsure whether it is
appropriate to treat logistic regression as a classifier/learner in the
data-mining sense, such that Weiss & Provosts' conclusions also apply to the
statistical technique. Your opinion on this matter would be greatly
appreciated.
Also, would it be possible to implement the corrections suggested in your
Logistic Regression in Rare Events Data paper with SPSS/Excel?
Thank you for your kind attention; I look forward to hearing from you soon.
Yours sincerely,
Derek Wong
Research Assistant
Transport Studies Group
Civil & Building Engineering Dept.
Loughborugh University
Leics. LE11 3TU
UK
Tel.: +44 (0)1509 263171 ext. 4681
Fax: +44 (0)1509 223981
Project website: http://civil-unrest.lboro.ac.uk/cvaja/index.htm
-
relogit mailing list served by Harvard-MIT Data Center
List Address: relogit(a)latte.harvard.edu
Subscribe/Unsubscribe: http://lists.hmdc.harvard.edu/?info=relogit