I think you need to remove NAs in your data set. When there is "NA" among
the estimated coefficients, sims() cannot handle the simulation.
Kosuke
--
Department of Politics
Princeton University
Dear Kosuke,
I appreciate your help very much. I'll in future send emails to the Zelig mailing
list.
I have attached with this message a data set that is part of many textbooks, including
Bill Greene's Econometric Analysis and David Hensher's Applied Choice Analysis.
The data are in public domain and are not restricted by copyright.
The data set comprise of 210 individuals who were administered a stated preference for
intercity travel choices in Australia. The four alternatives were: air, bus, rail, and
car. The explanatory variables included the following:
Id individual's unique ID
Alt alternatives available to the individual
Hinc household income in '000s
Psize size of the traveling party, i.e. number of commuters traveling
together
Aasc alternative specific constant for air
Tasc alternative specific constant for train
Basc alternative specific constant for bus
Casc alternative specific constant for car
Psizea size of traveling party interacted with the mode air
Mode chosen alternative
Twait terminal wait time in minutes
Invc cost of travel
Invt in vehicle travel time in minutes
Gc generalized cost, a composite variable consisting of in vehicle
travel time and out- of-pocket costs
mc mode choice, 1 if the mode is chosen, 0 otherwise,
hinca household income interacted with mode air
hincb household income interacted with mode bus
hinct household income interacted with mode train
psizeb size of traveling party interacted with mode bus
psizet size of traveling party interacted with mode train
t a variable needed for running the model
choice it is the dependent variable
In Stata, one can use either clogit or asclogit to estimate McFadden's logit model.
In R, we have two alternatives: mlogit, a package written for McFadden's logit, or
coxph model, which can be tricked into maximizing the same likelihood function as the
other routines do. I used Zelig to estimate coxph.
The trick with coxph is to create a new variable (t in this case) that assumes the value
2 if the alternative is not chosen by the individual and 1 otherwise. mc is the
traditional dependant variable that assumes the value 1 if the alternative is chosen and 0
otherwise.
I ran the following Zelig command to estimate the coxph model:
z1 <- zelig(Surv(t,mc) ~ invt+twait+gc+hinc*alt+ strata(id), model =
"coxph",data = h09,na.action=na.exclude)
I get the following error:
ERROR: Invalid status value
However, when I replace mc with choice in the above script, the model is estimated
without any problem:
It generates the following output:
coef se(coef) exp(coef) z p
invt -0.00449 0.00121 0.9955 -3.723 2.0e-04
twait -0.09709 0.01053 0.9075 -9.218 0.0e+00
gc 0.00747 0.00683 1.0075 1.095 2.7e-01
hinc NA 0.00000 NA NA NA
alt[T.bus] 0.43194 0.90091 1.5402 0.479 6.3e-01
alt[T.car] -3.58484 0.98922 0.0277 -3.624 2.9e-04
alt[T.train] 1.82237 0.82629 6.1865 2.205 2.7e-02
hinc:alt[T.bus] -0.02389 0.01639 0.9764 -1.457 1.4e-01
hinc:alt[T.car] 0.00153 0.01226 1.0015 0.125 9.0e-01
hinc:alt[T.train] -0.06274 0.01567 0.9392 -4.005 6.2e-05
The only difference between choice and mc is that mc is a factor variable with value
labels, whereas choice does not have any labels.
Lastly, I cannot simulate using the following code:
x.out <- setx(z.out, strata = "id")
s.out <- sim(z.out, x = x.out)
In this particular application of coxph, the simulated values for each individual
(strata) should sum to 1.
Many thanks for your consideration.
Sincerely, Murtaza
-----Original Message-----
From: Kosuke Imai [mailto:kimai@Princeton.EDU]
Sent: Monday, July 27, 2009 10:58 AM
To: Murtaza Haider, Professor
Cc: Gary King; zelig(a)lists.gking.harvard.edu
Subject: RE: Thank you for Zelig
Hi,
Can you send us a code that replicates this error using the data set we
provide with coxph? That way, we can determine whether this error is
general or something specific to your data set. Also, it would be great
if you could send your queries to the zelig mailing list (cc'd) so that
other users can give you suggestions and/or benefit from the discussion.
Thanks,
Kosuke
--
Department of Politics
Princeton University
http://imai.princeton.edu
On Fri, 24 Jul 2009, Murtaza Haider, Professor wrote:
Greetings:
I apologize for the delay in this response.
I've been away on vacation. Thank you for suggesting multinomial probit. I have
tested it and it works fine.
There is another way of using Zelig to estimate
McFadden's Logit models. Because of the similarities in the likelihood functions, one
can trick Cox proportional hazards model, which is available in your package, to estimate
McFadden's Logit model.
When I used the coxph working in your package, I
confronted two issues. First, the choice variable in my data set was a factor variable
with names. The algorithm did not work until I stripped the value labels
"yes/no" from the binary (1/0) variable.
Secondly, I've not been able to simulate
using the s.out command. I get to the s.out stage that returns the message:
"subscript out of bounds." Here is the command I ran:
z.out <- zelig(Surv(t,choice) ~
invt+twait+gc+aasc+tasc+ basc+hinca+ strata(id),
model = "coxph", data =
h09,na.action=na.exclude)
x.out <- setx(z.out, strata = "id")
s.out <- sim(z.out, x = x.out)
Sincerely, Murtaza
-
Zelig Mailing List, served by Harvard-MIT Data Center
Send messages: zelig(a)lists.gking.harvard.edu
[un]subscribe Options: