I am trying to run a logit using weighted data. I get results, but
also the following error message:
"In eval(expr, envir, enclos) : non-integer #successes in a binomial glm!"
Based on my read of the R help archives, it seems that the problem is
that the binomial model does not like the weight, since it leads to
non-integer outcomes.
My question is, are my results "wrong," or can I use them despite the
error messages? If I can't use the results, what model can I use with
the weight?
Thanks much,
-c
--
Casey A. Klofstad
University of Miami
Department of Political Science
Coral Gables, FL
klofstad(a)gmail.com
http://moya.bus.miami.edu/~cklofstad
-
Zelig Mailing List, served by Harvard-MIT Data Center
Send messages: zelig(a)lists.gking.harvard.edu
[un]subscribe Options: http://lists.gking.harvard.edu/?info=zelig
Zelig program information: http://gking.harvard.edu/zelig/
Dear Zelig team,
Thank you for releasing Zelig 3.3-0. I was actually wondering when plots
with confidence intervals would be available for the Cox model. I was very
happy then when you introduced the plot.surv() command. However, I tried
that on my data and got the following error message, with traceback()
pasted below. I would appreciate any help you could provide.
> low <- setx(z.out,pexec_t=0)
> high <- setx(z.out,pexec_t=1)
>
> s.out1 <- sim(z.out,x=low)
> s.out2 <- sim(z.out,x=high)
> out <- list(s.out1,s.out2)
> plot.surv(x = out, duration = data$.t, censor=data$adverse.backslide,
+ type="line", plottimes=FALSE, plotcensor=FALSE,
+ main="Survival", xlab="Time", ylab="Survival")
Error in plot.window(...) : need finite 'xlim' values
> traceback()
5: plot.window(...)
4: localWindow(xlim, ylim, log, asp, ...)
3: plot.default(times, survest[-1], type = "n", xlim = c(0, max(duration)),
...)
2: plot(times, survest[-1], type = "n", xlim = c(0, max(duration)),
...)
1: plot.surv(x = out, duration = data$.t, censor = data$adverse.backslide,
type = "line", plottimes = FALSE, plotcensor = FALSE, main =
"Survival",
xlab = "Time", ylab = "Survival")
Thank you,
Jose Aleman, Ph. D.
Assistant Professor
Political Science Department
Fordham University
-
Zelig Mailing List, served by Harvard-MIT Data Center
Send messages: zelig(a)lists.gking.harvard.edu
[un]subscribe Options: http://lists.gking.harvard.edu/?info=zelig
Zelig program information: http://gking.harvard.edu/zelig/
Hi everyone,
We just released a new version of Zelig, 3.3-0. This is a stable
release for the latest version of R, i.e., R.2.7-0. As always, you can
install the latest version via:
install.packages("Zelig", repos = "http://gking.harvard.edu")
This version includes two key updates. Patrick Lam has contributed a
new version of the coxph model so that time-varying covariates can be
used. John Graves contributed a new graphical function, plot.surv, which
plots confidence intervals for survival curves. See the Zelig manual for
details and examples.
We welcome contributions from others to the Zelig project. If you are
interested, please let us know.
Best,
Kosuke
---------------------------------------------------------
Kosuke Imai Office: Corwin Hall 041
Assistant Professor Phone: 609-258-6601
Department of Politics eFax: 973-556-1929
Princeton University Email: kimai(a)Princeton.Edu
Princeton, NJ 08544-1012 http://imai.princeton.edu/
---------------------------------------------------------
-
Zelig Mailing List, served by Harvard-MIT Data Center
Send messages: zelig(a)lists.gking.harvard.edu
[un]subscribe Options: http://lists.gking.harvard.edu/?info=zelig
Zelig program information: http://gking.harvard.edu/zelig/
Hi Jose,
I think I've figured out why you are getting different answers from R and
Stata. It has to do with how your data is being set up. In your dataset,
your start and stop variables (entrydate and elapsed) are not set up in
intervals, but rather elapsed is set up as cumulative durations. So in your
panel dataset, your unit of observation is country-years and for each
period, your entrydate is the same (0 for example), and your elapsed is a
cumulative number of days. So your variables look something like this:
year, entrydate, elapsed
1950, 0, 364
1951, 0, 729
1952, 0, 1095
1953, 0, 1460
So the elapsed variable is simply a cumulative count of the number of days
that have passed since the observation became at risk. However, what R
needs is for the data to be in interval form. So it has to look something
like this:
year, start, stop
1950, 0, 364
1951, 364, 729
1952, 729, 1095
1953, 1095, 1460
In Stata, the stset command converts the former structure into the latter
structure. R does not do that by itself, so you have to feed in variables
that conform to the latter structure. Luckily, since your data seems to
have been processed in Stata already, the variables that Stata created are
in the dataset. The start and stop variables you need are ".t0" and ".t" if
you are reading the data in R, "_t0" and "_t" if you are reading the data in
Stata. If you hadn't processed the data in Stata beforehand using the stset
command, you would have to code it in R, probably with a for loop or
tapply().
So if you run the following in R, it should give you the same answers as
Stata:
z.out <- zelig(Surv(.t0, .t, adverse.backslide) ~ hinst + lparcomp + pexec +
inequality + primary.sector + development + lgrowth + ethnicity + mil.pop +
demop + strikep + riotp + guerrillap + urban.100k + pmilexec, model =
"coxph", data = statadata, robust = TRUE, cluster = "new.period")
Also, note that your R formula includes a pmilexec variable that is not in
your Stata code.
Hope that helps. I'm also forwarding this response to the Zelig list in
case anybody else runs into the same problem.
Patrick
On Fri, May 30, 2008 at 4:51 PM, <aleman(a)fordham.edu> wrote:
> Hi Patrick,
>
> Thanks a lot for the clarification. I was actually thinking of doing what
> you advised....I actually went ahead and tried both models in R and Stata
> but I am still getting different results. I'm using the same start, stop
> and event variables, the only difference is that in R I have designated
> failure type -2 as "adverse backslide". Otherwise the start is 'entrydate'
> and stop 'elapsed'. I'm also clustering by id() in R, in this case by
> specifying the cluster option () to adjust the standard errors by
> 'new.period' as in stata. This automatically implies robust standard
> errors. This is the R output I get:
>
> Call:
> zelig(formula = Surv(entrydate, elapsed, adverse.backslide) ~
> hinst + lparcomp + pexec + inequality + primary.sector +
> development + lgrowth + ethnicity + mil.pop + demop +
> strikep + riotp + guerrillap + urban.100k + pmilexec,
> model = "coxph", data = statadata, robust = TRUE, cluster =
> "new.period")
>
> n=2136 (21 observations deleted due to missingness)
> coef robust se se(coef) exp(coef) z p
> hinst 1.0374 0.4000 0.3208 2.822 2.593 9.5e-03
> lparcomp 1.7180 0.3429 0.3409 5.574 5.010 5.4e-07
> pexec 1.9956 0.8302 0.7161 7.356 2.404 1.6e-02
> inequality 0.0501 0.1233 0.0846 1.051 0.406 6.8e-01
> primary.sector 0.0297 0.0312 0.0293 1.030 0.950 3.4e-01
> development -0.6638 0.1957 0.2330 0.515 -3.392 6.9e-04
> lgrowth 0.0161 0.0473 0.0456 1.016 0.340 7.3e-01
> ethnicity -2.2800 1.1489 1.0665 0.102 -1.984 4.7e-02
> mil.pop 0.0554 0.0488 0.0441 1.057 1.135 2.6e-01
> demop -0.9414 0.7162 0.6967 0.390 -1.314 1.9e-01
> strikep 0.4360 0.6677 0.7750 1.546 0.653 5.1e-01
> riotp 1.3968 0.6268 0.5989 4.042 2.229 2.6e-02
> guerrillap 1.5876 0.4864 0.5607 4.892 3.264 1.1e-03
> urban.100k 0.0111 0.0391 0.0323 1.011 0.284 7.8e-01
> pmilexec -0.3709 1.4837 1.3112 0.690 -0.250 8.0e-01
>
> exp(coef) exp(-coef) lower .95 upper .95
> hinst 2.822 0.354 1.2884 6.181
> lparcomp 5.574 0.179 2.8462 10.915
> pexec 7.356 0.136 1.4455 37.438
> inequality 1.051 0.951 0.8257 1.339
> primary.sector 1.030 0.971 0.9689 1.095
> development 0.515 1.942 0.3508 0.756
> lgrowth 1.016 0.984 0.9262 1.115
> ethnicity 0.102 9.777 0.0108 0.972
> mil.pop 1.057 0.946 0.9606 1.163
> demop 0.390 2.564 0.0958 1.588
> strikep 1.546 0.647 0.4178 5.724
> riotp 4.042 0.247 1.1833 13.808
> guerrillap 4.892 0.204 1.8857 12.691
> urban.100k 1.011 0.989 0.9365 1.092
> pmilexec 0.690 1.449 0.0377 12.644
>
> Rsquare= 0.04 (max possible= 0.105 )
> Likelihood ratio test= 87.9 on 15 df, p=2.48e-12
> Wald test = 121 on 15 df, p=0
> Score (logrank) test = 119 on 15 df, p=0, Robust = 15.4 p=0.421
>
> (Note: the likelihood ratio and score tests assume independence of
> observations within a cluster, the Wald and robust score tests do
> not).
>
> If you take a look at the first column in the attached word document,
> however, you will see that the coefficients and their signs are not similar
> when I compute the model in stata. The stata code I used to run this model
> is:
>
> stset elapsed, id(new_period) failure(transition==-2) origin(time
> entrydate) scale(365.25)
> stcox hinst lparcomp pexec inequality primary_sector development lgrowth
> ethnicity mil_pop demop strikep riotp guerrillap urban_100k, nohr robust
>
> I would appreciate any thoughts/ideas you may have as to why I'm not
> getting similar results.
>
>
> Yours truly,
>
>
> Jose(See attached file: backsliders.out)
>
>
>
> "Patrick Lam"
> <plam(a)fas.harvard
> .edu> To
> Sent by: aleman(a)fordham.edu
> pkphlam(a)gmail.com cc
> zelig(a)lists.gking.harvard.edu
> Subject
> 05/30/2008 02:41 Re: [zelig] Question regarding
> PM multiple records per subject in cox
> proportional hazards model
>
>
>
>
>
>
>
>
>
>
> Dear Jose,
>
> If I understand your question correctly, you are trying to estimate a
> competing risks model with multiple events per subject (time-varying
> covariates).
>
> As you say, the first part involves estimating J different models, where
> you create J different event (censor) variables. For the jth model, the
> variable will take on a value of 1 if the jth event occurs and 0 if not
> (either the subject is truly censored or one of the other J-1 events
> occurred).
>
> As for the multiple events per subject part, R handles this in a similar
> way to Stata. Each record per subject consists of an interval with a start
> time and an end time. Therefore, you need start time, stop time, and event
> variables on the left hand side of the formula in the Surv() function.
> Start time is either numerical or a date that indicates the beginning of an
> interval, stop time indicates the end of an interval, and event is the
> event variable indicating whether an event occurs at the end of the
> interval. There is no need for an id() variable in R. Strictly speaking,
> the estimation of the model does not require knowledge of which subject
> each record belongs to. What is important are the start and stop times and
> the event variable. See help("Surv") for more details, including how to
> deal with other types of censoring.
>
> So sample syntax for Zelig would be:
>
> z.out <- zelig( Surv(start, stop, event) ~ x1 + x2, data = data, model =
> "coxph")
> x.out <- setx(z.out)
> s.out <- sim(z.out, x = x.out)
>
> I believe the equivalent Stata syntax would be something like this:
>
> stset stop, id(id), failure(event) origin(start)
>
> Details about estimation with time-varying covariates are included in the
> next version of the documentation for coxph in Zelig.
>
> Patrick
>
> On Fri, May 30, 2008 at 11:07 AM, <aleman(a)fordham.edu> wrote:
>
> Dear Zelig users,
>
> A colleague and I are trying to use the survival package to run Cox
> models
> in Zelig. We are using a partitioned likelihood approach to model several
> kinds of events using Cox models. Since we have panel data, we have
> multiple records per subject and the possibility that a country may
> experience more than one type of event. Practically speaking, this will
> require us to estimate 4 models, 1 each for each risk. The basic idea
> here
> is that we treat the J-1 remaining risks as if they are right-censored
> cases. This isolates the risk of interest. In stata this is easy, you
> just
> use the id() option. In R this doesn't seem to be so straightforward
> because the cluster function is the equivalent of the cluster() option in
> Stata, but the equivalent of the id() option in Stata does not to have
> been
> implemented. Another alternative is to use the strata () option to
> stratify
> by event type, but the usual and mostly valid complaint against this
> model
> is that covariate effects are constrained to be equal across risks. Is
> there a way to model multiple records per subject in R while allowing for
> varying covariate effects across risks?
>
> Thanks,
>
> Jose Aleman, Ph.D.
> Assistant Professor
> Political Science Department
> Fordham University
>
> -
> Zelig Mailing List, served by Harvard-MIT Data Center
> Send messages: zelig(a)lists.gking.harvard.edu
> [un]subscribe Options: http://lists.gking.harvard.edu/?info=zelig
> Zelig program information: http://gking.harvard.edu/zelig/
>
>