Zelig June 2008

zelig@lists.gking.harvard.edu

11 participants
14 discussions

by Casey Klofstad

I am trying to run a logit using weighted data. I get results, but also the following error message: "In eval(expr, envir, enclos) : non-integer #successes in a binomial glm!" Based on my read of the R help archives, it seems that the problem is that the binomial model does not like the weight, since it leads to non-integer outcomes. My question is, are my results "wrong," or can I use them despite the error messages? If I can't use the results, what model can I use with the weight? Thanks much, -c -- Casey A. Klofstad University of Miami Department of Political Science Coral Gables, FL klofstad(a)gmail.com http://moya.bus.miami.edu/~cklofstad - Zelig Mailing List, served by Harvard-MIT Data Center Send messages: zelig(a)lists.gking.harvard.edu [un]subscribe Options: http://lists.gking.harvard.edu/?info=zelig Zelig program information: http://gking.harvard.edu/zelig/

15 years, 11 months

question about plot.surv() command

by JOSE A ALEMAN

Dear Zelig team, Thank you for releasing Zelig 3.3-0. I was actually wondering when plots with confidence intervals would be available for the Cox model. I was very happy then when you introduced the plot.surv() command. However, I tried that on my data and got the following error message, with traceback() pasted below. I would appreciate any help you could provide. > low <- setx(z.out,pexec_t=0) > high <- setx(z.out,pexec_t=1) > > s.out1 <- sim(z.out,x=low) > s.out2 <- sim(z.out,x=high) > out <- list(s.out1,s.out2) > plot.surv(x = out, duration = data$.t, censor=data$adverse.backslide, + type="line", plottimes=FALSE, plotcensor=FALSE, + main="Survival", xlab="Time", ylab="Survival") Error in plot.window(...) : need finite 'xlim' values > traceback() 5: plot.window(...) 4: localWindow(xlim, ylim, log, asp, ...) 3: plot.default(times, survest[-1], type = "n", xlim = c(0, max(duration)), ...) 2: plot(times, survest[-1], type = "n", xlim = c(0, max(duration)), ...) 1: plot.surv(x = out, duration = data$.t, censor = data$adverse.backslide, type = "line", plottimes = FALSE, plotcensor = FALSE, main = "Survival", xlab = "Time", ylab = "Survival") Thank you, Jose Aleman, Ph. D. Assistant Professor Political Science Department Fordham University - Zelig Mailing List, served by Harvard-MIT Data Center Send messages: zelig(a)lists.gking.harvard.edu [un]subscribe Options: http://lists.gking.harvard.edu/?info=zelig Zelig program information: http://gking.harvard.edu/zelig/

15 years, 11 months

Zelig 3.3-0

by Kosuke Imai

Hi everyone, We just released a new version of Zelig, 3.3-0. This is a stable release for the latest version of R, i.e., R.2.7-0. As always, you can install the latest version via: install.packages("Zelig", repos = "http://gking.harvard.edu") This version includes two key updates. Patrick Lam has contributed a new version of the coxph model so that time-varying covariates can be used. John Graves contributed a new graphical function, plot.surv, which plots confidence intervals for survival curves. See the Zelig manual for details and examples. We welcome contributions from others to the Zelig project. If you are interested, please let us know. Best, Kosuke --------------------------------------------------------- Kosuke Imai Office: Corwin Hall 041 Assistant Professor Phone: 609-258-6601 Department of Politics eFax: 973-556-1929 Princeton University Email: kimai(a)Princeton.Edu Princeton, NJ 08544-1012 http://imai.princeton.edu/ --------------------------------------------------------- - Zelig Mailing List, served by Harvard-MIT Data Center Send messages: zelig(a)lists.gking.harvard.edu [un]subscribe Options: http://lists.gking.harvard.edu/?info=zelig Zelig program information: http://gking.harvard.edu/zelig/

15 years, 11 months

Re: [zelig] Question regarding multiple records per subject in cox proportional hazards model

by Patrick Lam

Hi Jose, I think I've figured out why you are getting different answers from R and Stata. It has to do with how your data is being set up. In your dataset, your start and stop variables (entrydate and elapsed) are not set up in intervals, but rather elapsed is set up as cumulative durations. So in your panel dataset, your unit of observation is country-years and for each period, your entrydate is the same (0 for example), and your elapsed is a cumulative number of days. So your variables look something like this: year, entrydate, elapsed 1950, 0, 364 1951, 0, 729 1952, 0, 1095 1953, 0, 1460 So the elapsed variable is simply a cumulative count of the number of days that have passed since the observation became at risk. However, what R needs is for the data to be in interval form. So it has to look something like this: year, start, stop 1950, 0, 364 1951, 364, 729 1952, 729, 1095 1953, 1095, 1460 In Stata, the stset command converts the former structure into the latter structure. R does not do that by itself, so you have to feed in variables that conform to the latter structure. Luckily, since your data seems to have been processed in Stata already, the variables that Stata created are in the dataset. The start and stop variables you need are ".t0" and ".t" if you are reading the data in R, "_t0" and "_t" if you are reading the data in Stata. If you hadn't processed the data in Stata beforehand using the stset command, you would have to code it in R, probably with a for loop or tapply(). So if you run the following in R, it should give you the same answers as Stata: z.out <- zelig(Surv(.t0, .t, adverse.backslide) ~ hinst + lparcomp + pexec + inequality + primary.sector + development + lgrowth + ethnicity + mil.pop + demop + strikep + riotp + guerrillap + urban.100k + pmilexec, model = "coxph", data = statadata, robust = TRUE, cluster = "new.period") Also, note that your R formula includes a pmilexec variable that is not in your Stata code. Hope that helps. I'm also forwarding this response to the Zelig list in case anybody else runs into the same problem. Patrick On Fri, May 30, 2008 at 4:51 PM, <aleman(a)fordham.edu> wrote: > Hi Patrick, > > Thanks a lot for the clarification. I was actually thinking of doing what > you advised....I actually went ahead and tried both models in R and Stata > but I am still getting different results. I'm using the same start, stop > and event variables, the only difference is that in R I have designated > failure type -2 as "adverse backslide". Otherwise the start is 'entrydate' > and stop 'elapsed'. I'm also clustering by id() in R, in this case by > specifying the cluster option () to adjust the standard errors by > 'new.period' as in stata. This automatically implies robust standard > errors. This is the R output I get: > > Call: > zelig(formula = Surv(entrydate, elapsed, adverse.backslide) ~ > hinst + lparcomp + pexec + inequality + primary.sector + > development + lgrowth + ethnicity + mil.pop + demop + > strikep + riotp + guerrillap + urban.100k + pmilexec, > model = "coxph", data = statadata, robust = TRUE, cluster = > "new.period") > > n=2136 (21 observations deleted due to missingness) > coef robust se se(coef) exp(coef) z p > hinst 1.0374 0.4000 0.3208 2.822 2.593 9.5e-03 > lparcomp 1.7180 0.3429 0.3409 5.574 5.010 5.4e-07 > pexec 1.9956 0.8302 0.7161 7.356 2.404 1.6e-02 > inequality 0.0501 0.1233 0.0846 1.051 0.406 6.8e-01 > primary.sector 0.0297 0.0312 0.0293 1.030 0.950 3.4e-01 > development -0.6638 0.1957 0.2330 0.515 -3.392 6.9e-04 > lgrowth 0.0161 0.0473 0.0456 1.016 0.340 7.3e-01 > ethnicity -2.2800 1.1489 1.0665 0.102 -1.984 4.7e-02 > mil.pop 0.0554 0.0488 0.0441 1.057 1.135 2.6e-01 > demop -0.9414 0.7162 0.6967 0.390 -1.314 1.9e-01 > strikep 0.4360 0.6677 0.7750 1.546 0.653 5.1e-01 > riotp 1.3968 0.6268 0.5989 4.042 2.229 2.6e-02 > guerrillap 1.5876 0.4864 0.5607 4.892 3.264 1.1e-03 > urban.100k 0.0111 0.0391 0.0323 1.011 0.284 7.8e-01 > pmilexec -0.3709 1.4837 1.3112 0.690 -0.250 8.0e-01 > > exp(coef) exp(-coef) lower .95 upper .95 > hinst 2.822 0.354 1.2884 6.181 > lparcomp 5.574 0.179 2.8462 10.915 > pexec 7.356 0.136 1.4455 37.438 > inequality 1.051 0.951 0.8257 1.339 > primary.sector 1.030 0.971 0.9689 1.095 > development 0.515 1.942 0.3508 0.756 > lgrowth 1.016 0.984 0.9262 1.115 > ethnicity 0.102 9.777 0.0108 0.972 > mil.pop 1.057 0.946 0.9606 1.163 > demop 0.390 2.564 0.0958 1.588 > strikep 1.546 0.647 0.4178 5.724 > riotp 4.042 0.247 1.1833 13.808 > guerrillap 4.892 0.204 1.8857 12.691 > urban.100k 1.011 0.989 0.9365 1.092 > pmilexec 0.690 1.449 0.0377 12.644 > > Rsquare= 0.04 (max possible= 0.105 ) > Likelihood ratio test= 87.9 on 15 df, p=2.48e-12 > Wald test = 121 on 15 df, p=0 > Score (logrank) test = 119 on 15 df, p=0, Robust = 15.4 p=0.421 > > (Note: the likelihood ratio and score tests assume independence of > observations within a cluster, the Wald and robust score tests do > not). > > If you take a look at the first column in the attached word document, > however, you will see that the coefficients and their signs are not similar > when I compute the model in stata. The stata code I used to run this model > is: > > stset elapsed, id(new_period) failure(transition==-2) origin(time > entrydate) scale(365.25) > stcox hinst lparcomp pexec inequality primary_sector development lgrowth > ethnicity mil_pop demop strikep riotp guerrillap urban_100k, nohr robust > > I would appreciate any thoughts/ideas you may have as to why I'm not > getting similar results. > > > Yours truly, > > > Jose(See attached file: backsliders.out) > > > > "Patrick Lam" > <plam(a)fas.harvard > .edu> To > Sent by: aleman(a)fordham.edu > pkphlam(a)gmail.com cc > zelig(a)lists.gking.harvard.edu > Subject > 05/30/2008 02:41 Re: [zelig] Question regarding > PM multiple records per subject in cox > proportional hazards model > > > > > > > > > > > Dear Jose, > > If I understand your question correctly, you are trying to estimate a > competing risks model with multiple events per subject (time-varying > covariates). > > As you say, the first part involves estimating J different models, where > you create J different event (censor) variables. For the jth model, the > variable will take on a value of 1 if the jth event occurs and 0 if not > (either the subject is truly censored or one of the other J-1 events > occurred). > > As for the multiple events per subject part, R handles this in a similar > way to Stata. Each record per subject consists of an interval with a start > time and an end time. Therefore, you need start time, stop time, and event > variables on the left hand side of the formula in the Surv() function. > Start time is either numerical or a date that indicates the beginning of an > interval, stop time indicates the end of an interval, and event is the > event variable indicating whether an event occurs at the end of the > interval. There is no need for an id() variable in R. Strictly speaking, > the estimation of the model does not require knowledge of which subject > each record belongs to. What is important are the start and stop times and > the event variable. See help("Surv") for more details, including how to > deal with other types of censoring. > > So sample syntax for Zelig would be: > > z.out <- zelig( Surv(start, stop, event) ~ x1 + x2, data = data, model = > "coxph") > x.out <- setx(z.out) > s.out <- sim(z.out, x = x.out) > > I believe the equivalent Stata syntax would be something like this: > > stset stop, id(id), failure(event) origin(start) > > Details about estimation with time-varying covariates are included in the > next version of the documentation for coxph in Zelig. > > Patrick > > On Fri, May 30, 2008 at 11:07 AM, <aleman(a)fordham.edu> wrote: > > Dear Zelig users, > > A colleague and I are trying to use the survival package to run Cox > models > in Zelig. We are using a partitioned likelihood approach to model several > kinds of events using Cox models. Since we have panel data, we have > multiple records per subject and the possibility that a country may > experience more than one type of event. Practically speaking, this will > require us to estimate 4 models, 1 each for each risk. The basic idea > here > is that we treat the J-1 remaining risks as if they are right-censored > cases. This isolates the risk of interest. In stata this is easy, you > just > use the id() option. In R this doesn't seem to be so straightforward > because the cluster function is the equivalent of the cluster() option in > Stata, but the equivalent of the id() option in Stata does not to have > been > implemented. Another alternative is to use the strata () option to > stratify > by event type, but the usual and mostly valid complaint against this > model > is that covariate effects are constrained to be equal across risks. Is > there a way to model multiple records per subject in R while allowing for > varying covariate effects across risks? > > Thanks, > > Jose Aleman, Ph.D. > Assistant Professor > Political Science Department > Fordham University > > - > Zelig Mailing List, served by Harvard-MIT Data Center > Send messages: zelig(a)lists.gking.harvard.edu > [un]subscribe Options: http://lists.gking.harvard.edu/?info=zelig > Zelig program information: http://gking.harvard.edu/zelig/ > >

15 years, 11 months

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

Zelig June 2008