dichotomous variables coded as 0s and 1s (or any numeric values) are set
by setx to their means. dichotomous variables stored as factor variables
are set to their mode. For details, see the documentation at:
The distinction between numeric and factor variables is even more useful
in the context of multiple (>2) category variables. For example, a
variable like religion (coded as 1=Jewish, 2=Catholic, 3=Protestant,
4=Other) is clearly a factor and thus not numeric, which we can seen
because the mean of this variable would be nonsense.
If you're importing data from Stata, for example, all variables will be
numeric (or character) since Stata doesn't know the difference between
numeric and factor data types.
Gary
: Gary King, King(a)Harvard.Edu
:
: Center for Basic Research Direct (617) 495-2027 :
: in the Social Sciences Assistant (617) 495-9271 :
: 34 Kirkland Street, Rm. 2 HU-MIT DC (617) 495-4734 :
: Harvard U, Cambridge, MA 02138 eFax (617) 812-8581 :
On Tue, 11 May 2004, Steve Purpura wrote:
Thanks, Olivia.
As long as we have this example to play with, let's examine another area
that is open for question: how does setx() work with Boolean independent
variables?
If I say, x.out<-setx(z.out) with variables that are either 0/1 in the
independent set, I probably need to do something special to tell Zelig to
process them without using the mean/median. Right? Is the default behavior
dependent upon the datatype of the independent variable?
Steve
-----Original Message-----
From: Olivia Lau [mailto:olau@fas.harvard.edu]
Sent: Sunday, May 09, 2004 11:40 PM
To: Steve Purpura
Cc: zelig(a)latte.harvard.edu
Subject: Re: [zelig] Problem with setx and character variables
Dear Steve,
Thanks for pointing this out. We'll take a look. Meanwhile, you can get
around this problem by cutting out the columns "country",
"casename",
"ended", "ethwar", and "waryrs" -- I don't think you
use them in your
specifications. You can use
data <- f.laitin[, c(....the columns you want
to keep in quotes ...)]
to do this.
In addition, you might want to be careful about how you use attach(). If you
have a variable named steve in a data frame, and a vector named steve in
your workspace, R will get confused and try to pick up the vector rather
than finding it in the data frame when you specify your formula. I
generally avoid attach() whenver possible, and always detach() before
running zelig() or any other model-fitting command. In your case, you need
to remember to column bind myonset to your data frame before running the
regression -- generally a good practice for creating replication data sets
in the future.
We'll let you know when the fix is ready.
Yours,
Olivia Lau
On Sun, 9 May 2004, Steve Purpura wrote:
> I'm trying to replicate Fearon and Laitin's "Ethnicity, Insurgency,
> and Civil War" data (search for repdata.zip associated with Laitin's
> web site). This replication pointed out an annoying problem with
> setx() and sim() while using model = "logit".
>
> Error in setx.default(z.out.my) : character is not a supported
> variable type (use factor or numeric).
>
> It would be nice if Zelig could just convert the data as appropriate.
>
>
> ###
> ### Stephen Purpura
> ###
> ###
> options(digits=3,scipen=12)
> library(car)
> library(lattice)
> library(modreg)
> library(foreign)
> library(Zelig)
>
> ######################################################################
> ###
> ###
> ###
http://www.stanford.edu/group/ethnic/publicdata/publicdata.html
> ###
> ###
> wd<-"z:/Documents and Settings/Steve/My Documents/UW POLS503/Assignment
2"
> setwd(wd)
>
> f.laitin<-read.dta('repdata.dta')
> attach(f.laitin)
>
> onset[onset==4]<-0
>
> ###
> ### Build Table #1, Col 1 from the paper
> ###
> z.out.1<-zelig(formula = as.factor(onset) ~ warl + gdpenl + lpopl1 +
> lmtnest
> + ncontig + Oil + nwstate + instab + polity2l + ethfrac + relfrac,
> + model =
> "logit", data = f.laitin)
> summary(z.out.1)
>
> pred.1<-trunc(plogis(predict(z.out.1))/.5)
>
> pred<-trunc((1/(1+exp(-predict(z.out.1))))/.5)
> table(pred,z.out.1$y)
>
> ###
> ### col #2
> ### TODO: if second > .049999
> ###
> z.out.2<-zelig(formula = as.factor(ethonset) ~ warl + gdpenl + lpopl1
> + lmtnest + ncontig + Oil + nwstate + instab + polity2l + ethfrac +
> relfrac, model = "logit", data = f.laitin)
> summary(z.out.2)
>
>
> ###
> ### col #3
> ###
> z.out.3<-zelig(formula = as.factor(onset) ~ warl + gdpenl + lpopl1 +
> lmtnest
> + ncontig + Oil + nwstate + instab + anocl + deml + ethfrac + relfrac,
> + model
> = "logit", data = f.laitin)
> summary(z.out.3)
>
> ###
> ### col #4
> ###
> z.out.4<-zelig(formula = as.factor(emponset) ~ empwarl + empgdpenl +
> emplpopl + emplmtnest + empncontig + Oil + nwstate + instab +
> empethfrac, model = "logit", data = f.laitin)
> summary(z.out.4)
>
> ###
> ### col #5
> ###
> z.out.5<-zelig(formula = as.factor(cowonset) ~ cowwarl + gdpenl +
> lpopl1 + lmtnest + ncontig + Oil + nwstate + instab + anocl + deml +
> ethfrac + relfrac, model = "logit", data = f.laitin)
> summary(z.out.5)
>
>
> #### logit dependent variable
> # major onset variables are:
> # onset
> # ethonset
> # emponset
> # colonset
> # cowonset
> # sdonset
> #
> # so let's check their values first
>
> table(onset,exclude=NULL)
> table(ethonset,exclude=NULL)
> table(emponset,exclude=NULL)
> table(colonset,exclude=NULL)
> table(sdonset,exclude=NULL)
> table(cowonset,exclude=NULL)
>
>
> ###
> ### Our variable
> ###
> m.onset<-onset
> m.ethonset<-ethonset
> m.emponset<-emponset
> m.cowonset<-cowonset
> m.colonset<-colonset
> m.sdonset<-sdonset
>
> m.onset[m.onset>1]<-0
> m.ethonset[m.ethonset>1]<-0
> m.emponset[m.emponset>1]<-0
> m.cowonset[m.cowonset>1]<-0
> m.colonset[m.colonset>1]<-0
> m.sdonset[m.sdonset>1]<-0
>
> # there are NA in colonset and cowonset; I will set these to # zero
> and check. Goofy code but it gets around the problem # with NA.
> ccolonset<-rep(0,length(m.colonset))
> ccowonset<-rep(0,length(m.cowonset))
> ccolonset[m.colonset==1]<-1
> ccowonset[m.cowonset==1]<-1
> m.colonset<-ccolonset
> m.cowonset<-ccowonset
> table(m.colonset,exclude=NULL)
> table(m.cowonset,exclude=NULL)
>
> myonset<-m.onset | m.ethonset | m.emponset | m.cowonset | m.colonset |
> m.sdonset
>
> table(m.onset)
> table(m.ethonset)
> table(m.emponset)
> table(m.cowonset)
> table(m.colonset)
> table(m.sdonset)
>
> table(myonset)
>
> z.out.my<-zelig(formula = myonset ~ warl + gdpenl + lpopl1 + lmtnest +
> ncontig + Oil + nwstate + instab + polity2l + ethfrac + relfrac +
> anocl + deml, model = "logit", data = f.laitin)
> summary(z.out.my)
> write.table(z.out.my$coeff, file = "myonset.csv", sep = ",",
col.names
> = NA)
> x.out<-setx(z.out.my) ## failure
> s.out<-sim(z.out.my,x=x.out)
> summary(s.out)
> plot(s.out)
>
>
>
> -
> Zelig Mailing List, served by Harvard-MIT Data Center
> Send messages: zelig(a)latte.harvard.edu
> [un]subscribe Options:
http://lists.hmdc.harvard.edu/?info=zelig
> Zel
-
Zelig Mailing List, served by Harvard-MIT Data Center
Send messages: zelig(a)latte.harvard.edu
[un]subscribe Options: