Dear Jack,
It is great to hear that are using the anchors R library. Your
comments are very useful for helping us to improve the documentation,
and I hope the following comments answer your questions.
First, the cutpoints for the ordered probit model
(i.e. "gamma1.cut1.ones")
are intransitive (.0955, -.6177, -.3790, -.2718)
The cutpoints (taus) in the ordered probit are parameterized in the
same manner as chopit model, see equation 8 in
http://gking.harvard.edu/vign/software/anchors/docs/node7.html
So we have
tau_1 = gamma1.cut1
tau_2 = tau_1 + exp(gamm1.cut2)
...
tau_n = tau_{n-1} + exp(gamma1.cut{n})
Converting the ordered probit gammas
cut1 cut2 cut3 cut4
gamma 0.0955, -.6177, -.3790, -.2718
using this method, we get the taus:
cut1 cut2 cut3 cut4
tau 0.0955 0.6346 1.3192 2.0812
Second, the reported lnse.self if 0.0000, with NaN
standard error
The default normalization, and the flexible way of specifying
alternative normalizations is described at:
http://gking.harvard.edu/vign/software/anchors/docs/node8.html
http://gking.harvard.edu/vign/software/anchors/docs/node9.html
The R code is designed to allow the user to choose the substantively
useful manner in which to define the location and scale of the latent
variables.
The default is to mimic the ordered probit normalization, which omits
the intercept from mu = X'beta, and set the variance of the
self-questions equal to 1, but since all variances in the statistical
model are parameterized in their log form we have lnse.self = 0. To
map between the variables in the R code and the model, by default
sigma = 1
lnse.self = log( sigma ) = 0
Since lnse.self is fixed in this normalization, there is no standard
error, and hence the value is printed as NaN.
Note, if you include an intercept in mu = X'beta, this intercept will
be constrained to be zero (and not estimated) also with no standard
error (hence NaN). In example R code, the intercept is not included
in the definition of the covariates of mu.
(the lnse.re is also 0, but this makes sense since
there is only one
measure of efficacy in the example).
Yes, exactly. By default chopit() will estimate a random effect (but
this can be turned off as an option). If, however, there is only one
self-question, then the following warning is printed in the log of the
run:
WARNING! Cannot estimate random effect with only one self-question
WARNING! DISABLING random effect estimation for this run.
Finally, all of the coefficients and standard errors
that are not in the
model cause warnings (about 50 in all) and are reported NaN in the output.
The warnings you note are produced at intermediate stages by the
optimization algorithm BFGS trying out areas of the parameter space
which result in probabilities of 0 or 1 for a choice category.
Calculations of log(p) = log(0) or log(1-p) = log(1-1) produce NaN.
If the situation of a category with a zero probability affects the fit
of the model at any stage of the estimation, because that category was
chosen by a respondent, an important warning will be issued. For
example, in the output using the chopitsim data produces currently
very terse warnings such as:
WARNING: boosting cases N = 8
This indicates that 8 respondents chose a category with zero
probability. (I will make this a more verbose warning!) Away from the
optimum maximum likelihood values, it is easy to give values which
break the model by producing a mu many, many standard deviations from
the mean of the latent dimension of y. To help the estimation
algorithm get out of these parameter areas which are far from the
optimum, the probability is boosted by a small amount to make it
nonzero.
With the real data I have analyzed, at the optimum there is never a
chosen category with a zero probability, but this will a function of
the data and could occur. A user should be profoundly sceptical if
the warnings occur after the model converges or during the calculation
of the numerical hessian.
I should note also that the nonparametric example
seems to work fine.
Great!
I assume that the data are not the original survey
data, but rather
simulated (this was the case for the gllamm implementation as well).
Yes, these are simulated data. A description of the data is provided
along with the on-line documentation for the functions. For example,
from the R command line or inside a batch file, you can type
help(poleff)
for the description. Similarly,
help(chopit)
help(anchors)
will bring up other useful summaries of the functions and an overview
of the library of the functions. More detailed information about the
library and functions (with more example code) is available at
http://gking.harvard.edu/vign/software/anchors/docs/
Please let me know if I can be of assistance in clarifying the
use of the R library.
Sincerely,
Jonathan
__________________________________________________________________
Jonathan Wand Center for Basic Research
CBRSS Research Fellow in the Social Sciences
jwand(a)latte.harvard.edu Harvard University
www.fas.harvard.edu/~jwand 34 Kirkland Street
617.496.2260(tel) 617.496.5149(fax) Cambridge, MA 02138
__________________________________________________________________
Jack Buckley writes:
Hi, all.
I have previously used gllamm to estimate several chopit models, but I
recently decided to switch to R to see if there are any gains in
computational efficiency (and easier stochastic simulation). I am wondering
if anyone else has estimated the model with the example data (China and
Mexico's political efficacy).
When I estimate the model using the unaltered data, I get a few curious
results:
First, the cutpoints for the ordered probit model (i.e. "gamma1.cut1.ones")
are intransitive (.0955, -.6177, -.3790, -.2718)
Second, the reported lnse.self if 0.0000, with NaN standard error (the
lnse.re is also 0, but this makes sense since there is only one measure of
efficacy in the example).
Finally, all of the coefficients and standard errors that are not in the
model cause warnings (about 50 in all) and are reported NaN in the output.
I should note also that the nonparametric example seems to work fine.
If anyone else has tried the examples, did they get the same results?
I assume that the data are not the original survey data, but rather
simulated (this was the case for the gllamm implementation as well). Thus,
all of the points I mention may be either artifacts of the simulated data
(especially the first point) or normal operation of the program, but I just
want to be sure before I estimate a more complex, multiple measures model.
I am using R 1.6.0 (the October 2002 release) with the precompliled binaries
for Windows, on a PIII450 box.
Thanks,
Jack
___________________
Jack Buckley
Department of Political Science
State University of New York at Stony Brook
sbuckley(a)ic.sunysb.edu
Voice:(631) 632-4353
Fax: (631) 632-4116
Web:
www.sinc.sunysb.edu/Stu/sbuckley
---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (
http://www.grisoft.com).
Version: 6.0.394 / Virus Database: 224 - Release Date: 10/3/2002
-
vign mailing list served by Harvard-MIT Data Center
List Address: vign(a)latte.harvard.edu
Subscribe/Unsubscribe:
http://lists.hmdc.harvard.edu/?info=vign
-
vign mailing list served by Harvard-MIT Data Center
List Address: vign(a)latte.harvard.edu
Subscribe/Unsubscribe:
http://lists.hmdc.harvard.edu/?info=vign