You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Ilya Gluhovsky <il...@gmail.com> on 2011/11/11 03:21:19 UTC

logistic regression mishaps

Hi Everyone,

Doing the example that comes with the distribution:

mahout org.apache.mahout.classifier.sgd.TrainLogistic \
    --passes 1000    \
    --rate 50 --lambda 0.001  \
    --input         /shared_data/test/donut.csv      \
    --features 21  \
    --output        /shared_data/test/donut_model              \
    --target color  \
    --categories 2  \
    --predictors x y xx xy yy a b c --types n n


1.  Documentation:  is there a better source to learn about these parameters
than --help?

2.  "features" seems to refer to some internal feature representation tricks
inside Mahout.  However, even in this toy example changing it around the example
value of 21 to, say, 15, 30, and 100 makes a big difference to the coefficient
vector.  Any thoughts on this apparent instability?

3.  Is there any way of getting the json from the --output file donut_model?  It
apparently is some sort of a binary file.

Thanks a lot!
Ilya.