You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by jun li <ju...@gmail.com> on 2010/09/02 09:28:25 UTC

bayes classifier classify 20news-group all as unknow when using mapreduce method.

hi,
when I using sequential method to classify 20news-groups  dataset, all is ok.
but when I change the method to mapreduce, its confusion matrix all
becomes 0. and see output file , it all classified as unknown.

the following is my shell scripts.

train.sh:
MAHOUT_HOME=/home/lijun/mahout-0.3
$HADOOP_HOME/bin/hadoop fs -put $MAHOUT_HOME/examples/20news-input 20news-input
hadoop \
    jar \
    $MAHOUT_HOME/examples/target/mahout-examples-0.3.job \
    org.apache.mahout.classifier.bayes.TrainClassifier \
    -i 20news-input \
    -o newsmodel-ng1 \
    -ng 1 \
    -type bayes \
    -source hdfs

test.sh :
hadoop \
    jar \
    $MAHOUT_HOME/examples/target/mahout-examples-0.3.job \
    org.apache.mahout.classifier.bayes.TestClassifier \
    -m newsmodel-ng1 \
    -d 20news-input \
    -ng 1 \
   -type bayes \
   -source hdfs \
   -v \
   -method mapreduce  ( only here is changed, others untouched. )

when using mapreduce,  the result matrix all is 0.
and see output file , they are all classifed as unknown.
 ../bin/mahout seqdumper -s 20news-input-output/part-00000
Input Path: 20news-input-output/part-00000
Key class: class org.apache.mahout.common.StringTuple Value Class:
class org.apache.hadoop.io.DoubleWritable
Key: [__CT, alt.atheism, unknown]: Value: 799.0
Key: [__CT, comp.graphics, unknown]: Value: 973.0
Key: [__CT, comp.os.ms-windows.misc, unknown]: Value: 985.0
Key: [__CT, comp.sys.ibm.pc.hardware, unknown]: Value: 982.0
Key: [__CT, comp.sys.mac.hardware, unknown]: Value: 961.0
Key: [__CT, comp.windows.x, unknown]: Value: 980.0
Key: [__CT, misc.forsale, unknown]: Value: 972.0
Key: [__CT, rec.autos, unknown]: Value: 990.0
Key: [__CT, rec.motorcycles, unknown]: Value: 994.0
Key: [__CT, rec.sport.baseball, unknown]: Value: 994.0
Key: [__CT, rec.sport.hockey, unknown]: Value: 999.0
Key: [__CT, sci.crypt, unknown]: Value: 991.0
Key: [__CT, sci.electronics, unknown]: Value: 981.0
Key: [__CT, sci.med, unknown]: Value: 990.0
Key: [__CT, sci.space, unknown]: Value: 987.0
Key: [__CT, soc.religion.christian, unknown]: Value: 997.0
Key: [__CT, talk.politics.guns, unknown]: Value: 910.0
Key: [__CT, talk.politics.mideast, unknown]: Value: 940.0
Key: [__CT, talk.politics.misc, unknown]: Value: 775.0
Key: [__CT, talk.religion.misc, unknown]: Value: 628.0
Count: 20

I think maybe bugs happened at  modeling loading  before mapper.
Any suggest or patch ?
thanks.



-- 
Li Jun

Re: bayes classifier classify 20news-group all as unknow when using mapreduce method.

Posted by Xiaomeng Wan <sh...@gmail.com>.
the model isn't loaded correctly, try to give it the full path.

Regards,
Xiaomeng

On Thu, Sep 2, 2010 at 1:28 AM, jun li <ju...@gmail.com> wrote:
> hi,
> when I using sequential method to classify 20news-groups  dataset, all is ok.
> but when I change the method to mapreduce, its confusion matrix all
> becomes 0. and see output file , it all classified as unknown.
>
> the following is my shell scripts.
>
> train.sh:
> MAHOUT_HOME=/home/lijun/mahout-0.3
> $HADOOP_HOME/bin/hadoop fs -put $MAHOUT_HOME/examples/20news-input 20news-input
> hadoop \
>    jar \
>    $MAHOUT_HOME/examples/target/mahout-examples-0.3.job \
>    org.apache.mahout.classifier.bayes.TrainClassifier \
>    -i 20news-input \
>    -o newsmodel-ng1 \
>    -ng 1 \
>    -type bayes \
>    -source hdfs
>
> test.sh :
> hadoop \
>    jar \
>    $MAHOUT_HOME/examples/target/mahout-examples-0.3.job \
>    org.apache.mahout.classifier.bayes.TestClassifier \
>    -m newsmodel-ng1 \
>    -d 20news-input \
>    -ng 1 \
>   -type bayes \
>   -source hdfs \
>   -v \
>   -method mapreduce  ( only here is changed, others untouched. )
>
> when using mapreduce,  the result matrix all is 0.
> and see output file , they are all classifed as unknown.
>  ../bin/mahout seqdumper -s 20news-input-output/part-00000
> Input Path: 20news-input-output/part-00000
> Key class: class org.apache.mahout.common.StringTuple Value Class:
> class org.apache.hadoop.io.DoubleWritable
> Key: [__CT, alt.atheism, unknown]: Value: 799.0
> Key: [__CT, comp.graphics, unknown]: Value: 973.0
> Key: [__CT, comp.os.ms-windows.misc, unknown]: Value: 985.0
> Key: [__CT, comp.sys.ibm.pc.hardware, unknown]: Value: 982.0
> Key: [__CT, comp.sys.mac.hardware, unknown]: Value: 961.0
> Key: [__CT, comp.windows.x, unknown]: Value: 980.0
> Key: [__CT, misc.forsale, unknown]: Value: 972.0
> Key: [__CT, rec.autos, unknown]: Value: 990.0
> Key: [__CT, rec.motorcycles, unknown]: Value: 994.0
> Key: [__CT, rec.sport.baseball, unknown]: Value: 994.0
> Key: [__CT, rec.sport.hockey, unknown]: Value: 999.0
> Key: [__CT, sci.crypt, unknown]: Value: 991.0
> Key: [__CT, sci.electronics, unknown]: Value: 981.0
> Key: [__CT, sci.med, unknown]: Value: 990.0
> Key: [__CT, sci.space, unknown]: Value: 987.0
> Key: [__CT, soc.religion.christian, unknown]: Value: 997.0
> Key: [__CT, talk.politics.guns, unknown]: Value: 910.0
> Key: [__CT, talk.politics.mideast, unknown]: Value: 940.0
> Key: [__CT, talk.politics.misc, unknown]: Value: 775.0
> Key: [__CT, talk.religion.misc, unknown]: Value: 628.0
> Count: 20
>
> I think maybe bugs happened at  modeling loading  before mapper.
> Any suggest or patch ?
> thanks.
>
>
>
> --
> Li Jun
>