You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by jun li <ju...@gmail.com> on 2010/09/02 09:28:25 UTC
bayes classifier classify 20news-group all as unknow when using
mapreduce method.
hi,
when I using sequential method to classify 20news-groups dataset, all is ok.
but when I change the method to mapreduce, its confusion matrix all
becomes 0. and see output file , it all classified as unknown.
the following is my shell scripts.
train.sh:
MAHOUT_HOME=/home/lijun/mahout-0.3
$HADOOP_HOME/bin/hadoop fs -put $MAHOUT_HOME/examples/20news-input 20news-input
hadoop \
jar \
$MAHOUT_HOME/examples/target/mahout-examples-0.3.job \
org.apache.mahout.classifier.bayes.TrainClassifier \
-i 20news-input \
-o newsmodel-ng1 \
-ng 1 \
-type bayes \
-source hdfs
test.sh :
hadoop \
jar \
$MAHOUT_HOME/examples/target/mahout-examples-0.3.job \
org.apache.mahout.classifier.bayes.TestClassifier \
-m newsmodel-ng1 \
-d 20news-input \
-ng 1 \
-type bayes \
-source hdfs \
-v \
-method mapreduce ( only here is changed, others untouched. )
when using mapreduce, the result matrix all is 0.
and see output file , they are all classifed as unknown.
../bin/mahout seqdumper -s 20news-input-output/part-00000
Input Path: 20news-input-output/part-00000
Key class: class org.apache.mahout.common.StringTuple Value Class:
class org.apache.hadoop.io.DoubleWritable
Key: [__CT, alt.atheism, unknown]: Value: 799.0
Key: [__CT, comp.graphics, unknown]: Value: 973.0
Key: [__CT, comp.os.ms-windows.misc, unknown]: Value: 985.0
Key: [__CT, comp.sys.ibm.pc.hardware, unknown]: Value: 982.0
Key: [__CT, comp.sys.mac.hardware, unknown]: Value: 961.0
Key: [__CT, comp.windows.x, unknown]: Value: 980.0
Key: [__CT, misc.forsale, unknown]: Value: 972.0
Key: [__CT, rec.autos, unknown]: Value: 990.0
Key: [__CT, rec.motorcycles, unknown]: Value: 994.0
Key: [__CT, rec.sport.baseball, unknown]: Value: 994.0
Key: [__CT, rec.sport.hockey, unknown]: Value: 999.0
Key: [__CT, sci.crypt, unknown]: Value: 991.0
Key: [__CT, sci.electronics, unknown]: Value: 981.0
Key: [__CT, sci.med, unknown]: Value: 990.0
Key: [__CT, sci.space, unknown]: Value: 987.0
Key: [__CT, soc.religion.christian, unknown]: Value: 997.0
Key: [__CT, talk.politics.guns, unknown]: Value: 910.0
Key: [__CT, talk.politics.mideast, unknown]: Value: 940.0
Key: [__CT, talk.politics.misc, unknown]: Value: 775.0
Key: [__CT, talk.religion.misc, unknown]: Value: 628.0
Count: 20
I think maybe bugs happened at modeling loading before mapper.
Any suggest or patch ?
thanks.
--
Li Jun
Re: bayes classifier classify 20news-group all as unknow when using
mapreduce method.
Posted by Xiaomeng Wan <sh...@gmail.com>.
the model isn't loaded correctly, try to give it the full path.
Regards,
Xiaomeng
On Thu, Sep 2, 2010 at 1:28 AM, jun li <ju...@gmail.com> wrote:
> hi,
> when I using sequential method to classify 20news-groups dataset, all is ok.
> but when I change the method to mapreduce, its confusion matrix all
> becomes 0. and see output file , it all classified as unknown.
>
> the following is my shell scripts.
>
> train.sh:
> MAHOUT_HOME=/home/lijun/mahout-0.3
> $HADOOP_HOME/bin/hadoop fs -put $MAHOUT_HOME/examples/20news-input 20news-input
> hadoop \
> jar \
> $MAHOUT_HOME/examples/target/mahout-examples-0.3.job \
> org.apache.mahout.classifier.bayes.TrainClassifier \
> -i 20news-input \
> -o newsmodel-ng1 \
> -ng 1 \
> -type bayes \
> -source hdfs
>
> test.sh :
> hadoop \
> jar \
> $MAHOUT_HOME/examples/target/mahout-examples-0.3.job \
> org.apache.mahout.classifier.bayes.TestClassifier \
> -m newsmodel-ng1 \
> -d 20news-input \
> -ng 1 \
> -type bayes \
> -source hdfs \
> -v \
> -method mapreduce ( only here is changed, others untouched. )
>
> when using mapreduce, the result matrix all is 0.
> and see output file , they are all classifed as unknown.
> ../bin/mahout seqdumper -s 20news-input-output/part-00000
> Input Path: 20news-input-output/part-00000
> Key class: class org.apache.mahout.common.StringTuple Value Class:
> class org.apache.hadoop.io.DoubleWritable
> Key: [__CT, alt.atheism, unknown]: Value: 799.0
> Key: [__CT, comp.graphics, unknown]: Value: 973.0
> Key: [__CT, comp.os.ms-windows.misc, unknown]: Value: 985.0
> Key: [__CT, comp.sys.ibm.pc.hardware, unknown]: Value: 982.0
> Key: [__CT, comp.sys.mac.hardware, unknown]: Value: 961.0
> Key: [__CT, comp.windows.x, unknown]: Value: 980.0
> Key: [__CT, misc.forsale, unknown]: Value: 972.0
> Key: [__CT, rec.autos, unknown]: Value: 990.0
> Key: [__CT, rec.motorcycles, unknown]: Value: 994.0
> Key: [__CT, rec.sport.baseball, unknown]: Value: 994.0
> Key: [__CT, rec.sport.hockey, unknown]: Value: 999.0
> Key: [__CT, sci.crypt, unknown]: Value: 991.0
> Key: [__CT, sci.electronics, unknown]: Value: 981.0
> Key: [__CT, sci.med, unknown]: Value: 990.0
> Key: [__CT, sci.space, unknown]: Value: 987.0
> Key: [__CT, soc.religion.christian, unknown]: Value: 997.0
> Key: [__CT, talk.politics.guns, unknown]: Value: 910.0
> Key: [__CT, talk.politics.mideast, unknown]: Value: 940.0
> Key: [__CT, talk.politics.misc, unknown]: Value: 775.0
> Key: [__CT, talk.religion.misc, unknown]: Value: 628.0
> Count: 20
>
> I think maybe bugs happened at modeling loading before mapper.
> Any suggest or patch ?
> thanks.
>
>
>
> --
> Li Jun
>