You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Robin Anil <ro...@gmail.com> on 2010/12/08 06:08:50 UTC

Re: NPE in bayes wiki example

Did you try what I mentioned?

On Tue, Nov 30, 2010 at 8:11 AM, Robin Anil <ro...@gmail.com> wrote:

>
> On Tue, Nov 30, 2010 at 7:47 AM, Divya <di...@k2associates.com.sg> wrote:
>
>> Hi,
>>
>> Thanks for the advice Robin.
>> But most of the time I don’t get response of issues I am facing that’s why
>>  I reframe it and post it again.
>>
> The responses are usually delayed based on availability of free time for
> all of us. Mahout community is made up of people who contribute as much as
> they can when they find time as it is not part of our day to day work. So in
> the time we get, unless we see details of the problem, we can't do anything
> other than ask you again for details and this round trip keeps the
> conversation going. I can point to many tutorials(even I read through them
> before hacking away on Mahout) like this one
> http://www.catb.org/~esr/faqs/smart-questions.html which will help you
> understand a bit more of why people behave on mailing lists they way you
> would have perceived.
>
>>
>
> May someone can understand my problem and would be able to help me.
>> As I am new bee to Mahout and don’t have any experience in this field.
>>
>> We do want more new-bees coming in to Mahout :)
>
>> I am trying run the Wikipedia classification example.
>> I have downloaded Wikipedia data set and created chunks of that data(1 MB
>> each).
>> I am using one of the chunk file for as my input data for Wikipedia
>> example.
>>
>>
>> Steps I followed are :
>> 1.Created train input data set using one of the chunk of Wikipedia data
>> set and subjects.txt with the help of wikipediaDataSetCreator CLI.
>> 2.Repeated the first step but here the  used another chunk of Wikipedia
>> data set to create test input data.
>> 3.Train the classifier by passing train input data set.
>> 4.Test the classifier by passing train input data set as model and test
>> input data set as testdir.
>>
>> Now the issue is when I try to testclassifier by passing trained data set
>> as model and train input data set as testdir I am able to view the result in
>> form of confusion matrix.
>> But when I try to test classifier by passing by passing trained data set
>> as model and test input data set(which I have created in second step) as
>> testdir I get null pointer exception as shown in below mail.
>>
> Now I get what you are talking about. Can you do one thing. Can you train
> the model using the test input dataset and try to classify the test dataset.
> I want to check whether there is any corruption in the test dataset which is
> causing this NPE
>
>
>
>
>>
>>           Name                                                 Size
>>
>> Initial Train input data set                                 2 MB (two
>> chunks)
>> Initial Test input data set                                  1 MB (one
>> chunk)
>> Train data set after wikipediadatasetcreater(part-r-00000)   154 KB
>> Test data set after wikipediadatasetcreater(part-r-00000)    43 KB
>> Train model data set(trainer-thetaNormalizer)                1 KB
>> Train model data set(trainer-tfIdf)                          311 KB
>> Train model data set(trainer-weights\Sigma_j)                215 KB
>> Train model data set(trainer-weights\Sigma_kSigma_j)          1 KB
>> Train model data set(trainer-weights\Sigma_k)                 1 KB
>>
>>
>> The model sizes look fine.  Infact model loading didnt seem to have any
> issue as per the logs you posted
>
>> Hope I will get solution of my issue now.
>>
>> Thanks much
>> Regards,
>> Divya
>>
>>
>>
>>
>>
>>
>>
>> -----Original Message-----
>> From: Robin Anil [mailto:robin.anil@gmail.com]
>> Sent: Monday, November 29, 2010 7:34 PM
>> To: user@mahout.apache.org
>> Subject: Re: NPE in bayes wiki example
>>
>> Hi Divya, I am kind of overwhelmed by the flurry of emails from you and
>> the
>> replies. I am currently not able to make head and tail of the problem you
>> are facing. It would be really helpful if you can write a bit more about
>> the
>> input files the command your ran, the output files generated. their sizes,
>> and so on. and maybe use a single email-thread for all Bayes classifier
>> related problems. I guarantee you, I will be able to solve your issues
>> with
>> Bayes classifier much faster.
>>
>> Regards
>> Robin
>>
>> On Mon, Nov 29, 2010 at 12:54 PM, Divya <di...@k2associates.com.sg>
>> wrote:
>>
>> > Hi,
>> >
>> > Steps I followed are below :
>> >
>> > $  bin/mahout wikipediaDataSetCreator  -i
>> >
>> >
>> D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/Traininput
>> > -o examples/bi
>> > n/work/wikipedia/wikipediaClassification/train-subject -c
>> > $MAHOUT_HOME/examples/src/test/resources/subjects.txt
>> >
>> > $  bin/mahout wikipediaDataSetCreator  -i
>> >
>> D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/Testinput
>> > -o examples/bin
>> > /work/wikipedia/wikipediaClassification/test-subject -c
>> > $MAHOUT_HOME/examples/src/test/resources/subjects.txt
>> >
>> > $ bin/mahout trainclassifier -i
>> > examples/bin/work/wikipedia/wikipediaClassification/train-subject -o
>> > examples/bin/work/wikipedia/wikip
>> > ediaClassification/wikipedia-subject-model
>> >
>> > $ bin/mahout testclassifier -m
>> >
>> examples/bin/work/wikipedia/wikipediaClassification/wikipedia-subject-model
>> > -d examples/bin/work/wikipedia/wikipediaClassification/test-subject
>> >
>> >
>> > Regards,
>> > Divya
>> >
>> >
>> >
>> > -----Original Message-----
>> > From: Grant Ingersoll [mailto:gsingers@apache.org]
>> > Sent: Saturday, November 27, 2010 8:54 PM
>> > To: user@mahout.apache.org
>> > Subject: Re: NPE in bayes wiki example
>> >
>> > Can you provide all the steps you have done up to this point?
>> >
>> > -Grant
>> >
>> > On Nov 25, 2010, at 12:57 AM, Divya wrote:
>> >
>> > > Hi,
>> > >
>> > > I am getting null pointer exception when I pass my test input data to
>> > > testclassifier
>> > >
>> > >
>> > >
>> > > $ bin/mahout testclassifier -m
>> > >
>> >
>> examples/bin/work/wikipedia/wikipediaClassification/wikipedia-subject-model
>> > > -d examples/bin/work/wikipe
>> > >
>> > > dia/wikipediaClassification/test-subject
>> > >
>> > > Running on hadoop, using
>> HADOOP_HOME=C:\cygwin\home\Divya\hadoop-0.20.2
>> > >
>> > > HADOOP_CONF_DIR=C:\cygwin\home\Divya\hadoop-0.20.2\conf
>> > >
>> > > 10/11/25 13:51:36 INFO bayes.TestClassifier: Loading model from:
>> > > {basePath=examples/bin/work/wikipedia/wikipediaClassification/wikipedi
>> > >
>> > > a-subject-model, classifierType=bayes, alpha_i=1.0, dataSource=hdfs,
>> > > gramSize=1, verbose=false, encoding=UTF-8, defaultCat=unknown, tes
>> > >
>> > >
>> >
>> tDirPath=examples/bin/work/wikipedia/wikipediaClassification/test-subject}
>> > >
>> > > 10/11/25 13:51:36 INFO bayes.TestClassifier: Testing Bayes Classifier
>> > >
>> > > 10/11/25 13:51:38 INFO io.SequenceFileModelReader:
>> > >
>> >
>> >
>> file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki
>> > > pedia-su
>> > >
>> > > bject-model/trainer-weights/Sigma_j/part-00000
>> > >
>> > > 10/11/25 13:51:38 INFO io.SequenceFileModelReader:
>> > >
>> >
>> >
>> file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki
>> > > pedia-su
>> > >
>> > > bject-model/trainer-weights/Sigma_k/part-00000
>> > >
>> > > 10/11/25 13:51:38 INFO io.SequenceFileModelReader:
>> > >
>> >
>> >
>> file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki
>> > > pedia-su
>> > >
>> > > bject-model/trainer-weights/Sigma_kSigma_j/part-00000
>> > >
>> > > 10/11/25 13:51:38 INFO io.SequenceFileModelReader: 8.048212844092422
>> > >
>> > > 10/11/25 13:51:39 INFO io.SequenceFileModelReader:
>> > >
>> >
>> >
>> file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki
>> > > pedia-su
>> > >
>> > > bject-model/trainer-thetaNormalizer/part-00000
>> > >
>> > > 10/11/25 13:51:39 INFO io.SequenceFileModelReader:
>> > >
>> >
>> >
>> file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki
>> > > pedia-su
>> > >
>> > > bject-model/trainer-tfIdf/trainer-tfIdf/part-00000
>> > >
>> > > 10/11/25 13:51:39 INFO datastore.InMemoryBayesDatastore: history
>> > > -23722.080627413125 23722.080627413125 -1.0
>> > >
>> > > Exception in thread "main" java.lang.NullPointerException
>> > >
>> > >        at
>> > >
>> >
>> >
>> org.apache.mahout.classifier.ConfusionMatrix.getCount(ConfusionMatrix.java:1
>> > > 02)
>> > >
>> > >        at
>> > >
>> >
>> >
>> org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.
>> > > java:118)
>> > >
>> > >        at
>> > >
>> >
>> >
>> org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.
>> > > java:122)
>> > >
>> > >        at
>> > >
>> >
>> >
>> org.apache.mahout.classifier.ConfusionMatrix.addInstance(ConfusionMatrix.jav
>> > > a:90)
>> > >
>> > >        at
>> > >
>> >
>> >
>> org.apache.mahout.classifier.ResultAnalyzer.addInstance(ResultAnalyzer.java:
>> > > 68)
>> > >
>> > >        at
>> > >
>> >
>> >
>> org.apache.mahout.classifier.bayes.TestClassifier.classifySequential(TestCla
>> > > ssifier.java:266)
>> > >
>> > >        at
>> > >
>> >
>> >
>> org.apache.mahout.classifier.bayes.TestClassifier.main(TestClassifier.java:1
>> > > 86)
>> > >
>> > >        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> > >
>> > >        at
>> > >
>> >
>> >
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39
>> > > )
>> > >
>> > >        at
>> > >
>> >
>> >
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl
>> > > .java:25)
>> > >
>> > >        at java.lang.reflect.Method.invoke(Method.java:597)
>> > >
>> > >        at
>> > >
>> >
>> >
>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver
>> > > .java:68)
>> > >
>> > >        at
>> > > org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>> > >
>> > >        at
>> > org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:184)
>> > >
>> > >        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> > >
>> > >        at
>> > >
>> >
>> >
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39
>> > > )
>> > >
>> > >        at
>> > >
>> >
>> >
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl
>> > > .java:25)
>> > >
>> > >        at java.lang.reflect.Method.invoke(Method.java:597)
>> > >
>> > >        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>> > >
>> > >
>> > >
>> > > My classifier is subjects.txt which has two entries History and
>> Science.
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > but when I pass train input data I get to see the results
>> > >
>> > >
>> > >
>> > > $ bin/mahout testclassifier -m
>> > >
>> >
>> examples/bin/work/wikipedia/wikipediaClassification/wikipedia-subject-model
>> > > -d examples/bin/work/wikipe
>> > >
>> > > dia/wikipediaClassification/train-subject
>> > >
>> > > Running on hadoop, using
>> HADOOP_HOME=C:\cygwin\home\Divya\hadoop-0.20.2
>> > >
>> > > HADOOP_CONF_DIR=C:\cygwin\home\Divya\hadoop-0.20.2\conf
>> > >
>> > > 10/11/25 13:51:54 INFO bayes.TestClassifier: Loading model from:
>> > > {basePath=examples/bin/work/wikipedia/wikipediaClassification/wikipedi
>> > >
>> > > a-subject-model, classifierType=bayes, alpha_i=1.0, dataSource=hdfs,
>> > > gramSize=1, verbose=false, encoding=UTF-8, defaultCat=unknown, tes
>> > >
>> > >
>> >
>> tDirPath=examples/bin/work/wikipedia/wikipediaClassification/train-subject}
>> > >
>> > > 10/11/25 13:51:54 INFO bayes.TestClassifier: Testing Bayes Classifier
>> > >
>> > > 10/11/25 13:51:55 INFO io.SequenceFileModelReader:
>> > >
>> >
>> >
>> file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki
>> > > pedia-su
>> > >
>> > > bject-model/trainer-weights/Sigma_j/part-00000
>> > >
>> > > 10/11/25 13:51:55 INFO io.SequenceFileModelReader:
>> > >
>> >
>> >
>> file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki
>> > > pedia-su
>> > >
>> > > bject-model/trainer-weights/Sigma_k/part-00000
>> > >
>> > > 10/11/25 13:51:55 INFO io.SequenceFileModelReader:
>> > >
>> >
>> >
>> file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki
>> > > pedia-su
>> > >
>> > > bject-model/trainer-weights/Sigma_kSigma_j/part-00000
>> > >
>> > > 10/11/25 13:51:55 INFO io.SequenceFileModelReader: 8.048212844092422
>> > >
>> > > 10/11/25 13:51:55 INFO io.SequenceFileModelReader:
>> > >
>> >
>> >
>> file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki
>> > > pedia-su
>> > >
>> > > bject-model/trainer-thetaNormalizer/part-00000
>> > >
>> > > 10/11/25 13:51:55 INFO io.SequenceFileModelReader:
>> > >
>> >
>> >
>> file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki
>> > > pedia-su
>> > >
>> > > bject-model/trainer-tfIdf/trainer-tfIdf/part-00000
>> > >
>> > > 10/11/25 13:51:55 INFO datastore.InMemoryBayesDatastore: history
>> > > -23722.080627413125 23722.080627413125 -1.0
>> > >
>> > > 10/11/25 13:51:55 INFO bayes.TestClassifier: Classified instances from
>> > > part-r-00000
>> > >
>> > > 10/11/25 13:51:55 INFO bayes.TestClassifier:
>> > > =======================================================
>> > >
>> > > Summary
>> > >
>> > > -------------------------------------------------------
>> > >
>> > > Correctly Classified Instances          :          2           100%
>> > >
>> > > Incorrectly Classified Instances        :          0             0%
>> > >
>> > > Total Classified Instances              :          2
>> > >
>> > >
>> > >
>> > > =======================================================
>> > >
>> > > Confusion Matrix
>> > >
>> > > -------------------------------------------------------
>> > >
>> > > a       <--Classified as
>> > >
>> > > 2        |  2           a     = history
>> > >
>> > > Default Category: unknown: 1
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > 10/11/25 13:51:55 INFO driver.MahoutDriver: Program took 953 ms
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > Can someone please explain the reason behind it.
>> > >
>> > >
>> > >
>> > > Thanks
>> > >
>> > > Regards,
>> > >
>> > > Divya
>> > >
>> >
>> > --------------------------
>> > Grant Ingersoll
>> > http://www.lucidimagination.com/
>> >
>> > Search the Lucene ecosystem docs using Solr/Lucene:
>> > http://www.lucidimagination.com/search
>> >
>> >
>> >
>>
>>
>

RE: NPE in bayes wiki example

Posted by Divya <di...@k2associates.com.sg>.
Thanks Robin 

-----Original Message-----
From: Robin Anil [mailto:robin.anil@gmail.com] 
Sent: Wednesday, December 08, 2010 1:09 PM
To: user@mahout.apache.org
Subject: Re: NPE in bayes wiki example

Did you try what I mentioned?

On Tue, Nov 30, 2010 at 8:11 AM, Robin Anil <ro...@gmail.com> wrote:

>
> On Tue, Nov 30, 2010 at 7:47 AM, Divya <di...@k2associates.com.sg> wrote:
>
>> Hi,
>>
>> Thanks for the advice Robin.
>> But most of the time I don’t get response of issues I am facing that’s why
>>  I reframe it and post it again.
>>
> The responses are usually delayed based on availability of free time for
> all of us. Mahout community is made up of people who contribute as much as
> they can when they find time as it is not part of our day to day work. So in
> the time we get, unless we see details of the problem, we can't do anything
> other than ask you again for details and this round trip keeps the
> conversation going. I can point to many tutorials(even I read through them
> before hacking away on Mahout) like this one
> http://www.catb.org/~esr/faqs/smart-questions.html which will help you
> understand a bit more of why people behave on mailing lists they way you
> would have perceived.
>
>>
>
> May someone can understand my problem and would be able to help me.
>> As I am new bee to Mahout and don’t have any experience in this field.
>>
>> We do want more new-bees coming in to Mahout :)
>
>> I am trying run the Wikipedia classification example.
>> I have downloaded Wikipedia data set and created chunks of that data(1 MB
>> each).
>> I am using one of the chunk file for as my input data for Wikipedia
>> example.
>>
>>
>> Steps I followed are :
>> 1.Created train input data set using one of the chunk of Wikipedia data
>> set and subjects.txt with the help of wikipediaDataSetCreator CLI.
>> 2.Repeated the first step but here the  used another chunk of Wikipedia
>> data set to create test input data.
>> 3.Train the classifier by passing train input data set.
>> 4.Test the classifier by passing train input data set as model and test
>> input data set as testdir.
>>
>> Now the issue is when I try to testclassifier by passing trained data set
>> as model and train input data set as testdir I am able to view the result in
>> form of confusion matrix.
>> But when I try to test classifier by passing by passing trained data set
>> as model and test input data set(which I have created in second step) as
>> testdir I get null pointer exception as shown in below mail.
>>
> Now I get what you are talking about. Can you do one thing. Can you train
> the model using the test input dataset and try to classify the test dataset.
> I want to check whether there is any corruption in the test dataset which is
> causing this NPE
>
>
>
>
>>
>>           Name                                                 Size
>>
>> Initial Train input data set                                 2 MB (two
>> chunks)
>> Initial Test input data set                                  1 MB (one
>> chunk)
>> Train data set after wikipediadatasetcreater(part-r-00000)   154 KB
>> Test data set after wikipediadatasetcreater(part-r-00000)    43 KB
>> Train model data set(trainer-thetaNormalizer)                1 KB
>> Train model data set(trainer-tfIdf)                          311 KB
>> Train model data set(trainer-weights\Sigma_j)                215 KB
>> Train model data set(trainer-weights\Sigma_kSigma_j)          1 KB
>> Train model data set(trainer-weights\Sigma_k)                 1 KB
>>
>>
>> The model sizes look fine.  Infact model loading didnt seem to have any
> issue as per the logs you posted
>
>> Hope I will get solution of my issue now.
>>
>> Thanks much
>> Regards,
>> Divya
>>
>>
>>
>>
>>
>>
>>
>> -----Original Message-----
>> From: Robin Anil [mailto:robin.anil@gmail.com]
>> Sent: Monday, November 29, 2010 7:34 PM
>> To: user@mahout.apache.org
>> Subject: Re: NPE in bayes wiki example
>>
>> Hi Divya, I am kind of overwhelmed by the flurry of emails from you and
>> the
>> replies. I am currently not able to make head and tail of the problem you
>> are facing. It would be really helpful if you can write a bit more about
>> the
>> input files the command your ran, the output files generated. their sizes,
>> and so on. and maybe use a single email-thread for all Bayes classifier
>> related problems. I guarantee you, I will be able to solve your issues
>> with
>> Bayes classifier much faster.
>>
>> Regards
>> Robin
>>
>> On Mon, Nov 29, 2010 at 12:54 PM, Divya <di...@k2associates.com.sg>
>> wrote:
>>
>> > Hi,
>> >
>> > Steps I followed are below :
>> >
>> > $  bin/mahout wikipediaDataSetCreator  -i
>> >
>> >
>> D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/Traininput
>> > -o examples/bi
>> > n/work/wikipedia/wikipediaClassification/train-subject -c
>> > $MAHOUT_HOME/examples/src/test/resources/subjects.txt
>> >
>> > $  bin/mahout wikipediaDataSetCreator  -i
>> >
>> D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/Testinput
>> > -o examples/bin
>> > /work/wikipedia/wikipediaClassification/test-subject -c
>> > $MAHOUT_HOME/examples/src/test/resources/subjects.txt
>> >
>> > $ bin/mahout trainclassifier -i
>> > examples/bin/work/wikipedia/wikipediaClassification/train-subject -o
>> > examples/bin/work/wikipedia/wikip
>> > ediaClassification/wikipedia-subject-model
>> >
>> > $ bin/mahout testclassifier -m
>> >
>> examples/bin/work/wikipedia/wikipediaClassification/wikipedia-subject-model
>> > -d examples/bin/work/wikipedia/wikipediaClassification/test-subject
>> >
>> >
>> > Regards,
>> > Divya
>> >
>> >
>> >
>> > -----Original Message-----
>> > From: Grant Ingersoll [mailto:gsingers@apache.org]
>> > Sent: Saturday, November 27, 2010 8:54 PM
>> > To: user@mahout.apache.org
>> > Subject: Re: NPE in bayes wiki example
>> >
>> > Can you provide all the steps you have done up to this point?
>> >
>> > -Grant
>> >
>> > On Nov 25, 2010, at 12:57 AM, Divya wrote:
>> >
>> > > Hi,
>> > >
>> > > I am getting null pointer exception when I pass my test input data to
>> > > testclassifier
>> > >
>> > >
>> > >
>> > > $ bin/mahout testclassifier -m
>> > >
>> >
>> examples/bin/work/wikipedia/wikipediaClassification/wikipedia-subject-model
>> > > -d examples/bin/work/wikipe
>> > >
>> > > dia/wikipediaClassification/test-subject
>> > >
>> > > Running on hadoop, using
>> HADOOP_HOME=C:\cygwin\home\Divya\hadoop-0.20.2
>> > >
>> > > HADOOP_CONF_DIR=C:\cygwin\home\Divya\hadoop-0.20.2\conf
>> > >
>> > > 10/11/25 13:51:36 INFO bayes.TestClassifier: Loading model from:
>> > > {basePath=examples/bin/work/wikipedia/wikipediaClassification/wikipedi
>> > >
>> > > a-subject-model, classifierType=bayes, alpha_i=1.0, dataSource=hdfs,
>> > > gramSize=1, verbose=false, encoding=UTF-8, defaultCat=unknown, tes
>> > >
>> > >
>> >
>> tDirPath=examples/bin/work/wikipedia/wikipediaClassification/test-subject}
>> > >
>> > > 10/11/25 13:51:36 INFO bayes.TestClassifier: Testing Bayes Classifier
>> > >
>> > > 10/11/25 13:51:38 INFO io.SequenceFileModelReader:
>> > >
>> >
>> >
>> file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki
>> > > pedia-su
>> > >
>> > > bject-model/trainer-weights/Sigma_j/part-00000
>> > >
>> > > 10/11/25 13:51:38 INFO io.SequenceFileModelReader:
>> > >
>> >
>> >
>> file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki
>> > > pedia-su
>> > >
>> > > bject-model/trainer-weights/Sigma_k/part-00000
>> > >
>> > > 10/11/25 13:51:38 INFO io.SequenceFileModelReader:
>> > >
>> >
>> >
>> file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki
>> > > pedia-su
>> > >
>> > > bject-model/trainer-weights/Sigma_kSigma_j/part-00000
>> > >
>> > > 10/11/25 13:51:38 INFO io.SequenceFileModelReader: 8.048212844092422
>> > >
>> > > 10/11/25 13:51:39 INFO io.SequenceFileModelReader:
>> > >
>> >
>> >
>> file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki
>> > > pedia-su
>> > >
>> > > bject-model/trainer-thetaNormalizer/part-00000
>> > >
>> > > 10/11/25 13:51:39 INFO io.SequenceFileModelReader:
>> > >
>> >
>> >
>> file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki
>> > > pedia-su
>> > >
>> > > bject-model/trainer-tfIdf/trainer-tfIdf/part-00000
>> > >
>> > > 10/11/25 13:51:39 INFO datastore.InMemoryBayesDatastore: history
>> > > -23722.080627413125 23722.080627413125 -1.0
>> > >
>> > > Exception in thread "main" java.lang.NullPointerException
>> > >
>> > >        at
>> > >
>> >
>> >
>> org.apache.mahout.classifier.ConfusionMatrix.getCount(ConfusionMatrix.java:1
>> > > 02)
>> > >
>> > >        at
>> > >
>> >
>> >
>> org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.
>> > > java:118)
>> > >
>> > >        at
>> > >
>> >
>> >
>> org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.
>> > > java:122)
>> > >
>> > >        at
>> > >
>> >
>> >
>> org.apache.mahout.classifier.ConfusionMatrix.addInstance(ConfusionMatrix.jav
>> > > a:90)
>> > >
>> > >        at
>> > >
>> >
>> >
>> org.apache.mahout.classifier.ResultAnalyzer.addInstance(ResultAnalyzer.java:
>> > > 68)
>> > >
>> > >        at
>> > >
>> >
>> >
>> org.apache.mahout.classifier.bayes.TestClassifier.classifySequential(TestCla
>> > > ssifier.java:266)
>> > >
>> > >        at
>> > >
>> >
>> >
>> org.apache.mahout.classifier.bayes.TestClassifier.main(TestClassifier.java:1
>> > > 86)
>> > >
>> > >        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> > >
>> > >        at
>> > >
>> >
>> >
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39
>> > > )
>> > >
>> > >        at
>> > >
>> >
>> >
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl
>> > > .java:25)
>> > >
>> > >        at java.lang.reflect.Method.invoke(Method.java:597)
>> > >
>> > >        at
>> > >
>> >
>> >
>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver
>> > > .java:68)
>> > >
>> > >        at
>> > > org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>> > >
>> > >        at
>> > org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:184)
>> > >
>> > >        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> > >
>> > >        at
>> > >
>> >
>> >
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39
>> > > )
>> > >
>> > >        at
>> > >
>> >
>> >
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl
>> > > .java:25)
>> > >
>> > >        at java.lang.reflect.Method.invoke(Method.java:597)
>> > >
>> > >        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>> > >
>> > >
>> > >
>> > > My classifier is subjects.txt which has two entries History and
>> Science.
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > but when I pass train input data I get to see the results
>> > >
>> > >
>> > >
>> > > $ bin/mahout testclassifier -m
>> > >
>> >
>> examples/bin/work/wikipedia/wikipediaClassification/wikipedia-subject-model
>> > > -d examples/bin/work/wikipe
>> > >
>> > > dia/wikipediaClassification/train-subject
>> > >
>> > > Running on hadoop, using
>> HADOOP_HOME=C:\cygwin\home\Divya\hadoop-0.20.2
>> > >
>> > > HADOOP_CONF_DIR=C:\cygwin\home\Divya\hadoop-0.20.2\conf
>> > >
>> > > 10/11/25 13:51:54 INFO bayes.TestClassifier: Loading model from:
>> > > {basePath=examples/bin/work/wikipedia/wikipediaClassification/wikipedi
>> > >
>> > > a-subject-model, classifierType=bayes, alpha_i=1.0, dataSource=hdfs,
>> > > gramSize=1, verbose=false, encoding=UTF-8, defaultCat=unknown, tes
>> > >
>> > >
>> >
>> tDirPath=examples/bin/work/wikipedia/wikipediaClassification/train-subject}
>> > >
>> > > 10/11/25 13:51:54 INFO bayes.TestClassifier: Testing Bayes Classifier
>> > >
>> > > 10/11/25 13:51:55 INFO io.SequenceFileModelReader:
>> > >
>> >
>> >
>> file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki
>> > > pedia-su
>> > >
>> > > bject-model/trainer-weights/Sigma_j/part-00000
>> > >
>> > > 10/11/25 13:51:55 INFO io.SequenceFileModelReader:
>> > >
>> >
>> >
>> file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki
>> > > pedia-su
>> > >
>> > > bject-model/trainer-weights/Sigma_k/part-00000
>> > >
>> > > 10/11/25 13:51:55 INFO io.SequenceFileModelReader:
>> > >
>> >
>> >
>> file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki
>> > > pedia-su
>> > >
>> > > bject-model/trainer-weights/Sigma_kSigma_j/part-00000
>> > >
>> > > 10/11/25 13:51:55 INFO io.SequenceFileModelReader: 8.048212844092422
>> > >
>> > > 10/11/25 13:51:55 INFO io.SequenceFileModelReader:
>> > >
>> >
>> >
>> file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki
>> > > pedia-su
>> > >
>> > > bject-model/trainer-thetaNormalizer/part-00000
>> > >
>> > > 10/11/25 13:51:55 INFO io.SequenceFileModelReader:
>> > >
>> >
>> >
>> file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki
>> > > pedia-su
>> > >
>> > > bject-model/trainer-tfIdf/trainer-tfIdf/part-00000
>> > >
>> > > 10/11/25 13:51:55 INFO datastore.InMemoryBayesDatastore: history
>> > > -23722.080627413125 23722.080627413125 -1.0
>> > >
>> > > 10/11/25 13:51:55 INFO bayes.TestClassifier: Classified instances from
>> > > part-r-00000
>> > >
>> > > 10/11/25 13:51:55 INFO bayes.TestClassifier:
>> > > =======================================================
>> > >
>> > > Summary
>> > >
>> > > -------------------------------------------------------
>> > >
>> > > Correctly Classified Instances          :          2           100%
>> > >
>> > > Incorrectly Classified Instances        :          0             0%
>> > >
>> > > Total Classified Instances              :          2
>> > >
>> > >
>> > >
>> > > =======================================================
>> > >
>> > > Confusion Matrix
>> > >
>> > > -------------------------------------------------------
>> > >
>> > > a       <--Classified as
>> > >
>> > > 2        |  2           a     = history
>> > >
>> > > Default Category: unknown: 1
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > 10/11/25 13:51:55 INFO driver.MahoutDriver: Program took 953 ms
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > Can someone please explain the reason behind it.
>> > >
>> > >
>> > >
>> > > Thanks
>> > >
>> > > Regards,
>> > >
>> > > Divya
>> > >
>> >
>> > --------------------------
>> > Grant Ingersoll
>> > http://www.lucidimagination.com/
>> >
>> > Search the Lucene ecosystem docs using Solr/Lucene:
>> > http://www.lucidimagination.com/search
>> >
>> >
>> >
>>
>>
>


RE: NPE in bayes wiki example

Posted by Divya <di...@k2associates.com.sg>.
@Robin- I have tried what you have suggested and it worked ..
Why previously it didn’t work?


-----Original Message-----
From: Robin Anil [mailto:robin.anil@gmail.com] 
Sent: Wednesday, December 08, 2010 1:09 PM
To: user@mahout.apache.org
Subject: Re: NPE in bayes wiki example

Did you try what I mentioned?

On Tue, Nov 30, 2010 at 8:11 AM, Robin Anil <ro...@gmail.com> wrote:

>
> On Tue, Nov 30, 2010 at 7:47 AM, Divya <di...@k2associates.com.sg> wrote:
>
>> Hi,
>>
>> Thanks for the advice Robin.
>> But most of the time I don’t get response of issues I am facing that’s why
>>  I reframe it and post it again.
>>
> The responses are usually delayed based on availability of free time for
> all of us. Mahout community is made up of people who contribute as much as
> they can when they find time as it is not part of our day to day work. So in
> the time we get, unless we see details of the problem, we can't do anything
> other than ask you again for details and this round trip keeps the
> conversation going. I can point to many tutorials(even I read through them
> before hacking away on Mahout) like this one
> http://www.catb.org/~esr/faqs/smart-questions.html which will help you
> understand a bit more of why people behave on mailing lists they way you
> would have perceived.
>
>>
>
> May someone can understand my problem and would be able to help me.
>> As I am new bee to Mahout and don’t have any experience in this field.
>>
>> We do want more new-bees coming in to Mahout :)
>
>> I am trying run the Wikipedia classification example.
>> I have downloaded Wikipedia data set and created chunks of that data(1 MB
>> each).
>> I am using one of the chunk file for as my input data for Wikipedia
>> example.
>>
>>
>> Steps I followed are :
>> 1.Created train input data set using one of the chunk of Wikipedia data
>> set and subjects.txt with the help of wikipediaDataSetCreator CLI.
>> 2.Repeated the first step but here the  used another chunk of Wikipedia
>> data set to create test input data.
>> 3.Train the classifier by passing train input data set.
>> 4.Test the classifier by passing train input data set as model and test
>> input data set as testdir.
>>
>> Now the issue is when I try to testclassifier by passing trained data set
>> as model and train input data set as testdir I am able to view the result in
>> form of confusion matrix.
>> But when I try to test classifier by passing by passing trained data set
>> as model and test input data set(which I have created in second step) as
>> testdir I get null pointer exception as shown in below mail.
>>
> Now I get what you are talking about. Can you do one thing. Can you train
> the model using the test input dataset and try to classify the test dataset.
> I want to check whether there is any corruption in the test dataset which is
> causing this NPE
>
>
>
>
>>
>>           Name                                                 Size
>>
>> Initial Train input data set                                 2 MB (two
>> chunks)
>> Initial Test input data set                                  1 MB (one
>> chunk)
>> Train data set after wikipediadatasetcreater(part-r-00000)   154 KB
>> Test data set after wikipediadatasetcreater(part-r-00000)    43 KB
>> Train model data set(trainer-thetaNormalizer)                1 KB
>> Train model data set(trainer-tfIdf)                          311 KB
>> Train model data set(trainer-weights\Sigma_j)                215 KB
>> Train model data set(trainer-weights\Sigma_kSigma_j)          1 KB
>> Train model data set(trainer-weights\Sigma_k)                 1 KB
>>
>>
>> The model sizes look fine.  Infact model loading didnt seem to have any
> issue as per the logs you posted
>
>> Hope I will get solution of my issue now.
>>
>> Thanks much
>> Regards,
>> Divya
>>
>>
>>
>>
>>
>>
>>
>> -----Original Message-----
>> From: Robin Anil [mailto:robin.anil@gmail.com]
>> Sent: Monday, November 29, 2010 7:34 PM
>> To: user@mahout.apache.org
>> Subject: Re: NPE in bayes wiki example
>>
>> Hi Divya, I am kind of overwhelmed by the flurry of emails from you and
>> the
>> replies. I am currently not able to make head and tail of the problem you
>> are facing. It would be really helpful if you can write a bit more about
>> the
>> input files the command your ran, the output files generated. their sizes,
>> and so on. and maybe use a single email-thread for all Bayes classifier
>> related problems. I guarantee you, I will be able to solve your issues
>> with
>> Bayes classifier much faster.
>>
>> Regards
>> Robin
>>
>> On Mon, Nov 29, 2010 at 12:54 PM, Divya <di...@k2associates.com.sg>
>> wrote:
>>
>> > Hi,
>> >
>> > Steps I followed are below :
>> >
>> > $  bin/mahout wikipediaDataSetCreator  -i
>> >
>> >
>> D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/Traininput
>> > -o examples/bi
>> > n/work/wikipedia/wikipediaClassification/train-subject -c
>> > $MAHOUT_HOME/examples/src/test/resources/subjects.txt
>> >
>> > $  bin/mahout wikipediaDataSetCreator  -i
>> >
>> D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/Testinput
>> > -o examples/bin
>> > /work/wikipedia/wikipediaClassification/test-subject -c
>> > $MAHOUT_HOME/examples/src/test/resources/subjects.txt
>> >
>> > $ bin/mahout trainclassifier -i
>> > examples/bin/work/wikipedia/wikipediaClassification/train-subject -o
>> > examples/bin/work/wikipedia/wikip
>> > ediaClassification/wikipedia-subject-model
>> >
>> > $ bin/mahout testclassifier -m
>> >
>> examples/bin/work/wikipedia/wikipediaClassification/wikipedia-subject-model
>> > -d examples/bin/work/wikipedia/wikipediaClassification/test-subject
>> >
>> >
>> > Regards,
>> > Divya
>> >
>> >
>> >
>> > -----Original Message-----
>> > From: Grant Ingersoll [mailto:gsingers@apache.org]
>> > Sent: Saturday, November 27, 2010 8:54 PM
>> > To: user@mahout.apache.org
>> > Subject: Re: NPE in bayes wiki example
>> >
>> > Can you provide all the steps you have done up to this point?
>> >
>> > -Grant
>> >
>> > On Nov 25, 2010, at 12:57 AM, Divya wrote:
>> >
>> > > Hi,
>> > >
>> > > I am getting null pointer exception when I pass my test input data to
>> > > testclassifier
>> > >
>> > >
>> > >
>> > > $ bin/mahout testclassifier -m
>> > >
>> >
>> examples/bin/work/wikipedia/wikipediaClassification/wikipedia-subject-model
>> > > -d examples/bin/work/wikipe
>> > >
>> > > dia/wikipediaClassification/test-subject
>> > >
>> > > Running on hadoop, using
>> HADOOP_HOME=C:\cygwin\home\Divya\hadoop-0.20.2
>> > >
>> > > HADOOP_CONF_DIR=C:\cygwin\home\Divya\hadoop-0.20.2\conf
>> > >
>> > > 10/11/25 13:51:36 INFO bayes.TestClassifier: Loading model from:
>> > > {basePath=examples/bin/work/wikipedia/wikipediaClassification/wikipedi
>> > >
>> > > a-subject-model, classifierType=bayes, alpha_i=1.0, dataSource=hdfs,
>> > > gramSize=1, verbose=false, encoding=UTF-8, defaultCat=unknown, tes
>> > >
>> > >
>> >
>> tDirPath=examples/bin/work/wikipedia/wikipediaClassification/test-subject}
>> > >
>> > > 10/11/25 13:51:36 INFO bayes.TestClassifier: Testing Bayes Classifier
>> > >
>> > > 10/11/25 13:51:38 INFO io.SequenceFileModelReader:
>> > >
>> >
>> >
>> file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki
>> > > pedia-su
>> > >
>> > > bject-model/trainer-weights/Sigma_j/part-00000
>> > >
>> > > 10/11/25 13:51:38 INFO io.SequenceFileModelReader:
>> > >
>> >
>> >
>> file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki
>> > > pedia-su
>> > >
>> > > bject-model/trainer-weights/Sigma_k/part-00000
>> > >
>> > > 10/11/25 13:51:38 INFO io.SequenceFileModelReader:
>> > >
>> >
>> >
>> file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki
>> > > pedia-su
>> > >
>> > > bject-model/trainer-weights/Sigma_kSigma_j/part-00000
>> > >
>> > > 10/11/25 13:51:38 INFO io.SequenceFileModelReader: 8.048212844092422
>> > >
>> > > 10/11/25 13:51:39 INFO io.SequenceFileModelReader:
>> > >
>> >
>> >
>> file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki
>> > > pedia-su
>> > >
>> > > bject-model/trainer-thetaNormalizer/part-00000
>> > >
>> > > 10/11/25 13:51:39 INFO io.SequenceFileModelReader:
>> > >
>> >
>> >
>> file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki
>> > > pedia-su
>> > >
>> > > bject-model/trainer-tfIdf/trainer-tfIdf/part-00000
>> > >
>> > > 10/11/25 13:51:39 INFO datastore.InMemoryBayesDatastore: history
>> > > -23722.080627413125 23722.080627413125 -1.0
>> > >
>> > > Exception in thread "main" java.lang.NullPointerException
>> > >
>> > >        at
>> > >
>> >
>> >
>> org.apache.mahout.classifier.ConfusionMatrix.getCount(ConfusionMatrix.java:1
>> > > 02)
>> > >
>> > >        at
>> > >
>> >
>> >
>> org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.
>> > > java:118)
>> > >
>> > >        at
>> > >
>> >
>> >
>> org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.
>> > > java:122)
>> > >
>> > >        at
>> > >
>> >
>> >
>> org.apache.mahout.classifier.ConfusionMatrix.addInstance(ConfusionMatrix.jav
>> > > a:90)
>> > >
>> > >        at
>> > >
>> >
>> >
>> org.apache.mahout.classifier.ResultAnalyzer.addInstance(ResultAnalyzer.java:
>> > > 68)
>> > >
>> > >        at
>> > >
>> >
>> >
>> org.apache.mahout.classifier.bayes.TestClassifier.classifySequential(TestCla
>> > > ssifier.java:266)
>> > >
>> > >        at
>> > >
>> >
>> >
>> org.apache.mahout.classifier.bayes.TestClassifier.main(TestClassifier.java:1
>> > > 86)
>> > >
>> > >        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> > >
>> > >        at
>> > >
>> >
>> >
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39
>> > > )
>> > >
>> > >        at
>> > >
>> >
>> >
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl
>> > > .java:25)
>> > >
>> > >        at java.lang.reflect.Method.invoke(Method.java:597)
>> > >
>> > >        at
>> > >
>> >
>> >
>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver
>> > > .java:68)
>> > >
>> > >        at
>> > > org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>> > >
>> > >        at
>> > org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:184)
>> > >
>> > >        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> > >
>> > >        at
>> > >
>> >
>> >
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39
>> > > )
>> > >
>> > >        at
>> > >
>> >
>> >
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl
>> > > .java:25)
>> > >
>> > >        at java.lang.reflect.Method.invoke(Method.java:597)
>> > >
>> > >        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>> > >
>> > >
>> > >
>> > > My classifier is subjects.txt which has two entries History and
>> Science.
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > but when I pass train input data I get to see the results
>> > >
>> > >
>> > >
>> > > $ bin/mahout testclassifier -m
>> > >
>> >
>> examples/bin/work/wikipedia/wikipediaClassification/wikipedia-subject-model
>> > > -d examples/bin/work/wikipe
>> > >
>> > > dia/wikipediaClassification/train-subject
>> > >
>> > > Running on hadoop, using
>> HADOOP_HOME=C:\cygwin\home\Divya\hadoop-0.20.2
>> > >
>> > > HADOOP_CONF_DIR=C:\cygwin\home\Divya\hadoop-0.20.2\conf
>> > >
>> > > 10/11/25 13:51:54 INFO bayes.TestClassifier: Loading model from:
>> > > {basePath=examples/bin/work/wikipedia/wikipediaClassification/wikipedi
>> > >
>> > > a-subject-model, classifierType=bayes, alpha_i=1.0, dataSource=hdfs,
>> > > gramSize=1, verbose=false, encoding=UTF-8, defaultCat=unknown, tes
>> > >
>> > >
>> >
>> tDirPath=examples/bin/work/wikipedia/wikipediaClassification/train-subject}
>> > >
>> > > 10/11/25 13:51:54 INFO bayes.TestClassifier: Testing Bayes Classifier
>> > >
>> > > 10/11/25 13:51:55 INFO io.SequenceFileModelReader:
>> > >
>> >
>> >
>> file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki
>> > > pedia-su
>> > >
>> > > bject-model/trainer-weights/Sigma_j/part-00000
>> > >
>> > > 10/11/25 13:51:55 INFO io.SequenceFileModelReader:
>> > >
>> >
>> >
>> file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki
>> > > pedia-su
>> > >
>> > > bject-model/trainer-weights/Sigma_k/part-00000
>> > >
>> > > 10/11/25 13:51:55 INFO io.SequenceFileModelReader:
>> > >
>> >
>> >
>> file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki
>> > > pedia-su
>> > >
>> > > bject-model/trainer-weights/Sigma_kSigma_j/part-00000
>> > >
>> > > 10/11/25 13:51:55 INFO io.SequenceFileModelReader: 8.048212844092422
>> > >
>> > > 10/11/25 13:51:55 INFO io.SequenceFileModelReader:
>> > >
>> >
>> >
>> file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki
>> > > pedia-su
>> > >
>> > > bject-model/trainer-thetaNormalizer/part-00000
>> > >
>> > > 10/11/25 13:51:55 INFO io.SequenceFileModelReader:
>> > >
>> >
>> >
>> file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki
>> > > pedia-su
>> > >
>> > > bject-model/trainer-tfIdf/trainer-tfIdf/part-00000
>> > >
>> > > 10/11/25 13:51:55 INFO datastore.InMemoryBayesDatastore: history
>> > > -23722.080627413125 23722.080627413125 -1.0
>> > >
>> > > 10/11/25 13:51:55 INFO bayes.TestClassifier: Classified instances from
>> > > part-r-00000
>> > >
>> > > 10/11/25 13:51:55 INFO bayes.TestClassifier:
>> > > =======================================================
>> > >
>> > > Summary
>> > >
>> > > -------------------------------------------------------
>> > >
>> > > Correctly Classified Instances          :          2           100%
>> > >
>> > > Incorrectly Classified Instances        :          0             0%
>> > >
>> > > Total Classified Instances              :          2
>> > >
>> > >
>> > >
>> > > =======================================================
>> > >
>> > > Confusion Matrix
>> > >
>> > > -------------------------------------------------------
>> > >
>> > > a       <--Classified as
>> > >
>> > > 2        |  2           a     = history
>> > >
>> > > Default Category: unknown: 1
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > 10/11/25 13:51:55 INFO driver.MahoutDriver: Program took 953 ms
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > Can someone please explain the reason behind it.
>> > >
>> > >
>> > >
>> > > Thanks
>> > >
>> > > Regards,
>> > >
>> > > Divya
>> > >
>> >
>> > --------------------------
>> > Grant Ingersoll
>> > http://www.lucidimagination.com/
>> >
>> > Search the Lucene ecosystem docs using Solr/Lucene:
>> > http://www.lucidimagination.com/search
>> >
>> >
>> >
>>
>>
>


RE: NPE in bayes wiki example

Posted by Divya <di...@k2associates.com.sg>.
Thanks Robin

-----Original Message-----
From: Robin Anil [mailto:robin.anil@gmail.com] 
Sent: Wednesday, December 08, 2010 1:09 PM
To: user@mahout.apache.org
Subject: Re: NPE in bayes wiki example

Did you try what I mentioned?

On Tue, Nov 30, 2010 at 8:11 AM, Robin Anil <ro...@gmail.com> wrote:

>
> On Tue, Nov 30, 2010 at 7:47 AM, Divya <di...@k2associates.com.sg> wrote:
>
>> Hi,
>>
>> Thanks for the advice Robin.
>> But most of the time I don’t get response of issues I am facing that’s why
>>  I reframe it and post it again.
>>
> The responses are usually delayed based on availability of free time for
> all of us. Mahout community is made up of people who contribute as much as
> they can when they find time as it is not part of our day to day work. So in
> the time we get, unless we see details of the problem, we can't do anything
> other than ask you again for details and this round trip keeps the
> conversation going. I can point to many tutorials(even I read through them
> before hacking away on Mahout) like this one
> http://www.catb.org/~esr/faqs/smart-questions.html which will help you
> understand a bit more of why people behave on mailing lists they way you
> would have perceived.
>
>>
>
> May someone can understand my problem and would be able to help me.
>> As I am new bee to Mahout and don’t have any experience in this field.
>>
>> We do want more new-bees coming in to Mahout :)
>
>> I am trying run the Wikipedia classification example.
>> I have downloaded Wikipedia data set and created chunks of that data(1 MB
>> each).
>> I am using one of the chunk file for as my input data for Wikipedia
>> example.
>>
>>
>> Steps I followed are :
>> 1.Created train input data set using one of the chunk of Wikipedia data
>> set and subjects.txt with the help of wikipediaDataSetCreator CLI.
>> 2.Repeated the first step but here the  used another chunk of Wikipedia
>> data set to create test input data.
>> 3.Train the classifier by passing train input data set.
>> 4.Test the classifier by passing train input data set as model and test
>> input data set as testdir.
>>
>> Now the issue is when I try to testclassifier by passing trained data set
>> as model and train input data set as testdir I am able to view the result in
>> form of confusion matrix.
>> But when I try to test classifier by passing by passing trained data set
>> as model and test input data set(which I have created in second step) as
>> testdir I get null pointer exception as shown in below mail.
>>
> Now I get what you are talking about. Can you do one thing. Can you train
> the model using the test input dataset and try to classify the test dataset.
> I want to check whether there is any corruption in the test dataset which is
> causing this NPE
>
>
>
>
>>
>>           Name                                                 Size
>>
>> Initial Train input data set                                 2 MB (two
>> chunks)
>> Initial Test input data set                                  1 MB (one
>> chunk)
>> Train data set after wikipediadatasetcreater(part-r-00000)   154 KB
>> Test data set after wikipediadatasetcreater(part-r-00000)    43 KB
>> Train model data set(trainer-thetaNormalizer)                1 KB
>> Train model data set(trainer-tfIdf)                          311 KB
>> Train model data set(trainer-weights\Sigma_j)                215 KB
>> Train model data set(trainer-weights\Sigma_kSigma_j)          1 KB
>> Train model data set(trainer-weights\Sigma_k)                 1 KB
>>
>>
>> The model sizes look fine.  Infact model loading didnt seem to have any
> issue as per the logs you posted
>
>> Hope I will get solution of my issue now.
>>
>> Thanks much
>> Regards,
>> Divya
>>
>>
>>
>>
>>
>>
>>
>> -----Original Message-----
>> From: Robin Anil [mailto:robin.anil@gmail.com]
>> Sent: Monday, November 29, 2010 7:34 PM
>> To: user@mahout.apache.org
>> Subject: Re: NPE in bayes wiki example
>>
>> Hi Divya, I am kind of overwhelmed by the flurry of emails from you and
>> the
>> replies. I am currently not able to make head and tail of the problem you
>> are facing. It would be really helpful if you can write a bit more about
>> the
>> input files the command your ran, the output files generated. their sizes,
>> and so on. and maybe use a single email-thread for all Bayes classifier
>> related problems. I guarantee you, I will be able to solve your issues
>> with
>> Bayes classifier much faster.
>>
>> Regards
>> Robin
>>
>> On Mon, Nov 29, 2010 at 12:54 PM, Divya <di...@k2associates.com.sg>
>> wrote:
>>
>> > Hi,
>> >
>> > Steps I followed are below :
>> >
>> > $  bin/mahout wikipediaDataSetCreator  -i
>> >
>> >
>> D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/Traininput
>> > -o examples/bi
>> > n/work/wikipedia/wikipediaClassification/train-subject -c
>> > $MAHOUT_HOME/examples/src/test/resources/subjects.txt
>> >
>> > $  bin/mahout wikipediaDataSetCreator  -i
>> >
>> D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/Testinput
>> > -o examples/bin
>> > /work/wikipedia/wikipediaClassification/test-subject -c
>> > $MAHOUT_HOME/examples/src/test/resources/subjects.txt
>> >
>> > $ bin/mahout trainclassifier -i
>> > examples/bin/work/wikipedia/wikipediaClassification/train-subject -o
>> > examples/bin/work/wikipedia/wikip
>> > ediaClassification/wikipedia-subject-model
>> >
>> > $ bin/mahout testclassifier -m
>> >
>> examples/bin/work/wikipedia/wikipediaClassification/wikipedia-subject-model
>> > -d examples/bin/work/wikipedia/wikipediaClassification/test-subject
>> >
>> >
>> > Regards,
>> > Divya
>> >
>> >
>> >
>> > -----Original Message-----
>> > From: Grant Ingersoll [mailto:gsingers@apache.org]
>> > Sent: Saturday, November 27, 2010 8:54 PM
>> > To: user@mahout.apache.org
>> > Subject: Re: NPE in bayes wiki example
>> >
>> > Can you provide all the steps you have done up to this point?
>> >
>> > -Grant
>> >
>> > On Nov 25, 2010, at 12:57 AM, Divya wrote:
>> >
>> > > Hi,
>> > >
>> > > I am getting null pointer exception when I pass my test input data to
>> > > testclassifier
>> > >
>> > >
>> > >
>> > > $ bin/mahout testclassifier -m
>> > >
>> >
>> examples/bin/work/wikipedia/wikipediaClassification/wikipedia-subject-model
>> > > -d examples/bin/work/wikipe
>> > >
>> > > dia/wikipediaClassification/test-subject
>> > >
>> > > Running on hadoop, using
>> HADOOP_HOME=C:\cygwin\home\Divya\hadoop-0.20.2
>> > >
>> > > HADOOP_CONF_DIR=C:\cygwin\home\Divya\hadoop-0.20.2\conf
>> > >
>> > > 10/11/25 13:51:36 INFO bayes.TestClassifier: Loading model from:
>> > > {basePath=examples/bin/work/wikipedia/wikipediaClassification/wikipedi
>> > >
>> > > a-subject-model, classifierType=bayes, alpha_i=1.0, dataSource=hdfs,
>> > > gramSize=1, verbose=false, encoding=UTF-8, defaultCat=unknown, tes
>> > >
>> > >
>> >
>> tDirPath=examples/bin/work/wikipedia/wikipediaClassification/test-subject}
>> > >
>> > > 10/11/25 13:51:36 INFO bayes.TestClassifier: Testing Bayes Classifier
>> > >
>> > > 10/11/25 13:51:38 INFO io.SequenceFileModelReader:
>> > >
>> >
>> >
>> file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki
>> > > pedia-su
>> > >
>> > > bject-model/trainer-weights/Sigma_j/part-00000
>> > >
>> > > 10/11/25 13:51:38 INFO io.SequenceFileModelReader:
>> > >
>> >
>> >
>> file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki
>> > > pedia-su
>> > >
>> > > bject-model/trainer-weights/Sigma_k/part-00000
>> > >
>> > > 10/11/25 13:51:38 INFO io.SequenceFileModelReader:
>> > >
>> >
>> >
>> file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki
>> > > pedia-su
>> > >
>> > > bject-model/trainer-weights/Sigma_kSigma_j/part-00000
>> > >
>> > > 10/11/25 13:51:38 INFO io.SequenceFileModelReader: 8.048212844092422
>> > >
>> > > 10/11/25 13:51:39 INFO io.SequenceFileModelReader:
>> > >
>> >
>> >
>> file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki
>> > > pedia-su
>> > >
>> > > bject-model/trainer-thetaNormalizer/part-00000
>> > >
>> > > 10/11/25 13:51:39 INFO io.SequenceFileModelReader:
>> > >
>> >
>> >
>> file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki
>> > > pedia-su
>> > >
>> > > bject-model/trainer-tfIdf/trainer-tfIdf/part-00000
>> > >
>> > > 10/11/25 13:51:39 INFO datastore.InMemoryBayesDatastore: history
>> > > -23722.080627413125 23722.080627413125 -1.0
>> > >
>> > > Exception in thread "main" java.lang.NullPointerException
>> > >
>> > >        at
>> > >
>> >
>> >
>> org.apache.mahout.classifier.ConfusionMatrix.getCount(ConfusionMatrix.java:1
>> > > 02)
>> > >
>> > >        at
>> > >
>> >
>> >
>> org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.
>> > > java:118)
>> > >
>> > >        at
>> > >
>> >
>> >
>> org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.
>> > > java:122)
>> > >
>> > >        at
>> > >
>> >
>> >
>> org.apache.mahout.classifier.ConfusionMatrix.addInstance(ConfusionMatrix.jav
>> > > a:90)
>> > >
>> > >        at
>> > >
>> >
>> >
>> org.apache.mahout.classifier.ResultAnalyzer.addInstance(ResultAnalyzer.java:
>> > > 68)
>> > >
>> > >        at
>> > >
>> >
>> >
>> org.apache.mahout.classifier.bayes.TestClassifier.classifySequential(TestCla
>> > > ssifier.java:266)
>> > >
>> > >        at
>> > >
>> >
>> >
>> org.apache.mahout.classifier.bayes.TestClassifier.main(TestClassifier.java:1
>> > > 86)
>> > >
>> > >        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> > >
>> > >        at
>> > >
>> >
>> >
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39
>> > > )
>> > >
>> > >        at
>> > >
>> >
>> >
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl
>> > > .java:25)
>> > >
>> > >        at java.lang.reflect.Method.invoke(Method.java:597)
>> > >
>> > >        at
>> > >
>> >
>> >
>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver
>> > > .java:68)
>> > >
>> > >        at
>> > > org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>> > >
>> > >        at
>> > org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:184)
>> > >
>> > >        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> > >
>> > >        at
>> > >
>> >
>> >
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39
>> > > )
>> > >
>> > >        at
>> > >
>> >
>> >
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl
>> > > .java:25)
>> > >
>> > >        at java.lang.reflect.Method.invoke(Method.java:597)
>> > >
>> > >        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>> > >
>> > >
>> > >
>> > > My classifier is subjects.txt which has two entries History and
>> Science.
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > but when I pass train input data I get to see the results
>> > >
>> > >
>> > >
>> > > $ bin/mahout testclassifier -m
>> > >
>> >
>> examples/bin/work/wikipedia/wikipediaClassification/wikipedia-subject-model
>> > > -d examples/bin/work/wikipe
>> > >
>> > > dia/wikipediaClassification/train-subject
>> > >
>> > > Running on hadoop, using
>> HADOOP_HOME=C:\cygwin\home\Divya\hadoop-0.20.2
>> > >
>> > > HADOOP_CONF_DIR=C:\cygwin\home\Divya\hadoop-0.20.2\conf
>> > >
>> > > 10/11/25 13:51:54 INFO bayes.TestClassifier: Loading model from:
>> > > {basePath=examples/bin/work/wikipedia/wikipediaClassification/wikipedi
>> > >
>> > > a-subject-model, classifierType=bayes, alpha_i=1.0, dataSource=hdfs,
>> > > gramSize=1, verbose=false, encoding=UTF-8, defaultCat=unknown, tes
>> > >
>> > >
>> >
>> tDirPath=examples/bin/work/wikipedia/wikipediaClassification/train-subject}
>> > >
>> > > 10/11/25 13:51:54 INFO bayes.TestClassifier: Testing Bayes Classifier
>> > >
>> > > 10/11/25 13:51:55 INFO io.SequenceFileModelReader:
>> > >
>> >
>> >
>> file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki
>> > > pedia-su
>> > >
>> > > bject-model/trainer-weights/Sigma_j/part-00000
>> > >
>> > > 10/11/25 13:51:55 INFO io.SequenceFileModelReader:
>> > >
>> >
>> >
>> file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki
>> > > pedia-su
>> > >
>> > > bject-model/trainer-weights/Sigma_k/part-00000
>> > >
>> > > 10/11/25 13:51:55 INFO io.SequenceFileModelReader:
>> > >
>> >
>> >
>> file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki
>> > > pedia-su
>> > >
>> > > bject-model/trainer-weights/Sigma_kSigma_j/part-00000
>> > >
>> > > 10/11/25 13:51:55 INFO io.SequenceFileModelReader: 8.048212844092422
>> > >
>> > > 10/11/25 13:51:55 INFO io.SequenceFileModelReader:
>> > >
>> >
>> >
>> file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki
>> > > pedia-su
>> > >
>> > > bject-model/trainer-thetaNormalizer/part-00000
>> > >
>> > > 10/11/25 13:51:55 INFO io.SequenceFileModelReader:
>> > >
>> >
>> >
>> file:/D:/mahout-0.4/examples/bin/work/wikipedia/wikipediaClassification/wiki
>> > > pedia-su
>> > >
>> > > bject-model/trainer-tfIdf/trainer-tfIdf/part-00000
>> > >
>> > > 10/11/25 13:51:55 INFO datastore.InMemoryBayesDatastore: history
>> > > -23722.080627413125 23722.080627413125 -1.0
>> > >
>> > > 10/11/25 13:51:55 INFO bayes.TestClassifier: Classified instances from
>> > > part-r-00000
>> > >
>> > > 10/11/25 13:51:55 INFO bayes.TestClassifier:
>> > > =======================================================
>> > >
>> > > Summary
>> > >
>> > > -------------------------------------------------------
>> > >
>> > > Correctly Classified Instances          :          2           100%
>> > >
>> > > Incorrectly Classified Instances        :          0             0%
>> > >
>> > > Total Classified Instances              :          2
>> > >
>> > >
>> > >
>> > > =======================================================
>> > >
>> > > Confusion Matrix
>> > >
>> > > -------------------------------------------------------
>> > >
>> > > a       <--Classified as
>> > >
>> > > 2        |  2           a     = history
>> > >
>> > > Default Category: unknown: 1
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > 10/11/25 13:51:55 INFO driver.MahoutDriver: Program took 953 ms
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > Can someone please explain the reason behind it.
>> > >
>> > >
>> > >
>> > > Thanks
>> > >
>> > > Regards,
>> > >
>> > > Divya
>> > >
>> >
>> > --------------------------
>> > Grant Ingersoll
>> > http://www.lucidimagination.com/
>> >
>> > Search the Lucene ecosystem docs using Solr/Lucene:
>> > http://www.lucidimagination.com/search
>> >
>> >
>> >
>>
>>
>