You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Vijay Santhanam <vi...@gmail.com> on 2011/07/04 10:52:15 UTC

20news

Hi All,

I'm new to Mahout and I'm interested in experimenting with it's classifiers.

Right now, I'm just trying to get up and running with the demo's and
examples.

After checking out the mahout trunk, I've tried running the classification
example 20news, but after running the ./examples/bin/build/20news-bayes.sh
script I get the following error during the classification phase.

Does anyone else get the same thing? Or have any recommendations about how
to fix it?
I'd just like to get a sample classifier working before I embark on my own
classification journey.


INFO: Loading model from:
{basePath=examples/bin/work/20news-bydate/bayes-model, classifierType=bayes,
alpha_i=1.0, dataSource=hdfs, gramSize=1, verbose=false, encoding=UTF-8,
defaultCat=unknown,
testDirPath=examples/bin/work/20news-bydate/bayes-test-input}
Jul 4, 2011 6:28:25 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: Testing Bayes Classifier
Jul 4, 2011 6:28:27 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: Read 50000 feature weights
Jul 4, 2011 6:28:27 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: Read 100000 feature weights
Jul 4, 2011 6:28:28 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: 193370.88331085522
Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: rec.sport.baseball -129829.34738930278 531784.7805631821
-0.2441388925268003
Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: sci.crypt -193023.42370049533 531784.7805631821 -0.3629728242618669
Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: rec.sport.hockey -167853.6159738822 531784.7805631821
-0.31564200802459647
Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: talk.politics.guns -203524.0148974065 531784.7805631821
-0.3827187658170024
Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: soc.religion.christian -163900.9258713857 531784.7805631821
-0.308209132457322
Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: sci.electronics -142854.1677345925 531784.7805631821
-0.26863154598614886
Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: comp.os.ms-windows.misc -531784.7805631821 531784.7805631821 -1.0
Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: misc.forsale -143454.70176448982 531784.7805631821
-0.26976082619845826
Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: talk.religion.misc -139428.73484148504 531784.7805631821
-0.2621901565024562
Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: alt.atheism -139569.06867597546 531784.7805631821 -0.2624540486626301
Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: comp.windows.x -178029.10523376046 531784.7805631821
-0.33477660839638973
Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: talk.politics.mideast -193075.00789450994 531784.7805631821
-0.36306982627452317
Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: comp.sys.ibm.pc.hardware -138410.02049984262 531784.7805631821
-0.2602745049477736
Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: comp.sys.mac.hardware -125200.9927438868 531784.7805631821
-0.23543545682389364
Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: sci.space -192437.0009266271 531784.7805631821 -0.3618700797018455
Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: rec.motorcycles -143142.20855440624 531784.7805631821
-0.26917319522159455
Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: rec.autos -141800.97549909537 531784.7805631821 -0.2666510601317365
Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: comp.graphics -166882.18654471825 531784.7805631821
-0.3138152738556811
Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: talk.politics.misc -165196.84193278523 531784.7805631821
-0.3106460507535303
Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: sci.med -192698.5183245711 531784.7805631821 -0.36236185270382393
Exception in thread "main" java.lang.IllegalArgumentException: Label not
found: alt.atheism from
 at
com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
at
org.apache.mahout.classifier.ConfusionMatrix.getCount(ConfusionMatrix.java:93)
 at
org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:113)
at
org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:117)
 at
org.apache.mahout.classifier.ConfusionMatrix.addInstance(ConfusionMatrix.java:85)
at
org.apache.mahout.classifier.ResultAnalyzer.addInstance(ResultAnalyzer.java:67)
 at
org.apache.mahout.classifier.bayes.TestClassifier.classifySequential(TestClassifier.java:244)
at
org.apache.mahout.classifier.bayes.TestClassifier.main(TestClassifier.java:177)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
 at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
 at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)


Any help is great appreciated.

Regards,
-- 
 Vijay Santhanam
 Software Engineer

Re: 20news

Posted by Sean Owen <sr...@gmail.com>.
I committed a change to make the parsing bits I found in .bayes. use space
and tab. You can try again. I confess I don't know this code and there's a
lot of little pieces of parsing here and there so don't know if this is the
heart of the issue.

On Mon, Jul 4, 2011 at 4:08 PM, Vijay Santhanam
<vi...@gmail.com>wrote:

> Hi Sean,
>
> Thanks for responding.
>
> I would expect the sequential classifer tokenizer to be identical to what's
> used in the parallel classifier tokenizer.
>
> If that's not possible, then NGrams should perhaps be configurable with
> where it finds it's first token (i.e. the label).
>
> I'm very new to hadoop and this world, so I'm not sure what I'm looking at
> when it the classifier goes into mapreduce execution.
>
> -V
>
>
>

Re: 20news

Posted by Vijay Santhanam <vi...@gmail.com>.
Hi Sean,

Thanks for responding.

I would expect the sequential classifer tokenizer to be identical to what's
used in the parallel classifier tokenizer.

If that's not possible, then NGrams should perhaps be configurable with
where it finds it's first token (i.e. the label).

I'm very new to hadoop and this world, so I'm not sure what I'm looking at
when it the classifier goes into mapreduce execution.

-V


On Tue, Jul 5, 2011 at 12:46 AM, Sean Owen <sr...@gmail.com> wrote:

> This could be my doing. I noticed that various bits of code split
> input files in different ways: StringTokenizer, Pattern, Splitter. And
> using different delimiters: space, space/tab, or the weird collection
> of delimiters from StringTokenizer. (BTW StringTokenizer is all but
> deprecated for this reason.) So I tried to move towards Splitter, or
> Pattern where that made more sense.
>
> So I have recently tried to standardize how things like NGrams
> tokenizes to make it all work more the same. I tried to guess and
> preserve the intent of the tokenization, but it did change in several
> places as a result, and this could be the issue here.
>
> So: what class is tokenizing, what do you expect it tokenize on? We
> can easily add "tab" to what NGrams tokenizes on for instance.
>
> Sean
>
> On Mon, Jul 4, 2011 at 1:23 PM, Robin Anil <ro...@gmail.com> wrote:
> > Are you using some non-standard Java character encoding?
> >
> >
> > On Mon, Jul 4, 2011 at 5:23 PM, Vijay Santhanam
> > <vi...@gmail.com>wrote:
> >
> >> Hi,
> >>
> >> Okay, I replaced all the tab characters with space characters for each
> file
> >> in the bayes-test-input folder and now the classifier completes without
> >> error.
> >>
> >> Tomorrow I'll investigate why the trainer correctly parses the
> >> tab-separated
> >> label correctly, but the classifier does not. Actually, I know why the
> >> classifier doesn't extract the correct label--- because
> >> org.apache.mahout.common.nlp.NGrams tokenizes via spaces only.
> >>
> >> The other mystery is why it works for everyone else except poor me :(
> >>
> >> If anyone has any ideas I'd love to hear it.
> >>
> >> Cheers,
> >> Vijay
> >>
> >>
> >>
> >> On Mon, Jul 4, 2011 at 9:16 PM, Vijay Santhanam
> >> <vi...@gmail.com>wrote:
> >>
> >> > Hi,
> >> >
> >> > I got debugger running w/ eclipse so I could watch what was happening
> >> under
> >> > the hood.
> >> >
> >> > Here's the exception again
> >> > Exception in thread "main" java.lang.IllegalArgumentException: Label
> not
> >> > found: alt.atheism from
> >> >  at
> >> >
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
> >> > at
> >> >
> >>
> org.apache.mahout.classifier.ConfusionMatrix.getCount(ConfusionMatrix.java:93)
> >> >  at
> >> >
> >>
> org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:113)
> >> > at
> >> >
> >>
> org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:117)
> >> >  at
> >> >
> >>
> org.apache.mahout.classifier.ConfusionMatrix.addInstance(ConfusionMatrix.java:85)
> >> > at
> >> >
> >>
> org.apache.mahout.classifier.ResultAnalyzer.addInstance(ResultAnalyzer.java:67)
> >> >  at
> >> >
> >>
> org.apache.mahout.classifier.bayes.TestClassifier.classifySequential(TestClassifier.java:244)
> >> > at
> >> >
> >>
> org.apache.mahout.classifier.bayes.TestClassifier.main(TestClassifier.java:177)
> >> >  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >> > at
> >> >
> >>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> >> >  at
> >> >
> >>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >> > at java.lang.reflect.Method.invoke(Method.java:597)
> >> >  at
> >> >
> >>
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> >> > at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> >> >  at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
> >> >
> >> > Notice the "Label not found: alt.atheism\tfrom"
> >> >
> >> > That's an invalid label in the confusion matrix. I think it SHOULD be
> >> just
> >> > alt.atheism. I'm not sure how the \tfrom is getting in there, but it
> is.
> >> > Perhaps it has something to do with the way my test data was
> formatted.
> >> >
> >> > I'll keep digging....
> >> >
> >> > Thanks,
> >> > Vijay
> >> >
> >> >
>



-- 
 Vijay Santhanam
 Software Engineer
 http://au.linkedin.com/in/vijaysanthanam
 0407525087

Re: 20news

Posted by Sean Owen <sr...@gmail.com>.
This could be my doing. I noticed that various bits of code split
input files in different ways: StringTokenizer, Pattern, Splitter. And
using different delimiters: space, space/tab, or the weird collection
of delimiters from StringTokenizer. (BTW StringTokenizer is all but
deprecated for this reason.) So I tried to move towards Splitter, or
Pattern where that made more sense.

So I have recently tried to standardize how things like NGrams
tokenizes to make it all work more the same. I tried to guess and
preserve the intent of the tokenization, but it did change in several
places as a result, and this could be the issue here.

So: what class is tokenizing, what do you expect it tokenize on? We
can easily add "tab" to what NGrams tokenizes on for instance.

Sean

On Mon, Jul 4, 2011 at 1:23 PM, Robin Anil <ro...@gmail.com> wrote:
> Are you using some non-standard Java character encoding?
>
>
> On Mon, Jul 4, 2011 at 5:23 PM, Vijay Santhanam
> <vi...@gmail.com>wrote:
>
>> Hi,
>>
>> Okay, I replaced all the tab characters with space characters for each file
>> in the bayes-test-input folder and now the classifier completes without
>> error.
>>
>> Tomorrow I'll investigate why the trainer correctly parses the
>> tab-separated
>> label correctly, but the classifier does not. Actually, I know why the
>> classifier doesn't extract the correct label--- because
>> org.apache.mahout.common.nlp.NGrams tokenizes via spaces only.
>>
>> The other mystery is why it works for everyone else except poor me :(
>>
>> If anyone has any ideas I'd love to hear it.
>>
>> Cheers,
>> Vijay
>>
>>
>>
>> On Mon, Jul 4, 2011 at 9:16 PM, Vijay Santhanam
>> <vi...@gmail.com>wrote:
>>
>> > Hi,
>> >
>> > I got debugger running w/ eclipse so I could watch what was happening
>> under
>> > the hood.
>> >
>> > Here's the exception again
>> > Exception in thread "main" java.lang.IllegalArgumentException: Label not
>> > found: alt.atheism from
>> >  at
>> > com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
>> > at
>> >
>> org.apache.mahout.classifier.ConfusionMatrix.getCount(ConfusionMatrix.java:93)
>> >  at
>> >
>> org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:113)
>> > at
>> >
>> org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:117)
>> >  at
>> >
>> org.apache.mahout.classifier.ConfusionMatrix.addInstance(ConfusionMatrix.java:85)
>> > at
>> >
>> org.apache.mahout.classifier.ResultAnalyzer.addInstance(ResultAnalyzer.java:67)
>> >  at
>> >
>> org.apache.mahout.classifier.bayes.TestClassifier.classifySequential(TestClassifier.java:244)
>> > at
>> >
>> org.apache.mahout.classifier.bayes.TestClassifier.main(TestClassifier.java:177)
>> >  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> > at
>> >
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>> >  at
>> >
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> > at java.lang.reflect.Method.invoke(Method.java:597)
>> >  at
>> >
>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>> > at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>> >  at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
>> >
>> > Notice the "Label not found: alt.atheism\tfrom"
>> >
>> > That's an invalid label in the confusion matrix. I think it SHOULD be
>> just
>> > alt.atheism. I'm not sure how the \tfrom is getting in there, but it is.
>> > Perhaps it has something to do with the way my test data was formatted.
>> >
>> > I'll keep digging....
>> >
>> > Thanks,
>> > Vijay
>> >
>> >

Re: 20news

Posted by Vijay Santhanam <vi...@gmail.com>.
Sorry, I think I asked the wrong question.

I'm asking about training and classification (i.e. post-preparation -- after
steps 3/4 in https://cwiki.apache.org/MAHOUT/twenty-newsgroups.html) phases.

In parallel mode, what class is used for extracting the "label" for a
document?
In sequential mode, NGrams is used which is extracting the "label" before
the first space character. -- I worked that out by following the debugger.

The problem is with using NGrams for extracting a document's label in
sequential mode.

I guess if you can successfully train/classify the 20news example in
sequential mode, something is up with my UTF8 encoding on my mac.

Do you follow me?




On Mon, Jul 4, 2011 at 10:29 PM, Robin Anil <ro...@gmail.com> wrote:

> We are using the default lucene tokenizer. You can also pass in a tokenizer
> via the command line.
>
>
>
> On Mon, Jul 4, 2011 at 5:55 PM, Vijay Santhanam
> <vi...@gmail.com>wrote:
>
> > No sir.
> >
> > UTF-8 all the way.
> >
> > When doing non-sequential training and classification, what class is used
> > for tokenization?
> >
> > I get the feeling different tokenizer classes are used for sequential and
> > parallel training/classification.
> >
> >
> >
> >
> > On Mon, Jul 4, 2011 at 10:23 PM, Robin Anil <ro...@gmail.com>
> wrote:
> >
> > > Are you using some non-standard Java character encoding?
> > >
> > >
> > > On Mon, Jul 4, 2011 at 5:23 PM, Vijay Santhanam
> > > <vi...@gmail.com>wrote:
> > >
> > > > Hi,
> > > >
> > > > Okay, I replaced all the tab characters with space characters for
> each
> > > file
> > > > in the bayes-test-input folder and now the classifier completes
> without
> > > > error.
> > > >
> > > > Tomorrow I'll investigate why the trainer correctly parses the
> > > > tab-separated
> > > > label correctly, but the classifier does not. Actually, I know why
> the
> > > > classifier doesn't extract the correct label--- because
> > > > org.apache.mahout.common.nlp.NGrams tokenizes via spaces only.
> > > >
> > > > The other mystery is why it works for everyone else except poor me :(
> > > >
> > > > If anyone has any ideas I'd love to hear it.
> > > >
> > > > Cheers,
> > > > Vijay
> > > >
> > > >
> > > >
> > > > On Mon, Jul 4, 2011 at 9:16 PM, Vijay Santhanam
> > > > <vi...@gmail.com>wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I got debugger running w/ eclipse so I could watch what was
> happening
> > > > under
> > > > > the hood.
> > > > >
> > > > > Here's the exception again
> > > > > Exception in thread "main" java.lang.IllegalArgumentException:
> Label
> > > not
> > > > > found: alt.atheism from
> > > > >  at
> > > > >
> > >
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.mahout.classifier.ConfusionMatrix.getCount(ConfusionMatrix.java:93)
> > > > >  at
> > > > >
> > > >
> > >
> >
> org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:113)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:117)
> > > > >  at
> > > > >
> > > >
> > >
> >
> org.apache.mahout.classifier.ConfusionMatrix.addInstance(ConfusionMatrix.java:85)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.mahout.classifier.ResultAnalyzer.addInstance(ResultAnalyzer.java:67)
> > > > >  at
> > > > >
> > > >
> > >
> >
> org.apache.mahout.classifier.bayes.TestClassifier.classifySequential(TestClassifier.java:244)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.mahout.classifier.bayes.TestClassifier.main(TestClassifier.java:177)
> > > > >  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > > > > at
> > > > >
> > > >
> > >
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> > > > >  at
> > > > >
> > > >
> > >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> > > > > at java.lang.reflect.Method.invoke(Method.java:597)
> > > > >  at
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> > > > > at
> > org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> > > > >  at
> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
> > > > >
> > > > > Notice the "Label not found: alt.atheism\tfrom"
> > > > >
> > > > > That's an invalid label in the confusion matrix. I think it SHOULD
> be
> > > > just
> > > > > alt.atheism. I'm not sure how the \tfrom is getting in there, but
> it
> > > is.
> > > > > Perhaps it has something to do with the way my test data was
> > formatted.
> > > > >
> > > > > I'll keep digging....
> > > > >
> > > > > Thanks,
> > > > > Vijay
> > > > >
> > > > >
> > > > >
> > > > > On Mon, Jul 4, 2011 at 8:52 PM, Vijay Santhanam <
> > > > vijay.santhanam@gmail.com
> > > > > > wrote:
> > > > >
> > > > >> Hi Robin,
> > > > >>
> > > > >> The console dump was a too large for pastebin, so I uploaded it
> here
> > > --
> > > > >>
> > http://dl.dropbox.com/u/7881451/build-20news-bayes-console-output.txt
> > > > >>
> > > > >> I performed a fresh checkout only hours ago, and I used script
> > > > >> examples/bin/build-20news-bayes.sh
> > > > >> I've opted to avoid hadoop, but from what I can tell the model was
> > > > created
> > > > >> with success.
> > > > >>
> > > > >>
> > > > >> Thanks,
> > > > >> Vijay
> > > > >>
> > > > >>
> > > > >> On Mon, Jul 4, 2011 at 8:28 PM, Robin Anil <ro...@gmail.com>
> > > > wrote:
> > > > >>
> > > > >>> Can you send me the console dump
> > > > >>> Command line + Log written by the program and put it on say
> > pastebin
> > > > >>>
> > > > >>> Robin
> > > > >>>
> > > > >>> On Mon, Jul 4, 2011 at 3:48 PM, Vijay Santhanam
> > > > >>> <vi...@gmail.com>wrote:
> > > > >>>
> > > > >>> > I tried deleting all the folders from the test and train data
> > > except
> > > > >>> for
> > > > >>> > alt.atheism, but I get the identical error.
> > > > >>> >
> > > > >>> > I might try debugging the problem in eclipse rather than from
> > > > >>> commandline,
> > > > >>> > but Eclipse doesn't quite want to work either.
> > > > >>> >
> > > > >>> >
> > > > >>> > On Mon, Jul 4, 2011 at 8:02 PM, Vijay Santhanam
> > > > >>> > <vi...@gmail.com>wrote:
> > > > >>> >
> > > > >>> > > Thanks anyway Sergey. Could you perhaps upload your
> bayes-model
> > > > >>> folder so
> > > > >>> > I
> > > > >>> > > could try that out?
> > > > >>> > >
> > > > >>> > >
> > > > >>> > >
> > > > >>> > > On Mon, Jul 4, 2011 at 7:57 PM, Sergey Bartunov <sbos.net@
> > > > gmail.com
> > > > >>> > >wrote:
> > > > >>> > >
> > > > >>> > >> Well, that's strange. Sorry, I can't help you at the moment,
> > > maybe
> > > > >>> > >> someone else in the mailing list could.
> > > > >>> > >>
> > > > >>> > >> On 4 July 2011 13:49, Vijay Santhanam <
> > > vijay.santhanam@gmail.com>
> > > > >>> > wrote:
> > > > >>> > >> > Hi Sergey,
> > > > >>> > >> >
> > > > >>> > >> > Yes, there were no errors.
> > > > >>> > >> >
> > > > >>> > >> > And all the model data seems to have been populated into
> > > > >>> bayes-model
> > > > >>> > >> folder.
> > > > >>> > >> > Also, each main folder in bayes-model has a _SUCESS file.
> > > > >>> > >> >
> > > > >>> > >> > See the tarball of my trained model here,
> > > > >>> > >> > http://dl.dropbox.com/u/7881451/bayes-model.tar.gz
> > > > >>> > >> > Please compare it to your trained model if possible, I
> would
> > > > like
> > > > >>> to
> > > > >>> > >> know if
> > > > >>> > >> > it's different in any way.
> > > > >>> > >> >
> > > > >>> > >> > Perhaps it's corrupted in someway.
> > > > >>> > >> >
> > > > >>> > >> > Thanks,
> > > > >>> > >> > Vijay
> > > > >>> > >> >
> > > > >>> > >> >
> > > > >>> > >> >
> > > > >>> > >> > On Mon, Jul 4, 2011 at 7:39 PM, Sergey Bartunov <sbos.net
> @
> > > > >>> gmail.com>
> > > > >>> > >> wrote:
> > > > >>> > >> >
> > > > >>> > >> >> Stop, did you _train_ the classifier successfully before
> > > > running
> > > > >>> the
> > > > >>> > >> >> _test_?
> > > > >>> > >> >>
> > > > >>> > >> >> On 4 July 2011 13:30, Vijay Santhanam <
> > > > vijay.santhanam@gmail.com
> > > > >>> >
> > > > >>> > >> wrote:
> > > > >>> > >> >> > Hi Sergey,
> > > > >>> > >> >> >
> > > > >>> > >> >> > I've tried using both the sh script file and following
> > the
> > > > >>> > >> instructions
> > > > >>> > >> >> at
> > > > >>> > >> >> > https://cwiki.apache.org/MAHOUT/twenty-newsgroups.html-
> > > > like
> > > > >>> you
> > > > >>> > >> >> suggested.
> > > > >>> > >> >> > Both return the same results.
> > > > >>> > >> >> >
> > > > >>> > >> >> > I've uploaded my bayes-test-input folder to dropbox,
> the
> > > > first
> > > > >>> file
> > > > >>> > >> is
> > > > >>> > >> >> > here...
> > > > >>> > >> >> >
> > > > >>> http://dl.dropbox.com/u/7881451/bayes-test-input/alt.atheism.txt
> > > > >>> > >> >> >
> > > > >>> > >> >> > Thanks,
> > > > >>> > >> >> > Vijay
> > > > >>> > >> >> >
> > > > >>> > >> >> > On Mon, Jul 4, 2011 at 7:23 PM, Sergey Bartunov <
> > sbos.net@
> > > > >>> > gmail.com>
> > > > >>> > >> >> wrote:
> > > > >>> > >> >> >
> > > > >>> > >> >> >> Paste somewhere your  bayes-test-input file.
> > > > >>> > >> >> >>
> > > > >>> > >> >> >> On 4 July 2011 13:20, Sergey Bartunov <sbos.net@
> > gmail.com
> > > >
> > > > >>> wrote:
> > > > >>> > >> >> >> > Yes, I worked WITH hadoop, but there should be no
> > > > >>> difference.
> > > > >>> > >> >> >> >
> > > > >>> > >> >> >> > Why do you use examples/bin/build/20news-bayes.sh
> > > instead
> > > > of
> > > > >>> > >> direct
> > > > >>> > >> >> >> > running bin/mahout? Is it the same?
> > > > >>> > >> >> >> >
> > > > >>> > >> >> >> > On 4 July 2011 13:12, Vijay Santhanam <
> > > > >>> > vijay.santhanam@gmail.com>
> > > > >>> > >> >> wrote:
> > > > >>> > >> >> >> >> Thanks Sergey,
> > > > >>> > >> >> >> >>
> > > > >>> > >> >> >> >> I'm still receiving the same error after following
> > > those
> > > > >>> steps.
> > > > >>> > >> >> >> >> I've chosen not to use hadoop - does yours work
> WITH
> > > > >>> hadoop?
> > > > >>> > >> >> >> >>
> > > > >>> > >> >> >> >> A few bits of info that might be relevant.
> > > > >>> > >> >> >> >>
> > > > >>> > >> >> >> >> My examples/bin/work folder contains the expected
> > > folders
> > > > >>> from
> > > > >>> > >> test
> > > > >>> > >> >> data
> > > > >>> > >> >> >> >> preparation and training...
> > > > >>> > >> >> >> >> drwxr-xr-x@ 22 Vijay  staff  748 18 Mar  2003
> > > > >>> > 20news-bydate-test
> > > > >>> > >> >> >> >> drwxr-xr-x@ 22 Vijay  staff  748 18 Mar  2003
> > > > >>> > >> 20news-bydate-train
> > > > >>> > >> >> >> >> drwxr-xr-x   3 Vijay  staff  102  4 Jul 19:03
> > > bayes-model
> > > > >>> > >> >> >> >> drwxr-xr-x  22 Vijay  staff  748  4 Jul 18:20
> > > > >>> bayes-test-input
> > > > >>> > >> >> >> >> drwxr-xr-x  22 Vijay  staff  748  4 Jul 17:49
> > > > >>> bayes-train-input
> > > > >>> > >> >> >> >>
> > > > >>> > >> >> >> >>
> > > > >>> > >> >> >> >> I appreciate your help, do you have any other
> > > > suggestions?
> > > > >>> > >> >> >> >>
> > > > >>> > >> >> >> >> Regards,
> > > > >>> > >> >> >> >> Vijay
> > > > >>> > >> >> >> >>
> > > > >>> > >> >> >> >>
> > > > >>> > >> >> >> >> On Mon, Jul 4, 2011 at 6:58 PM, Sergey Bartunov <
> > > > sbos.net@
> > > > >>> > >> gmail.com>
> > > > >>> > >> >> >> wrote:
> > > > >>> > >> >> >> >>
> > > > >>> > >> >> >> >>> When I started with Mahout I had the same errors.
> In
> > > my
> > > > >>> case,
> > > > >>> > I
> > > > >>> > >> just
> > > > >>> > >> >> >> >>> didn't run PrepareTwentyNewsgroups. You may try to
> > > > >>> accurately
> > > > >>> > >> repeat
> > > > >>> > >> >> >> >>> all steps from
> > > > >>> > >> >> https://cwiki.apache.org/MAHOUT/twenty-newsgroups.html
> > > > >>> > >> >> >> >>>
> > > > >>> > >> >> >> >>> On 4 July 2011 12:52, Vijay Santhanam <
> > > > >>> > >> vijay.santhanam@gmail.com>
> > > > >>> > >> >> >> wrote:
> > > > >>> > >> >> >> >>> > Hi All,
> > > > >>> > >> >> >> >>> >
> > > > >>> > >> >> >> >>> > I'm new to Mahout and I'm interested in
> > > experimenting
> > > > >>> with
> > > > >>> > >> it's
> > > > >>> > >> >> >> >>> classifiers.
> > > > >>> > >> >> >> >>> >
> > > > >>> > >> >> >> >>> > Right now, I'm just trying to get up and running
> > > with
> > > > >>> the
> > > > >>> > >> demo's
> > > > >>> > >> >> and
> > > > >>> > >> >> >> >>> > examples.
> > > > >>> > >> >> >> >>> >
> > > > >>> > >> >> >> >>> > After checking out the mahout trunk, I've tried
> > > > running
> > > > >>> the
> > > > >>> > >> >> >> >>> classification
> > > > >>> > >> >> >> >>> > example 20news, but after running the
> > > > >>> > >> >> >> >>> ./examples/bin/build/20news-bayes.sh
> > > > >>> > >> >> >> >>> > script I get the following error during the
> > > > >>> classification
> > > > >>> > >> phase.
> > > > >>> > >> >> >> >>> >
> > > > >>> > >> >> >> >>> > Does anyone else get the same thing? Or have any
> > > > >>> > >> recommendations
> > > > >>> > >> >> >> about
> > > > >>> > >> >> >> >>> how
> > > > >>> > >> >> >> >>> > to fix it?
> > > > >>> > >> >> >> >>> > I'd just like to get a sample classifier working
> > > > before
> > > > >>> I
> > > > >>> > >> embark
> > > > >>> > >> >> on
> > > > >>> > >> >> >> my
> > > > >>> > >> >> >> >>> own
> > > > >>> > >> >> >> >>> > classification journey.
> > > > >>> > >> >> >> >>> >
> > > > >>> > >> >> >> >>> >
> > > > >>> > >> >> >> >>> > INFO: Loading model from:
> > > > >>> > >> >> >> >>> >
> > > {basePath=examples/bin/work/20news-bydate/bayes-model,
> > > > >>> > >> >> >> >>> classifierType=bayes,
> > > > >>> > >> >> >> >>> > alpha_i=1.0, dataSource=hdfs, gramSize=1,
> > > > verbose=false,
> > > > >>> > >> >> >> encoding=UTF-8,
> > > > >>> > >> >> >> >>> > defaultCat=unknown,
> > > > >>> > >> >> >> >>> >
> > > > >>> > testDirPath=examples/bin/work/20news-bydate/bayes-test-input}
> > > > >>> > >> >> >> >>> > Jul 4, 2011 6:28:25 PM
> > > org.slf4j.impl.JCLLoggerAdapter
> > > > >>> info
> > > > >>> > >> >> >> >>> > INFO: Testing Bayes Classifier
> > > > >>> > >> >> >> >>> > Jul 4, 2011 6:28:27 PM
> > > org.slf4j.impl.JCLLoggerAdapter
> > > > >>> info
> > > > >>> > >> >> >> >>> > INFO: Read 50000 feature weights
> > > > >>> > >> >> >> >>> > Jul 4, 2011 6:28:27 PM
> > > org.slf4j.impl.JCLLoggerAdapter
> > > > >>> info
> > > > >>> > >> >> >> >>> > INFO: Read 100000 feature weights
> > > > >>> > >> >> >> >>> > Jul 4, 2011 6:28:28 PM
> > > org.slf4j.impl.JCLLoggerAdapter
> > > > >>> info
> > > > >>> > >> >> >> >>> > INFO: 193370.88331085522
> > > > >>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM
> > > org.slf4j.impl.JCLLoggerAdapter
> > > > >>> info
> > > > >>> > >> >> >> >>> > INFO: rec.sport.baseball -129829.34738930278
> > > > >>> > 531784.7805631821
> > > > >>> > >> >> >> >>> > -0.2441388925268003
> > > > >>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM
> > > org.slf4j.impl.JCLLoggerAdapter
> > > > >>> info
> > > > >>> > >> >> >> >>> > INFO: sci.crypt -193023.42370049533
> > > 531784.7805631821
> > > > >>> > >> >> >> -0.3629728242618669
> > > > >>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM
> > > org.slf4j.impl.JCLLoggerAdapter
> > > > >>> info
> > > > >>> > >> >> >> >>> > INFO: rec.sport.hockey -167853.6159738822
> > > > >>> 531784.7805631821
> > > > >>> > >> >> >> >>> > -0.31564200802459647
> > > > >>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM
> > > org.slf4j.impl.JCLLoggerAdapter
> > > > >>> info
> > > > >>> > >> >> >> >>> > INFO: talk.politics.guns -203524.0148974065
> > > > >>> > 531784.7805631821
> > > > >>> > >> >> >> >>> > -0.3827187658170024
> > > > >>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM
> > > org.slf4j.impl.JCLLoggerAdapter
> > > > >>> info
> > > > >>> > >> >> >> >>> > INFO: soc.religion.christian -163900.9258713857
> > > > >>> > >> 531784.7805631821
> > > > >>> > >> >> >> >>> > -0.308209132457322
> > > > >>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM
> > > org.slf4j.impl.JCLLoggerAdapter
> > > > >>> info
> > > > >>> > >> >> >> >>> > INFO: sci.electronics -142854.1677345925
> > > > >>> 531784.7805631821
> > > > >>> > >> >> >> >>> > -0.26863154598614886
> > > > >>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM
> > > org.slf4j.impl.JCLLoggerAdapter
> > > > >>> info
> > > > >>> > >> >> >> >>> > INFO: comp.os.ms-windows.misc -531784.7805631821
> > > > >>> > >> 531784.7805631821
> > > > >>> > >> >> >> -1.0
> > > > >>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM
> > > org.slf4j.impl.JCLLoggerAdapter
> > > > >>> info
> > > > >>> > >> >> >> >>> > INFO: misc.forsale -143454.70176448982
> > > > 531784.7805631821
> > > > >>> > >> >> >> >>> > -0.26976082619845826
> > > > >>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM
> > > org.slf4j.impl.JCLLoggerAdapter
> > > > >>> info
> > > > >>> > >> >> >> >>> > INFO: talk.religion.misc -139428.73484148504
> > > > >>> > 531784.7805631821
> > > > >>> > >> >> >> >>> > -0.2621901565024562
> > > > >>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM
> > > org.slf4j.impl.JCLLoggerAdapter
> > > > >>> info
> > > > >>> > >> >> >> >>> > INFO: alt.atheism -139569.06867597546
> > > > 531784.7805631821
> > > > >>> > >> >> >> >>> -0.2624540486626301
> > > > >>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM
> > > org.slf4j.impl.JCLLoggerAdapter
> > > > >>> info
> > > > >>> > >> >> >> >>> > INFO: comp.windows.x -178029.10523376046
> > > > >>> 531784.7805631821
> > > > >>> > >> >> >> >>> > -0.33477660839638973
> > > > >>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM
> > > org.slf4j.impl.JCLLoggerAdapter
> > > > >>> info
> > > > >>> > >> >> >> >>> > INFO: talk.politics.mideast -193075.00789450994
> > > > >>> > >> 531784.7805631821
> > > > >>> > >> >> >> >>> > -0.36306982627452317
> > > > >>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM
> > > org.slf4j.impl.JCLLoggerAdapter
> > > > >>> info
> > > > >>> > >> >> >> >>> > INFO: comp.sys.ibm.pc.hardware
> -138410.02049984262
> > > > >>> > >> >> 531784.7805631821
> > > > >>> > >> >> >> >>> > -0.2602745049477736
> > > > >>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM
> > > org.slf4j.impl.JCLLoggerAdapter
> > > > >>> info
> > > > >>> > >> >> >> >>> > INFO: comp.sys.mac.hardware -125200.9927438868
> > > > >>> > >> 531784.7805631821
> > > > >>> > >> >> >> >>> > -0.23543545682389364
> > > > >>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM
> > > org.slf4j.impl.JCLLoggerAdapter
> > > > >>> info
> > > > >>> > >> >> >> >>> > INFO: sci.space -192437.0009266271
> > 531784.7805631821
> > > > >>> > >> >> >> -0.3618700797018455
> > > > >>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM
> > > org.slf4j.impl.JCLLoggerAdapter
> > > > >>> info
> > > > >>> > >> >> >> >>> > INFO: rec.motorcycles -143142.20855440624
> > > > >>> 531784.7805631821
> > > > >>> > >> >> >> >>> > -0.26917319522159455
> > > > >>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM
> > > org.slf4j.impl.JCLLoggerAdapter
> > > > >>> info
> > > > >>> > >> >> >> >>> > INFO: rec.autos -141800.97549909537
> > > 531784.7805631821
> > > > >>> > >> >> >> -0.2666510601317365
> > > > >>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM
> > > org.slf4j.impl.JCLLoggerAdapter
> > > > >>> info
> > > > >>> > >> >> >> >>> > INFO: comp.graphics -166882.18654471825
> > > > >>> 531784.7805631821
> > > > >>> > >> >> >> >>> > -0.3138152738556811
> > > > >>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM
> > > org.slf4j.impl.JCLLoggerAdapter
> > > > >>> info
> > > > >>> > >> >> >> >>> > INFO: talk.politics.misc -165196.84193278523
> > > > >>> > 531784.7805631821
> > > > >>> > >> >> >> >>> > -0.3106460507535303
> > > > >>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM
> > > org.slf4j.impl.JCLLoggerAdapter
> > > > >>> info
> > > > >>> > >> >> >> >>> > INFO: sci.med -192698.5183245711
> 531784.7805631821
> > > > >>> > >> >> >> -0.36236185270382393
> > > > >>> > >> >> >> >>> > Exception in thread "main"
> > > > >>> > java.lang.IllegalArgumentException:
> > > > >>> > >> >> Label
> > > > >>> > >> >> >> not
> > > > >>> > >> >> >> >>> > found: alt.atheism from
> > > > >>> > >> >> >> >>> >  at
> > > > >>> > >> >> >> >>> >
> > > > >>> > >> >> >>
> > > > >>> > >> >>
> > > > >>> > >>
> > > > >>> >
> > > > >>>
> > > >
> > com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
> > > > >>> > >> >> >> >>> > at
> > > > >>> > >> >> >> >>> >
> > > > >>> > >> >> >> >>>
> > > > >>> > >> >> >>
> > > > >>> > >> >>
> > > > >>> > >>
> > > > >>> >
> > > > >>>
> > > >
> > >
> >
> org.apache.mahout.classifier.ConfusionMatrix.getCount(ConfusionMatrix.java:93)
> > > > >>> > >> >> >> >>> >  at
> > > > >>> > >> >> >> >>> >
> > > > >>> > >> >> >> >>>
> > > > >>> > >> >> >>
> > > > >>> > >> >>
> > > > >>> > >>
> > > > >>> >
> > > > >>>
> > > >
> > >
> >
> org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:113)
> > > > >>> > >> >> >> >>> > at
> > > > >>> > >> >> >> >>> >
> > > > >>> > >> >> >> >>>
> > > > >>> > >> >> >>
> > > > >>> > >> >>
> > > > >>> > >>
> > > > >>> >
> > > > >>>
> > > >
> > >
> >
> org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:117)
> > > > >>> > >> >> >> >>> >  at
> > > > >>> > >> >> >> >>> >
> > > > >>> > >> >> >> >>>
> > > > >>> > >> >> >>
> > > > >>> > >> >>
> > > > >>> > >>
> > > > >>> >
> > > > >>>
> > > >
> > >
> >
> org.apache.mahout.classifier.ConfusionMatrix.addInstance(ConfusionMatrix.java:85)
> > > > >>> > >> >> >> >>> > at
> > > > >>> > >> >> >> >>> >
> > > > >>> > >> >> >> >>>
> > > > >>> > >> >> >>
> > > > >>> > >> >>
> > > > >>> > >>
> > > > >>> >
> > > > >>>
> > > >
> > >
> >
> org.apache.mahout.classifier.ResultAnalyzer.addInstance(ResultAnalyzer.java:67)
> > > > >>> > >> >> >> >>> >  at
> > > > >>> > >> >> >> >>> >
> > > > >>> > >> >> >> >>>
> > > > >>> > >> >> >>
> > > > >>> > >> >>
> > > > >>> > >>
> > > > >>> >
> > > > >>>
> > > >
> > >
> >
> org.apache.mahout.classifier.bayes.TestClassifier.classifySequential(TestClassifier.java:244)
> > > > >>> > >> >> >> >>> > at
> > > > >>> > >> >> >> >>> >
> > > > >>> > >> >> >> >>>
> > > > >>> > >> >> >>
> > > > >>> > >> >>
> > > > >>> > >>
> > > > >>> >
> > > > >>>
> > > >
> > >
> >
> org.apache.mahout.classifier.bayes.TestClassifier.main(TestClassifier.java:177)
> > > > >>> > >> >> >> >>> >  at
> > > > sun.reflect.NativeMethodAccessorImpl.invoke0(Native
> > > > >>> > >> Method)
> > > > >>> > >> >> >> >>> > at
> > > > >>> > >> >> >> >>> >
> > > > >>> > >> >> >> >>>
> > > > >>> > >> >> >>
> > > > >>> > >> >>
> > > > >>> > >>
> > > > >>> >
> > > > >>>
> > > >
> > >
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> > > > >>> > >> >> >> >>> >  at
> > > > >>> > >> >> >> >>> >
> > > > >>> > >> >> >> >>>
> > > > >>> > >> >> >>
> > > > >>> > >> >>
> > > > >>> > >>
> > > > >>> >
> > > > >>>
> > > >
> > >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> > > > >>> > >> >> >> >>> > at
> > java.lang.reflect.Method.invoke(Method.java:597)
> > > > >>> > >> >> >> >>> >  at
> > > > >>> > >> >> >> >>> >
> > > > >>> > >> >> >> >>>
> > > > >>> > >> >> >>
> > > > >>> > >> >>
> > > > >>> > >>
> > > > >>> >
> > > > >>>
> > > >
> > >
> >
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> > > > >>> > >> >> >> >>> > at
> > > > >>> > >> >> >>
> > > > >>> >
> > org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> > > > >>> > >> >> >> >>> >  at
> > > > >>> > >> >>
> > > > org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
> > > > >>> > >> >> >> >>> >
> > > > >>> > >> >> >> >>> >
> > > > >>> > >> >> >> >>> > Any help is great appreciated.
> > > > >>> > >> >> >> >>> >
> > > > >>> > >> >> >> >>> > Regards,
> > > > >>> > >> >> >> >>> > --
> > > > >>> > >> >> >> >>> >  Vijay Santhanam
> > > > >>> > >> >> >> >>> >  Software Engineer
> > > > >>> > >> >> >> >>> >
> > > > >>> > >> >> >> >>>
> > > > >>> > >> >> >> >>
> > > > >>> > >> >> >> >>
> > > > >>> > >> >> >> >>
> > > > >>> > >> >> >> >> --
> > > > >>> > >> >> >> >>  Vijay Santhanam
> > > > >>> > >> >> >> >>  Software Engineer
> > > > >>> > >> >> >> >>  http://au.linkedin.com/in/vijaysanthanam
> > > > >>> > >> >> >> >>  0407525087
> > > > >>> > >> >> >> >>
> > > > >>> > >> >> >> >
> > > > >>> > >> >> >>
> > > > >>> > >> >> >
> > > > >>> > >> >> >
> > > > >>> > >> >> >
> > > > >>> > >> >> > --
> > > > >>> > >> >> >  Vijay Santhanam
> > > > >>> > >> >> >  Software Engineer
> > > > >>> > >> >> >  http://au.linkedin.com/in/vijaysanthanam
> > > > >>> > >> >> >  0407525087
> > > > >>> > >> >> >
> > > > >>> > >> >>
> > > > >>> > >> >
> > > > >>> > >> >
> > > > >>> > >> >
> > > > >>> > >> > --
> > > > >>> > >> >  Vijay Santhanam
> > > > >>> > >> >  Software Engineer
> > > > >>> > >> >  http://au.linkedin.com/in/vijaysanthanam
> > > > >>> > >> >  0407525087
> > > > >>> > >> >
> > > > >>> > >>
> > > > >>> > >
> > > > >>> > >
> > > > >>> > >
> > > > >>> > > --
> > > > >>> > >  Vijay Santhanam
> > > > >>> > >  Software Engineer
> > > > >>> > >  http://au.linkedin.com/in/vijaysanthanam
> > > > >>> > >  0407525087
> > > > >>> > >
> > > > >>> >
> > > > >>> >
> > > > >>> >
> > > > >>> > --
> > > > >>> >  Vijay Santhanam
> > > > >>> >  Software Engineer
> > > > >>> >  http://au.linkedin.com/in/vijaysanthanam
> > > > >>> >  0407525087
> > > > >>> >
> > > > >>>
> > > > >>
> > > > >>
> > > > >>
> > > > >> --
> > > > >>  Vijay Santhanam
> > > > >>  Software Engineer
> > > > >>  http://au.linkedin.com/in/vijaysanthanam
> > > > >>  0407525087
> > > > >>
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > >  Vijay Santhanam
> > > > >  Software Engineer
> > > > >  http://au.linkedin.com/in/vijaysanthanam
> > > > >  0407525087
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > >  Vijay Santhanam
> > > >  Software Engineer
> > > >  http://au.linkedin.com/in/vijaysanthanam
> > > >  0407525087
> > > >
> > >
> >
> >
> >
> > --
> >  Vijay Santhanam
> >  Software Engineer
> >  http://au.linkedin.com/in/vijaysanthanam
> >  0407525087
> >
>



-- 
 Vijay Santhanam
 Software Engineer
 http://au.linkedin.com/in/vijaysanthanam
 0407525087

Re: 20news

Posted by Robin Anil <ro...@gmail.com>.
We are using the default lucene tokenizer. You can also pass in a tokenizer
via the command line.



On Mon, Jul 4, 2011 at 5:55 PM, Vijay Santhanam
<vi...@gmail.com>wrote:

> No sir.
>
> UTF-8 all the way.
>
> When doing non-sequential training and classification, what class is used
> for tokenization?
>
> I get the feeling different tokenizer classes are used for sequential and
> parallel training/classification.
>
>
>
>
> On Mon, Jul 4, 2011 at 10:23 PM, Robin Anil <ro...@gmail.com> wrote:
>
> > Are you using some non-standard Java character encoding?
> >
> >
> > On Mon, Jul 4, 2011 at 5:23 PM, Vijay Santhanam
> > <vi...@gmail.com>wrote:
> >
> > > Hi,
> > >
> > > Okay, I replaced all the tab characters with space characters for each
> > file
> > > in the bayes-test-input folder and now the classifier completes without
> > > error.
> > >
> > > Tomorrow I'll investigate why the trainer correctly parses the
> > > tab-separated
> > > label correctly, but the classifier does not. Actually, I know why the
> > > classifier doesn't extract the correct label--- because
> > > org.apache.mahout.common.nlp.NGrams tokenizes via spaces only.
> > >
> > > The other mystery is why it works for everyone else except poor me :(
> > >
> > > If anyone has any ideas I'd love to hear it.
> > >
> > > Cheers,
> > > Vijay
> > >
> > >
> > >
> > > On Mon, Jul 4, 2011 at 9:16 PM, Vijay Santhanam
> > > <vi...@gmail.com>wrote:
> > >
> > > > Hi,
> > > >
> > > > I got debugger running w/ eclipse so I could watch what was happening
> > > under
> > > > the hood.
> > > >
> > > > Here's the exception again
> > > > Exception in thread "main" java.lang.IllegalArgumentException: Label
> > not
> > > > found: alt.atheism from
> > > >  at
> > > >
> > com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
> > > > at
> > > >
> > >
> >
> org.apache.mahout.classifier.ConfusionMatrix.getCount(ConfusionMatrix.java:93)
> > > >  at
> > > >
> > >
> >
> org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:113)
> > > > at
> > > >
> > >
> >
> org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:117)
> > > >  at
> > > >
> > >
> >
> org.apache.mahout.classifier.ConfusionMatrix.addInstance(ConfusionMatrix.java:85)
> > > > at
> > > >
> > >
> >
> org.apache.mahout.classifier.ResultAnalyzer.addInstance(ResultAnalyzer.java:67)
> > > >  at
> > > >
> > >
> >
> org.apache.mahout.classifier.bayes.TestClassifier.classifySequential(TestClassifier.java:244)
> > > > at
> > > >
> > >
> >
> org.apache.mahout.classifier.bayes.TestClassifier.main(TestClassifier.java:177)
> > > >  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > > > at
> > > >
> > >
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> > > >  at
> > > >
> > >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> > > > at java.lang.reflect.Method.invoke(Method.java:597)
> > > >  at
> > > >
> > >
> >
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> > > > at
> org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> > > >  at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
> > > >
> > > > Notice the "Label not found: alt.atheism\tfrom"
> > > >
> > > > That's an invalid label in the confusion matrix. I think it SHOULD be
> > > just
> > > > alt.atheism. I'm not sure how the \tfrom is getting in there, but it
> > is.
> > > > Perhaps it has something to do with the way my test data was
> formatted.
> > > >
> > > > I'll keep digging....
> > > >
> > > > Thanks,
> > > > Vijay
> > > >
> > > >
> > > >
> > > > On Mon, Jul 4, 2011 at 8:52 PM, Vijay Santhanam <
> > > vijay.santhanam@gmail.com
> > > > > wrote:
> > > >
> > > >> Hi Robin,
> > > >>
> > > >> The console dump was a too large for pastebin, so I uploaded it here
> > --
> > > >>
> http://dl.dropbox.com/u/7881451/build-20news-bayes-console-output.txt
> > > >>
> > > >> I performed a fresh checkout only hours ago, and I used script
> > > >> examples/bin/build-20news-bayes.sh
> > > >> I've opted to avoid hadoop, but from what I can tell the model was
> > > created
> > > >> with success.
> > > >>
> > > >>
> > > >> Thanks,
> > > >> Vijay
> > > >>
> > > >>
> > > >> On Mon, Jul 4, 2011 at 8:28 PM, Robin Anil <ro...@gmail.com>
> > > wrote:
> > > >>
> > > >>> Can you send me the console dump
> > > >>> Command line + Log written by the program and put it on say
> pastebin
> > > >>>
> > > >>> Robin
> > > >>>
> > > >>> On Mon, Jul 4, 2011 at 3:48 PM, Vijay Santhanam
> > > >>> <vi...@gmail.com>wrote:
> > > >>>
> > > >>> > I tried deleting all the folders from the test and train data
> > except
> > > >>> for
> > > >>> > alt.atheism, but I get the identical error.
> > > >>> >
> > > >>> > I might try debugging the problem in eclipse rather than from
> > > >>> commandline,
> > > >>> > but Eclipse doesn't quite want to work either.
> > > >>> >
> > > >>> >
> > > >>> > On Mon, Jul 4, 2011 at 8:02 PM, Vijay Santhanam
> > > >>> > <vi...@gmail.com>wrote:
> > > >>> >
> > > >>> > > Thanks anyway Sergey. Could you perhaps upload your bayes-model
> > > >>> folder so
> > > >>> > I
> > > >>> > > could try that out?
> > > >>> > >
> > > >>> > >
> > > >>> > >
> > > >>> > > On Mon, Jul 4, 2011 at 7:57 PM, Sergey Bartunov <sbos.net@
> > > gmail.com
> > > >>> > >wrote:
> > > >>> > >
> > > >>> > >> Well, that's strange. Sorry, I can't help you at the moment,
> > maybe
> > > >>> > >> someone else in the mailing list could.
> > > >>> > >>
> > > >>> > >> On 4 July 2011 13:49, Vijay Santhanam <
> > vijay.santhanam@gmail.com>
> > > >>> > wrote:
> > > >>> > >> > Hi Sergey,
> > > >>> > >> >
> > > >>> > >> > Yes, there were no errors.
> > > >>> > >> >
> > > >>> > >> > And all the model data seems to have been populated into
> > > >>> bayes-model
> > > >>> > >> folder.
> > > >>> > >> > Also, each main folder in bayes-model has a _SUCESS file.
> > > >>> > >> >
> > > >>> > >> > See the tarball of my trained model here,
> > > >>> > >> > http://dl.dropbox.com/u/7881451/bayes-model.tar.gz
> > > >>> > >> > Please compare it to your trained model if possible, I would
> > > like
> > > >>> to
> > > >>> > >> know if
> > > >>> > >> > it's different in any way.
> > > >>> > >> >
> > > >>> > >> > Perhaps it's corrupted in someway.
> > > >>> > >> >
> > > >>> > >> > Thanks,
> > > >>> > >> > Vijay
> > > >>> > >> >
> > > >>> > >> >
> > > >>> > >> >
> > > >>> > >> > On Mon, Jul 4, 2011 at 7:39 PM, Sergey Bartunov <sbos.net@
> > > >>> gmail.com>
> > > >>> > >> wrote:
> > > >>> > >> >
> > > >>> > >> >> Stop, did you _train_ the classifier successfully before
> > > running
> > > >>> the
> > > >>> > >> >> _test_?
> > > >>> > >> >>
> > > >>> > >> >> On 4 July 2011 13:30, Vijay Santhanam <
> > > vijay.santhanam@gmail.com
> > > >>> >
> > > >>> > >> wrote:
> > > >>> > >> >> > Hi Sergey,
> > > >>> > >> >> >
> > > >>> > >> >> > I've tried using both the sh script file and following
> the
> > > >>> > >> instructions
> > > >>> > >> >> at
> > > >>> > >> >> > https://cwiki.apache.org/MAHOUT/twenty-newsgroups.html -
> > > like
> > > >>> you
> > > >>> > >> >> suggested.
> > > >>> > >> >> > Both return the same results.
> > > >>> > >> >> >
> > > >>> > >> >> > I've uploaded my bayes-test-input folder to dropbox, the
> > > first
> > > >>> file
> > > >>> > >> is
> > > >>> > >> >> > here...
> > > >>> > >> >> >
> > > >>> http://dl.dropbox.com/u/7881451/bayes-test-input/alt.atheism.txt
> > > >>> > >> >> >
> > > >>> > >> >> > Thanks,
> > > >>> > >> >> > Vijay
> > > >>> > >> >> >
> > > >>> > >> >> > On Mon, Jul 4, 2011 at 7:23 PM, Sergey Bartunov <
> sbos.net@
> > > >>> > gmail.com>
> > > >>> > >> >> wrote:
> > > >>> > >> >> >
> > > >>> > >> >> >> Paste somewhere your  bayes-test-input file.
> > > >>> > >> >> >>
> > > >>> > >> >> >> On 4 July 2011 13:20, Sergey Bartunov <sbos.net@
> gmail.com
> > >
> > > >>> wrote:
> > > >>> > >> >> >> > Yes, I worked WITH hadoop, but there should be no
> > > >>> difference.
> > > >>> > >> >> >> >
> > > >>> > >> >> >> > Why do you use examples/bin/build/20news-bayes.sh
> > instead
> > > of
> > > >>> > >> direct
> > > >>> > >> >> >> > running bin/mahout? Is it the same?
> > > >>> > >> >> >> >
> > > >>> > >> >> >> > On 4 July 2011 13:12, Vijay Santhanam <
> > > >>> > vijay.santhanam@gmail.com>
> > > >>> > >> >> wrote:
> > > >>> > >> >> >> >> Thanks Sergey,
> > > >>> > >> >> >> >>
> > > >>> > >> >> >> >> I'm still receiving the same error after following
> > those
> > > >>> steps.
> > > >>> > >> >> >> >> I've chosen not to use hadoop - does yours work WITH
> > > >>> hadoop?
> > > >>> > >> >> >> >>
> > > >>> > >> >> >> >> A few bits of info that might be relevant.
> > > >>> > >> >> >> >>
> > > >>> > >> >> >> >> My examples/bin/work folder contains the expected
> > folders
> > > >>> from
> > > >>> > >> test
> > > >>> > >> >> data
> > > >>> > >> >> >> >> preparation and training...
> > > >>> > >> >> >> >> drwxr-xr-x@ 22 Vijay  staff  748 18 Mar  2003
> > > >>> > 20news-bydate-test
> > > >>> > >> >> >> >> drwxr-xr-x@ 22 Vijay  staff  748 18 Mar  2003
> > > >>> > >> 20news-bydate-train
> > > >>> > >> >> >> >> drwxr-xr-x   3 Vijay  staff  102  4 Jul 19:03
> > bayes-model
> > > >>> > >> >> >> >> drwxr-xr-x  22 Vijay  staff  748  4 Jul 18:20
> > > >>> bayes-test-input
> > > >>> > >> >> >> >> drwxr-xr-x  22 Vijay  staff  748  4 Jul 17:49
> > > >>> bayes-train-input
> > > >>> > >> >> >> >>
> > > >>> > >> >> >> >>
> > > >>> > >> >> >> >> I appreciate your help, do you have any other
> > > suggestions?
> > > >>> > >> >> >> >>
> > > >>> > >> >> >> >> Regards,
> > > >>> > >> >> >> >> Vijay
> > > >>> > >> >> >> >>
> > > >>> > >> >> >> >>
> > > >>> > >> >> >> >> On Mon, Jul 4, 2011 at 6:58 PM, Sergey Bartunov <
> > > sbos.net@
> > > >>> > >> gmail.com>
> > > >>> > >> >> >> wrote:
> > > >>> > >> >> >> >>
> > > >>> > >> >> >> >>> When I started with Mahout I had the same errors. In
> > my
> > > >>> case,
> > > >>> > I
> > > >>> > >> just
> > > >>> > >> >> >> >>> didn't run PrepareTwentyNewsgroups. You may try to
> > > >>> accurately
> > > >>> > >> repeat
> > > >>> > >> >> >> >>> all steps from
> > > >>> > >> >> https://cwiki.apache.org/MAHOUT/twenty-newsgroups.html
> > > >>> > >> >> >> >>>
> > > >>> > >> >> >> >>> On 4 July 2011 12:52, Vijay Santhanam <
> > > >>> > >> vijay.santhanam@gmail.com>
> > > >>> > >> >> >> wrote:
> > > >>> > >> >> >> >>> > Hi All,
> > > >>> > >> >> >> >>> >
> > > >>> > >> >> >> >>> > I'm new to Mahout and I'm interested in
> > experimenting
> > > >>> with
> > > >>> > >> it's
> > > >>> > >> >> >> >>> classifiers.
> > > >>> > >> >> >> >>> >
> > > >>> > >> >> >> >>> > Right now, I'm just trying to get up and running
> > with
> > > >>> the
> > > >>> > >> demo's
> > > >>> > >> >> and
> > > >>> > >> >> >> >>> > examples.
> > > >>> > >> >> >> >>> >
> > > >>> > >> >> >> >>> > After checking out the mahout trunk, I've tried
> > > running
> > > >>> the
> > > >>> > >> >> >> >>> classification
> > > >>> > >> >> >> >>> > example 20news, but after running the
> > > >>> > >> >> >> >>> ./examples/bin/build/20news-bayes.sh
> > > >>> > >> >> >> >>> > script I get the following error during the
> > > >>> classification
> > > >>> > >> phase.
> > > >>> > >> >> >> >>> >
> > > >>> > >> >> >> >>> > Does anyone else get the same thing? Or have any
> > > >>> > >> recommendations
> > > >>> > >> >> >> about
> > > >>> > >> >> >> >>> how
> > > >>> > >> >> >> >>> > to fix it?
> > > >>> > >> >> >> >>> > I'd just like to get a sample classifier working
> > > before
> > > >>> I
> > > >>> > >> embark
> > > >>> > >> >> on
> > > >>> > >> >> >> my
> > > >>> > >> >> >> >>> own
> > > >>> > >> >> >> >>> > classification journey.
> > > >>> > >> >> >> >>> >
> > > >>> > >> >> >> >>> >
> > > >>> > >> >> >> >>> > INFO: Loading model from:
> > > >>> > >> >> >> >>> >
> > {basePath=examples/bin/work/20news-bydate/bayes-model,
> > > >>> > >> >> >> >>> classifierType=bayes,
> > > >>> > >> >> >> >>> > alpha_i=1.0, dataSource=hdfs, gramSize=1,
> > > verbose=false,
> > > >>> > >> >> >> encoding=UTF-8,
> > > >>> > >> >> >> >>> > defaultCat=unknown,
> > > >>> > >> >> >> >>> >
> > > >>> > testDirPath=examples/bin/work/20news-bydate/bayes-test-input}
> > > >>> > >> >> >> >>> > Jul 4, 2011 6:28:25 PM
> > org.slf4j.impl.JCLLoggerAdapter
> > > >>> info
> > > >>> > >> >> >> >>> > INFO: Testing Bayes Classifier
> > > >>> > >> >> >> >>> > Jul 4, 2011 6:28:27 PM
> > org.slf4j.impl.JCLLoggerAdapter
> > > >>> info
> > > >>> > >> >> >> >>> > INFO: Read 50000 feature weights
> > > >>> > >> >> >> >>> > Jul 4, 2011 6:28:27 PM
> > org.slf4j.impl.JCLLoggerAdapter
> > > >>> info
> > > >>> > >> >> >> >>> > INFO: Read 100000 feature weights
> > > >>> > >> >> >> >>> > Jul 4, 2011 6:28:28 PM
> > org.slf4j.impl.JCLLoggerAdapter
> > > >>> info
> > > >>> > >> >> >> >>> > INFO: 193370.88331085522
> > > >>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM
> > org.slf4j.impl.JCLLoggerAdapter
> > > >>> info
> > > >>> > >> >> >> >>> > INFO: rec.sport.baseball -129829.34738930278
> > > >>> > 531784.7805631821
> > > >>> > >> >> >> >>> > -0.2441388925268003
> > > >>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM
> > org.slf4j.impl.JCLLoggerAdapter
> > > >>> info
> > > >>> > >> >> >> >>> > INFO: sci.crypt -193023.42370049533
> > 531784.7805631821
> > > >>> > >> >> >> -0.3629728242618669
> > > >>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM
> > org.slf4j.impl.JCLLoggerAdapter
> > > >>> info
> > > >>> > >> >> >> >>> > INFO: rec.sport.hockey -167853.6159738822
> > > >>> 531784.7805631821
> > > >>> > >> >> >> >>> > -0.31564200802459647
> > > >>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM
> > org.slf4j.impl.JCLLoggerAdapter
> > > >>> info
> > > >>> > >> >> >> >>> > INFO: talk.politics.guns -203524.0148974065
> > > >>> > 531784.7805631821
> > > >>> > >> >> >> >>> > -0.3827187658170024
> > > >>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM
> > org.slf4j.impl.JCLLoggerAdapter
> > > >>> info
> > > >>> > >> >> >> >>> > INFO: soc.religion.christian -163900.9258713857
> > > >>> > >> 531784.7805631821
> > > >>> > >> >> >> >>> > -0.308209132457322
> > > >>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM
> > org.slf4j.impl.JCLLoggerAdapter
> > > >>> info
> > > >>> > >> >> >> >>> > INFO: sci.electronics -142854.1677345925
> > > >>> 531784.7805631821
> > > >>> > >> >> >> >>> > -0.26863154598614886
> > > >>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM
> > org.slf4j.impl.JCLLoggerAdapter
> > > >>> info
> > > >>> > >> >> >> >>> > INFO: comp.os.ms-windows.misc -531784.7805631821
> > > >>> > >> 531784.7805631821
> > > >>> > >> >> >> -1.0
> > > >>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM
> > org.slf4j.impl.JCLLoggerAdapter
> > > >>> info
> > > >>> > >> >> >> >>> > INFO: misc.forsale -143454.70176448982
> > > 531784.7805631821
> > > >>> > >> >> >> >>> > -0.26976082619845826
> > > >>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM
> > org.slf4j.impl.JCLLoggerAdapter
> > > >>> info
> > > >>> > >> >> >> >>> > INFO: talk.religion.misc -139428.73484148504
> > > >>> > 531784.7805631821
> > > >>> > >> >> >> >>> > -0.2621901565024562
> > > >>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM
> > org.slf4j.impl.JCLLoggerAdapter
> > > >>> info
> > > >>> > >> >> >> >>> > INFO: alt.atheism -139569.06867597546
> > > 531784.7805631821
> > > >>> > >> >> >> >>> -0.2624540486626301
> > > >>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM
> > org.slf4j.impl.JCLLoggerAdapter
> > > >>> info
> > > >>> > >> >> >> >>> > INFO: comp.windows.x -178029.10523376046
> > > >>> 531784.7805631821
> > > >>> > >> >> >> >>> > -0.33477660839638973
> > > >>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM
> > org.slf4j.impl.JCLLoggerAdapter
> > > >>> info
> > > >>> > >> >> >> >>> > INFO: talk.politics.mideast -193075.00789450994
> > > >>> > >> 531784.7805631821
> > > >>> > >> >> >> >>> > -0.36306982627452317
> > > >>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM
> > org.slf4j.impl.JCLLoggerAdapter
> > > >>> info
> > > >>> > >> >> >> >>> > INFO: comp.sys.ibm.pc.hardware -138410.02049984262
> > > >>> > >> >> 531784.7805631821
> > > >>> > >> >> >> >>> > -0.2602745049477736
> > > >>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM
> > org.slf4j.impl.JCLLoggerAdapter
> > > >>> info
> > > >>> > >> >> >> >>> > INFO: comp.sys.mac.hardware -125200.9927438868
> > > >>> > >> 531784.7805631821
> > > >>> > >> >> >> >>> > -0.23543545682389364
> > > >>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM
> > org.slf4j.impl.JCLLoggerAdapter
> > > >>> info
> > > >>> > >> >> >> >>> > INFO: sci.space -192437.0009266271
> 531784.7805631821
> > > >>> > >> >> >> -0.3618700797018455
> > > >>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM
> > org.slf4j.impl.JCLLoggerAdapter
> > > >>> info
> > > >>> > >> >> >> >>> > INFO: rec.motorcycles -143142.20855440624
> > > >>> 531784.7805631821
> > > >>> > >> >> >> >>> > -0.26917319522159455
> > > >>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM
> > org.slf4j.impl.JCLLoggerAdapter
> > > >>> info
> > > >>> > >> >> >> >>> > INFO: rec.autos -141800.97549909537
> > 531784.7805631821
> > > >>> > >> >> >> -0.2666510601317365
> > > >>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM
> > org.slf4j.impl.JCLLoggerAdapter
> > > >>> info
> > > >>> > >> >> >> >>> > INFO: comp.graphics -166882.18654471825
> > > >>> 531784.7805631821
> > > >>> > >> >> >> >>> > -0.3138152738556811
> > > >>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM
> > org.slf4j.impl.JCLLoggerAdapter
> > > >>> info
> > > >>> > >> >> >> >>> > INFO: talk.politics.misc -165196.84193278523
> > > >>> > 531784.7805631821
> > > >>> > >> >> >> >>> > -0.3106460507535303
> > > >>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM
> > org.slf4j.impl.JCLLoggerAdapter
> > > >>> info
> > > >>> > >> >> >> >>> > INFO: sci.med -192698.5183245711 531784.7805631821
> > > >>> > >> >> >> -0.36236185270382393
> > > >>> > >> >> >> >>> > Exception in thread "main"
> > > >>> > java.lang.IllegalArgumentException:
> > > >>> > >> >> Label
> > > >>> > >> >> >> not
> > > >>> > >> >> >> >>> > found: alt.atheism from
> > > >>> > >> >> >> >>> >  at
> > > >>> > >> >> >> >>> >
> > > >>> > >> >> >>
> > > >>> > >> >>
> > > >>> > >>
> > > >>> >
> > > >>>
> > >
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
> > > >>> > >> >> >> >>> > at
> > > >>> > >> >> >> >>> >
> > > >>> > >> >> >> >>>
> > > >>> > >> >> >>
> > > >>> > >> >>
> > > >>> > >>
> > > >>> >
> > > >>>
> > >
> >
> org.apache.mahout.classifier.ConfusionMatrix.getCount(ConfusionMatrix.java:93)
> > > >>> > >> >> >> >>> >  at
> > > >>> > >> >> >> >>> >
> > > >>> > >> >> >> >>>
> > > >>> > >> >> >>
> > > >>> > >> >>
> > > >>> > >>
> > > >>> >
> > > >>>
> > >
> >
> org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:113)
> > > >>> > >> >> >> >>> > at
> > > >>> > >> >> >> >>> >
> > > >>> > >> >> >> >>>
> > > >>> > >> >> >>
> > > >>> > >> >>
> > > >>> > >>
> > > >>> >
> > > >>>
> > >
> >
> org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:117)
> > > >>> > >> >> >> >>> >  at
> > > >>> > >> >> >> >>> >
> > > >>> > >> >> >> >>>
> > > >>> > >> >> >>
> > > >>> > >> >>
> > > >>> > >>
> > > >>> >
> > > >>>
> > >
> >
> org.apache.mahout.classifier.ConfusionMatrix.addInstance(ConfusionMatrix.java:85)
> > > >>> > >> >> >> >>> > at
> > > >>> > >> >> >> >>> >
> > > >>> > >> >> >> >>>
> > > >>> > >> >> >>
> > > >>> > >> >>
> > > >>> > >>
> > > >>> >
> > > >>>
> > >
> >
> org.apache.mahout.classifier.ResultAnalyzer.addInstance(ResultAnalyzer.java:67)
> > > >>> > >> >> >> >>> >  at
> > > >>> > >> >> >> >>> >
> > > >>> > >> >> >> >>>
> > > >>> > >> >> >>
> > > >>> > >> >>
> > > >>> > >>
> > > >>> >
> > > >>>
> > >
> >
> org.apache.mahout.classifier.bayes.TestClassifier.classifySequential(TestClassifier.java:244)
> > > >>> > >> >> >> >>> > at
> > > >>> > >> >> >> >>> >
> > > >>> > >> >> >> >>>
> > > >>> > >> >> >>
> > > >>> > >> >>
> > > >>> > >>
> > > >>> >
> > > >>>
> > >
> >
> org.apache.mahout.classifier.bayes.TestClassifier.main(TestClassifier.java:177)
> > > >>> > >> >> >> >>> >  at
> > > sun.reflect.NativeMethodAccessorImpl.invoke0(Native
> > > >>> > >> Method)
> > > >>> > >> >> >> >>> > at
> > > >>> > >> >> >> >>> >
> > > >>> > >> >> >> >>>
> > > >>> > >> >> >>
> > > >>> > >> >>
> > > >>> > >>
> > > >>> >
> > > >>>
> > >
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> > > >>> > >> >> >> >>> >  at
> > > >>> > >> >> >> >>> >
> > > >>> > >> >> >> >>>
> > > >>> > >> >> >>
> > > >>> > >> >>
> > > >>> > >>
> > > >>> >
> > > >>>
> > >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> > > >>> > >> >> >> >>> > at
> java.lang.reflect.Method.invoke(Method.java:597)
> > > >>> > >> >> >> >>> >  at
> > > >>> > >> >> >> >>> >
> > > >>> > >> >> >> >>>
> > > >>> > >> >> >>
> > > >>> > >> >>
> > > >>> > >>
> > > >>> >
> > > >>>
> > >
> >
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> > > >>> > >> >> >> >>> > at
> > > >>> > >> >> >>
> > > >>> >
> org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> > > >>> > >> >> >> >>> >  at
> > > >>> > >> >>
> > > org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
> > > >>> > >> >> >> >>> >
> > > >>> > >> >> >> >>> >
> > > >>> > >> >> >> >>> > Any help is great appreciated.
> > > >>> > >> >> >> >>> >
> > > >>> > >> >> >> >>> > Regards,
> > > >>> > >> >> >> >>> > --
> > > >>> > >> >> >> >>> >  Vijay Santhanam
> > > >>> > >> >> >> >>> >  Software Engineer
> > > >>> > >> >> >> >>> >
> > > >>> > >> >> >> >>>
> > > >>> > >> >> >> >>
> > > >>> > >> >> >> >>
> > > >>> > >> >> >> >>
> > > >>> > >> >> >> >> --
> > > >>> > >> >> >> >>  Vijay Santhanam
> > > >>> > >> >> >> >>  Software Engineer
> > > >>> > >> >> >> >>  http://au.linkedin.com/in/vijaysanthanam
> > > >>> > >> >> >> >>  0407525087
> > > >>> > >> >> >> >>
> > > >>> > >> >> >> >
> > > >>> > >> >> >>
> > > >>> > >> >> >
> > > >>> > >> >> >
> > > >>> > >> >> >
> > > >>> > >> >> > --
> > > >>> > >> >> >  Vijay Santhanam
> > > >>> > >> >> >  Software Engineer
> > > >>> > >> >> >  http://au.linkedin.com/in/vijaysanthanam
> > > >>> > >> >> >  0407525087
> > > >>> > >> >> >
> > > >>> > >> >>
> > > >>> > >> >
> > > >>> > >> >
> > > >>> > >> >
> > > >>> > >> > --
> > > >>> > >> >  Vijay Santhanam
> > > >>> > >> >  Software Engineer
> > > >>> > >> >  http://au.linkedin.com/in/vijaysanthanam
> > > >>> > >> >  0407525087
> > > >>> > >> >
> > > >>> > >>
> > > >>> > >
> > > >>> > >
> > > >>> > >
> > > >>> > > --
> > > >>> > >  Vijay Santhanam
> > > >>> > >  Software Engineer
> > > >>> > >  http://au.linkedin.com/in/vijaysanthanam
> > > >>> > >  0407525087
> > > >>> > >
> > > >>> >
> > > >>> >
> > > >>> >
> > > >>> > --
> > > >>> >  Vijay Santhanam
> > > >>> >  Software Engineer
> > > >>> >  http://au.linkedin.com/in/vijaysanthanam
> > > >>> >  0407525087
> > > >>> >
> > > >>>
> > > >>
> > > >>
> > > >>
> > > >> --
> > > >>  Vijay Santhanam
> > > >>  Software Engineer
> > > >>  http://au.linkedin.com/in/vijaysanthanam
> > > >>  0407525087
> > > >>
> > > >
> > > >
> > > >
> > > > --
> > > >  Vijay Santhanam
> > > >  Software Engineer
> > > >  http://au.linkedin.com/in/vijaysanthanam
> > > >  0407525087
> > > >
> > >
> > >
> > >
> > > --
> > >  Vijay Santhanam
> > >  Software Engineer
> > >  http://au.linkedin.com/in/vijaysanthanam
> > >  0407525087
> > >
> >
>
>
>
> --
>  Vijay Santhanam
>  Software Engineer
>  http://au.linkedin.com/in/vijaysanthanam
>  0407525087
>

Re: 20news

Posted by Vijay Santhanam <vi...@gmail.com>.
No sir.

UTF-8 all the way.

When doing non-sequential training and classification, what class is used
for tokenization?

I get the feeling different tokenizer classes are used for sequential and
parallel training/classification.




On Mon, Jul 4, 2011 at 10:23 PM, Robin Anil <ro...@gmail.com> wrote:

> Are you using some non-standard Java character encoding?
>
>
> On Mon, Jul 4, 2011 at 5:23 PM, Vijay Santhanam
> <vi...@gmail.com>wrote:
>
> > Hi,
> >
> > Okay, I replaced all the tab characters with space characters for each
> file
> > in the bayes-test-input folder and now the classifier completes without
> > error.
> >
> > Tomorrow I'll investigate why the trainer correctly parses the
> > tab-separated
> > label correctly, but the classifier does not. Actually, I know why the
> > classifier doesn't extract the correct label--- because
> > org.apache.mahout.common.nlp.NGrams tokenizes via spaces only.
> >
> > The other mystery is why it works for everyone else except poor me :(
> >
> > If anyone has any ideas I'd love to hear it.
> >
> > Cheers,
> > Vijay
> >
> >
> >
> > On Mon, Jul 4, 2011 at 9:16 PM, Vijay Santhanam
> > <vi...@gmail.com>wrote:
> >
> > > Hi,
> > >
> > > I got debugger running w/ eclipse so I could watch what was happening
> > under
> > > the hood.
> > >
> > > Here's the exception again
> > > Exception in thread "main" java.lang.IllegalArgumentException: Label
> not
> > > found: alt.atheism from
> > >  at
> > >
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
> > > at
> > >
> >
> org.apache.mahout.classifier.ConfusionMatrix.getCount(ConfusionMatrix.java:93)
> > >  at
> > >
> >
> org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:113)
> > > at
> > >
> >
> org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:117)
> > >  at
> > >
> >
> org.apache.mahout.classifier.ConfusionMatrix.addInstance(ConfusionMatrix.java:85)
> > > at
> > >
> >
> org.apache.mahout.classifier.ResultAnalyzer.addInstance(ResultAnalyzer.java:67)
> > >  at
> > >
> >
> org.apache.mahout.classifier.bayes.TestClassifier.classifySequential(TestClassifier.java:244)
> > > at
> > >
> >
> org.apache.mahout.classifier.bayes.TestClassifier.main(TestClassifier.java:177)
> > >  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > > at
> > >
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> > >  at
> > >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> > > at java.lang.reflect.Method.invoke(Method.java:597)
> > >  at
> > >
> >
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> > > at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> > >  at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
> > >
> > > Notice the "Label not found: alt.atheism\tfrom"
> > >
> > > That's an invalid label in the confusion matrix. I think it SHOULD be
> > just
> > > alt.atheism. I'm not sure how the \tfrom is getting in there, but it
> is.
> > > Perhaps it has something to do with the way my test data was formatted.
> > >
> > > I'll keep digging....
> > >
> > > Thanks,
> > > Vijay
> > >
> > >
> > >
> > > On Mon, Jul 4, 2011 at 8:52 PM, Vijay Santhanam <
> > vijay.santhanam@gmail.com
> > > > wrote:
> > >
> > >> Hi Robin,
> > >>
> > >> The console dump was a too large for pastebin, so I uploaded it here
> --
> > >> http://dl.dropbox.com/u/7881451/build-20news-bayes-console-output.txt
> > >>
> > >> I performed a fresh checkout only hours ago, and I used script
> > >> examples/bin/build-20news-bayes.sh
> > >> I've opted to avoid hadoop, but from what I can tell the model was
> > created
> > >> with success.
> > >>
> > >>
> > >> Thanks,
> > >> Vijay
> > >>
> > >>
> > >> On Mon, Jul 4, 2011 at 8:28 PM, Robin Anil <ro...@gmail.com>
> > wrote:
> > >>
> > >>> Can you send me the console dump
> > >>> Command line + Log written by the program and put it on say pastebin
> > >>>
> > >>> Robin
> > >>>
> > >>> On Mon, Jul 4, 2011 at 3:48 PM, Vijay Santhanam
> > >>> <vi...@gmail.com>wrote:
> > >>>
> > >>> > I tried deleting all the folders from the test and train data
> except
> > >>> for
> > >>> > alt.atheism, but I get the identical error.
> > >>> >
> > >>> > I might try debugging the problem in eclipse rather than from
> > >>> commandline,
> > >>> > but Eclipse doesn't quite want to work either.
> > >>> >
> > >>> >
> > >>> > On Mon, Jul 4, 2011 at 8:02 PM, Vijay Santhanam
> > >>> > <vi...@gmail.com>wrote:
> > >>> >
> > >>> > > Thanks anyway Sergey. Could you perhaps upload your bayes-model
> > >>> folder so
> > >>> > I
> > >>> > > could try that out?
> > >>> > >
> > >>> > >
> > >>> > >
> > >>> > > On Mon, Jul 4, 2011 at 7:57 PM, Sergey Bartunov <sbos.net@
> > gmail.com
> > >>> > >wrote:
> > >>> > >
> > >>> > >> Well, that's strange. Sorry, I can't help you at the moment,
> maybe
> > >>> > >> someone else in the mailing list could.
> > >>> > >>
> > >>> > >> On 4 July 2011 13:49, Vijay Santhanam <
> vijay.santhanam@gmail.com>
> > >>> > wrote:
> > >>> > >> > Hi Sergey,
> > >>> > >> >
> > >>> > >> > Yes, there were no errors.
> > >>> > >> >
> > >>> > >> > And all the model data seems to have been populated into
> > >>> bayes-model
> > >>> > >> folder.
> > >>> > >> > Also, each main folder in bayes-model has a _SUCESS file.
> > >>> > >> >
> > >>> > >> > See the tarball of my trained model here,
> > >>> > >> > http://dl.dropbox.com/u/7881451/bayes-model.tar.gz
> > >>> > >> > Please compare it to your trained model if possible, I would
> > like
> > >>> to
> > >>> > >> know if
> > >>> > >> > it's different in any way.
> > >>> > >> >
> > >>> > >> > Perhaps it's corrupted in someway.
> > >>> > >> >
> > >>> > >> > Thanks,
> > >>> > >> > Vijay
> > >>> > >> >
> > >>> > >> >
> > >>> > >> >
> > >>> > >> > On Mon, Jul 4, 2011 at 7:39 PM, Sergey Bartunov <sbos.net@
> > >>> gmail.com>
> > >>> > >> wrote:
> > >>> > >> >
> > >>> > >> >> Stop, did you _train_ the classifier successfully before
> > running
> > >>> the
> > >>> > >> >> _test_?
> > >>> > >> >>
> > >>> > >> >> On 4 July 2011 13:30, Vijay Santhanam <
> > vijay.santhanam@gmail.com
> > >>> >
> > >>> > >> wrote:
> > >>> > >> >> > Hi Sergey,
> > >>> > >> >> >
> > >>> > >> >> > I've tried using both the sh script file and following the
> > >>> > >> instructions
> > >>> > >> >> at
> > >>> > >> >> > https://cwiki.apache.org/MAHOUT/twenty-newsgroups.html -
> > like
> > >>> you
> > >>> > >> >> suggested.
> > >>> > >> >> > Both return the same results.
> > >>> > >> >> >
> > >>> > >> >> > I've uploaded my bayes-test-input folder to dropbox, the
> > first
> > >>> file
> > >>> > >> is
> > >>> > >> >> > here...
> > >>> > >> >> >
> > >>> http://dl.dropbox.com/u/7881451/bayes-test-input/alt.atheism.txt
> > >>> > >> >> >
> > >>> > >> >> > Thanks,
> > >>> > >> >> > Vijay
> > >>> > >> >> >
> > >>> > >> >> > On Mon, Jul 4, 2011 at 7:23 PM, Sergey Bartunov <sbos.net@
> > >>> > gmail.com>
> > >>> > >> >> wrote:
> > >>> > >> >> >
> > >>> > >> >> >> Paste somewhere your  bayes-test-input file.
> > >>> > >> >> >>
> > >>> > >> >> >> On 4 July 2011 13:20, Sergey Bartunov <sbos.net@gmail.com
> >
> > >>> wrote:
> > >>> > >> >> >> > Yes, I worked WITH hadoop, but there should be no
> > >>> difference.
> > >>> > >> >> >> >
> > >>> > >> >> >> > Why do you use examples/bin/build/20news-bayes.sh
> instead
> > of
> > >>> > >> direct
> > >>> > >> >> >> > running bin/mahout? Is it the same?
> > >>> > >> >> >> >
> > >>> > >> >> >> > On 4 July 2011 13:12, Vijay Santhanam <
> > >>> > vijay.santhanam@gmail.com>
> > >>> > >> >> wrote:
> > >>> > >> >> >> >> Thanks Sergey,
> > >>> > >> >> >> >>
> > >>> > >> >> >> >> I'm still receiving the same error after following
> those
> > >>> steps.
> > >>> > >> >> >> >> I've chosen not to use hadoop - does yours work WITH
> > >>> hadoop?
> > >>> > >> >> >> >>
> > >>> > >> >> >> >> A few bits of info that might be relevant.
> > >>> > >> >> >> >>
> > >>> > >> >> >> >> My examples/bin/work folder contains the expected
> folders
> > >>> from
> > >>> > >> test
> > >>> > >> >> data
> > >>> > >> >> >> >> preparation and training...
> > >>> > >> >> >> >> drwxr-xr-x@ 22 Vijay  staff  748 18 Mar  2003
> > >>> > 20news-bydate-test
> > >>> > >> >> >> >> drwxr-xr-x@ 22 Vijay  staff  748 18 Mar  2003
> > >>> > >> 20news-bydate-train
> > >>> > >> >> >> >> drwxr-xr-x   3 Vijay  staff  102  4 Jul 19:03
> bayes-model
> > >>> > >> >> >> >> drwxr-xr-x  22 Vijay  staff  748  4 Jul 18:20
> > >>> bayes-test-input
> > >>> > >> >> >> >> drwxr-xr-x  22 Vijay  staff  748  4 Jul 17:49
> > >>> bayes-train-input
> > >>> > >> >> >> >>
> > >>> > >> >> >> >>
> > >>> > >> >> >> >> I appreciate your help, do you have any other
> > suggestions?
> > >>> > >> >> >> >>
> > >>> > >> >> >> >> Regards,
> > >>> > >> >> >> >> Vijay
> > >>> > >> >> >> >>
> > >>> > >> >> >> >>
> > >>> > >> >> >> >> On Mon, Jul 4, 2011 at 6:58 PM, Sergey Bartunov <
> > sbos.net@
> > >>> > >> gmail.com>
> > >>> > >> >> >> wrote:
> > >>> > >> >> >> >>
> > >>> > >> >> >> >>> When I started with Mahout I had the same errors. In
> my
> > >>> case,
> > >>> > I
> > >>> > >> just
> > >>> > >> >> >> >>> didn't run PrepareTwentyNewsgroups. You may try to
> > >>> accurately
> > >>> > >> repeat
> > >>> > >> >> >> >>> all steps from
> > >>> > >> >> https://cwiki.apache.org/MAHOUT/twenty-newsgroups.html
> > >>> > >> >> >> >>>
> > >>> > >> >> >> >>> On 4 July 2011 12:52, Vijay Santhanam <
> > >>> > >> vijay.santhanam@gmail.com>
> > >>> > >> >> >> wrote:
> > >>> > >> >> >> >>> > Hi All,
> > >>> > >> >> >> >>> >
> > >>> > >> >> >> >>> > I'm new to Mahout and I'm interested in
> experimenting
> > >>> with
> > >>> > >> it's
> > >>> > >> >> >> >>> classifiers.
> > >>> > >> >> >> >>> >
> > >>> > >> >> >> >>> > Right now, I'm just trying to get up and running
> with
> > >>> the
> > >>> > >> demo's
> > >>> > >> >> and
> > >>> > >> >> >> >>> > examples.
> > >>> > >> >> >> >>> >
> > >>> > >> >> >> >>> > After checking out the mahout trunk, I've tried
> > running
> > >>> the
> > >>> > >> >> >> >>> classification
> > >>> > >> >> >> >>> > example 20news, but after running the
> > >>> > >> >> >> >>> ./examples/bin/build/20news-bayes.sh
> > >>> > >> >> >> >>> > script I get the following error during the
> > >>> classification
> > >>> > >> phase.
> > >>> > >> >> >> >>> >
> > >>> > >> >> >> >>> > Does anyone else get the same thing? Or have any
> > >>> > >> recommendations
> > >>> > >> >> >> about
> > >>> > >> >> >> >>> how
> > >>> > >> >> >> >>> > to fix it?
> > >>> > >> >> >> >>> > I'd just like to get a sample classifier working
> > before
> > >>> I
> > >>> > >> embark
> > >>> > >> >> on
> > >>> > >> >> >> my
> > >>> > >> >> >> >>> own
> > >>> > >> >> >> >>> > classification journey.
> > >>> > >> >> >> >>> >
> > >>> > >> >> >> >>> >
> > >>> > >> >> >> >>> > INFO: Loading model from:
> > >>> > >> >> >> >>> >
> {basePath=examples/bin/work/20news-bydate/bayes-model,
> > >>> > >> >> >> >>> classifierType=bayes,
> > >>> > >> >> >> >>> > alpha_i=1.0, dataSource=hdfs, gramSize=1,
> > verbose=false,
> > >>> > >> >> >> encoding=UTF-8,
> > >>> > >> >> >> >>> > defaultCat=unknown,
> > >>> > >> >> >> >>> >
> > >>> > testDirPath=examples/bin/work/20news-bydate/bayes-test-input}
> > >>> > >> >> >> >>> > Jul 4, 2011 6:28:25 PM
> org.slf4j.impl.JCLLoggerAdapter
> > >>> info
> > >>> > >> >> >> >>> > INFO: Testing Bayes Classifier
> > >>> > >> >> >> >>> > Jul 4, 2011 6:28:27 PM
> org.slf4j.impl.JCLLoggerAdapter
> > >>> info
> > >>> > >> >> >> >>> > INFO: Read 50000 feature weights
> > >>> > >> >> >> >>> > Jul 4, 2011 6:28:27 PM
> org.slf4j.impl.JCLLoggerAdapter
> > >>> info
> > >>> > >> >> >> >>> > INFO: Read 100000 feature weights
> > >>> > >> >> >> >>> > Jul 4, 2011 6:28:28 PM
> org.slf4j.impl.JCLLoggerAdapter
> > >>> info
> > >>> > >> >> >> >>> > INFO: 193370.88331085522
> > >>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM
> org.slf4j.impl.JCLLoggerAdapter
> > >>> info
> > >>> > >> >> >> >>> > INFO: rec.sport.baseball -129829.34738930278
> > >>> > 531784.7805631821
> > >>> > >> >> >> >>> > -0.2441388925268003
> > >>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM
> org.slf4j.impl.JCLLoggerAdapter
> > >>> info
> > >>> > >> >> >> >>> > INFO: sci.crypt -193023.42370049533
> 531784.7805631821
> > >>> > >> >> >> -0.3629728242618669
> > >>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM
> org.slf4j.impl.JCLLoggerAdapter
> > >>> info
> > >>> > >> >> >> >>> > INFO: rec.sport.hockey -167853.6159738822
> > >>> 531784.7805631821
> > >>> > >> >> >> >>> > -0.31564200802459647
> > >>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM
> org.slf4j.impl.JCLLoggerAdapter
> > >>> info
> > >>> > >> >> >> >>> > INFO: talk.politics.guns -203524.0148974065
> > >>> > 531784.7805631821
> > >>> > >> >> >> >>> > -0.3827187658170024
> > >>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM
> org.slf4j.impl.JCLLoggerAdapter
> > >>> info
> > >>> > >> >> >> >>> > INFO: soc.religion.christian -163900.9258713857
> > >>> > >> 531784.7805631821
> > >>> > >> >> >> >>> > -0.308209132457322
> > >>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM
> org.slf4j.impl.JCLLoggerAdapter
> > >>> info
> > >>> > >> >> >> >>> > INFO: sci.electronics -142854.1677345925
> > >>> 531784.7805631821
> > >>> > >> >> >> >>> > -0.26863154598614886
> > >>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM
> org.slf4j.impl.JCLLoggerAdapter
> > >>> info
> > >>> > >> >> >> >>> > INFO: comp.os.ms-windows.misc -531784.7805631821
> > >>> > >> 531784.7805631821
> > >>> > >> >> >> -1.0
> > >>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM
> org.slf4j.impl.JCLLoggerAdapter
> > >>> info
> > >>> > >> >> >> >>> > INFO: misc.forsale -143454.70176448982
> > 531784.7805631821
> > >>> > >> >> >> >>> > -0.26976082619845826
> > >>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM
> org.slf4j.impl.JCLLoggerAdapter
> > >>> info
> > >>> > >> >> >> >>> > INFO: talk.religion.misc -139428.73484148504
> > >>> > 531784.7805631821
> > >>> > >> >> >> >>> > -0.2621901565024562
> > >>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM
> org.slf4j.impl.JCLLoggerAdapter
> > >>> info
> > >>> > >> >> >> >>> > INFO: alt.atheism -139569.06867597546
> > 531784.7805631821
> > >>> > >> >> >> >>> -0.2624540486626301
> > >>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM
> org.slf4j.impl.JCLLoggerAdapter
> > >>> info
> > >>> > >> >> >> >>> > INFO: comp.windows.x -178029.10523376046
> > >>> 531784.7805631821
> > >>> > >> >> >> >>> > -0.33477660839638973
> > >>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM
> org.slf4j.impl.JCLLoggerAdapter
> > >>> info
> > >>> > >> >> >> >>> > INFO: talk.politics.mideast -193075.00789450994
> > >>> > >> 531784.7805631821
> > >>> > >> >> >> >>> > -0.36306982627452317
> > >>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM
> org.slf4j.impl.JCLLoggerAdapter
> > >>> info
> > >>> > >> >> >> >>> > INFO: comp.sys.ibm.pc.hardware -138410.02049984262
> > >>> > >> >> 531784.7805631821
> > >>> > >> >> >> >>> > -0.2602745049477736
> > >>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM
> org.slf4j.impl.JCLLoggerAdapter
> > >>> info
> > >>> > >> >> >> >>> > INFO: comp.sys.mac.hardware -125200.9927438868
> > >>> > >> 531784.7805631821
> > >>> > >> >> >> >>> > -0.23543545682389364
> > >>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM
> org.slf4j.impl.JCLLoggerAdapter
> > >>> info
> > >>> > >> >> >> >>> > INFO: sci.space -192437.0009266271 531784.7805631821
> > >>> > >> >> >> -0.3618700797018455
> > >>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM
> org.slf4j.impl.JCLLoggerAdapter
> > >>> info
> > >>> > >> >> >> >>> > INFO: rec.motorcycles -143142.20855440624
> > >>> 531784.7805631821
> > >>> > >> >> >> >>> > -0.26917319522159455
> > >>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM
> org.slf4j.impl.JCLLoggerAdapter
> > >>> info
> > >>> > >> >> >> >>> > INFO: rec.autos -141800.97549909537
> 531784.7805631821
> > >>> > >> >> >> -0.2666510601317365
> > >>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM
> org.slf4j.impl.JCLLoggerAdapter
> > >>> info
> > >>> > >> >> >> >>> > INFO: comp.graphics -166882.18654471825
> > >>> 531784.7805631821
> > >>> > >> >> >> >>> > -0.3138152738556811
> > >>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM
> org.slf4j.impl.JCLLoggerAdapter
> > >>> info
> > >>> > >> >> >> >>> > INFO: talk.politics.misc -165196.84193278523
> > >>> > 531784.7805631821
> > >>> > >> >> >> >>> > -0.3106460507535303
> > >>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM
> org.slf4j.impl.JCLLoggerAdapter
> > >>> info
> > >>> > >> >> >> >>> > INFO: sci.med -192698.5183245711 531784.7805631821
> > >>> > >> >> >> -0.36236185270382393
> > >>> > >> >> >> >>> > Exception in thread "main"
> > >>> > java.lang.IllegalArgumentException:
> > >>> > >> >> Label
> > >>> > >> >> >> not
> > >>> > >> >> >> >>> > found: alt.atheism from
> > >>> > >> >> >> >>> >  at
> > >>> > >> >> >> >>> >
> > >>> > >> >> >>
> > >>> > >> >>
> > >>> > >>
> > >>> >
> > >>>
> > com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
> > >>> > >> >> >> >>> > at
> > >>> > >> >> >> >>> >
> > >>> > >> >> >> >>>
> > >>> > >> >> >>
> > >>> > >> >>
> > >>> > >>
> > >>> >
> > >>>
> >
> org.apache.mahout.classifier.ConfusionMatrix.getCount(ConfusionMatrix.java:93)
> > >>> > >> >> >> >>> >  at
> > >>> > >> >> >> >>> >
> > >>> > >> >> >> >>>
> > >>> > >> >> >>
> > >>> > >> >>
> > >>> > >>
> > >>> >
> > >>>
> >
> org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:113)
> > >>> > >> >> >> >>> > at
> > >>> > >> >> >> >>> >
> > >>> > >> >> >> >>>
> > >>> > >> >> >>
> > >>> > >> >>
> > >>> > >>
> > >>> >
> > >>>
> >
> org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:117)
> > >>> > >> >> >> >>> >  at
> > >>> > >> >> >> >>> >
> > >>> > >> >> >> >>>
> > >>> > >> >> >>
> > >>> > >> >>
> > >>> > >>
> > >>> >
> > >>>
> >
> org.apache.mahout.classifier.ConfusionMatrix.addInstance(ConfusionMatrix.java:85)
> > >>> > >> >> >> >>> > at
> > >>> > >> >> >> >>> >
> > >>> > >> >> >> >>>
> > >>> > >> >> >>
> > >>> > >> >>
> > >>> > >>
> > >>> >
> > >>>
> >
> org.apache.mahout.classifier.ResultAnalyzer.addInstance(ResultAnalyzer.java:67)
> > >>> > >> >> >> >>> >  at
> > >>> > >> >> >> >>> >
> > >>> > >> >> >> >>>
> > >>> > >> >> >>
> > >>> > >> >>
> > >>> > >>
> > >>> >
> > >>>
> >
> org.apache.mahout.classifier.bayes.TestClassifier.classifySequential(TestClassifier.java:244)
> > >>> > >> >> >> >>> > at
> > >>> > >> >> >> >>> >
> > >>> > >> >> >> >>>
> > >>> > >> >> >>
> > >>> > >> >>
> > >>> > >>
> > >>> >
> > >>>
> >
> org.apache.mahout.classifier.bayes.TestClassifier.main(TestClassifier.java:177)
> > >>> > >> >> >> >>> >  at
> > sun.reflect.NativeMethodAccessorImpl.invoke0(Native
> > >>> > >> Method)
> > >>> > >> >> >> >>> > at
> > >>> > >> >> >> >>> >
> > >>> > >> >> >> >>>
> > >>> > >> >> >>
> > >>> > >> >>
> > >>> > >>
> > >>> >
> > >>>
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> > >>> > >> >> >> >>> >  at
> > >>> > >> >> >> >>> >
> > >>> > >> >> >> >>>
> > >>> > >> >> >>
> > >>> > >> >>
> > >>> > >>
> > >>> >
> > >>>
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> > >>> > >> >> >> >>> > at java.lang.reflect.Method.invoke(Method.java:597)
> > >>> > >> >> >> >>> >  at
> > >>> > >> >> >> >>> >
> > >>> > >> >> >> >>>
> > >>> > >> >> >>
> > >>> > >> >>
> > >>> > >>
> > >>> >
> > >>>
> >
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> > >>> > >> >> >> >>> > at
> > >>> > >> >> >>
> > >>> > org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> > >>> > >> >> >> >>> >  at
> > >>> > >> >>
> > org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
> > >>> > >> >> >> >>> >
> > >>> > >> >> >> >>> >
> > >>> > >> >> >> >>> > Any help is great appreciated.
> > >>> > >> >> >> >>> >
> > >>> > >> >> >> >>> > Regards,
> > >>> > >> >> >> >>> > --
> > >>> > >> >> >> >>> >  Vijay Santhanam
> > >>> > >> >> >> >>> >  Software Engineer
> > >>> > >> >> >> >>> >
> > >>> > >> >> >> >>>
> > >>> > >> >> >> >>
> > >>> > >> >> >> >>
> > >>> > >> >> >> >>
> > >>> > >> >> >> >> --
> > >>> > >> >> >> >>  Vijay Santhanam
> > >>> > >> >> >> >>  Software Engineer
> > >>> > >> >> >> >>  http://au.linkedin.com/in/vijaysanthanam
> > >>> > >> >> >> >>  0407525087
> > >>> > >> >> >> >>
> > >>> > >> >> >> >
> > >>> > >> >> >>
> > >>> > >> >> >
> > >>> > >> >> >
> > >>> > >> >> >
> > >>> > >> >> > --
> > >>> > >> >> >  Vijay Santhanam
> > >>> > >> >> >  Software Engineer
> > >>> > >> >> >  http://au.linkedin.com/in/vijaysanthanam
> > >>> > >> >> >  0407525087
> > >>> > >> >> >
> > >>> > >> >>
> > >>> > >> >
> > >>> > >> >
> > >>> > >> >
> > >>> > >> > --
> > >>> > >> >  Vijay Santhanam
> > >>> > >> >  Software Engineer
> > >>> > >> >  http://au.linkedin.com/in/vijaysanthanam
> > >>> > >> >  0407525087
> > >>> > >> >
> > >>> > >>
> > >>> > >
> > >>> > >
> > >>> > >
> > >>> > > --
> > >>> > >  Vijay Santhanam
> > >>> > >  Software Engineer
> > >>> > >  http://au.linkedin.com/in/vijaysanthanam
> > >>> > >  0407525087
> > >>> > >
> > >>> >
> > >>> >
> > >>> >
> > >>> > --
> > >>> >  Vijay Santhanam
> > >>> >  Software Engineer
> > >>> >  http://au.linkedin.com/in/vijaysanthanam
> > >>> >  0407525087
> > >>> >
> > >>>
> > >>
> > >>
> > >>
> > >> --
> > >>  Vijay Santhanam
> > >>  Software Engineer
> > >>  http://au.linkedin.com/in/vijaysanthanam
> > >>  0407525087
> > >>
> > >
> > >
> > >
> > > --
> > >  Vijay Santhanam
> > >  Software Engineer
> > >  http://au.linkedin.com/in/vijaysanthanam
> > >  0407525087
> > >
> >
> >
> >
> > --
> >  Vijay Santhanam
> >  Software Engineer
> >  http://au.linkedin.com/in/vijaysanthanam
> >  0407525087
> >
>



-- 
 Vijay Santhanam
 Software Engineer
 http://au.linkedin.com/in/vijaysanthanam
 0407525087

Re: 20news

Posted by Robin Anil <ro...@gmail.com>.
Are you using some non-standard Java character encoding?


On Mon, Jul 4, 2011 at 5:23 PM, Vijay Santhanam
<vi...@gmail.com>wrote:

> Hi,
>
> Okay, I replaced all the tab characters with space characters for each file
> in the bayes-test-input folder and now the classifier completes without
> error.
>
> Tomorrow I'll investigate why the trainer correctly parses the
> tab-separated
> label correctly, but the classifier does not. Actually, I know why the
> classifier doesn't extract the correct label--- because
> org.apache.mahout.common.nlp.NGrams tokenizes via spaces only.
>
> The other mystery is why it works for everyone else except poor me :(
>
> If anyone has any ideas I'd love to hear it.
>
> Cheers,
> Vijay
>
>
>
> On Mon, Jul 4, 2011 at 9:16 PM, Vijay Santhanam
> <vi...@gmail.com>wrote:
>
> > Hi,
> >
> > I got debugger running w/ eclipse so I could watch what was happening
> under
> > the hood.
> >
> > Here's the exception again
> > Exception in thread "main" java.lang.IllegalArgumentException: Label not
> > found: alt.atheism from
> >  at
> > com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
> > at
> >
> org.apache.mahout.classifier.ConfusionMatrix.getCount(ConfusionMatrix.java:93)
> >  at
> >
> org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:113)
> > at
> >
> org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:117)
> >  at
> >
> org.apache.mahout.classifier.ConfusionMatrix.addInstance(ConfusionMatrix.java:85)
> > at
> >
> org.apache.mahout.classifier.ResultAnalyzer.addInstance(ResultAnalyzer.java:67)
> >  at
> >
> org.apache.mahout.classifier.bayes.TestClassifier.classifySequential(TestClassifier.java:244)
> > at
> >
> org.apache.mahout.classifier.bayes.TestClassifier.main(TestClassifier.java:177)
> >  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > at
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> >  at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> > at java.lang.reflect.Method.invoke(Method.java:597)
> >  at
> >
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> > at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> >  at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
> >
> > Notice the "Label not found: alt.atheism\tfrom"
> >
> > That's an invalid label in the confusion matrix. I think it SHOULD be
> just
> > alt.atheism. I'm not sure how the \tfrom is getting in there, but it is.
> > Perhaps it has something to do with the way my test data was formatted.
> >
> > I'll keep digging....
> >
> > Thanks,
> > Vijay
> >
> >
> >
> > On Mon, Jul 4, 2011 at 8:52 PM, Vijay Santhanam <
> vijay.santhanam@gmail.com
> > > wrote:
> >
> >> Hi Robin,
> >>
> >> The console dump was a too large for pastebin, so I uploaded it here --
> >> http://dl.dropbox.com/u/7881451/build-20news-bayes-console-output.txt
> >>
> >> I performed a fresh checkout only hours ago, and I used script
> >> examples/bin/build-20news-bayes.sh
> >> I've opted to avoid hadoop, but from what I can tell the model was
> created
> >> with success.
> >>
> >>
> >> Thanks,
> >> Vijay
> >>
> >>
> >> On Mon, Jul 4, 2011 at 8:28 PM, Robin Anil <ro...@gmail.com>
> wrote:
> >>
> >>> Can you send me the console dump
> >>> Command line + Log written by the program and put it on say pastebin
> >>>
> >>> Robin
> >>>
> >>> On Mon, Jul 4, 2011 at 3:48 PM, Vijay Santhanam
> >>> <vi...@gmail.com>wrote:
> >>>
> >>> > I tried deleting all the folders from the test and train data except
> >>> for
> >>> > alt.atheism, but I get the identical error.
> >>> >
> >>> > I might try debugging the problem in eclipse rather than from
> >>> commandline,
> >>> > but Eclipse doesn't quite want to work either.
> >>> >
> >>> >
> >>> > On Mon, Jul 4, 2011 at 8:02 PM, Vijay Santhanam
> >>> > <vi...@gmail.com>wrote:
> >>> >
> >>> > > Thanks anyway Sergey. Could you perhaps upload your bayes-model
> >>> folder so
> >>> > I
> >>> > > could try that out?
> >>> > >
> >>> > >
> >>> > >
> >>> > > On Mon, Jul 4, 2011 at 7:57 PM, Sergey Bartunov <sbos.net@
> gmail.com
> >>> > >wrote:
> >>> > >
> >>> > >> Well, that's strange. Sorry, I can't help you at the moment, maybe
> >>> > >> someone else in the mailing list could.
> >>> > >>
> >>> > >> On 4 July 2011 13:49, Vijay Santhanam <vi...@gmail.com>
> >>> > wrote:
> >>> > >> > Hi Sergey,
> >>> > >> >
> >>> > >> > Yes, there were no errors.
> >>> > >> >
> >>> > >> > And all the model data seems to have been populated into
> >>> bayes-model
> >>> > >> folder.
> >>> > >> > Also, each main folder in bayes-model has a _SUCESS file.
> >>> > >> >
> >>> > >> > See the tarball of my trained model here,
> >>> > >> > http://dl.dropbox.com/u/7881451/bayes-model.tar.gz
> >>> > >> > Please compare it to your trained model if possible, I would
> like
> >>> to
> >>> > >> know if
> >>> > >> > it's different in any way.
> >>> > >> >
> >>> > >> > Perhaps it's corrupted in someway.
> >>> > >> >
> >>> > >> > Thanks,
> >>> > >> > Vijay
> >>> > >> >
> >>> > >> >
> >>> > >> >
> >>> > >> > On Mon, Jul 4, 2011 at 7:39 PM, Sergey Bartunov <sbos.net@
> >>> gmail.com>
> >>> > >> wrote:
> >>> > >> >
> >>> > >> >> Stop, did you _train_ the classifier successfully before
> running
> >>> the
> >>> > >> >> _test_?
> >>> > >> >>
> >>> > >> >> On 4 July 2011 13:30, Vijay Santhanam <
> vijay.santhanam@gmail.com
> >>> >
> >>> > >> wrote:
> >>> > >> >> > Hi Sergey,
> >>> > >> >> >
> >>> > >> >> > I've tried using both the sh script file and following the
> >>> > >> instructions
> >>> > >> >> at
> >>> > >> >> > https://cwiki.apache.org/MAHOUT/twenty-newsgroups.html -
> like
> >>> you
> >>> > >> >> suggested.
> >>> > >> >> > Both return the same results.
> >>> > >> >> >
> >>> > >> >> > I've uploaded my bayes-test-input folder to dropbox, the
> first
> >>> file
> >>> > >> is
> >>> > >> >> > here...
> >>> > >> >> >
> >>> http://dl.dropbox.com/u/7881451/bayes-test-input/alt.atheism.txt
> >>> > >> >> >
> >>> > >> >> > Thanks,
> >>> > >> >> > Vijay
> >>> > >> >> >
> >>> > >> >> > On Mon, Jul 4, 2011 at 7:23 PM, Sergey Bartunov <sbos.net@
> >>> > gmail.com>
> >>> > >> >> wrote:
> >>> > >> >> >
> >>> > >> >> >> Paste somewhere your  bayes-test-input file.
> >>> > >> >> >>
> >>> > >> >> >> On 4 July 2011 13:20, Sergey Bartunov <sb...@gmail.com>
> >>> wrote:
> >>> > >> >> >> > Yes, I worked WITH hadoop, but there should be no
> >>> difference.
> >>> > >> >> >> >
> >>> > >> >> >> > Why do you use examples/bin/build/20news-bayes.sh instead
> of
> >>> > >> direct
> >>> > >> >> >> > running bin/mahout? Is it the same?
> >>> > >> >> >> >
> >>> > >> >> >> > On 4 July 2011 13:12, Vijay Santhanam <
> >>> > vijay.santhanam@gmail.com>
> >>> > >> >> wrote:
> >>> > >> >> >> >> Thanks Sergey,
> >>> > >> >> >> >>
> >>> > >> >> >> >> I'm still receiving the same error after following those
> >>> steps.
> >>> > >> >> >> >> I've chosen not to use hadoop - does yours work WITH
> >>> hadoop?
> >>> > >> >> >> >>
> >>> > >> >> >> >> A few bits of info that might be relevant.
> >>> > >> >> >> >>
> >>> > >> >> >> >> My examples/bin/work folder contains the expected folders
> >>> from
> >>> > >> test
> >>> > >> >> data
> >>> > >> >> >> >> preparation and training...
> >>> > >> >> >> >> drwxr-xr-x@ 22 Vijay  staff  748 18 Mar  2003
> >>> > 20news-bydate-test
> >>> > >> >> >> >> drwxr-xr-x@ 22 Vijay  staff  748 18 Mar  2003
> >>> > >> 20news-bydate-train
> >>> > >> >> >> >> drwxr-xr-x   3 Vijay  staff  102  4 Jul 19:03 bayes-model
> >>> > >> >> >> >> drwxr-xr-x  22 Vijay  staff  748  4 Jul 18:20
> >>> bayes-test-input
> >>> > >> >> >> >> drwxr-xr-x  22 Vijay  staff  748  4 Jul 17:49
> >>> bayes-train-input
> >>> > >> >> >> >>
> >>> > >> >> >> >>
> >>> > >> >> >> >> I appreciate your help, do you have any other
> suggestions?
> >>> > >> >> >> >>
> >>> > >> >> >> >> Regards,
> >>> > >> >> >> >> Vijay
> >>> > >> >> >> >>
> >>> > >> >> >> >>
> >>> > >> >> >> >> On Mon, Jul 4, 2011 at 6:58 PM, Sergey Bartunov <
> sbos.net@
> >>> > >> gmail.com>
> >>> > >> >> >> wrote:
> >>> > >> >> >> >>
> >>> > >> >> >> >>> When I started with Mahout I had the same errors. In my
> >>> case,
> >>> > I
> >>> > >> just
> >>> > >> >> >> >>> didn't run PrepareTwentyNewsgroups. You may try to
> >>> accurately
> >>> > >> repeat
> >>> > >> >> >> >>> all steps from
> >>> > >> >> https://cwiki.apache.org/MAHOUT/twenty-newsgroups.html
> >>> > >> >> >> >>>
> >>> > >> >> >> >>> On 4 July 2011 12:52, Vijay Santhanam <
> >>> > >> vijay.santhanam@gmail.com>
> >>> > >> >> >> wrote:
> >>> > >> >> >> >>> > Hi All,
> >>> > >> >> >> >>> >
> >>> > >> >> >> >>> > I'm new to Mahout and I'm interested in experimenting
> >>> with
> >>> > >> it's
> >>> > >> >> >> >>> classifiers.
> >>> > >> >> >> >>> >
> >>> > >> >> >> >>> > Right now, I'm just trying to get up and running with
> >>> the
> >>> > >> demo's
> >>> > >> >> and
> >>> > >> >> >> >>> > examples.
> >>> > >> >> >> >>> >
> >>> > >> >> >> >>> > After checking out the mahout trunk, I've tried
> running
> >>> the
> >>> > >> >> >> >>> classification
> >>> > >> >> >> >>> > example 20news, but after running the
> >>> > >> >> >> >>> ./examples/bin/build/20news-bayes.sh
> >>> > >> >> >> >>> > script I get the following error during the
> >>> classification
> >>> > >> phase.
> >>> > >> >> >> >>> >
> >>> > >> >> >> >>> > Does anyone else get the same thing? Or have any
> >>> > >> recommendations
> >>> > >> >> >> about
> >>> > >> >> >> >>> how
> >>> > >> >> >> >>> > to fix it?
> >>> > >> >> >> >>> > I'd just like to get a sample classifier working
> before
> >>> I
> >>> > >> embark
> >>> > >> >> on
> >>> > >> >> >> my
> >>> > >> >> >> >>> own
> >>> > >> >> >> >>> > classification journey.
> >>> > >> >> >> >>> >
> >>> > >> >> >> >>> >
> >>> > >> >> >> >>> > INFO: Loading model from:
> >>> > >> >> >> >>> > {basePath=examples/bin/work/20news-bydate/bayes-model,
> >>> > >> >> >> >>> classifierType=bayes,
> >>> > >> >> >> >>> > alpha_i=1.0, dataSource=hdfs, gramSize=1,
> verbose=false,
> >>> > >> >> >> encoding=UTF-8,
> >>> > >> >> >> >>> > defaultCat=unknown,
> >>> > >> >> >> >>> >
> >>> > testDirPath=examples/bin/work/20news-bydate/bayes-test-input}
> >>> > >> >> >> >>> > Jul 4, 2011 6:28:25 PM org.slf4j.impl.JCLLoggerAdapter
> >>> info
> >>> > >> >> >> >>> > INFO: Testing Bayes Classifier
> >>> > >> >> >> >>> > Jul 4, 2011 6:28:27 PM org.slf4j.impl.JCLLoggerAdapter
> >>> info
> >>> > >> >> >> >>> > INFO: Read 50000 feature weights
> >>> > >> >> >> >>> > Jul 4, 2011 6:28:27 PM org.slf4j.impl.JCLLoggerAdapter
> >>> info
> >>> > >> >> >> >>> > INFO: Read 100000 feature weights
> >>> > >> >> >> >>> > Jul 4, 2011 6:28:28 PM org.slf4j.impl.JCLLoggerAdapter
> >>> info
> >>> > >> >> >> >>> > INFO: 193370.88331085522
> >>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
> >>> info
> >>> > >> >> >> >>> > INFO: rec.sport.baseball -129829.34738930278
> >>> > 531784.7805631821
> >>> > >> >> >> >>> > -0.2441388925268003
> >>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
> >>> info
> >>> > >> >> >> >>> > INFO: sci.crypt -193023.42370049533 531784.7805631821
> >>> > >> >> >> -0.3629728242618669
> >>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
> >>> info
> >>> > >> >> >> >>> > INFO: rec.sport.hockey -167853.6159738822
> >>> 531784.7805631821
> >>> > >> >> >> >>> > -0.31564200802459647
> >>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
> >>> info
> >>> > >> >> >> >>> > INFO: talk.politics.guns -203524.0148974065
> >>> > 531784.7805631821
> >>> > >> >> >> >>> > -0.3827187658170024
> >>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
> >>> info
> >>> > >> >> >> >>> > INFO: soc.religion.christian -163900.9258713857
> >>> > >> 531784.7805631821
> >>> > >> >> >> >>> > -0.308209132457322
> >>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
> >>> info
> >>> > >> >> >> >>> > INFO: sci.electronics -142854.1677345925
> >>> 531784.7805631821
> >>> > >> >> >> >>> > -0.26863154598614886
> >>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
> >>> info
> >>> > >> >> >> >>> > INFO: comp.os.ms-windows.misc -531784.7805631821
> >>> > >> 531784.7805631821
> >>> > >> >> >> -1.0
> >>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
> >>> info
> >>> > >> >> >> >>> > INFO: misc.forsale -143454.70176448982
> 531784.7805631821
> >>> > >> >> >> >>> > -0.26976082619845826
> >>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
> >>> info
> >>> > >> >> >> >>> > INFO: talk.religion.misc -139428.73484148504
> >>> > 531784.7805631821
> >>> > >> >> >> >>> > -0.2621901565024562
> >>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
> >>> info
> >>> > >> >> >> >>> > INFO: alt.atheism -139569.06867597546
> 531784.7805631821
> >>> > >> >> >> >>> -0.2624540486626301
> >>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
> >>> info
> >>> > >> >> >> >>> > INFO: comp.windows.x -178029.10523376046
> >>> 531784.7805631821
> >>> > >> >> >> >>> > -0.33477660839638973
> >>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
> >>> info
> >>> > >> >> >> >>> > INFO: talk.politics.mideast -193075.00789450994
> >>> > >> 531784.7805631821
> >>> > >> >> >> >>> > -0.36306982627452317
> >>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
> >>> info
> >>> > >> >> >> >>> > INFO: comp.sys.ibm.pc.hardware -138410.02049984262
> >>> > >> >> 531784.7805631821
> >>> > >> >> >> >>> > -0.2602745049477736
> >>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
> >>> info
> >>> > >> >> >> >>> > INFO: comp.sys.mac.hardware -125200.9927438868
> >>> > >> 531784.7805631821
> >>> > >> >> >> >>> > -0.23543545682389364
> >>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
> >>> info
> >>> > >> >> >> >>> > INFO: sci.space -192437.0009266271 531784.7805631821
> >>> > >> >> >> -0.3618700797018455
> >>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
> >>> info
> >>> > >> >> >> >>> > INFO: rec.motorcycles -143142.20855440624
> >>> 531784.7805631821
> >>> > >> >> >> >>> > -0.26917319522159455
> >>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
> >>> info
> >>> > >> >> >> >>> > INFO: rec.autos -141800.97549909537 531784.7805631821
> >>> > >> >> >> -0.2666510601317365
> >>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
> >>> info
> >>> > >> >> >> >>> > INFO: comp.graphics -166882.18654471825
> >>> 531784.7805631821
> >>> > >> >> >> >>> > -0.3138152738556811
> >>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
> >>> info
> >>> > >> >> >> >>> > INFO: talk.politics.misc -165196.84193278523
> >>> > 531784.7805631821
> >>> > >> >> >> >>> > -0.3106460507535303
> >>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
> >>> info
> >>> > >> >> >> >>> > INFO: sci.med -192698.5183245711 531784.7805631821
> >>> > >> >> >> -0.36236185270382393
> >>> > >> >> >> >>> > Exception in thread "main"
> >>> > java.lang.IllegalArgumentException:
> >>> > >> >> Label
> >>> > >> >> >> not
> >>> > >> >> >> >>> > found: alt.atheism from
> >>> > >> >> >> >>> >  at
> >>> > >> >> >> >>> >
> >>> > >> >> >>
> >>> > >> >>
> >>> > >>
> >>> >
> >>>
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
> >>> > >> >> >> >>> > at
> >>> > >> >> >> >>> >
> >>> > >> >> >> >>>
> >>> > >> >> >>
> >>> > >> >>
> >>> > >>
> >>> >
> >>>
> org.apache.mahout.classifier.ConfusionMatrix.getCount(ConfusionMatrix.java:93)
> >>> > >> >> >> >>> >  at
> >>> > >> >> >> >>> >
> >>> > >> >> >> >>>
> >>> > >> >> >>
> >>> > >> >>
> >>> > >>
> >>> >
> >>>
> org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:113)
> >>> > >> >> >> >>> > at
> >>> > >> >> >> >>> >
> >>> > >> >> >> >>>
> >>> > >> >> >>
> >>> > >> >>
> >>> > >>
> >>> >
> >>>
> org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:117)
> >>> > >> >> >> >>> >  at
> >>> > >> >> >> >>> >
> >>> > >> >> >> >>>
> >>> > >> >> >>
> >>> > >> >>
> >>> > >>
> >>> >
> >>>
> org.apache.mahout.classifier.ConfusionMatrix.addInstance(ConfusionMatrix.java:85)
> >>> > >> >> >> >>> > at
> >>> > >> >> >> >>> >
> >>> > >> >> >> >>>
> >>> > >> >> >>
> >>> > >> >>
> >>> > >>
> >>> >
> >>>
> org.apache.mahout.classifier.ResultAnalyzer.addInstance(ResultAnalyzer.java:67)
> >>> > >> >> >> >>> >  at
> >>> > >> >> >> >>> >
> >>> > >> >> >> >>>
> >>> > >> >> >>
> >>> > >> >>
> >>> > >>
> >>> >
> >>>
> org.apache.mahout.classifier.bayes.TestClassifier.classifySequential(TestClassifier.java:244)
> >>> > >> >> >> >>> > at
> >>> > >> >> >> >>> >
> >>> > >> >> >> >>>
> >>> > >> >> >>
> >>> > >> >>
> >>> > >>
> >>> >
> >>>
> org.apache.mahout.classifier.bayes.TestClassifier.main(TestClassifier.java:177)
> >>> > >> >> >> >>> >  at
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native
> >>> > >> Method)
> >>> > >> >> >> >>> > at
> >>> > >> >> >> >>> >
> >>> > >> >> >> >>>
> >>> > >> >> >>
> >>> > >> >>
> >>> > >>
> >>> >
> >>>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> >>> > >> >> >> >>> >  at
> >>> > >> >> >> >>> >
> >>> > >> >> >> >>>
> >>> > >> >> >>
> >>> > >> >>
> >>> > >>
> >>> >
> >>>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >>> > >> >> >> >>> > at java.lang.reflect.Method.invoke(Method.java:597)
> >>> > >> >> >> >>> >  at
> >>> > >> >> >> >>> >
> >>> > >> >> >> >>>
> >>> > >> >> >>
> >>> > >> >>
> >>> > >>
> >>> >
> >>>
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> >>> > >> >> >> >>> > at
> >>> > >> >> >>
> >>> > org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> >>> > >> >> >> >>> >  at
> >>> > >> >>
> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
> >>> > >> >> >> >>> >
> >>> > >> >> >> >>> >
> >>> > >> >> >> >>> > Any help is great appreciated.
> >>> > >> >> >> >>> >
> >>> > >> >> >> >>> > Regards,
> >>> > >> >> >> >>> > --
> >>> > >> >> >> >>> >  Vijay Santhanam
> >>> > >> >> >> >>> >  Software Engineer
> >>> > >> >> >> >>> >
> >>> > >> >> >> >>>
> >>> > >> >> >> >>
> >>> > >> >> >> >>
> >>> > >> >> >> >>
> >>> > >> >> >> >> --
> >>> > >> >> >> >>  Vijay Santhanam
> >>> > >> >> >> >>  Software Engineer
> >>> > >> >> >> >>  http://au.linkedin.com/in/vijaysanthanam
> >>> > >> >> >> >>  0407525087
> >>> > >> >> >> >>
> >>> > >> >> >> >
> >>> > >> >> >>
> >>> > >> >> >
> >>> > >> >> >
> >>> > >> >> >
> >>> > >> >> > --
> >>> > >> >> >  Vijay Santhanam
> >>> > >> >> >  Software Engineer
> >>> > >> >> >  http://au.linkedin.com/in/vijaysanthanam
> >>> > >> >> >  0407525087
> >>> > >> >> >
> >>> > >> >>
> >>> > >> >
> >>> > >> >
> >>> > >> >
> >>> > >> > --
> >>> > >> >  Vijay Santhanam
> >>> > >> >  Software Engineer
> >>> > >> >  http://au.linkedin.com/in/vijaysanthanam
> >>> > >> >  0407525087
> >>> > >> >
> >>> > >>
> >>> > >
> >>> > >
> >>> > >
> >>> > > --
> >>> > >  Vijay Santhanam
> >>> > >  Software Engineer
> >>> > >  http://au.linkedin.com/in/vijaysanthanam
> >>> > >  0407525087
> >>> > >
> >>> >
> >>> >
> >>> >
> >>> > --
> >>> >  Vijay Santhanam
> >>> >  Software Engineer
> >>> >  http://au.linkedin.com/in/vijaysanthanam
> >>> >  0407525087
> >>> >
> >>>
> >>
> >>
> >>
> >> --
> >>  Vijay Santhanam
> >>  Software Engineer
> >>  http://au.linkedin.com/in/vijaysanthanam
> >>  0407525087
> >>
> >
> >
> >
> > --
> >  Vijay Santhanam
> >  Software Engineer
> >  http://au.linkedin.com/in/vijaysanthanam
> >  0407525087
> >
>
>
>
> --
>  Vijay Santhanam
>  Software Engineer
>  http://au.linkedin.com/in/vijaysanthanam
>  0407525087
>

Re: 20news

Posted by Vijay Santhanam <vi...@gmail.com>.
Hi,

Okay, I replaced all the tab characters with space characters for each file
in the bayes-test-input folder and now the classifier completes without
error.

Tomorrow I'll investigate why the trainer correctly parses the tab-separated
label correctly, but the classifier does not. Actually, I know why the
classifier doesn't extract the correct label--- because
org.apache.mahout.common.nlp.NGrams tokenizes via spaces only.

The other mystery is why it works for everyone else except poor me :(

If anyone has any ideas I'd love to hear it.

Cheers,
Vijay



On Mon, Jul 4, 2011 at 9:16 PM, Vijay Santhanam
<vi...@gmail.com>wrote:

> Hi,
>
> I got debugger running w/ eclipse so I could watch what was happening under
> the hood.
>
> Here's the exception again
> Exception in thread "main" java.lang.IllegalArgumentException: Label not
> found: alt.atheism from
>  at
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
> at
> org.apache.mahout.classifier.ConfusionMatrix.getCount(ConfusionMatrix.java:93)
>  at
> org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:113)
> at
> org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:117)
>  at
> org.apache.mahout.classifier.ConfusionMatrix.addInstance(ConfusionMatrix.java:85)
> at
> org.apache.mahout.classifier.ResultAnalyzer.addInstance(ResultAnalyzer.java:67)
>  at
> org.apache.mahout.classifier.bayes.TestClassifier.classifySequential(TestClassifier.java:244)
> at
> org.apache.mahout.classifier.bayes.TestClassifier.main(TestClassifier.java:177)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>  at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
>  at
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>  at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
>
> Notice the "Label not found: alt.atheism\tfrom"
>
> That's an invalid label in the confusion matrix. I think it SHOULD be just
> alt.atheism. I'm not sure how the \tfrom is getting in there, but it is.
> Perhaps it has something to do with the way my test data was formatted.
>
> I'll keep digging....
>
> Thanks,
> Vijay
>
>
>
> On Mon, Jul 4, 2011 at 8:52 PM, Vijay Santhanam <vijay.santhanam@gmail.com
> > wrote:
>
>> Hi Robin,
>>
>> The console dump was a too large for pastebin, so I uploaded it here --
>> http://dl.dropbox.com/u/7881451/build-20news-bayes-console-output.txt
>>
>> I performed a fresh checkout only hours ago, and I used script
>> examples/bin/build-20news-bayes.sh
>> I've opted to avoid hadoop, but from what I can tell the model was created
>> with success.
>>
>>
>> Thanks,
>> Vijay
>>
>>
>> On Mon, Jul 4, 2011 at 8:28 PM, Robin Anil <ro...@gmail.com> wrote:
>>
>>> Can you send me the console dump
>>> Command line + Log written by the program and put it on say pastebin
>>>
>>> Robin
>>>
>>> On Mon, Jul 4, 2011 at 3:48 PM, Vijay Santhanam
>>> <vi...@gmail.com>wrote:
>>>
>>> > I tried deleting all the folders from the test and train data except
>>> for
>>> > alt.atheism, but I get the identical error.
>>> >
>>> > I might try debugging the problem in eclipse rather than from
>>> commandline,
>>> > but Eclipse doesn't quite want to work either.
>>> >
>>> >
>>> > On Mon, Jul 4, 2011 at 8:02 PM, Vijay Santhanam
>>> > <vi...@gmail.com>wrote:
>>> >
>>> > > Thanks anyway Sergey. Could you perhaps upload your bayes-model
>>> folder so
>>> > I
>>> > > could try that out?
>>> > >
>>> > >
>>> > >
>>> > > On Mon, Jul 4, 2011 at 7:57 PM, Sergey Bartunov <sbos.net@gmail.com
>>> > >wrote:
>>> > >
>>> > >> Well, that's strange. Sorry, I can't help you at the moment, maybe
>>> > >> someone else in the mailing list could.
>>> > >>
>>> > >> On 4 July 2011 13:49, Vijay Santhanam <vi...@gmail.com>
>>> > wrote:
>>> > >> > Hi Sergey,
>>> > >> >
>>> > >> > Yes, there were no errors.
>>> > >> >
>>> > >> > And all the model data seems to have been populated into
>>> bayes-model
>>> > >> folder.
>>> > >> > Also, each main folder in bayes-model has a _SUCESS file.
>>> > >> >
>>> > >> > See the tarball of my trained model here,
>>> > >> > http://dl.dropbox.com/u/7881451/bayes-model.tar.gz
>>> > >> > Please compare it to your trained model if possible, I would like
>>> to
>>> > >> know if
>>> > >> > it's different in any way.
>>> > >> >
>>> > >> > Perhaps it's corrupted in someway.
>>> > >> >
>>> > >> > Thanks,
>>> > >> > Vijay
>>> > >> >
>>> > >> >
>>> > >> >
>>> > >> > On Mon, Jul 4, 2011 at 7:39 PM, Sergey Bartunov <sbos.net@
>>> gmail.com>
>>> > >> wrote:
>>> > >> >
>>> > >> >> Stop, did you _train_ the classifier successfully before running
>>> the
>>> > >> >> _test_?
>>> > >> >>
>>> > >> >> On 4 July 2011 13:30, Vijay Santhanam <vijay.santhanam@gmail.com
>>> >
>>> > >> wrote:
>>> > >> >> > Hi Sergey,
>>> > >> >> >
>>> > >> >> > I've tried using both the sh script file and following the
>>> > >> instructions
>>> > >> >> at
>>> > >> >> > https://cwiki.apache.org/MAHOUT/twenty-newsgroups.html - like
>>> you
>>> > >> >> suggested.
>>> > >> >> > Both return the same results.
>>> > >> >> >
>>> > >> >> > I've uploaded my bayes-test-input folder to dropbox, the first
>>> file
>>> > >> is
>>> > >> >> > here...
>>> > >> >> >
>>> http://dl.dropbox.com/u/7881451/bayes-test-input/alt.atheism.txt
>>> > >> >> >
>>> > >> >> > Thanks,
>>> > >> >> > Vijay
>>> > >> >> >
>>> > >> >> > On Mon, Jul 4, 2011 at 7:23 PM, Sergey Bartunov <sbos.net@
>>> > gmail.com>
>>> > >> >> wrote:
>>> > >> >> >
>>> > >> >> >> Paste somewhere your  bayes-test-input file.
>>> > >> >> >>
>>> > >> >> >> On 4 July 2011 13:20, Sergey Bartunov <sb...@gmail.com>
>>> wrote:
>>> > >> >> >> > Yes, I worked WITH hadoop, but there should be no
>>> difference.
>>> > >> >> >> >
>>> > >> >> >> > Why do you use examples/bin/build/20news-bayes.sh instead of
>>> > >> direct
>>> > >> >> >> > running bin/mahout? Is it the same?
>>> > >> >> >> >
>>> > >> >> >> > On 4 July 2011 13:12, Vijay Santhanam <
>>> > vijay.santhanam@gmail.com>
>>> > >> >> wrote:
>>> > >> >> >> >> Thanks Sergey,
>>> > >> >> >> >>
>>> > >> >> >> >> I'm still receiving the same error after following those
>>> steps.
>>> > >> >> >> >> I've chosen not to use hadoop - does yours work WITH
>>> hadoop?
>>> > >> >> >> >>
>>> > >> >> >> >> A few bits of info that might be relevant.
>>> > >> >> >> >>
>>> > >> >> >> >> My examples/bin/work folder contains the expected folders
>>> from
>>> > >> test
>>> > >> >> data
>>> > >> >> >> >> preparation and training...
>>> > >> >> >> >> drwxr-xr-x@ 22 Vijay  staff  748 18 Mar  2003
>>> > 20news-bydate-test
>>> > >> >> >> >> drwxr-xr-x@ 22 Vijay  staff  748 18 Mar  2003
>>> > >> 20news-bydate-train
>>> > >> >> >> >> drwxr-xr-x   3 Vijay  staff  102  4 Jul 19:03 bayes-model
>>> > >> >> >> >> drwxr-xr-x  22 Vijay  staff  748  4 Jul 18:20
>>> bayes-test-input
>>> > >> >> >> >> drwxr-xr-x  22 Vijay  staff  748  4 Jul 17:49
>>> bayes-train-input
>>> > >> >> >> >>
>>> > >> >> >> >>
>>> > >> >> >> >> I appreciate your help, do you have any other suggestions?
>>> > >> >> >> >>
>>> > >> >> >> >> Regards,
>>> > >> >> >> >> Vijay
>>> > >> >> >> >>
>>> > >> >> >> >>
>>> > >> >> >> >> On Mon, Jul 4, 2011 at 6:58 PM, Sergey Bartunov <sbos.net@
>>> > >> gmail.com>
>>> > >> >> >> wrote:
>>> > >> >> >> >>
>>> > >> >> >> >>> When I started with Mahout I had the same errors. In my
>>> case,
>>> > I
>>> > >> just
>>> > >> >> >> >>> didn't run PrepareTwentyNewsgroups. You may try to
>>> accurately
>>> > >> repeat
>>> > >> >> >> >>> all steps from
>>> > >> >> https://cwiki.apache.org/MAHOUT/twenty-newsgroups.html
>>> > >> >> >> >>>
>>> > >> >> >> >>> On 4 July 2011 12:52, Vijay Santhanam <
>>> > >> vijay.santhanam@gmail.com>
>>> > >> >> >> wrote:
>>> > >> >> >> >>> > Hi All,
>>> > >> >> >> >>> >
>>> > >> >> >> >>> > I'm new to Mahout and I'm interested in experimenting
>>> with
>>> > >> it's
>>> > >> >> >> >>> classifiers.
>>> > >> >> >> >>> >
>>> > >> >> >> >>> > Right now, I'm just trying to get up and running with
>>> the
>>> > >> demo's
>>> > >> >> and
>>> > >> >> >> >>> > examples.
>>> > >> >> >> >>> >
>>> > >> >> >> >>> > After checking out the mahout trunk, I've tried running
>>> the
>>> > >> >> >> >>> classification
>>> > >> >> >> >>> > example 20news, but after running the
>>> > >> >> >> >>> ./examples/bin/build/20news-bayes.sh
>>> > >> >> >> >>> > script I get the following error during the
>>> classification
>>> > >> phase.
>>> > >> >> >> >>> >
>>> > >> >> >> >>> > Does anyone else get the same thing? Or have any
>>> > >> recommendations
>>> > >> >> >> about
>>> > >> >> >> >>> how
>>> > >> >> >> >>> > to fix it?
>>> > >> >> >> >>> > I'd just like to get a sample classifier working before
>>> I
>>> > >> embark
>>> > >> >> on
>>> > >> >> >> my
>>> > >> >> >> >>> own
>>> > >> >> >> >>> > classification journey.
>>> > >> >> >> >>> >
>>> > >> >> >> >>> >
>>> > >> >> >> >>> > INFO: Loading model from:
>>> > >> >> >> >>> > {basePath=examples/bin/work/20news-bydate/bayes-model,
>>> > >> >> >> >>> classifierType=bayes,
>>> > >> >> >> >>> > alpha_i=1.0, dataSource=hdfs, gramSize=1, verbose=false,
>>> > >> >> >> encoding=UTF-8,
>>> > >> >> >> >>> > defaultCat=unknown,
>>> > >> >> >> >>> >
>>> > testDirPath=examples/bin/work/20news-bydate/bayes-test-input}
>>> > >> >> >> >>> > Jul 4, 2011 6:28:25 PM org.slf4j.impl.JCLLoggerAdapter
>>> info
>>> > >> >> >> >>> > INFO: Testing Bayes Classifier
>>> > >> >> >> >>> > Jul 4, 2011 6:28:27 PM org.slf4j.impl.JCLLoggerAdapter
>>> info
>>> > >> >> >> >>> > INFO: Read 50000 feature weights
>>> > >> >> >> >>> > Jul 4, 2011 6:28:27 PM org.slf4j.impl.JCLLoggerAdapter
>>> info
>>> > >> >> >> >>> > INFO: Read 100000 feature weights
>>> > >> >> >> >>> > Jul 4, 2011 6:28:28 PM org.slf4j.impl.JCLLoggerAdapter
>>> info
>>> > >> >> >> >>> > INFO: 193370.88331085522
>>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
>>> info
>>> > >> >> >> >>> > INFO: rec.sport.baseball -129829.34738930278
>>> > 531784.7805631821
>>> > >> >> >> >>> > -0.2441388925268003
>>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
>>> info
>>> > >> >> >> >>> > INFO: sci.crypt -193023.42370049533 531784.7805631821
>>> > >> >> >> -0.3629728242618669
>>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
>>> info
>>> > >> >> >> >>> > INFO: rec.sport.hockey -167853.6159738822
>>> 531784.7805631821
>>> > >> >> >> >>> > -0.31564200802459647
>>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
>>> info
>>> > >> >> >> >>> > INFO: talk.politics.guns -203524.0148974065
>>> > 531784.7805631821
>>> > >> >> >> >>> > -0.3827187658170024
>>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
>>> info
>>> > >> >> >> >>> > INFO: soc.religion.christian -163900.9258713857
>>> > >> 531784.7805631821
>>> > >> >> >> >>> > -0.308209132457322
>>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
>>> info
>>> > >> >> >> >>> > INFO: sci.electronics -142854.1677345925
>>> 531784.7805631821
>>> > >> >> >> >>> > -0.26863154598614886
>>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
>>> info
>>> > >> >> >> >>> > INFO: comp.os.ms-windows.misc -531784.7805631821
>>> > >> 531784.7805631821
>>> > >> >> >> -1.0
>>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
>>> info
>>> > >> >> >> >>> > INFO: misc.forsale -143454.70176448982 531784.7805631821
>>> > >> >> >> >>> > -0.26976082619845826
>>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
>>> info
>>> > >> >> >> >>> > INFO: talk.religion.misc -139428.73484148504
>>> > 531784.7805631821
>>> > >> >> >> >>> > -0.2621901565024562
>>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
>>> info
>>> > >> >> >> >>> > INFO: alt.atheism -139569.06867597546 531784.7805631821
>>> > >> >> >> >>> -0.2624540486626301
>>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
>>> info
>>> > >> >> >> >>> > INFO: comp.windows.x -178029.10523376046
>>> 531784.7805631821
>>> > >> >> >> >>> > -0.33477660839638973
>>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
>>> info
>>> > >> >> >> >>> > INFO: talk.politics.mideast -193075.00789450994
>>> > >> 531784.7805631821
>>> > >> >> >> >>> > -0.36306982627452317
>>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
>>> info
>>> > >> >> >> >>> > INFO: comp.sys.ibm.pc.hardware -138410.02049984262
>>> > >> >> 531784.7805631821
>>> > >> >> >> >>> > -0.2602745049477736
>>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
>>> info
>>> > >> >> >> >>> > INFO: comp.sys.mac.hardware -125200.9927438868
>>> > >> 531784.7805631821
>>> > >> >> >> >>> > -0.23543545682389364
>>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
>>> info
>>> > >> >> >> >>> > INFO: sci.space -192437.0009266271 531784.7805631821
>>> > >> >> >> -0.3618700797018455
>>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
>>> info
>>> > >> >> >> >>> > INFO: rec.motorcycles -143142.20855440624
>>> 531784.7805631821
>>> > >> >> >> >>> > -0.26917319522159455
>>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
>>> info
>>> > >> >> >> >>> > INFO: rec.autos -141800.97549909537 531784.7805631821
>>> > >> >> >> -0.2666510601317365
>>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
>>> info
>>> > >> >> >> >>> > INFO: comp.graphics -166882.18654471825
>>> 531784.7805631821
>>> > >> >> >> >>> > -0.3138152738556811
>>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
>>> info
>>> > >> >> >> >>> > INFO: talk.politics.misc -165196.84193278523
>>> > 531784.7805631821
>>> > >> >> >> >>> > -0.3106460507535303
>>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
>>> info
>>> > >> >> >> >>> > INFO: sci.med -192698.5183245711 531784.7805631821
>>> > >> >> >> -0.36236185270382393
>>> > >> >> >> >>> > Exception in thread "main"
>>> > java.lang.IllegalArgumentException:
>>> > >> >> Label
>>> > >> >> >> not
>>> > >> >> >> >>> > found: alt.atheism from
>>> > >> >> >> >>> >  at
>>> > >> >> >> >>> >
>>> > >> >> >>
>>> > >> >>
>>> > >>
>>> >
>>> com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
>>> > >> >> >> >>> > at
>>> > >> >> >> >>> >
>>> > >> >> >> >>>
>>> > >> >> >>
>>> > >> >>
>>> > >>
>>> >
>>> org.apache.mahout.classifier.ConfusionMatrix.getCount(ConfusionMatrix.java:93)
>>> > >> >> >> >>> >  at
>>> > >> >> >> >>> >
>>> > >> >> >> >>>
>>> > >> >> >>
>>> > >> >>
>>> > >>
>>> >
>>> org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:113)
>>> > >> >> >> >>> > at
>>> > >> >> >> >>> >
>>> > >> >> >> >>>
>>> > >> >> >>
>>> > >> >>
>>> > >>
>>> >
>>> org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:117)
>>> > >> >> >> >>> >  at
>>> > >> >> >> >>> >
>>> > >> >> >> >>>
>>> > >> >> >>
>>> > >> >>
>>> > >>
>>> >
>>> org.apache.mahout.classifier.ConfusionMatrix.addInstance(ConfusionMatrix.java:85)
>>> > >> >> >> >>> > at
>>> > >> >> >> >>> >
>>> > >> >> >> >>>
>>> > >> >> >>
>>> > >> >>
>>> > >>
>>> >
>>> org.apache.mahout.classifier.ResultAnalyzer.addInstance(ResultAnalyzer.java:67)
>>> > >> >> >> >>> >  at
>>> > >> >> >> >>> >
>>> > >> >> >> >>>
>>> > >> >> >>
>>> > >> >>
>>> > >>
>>> >
>>> org.apache.mahout.classifier.bayes.TestClassifier.classifySequential(TestClassifier.java:244)
>>> > >> >> >> >>> > at
>>> > >> >> >> >>> >
>>> > >> >> >> >>>
>>> > >> >> >>
>>> > >> >>
>>> > >>
>>> >
>>> org.apache.mahout.classifier.bayes.TestClassifier.main(TestClassifier.java:177)
>>> > >> >> >> >>> >  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>>> > >> Method)
>>> > >> >> >> >>> > at
>>> > >> >> >> >>> >
>>> > >> >> >> >>>
>>> > >> >> >>
>>> > >> >>
>>> > >>
>>> >
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>> > >> >> >> >>> >  at
>>> > >> >> >> >>> >
>>> > >> >> >> >>>
>>> > >> >> >>
>>> > >> >>
>>> > >>
>>> >
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>> > >> >> >> >>> > at java.lang.reflect.Method.invoke(Method.java:597)
>>> > >> >> >> >>> >  at
>>> > >> >> >> >>> >
>>> > >> >> >> >>>
>>> > >> >> >>
>>> > >> >>
>>> > >>
>>> >
>>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>>> > >> >> >> >>> > at
>>> > >> >> >>
>>> > org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>>> > >> >> >> >>> >  at
>>> > >> >> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
>>> > >> >> >> >>> >
>>> > >> >> >> >>> >
>>> > >> >> >> >>> > Any help is great appreciated.
>>> > >> >> >> >>> >
>>> > >> >> >> >>> > Regards,
>>> > >> >> >> >>> > --
>>> > >> >> >> >>> >  Vijay Santhanam
>>> > >> >> >> >>> >  Software Engineer
>>> > >> >> >> >>> >
>>> > >> >> >> >>>
>>> > >> >> >> >>
>>> > >> >> >> >>
>>> > >> >> >> >>
>>> > >> >> >> >> --
>>> > >> >> >> >>  Vijay Santhanam
>>> > >> >> >> >>  Software Engineer
>>> > >> >> >> >>  http://au.linkedin.com/in/vijaysanthanam
>>> > >> >> >> >>  0407525087
>>> > >> >> >> >>
>>> > >> >> >> >
>>> > >> >> >>
>>> > >> >> >
>>> > >> >> >
>>> > >> >> >
>>> > >> >> > --
>>> > >> >> >  Vijay Santhanam
>>> > >> >> >  Software Engineer
>>> > >> >> >  http://au.linkedin.com/in/vijaysanthanam
>>> > >> >> >  0407525087
>>> > >> >> >
>>> > >> >>
>>> > >> >
>>> > >> >
>>> > >> >
>>> > >> > --
>>> > >> >  Vijay Santhanam
>>> > >> >  Software Engineer
>>> > >> >  http://au.linkedin.com/in/vijaysanthanam
>>> > >> >  0407525087
>>> > >> >
>>> > >>
>>> > >
>>> > >
>>> > >
>>> > > --
>>> > >  Vijay Santhanam
>>> > >  Software Engineer
>>> > >  http://au.linkedin.com/in/vijaysanthanam
>>> > >  0407525087
>>> > >
>>> >
>>> >
>>> >
>>> > --
>>> >  Vijay Santhanam
>>> >  Software Engineer
>>> >  http://au.linkedin.com/in/vijaysanthanam
>>> >  0407525087
>>> >
>>>
>>
>>
>>
>> --
>>  Vijay Santhanam
>>  Software Engineer
>>  http://au.linkedin.com/in/vijaysanthanam
>>  0407525087
>>
>
>
>
> --
>  Vijay Santhanam
>  Software Engineer
>  http://au.linkedin.com/in/vijaysanthanam
>  0407525087
>



-- 
 Vijay Santhanam
 Software Engineer
 http://au.linkedin.com/in/vijaysanthanam
 0407525087

Re: 20news

Posted by Vijay Santhanam <vi...@gmail.com>.
Hi,

I got debugger running w/ eclipse so I could watch what was happening under
the hood.

Here's the exception again
Exception in thread "main" java.lang.IllegalArgumentException: Label not
found: alt.atheism from
 at
com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
at
org.apache.mahout.classifier.ConfusionMatrix.getCount(ConfusionMatrix.java:93)
 at
org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:113)
at
org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:117)
 at
org.apache.mahout.classifier.ConfusionMatrix.addInstance(ConfusionMatrix.java:85)
at
org.apache.mahout.classifier.ResultAnalyzer.addInstance(ResultAnalyzer.java:67)
 at
org.apache.mahout.classifier.bayes.TestClassifier.classifySequential(TestClassifier.java:244)
at
org.apache.mahout.classifier.bayes.TestClassifier.main(TestClassifier.java:177)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
 at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
 at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)

Notice the "Label not found: alt.atheism\tfrom"

That's an invalid label in the confusion matrix. I think it SHOULD be just
alt.atheism. I'm not sure how the \tfrom is getting in there, but it is.
Perhaps it has something to do with the way my test data was formatted.

I'll keep digging....

Thanks,
Vijay



On Mon, Jul 4, 2011 at 8:52 PM, Vijay Santhanam
<vi...@gmail.com>wrote:

> Hi Robin,
>
> The console dump was a too large for pastebin, so I uploaded it here --
> http://dl.dropbox.com/u/7881451/build-20news-bayes-console-output.txt
>
> I performed a fresh checkout only hours ago, and I used script
> examples/bin/build-20news-bayes.sh
> I've opted to avoid hadoop, but from what I can tell the model was created
> with success.
>
>
> Thanks,
> Vijay
>
>
> On Mon, Jul 4, 2011 at 8:28 PM, Robin Anil <ro...@gmail.com> wrote:
>
>> Can you send me the console dump
>> Command line + Log written by the program and put it on say pastebin
>>
>> Robin
>>
>> On Mon, Jul 4, 2011 at 3:48 PM, Vijay Santhanam
>> <vi...@gmail.com>wrote:
>>
>> > I tried deleting all the folders from the test and train data except for
>> > alt.atheism, but I get the identical error.
>> >
>> > I might try debugging the problem in eclipse rather than from
>> commandline,
>> > but Eclipse doesn't quite want to work either.
>> >
>> >
>> > On Mon, Jul 4, 2011 at 8:02 PM, Vijay Santhanam
>> > <vi...@gmail.com>wrote:
>> >
>> > > Thanks anyway Sergey. Could you perhaps upload your bayes-model folder
>> so
>> > I
>> > > could try that out?
>> > >
>> > >
>> > >
>> > > On Mon, Jul 4, 2011 at 7:57 PM, Sergey Bartunov <sbos.net@gmail.com
>> > >wrote:
>> > >
>> > >> Well, that's strange. Sorry, I can't help you at the moment, maybe
>> > >> someone else in the mailing list could.
>> > >>
>> > >> On 4 July 2011 13:49, Vijay Santhanam <vi...@gmail.com>
>> > wrote:
>> > >> > Hi Sergey,
>> > >> >
>> > >> > Yes, there were no errors.
>> > >> >
>> > >> > And all the model data seems to have been populated into
>> bayes-model
>> > >> folder.
>> > >> > Also, each main folder in bayes-model has a _SUCESS file.
>> > >> >
>> > >> > See the tarball of my trained model here,
>> > >> > http://dl.dropbox.com/u/7881451/bayes-model.tar.gz
>> > >> > Please compare it to your trained model if possible, I would like
>> to
>> > >> know if
>> > >> > it's different in any way.
>> > >> >
>> > >> > Perhaps it's corrupted in someway.
>> > >> >
>> > >> > Thanks,
>> > >> > Vijay
>> > >> >
>> > >> >
>> > >> >
>> > >> > On Mon, Jul 4, 2011 at 7:39 PM, Sergey Bartunov <sbos.net@
>> gmail.com>
>> > >> wrote:
>> > >> >
>> > >> >> Stop, did you _train_ the classifier successfully before running
>> the
>> > >> >> _test_?
>> > >> >>
>> > >> >> On 4 July 2011 13:30, Vijay Santhanam <vi...@gmail.com>
>> > >> wrote:
>> > >> >> > Hi Sergey,
>> > >> >> >
>> > >> >> > I've tried using both the sh script file and following the
>> > >> instructions
>> > >> >> at
>> > >> >> > https://cwiki.apache.org/MAHOUT/twenty-newsgroups.html - like
>> you
>> > >> >> suggested.
>> > >> >> > Both return the same results.
>> > >> >> >
>> > >> >> > I've uploaded my bayes-test-input folder to dropbox, the first
>> file
>> > >> is
>> > >> >> > here...
>> > >> >> >
>> http://dl.dropbox.com/u/7881451/bayes-test-input/alt.atheism.txt
>> > >> >> >
>> > >> >> > Thanks,
>> > >> >> > Vijay
>> > >> >> >
>> > >> >> > On Mon, Jul 4, 2011 at 7:23 PM, Sergey Bartunov <sbos.net@
>> > gmail.com>
>> > >> >> wrote:
>> > >> >> >
>> > >> >> >> Paste somewhere your  bayes-test-input file.
>> > >> >> >>
>> > >> >> >> On 4 July 2011 13:20, Sergey Bartunov <sb...@gmail.com>
>> wrote:
>> > >> >> >> > Yes, I worked WITH hadoop, but there should be no difference.
>> > >> >> >> >
>> > >> >> >> > Why do you use examples/bin/build/20news-bayes.sh instead of
>> > >> direct
>> > >> >> >> > running bin/mahout? Is it the same?
>> > >> >> >> >
>> > >> >> >> > On 4 July 2011 13:12, Vijay Santhanam <
>> > vijay.santhanam@gmail.com>
>> > >> >> wrote:
>> > >> >> >> >> Thanks Sergey,
>> > >> >> >> >>
>> > >> >> >> >> I'm still receiving the same error after following those
>> steps.
>> > >> >> >> >> I've chosen not to use hadoop - does yours work WITH hadoop?
>> > >> >> >> >>
>> > >> >> >> >> A few bits of info that might be relevant.
>> > >> >> >> >>
>> > >> >> >> >> My examples/bin/work folder contains the expected folders
>> from
>> > >> test
>> > >> >> data
>> > >> >> >> >> preparation and training...
>> > >> >> >> >> drwxr-xr-x@ 22 Vijay  staff  748 18 Mar  2003
>> > 20news-bydate-test
>> > >> >> >> >> drwxr-xr-x@ 22 Vijay  staff  748 18 Mar  2003
>> > >> 20news-bydate-train
>> > >> >> >> >> drwxr-xr-x   3 Vijay  staff  102  4 Jul 19:03 bayes-model
>> > >> >> >> >> drwxr-xr-x  22 Vijay  staff  748  4 Jul 18:20
>> bayes-test-input
>> > >> >> >> >> drwxr-xr-x  22 Vijay  staff  748  4 Jul 17:49
>> bayes-train-input
>> > >> >> >> >>
>> > >> >> >> >>
>> > >> >> >> >> I appreciate your help, do you have any other suggestions?
>> > >> >> >> >>
>> > >> >> >> >> Regards,
>> > >> >> >> >> Vijay
>> > >> >> >> >>
>> > >> >> >> >>
>> > >> >> >> >> On Mon, Jul 4, 2011 at 6:58 PM, Sergey Bartunov <sbos.net@
>> > >> gmail.com>
>> > >> >> >> wrote:
>> > >> >> >> >>
>> > >> >> >> >>> When I started with Mahout I had the same errors. In my
>> case,
>> > I
>> > >> just
>> > >> >> >> >>> didn't run PrepareTwentyNewsgroups. You may try to
>> accurately
>> > >> repeat
>> > >> >> >> >>> all steps from
>> > >> >> https://cwiki.apache.org/MAHOUT/twenty-newsgroups.html
>> > >> >> >> >>>
>> > >> >> >> >>> On 4 July 2011 12:52, Vijay Santhanam <
>> > >> vijay.santhanam@gmail.com>
>> > >> >> >> wrote:
>> > >> >> >> >>> > Hi All,
>> > >> >> >> >>> >
>> > >> >> >> >>> > I'm new to Mahout and I'm interested in experimenting
>> with
>> > >> it's
>> > >> >> >> >>> classifiers.
>> > >> >> >> >>> >
>> > >> >> >> >>> > Right now, I'm just trying to get up and running with the
>> > >> demo's
>> > >> >> and
>> > >> >> >> >>> > examples.
>> > >> >> >> >>> >
>> > >> >> >> >>> > After checking out the mahout trunk, I've tried running
>> the
>> > >> >> >> >>> classification
>> > >> >> >> >>> > example 20news, but after running the
>> > >> >> >> >>> ./examples/bin/build/20news-bayes.sh
>> > >> >> >> >>> > script I get the following error during the
>> classification
>> > >> phase.
>> > >> >> >> >>> >
>> > >> >> >> >>> > Does anyone else get the same thing? Or have any
>> > >> recommendations
>> > >> >> >> about
>> > >> >> >> >>> how
>> > >> >> >> >>> > to fix it?
>> > >> >> >> >>> > I'd just like to get a sample classifier working before I
>> > >> embark
>> > >> >> on
>> > >> >> >> my
>> > >> >> >> >>> own
>> > >> >> >> >>> > classification journey.
>> > >> >> >> >>> >
>> > >> >> >> >>> >
>> > >> >> >> >>> > INFO: Loading model from:
>> > >> >> >> >>> > {basePath=examples/bin/work/20news-bydate/bayes-model,
>> > >> >> >> >>> classifierType=bayes,
>> > >> >> >> >>> > alpha_i=1.0, dataSource=hdfs, gramSize=1, verbose=false,
>> > >> >> >> encoding=UTF-8,
>> > >> >> >> >>> > defaultCat=unknown,
>> > >> >> >> >>> >
>> > testDirPath=examples/bin/work/20news-bydate/bayes-test-input}
>> > >> >> >> >>> > Jul 4, 2011 6:28:25 PM org.slf4j.impl.JCLLoggerAdapter
>> info
>> > >> >> >> >>> > INFO: Testing Bayes Classifier
>> > >> >> >> >>> > Jul 4, 2011 6:28:27 PM org.slf4j.impl.JCLLoggerAdapter
>> info
>> > >> >> >> >>> > INFO: Read 50000 feature weights
>> > >> >> >> >>> > Jul 4, 2011 6:28:27 PM org.slf4j.impl.JCLLoggerAdapter
>> info
>> > >> >> >> >>> > INFO: Read 100000 feature weights
>> > >> >> >> >>> > Jul 4, 2011 6:28:28 PM org.slf4j.impl.JCLLoggerAdapter
>> info
>> > >> >> >> >>> > INFO: 193370.88331085522
>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
>> info
>> > >> >> >> >>> > INFO: rec.sport.baseball -129829.34738930278
>> > 531784.7805631821
>> > >> >> >> >>> > -0.2441388925268003
>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
>> info
>> > >> >> >> >>> > INFO: sci.crypt -193023.42370049533 531784.7805631821
>> > >> >> >> -0.3629728242618669
>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
>> info
>> > >> >> >> >>> > INFO: rec.sport.hockey -167853.6159738822
>> 531784.7805631821
>> > >> >> >> >>> > -0.31564200802459647
>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
>> info
>> > >> >> >> >>> > INFO: talk.politics.guns -203524.0148974065
>> > 531784.7805631821
>> > >> >> >> >>> > -0.3827187658170024
>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
>> info
>> > >> >> >> >>> > INFO: soc.religion.christian -163900.9258713857
>> > >> 531784.7805631821
>> > >> >> >> >>> > -0.308209132457322
>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
>> info
>> > >> >> >> >>> > INFO: sci.electronics -142854.1677345925
>> 531784.7805631821
>> > >> >> >> >>> > -0.26863154598614886
>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
>> info
>> > >> >> >> >>> > INFO: comp.os.ms-windows.misc -531784.7805631821
>> > >> 531784.7805631821
>> > >> >> >> -1.0
>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
>> info
>> > >> >> >> >>> > INFO: misc.forsale -143454.70176448982 531784.7805631821
>> > >> >> >> >>> > -0.26976082619845826
>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
>> info
>> > >> >> >> >>> > INFO: talk.religion.misc -139428.73484148504
>> > 531784.7805631821
>> > >> >> >> >>> > -0.2621901565024562
>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
>> info
>> > >> >> >> >>> > INFO: alt.atheism -139569.06867597546 531784.7805631821
>> > >> >> >> >>> -0.2624540486626301
>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
>> info
>> > >> >> >> >>> > INFO: comp.windows.x -178029.10523376046
>> 531784.7805631821
>> > >> >> >> >>> > -0.33477660839638973
>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
>> info
>> > >> >> >> >>> > INFO: talk.politics.mideast -193075.00789450994
>> > >> 531784.7805631821
>> > >> >> >> >>> > -0.36306982627452317
>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
>> info
>> > >> >> >> >>> > INFO: comp.sys.ibm.pc.hardware -138410.02049984262
>> > >> >> 531784.7805631821
>> > >> >> >> >>> > -0.2602745049477736
>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
>> info
>> > >> >> >> >>> > INFO: comp.sys.mac.hardware -125200.9927438868
>> > >> 531784.7805631821
>> > >> >> >> >>> > -0.23543545682389364
>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
>> info
>> > >> >> >> >>> > INFO: sci.space -192437.0009266271 531784.7805631821
>> > >> >> >> -0.3618700797018455
>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
>> info
>> > >> >> >> >>> > INFO: rec.motorcycles -143142.20855440624
>> 531784.7805631821
>> > >> >> >> >>> > -0.26917319522159455
>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
>> info
>> > >> >> >> >>> > INFO: rec.autos -141800.97549909537 531784.7805631821
>> > >> >> >> -0.2666510601317365
>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
>> info
>> > >> >> >> >>> > INFO: comp.graphics -166882.18654471825 531784.7805631821
>> > >> >> >> >>> > -0.3138152738556811
>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
>> info
>> > >> >> >> >>> > INFO: talk.politics.misc -165196.84193278523
>> > 531784.7805631821
>> > >> >> >> >>> > -0.3106460507535303
>> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
>> info
>> > >> >> >> >>> > INFO: sci.med -192698.5183245711 531784.7805631821
>> > >> >> >> -0.36236185270382393
>> > >> >> >> >>> > Exception in thread "main"
>> > java.lang.IllegalArgumentException:
>> > >> >> Label
>> > >> >> >> not
>> > >> >> >> >>> > found: alt.atheism from
>> > >> >> >> >>> >  at
>> > >> >> >> >>> >
>> > >> >> >>
>> > >> >>
>> > >>
>> >
>> com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
>> > >> >> >> >>> > at
>> > >> >> >> >>> >
>> > >> >> >> >>>
>> > >> >> >>
>> > >> >>
>> > >>
>> >
>> org.apache.mahout.classifier.ConfusionMatrix.getCount(ConfusionMatrix.java:93)
>> > >> >> >> >>> >  at
>> > >> >> >> >>> >
>> > >> >> >> >>>
>> > >> >> >>
>> > >> >>
>> > >>
>> >
>> org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:113)
>> > >> >> >> >>> > at
>> > >> >> >> >>> >
>> > >> >> >> >>>
>> > >> >> >>
>> > >> >>
>> > >>
>> >
>> org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:117)
>> > >> >> >> >>> >  at
>> > >> >> >> >>> >
>> > >> >> >> >>>
>> > >> >> >>
>> > >> >>
>> > >>
>> >
>> org.apache.mahout.classifier.ConfusionMatrix.addInstance(ConfusionMatrix.java:85)
>> > >> >> >> >>> > at
>> > >> >> >> >>> >
>> > >> >> >> >>>
>> > >> >> >>
>> > >> >>
>> > >>
>> >
>> org.apache.mahout.classifier.ResultAnalyzer.addInstance(ResultAnalyzer.java:67)
>> > >> >> >> >>> >  at
>> > >> >> >> >>> >
>> > >> >> >> >>>
>> > >> >> >>
>> > >> >>
>> > >>
>> >
>> org.apache.mahout.classifier.bayes.TestClassifier.classifySequential(TestClassifier.java:244)
>> > >> >> >> >>> > at
>> > >> >> >> >>> >
>> > >> >> >> >>>
>> > >> >> >>
>> > >> >>
>> > >>
>> >
>> org.apache.mahout.classifier.bayes.TestClassifier.main(TestClassifier.java:177)
>> > >> >> >> >>> >  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>> > >> Method)
>> > >> >> >> >>> > at
>> > >> >> >> >>> >
>> > >> >> >> >>>
>> > >> >> >>
>> > >> >>
>> > >>
>> >
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>> > >> >> >> >>> >  at
>> > >> >> >> >>> >
>> > >> >> >> >>>
>> > >> >> >>
>> > >> >>
>> > >>
>> >
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> > >> >> >> >>> > at java.lang.reflect.Method.invoke(Method.java:597)
>> > >> >> >> >>> >  at
>> > >> >> >> >>> >
>> > >> >> >> >>>
>> > >> >> >>
>> > >> >>
>> > >>
>> >
>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>> > >> >> >> >>> > at
>> > >> >> >>
>> > org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>> > >> >> >> >>> >  at
>> > >> >> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
>> > >> >> >> >>> >
>> > >> >> >> >>> >
>> > >> >> >> >>> > Any help is great appreciated.
>> > >> >> >> >>> >
>> > >> >> >> >>> > Regards,
>> > >> >> >> >>> > --
>> > >> >> >> >>> >  Vijay Santhanam
>> > >> >> >> >>> >  Software Engineer
>> > >> >> >> >>> >
>> > >> >> >> >>>
>> > >> >> >> >>
>> > >> >> >> >>
>> > >> >> >> >>
>> > >> >> >> >> --
>> > >> >> >> >>  Vijay Santhanam
>> > >> >> >> >>  Software Engineer
>> > >> >> >> >>  http://au.linkedin.com/in/vijaysanthanam
>> > >> >> >> >>  0407525087
>> > >> >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >
>> > >> >> >
>> > >> >> >
>> > >> >> > --
>> > >> >> >  Vijay Santhanam
>> > >> >> >  Software Engineer
>> > >> >> >  http://au.linkedin.com/in/vijaysanthanam
>> > >> >> >  0407525087
>> > >> >> >
>> > >> >>
>> > >> >
>> > >> >
>> > >> >
>> > >> > --
>> > >> >  Vijay Santhanam
>> > >> >  Software Engineer
>> > >> >  http://au.linkedin.com/in/vijaysanthanam
>> > >> >  0407525087
>> > >> >
>> > >>
>> > >
>> > >
>> > >
>> > > --
>> > >  Vijay Santhanam
>> > >  Software Engineer
>> > >  http://au.linkedin.com/in/vijaysanthanam
>> > >  0407525087
>> > >
>> >
>> >
>> >
>> > --
>> >  Vijay Santhanam
>> >  Software Engineer
>> >  http://au.linkedin.com/in/vijaysanthanam
>> >  0407525087
>> >
>>
>
>
>
> --
>  Vijay Santhanam
>  Software Engineer
>  http://au.linkedin.com/in/vijaysanthanam
>  0407525087
>



-- 
 Vijay Santhanam
 Software Engineer
 http://au.linkedin.com/in/vijaysanthanam
 0407525087

Re: 20news

Posted by Vijay Santhanam <vi...@gmail.com>.
Hi Robin,

The console dump was a too large for pastebin, so I uploaded it here --
http://dl.dropbox.com/u/7881451/build-20news-bayes-console-output.txt

I performed a fresh checkout only hours ago, and I used script
examples/bin/build-20news-bayes.sh
I've opted to avoid hadoop, but from what I can tell the model was created
with success.


Thanks,
Vijay


On Mon, Jul 4, 2011 at 8:28 PM, Robin Anil <ro...@gmail.com> wrote:

> Can you send me the console dump
> Command line + Log written by the program and put it on say pastebin
>
> Robin
>
> On Mon, Jul 4, 2011 at 3:48 PM, Vijay Santhanam
> <vi...@gmail.com>wrote:
>
> > I tried deleting all the folders from the test and train data except for
> > alt.atheism, but I get the identical error.
> >
> > I might try debugging the problem in eclipse rather than from
> commandline,
> > but Eclipse doesn't quite want to work either.
> >
> >
> > On Mon, Jul 4, 2011 at 8:02 PM, Vijay Santhanam
> > <vi...@gmail.com>wrote:
> >
> > > Thanks anyway Sergey. Could you perhaps upload your bayes-model folder
> so
> > I
> > > could try that out?
> > >
> > >
> > >
> > > On Mon, Jul 4, 2011 at 7:57 PM, Sergey Bartunov <sbos.net@gmail.com
> > >wrote:
> > >
> > >> Well, that's strange. Sorry, I can't help you at the moment, maybe
> > >> someone else in the mailing list could.
> > >>
> > >> On 4 July 2011 13:49, Vijay Santhanam <vi...@gmail.com>
> > wrote:
> > >> > Hi Sergey,
> > >> >
> > >> > Yes, there were no errors.
> > >> >
> > >> > And all the model data seems to have been populated into bayes-model
> > >> folder.
> > >> > Also, each main folder in bayes-model has a _SUCESS file.
> > >> >
> > >> > See the tarball of my trained model here,
> > >> > http://dl.dropbox.com/u/7881451/bayes-model.tar.gz
> > >> > Please compare it to your trained model if possible, I would like to
> > >> know if
> > >> > it's different in any way.
> > >> >
> > >> > Perhaps it's corrupted in someway.
> > >> >
> > >> > Thanks,
> > >> > Vijay
> > >> >
> > >> >
> > >> >
> > >> > On Mon, Jul 4, 2011 at 7:39 PM, Sergey Bartunov <sbos.net@gmail.com
> >
> > >> wrote:
> > >> >
> > >> >> Stop, did you _train_ the classifier successfully before running
> the
> > >> >> _test_?
> > >> >>
> > >> >> On 4 July 2011 13:30, Vijay Santhanam <vi...@gmail.com>
> > >> wrote:
> > >> >> > Hi Sergey,
> > >> >> >
> > >> >> > I've tried using both the sh script file and following the
> > >> instructions
> > >> >> at
> > >> >> > https://cwiki.apache.org/MAHOUT/twenty-newsgroups.html - like
> you
> > >> >> suggested.
> > >> >> > Both return the same results.
> > >> >> >
> > >> >> > I've uploaded my bayes-test-input folder to dropbox, the first
> file
> > >> is
> > >> >> > here...
> > >> >> > http://dl.dropbox.com/u/7881451/bayes-test-input/alt.atheism.txt
> > >> >> >
> > >> >> > Thanks,
> > >> >> > Vijay
> > >> >> >
> > >> >> > On Mon, Jul 4, 2011 at 7:23 PM, Sergey Bartunov <sbos.net@
> > gmail.com>
> > >> >> wrote:
> > >> >> >
> > >> >> >> Paste somewhere your  bayes-test-input file.
> > >> >> >>
> > >> >> >> On 4 July 2011 13:20, Sergey Bartunov <sb...@gmail.com>
> wrote:
> > >> >> >> > Yes, I worked WITH hadoop, but there should be no difference.
> > >> >> >> >
> > >> >> >> > Why do you use examples/bin/build/20news-bayes.sh instead of
> > >> direct
> > >> >> >> > running bin/mahout? Is it the same?
> > >> >> >> >
> > >> >> >> > On 4 July 2011 13:12, Vijay Santhanam <
> > vijay.santhanam@gmail.com>
> > >> >> wrote:
> > >> >> >> >> Thanks Sergey,
> > >> >> >> >>
> > >> >> >> >> I'm still receiving the same error after following those
> steps.
> > >> >> >> >> I've chosen not to use hadoop - does yours work WITH hadoop?
> > >> >> >> >>
> > >> >> >> >> A few bits of info that might be relevant.
> > >> >> >> >>
> > >> >> >> >> My examples/bin/work folder contains the expected folders
> from
> > >> test
> > >> >> data
> > >> >> >> >> preparation and training...
> > >> >> >> >> drwxr-xr-x@ 22 Vijay  staff  748 18 Mar  2003
> > 20news-bydate-test
> > >> >> >> >> drwxr-xr-x@ 22 Vijay  staff  748 18 Mar  2003
> > >> 20news-bydate-train
> > >> >> >> >> drwxr-xr-x   3 Vijay  staff  102  4 Jul 19:03 bayes-model
> > >> >> >> >> drwxr-xr-x  22 Vijay  staff  748  4 Jul 18:20
> bayes-test-input
> > >> >> >> >> drwxr-xr-x  22 Vijay  staff  748  4 Jul 17:49
> bayes-train-input
> > >> >> >> >>
> > >> >> >> >>
> > >> >> >> >> I appreciate your help, do you have any other suggestions?
> > >> >> >> >>
> > >> >> >> >> Regards,
> > >> >> >> >> Vijay
> > >> >> >> >>
> > >> >> >> >>
> > >> >> >> >> On Mon, Jul 4, 2011 at 6:58 PM, Sergey Bartunov <sbos.net@
> > >> gmail.com>
> > >> >> >> wrote:
> > >> >> >> >>
> > >> >> >> >>> When I started with Mahout I had the same errors. In my
> case,
> > I
> > >> just
> > >> >> >> >>> didn't run PrepareTwentyNewsgroups. You may try to
> accurately
> > >> repeat
> > >> >> >> >>> all steps from
> > >> >> https://cwiki.apache.org/MAHOUT/twenty-newsgroups.html
> > >> >> >> >>>
> > >> >> >> >>> On 4 July 2011 12:52, Vijay Santhanam <
> > >> vijay.santhanam@gmail.com>
> > >> >> >> wrote:
> > >> >> >> >>> > Hi All,
> > >> >> >> >>> >
> > >> >> >> >>> > I'm new to Mahout and I'm interested in experimenting with
> > >> it's
> > >> >> >> >>> classifiers.
> > >> >> >> >>> >
> > >> >> >> >>> > Right now, I'm just trying to get up and running with the
> > >> demo's
> > >> >> and
> > >> >> >> >>> > examples.
> > >> >> >> >>> >
> > >> >> >> >>> > After checking out the mahout trunk, I've tried running
> the
> > >> >> >> >>> classification
> > >> >> >> >>> > example 20news, but after running the
> > >> >> >> >>> ./examples/bin/build/20news-bayes.sh
> > >> >> >> >>> > script I get the following error during the classification
> > >> phase.
> > >> >> >> >>> >
> > >> >> >> >>> > Does anyone else get the same thing? Or have any
> > >> recommendations
> > >> >> >> about
> > >> >> >> >>> how
> > >> >> >> >>> > to fix it?
> > >> >> >> >>> > I'd just like to get a sample classifier working before I
> > >> embark
> > >> >> on
> > >> >> >> my
> > >> >> >> >>> own
> > >> >> >> >>> > classification journey.
> > >> >> >> >>> >
> > >> >> >> >>> >
> > >> >> >> >>> > INFO: Loading model from:
> > >> >> >> >>> > {basePath=examples/bin/work/20news-bydate/bayes-model,
> > >> >> >> >>> classifierType=bayes,
> > >> >> >> >>> > alpha_i=1.0, dataSource=hdfs, gramSize=1, verbose=false,
> > >> >> >> encoding=UTF-8,
> > >> >> >> >>> > defaultCat=unknown,
> > >> >> >> >>> >
> > testDirPath=examples/bin/work/20news-bydate/bayes-test-input}
> > >> >> >> >>> > Jul 4, 2011 6:28:25 PM org.slf4j.impl.JCLLoggerAdapter
> info
> > >> >> >> >>> > INFO: Testing Bayes Classifier
> > >> >> >> >>> > Jul 4, 2011 6:28:27 PM org.slf4j.impl.JCLLoggerAdapter
> info
> > >> >> >> >>> > INFO: Read 50000 feature weights
> > >> >> >> >>> > Jul 4, 2011 6:28:27 PM org.slf4j.impl.JCLLoggerAdapter
> info
> > >> >> >> >>> > INFO: Read 100000 feature weights
> > >> >> >> >>> > Jul 4, 2011 6:28:28 PM org.slf4j.impl.JCLLoggerAdapter
> info
> > >> >> >> >>> > INFO: 193370.88331085522
> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
> info
> > >> >> >> >>> > INFO: rec.sport.baseball -129829.34738930278
> > 531784.7805631821
> > >> >> >> >>> > -0.2441388925268003
> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
> info
> > >> >> >> >>> > INFO: sci.crypt -193023.42370049533 531784.7805631821
> > >> >> >> -0.3629728242618669
> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
> info
> > >> >> >> >>> > INFO: rec.sport.hockey -167853.6159738822
> 531784.7805631821
> > >> >> >> >>> > -0.31564200802459647
> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
> info
> > >> >> >> >>> > INFO: talk.politics.guns -203524.0148974065
> > 531784.7805631821
> > >> >> >> >>> > -0.3827187658170024
> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
> info
> > >> >> >> >>> > INFO: soc.religion.christian -163900.9258713857
> > >> 531784.7805631821
> > >> >> >> >>> > -0.308209132457322
> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
> info
> > >> >> >> >>> > INFO: sci.electronics -142854.1677345925 531784.7805631821
> > >> >> >> >>> > -0.26863154598614886
> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
> info
> > >> >> >> >>> > INFO: comp.os.ms-windows.misc -531784.7805631821
> > >> 531784.7805631821
> > >> >> >> -1.0
> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
> info
> > >> >> >> >>> > INFO: misc.forsale -143454.70176448982 531784.7805631821
> > >> >> >> >>> > -0.26976082619845826
> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
> info
> > >> >> >> >>> > INFO: talk.religion.misc -139428.73484148504
> > 531784.7805631821
> > >> >> >> >>> > -0.2621901565024562
> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
> info
> > >> >> >> >>> > INFO: alt.atheism -139569.06867597546 531784.7805631821
> > >> >> >> >>> -0.2624540486626301
> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
> info
> > >> >> >> >>> > INFO: comp.windows.x -178029.10523376046 531784.7805631821
> > >> >> >> >>> > -0.33477660839638973
> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
> info
> > >> >> >> >>> > INFO: talk.politics.mideast -193075.00789450994
> > >> 531784.7805631821
> > >> >> >> >>> > -0.36306982627452317
> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
> info
> > >> >> >> >>> > INFO: comp.sys.ibm.pc.hardware -138410.02049984262
> > >> >> 531784.7805631821
> > >> >> >> >>> > -0.2602745049477736
> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
> info
> > >> >> >> >>> > INFO: comp.sys.mac.hardware -125200.9927438868
> > >> 531784.7805631821
> > >> >> >> >>> > -0.23543545682389364
> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
> info
> > >> >> >> >>> > INFO: sci.space -192437.0009266271 531784.7805631821
> > >> >> >> -0.3618700797018455
> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
> info
> > >> >> >> >>> > INFO: rec.motorcycles -143142.20855440624
> 531784.7805631821
> > >> >> >> >>> > -0.26917319522159455
> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
> info
> > >> >> >> >>> > INFO: rec.autos -141800.97549909537 531784.7805631821
> > >> >> >> -0.2666510601317365
> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
> info
> > >> >> >> >>> > INFO: comp.graphics -166882.18654471825 531784.7805631821
> > >> >> >> >>> > -0.3138152738556811
> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
> info
> > >> >> >> >>> > INFO: talk.politics.misc -165196.84193278523
> > 531784.7805631821
> > >> >> >> >>> > -0.3106460507535303
> > >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter
> info
> > >> >> >> >>> > INFO: sci.med -192698.5183245711 531784.7805631821
> > >> >> >> -0.36236185270382393
> > >> >> >> >>> > Exception in thread "main"
> > java.lang.IllegalArgumentException:
> > >> >> Label
> > >> >> >> not
> > >> >> >> >>> > found: alt.atheism from
> > >> >> >> >>> >  at
> > >> >> >> >>> >
> > >> >> >>
> > >> >>
> > >>
> > com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
> > >> >> >> >>> > at
> > >> >> >> >>> >
> > >> >> >> >>>
> > >> >> >>
> > >> >>
> > >>
> >
> org.apache.mahout.classifier.ConfusionMatrix.getCount(ConfusionMatrix.java:93)
> > >> >> >> >>> >  at
> > >> >> >> >>> >
> > >> >> >> >>>
> > >> >> >>
> > >> >>
> > >>
> >
> org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:113)
> > >> >> >> >>> > at
> > >> >> >> >>> >
> > >> >> >> >>>
> > >> >> >>
> > >> >>
> > >>
> >
> org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:117)
> > >> >> >> >>> >  at
> > >> >> >> >>> >
> > >> >> >> >>>
> > >> >> >>
> > >> >>
> > >>
> >
> org.apache.mahout.classifier.ConfusionMatrix.addInstance(ConfusionMatrix.java:85)
> > >> >> >> >>> > at
> > >> >> >> >>> >
> > >> >> >> >>>
> > >> >> >>
> > >> >>
> > >>
> >
> org.apache.mahout.classifier.ResultAnalyzer.addInstance(ResultAnalyzer.java:67)
> > >> >> >> >>> >  at
> > >> >> >> >>> >
> > >> >> >> >>>
> > >> >> >>
> > >> >>
> > >>
> >
> org.apache.mahout.classifier.bayes.TestClassifier.classifySequential(TestClassifier.java:244)
> > >> >> >> >>> > at
> > >> >> >> >>> >
> > >> >> >> >>>
> > >> >> >>
> > >> >>
> > >>
> >
> org.apache.mahout.classifier.bayes.TestClassifier.main(TestClassifier.java:177)
> > >> >> >> >>> >  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
> > >> Method)
> > >> >> >> >>> > at
> > >> >> >> >>> >
> > >> >> >> >>>
> > >> >> >>
> > >> >>
> > >>
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> > >> >> >> >>> >  at
> > >> >> >> >>> >
> > >> >> >> >>>
> > >> >> >>
> > >> >>
> > >>
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> > >> >> >> >>> > at java.lang.reflect.Method.invoke(Method.java:597)
> > >> >> >> >>> >  at
> > >> >> >> >>> >
> > >> >> >> >>>
> > >> >> >>
> > >> >>
> > >>
> >
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> > >> >> >> >>> > at
> > >> >> >>
> > org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> > >> >> >> >>> >  at
> > >> >> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
> > >> >> >> >>> >
> > >> >> >> >>> >
> > >> >> >> >>> > Any help is great appreciated.
> > >> >> >> >>> >
> > >> >> >> >>> > Regards,
> > >> >> >> >>> > --
> > >> >> >> >>> >  Vijay Santhanam
> > >> >> >> >>> >  Software Engineer
> > >> >> >> >>> >
> > >> >> >> >>>
> > >> >> >> >>
> > >> >> >> >>
> > >> >> >> >>
> > >> >> >> >> --
> > >> >> >> >>  Vijay Santhanam
> > >> >> >> >>  Software Engineer
> > >> >> >> >>  http://au.linkedin.com/in/vijaysanthanam
> > >> >> >> >>  0407525087
> > >> >> >> >>
> > >> >> >> >
> > >> >> >>
> > >> >> >
> > >> >> >
> > >> >> >
> > >> >> > --
> > >> >> >  Vijay Santhanam
> > >> >> >  Software Engineer
> > >> >> >  http://au.linkedin.com/in/vijaysanthanam
> > >> >> >  0407525087
> > >> >> >
> > >> >>
> > >> >
> > >> >
> > >> >
> > >> > --
> > >> >  Vijay Santhanam
> > >> >  Software Engineer
> > >> >  http://au.linkedin.com/in/vijaysanthanam
> > >> >  0407525087
> > >> >
> > >>
> > >
> > >
> > >
> > > --
> > >  Vijay Santhanam
> > >  Software Engineer
> > >  http://au.linkedin.com/in/vijaysanthanam
> > >  0407525087
> > >
> >
> >
> >
> > --
> >  Vijay Santhanam
> >  Software Engineer
> >  http://au.linkedin.com/in/vijaysanthanam
> >  0407525087
> >
>



-- 
 Vijay Santhanam
 Software Engineer
 http://au.linkedin.com/in/vijaysanthanam
 0407525087

Re: 20news

Posted by Robin Anil <ro...@gmail.com>.
Can you send me the console dump
Command line + Log written by the program and put it on say pastebin

Robin

On Mon, Jul 4, 2011 at 3:48 PM, Vijay Santhanam
<vi...@gmail.com>wrote:

> I tried deleting all the folders from the test and train data except for
> alt.atheism, but I get the identical error.
>
> I might try debugging the problem in eclipse rather than from commandline,
> but Eclipse doesn't quite want to work either.
>
>
> On Mon, Jul 4, 2011 at 8:02 PM, Vijay Santhanam
> <vi...@gmail.com>wrote:
>
> > Thanks anyway Sergey. Could you perhaps upload your bayes-model folder so
> I
> > could try that out?
> >
> >
> >
> > On Mon, Jul 4, 2011 at 7:57 PM, Sergey Bartunov <sbos.net@gmail.com
> >wrote:
> >
> >> Well, that's strange. Sorry, I can't help you at the moment, maybe
> >> someone else in the mailing list could.
> >>
> >> On 4 July 2011 13:49, Vijay Santhanam <vi...@gmail.com>
> wrote:
> >> > Hi Sergey,
> >> >
> >> > Yes, there were no errors.
> >> >
> >> > And all the model data seems to have been populated into bayes-model
> >> folder.
> >> > Also, each main folder in bayes-model has a _SUCESS file.
> >> >
> >> > See the tarball of my trained model here,
> >> > http://dl.dropbox.com/u/7881451/bayes-model.tar.gz
> >> > Please compare it to your trained model if possible, I would like to
> >> know if
> >> > it's different in any way.
> >> >
> >> > Perhaps it's corrupted in someway.
> >> >
> >> > Thanks,
> >> > Vijay
> >> >
> >> >
> >> >
> >> > On Mon, Jul 4, 2011 at 7:39 PM, Sergey Bartunov <sb...@gmail.com>
> >> wrote:
> >> >
> >> >> Stop, did you _train_ the classifier successfully before running the
> >> >> _test_?
> >> >>
> >> >> On 4 July 2011 13:30, Vijay Santhanam <vi...@gmail.com>
> >> wrote:
> >> >> > Hi Sergey,
> >> >> >
> >> >> > I've tried using both the sh script file and following the
> >> instructions
> >> >> at
> >> >> > https://cwiki.apache.org/MAHOUT/twenty-newsgroups.html - like you
> >> >> suggested.
> >> >> > Both return the same results.
> >> >> >
> >> >> > I've uploaded my bayes-test-input folder to dropbox, the first file
> >> is
> >> >> > here...
> >> >> > http://dl.dropbox.com/u/7881451/bayes-test-input/alt.atheism.txt
> >> >> >
> >> >> > Thanks,
> >> >> > Vijay
> >> >> >
> >> >> > On Mon, Jul 4, 2011 at 7:23 PM, Sergey Bartunov <sbos.net@
> gmail.com>
> >> >> wrote:
> >> >> >
> >> >> >> Paste somewhere your  bayes-test-input file.
> >> >> >>
> >> >> >> On 4 July 2011 13:20, Sergey Bartunov <sb...@gmail.com> wrote:
> >> >> >> > Yes, I worked WITH hadoop, but there should be no difference.
> >> >> >> >
> >> >> >> > Why do you use examples/bin/build/20news-bayes.sh instead of
> >> direct
> >> >> >> > running bin/mahout? Is it the same?
> >> >> >> >
> >> >> >> > On 4 July 2011 13:12, Vijay Santhanam <
> vijay.santhanam@gmail.com>
> >> >> wrote:
> >> >> >> >> Thanks Sergey,
> >> >> >> >>
> >> >> >> >> I'm still receiving the same error after following those steps.
> >> >> >> >> I've chosen not to use hadoop - does yours work WITH hadoop?
> >> >> >> >>
> >> >> >> >> A few bits of info that might be relevant.
> >> >> >> >>
> >> >> >> >> My examples/bin/work folder contains the expected folders from
> >> test
> >> >> data
> >> >> >> >> preparation and training...
> >> >> >> >> drwxr-xr-x@ 22 Vijay  staff  748 18 Mar  2003
> 20news-bydate-test
> >> >> >> >> drwxr-xr-x@ 22 Vijay  staff  748 18 Mar  2003
> >> 20news-bydate-train
> >> >> >> >> drwxr-xr-x   3 Vijay  staff  102  4 Jul 19:03 bayes-model
> >> >> >> >> drwxr-xr-x  22 Vijay  staff  748  4 Jul 18:20 bayes-test-input
> >> >> >> >> drwxr-xr-x  22 Vijay  staff  748  4 Jul 17:49 bayes-train-input
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> I appreciate your help, do you have any other suggestions?
> >> >> >> >>
> >> >> >> >> Regards,
> >> >> >> >> Vijay
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> On Mon, Jul 4, 2011 at 6:58 PM, Sergey Bartunov <sbos.net@
> >> gmail.com>
> >> >> >> wrote:
> >> >> >> >>
> >> >> >> >>> When I started with Mahout I had the same errors. In my case,
> I
> >> just
> >> >> >> >>> didn't run PrepareTwentyNewsgroups. You may try to accurately
> >> repeat
> >> >> >> >>> all steps from
> >> >> https://cwiki.apache.org/MAHOUT/twenty-newsgroups.html
> >> >> >> >>>
> >> >> >> >>> On 4 July 2011 12:52, Vijay Santhanam <
> >> vijay.santhanam@gmail.com>
> >> >> >> wrote:
> >> >> >> >>> > Hi All,
> >> >> >> >>> >
> >> >> >> >>> > I'm new to Mahout and I'm interested in experimenting with
> >> it's
> >> >> >> >>> classifiers.
> >> >> >> >>> >
> >> >> >> >>> > Right now, I'm just trying to get up and running with the
> >> demo's
> >> >> and
> >> >> >> >>> > examples.
> >> >> >> >>> >
> >> >> >> >>> > After checking out the mahout trunk, I've tried running the
> >> >> >> >>> classification
> >> >> >> >>> > example 20news, but after running the
> >> >> >> >>> ./examples/bin/build/20news-bayes.sh
> >> >> >> >>> > script I get the following error during the classification
> >> phase.
> >> >> >> >>> >
> >> >> >> >>> > Does anyone else get the same thing? Or have any
> >> recommendations
> >> >> >> about
> >> >> >> >>> how
> >> >> >> >>> > to fix it?
> >> >> >> >>> > I'd just like to get a sample classifier working before I
> >> embark
> >> >> on
> >> >> >> my
> >> >> >> >>> own
> >> >> >> >>> > classification journey.
> >> >> >> >>> >
> >> >> >> >>> >
> >> >> >> >>> > INFO: Loading model from:
> >> >> >> >>> > {basePath=examples/bin/work/20news-bydate/bayes-model,
> >> >> >> >>> classifierType=bayes,
> >> >> >> >>> > alpha_i=1.0, dataSource=hdfs, gramSize=1, verbose=false,
> >> >> >> encoding=UTF-8,
> >> >> >> >>> > defaultCat=unknown,
> >> >> >> >>> >
> testDirPath=examples/bin/work/20news-bydate/bayes-test-input}
> >> >> >> >>> > Jul 4, 2011 6:28:25 PM org.slf4j.impl.JCLLoggerAdapter info
> >> >> >> >>> > INFO: Testing Bayes Classifier
> >> >> >> >>> > Jul 4, 2011 6:28:27 PM org.slf4j.impl.JCLLoggerAdapter info
> >> >> >> >>> > INFO: Read 50000 feature weights
> >> >> >> >>> > Jul 4, 2011 6:28:27 PM org.slf4j.impl.JCLLoggerAdapter info
> >> >> >> >>> > INFO: Read 100000 feature weights
> >> >> >> >>> > Jul 4, 2011 6:28:28 PM org.slf4j.impl.JCLLoggerAdapter info
> >> >> >> >>> > INFO: 193370.88331085522
> >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> >> >> >> >>> > INFO: rec.sport.baseball -129829.34738930278
> 531784.7805631821
> >> >> >> >>> > -0.2441388925268003
> >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> >> >> >> >>> > INFO: sci.crypt -193023.42370049533 531784.7805631821
> >> >> >> -0.3629728242618669
> >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> >> >> >> >>> > INFO: rec.sport.hockey -167853.6159738822 531784.7805631821
> >> >> >> >>> > -0.31564200802459647
> >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> >> >> >> >>> > INFO: talk.politics.guns -203524.0148974065
> 531784.7805631821
> >> >> >> >>> > -0.3827187658170024
> >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> >> >> >> >>> > INFO: soc.religion.christian -163900.9258713857
> >> 531784.7805631821
> >> >> >> >>> > -0.308209132457322
> >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> >> >> >> >>> > INFO: sci.electronics -142854.1677345925 531784.7805631821
> >> >> >> >>> > -0.26863154598614886
> >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> >> >> >> >>> > INFO: comp.os.ms-windows.misc -531784.7805631821
> >> 531784.7805631821
> >> >> >> -1.0
> >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> >> >> >> >>> > INFO: misc.forsale -143454.70176448982 531784.7805631821
> >> >> >> >>> > -0.26976082619845826
> >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> >> >> >> >>> > INFO: talk.religion.misc -139428.73484148504
> 531784.7805631821
> >> >> >> >>> > -0.2621901565024562
> >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> >> >> >> >>> > INFO: alt.atheism -139569.06867597546 531784.7805631821
> >> >> >> >>> -0.2624540486626301
> >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> >> >> >> >>> > INFO: comp.windows.x -178029.10523376046 531784.7805631821
> >> >> >> >>> > -0.33477660839638973
> >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> >> >> >> >>> > INFO: talk.politics.mideast -193075.00789450994
> >> 531784.7805631821
> >> >> >> >>> > -0.36306982627452317
> >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> >> >> >> >>> > INFO: comp.sys.ibm.pc.hardware -138410.02049984262
> >> >> 531784.7805631821
> >> >> >> >>> > -0.2602745049477736
> >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> >> >> >> >>> > INFO: comp.sys.mac.hardware -125200.9927438868
> >> 531784.7805631821
> >> >> >> >>> > -0.23543545682389364
> >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> >> >> >> >>> > INFO: sci.space -192437.0009266271 531784.7805631821
> >> >> >> -0.3618700797018455
> >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> >> >> >> >>> > INFO: rec.motorcycles -143142.20855440624 531784.7805631821
> >> >> >> >>> > -0.26917319522159455
> >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> >> >> >> >>> > INFO: rec.autos -141800.97549909537 531784.7805631821
> >> >> >> -0.2666510601317365
> >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> >> >> >> >>> > INFO: comp.graphics -166882.18654471825 531784.7805631821
> >> >> >> >>> > -0.3138152738556811
> >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> >> >> >> >>> > INFO: talk.politics.misc -165196.84193278523
> 531784.7805631821
> >> >> >> >>> > -0.3106460507535303
> >> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> >> >> >> >>> > INFO: sci.med -192698.5183245711 531784.7805631821
> >> >> >> -0.36236185270382393
> >> >> >> >>> > Exception in thread "main"
> java.lang.IllegalArgumentException:
> >> >> Label
> >> >> >> not
> >> >> >> >>> > found: alt.atheism from
> >> >> >> >>> >  at
> >> >> >> >>> >
> >> >> >>
> >> >>
> >>
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
> >> >> >> >>> > at
> >> >> >> >>> >
> >> >> >> >>>
> >> >> >>
> >> >>
> >>
> org.apache.mahout.classifier.ConfusionMatrix.getCount(ConfusionMatrix.java:93)
> >> >> >> >>> >  at
> >> >> >> >>> >
> >> >> >> >>>
> >> >> >>
> >> >>
> >>
> org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:113)
> >> >> >> >>> > at
> >> >> >> >>> >
> >> >> >> >>>
> >> >> >>
> >> >>
> >>
> org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:117)
> >> >> >> >>> >  at
> >> >> >> >>> >
> >> >> >> >>>
> >> >> >>
> >> >>
> >>
> org.apache.mahout.classifier.ConfusionMatrix.addInstance(ConfusionMatrix.java:85)
> >> >> >> >>> > at
> >> >> >> >>> >
> >> >> >> >>>
> >> >> >>
> >> >>
> >>
> org.apache.mahout.classifier.ResultAnalyzer.addInstance(ResultAnalyzer.java:67)
> >> >> >> >>> >  at
> >> >> >> >>> >
> >> >> >> >>>
> >> >> >>
> >> >>
> >>
> org.apache.mahout.classifier.bayes.TestClassifier.classifySequential(TestClassifier.java:244)
> >> >> >> >>> > at
> >> >> >> >>> >
> >> >> >> >>>
> >> >> >>
> >> >>
> >>
> org.apache.mahout.classifier.bayes.TestClassifier.main(TestClassifier.java:177)
> >> >> >> >>> >  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
> >> Method)
> >> >> >> >>> > at
> >> >> >> >>> >
> >> >> >> >>>
> >> >> >>
> >> >>
> >>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> >> >> >> >>> >  at
> >> >> >> >>> >
> >> >> >> >>>
> >> >> >>
> >> >>
> >>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >> >> >> >>> > at java.lang.reflect.Method.invoke(Method.java:597)
> >> >> >> >>> >  at
> >> >> >> >>> >
> >> >> >> >>>
> >> >> >>
> >> >>
> >>
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> >> >> >> >>> > at
> >> >> >>
> org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> >> >> >> >>> >  at
> >> >> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
> >> >> >> >>> >
> >> >> >> >>> >
> >> >> >> >>> > Any help is great appreciated.
> >> >> >> >>> >
> >> >> >> >>> > Regards,
> >> >> >> >>> > --
> >> >> >> >>> >  Vijay Santhanam
> >> >> >> >>> >  Software Engineer
> >> >> >> >>> >
> >> >> >> >>>
> >> >> >> >>
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> --
> >> >> >> >>  Vijay Santhanam
> >> >> >> >>  Software Engineer
> >> >> >> >>  http://au.linkedin.com/in/vijaysanthanam
> >> >> >> >>  0407525087
> >> >> >> >>
> >> >> >> >
> >> >> >>
> >> >> >
> >> >> >
> >> >> >
> >> >> > --
> >> >> >  Vijay Santhanam
> >> >> >  Software Engineer
> >> >> >  http://au.linkedin.com/in/vijaysanthanam
> >> >> >  0407525087
> >> >> >
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> >  Vijay Santhanam
> >> >  Software Engineer
> >> >  http://au.linkedin.com/in/vijaysanthanam
> >> >  0407525087
> >> >
> >>
> >
> >
> >
> > --
> >  Vijay Santhanam
> >  Software Engineer
> >  http://au.linkedin.com/in/vijaysanthanam
> >  0407525087
> >
>
>
>
> --
>  Vijay Santhanam
>  Software Engineer
>  http://au.linkedin.com/in/vijaysanthanam
>  0407525087
>

Re: 20news

Posted by Vijay Santhanam <vi...@gmail.com>.
I tried deleting all the folders from the test and train data except for
alt.atheism, but I get the identical error.

I might try debugging the problem in eclipse rather than from commandline,
but Eclipse doesn't quite want to work either.


On Mon, Jul 4, 2011 at 8:02 PM, Vijay Santhanam
<vi...@gmail.com>wrote:

> Thanks anyway Sergey. Could you perhaps upload your bayes-model folder so I
> could try that out?
>
>
>
> On Mon, Jul 4, 2011 at 7:57 PM, Sergey Bartunov <sb...@gmail.com>wrote:
>
>> Well, that's strange. Sorry, I can't help you at the moment, maybe
>> someone else in the mailing list could.
>>
>> On 4 July 2011 13:49, Vijay Santhanam <vi...@gmail.com> wrote:
>> > Hi Sergey,
>> >
>> > Yes, there were no errors.
>> >
>> > And all the model data seems to have been populated into bayes-model
>> folder.
>> > Also, each main folder in bayes-model has a _SUCESS file.
>> >
>> > See the tarball of my trained model here,
>> > http://dl.dropbox.com/u/7881451/bayes-model.tar.gz
>> > Please compare it to your trained model if possible, I would like to
>> know if
>> > it's different in any way.
>> >
>> > Perhaps it's corrupted in someway.
>> >
>> > Thanks,
>> > Vijay
>> >
>> >
>> >
>> > On Mon, Jul 4, 2011 at 7:39 PM, Sergey Bartunov <sb...@gmail.com>
>> wrote:
>> >
>> >> Stop, did you _train_ the classifier successfully before running the
>> >> _test_?
>> >>
>> >> On 4 July 2011 13:30, Vijay Santhanam <vi...@gmail.com>
>> wrote:
>> >> > Hi Sergey,
>> >> >
>> >> > I've tried using both the sh script file and following the
>> instructions
>> >> at
>> >> > https://cwiki.apache.org/MAHOUT/twenty-newsgroups.html - like you
>> >> suggested.
>> >> > Both return the same results.
>> >> >
>> >> > I've uploaded my bayes-test-input folder to dropbox, the first file
>> is
>> >> > here...
>> >> > http://dl.dropbox.com/u/7881451/bayes-test-input/alt.atheism.txt
>> >> >
>> >> > Thanks,
>> >> > Vijay
>> >> >
>> >> > On Mon, Jul 4, 2011 at 7:23 PM, Sergey Bartunov <sb...@gmail.com>
>> >> wrote:
>> >> >
>> >> >> Paste somewhere your  bayes-test-input file.
>> >> >>
>> >> >> On 4 July 2011 13:20, Sergey Bartunov <sb...@gmail.com> wrote:
>> >> >> > Yes, I worked WITH hadoop, but there should be no difference.
>> >> >> >
>> >> >> > Why do you use examples/bin/build/20news-bayes.sh instead of
>> direct
>> >> >> > running bin/mahout? Is it the same?
>> >> >> >
>> >> >> > On 4 July 2011 13:12, Vijay Santhanam <vi...@gmail.com>
>> >> wrote:
>> >> >> >> Thanks Sergey,
>> >> >> >>
>> >> >> >> I'm still receiving the same error after following those steps.
>> >> >> >> I've chosen not to use hadoop - does yours work WITH hadoop?
>> >> >> >>
>> >> >> >> A few bits of info that might be relevant.
>> >> >> >>
>> >> >> >> My examples/bin/work folder contains the expected folders from
>> test
>> >> data
>> >> >> >> preparation and training...
>> >> >> >> drwxr-xr-x@ 22 Vijay  staff  748 18 Mar  2003 20news-bydate-test
>> >> >> >> drwxr-xr-x@ 22 Vijay  staff  748 18 Mar  2003
>> 20news-bydate-train
>> >> >> >> drwxr-xr-x   3 Vijay  staff  102  4 Jul 19:03 bayes-model
>> >> >> >> drwxr-xr-x  22 Vijay  staff  748  4 Jul 18:20 bayes-test-input
>> >> >> >> drwxr-xr-x  22 Vijay  staff  748  4 Jul 17:49 bayes-train-input
>> >> >> >>
>> >> >> >>
>> >> >> >> I appreciate your help, do you have any other suggestions?
>> >> >> >>
>> >> >> >> Regards,
>> >> >> >> Vijay
>> >> >> >>
>> >> >> >>
>> >> >> >> On Mon, Jul 4, 2011 at 6:58 PM, Sergey Bartunov <sbos.net@
>> gmail.com>
>> >> >> wrote:
>> >> >> >>
>> >> >> >>> When I started with Mahout I had the same errors. In my case, I
>> just
>> >> >> >>> didn't run PrepareTwentyNewsgroups. You may try to accurately
>> repeat
>> >> >> >>> all steps from
>> >> https://cwiki.apache.org/MAHOUT/twenty-newsgroups.html
>> >> >> >>>
>> >> >> >>> On 4 July 2011 12:52, Vijay Santhanam <
>> vijay.santhanam@gmail.com>
>> >> >> wrote:
>> >> >> >>> > Hi All,
>> >> >> >>> >
>> >> >> >>> > I'm new to Mahout and I'm interested in experimenting with
>> it's
>> >> >> >>> classifiers.
>> >> >> >>> >
>> >> >> >>> > Right now, I'm just trying to get up and running with the
>> demo's
>> >> and
>> >> >> >>> > examples.
>> >> >> >>> >
>> >> >> >>> > After checking out the mahout trunk, I've tried running the
>> >> >> >>> classification
>> >> >> >>> > example 20news, but after running the
>> >> >> >>> ./examples/bin/build/20news-bayes.sh
>> >> >> >>> > script I get the following error during the classification
>> phase.
>> >> >> >>> >
>> >> >> >>> > Does anyone else get the same thing? Or have any
>> recommendations
>> >> >> about
>> >> >> >>> how
>> >> >> >>> > to fix it?
>> >> >> >>> > I'd just like to get a sample classifier working before I
>> embark
>> >> on
>> >> >> my
>> >> >> >>> own
>> >> >> >>> > classification journey.
>> >> >> >>> >
>> >> >> >>> >
>> >> >> >>> > INFO: Loading model from:
>> >> >> >>> > {basePath=examples/bin/work/20news-bydate/bayes-model,
>> >> >> >>> classifierType=bayes,
>> >> >> >>> > alpha_i=1.0, dataSource=hdfs, gramSize=1, verbose=false,
>> >> >> encoding=UTF-8,
>> >> >> >>> > defaultCat=unknown,
>> >> >> >>> > testDirPath=examples/bin/work/20news-bydate/bayes-test-input}
>> >> >> >>> > Jul 4, 2011 6:28:25 PM org.slf4j.impl.JCLLoggerAdapter info
>> >> >> >>> > INFO: Testing Bayes Classifier
>> >> >> >>> > Jul 4, 2011 6:28:27 PM org.slf4j.impl.JCLLoggerAdapter info
>> >> >> >>> > INFO: Read 50000 feature weights
>> >> >> >>> > Jul 4, 2011 6:28:27 PM org.slf4j.impl.JCLLoggerAdapter info
>> >> >> >>> > INFO: Read 100000 feature weights
>> >> >> >>> > Jul 4, 2011 6:28:28 PM org.slf4j.impl.JCLLoggerAdapter info
>> >> >> >>> > INFO: 193370.88331085522
>> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>> >> >> >>> > INFO: rec.sport.baseball -129829.34738930278 531784.7805631821
>> >> >> >>> > -0.2441388925268003
>> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>> >> >> >>> > INFO: sci.crypt -193023.42370049533 531784.7805631821
>> >> >> -0.3629728242618669
>> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>> >> >> >>> > INFO: rec.sport.hockey -167853.6159738822 531784.7805631821
>> >> >> >>> > -0.31564200802459647
>> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>> >> >> >>> > INFO: talk.politics.guns -203524.0148974065 531784.7805631821
>> >> >> >>> > -0.3827187658170024
>> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>> >> >> >>> > INFO: soc.religion.christian -163900.9258713857
>> 531784.7805631821
>> >> >> >>> > -0.308209132457322
>> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>> >> >> >>> > INFO: sci.electronics -142854.1677345925 531784.7805631821
>> >> >> >>> > -0.26863154598614886
>> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>> >> >> >>> > INFO: comp.os.ms-windows.misc -531784.7805631821
>> 531784.7805631821
>> >> >> -1.0
>> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>> >> >> >>> > INFO: misc.forsale -143454.70176448982 531784.7805631821
>> >> >> >>> > -0.26976082619845826
>> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>> >> >> >>> > INFO: talk.religion.misc -139428.73484148504 531784.7805631821
>> >> >> >>> > -0.2621901565024562
>> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>> >> >> >>> > INFO: alt.atheism -139569.06867597546 531784.7805631821
>> >> >> >>> -0.2624540486626301
>> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>> >> >> >>> > INFO: comp.windows.x -178029.10523376046 531784.7805631821
>> >> >> >>> > -0.33477660839638973
>> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>> >> >> >>> > INFO: talk.politics.mideast -193075.00789450994
>> 531784.7805631821
>> >> >> >>> > -0.36306982627452317
>> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>> >> >> >>> > INFO: comp.sys.ibm.pc.hardware -138410.02049984262
>> >> 531784.7805631821
>> >> >> >>> > -0.2602745049477736
>> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>> >> >> >>> > INFO: comp.sys.mac.hardware -125200.9927438868
>> 531784.7805631821
>> >> >> >>> > -0.23543545682389364
>> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>> >> >> >>> > INFO: sci.space -192437.0009266271 531784.7805631821
>> >> >> -0.3618700797018455
>> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>> >> >> >>> > INFO: rec.motorcycles -143142.20855440624 531784.7805631821
>> >> >> >>> > -0.26917319522159455
>> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>> >> >> >>> > INFO: rec.autos -141800.97549909537 531784.7805631821
>> >> >> -0.2666510601317365
>> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>> >> >> >>> > INFO: comp.graphics -166882.18654471825 531784.7805631821
>> >> >> >>> > -0.3138152738556811
>> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>> >> >> >>> > INFO: talk.politics.misc -165196.84193278523 531784.7805631821
>> >> >> >>> > -0.3106460507535303
>> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>> >> >> >>> > INFO: sci.med -192698.5183245711 531784.7805631821
>> >> >> -0.36236185270382393
>> >> >> >>> > Exception in thread "main" java.lang.IllegalArgumentException:
>> >> Label
>> >> >> not
>> >> >> >>> > found: alt.atheism from
>> >> >> >>> >  at
>> >> >> >>> >
>> >> >>
>> >>
>> com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
>> >> >> >>> > at
>> >> >> >>> >
>> >> >> >>>
>> >> >>
>> >>
>> org.apache.mahout.classifier.ConfusionMatrix.getCount(ConfusionMatrix.java:93)
>> >> >> >>> >  at
>> >> >> >>> >
>> >> >> >>>
>> >> >>
>> >>
>> org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:113)
>> >> >> >>> > at
>> >> >> >>> >
>> >> >> >>>
>> >> >>
>> >>
>> org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:117)
>> >> >> >>> >  at
>> >> >> >>> >
>> >> >> >>>
>> >> >>
>> >>
>> org.apache.mahout.classifier.ConfusionMatrix.addInstance(ConfusionMatrix.java:85)
>> >> >> >>> > at
>> >> >> >>> >
>> >> >> >>>
>> >> >>
>> >>
>> org.apache.mahout.classifier.ResultAnalyzer.addInstance(ResultAnalyzer.java:67)
>> >> >> >>> >  at
>> >> >> >>> >
>> >> >> >>>
>> >> >>
>> >>
>> org.apache.mahout.classifier.bayes.TestClassifier.classifySequential(TestClassifier.java:244)
>> >> >> >>> > at
>> >> >> >>> >
>> >> >> >>>
>> >> >>
>> >>
>> org.apache.mahout.classifier.bayes.TestClassifier.main(TestClassifier.java:177)
>> >> >> >>> >  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>> Method)
>> >> >> >>> > at
>> >> >> >>> >
>> >> >> >>>
>> >> >>
>> >>
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>> >> >> >>> >  at
>> >> >> >>> >
>> >> >> >>>
>> >> >>
>> >>
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> >> >> >>> > at java.lang.reflect.Method.invoke(Method.java:597)
>> >> >> >>> >  at
>> >> >> >>> >
>> >> >> >>>
>> >> >>
>> >>
>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>> >> >> >>> > at
>> >> >> org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>> >> >> >>> >  at
>> >> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
>> >> >> >>> >
>> >> >> >>> >
>> >> >> >>> > Any help is great appreciated.
>> >> >> >>> >
>> >> >> >>> > Regards,
>> >> >> >>> > --
>> >> >> >>> >  Vijay Santhanam
>> >> >> >>> >  Software Engineer
>> >> >> >>> >
>> >> >> >>>
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >> --
>> >> >> >>  Vijay Santhanam
>> >> >> >>  Software Engineer
>> >> >> >>  http://au.linkedin.com/in/vijaysanthanam
>> >> >> >>  0407525087
>> >> >> >>
>> >> >> >
>> >> >>
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> >  Vijay Santhanam
>> >> >  Software Engineer
>> >> >  http://au.linkedin.com/in/vijaysanthanam
>> >> >  0407525087
>> >> >
>> >>
>> >
>> >
>> >
>> > --
>> >  Vijay Santhanam
>> >  Software Engineer
>> >  http://au.linkedin.com/in/vijaysanthanam
>> >  0407525087
>> >
>>
>
>
>
> --
>  Vijay Santhanam
>  Software Engineer
>  http://au.linkedin.com/in/vijaysanthanam
>  0407525087
>



-- 
 Vijay Santhanam
 Software Engineer
 http://au.linkedin.com/in/vijaysanthanam
 0407525087

Re: 20news

Posted by Vijay Santhanam <vi...@gmail.com>.
Thanks anyway Sergey. Could you perhaps upload your bayes-model folder so I
could try that out?



On Mon, Jul 4, 2011 at 7:57 PM, Sergey Bartunov <sb...@gmail.com> wrote:

> Well, that's strange. Sorry, I can't help you at the moment, maybe
> someone else in the mailing list could.
>
> On 4 July 2011 13:49, Vijay Santhanam <vi...@gmail.com> wrote:
> > Hi Sergey,
> >
> > Yes, there were no errors.
> >
> > And all the model data seems to have been populated into bayes-model
> folder.
> > Also, each main folder in bayes-model has a _SUCESS file.
> >
> > See the tarball of my trained model here,
> > http://dl.dropbox.com/u/7881451/bayes-model.tar.gz
> > Please compare it to your trained model if possible, I would like to know
> if
> > it's different in any way.
> >
> > Perhaps it's corrupted in someway.
> >
> > Thanks,
> > Vijay
> >
> >
> >
> > On Mon, Jul 4, 2011 at 7:39 PM, Sergey Bartunov <sb...@gmail.com>
> wrote:
> >
> >> Stop, did you _train_ the classifier successfully before running the
> >> _test_?
> >>
> >> On 4 July 2011 13:30, Vijay Santhanam <vi...@gmail.com>
> wrote:
> >> > Hi Sergey,
> >> >
> >> > I've tried using both the sh script file and following the
> instructions
> >> at
> >> > https://cwiki.apache.org/MAHOUT/twenty-newsgroups.html - like you
> >> suggested.
> >> > Both return the same results.
> >> >
> >> > I've uploaded my bayes-test-input folder to dropbox, the first file is
> >> > here...
> >> > http://dl.dropbox.com/u/7881451/bayes-test-input/alt.atheism.txt
> >> >
> >> > Thanks,
> >> > Vijay
> >> >
> >> > On Mon, Jul 4, 2011 at 7:23 PM, Sergey Bartunov <sb...@gmail.com>
> >> wrote:
> >> >
> >> >> Paste somewhere your  bayes-test-input file.
> >> >>
> >> >> On 4 July 2011 13:20, Sergey Bartunov <sb...@gmail.com> wrote:
> >> >> > Yes, I worked WITH hadoop, but there should be no difference.
> >> >> >
> >> >> > Why do you use examples/bin/build/20news-bayes.sh instead of direct
> >> >> > running bin/mahout? Is it the same?
> >> >> >
> >> >> > On 4 July 2011 13:12, Vijay Santhanam <vi...@gmail.com>
> >> wrote:
> >> >> >> Thanks Sergey,
> >> >> >>
> >> >> >> I'm still receiving the same error after following those steps.
> >> >> >> I've chosen not to use hadoop - does yours work WITH hadoop?
> >> >> >>
> >> >> >> A few bits of info that might be relevant.
> >> >> >>
> >> >> >> My examples/bin/work folder contains the expected folders from
> test
> >> data
> >> >> >> preparation and training...
> >> >> >> drwxr-xr-x@ 22 Vijay  staff  748 18 Mar  2003 20news-bydate-test
> >> >> >> drwxr-xr-x@ 22 Vijay  staff  748 18 Mar  2003 20news-bydate-train
> >> >> >> drwxr-xr-x   3 Vijay  staff  102  4 Jul 19:03 bayes-model
> >> >> >> drwxr-xr-x  22 Vijay  staff  748  4 Jul 18:20 bayes-test-input
> >> >> >> drwxr-xr-x  22 Vijay  staff  748  4 Jul 17:49 bayes-train-input
> >> >> >>
> >> >> >>
> >> >> >> I appreciate your help, do you have any other suggestions?
> >> >> >>
> >> >> >> Regards,
> >> >> >> Vijay
> >> >> >>
> >> >> >>
> >> >> >> On Mon, Jul 4, 2011 at 6:58 PM, Sergey Bartunov <sbos.net@
> gmail.com>
> >> >> wrote:
> >> >> >>
> >> >> >>> When I started with Mahout I had the same errors. In my case, I
> just
> >> >> >>> didn't run PrepareTwentyNewsgroups. You may try to accurately
> repeat
> >> >> >>> all steps from
> >> https://cwiki.apache.org/MAHOUT/twenty-newsgroups.html
> >> >> >>>
> >> >> >>> On 4 July 2011 12:52, Vijay Santhanam <vijay.santhanam@gmail.com
> >
> >> >> wrote:
> >> >> >>> > Hi All,
> >> >> >>> >
> >> >> >>> > I'm new to Mahout and I'm interested in experimenting with it's
> >> >> >>> classifiers.
> >> >> >>> >
> >> >> >>> > Right now, I'm just trying to get up and running with the
> demo's
> >> and
> >> >> >>> > examples.
> >> >> >>> >
> >> >> >>> > After checking out the mahout trunk, I've tried running the
> >> >> >>> classification
> >> >> >>> > example 20news, but after running the
> >> >> >>> ./examples/bin/build/20news-bayes.sh
> >> >> >>> > script I get the following error during the classification
> phase.
> >> >> >>> >
> >> >> >>> > Does anyone else get the same thing? Or have any
> recommendations
> >> >> about
> >> >> >>> how
> >> >> >>> > to fix it?
> >> >> >>> > I'd just like to get a sample classifier working before I
> embark
> >> on
> >> >> my
> >> >> >>> own
> >> >> >>> > classification journey.
> >> >> >>> >
> >> >> >>> >
> >> >> >>> > INFO: Loading model from:
> >> >> >>> > {basePath=examples/bin/work/20news-bydate/bayes-model,
> >> >> >>> classifierType=bayes,
> >> >> >>> > alpha_i=1.0, dataSource=hdfs, gramSize=1, verbose=false,
> >> >> encoding=UTF-8,
> >> >> >>> > defaultCat=unknown,
> >> >> >>> > testDirPath=examples/bin/work/20news-bydate/bayes-test-input}
> >> >> >>> > Jul 4, 2011 6:28:25 PM org.slf4j.impl.JCLLoggerAdapter info
> >> >> >>> > INFO: Testing Bayes Classifier
> >> >> >>> > Jul 4, 2011 6:28:27 PM org.slf4j.impl.JCLLoggerAdapter info
> >> >> >>> > INFO: Read 50000 feature weights
> >> >> >>> > Jul 4, 2011 6:28:27 PM org.slf4j.impl.JCLLoggerAdapter info
> >> >> >>> > INFO: Read 100000 feature weights
> >> >> >>> > Jul 4, 2011 6:28:28 PM org.slf4j.impl.JCLLoggerAdapter info
> >> >> >>> > INFO: 193370.88331085522
> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> >> >> >>> > INFO: rec.sport.baseball -129829.34738930278 531784.7805631821
> >> >> >>> > -0.2441388925268003
> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> >> >> >>> > INFO: sci.crypt -193023.42370049533 531784.7805631821
> >> >> -0.3629728242618669
> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> >> >> >>> > INFO: rec.sport.hockey -167853.6159738822 531784.7805631821
> >> >> >>> > -0.31564200802459647
> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> >> >> >>> > INFO: talk.politics.guns -203524.0148974065 531784.7805631821
> >> >> >>> > -0.3827187658170024
> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> >> >> >>> > INFO: soc.religion.christian -163900.9258713857
> 531784.7805631821
> >> >> >>> > -0.308209132457322
> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> >> >> >>> > INFO: sci.electronics -142854.1677345925 531784.7805631821
> >> >> >>> > -0.26863154598614886
> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> >> >> >>> > INFO: comp.os.ms-windows.misc -531784.7805631821
> 531784.7805631821
> >> >> -1.0
> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> >> >> >>> > INFO: misc.forsale -143454.70176448982 531784.7805631821
> >> >> >>> > -0.26976082619845826
> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> >> >> >>> > INFO: talk.religion.misc -139428.73484148504 531784.7805631821
> >> >> >>> > -0.2621901565024562
> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> >> >> >>> > INFO: alt.atheism -139569.06867597546 531784.7805631821
> >> >> >>> -0.2624540486626301
> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> >> >> >>> > INFO: comp.windows.x -178029.10523376046 531784.7805631821
> >> >> >>> > -0.33477660839638973
> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> >> >> >>> > INFO: talk.politics.mideast -193075.00789450994
> 531784.7805631821
> >> >> >>> > -0.36306982627452317
> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> >> >> >>> > INFO: comp.sys.ibm.pc.hardware -138410.02049984262
> >> 531784.7805631821
> >> >> >>> > -0.2602745049477736
> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> >> >> >>> > INFO: comp.sys.mac.hardware -125200.9927438868
> 531784.7805631821
> >> >> >>> > -0.23543545682389364
> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> >> >> >>> > INFO: sci.space -192437.0009266271 531784.7805631821
> >> >> -0.3618700797018455
> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> >> >> >>> > INFO: rec.motorcycles -143142.20855440624 531784.7805631821
> >> >> >>> > -0.26917319522159455
> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> >> >> >>> > INFO: rec.autos -141800.97549909537 531784.7805631821
> >> >> -0.2666510601317365
> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> >> >> >>> > INFO: comp.graphics -166882.18654471825 531784.7805631821
> >> >> >>> > -0.3138152738556811
> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> >> >> >>> > INFO: talk.politics.misc -165196.84193278523 531784.7805631821
> >> >> >>> > -0.3106460507535303
> >> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> >> >> >>> > INFO: sci.med -192698.5183245711 531784.7805631821
> >> >> -0.36236185270382393
> >> >> >>> > Exception in thread "main" java.lang.IllegalArgumentException:
> >> Label
> >> >> not
> >> >> >>> > found: alt.atheism from
> >> >> >>> >  at
> >> >> >>> >
> >> >>
> >>
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
> >> >> >>> > at
> >> >> >>> >
> >> >> >>>
> >> >>
> >>
> org.apache.mahout.classifier.ConfusionMatrix.getCount(ConfusionMatrix.java:93)
> >> >> >>> >  at
> >> >> >>> >
> >> >> >>>
> >> >>
> >>
> org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:113)
> >> >> >>> > at
> >> >> >>> >
> >> >> >>>
> >> >>
> >>
> org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:117)
> >> >> >>> >  at
> >> >> >>> >
> >> >> >>>
> >> >>
> >>
> org.apache.mahout.classifier.ConfusionMatrix.addInstance(ConfusionMatrix.java:85)
> >> >> >>> > at
> >> >> >>> >
> >> >> >>>
> >> >>
> >>
> org.apache.mahout.classifier.ResultAnalyzer.addInstance(ResultAnalyzer.java:67)
> >> >> >>> >  at
> >> >> >>> >
> >> >> >>>
> >> >>
> >>
> org.apache.mahout.classifier.bayes.TestClassifier.classifySequential(TestClassifier.java:244)
> >> >> >>> > at
> >> >> >>> >
> >> >> >>>
> >> >>
> >>
> org.apache.mahout.classifier.bayes.TestClassifier.main(TestClassifier.java:177)
> >> >> >>> >  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >> >> >>> > at
> >> >> >>> >
> >> >> >>>
> >> >>
> >>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> >> >> >>> >  at
> >> >> >>> >
> >> >> >>>
> >> >>
> >>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >> >> >>> > at java.lang.reflect.Method.invoke(Method.java:597)
> >> >> >>> >  at
> >> >> >>> >
> >> >> >>>
> >> >>
> >>
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> >> >> >>> > at
> >> >> org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> >> >> >>> >  at
> >> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
> >> >> >>> >
> >> >> >>> >
> >> >> >>> > Any help is great appreciated.
> >> >> >>> >
> >> >> >>> > Regards,
> >> >> >>> > --
> >> >> >>> >  Vijay Santhanam
> >> >> >>> >  Software Engineer
> >> >> >>> >
> >> >> >>>
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> --
> >> >> >>  Vijay Santhanam
> >> >> >>  Software Engineer
> >> >> >>  http://au.linkedin.com/in/vijaysanthanam
> >> >> >>  0407525087
> >> >> >>
> >> >> >
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> >  Vijay Santhanam
> >> >  Software Engineer
> >> >  http://au.linkedin.com/in/vijaysanthanam
> >> >  0407525087
> >> >
> >>
> >
> >
> >
> > --
> >  Vijay Santhanam
> >  Software Engineer
> >  http://au.linkedin.com/in/vijaysanthanam
> >  0407525087
> >
>



-- 
 Vijay Santhanam
 Software Engineer
 http://au.linkedin.com/in/vijaysanthanam
 0407525087

Re: 20news

Posted by Sergey Bartunov <sb...@gmail.com>.
Well, that's strange. Sorry, I can't help you at the moment, maybe
someone else in the mailing list could.

On 4 July 2011 13:49, Vijay Santhanam <vi...@gmail.com> wrote:
> Hi Sergey,
>
> Yes, there were no errors.
>
> And all the model data seems to have been populated into bayes-model folder.
> Also, each main folder in bayes-model has a _SUCESS file.
>
> See the tarball of my trained model here,
> http://dl.dropbox.com/u/7881451/bayes-model.tar.gz
> Please compare it to your trained model if possible, I would like to know if
> it's different in any way.
>
> Perhaps it's corrupted in someway.
>
> Thanks,
> Vijay
>
>
>
> On Mon, Jul 4, 2011 at 7:39 PM, Sergey Bartunov <sb...@gmail.com> wrote:
>
>> Stop, did you _train_ the classifier successfully before running the
>> _test_?
>>
>> On 4 July 2011 13:30, Vijay Santhanam <vi...@gmail.com> wrote:
>> > Hi Sergey,
>> >
>> > I've tried using both the sh script file and following the instructions
>> at
>> > https://cwiki.apache.org/MAHOUT/twenty-newsgroups.html - like you
>> suggested.
>> > Both return the same results.
>> >
>> > I've uploaded my bayes-test-input folder to dropbox, the first file is
>> > here...
>> > http://dl.dropbox.com/u/7881451/bayes-test-input/alt.atheism.txt
>> >
>> > Thanks,
>> > Vijay
>> >
>> > On Mon, Jul 4, 2011 at 7:23 PM, Sergey Bartunov <sb...@gmail.com>
>> wrote:
>> >
>> >> Paste somewhere your  bayes-test-input file.
>> >>
>> >> On 4 July 2011 13:20, Sergey Bartunov <sb...@gmail.com> wrote:
>> >> > Yes, I worked WITH hadoop, but there should be no difference.
>> >> >
>> >> > Why do you use examples/bin/build/20news-bayes.sh instead of direct
>> >> > running bin/mahout? Is it the same?
>> >> >
>> >> > On 4 July 2011 13:12, Vijay Santhanam <vi...@gmail.com>
>> wrote:
>> >> >> Thanks Sergey,
>> >> >>
>> >> >> I'm still receiving the same error after following those steps.
>> >> >> I've chosen not to use hadoop - does yours work WITH hadoop?
>> >> >>
>> >> >> A few bits of info that might be relevant.
>> >> >>
>> >> >> My examples/bin/work folder contains the expected folders from test
>> data
>> >> >> preparation and training...
>> >> >> drwxr-xr-x@ 22 Vijay  staff  748 18 Mar  2003 20news-bydate-test
>> >> >> drwxr-xr-x@ 22 Vijay  staff  748 18 Mar  2003 20news-bydate-train
>> >> >> drwxr-xr-x   3 Vijay  staff  102  4 Jul 19:03 bayes-model
>> >> >> drwxr-xr-x  22 Vijay  staff  748  4 Jul 18:20 bayes-test-input
>> >> >> drwxr-xr-x  22 Vijay  staff  748  4 Jul 17:49 bayes-train-input
>> >> >>
>> >> >>
>> >> >> I appreciate your help, do you have any other suggestions?
>> >> >>
>> >> >> Regards,
>> >> >> Vijay
>> >> >>
>> >> >>
>> >> >> On Mon, Jul 4, 2011 at 6:58 PM, Sergey Bartunov <sb...@gmail.com>
>> >> wrote:
>> >> >>
>> >> >>> When I started with Mahout I had the same errors. In my case, I just
>> >> >>> didn't run PrepareTwentyNewsgroups. You may try to accurately repeat
>> >> >>> all steps from
>> https://cwiki.apache.org/MAHOUT/twenty-newsgroups.html
>> >> >>>
>> >> >>> On 4 July 2011 12:52, Vijay Santhanam <vi...@gmail.com>
>> >> wrote:
>> >> >>> > Hi All,
>> >> >>> >
>> >> >>> > I'm new to Mahout and I'm interested in experimenting with it's
>> >> >>> classifiers.
>> >> >>> >
>> >> >>> > Right now, I'm just trying to get up and running with the demo's
>> and
>> >> >>> > examples.
>> >> >>> >
>> >> >>> > After checking out the mahout trunk, I've tried running the
>> >> >>> classification
>> >> >>> > example 20news, but after running the
>> >> >>> ./examples/bin/build/20news-bayes.sh
>> >> >>> > script I get the following error during the classification phase.
>> >> >>> >
>> >> >>> > Does anyone else get the same thing? Or have any recommendations
>> >> about
>> >> >>> how
>> >> >>> > to fix it?
>> >> >>> > I'd just like to get a sample classifier working before I embark
>> on
>> >> my
>> >> >>> own
>> >> >>> > classification journey.
>> >> >>> >
>> >> >>> >
>> >> >>> > INFO: Loading model from:
>> >> >>> > {basePath=examples/bin/work/20news-bydate/bayes-model,
>> >> >>> classifierType=bayes,
>> >> >>> > alpha_i=1.0, dataSource=hdfs, gramSize=1, verbose=false,
>> >> encoding=UTF-8,
>> >> >>> > defaultCat=unknown,
>> >> >>> > testDirPath=examples/bin/work/20news-bydate/bayes-test-input}
>> >> >>> > Jul 4, 2011 6:28:25 PM org.slf4j.impl.JCLLoggerAdapter info
>> >> >>> > INFO: Testing Bayes Classifier
>> >> >>> > Jul 4, 2011 6:28:27 PM org.slf4j.impl.JCLLoggerAdapter info
>> >> >>> > INFO: Read 50000 feature weights
>> >> >>> > Jul 4, 2011 6:28:27 PM org.slf4j.impl.JCLLoggerAdapter info
>> >> >>> > INFO: Read 100000 feature weights
>> >> >>> > Jul 4, 2011 6:28:28 PM org.slf4j.impl.JCLLoggerAdapter info
>> >> >>> > INFO: 193370.88331085522
>> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>> >> >>> > INFO: rec.sport.baseball -129829.34738930278 531784.7805631821
>> >> >>> > -0.2441388925268003
>> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>> >> >>> > INFO: sci.crypt -193023.42370049533 531784.7805631821
>> >> -0.3629728242618669
>> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>> >> >>> > INFO: rec.sport.hockey -167853.6159738822 531784.7805631821
>> >> >>> > -0.31564200802459647
>> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>> >> >>> > INFO: talk.politics.guns -203524.0148974065 531784.7805631821
>> >> >>> > -0.3827187658170024
>> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>> >> >>> > INFO: soc.religion.christian -163900.9258713857 531784.7805631821
>> >> >>> > -0.308209132457322
>> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>> >> >>> > INFO: sci.electronics -142854.1677345925 531784.7805631821
>> >> >>> > -0.26863154598614886
>> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>> >> >>> > INFO: comp.os.ms-windows.misc -531784.7805631821 531784.7805631821
>> >> -1.0
>> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>> >> >>> > INFO: misc.forsale -143454.70176448982 531784.7805631821
>> >> >>> > -0.26976082619845826
>> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>> >> >>> > INFO: talk.religion.misc -139428.73484148504 531784.7805631821
>> >> >>> > -0.2621901565024562
>> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>> >> >>> > INFO: alt.atheism -139569.06867597546 531784.7805631821
>> >> >>> -0.2624540486626301
>> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>> >> >>> > INFO: comp.windows.x -178029.10523376046 531784.7805631821
>> >> >>> > -0.33477660839638973
>> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>> >> >>> > INFO: talk.politics.mideast -193075.00789450994 531784.7805631821
>> >> >>> > -0.36306982627452317
>> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>> >> >>> > INFO: comp.sys.ibm.pc.hardware -138410.02049984262
>> 531784.7805631821
>> >> >>> > -0.2602745049477736
>> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>> >> >>> > INFO: comp.sys.mac.hardware -125200.9927438868 531784.7805631821
>> >> >>> > -0.23543545682389364
>> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>> >> >>> > INFO: sci.space -192437.0009266271 531784.7805631821
>> >> -0.3618700797018455
>> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>> >> >>> > INFO: rec.motorcycles -143142.20855440624 531784.7805631821
>> >> >>> > -0.26917319522159455
>> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>> >> >>> > INFO: rec.autos -141800.97549909537 531784.7805631821
>> >> -0.2666510601317365
>> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>> >> >>> > INFO: comp.graphics -166882.18654471825 531784.7805631821
>> >> >>> > -0.3138152738556811
>> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>> >> >>> > INFO: talk.politics.misc -165196.84193278523 531784.7805631821
>> >> >>> > -0.3106460507535303
>> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>> >> >>> > INFO: sci.med -192698.5183245711 531784.7805631821
>> >> -0.36236185270382393
>> >> >>> > Exception in thread "main" java.lang.IllegalArgumentException:
>> Label
>> >> not
>> >> >>> > found: alt.atheism from
>> >> >>> >  at
>> >> >>> >
>> >>
>> com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
>> >> >>> > at
>> >> >>> >
>> >> >>>
>> >>
>> org.apache.mahout.classifier.ConfusionMatrix.getCount(ConfusionMatrix.java:93)
>> >> >>> >  at
>> >> >>> >
>> >> >>>
>> >>
>> org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:113)
>> >> >>> > at
>> >> >>> >
>> >> >>>
>> >>
>> org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:117)
>> >> >>> >  at
>> >> >>> >
>> >> >>>
>> >>
>> org.apache.mahout.classifier.ConfusionMatrix.addInstance(ConfusionMatrix.java:85)
>> >> >>> > at
>> >> >>> >
>> >> >>>
>> >>
>> org.apache.mahout.classifier.ResultAnalyzer.addInstance(ResultAnalyzer.java:67)
>> >> >>> >  at
>> >> >>> >
>> >> >>>
>> >>
>> org.apache.mahout.classifier.bayes.TestClassifier.classifySequential(TestClassifier.java:244)
>> >> >>> > at
>> >> >>> >
>> >> >>>
>> >>
>> org.apache.mahout.classifier.bayes.TestClassifier.main(TestClassifier.java:177)
>> >> >>> >  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> >> >>> > at
>> >> >>> >
>> >> >>>
>> >>
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>> >> >>> >  at
>> >> >>> >
>> >> >>>
>> >>
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> >> >>> > at java.lang.reflect.Method.invoke(Method.java:597)
>> >> >>> >  at
>> >> >>> >
>> >> >>>
>> >>
>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>> >> >>> > at
>> >> org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>> >> >>> >  at
>> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
>> >> >>> >
>> >> >>> >
>> >> >>> > Any help is great appreciated.
>> >> >>> >
>> >> >>> > Regards,
>> >> >>> > --
>> >> >>> >  Vijay Santhanam
>> >> >>> >  Software Engineer
>> >> >>> >
>> >> >>>
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >>  Vijay Santhanam
>> >> >>  Software Engineer
>> >> >>  http://au.linkedin.com/in/vijaysanthanam
>> >> >>  0407525087
>> >> >>
>> >> >
>> >>
>> >
>> >
>> >
>> > --
>> >  Vijay Santhanam
>> >  Software Engineer
>> >  http://au.linkedin.com/in/vijaysanthanam
>> >  0407525087
>> >
>>
>
>
>
> --
>  Vijay Santhanam
>  Software Engineer
>  http://au.linkedin.com/in/vijaysanthanam
>  0407525087
>

Re: 20news

Posted by Vijay Santhanam <vi...@gmail.com>.
Hi Sergey,

Yes, there were no errors.

And all the model data seems to have been populated into bayes-model folder.
Also, each main folder in bayes-model has a _SUCESS file.

See the tarball of my trained model here,
http://dl.dropbox.com/u/7881451/bayes-model.tar.gz
Please compare it to your trained model if possible, I would like to know if
it's different in any way.

Perhaps it's corrupted in someway.

Thanks,
Vijay



On Mon, Jul 4, 2011 at 7:39 PM, Sergey Bartunov <sb...@gmail.com> wrote:

> Stop, did you _train_ the classifier successfully before running the
> _test_?
>
> On 4 July 2011 13:30, Vijay Santhanam <vi...@gmail.com> wrote:
> > Hi Sergey,
> >
> > I've tried using both the sh script file and following the instructions
> at
> > https://cwiki.apache.org/MAHOUT/twenty-newsgroups.html - like you
> suggested.
> > Both return the same results.
> >
> > I've uploaded my bayes-test-input folder to dropbox, the first file is
> > here...
> > http://dl.dropbox.com/u/7881451/bayes-test-input/alt.atheism.txt
> >
> > Thanks,
> > Vijay
> >
> > On Mon, Jul 4, 2011 at 7:23 PM, Sergey Bartunov <sb...@gmail.com>
> wrote:
> >
> >> Paste somewhere your  bayes-test-input file.
> >>
> >> On 4 July 2011 13:20, Sergey Bartunov <sb...@gmail.com> wrote:
> >> > Yes, I worked WITH hadoop, but there should be no difference.
> >> >
> >> > Why do you use examples/bin/build/20news-bayes.sh instead of direct
> >> > running bin/mahout? Is it the same?
> >> >
> >> > On 4 July 2011 13:12, Vijay Santhanam <vi...@gmail.com>
> wrote:
> >> >> Thanks Sergey,
> >> >>
> >> >> I'm still receiving the same error after following those steps.
> >> >> I've chosen not to use hadoop - does yours work WITH hadoop?
> >> >>
> >> >> A few bits of info that might be relevant.
> >> >>
> >> >> My examples/bin/work folder contains the expected folders from test
> data
> >> >> preparation and training...
> >> >> drwxr-xr-x@ 22 Vijay  staff  748 18 Mar  2003 20news-bydate-test
> >> >> drwxr-xr-x@ 22 Vijay  staff  748 18 Mar  2003 20news-bydate-train
> >> >> drwxr-xr-x   3 Vijay  staff  102  4 Jul 19:03 bayes-model
> >> >> drwxr-xr-x  22 Vijay  staff  748  4 Jul 18:20 bayes-test-input
> >> >> drwxr-xr-x  22 Vijay  staff  748  4 Jul 17:49 bayes-train-input
> >> >>
> >> >>
> >> >> I appreciate your help, do you have any other suggestions?
> >> >>
> >> >> Regards,
> >> >> Vijay
> >> >>
> >> >>
> >> >> On Mon, Jul 4, 2011 at 6:58 PM, Sergey Bartunov <sb...@gmail.com>
> >> wrote:
> >> >>
> >> >>> When I started with Mahout I had the same errors. In my case, I just
> >> >>> didn't run PrepareTwentyNewsgroups. You may try to accurately repeat
> >> >>> all steps from
> https://cwiki.apache.org/MAHOUT/twenty-newsgroups.html
> >> >>>
> >> >>> On 4 July 2011 12:52, Vijay Santhanam <vi...@gmail.com>
> >> wrote:
> >> >>> > Hi All,
> >> >>> >
> >> >>> > I'm new to Mahout and I'm interested in experimenting with it's
> >> >>> classifiers.
> >> >>> >
> >> >>> > Right now, I'm just trying to get up and running with the demo's
> and
> >> >>> > examples.
> >> >>> >
> >> >>> > After checking out the mahout trunk, I've tried running the
> >> >>> classification
> >> >>> > example 20news, but after running the
> >> >>> ./examples/bin/build/20news-bayes.sh
> >> >>> > script I get the following error during the classification phase.
> >> >>> >
> >> >>> > Does anyone else get the same thing? Or have any recommendations
> >> about
> >> >>> how
> >> >>> > to fix it?
> >> >>> > I'd just like to get a sample classifier working before I embark
> on
> >> my
> >> >>> own
> >> >>> > classification journey.
> >> >>> >
> >> >>> >
> >> >>> > INFO: Loading model from:
> >> >>> > {basePath=examples/bin/work/20news-bydate/bayes-model,
> >> >>> classifierType=bayes,
> >> >>> > alpha_i=1.0, dataSource=hdfs, gramSize=1, verbose=false,
> >> encoding=UTF-8,
> >> >>> > defaultCat=unknown,
> >> >>> > testDirPath=examples/bin/work/20news-bydate/bayes-test-input}
> >> >>> > Jul 4, 2011 6:28:25 PM org.slf4j.impl.JCLLoggerAdapter info
> >> >>> > INFO: Testing Bayes Classifier
> >> >>> > Jul 4, 2011 6:28:27 PM org.slf4j.impl.JCLLoggerAdapter info
> >> >>> > INFO: Read 50000 feature weights
> >> >>> > Jul 4, 2011 6:28:27 PM org.slf4j.impl.JCLLoggerAdapter info
> >> >>> > INFO: Read 100000 feature weights
> >> >>> > Jul 4, 2011 6:28:28 PM org.slf4j.impl.JCLLoggerAdapter info
> >> >>> > INFO: 193370.88331085522
> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> >> >>> > INFO: rec.sport.baseball -129829.34738930278 531784.7805631821
> >> >>> > -0.2441388925268003
> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> >> >>> > INFO: sci.crypt -193023.42370049533 531784.7805631821
> >> -0.3629728242618669
> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> >> >>> > INFO: rec.sport.hockey -167853.6159738822 531784.7805631821
> >> >>> > -0.31564200802459647
> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> >> >>> > INFO: talk.politics.guns -203524.0148974065 531784.7805631821
> >> >>> > -0.3827187658170024
> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> >> >>> > INFO: soc.religion.christian -163900.9258713857 531784.7805631821
> >> >>> > -0.308209132457322
> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> >> >>> > INFO: sci.electronics -142854.1677345925 531784.7805631821
> >> >>> > -0.26863154598614886
> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> >> >>> > INFO: comp.os.ms-windows.misc -531784.7805631821 531784.7805631821
> >> -1.0
> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> >> >>> > INFO: misc.forsale -143454.70176448982 531784.7805631821
> >> >>> > -0.26976082619845826
> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> >> >>> > INFO: talk.religion.misc -139428.73484148504 531784.7805631821
> >> >>> > -0.2621901565024562
> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> >> >>> > INFO: alt.atheism -139569.06867597546 531784.7805631821
> >> >>> -0.2624540486626301
> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> >> >>> > INFO: comp.windows.x -178029.10523376046 531784.7805631821
> >> >>> > -0.33477660839638973
> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> >> >>> > INFO: talk.politics.mideast -193075.00789450994 531784.7805631821
> >> >>> > -0.36306982627452317
> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> >> >>> > INFO: comp.sys.ibm.pc.hardware -138410.02049984262
> 531784.7805631821
> >> >>> > -0.2602745049477736
> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> >> >>> > INFO: comp.sys.mac.hardware -125200.9927438868 531784.7805631821
> >> >>> > -0.23543545682389364
> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> >> >>> > INFO: sci.space -192437.0009266271 531784.7805631821
> >> -0.3618700797018455
> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> >> >>> > INFO: rec.motorcycles -143142.20855440624 531784.7805631821
> >> >>> > -0.26917319522159455
> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> >> >>> > INFO: rec.autos -141800.97549909537 531784.7805631821
> >> -0.2666510601317365
> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> >> >>> > INFO: comp.graphics -166882.18654471825 531784.7805631821
> >> >>> > -0.3138152738556811
> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> >> >>> > INFO: talk.politics.misc -165196.84193278523 531784.7805631821
> >> >>> > -0.3106460507535303
> >> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> >> >>> > INFO: sci.med -192698.5183245711 531784.7805631821
> >> -0.36236185270382393
> >> >>> > Exception in thread "main" java.lang.IllegalArgumentException:
> Label
> >> not
> >> >>> > found: alt.atheism from
> >> >>> >  at
> >> >>> >
> >>
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
> >> >>> > at
> >> >>> >
> >> >>>
> >>
> org.apache.mahout.classifier.ConfusionMatrix.getCount(ConfusionMatrix.java:93)
> >> >>> >  at
> >> >>> >
> >> >>>
> >>
> org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:113)
> >> >>> > at
> >> >>> >
> >> >>>
> >>
> org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:117)
> >> >>> >  at
> >> >>> >
> >> >>>
> >>
> org.apache.mahout.classifier.ConfusionMatrix.addInstance(ConfusionMatrix.java:85)
> >> >>> > at
> >> >>> >
> >> >>>
> >>
> org.apache.mahout.classifier.ResultAnalyzer.addInstance(ResultAnalyzer.java:67)
> >> >>> >  at
> >> >>> >
> >> >>>
> >>
> org.apache.mahout.classifier.bayes.TestClassifier.classifySequential(TestClassifier.java:244)
> >> >>> > at
> >> >>> >
> >> >>>
> >>
> org.apache.mahout.classifier.bayes.TestClassifier.main(TestClassifier.java:177)
> >> >>> >  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >> >>> > at
> >> >>> >
> >> >>>
> >>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> >> >>> >  at
> >> >>> >
> >> >>>
> >>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >> >>> > at java.lang.reflect.Method.invoke(Method.java:597)
> >> >>> >  at
> >> >>> >
> >> >>>
> >>
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> >> >>> > at
> >> org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> >> >>> >  at
> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
> >> >>> >
> >> >>> >
> >> >>> > Any help is great appreciated.
> >> >>> >
> >> >>> > Regards,
> >> >>> > --
> >> >>> >  Vijay Santhanam
> >> >>> >  Software Engineer
> >> >>> >
> >> >>>
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >>  Vijay Santhanam
> >> >>  Software Engineer
> >> >>  http://au.linkedin.com/in/vijaysanthanam
> >> >>  0407525087
> >> >>
> >> >
> >>
> >
> >
> >
> > --
> >  Vijay Santhanam
> >  Software Engineer
> >  http://au.linkedin.com/in/vijaysanthanam
> >  0407525087
> >
>



-- 
 Vijay Santhanam
 Software Engineer
 http://au.linkedin.com/in/vijaysanthanam
 0407525087

Re: 20news

Posted by Sergey Bartunov <sb...@gmail.com>.
Stop, did you _train_ the classifier successfully before running the _test_?

On 4 July 2011 13:30, Vijay Santhanam <vi...@gmail.com> wrote:
> Hi Sergey,
>
> I've tried using both the sh script file and following the instructions at
> https://cwiki.apache.org/MAHOUT/twenty-newsgroups.html - like you suggested.
> Both return the same results.
>
> I've uploaded my bayes-test-input folder to dropbox, the first file is
> here...
> http://dl.dropbox.com/u/7881451/bayes-test-input/alt.atheism.txt
>
> Thanks,
> Vijay
>
> On Mon, Jul 4, 2011 at 7:23 PM, Sergey Bartunov <sb...@gmail.com> wrote:
>
>> Paste somewhere your  bayes-test-input file.
>>
>> On 4 July 2011 13:20, Sergey Bartunov <sb...@gmail.com> wrote:
>> > Yes, I worked WITH hadoop, but there should be no difference.
>> >
>> > Why do you use examples/bin/build/20news-bayes.sh instead of direct
>> > running bin/mahout? Is it the same?
>> >
>> > On 4 July 2011 13:12, Vijay Santhanam <vi...@gmail.com> wrote:
>> >> Thanks Sergey,
>> >>
>> >> I'm still receiving the same error after following those steps.
>> >> I've chosen not to use hadoop - does yours work WITH hadoop?
>> >>
>> >> A few bits of info that might be relevant.
>> >>
>> >> My examples/bin/work folder contains the expected folders from test data
>> >> preparation and training...
>> >> drwxr-xr-x@ 22 Vijay  staff  748 18 Mar  2003 20news-bydate-test
>> >> drwxr-xr-x@ 22 Vijay  staff  748 18 Mar  2003 20news-bydate-train
>> >> drwxr-xr-x   3 Vijay  staff  102  4 Jul 19:03 bayes-model
>> >> drwxr-xr-x  22 Vijay  staff  748  4 Jul 18:20 bayes-test-input
>> >> drwxr-xr-x  22 Vijay  staff  748  4 Jul 17:49 bayes-train-input
>> >>
>> >>
>> >> I appreciate your help, do you have any other suggestions?
>> >>
>> >> Regards,
>> >> Vijay
>> >>
>> >>
>> >> On Mon, Jul 4, 2011 at 6:58 PM, Sergey Bartunov <sb...@gmail.com>
>> wrote:
>> >>
>> >>> When I started with Mahout I had the same errors. In my case, I just
>> >>> didn't run PrepareTwentyNewsgroups. You may try to accurately repeat
>> >>> all steps from https://cwiki.apache.org/MAHOUT/twenty-newsgroups.html
>> >>>
>> >>> On 4 July 2011 12:52, Vijay Santhanam <vi...@gmail.com>
>> wrote:
>> >>> > Hi All,
>> >>> >
>> >>> > I'm new to Mahout and I'm interested in experimenting with it's
>> >>> classifiers.
>> >>> >
>> >>> > Right now, I'm just trying to get up and running with the demo's and
>> >>> > examples.
>> >>> >
>> >>> > After checking out the mahout trunk, I've tried running the
>> >>> classification
>> >>> > example 20news, but after running the
>> >>> ./examples/bin/build/20news-bayes.sh
>> >>> > script I get the following error during the classification phase.
>> >>> >
>> >>> > Does anyone else get the same thing? Or have any recommendations
>> about
>> >>> how
>> >>> > to fix it?
>> >>> > I'd just like to get a sample classifier working before I embark on
>> my
>> >>> own
>> >>> > classification journey.
>> >>> >
>> >>> >
>> >>> > INFO: Loading model from:
>> >>> > {basePath=examples/bin/work/20news-bydate/bayes-model,
>> >>> classifierType=bayes,
>> >>> > alpha_i=1.0, dataSource=hdfs, gramSize=1, verbose=false,
>> encoding=UTF-8,
>> >>> > defaultCat=unknown,
>> >>> > testDirPath=examples/bin/work/20news-bydate/bayes-test-input}
>> >>> > Jul 4, 2011 6:28:25 PM org.slf4j.impl.JCLLoggerAdapter info
>> >>> > INFO: Testing Bayes Classifier
>> >>> > Jul 4, 2011 6:28:27 PM org.slf4j.impl.JCLLoggerAdapter info
>> >>> > INFO: Read 50000 feature weights
>> >>> > Jul 4, 2011 6:28:27 PM org.slf4j.impl.JCLLoggerAdapter info
>> >>> > INFO: Read 100000 feature weights
>> >>> > Jul 4, 2011 6:28:28 PM org.slf4j.impl.JCLLoggerAdapter info
>> >>> > INFO: 193370.88331085522
>> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>> >>> > INFO: rec.sport.baseball -129829.34738930278 531784.7805631821
>> >>> > -0.2441388925268003
>> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>> >>> > INFO: sci.crypt -193023.42370049533 531784.7805631821
>> -0.3629728242618669
>> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>> >>> > INFO: rec.sport.hockey -167853.6159738822 531784.7805631821
>> >>> > -0.31564200802459647
>> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>> >>> > INFO: talk.politics.guns -203524.0148974065 531784.7805631821
>> >>> > -0.3827187658170024
>> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>> >>> > INFO: soc.religion.christian -163900.9258713857 531784.7805631821
>> >>> > -0.308209132457322
>> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>> >>> > INFO: sci.electronics -142854.1677345925 531784.7805631821
>> >>> > -0.26863154598614886
>> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>> >>> > INFO: comp.os.ms-windows.misc -531784.7805631821 531784.7805631821
>> -1.0
>> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>> >>> > INFO: misc.forsale -143454.70176448982 531784.7805631821
>> >>> > -0.26976082619845826
>> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>> >>> > INFO: talk.religion.misc -139428.73484148504 531784.7805631821
>> >>> > -0.2621901565024562
>> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>> >>> > INFO: alt.atheism -139569.06867597546 531784.7805631821
>> >>> -0.2624540486626301
>> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>> >>> > INFO: comp.windows.x -178029.10523376046 531784.7805631821
>> >>> > -0.33477660839638973
>> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>> >>> > INFO: talk.politics.mideast -193075.00789450994 531784.7805631821
>> >>> > -0.36306982627452317
>> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>> >>> > INFO: comp.sys.ibm.pc.hardware -138410.02049984262 531784.7805631821
>> >>> > -0.2602745049477736
>> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>> >>> > INFO: comp.sys.mac.hardware -125200.9927438868 531784.7805631821
>> >>> > -0.23543545682389364
>> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>> >>> > INFO: sci.space -192437.0009266271 531784.7805631821
>> -0.3618700797018455
>> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>> >>> > INFO: rec.motorcycles -143142.20855440624 531784.7805631821
>> >>> > -0.26917319522159455
>> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>> >>> > INFO: rec.autos -141800.97549909537 531784.7805631821
>> -0.2666510601317365
>> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>> >>> > INFO: comp.graphics -166882.18654471825 531784.7805631821
>> >>> > -0.3138152738556811
>> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>> >>> > INFO: talk.politics.misc -165196.84193278523 531784.7805631821
>> >>> > -0.3106460507535303
>> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>> >>> > INFO: sci.med -192698.5183245711 531784.7805631821
>> -0.36236185270382393
>> >>> > Exception in thread "main" java.lang.IllegalArgumentException: Label
>> not
>> >>> > found: alt.atheism from
>> >>> >  at
>> >>> >
>> com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
>> >>> > at
>> >>> >
>> >>>
>> org.apache.mahout.classifier.ConfusionMatrix.getCount(ConfusionMatrix.java:93)
>> >>> >  at
>> >>> >
>> >>>
>> org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:113)
>> >>> > at
>> >>> >
>> >>>
>> org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:117)
>> >>> >  at
>> >>> >
>> >>>
>> org.apache.mahout.classifier.ConfusionMatrix.addInstance(ConfusionMatrix.java:85)
>> >>> > at
>> >>> >
>> >>>
>> org.apache.mahout.classifier.ResultAnalyzer.addInstance(ResultAnalyzer.java:67)
>> >>> >  at
>> >>> >
>> >>>
>> org.apache.mahout.classifier.bayes.TestClassifier.classifySequential(TestClassifier.java:244)
>> >>> > at
>> >>> >
>> >>>
>> org.apache.mahout.classifier.bayes.TestClassifier.main(TestClassifier.java:177)
>> >>> >  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> >>> > at
>> >>> >
>> >>>
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>> >>> >  at
>> >>> >
>> >>>
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> >>> > at java.lang.reflect.Method.invoke(Method.java:597)
>> >>> >  at
>> >>> >
>> >>>
>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>> >>> > at
>> org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>> >>> >  at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
>> >>> >
>> >>> >
>> >>> > Any help is great appreciated.
>> >>> >
>> >>> > Regards,
>> >>> > --
>> >>> >  Vijay Santhanam
>> >>> >  Software Engineer
>> >>> >
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >>  Vijay Santhanam
>> >>  Software Engineer
>> >>  http://au.linkedin.com/in/vijaysanthanam
>> >>  0407525087
>> >>
>> >
>>
>
>
>
> --
>  Vijay Santhanam
>  Software Engineer
>  http://au.linkedin.com/in/vijaysanthanam
>  0407525087
>

Re: 20news

Posted by Vijay Santhanam <vi...@gmail.com>.
Hi Sergey,

I've tried using both the sh script file and following the instructions at
https://cwiki.apache.org/MAHOUT/twenty-newsgroups.html - like you suggested.
Both return the same results.

I've uploaded my bayes-test-input folder to dropbox, the first file is
here...
http://dl.dropbox.com/u/7881451/bayes-test-input/alt.atheism.txt

Thanks,
Vijay

On Mon, Jul 4, 2011 at 7:23 PM, Sergey Bartunov <sb...@gmail.com> wrote:

> Paste somewhere your  bayes-test-input file.
>
> On 4 July 2011 13:20, Sergey Bartunov <sb...@gmail.com> wrote:
> > Yes, I worked WITH hadoop, but there should be no difference.
> >
> > Why do you use examples/bin/build/20news-bayes.sh instead of direct
> > running bin/mahout? Is it the same?
> >
> > On 4 July 2011 13:12, Vijay Santhanam <vi...@gmail.com> wrote:
> >> Thanks Sergey,
> >>
> >> I'm still receiving the same error after following those steps.
> >> I've chosen not to use hadoop - does yours work WITH hadoop?
> >>
> >> A few bits of info that might be relevant.
> >>
> >> My examples/bin/work folder contains the expected folders from test data
> >> preparation and training...
> >> drwxr-xr-x@ 22 Vijay  staff  748 18 Mar  2003 20news-bydate-test
> >> drwxr-xr-x@ 22 Vijay  staff  748 18 Mar  2003 20news-bydate-train
> >> drwxr-xr-x   3 Vijay  staff  102  4 Jul 19:03 bayes-model
> >> drwxr-xr-x  22 Vijay  staff  748  4 Jul 18:20 bayes-test-input
> >> drwxr-xr-x  22 Vijay  staff  748  4 Jul 17:49 bayes-train-input
> >>
> >>
> >> I appreciate your help, do you have any other suggestions?
> >>
> >> Regards,
> >> Vijay
> >>
> >>
> >> On Mon, Jul 4, 2011 at 6:58 PM, Sergey Bartunov <sb...@gmail.com>
> wrote:
> >>
> >>> When I started with Mahout I had the same errors. In my case, I just
> >>> didn't run PrepareTwentyNewsgroups. You may try to accurately repeat
> >>> all steps from https://cwiki.apache.org/MAHOUT/twenty-newsgroups.html
> >>>
> >>> On 4 July 2011 12:52, Vijay Santhanam <vi...@gmail.com>
> wrote:
> >>> > Hi All,
> >>> >
> >>> > I'm new to Mahout and I'm interested in experimenting with it's
> >>> classifiers.
> >>> >
> >>> > Right now, I'm just trying to get up and running with the demo's and
> >>> > examples.
> >>> >
> >>> > After checking out the mahout trunk, I've tried running the
> >>> classification
> >>> > example 20news, but after running the
> >>> ./examples/bin/build/20news-bayes.sh
> >>> > script I get the following error during the classification phase.
> >>> >
> >>> > Does anyone else get the same thing? Or have any recommendations
> about
> >>> how
> >>> > to fix it?
> >>> > I'd just like to get a sample classifier working before I embark on
> my
> >>> own
> >>> > classification journey.
> >>> >
> >>> >
> >>> > INFO: Loading model from:
> >>> > {basePath=examples/bin/work/20news-bydate/bayes-model,
> >>> classifierType=bayes,
> >>> > alpha_i=1.0, dataSource=hdfs, gramSize=1, verbose=false,
> encoding=UTF-8,
> >>> > defaultCat=unknown,
> >>> > testDirPath=examples/bin/work/20news-bydate/bayes-test-input}
> >>> > Jul 4, 2011 6:28:25 PM org.slf4j.impl.JCLLoggerAdapter info
> >>> > INFO: Testing Bayes Classifier
> >>> > Jul 4, 2011 6:28:27 PM org.slf4j.impl.JCLLoggerAdapter info
> >>> > INFO: Read 50000 feature weights
> >>> > Jul 4, 2011 6:28:27 PM org.slf4j.impl.JCLLoggerAdapter info
> >>> > INFO: Read 100000 feature weights
> >>> > Jul 4, 2011 6:28:28 PM org.slf4j.impl.JCLLoggerAdapter info
> >>> > INFO: 193370.88331085522
> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> >>> > INFO: rec.sport.baseball -129829.34738930278 531784.7805631821
> >>> > -0.2441388925268003
> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> >>> > INFO: sci.crypt -193023.42370049533 531784.7805631821
> -0.3629728242618669
> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> >>> > INFO: rec.sport.hockey -167853.6159738822 531784.7805631821
> >>> > -0.31564200802459647
> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> >>> > INFO: talk.politics.guns -203524.0148974065 531784.7805631821
> >>> > -0.3827187658170024
> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> >>> > INFO: soc.religion.christian -163900.9258713857 531784.7805631821
> >>> > -0.308209132457322
> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> >>> > INFO: sci.electronics -142854.1677345925 531784.7805631821
> >>> > -0.26863154598614886
> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> >>> > INFO: comp.os.ms-windows.misc -531784.7805631821 531784.7805631821
> -1.0
> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> >>> > INFO: misc.forsale -143454.70176448982 531784.7805631821
> >>> > -0.26976082619845826
> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> >>> > INFO: talk.religion.misc -139428.73484148504 531784.7805631821
> >>> > -0.2621901565024562
> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> >>> > INFO: alt.atheism -139569.06867597546 531784.7805631821
> >>> -0.2624540486626301
> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> >>> > INFO: comp.windows.x -178029.10523376046 531784.7805631821
> >>> > -0.33477660839638973
> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> >>> > INFO: talk.politics.mideast -193075.00789450994 531784.7805631821
> >>> > -0.36306982627452317
> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> >>> > INFO: comp.sys.ibm.pc.hardware -138410.02049984262 531784.7805631821
> >>> > -0.2602745049477736
> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> >>> > INFO: comp.sys.mac.hardware -125200.9927438868 531784.7805631821
> >>> > -0.23543545682389364
> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> >>> > INFO: sci.space -192437.0009266271 531784.7805631821
> -0.3618700797018455
> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> >>> > INFO: rec.motorcycles -143142.20855440624 531784.7805631821
> >>> > -0.26917319522159455
> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> >>> > INFO: rec.autos -141800.97549909537 531784.7805631821
> -0.2666510601317365
> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> >>> > INFO: comp.graphics -166882.18654471825 531784.7805631821
> >>> > -0.3138152738556811
> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> >>> > INFO: talk.politics.misc -165196.84193278523 531784.7805631821
> >>> > -0.3106460507535303
> >>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> >>> > INFO: sci.med -192698.5183245711 531784.7805631821
> -0.36236185270382393
> >>> > Exception in thread "main" java.lang.IllegalArgumentException: Label
> not
> >>> > found: alt.atheism from
> >>> >  at
> >>> >
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
> >>> > at
> >>> >
> >>>
> org.apache.mahout.classifier.ConfusionMatrix.getCount(ConfusionMatrix.java:93)
> >>> >  at
> >>> >
> >>>
> org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:113)
> >>> > at
> >>> >
> >>>
> org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:117)
> >>> >  at
> >>> >
> >>>
> org.apache.mahout.classifier.ConfusionMatrix.addInstance(ConfusionMatrix.java:85)
> >>> > at
> >>> >
> >>>
> org.apache.mahout.classifier.ResultAnalyzer.addInstance(ResultAnalyzer.java:67)
> >>> >  at
> >>> >
> >>>
> org.apache.mahout.classifier.bayes.TestClassifier.classifySequential(TestClassifier.java:244)
> >>> > at
> >>> >
> >>>
> org.apache.mahout.classifier.bayes.TestClassifier.main(TestClassifier.java:177)
> >>> >  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >>> > at
> >>> >
> >>>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> >>> >  at
> >>> >
> >>>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >>> > at java.lang.reflect.Method.invoke(Method.java:597)
> >>> >  at
> >>> >
> >>>
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> >>> > at
> org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> >>> >  at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
> >>> >
> >>> >
> >>> > Any help is great appreciated.
> >>> >
> >>> > Regards,
> >>> > --
> >>> >  Vijay Santhanam
> >>> >  Software Engineer
> >>> >
> >>>
> >>
> >>
> >>
> >> --
> >>  Vijay Santhanam
> >>  Software Engineer
> >>  http://au.linkedin.com/in/vijaysanthanam
> >>  0407525087
> >>
> >
>



-- 
 Vijay Santhanam
 Software Engineer
 http://au.linkedin.com/in/vijaysanthanam
 0407525087

Re: 20news

Posted by Sergey Bartunov <sb...@gmail.com>.
Paste somewhere your  bayes-test-input file.

On 4 July 2011 13:20, Sergey Bartunov <sb...@gmail.com> wrote:
> Yes, I worked WITH hadoop, but there should be no difference.
>
> Why do you use examples/bin/build/20news-bayes.sh instead of direct
> running bin/mahout? Is it the same?
>
> On 4 July 2011 13:12, Vijay Santhanam <vi...@gmail.com> wrote:
>> Thanks Sergey,
>>
>> I'm still receiving the same error after following those steps.
>> I've chosen not to use hadoop - does yours work WITH hadoop?
>>
>> A few bits of info that might be relevant.
>>
>> My examples/bin/work folder contains the expected folders from test data
>> preparation and training...
>> drwxr-xr-x@ 22 Vijay  staff  748 18 Mar  2003 20news-bydate-test
>> drwxr-xr-x@ 22 Vijay  staff  748 18 Mar  2003 20news-bydate-train
>> drwxr-xr-x   3 Vijay  staff  102  4 Jul 19:03 bayes-model
>> drwxr-xr-x  22 Vijay  staff  748  4 Jul 18:20 bayes-test-input
>> drwxr-xr-x  22 Vijay  staff  748  4 Jul 17:49 bayes-train-input
>>
>>
>> I appreciate your help, do you have any other suggestions?
>>
>> Regards,
>> Vijay
>>
>>
>> On Mon, Jul 4, 2011 at 6:58 PM, Sergey Bartunov <sb...@gmail.com> wrote:
>>
>>> When I started with Mahout I had the same errors. In my case, I just
>>> didn't run PrepareTwentyNewsgroups. You may try to accurately repeat
>>> all steps from https://cwiki.apache.org/MAHOUT/twenty-newsgroups.html
>>>
>>> On 4 July 2011 12:52, Vijay Santhanam <vi...@gmail.com> wrote:
>>> > Hi All,
>>> >
>>> > I'm new to Mahout and I'm interested in experimenting with it's
>>> classifiers.
>>> >
>>> > Right now, I'm just trying to get up and running with the demo's and
>>> > examples.
>>> >
>>> > After checking out the mahout trunk, I've tried running the
>>> classification
>>> > example 20news, but after running the
>>> ./examples/bin/build/20news-bayes.sh
>>> > script I get the following error during the classification phase.
>>> >
>>> > Does anyone else get the same thing? Or have any recommendations about
>>> how
>>> > to fix it?
>>> > I'd just like to get a sample classifier working before I embark on my
>>> own
>>> > classification journey.
>>> >
>>> >
>>> > INFO: Loading model from:
>>> > {basePath=examples/bin/work/20news-bydate/bayes-model,
>>> classifierType=bayes,
>>> > alpha_i=1.0, dataSource=hdfs, gramSize=1, verbose=false, encoding=UTF-8,
>>> > defaultCat=unknown,
>>> > testDirPath=examples/bin/work/20news-bydate/bayes-test-input}
>>> > Jul 4, 2011 6:28:25 PM org.slf4j.impl.JCLLoggerAdapter info
>>> > INFO: Testing Bayes Classifier
>>> > Jul 4, 2011 6:28:27 PM org.slf4j.impl.JCLLoggerAdapter info
>>> > INFO: Read 50000 feature weights
>>> > Jul 4, 2011 6:28:27 PM org.slf4j.impl.JCLLoggerAdapter info
>>> > INFO: Read 100000 feature weights
>>> > Jul 4, 2011 6:28:28 PM org.slf4j.impl.JCLLoggerAdapter info
>>> > INFO: 193370.88331085522
>>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>>> > INFO: rec.sport.baseball -129829.34738930278 531784.7805631821
>>> > -0.2441388925268003
>>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>>> > INFO: sci.crypt -193023.42370049533 531784.7805631821 -0.3629728242618669
>>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>>> > INFO: rec.sport.hockey -167853.6159738822 531784.7805631821
>>> > -0.31564200802459647
>>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>>> > INFO: talk.politics.guns -203524.0148974065 531784.7805631821
>>> > -0.3827187658170024
>>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>>> > INFO: soc.religion.christian -163900.9258713857 531784.7805631821
>>> > -0.308209132457322
>>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>>> > INFO: sci.electronics -142854.1677345925 531784.7805631821
>>> > -0.26863154598614886
>>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>>> > INFO: comp.os.ms-windows.misc -531784.7805631821 531784.7805631821 -1.0
>>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>>> > INFO: misc.forsale -143454.70176448982 531784.7805631821
>>> > -0.26976082619845826
>>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>>> > INFO: talk.religion.misc -139428.73484148504 531784.7805631821
>>> > -0.2621901565024562
>>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>>> > INFO: alt.atheism -139569.06867597546 531784.7805631821
>>> -0.2624540486626301
>>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>>> > INFO: comp.windows.x -178029.10523376046 531784.7805631821
>>> > -0.33477660839638973
>>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>>> > INFO: talk.politics.mideast -193075.00789450994 531784.7805631821
>>> > -0.36306982627452317
>>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>>> > INFO: comp.sys.ibm.pc.hardware -138410.02049984262 531784.7805631821
>>> > -0.2602745049477736
>>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>>> > INFO: comp.sys.mac.hardware -125200.9927438868 531784.7805631821
>>> > -0.23543545682389364
>>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>>> > INFO: sci.space -192437.0009266271 531784.7805631821 -0.3618700797018455
>>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>>> > INFO: rec.motorcycles -143142.20855440624 531784.7805631821
>>> > -0.26917319522159455
>>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>>> > INFO: rec.autos -141800.97549909537 531784.7805631821 -0.2666510601317365
>>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>>> > INFO: comp.graphics -166882.18654471825 531784.7805631821
>>> > -0.3138152738556811
>>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>>> > INFO: talk.politics.misc -165196.84193278523 531784.7805631821
>>> > -0.3106460507535303
>>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>>> > INFO: sci.med -192698.5183245711 531784.7805631821 -0.36236185270382393
>>> > Exception in thread "main" java.lang.IllegalArgumentException: Label not
>>> > found: alt.atheism from
>>> >  at
>>> > com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
>>> > at
>>> >
>>> org.apache.mahout.classifier.ConfusionMatrix.getCount(ConfusionMatrix.java:93)
>>> >  at
>>> >
>>> org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:113)
>>> > at
>>> >
>>> org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:117)
>>> >  at
>>> >
>>> org.apache.mahout.classifier.ConfusionMatrix.addInstance(ConfusionMatrix.java:85)
>>> > at
>>> >
>>> org.apache.mahout.classifier.ResultAnalyzer.addInstance(ResultAnalyzer.java:67)
>>> >  at
>>> >
>>> org.apache.mahout.classifier.bayes.TestClassifier.classifySequential(TestClassifier.java:244)
>>> > at
>>> >
>>> org.apache.mahout.classifier.bayes.TestClassifier.main(TestClassifier.java:177)
>>> >  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>> > at
>>> >
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>> >  at
>>> >
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>> > at java.lang.reflect.Method.invoke(Method.java:597)
>>> >  at
>>> >
>>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>>> > at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>>> >  at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
>>> >
>>> >
>>> > Any help is great appreciated.
>>> >
>>> > Regards,
>>> > --
>>> >  Vijay Santhanam
>>> >  Software Engineer
>>> >
>>>
>>
>>
>>
>> --
>>  Vijay Santhanam
>>  Software Engineer
>>  http://au.linkedin.com/in/vijaysanthanam
>>  0407525087
>>
>

Re: 20news

Posted by Sergey Bartunov <sb...@gmail.com>.
Yes, I worked WITH hadoop, but there should be no difference.

Why do you use examples/bin/build/20news-bayes.sh instead of direct
running bin/mahout? Is it the same?

On 4 July 2011 13:12, Vijay Santhanam <vi...@gmail.com> wrote:
> Thanks Sergey,
>
> I'm still receiving the same error after following those steps.
> I've chosen not to use hadoop - does yours work WITH hadoop?
>
> A few bits of info that might be relevant.
>
> My examples/bin/work folder contains the expected folders from test data
> preparation and training...
> drwxr-xr-x@ 22 Vijay  staff  748 18 Mar  2003 20news-bydate-test
> drwxr-xr-x@ 22 Vijay  staff  748 18 Mar  2003 20news-bydate-train
> drwxr-xr-x   3 Vijay  staff  102  4 Jul 19:03 bayes-model
> drwxr-xr-x  22 Vijay  staff  748  4 Jul 18:20 bayes-test-input
> drwxr-xr-x  22 Vijay  staff  748  4 Jul 17:49 bayes-train-input
>
>
> I appreciate your help, do you have any other suggestions?
>
> Regards,
> Vijay
>
>
> On Mon, Jul 4, 2011 at 6:58 PM, Sergey Bartunov <sb...@gmail.com> wrote:
>
>> When I started with Mahout I had the same errors. In my case, I just
>> didn't run PrepareTwentyNewsgroups. You may try to accurately repeat
>> all steps from https://cwiki.apache.org/MAHOUT/twenty-newsgroups.html
>>
>> On 4 July 2011 12:52, Vijay Santhanam <vi...@gmail.com> wrote:
>> > Hi All,
>> >
>> > I'm new to Mahout and I'm interested in experimenting with it's
>> classifiers.
>> >
>> > Right now, I'm just trying to get up and running with the demo's and
>> > examples.
>> >
>> > After checking out the mahout trunk, I've tried running the
>> classification
>> > example 20news, but after running the
>> ./examples/bin/build/20news-bayes.sh
>> > script I get the following error during the classification phase.
>> >
>> > Does anyone else get the same thing? Or have any recommendations about
>> how
>> > to fix it?
>> > I'd just like to get a sample classifier working before I embark on my
>> own
>> > classification journey.
>> >
>> >
>> > INFO: Loading model from:
>> > {basePath=examples/bin/work/20news-bydate/bayes-model,
>> classifierType=bayes,
>> > alpha_i=1.0, dataSource=hdfs, gramSize=1, verbose=false, encoding=UTF-8,
>> > defaultCat=unknown,
>> > testDirPath=examples/bin/work/20news-bydate/bayes-test-input}
>> > Jul 4, 2011 6:28:25 PM org.slf4j.impl.JCLLoggerAdapter info
>> > INFO: Testing Bayes Classifier
>> > Jul 4, 2011 6:28:27 PM org.slf4j.impl.JCLLoggerAdapter info
>> > INFO: Read 50000 feature weights
>> > Jul 4, 2011 6:28:27 PM org.slf4j.impl.JCLLoggerAdapter info
>> > INFO: Read 100000 feature weights
>> > Jul 4, 2011 6:28:28 PM org.slf4j.impl.JCLLoggerAdapter info
>> > INFO: 193370.88331085522
>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>> > INFO: rec.sport.baseball -129829.34738930278 531784.7805631821
>> > -0.2441388925268003
>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>> > INFO: sci.crypt -193023.42370049533 531784.7805631821 -0.3629728242618669
>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>> > INFO: rec.sport.hockey -167853.6159738822 531784.7805631821
>> > -0.31564200802459647
>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>> > INFO: talk.politics.guns -203524.0148974065 531784.7805631821
>> > -0.3827187658170024
>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>> > INFO: soc.religion.christian -163900.9258713857 531784.7805631821
>> > -0.308209132457322
>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>> > INFO: sci.electronics -142854.1677345925 531784.7805631821
>> > -0.26863154598614886
>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>> > INFO: comp.os.ms-windows.misc -531784.7805631821 531784.7805631821 -1.0
>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>> > INFO: misc.forsale -143454.70176448982 531784.7805631821
>> > -0.26976082619845826
>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>> > INFO: talk.religion.misc -139428.73484148504 531784.7805631821
>> > -0.2621901565024562
>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>> > INFO: alt.atheism -139569.06867597546 531784.7805631821
>> -0.2624540486626301
>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>> > INFO: comp.windows.x -178029.10523376046 531784.7805631821
>> > -0.33477660839638973
>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>> > INFO: talk.politics.mideast -193075.00789450994 531784.7805631821
>> > -0.36306982627452317
>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>> > INFO: comp.sys.ibm.pc.hardware -138410.02049984262 531784.7805631821
>> > -0.2602745049477736
>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>> > INFO: comp.sys.mac.hardware -125200.9927438868 531784.7805631821
>> > -0.23543545682389364
>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>> > INFO: sci.space -192437.0009266271 531784.7805631821 -0.3618700797018455
>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>> > INFO: rec.motorcycles -143142.20855440624 531784.7805631821
>> > -0.26917319522159455
>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>> > INFO: rec.autos -141800.97549909537 531784.7805631821 -0.2666510601317365
>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>> > INFO: comp.graphics -166882.18654471825 531784.7805631821
>> > -0.3138152738556811
>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>> > INFO: talk.politics.misc -165196.84193278523 531784.7805631821
>> > -0.3106460507535303
>> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
>> > INFO: sci.med -192698.5183245711 531784.7805631821 -0.36236185270382393
>> > Exception in thread "main" java.lang.IllegalArgumentException: Label not
>> > found: alt.atheism from
>> >  at
>> > com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
>> > at
>> >
>> org.apache.mahout.classifier.ConfusionMatrix.getCount(ConfusionMatrix.java:93)
>> >  at
>> >
>> org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:113)
>> > at
>> >
>> org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:117)
>> >  at
>> >
>> org.apache.mahout.classifier.ConfusionMatrix.addInstance(ConfusionMatrix.java:85)
>> > at
>> >
>> org.apache.mahout.classifier.ResultAnalyzer.addInstance(ResultAnalyzer.java:67)
>> >  at
>> >
>> org.apache.mahout.classifier.bayes.TestClassifier.classifySequential(TestClassifier.java:244)
>> > at
>> >
>> org.apache.mahout.classifier.bayes.TestClassifier.main(TestClassifier.java:177)
>> >  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> > at
>> >
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>> >  at
>> >
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> > at java.lang.reflect.Method.invoke(Method.java:597)
>> >  at
>> >
>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>> > at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>> >  at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
>> >
>> >
>> > Any help is great appreciated.
>> >
>> > Regards,
>> > --
>> >  Vijay Santhanam
>> >  Software Engineer
>> >
>>
>
>
>
> --
>  Vijay Santhanam
>  Software Engineer
>  http://au.linkedin.com/in/vijaysanthanam
>  0407525087
>

Re: 20news

Posted by Vijay Santhanam <vi...@gmail.com>.
Thanks Sergey,

I'm still receiving the same error after following those steps.
I've chosen not to use hadoop - does yours work WITH hadoop?

A few bits of info that might be relevant.

My examples/bin/work folder contains the expected folders from test data
preparation and training...
drwxr-xr-x@ 22 Vijay  staff  748 18 Mar  2003 20news-bydate-test
drwxr-xr-x@ 22 Vijay  staff  748 18 Mar  2003 20news-bydate-train
drwxr-xr-x   3 Vijay  staff  102  4 Jul 19:03 bayes-model
drwxr-xr-x  22 Vijay  staff  748  4 Jul 18:20 bayes-test-input
drwxr-xr-x  22 Vijay  staff  748  4 Jul 17:49 bayes-train-input


I appreciate your help, do you have any other suggestions?

Regards,
Vijay


On Mon, Jul 4, 2011 at 6:58 PM, Sergey Bartunov <sb...@gmail.com> wrote:

> When I started with Mahout I had the same errors. In my case, I just
> didn't run PrepareTwentyNewsgroups. You may try to accurately repeat
> all steps from https://cwiki.apache.org/MAHOUT/twenty-newsgroups.html
>
> On 4 July 2011 12:52, Vijay Santhanam <vi...@gmail.com> wrote:
> > Hi All,
> >
> > I'm new to Mahout and I'm interested in experimenting with it's
> classifiers.
> >
> > Right now, I'm just trying to get up and running with the demo's and
> > examples.
> >
> > After checking out the mahout trunk, I've tried running the
> classification
> > example 20news, but after running the
> ./examples/bin/build/20news-bayes.sh
> > script I get the following error during the classification phase.
> >
> > Does anyone else get the same thing? Or have any recommendations about
> how
> > to fix it?
> > I'd just like to get a sample classifier working before I embark on my
> own
> > classification journey.
> >
> >
> > INFO: Loading model from:
> > {basePath=examples/bin/work/20news-bydate/bayes-model,
> classifierType=bayes,
> > alpha_i=1.0, dataSource=hdfs, gramSize=1, verbose=false, encoding=UTF-8,
> > defaultCat=unknown,
> > testDirPath=examples/bin/work/20news-bydate/bayes-test-input}
> > Jul 4, 2011 6:28:25 PM org.slf4j.impl.JCLLoggerAdapter info
> > INFO: Testing Bayes Classifier
> > Jul 4, 2011 6:28:27 PM org.slf4j.impl.JCLLoggerAdapter info
> > INFO: Read 50000 feature weights
> > Jul 4, 2011 6:28:27 PM org.slf4j.impl.JCLLoggerAdapter info
> > INFO: Read 100000 feature weights
> > Jul 4, 2011 6:28:28 PM org.slf4j.impl.JCLLoggerAdapter info
> > INFO: 193370.88331085522
> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> > INFO: rec.sport.baseball -129829.34738930278 531784.7805631821
> > -0.2441388925268003
> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> > INFO: sci.crypt -193023.42370049533 531784.7805631821 -0.3629728242618669
> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> > INFO: rec.sport.hockey -167853.6159738822 531784.7805631821
> > -0.31564200802459647
> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> > INFO: talk.politics.guns -203524.0148974065 531784.7805631821
> > -0.3827187658170024
> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> > INFO: soc.religion.christian -163900.9258713857 531784.7805631821
> > -0.308209132457322
> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> > INFO: sci.electronics -142854.1677345925 531784.7805631821
> > -0.26863154598614886
> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> > INFO: comp.os.ms-windows.misc -531784.7805631821 531784.7805631821 -1.0
> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> > INFO: misc.forsale -143454.70176448982 531784.7805631821
> > -0.26976082619845826
> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> > INFO: talk.religion.misc -139428.73484148504 531784.7805631821
> > -0.2621901565024562
> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> > INFO: alt.atheism -139569.06867597546 531784.7805631821
> -0.2624540486626301
> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> > INFO: comp.windows.x -178029.10523376046 531784.7805631821
> > -0.33477660839638973
> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> > INFO: talk.politics.mideast -193075.00789450994 531784.7805631821
> > -0.36306982627452317
> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> > INFO: comp.sys.ibm.pc.hardware -138410.02049984262 531784.7805631821
> > -0.2602745049477736
> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> > INFO: comp.sys.mac.hardware -125200.9927438868 531784.7805631821
> > -0.23543545682389364
> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> > INFO: sci.space -192437.0009266271 531784.7805631821 -0.3618700797018455
> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> > INFO: rec.motorcycles -143142.20855440624 531784.7805631821
> > -0.26917319522159455
> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> > INFO: rec.autos -141800.97549909537 531784.7805631821 -0.2666510601317365
> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> > INFO: comp.graphics -166882.18654471825 531784.7805631821
> > -0.3138152738556811
> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> > INFO: talk.politics.misc -165196.84193278523 531784.7805631821
> > -0.3106460507535303
> > Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> > INFO: sci.med -192698.5183245711 531784.7805631821 -0.36236185270382393
> > Exception in thread "main" java.lang.IllegalArgumentException: Label not
> > found: alt.atheism from
> >  at
> > com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
> > at
> >
> org.apache.mahout.classifier.ConfusionMatrix.getCount(ConfusionMatrix.java:93)
> >  at
> >
> org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:113)
> > at
> >
> org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:117)
> >  at
> >
> org.apache.mahout.classifier.ConfusionMatrix.addInstance(ConfusionMatrix.java:85)
> > at
> >
> org.apache.mahout.classifier.ResultAnalyzer.addInstance(ResultAnalyzer.java:67)
> >  at
> >
> org.apache.mahout.classifier.bayes.TestClassifier.classifySequential(TestClassifier.java:244)
> > at
> >
> org.apache.mahout.classifier.bayes.TestClassifier.main(TestClassifier.java:177)
> >  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > at
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> >  at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> > at java.lang.reflect.Method.invoke(Method.java:597)
> >  at
> >
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> > at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> >  at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
> >
> >
> > Any help is great appreciated.
> >
> > Regards,
> > --
> >  Vijay Santhanam
> >  Software Engineer
> >
>



-- 
 Vijay Santhanam
 Software Engineer
 http://au.linkedin.com/in/vijaysanthanam
 0407525087

Re: 20news

Posted by Sergey Bartunov <sb...@gmail.com>.
When I started with Mahout I had the same errors. In my case, I just
didn't run PrepareTwentyNewsgroups. You may try to accurately repeat
all steps from https://cwiki.apache.org/MAHOUT/twenty-newsgroups.html

On 4 July 2011 12:52, Vijay Santhanam <vi...@gmail.com> wrote:
> Hi All,
>
> I'm new to Mahout and I'm interested in experimenting with it's classifiers.
>
> Right now, I'm just trying to get up and running with the demo's and
> examples.
>
> After checking out the mahout trunk, I've tried running the classification
> example 20news, but after running the ./examples/bin/build/20news-bayes.sh
> script I get the following error during the classification phase.
>
> Does anyone else get the same thing? Or have any recommendations about how
> to fix it?
> I'd just like to get a sample classifier working before I embark on my own
> classification journey.
>
>
> INFO: Loading model from:
> {basePath=examples/bin/work/20news-bydate/bayes-model, classifierType=bayes,
> alpha_i=1.0, dataSource=hdfs, gramSize=1, verbose=false, encoding=UTF-8,
> defaultCat=unknown,
> testDirPath=examples/bin/work/20news-bydate/bayes-test-input}
> Jul 4, 2011 6:28:25 PM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Testing Bayes Classifier
> Jul 4, 2011 6:28:27 PM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Read 50000 feature weights
> Jul 4, 2011 6:28:27 PM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Read 100000 feature weights
> Jul 4, 2011 6:28:28 PM org.slf4j.impl.JCLLoggerAdapter info
> INFO: 193370.88331085522
> Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> INFO: rec.sport.baseball -129829.34738930278 531784.7805631821
> -0.2441388925268003
> Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> INFO: sci.crypt -193023.42370049533 531784.7805631821 -0.3629728242618669
> Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> INFO: rec.sport.hockey -167853.6159738822 531784.7805631821
> -0.31564200802459647
> Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> INFO: talk.politics.guns -203524.0148974065 531784.7805631821
> -0.3827187658170024
> Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> INFO: soc.religion.christian -163900.9258713857 531784.7805631821
> -0.308209132457322
> Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> INFO: sci.electronics -142854.1677345925 531784.7805631821
> -0.26863154598614886
> Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> INFO: comp.os.ms-windows.misc -531784.7805631821 531784.7805631821 -1.0
> Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> INFO: misc.forsale -143454.70176448982 531784.7805631821
> -0.26976082619845826
> Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> INFO: talk.religion.misc -139428.73484148504 531784.7805631821
> -0.2621901565024562
> Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> INFO: alt.atheism -139569.06867597546 531784.7805631821 -0.2624540486626301
> Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> INFO: comp.windows.x -178029.10523376046 531784.7805631821
> -0.33477660839638973
> Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> INFO: talk.politics.mideast -193075.00789450994 531784.7805631821
> -0.36306982627452317
> Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> INFO: comp.sys.ibm.pc.hardware -138410.02049984262 531784.7805631821
> -0.2602745049477736
> Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> INFO: comp.sys.mac.hardware -125200.9927438868 531784.7805631821
> -0.23543545682389364
> Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> INFO: sci.space -192437.0009266271 531784.7805631821 -0.3618700797018455
> Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> INFO: rec.motorcycles -143142.20855440624 531784.7805631821
> -0.26917319522159455
> Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> INFO: rec.autos -141800.97549909537 531784.7805631821 -0.2666510601317365
> Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> INFO: comp.graphics -166882.18654471825 531784.7805631821
> -0.3138152738556811
> Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> INFO: talk.politics.misc -165196.84193278523 531784.7805631821
> -0.3106460507535303
> Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info
> INFO: sci.med -192698.5183245711 531784.7805631821 -0.36236185270382393
> Exception in thread "main" java.lang.IllegalArgumentException: Label not
> found: alt.atheism from
>  at
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
> at
> org.apache.mahout.classifier.ConfusionMatrix.getCount(ConfusionMatrix.java:93)
>  at
> org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:113)
> at
> org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:117)
>  at
> org.apache.mahout.classifier.ConfusionMatrix.addInstance(ConfusionMatrix.java:85)
> at
> org.apache.mahout.classifier.ResultAnalyzer.addInstance(ResultAnalyzer.java:67)
>  at
> org.apache.mahout.classifier.bayes.TestClassifier.classifySequential(TestClassifier.java:244)
> at
> org.apache.mahout.classifier.bayes.TestClassifier.main(TestClassifier.java:177)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>  at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
>  at
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>  at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
>
>
> Any help is great appreciated.
>
> Regards,
> --
>  Vijay Santhanam
>  Software Engineer
>