You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Robin Anil <ro...@gmail.com> on 2009/10/07 14:08:31 UTC
Re: Classify() method results anomoly - help!
Hi Sandra,
I tested the priority queue implementation it does seem that there is some
problem with the priority queue implementation of hadoop
import org.apache.hadoop.util.PriorityQueue;
PriorityQueue<ClassifierResult> queue = new
ClassifierResultPriorityQueue(3);
queue.insert(new ClassifierResult("label1", 5));
queue.insert(new ClassifierResult("label2", 4));
queue.insert(new ClassifierResult("label3", 3));
queue.insert(new ClassifierResult("label4", 2));
queue.insert(new ClassifierResult("label5", 1));
assertEquals("Incorrect Size", 3, queue.size());
log.info(queue.pop().toString());
log.info(queue.pop().toString());
log.info(queue.pop().toString());
09/10/07 16:58:39 INFO common.ClassifierResultPriorityQueueTest:
ClassifierResult{category='label3', score=3.0}
09/10/07 16:58:39 INFO common.ClassifierResultPriorityQueueTest:
ClassifierResult{category='label4', score=2.0}
09/10/07 16:58:39 INFO common.ClassifierResultPriorityQueueTest:
ClassifierResult{category='label5', score=1.0}
label1 and label2 were missing. I couldn't explain this behaviour.
I changed it to java.util PriorityQueue. So its working now.
On Wed, Sep 30, 2009 at 6:43 PM, Sandra Clover <sc...@consultant.com>wrote:
> Hi Robin, Thanks for the reply & for updating the documentation &
> your advice. I'll try the trunk version. To answer your question I am
> using Mahout version 0.1 & Hadoop 0.19.2. Hope this helps... Thanks
> again, Robin Sandra.
>
> ----- Original Message -----
> From: "Robin Anil"
> To: mahout-user@lucene.apache.org
> Subject: Re: Classify() method results anomoly - help!
> Date: Wed, 30 Sep 2009 18:08:05 +0530
>
>
> Hi Sandra, those scores are indicative of the relative score not the
> probability, Thank for bringing this to our notice, I will fix the
> documentation, you may try the trunk and see if the former error is
> coming. Also
> could you tell me the version of hadoop you are using.
>
>
>
> On Wed, Sep 30, 2009 at 5:30 PM, Sandra Clover wrote:
>
> > Thanks Grant, I'll look into that. I've been having a look at the
> > numbers returned from the getScore() method also. I have noticed a
> range
> > from 0 to around 20000.243434+ with numbers in between like:
> > 1659.930763537123 According to the API documentation for this
> method:
> > "The label and the associated score(Usually probabilty)". This does
> not
> > look like probability to me. I was kind of expecting an answer
> between 0
> > and 1 or 0 and 100 or something like that. Are these results
> typical or
> > indicative of some sort of bug? Once again, comments/suggestions
> > appreciated.Sandra.
> >
> >
> >
> > ----- Original Message -----
> > From: "Grant Ingersoll"
> > To: mahout-user@lucene.apache.org
> > Subject: Re: Classify() method results anomoly - help!
> > Date: Tue, 29 Sep 2009 16:02:46 -0400
> >
> >
> >
> > On Sep 29, 2009, at 8:47 AM, Sandra Clover wrote:
> >
> > > Hi, I'm using Mahout 0.1 for document classification (using the
> > > distributed Bayesian Network) and I'm getting some answers back.
> I
> > > have noticed 1 thing that is really bugging me. I'm wondering can
> > you
> > > help please:-
> > > Problem: Concernign the Classify() method there are 2
> constructors
> > in
> > > the API. The first one returns just one answer (according to the
> > API it
> > > returns: "the single best category"). The second constructor says
> > that
> > > it: "return the top numResults, ranked by score" My problem is
> that
> > I
> > > have compared and contrasted the results in both techniques. I
> have
> > > noticed that the single best category does not appear at *all* in
> > the
> > > range of categories given by the second contructor! Strange no? I
> > would
> > > of expected that it should come top of the list. I have gone to a
> > value
> > > of 20 deep in the numResults level and have not even see in the
> > best
> > > category. Has anyone encountered this before? I would appreciate
> > any
> > > comments/suggestions/user-experience that you may like to share.
> > Thanks,
> > > Sandra.
> > >
> >
> > That sounds like a bug. Can you try out the trunk version of
> > Mahout and see if it is still there? A lot of the classification
> > stuff has been reworked recently (I'm not even sure at the moment
> > that those two classify methods are even still in the code!)
> >
> > --
> > An Excellent Credit Score is 750
> > See Yours in Just 2 Easy Steps!
> >
> >
>
> --
> An Excellent Credit Score is 750
> See Yours in Just 2 Easy Steps!
>
>