You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Robin Anil <ro...@gmail.com> on 2009/10/07 14:08:31 UTC

Re: Classify() method results anomoly - help!

Hi Sandra,
 I tested the priority queue implementation it does seem that there is some
problem with the priority queue implementation of hadoop
import org.apache.hadoop.util.PriorityQueue;
PriorityQueue<ClassifierResult> queue = new
ClassifierResultPriorityQueue(3);
    queue.insert(new ClassifierResult("label1", 5));
    queue.insert(new ClassifierResult("label2", 4));
    queue.insert(new ClassifierResult("label3", 3));
    queue.insert(new ClassifierResult("label4", 2));
    queue.insert(new ClassifierResult("label5", 1));

    assertEquals("Incorrect Size", 3, queue.size());
    log.info(queue.pop().toString());
    log.info(queue.pop().toString());
    log.info(queue.pop().toString());

09/10/07 16:58:39 INFO common.ClassifierResultPriorityQueueTest:
ClassifierResult{category='label3', score=3.0}
09/10/07 16:58:39 INFO common.ClassifierResultPriorityQueueTest:
ClassifierResult{category='label4', score=2.0}
09/10/07 16:58:39 INFO common.ClassifierResultPriorityQueueTest:
ClassifierResult{category='label5', score=1.0}
label1 and label2 were missing. I couldn't explain this behaviour.

I changed it to java.util PriorityQueue. So its working now.


On Wed, Sep 30, 2009 at 6:43 PM, Sandra Clover <sc...@consultant.com>wrote:

> Hi Robin,     Thanks for the reply & for updating the documentation &
> your advice. I'll try the trunk version. To answer your question I am
> using Mahout version 0.1 & Hadoop 0.19.2. Hope this helps... Thanks
> again, Robin Sandra.
>
>  ----- Original Message -----
>  From: "Robin Anil"
>  To: mahout-user@lucene.apache.org
>  Subject: Re: Classify() method results anomoly - help!
>   Date: Wed, 30 Sep 2009 18:08:05 +0530
>
>
>  Hi Sandra, those scores are indicative of the relative score not the
>  probability, Thank for bringing this to our notice, I will fix the
>  documentation, you may try the trunk and see if the former error is
>  coming. Also
>  could you tell me the version of hadoop you are using.
>
>
>
>   On Wed, Sep 30, 2009 at 5:30 PM, Sandra Clover wrote:
>
>  > Thanks Grant, I'll look into that. I've been having a look at the
>  > numbers returned from the getScore() method also. I have noticed a
>  range
>  > from 0 to around 20000.243434+ with numbers in between like:
>  > 1659.930763537123 According to the API documentation for this
>  method:
>  > "The label and the associated score(Usually probabilty)". This does
>  not
>  > look like probability to me. I was kind of expecting an answer
>  between 0
>  > and 1 or 0 and 100 or something like that. Are these results
>  typical or
>  > indicative of some sort of bug? Once again, comments/suggestions
>  > appreciated.Sandra.
>  >
>  >
>  >
>  > ----- Original Message -----
>  > From: "Grant Ingersoll"
>  > To: mahout-user@lucene.apache.org
>  > Subject: Re: Classify() method results anomoly - help!
>  > Date: Tue, 29 Sep 2009 16:02:46 -0400
>  >
>  >
>  >
>  > On Sep 29, 2009, at 8:47 AM, Sandra Clover wrote:
>  >
>  > > Hi, I'm using Mahout 0.1 for document classification (using the
>  > > distributed Bayesian Network) and I'm getting some answers back.
>  I
>  > > have noticed 1 thing that is really bugging me. I'm wondering can
>  > you
>  > > help please:-
>  > > Problem: Concernign the Classify() method there are 2
>  constructors
>  > in
>  > > the API. The first one returns just one answer (according to the
>  > API it
>  > > returns: "the single best category"). The second constructor says
>  > that
>  > > it: "return the top numResults, ranked by score" My problem is
>  that
>  > I
>  > > have compared and contrasted the results in both techniques. I
>  have
>  > > noticed that the single best category does not appear at *all* in
>  > the
>  > > range of categories given by the second contructor! Strange no? I
>  > would
>  > > of expected that it should come top of the list. I have gone to a
>  > value
>  > > of 20 deep in the numResults level and have not even see in the
>  > best
>  > > category. Has anyone encountered this before? I would appreciate
>  > any
>  > > comments/suggestions/user-experience that you may like to share.
>  > Thanks,
>  > > Sandra.
>  > >
>  >
>  > That sounds like a bug. Can you try out the trunk version of
>  > Mahout and see if it is still there? A lot of the classification
>  > stuff has been reworked recently (I'm not even sure at the moment
>  > that those two classify methods are even still in the code!)
>  >
>  > --
>  > An Excellent Credit Score is 750
>  > See Yours in Just 2 Easy Steps!
>  >
>  >
>
> --
> An Excellent Credit Score is 750
> See Yours in Just 2 Easy Steps!
>
>