You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Sandra Clover <sc...@consultant.com> on 2009/09/29 14:47:09 UTC

Classify() method results anomoly - help!

Hi,    I'm using Mahout 0.1 for document classification (using the
distributed Bayesian Network)  and I'm getting some answers back.       I
have noticed 1 thing that is really bugging me. I'm wondering can you
help please:-
 Problem: Concernign the Classify() method there are 2 constructors in
the API. The first one returns just one answer (according to the API it
returns: "the single best category"). The second constructor says that
it: "return the top numResults, ranked by score" My problem is that I
have compared and contrasted the results in both techniques. I have
noticed that the single best category does not appear at *all* in the
range of categories given by the second contructor! Strange no? I would
of expected that it should come top of the list. I have gone to a value
of 20 deep in the numResults level and have not even see in the best
category.     Has anyone encountered this before? I would appreciate any
comments/suggestions/user-experience that you may like to share. Thanks,
Sandra.
 

-- 
An Excellent Credit Score is 750 
See Yours in Just 2 Easy Steps!


Re: Classify() method results anomoly - help!

Posted by Grant Ingersoll <gs...@apache.org>.
On Sep 29, 2009, at 8:47 AM, Sandra Clover wrote:

> Hi,    I'm using Mahout 0.1 for document classification (using the
> distributed Bayesian Network)  and I'm getting some answers  
> back.       I
> have noticed 1 thing that is really bugging me. I'm wondering can you
> help please:-
>  Problem: Concernign the Classify() method there are 2 constructors in
> the API. The first one returns just one answer (according to the API  
> it
> returns: "the single best category"). The second constructor says that
> it: "return the top numResults, ranked by score" My problem is that I
> have compared and contrasted the results in both techniques. I have
> noticed that the single best category does not appear at *all* in the
> range of categories given by the second contructor! Strange no? I  
> would
> of expected that it should come top of the list. I have gone to a  
> value
> of 20 deep in the numResults level and have not even see in the best
> category.     Has anyone encountered this before? I would appreciate  
> any
> comments/suggestions/user-experience that you may like to share.  
> Thanks,
> Sandra.
>

That sounds like a bug.  Can you try out the trunk version of Mahout  
and see if it is still there?  A lot of the classification stuff has  
been reworked recently (I'm not even sure at the moment that those two  
classify methods are even still in the code!)