You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Isabel Drost <is...@apache.org> on 2011/12/21 19:50:45 UTC

Status naive bayes [Was: Re: SequenceFile cast problems]

On 14.12.2011 Grant Ingersoll wrote:
> While Ted answered the Dissector question, your original issue, I believe,
> is that Mahout currently has two different NB implementations. 
> trainclassifier/testclassifier use the old, word based package which
> requires Text as input.  The new package, which TrainNaiveBayesJob uses,
> requires VectorWritables.

While reading that thread it occured to me that this is sort of confusing for 
users. What is the reason for keeping both implementations? Would it make sense 
to keep only the vector-based version?


Isabel

Re: Status naive bayes [Was: Re: SequenceFile cast problems]

Posted by Grant Ingersoll <gs...@apache.org>.
I asked the same question a few months back and got no reply.  I think the goal is to move to vector based, but I'm not convinced that the new one is totally correct yet.  Robin was the primary author of both, but haven't heard more from him on it.


On Dec 21, 2011, at 1:50 PM, Isabel Drost wrote:

> On 14.12.2011 Grant Ingersoll wrote:
>> While Ted answered the Dissector question, your original issue, I believe,
>> is that Mahout currently has two different NB implementations. 
>> trainclassifier/testclassifier use the old, word based package which
>> requires Text as input.  The new package, which TrainNaiveBayesJob uses,
>> requires VectorWritables.
> 
> While reading that thread it occured to me that this is sort of confusing for 
> users. What is the reason for keeping both implementations? Would it make sense 
> to keep only the vector-based version?
> 
> 
> Isabel

--------------------------------------------
Grant Ingersoll
http://www.lucidimagination.com