You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Xiaobo Gu <gu...@gmail.com> on 2011/08/14 06:32:18 UTC

What's the difference between bayes and naivebayes?

Hi,
1. What's difference between them from the algorithm point of view, do
they only support category predictors only?
2. What are the input file format requirement of them, for
org.apache.mahout.naivbayes.*, it requires
SequenceFile<Text,VectorWritable>, and for org.apache.mahout.bayes.*,
it requires a tab seperated text file without header, why not use the
same input format?


Regards,

Xiaobo Gu

Re: What's the difference between bayes and naivebayes?

Posted by Xiaobo Gu <gu...@gmail.com>.
What about my second question

>> And do you mean there are the same
>> thing, we should always use naivebayes.*, provided we can prepair the
>> input data as required?
>
> Here's a link to the paper:
> http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.13.8572&rank=1
>
> --sebastian
>
>>
>> Regards,
>>
>> Xiaobo Gu
>>
>> On Sun, Aug 14, 2011 at 2:31 PM, Sebastian Schelter<ss...@apache.org>
>>  wrote:
>>>
>>> Hi Xiaobo,
>>>
>>> as far as I recall the paper on which Mahout's NB implementation is based
>>> on
>>> consists of two parts, the first parts describes techniques to generally
>>> improve NB's predicition quality on skewed input data and the likes while
>>> the second part shows how to handle textual data.
>>>
>>> I think that bayes.* is an older implementation that includes the first
>>> and
>>> the second part of the paper, while naivebayes.* is a newer one that only
>>> contains the general algorithm described in the first part of the paper.
>>>
>>> --sebastian
>>>
>>> On 14.08.2011 06:32, Xiaobo Gu wrote:
>>>>
>>>> Hi,
>>>> 1. What's difference between them from the algorithm point of view, do
>>>> they only support category predictors only?
>>>> 2. What are the input file format requirement of them, for
>>>> org.apache.mahout.naivbayes.*, it requires
>>>> SequenceFile<Text,VectorWritable>, and for org.apache.mahout.bayes.*,
>>>> it requires a tab seperated text file without header, why not use the
>>>> same input format?
>>>>
>>>>
>>>> Regards,
>>>>
>>>> Xiaobo Gu
>>>
>>>
>
>

Re: What's the difference between bayes and naivebayes?

Posted by Sebastian Schelter <ss...@apache.org>.
On 14.08.2011 08:53, Xiaobo Gu wrote:
> Can you show me the paper please. And do you mean there are the same
> thing, we should always use naivebayes.*, provided we can prepair the
> input data as required?

Here's a link to the paper: 
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.13.8572&rank=1

--sebastian

>
> Regards,
>
> Xiaobo Gu
>
> On Sun, Aug 14, 2011 at 2:31 PM, Sebastian Schelter<ss...@apache.org>  wrote:
>> Hi Xiaobo,
>>
>> as far as I recall the paper on which Mahout's NB implementation is based on
>> consists of two parts, the first parts describes techniques to generally
>> improve NB's predicition quality on skewed input data and the likes while
>> the second part shows how to handle textual data.
>>
>> I think that bayes.* is an older implementation that includes the first and
>> the second part of the paper, while naivebayes.* is a newer one that only
>> contains the general algorithm described in the first part of the paper.
>>
>> --sebastian
>>
>> On 14.08.2011 06:32, Xiaobo Gu wrote:
>>>
>>> Hi,
>>> 1. What's difference between them from the algorithm point of view, do
>>> they only support category predictors only?
>>> 2. What are the input file format requirement of them, for
>>> org.apache.mahout.naivbayes.*, it requires
>>> SequenceFile<Text,VectorWritable>, and for org.apache.mahout.bayes.*,
>>> it requires a tab seperated text file without header, why not use the
>>> same input format?
>>>
>>>
>>> Regards,
>>>
>>> Xiaobo Gu
>>
>>


Re: What's the difference between bayes and naivebayes?

Posted by Xiaobo Gu <gu...@gmail.com>.
Can you show me the paper please. And do you mean there are the same
thing, we should always use naivebayes.*, provided we can prepair the
input data as required?

Regards,

Xiaobo Gu

On Sun, Aug 14, 2011 at 2:31 PM, Sebastian Schelter <ss...@apache.org> wrote:
> Hi Xiaobo,
>
> as far as I recall the paper on which Mahout's NB implementation is based on
> consists of two parts, the first parts describes techniques to generally
> improve NB's predicition quality on skewed input data and the likes while
> the second part shows how to handle textual data.
>
> I think that bayes.* is an older implementation that includes the first and
> the second part of the paper, while naivebayes.* is a newer one that only
> contains the general algorithm described in the first part of the paper.
>
> --sebastian
>
> On 14.08.2011 06:32, Xiaobo Gu wrote:
>>
>> Hi,
>> 1. What's difference between them from the algorithm point of view, do
>> they only support category predictors only?
>> 2. What are the input file format requirement of them, for
>> org.apache.mahout.naivbayes.*, it requires
>> SequenceFile<Text,VectorWritable>, and for org.apache.mahout.bayes.*,
>> it requires a tab seperated text file without header, why not use the
>> same input format?
>>
>>
>> Regards,
>>
>> Xiaobo Gu
>
>

Re: What's the difference between bayes and naivebayes?

Posted by Sebastian Schelter <ss...@apache.org>.
Hi Xiaobo,

as far as I recall the paper on which Mahout's NB implementation is 
based on consists of two parts, the first parts describes techniques to 
generally improve NB's predicition quality on skewed input data and the 
likes while the second part shows how to handle textual data.

I think that bayes.* is an older implementation that includes the first 
and the second part of the paper, while naivebayes.* is a newer one that 
only contains the general algorithm described in the first part of the 
paper.

--sebastian

On 14.08.2011 06:32, Xiaobo Gu wrote:
> Hi,
> 1. What's difference between them from the algorithm point of view, do
> they only support category predictors only?
> 2. What are the input file format requirement of them, for
> org.apache.mahout.naivbayes.*, it requires
> SequenceFile<Text,VectorWritable>, and for org.apache.mahout.bayes.*,
> it requires a tab seperated text file without header, why not use the
> same input format?
>
>
> Regards,
>
> Xiaobo Gu