You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by jun li <ju...@gmail.com> on 2010/08/30 09:21:36 UTC

how much training set size do mahout bayes classifier support?

I ever train a naive bayes classifier by a large training size. like
dmoz , using lingpipe package.
but out of memory. i.e., exceed limit of java heap size.

I want to know does any one tried a big training size to train a
mahout bayes classifier  for text ?
thanks.


-- 
Li Jun

Re: how much training set size do mahout bayes classifier support?

Posted by Robin Anil <ro...@gmail.com>.
Training can be of arbitary size. No limits. Classification needs to load
data into memory and therefore you are limited there. You can prune low
frequency words to greatly reduce the model size without affecting precision
much

Robin

On Mon, Aug 30, 2010 at 1:01 PM, Ted Dunning <te...@gmail.com> wrote:

> With Naive Bayes, you should be able to train with a nearly arbitrarily
> large data set.  The only limit will be keeping a list of the unique words
> in memory.
>
> On Mon, Aug 30, 2010 at 12:21 AM, jun li <ju...@gmail.com> wrote:
>
> > I ever train a naive bayes classifier by a large training size. like
> > dmoz , using lingpipe package.
> > but out of memory. i.e., exceed limit of java heap size.
> >
> > I want to know does any one tried a big training size to train a
> > mahout bayes classifier  for text ?
> > thanks.
> >
> >
> > --
> > Li Jun
> >
>

Re: how much training set size do mahout bayes classifier support?

Posted by Ted Dunning <te...@gmail.com>.
With Naive Bayes, you should be able to train with a nearly arbitrarily
large data set.  The only limit will be keeping a list of the unique words
in memory.

On Mon, Aug 30, 2010 at 12:21 AM, jun li <ju...@gmail.com> wrote:

> I ever train a naive bayes classifier by a large training size. like
> dmoz , using lingpipe package.
> but out of memory. i.e., exceed limit of java heap size.
>
> I want to know does any one tried a big training size to train a
> mahout bayes classifier  for text ?
> thanks.
>
>
> --
> Li Jun
>