You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by enyun <co...@126.com> on 2012/01/04 12:02:24 UTC

How to specify the prior distribution for bayes based classifier method?

hi all,

I'm trying to use the mahout bayes method to solve some classifier problem.
But I found the result is very bad.
When I dived into source code, I found the prior class distribution was not considered into model ( for example 20-news case ).
Was it supposed to do like this or a bug here?

P( c | d ) = p(c) * p(d|c)/p(d) = p(c) * p(t1|c)*p(t2|c)***p(tn|c) / p(d);
here, the p(c) was ignored in real prediction progress.

thanks,
enyun

Re: How to specify the prior distribution for bayes based classifier method?

Posted by Ted Dunning <te...@gmail.com>.
When NB fails, it is usually due to over-fitting because the training data
is relatively small, not because the prior is ignored.

See Rennie's paper for more discussion.
http://people.csail.mit.edu/jrennie/papers/icml03-nb.pdf

Can you say more about your data size?

2012/1/4 enyun <co...@126.com>

> hi all,
>
> I'm trying to use the mahout bayes method to solve some classifier problem.
> But I found the result is very bad.
> When I dived into source code, I found the prior class distribution was
> not considered into model ( for example 20-news case ).
> Was it supposed to do like this or a bug here?
>
> P( c | d ) = p(c) * p(d|c)/p(d) = p(c) * p(t1|c)*p(t2|c)***p(tn|c) / p(d);
> here, the p(c) was ignored in real prediction progress.
>
> thanks,
> enyun
>