You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mahout.apache.org by Ted Dunning <te...@gmail.com> on 2008/05/13 00:54:31 UTC

interesting thesis regarding naive bayes

The idea here is that Naive Bayes can be significantly improved by taking
into account the variable selection process.

The benefits are two-fold:

a) estimates of error on unseen data are much more accurate if the
unselected parameters are accounted for

b) the actual performance of the classifier can be a bit better because some
interdependence is allowed and because regularization parameters are chosen
more accurately (because they depend on minimizing expected error).

I haven't digested how hard this would be to integrate into anything we
might have, but one interesting aspect is that the additional run-time is
negligible which means that the bias correction would probably not need to
be parallelized.

http://math.usask.ca/~longhai/doc/thesis/thesis.abstract.html

-- 
ted