You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Ted Dunning <te...@gmail.com> on 2010/07/19 19:52:32 UTC
Re: naive bayes

Drew reminds me that posting to the list is good form.

He asked about SGD and how it worked.  My answer was a nutshell explanation.

On Mon, Jul 19, 2010 at 10:40 AM, Drew Farris <dr...@gmail.com> wrote:

> Thanks for the explanation -- in all it sounds pretty elegant. You
> think in terms of numbers of examples to avoid the problem of
> unbalanced training sets and you train on the features that are most
> interesting in terms of providing new information. Time for me to
> start reading the code.
>
> (Would this be worth passing along to the list, or is it re-iterating
> something already been mentioned there?)
>
> On Mon, Jul 19, 2010 at 12:43 PM, Ted Dunning <te...@gmail.com>
> wrote:
> > The basic idea is very, very simple.  You take an example, figure out a
> > small change to the classifier that would make it do better for that
> example
> > and change the classifier a little bit in that direction.  There are a
> few
> > tricks.
> > The stochastic part is also straight-forward.  The idea is that you think
> of
> > taking samples randomly from a distribution rather than from a set of
> input
> > examples.  The practical effect is that you no longer think in terms of
> > passes through the training data, but rather in terms of number of
> examples
> > seen.  You can batch updates if you like, but the batch size is not
> > determined by the number of examples you have lying around.  This allows
> > convergence in less than a single pass through the data (if the problem
> is
> > appropriate and the data large enough).
> > A second wrinkle in MAHOUT-228 is the confidence weighted learning hack.
> >  The idea is that if you have a new training example that shows you need
> to
> > update the classifier that this new example is likely to have a
> combination
> > of features that you have seen many times and features that you have
> rarely
> > seen.  For the features you have seen often, you probably don't want to
> > learn much while for the features you have rarely seen, you probably want
> to
> > learn a bunch.  In MAHOUT-228, I don't do the mathematically clever
> update
> > that Mark Dredze suggests.  Instead, I just anneal the learning rate on
> each
> > term separately.  The results are very impressive.  This also takes care
> of
> > IDF weighting and stop lists.
> >
> > http://leon.bottou.org/projects/sgd
> >
> http://alias-i.com/lingpipe/demos/tutorial/logistic-regression/read-me.html
> > http://videolectures.net/icml08_pereira_cwl/
> >
> > On Mon, Jul 19, 2010 at 9:26 AM, Drew Farris <dr...@gmail.com>
> wrote:
> >>
> >> I've spent a small amount of time with MAHOUT-228, enough to realize
> >> that I need to understand more details of the SGD approach in addition
> >> to diving into the code :)
> >
>