You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Isabel Drost-Fromm <is...@apache.org> on 2013/11/29 10:12:54 UTC

Moving valuable docs to Mahout site [Was: Re: Detecting high bias and variance in AdaptiveLogisticRegression classification]

Hi,

when going through our wiki docs to convert them to Apache CMS I got
the impression that we could improve a lot in particular when it comes
to explaining strengths and limitations of particular approaches.

While reading the below (and several other valuable mails
on user@) I started to wonder whether texts like this could be a basis
for more detailed docs. 

The future work part could go on a separate page that tracks potential
future work. One could think about using JIRA as well, but then again
larger items like this do not look like they will be done within the
next few months unless someone with some particular interest steps up
to work on these...

What do you think?


Isabel


On Tue, 26 Nov 2013 23:26:11 -0800
Ted Dunning <te...@gmail.com> wrote:

> Well, first off, let me say that I am much less of a fan now of the
> magical cross validation approach and adaptation based on that than I
> was when I wrote the ALR code.  There are definitely legs in the
> ideas, but my implementation has a number of flaws.
> 
> For example:
> 
> a) the way that I provide for handling multiple passes through the
> data is very easy to screw up.  I think that simply separating the
> data entirely might be a better approach.
> 
> b) for truly on-line learning where no repeated passes through the
> data will ever occur, then cross validation is not the best choice.
> Much better in those cases to use what Google researchers described
> in [1].
> 
> c) it is clear from several reports that the evolutionary algorithm
> prematurely shuts down the learning rate.  I think that Adagrad-like
> learning rates are more reliable.  See [1] again for one of the more
> readable descriptions of this.  See also [2] for another view on
> adaptive learning rates.
> 
> d) item (c) is also related to the way that learning rates are
> adapted in the underlying OnlineLogisticRegression.  That needs to be
> fixed.
> 
> e) asynchronous parallel stochastic gradient descent with mini-batch
> learning is where we should be headed.  I do not have time to write
> it, however.
> 
> All this aside, I am happy to help in any way that I can given my
> recent time limits.
> 
> 
> [1] http://research.google.com/pubs/pub41159.html
> 
> [2] http://www.cs.jhu.edu/~mdredze/publications/cw_nips_08.pdf