You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Olivier Grisel (JIRA)" <ji...@apache.org> on 2010/01/19 10:46:54 UTC
[jira] Issue Comment Edited: (MAHOUT-228) Need sequential logistic regression implementation using SGD techniques

    [ https://issues.apache.org/jira/browse/MAHOUT-228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12802165#action_12802165 ] 

Olivier Grisel edited comment on MAHOUT-228 at 1/19/10 9:46 AM:
----------------------------------------------------------------

bq. Are you sure that this is correct? The lazy regularization update should be applied before any coefficient is used for prediction or for update. Is eager regularization after the update necessary?

I made it eager only for the coefficients that have just been updated by the current train step, the remaining coefficients regularization is still delayed until the next "classify(instance)" affecting those coefficients.

If we do not do this (or find a somehow equivalent work around) the coefficient are only regularized upon the classify(instance) call and hence are marked as regularized for the current step value while at the same time the training update make the coefficient of the current step non-null hence inducing a completely dense parameters set.

While this is not a big deal as long as beta is using a DenseMatrix representation, this prevents us from actually measuring the real impact of the lambda value by measuring the sparsity of the parameters. Maybe on problems leading to very sparse models, using a SparseRowMatrix of some kind will be determinant performance-wise and in that case the sparsity inducing ability of L1 should be ensured.

Maybe lazy regularization could also be implemented in a more simple / readable way by doing full regularizeration of beta every "regularizationSkip" training steps (IIRC, this is the case in Leon Bottou's SvmSgd2 but this adds yet another hyperparameter to fiddle with).

There might also be a way to mostly keep the lazy reg as it is and rethink the updateSteps update to avoid breaking the sparsity of L1. Maybe this is just a matter of moving the step++; call after the classify(instance); call. I don't remember if it tried that in the first place...

      was (Author: ogrisel):
    bq. Are you sure that this is correct? The lazy regularization update should be applied before any coefficient is used for prediction or for update. Is eager regularization after the update necessary?

I made it is eager only for the coefficients that have just be updated by the current train step, the remaining coefficient regularization is still delayed until the next "classify()" affecting those coefficients.

If we do not do this (or find a some how equivalent work around) the coefficient are only regularized upon the classify call and hence are marked as regularized for the current step value while at the same time the training update make the coefficient of the current step non-null hence inducing a completely dense parameters set.

While this is not a big deal as long as beta is using a DenseMatrix representation, this prevent us to actually measure the real impact of the lambda value by measuring the sparsity of the parameters. Maybe on problem leading to very sparse models, using a SparseRowMatrix of some kind will be determinant performance-wise and in that case the sparsity inducing ability of L1 should be ensured.

Maybe lazy regularization could also be implemented in a more simple / readable way by doing full regularizeration of beta every "regularizationSkip" training steps (IIRC, this is the case in Leon Bottou's SvmSgd2 but this adds yet another hyperparameter to fiddle with).

There might also be a way to mostly keep the lazy reg as it is and rethink the updateSteps update to avoid breaking the sparsity of L1. Maybe this is just a matter of moving the step++; call after the classify(instance); call. I don't remember if it tried that...
  
> Need sequential logistic regression implementation using SGD techniques
> -----------------------------------------------------------------------
>
>                 Key: MAHOUT-228
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-228
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Classification
>            Reporter: Ted Dunning
>             Fix For: 0.3
>
>         Attachments: logP.csv, MAHOUT-228-3.patch, r.csv, sgd-derivation.pdf, sgd-derivation.tex, sgd.csv
>
>
> Stochastic gradient descent (SGD) is often fast enough for highly scalable learning (see Vowpal Wabbit, http://hunch.net/~vw/).
> I often need to have a logistic regression in Java as well, so that is a reasonable place to start.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.