You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@madlib.apache.org by "Frank McQuillan (JIRA)" <ji...@apache.org> on 2018/05/08 22:31:00 UTC

[jira] [Updated] (MADLIB-1210) Add momentum methods to MLP

     [ https://issues.apache.org/jira/browse/MADLIB-1210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Frank McQuillan updated MADLIB-1210:
------------------------------------
    Description: 
Story

As a data scientist,
I want to use momentum methods in MLP,
so that I get significantly better convergence behavior.

Details

Adding momentum will get the MADlib MLP algorithm closer to state of the art.

1) Implement momentum term, default value ~0.9

Ref [1]:
"Momentum update is another approach that almost always enjoys better converge rates on deep networks." 

2) Implement Nesterov momentum, default TRUE

Ref [2]:
"Nesterov Momentum is a slightly different version of the momentum update that has recently been gaining popularity. It enjoys stronger theoretical converge guarantees for convex functions and in practice it also consistently works slightly better than standard momentum."

Acceptance

[1] Find/create a dataset that can be used to compare the usefulness of momentum with and without Nesterov, mini-batch, and SGD. This usefulness can be compared based on convergence, which includes both speed and accuracy.

References

[1] http://cs231n.github.io/neural-networks-3/#sgd
[2] http://www.cs.utoronto.ca/~ilya/pubs/ilya_sutskever_phd_thesis.pdf, a link from previous source.
[3] http://ruder.io/optimizing-gradient-descent/index.html#gradientdescentoptimizationalgorithms
[4] http://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html
[5] https://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf

  was:
Story

As a data scientist,
I want to use momentum methods in MLP,
so that I get significantly better convergence behavior.

Details

Adding momentum will get the MADlib MLP algorithm closer to state of the art.

1) Implement momentum term, default value ~0.9

Ref [1]:
"Momentum update is another approach that almost always enjoys better converge rates on deep networks." 

2) Implement Nesterov momentum, default TRUE

Ref [2]:
"Nesterov Momentum is a slightly different version of the momentum update that has recently been gaining popularity. It enjoys stronger theoretical converge guarantees for convex functions and in practice it also consistently works slightly better than standard momentum."

Acceptance

TBD

References

[1] http://cs231n.github.io/neural-networks-3/#sgd
[2] http://ruder.io/optimizing-gradient-descent/index.html#gradientdescentoptimizationalgorithms
[3] http://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html



> Add momentum methods to MLP
> ---------------------------
>
>                 Key: MADLIB-1210
>                 URL: https://issues.apache.org/jira/browse/MADLIB-1210
>             Project: Apache MADlib
>          Issue Type: New Feature
>          Components: Module: Neural Networks
>            Reporter: Frank McQuillan
>            Priority: Major
>             Fix For: v1.15
>
>
> Story
> As a data scientist,
> I want to use momentum methods in MLP,
> so that I get significantly better convergence behavior.
> Details
> Adding momentum will get the MADlib MLP algorithm closer to state of the art.
> 1) Implement momentum term, default value ~0.9
> Ref [1]:
> "Momentum update is another approach that almost always enjoys better converge rates on deep networks." 
> 2) Implement Nesterov momentum, default TRUE
> Ref [2]:
> "Nesterov Momentum is a slightly different version of the momentum update that has recently been gaining popularity. It enjoys stronger theoretical converge guarantees for convex functions and in practice it also consistently works slightly better than standard momentum."
> Acceptance
> [1] Find/create a dataset that can be used to compare the usefulness of momentum with and without Nesterov, mini-batch, and SGD. This usefulness can be compared based on convergence, which includes both speed and accuracy.
> References
> [1] http://cs231n.github.io/neural-networks-3/#sgd
> [2] http://www.cs.utoronto.ca/~ilya/pubs/ilya_sutskever_phd_thesis.pdf, a link from previous source.
> [3] http://ruder.io/optimizing-gradient-descent/index.html#gradientdescentoptimizationalgorithms
> [4] http://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html
> [5] https://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)