You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2021/03/20 08:32:44 UTC

[GitHub] [incubator-mxnet] khaotik opened a new pull request #20065: [FEATURE] AdaBelief operator

khaotik opened a new pull request #20065:
URL: https://github.com/apache/incubator-mxnet/pull/20065


   ## Description ##
   #19950
   Hello, this is my first PR to apache MXNet. Glad to work with the community.
   
   ## Checklist ##
   ### Essentials ###
   - [X] PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
   - [X] Changes are complete (i.e. I finished coding on this PR)
   - [X] All changes have test coverage
   - [X] Code is well-documented
   
   ### Changes ###
   - added C++ operator under `src/operator/contrib/adabelief*`
   - briefly modified `src/operator/contrib/adamw*` to avoid symbol name clashes
   - `src/operator/contrib/adabelief*`
   
   ## Comments ##
   - This is mostly a copycat of AdamW optimizer, and therefore fair amount of code duplication. It might be a good idea to perform a refactor regarding adam-like optimizers later.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] szha commented on pull request #20065: [FEATURE] AdaBelief operator

Posted by GitBox <gi...@apache.org>.
szha commented on pull request #20065:
URL: https://github.com/apache/incubator-mxnet/pull/20065#issuecomment-803593717


   OK the CI is working for this PR now. There are some formatting issues to address in the sanity check: https://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Fsanity/detail/PR-20065/2/pipeline#step-75-log-169


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] khaotik commented on pull request #20065: [FEATURE] AdaBelief operator

Posted by GitBox <gi...@apache.org>.
khaotik commented on pull request #20065:
URL: https://github.com/apache/incubator-mxnet/pull/20065#issuecomment-804419813


   @mxnet-bot run ci [centos-cpu]


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] khaotik commented on pull request #20065: [FEATURE] AdaBelief operator

Posted by GitBox <gi...@apache.org>.
khaotik commented on pull request #20065:
URL: https://github.com/apache/incubator-mxnet/pull/20065#issuecomment-803791285


   Here are failed tests:
   
   ```
   tests/python/unittest/test_operator.py::test_layer_norm
   tests/python/unittest/test_operator.py::test_norm
   ```
   
   I can't see immediate relationship with my PR here. Do they just fail at random?
   
   @mxnet-bot run ci [centos-cpu, unix-cpu]


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] szha commented on pull request #20065: [FEATURE] AdaBelief operator

Posted by GitBox <gi...@apache.org>.
szha commented on pull request #20065:
URL: https://github.com/apache/incubator-mxnet/pull/20065#issuecomment-804044263


   > Here are failed tests:
   > 
   > 
   > 
   > ```
   > 
   > tests/python/unittest/test_operator.py::test_layer_norm
   > 
   > tests/python/unittest/test_operator.py::test_norm
   > 
   > ```
   > 
   > 
   > 
   > I can't see immediate relationship with my PR here. Do they just fail at random?
   > 
   > 
   > 
   > @mxnet-bot run ci [centos-cpu, unix-cpu]
   
   Yes I believe those are just flaky and we can overcome with rerunning the failed pipelines for now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] mxnet-bot commented on pull request #20065: [FEATURE] AdaBelief operator

Posted by GitBox <gi...@apache.org>.
mxnet-bot commented on pull request #20065:
URL: https://github.com/apache/incubator-mxnet/pull/20065#issuecomment-803791306


   Jenkins CI successfully triggered : [centos-cpu, unix-cpu]


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] mxnet-bot commented on pull request #20065: [FEATURE] AdaBelief operator

Posted by GitBox <gi...@apache.org>.
mxnet-bot commented on pull request #20065:
URL: https://github.com/apache/incubator-mxnet/pull/20065#issuecomment-804044315


   Jenkins CI successfully triggered : [unix-cpu, centos-cpu]


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] szha merged pull request #20065: [FEATURE] AdaBelief operator

Posted by GitBox <gi...@apache.org>.
szha merged pull request #20065:
URL: https://github.com/apache/incubator-mxnet/pull/20065


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] mxnet-bot commented on pull request #20065: [FEATURE] AdaBelief operator

Posted by GitBox <gi...@apache.org>.
mxnet-bot commented on pull request #20065:
URL: https://github.com/apache/incubator-mxnet/pull/20065#issuecomment-804419872


   Jenkins CI successfully triggered : [centos-cpu]


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] szha commented on pull request #20065: [FEATURE] AdaBelief operator

Posted by GitBox <gi...@apache.org>.
szha commented on pull request #20065:
URL: https://github.com/apache/incubator-mxnet/pull/20065#issuecomment-803638870


   @khaotik that might have been the host running out of memory.
   
   There are some additional errors related to AMP: https://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Fcentos-gpu/detail/PR-20065/1/pipeline/#step-170-log-891.
   
   The way to fix it is to include the new operators in [this list](https://github.com/apache/incubator-mxnet/blob/master/python/mxnet/amp/lists/symbol_fp16.py#L40), which signifies that when AMP is enabled, these operators need not to be casted.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] mxnet-bot commented on pull request #20065: [FEATURE] AdaBelief operator

Posted by GitBox <gi...@apache.org>.
mxnet-bot commented on pull request #20065:
URL: https://github.com/apache/incubator-mxnet/pull/20065#issuecomment-803272227


   Hey @khaotik , Thanks for submitting the PR 
   All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands: 
   - To trigger all jobs: @mxnet-bot run ci [all] 
   - To trigger specific jobs: @mxnet-bot run ci [job1, job2] 
   *** 
   **CI supported jobs**: [website, windows-gpu, unix-cpu, edge, miscellaneous, sanity, centos-cpu, clang, windows-cpu, centos-gpu, unix-gpu]
   *** 
   _Note_: 
    Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin. 
   All CI tests must pass before the PR can be merged. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] szha commented on pull request #20065: [FEATURE] AdaBelief operator

Posted by GitBox <gi...@apache.org>.
szha commented on pull request #20065:
URL: https://github.com/apache/incubator-mxnet/pull/20065#issuecomment-803502766


   @khaotik thanks for the contribution! It looks like CI fails to start due to [a merge failure](https://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Fsanity/detail/PR-20065/1/pipeline/). Perhaps you could squash the commits into a single commit, rebase to the latest master, and then do a force push?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] khaotik commented on pull request #20065: [FEATURE] AdaBelief operator

Posted by GitBox <gi...@apache.org>.
khaotik commented on pull request #20065:
URL: https://github.com/apache/incubator-mxnet/pull/20065#issuecomment-803619360


   Hmm, test failed due to gcc "internal compiler error" during build phase. I find this a bit puzzling.
   
   ```
   [2021-03-21T15:52:36.650Z] FAILED: CMakeFiles/mxnet.dir/src/operator/numpy/np_kron.cc.o 
   
   [2021-03-21T15:52:36.650Z] /usr/local/bin/ccache /usr/bin/g++-7  -DDMLC_CORE_USE_CMAKE -DDMLC_LOG_FATAL_THROW=1 -DDMLC_LOG_STACK_TRACE_SIZE=0 -DDMLC_MODERN_THREAD_LOCAL=0 -DDMLC_STRICT_CXX11 -DDMLC_USE_CXX11 -DDMLC_USE_CXX11=1 -DDMLC_USE_CXX14 -DMSHADOW_FORCE_STREAM -DMSHADOW_INT64_TENSOR_SIZE=1 -DMSHADOW_IN_CXX11 -DMSHADOW_USE_CBLAS=1 -DMSHADOW_USE_CUDA=1 -DMSHADOW_USE_CUDNN -DMSHADOW_USE_MKL=0 -DMSHADOW_USE_SSE -DMXNET_USE_BLAS_OPEN=1 -DMXNET_USE_CUDA=1 -DMXNET_USE_ILP64_LAPACKE=1 -DMXNET_USE_INTGEMM=1 -DMXNET_USE_LAPACK=1 -DMXNET_USE_LAPACKE_INTERFACE=1 -DMXNET_USE_LIBJPEG_TURBO=0 -DMXNET_USE_NVTX=1 -DMXNET_USE_ONEDNN=1 -DMXNET_USE_OPENCV=1 -DMXNET_USE_OPENMP=1 -DMXNET_USE_OPERATOR_TUNING=1 -DMXNET_USE_SIGNAL_HANDLER=1 -DUSE_CUDNN -D__USE_XOPEN2K8 -Dmxnet_EXPORTS -I/work/mxnet/3rdparty/onednn/include -I3rdparty/onednn/include -I/work/mxnet/include -I/work/mxnet/src -I/work/mxnet/3rdparty/tvm/nnvm/include -I/work/mxnet/3rdparty/tvm/include -I/work/mxnet/3rdparty/dmlc-core/includ
 e -I/work/mxnet/3rdparty/dlpack/include -I/work/mxnet/3rdparty/mshadow -I3rdparty/intgemm -I/work/mxnet/3rdparty/intgemm -I/work/mxnet/3rdparty/onednn/src/../include -I/work/mxnet/3rdparty/miniz -I3rdparty/dmlc-core/include -isystem /usr/include/opencv4 -isystem /usr/local/cuda/include -D_GLIBCXX_ASSERTIONS  -Wall -Wno-sign-compare -O3 -g -fopenmp -O2 -g -DNDEBUG -fPIC   -Werror -Wno-error=unused-variable -Wno-unused-parameter -Wno-unknown-pragmas -Wno-unused-local-typedefs -msse3 -mf16c -fopenmp -std=gnu++1z -MD -MT CMakeFiles/mxnet.dir/src/operator/numpy/np_kron.cc.o -MF CMakeFiles/mxnet.dir/src/operator/numpy/np_kron.cc.o.d -o CMakeFiles/mxnet.dir/src/operator/numpy/np_kron.cc.o -c /work/mxnet/src/operator/numpy/np_kron.cc
   
   [2021-03-21T15:52:36.650Z] g++-7: internal compiler error: Killed (program cc1plus)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] szha commented on pull request #20065: [FEATURE] AdaBelief operator

Posted by GitBox <gi...@apache.org>.
szha commented on pull request #20065:
URL: https://github.com/apache/incubator-mxnet/pull/20065#issuecomment-830145684


   @khaotik thanks for the contribution!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org