You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2021/03/30 13:59:57 UTC

[GitHub] [incubator-mxnet] wms2537 opened a new pull request #20104: [FEATURE] Add int16 support.

wms2537 opened a new pull request #20104:
URL: https://github.com/apache/incubator-mxnet/pull/20104


   ## Description ##
   Adds int16 support to mxnet backend, as in #20066.
   
   ## Checklist ##
   ### Essentials ###
   - [x] PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
   - [x] Changes are complete (i.e. I finished coding on this PR)
   - [x] All changes have test coverage
   - [x] Code is well-documented
   
   ### Changes ###
   - [ ] Feature1, tests, (and when applicable, API doc)
   - [ ] Feature2, tests, (and when applicable, API doc)
   
   ## Comments ##
   - If this change is a backward incompatible change, why must this change be made.
   - Interesting edge cases to note here
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] mxnet-bot commented on pull request #20104: [FEATURE] Add int16 support.

Posted by GitBox <gi...@apache.org>.
mxnet-bot commented on pull request #20104:
URL: https://github.com/apache/incubator-mxnet/pull/20104#issuecomment-820332768


   Jenkins CI successfully triggered : [windows-gpu, unix-cpu, unix-gpu, windows-cpu]


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] szha commented on pull request #20104: [FEATURE] Add int16 support.

Posted by GitBox <gi...@apache.org>.
szha commented on pull request #20104:
URL: https://github.com/apache/incubator-mxnet/pull/20104#issuecomment-821225002


   @bgawrych looks like the compiler is killed when compiling the einsum op: https://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Funix-cpu/detail/PR-20104/6/pipeline#step-192-log-785
   
   This could be due to large memory consumption of g++-7 that causes the OS to kill the process.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] szha commented on pull request #20104: [FEATURE] Add int16 support.

Posted by GitBox <gi...@apache.org>.
szha commented on pull request #20104:
URL: https://github.com/apache/incubator-mxnet/pull/20104#issuecomment-821240863


   We could consider breaking down the einsum source code into smaller pieces so that each piece takes less memory. One thing that's unclear to me is why onednn pushes the size across the limit and if there's anything to be done there.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] wms2537 commented on pull request #20104: [FEATURE] Add int16 support.

Posted by GitBox <gi...@apache.org>.
wms2537 commented on pull request #20104:
URL: https://github.com/apache/incubator-mxnet/pull/20104#issuecomment-821234972


   Any solutions to this? @szha 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] mxnet-bot commented on pull request #20104: [FEATURE] Add int16 support.

Posted by GitBox <gi...@apache.org>.
mxnet-bot commented on pull request #20104:
URL: https://github.com/apache/incubator-mxnet/pull/20104#issuecomment-820173283


   Undefined action detected. 
   Permissible actions are : run ci [all], run ci [job1, job2] 
   Example : @mxnet-bot run ci [all] 
   Example : @mxnet-bot run ci [centos-cpu, clang]


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] wms2537 commented on pull request #20104: [FEATURE] Add int16 support.

Posted by GitBox <gi...@apache.org>.
wms2537 commented on pull request #20104:
URL: https://github.com/apache/incubator-mxnet/pull/20104#issuecomment-820910612


   @mxnet-bot run ci [unix-cpu,unix-gpu,windows-cpu,windows-gpu]
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] mxnet-bot commented on pull request #20104: [FEATURE] Add int16 support.

Posted by GitBox <gi...@apache.org>.
mxnet-bot commented on pull request #20104:
URL: https://github.com/apache/incubator-mxnet/pull/20104#issuecomment-820242786


   Unauthorized access detected. 
   Only following 3 categories can trigger CI : 
   PR Author, MXNet Committer, Jenkins Admin.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] bgawrych commented on pull request #20104: [FEATURE] Add int16 support.

Posted by GitBox <gi...@apache.org>.
bgawrych commented on pull request #20104:
URL: https://github.com/apache/incubator-mxnet/pull/20104#issuecomment-821019672


   On my machine I successfully built unix-cpu with oneDNN using docker and script from CI folder - @szha, @leezu I don't see any failure in the logs, can it be some kind of CI issue?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] bgawrych commented on pull request #20104: [FEATURE] Add int16 support.

Posted by GitBox <gi...@apache.org>.
bgawrych commented on pull request #20104:
URL: https://github.com/apache/incubator-mxnet/pull/20104#issuecomment-824851735


   @szha It's probably not by oneDNN itself, but rather by additional code generated when building with oneDNN (OOM is triggered also without oneDNN - see unix-gpu CUDA+cuDNN and there are sucessfully builds with oneDNN too).
    
   In RFC for this problem https://github.com/apache/incubator-mxnet/issues/19688 - there is calculation where function can be generated in 640 ways, but with this change it is 1000 ways (+2 data types) so memory consumption goes significantly up.
   
   There are many large files like:
   264M    ./src/operator/numpy/linalg/np_norm_forward.cc.s
   │253M    ./src/operator/numpy/np_broadcast_reduce_op_value.cc.s
   │209M    ./src/operator/tensor/indexing_op.cc.s
   
   CI is using always c5.18xlarge? Is single instance running multiple builds?
   Maybe increasing swap size or limiting number of threads (ninja -j ) will help?
   
   @akarbown 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] wms2537 commented on pull request #20104: [FEATURE] Add int16 support.

Posted by GitBox <gi...@apache.org>.
wms2537 commented on pull request #20104:
URL: https://github.com/apache/incubator-mxnet/pull/20104#issuecomment-820332706


   @mxnet-bot run ci [unix-cpu,unix-gpu,windows-cpu,windows-gpu]


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] bgawrych commented on pull request #20104: [FEATURE] Add int16 support.

Posted by GitBox <gi...@apache.org>.
bgawrych commented on pull request #20104:
URL: https://github.com/apache/incubator-mxnet/pull/20104#issuecomment-820173239


   Can't find any reason why CI is failing this time - only:
   ```[2021-04-12T12:45:34.003Z] Cannot contact mxnetlinux-cpu_n9hxuz5xzz: java.lang.InterruptedException```
   @wms2537 you can try to rerun CI tests by commenting
   ```
   @mxnet-bot run ci [unix-cpu,unix-gpu,windows-cpu,windows-gpu]
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] bgawrych commented on pull request #20104: [FEATURE] Add int16 support.

Posted by GitBox <gi...@apache.org>.
bgawrych commented on pull request #20104:
URL: https://github.com/apache/incubator-mxnet/pull/20104#issuecomment-817695297


   @wms2537 in full CI logs of windows-cpu run there is following error:
   ```
   [2021-04-08T17:00:39.673Z] C:\jenkins_slave\workspace\build-cpu\src\operator\tensor\./broadcast_reduce_op.h(1553): error C2666: '*': 18 overloads have similar conversions
   [2021-04-08T17:00:39.673Z] C:\jenkins_slave\workspace\build-cpu\include\mxnet\./ndarray.h(1249): note: could be 'mxnet::NDArray mxnet::operator *(const mxnet::NDArray &,const mxnet::real_t &)'
   [2021-04-08T17:00:39.673Z] C:\jenkins_slave\workspace\build-cpu\include\mxnet\./ndarray.h(1242): note: or       'mxnet::NDArray mxnet::operator *(const mxnet::NDArray &,const mxnet::NDArray &)'
   [2021-04-08T17:00:39.673Z] C:\jenkins_slave\workspace\build-cpu\3rdparty\mshadow\mshadow\./bfloat.h(170): note: or       'mshadow::bfloat::bf16_t mshadow::bfloat::operator *(mshadow::bfloat::bf16_t,mshadow::bfloat::bf16_t)' [found using argument-dependent lookup]
   [2021-04-08T17:00:39.673Z] C:\jenkins_slave\workspace\build-cpu\3rdparty\mshadow\mshadow\./bfloat.h(170): note: or       'float mshadow::bfloat::operator *(float,mshadow::bfloat::bf16_t)' [found using argument-dependent lookup]
   [2021-04-08T17:00:39.673Z] C:\jenkins_slave\workspace\build-cpu\3rdparty\mshadow\mshadow\./bfloat.h(170): note: or       'float mshadow::bfloat::operator *(mshadow::bfloat::bf16_t,float)' [found using argument-dependent lookup]
   [2021-04-08T17:00:39.673Z] C:\jenkins_slave\workspace\build-cpu\3rdparty\mshadow\mshadow\./bfloat.h(170): note: or       'double mshadow::bfloat::operator *(double,mshadow::bfloat::bf16_t)' [found using argument-dependent lookup]
   [2021-04-08T17:00:39.673Z] C:\jenkins_slave\workspace\build-cpu\3rdparty\mshadow\mshadow\./bfloat.h(170): note: or       'double mshadow::bfloat::operator *(mshadow::bfloat::bf16_t,double)' [found using argument-dependent lookup]
   [2021-04-08T17:00:39.673Z] C:\jenkins_slave\workspace\build-cpu\3rdparty\mshadow\mshadow\./bfloat.h(170): note: or       'float mshadow::bfloat::operator *(int8_t,mshadow::bfloat::bf16_t)' [found using argument-dependent lookup]
   [2021-04-08T17:00:39.673Z] C:\jenkins_slave\workspace\build-cpu\3rdparty\mshadow\mshadow\./bfloat.h(170): note: or       'float mshadow::bfloat::operator *(mshadow::bfloat::bf16_t,int8_t)' [found using argument-dependent lookup]
   [2021-04-08T17:00:39.673Z] C:\jenkins_slave\workspace\build-cpu\3rdparty\mshadow\mshadow\./bfloat.h(170): note: or       'float mshadow::bfloat::operator *(uint8_t,mshadow::bfloat::bf16_t)' [found using argument-dependent lookup]
   [2021-04-08T17:00:39.673Z] C:\jenkins_slave\workspace\build-cpu\3rdparty\mshadow\mshadow\./bfloat.h(170): note: or       'float mshadow::bfloat::operator *(mshadow::bfloat::bf16_t,uint8_t)' [found using argument-dependent lookup]
   [2021-04-08T17:00:39.673Z] C:\jenkins_slave\workspace\build-cpu\3rdparty\mshadow\mshadow\./bfloat.h(170): note: or       'float mshadow::bfloat::operator *(int32_t,mshadow::bfloat::bf16_t)' [found using argument-dependent lookup]
   [2021-04-08T17:00:39.673Z] C:\jenkins_slave\workspace\build-cpu\3rdparty\mshadow\mshadow\./bfloat.h(170): note: or       'float mshadow::bfloat::operator *(mshadow::bfloat::bf16_t,int32_t)' [found using argument-dependent lookup]
   [2021-04-08T17:00:39.673Z] C:\jenkins_slave\workspace\build-cpu\3rdparty\mshadow\mshadow\./bfloat.h(170): note: or       'float mshadow::bfloat::operator *(uint32_t,mshadow::bfloat::bf16_t)' [found using argument-dependent lookup]
   [2021-04-08T17:00:39.673Z] C:\jenkins_slave\workspace\build-cpu\3rdparty\mshadow\mshadow\./bfloat.h(170): note: or       'float mshadow::bfloat::operator *(mshadow::bfloat::bf16_t,uint32_t)' [found using argument-dependent lookup]
   [2021-04-08T17:00:39.673Z] C:\jenkins_slave\workspace\build-cpu\3rdparty\mshadow\mshadow\./bfloat.h(170): note: or       'float mshadow::bfloat::operator *(int64_t,mshadow::bfloat::bf16_t)' [found using argument-dependent lookup]
   [2021-04-08T17:00:39.673Z] C:\jenkins_slave\workspace\build-cpu\3rdparty\mshadow\mshadow\./bfloat.h(170): note: or       'float mshadow::bfloat::operator *(mshadow::bfloat::bf16_t,int64_t)' [found using argument-dependent lookup]
   [2021-04-08T17:00:39.673Z] C:\jenkins_slave\workspace\build-cpu\3rdparty\mshadow\mshadow\./bfloat.h(170): note: or       'float mshadow::bfloat::operator *(uint64_t,mshadow::bfloat::bf16_t)' [found using argument-dependent lookup]
   [2021-04-08T17:00:39.673Z] C:\jenkins_slave\workspace\build-cpu\3rdparty\mshadow\mshadow\./bfloat.h(170): note: or       'float mshadow::bfloat::operator *(mshadow::bfloat::bf16_t,uint64_t)' [found using argument-dependent lookup]
   [2021-04-08T17:00:39.673Z] C:\jenkins_slave\workspace\build-cpu\src\operator\tensor\./broadcast_reduce_op.h(1553): note: or       'built-in C++ operator*(OType, float)'
   ```
   
   This is probably caused by lack of int16 support in bfloat header- here:
   https://github.com/apache/incubator-mxnet/blob/957733d8b6ccdc8de6c54d9ac4e6c6ce9420e77b/3rdparty/mshadow/mshadow/bfloat.h#L48-L55
   
   and here:
   https://github.com/apache/incubator-mxnet/blob/957733d8b6ccdc8de6c54d9ac4e6c6ce9420e77b/3rdparty/mshadow/mshadow/bfloat.h#L87-L94
   
   Will you try to fix it?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] szha commented on pull request #20104: [FEATURE] Add int16 support.

Posted by GitBox <gi...@apache.org>.
szha commented on pull request #20104:
URL: https://github.com/apache/incubator-mxnet/pull/20104#issuecomment-815984568


   @mxnet-bot run ci [centos-cpu,centos-gpu,windows-cpu,windows-gpu]


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] mxnet-bot commented on pull request #20104: [FEATURE] Add int16 support.

Posted by GitBox <gi...@apache.org>.
mxnet-bot commented on pull request #20104:
URL: https://github.com/apache/incubator-mxnet/pull/20104#issuecomment-810277922


   Hey @wms2537 , Thanks for submitting the PR 
   All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands: 
   - To trigger all jobs: @mxnet-bot run ci [all] 
   - To trigger specific jobs: @mxnet-bot run ci [job1, job2] 
   *** 
   **CI supported jobs**: [edge, unix-cpu, sanity, windows-gpu, miscellaneous, website, unix-gpu, centos-cpu, centos-gpu, clang, windows-cpu]
   *** 
   _Note_: 
    Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin. 
   All CI tests must pass before the PR can be merged. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] mxnet-bot commented on pull request #20104: [FEATURE] Add int16 support.

Posted by GitBox <gi...@apache.org>.
mxnet-bot commented on pull request #20104:
URL: https://github.com/apache/incubator-mxnet/pull/20104#issuecomment-815984649


   Jenkins CI successfully triggered : [centos-cpu, windows-cpu, windows-gpu, centos-gpu]


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] mxnet-bot commented on pull request #20104: [FEATURE] Add int16 support.

Posted by GitBox <gi...@apache.org>.
mxnet-bot commented on pull request #20104:
URL: https://github.com/apache/incubator-mxnet/pull/20104#issuecomment-815704783


   Jenkins CI successfully triggered : [edge, centos-cpu, clang, windows-cpu, windows-gpu, unix-cpu, website, sanity, centos-gpu, unix-gpu, miscellaneous]


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] wms2537 commented on pull request #20104: [FEATURE] Add int16 support.

Posted by GitBox <gi...@apache.org>.
wms2537 commented on pull request #20104:
URL: https://github.com/apache/incubator-mxnet/pull/20104#issuecomment-815704576


    @mxnet-bot run ci [all]


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] bgawrych commented on pull request #20104: [FEATURE] Add int16 support.

Posted by GitBox <gi...@apache.org>.
bgawrych commented on pull request #20104:
URL: https://github.com/apache/incubator-mxnet/pull/20104#issuecomment-815630114


   @wms2537 Can you retrigger CI? I've successfully built MXNet with this PR - in the CI it's not compiling:
   [2021-04-01T14:45:48.117Z] Cannot contact mxnetlinux-cpu_2cu10y3j6k: java.lang.InterruptedException
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] bgawrych commented on pull request #20104: [FEATURE] Add int16 support.

Posted by GitBox <gi...@apache.org>.
bgawrych commented on pull request #20104:
URL: https://github.com/apache/incubator-mxnet/pull/20104#issuecomment-820242734


   @mxnet-bot run ci [unix-cpu,unix-gpu,windows-cpu,windows-gpu]


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org