You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2022/02/05 21:18:29 UTC

[GitHub] [incubator-mxnet] DickJC123 opened a new pull request #20876: [FEATURE] [WIP] Add g5 instance to CI

DickJC123 opened a new pull request #20876:
URL: https://github.com/apache/incubator-mxnet/pull/20876


   ## Description ##
   @josephevans is in the process of adding a g5 instance to the CI, for MXNet testing on A100.
   This PR will first enable the CI on the g5 instance, which will expose the need for some test tolerance adjustments, since A100 uses reduced-mantissa-width TF32 calculations by default on float32 datasets.
   I will then add the fixing commits to this PR to get a clean CI before merging.
   
   See the related PR https://github.com/apache/incubator-mxnet-ci/pull/43 [not yet merged]
   
   ## Checklist ##
   ### Essentials ###
   - [ X] PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
   - [ ] Changes are complete (i.e. I finished coding on this PR)
   - [ ] All changes have test coverage
   - [ ] Code is well-documented
   
   ### Changes ###
   - [ ] Feature1, tests, (and when applicable, API doc)
   - [ ] Feature2, tests, (and when applicable, API doc)
   
   ## Comments ##
   - If this change is a backward incompatible change, why must this change be made.
   - Interesting edge cases to note here
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@mxnet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] DickJC123 commented on pull request #20876: [FEATURE] [WIP] Add g5 instance to CI

Posted by GitBox <gi...@apache.org>.
DickJC123 commented on pull request #20876:
URL: https://github.com/apache/incubator-mxnet/pull/20876#issuecomment-1032842894


   @josephevans Could you take a look at what I've done so far, and perhaps troubleshoot why I'm seeing the error `There are no nodes with the label ‘mxnetlinux-gpu-g5’` on CI page https://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Funix-gpu/detail/PR-20876/3/pipeline/247. Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@mxnet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] ptrendx merged pull request #20876: [FEATURE] Add g5 instance to CI

Posted by GitBox <gi...@apache.org>.
ptrendx merged pull request #20876:
URL: https://github.com/apache/incubator-mxnet/pull/20876


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@mxnet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] DickJC123 commented on pull request #20876: [FEATURE] Add g5 instance to CI

Posted by GitBox <gi...@apache.org>.
DickJC123 commented on pull request #20876:
URL: https://github.com/apache/incubator-mxnet/pull/20876#issuecomment-1047328429


   I've encountered a test failure of test_countsketch here: https://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Fcentos-gpu/detail/PR-20876/16/pipeline
   
   I see where threads might write outside the output tensor bounds, so pushing a fix to this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@mxnet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] DickJC123 commented on pull request #20876: [FEATURE] Add g5 instance to CI

Posted by GitBox <gi...@apache.org>.
DickJC123 commented on pull request #20876:
URL: https://github.com/apache/incubator-mxnet/pull/20876#issuecomment-1061175582


   Gentle ping for additional reviews and an eventual merge.  This PR contains a few unrelated CI fixes that could help PR development generally.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@mxnet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] TristonC edited a comment on pull request #20876: [FEATURE] Add g5 instance to CI

Posted by GitBox <gi...@apache.org>.
TristonC edited a comment on pull request #20876:
URL: https://github.com/apache/incubator-mxnet/pull/20876#issuecomment-1062211843


   @szha @ptrendx  Could you please help review and merge this PR? It passed all the checks and is ready to be merged. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@mxnet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] TristonC commented on pull request #20876: [FEATURE] Add g5 instance to CI

Posted by GitBox <gi...@apache.org>.
TristonC commented on pull request #20876:
URL: https://github.com/apache/incubator-mxnet/pull/20876#issuecomment-1062211843


   @szha @josephevans Could you please help review and merge this PR? It passed all the checks and is ready to be merged. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@mxnet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] DickJC123 commented on pull request #20876: [FEATURE] Add g5 instance to CI

Posted by GitBox <gi...@apache.org>.
DickJC123 commented on pull request #20876:
URL: https://github.com/apache/incubator-mxnet/pull/20876#issuecomment-1068465192


   To help this PR pass CI, it included a fix to test_countsketch, providing a resolution for https://github.com/apache/incubator-mxnet/issues/10988.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@mxnet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] mxnet-bot commented on pull request #20876: [FEATURE] [WIP] Add g5 instance to CI

Posted by GitBox <gi...@apache.org>.
mxnet-bot commented on pull request #20876:
URL: https://github.com/apache/incubator-mxnet/pull/20876#issuecomment-1030700482


   Hey @DickJC123 , Thanks for submitting the PR 
   All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands: 
   - To trigger all jobs: @mxnet-bot run ci [all] 
   - To trigger specific jobs: @mxnet-bot run ci [job1, job2] 
   *** 
   **CI supported jobs**: [centos-gpu, windows-gpu, unix-cpu, unix-gpu, miscellaneous, sanity, edge, website, centos-cpu, clang, windows-cpu]
   *** 
   _Note_: 
    Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin. 
   All CI tests must pass before the PR can be merged. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@mxnet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org