You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2020/07/17 17:38:13 UTC

[GitHub] [incubator-mxnet] leezu opened a new issue #18745: Run Large Tensor Tests as part of PR tests

leezu opened a new issue #18745:
URL: https://github.com/apache/incubator-mxnet/issues/18745


   While there is an effort to enable large tensor feature by default (https://github.com/apache/incubator-mxnet/pull/18625), the feature is only tested nightly and in fact the tests are failing: http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/NightlyTestsForBinaries/detail/master/752/pipeline/96
   
   As this will be a default feature, and especially as the test only runs 30min, can we enable it as part of the PR checks?
   
   https://github.com/apache/incubator-mxnet/pull/18744
   https://github.com/apache/incubator-mxnet/pull/18718
   https://github.com/apache/incubator-mxnet/pull/18715


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] leezu commented on issue #18745: Run Large Tensor Tests as part of PR tests

Posted by GitBox <gi...@apache.org>.

leezu commented on issue #18745:
URL: https://github.com/apache/incubator-mxnet/issues/18745#issuecomment-660319898


   Why does the run linked above only take 26m then?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] leezu commented on issue #18745: Run Large Tensor Tests as part of PR tests

Posted by GitBox <gi...@apache.org>.

leezu commented on issue #18745:
URL: https://github.com/apache/incubator-mxnet/issues/18745#issuecomment-660323091


   @access2rohit there's only one long-running test:
   
   ```
   ] 
   
   [2020-02-25T20:36:21.873Z] OK (SKIP=1)
   
   [2020-02-25T20:36:21.873Z] + nosetests-3.4 tests/nightly/test_large_vector.py:test_nn
   
   [2020-02-25T21:25:14.016Z] .
   
   [2020-02-25T21:25:14.016Z] ----------------------------------------------------------------------
   
   [2020-02-25T21:25:14.016Z] Ran 1 test in 2936.624s
   
   ```
   
   All other tests run in around 30 minutes. Could the long-running test be optimized to take less than 1 hour? In any case, I think the other tests could be moved to run on every PR?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] access2rohit edited a comment on issue #18745: Run Large Tensor Tests as part of PR tests

Posted by GitBox <gi...@apache.org>.

access2rohit edited a comment on issue #18745:
URL: https://github.com/apache/incubator-mxnet/issues/18745#issuecomment-660324294


   We can split up and parallelize the jobs to run on multiple workers as a part of multiple test suites in CI. That will speed things up but will consume more CI resources hence CI run cost per PR will increase(depending on which platforms you want it to be tested on in CI and for Both CPU and GPU build to ensure correctness and consistency in behaviour). What do you think ?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] access2rohit commented on issue #18745: Run Large Tensor Tests as part of PR tests

Posted by GitBox <gi...@apache.org>.

access2rohit commented on issue #18745:
URL: https://github.com/apache/incubator-mxnet/issues/18745#issuecomment-660322071


   > Why does the run linked above only take 26m then?
   
   I think last successful run should be considered. And if we do then : http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/NightlyTestsForBinaries/detail/master/609/pipeline
   
   it takes over roughly 90 mins. @leezu do you agree ?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] leezu commented on issue #18745: Run Large Tensor Tests as part of PR tests

Posted by GitBox <gi...@apache.org>.

leezu commented on issue #18745:
URL: https://github.com/apache/incubator-mxnet/issues/18745#issuecomment-660325168


   As mentioned, I don't think 30 minutes is an issue here as this is a default feature, will run in parallel to other pipelines which already take 60 minutes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] access2rohit commented on issue #18745: Run Large Tensor Tests as part of PR tests

Posted by GitBox <gi...@apache.org>.

access2rohit commented on issue #18745:
URL: https://github.com/apache/incubator-mxnet/issues/18745#issuecomment-660324294


   We can split up and parallelize the jobs to run on multiple works as a part of multiple test suites in CI. That will speed things up but will consume more CI resource hence CI run cost per PR will increase(depending on which platforms you want it to be tested on in CI and for Both CPU and GPU build to ensure correctness and consistency in behaviour). What do you think ?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] access2rohit edited a comment on issue #18745: Run Large Tensor Tests as part of PR tests

Posted by GitBox <gi...@apache.org>.

access2rohit edited a comment on issue #18745:
URL: https://github.com/apache/incubator-mxnet/issues/18745#issuecomment-660324294


   We can split up and parallelize the jobs to run on multiple workers as a part of multiple test suites in CI. That will speed things up but will consume more CI resource hence CI run cost per PR will increase(depending on which platforms you want it to be tested on in CI and for Both CPU and GPU build to ensure correctness and consistency in behaviour). What do you think ?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] access2rohit commented on issue #18745: Run Large Tensor Tests as part of PR tests

Posted by GitBox <gi...@apache.org>.

access2rohit commented on issue #18745:
URL: https://github.com/apache/incubator-mxnet/issues/18745#issuecomment-660315952


   @leezu simply running all tests within test_large_array.py takes more than 90mins. I don't think its practical to enable them on CI. It will simply timeout on all PRs


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org