You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2020/04/23 03:12:41 UTC

[GitHub] [incubator-mxnet] szha opened a new issue #18144: test_numpy_op.py::test_np_empty_like hangs

szha opened a new issue #18144:
URL: https://github.com/apache/incubator-mxnet/issues/18144


   ## Description
   test_numpy_op.py::test_np_empty_like hangs on unix-gpu
   
   http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Funix-gpu/detail/PR-18025/59/pipeline/425
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] leezu edited a comment on issue #18144: test_numpy_op.py::test_np_empty_like hangs

Posted by GitBox <gi...@apache.org>.
leezu edited a comment on issue #18144:
URL: https://github.com/apache/incubator-mxnet/issues/18144#issuecomment-618797968


   I agree. #18090 should have been linked but may have been missed unintentionally


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] leezu commented on issue #18144: test_numpy_op.py::test_np_empty_like hangs

Posted by GitBox <gi...@apache.org>.
leezu commented on issue #18144:
URL: https://github.com/apache/incubator-mxnet/issues/18144#issuecomment-618763195


   @haojin2 you can check #18090 for the evidence.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] haojin2 commented on issue #18144: test_numpy_op.py::test_np_empty_like hangs

Posted by GitBox <gi...@apache.org>.
haojin2 commented on issue #18144:
URL: https://github.com/apache/incubator-mxnet/issues/18144#issuecomment-618787001


   @leezu I understand the goal, but my point is that we should avoid providing un-related info in the issue's description (the hang in the first provided link is not related at all), shouldn't we? It'd be better if link to #18090 was provided in the first place to avoid such confusions.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] leezu commented on issue #18144: test_numpy_op.py::test_np_empty_like hangs

Posted by GitBox <gi...@apache.org>.
leezu commented on issue #18144:
URL: https://github.com/apache/incubator-mxnet/issues/18144#issuecomment-618560077


   In the linked CI run `test_numpy_op.py::test_np_empty_like` is not run and thus can't be responsible for the hang. Thus there must be more triggers besides `test_np_empty_like`.
   
   Related https://github.com/apache/incubator-mxnet/issues/18090


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] leezu edited a comment on issue #18144: test_numpy_op.py::test_np_empty_like hangs

Posted by GitBox <gi...@apache.org>.
leezu edited a comment on issue #18144:
URL: https://github.com/apache/incubator-mxnet/issues/18144#issuecomment-618763195


   @haojin2 you can check #18090 for the evidence. In the above commit, the problem is that only `empty_like` is disabled but not the other numpy operators relying on CustomOp. Doing that in https://github.com/apache/incubator-mxnet/pull/18151 CI passed without hang 2 times in a row so far. You're right that this doesn't fix the root-cause. The objective here is to restore CI stability


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] leezu edited a comment on issue #18144: test_numpy_op.py::test_np_empty_like hangs

Posted by GitBox <gi...@apache.org>.
leezu edited a comment on issue #18144:
URL: https://github.com/apache/incubator-mxnet/issues/18144#issuecomment-618763195


   @haojin2 you can check #18090 for the evidence. In the above commit, the problem is that only `empty_like` is disabled but not the other numpy operators relying on CustomOp. Doing that in https://github.com/apache/incubator-mxnet/pull/18151 CI passed without hang 2 times in a row so far.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] leezu commented on issue #18144: test_numpy_op.py::test_np_empty_like hangs

Posted by GitBox <gi...@apache.org>.
leezu commented on issue #18144:
URL: https://github.com/apache/incubator-mxnet/issues/18144#issuecomment-618797968


   I agree.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] haojin2 commented on issue #18144: test_numpy_op.py::test_np_empty_like hangs

Posted by GitBox <gi...@apache.org>.
haojin2 commented on issue #18144:
URL: https://github.com/apache/incubator-mxnet/issues/18144#issuecomment-618758610


   Wow this issue is VERY INTERESTING, in the first [link](http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Funix-gpu/detail/PR-18025/59/pipeline/425) given in the issue description I'm not even seeing `test_np_empty_like` being run at all, and the last test run before the final timeout process kill was `test_np_bincount`. Also as @leezu pointed out in the above comment, even removing `test_np_empty_like` does not solve the issue. So to conclude, so far I'm not seeing any solid evidence supporting `test_np_empty_like` to be the root cause for the hang.
   To be clear, I'm not saying that I don't think we should re-implement `empty_like` with a native implementation in the future, simply want to suggest that maybe you guys are attacking the wrong target at this moment.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] haojin2 edited a comment on issue #18144: test_numpy_op.py::test_np_empty_like hangs

Posted by GitBox <gi...@apache.org>.
haojin2 edited a comment on issue #18144:
URL: https://github.com/apache/incubator-mxnet/issues/18144#issuecomment-618787001


   @leezu I understand the goal, but my point is that we should avoid providing un-related info in the issue's description (the hang in the first provided link is not related at all), shouldn't we? It'd be better if link to #18090 was provided in the first place to avoid such confusions, don't you agree?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org