You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2020/12/01 22:27:26 UTC

[GitHub] [tvm] tqchen opened a new issue #7010: [TEST][FLAKY] test_op_grad_level2.py::test_conv2d_grad.py

tqchen opened a new issue #7010:
URL: https://github.com/apache/tvm/issues/7010


   
   https://ci.tlcpack.ai/job/tvm/job/main/245/execution/node/218/log/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] altanh commented on issue #7010: [TEST][FLAKY] test_op_grad_level2.py::test_conv2d_grad.py

Posted by GitBox <gi...@apache.org>.
altanh commented on issue #7010:
URL: https://github.com/apache/tvm/issues/7010#issuecomment-737546289


   per discussion with @tkonolige, we're pretty sure the abort is being caused by `libomp` conflicts between different 3rd party libraries (e.g. PyTorch and ONNX).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] altanh commented on issue #7010: [TEST][FLAKY] test_op_grad_level2.py::test_conv2d_grad.py

Posted by GitBox <gi...@apache.org>.
altanh commented on issue #7010:
URL: https://github.com/apache/tvm/issues/7010#issuecomment-738383860


   We should keep this issue but rename to dependency libomp conflict I think (or open a new one), since it might arise in the future


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] tqchen commented on issue #7010: [TEST][FLAKY] test_op_grad_level2.py::test_conv2d_grad.py

Posted by GitBox <gi...@apache.org>.
tqchen commented on issue #7010:
URL: https://github.com/apache/tvm/issues/7010#issuecomment-738105336


   It would be great to propose a fix. Is this related to the fact that we are using pytorch for gradient testing? Ideally we sould move that to a separate set of test suite. 
   
   By default, we should use numerical gradient checking that is independent from other frameworks


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] tqchen edited a comment on issue #7010: [TEST][FLAKY] test_op_grad_level2.py::test_conv2d_grad.py

Posted by GitBox <gi...@apache.org>.
tqchen edited a comment on issue #7010:
URL: https://github.com/apache/tvm/issues/7010#issuecomment-738105336


   It would be great to propose a fix, given that the flaky error happens quite frequently. 
   
   Is this related to the fact that we are using pytorch for gradient testing? Ideally we sould move that to a separate set of test suite.  By default, we should use numerical gradient checking that is independent from other frameworks


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] tqchen closed issue #7010: [TEST][FLAKY] test_op_grad_level2.py::test_conv2d_grad.py

Posted by GitBox <gi...@apache.org>.
tqchen closed issue #7010:
URL: https://github.com/apache/tvm/issues/7010


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] altanh commented on issue #7010: [TEST][FLAKY] test_op_grad_level2.py::test_conv2d_grad.py

Posted by GitBox <gi...@apache.org>.
altanh commented on issue #7010:
URL: https://github.com/apache/tvm/issues/7010#issuecomment-738177491


   I agree. I think first we should address #7017 to confirm it's the same failure that is happening on CI, and then look into removing the dependencies. If we can't remove the dependency (like in the case of `test_onnx.py` and `test_dlpack.py`), I propose sandboxing based on dependency so that files with conflicting dependencies will always be run on separate pytest processes. If a single file uses two conflicting dependencies, I'm not sure how to proceed- we may need to build dependencies with special libomp configuration on the CI machine (at least we can cache this?)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] tqchen commented on issue #7010: [TEST][FLAKY] test_op_grad_level2.py::test_conv2d_grad.py

Posted by GitBox <gi...@apache.org>.
tqchen commented on issue #7010:
URL: https://github.com/apache/tvm/issues/7010#issuecomment-826806166


   closing for now as original flaky issue is fixed


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] tkonolige edited a comment on issue #7010: [TEST][FLAKY] test_op_grad_level2.py::test_conv2d_grad.py

Posted by GitBox <gi...@apache.org>.
tkonolige edited a comment on issue #7010:
URL: https://github.com/apache/tvm/issues/7010#issuecomment-737547902


   The error message is:
   ```
   OMP: Error #15: Initializing libomp.dylib, but found libiomp5.dylib already initialized.
   OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://openmp.llvm.org/
   ```
   
   Pytorch loading:
   ```
   dyld: loaded: <F7FFAF24-7A9F-35EA-B715-F2A2F250F575> /Users/tristan/Library/Python/3.8/lib/python/site-packages/torch/lib/libtorch_global_deps.dylib
   dyld: loaded: <52F67CC7-A4B0-3F4D-A80D-7DC28D4A776A> /Users/tristan/Library/Python/3.8/lib/python/site-packages/torch/lib/../.dylibs/libiomp5.dylib
   ```
   
   Onnx loading:
   ```
   dyld: loaded: <C903042A-EFCF-3557-AB7C-155BA03165D0> /usr/local/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_pybind11_state.so
   dyld: loaded: <FF7BABED-D8CA-3F78-BCE2-F0C293919D70> /usr/local/opt/libomp/lib/libomp.dylib
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] altanh commented on issue #7010: [TEST][FLAKY] test_op_grad_level2.py::test_conv2d_grad.py

Posted by GitBox <gi...@apache.org>.
altanh commented on issue #7010:
URL: https://github.com/apache/tvm/issues/7010#issuecomment-737491908


   I can't reproduce this locally on the current main branch


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] altanh commented on issue #7010: [TEST][FLAKY] test_op_grad_level2.py::test_conv2d_grad.py

Posted by GitBox <gi...@apache.org>.
altanh commented on issue #7010:
URL: https://github.com/apache/tvm/issues/7010#issuecomment-737555189


   Relevant issue on onnxruntime GitHub: https://github.com/microsoft/onnxruntime/issues/5369


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] altanh commented on issue #7010: [TEST][FLAKY] test_op_grad_level2.py::test_conv2d_grad.py

Posted by GitBox <gi...@apache.org>.
altanh commented on issue #7010:
URL: https://github.com/apache/tvm/issues/7010#issuecomment-738218571


   @tkonolige found that `pytest-xdist` package supports passing `--forked` argument to `pytest`. This seems to fix the problem for running contrib tests.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] tqchen commented on issue #7010: [TEST][FLAKY] test_op_grad_level2.py::test_conv2d_grad.py

Posted by GitBox <gi...@apache.org>.
tqchen commented on issue #7010:
URL: https://github.com/apache/tvm/issues/7010#issuecomment-736860281


   cc @altanh @jroesch @antinucleon would be great if you can take a look


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] altanh commented on issue #7010: [TEST][FLAKY] test_op_grad_level2.py::test_conv2d_grad.py

Posted by GitBox <gi...@apache.org>.
altanh commented on issue #7010:
URL: https://github.com/apache/tvm/issues/7010#issuecomment-736863288


   I suspect some recent PR might have broke something, this is the error: ```tests/python/relay/test_op_grad_level2.py::test_conv2d_grad Fatal Python error: Aborted.```
   
   Doesn't seem to me like a numerical issue with the gradient


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] tkonolige commented on issue #7010: [TEST][FLAKY] test_op_grad_level2.py::test_conv2d_grad.py

Posted by GitBox <gi...@apache.org>.
tkonolige commented on issue #7010:
URL: https://github.com/apache/tvm/issues/7010#issuecomment-737547902


   The error message is:
   ```
   OMP: Error #15: Initializing libomp.dylib, but found libiomp5.dylib already initialized.
   OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://openmp.llvm.org/
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] tqchen commented on issue #7010: [TEST][FLAKY] test_op_grad_level2.py::test_conv2d_grad.py

Posted by GitBox <gi...@apache.org>.
tqchen commented on issue #7010:
URL: https://github.com/apache/tvm/issues/7010#issuecomment-737379507


   https://ci.tlcpack.ai/job/tvm/job/main/250/execution/node/233/log/ another one related


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org