You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2021/08/06 12:30:57 UTC

[GitHub] [incubator-mxnet] tpyl opened a new pull request #20491: [BUGFIX] [MXNET-1456] Added support for OptimizeForBackend in C++ API

tpyl opened a new pull request #20491:
URL: https://github.com/apache/incubator-mxnet/pull/20491


   Fixes TensorRT support in C++ API. Updated unit tests / examples.
   
   ## Description ##
   TensorRT support is broken in C++ API. (See e.g. [](https://github.com/apache/incubator-mxnet/issues/19550 ) and [](https://github.com/apache/incubator-mxnet/issues/20307)). This PR adds support for OptimizeForBackend which mirrors the modern Python approach (http://mxnet.apache.org/versions/1.8.0/api/python/docs/tutorials/performance/backend/tensorrt/tensorrt.html))
   
   
   ### Changes ###
   - Add mxnet::cpp::Symbol::OptimizeForBackend 
   - Option to pass flags to mxnet::cpp::Executor (needed to pass static_alloc and static_shape needed for TensorRT)
   - Update example/inference/imagenet_inference.cpp to reflect the new way of using TensorRT (used by ...example/inference/unit_test_imagenet_inference.sh )
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@mxnet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] tpyl commented on pull request #20491: [BUGFIX] [MXNET-1456] Added support for OptimizeForBackend in C++ API

Posted by GitBox <gi...@apache.org>.

tpyl commented on pull request #20491:
URL: https://github.com/apache/incubator-mxnet/pull/20491#issuecomment-1004978787


   @nswamy @aaronmarkham 
   Looks like unix-gpu (e.g. ubuntu_gpu_cu114) which is bumped up to Ubuntu 20.04 by this PR (because TensorRT installation is by far the most straightforward for TensorRT8 + Ubuntu 20.04) suffers from the same reference leak issue that has been flagged for CentOS. 
   
   This could be a clue as to why reference leaks happen? I don't think this is is an issue strictly introduced by this PR. 
   
   Similarly, hard to see how the segfault for centos-gpu could be caused by the changes in this PR. None of the TensorRT code is compiled in for the centos-gpu tests and none of the CentOS CI code is changed. (The only place where we build mxnet with -DUSE_TENSORRT=1 is in build_ubuntu_gpu_tensorrt()). 
   
   I would be grateful for any suggestions on how to get the tests passing for this PR. 
   ```
   [2022-01-04T06:01:03.508Z] ==================================== ERRORS ====================================
   [2022-01-04T06:01:03.508Z] _ ERROR at teardown of test_np_standard_binary_funcs[lshape5-rshape5-add-add-True-numeric-<lambda>-None--1.0-1.0] _
   [2022-01-04T06:01:03.508Z] [gw1] linux -- Python 3.8.10 /usr/bin/python3
   [2022-01-04T06:01:03.508Z] 
   [2022-01-04T06:01:03.508Z] request = <SubRequest 'check_leak_ndarray' for <Function test_np_standard_binary_funcs[lshape5-rshape5-add-add-True-numeric-<lambda>-None--1.0-1.0]>>
   [2022-01-04T06:01:03.508Z] 
   [2022-01-04T06:01:03.508Z]     @pytest.fixture(autouse=True)
   [2022-01-04T06:01:03.508Z]     def check_leak_ndarray(request):
   [2022-01-04T06:01:03.508Z]         garbage_expected = request.node.get_closest_marker('garbage_expected')
   [2022-01-04T06:01:03.508Z]         if garbage_expected:  # Some tests leak references. They should be fixed.
   [2022-01-04T06:01:03.508Z]             yield  # run test
   [2022-01-04T06:01:03.508Z]             return
   [2022-01-04T06:01:03.508Z]     
   [2022-01-04T06:01:03.508Z]         if 'centos' in platform.platform():
   [2022-01-04T06:01:03.508Z]             # Multiple tests are failing due to reference leaks on CentOS. It's not
   [2022-01-04T06:01:03.508Z]             # yet known why there are more memory leaks in the Python 3.6.9 version
   [2022-01-04T06:01:03.508Z]             # shipped on CentOS compared to the Python 3.6.9 version shipped in
   [2022-01-04T06:01:03.508Z]             # Ubuntu.
   [2022-01-04T06:01:03.508Z]             yield
   [2022-01-04T06:01:03.508Z]             return
   ````


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@mxnet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] tpyl commented on pull request #20491: [BUGFIX] [MXNET-1456] Added support for OptimizeForBackend in C++ API

Posted by GitBox <gi...@apache.org>.

tpyl commented on pull request #20491:
URL: https://github.com/apache/incubator-mxnet/pull/20491#issuecomment-897097311


   The failed check seems unrelated to the PR (cuda out of mem error), what are the next steps here?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@mxnet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] mxnet-bot commented on pull request #20491: [BUGFIX] [MXNET-1456] Added support for OptimizeForBackend in C++ API

Posted by GitBox <gi...@apache.org>.

mxnet-bot commented on pull request #20491:
URL: https://github.com/apache/incubator-mxnet/pull/20491#issuecomment-894227306


   Hey @tpyl , Thanks for submitting the PR 
   All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands: 
   - To trigger all jobs: @mxnet-bot run ci [all] 
   - To trigger specific jobs: @mxnet-bot run ci [job1, job2] 
   *** 
   **CI supported jobs**: [edge, sanity, miscellaneous, centos-cpu, unix-cpu, unix-gpu, centos-gpu, website, windows-cpu, clang, windows-gpu]
   *** 
   _Note_: 
    Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin. 
   All CI tests must pass before the PR can be merged. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@mxnet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] tpyl commented on pull request #20491: [BUGFIX] [MXNET-1456] Added support for OptimizeForBackend in C++ API

Posted by GitBox <gi...@apache.org>.

tpyl commented on pull request #20491:
URL: https://github.com/apache/incubator-mxnet/pull/20491#issuecomment-995167708


   run ci [unix-gpu]


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@mxnet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] mxnet-bot commented on pull request #20491: [BUGFIX] [MXNET-1456] Added support for OptimizeForBackend in C++ API

Posted by GitBox <gi...@apache.org>.

mxnet-bot commented on pull request #20491:
URL: https://github.com/apache/incubator-mxnet/pull/20491#issuecomment-897901941


   Undefined action detected. 
   Permissible actions are : run ci [all], run ci [job1, job2] 
   Example : @mxnet-bot run ci [all] 
   Example : @mxnet-bot run ci [centos-cpu, clang]


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@mxnet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] mxnet-bot commented on pull request #20491: [BUGFIX] [MXNET-1456] Added support for OptimizeForBackend in C++ API

Posted by GitBox <gi...@apache.org>.

mxnet-bot commented on pull request #20491:
URL: https://github.com/apache/incubator-mxnet/pull/20491#issuecomment-897902994


   Jenkins CI successfully triggered : [centos-gpu]


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@mxnet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] bartekkuncer commented on pull request #20491: [BUGFIX] [MXNET-1456] Added support for OptimizeForBackend in C++ API

Posted by GitBox <gi...@apache.org>.

bartekkuncer commented on pull request #20491:
URL: https://github.com/apache/incubator-mxnet/pull/20491#issuecomment-999486068


   Hi @tpyl , I saw that as a part of your change you are changing oneDNN submodule. Are those changes intentional? If so please tell me what is their purpose.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@mxnet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] tpyl commented on pull request #20491: [BUGFIX] [MXNET-1456] Added support for OptimizeForBackend in C++ API

Posted by GitBox <gi...@apache.org>.

tpyl commented on pull request #20491:
URL: https://github.com/apache/incubator-mxnet/pull/20491#issuecomment-999781513


   > Hi @tpyl , I saw that as a part of your change you are changing oneDNN submodule. Are those changes intentional? If so please tell me what is their purpose.
   
   I think it was accidental, not sure how that happened. Maybe due to switching between branches and forgetting to update submodules every time. I reverted that change now. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@mxnet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] tpyl commented on pull request #20491: [BUGFIX] [MXNET-1456] Added support for OptimizeForBackend in C++ API

Posted by GitBox <gi...@apache.org>.

tpyl commented on pull request #20491:
URL: https://github.com/apache/incubator-mxnet/pull/20491#issuecomment-897901895


   @mxnet-bot run ci centos-gpu
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@mxnet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] tpyl commented on pull request #20491: [BUGFIX] [MXNET-1456] Added support for OptimizeForBackend in C++ API

Posted by GitBox <gi...@apache.org>.

tpyl commented on pull request #20491:
URL: https://github.com/apache/incubator-mxnet/pull/20491#issuecomment-897902954


   @mxnet-bot run ci [centos-gpu]


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@mxnet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] tpyl commented on pull request #20491: [BUGFIX] [MXNET-1456] Added support for OptimizeForBackend in C++ API

Posted by GitBox <gi...@apache.org>.

tpyl commented on pull request #20491:
URL: https://github.com/apache/incubator-mxnet/pull/20491#issuecomment-996229346


   > LGTM! Could you also help add related tests into https://github.com/apache/incubator-mxnet/blob/master/cpp-package/tests/ci_test.sh . Thanks!
   
   I did some work on the ci_tests. But for the tensorRT tests to run, you would have to have set default docker runtime to nvidia and manually run `./build.py -p ubuntu_tensorrt_cu114 /work/runtime_functions.sh citest_cpp_package`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@mxnet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org