You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mxnet.apache.org by Clement Fuji Tsang <cf...@nvidia.com> on 2019/01/14 19:16:58 UTC

Proposal on TensorRT reformat using Subgraph API

Hi,

The current TensorRT implementation tweak simple_bind, you pass parameters through the shared_params of the function.

The problems of the current implementation:
 - You have to use the symbol API (currently no solution for module API)
 - You have to set an environment variable + use the specific binding function which is not very user friendly

Some TensorRT constraints:
 - Go through ONNX representation to use onnx-tensorrt (currently no nnvm to tensorrt implementation).
 - The ONNX model must contains some attributes and informations such as shape, dtypes, context, and weight values to instantiate the TensorRT engine properly (this is TensorRT enginer requirement not an ONNX requirement).

Here is a proposal using Subgraph API:
As most attributes inferences are done at the binding level we cannot create the NNVM and instantiate the TensorRT engine at the Subgraph API.

What we can do is the same kind of solution that we have for CuDNNConv to call CuDNN find only once (see: https://github.com/apache/incubator-mxnet/blob/d22b323df5cfd2d330a321a3daf6880e108eb90c/src/operator/nn/convolution.cu#L39) which is to create the NNVM and instantiate the TensorRT engine during the first call for forward pass and then just get the existing TensorRT engine for the following others forward pass.

It doesn't require any change to the Subgraph API, and I believe if it follows the same behavior than CuDNNConv it should be a valid approach, but I'm looking for your approval.

One problem arise, on which I hope to start a discussion here:
Some variable nodes will be partitioned away from the main graph and will be contained in the subgraphs (As the weights have to be contains inside the ONNX model / TensorRT engine).
So we need to find a way to load the weights inside the TensorRT node, if possible without adding any function but still relying on whatever functions users are currently calling to load weights.
I'm thinking about a way to directly interact with node inside subgraph (so we would have to modify Getters) and directly embedded the weights values inside a node attribute (the same way we embedded the subgraph), which may or may not be used by the node.
Ideally the solution could be use by others futures users of subgraph API if they need the weights values inside the node for whatever reasons.


Let me know your thought on it,

Clement Fuji Tsang


-----------------------------------------------------------------------------------
This email message is for the sole use of the intended recipient(s) and may contain
confidential information.  Any unauthorized review, use, disclosure or distribution
is prohibited.  If you are not the intended recipient, please contact the sender by
reply email and destroy all copies of the original message.
-----------------------------------------------------------------------------------