You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2019/09/14 14:36:52 UTC

[GitHub] [incubator-mxnet] QueensGambit opened a new issue #16173: Saving and loading of cudNN optimization and graph fusion

QueensGambit opened a new issue #16173: Saving and loading of cudNN optimization and graph fusion
URL: https://github.com/apache/incubator-mxnet/issues/16173
 
 
   Hello everyone,
   
   there are several tasks which are executed repeatedly on binding MXNet graphs and result in the same outcome when the graph is unchanged.
   In theory these results could be saved to disk and later reloaded. 
   These tasks include cudNN autotuning, TensorRT graph fusion, IntelMKLDNN graph optimization.
   
   Here is a short overview:
   
   ## cudNN convolution autotuning
   
   * **Description:** runs performance tests for convolutional layers to check what convolutional algorithm types are most performant for the given computation graph
   * **Indicated by**:
   ```
   incubator-mxnet/src/operator/nn/./cudnn/./cudnn_algoreg-inl.h:97:
   Running performance tests to find the best convolution algorithm, 
   this can take a while...
   (set the environment variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disable)
   ```
   * **Optimization time**:  medium (e.g. 9 seconds)
   * **Urgency**: high
   * **Note**: An experimental version for export and loading of cudNN optimization has already been implemented by @KellenSunderland and his team (https://github.com/apache/incubator-mxnet/issues/14539).
   * **Related Issues**:  https://github.com/apache/incubator-mxnet/issues/10567, https://github.com/apache/incubator-mxnet/issues/14539
   
   ## TensorRT graph fusion
   
   * **Description**: attempts to fuse multiple cuda operations into one which saves memory transfer times
   * **Indicated by**: multiple log messages in the case the current GPU does not support fp16.
   ```
   ../src/operator/subgraph/tensorrt/onnx_to_tensorrt.cc:121:
    TensorRT can't use fp16 on this platform
   ```
   * **Optimization time**: long (e.g. 24 seconds)
   * **Urgency**: high
   
   ## MKLDNN graph optimization
   * **Description**: applies MKLDNN optimization to Convolutional and FullyConnected layers and is enabled by `MXNET_SUBGRAPH_BACKEND=MKLDNN`
   * **Indicated by**:
   ```
   src/operator/subgraph/build_subgraph.cc:686:
   start to execute MKLDNN convolution optimization pass.
   src/operator/subgraph/build_subgraph.cc:686:
   start to execute MKLDNN FullyConnected optimization pass.
   ```
   * **Optimization time**: very fast (e.g. < 1 second)
   * **Urgency**: low
   
   ## Experimental support
   
   First I suggest adding two experimental API methods (python, C++,...) for each optimization technique independently which can be called by `model.save_cache()` and `model.load_cache()`:
   These methods are only supported for static graphs. (e.g. after `net.hybridize()` in the case of Gluon).
   
   ```python
   def load_cache(filename, type)
   
   """Load optimization cache from file previously saved by `save_cache`.
   
   Parameters
   ----------
   filename : str
       Path to cache file.
   type : str, default 'cudnn'
       must be in {'cudnn', 'tensorrt', 'mkldnn'}
   References
   ----------
   `Saving and Loading Optimization Cache for Models \
   `_
   """
   
   if type == 'cudnn':
      raise NotImplementedError
      # _load_cudnn_cache(filename)
   elif type == 'tensorrt':
      raise NotImplementedError
      # _load_tensorrt_cache(filename)
   elif type == 'mkldnn':
      raise NotImplementedError
      # _load_mkldnn_cache(filename)
   else:
     raise ValueError("type must be in {'cudnn', 'tensorrt', 'mkldnn'}")
   ```
   
   ```python
   def save_cache(filename, type)
   
   """Saves the optimization cache for the graph.
       Must be run after `model.bind(optimize=True)`.
   
   Parameters
   ----------
   filename : str
       Path to cache file.
   type : str, default 'cudnn'
       must be in {'cudnn', 'tensorrt', 'mkldnn'}
   References
   ----------
   `Saving and Loading Optimization Cache for Models \
   `_
   """
   
   if type == 'cudnn':
      raise NotImplementedError
      # _save_cudnn_cache(filename)
   elif type == 'tensorrt':
      raise NotImplementedError
      # _save_tensorrt_cache(filename)
   elif type == 'mkldnn':
      raise NotImplementedError
      # _save_mkldnn_cache(filename)
   else:
     raise ValueError("type must be in {'cudnn', 'tensorrt', 'mkldnn'}")
   ```
   This addition requires a new boolean parameter for `bind()` methods which is set to `True` by default for backward-compatibility.
   `optimize=True`
   ```python
   def bind(self, ctx, args, args_grad=None, grad_req='write',
                  aux_states=None, group2ctx=None, shared_exec=None, optimize= True):
   """
      # ...
       optimize : boolean, optional
           When set to True, the model is optimized with the current back-end (e.g. cuDNN, TensorRT, MKLDNN)
   """
   ```
   
   ## Automatic integration with `bind()`
   As soon as caching and reloading has surpassed experimental status, we can consider integrating it as an automatic procedure to `bind()`:
   
   As a unified module I imagine the following process:
   
   For every model a unique fingerprint is generated similar to a git commit hash in terms of lenght.
   The last 7 digits of the fingerprint indicate the filename for the cache file.
   It is based on the following ingredients:
   
   For **cudNN convolution autotuning**
   * MXNet version (major, minor)
   * model structure
   * CUDA version (major, minor)
   * cudNN version (major, minor)
   
   For **TensorRT graph optimization:**
   * MXNet version (major, minor)
   * model structure
   * CUDA version (major, minor)
   * cudNN version (major, minor)
   * Tensorrt version (major, minor)
   
   **MKLDNN graph optimization**
   * MXNet version (major, minor)
   * model structure
   * MKLDNN version (major, minor)
   
   On every `bind()` call, MXNet generates the fingerprint and attempts to load the file from the current working directory if it exists.
   If the file was not found or loading failed then the optimization will be run and saved afterwards to `<fingerprint_digits>.cache`.
   
   ---
   
   Is an important detail missing or do you recommend changes in certain aspects (e.g. naming conventions)? 
   I am interested to hear your thought on this.
   
   
   Best regards,
   ~Johannes Czech

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services