You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2019/10/04 22:58:15 UTC

[GitHub] [incubator-mxnet] leezu opened a new issue #16375: [RFC] Deferred compute in imperative interface to unify imperative and symbolic interface

leezu opened a new issue #16375: [RFC] Deferred compute in imperative interface to unify imperative and symbolic interface
URL: https://github.com/apache/incubator-mxnet/issues/16375
 
 
   A new **deferred computation** (DC) argument to the imperative MXNet APIs is
   proposed. If enabled, memory allocation and computation is deferred as long as
   possible. Users can export the computational graph recorded during deferred
   computation, which enables hybridization support.
   
   Arrays for which DC is enabled are called **lazy**. Other arrays are called
   **normal**. Inplace operations on lazy arrays are unsupported.
   
   Storage allocation and computation for lazy arrays is deferred until their
   results are required by conversion to numpy or use as input to an operator
   creating a normal array. Accessing attributes such as `shape` can also trigger
   computation if the attribute can't be inferred.
   
   ## C API
   
   ### Deferred Compute (DC) Mode
   
   An “alias” to `MXImperativeInvokeEx`, `MXImperativeDeferredInvokeEx` is
   introduced which creates lazy arrays based on (normal or lazy) input arrays and
   the operator
   
   ``` c
   /*!
    * \brief invoke a nnvm op and imperative function creating lazy ndarray
    * \param creator the op
    * \param num_inputs number of input NDArrays
    * \param inputs input NDArrays
    * \param num_outputs number of output NDArrays
    * \param outputs output NDArrays
    * \param num_params number of keyword parameters
    * \param param_keys keys for keyword parameters
    * \param param_vals values for keyword parameters
    * \param out_stypes output ndarrays' stypes
    * \return 0 when success, -1 when failure happens
    */
   MXNET_DLL int MXImperativeDeferredInvokeEx(AtomicSymbolCreator creator,
                                              int num_inputs,
                                              NDArrayHandle *inputs,
                                              int *num_outputs,
                                              NDArrayHandle **outputs,
                                              int num_params,
                                              const char **param_keys,
                                              const char **param_vals,
                                              const int **out_stypes);
   ```
   
   ### Checks and explicit trigger
   
   ``` c
   /*!
    * \brief Check if array's computation is deferred.
    * \param handles ndarray handles to be checked
    * \param num_handles nmuber of ndarray handles to be checked
    * \param status pointer to array of num_handles integers to hold the result.
    */
   MXNET_DLL int MXNDArrayGetIsDeferredCompute(NDArrayHandle *handles,
                                                int num_handles,
                                                int *status);
   /*!
    * \brief Trigger deferred computation.
    * \param handles ndarray handles to trigger comuptation of.
    * \param num_handles nmuber of ndarray handles to be checked
    *
    * Deferred computation of input arrays for specified handles is triggered if
    * required. Arrays that are already computed are ignored.
    */
    MXNET_DLL int MXNDArrayTriggerDeferredCompute(NDArrayHandle *handles,
                                                 int num_handles);
   ```
   
   ### Exporting to symbol
   
   The computational graph recorded in deferred computation mode can be exported to
   symbol. Users must specify all inputs and outputs, to define the part of the
   graph they are interested in exporting.
   
   It is an error, if any of the output depends on an input is not or cannot be
   computed from the specified inputs. Equally, providing an input that is not
   connected to any output is an error.
   
   ``` C
   /*!
    * \brief Extract the graph constructed during deferred computation mode as a
    * Symbol.
    * \param input_handles ndarray handles of inputs
    * \param output_handles ndarray handles of outputs
    * \param input_names names associated with the inputs of the returned Symbol
    * \param output_names names associated with the outputs of the returned Symbol
    * \param out grouped output symbol handle
    *
    * Construct a Symbol for the subgraph of the deferred computation graph
    * spanning from the input_handles to the output_handles. Requires that
    * input_handles and output_handles are connected in the tracked computational
    * graph. The input_handles are required to have been used as arguments to an
    * operator that is part of the tracked subgraph. All inputs of the
    * computational graph must be specified.
    */
   MXNET_DLL int MXNDArrayGetDeferredComputeSymbol(NDArrayHandle *input_handles,
                                                   NDArrayHandle *output_handles,
                                                   const char** input_names,
                                                   const char** output_names,
                                                   int num_inputs,
                                                   int num_outputs,
                                                   SymbolHandle *out);
   ```
   
   **Basic Python usage example**
   Example without Gluon.
   
   ``` python
   x = mx.np.arange(shape=(8, 10))
   with deferred_compute():
       y = (x + 5) * (x + 5)
       z = x**2
   s = export(inputs={'x': x}, outputs={'y': y, 'z': z})
   assert s.list_inputs() == ['x']
   assert s.list_outputs() == ['y', 'z']
   ```
   
   
   
   ## Implementation (C++)
   
   ### `NDArray`
   
   ``` C++
   class NDArray {
    public:
   
      [...]
   
     /*!
      * \brief constructs a new dynamic NDArray
      * \param shape the shape of array
      * \param ctx context of NDArray
      * \param `delay_alloc whether delay the allocation (True for DC mode)`
      * \param dtype data type of this ndarray
      */
     NDArray(const mxnet::TShape &shape, Context ctx,
             bool delay_alloc = false, int dtype = mshadow::default_type_flag)
         : ptr_(std::make_shared<Chunk>(shape, ctx, delay_alloc, dtype)),
           shape_(shape),
           dtype_(dtype),
           storage_type_(kDefaultStorage),
           autograd_entry_(nullptr) {
     }
   
      [...]
      
     /*!
      * \brief Block until all the pending write operations with respect
      *    to current NDArray are finished, and read can be performed.
      *
      * If this is a array with deferred computation, computation is triggered.
      */
     inline void WaitToRead() const;
     /*!
      * \brief Block until all the pending read/write operations with respect
      *    to current NDArray are finished, and write can be performed.
      *
      * If this is a array with deferred computation, computation is triggered.
      */
     inline void WaitToWrite() const;
   
      [...]
   
    private:
   
      [...]
   
     /*! \brief node entry for autograd */
     nnvm::NodeEntry autograd_entry_;  // renamed from entry_
     /*! \brief node entry for deferred computation tracking */
     nnvm::NodeEntry deferredcompute_entry_;
     
     /*!
      * \brief Perform deferred computation.
      *
      * Applicable if current array is associated with deferredcompute_entry_ and
      * DCInfo. If so, compute this and all dependent NDArrays.
      *
      * Triggered automatically if needed by WaitToRead
      */
     void DeferredCompute() const;
   
      [...]
   
   };
   ```
   
   ### `DCInfo`
   
   ``` C++
     /*! \brief DCInfo stores NDArays required to perform the deferred computation
      *  of it's owning NDArray.
      *
      *  Once deferred computation is completed, DCInfo::Clear should be executed
      *  to release references to input data.
      */
     class DCInfo {
     public:
       DCInfo(std::vector<NDArray> inputs) inputs_(inputs);
   
       static DCInfo& Get(const nnvm::NodePtr& node) {
         return dmlc::get<DCInfo>(node->info);
       }
   
       static void Clear(const nnvm::NodePtr& node) {
         if (node == nullptr || node->info.empty()) return;
         DCInfo& info = Get(node);
         node->info.clear();
       }
   
       static DCInfo& Create(const nnvm::NodePtr& node) {
         node->info.construct<DCInfo>();
         return Get(node);
       }
   
     private:
       std::vector<NDArray> inputs_;
   
     };
   ```
   
   ### Execution
   
   **Trigger execution from within `NDArray`**
   
   `NDArray::WaitToRead` and `NDArray::WaitToWrite` are extended to trigger
   execution, calling `NDArray::TriggerDeferredCompute`. `TriggerDeferredCompute`
   is a no-op if no `DCInfo` is associated with the current array, ie. if it is
   already computed.
   
   **Explicit C API to trigger execution**
   Users can also manually trigger the computation of specified arrays.
   
   ``` C
   MXNET_DLL int MXNDArrayTriggerDeferredComputation(NDArrayHandle *handles)
   ```
   
   **Implementation**
   Operations on the graph are pushed to the engine for asynchronous execution via RunGraph.
   
   ### FAQ
   
   **How about Autograd, `NDArray.autograd_entry_` and `AGInfo`?**
   Autograd inside deferred computation (DC) mode can be supported.
   
   Relation of Autograd and DC: While autograd’s `RecordOp` provides a similar
   recording functionality to the deferred computation, the autograd graph is not
   the same as a computational graph: `NDArray::Detach()` serves to detach a node
   from the autograd graph by deleting `NDArray.entry_`, though the `NodeEntry` is
   still required for reconstructing the computational history of how this NDArray
   came to be.
   
   **Are reqs like `kInPlace` supported?**
   No. For now only `kWriteTo` is supported in DC mode.
   
   The plan is to replace inplace operations with `kWriteTo` operations, writing to
   a new (lazy) array. The framework should be smart enough to decide when to reuse
   memory and when not. It shouldn’t be required for users to specify that they
   want an inplace operation.
   
   **How is context attribute handled, specifically context changes?**
   
   Cross-device copy must be represented as operator (`CrossDeviceCopyOp`) which
   requires special handling in the graph executor.
   
   **How is incomplete shape information handled?**
   `shape` property triggers computation if shape is accessed and can't be inferred completely.
   Users can access `static_shape` if they wan't to avoid triggering computation.
   
   ## Python (Gluon)
   
   Based on DC, hybridization in Gluon is simplified:
   
   Instead of implementing `def hybrid_forward(self, F, x, ...)` in `HybridBlock`,
   users can opt to implement `def forward(self, x, ...)` in `HybridBlock`.
   
   Hybridization based on DC works by the HybridBlock performing the following
   steps (if it is not called by a parent block being hybridized)
   
   - keeping a reference to the input arrays and a reference to the parameter
     arrays to pass them to `MXNDArrayGetDeferredComputeSymbol`;
   - enabling deferred compute mode
   - running `forward`
   - exporting to symbol and create CachedOp; Run CachedOp
   
   
   A (internal) global context variable tracks if hybridization is ongoing. If set
   to False and a Block is called that is to be hybridized, the global context
   variable is set to True and the Block goes through all 4 steps outlined above;
   finally the context variable is set back to False after the *export to Symbol*
   step is finished.
   
   **Usage example**
   
   ``` python
   class Net(nn.HybridBlock):  
       def forward(self, x, ...):
           ...
   ```
   
   **Hybridizing `gluon.Block`s?**
   
   DC could be used to support hybridzing `Block` if all logic can be traced. A
   separate effort may add logic to detect these cases and add hybridization
   support based on DC. For now we rely on user to signify hybridization support by
   subclassing `HybridBlock`.
   
   ### Parameter Shape Inference
   
   For HybridBlock making use of DC for hybridization, we request users to
   implement `HybridBlock.infer_shape` to infer the parameters shape given the
   inputs.
   
   Currently, if `HybridBlock.infer_shape` is not implemented, backward shape
   inference is used to infer the shape of parameters. However backward shape
   inference is not supported in all cases (cf #14253,
   https://github.com/apache/incubator-mxnet/issues/14983#issuecomment-493729329)
   and relying on it for parameter shape inference is brittle. Thus for consistency
   and simplicity we require `infer_shape` method implementation when using
   hybridization based on DC.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services