You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2020/01/30 12:30:37 UTC
[GitHub] [incubator-mxnet] RuRo edited a comment on issue #14373: Passing parameters to HybridBlocks and not using them

RuRo edited a comment on issue #14373: Passing parameters to HybridBlocks and not using them
URL: https://github.com/apache/incubator-mxnet/issues/14373#issuecomment-580229041
 
 
   There is a similar problem when there are unused parameters.
   For example, you can have a model like this:
   ```python
   class Test(mx.gluon.nn.HybridBlock): 
       def __init__(self, mode, *args, **kwargs): 
           super().__init__(*args, **kwargs) 
           self.mode = mode 
           with self.name_scope(): 
               self.d1 = mx.gluon.nn.Dense(2) 
               self.d2 = mx.gluon.nn.Dense(3) 
        
       def hybrid_forward(self, F, x, *args, **kwargs): 
           o1 = self.d1(x) 
           o2 = self.d2(x) 
           if self.mode: 
               return o1 # output path o2 is not used
           else: 
               return o1, o2 
   ```
   
    Currently, this model will not hybridize successfully, when `mode == True`, because the weights in the `o2` path are "unused".
   
   <details>
   
   ```python
   /usr/lib/python3.8/site-packages/mxnet/gluon/block.py:694: UserWarning: Parameter test4_dense1_weight, test4_dense1_bias is not used by any computation. Is this intended?
     out = self.forward(*args)
   ---------------------------------------------------------------------------
   DeferredInitializationError               Traceback (most recent call last)
   /usr/lib/python3.8/site-packages/mxnet/gluon/block.py in _call_cached_op(self, *args)
      1012         try:
   -> 1013             cargs = [args_without_none[i] if is_arg else i.data()
      1014                      for is_arg, i in self._cached_op_args]
   
   /usr/lib/python3.8/site-packages/mxnet/gluon/block.py in <listcomp>(.0)
      1012         try:
   -> 1013             cargs = [args_without_none[i] if is_arg else i.data()
      1014                      for is_arg, i in self._cached_op_args]
   
   /usr/lib/python3.8/site-packages/mxnet/gluon/parameter.py in data(self, ctx)
       564                                "instead." % (self.name, str(ctx), self._stype))
   --> 565         return self._check_and_get(self._data, ctx)
       566 
   
   /usr/lib/python3.8/site-packages/mxnet/gluon/parameter.py in _check_and_get(self, arr_list, ctx)
       230         if self._deferred_init:
   --> 231             raise DeferredInitializationError(
       232                 "Parameter '%s' has not been initialized yet because initialization was " \
   
   DeferredInitializationError: Parameter 'test4_dense0_weight' has not been initialized yet because initialization was deferred. Actual initialization happens during the first forward pass. Please pass one batch of data through the network before accessing Parameters. You can also avoid deferred initialization by specifying in_units, num_features, etc., for network layers.
   
   During handling of the above exception, another exception occurred:
   
   KeyError                                  Traceback (most recent call last)
   /usr/lib/python3.8/site-packages/mxnet/gluon/block.py in _deferred_infer_shape(self, *args)
       973         try:
   --> 974             self.infer_shape(*args)
       975         except Exception as e:
   
   /usr/lib/python3.8/site-packages/mxnet/gluon/block.py in infer_shape(self, *args)
      1074         """Infers shape of Parameters from inputs."""
   -> 1075         self._infer_attrs('infer_shape', 'shape', *args)
      1076 
   
   /usr/lib/python3.8/site-packages/mxnet/gluon/block.py in _infer_attrs(self, infer_fn, attr, *args)
      1070         for i in self.collect_params().values():
   -> 1071             setattr(i, attr, sdict[i.name])
      1072 
   
   KeyError: 'test4_dense1_weight'
   
   During handling of the above exception, another exception occurred:
   
   ValueError                                Traceback (most recent call last)
   <ipython-input-48-a18f0aa96b25> in <module>
   ----> 1 t(mx.nd.array([10]))
   
   /usr/lib/python3.8/site-packages/mxnet/gluon/block.py in __call__(self, *args)
       692             hook(self, args)
       693 
   --> 694         out = self.forward(*args)
       695 
       696         for hook in self._forward_hooks.values():
   
   /usr/lib/python3.8/site-packages/mxnet/gluon/block.py in forward(self, x, *args)
      1150                                      'Find all contexts = {}'.format(ctx_set))
      1151                 with ctx:
   -> 1152                     return self._call_cached_op(x, *args)
      1153             with ctx:
      1154                 try:
   
   /usr/lib/python3.8/site-packages/mxnet/gluon/block.py in _call_cached_op(self, *args)
      1014                      for is_arg, i in self._cached_op_args]
      1015         except DeferredInitializationError:
   -> 1016             self._deferred_infer_shape(*args)
      1017             cargs = []
      1018             for is_arg, i in self._cached_op_args:
   
   /usr/lib/python3.8/site-packages/mxnet/gluon/block.py in _deferred_infer_shape(self, *args)
       976             error_msg = "Deferred initialization failed because shape"\
       977                         " cannot be inferred. {}".format(e)
   --> 978             raise ValueError(error_msg)
       979 
       980     def _call_cached_op(self, *args):
   
   ValueError: Deferred initialization failed because shape cannot be inferred. 'test4_dense1_weight'
   ```
   
   </details>
   
   Having unused parameters is useful since you might want your pretrain/finetune/evaluation networks to behave differently, but be compatible for `.save_parameters` and `.load_parameters` without `allow_missing` and `ignore_extra`.
   
   I think this issue could be fixed without changing the inner workings too much by adding a `F.nodiscard(o2)` operator. It would be a no-op in `nd` mode and would somehow mark the output as a required computation during `sym` mode. Not sure, how feasible something like that is.
   
   My current workaround is something like
   ```python
           return F.broadcast_add(o1, F.sum(0.0 * o2)) # output path o2 is not used
   ```
   which is both really ugly and potentially inefficient, since it forces the unneeded computation.
   
   If the `F.nodiscard` option is too hard to implement, something like
   ```python
   o1 = F.depends_on(o1, o2)
   ```
   could also work. It would basically be the same as `F.broadcast_add(o1, F.sum(0.0 * o2))` but without any computations.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services