You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2020/04/20 20:17:42 UTC

[GitHub] [incubator-mxnet] sxjscience opened a new issue #18116: [Gluon] Gluon load_parameters/save_parameters does not respect the prefix flag and will be problematic when we share the parameters

sxjscience opened a new issue #18116:
URL: https://github.com/apache/incubator-mxnet/issues/18116


   I find that the load/save logic in Gluon does not respect the `prefix` in the network.
   
   Consider the following example, I created two networks, `Foo` and `Foo2`, where they both have one dense layer with `prefix='layer_'` but with different attribute names. One is called `self.l1` and the other is called `self.l2`. At first glance, because these two layers **share the same prefix**, we can share the parameters, i.e., directly load the parameters from `foo` to `foo2`.
   
   However, the following code will trigger an error:
   
   ```python
   import mxnet as mx
   from mxnet.gluon import HybridBlock, nn
   import tempfile
   import os
   mx.npx.set_np()
   
   
   class Foo(HybridBlock):
       def __init__(self, prefix=None, params=None):
           super().__init__(prefix=prefix, params=params)
           with self.name_scope():
               self.l1 = nn.Dense(16, prefix='layer_')
   
       def hybrid_forward(self, F, x):
           return self.l1(x)
   
   
   class Foo2(HybridBlock):
       def __init__(self, prefix=None, params=None):
           super().__init__(prefix=prefix, params=params)
           with self.name_scope():
               self.l2 = nn.Dense(16, prefix='net_')
   
       def hybrid_forward(self, F, x):
           return self.l2(x)
   
   
   foo = Foo()
   foo.initialize()
   foo(mx.np.ones((32, 6)))
   foo2 = Foo2()
   with tempfile.TemporaryDirectory() as dir_path:
       foo.save_parameters(os.path.join(dir_path, 'test.params'))
       foo2.load_parameters(os.path.join(dir_path, 'test.params'))
   ```
   
   Error message:
   ```
   AssertionError: Parameter 'l2.weight' is missing in file '/tmp/tmpkf_n3w3s/test.params', which contains parameters: 'l1.weight', 'l1.bias'. Set allow_missing=True to ignore missing parameters.
   ```
   
   Thus, Gluon is using the attribute name for sharing the parameters. 
   
   To understand the problem, let's consider the following example, in which we create network that has 4 shared dense layers. When we call `save_parameters`, the saved parameters should ideally only contain a single copy of the weights. However, it know contains 4 copies of the weights. This is **not acceptable in the deployment setting** in which we will have hard constraint on the size of the artifact.
   
   ```python
   import mxnet as mx
   from mxnet.gluon import HybridBlock, nn
   import tempfile
   import os
   mx.npx.set_np()
   
   
   class Foo(HybridBlock):
       def __init__(self, prefix=None, params=None):
           super().__init__(prefix=prefix, params=params)
           with self.name_scope():
               self.l1 = nn.Dense(2048, prefix='layer_')
               self.l2 = nn.Dense(2048, params=self.l1.collect_params())
               self.l3 = nn.Dense(2048, params=self.l1.collect_params())
               self.l4 = nn.Dense(2048, params=self.l1.collect_params())
   
       def hybrid_forward(self, F, x):
           return self.l4(self.l3(self.l2(self.l1(x))))
   
   
   class Foo2(HybridBlock):
       def __init__(self, prefix=None, params=None):
           super().__init__(prefix=prefix, params=params)
           with self.name_scope():
               self.l1 = nn.Dense(2048, prefix='layer_')
   
       def hybrid_forward(self, F, x):
           return self.l1(x)
   
   
   foo = Foo()
   foo.initialize()
   foo(mx.np.ones((32, 2048)))
   foo2 = Foo2(params=foo.collect_params())
   
   with tempfile.TemporaryDirectory() as dir_path:
       foo.save_parameters(os.path.join(dir_path, 'foo1_save_parameters.params'))
       foo2.save_parameters(os.path.join(dir_path, 'foo2_save_parameters.params'))
       print('Keys by collect_params():', foo.collect_params().keys())
       print('Keys by loading the shared parameters:', mx.npx.load(os.path.join(dir_path, 'foo1_save_parameters.params')).keys())    
       print('Four shared layer artifact size:', os.stat(os.path.join(dir_path, 'foo1_save_parameters.params')).st_size)
       print('One layer artifact size:', os.stat(os.path.join(dir_path, 'foo2_save_parameters.params')).st_size)
   
   ```
   
   Output as follows. We can see that the size of `foo.save_parameters()` will be 4 times the size of `foo2.save_parameters()`. However, these two should be the same.
   ```
   Keys by collect_params(): odict_keys(['foo3_layer_weight', 'foo3_layer_bias'])
   Keys by loading the shared parameters: dict_keys(['l1.weight', 'l1.bias', 'l2.weight', 'l2.bias', 'l3.weight', 'l3.bias', 'l4.weight', 'l4.bias'])
   Four shared layer artifact size: 67142080
   One layer artifact size: 16785544
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] leezu commented on issue #18116: [Gluon] Gluon load_parameters/save_parameters does not respect the prefix flag and will be problematic when we share the parameters

Posted by GitBox <gi...@apache.org>.
leezu commented on issue #18116:
URL: https://github.com/apache/incubator-mxnet/issues/18116#issuecomment-616837981


   Let's track changing the default for MXNet 2 in this issue


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] leezu commented on issue #18116: [Gluon] Gluon load_parameters/save_parameters does not respect the prefix flag and will be problematic when we share the parameters

Posted by GitBox <gi...@apache.org>.
leezu commented on issue #18116:
URL: https://github.com/apache/incubator-mxnet/issues/18116#issuecomment-616807687


   Fixed by https://github.com/apache/incubator-mxnet/commit/8c44af4eba798b379c374c15582f9aea7dd7d8fd ?
   
   Need to use `deduplicate=True`
   
   We should make this default in MXNet 2


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] sxjscience commented on issue #18116: [Gluon] Gluon load_parameters/save_parameters deduplicate by default

Posted by GitBox <gi...@apache.org>.
sxjscience commented on issue #18116:
URL: https://github.com/apache/incubator-mxnet/issues/18116#issuecomment-617007019


   @leezu Should we submit a PR to change the default behavior? I think we should fix it as early as possible because we rely on `save_parameters()` to generate the model zoos.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] sxjscience commented on issue #18116: [Gluon] Gluon load_parameters/save_parameters does not respect the prefix flag and will be problematic when we share the parameters

Posted by GitBox <gi...@apache.org>.
sxjscience commented on issue #18116:
URL: https://github.com/apache/incubator-mxnet/issues/18116#issuecomment-616816136


   Confirmed that using `save_parameters(..., deduplicate=True)` will solve this problem.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org