You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mxnet.apache.org by GitBox <gi...@apache.org> on 2020/11/21 01:36:19 UTC

[GitHub] [incubator-mxnet] chinakook opened a new issue #19574: symbol.tojson memory leak

chinakook opened a new issue #19574:
URL: https://github.com/apache/incubator-mxnet/issues/19574


   ## Description
   Current symbol's tojson API have memory leak. I found this from debug with https://github.com/dmlc/gluon-cv/blob/849411ed56632cd854850b07142087d599f97dcb/gluoncv/data/batchify.py#L396 . The memory will keep growing in a multiprocessing training environment because of frequently pickling symbol with tojson() may cause memory leak.
   
   ## To Reproduce
   Mini example to reproduce this memory leak:
   ```
   import mxnet as mx
   from mxnet.gluon.model_zoo.vision import resnet101_v1 
   
   net = resnet101_v1()
   net.collect_params().initialize(ctx=mx.gpu(0))
   
   for i in range(2000):
       y = net(mx.sym.var(name='data'))
       y.tojson()
   ```
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org


[GitHub] [incubator-mxnet] austinmw edited a comment on issue #19574: symbol.tojson memory leak

Posted by GitBox <gi...@apache.org>.
austinmw edited a comment on issue #19574:
URL: https://github.com/apache/incubator-mxnet/issues/19574#issuecomment-773807199


   @chinakook Hi, do you happen to know how I can patch `FasterRCNNTrainBatchify` to resolve this (if that's possible)?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org


[GitHub] [incubator-mxnet] chinakook commented on issue #19574: symbol.tojson memory leak

Posted by GitBox <gi...@apache.org>.
chinakook commented on issue #19574:
URL: https://github.com/apache/incubator-mxnet/issues/19574#issuecomment-773867205


   > @chinakook Hi, do you happen to know how I can patch `FasterRCNNTrainBatchify` to resolve this (if that's possible)?
   ```
       def __getstate__(self):
           return self._num_shards, self._img_pad, self._label_pad, self.NUM_ELEMENTS, \
                   [x.tojson(remove_amp_cast=False) for x in self._feat_sym]
   
       def __setstate__(self, state):
           self._num_shards, self._img_pad, self._label_pad, self.NUM_ELEMENTS, json_strs = state
           self._feat_sym = tuple([mx.symbol.load_json(x) for x in json_strs])
   ```
   Adding the code into ```class FasterRCNNTrainBatchify``` can solve it.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org


[GitHub] [incubator-mxnet] austinmw edited a comment on issue #19574: symbol.tojson memory leak

Posted by GitBox <gi...@apache.org>.
austinmw edited a comment on issue #19574:
URL: https://github.com/apache/incubator-mxnet/issues/19574#issuecomment-773807199


   @chinakook Hi, do you happen to know how I can patch `FasterRCNNTrainBatchify` to resolve this (if that's possible)?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org


[GitHub] [incubator-mxnet] austinmw commented on issue #19574: symbol.tojson memory leak

Posted by GitBox <gi...@apache.org>.
austinmw commented on issue #19574:
URL: https://github.com/apache/incubator-mxnet/issues/19574#issuecomment-773807199


   @chinakook Hi, do you happen to know how I can patch `FasterRCNNTrainBatchify` to resolve this?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org


[GitHub] [incubator-mxnet] austinmw commented on issue #19574: symbol.tojson memory leak

Posted by GitBox <gi...@apache.org>.
austinmw commented on issue #19574:
URL: https://github.com/apache/incubator-mxnet/issues/19574#issuecomment-773807199


   @chinakook Hi, do you happen to know how I can patch `FasterRCNNTrainBatchify` to resolve this?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org


[GitHub] [incubator-mxnet] chinakook commented on issue #19574: symbol.tojson memory leak

Posted by GitBox <gi...@apache.org>.
chinakook commented on issue #19574:
URL: https://github.com/apache/incubator-mxnet/issues/19574#issuecomment-773867205


   > @chinakook Hi, do you happen to know how I can patch `FasterRCNNTrainBatchify` to resolve this (if that's possible)?
   ```
       def __getstate__(self):
           return self._num_shards, self._img_pad, self._label_pad, self.NUM_ELEMENTS, \
                   [x.tojson(remove_amp_cast=False) for x in self._feat_sym]
   
       def __setstate__(self, state):
           self._num_shards, self._img_pad, self._label_pad, self.NUM_ELEMENTS, json_strs = state
           self._feat_sym = tuple([mx.symbol.load_json(x) for x in json_strs])
   ```
   Adding the code into ```class FasterRCNNTrainBatchify``` can solve it.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org


[GitHub] [incubator-mxnet] austinmw commented on issue #19574: symbol.tojson memory leak

Posted by GitBox <gi...@apache.org>.
austinmw commented on issue #19574:
URL: https://github.com/apache/incubator-mxnet/issues/19574#issuecomment-774101192


   @chinakook Thanks for your response! I tried this patch, but the leak still occurs. Happen to know if it's possible whether another class like `ForwardBackwardTask` or `FasterRCNNDefaultTrainTransform` needs modification too?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org


[GitHub] [incubator-mxnet] chinakook commented on issue #19574: symbol.tojson memory leak

Posted by GitBox <gi...@apache.org>.
chinakook commented on issue #19574:
URL: https://github.com/apache/incubator-mxnet/issues/19574#issuecomment-775808569


   @austinmw However, this fix are reverted in 1.x. 
   https://github.com/dmlc/gluon-cv/pull/1528
   After my fixes , The gluoncv will be normally to train with MXNet 1.x. So, It's time to re-merge the memory fix again.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org


[GitHub] [incubator-mxnet] chinakook commented on issue #19574: symbol.tojson memory leak

Posted by GitBox <gi...@apache.org>.
chinakook commented on issue #19574:
URL: https://github.com/apache/incubator-mxnet/issues/19574#issuecomment-775804734


   @austinmw I used a modified mxnet 2.0 version with all memory bugs fixed by @leezu. There are no leask with that patch.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org


[GitHub] [incubator-mxnet] szha commented on issue #19574: symbol.tojson memory leak

Posted by GitBox <gi...@apache.org>.
szha commented on issue #19574:
URL: https://github.com/apache/incubator-mxnet/issues/19574#issuecomment-774106076


   cc @mk-61 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org