You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mxnet.apache.org by GitBox <gi...@apache.org> on 2020/11/21 01:36:19 UTC
[GitHub] [incubator-mxnet] chinakook opened a new issue #19574: symbol.tojson memory leak
chinakook opened a new issue #19574:
URL: https://github.com/apache/incubator-mxnet/issues/19574
## Description
Current symbol's tojson API have memory leak. I found this from debug with https://github.com/dmlc/gluon-cv/blob/849411ed56632cd854850b07142087d599f97dcb/gluoncv/data/batchify.py#L396 . The memory will keep growing in a multiprocessing training environment because of frequently pickling symbol with tojson() may cause memory leak.
## To Reproduce
Mini example to reproduce this memory leak:
```
import mxnet as mx
from mxnet.gluon.model_zoo.vision import resnet101_v1
net = resnet101_v1()
net.collect_params().initialize(ctx=mx.gpu(0))
for i in range(2000):
y = net(mx.sym.var(name='data'))
y.tojson()
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org
[GitHub] [incubator-mxnet] austinmw edited a comment on issue #19574: symbol.tojson memory leak
Posted by GitBox <gi...@apache.org>.
austinmw edited a comment on issue #19574:
URL: https://github.com/apache/incubator-mxnet/issues/19574#issuecomment-773807199
@chinakook Hi, do you happen to know how I can patch `FasterRCNNTrainBatchify` to resolve this (if that's possible)?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org
[GitHub] [incubator-mxnet] chinakook commented on issue #19574: symbol.tojson memory leak
Posted by GitBox <gi...@apache.org>.
chinakook commented on issue #19574:
URL: https://github.com/apache/incubator-mxnet/issues/19574#issuecomment-773867205
> @chinakook Hi, do you happen to know how I can patch `FasterRCNNTrainBatchify` to resolve this (if that's possible)?
```
def __getstate__(self):
return self._num_shards, self._img_pad, self._label_pad, self.NUM_ELEMENTS, \
[x.tojson(remove_amp_cast=False) for x in self._feat_sym]
def __setstate__(self, state):
self._num_shards, self._img_pad, self._label_pad, self.NUM_ELEMENTS, json_strs = state
self._feat_sym = tuple([mx.symbol.load_json(x) for x in json_strs])
```
Adding the code into ```class FasterRCNNTrainBatchify``` can solve it.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org
[GitHub] [incubator-mxnet] austinmw edited a comment on issue #19574: symbol.tojson memory leak
Posted by GitBox <gi...@apache.org>.
austinmw edited a comment on issue #19574:
URL: https://github.com/apache/incubator-mxnet/issues/19574#issuecomment-773807199
@chinakook Hi, do you happen to know how I can patch `FasterRCNNTrainBatchify` to resolve this (if that's possible)?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org
[GitHub] [incubator-mxnet] austinmw commented on issue #19574: symbol.tojson memory leak
Posted by GitBox <gi...@apache.org>.
austinmw commented on issue #19574:
URL: https://github.com/apache/incubator-mxnet/issues/19574#issuecomment-773807199
@chinakook Hi, do you happen to know how I can patch `FasterRCNNTrainBatchify` to resolve this?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org
[GitHub] [incubator-mxnet] austinmw commented on issue #19574: symbol.tojson memory leak
Posted by GitBox <gi...@apache.org>.
austinmw commented on issue #19574:
URL: https://github.com/apache/incubator-mxnet/issues/19574#issuecomment-773807199
@chinakook Hi, do you happen to know how I can patch `FasterRCNNTrainBatchify` to resolve this?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org
[GitHub] [incubator-mxnet] chinakook commented on issue #19574: symbol.tojson memory leak
Posted by GitBox <gi...@apache.org>.
chinakook commented on issue #19574:
URL: https://github.com/apache/incubator-mxnet/issues/19574#issuecomment-773867205
> @chinakook Hi, do you happen to know how I can patch `FasterRCNNTrainBatchify` to resolve this (if that's possible)?
```
def __getstate__(self):
return self._num_shards, self._img_pad, self._label_pad, self.NUM_ELEMENTS, \
[x.tojson(remove_amp_cast=False) for x in self._feat_sym]
def __setstate__(self, state):
self._num_shards, self._img_pad, self._label_pad, self.NUM_ELEMENTS, json_strs = state
self._feat_sym = tuple([mx.symbol.load_json(x) for x in json_strs])
```
Adding the code into ```class FasterRCNNTrainBatchify``` can solve it.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org
[GitHub] [incubator-mxnet] austinmw commented on issue #19574: symbol.tojson memory leak
Posted by GitBox <gi...@apache.org>.
austinmw commented on issue #19574:
URL: https://github.com/apache/incubator-mxnet/issues/19574#issuecomment-774101192
@chinakook Thanks for your response! I tried this patch, but the leak still occurs. Happen to know if it's possible whether another class like `ForwardBackwardTask` or `FasterRCNNDefaultTrainTransform` needs modification too?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org
[GitHub] [incubator-mxnet] chinakook commented on issue #19574: symbol.tojson memory leak
Posted by GitBox <gi...@apache.org>.
chinakook commented on issue #19574:
URL: https://github.com/apache/incubator-mxnet/issues/19574#issuecomment-775808569
@austinmw However, this fix are reverted in 1.x.
https://github.com/dmlc/gluon-cv/pull/1528
After my fixes , The gluoncv will be normally to train with MXNet 1.x. So, It's time to re-merge the memory fix again.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org
[GitHub] [incubator-mxnet] chinakook commented on issue #19574: symbol.tojson memory leak
Posted by GitBox <gi...@apache.org>.
chinakook commented on issue #19574:
URL: https://github.com/apache/incubator-mxnet/issues/19574#issuecomment-775804734
@austinmw I used a modified mxnet 2.0 version with all memory bugs fixed by @leezu. There are no leask with that patch.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org
[GitHub] [incubator-mxnet] szha commented on issue #19574: symbol.tojson memory leak
Posted by GitBox <gi...@apache.org>.
szha commented on issue #19574:
URL: https://github.com/apache/incubator-mxnet/issues/19574#issuecomment-774106076
cc @mk-61
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org