You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mxnet.apache.org by GitBox <gi...@apache.org> on 2022/03/28 18:00:14 UTC

[GitHub] [incubator-mxnet] ptrendx commented on issue #20959: GPU memory leak when using gluon.data.DataLoader with num_workers>0

ptrendx commented on issue #20959:
URL: https://github.com/apache/incubator-mxnet/issues/20959#issuecomment-1080969760


   The workaround skips the clean up for all engines, not just the NaiveEngine. 
   
   So, the general problem here is that when you create the dataloader, it creates a pool of workers by forking the main process, which creates a copy of everything, including the engine and the resources held by it. Then the forked process destroys this copy of the engine to become a much leaner dataloader worker. This would normally destroy the stream engine uses, but with the workaround commit in place, the destruction of the stream does not happen. Now, the problem is that CUDA does not in fact survive forking and the fact that it seems to work is just a lucky coincidence. That is why the spawn method should be used to fix the dataloader - with that the worker processes do not inherit anything from the parent and start from a clean state - with nothing copied to destroy.
   In principle in the end it should work the same way as currently, via shared memory so there should be no visible differences compared to the current way of things (if anything, it should actually work slightly faster, since it would not need to spend the time to destroy the copied engine during the dataloader construction). I guess the error that @TristonC encounters means that there is some additional issue in the dataloader that it somehow depends on some copied variable from the parent process in order to initiate the communication channel with the parent. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org