You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2018/06/13 18:51:15 UTC

[GitHub] ThomasDelteil commented on issue #11243: weird gpu memory usage

ThomasDelteil commented on issue #11243: weird gpu memory usage
URL: https://github.com/apache/incubator-mxnet/issues/11243#issuecomment-397046550
 
 
   @dwSun MXNet is built using asynchronous operations. When you load data, run a forward pass or backward pass, the operations are enqueued on the MXNet backend and executed when the parent dependencies are available. 
   
   With your current script, there is no blocking operations, so the training runs through your epoch and keep adding "copy to GPU" operations. These operations don't have parent dependencies and can be executed immediately.  the actual training isn't completed when you reach the end of your epoch loop. After a few epoch you will clog up your GPU memory.
   
   `print(total_train_loss.asscalar()/training_samples)`
   
   `.asscalar()` is `.asnumpy()[0]`, this causes a synchronous operation to copy the memory back to the CPU. When you add this line, the training isn't "slow" it is the normal speed, because every 500 iterations, your network is going to wait for the computation to be completed and return the result to the CPU. If your dataset is small and fit in GPU memory, you can have a `mx.nd.waitall` at the end of your epoch, but that means your entire dataset will be copied to GPU. This makes it pretty fast since at the beginning of each batch the data is already available. However you might run into OOM error, in that case you can for example keep track of your loss using `loss_acc += loss.sum().asscalar()` and forcing a copy to CPU on every batch.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services