You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2018/01/05 05:37:26 UTC

[GitHub] jeremiedb commented on issue #7968: [R] Transfer Learning using VGG-16

jeremiedb commented on issue #7968: [R] Transfer Learning using VGG-16
URL: https://github.com/apache/incubator-mxnet/issues/7968#issuecomment-355477039

Just made a few tests with different ResNet models and I also experienced crashes.

Issue appears tied with a memory that isn't released during training. No problem with ResNet34 or 50, but it got problematic with 101. Have you looked at the GPU usage immediatly after launching the training (nvidia-smi) to confirm you have same issue?

I also noticed apparent memory leak when running large embeddings. A quick turnaround is to add a gc() within the training loop after each couple of batch (not necessary to add a gc() within the eval data loop). You can do it either in `mx.model.FeedForward.create` or `mx.model.buckets` (I only used the later but should work for the usual training function). Good news is that it doesn't slow down noticeably the training and finetune ResNet101 wasn't crashing anymore and GPU memory remained below 4Go on 8 samples.

@thirdwing Any idea whether this memory issue could better be handled than with gc()? If performance isn't affected, I wonder if a quick PR with the gc() would be worth.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

With regards,
Apache Git Services