You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2018/11/28 23:52:45 UTC

[GitHub] ThomasDelteil edited a comment on issue #13447: Rewrite dataloader, improves responsiveness and reliability

ThomasDelteil edited a comment on issue #13447: Rewrite dataloader, improves responsiveness and reliability
URL: https://github.com/apache/incubator-mxnet/pull/13447#issuecomment-442650646
 
 
   Thanks that's a massive improvements, especially for datasets that gets iterated in a small number of iterations (small datasets or big batch sizes). The killing and respawning of workers was one reason why I was questioning more and more the fact that Gluon is privileging the concept of epochs vs number of iterations in measuring progress through training for checkpointing, iterating, etc.
   
   The next improvements I would love to see for the data loader would be the cold start issue at beginning of epoch:
   - prefetching the first batch of the next epoch before the previous epoch is completed.
   - Or when the queue is empty, first batch should be prefetched in a distributed manner across workers to avoid cold start problem on large batch-sizes. But that would become secondary if the first point is addressed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services