You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2019/07/27 17:25:25 UTC

[GitHub] [incubator-mxnet] Neutron3529 commented on issue #15655: Performance regression for gluon dataloader with large batch size

Neutron3529 commented on issue #15655: Performance regression for gluon dataloader with large batch size
URL: https://github.com/apache/incubator-mxnet/issues/15655#issuecomment-515700522
 
 
   > 
   > 
   > Data loaders are different from Data iterators.
   > Data iterators first load all the data in to the memory before iterating it, while data loaders load data into memory as they iterate the data, hence they are slower than data iterators.
   > 
   > **So why we use data loaders if data iterators are faster?**
   > Some times when we deal with massive datasets that can't be loaded into memory, so for these data sets data iterators wouldn't work.
   > In practice we should use data iterators if our dataset can be loaded into memory (for example small datasets), otherwise we have to use data loaders.
   > For more info regarding **Data iterators Vs Data loaders** check [this](https://mxnet.incubator.apache.org/versions/master/architecture/note_data_loading.html) out.
   
   Maybe you are right.
   BUT... Did you test how slow the DataLoader could be?
   It tooks 18s to finish a loop
   ```
   import mxnet as mx
   from mxnet import nd
   def data_xform(data):
       """Move channel axis to the beginning, cast to float32, and normalize to [0, 1]."""
       return nd.moveaxis(data, 2, 0).astype('float32') / 255
   
   train_data = mx.gluon.data.vision.MNIST(train=True).transform_first(data_xform)
   val_data = mx.gluon.data.vision.MNIST(train=False).transform_first(data_xform)
   batch_size = 100#set to 10000 produce the same result.
   train_loader = mx.gluon.data.DataLoader(train_data, shuffle=True, batch_size=batch_size)
   val_loader = mx.gluon.data.DataLoader(val_data, shuffle=False, batch_size=batch_size)
   for i,j in train_loader:
     pass
   ```
   It took 18s for me to finish the final loop. (with both mxnet-cu100mkl and mxnet-mkl tested)
   It is not the data loader **slower** than data iter

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services