You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2018/11/12 06:42:03 UTC

[GitHub] leezu commented on issue #13209: [WIP] Allow dataloader iterator to be reused

leezu commented on issue #13209: [WIP] Allow dataloader iterator to be reused
URL: https://github.com/apache/incubator-mxnet/pull/13209#issuecomment-437773300
 
 
   Thanks for your work @zhreshold . I think you can just remove the `reset()` method. The logic should instead be handled in `_MultiWorkerIter.__iter__` and `_SameProcessIter.__iter__`. That would simplify the use case of your example:
   
   ```
   data_loader = mx.gluon.DataLoader(dataset, batch_size, num_workers=100)
   it = iter(data_loader) # explicitly cache the iterator
   
   for i in range(num_epochs):
       # it.reset() # Deleted. Not necessary anymore.
       for batch in it: # This calls it.__iter__ which resets the iterator
           # training code
   ```
   
   Still does not affect the existing use-case:
   
   ```
   data_loader = mx.gluon.DataLoader(dataset, batch_size, num_workers=100)
   for i in range(epochs):
       for batch in data_loader:
           # training code
   ```
   
   However, I wonder if there is a cleaner way to represent the process pool. The current solution may be a bit confusing to users. It is not very intuitive that the iterator must be cached for good performance. Explicitly specifying the resource (ie. process pool) used for the execution may help to clarify? Are you familiar with the `concurrent.futures` pools? I feel like we are reimplementing a slightly different way of this Python standard libray API here. It may be better to find abstractions that just rely on the Python standard API. It may also be easier to understand for new users? For example, the concept of a executor may be useful. For multiprocessing, there is https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.ProcessPoolExecutor
   
   The last paragraph is not a critic of your PR but I want to raise the point that a overhaul of the API may help improve clarity. We can introduce a dependency on the [backport of `concurrent.futures`](https://www.google.com/search?q=concurrent.futures+python+2&ie=utf-8&oe=utf-8&client=firefox-b-ab) to Py2 for Python 2 users (Python 2 is EOL soon). @szha 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services