You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2019/01/30 20:05:24 UTC

[GitHub] zhreshold commented on issue #13945: dataloader crashes with threads and slow downs with processes

zhreshold commented on issue #13945: dataloader crashes with threads and slow downs with processes
URL: https://github.com/apache/incubator-mxnet/issues/13945#issuecomment-459089100

@mfiore

Is your dataset small, do you know how many batches in each epoch? If it's small, the prefetching step will push all workloads and you will need to wait until the first worker finish its job.

Regarding your question,

1. The async ret value is a `multiprocessing.pool.AsyncResult` object, it's only synchronized when ret.get() is called (line 443)
2. As mentioned in 1., you cannot check the results in self._data_buffer because they are filled with async objects
3. `The problem seem to be related to the "next" statement at line 426`, seems like the crash is related to the dataset itself, not dataloader. Since you are using RecordIO, one possible reason is that the record file seek function is not threadsafe, as in this line https://github.com/apache/incubator-mxnet/blob/master/python/mxnet/recordio.py#L268, a temporary solution is to add a mutex to guard this line: https://github.com/apache/incubator-mxnet/blob/master/python/mxnet/recordio.py#L313

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

With regards,
Apache Git Services