You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2018/11/05 21:18:30 UTC

[GitHub] ThomasDelteil opened a new issue #13126: Can't break loop when using > 1 worker with DataLoader

ThomasDelteil opened a new issue #13126: Can't break loop when using > 1 worker with DataLoader
URL: https://github.com/apache/incubator-mxnet/issues/13126
 
 
   ## Description
   
   When using more than 1 worker with a DataLoader and starting the iteration and breaking the loop straight away, the worker crash and throw an error.
   
   ## Environment info (Required)
   
   ```
   MXNet 1.3.1
   ```
   
   ## Minimum reproducible example
   
   ```python
   
   train_dataset = mx.gluon.data.vision.MNIST(train=True).transform_first(mx.gluon.data.vision.transforms.ToTensor())
   train_data = mx.gluon.data.DataLoader(train_dataset, shuffle=True, last_batch='rollover', batch_size=batch_size, num_workers=2)
   ```
   ```
   Exception in thread Thread-5:
   Traceback (most recent call last):
     File "/home/ubuntu/anaconda3/lib/python3.6/threading.py", line 916, in _bootstrap_inner
       self.run()
     File "/home/ubuntu/anaconda3/lib/python3.6/threading.py", line 864, in run
       self._target(*self._args, **self._kwargs)
     File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/mxnet/gluon/data/dataloader.py", line 195, in fetcher_loop
       idx, batch = data_queue.get()
     File "/home/ubuntu/anaconda3/lib/python3.6/multiprocessing/queues.py", line 337, in get
       return _ForkingPickler.loads(res)
     File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/mxnet/gluon/data/dataloader.py", line 57, in rebuild_ndarray
       fd = fd.detach()
     File "/home/ubuntu/anaconda3/lib/python3.6/multiprocessing/resource_sharer.py", line 58, in detach
       return reduction.recv_handle(conn)
     File "/home/ubuntu/anaconda3/lib/python3.6/multiprocessing/reduction.py", line 182, in recv_handle
       return recvfds(s, 1)[0]
     File "/home/ubuntu/anaconda3/lib/python3.6/multiprocessing/reduction.py", line 153, in recvfds
       msg, ancdata, flags, addr = sock.recvmsg(1, socket.CMSG_LEN(bytes_size))
   ConnectionResetError: [Errno 104] Connection reset by peer
   
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services