You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2019/01/20 14:39:03 UTC

[GitHub] OElesin opened a new issue #13942: Error using executing MXNet DataLoader

OElesin opened a new issue #13942: Error using executing MXNet DataLoader
URL: https://github.com/apache/incubator-mxnet/issues/13942
 
 
   When running the DataLoader I encounter a certain error which works from my MacBook and AWS Sagemaker. However, fails when I run the same with AWS Batch (which runs jobs in docker containers). See code below:
   ```python
       data_loader = DataLoader(
           dataset, batch_size=BATCH_SIZE, last_batch='keep',
           shuffle=False, num_workers=multiprocessing.cpu_count()
       )
       for i, (data, label) in enumerate(data_loader):
           data = data.as_in_context(ctx)
           if i % n_print == 0 and i > 0:
               print(
                   "{0} batches, {1} images, {2:.3f} img/sec".format(
                       i, i*BATCH_SIZE, BATCH_SIZE*n_print/(time.time()-tick)
                   )
               )
               tick = time.time()
           output = net(data)
           features[i * BATCH_SIZE:(i+1)*max(BATCH_SIZE, len(output)), :] = output.asnumpy().squeeze()
   ```
   
   Error message:
   ```python
   save(x)
   File "/usr/lib/python2.7/pickle.py", line 286, in save
   f(self, obj) # Call unbound method with explicit self
   File "/usr/lib/python2.7/multiprocessing/forking.py", line 66, in dispatcher
   rv = reduce(obj)
   File "/usr/local/lib/python2.7/dist-packages/mxnet/gluon/data/dataloader.py", line 43, in reduce_ndarray
   return rebuild_ndarray, data._to_shared_mem()
   File "/usr/local/lib/python2.7/dist-packages/mxnet/ndarray/ndarray.py", line 200, in _to_shared_mem
   self.handle, ctypes.byref(shared_pid), ctypes.byref(shared_id)))
   File "/usr/local/lib/python2.7/dist-packages/mxnet/base.py", line 149, in check_call
   raise MXNetError(py_str(_LIB.MXGetLastError()))
   MXNetError: [14:48:14] src/operator/tensor/../tensor/elemwise_unary_op.h:301: Check failed: inputs[0].dptr_ == outputs[0].dptr_ (0x7fe0beffc040 vs. 0x7fe0bf001600)
   Stack trace returned 10 entries:
   [bt] (0) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x17ec9d) [0x7fe11ec74c9d]
   [bt] (1) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x17f068) [0x7fe11ec75068]
   [bt] (2) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x8f7034) [0x7fe11f3ed034]
   [bt] (3) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x2825020) [0x7fe12131b020]
   [bt] (4) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x27a3ad8) [0x7fe121299ad8]
   [bt] (5) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x27a3b13) [0x7fe121299b13]
   [bt] (6) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x27ab954) [0x7fe1212a1954]
   [bt] (7) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x27af461) [0x7fe1212a5461]
   [bt] (8) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x27ac01b) [0x7fe1212a201b]
   [bt] (9) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7fe130197c80]
   ```
   Anyone with any ideas?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services