You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2019/12/22 12:10:38 UTC

[GitHub] [incubator-mxnet] hadjiprocopis opened a new issue #17144: ImageRecordIter : sometimes crashes when number of threads > 1

hadjiprocopis opened a new issue #17144: ImageRecordIter : sometimes crashes when number of threads > 1
URL: https://github.com/apache/incubator-mxnet/issues/17144
 
 
   Hi,
   I am using the Perl API (AI::MXNet the latest from CPAN v1.4) of MXNet v1.5.1 
   I am in Linux, fedora 30, kernel  5.3.13-200 and my Perl version is v5.28.2
   
   I have compiled MXNet from source archive : apache-mxnet-src-1.5.1-incubating.tar.gz ( i had too many problems with the git repo).
   
   The problem arises with the following piece of Perl code:
   
   ```
   #!/usr/bin/perl
   
   use strict;
   use warnings;
   
   use AI::MXNet qw/mx/;
   
   my $batch_size = 4;
   # num channels, width, height
   my $img_shape = [3, 256, 256];
   my $training_file = 'training.bin';
   # set src/io/iter_image_recordio_2.cc threadget=1
   my $train_dataiter = mx->io->ImageRecordIter({
   	'path_imgrec' => $training_file,
   	'path_imglist' => 'training.lst',
   	# num channels, width, height
   	'data_shape' => $img_shape,
   	'batch_size' => $batch_size,
   	'label_width' => 1,
   });
   my $batch = $train_dataiter->next();
   
   ```
   The problem is that sometimes 50-50 the above code causes a segmentation fault at `res.release()` of function 
   ```
   
   inline size_t ImageRecordIOParser2<DType>::ParseChunk(DType* data_dptr, real_t*$
     const size_t current_size, dmlc::InputSplit::Blob * chunk)
   ```
   In file `src/io/iter_image_recordio_2.cc`
   
   The problem disappears when I set the omp number of threads to 1 by adding a `threadget=1;` in said function. (setting the env var `export OMP_NUM_THREADS=1` seems to have no effect in my case that's why I edited the c file).
   
   I assume the input training file is correct because the corresponding python code works fine:
   
   ```
   import mxnet as mx
   import numpy as np
   import matplotlib.pyplot as plt
   import cv2
   
   # this works
   
   dataiter = mx.io.ImageRecordIter(
     path_imgrec="training.bin",
     path_imglist="training.lst",
     data_shape=(3,256,256),
     batch_size=4,
     label_width=1
   )
   
   batch = dataiter.next() # first batch.
   images = batch.data[0] # This will contain 4 (=batch_size) images each of 3x227x227.
   
   for i in range(4):
       plt.subplot(1,4,i+1)
       plt.imshow(images[i].asnumpy().astype(np.uint8).transpose((1,2,0)))
   plt.show()
   
   ```
   
   For debugging my case I have the following settings in config.mk
   ```
   DEV = 1
   DEBUG = 1
   USE_CUDA = 0
   ENABLE_CUDA_RTC = 1
   USE_CUDNN = 0
   USE_NVTX = 0
   USE_NCCL = 0
   USE_OPENCV = 1
   USE_LIBJPEG_TURBO = 0
   USE_OPENMP = 1
   USE_MKLDNN =
   USE_JEMALLOC = 0
   USE_GPERFTOOLS = 1
   USE_CPP_PACKAGE = 1
   ```
   Any ideas as to why this crash happens? Will I be able to decode images using many threads ever?
   
   I also have the following 3 side questions:
   1) what is the corresponding C++ code for the above Perl (and Python) script?
   2) any ideas how to control number of threads via env-variables? It seems that `OMP_NUM_THREADS`, `MXNET_CPU_PRIORITY_NTHREADS`, `MXNET_CPU_WORKER_NTHREADS` have absolutely no effect in my case (e.g. `OMP_NUM_THREADS=1 dataloader.pl` or `export OMP_NUM_THREADS=1; dataloader.pl` do not affect the number of threads in aforementioned function (`ImageRecordIOParser2()` of file `src/io/iter_image_recordio_2.cc`)
   3) I need C++ examples!
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services