You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2017/11/21 14:07:01 UTC
[GitHub] aijanai opened a new issue #8750: terminate called after throwing an instance of 'std::bad_alloc'

aijanai opened a new issue #8750: terminate called after throwing an instance of 'std::bad_alloc'
URL: https://github.com/apache/incubator-mxnet/issues/8750
 
 
   ## Description
   Feeding a dataset 10000 sentences (in one hot representation) always result in an abrupt `std::bad_alloc` as soon as we reach the `fit` invocation.
   
   ## Environment info (Required)
   
   ```
   ----------Python Info----------
   Version      : 3.5.2
   Compiler     : GCC 5.4.0 20160609
   Build        : ('default', 'Sep 14 2017 22:51:06')
   Arch         : ('64bit', 'ELF')
   ------------Pip Info-----------
   Version      : 9.0.1
   Directory    : /home/ai_ja_nai/pischool/mxnet/venv3/local/lib/python3.5/site-packages/pip
   ----------MXNet Info-----------
   Version      : 0.12.1
   Directory    : /home/ai_ja_nai/pischool/mxnet/venv3/local/lib/python3.5/site-packages/mxnet
   Commit Hash   : e0c7906693f0c79b0ce34a4d777c26a6bf1903c1
   ----------System Info----------
   Platform     : Linux-4.4.0-98-generic-x86_64-with-Ubuntu-16.04-xenial
   system       : Linux
   node         : aiMacBookPro
   release      : 4.4.0-98-generic
   version      : #121-Ubuntu SMP Tue Oct 10 14:24:03 UTC 2017
   ----------Hardware Info----------
   machine      : x86_64
   processor    : x86_64
   Architecture:          x86_64
   CPU op-mode(s):        32-bit, 64-bit
   Byte Order:            Little Endian
   CPU(s):                4
   On-line CPU(s) list:   0-3
   Thread(s) per core:    2
   Core(s) per socket:    2
   Socket(s):             1
   NUMA node(s):          1
   Vendor ID:             GenuineIntel
   CPU family:            6
   Model:                 69
   Model name:            Intel(R) Core(TM) i5-4258U CPU @ 2.40GHz
   Stepping:              1
   CPU MHz:               2900.531
   CPU max MHz:           2900,0000
   CPU min MHz:           800,0000
   BogoMIPS:              4800.02
   Virtualization:        VT-x
   L1d cache:             32K
   L1i cache:             32K
   L2 cache:              256K
   L3 cache:              3072K
   NUMA node0 CPU(s):     0-3
   Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm epb tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt dtherm ida arat pln pts
   ----------Network Test----------
   Setting timeout: 10
   Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.0745 sec, LOAD: 0.1277 sec.
   Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 0.1750 sec, LOAD: 4.3842 sec.
   Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0590 sec, LOAD: 0.6070 sec.
   Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0557 sec, LOAD: 0.3190 sec.
   Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0468 sec, LOAD: 0.1626 sec.
   Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0343 sec, LOAD: 0.8834 sec.
   
   ```
   
   Package used (Python/R/Scala/Julia):
   I'm using Python
   
   
   
   ## Error Message:
   terminate called after throwing an instance of 'std::bad_alloc'
     what():  std::bad_alloc
   Aborted (core dumped)
   
   
   ## Minimum reproducible example
   `python rnn_reverse_string.py` (find it in the attached archive)
   [crashing_mxnet.tar.gz](https://github.com/apache/incubator-mxnet/files/1491572/crashing_mxnet.tar.gz)
   
   
   ## Steps to reproduce
   
   1. test the sanity of the iterator with `test_train_set.py`
   2. try to execute the RNN (a trivial RNN which is supposed to learn how to reverse a sentence) and enjoy the crash
   
   ## What have you tried to solve it?
   
   1.Tried to isolate the problem in the iterator. The iterator is based on the `SimpleIter` mentioned in Loading Data Tutorial, but has been amended on the basic differences found Sockeye's `ParallelBucketSentenceIter`. The iterator can output War and Peace, provided as the `english` file, though most of the memory vanishes for the integer representation and the vocabulary which is loaded in RAM. Try using a small subset, like 1000 sentences.
   2. Tried to use an on the fly one-hot encoding of the sentences to save some memory. It crashes with any data size, I can't find the issue in my iterator. I'm pasting the code for ease:
   
   
   ```
   import mxnet as mx
   
   # THIS IS A VERY SIMPLE ITERATOR THAT, TAKEN A LIST OF SENTENCES ENCODED AS INTEGERS, OUTPUTS THE BATCHES IN ONE HOT FORM (TO OVERCOME MEMORY ALLOCATION ISSUES)
   
   class OneHotIterator(mx.io.DataIter):
       def __init__(self,
                    data, label,
                    data_names, max_len_data, vocab_size_data,
                    label_names, max_len_label, vocab_size_label,
                    batch_size=10):
           self._provide_data = [
               mx.io.DataDesc(
                   name=data_names,
                   shape=(batch_size, max_len_data, vocab_size_data),
                   layout='NTC')
           ]
           self._provide_label = [
               mx.io.DataDesc(
                   name=label_names,
                   shape=(batch_size, max_len_label, vocab_size_label),
                   layout='NTC')
           ]
           self.num_batches = len(data)//batch_size
           self.batch_size = batch_size
           self.cur_data_pointer = 0
           self.cur_batch = 0
           self.vocab_size_data = vocab_size_data
           self.vocab_size_label = vocab_size_label
           self.data = data
           self.label = label
   
       def __iter__(self):
           return self
   
       def reset(self):
           self.cur_batch = 0
           self.cur_data_pointer = 0
   
       def __next__(self):
           return self.next()
   
       @property
       def provide_data(self):
           return self._provide_data
   
   
       @property
       def provide_label(self):
           return self._provide_label
   
       def next(self):
           if self.cur_batch < self.num_batches-1:
               self.cur_batch += 1
   
               data_batch = []
               label_batch = []
   
               for i in range(self.batch_size):
                   data_batch.append(self.data[self.cur_data_pointer])
                   label_batch.append(self.label[self.cur_data_pointer])
                   self.cur_data_pointer+=1
   
               label = [mx.nd.one_hot(mx.nd.array(data_batch), self.vocab_size_label)]
               data = [mx.nd.one_hot(mx.nd.array(label_batch), self.vocab_size_data)]
               
               return mx.io.DataBatch(
                   data,
                   label,
                   provide_data=self._provide_data,
                   provide_label=self._provide_label
               )
           else:
               raise StopIteration
   ```
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services