You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2017/11/21 14:07:01 UTC
[GitHub] aijanai opened a new issue #8750: terminate called after throwing an instance of 'std::bad_alloc'
aijanai opened a new issue #8750: terminate called after throwing an instance of 'std::bad_alloc'
URL: https://github.com/apache/incubator-mxnet/issues/8750
## Description
Feeding a dataset 10000 sentences (in one hot representation) always result in an abrupt `std::bad_alloc` as soon as we reach the `fit` invocation.
## Environment info (Required)
```
----------Python Info----------
Version : 3.5.2
Compiler : GCC 5.4.0 20160609
Build : ('default', 'Sep 14 2017 22:51:06')
Arch : ('64bit', 'ELF')
------------Pip Info-----------
Version : 9.0.1
Directory : /home/ai_ja_nai/pischool/mxnet/venv3/local/lib/python3.5/site-packages/pip
----------MXNet Info-----------
Version : 0.12.1
Directory : /home/ai_ja_nai/pischool/mxnet/venv3/local/lib/python3.5/site-packages/mxnet
Commit Hash : e0c7906693f0c79b0ce34a4d777c26a6bf1903c1
----------System Info----------
Platform : Linux-4.4.0-98-generic-x86_64-with-Ubuntu-16.04-xenial
system : Linux
node : aiMacBookPro
release : 4.4.0-98-generic
version : #121-Ubuntu SMP Tue Oct 10 14:24:03 UTC 2017
----------Hardware Info----------
machine : x86_64
processor : x86_64
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 2
Core(s) per socket: 2
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 69
Model name: Intel(R) Core(TM) i5-4258U CPU @ 2.40GHz
Stepping: 1
CPU MHz: 2900.531
CPU max MHz: 2900,0000
CPU min MHz: 800,0000
BogoMIPS: 4800.02
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 3072K
NUMA node0 CPU(s): 0-3
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm epb tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt dtherm ida arat pln pts
----------Network Test----------
Setting timeout: 10
Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.0745 sec, LOAD: 0.1277 sec.
Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 0.1750 sec, LOAD: 4.3842 sec.
Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0590 sec, LOAD: 0.6070 sec.
Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0557 sec, LOAD: 0.3190 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0468 sec, LOAD: 0.1626 sec.
Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0343 sec, LOAD: 0.8834 sec.
```
Package used (Python/R/Scala/Julia):
I'm using Python
## Error Message:
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
Aborted (core dumped)
## Minimum reproducible example
`python rnn_reverse_string.py` (find it in the attached archive)
[crashing_mxnet.tar.gz](https://github.com/apache/incubator-mxnet/files/1491572/crashing_mxnet.tar.gz)
## Steps to reproduce
1. test the sanity of the iterator with `test_train_set.py`
2. try to execute the RNN (a trivial RNN which is supposed to learn how to reverse a sentence) and enjoy the crash
## What have you tried to solve it?
1.Tried to isolate the problem in the iterator. The iterator is based on the `SimpleIter` mentioned in Loading Data Tutorial, but has been amended on the basic differences found Sockeye's `ParallelBucketSentenceIter`. The iterator can output War and Peace, provided as the `english` file, though most of the memory vanishes for the integer representation and the vocabulary which is loaded in RAM. Try using a small subset, like 1000 sentences.
2. Tried to use an on the fly one-hot encoding of the sentences to save some memory. It crashes with any data size, I can't find the issue in my iterator. I'm pasting the code for ease:
```
import mxnet as mx
# THIS IS A VERY SIMPLE ITERATOR THAT, TAKEN A LIST OF SENTENCES ENCODED AS INTEGERS, OUTPUTS THE BATCHES IN ONE HOT FORM (TO OVERCOME MEMORY ALLOCATION ISSUES)
class OneHotIterator(mx.io.DataIter):
def __init__(self,
data, label,
data_names, max_len_data, vocab_size_data,
label_names, max_len_label, vocab_size_label,
batch_size=10):
self._provide_data = [
mx.io.DataDesc(
name=data_names,
shape=(batch_size, max_len_data, vocab_size_data),
layout='NTC')
]
self._provide_label = [
mx.io.DataDesc(
name=label_names,
shape=(batch_size, max_len_label, vocab_size_label),
layout='NTC')
]
self.num_batches = len(data)//batch_size
self.batch_size = batch_size
self.cur_data_pointer = 0
self.cur_batch = 0
self.vocab_size_data = vocab_size_data
self.vocab_size_label = vocab_size_label
self.data = data
self.label = label
def __iter__(self):
return self
def reset(self):
self.cur_batch = 0
self.cur_data_pointer = 0
def __next__(self):
return self.next()
@property
def provide_data(self):
return self._provide_data
@property
def provide_label(self):
return self._provide_label
def next(self):
if self.cur_batch < self.num_batches-1:
self.cur_batch += 1
data_batch = []
label_batch = []
for i in range(self.batch_size):
data_batch.append(self.data[self.cur_data_pointer])
label_batch.append(self.label[self.cur_data_pointer])
self.cur_data_pointer+=1
label = [mx.nd.one_hot(mx.nd.array(data_batch), self.vocab_size_label)]
data = [mx.nd.one_hot(mx.nd.array(label_batch), self.vocab_size_data)]
return mx.io.DataBatch(
data,
label,
provide_data=self._provide_data,
provide_label=self._provide_label
)
else:
raise StopIteration
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services