You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2019/09/27 18:45:39 UTC
[GitHub] [incubator-mxnet] chunhanl opened a new issue #16302: Error while
running unit test test_gluon_data.py:test_dataloader_context
chunhanl opened a new issue #16302: Error while running unit test test_gluon_data.py:test_dataloader_context
URL: https://github.com/apache/incubator-mxnet/issues/16302
Note: Providing complete information in the most concise form is the best way to get help. This issue template serves as the checklist for essential information to most of the technical issues and bug reports. For non-technical issues and feature requests, feel free to present the information in what you believe is the best form.
For Q & A and discussion, please start a discussion thread at https://discuss.mxnet.io
## Description
Error while running unittest test_gluon_data.py:test_dataloader_context
## Environment info (Required)
```
----------Python Info----------
Version : 3.7.4
Compiler : GCC 7.3.0
Build : ('default', 'Aug 13 2019 20:35:49')
Arch : ('64bit', '')
------------Pip Info-----------
Version : 19.2.3
Directory : /home/leo/anaconda3/envs/mxnet_dev/lib/python3.7/site-packages/pip
----------MXNet Info-----------
Version : 1.6.0
Directory : /home/leo/anaconda3/envs/mxnet_dev/lib/python3.7/site-packages/mxnet-1.6.0-py3.7.egg/mxnet
Commit hash file "/home/leo/anaconda3/envs/mxnet_dev/lib/python3.7/site-packages/mxnet-1.6.0-py3.7.egg/mxnet/COMMIT_HASH" not found. Not installed from pre-built package or built from source.
Library : ['/home/leo/anaconda3/envs/mxnet_dev/lib/python3.7/site-packages/mxnet-1.6.0-py3.7.egg/mxnet/libmxnet.so']
Build features:
✔ CUDA
✖ CUDNN
✖ NCCL
✔ CUDA_RTC
✖ TENSORRT
✔ CPU_SSE
✔ CPU_SSE2
✔ CPU_SSE3
✔ CPU_SSE4_1
✔ CPU_SSE4_2
✖ CPU_SSE4A
✔ CPU_AVX
✖ CPU_AVX2
✔ OPENMP
✖ SSE
✔ F16C
✔ JEMALLOC
✖ BLAS_OPEN
✖ BLAS_ATLAS
✔ BLAS_MKL
✖ BLAS_APPLE
✔ LAPACK
✔ MKLDNN
✔ OPENCV
✖ CAFFE
✖ PROFILER
✖ DIST_KVSTORE
✖ CXX14
✖ INT64_TENSOR_SIZE
✔ SIGNAL_HANDLER
✖ DEBUG
✖ TVM_OP
----------System Info----------
Platform : Linux-4.15.0-64-generic-x86_64-with-debian-stretch-sid
system : Linux
node : leo-ubuntu
release : 4.15.0-64-generic
version : #73~16.04.1-Ubuntu SMP Fri Sep 13 09:56:18 UTC 2019
----------Hardware Info----------
machine : x86_64
processor : x86_64
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 1
Core(s) per socket: 4
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 60
Model name: Intel(R) Core(TM) i5-4460 CPU @ 3.20GHz
Stepping: 3
CPU MHz: 2540.259
CPU max MHz: 3400.0000
CPU min MHz: 800.0000
BogoMIPS: 6385.21
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 6144K
NUMA node0 CPU(s): 0-3
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt dtherm ida arat pln pts md_clear flush_l1d
----------Network Test----------
Setting timeout: 10
Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0128 sec, LOAD: 0.7847 sec.
Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.0766 sec, LOAD: 5.3545 sec.
Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 0.0233 sec, LOAD: 0.4215 sec.
Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0211 sec, LOAD: 0.2097 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0109 sec, LOAD: 0.5540 sec.
Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0164 sec, LOAD: 0.1282 sec.
----------Environment----------
```
## Build info (Required if built from source)
Compiler (gcc/clang/mingw/visual studio):
MXNet commit hash:
c5007ea01ad1b88e43cc669385a4a76b7020ce94
Build config:
```
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
#-------------------------------------------------------------------------------
# Template configuration for compiling mxnet
#
# If you want to change the configuration, please use the following
# steps. Assume you are on the root directory of mxnet. First copy the this
# file so that any local changes will be ignored by git
#
# $ cp make/config.mk .
#
# Next modify the according entries, and then compile by
#
# $ make
#
# or build in parallel with 8 threads
#
# $ make -j8
#-------------------------------------------------------------------------------
#---------------------
# choice of compiler
#--------------------
ifndef CC
export CC = gcc
endif
ifndef CXX
export CXX = g++
endif
ifndef NVCC
export NVCC = nvcc
endif
# Enable profiling
USE_PROFILER = 1
# whether compile with options for MXNet developer
DEV = 0
# whether compile with debug
DEBUG = 1
# whether to turn on segfault signal handler to log the stack trace
USE_SIGNAL_HANDLER =
# the additional link flags you want to add
ADD_LDFLAGS =
# the additional compile flags you want to add
ADD_CFLAGS =
# whether to build operators written in TVM
USE_TVM_OP = 0
#---------------------------------------------
# matrix computation libraries for CPU/GPU
#---------------------------------------------
# whether use CUDA during compile
USE_CUDA = 0
# add the path to CUDA library to link and compile flag
# if you have already add them to environment variable, leave it as NONE
# USE_CUDA_PATH = /usr/local/cuda
USE_CUDA_PATH = NONE
# whether to enable CUDA runtime compilation
ENABLE_CUDA_RTC = 1
# whether use CuDNN R3 library
USE_CUDNN = 0
# whether to use NVTX when profiling
USE_NVTX = 0
#whether to use NCCL library
USE_NCCL = 0
#add the path to NCCL library
USE_NCCL_PATH = NONE
# whether use opencv during compilation
# you can disable it, however, you will not able to use
# imbin iterator
USE_OPENCV = 1
# Add OpenCV include path, in which the directory `opencv2` exists
USE_OPENCV_INC_PATH = NONE
# Add OpenCV shared library path, in which the shared library exists
USE_OPENCV_LIB_PATH = NONE
#whether use libjpeg-turbo for image decode without OpenCV wrapper
USE_LIBJPEG_TURBO = 0
#add the path to libjpeg-turbo library
USE_LIBJPEG_TURBO_PATH = NONE
# use openmp for parallelization
USE_OPENMP = 1
# whether use MKL-DNN library: 0 = disabled, 1 = enabled
# if USE_MKLDNN is not defined, MKL-DNN will be enabled by default on x86 Linux.
# you can disable it explicity with USE_MKLDNN = 0
USE_MKLDNN =
# whether use NNPACK library
USE_NNPACK = 0
# choose the version of blas you want to use
# can be: mkl, blas, atlas, openblas
# in default use atlas for linux while apple for osx
UNAME_S := $(shell uname -s)
ifeq ($(UNAME_S), Darwin)
USE_BLAS = apple
else
USE_BLAS = atlas
endif
# whether use lapack during compilation
# only effective when compiled with blas versions openblas/apple/atlas/mkl
USE_LAPACK = 1
# path to lapack library in case of a non-standard installation
USE_LAPACK_PATH =
# add path to intel library, you may need it for MKL, if you did not add the path
# to environment variable
USE_INTEL_PATH = NONE
# If use MKL only for BLAS, choose static link automatically to allow python wrapper
ifeq ($(USE_BLAS), mkl)
USE_STATIC_MKL = 1
else
USE_STATIC_MKL = NONE
endif
#----------------------------
# Settings for power and arm arch
#----------------------------
ARCH := $(shell uname -a)
ifneq (,$(filter $(ARCH), armv6l armv7l powerpc64le ppc64le aarch64))
USE_SSE=0
USE_F16C=0
else
USE_SSE=1
endif
#----------------------------
# F16C instruction support for faster arithmetic of fp16 on CPU
#----------------------------
# For distributed training with fp16, this helps even if training on GPUs
# If left empty, checks CPU support and turns it on.
# For cross compilation, please check support for F16C on target device and turn off if necessary.
USE_F16C =
#----------------------------
# distributed computing
#----------------------------
# whether or not to enable multi-machine supporting
USE_DIST_KVSTORE = 0
# whether or not allow to read and write HDFS directly. If yes, then hadoop is
# required
USE_HDFS = 0
# path to libjvm.so. required if USE_HDFS=1
LIBJVM=$(JAVA_HOME)/jre/lib/amd64/server
# whether or not allow to read and write AWS S3 directly. If yes, then
# libcurl4-openssl-dev is required, it can be installed on Ubuntu by
# sudo apt-get install -y libcurl4-openssl-dev
USE_S3 = 0
#----------------------------
# performance settings
#----------------------------
# Use operator tuning
USE_OPERATOR_TUNING = 1
# Use gperftools if found
# Disable because of #8968
USE_GPERFTOOLS = 0
# path to gperftools (tcmalloc) library in case of a non-standard installation
USE_GPERFTOOLS_PATH =
# Link gperftools statically
USE_GPERFTOOLS_STATIC =
# Use JEMalloc if found, and not using gperftools
USE_JEMALLOC = 1
# path to jemalloc library in case of a non-standard installation
USE_JEMALLOC_PATH =
# Link jemalloc statically
USE_JEMALLOC_STATIC =
#----------------------------
# additional operators
#----------------------------
# path to folders containing projects specific operators that you don't want to put in src/operators
EXTRA_OPERATORS =
#----------------------------
# other features
#----------------------------
# Create C++ interface package
USE_CPP_PACKAGE = 0
# Use int64_t type to represent the total number of elements in a tensor
# This will cause performance degradation reported in issue #14496
# Set to 1 for large tensor with tensor size greater than INT32_MAX i.e. 2147483647
# Note: the size of each dimension is still bounded by INT32_MAX
USE_INT64_TENSOR_SIZE = 0
# Python executable. Needed for cython target
PYTHON = python
#----------------------------
# plugins
#----------------------------
# whether to use caffe integration. This requires installing caffe.
# You also need to add CAFFE_PATH/build/lib to your LD_LIBRARY_PATH
# CAFFE_PATH = $(HOME)/caffe
# MXNET_PLUGINS += plugin/caffe/caffe.mk
# WARPCTC_PATH = $(HOME)/warp-ctc
# MXNET_PLUGINS += plugin/warpctc/warpctc.mk
# whether to use sframe integration. This requires build sframe
# git@github.com:dato-code/SFrame.git
# SFRAME_PATH = $(HOME)/SFrame
# MXNET_PLUGINS += plugin/sframe/plugin.mk
```
## Error Message:
```
nose.config: INFO: Ignoring files matching ['^\\.', '^_', '^setup\\.py$']
[INFO] Setting module np/mx/python random seeds, use MXNET_MODULE_SEED=1269705535 to reproduce.
test_gluon_data.test_dataloader_context ... ok
ERROR
======================================================================
ERROR: test suite for <module 'test_gluon_data' from '/home/leo/workspace/incubator-mxnet/tests/python/unittest/test_gluon_data.py'>
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/leo/anaconda3/envs/mxnet_dev/lib/python3.7/site-packages/nose/suite.py", line 229, in run
self.tearDown()
File "/home/leo/anaconda3/envs/mxnet_dev/lib/python3.7/site-packages/nose/suite.py", line 352, in tearDown
self.teardownContext(ancestor)
File "/home/leo/anaconda3/envs/mxnet_dev/lib/python3.7/site-packages/nose/suite.py", line 368, in teardownContext
try_run(context, names)
File "/home/leo/anaconda3/envs/mxnet_dev/lib/python3.7/site-packages/nose/util.py", line 471, in try_run
return func()
File "/home/leo/workspace/incubator-mxnet/tests/python/unittest/common.py", line 272, in teardown
mx.nd.waitall()
File "/home/leo/anaconda3/envs/mxnet_dev/lib/python3.7/site-packages/mxnet-1.6.0-py3.7.egg/mxnet/ndarray/ndarray.py", line 188, in waitall
check_call(_LIB.MXNDArrayWaitAll())
File "/home/leo/anaconda3/envs/mxnet_dev/lib/python3.7/site-packages/mxnet-1.6.0-py3.7.egg/mxnet/base.py", line 254, in check_call
raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: [11:36:49] /home/leo/workspace/incubator-mxnet/src/engine/./../common/cuda_utils.h:321: Check failed: e == cudaSuccess || e == cudaErrorCudartUnloading: CUDA: invalid device ordinal
Stack trace:
[bt] (0) /home/leo/anaconda3/envs/mxnet_dev/lib/python3.7/site-packages/mxnet-1.6.0-py3.7.egg/mxnet/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x32) [0x7ff5faee4f82]
[bt] (1) /home/leo/anaconda3/envs/mxnet_dev/lib/python3.7/site-packages/mxnet-1.6.0-py3.7.egg/mxnet/libmxnet.so(mxnet::common::cuda::DeviceStore::SetDevice(int)+0xd0) [0x7ff5faee6dc0]
[bt] (2) /home/leo/anaconda3/envs/mxnet_dev/lib/python3.7/site-packages/mxnet-1.6.0-py3.7.egg/mxnet/libmxnet.so(mxnet::common::cuda::DeviceStore::DeviceStore(int, bool)+0x48) [0x7ff5faee6e28]
[bt] (3) /home/leo/anaconda3/envs/mxnet_dev/lib/python3.7/site-packages/mxnet-1.6.0-py3.7.egg/mxnet/libmxnet.so(mxnet::storage::PinnedMemoryStorage::Alloc(mxnet::Storage::Handle*)+0x56) [0x7ff5fd341656]
[bt] (4) /home/leo/anaconda3/envs/mxnet_dev/lib/python3.7/site-packages/mxnet-1.6.0-py3.7.egg/mxnet/libmxnet.so(mxnet::StorageImpl::Alloc(mxnet::Storage::Handle*)+0x56) [0x7ff5fd340306]
[bt] (5) /home/leo/anaconda3/envs/mxnet_dev/lib/python3.7/site-packages/mxnet-1.6.0-py3.7.egg/mxnet/libmxnet.so(mxnet::NDArray::CheckAndAlloc() const+0x1cf) [0x7ff5faff685f]
[bt] (6) /home/leo/anaconda3/envs/mxnet_dev/lib/python3.7/site-packages/mxnet-1.6.0-py3.7.egg/mxnet/libmxnet.so(void mxnet::CopyFromToDnsImpl<mshadow::cpu, mshadow::cpu>(mxnet::NDArray const&, mxnet::NDArray const&, mxnet::RunContext)+0x570) [0x7ff5fd1b5070]
[bt] (7) /home/leo/anaconda3/envs/mxnet_dev/lib/python3.7/site-packages/mxnet-1.6.0-py3.7.egg/mxnet/libmxnet.so(void mxnet::CopyFromToImpl<mshadow::cpu, mshadow::cpu>(mxnet::NDArray const&, mxnet::NDArray const&, mxnet::RunContext, std::vector<mxnet::Resource, std::allocator<mxnet::Resource> > const&)+0x4a3) [0x7ff5fd1bdbb3]
[bt] (8) /home/leo/anaconda3/envs/mxnet_dev/lib/python3.7/site-packages/mxnet-1.6.0-py3.7.egg/mxnet/libmxnet.so(+0x371e197) [0x7ff5fd1a9197]
----------------------------------------------------------------------
Ran 1 test in 7.134s
FAILED (errors=1)
```
## Minimum reproducible example
## Steps to reproduce
1. nosetests --verbosity=3 --nocapture tests/python/unittest/test_gluon_data.py:test_dataloader_context
## What have you tried to solve it?
1. Pretty sure it is related to
```
# use pinned memory with custom device id
loader3 = gluon.data.DataLoader(dataset, 8, pin_memory=True,
pin_device_id=custom_dev_id)
for _, x in enumerate(loader3):
assert x.context == context.cpu_pinned(custom_dev_id)
```
when custom_dev_id is set to 1, not sure what reason cuased this error
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services