You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2019/10/30 22:05:30 UTC
[GitHub] [incubator-mxnet] larroy opened a new issue #16675: MaskRCNN unable
to train with master, works with previous revisions
larroy opened a new issue #16675: MaskRCNN unable to train with master, works with previous revisions
URL: https://github.com/apache/incubator-mxnet/issues/16675
## Description
I can't train mask rcnn with latest revisions of MXNet:
https://gluon-cv.mxnet.io/build/examples_instance/train_mask_rcnn_coco.html
This revision works:
e9e267ef7 - (Sat, 14 Sep 2019 09:33:08 -0700) reminisce - Fix remaining errors reported by D2L (#16157)
This doesn't:
86ed5f5c0 - (Mon, 28 Oct 2019 01:24:05 -0700) Huang, Gua.. - [NumPy][Operator] NumPy operator `may_share_memory` and `shares_memory` (#16533) (upstream/v1.6.x)
I see very low throughput, high CPU usage and low GPU usage or it gets stuck completely.
This can be reproduced either from source or from the latest pip builds, so I don't think it's my environment or my build options.
This is my build environment:
```
USE_CUDA: "ON" # Build with CUDA support
USE_OLDCMAKECUDA: "OFF" # Build with old cmake cuda
USE_NCCL: "ON" # Use NVidia NCCL with CUDA
USE_OPENCV: "ON" # Build with OpenCV support
USE_OPENMP: "PLATFORM" # Build with Openmp support
USE_CUDNN: "ON" # Build with cudnn support) # one could set CUDNN_ROOT for search path
USE_SSE: "ON" # Build with x86 SSE instruction support IF NOT ARM
USE_F16C: "ON" # Build with x86 F16C instruction support) # autodetects support if "ON"
USE_LAPACK: "ON" # Build with lapack support
USE_MKL_IF_AVAILABLE: "OFF" # Use MKL if found
USE_MKLML_MKL: "OFF" # Use MKLDNN variant of MKL (if MKL found) IF USE_MKL_IF_AVAILABLE AND (NOT APPLE)
USE_MKLDNN: "OFF" # Use MKLDNN variant of MKL (if MKL found) IF USE_MKL_IF_AVAILABLE AND (NOT APPLE)
USE_OPERATOR_TUNING: "ON" # Enable auto-tuning of operators IF NOT MSVC
USE_GPERFTOOLS: "ON" # Build with GPerfTools support (if found)
USE_JEMALLOC: "ON" # Build with Jemalloc support
USE_DIST_KVSTORE: "OFF" # Build with DIST_KVSTORE support
USE_PLUGINS_WARPCTC: "OFF" # Use WARPCTC Plugins
USE_PLUGIN_CAFFE: "OFF" # Use Caffe Plugin
USE_CPP_PACKAGE: "OFF" # Build C++ Package
USE_MXNET_LIB_NAMING: "ON" # Use MXNet library naming conventions.
USE_GPROF: "OFF" # Compile with gprof (profiling) flag
USE_CXX14_IF_AVAILABLE: "OFF" # Build with C++14 if the compiler supports it
USE_VTUNE: "OFF" # Enable use of Intel Amplifier XE (VTune)) # one could set VTUNE_ROOT for search path
ENABLE_CUDA_RTC: "ON" # Build with CUDA runtime compilation support
BUILD_CPP_EXAMPLES: "ON" # Build cpp examples
INSTALL_EXAMPLES: "OFF" # Install the example source files.
USE_SIGNAL_HANDLER: "ON" # Print stack traces on segfaults.
USE_TENSORRT: "OFF" # Enable infeference optimization with TensorRT.
USE_ASAN: "OFF" # Enable Clang/GCC ASAN sanitizers.
ENABLE_TESTCOVERAGE: "OFF" # Enable compilation with test coverage metric output
CMAKE_BUILD_TYPE: "Release"
CMAKE_CUDA_COMPILER_LAUNCHER: "ccache"
CMAKE_C_COMPILER_LAUNCHER: "ccache"
CMAKE_CXX_COMPILER_LAUNCHER: "ccache"
```
## Diagnose
```
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 96
On-line CPU(s) list: 0-95
Thread(s) per core: 2
Core(s) per socket: 24
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz
Stepping: 4
CPU MHz: 3134.070
BogoMIPS: 5000.00
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 1024K
L3 cache: 33792K
NUMA node0 CPU(s): 0-23,48-71
NUMA node1 CPU(s): 24-47,72-95
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke
----------Python Info----------
Version : 3.6.8
Compiler : GCC 8.3.0
Build : ('default', 'Oct 7 2019 12:59:55')
Arch : ('64bit', 'ELF')
------------Pip Info-----------
Version : 19.3.1
Directory : /home/piotr/mxnet/py3_venv/lib/python3.6/site-packages/pip
----------MXNet Info-----------
Version : 1.6.0
Directory : /home/piotr/mxnet/python/mxnet
Commit hash file "/home/piotr/mxnet/python/mxnet/COMMIT_HASH" not found. Not installed from pre-built package or built from source.
Library : ['/home/piotr/mxnet/python/mxnet/../../build/libmxnet.so']
Build features:
✔ CUDA
✔ CUDNN
✔ NCCL
✔ CUDA_RTC
✖ TENSORRT
✔ CPU_SSE
✔ CPU_SSE2
✔ CPU_SSE3
✔ CPU_SSE4_1
✔ CPU_SSE4_2
✖ CPU_SSE4A
✔ CPU_AVX
✖ CPU_AVX2
✔ OPENMP
✖ SSE
✔ F16C
✔ JEMALLOC
✔ BLAS_OPEN
✖ BLAS_ATLAS
✖ BLAS_MKL
✖ BLAS_APPLE
✔ LAPACK
✖ MKLDNN
✔ OPENCV
✖ CAFFE
✖ PROFILER
✖ DIST_KVSTORE
✖ CXX14
✖ INT64_TENSOR_SIZE
✔ SIGNAL_HANDLER
✖ DEBUG
✖ TVM_OP
----------System Info----------
Platform : Linux-4.15.0-1052-aws-x86_64-with-Ubuntu-18.04-bionic
system : Linux
node : 18-232-106-45
release : 4.15.0-1052-aws
version : #54-Ubuntu SMP Tue Oct 1 15:43:26 UTC 2019
----------Hardware Info----------
machine : x86_64
processor : x86_64
----------Network Test----------
Setting timeout: 10
Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0020 sec, LOAD: 0.4104 sec.
Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.0190 sec, LOAD: 0.0444 sec.
Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 0.0222 sec, LOAD: 0.3929 sec.
Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0184 sec, LOAD: 0.3812 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0017 sec, LOAD: 0.0803 sec.
Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0063 sec, LOAD: 0.0893 sec.
----------Environment----------
(END)
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services