You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2018/10/16 14:31:28 UTC

[GitHub] larroy commented on issue #12453: GPU tests are unstable

larroy commented on issue #12453: GPU tests are unstable
URL: https://github.com/apache/incubator-mxnet/issues/12453#issuecomment-430260571
 
 
   This is failing again on a GPU instance p3.2xlarge.
   
    time ci/build.py --docker-registry mxnetci --platform ubuntu_build_cuda --docker-build-retries 3 --shm-size 500m /work/runtime_functions.sh build_ubuntu_gpu_mkldnn && time ci/build.py --docker-registry mxnetci --nvidiadocker --platform ubuntu_gpu --docker-build-retries 3 --shm-size 500m /work/runtime_functions.sh unittest_ubuntu_python3_gpu
   
   
   ERROR
   test_operator_gpu.test_ndarray_equal ... [INFO] Setting test np/mx/python random seeds, use MXNET_TEST_SEED=1470664036 to reproduce.
   ERROR
   test_operator_gpu.test_size_array ... [INFO] Setting test np/mx/python random seeds, use MXNET_TEST_SEED=1016858059 to reproduce.
   ERROR
   test invalid sparse operator will throw a exception ... ok
   test_operator_gpu.test_ndarray_not_equal ... ok
   test_operator_gpu.test_nadam ... [INFO] Setting test np/mx/python random seeds, use MXNET_TEST_SEED=440311246 to reproduce.
   ERROR
   test check_format for sparse ndarray ... [13:03:09] src/operator/tensor/./.././../common/../operator/mxnet_op.h:649: Check failed: (err) == (cudaSuccess) Name: mxnet_generic_kernel ErrStr:too many resources requested for launch
   /work/runtime_functions.sh: line 722:     8 Aborted                 (core dumped) nosetests-3.4 $NOSE_COVERAGE_ARGUMENTS --with-xunit --xunit-file nosetests_gpu.xml --verbose tests/python/gpu
   build.py: 2018-10-16 13:03:10,500 Waiting for status of container dd18847ed3fd for 600 s.
   build.py: 2018-10-16 13:03:10,644 Container exit status: {'StatusCode': 134, 'Error': None}
   build.py: 2018-10-16 13:03:10,644 Stopping container: dd18847ed3fd
   build.py: 2018-10-16 13:03:10,646 Removing container: dd18847ed3fd
   build.py: 2018-10-16 13:03:10,716 Execution of ['/work/runtime_functions.sh', 'unittest_ubuntu_python3_gpu'] failed with status: 134
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services