You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2020/04/30 03:22:45 UTC
[GitHub] [incubator-mxnet] xidulu opened a new issue #18198: with_seed() broken when running GPU unit test
xidulu opened a new issue #18198:
URL: https://github.com/apache/incubator-mxnet/issues/18198
## Description
Ran GPU unit tests
`DMLC_LOG_STACK_TRACE_DEPTH=10 MXNET_MODULE_SEED=781106105 MXNET_ENGINE_TYPE=NaiveEngine pytest tests/python/gpu/test_operator_gpu.py`
### Error Message
```
tests/python/gpu/test_operator_gpu.py .........s.s...................... [ 5%]
.........................FFFFFsF.FFFFFFFFFFFFFFFFFFFFFFFFFF.FFFFFFFFFFFF [ 16%]
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF.FFF.FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF [ 28%]
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFsFssFFFFFF [ 39%]
FFFFFFFFFFFFFFFFsFFFFFFFFFFFFFFFFFFFsFFFFFFFFFFFsFFFFFFFsFFFsFFFFFFFFFFF [ 50%]
FFFFF.FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFsFFFFFFFFFFFFFFFFFFFFFFFFF [ 62%]
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFsFFFFFFFFFsFFFFFF.FFFFFFFFFFFF [ 73%]
FFFFFFFFFFFFFFFFFFF....F.FFFFFFFFFFFFFFFFFFFFFFFFFFFFFF....FFFFFFFFFFFFF [ 84%]
FFFxxxFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF [ 96%]
FFFFFFFFFFFFFFFFFFFFFFFF [100%]
=================================== FAILURES ===================================
_______________________ test_batchnorm_backwards_notrain _______________________
args = (), kwargs = {}, test_count = 1, env_seed_str = None, i = 0
this_test_seed = 1871614074, log_level = 10
post_test_state = ('MT19937', array([ 793462385, 4162567913, 2690816661, 3146259572, 1379942102,
894119658, 364406528, 36749442..., 3314795127, 3420630909, 2538379262,
3698999054, 2822638424, 471751221, 3037373484], dtype=uint32), 1, 0, 0.0)
@functools.wraps(orig_test)
def test_new(*args, **kwargs):
test_count = int(os.getenv('MXNET_TEST_COUNT', '1'))
env_seed_str = os.getenv('MXNET_TEST_SEED')
for i in range(test_count):
if seed is not None:
this_test_seed = seed
log_level = logging.INFO
elif env_seed_str is not None:
this_test_seed = int(env_seed_str)
log_level = logging.INFO
else:
this_test_seed = np.random.randint(0, np.iinfo(np.int32).max)
log_level = logging.DEBUG
post_test_state = np.random.get_state()
np.random.seed(this_test_seed)
> mx.random.seed(this_test_seed)
tests/python/unittest/common.py:206:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
python/mxnet/random.py:96: in seed
check_call(_LIB.MXRandomSeed(seed_state))
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
ret = -1
def check_call(ret):
"""Check the return value of C API call.
This function will raise an exception when an error occurs.
Wrap every API call with this function.
Parameters
----------
ret : int
return value from API calls.
"""
if ret != 0:
> raise get_last_ffi_error()
E mxnet.base.MXNetError: Traceback (most recent call last):
E [bt] (5) /home/ubuntu/mxnet_master_develop/python/mxnet/../../build/libmxnet.so(MXRandomSeed+0x1a) [0x7f89ce0a0c0a]
E [bt] (4) /home/ubuntu/mxnet_master_develop/python/mxnet/../../build/libmxnet.so(mxnet::resource::ResourceManagerImpl::SeedRandom(unsigned int)+0x30b) [0x7f89d11b081b]
E [bt] (3) /home/ubuntu/mxnet_master_develop/python/mxnet/../../build/libmxnet.so(mxnet::engine::NaiveEngine::PushAsync(std::function<void (mxnet::RunContext, mxnet::engine::CallbackOnComplete)>, mxnet::Context, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, mxnet::FnProperty, int, char const*, bool)+0x43b) [0x7f89ce1d08eb]
E [bt] (2) /home/ubuntu/mxnet_master_develop/python/mxnet/../../build/libmxnet.so(std::_Function_handler<void (mxnet::RunContext, mxnet::engine::CallbackOnComplete), mxnet::resource::ResourceManagerImpl::ResourceParallelRandom<mshadow::gpu>::SeedOne(unsigned long, unsigned int)::{lambda(mxnet::RunContext, mxnet::engine::CallbackOnComplete)#1}>::_M_invoke(std::_Any_data const&, mxnet::RunContext&&, mxnet::engine::CallbackOnComplete&&)+0x1e) [0x7f89d11ac6ce]
E [bt] (1) /home/ubuntu/mxnet_master_develop/python/mxnet/../../build/libmxnet.so(mxnet::common::random::RandGenerator<mshadow::gpu, float>::Seed(mshadow::Stream<mshadow::gpu>*, unsigned int)+0x1e9) [0x7f89d1240f55]
E [bt] (0) /home/ubuntu/mxnet_master_develop/python/mxnet/../../build/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x7f) [0x7f89cdf9b24f]
E File "../src/common/random_generator.cu", line 58
E Name: Check failed: err == cudaSuccess (10 vs. 0) : rand_generator_seed_kernel ErrStr:invalid device ordinal
python/mxnet/base.py:246: MXNetError
---------------------------- Captured stderr setup -----------------------------
WARNING:root:Unable to import numpy/mxnet. Skipping seeding for numpy/mxnet.
------------------------------ Captured log setup ------------------------------
WARNING root:conftest.py:177 Unable to import numpy/mxnet. Skipping seeding for numpy/mxnet.
____________________ test_create_sparse_ndarray_gpu_to_cpu _____________________
```
## To Reproduce
(If you developed your own code, please provide a short script that reproduces the error. For existing examples, please provide link.)
### Steps to reproduce
(Paste the commands you ran that produced the error.)
1.
2.
## What have you tried to solve it?
1.
2.
## Environment
We recommend using our script for collecting the diagnositc information. Run the following command and paste the outputs below:
```
curl --retry 10 -s https://raw.githubusercontent.com/dmlc/gluon-nlp/master/tools/diagnose.py | python
# paste outputs here
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [incubator-mxnet] leezu commented on issue #18198: with_seed() broken when running GPU unit test
Posted by GitBox <gi...@apache.org>.
leezu commented on issue #18198:
URL: https://github.com/apache/incubator-mxnet/issues/18198#issuecomment-622242111
Yes, it's very likely
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [incubator-mxnet] haojin2 commented on issue #18198: with_seed() broken when running GPU unit test
Posted by GitBox <gi...@apache.org>.
haojin2 commented on issue #18198:
URL: https://github.com/apache/incubator-mxnet/issues/18198#issuecomment-621595040
Is this possibly caused by #18025? @szha @leezu
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org