You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2020/04/30 03:22:45 UTC

[GitHub] [incubator-mxnet] xidulu opened a new issue #18198: with_seed() broken when running GPU unit test

xidulu opened a new issue #18198:
URL: https://github.com/apache/incubator-mxnet/issues/18198


   ## Description
   
   Ran GPU unit tests
   `DMLC_LOG_STACK_TRACE_DEPTH=10 MXNET_MODULE_SEED=781106105 MXNET_ENGINE_TYPE=NaiveEngine pytest tests/python/gpu/test_operator_gpu.py`
   
   ### Error Message
   ```
   tests/python/gpu/test_operator_gpu.py .........s.s...................... [  5%]
   .........................FFFFFsF.FFFFFFFFFFFFFFFFFFFFFFFFFF.FFFFFFFFFFFF [ 16%]
   FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF.FFF.FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF [ 28%]
   FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFsFssFFFFFF [ 39%]
   FFFFFFFFFFFFFFFFsFFFFFFFFFFFFFFFFFFFsFFFFFFFFFFFsFFFFFFFsFFFsFFFFFFFFFFF [ 50%]
   FFFFF.FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFsFFFFFFFFFFFFFFFFFFFFFFFFF [ 62%]
   FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFsFFFFFFFFFsFFFFFF.FFFFFFFFFFFF [ 73%]
   FFFFFFFFFFFFFFFFFFF....F.FFFFFFFFFFFFFFFFFFFFFFFFFFFFFF....FFFFFFFFFFFFF [ 84%]
   FFFxxxFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF [ 96%]
   FFFFFFFFFFFFFFFFFFFFFFFF                                                 [100%]
   
   =================================== FAILURES ===================================
   _______________________ test_batchnorm_backwards_notrain _______________________
   
   args = (), kwargs = {}, test_count = 1, env_seed_str = None, i = 0
   this_test_seed = 1871614074, log_level = 10
   post_test_state = ('MT19937', array([ 793462385, 4162567913, 2690816661, 3146259572, 1379942102,
           894119658,  364406528, 36749442..., 3314795127, 3420630909, 2538379262,
          3698999054, 2822638424,  471751221, 3037373484], dtype=uint32), 1, 0, 0.0)
   
       @functools.wraps(orig_test)
       def test_new(*args, **kwargs):
           test_count = int(os.getenv('MXNET_TEST_COUNT', '1'))
           env_seed_str = os.getenv('MXNET_TEST_SEED')
           for i in range(test_count):
               if seed is not None:
                   this_test_seed = seed
                   log_level = logging.INFO
               elif env_seed_str is not None:
                   this_test_seed = int(env_seed_str)
                   log_level = logging.INFO
               else:
                   this_test_seed = np.random.randint(0, np.iinfo(np.int32).max)
                   log_level = logging.DEBUG
               post_test_state = np.random.get_state()
               np.random.seed(this_test_seed)
   >           mx.random.seed(this_test_seed)
   
   tests/python/unittest/common.py:206: 
   _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
   python/mxnet/random.py:96: in seed
       check_call(_LIB.MXRandomSeed(seed_state))
   _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
   
   ret = -1
   
       def check_call(ret):
           """Check the return value of C API call.
       
           This function will raise an exception when an error occurs.
           Wrap every API call with this function.
       
           Parameters
           ----------
           ret : int
               return value from API calls.
           """
           if ret != 0:
   >           raise get_last_ffi_error()
   E           mxnet.base.MXNetError: Traceback (most recent call last):
   E             [bt] (5) /home/ubuntu/mxnet_master_develop/python/mxnet/../../build/libmxnet.so(MXRandomSeed+0x1a) [0x7f89ce0a0c0a]
   E             [bt] (4) /home/ubuntu/mxnet_master_develop/python/mxnet/../../build/libmxnet.so(mxnet::resource::ResourceManagerImpl::SeedRandom(unsigned int)+0x30b) [0x7f89d11b081b]
   E             [bt] (3) /home/ubuntu/mxnet_master_develop/python/mxnet/../../build/libmxnet.so(mxnet::engine::NaiveEngine::PushAsync(std::function<void (mxnet::RunContext, mxnet::engine::CallbackOnComplete)>, mxnet::Context, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, mxnet::FnProperty, int, char const*, bool)+0x43b) [0x7f89ce1d08eb]
   E             [bt] (2) /home/ubuntu/mxnet_master_develop/python/mxnet/../../build/libmxnet.so(std::_Function_handler<void (mxnet::RunContext, mxnet::engine::CallbackOnComplete), mxnet::resource::ResourceManagerImpl::ResourceParallelRandom<mshadow::gpu>::SeedOne(unsigned long, unsigned int)::{lambda(mxnet::RunContext, mxnet::engine::CallbackOnComplete)#1}>::_M_invoke(std::_Any_data const&, mxnet::RunContext&&, mxnet::engine::CallbackOnComplete&&)+0x1e) [0x7f89d11ac6ce]
   E             [bt] (1) /home/ubuntu/mxnet_master_develop/python/mxnet/../../build/libmxnet.so(mxnet::common::random::RandGenerator<mshadow::gpu, float>::Seed(mshadow::Stream<mshadow::gpu>*, unsigned int)+0x1e9) [0x7f89d1240f55]
   E             [bt] (0) /home/ubuntu/mxnet_master_develop/python/mxnet/../../build/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x7f) [0x7f89cdf9b24f]
   E             File "../src/common/random_generator.cu", line 58
   E           Name: Check failed: err == cudaSuccess (10 vs. 0) : rand_generator_seed_kernel ErrStr:invalid device ordinal
   
   python/mxnet/base.py:246: MXNetError
   ---------------------------- Captured stderr setup -----------------------------
   WARNING:root:Unable to import numpy/mxnet. Skipping seeding for numpy/mxnet.
   ------------------------------ Captured log setup ------------------------------
   WARNING  root:conftest.py:177 Unable to import numpy/mxnet. Skipping seeding for numpy/mxnet.
   ____________________ test_create_sparse_ndarray_gpu_to_cpu _____________________
   
   ```
   
   ## To Reproduce
   (If you developed your own code, please provide a short script that reproduces the error. For existing examples, please provide link.)
   
   ### Steps to reproduce
   (Paste the commands you ran that produced the error.)
   
   1.
   2.
   
   ## What have you tried to solve it?
   
   1.
   2.
   
   ## Environment
   
   We recommend using our script for collecting the diagnositc information. Run the following command and paste the outputs below:
   ```
   curl --retry 10 -s https://raw.githubusercontent.com/dmlc/gluon-nlp/master/tools/diagnose.py | python
   
   # paste outputs here
   ```
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] leezu commented on issue #18198: with_seed() broken when running GPU unit test

Posted by GitBox <gi...@apache.org>.
leezu commented on issue #18198:
URL: https://github.com/apache/incubator-mxnet/issues/18198#issuecomment-622242111


   Yes, it's very likely


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] haojin2 commented on issue #18198: with_seed() broken when running GPU unit test

Posted by GitBox <gi...@apache.org>.
haojin2 commented on issue #18198:
URL: https://github.com/apache/incubator-mxnet/issues/18198#issuecomment-621595040


   Is this possibly caused by #18025? @szha @leezu 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org