You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2018/12/03 18:11:15 UTC

[GitHub] samskalicky commented on issue #13438: libc getenv is not threadsafe

samskalicky commented on issue #13438: libc getenv is not threadsafe
URL: https://github.com/apache/incubator-mxnet/issues/13438#issuecomment-443810081
 
 
   In an attempt to reproduce the issue (assuming its coming from simultaneous processes initializing at the same time) I created a cpp engine unit test by modifying tests/cpp/engine/threaded_engine_test.cc and adding the following two functions:
   ```
   void handler(int sig) {
     void *array[10];
     size_t size;
   
     // get void*'s for all entries on the stack                                                                                                                        
     size = backtrace(array, 10);
   
     // print out all the frames to stderr                                                                                                                              
     fprintf(stderr, "Error: signal %d:\n", sig);
     backtrace_symbols_fd(array, size, STDERR_FILENO);
     exit(1);
   }
   
   TEST(Engine, envsegfault) {
     signal(SIGSEGV, handler);
    mxnet::Engine* engine = mxnet::engine::CreateThreadedEnginePerDevice();
   
    for (int i = 0; i < 10000; ++i) {
      engine->Stop();
      engine->Start();
    }
   }
   ```
   
   The handler attempts to capture the segfault and print the stack trace, and the envsegfault test creates a threaded engine object and repeatedly stops and starts it to trigger the SetEnv call in initialize.cc and GetEnv call in threaded_engine_perdevice.cc.
   
   Then I wrote this python script to run multiple independent processes doing this:
   
   ```
   import multiprocessing
   import time
   import os
   
   def mxnet_worker():
       for i in range(100):
           ret = os.system('./mxnet_unit_tests  --gtest_filter=Engine.envsegfault')
           if ret != 0:
               print('Failed@@@@@@@@@@@@@@@@')
               exit(2)
   
   
   read_process = [multiprocessing.Process(target=mxnet_worker) for i in range(10)]
   for p in read_process:
       p.daemon = True
       p.start()
   
   for p in read_process:
       p.join()
   ```
   
   I was able to trigger a segfault, but I hadnt added the handler yet to print the stack trace. I need to run again and try to reproduce again.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services