You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2019/10/02 21:10:35 UTC

[GitHub] [incubator-mxnet] zachgk opened a new issue #16359: Flaky Scala Nightly Release Profiler

zachgk opened a new issue #16359: Flaky Scala Nightly Release Profiler
URL: https://github.com/apache/incubator-mxnet/issues/16359
 
 
   There is a flaky test on the Scala nightly Jenkins pipeline that occasionally causes it to fail. Sample failure:
   ```
   - Example CI: Test GAN MNIST
   
   [ScalaTest-main-running-DiscoverySuite] INFO org.apache.mxnetexamples.profiler.ProfilerSuite - Running profiler test...
   
   [ScalaTest-main-running-DiscoverySuite] INFO org.apache.mxnetexamples.profiler.ProfilerSuite - profile file save to /tmp
   
   terminate called after throwing an instance of 'dmlc::Error'
   
     what():  [20:31:17] src/c_api/c_api_profile.cc:141: Check failed: !thread_profiling_data.calls_.empty(): 
   
   Stack trace:
   
     [bt] (0) /tmp/mxnet6726847146594737253/libmxnet.so(+0x49240b) [0x7feab739740b]
   
     [bt] (1) /tmp/mxnet6726847146594737253/libmxnet.so(mxnet::on_exit_api()+0x38a) [0x7feab947e7ea]
   
     [bt] (2) /tmp/mxnet6726847146594737253/libmxnet.so(MXExecutorFree+0x27) [0x7feab9451af7]
   
     [bt] (3) [0x7feb71018407]
   
   
   
   
   
   Aborted (core dumped)
   ```
   
   See pipeline at http://jenkins.mxnet-ci.amazon-ml.com/job/restricted-publish-artifacts/job/master and a sample failure at http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/restricted-publish-artifacts/detail/master/121/pipeline/. The failing test suite was run other times. The same commit was used to build for CPU and did not show errors. The same actual build (binary, tests, jar) was also run on both centos7 and ubuntu18.04 without issues. So, it seems to be the result of something flaky while executing the code and rare.
   
   After some initial investigation by @samskalicky and I, the test that ran seems to be the scala Profiler Suite (https://github.com/apache/incubator-mxnet/blob/master/scala-package/examples/src/test/scala/org/apache/mxnetexamples/profiler/ProfilerSuite.scala#L33). It will set the profiler to running and targeting a temp file, run through a number of tests, and then stop the profiler. It does not seem like any of the tests were run before this error occurred, so it should probably be when starting the profiler. It runs the Scala method https://github.com/apache/incubator-mxnet/blob/master/scala-package/core/src/main/scala/org/apache/mxnet/Profiler.scala#L46 calling the JNI method https://github.com/apache/incubator-mxnet/blob/master/scala-package/native/src/main/native/org_apache_mxnet_native_c_api.cc#L2699 which calls `MXSetProfilerState(1)` in the engine.
   
   @samskalicky:
   It looks like it fails at this line:
   src/c_api/c_api_profile.cc:141: Check failed: !thread_profiling_data.calls_.empty()
   
   heres the relevant code:
   https://github.com/apache/incubator-mxnet/blob/f5ba7358d7ff0629f48445cf9dc1ce7fe2fd8e84/src/c_api/c_api_profile.cc#L130-L141 
   
   so looks like we push data on line 130 and do the check if theres any data on 141.
   
   on_enter_api is called at the beginning of some API
   on_exit_api is called when that same API exits
   
   Is it possible that Scala is setting the profiling option while some things are already running? so that when an API is called profiling is disabled, but when it exits its enabled?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services