You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2018/03/07 08:17:30 UTC

[GitHub] rahul003 commented on issue #9152: tutorial for distributed training

rahul003 commented on issue #9152: tutorial for distributed training
URL: https://github.com/apache/incubator-mxnet/pull/9152#issuecomment-371059065
 
 
   @TaoLv Sorry I missed your comment. You can profile the worker process similar to a single machine case. 
   ```
   mx.profiler.profiler_set_config(mode='all', filename= str(kv.rank) + 'profile_output.json')
   mx.profiler.profiler_set_state('run')
       # Code to be profiled goes here...
   mx.profiler.profiler_set_state('stop')
   ```
   Note the use of rank above to ensure that the path to save profile should be different for different workers. 
   
   There you can look for the operators KVStore Push/Pull to see time taken for communication.
   
   I'll add a proper section for profiling to the tutorial once https://github.com/apache/incubator-mxnet/pull/9933 is merged. That makes it easy to profile the server processes too.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services