You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2019/01/02 18:51:49 UTC

[GitHub] YutingZhang commented on issue #13593: Low CPU usage of MXNet in subprocesses

YutingZhang commented on issue #13593: Low CPU usage of MXNet in subprocesses
URL: https://github.com/apache/incubator-mxnet/issues/13593#issuecomment-450949556
 
 
   @pengzhao-intel @TaoLv @anirudh2290 @zhreshold Thank you for everyone's help, and happy new year! This problem seems more complicated (it might be multiple problems in the beginning). @zhreshold's fix solved the problem in most cases. 
   However, I found, if we call `asnumpy` in each worker, it interferes among the processes. And it does not seem to be a problem for GPU-version MxNet running on a GPU-machine. It seems only happening on **CPU-only machine (I tested on c5.18large with `mxnet-mkl`)**.
   
   Code (one-line difference):
   ```
   import argparse
   import sys
   from concurrent import futures
   import time
   import numpy as np
   mx=None
   
   
   def run(need_import):
       if need_import:
           import mxnet as mx
       else:
           global mx
       A = mx.nd.random.uniform(low=0, high=1, shape=(5000, 5000))
       while True:
           A = mx.nd.dot(A, A)
           A.asnumpy()    # ******** only difference ***********
   
   def parse_args():
       parser = argparse.ArgumentParser("benchmark mxnet cpu")
       parser.add_argument('--num-workers', '-j', dest='num_workers', type=int, default=0)
       parser.add_argument('--late-import', action='store_true')
       return parser.parse_args()
   
   def main(args):
   
       if args.num_workers == 0:
           print("Main process")
           try:
               run(need_import=args.late_import)
           except KeyboardInterrupt:
               pass
       else:
           print("Subprocesses")
           ex = futures.ProcessPoolExecutor(args.num_workers)
   
           for _ in range(args.num_workers):
               ex.submit(run, need_import=args.late_import)
           while True:
               try:
                   time.sleep(10000)
               except KeyboardInterrupt:
                   ex.shutdown(wait=False)
                   break
       print("Stopped")
   
   
   if __name__ == "__main__":
       args = parse_args()
       if not args.late_import:
          import mxnet as mx
       main(args)
   ```
   
   Launch 10 workers (`python3 mxnet_cpu_test.py --num-workers=10`). `MXNET_MP_WORKER_NTHREADS` does not affect the results.
   ![image](https://user-images.githubusercontent.com/7865903/50606321-3042e480-0e7a-11e9-892a-2066a6030caf.png)
   
   But run it only in main process is fine:
   ![image](https://user-images.githubusercontent.com/7865903/50606810-e1964a00-0e7b-11e9-94cb-b0f61dbbea36.png)
   
   
   By the way, another issue I found with `mxnet` (cpu non-mkl version) is: when you run MxNet in a subprocess, it interferes with many other non-mxnet functions (e.g., `cv2.cvtColor`). The subprocess got stuck at those functions. This did not happen for `mxnet==1.3.1`, it started to happen in some nightly build version. Probably, we should create a new ticket for this.
   
    
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services