You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2018/12/09 23:51:54 UTC

[GitHub] YutingZhang opened a new issue #13593: Low CPU usage of MXNet

YutingZhang opened a new issue #13593: Low CPU usage of MXNet
URL: https://github.com/apache/incubator-mxnet/issues/13593
 
 
   MXNet has low CPU usage when running CPU operations in multiple process scenarios. Specifically, for MXNet computation in a subprocess, MxNet can use only 1 or 2 CPUs to do its job. This issue shows different behavior for different variants of MxNet (see below) and on different machines ...
   
   This is tested on the 20181207 version, and other versions (e.g., 1.3.1) show similar problems. 
   
   Code to reproduce the issue
   
   Filename: `mxnet_cpu_test.py`
   ```python
   import argparse
   import sys
   from concurrent import futures
   import time
   import numpy as np
   mx=None
   
   
   def run(need_import):
       if need_import:
           import mxnet as mx
       else:
           global mx
       A = mx.nd.random.uniform(low=0, high=1, shape=(5000, 5000))
       while True:
           A = mx.nd.dot(A, A)
   
   def parse_args():
       parser = argparse.ArgumentParser("benchmark mxnet cpu")
       parser.add_argument('--num-workers', '-j', dest='num_workers', type=int, default=0)
       parser.add_argument('--late-import', action='store_true')
       return parser.parse_args()
   
   def main(args):
   
       if args.num_workers == 0:
           print("Main process")
           try:
               run(need_import=args.late_import)
           except KeyboardInterrupt:
               pass
       else:
           print("Subprocesses")
           ex = futures.ProcessPoolExecutor(args.num_workers)
   
           for _ in range(args.num_workers):
               ex.submit(run, need_import=args.late_import)
           while True:
               try:
                   time.sleep(10000)
               except KeyboardInterrupt:
                   ex.shutdown(wait=False)
                   break
       print("Stopped")
   
   
   if __name__ == "__main__":
       args = parse_args()
       if not args.late_import:
          import mxnet as mx
       main(args)
   
   ```
   
   Detailed experiments:
   
   - Run in the main process:
   `python3 mxnet_cpu_test.py --num-workers=0`
   ![image](https://user-images.githubusercontent.com/7865903/49704337-a3807000-fbc6-11e8-9118-0c7034e52cf9.png)
   Working fine for all mxnet variants (GPU or CPU-only).
   
   - Run in two subproceses
   -- `mxnet-cu90` on p3.16x:
   `python3 mxnet_cpu_test.py --num-workers=2`
   ![image](https://user-images.githubusercontent.com/7865903/49704395-420cd100-fbc7-11e8-9607-a0c907b2057a.png)
   It uses only 2 CPUs per subprocess.
   -- `mxnet-mkl` on p3.16x:
   `python3 mxnet_cpu_test.py --num-workers=2`
   ![image](https://user-images.githubusercontent.com/7865903/49704444-14745780-fbc8-11e8-8754-81b90af4f876.png)
   Same here. It uses only 2 CPUs per subprocess.
   -- `mxnet-mkl` on **CPU-only machine c5.18x**:
   `python3 mxnet_cpu_test.py --num-workers=2`
   ![image](https://user-images.githubusercontent.com/7865903/49704457-3d94e800-fbc8-11e8-8831-9136465fad1f.png)
   **Even worse.** It uses only 1.5 CPUs per subprocess.
   -- However, for vanilla CPU-version `mxnet` on c5.18x:
   `python3 mxnet_cpu_test.py --num-workers=2`
   ![image](https://user-images.githubusercontent.com/7865903/49704510-e2afc080-fbc8-11e8-8548-2505a7070205.png)
   It is working better. At least, it uses 5 CPUs per subprocess.
   -- Weirdly, still vanilla CPU-version `mxnet` but on **GPU machine p3.16x**:
   `python3 mxnet_cpu_test.py --num-workers=2`
   ![image](https://user-images.githubusercontent.com/7865903/49704532-1a1e6d00-fbc9-11e8-8009-95519fd9f1ef.png)
   It is working worse, i.e., 2 CPUs per subprocesses.
   - This problem seems relevant to how MXNet manage the thread per subprocess. If I do not `import 
    mxnet` in the main process and instead `import mxnet` in each subprocess:
   `python3 mxnet_cpu_test.py --num-workers=2 --late-import`
   ![image](https://user-images.githubusercontent.com/7865903/49704599-d11ae880-fbc9-11e8-8460-e7d1e53abb13.png)
   Then everything is working fine. 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services