You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2019/01/02 18:51:49 UTC
[GitHub] YutingZhang commented on issue #13593: Low CPU usage of MXNet in
subprocesses
YutingZhang commented on issue #13593: Low CPU usage of MXNet in subprocesses
URL: https://github.com/apache/incubator-mxnet/issues/13593#issuecomment-450949556
@pengzhao-intel @TaoLv @anirudh2290 @zhreshold Thank you for everyone's help, and happy new year! This problem seems more complicated (it might be multiple problems in the beginning). @zhreshold's fix solved the problem in most cases.
However, I found, if we call `asnumpy` in each worker, it interferes among the processes. And it does not seem to be a problem for GPU-version MxNet running on a GPU-machine. It seems only happening on **CPU-only machine (I tested on c5.18large with `mxnet-mkl`)**.
Code (one-line difference):
```
import argparse
import sys
from concurrent import futures
import time
import numpy as np
mx=None
def run(need_import):
if need_import:
import mxnet as mx
else:
global mx
A = mx.nd.random.uniform(low=0, high=1, shape=(5000, 5000))
while True:
A = mx.nd.dot(A, A)
A.asnumpy() # ******** only difference ***********
def parse_args():
parser = argparse.ArgumentParser("benchmark mxnet cpu")
parser.add_argument('--num-workers', '-j', dest='num_workers', type=int, default=0)
parser.add_argument('--late-import', action='store_true')
return parser.parse_args()
def main(args):
if args.num_workers == 0:
print("Main process")
try:
run(need_import=args.late_import)
except KeyboardInterrupt:
pass
else:
print("Subprocesses")
ex = futures.ProcessPoolExecutor(args.num_workers)
for _ in range(args.num_workers):
ex.submit(run, need_import=args.late_import)
while True:
try:
time.sleep(10000)
except KeyboardInterrupt:
ex.shutdown(wait=False)
break
print("Stopped")
if __name__ == "__main__":
args = parse_args()
if not args.late_import:
import mxnet as mx
main(args)
```
Launch 10 workers (`python3 mxnet_cpu_test.py --num-workers=10`). `MXNET_MP_WORKER_NTHREADS` does not affect the results.
![image](https://user-images.githubusercontent.com/7865903/50606321-3042e480-0e7a-11e9-892a-2066a6030caf.png)
But run it only in main process is fine:
![image](https://user-images.githubusercontent.com/7865903/50606810-e1964a00-0e7b-11e9-94cb-b0f61dbbea36.png)
By the way, another issue I found with `mxnet` (cpu non-mkl version) is: when you run MxNet in a subprocess, it interferes with many other non-mxnet functions (e.g., `cv2.cvtColor`). The subprocess got stuck at those functions. This did not happen for `mxnet==1.3.1`, it started to happen in some nightly build version. Probably, we should create a new ticket for this.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services