You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2019/12/05 21:52:23 UTC
[GitHub] [incubator-mxnet] anirudh2290 commented on issue #16431: [RFC] MXNet Multithreaded Inference Interface

anirudh2290 commented on issue #16431: [RFC] MXNet Multithreaded Inference Interface
URL: https://github.com/apache/incubator-mxnet/issues/16431#issuecomment-562335146
 
 
   Thanks for the thoughtful and valuable comments @arcadiaphy.
   
   > I've deployed many models with scala API, and run them in multiple threads. The whole system has run smoothly in production environment for more than 2 months.
   
   > The backend of inference is graph executor, which is created for each thread with shared model parameters. The executors can be dynamically reshaped in each thread independently according to the shape of the data input.
   
   Yes, if I am not mistaken this is very similar to how the C Predict API supports multi threaded inference today.
   
   > Like what's mentioned above, the dependency engine is not thread safe, so if you run it in threaded engine, dead lock and core dump will happen. Therefore, naive engine is the only option left. Without the dependency scheduling, any write dependency on model parameters is likely to be executed simultaneously and mess the internal data. If mkldnn is used to accelerate inference, you will get non-deterministic results per inference because mxnet stealthily reorder the data in ndarray (write dependency involved) for mkldnn operators. I've used a temporary method to address this issue which is not suitable for an official PR.
   
   This is a very useful point. In my proposal, I was concentrating mostly on ThreadedEngine and not NaiveEngine. Though, recently I added tests for NaiveEngine in my PR and everything seemed to be working fine. Till now I have not been able to reproduce the correctness issue that you mention with MKLDNN (hidden write) and NaiveEngine, but it could be because the Reorder doesnt happen in the spawned thread. Here is my test: https://github.com/apache/incubator-mxnet/pull/16654/files#diff-1335fbaf3930b1438d9be18edb07a1a6R1384 . Not sure, if something changed with MKLDNN 1.0 or my test doesnt catch that use case, will dig more into this. 
   
   
   > Multithreaded inference should be used with caution. Sharing model parameters can reduce the memory footprint in your program, but a lot of memory usage is consumed by global resources (temporary workspace, random number generator, ...) or op cache for mkldnn which are stored in static thread_local variables. So thread number is the most important factor for memory footprint, any thread involving mxnet operation, be it any trivial imperative invoking of operators, will incur memory overhead by creating its own set of thread_local variables. I've spent so much time tracking down memory leak and the best solution is to limit thread number.
   
   > A new method to do multithreaded inference by threaded engine is much welcomed here. It will solve the above issues automatically and ensure result correctness by enforcing dependency checking.
   
   Yes, the earlier approach which has one graph executor per thread, may have a lot of memory consumption for global resources. Sharing the cached op will alleviate the pain. As you know, we still have a lot of customers using graph executor as the backend. Would be a great add, if you are interested to contribute towards making graph executor also thread safe for inference use cases.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services