You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2019/07/18 03:24:08 UTC

[GitHub] [incubator-mxnet] arcadiaphy commented on issue #15574: fix naive engine for multi-threaded inference

arcadiaphy commented on issue #15574: fix naive engine for multi-threaded inference
URL: https://github.com/apache/incubator-mxnet/pull/15574#issuecomment-512649728

@apeforest @ZhennanQin @marcoabreu @anirudh2290
Multi-threaded inference is very common in deploying, so we need to have a solution on this scenario and provide a workable example for users. Let me sum up what I've known now.

Since threaded engine uses multi-thread scheduling for operator computation, so a natural idea is to utilize it for parallel inference which is covered in this [proposal](https://cwiki.apache.org/confluence/display/MXNET/Parallel+Inference+in+MXNet). There are two main problem for this kind of usage in mxnet:

- The threaded engine is not thread-safe, so we need to restrict the computation pushing and result pulling in one thread, otherwise we'll have all kinds of errors in engine.
- For computation pushing, one thread is OK, but when we pull the result using functions like `asnumpy`, it will block the thread and increase the latency.

Due to these problems and the lacking of a workable example, so I focus on the naive engine instead. The only example in mxnet now is cpp predict with `MXPredCreateMultiThread` API, which creates executors for each thread by sharing model parameters, a very seductive option with regard to both speed and memory footprint. In production environment, naive engine looks like a more usable method by restricting one inference to only one thread. Also, I think this way is the basis of other advanced methods, if you can't make it work in naive engine, then very likely you can't make it work in threaded engine too. There are two problems in this method too:

- The naive engine is not thread-safe, which I want to fix in this PR. Without this fix, the cpp example is broken. Maybe we'll have a better fix by planning the bigger solution on parallel inference, but using thread_local is the easiest way now.
- Perhaps there are some issues in MKLDNN calling, I've found a very strange phenomenon which is mentions in #15576. I hope that this can be fixed too since MKLDNN is so good.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

With regards,
Apache Git Services