You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mxnet.apache.org by kellen sunderland <ke...@gmail.com> on 2018/05/10 14:42:36 UTC

Parallel Inference Proposal

Hello MXNet developers,



I’ve recently been speaking with users who’d like to run parallel inference
requests with MXNet on their service.  They’ll do this on GPUs, and due to
resource constraints, they’d like to do this without duplicating their
model’s weights in memory.  They’d also like run inference with a low
degree of buffering/batching as latency is important.  I’ve created a wiki
page with a small proposal that I hope will make running parallel inference
a little easier.  I’d like to discuss the proposal in this thread and would
particularly appreciate it if core devs could correct me if I’ve made any
incorrect assumptions in the doc.


Proposal here:
https://cwiki.apache.org/confluence/display/MXNET/Parallel+Inference+in+MXNet



If people are OK with the proposal I can open a Jira ticket, PR, etc.  If
people are curious about perf implications I can also do some benchmarking.



Thanks in advance for the feedback,

-Kellen

Re: Parallel Inference Proposal

Posted by Can Balioglu <cb...@fastmail.com>.

Hi Kellen,

Great to see some progress on this as it is one of the major problems we face right now. Your approach seems to be a good fit for a short-/mid-term solution. Have you also considered using some sort of signaling? As far as I understand from your proposal and the example code, leveraging the 'can_read' attribute requires busy waiting in the main thread. An approach similar to Unix signals where the caller registers a handler that gets invoked when an NDArray is ready can potentially offer greater scalability.

-Can

On Thu, May 10, 2018, at 10:42, kellen sunderland wrote:
> Hello MXNet developers,
> 
> 
> 
> I’ve recently been speaking with users who’d like to run parallel inference
> requests with MXNet on their service.  They’ll do this on GPUs, and due to
> resource constraints, they’d like to do this without duplicating their
> model’s weights in memory.  They’d also like run inference with a low
> degree of buffering/batching as latency is important.  I’ve created a wiki
> page with a small proposal that I hope will make running parallel inference
> a little easier.  I’d like to discuss the proposal in this thread and would
> particularly appreciate it if core devs could correct me if I’ve made any
> incorrect assumptions in the doc.
> 
> 
> Proposal here:
> https://cwiki.apache.org/confluence/display/MXNET/Parallel+Inference+in+MXNet
> 
> 
> 
> If people are OK with the proposal I can open a Jira ticket, PR, etc.  If
> people are curious about perf implications I can also do some benchmarking.
> 
> 
> 
> Thanks in advance for the feedback,
> 
> -Kellen

Re: Parallel Inference Proposal

Posted by Hagay Lupesko <lu...@gmail.com>.

Good suggestion Kellen!

I like the idea, it will solve an existing deficiency in MXNet, that has
been worked around so far. As an example, the recently added Scala
inference API (part of 1.2RC) implemented a dispatcher in Scala to
workaround that limitation.

Would be great to better understand the changes you are planning in finer
details though.

Hagay

On Thu, May 10, 2018 at 7:42 AM, kellen sunderland <
kellen.sunderland@gmail.com> wrote:

> Hello MXNet developers,
>
>
>
> I’ve recently been speaking with users who’d like to run parallel inference
> requests with MXNet on their service.  They’ll do this on GPUs, and due to
> resource constraints, they’d like to do this without duplicating their
> model’s weights in memory.  They’d also like run inference with a low
> degree of buffering/batching as latency is important.  I’ve created a wiki
> page with a small proposal that I hope will make running parallel inference
> a little easier.  I’d like to discuss the proposal in this thread and would
> particularly appreciate it if core devs could correct me if I’ve made any
> incorrect assumptions in the doc.
>
>
> Proposal here:
> https://cwiki.apache.org/confluence/display/MXNET/
> Parallel+Inference+in+MXNet
>
>
>
> If people are OK with the proposal I can open a Jira ticket, PR, etc.  If
> people are curious about perf implications I can also do some benchmarking.
>
>
>
> Thanks in advance for the feedback,
>
> -Kellen
>