You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mxnet.apache.org by "Anirudh Subramanian (JIRA)" <ji...@apache.org> on 2018/05/02 21:20:00 UTC

[jira] [Comment Edited] (MXNET-323) Broadcasting ops are slow

    [ https://issues.apache.org/jira/browse/MXNET-323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16461632#comment-16461632 ] 

Anirudh Subramanian edited comment on MXNET-323 at 5/2/18 9:19 PM:
-------------------------------------------------------------------

Concentrating on `mx.sym.broadcast_add` which seems to be the slowest. Tried to run perf on 1000 runs of the following script. Over 1000 runs the bottleneck still seems to be the binary_broadcast_kernel itself:
{quote}{{ 47.75% python libmxnet.so [.] _ZN5mxnet2op8mxnet_op6KernelINS1_23binary_broadcast_kernelILi2EfNS0_10mshadow_op4plusEEEN7mshadow3cpuEE8LaunchExIJNS_9OpReqTypeENS7_5ShapeILi2EEES}}
 {{ 16.07% python libmxnet.so [.] _ZN5mxnet2op12OperatorTuneIfE18GetOMPLoopOverheadEm._omp_fn.0}}
 {{ 11.77% python [unknown] [k] 0xffffffff8181fae3}}
 {{ 4.85% python libomp.so [.] _Z19__kmp_wait_templateI11kmp_flag_64EvP8kmp_infoPT_iPv}}
 {{ 3.94% python [unknown] [k] 0xffffffff81823dda}}
{quote}
Trying to understand more on how the operator is implemented.

Used the following script to check for performance:
{quote}import mxnet as mx
 import time

a = mx.sym.var('a')
 b = mx.sym.var('b')

a_ = mx.nd.ones((2**20,))
 b1 = mx.nd.ones((2**20,))
 b2 = mx.nd.ones((1,))

func2 = mx.sym.broadcast_add(a, b).bind(mx.cpu(), \{'a': a_, 'b': b2})
 for i in range(1000):
 func2.forward()[0].wait_to_read()
{quote}


was (Author: anirudh2290):
Concentrating on `mx.sym.broadcast_add` which seems to be the slowest. Tried to run perf on 1000 runs of the following script. Over 1000 runs the bottleneck still seems to be the binary_broadcast_kernel itself:
{quote}{{ 47.75% python libmxnet.so [.] _ZN5mxnet2op8mxnet_op6KernelINS1_23binary_broadcast_kernelILi2EfNS0_10mshadow_op4plusEEEN7mshadow3cpuEE8LaunchExIJNS_9OpReqTypeENS7_5ShapeILi2EEES}}
{{ 16.07% python libmxnet.so [.] _ZN5mxnet2op12OperatorTuneIfE18GetOMPLoopOverheadEm._omp_fn.0}}
{{ 11.77% python [unknown] [k] 0xffffffff8181fae3}}
{{ 4.85% python libomp.so [.] _Z19__kmp_wait_templateI11kmp_flag_64EvP8kmp_infoPT_iPv}}
{{ 3.94% python [unknown] [k] 0xffffffff81823dda}}
{quote}
Trying to understand more on how the operator is implemented.

> Broadcasting ops are slow
> -------------------------
>
>                 Key: MXNET-323
>                 URL: https://issues.apache.org/jira/browse/MXNET-323
>             Project: Apache MXNet
>          Issue Type: Improvement
>            Reporter: Lai Wei
>            Assignee: Anirudh Subramanian
>            Priority: Major
>
> https://github.com/apache/incubator-mxnet/issues/8219



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org