You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mxnet.apache.org by "Anirudh Subramanian (JIRA)" <ji...@apache.org> on 2018/05/02 21:20:00 UTC
[jira] [Comment Edited] (MXNET-323) Broadcasting ops are slow
[ https://issues.apache.org/jira/browse/MXNET-323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16461632#comment-16461632 ]
Anirudh Subramanian edited comment on MXNET-323 at 5/2/18 9:19 PM:
-------------------------------------------------------------------
Concentrating on `mx.sym.broadcast_add` which seems to be the slowest. Tried to run perf on 1000 runs of the following script. Over 1000 runs the bottleneck still seems to be the binary_broadcast_kernel itself:
{quote}{{ 47.75% python libmxnet.so [.] _ZN5mxnet2op8mxnet_op6KernelINS1_23binary_broadcast_kernelILi2EfNS0_10mshadow_op4plusEEEN7mshadow3cpuEE8LaunchExIJNS_9OpReqTypeENS7_5ShapeILi2EEES}}
{{ 16.07% python libmxnet.so [.] _ZN5mxnet2op12OperatorTuneIfE18GetOMPLoopOverheadEm._omp_fn.0}}
{{ 11.77% python [unknown] [k] 0xffffffff8181fae3}}
{{ 4.85% python libomp.so [.] _Z19__kmp_wait_templateI11kmp_flag_64EvP8kmp_infoPT_iPv}}
{{ 3.94% python [unknown] [k] 0xffffffff81823dda}}
{quote}
Trying to understand more on how the operator is implemented.
Used the following script to check for performance:
{quote}import mxnet as mx
import time
a = mx.sym.var('a')
b = mx.sym.var('b')
a_ = mx.nd.ones((2**20,))
b1 = mx.nd.ones((2**20,))
b2 = mx.nd.ones((1,))
func2 = mx.sym.broadcast_add(a, b).bind(mx.cpu(), \{'a': a_, 'b': b2})
for i in range(1000):
func2.forward()[0].wait_to_read()
{quote}
was (Author: anirudh2290):
Concentrating on `mx.sym.broadcast_add` which seems to be the slowest. Tried to run perf on 1000 runs of the following script. Over 1000 runs the bottleneck still seems to be the binary_broadcast_kernel itself:
{quote}{{ 47.75% python libmxnet.so [.] _ZN5mxnet2op8mxnet_op6KernelINS1_23binary_broadcast_kernelILi2EfNS0_10mshadow_op4plusEEEN7mshadow3cpuEE8LaunchExIJNS_9OpReqTypeENS7_5ShapeILi2EEES}}
{{ 16.07% python libmxnet.so [.] _ZN5mxnet2op12OperatorTuneIfE18GetOMPLoopOverheadEm._omp_fn.0}}
{{ 11.77% python [unknown] [k] 0xffffffff8181fae3}}
{{ 4.85% python libomp.so [.] _Z19__kmp_wait_templateI11kmp_flag_64EvP8kmp_infoPT_iPv}}
{{ 3.94% python [unknown] [k] 0xffffffff81823dda}}
{quote}
Trying to understand more on how the operator is implemented.
> Broadcasting ops are slow
> -------------------------
>
> Key: MXNET-323
> URL: https://issues.apache.org/jira/browse/MXNET-323
> Project: Apache MXNet
> Issue Type: Improvement
> Reporter: Lai Wei
> Assignee: Anirudh Subramanian
> Priority: Major
>
> https://github.com/apache/incubator-mxnet/issues/8219
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org