You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2019/12/22 05:21:09 UTC

[GitHub] [incubator-mxnet] reminisce commented on issue #17097: [RFC][mxnet 2.0][item 10.1] MXNet Imperative Op Invocation Overhead

reminisce commented on issue #17097: [RFC][mxnet 2.0][item 10.1] MXNet Imperative Op Invocation Overhead
URL: https://github.com/apache/incubator-mxnet/issues/17097#issuecomment-568233782

@ptrendx Yes, there is an effort of profiling engine code flow using VTune. We hope the exercise can pinpoint the hotspots that contribute to the most part of latency. Further time split for pure C++ part between setup code (shape/type inference, memory allocation, dependency setup) and op scheduling is also around 50% vs. 50%.

For the "fast path" data structures, I'm summarizing the items as follows (including the ones suggested by @sxjscience):

- `tuple` and `list` since they can be interchangeable in NumPy semantics to represent shapes and axes.
- `str` because einsum has this parameter and the op can be intensively used in transformer models.
- `py_slice`, `Ellipsis`, `None` for basic indexing. We can do one step further by moving the whole indexing dispatch logic to backend.
- np scalars.
- `mx.context.Context`. One call of `mx.cpu()` can be as large as 600ns using ctypes. One thought is do it in the pybind way by creating a Python binding for the backend `Context` class.
- `np.dtype`. Similar to `Context`.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

With regards,
Apache Git Services