You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mxnet.apache.org by Alexander <no...@github.com.INVALID> on 2020/11/12 12:07:35 UTC

[apache/incubator-mxnet] [RFC] Integration with AndroidNN (#19521)

## Problem statement
Our team using mxnet for training and for inference. In recent time we have intention to run inference on Android devices so we compile mxnet using android ndk and it works fine. Now we have intention to accelerate inference on mobile devices using [android NN Api](https://developer.android.com/ndk/guides/neuralnetworks) which android support since version 8.1. This Api serve as common interface to hardware GPU/Accelerator drivers and provide api in the form of operators ( ANEURALNETWORKS_CONV_2D, ANEURALNETWORKS_AVERAGE_POOL_2D...).

## Proposed solutions
My task is to implement a proxy between mxnet and android nn using subgraph api and actually i already on half a way. I already implement selector, subgraph property, register opearator, and impement addition of major operator to android nn model based on partitioned graph. The design is similar to TensorRT subgraph but we don't use onnx as interim. So the question is, is it wise to implement subgraph for running inference on mobile device using framework which initially not have intention to run inference on mobile. I mean mxnet size in apk is about 150 MB that is pretty thick. I use mxnet 1.7. Will there a lightweight version of mxnet in future like TFlite for tensorflow? Also, any suggestions and thoughts about more appropriate solution for our problem are welcome!

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/19521

Re: [apache/incubator-mxnet] [RFC] Integration with AndroidNN (#19521)

Posted by Sam Skalicky <no...@github.com.INVALID>.
I did a quick set of builds from v1.x using make with different flags and looked at the size of libmxnet.so. Each subsequent row adds new build flags to the previous. 

Build Flags | libmxnet.so size [bytes]
------------ | -------------
None | 168573168
+USE_MKLDNN=0 | 131479368
+USE_INTGEMM=0 | 131251704
+USE_INT64_TENSOR_SIZE=0 | 131251704
+USE_CPP_PACKAGE=0 USE_DIST_KVSTORE=0 | 131251704
+USE_OPENCV=0 |	130949000
+USE_TVM_OP=0 |	130949000
+USE_NNPACK=0 |	130949000

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/19521#issuecomment-726285689

Re: [apache/incubator-mxnet] [RFC] Integration with AndroidNN (#19521)

Posted by Sam Skalicky <no...@github.com.INVALID>.
Using this set of build flags I removed diff sets of ops in v1.x on x86 and measured the libmxnet.so size. Each subsequent row removes additional ops from the previous.

`make USE_MKLDNN=0 USE_INTGEMM=0 USE_INT64_TENSOR_SIZE=0 USE_DIST_KVSTORE=0 USE_CPP_PACKAGE=0 USE_OPENCV=0 USE_TVM_OP=0 USE_NNPACK=0 -j`

Ops removed | libmxnet.so size [bytes]
------------ | -------------
top-level src/operator | 122415272
quantization | 121641000
image | 121023728
numpy | 77072424
fusion | 77031648
tvmop | 77031608
nnpack | 77031568
custom | 76693704

Trying to remove any more from `nn` or `tensor` is a fluster cluck, all those are ops are used all over the place in other MXNet sources. 

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/19521#issuecomment-726322649

Re: [apache/incubator-mxnet] [RFC] Integration with AndroidNN (#19521)

Posted by Alexander <no...@github.com.INVALID>.
Hi folks. I have a question to mxnet team related to previous talk. In our design of androidnn we approach a situation when we need to pass devices to androidnn backend. Usually, other backends (mkl, tensort) get a device through Context. The problem is context support limited list of devices (CPU,GPU). On the other hand, androidnn support other set of devices (cpu, gpu, npu...) with indexes specific to android and acquired via android ANeuralNetworks_getDevice api. So we need custom context and we have choices: 1) Modify existing Context by adding additional fields and defining a preprocessor flag MXNET_USE_ANDROIDNN in CMake. So if user pass USE_ANDROIDNN option to CMake he will use a custom context. This solution motivated by the fact that if there is a structure for passing devices - we should use it. Previous backends feel comfortable with provided set of devices, now, it's time to add support for new devices. 2) The second option is to pass all custom options, including device name and id, through MXOptimizeForBackend api which support options_map which was designed for passing custom options to backend and we can use it by passing all custom info required. Then use it when partition graph by adding a custom device to each subgraph as node attribute. Further, based on attribute, we will create a model in backend for a device based on this field. Thank you for response!

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/19521#issuecomment-751628566

Re: [apache/incubator-mxnet] [RFC] Integration with AndroidNN (#19521)

Posted by Sam Skalicky <no...@github.com.INVALID>.
How much of the binary size comes from FCompute and associated functions versus the other Operator functions (attribute parsing, inputs/outputs, shape/type/storageType inference, etc)? Any guesses?

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/19521#issuecomment-728235060

Re: [apache/incubator-mxnet] [RFC] Integration with AndroidNN (#19521)

Posted by Leonard Lausen <no...@github.com.INVALID>.
> I mean mxnet size in apk is about 150 MB that is pretty thick. I use mxnet 1.7. Will there a lightweight version of mxnet in future like TFlite for tensorflow?

It would be nice to have a lightweight version, but no-one is working on it currently. One workaround is to manually delete the operator implementation files that you don't need.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/19521#issuecomment-726249168

Re: [apache/incubator-mxnet] [RFC] Integration with AndroidNN (#19521)

Posted by Sam Skalicky <no...@github.com.INVALID>.
long term what would we want to do to exclude ops from the build? Would we want to do something like this:
https://github.com/samskalicky/incubator-mxnet/commit/f2184ceab711bf1081165d6e0c5dbca958111dae
Where we set a flag like `__EXCLUDE_ALL_OPS__` and then set flags specifically for the ops we want to include like `__INCLUDE_OP_NORM__`

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/19521#issuecomment-727653641

Re: [apache/incubator-mxnet] [RFC] Integration with AndroidNN (#19521)

Posted by "github-actions[bot]" <no...@github.com.INVALID>.
Welcome to Apache MXNet (incubating)! We are on a mission to democratize AI, and we are glad that you are contributing to it by opening this issue.
Please make sure to include all the relevant context, and one of the @apache/mxnet-committers will be here shortly.
If you are interested in contributing to our project, let us know! Also, be sure to check out our guide on [contributing to MXNet](https://mxnet.apache.org/community/contribute) and our [development guides wiki](https://cwiki.apache.org/confluence/display/MXNET/Developments).

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/19521#issuecomment-726038617

Re: [apache/incubator-mxnet] [RFC] Integration with AndroidNN (#19521)

Posted by dmitry-markeshov <no...@github.com.INVALID>.
Hi, we've built excluding not used operators and reached ~20MB size. But the main objective is performance. We believe that AndroidNN allows to compute mnxet models on GPU. 
The size should be minimal as well, coz a lot of operators just translated to AndroidNN calls.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/19521#issuecomment-726290390

Re: [apache/incubator-mxnet] [RFC] Integration with AndroidNN (#19521)

Posted by Sam Skalicky <no...@github.com.INVALID>.
Disentangling MXNet ops would be a good refactoring work. But it would be a lot of work. We may have to do it anyway to satisfy the licensing issue with Apache/Nvidia, so it might be worth doing. But like  @leezu pointed out no-one is currently working on this.  

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/19521#issuecomment-726325750

Re: [apache/incubator-mxnet] [RFC] Integration with AndroidNN (#19521)

Posted by Sam Skalicky <no...@github.com.INVALID>.
@dmitry-markeshov @AlexanderSerov the other thing you can do is run your subgraphing pass on x86 and remove the operators that will be executed by your custom backend. Then when you load optimized model on android you dont need to have those operators compiled in the mxnet build.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/19521#issuecomment-727629237

Re: [apache/incubator-mxnet] [RFC] Integration with AndroidNN (#19521)

Posted by Przemyslaw Tredak <no...@github.com.INVALID>.
Yes, currently the structure we have is
 - `operator_name.cc` which contains operator definition (+ all the infershape/type etc.) and `FCompute<cpu>`
 - `operator_name.cu` which contains just `FCompute<gpu>`

We should change that to something like:
 - `src/operator/operator_name.cc` which contains all the device independent operator definition
 - `src/operator_impl/cpu/operator_name.cc` which contains just `FCompute<cpu>`
 - `src/operator_impl/cuda/operator_name.cu` which contains just `FCompute<gpu>`

This would make it possible to have a subgraph backend replace whatever they need, as all the operator definitions would still exist. And I agree, together with the external ops functionality we could make it so `libmxnet.so` contains just the operator definitions, while separate `.so` would contains the actual implementations for different platforms. 

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/19521#issuecomment-728182152

Re: [apache/incubator-mxnet] [RFC] Integration with AndroidNN (#19521)

Posted by Alexander <no...@github.com.INVALID>.
Closed #19521 as not planned.

-- 
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/19521#event-7671464763
You are receiving this because you are subscribed to this thread.

Message ID: <ap...@github.com>

Re: [apache/incubator-mxnet] [RFC] Integration with AndroidNN (#19521)

Posted by Sam Skalicky <no...@github.com.INVALID>.
Hi @AlexanderSerov great question, let me try and break it down.

> In our design of androidnn we approach a situation when we need to pass devices to androidnn backend. Usually, other backends (mkl, tensort) get a device through Context. The problem is context support limited list of devices (CPU,GPU). On the other hand, androidnn support other set of devices (cpu, gpu, npu...) with indexes specific to android and acquired via android ANeuralNetworks_getDevice api. 

Contexts in MXNet are chosen by users with the expectation that they can run their whole model with any particular context. The MXNet community has worked hard to maintain parity between CPU and GPU contexts so that the majority of models can be executed on either context successfully. A Context in MXNet needs to be able to support executing all currently supported operators. 

How operators are implemented (custom C++, BLAS libraries, or custom NN libraries like MKL) is not visible at the context level, rather these are build-time configurations. For example, an MXNet user will use the same CPU context whether or not they use MKL or OpenBLAS as the BLAS library or whether they choose to use MKLDNN/oneDNN. We consider this a huge usability feature in MXNet, rather than having many contexts to enable each feature. Most users will find the build config that works best for them and stick to that. Having a single build with all those features enabled is not what most users want, inevitably they end up trying to minimize space on disk/device memory and try and reduce the size of the MXNet binary (like our previous discussion above). 

> So we need custom context and we have choices:
> 1. Modify existing Context by adding additional fields and defining a preprocessor flag MXNET_USE_ANDROIDNN in CMake. So if user pass USE_ANDROIDNN option to CMake he will use a custom context. This solution motivated by the fact that if there is a structure for passing devices - we should use it. Previous backends feel comfortable with provided set of devices, now, it's time to add support for new devices.

In general adding new build flags to enable custom backends in MXNet is acceptable, especially in the case where the flag would only be enabled for a particular platform (ie. ARM or Android). Adding support for new backends that would be generally applicable for all CPU types requires much more careful consideration and testing.

> 2. The second option is to pass all custom options, including device name and id, through MXOptimizeForBackend api which support options_map which was designed for passing custom options to backend and we can use it by passing all custom info required. Then use it when partition graph by adding a custom device to each subgraph as node attribute. Further, based on attribute, we will create a model in backend for a device based on this field.

Amazon is using this 2nd option for its implementation of custom backends for the Inferentia and Elastic Inference Accelerator (EIA) devices, and also for its integration of TVM compilation for MXNet models in the SageMaker Neo service. This is the preferable way to get started. It will allow you to build your custom backend, easily stay current with MXNet versions (upgrading between versions with MXNet extensions is way easier than upgrading a custom fork of the whole MXNet codebase), and simplify your distribution of your custom backend with MXNet. 

The MXNet community is currently reconsidering how MXNet has been architected for a variety of reasons (licensing issues, maintainability, etc) and looking to make the codebase more modular. So having a module backend is future-proofing your efforts as well. And it doesnt limit your contribution either, if your custom backend becomes popular you can always start a discussion on making your custom backend a default part of the MXNet codebase in the future. 


-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/19521#issuecomment-751914619

Re: [apache/incubator-mxnet] [RFC] Integration with AndroidNN (#19521)

Posted by Przemyslaw Tredak <no...@github.com.INVALID>.
I did not do any experiments, but I would be very surprised if it is not at least 50% - you only have a few very small functions for all the infers, while you need at least 8 variants for `MSHADOW_TYPE_SWITCH`, and especially on the NumPy side there are functions that use those switches nested (so 64+ variants), not to mention that those functions tend to be larger.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/19521#issuecomment-728284228

Re: [apache/incubator-mxnet] [RFC] Integration with AndroidNN (#19521)

Posted by Marco de Abreu <no...@github.com.INVALID>.
How about we just build and load the ops externally and people can just delete the ones they don't want - based on your external operator system. So we no longer embed them into the main .so file but have them alongside

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/19521#issuecomment-727686426