You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mxnet.apache.org by Marco de Abreu <ma...@apache.org> on 2019/04/04 19:29:32 UTC

[MXNET 2.0 Wishlist] [DISCUSS] Backend choices during runtime

Hello,

I'd like to start a discussion about something that I've noticed being
troublesome to maintain in the current version: Backend choices being made
at compile time.

Right now, the different backends and accelerators (CPU, cuda, mkl, AWS
elastic inference, (future) AMD, openblas,TVM, etc) are all scattered
across the different layers of MXNet. On one hand, we have compile time
flags that decide which backends are being compiled into the binary, while
at the same time choices can be made in the frontend during runtime.

At the moment, we have a lot of conditional build logic that picks
different parts. With the addition of MKLML and later MKLDNN the clear
separation of CPU and GPU got kind of broken up. While we have some places
where each code lives, in the end we resort to some files containing a lot
of conditional logic for the different backends (sorry I can't provide
links right now since I'm on mobile). To me this seems like a residue of
the fast development style from the early days (more processor statement
and less object orientation) while also having organic growth with new
accelerators. When I see how much AMD had to hack to fit in their
implementation, it seemed like we have to make this part more developer
friendly.

At the moment, every new flavour of MXNet has to be entirely recompiled.
This makes it hard for users to figure out which options to use, while it
makes it harder for us to test since the overhead to test every single
combination of compile parameters would be overwhelming.

I'd propose to have a clear class hierarchy based structure for
accelerators, operators and memory management. This structure can then be
implemented by the different backends. To reduce the compile burden, we
would introduce dynamic loading and split the different backends into
modules. These could then be developed, maintained and compiled on their
own and then placed in a "module" folder to be loaded at runtime. Adding a
new accelerator would be a matter of placing the precompiled binary into
the folder. The detailed configuration of that Backend would then be done
on runtime - the user shouldn't worry at the point of downloading mxnet
whether they want mkl, MKLDNN, mkl, openblas, atlas, TVM, cuda or what ever
else there is. I have an idea how we could help the user choosing, but
that's outside the scope of this proposal.

This would allow us to have a "core" MXNet that takes care of the engine,
scheduling, communication and all the other crucial parts. On the other
hand we could make MXNet less of a monolith and have clear interfaces. This
would also act as a forcing function because the different parts wouldn't
be intermingled but have to follow the common interface.

Of course this comes with the question what these interfaces would look
like. For operators, I'd like to propose getting inspiring (or fully
adapting) ONNX. For memory management and other Backend specific things we
could look at the current implementations and find a common ground.

Back when I had a community driven project, we heavily used this modularity
and it brought great benefits - besides the fact that our core was closed
source. It allowed community developers to act entirely independent from
other parts and even allowed them to add their own logic without having to
touch the core. Thinking about companies that implement their own backends
or have special tweaked operators without wanting to disclose them, this
structure would avoid them having to fork the project and then spend a lot
of effort porting the changes to the latest source release versions.
Instead, they would maintain their module and we as MXNet community would
only have to maintain these interfaces.

Right now this is a lot of prosa and basically a brain dump of my thoughts.
I'd be happy to follow up with details, but first I'd be curious what the
community thinks about this design.

Best regards,
Marco

Re: [MXNET 2.0 Wishlist] [DISCUSS] Backend choices during runtime

Posted by Junru Shao <ju...@gmail.com>.

+1 Thanks Marco for sharing this!

It is great to see people agree with this feature and we actually have been
planning for this for a while. We would love to share this plan as soon as
possible.


On Mon, Apr 8, 2019 at 9:42 AM Tianqi Chen <tq...@cs.washington.edu> wrote:

> Just to clarify. I am not questioning the usefulness of the separation.
> Just want to highlight the technical challenges here based on our past
> experiences.
>
> Crossing DLL boundaries in C++ can create quite a lot of problems,
> especially some of the dependencies used a different version of the
> compiler, follows static packaging or simply because of the dynamic linking
> difference in windows. These problems could make this direction move less
> appealing compared to focusing effort on other things.
>
> Technically, as a first step, it is possible to make dependencies change
> not change the global header files and via registration so that changing
> certain component won't trigger a global recompile in CMake. This is also a
> required step toward some modularity.
>
> For plugins, solutions that use C ABI can be used for certain plugin
> modules.
>
> Some of the discussion has been tied to what the interface should look
> like. I think we should use different threads for these and puts in more
> thoughts.
>
> Tianqi
>
>
>
> On Sun, Apr 7, 2019 at 4:39 PM kellen sunderland <
> kellen.sunderland@gmail.com> wrote:
>
> > I think we can make some incremental progress.  My thoughts were along
> the
> > lines of plugins (thinking about what happens with the VLC project).  At
> > process launch time we could gather some information about our execution
> > environment (either through configuration, or by convention looking at
> our
> > folder structure and libraries available).  We could then later load the
> > components we need after understanding if we're using a CUDA backend and
> > what operators or subgraph components we would need.  Advantages would be
> > that we would move a lot of the current conditional compile logic to
> > runtime, and automate a lot of it.  It would also make packaging binaries
> > for targeted environments a little easier.  As an example we could
> compile
> > once, then remove CUDA focused libraries for systems that are going to
> run
> > on CPUs.
> >
> > On Sun, Apr 7, 2019 at 2:45 PM Tianqi Chen <tq...@cs.washington.edu>
> > wrote:
> >
> > > While I personally like the idea. This can be something that is fairly
> > > technical challenging and I would caution against this idea vs pushing
> > for
> > > good features and just allow runtime configuration.
> > >
> > > The main problem here is due to the C++ ABI. There is no standard c++
> ABI
> > > across compilers, which means resorting to runtime DLL and dynamic
> > loading
> > > brings all sorts of technical problems, especially when multiple
> modules
> > > depend on the same third dependency(CUDA runtime).
> > > There is no good to go solution can be made here, especially given the
> > > explosion of the backend variants and dependencies in C++.
> > > A partial solution could be achieved, through the sole use of C ABI.
> > > Combing this with code generation can result in some simplifications
> and
> > > enable some runtime loadable module. TVM does this, and perhaps MXNet
> > could
> > > reuse some of that component for operator libraries. Similarly, having
> a
> > > customizable operator library that is loadable via C ABI might be
> > possible.
> > >
> > > So to summarize, while I really like the idea of dynamically loadable
> > > modules. My past experience suggests that this will bring a lot of
> > > additional engineering burden and technical debts without significant
> > > benefit. I would suggest starting by supporting something simple like a
> > > plugin module, before moving toward the general direction.
> > >
> > > Tianqi
> > >
> > > On Sun, Apr 7, 2019 at 1:31 PM kellen sunderland <
> > > kellen.sunderland@gmail.com> wrote:
> > >
> > > > Strongly support the idea of runtime loadable components in MXNet.
> > > There's
> > > > no reason (other than perhaps engineering effort) we can't have a
> > single
> > > > compilation of MXNet that finds dependencies and chooses execution
> > paths
> > > > intelligently (or based on configuration) at runtime.
> > > >
> > > > On Thu, Apr 4, 2019 at 12:29 PM Marco de Abreu <
> marcoabreu@apache.org>
> > > > wrote:
> > > >
> > > > > Hello,
> > > > >
> > > > > I'd like to start a discussion about something that I've noticed
> > being
> > > > > troublesome to maintain in the current version: Backend choices
> being
> > > > made
> > > > > at compile time.
> > > > >
> > > > > Right now, the different backends and accelerators (CPU, cuda, mkl,
> > AWS
> > > > > elastic inference, (future) AMD, openblas,TVM, etc) are all
> scattered
> > > > > across the different layers of MXNet. On one hand, we have compile
> > time
> > > > > flags that decide which backends are being compiled into the
> binary,
> > > > while
> > > > > at the same time choices can be made in the frontend during
> runtime.
> > > > >
> > > > > At the moment, we have a lot of conditional build logic that picks
> > > > > different parts. With the addition of MKLML and later MKLDNN the
> > clear
> > > > > separation of CPU and GPU got kind of broken up. While we have some
> > > > places
> > > > > where each code lives, in the end we resort to some files
> containing
> > a
> > > > lot
> > > > > of conditional logic for the different backends (sorry I can't
> > provide
> > > > > links right now since I'm on mobile). To me this seems like a
> residue
> > > of
> > > > > the fast development style from the early days (more processor
> > > statement
> > > > > and less object orientation) while also having organic growth with
> > new
> > > > > accelerators. When I see how much AMD had to hack to fit in their
> > > > > implementation, it seemed like we have to make this part more
> > developer
> > > > > friendly.
> > > > >
> > > > > At the moment, every new flavour of MXNet has to be entirely
> > > recompiled.
> > > > > This makes it hard for users to figure out which options to use,
> > while
> > > it
> > > > > makes it harder for us to test since the overhead to test every
> > single
> > > > > combination of compile parameters would be overwhelming.
> > > > >
> > > > > I'd propose to have a clear class hierarchy based structure for
> > > > > accelerators, operators and memory management. This structure can
> > then
> > > be
> > > > > implemented by the different backends. To reduce the compile
> burden,
> > we
> > > > > would introduce dynamic loading and split the different backends
> into
> > > > > modules. These could then be developed, maintained and compiled on
> > > their
> > > > > own and then placed in a "module" folder to be loaded at runtime.
> > > Adding
> > > > a
> > > > > new accelerator would be a matter of placing the precompiled binary
> > > into
> > > > > the folder. The detailed configuration of that Backend would then
> be
> > > done
> > > > > on runtime - the user shouldn't worry at the point of downloading
> > mxnet
> > > > > whether they want mkl, MKLDNN, mkl, openblas, atlas, TVM, cuda or
> > what
> > > > ever
> > > > > else there is. I have an idea how we could help the user choosing,
> > but
> > > > > that's outside the scope of this proposal.
> > > > >
> > > > > This would allow us to have a "core" MXNet that takes care of the
> > > engine,
> > > > > scheduling, communication and all the other crucial parts. On the
> > other
> > > > > hand we could make MXNet less of a monolith and have clear
> > interfaces.
> > > > This
> > > > > would also act as a forcing function because the different parts
> > > wouldn't
> > > > > be intermingled but have to follow the common interface.
> > > > >
> > > > > Of course this comes with the question what these interfaces would
> > look
> > > > > like. For operators, I'd like to propose getting inspiring (or
> fully
> > > > > adapting) ONNX. For memory management and other Backend specific
> > things
> > > > we
> > > > > could look at the current implementations and find a common ground.
> > > > >
> > > > > Back when I had a community driven project, we heavily used this
> > > > modularity
> > > > > and it brought great benefits - besides the fact that our core was
> > > closed
> > > > > source. It allowed community developers to act entirely independent
> > > from
> > > > > other parts and even allowed them to add their own logic without
> > having
> > > > to
> > > > > touch the core. Thinking about companies that implement their own
> > > > backends
> > > > > or have special tweaked operators without wanting to disclose them,
> > > this
> > > > > structure would avoid them having to fork the project and then
> spend
> > a
> > > > lot
> > > > > of effort porting the changes to the latest source release
> versions.
> > > > > Instead, they would maintain their module and we as MXNet community
> > > would
> > > > > only have to maintain these interfaces.
> > > > >
> > > > > Right now this is a lot of prosa and basically a brain dump of my
> > > > thoughts.
> > > > > I'd be happy to follow up with details, but first I'd be curious
> what
> > > the
> > > > > community thinks about this design.
> > > > >
> > > > > Best regards,
> > > > > Marco
> > > > >
> > > >
> > >
> >
>

Re: [MXNET 2.0 Wishlist] [DISCUSS] Backend choices during runtime

Posted by Tianqi Chen <tq...@cs.washington.edu>.

+1.

While I like slack, personally,  I don't think we should treat slack as
public-archive. "everything that happens (also) happens in dev@"

Tianqi



On Fri, Apr 12, 2019 at 1:19 AM Marco de Abreu <ma...@gmail.com>
wrote:

> I'd prefer if we keep discussions on the dev-list instead of slack - feel
> free to open another thread.
>
> -Marco
>
> Pedro Larroy <pe...@gmail.com> schrieb am Fr., 12. Apr. 2019,
> 02:24:
>
> > I will respond in slack, so we don't derail the original thread's
> > topic with my points.
> >
> > Looking forward to your proposal.
> >
> > On Thu, Apr 11, 2019 at 1:00 PM Junru Shao <ju...@gmail.com>
> > wrote:
> > >
> > > I don't have idea about the following issues:
> > >
> > > 1) Reducing the abuse of inlined code moving more logic to
> implementation
> > > files and improve scoping which will also speed up compilation
> > > 2) Reduce runtime of some unit tests
> > > 3) Improve MXNet startup time
> > >
> > > Will be super interested to hear about your ideas :-)
> > >
> > >
> > > On Thu, Apr 11, 2019 at 12:52 PM Junru Shao <ju...@gmail.com>
> > wrote:
> > >
> > > > We have a systematic solution to go without ABI headache. I am
> > struggling
> > > > with some errants, and will share our proposal here as soon as I
> could.
> > > > This will be very interesting topic to discuss. Let's work hard
> > together
> > > > and make it perfect :-)
> > > >
> > > > On Thu, Apr 11, 2019 at 12:43 PM Pedro Larroy <
> > > > pedro.larroy.lists@gmail.com> wrote:
> > > >
> > > >> Thanks Marco for raising this issue. I think we can certainly do
> some
> > > >> improvements in modularization and build. At the same time Tianqi's
> > > >> point of view is important to consider and on point. I see a high
> risk
> > > >> of overengineering in such endeavor.
> > > >>
> > > >> I also see increased complexity, difficulty debugging, C++ ABI
> > > >> headaches, API compatibility, crashes inside a binary module, etc.
> > > >> which I don't want to deal with as a developer or even as an MXNet
> > > >> user. Does somebody have answers to these problems?
> > > >>
> > > >> If somebody thinks they have a good solution, by all means propose a
> > > >> design in the wiki, I think we are all open. Personally I see
> several
> > > >> other lower hanging fruits which need our attention:
> > > >>  * Simplifying our build logic,
> > > >>  * Cuda selection in CMake,
> > > >>  * Reducing the abuse of inlined code moving more logic to
> > > >> implementation files and improve scoping which will also speed up
> > > >> compilation, (some units take more than 5 minutes to build and lots
> of
> > > >> RAM in a top of the line CPU core)
> > > >>  * Reduce runtime of some unit tests
> > > >> And other  improvements in our codebase that would bring immediate
> > > >> benefits without the risks of overengineering of a plugin system. I
> > > >> also question our bandwidth for such an endeavor.
> > > >>  * Improve MXNet startup time.
> > > >>  * Thread safety
> > > >>
> > > >> I would say, let's apply the KISS principle, let's make the project
> > > >> fast to build, easy to work on, well documented and easy to
> contribute
> > > >> to before building the next Netscape browser. Otherwise we could
> save
> > > >> ourselves this exercise and switch to Rust directly.
> > > >>
> > > >> Pedro.
> > > >>
> > > >>
> > > >>
> > > >> On Mon, Apr 8, 2019 at 9:42 AM Tianqi Chen <
> tqchen@cs.washington.edu>
> > > >> wrote:
> > > >> >
> > > >> > Just to clarify. I am not questioning the usefulness of the
> > separation.
> > > >> > Just want to highlight the technical challenges here based on our
> > past
> > > >> > experiences.
> > > >> >
> > > >> > Crossing DLL boundaries in C++ can create quite a lot of problems,
> > > >> > especially some of the dependencies used a different version of
> the
> > > >> > compiler, follows static packaging or simply because of the
> dynamic
> > > >> linking
> > > >> > difference in windows. These problems could make this direction
> move
> > > >> less
> > > >> > appealing compared to focusing effort on other things.
> > > >> >
> > > >> > Technically, as a first step, it is possible to make dependencies
> > change
> > > >> > not change the global header files and via registration so that
> > changing
> > > >> > certain component won't trigger a global recompile in CMake. This
> is
> > > >> also a
> > > >> > required step toward some modularity.
> > > >> >
> > > >> > For plugins, solutions that use C ABI can be used for certain
> plugin
> > > >> > modules.
> > > >> >
> > > >> > Some of the discussion has been tied to what the interface should
> > look
> > > >> > like. I think we should use different threads for these and puts
> in
> > more
> > > >> > thoughts.
> > > >> >
> > > >> > Tianqi
> > > >> >
> > > >> >
> > > >> >
> > > >> > On Sun, Apr 7, 2019 at 4:39 PM kellen sunderland <
> > > >> > kellen.sunderland@gmail.com> wrote:
> > > >> >
> > > >> > > I think we can make some incremental progress.  My thoughts were
> > > >> along the
> > > >> > > lines of plugins (thinking about what happens with the VLC
> > project).
> > > >> At
> > > >> > > process launch time we could gather some information about our
> > > >> execution
> > > >> > > environment (either through configuration, or by convention
> > looking
> > > >> at our
> > > >> > > folder structure and libraries available).  We could then later
> > load
> > > >> the
> > > >> > > components we need after understanding if we're using a CUDA
> > backend
> > > >> and
> > > >> > > what operators or subgraph components we would need.  Advantages
> > > >> would be
> > > >> > > that we would move a lot of the current conditional compile
> logic
> > to
> > > >> > > runtime, and automate a lot of it.  It would also make packaging
> > > >> binaries
> > > >> > > for targeted environments a little easier.  As an example we
> could
> > > >> compile
> > > >> > > once, then remove CUDA focused libraries for systems that are
> > going
> > > >> to run
> > > >> > > on CPUs.
> > > >> > >
> > > >> > > On Sun, Apr 7, 2019 at 2:45 PM Tianqi Chen <
> > tqchen@cs.washington.edu>
> > > >> > > wrote:
> > > >> > >
> > > >> > > > While I personally like the idea. This can be something that
> is
> > > >> fairly
> > > >> > > > technical challenging and I would caution against this idea vs
> > > >> pushing
> > > >> > > for
> > > >> > > > good features and just allow runtime configuration.
> > > >> > > >
> > > >> > > > The main problem here is due to the C++ ABI. There is no
> > standard
> > > >> c++ ABI
> > > >> > > > across compilers, which means resorting to runtime DLL and
> > dynamic
> > > >> > > loading
> > > >> > > > brings all sorts of technical problems, especially when
> multiple
> > > >> modules
> > > >> > > > depend on the same third dependency(CUDA runtime).
> > > >> > > > There is no good to go solution can be made here, especially
> > given
> > > >> the
> > > >> > > > explosion of the backend variants and dependencies in C++.
> > > >> > > > A partial solution could be achieved, through the sole use of
> C
> > ABI.
> > > >> > > > Combing this with code generation can result in some
> > > >> simplifications and
> > > >> > > > enable some runtime loadable module. TVM does this, and
> perhaps
> > > >> MXNet
> > > >> > > could
> > > >> > > > reuse some of that component for operator libraries.
> Similarly,
> > > >> having a
> > > >> > > > customizable operator library that is loadable via C ABI might
> > be
> > > >> > > possible.
> > > >> > > >
> > > >> > > > So to summarize, while I really like the idea of dynamically
> > > >> loadable
> > > >> > > > modules. My past experience suggests that this will bring a
> lot
> > of
> > > >> > > > additional engineering burden and technical debts without
> > > >> significant
> > > >> > > > benefit. I would suggest starting by supporting something
> simple
> > > >> like a
> > > >> > > > plugin module, before moving toward the general direction.
> > > >> > > >
> > > >> > > > Tianqi
> > > >> > > >
> > > >> > > > On Sun, Apr 7, 2019 at 1:31 PM kellen sunderland <
> > > >> > > > kellen.sunderland@gmail.com> wrote:
> > > >> > > >
> > > >> > > > > Strongly support the idea of runtime loadable components in
> > MXNet.
> > > >> > > > There's
> > > >> > > > > no reason (other than perhaps engineering effort) we can't
> > have a
> > > >> > > single
> > > >> > > > > compilation of MXNet that finds dependencies and chooses
> > execution
> > > >> > > paths
> > > >> > > > > intelligently (or based on configuration) at runtime.
> > > >> > > > >
> > > >> > > > > On Thu, Apr 4, 2019 at 12:29 PM Marco de Abreu <
> > > >> marcoabreu@apache.org>
> > > >> > > > > wrote:
> > > >> > > > >
> > > >> > > > > > Hello,
> > > >> > > > > >
> > > >> > > > > > I'd like to start a discussion about something that I've
> > noticed
> > > >> > > being
> > > >> > > > > > troublesome to maintain in the current version: Backend
> > choices
> > > >> being
> > > >> > > > > made
> > > >> > > > > > at compile time.
> > > >> > > > > >
> > > >> > > > > > Right now, the different backends and accelerators (CPU,
> > cuda,
> > > >> mkl,
> > > >> > > AWS
> > > >> > > > > > elastic inference, (future) AMD, openblas,TVM, etc) are
> all
> > > >> scattered
> > > >> > > > > > across the different layers of MXNet. On one hand, we have
> > > >> compile
> > > >> > > time
> > > >> > > > > > flags that decide which backends are being compiled into
> the
> > > >> binary,
> > > >> > > > > while
> > > >> > > > > > at the same time choices can be made in the frontend
> during
> > > >> runtime.
> > > >> > > > > >
> > > >> > > > > > At the moment, we have a lot of conditional build logic
> that
> > > >> picks
> > > >> > > > > > different parts. With the addition of MKLML and later
> > MKLDNN the
> > > >> > > clear
> > > >> > > > > > separation of CPU and GPU got kind of broken up. While we
> > have
> > > >> some
> > > >> > > > > places
> > > >> > > > > > where each code lives, in the end we resort to some files
> > > >> containing
> > > >> > > a
> > > >> > > > > lot
> > > >> > > > > > of conditional logic for the different backends (sorry I
> > can't
> > > >> > > provide
> > > >> > > > > > links right now since I'm on mobile). To me this seems
> like
> > a
> > > >> residue
> > > >> > > > of
> > > >> > > > > > the fast development style from the early days (more
> > processor
> > > >> > > > statement
> > > >> > > > > > and less object orientation) while also having organic
> > growth
> > > >> with
> > > >> > > new
> > > >> > > > > > accelerators. When I see how much AMD had to hack to fit
> in
> > > >> their
> > > >> > > > > > implementation, it seemed like we have to make this part
> > more
> > > >> > > developer
> > > >> > > > > > friendly.
> > > >> > > > > >
> > > >> > > > > > At the moment, every new flavour of MXNet has to be
> entirely
> > > >> > > > recompiled.
> > > >> > > > > > This makes it hard for users to figure out which options
> to
> > use,
> > > >> > > while
> > > >> > > > it
> > > >> > > > > > makes it harder for us to test since the overhead to test
> > every
> > > >> > > single
> > > >> > > > > > combination of compile parameters would be overwhelming.
> > > >> > > > > >
> > > >> > > > > > I'd propose to have a clear class hierarchy based
> structure
> > for
> > > >> > > > > > accelerators, operators and memory management. This
> > structure
> > > >> can
> > > >> > > then
> > > >> > > > be
> > > >> > > > > > implemented by the different backends. To reduce the
> compile
> > > >> burden,
> > > >> > > we
> > > >> > > > > > would introduce dynamic loading and split the different
> > > >> backends into
> > > >> > > > > > modules. These could then be developed, maintained and
> > compiled
> > > >> on
> > > >> > > > their
> > > >> > > > > > own and then placed in a "module" folder to be loaded at
> > > >> runtime.
> > > >> > > > Adding
> > > >> > > > > a
> > > >> > > > > > new accelerator would be a matter of placing the
> precompiled
> > > >> binary
> > > >> > > > into
> > > >> > > > > > the folder. The detailed configuration of that Backend
> would
> > > >> then be
> > > >> > > > done
> > > >> > > > > > on runtime - the user shouldn't worry at the point of
> > > >> downloading
> > > >> > > mxnet
> > > >> > > > > > whether they want mkl, MKLDNN, mkl, openblas, atlas, TVM,
> > cuda
> > > >> or
> > > >> > > what
> > > >> > > > > ever
> > > >> > > > > > else there is. I have an idea how we could help the user
> > > >> choosing,
> > > >> > > but
> > > >> > > > > > that's outside the scope of this proposal.
> > > >> > > > > >
> > > >> > > > > > This would allow us to have a "core" MXNet that takes care
> > of
> > > >> the
> > > >> > > > engine,
> > > >> > > > > > scheduling, communication and all the other crucial parts.
> > On
> > > >> the
> > > >> > > other
> > > >> > > > > > hand we could make MXNet less of a monolith and have clear
> > > >> > > interfaces.
> > > >> > > > > This
> > > >> > > > > > would also act as a forcing function because the different
> > parts
> > > >> > > > wouldn't
> > > >> > > > > > be intermingled but have to follow the common interface.
> > > >> > > > > >
> > > >> > > > > > Of course this comes with the question what these
> interfaces
> > > >> would
> > > >> > > look
> > > >> > > > > > like. For operators, I'd like to propose getting inspiring
> > (or
> > > >> fully
> > > >> > > > > > adapting) ONNX. For memory management and other Backend
> > specific
> > > >> > > things
> > > >> > > > > we
> > > >> > > > > > could look at the current implementations and find a
> common
> > > >> ground.
> > > >> > > > > >
> > > >> > > > > > Back when I had a community driven project, we heavily
> used
> > this
> > > >> > > > > modularity
> > > >> > > > > > and it brought great benefits - besides the fact that our
> > core
> > > >> was
> > > >> > > > closed
> > > >> > > > > > source. It allowed community developers to act entirely
> > > >> independent
> > > >> > > > from
> > > >> > > > > > other parts and even allowed them to add their own logic
> > without
> > > >> > > having
> > > >> > > > > to
> > > >> > > > > > touch the core. Thinking about companies that implement
> > their
> > > >> own
> > > >> > > > > backends
> > > >> > > > > > or have special tweaked operators without wanting to
> > disclose
> > > >> them,
> > > >> > > > this
> > > >> > > > > > structure would avoid them having to fork the project and
> > then
> > > >> spend
> > > >> > > a
> > > >> > > > > lot
> > > >> > > > > > of effort porting the changes to the latest source release
> > > >> versions.
> > > >> > > > > > Instead, they would maintain their module and we as MXNet
> > > >> community
> > > >> > > > would
> > > >> > > > > > only have to maintain these interfaces.
> > > >> > > > > >
> > > >> > > > > > Right now this is a lot of prosa and basically a brain
> dump
> > of
> > > >> my
> > > >> > > > > thoughts.
> > > >> > > > > > I'd be happy to follow up with details, but first I'd be
> > > >> curious what
> > > >> > > > the
> > > >> > > > > > community thinks about this design.
> > > >> > > > > >
> > > >> > > > > > Best regards,
> > > >> > > > > > Marco
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >>
> > > >
> >
>

Re: [MXNET 2.0 Wishlist] [DISCUSS] Backend choices during runtime

Posted by Marco de Abreu <ma...@gmail.com>.

I'd prefer if we keep discussions on the dev-list instead of slack - feel
free to open another thread.

-Marco

Pedro Larroy <pe...@gmail.com> schrieb am Fr., 12. Apr. 2019,
02:24:

> I will respond in slack, so we don't derail the original thread's
> topic with my points.
>
> Looking forward to your proposal.
>
> On Thu, Apr 11, 2019 at 1:00 PM Junru Shao <ju...@gmail.com>
> wrote:
> >
> > I don't have idea about the following issues:
> >
> > 1) Reducing the abuse of inlined code moving more logic to implementation
> > files and improve scoping which will also speed up compilation
> > 2) Reduce runtime of some unit tests
> > 3) Improve MXNet startup time
> >
> > Will be super interested to hear about your ideas :-)
> >
> >
> > On Thu, Apr 11, 2019 at 12:52 PM Junru Shao <ju...@gmail.com>
> wrote:
> >
> > > We have a systematic solution to go without ABI headache. I am
> struggling
> > > with some errants, and will share our proposal here as soon as I could.
> > > This will be very interesting topic to discuss. Let's work hard
> together
> > > and make it perfect :-)
> > >
> > > On Thu, Apr 11, 2019 at 12:43 PM Pedro Larroy <
> > > pedro.larroy.lists@gmail.com> wrote:
> > >
> > >> Thanks Marco for raising this issue. I think we can certainly do some
> > >> improvements in modularization and build. At the same time Tianqi's
> > >> point of view is important to consider and on point. I see a high risk
> > >> of overengineering in such endeavor.
> > >>
> > >> I also see increased complexity, difficulty debugging, C++ ABI
> > >> headaches, API compatibility, crashes inside a binary module, etc.
> > >> which I don't want to deal with as a developer or even as an MXNet
> > >> user. Does somebody have answers to these problems?
> > >>
> > >> If somebody thinks they have a good solution, by all means propose a
> > >> design in the wiki, I think we are all open. Personally I see several
> > >> other lower hanging fruits which need our attention:
> > >>  * Simplifying our build logic,
> > >>  * Cuda selection in CMake,
> > >>  * Reducing the abuse of inlined code moving more logic to
> > >> implementation files and improve scoping which will also speed up
> > >> compilation, (some units take more than 5 minutes to build and lots of
> > >> RAM in a top of the line CPU core)
> > >>  * Reduce runtime of some unit tests
> > >> And other  improvements in our codebase that would bring immediate
> > >> benefits without the risks of overengineering of a plugin system. I
> > >> also question our bandwidth for such an endeavor.
> > >>  * Improve MXNet startup time.
> > >>  * Thread safety
> > >>
> > >> I would say, let's apply the KISS principle, let's make the project
> > >> fast to build, easy to work on, well documented and easy to contribute
> > >> to before building the next Netscape browser. Otherwise we could save
> > >> ourselves this exercise and switch to Rust directly.
> > >>
> > >> Pedro.
> > >>
> > >>
> > >>
> > >> On Mon, Apr 8, 2019 at 9:42 AM Tianqi Chen <tq...@cs.washington.edu>
> > >> wrote:
> > >> >
> > >> > Just to clarify. I am not questioning the usefulness of the
> separation.
> > >> > Just want to highlight the technical challenges here based on our
> past
> > >> > experiences.
> > >> >
> > >> > Crossing DLL boundaries in C++ can create quite a lot of problems,
> > >> > especially some of the dependencies used a different version of the
> > >> > compiler, follows static packaging or simply because of the dynamic
> > >> linking
> > >> > difference in windows. These problems could make this direction move
> > >> less
> > >> > appealing compared to focusing effort on other things.
> > >> >
> > >> > Technically, as a first step, it is possible to make dependencies
> change
> > >> > not change the global header files and via registration so that
> changing
> > >> > certain component won't trigger a global recompile in CMake. This is
> > >> also a
> > >> > required step toward some modularity.
> > >> >
> > >> > For plugins, solutions that use C ABI can be used for certain plugin
> > >> > modules.
> > >> >
> > >> > Some of the discussion has been tied to what the interface should
> look
> > >> > like. I think we should use different threads for these and puts in
> more
> > >> > thoughts.
> > >> >
> > >> > Tianqi
> > >> >
> > >> >
> > >> >
> > >> > On Sun, Apr 7, 2019 at 4:39 PM kellen sunderland <
> > >> > kellen.sunderland@gmail.com> wrote:
> > >> >
> > >> > > I think we can make some incremental progress.  My thoughts were
> > >> along the
> > >> > > lines of plugins (thinking about what happens with the VLC
> project).
> > >> At
> > >> > > process launch time we could gather some information about our
> > >> execution
> > >> > > environment (either through configuration, or by convention
> looking
> > >> at our
> > >> > > folder structure and libraries available).  We could then later
> load
> > >> the
> > >> > > components we need after understanding if we're using a CUDA
> backend
> > >> and
> > >> > > what operators or subgraph components we would need.  Advantages
> > >> would be
> > >> > > that we would move a lot of the current conditional compile logic
> to
> > >> > > runtime, and automate a lot of it.  It would also make packaging
> > >> binaries
> > >> > > for targeted environments a little easier.  As an example we could
> > >> compile
> > >> > > once, then remove CUDA focused libraries for systems that are
> going
> > >> to run
> > >> > > on CPUs.
> > >> > >
> > >> > > On Sun, Apr 7, 2019 at 2:45 PM Tianqi Chen <
> tqchen@cs.washington.edu>
> > >> > > wrote:
> > >> > >
> > >> > > > While I personally like the idea. This can be something that is
> > >> fairly
> > >> > > > technical challenging and I would caution against this idea vs
> > >> pushing
> > >> > > for
> > >> > > > good features and just allow runtime configuration.
> > >> > > >
> > >> > > > The main problem here is due to the C++ ABI. There is no
> standard
> > >> c++ ABI
> > >> > > > across compilers, which means resorting to runtime DLL and
> dynamic
> > >> > > loading
> > >> > > > brings all sorts of technical problems, especially when multiple
> > >> modules
> > >> > > > depend on the same third dependency(CUDA runtime).
> > >> > > > There is no good to go solution can be made here, especially
> given
> > >> the
> > >> > > > explosion of the backend variants and dependencies in C++.
> > >> > > > A partial solution could be achieved, through the sole use of C
> ABI.
> > >> > > > Combing this with code generation can result in some
> > >> simplifications and
> > >> > > > enable some runtime loadable module. TVM does this, and perhaps
> > >> MXNet
> > >> > > could
> > >> > > > reuse some of that component for operator libraries. Similarly,
> > >> having a
> > >> > > > customizable operator library that is loadable via C ABI might
> be
> > >> > > possible.
> > >> > > >
> > >> > > > So to summarize, while I really like the idea of dynamically
> > >> loadable
> > >> > > > modules. My past experience suggests that this will bring a lot
> of
> > >> > > > additional engineering burden and technical debts without
> > >> significant
> > >> > > > benefit. I would suggest starting by supporting something simple
> > >> like a
> > >> > > > plugin module, before moving toward the general direction.
> > >> > > >
> > >> > > > Tianqi
> > >> > > >
> > >> > > > On Sun, Apr 7, 2019 at 1:31 PM kellen sunderland <
> > >> > > > kellen.sunderland@gmail.com> wrote:
> > >> > > >
> > >> > > > > Strongly support the idea of runtime loadable components in
> MXNet.
> > >> > > > There's
> > >> > > > > no reason (other than perhaps engineering effort) we can't
> have a
> > >> > > single
> > >> > > > > compilation of MXNet that finds dependencies and chooses
> execution
> > >> > > paths
> > >> > > > > intelligently (or based on configuration) at runtime.
> > >> > > > >
> > >> > > > > On Thu, Apr 4, 2019 at 12:29 PM Marco de Abreu <
> > >> marcoabreu@apache.org>
> > >> > > > > wrote:
> > >> > > > >
> > >> > > > > > Hello,
> > >> > > > > >
> > >> > > > > > I'd like to start a discussion about something that I've
> noticed
> > >> > > being
> > >> > > > > > troublesome to maintain in the current version: Backend
> choices
> > >> being
> > >> > > > > made
> > >> > > > > > at compile time.
> > >> > > > > >
> > >> > > > > > Right now, the different backends and accelerators (CPU,
> cuda,
> > >> mkl,
> > >> > > AWS
> > >> > > > > > elastic inference, (future) AMD, openblas,TVM, etc) are all
> > >> scattered
> > >> > > > > > across the different layers of MXNet. On one hand, we have
> > >> compile
> > >> > > time
> > >> > > > > > flags that decide which backends are being compiled into the
> > >> binary,
> > >> > > > > while
> > >> > > > > > at the same time choices can be made in the frontend during
> > >> runtime.
> > >> > > > > >
> > >> > > > > > At the moment, we have a lot of conditional build logic that
> > >> picks
> > >> > > > > > different parts. With the addition of MKLML and later
> MKLDNN the
> > >> > > clear
> > >> > > > > > separation of CPU and GPU got kind of broken up. While we
> have
> > >> some
> > >> > > > > places
> > >> > > > > > where each code lives, in the end we resort to some files
> > >> containing
> > >> > > a
> > >> > > > > lot
> > >> > > > > > of conditional logic for the different backends (sorry I
> can't
> > >> > > provide
> > >> > > > > > links right now since I'm on mobile). To me this seems like
> a
> > >> residue
> > >> > > > of
> > >> > > > > > the fast development style from the early days (more
> processor
> > >> > > > statement
> > >> > > > > > and less object orientation) while also having organic
> growth
> > >> with
> > >> > > new
> > >> > > > > > accelerators. When I see how much AMD had to hack to fit in
> > >> their
> > >> > > > > > implementation, it seemed like we have to make this part
> more
> > >> > > developer
> > >> > > > > > friendly.
> > >> > > > > >
> > >> > > > > > At the moment, every new flavour of MXNet has to be entirely
> > >> > > > recompiled.
> > >> > > > > > This makes it hard for users to figure out which options to
> use,
> > >> > > while
> > >> > > > it
> > >> > > > > > makes it harder for us to test since the overhead to test
> every
> > >> > > single
> > >> > > > > > combination of compile parameters would be overwhelming.
> > >> > > > > >
> > >> > > > > > I'd propose to have a clear class hierarchy based structure
> for
> > >> > > > > > accelerators, operators and memory management. This
> structure
> > >> can
> > >> > > then
> > >> > > > be
> > >> > > > > > implemented by the different backends. To reduce the compile
> > >> burden,
> > >> > > we
> > >> > > > > > would introduce dynamic loading and split the different
> > >> backends into
> > >> > > > > > modules. These could then be developed, maintained and
> compiled
> > >> on
> > >> > > > their
> > >> > > > > > own and then placed in a "module" folder to be loaded at
> > >> runtime.
> > >> > > > Adding
> > >> > > > > a
> > >> > > > > > new accelerator would be a matter of placing the precompiled
> > >> binary
> > >> > > > into
> > >> > > > > > the folder. The detailed configuration of that Backend would
> > >> then be
> > >> > > > done
> > >> > > > > > on runtime - the user shouldn't worry at the point of
> > >> downloading
> > >> > > mxnet
> > >> > > > > > whether they want mkl, MKLDNN, mkl, openblas, atlas, TVM,
> cuda
> > >> or
> > >> > > what
> > >> > > > > ever
> > >> > > > > > else there is. I have an idea how we could help the user
> > >> choosing,
> > >> > > but
> > >> > > > > > that's outside the scope of this proposal.
> > >> > > > > >
> > >> > > > > > This would allow us to have a "core" MXNet that takes care
> of
> > >> the
> > >> > > > engine,
> > >> > > > > > scheduling, communication and all the other crucial parts.
> On
> > >> the
> > >> > > other
> > >> > > > > > hand we could make MXNet less of a monolith and have clear
> > >> > > interfaces.
> > >> > > > > This
> > >> > > > > > would also act as a forcing function because the different
> parts
> > >> > > > wouldn't
> > >> > > > > > be intermingled but have to follow the common interface.
> > >> > > > > >
> > >> > > > > > Of course this comes with the question what these interfaces
> > >> would
> > >> > > look
> > >> > > > > > like. For operators, I'd like to propose getting inspiring
> (or
> > >> fully
> > >> > > > > > adapting) ONNX. For memory management and other Backend
> specific
> > >> > > things
> > >> > > > > we
> > >> > > > > > could look at the current implementations and find a common
> > >> ground.
> > >> > > > > >
> > >> > > > > > Back when I had a community driven project, we heavily used
> this
> > >> > > > > modularity
> > >> > > > > > and it brought great benefits - besides the fact that our
> core
> > >> was
> > >> > > > closed
> > >> > > > > > source. It allowed community developers to act entirely
> > >> independent
> > >> > > > from
> > >> > > > > > other parts and even allowed them to add their own logic
> without
> > >> > > having
> > >> > > > > to
> > >> > > > > > touch the core. Thinking about companies that implement
> their
> > >> own
> > >> > > > > backends
> > >> > > > > > or have special tweaked operators without wanting to
> disclose
> > >> them,
> > >> > > > this
> > >> > > > > > structure would avoid them having to fork the project and
> then
> > >> spend
> > >> > > a
> > >> > > > > lot
> > >> > > > > > of effort porting the changes to the latest source release
> > >> versions.
> > >> > > > > > Instead, they would maintain their module and we as MXNet
> > >> community
> > >> > > > would
> > >> > > > > > only have to maintain these interfaces.
> > >> > > > > >
> > >> > > > > > Right now this is a lot of prosa and basically a brain dump
> of
> > >> my
> > >> > > > > thoughts.
> > >> > > > > > I'd be happy to follow up with details, but first I'd be
> > >> curious what
> > >> > > > the
> > >> > > > > > community thinks about this design.
> > >> > > > > >
> > >> > > > > > Best regards,
> > >> > > > > > Marco
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >>
> > >
>

Re: [MXNET 2.0 Wishlist] [DISCUSS] Backend choices during runtime

Posted by Pedro Larroy <pe...@gmail.com>.

I will respond in slack, so we don't derail the original thread's
topic with my points.

Looking forward to your proposal.

On Thu, Apr 11, 2019 at 1:00 PM Junru Shao <ju...@gmail.com> wrote:
>
> I don't have idea about the following issues:
>
> 1) Reducing the abuse of inlined code moving more logic to implementation
> files and improve scoping which will also speed up compilation
> 2) Reduce runtime of some unit tests
> 3) Improve MXNet startup time
>
> Will be super interested to hear about your ideas :-)
>
>
> On Thu, Apr 11, 2019 at 12:52 PM Junru Shao <ju...@gmail.com> wrote:
>
> > We have a systematic solution to go without ABI headache. I am struggling
> > with some errants, and will share our proposal here as soon as I could.
> > This will be very interesting topic to discuss. Let's work hard together
> > and make it perfect :-)
> >
> > On Thu, Apr 11, 2019 at 12:43 PM Pedro Larroy <
> > pedro.larroy.lists@gmail.com> wrote:
> >
> >> Thanks Marco for raising this issue. I think we can certainly do some
> >> improvements in modularization and build. At the same time Tianqi's
> >> point of view is important to consider and on point. I see a high risk
> >> of overengineering in such endeavor.
> >>
> >> I also see increased complexity, difficulty debugging, C++ ABI
> >> headaches, API compatibility, crashes inside a binary module, etc.
> >> which I don't want to deal with as a developer or even as an MXNet
> >> user. Does somebody have answers to these problems?
> >>
> >> If somebody thinks they have a good solution, by all means propose a
> >> design in the wiki, I think we are all open. Personally I see several
> >> other lower hanging fruits which need our attention:
> >>  * Simplifying our build logic,
> >>  * Cuda selection in CMake,
> >>  * Reducing the abuse of inlined code moving more logic to
> >> implementation files and improve scoping which will also speed up
> >> compilation, (some units take more than 5 minutes to build and lots of
> >> RAM in a top of the line CPU core)
> >>  * Reduce runtime of some unit tests
> >> And other  improvements in our codebase that would bring immediate
> >> benefits without the risks of overengineering of a plugin system. I
> >> also question our bandwidth for such an endeavor.
> >>  * Improve MXNet startup time.
> >>  * Thread safety
> >>
> >> I would say, let's apply the KISS principle, let's make the project
> >> fast to build, easy to work on, well documented and easy to contribute
> >> to before building the next Netscape browser. Otherwise we could save
> >> ourselves this exercise and switch to Rust directly.
> >>
> >> Pedro.
> >>
> >>
> >>
> >> On Mon, Apr 8, 2019 at 9:42 AM Tianqi Chen <tq...@cs.washington.edu>
> >> wrote:
> >> >
> >> > Just to clarify. I am not questioning the usefulness of the separation.
> >> > Just want to highlight the technical challenges here based on our past
> >> > experiences.
> >> >
> >> > Crossing DLL boundaries in C++ can create quite a lot of problems,
> >> > especially some of the dependencies used a different version of the
> >> > compiler, follows static packaging or simply because of the dynamic
> >> linking
> >> > difference in windows. These problems could make this direction move
> >> less
> >> > appealing compared to focusing effort on other things.
> >> >
> >> > Technically, as a first step, it is possible to make dependencies change
> >> > not change the global header files and via registration so that changing
> >> > certain component won't trigger a global recompile in CMake. This is
> >> also a
> >> > required step toward some modularity.
> >> >
> >> > For plugins, solutions that use C ABI can be used for certain plugin
> >> > modules.
> >> >
> >> > Some of the discussion has been tied to what the interface should look
> >> > like. I think we should use different threads for these and puts in more
> >> > thoughts.
> >> >
> >> > Tianqi
> >> >
> >> >
> >> >
> >> > On Sun, Apr 7, 2019 at 4:39 PM kellen sunderland <
> >> > kellen.sunderland@gmail.com> wrote:
> >> >
> >> > > I think we can make some incremental progress.  My thoughts were
> >> along the
> >> > > lines of plugins (thinking about what happens with the VLC project).
> >> At
> >> > > process launch time we could gather some information about our
> >> execution
> >> > > environment (either through configuration, or by convention looking
> >> at our
> >> > > folder structure and libraries available).  We could then later load
> >> the
> >> > > components we need after understanding if we're using a CUDA backend
> >> and
> >> > > what operators or subgraph components we would need.  Advantages
> >> would be
> >> > > that we would move a lot of the current conditional compile logic to
> >> > > runtime, and automate a lot of it.  It would also make packaging
> >> binaries
> >> > > for targeted environments a little easier.  As an example we could
> >> compile
> >> > > once, then remove CUDA focused libraries for systems that are going
> >> to run
> >> > > on CPUs.
> >> > >
> >> > > On Sun, Apr 7, 2019 at 2:45 PM Tianqi Chen <tq...@cs.washington.edu>
> >> > > wrote:
> >> > >
> >> > > > While I personally like the idea. This can be something that is
> >> fairly
> >> > > > technical challenging and I would caution against this idea vs
> >> pushing
> >> > > for
> >> > > > good features and just allow runtime configuration.
> >> > > >
> >> > > > The main problem here is due to the C++ ABI. There is no standard
> >> c++ ABI
> >> > > > across compilers, which means resorting to runtime DLL and dynamic
> >> > > loading
> >> > > > brings all sorts of technical problems, especially when multiple
> >> modules
> >> > > > depend on the same third dependency(CUDA runtime).
> >> > > > There is no good to go solution can be made here, especially given
> >> the
> >> > > > explosion of the backend variants and dependencies in C++.
> >> > > > A partial solution could be achieved, through the sole use of C ABI.
> >> > > > Combing this with code generation can result in some
> >> simplifications and
> >> > > > enable some runtime loadable module. TVM does this, and perhaps
> >> MXNet
> >> > > could
> >> > > > reuse some of that component for operator libraries. Similarly,
> >> having a
> >> > > > customizable operator library that is loadable via C ABI might be
> >> > > possible.
> >> > > >
> >> > > > So to summarize, while I really like the idea of dynamically
> >> loadable
> >> > > > modules. My past experience suggests that this will bring a lot of
> >> > > > additional engineering burden and technical debts without
> >> significant
> >> > > > benefit. I would suggest starting by supporting something simple
> >> like a
> >> > > > plugin module, before moving toward the general direction.
> >> > > >
> >> > > > Tianqi
> >> > > >
> >> > > > On Sun, Apr 7, 2019 at 1:31 PM kellen sunderland <
> >> > > > kellen.sunderland@gmail.com> wrote:
> >> > > >
> >> > > > > Strongly support the idea of runtime loadable components in MXNet.
> >> > > > There's
> >> > > > > no reason (other than perhaps engineering effort) we can't have a
> >> > > single
> >> > > > > compilation of MXNet that finds dependencies and chooses execution
> >> > > paths
> >> > > > > intelligently (or based on configuration) at runtime.
> >> > > > >
> >> > > > > On Thu, Apr 4, 2019 at 12:29 PM Marco de Abreu <
> >> marcoabreu@apache.org>
> >> > > > > wrote:
> >> > > > >
> >> > > > > > Hello,
> >> > > > > >
> >> > > > > > I'd like to start a discussion about something that I've noticed
> >> > > being
> >> > > > > > troublesome to maintain in the current version: Backend choices
> >> being
> >> > > > > made
> >> > > > > > at compile time.
> >> > > > > >
> >> > > > > > Right now, the different backends and accelerators (CPU, cuda,
> >> mkl,
> >> > > AWS
> >> > > > > > elastic inference, (future) AMD, openblas,TVM, etc) are all
> >> scattered
> >> > > > > > across the different layers of MXNet. On one hand, we have
> >> compile
> >> > > time
> >> > > > > > flags that decide which backends are being compiled into the
> >> binary,
> >> > > > > while
> >> > > > > > at the same time choices can be made in the frontend during
> >> runtime.
> >> > > > > >
> >> > > > > > At the moment, we have a lot of conditional build logic that
> >> picks
> >> > > > > > different parts. With the addition of MKLML and later MKLDNN the
> >> > > clear
> >> > > > > > separation of CPU and GPU got kind of broken up. While we have
> >> some
> >> > > > > places
> >> > > > > > where each code lives, in the end we resort to some files
> >> containing
> >> > > a
> >> > > > > lot
> >> > > > > > of conditional logic for the different backends (sorry I can't
> >> > > provide
> >> > > > > > links right now since I'm on mobile). To me this seems like a
> >> residue
> >> > > > of
> >> > > > > > the fast development style from the early days (more processor
> >> > > > statement
> >> > > > > > and less object orientation) while also having organic growth
> >> with
> >> > > new
> >> > > > > > accelerators. When I see how much AMD had to hack to fit in
> >> their
> >> > > > > > implementation, it seemed like we have to make this part more
> >> > > developer
> >> > > > > > friendly.
> >> > > > > >
> >> > > > > > At the moment, every new flavour of MXNet has to be entirely
> >> > > > recompiled.
> >> > > > > > This makes it hard for users to figure out which options to use,
> >> > > while
> >> > > > it
> >> > > > > > makes it harder for us to test since the overhead to test every
> >> > > single
> >> > > > > > combination of compile parameters would be overwhelming.
> >> > > > > >
> >> > > > > > I'd propose to have a clear class hierarchy based structure for
> >> > > > > > accelerators, operators and memory management. This structure
> >> can
> >> > > then
> >> > > > be
> >> > > > > > implemented by the different backends. To reduce the compile
> >> burden,
> >> > > we
> >> > > > > > would introduce dynamic loading and split the different
> >> backends into
> >> > > > > > modules. These could then be developed, maintained and compiled
> >> on
> >> > > > their
> >> > > > > > own and then placed in a "module" folder to be loaded at
> >> runtime.
> >> > > > Adding
> >> > > > > a
> >> > > > > > new accelerator would be a matter of placing the precompiled
> >> binary
> >> > > > into
> >> > > > > > the folder. The detailed configuration of that Backend would
> >> then be
> >> > > > done
> >> > > > > > on runtime - the user shouldn't worry at the point of
> >> downloading
> >> > > mxnet
> >> > > > > > whether they want mkl, MKLDNN, mkl, openblas, atlas, TVM, cuda
> >> or
> >> > > what
> >> > > > > ever
> >> > > > > > else there is. I have an idea how we could help the user
> >> choosing,
> >> > > but
> >> > > > > > that's outside the scope of this proposal.
> >> > > > > >
> >> > > > > > This would allow us to have a "core" MXNet that takes care of
> >> the
> >> > > > engine,
> >> > > > > > scheduling, communication and all the other crucial parts. On
> >> the
> >> > > other
> >> > > > > > hand we could make MXNet less of a monolith and have clear
> >> > > interfaces.
> >> > > > > This
> >> > > > > > would also act as a forcing function because the different parts
> >> > > > wouldn't
> >> > > > > > be intermingled but have to follow the common interface.
> >> > > > > >
> >> > > > > > Of course this comes with the question what these interfaces
> >> would
> >> > > look
> >> > > > > > like. For operators, I'd like to propose getting inspiring (or
> >> fully
> >> > > > > > adapting) ONNX. For memory management and other Backend specific
> >> > > things
> >> > > > > we
> >> > > > > > could look at the current implementations and find a common
> >> ground.
> >> > > > > >
> >> > > > > > Back when I had a community driven project, we heavily used this
> >> > > > > modularity
> >> > > > > > and it brought great benefits - besides the fact that our core
> >> was
> >> > > > closed
> >> > > > > > source. It allowed community developers to act entirely
> >> independent
> >> > > > from
> >> > > > > > other parts and even allowed them to add their own logic without
> >> > > having
> >> > > > > to
> >> > > > > > touch the core. Thinking about companies that implement their
> >> own
> >> > > > > backends
> >> > > > > > or have special tweaked operators without wanting to disclose
> >> them,
> >> > > > this
> >> > > > > > structure would avoid them having to fork the project and then
> >> spend
> >> > > a
> >> > > > > lot
> >> > > > > > of effort porting the changes to the latest source release
> >> versions.
> >> > > > > > Instead, they would maintain their module and we as MXNet
> >> community
> >> > > > would
> >> > > > > > only have to maintain these interfaces.
> >> > > > > >
> >> > > > > > Right now this is a lot of prosa and basically a brain dump of
> >> my
> >> > > > > thoughts.
> >> > > > > > I'd be happy to follow up with details, but first I'd be
> >> curious what
> >> > > > the
> >> > > > > > community thinks about this design.
> >> > > > > >
> >> > > > > > Best regards,
> >> > > > > > Marco
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >>
> >

Re: [MXNET 2.0 Wishlist] [DISCUSS] Backend choices during runtime

Posted by Junru Shao <ju...@gmail.com>.

I don't have idea about the following issues:

1) Reducing the abuse of inlined code moving more logic to implementation
files and improve scoping which will also speed up compilation
2) Reduce runtime of some unit tests
3) Improve MXNet startup time

Will be super interested to hear about your ideas :-)


On Thu, Apr 11, 2019 at 12:52 PM Junru Shao <ju...@gmail.com> wrote:

> We have a systematic solution to go without ABI headache. I am struggling
> with some errants, and will share our proposal here as soon as I could.
> This will be very interesting topic to discuss. Let's work hard together
> and make it perfect :-)
>
> On Thu, Apr 11, 2019 at 12:43 PM Pedro Larroy <
> pedro.larroy.lists@gmail.com> wrote:
>
>> Thanks Marco for raising this issue. I think we can certainly do some
>> improvements in modularization and build. At the same time Tianqi's
>> point of view is important to consider and on point. I see a high risk
>> of overengineering in such endeavor.
>>
>> I also see increased complexity, difficulty debugging, C++ ABI
>> headaches, API compatibility, crashes inside a binary module, etc.
>> which I don't want to deal with as a developer or even as an MXNet
>> user. Does somebody have answers to these problems?
>>
>> If somebody thinks they have a good solution, by all means propose a
>> design in the wiki, I think we are all open. Personally I see several
>> other lower hanging fruits which need our attention:
>>  * Simplifying our build logic,
>>  * Cuda selection in CMake,
>>  * Reducing the abuse of inlined code moving more logic to
>> implementation files and improve scoping which will also speed up
>> compilation, (some units take more than 5 minutes to build and lots of
>> RAM in a top of the line CPU core)
>>  * Reduce runtime of some unit tests
>> And other  improvements in our codebase that would bring immediate
>> benefits without the risks of overengineering of a plugin system. I
>> also question our bandwidth for such an endeavor.
>>  * Improve MXNet startup time.
>>  * Thread safety
>>
>> I would say, let's apply the KISS principle, let's make the project
>> fast to build, easy to work on, well documented and easy to contribute
>> to before building the next Netscape browser. Otherwise we could save
>> ourselves this exercise and switch to Rust directly.
>>
>> Pedro.
>>
>>
>>
>> On Mon, Apr 8, 2019 at 9:42 AM Tianqi Chen <tq...@cs.washington.edu>
>> wrote:
>> >
>> > Just to clarify. I am not questioning the usefulness of the separation.
>> > Just want to highlight the technical challenges here based on our past
>> > experiences.
>> >
>> > Crossing DLL boundaries in C++ can create quite a lot of problems,
>> > especially some of the dependencies used a different version of the
>> > compiler, follows static packaging or simply because of the dynamic
>> linking
>> > difference in windows. These problems could make this direction move
>> less
>> > appealing compared to focusing effort on other things.
>> >
>> > Technically, as a first step, it is possible to make dependencies change
>> > not change the global header files and via registration so that changing
>> > certain component won't trigger a global recompile in CMake. This is
>> also a
>> > required step toward some modularity.
>> >
>> > For plugins, solutions that use C ABI can be used for certain plugin
>> > modules.
>> >
>> > Some of the discussion has been tied to what the interface should look
>> > like. I think we should use different threads for these and puts in more
>> > thoughts.
>> >
>> > Tianqi
>> >
>> >
>> >
>> > On Sun, Apr 7, 2019 at 4:39 PM kellen sunderland <
>> > kellen.sunderland@gmail.com> wrote:
>> >
>> > > I think we can make some incremental progress.  My thoughts were
>> along the
>> > > lines of plugins (thinking about what happens with the VLC project).
>> At
>> > > process launch time we could gather some information about our
>> execution
>> > > environment (either through configuration, or by convention looking
>> at our
>> > > folder structure and libraries available).  We could then later load
>> the
>> > > components we need after understanding if we're using a CUDA backend
>> and
>> > > what operators or subgraph components we would need.  Advantages
>> would be
>> > > that we would move a lot of the current conditional compile logic to
>> > > runtime, and automate a lot of it.  It would also make packaging
>> binaries
>> > > for targeted environments a little easier.  As an example we could
>> compile
>> > > once, then remove CUDA focused libraries for systems that are going
>> to run
>> > > on CPUs.
>> > >
>> > > On Sun, Apr 7, 2019 at 2:45 PM Tianqi Chen <tq...@cs.washington.edu>
>> > > wrote:
>> > >
>> > > > While I personally like the idea. This can be something that is
>> fairly
>> > > > technical challenging and I would caution against this idea vs
>> pushing
>> > > for
>> > > > good features and just allow runtime configuration.
>> > > >
>> > > > The main problem here is due to the C++ ABI. There is no standard
>> c++ ABI
>> > > > across compilers, which means resorting to runtime DLL and dynamic
>> > > loading
>> > > > brings all sorts of technical problems, especially when multiple
>> modules
>> > > > depend on the same third dependency(CUDA runtime).
>> > > > There is no good to go solution can be made here, especially given
>> the
>> > > > explosion of the backend variants and dependencies in C++.
>> > > > A partial solution could be achieved, through the sole use of C ABI.
>> > > > Combing this with code generation can result in some
>> simplifications and
>> > > > enable some runtime loadable module. TVM does this, and perhaps
>> MXNet
>> > > could
>> > > > reuse some of that component for operator libraries. Similarly,
>> having a
>> > > > customizable operator library that is loadable via C ABI might be
>> > > possible.
>> > > >
>> > > > So to summarize, while I really like the idea of dynamically
>> loadable
>> > > > modules. My past experience suggests that this will bring a lot of
>> > > > additional engineering burden and technical debts without
>> significant
>> > > > benefit. I would suggest starting by supporting something simple
>> like a
>> > > > plugin module, before moving toward the general direction.
>> > > >
>> > > > Tianqi
>> > > >
>> > > > On Sun, Apr 7, 2019 at 1:31 PM kellen sunderland <
>> > > > kellen.sunderland@gmail.com> wrote:
>> > > >
>> > > > > Strongly support the idea of runtime loadable components in MXNet.
>> > > > There's
>> > > > > no reason (other than perhaps engineering effort) we can't have a
>> > > single
>> > > > > compilation of MXNet that finds dependencies and chooses execution
>> > > paths
>> > > > > intelligently (or based on configuration) at runtime.
>> > > > >
>> > > > > On Thu, Apr 4, 2019 at 12:29 PM Marco de Abreu <
>> marcoabreu@apache.org>
>> > > > > wrote:
>> > > > >
>> > > > > > Hello,
>> > > > > >
>> > > > > > I'd like to start a discussion about something that I've noticed
>> > > being
>> > > > > > troublesome to maintain in the current version: Backend choices
>> being
>> > > > > made
>> > > > > > at compile time.
>> > > > > >
>> > > > > > Right now, the different backends and accelerators (CPU, cuda,
>> mkl,
>> > > AWS
>> > > > > > elastic inference, (future) AMD, openblas,TVM, etc) are all
>> scattered
>> > > > > > across the different layers of MXNet. On one hand, we have
>> compile
>> > > time
>> > > > > > flags that decide which backends are being compiled into the
>> binary,
>> > > > > while
>> > > > > > at the same time choices can be made in the frontend during
>> runtime.
>> > > > > >
>> > > > > > At the moment, we have a lot of conditional build logic that
>> picks
>> > > > > > different parts. With the addition of MKLML and later MKLDNN the
>> > > clear
>> > > > > > separation of CPU and GPU got kind of broken up. While we have
>> some
>> > > > > places
>> > > > > > where each code lives, in the end we resort to some files
>> containing
>> > > a
>> > > > > lot
>> > > > > > of conditional logic for the different backends (sorry I can't
>> > > provide
>> > > > > > links right now since I'm on mobile). To me this seems like a
>> residue
>> > > > of
>> > > > > > the fast development style from the early days (more processor
>> > > > statement
>> > > > > > and less object orientation) while also having organic growth
>> with
>> > > new
>> > > > > > accelerators. When I see how much AMD had to hack to fit in
>> their
>> > > > > > implementation, it seemed like we have to make this part more
>> > > developer
>> > > > > > friendly.
>> > > > > >
>> > > > > > At the moment, every new flavour of MXNet has to be entirely
>> > > > recompiled.
>> > > > > > This makes it hard for users to figure out which options to use,
>> > > while
>> > > > it
>> > > > > > makes it harder for us to test since the overhead to test every
>> > > single
>> > > > > > combination of compile parameters would be overwhelming.
>> > > > > >
>> > > > > > I'd propose to have a clear class hierarchy based structure for
>> > > > > > accelerators, operators and memory management. This structure
>> can
>> > > then
>> > > > be
>> > > > > > implemented by the different backends. To reduce the compile
>> burden,
>> > > we
>> > > > > > would introduce dynamic loading and split the different
>> backends into
>> > > > > > modules. These could then be developed, maintained and compiled
>> on
>> > > > their
>> > > > > > own and then placed in a "module" folder to be loaded at
>> runtime.
>> > > > Adding
>> > > > > a
>> > > > > > new accelerator would be a matter of placing the precompiled
>> binary
>> > > > into
>> > > > > > the folder. The detailed configuration of that Backend would
>> then be
>> > > > done
>> > > > > > on runtime - the user shouldn't worry at the point of
>> downloading
>> > > mxnet
>> > > > > > whether they want mkl, MKLDNN, mkl, openblas, atlas, TVM, cuda
>> or
>> > > what
>> > > > > ever
>> > > > > > else there is. I have an idea how we could help the user
>> choosing,
>> > > but
>> > > > > > that's outside the scope of this proposal.
>> > > > > >
>> > > > > > This would allow us to have a "core" MXNet that takes care of
>> the
>> > > > engine,
>> > > > > > scheduling, communication and all the other crucial parts. On
>> the
>> > > other
>> > > > > > hand we could make MXNet less of a monolith and have clear
>> > > interfaces.
>> > > > > This
>> > > > > > would also act as a forcing function because the different parts
>> > > > wouldn't
>> > > > > > be intermingled but have to follow the common interface.
>> > > > > >
>> > > > > > Of course this comes with the question what these interfaces
>> would
>> > > look
>> > > > > > like. For operators, I'd like to propose getting inspiring (or
>> fully
>> > > > > > adapting) ONNX. For memory management and other Backend specific
>> > > things
>> > > > > we
>> > > > > > could look at the current implementations and find a common
>> ground.
>> > > > > >
>> > > > > > Back when I had a community driven project, we heavily used this
>> > > > > modularity
>> > > > > > and it brought great benefits - besides the fact that our core
>> was
>> > > > closed
>> > > > > > source. It allowed community developers to act entirely
>> independent
>> > > > from
>> > > > > > other parts and even allowed them to add their own logic without
>> > > having
>> > > > > to
>> > > > > > touch the core. Thinking about companies that implement their
>> own
>> > > > > backends
>> > > > > > or have special tweaked operators without wanting to disclose
>> them,
>> > > > this
>> > > > > > structure would avoid them having to fork the project and then
>> spend
>> > > a
>> > > > > lot
>> > > > > > of effort porting the changes to the latest source release
>> versions.
>> > > > > > Instead, they would maintain their module and we as MXNet
>> community
>> > > > would
>> > > > > > only have to maintain these interfaces.
>> > > > > >
>> > > > > > Right now this is a lot of prosa and basically a brain dump of
>> my
>> > > > > thoughts.
>> > > > > > I'd be happy to follow up with details, but first I'd be
>> curious what
>> > > > the
>> > > > > > community thinks about this design.
>> > > > > >
>> > > > > > Best regards,
>> > > > > > Marco
>> > > > > >
>> > > > >
>> > > >
>> > >
>>
>

Re: [MXNET 2.0 Wishlist] [DISCUSS] Backend choices during runtime

Posted by Junru Shao <ju...@gmail.com>.

We have a systematic solution to go without ABI headache. I am struggling
with some errants, and will share our proposal here as soon as I could.
This will be very interesting topic to discuss. Let's work hard together
and make it perfect :-)

On Thu, Apr 11, 2019 at 12:43 PM Pedro Larroy <pe...@gmail.com>
wrote:

> Thanks Marco for raising this issue. I think we can certainly do some
> improvements in modularization and build. At the same time Tianqi's
> point of view is important to consider and on point. I see a high risk
> of overengineering in such endeavor.
>
> I also see increased complexity, difficulty debugging, C++ ABI
> headaches, API compatibility, crashes inside a binary module, etc.
> which I don't want to deal with as a developer or even as an MXNet
> user. Does somebody have answers to these problems?
>
> If somebody thinks they have a good solution, by all means propose a
> design in the wiki, I think we are all open. Personally I see several
> other lower hanging fruits which need our attention:
>  * Simplifying our build logic,
>  * Cuda selection in CMake,
>  * Reducing the abuse of inlined code moving more logic to
> implementation files and improve scoping which will also speed up
> compilation, (some units take more than 5 minutes to build and lots of
> RAM in a top of the line CPU core)
>  * Reduce runtime of some unit tests
> And other  improvements in our codebase that would bring immediate
> benefits without the risks of overengineering of a plugin system. I
> also question our bandwidth for such an endeavor.
>  * Improve MXNet startup time.
>  * Thread safety
>
> I would say, let's apply the KISS principle, let's make the project
> fast to build, easy to work on, well documented and easy to contribute
> to before building the next Netscape browser. Otherwise we could save
> ourselves this exercise and switch to Rust directly.
>
> Pedro.
>
>
>
> On Mon, Apr 8, 2019 at 9:42 AM Tianqi Chen <tq...@cs.washington.edu>
> wrote:
> >
> > Just to clarify. I am not questioning the usefulness of the separation.
> > Just want to highlight the technical challenges here based on our past
> > experiences.
> >
> > Crossing DLL boundaries in C++ can create quite a lot of problems,
> > especially some of the dependencies used a different version of the
> > compiler, follows static packaging or simply because of the dynamic
> linking
> > difference in windows. These problems could make this direction move less
> > appealing compared to focusing effort on other things.
> >
> > Technically, as a first step, it is possible to make dependencies change
> > not change the global header files and via registration so that changing
> > certain component won't trigger a global recompile in CMake. This is
> also a
> > required step toward some modularity.
> >
> > For plugins, solutions that use C ABI can be used for certain plugin
> > modules.
> >
> > Some of the discussion has been tied to what the interface should look
> > like. I think we should use different threads for these and puts in more
> > thoughts.
> >
> > Tianqi
> >
> >
> >
> > On Sun, Apr 7, 2019 at 4:39 PM kellen sunderland <
> > kellen.sunderland@gmail.com> wrote:
> >
> > > I think we can make some incremental progress.  My thoughts were along
> the
> > > lines of plugins (thinking about what happens with the VLC project).
> At
> > > process launch time we could gather some information about our
> execution
> > > environment (either through configuration, or by convention looking at
> our
> > > folder structure and libraries available).  We could then later load
> the
> > > components we need after understanding if we're using a CUDA backend
> and
> > > what operators or subgraph components we would need.  Advantages would
> be
> > > that we would move a lot of the current conditional compile logic to
> > > runtime, and automate a lot of it.  It would also make packaging
> binaries
> > > for targeted environments a little easier.  As an example we could
> compile
> > > once, then remove CUDA focused libraries for systems that are going to
> run
> > > on CPUs.
> > >
> > > On Sun, Apr 7, 2019 at 2:45 PM Tianqi Chen <tq...@cs.washington.edu>
> > > wrote:
> > >
> > > > While I personally like the idea. This can be something that is
> fairly
> > > > technical challenging and I would caution against this idea vs
> pushing
> > > for
> > > > good features and just allow runtime configuration.
> > > >
> > > > The main problem here is due to the C++ ABI. There is no standard
> c++ ABI
> > > > across compilers, which means resorting to runtime DLL and dynamic
> > > loading
> > > > brings all sorts of technical problems, especially when multiple
> modules
> > > > depend on the same third dependency(CUDA runtime).
> > > > There is no good to go solution can be made here, especially given
> the
> > > > explosion of the backend variants and dependencies in C++.
> > > > A partial solution could be achieved, through the sole use of C ABI.
> > > > Combing this with code generation can result in some simplifications
> and
> > > > enable some runtime loadable module. TVM does this, and perhaps MXNet
> > > could
> > > > reuse some of that component for operator libraries. Similarly,
> having a
> > > > customizable operator library that is loadable via C ABI might be
> > > possible.
> > > >
> > > > So to summarize, while I really like the idea of dynamically loadable
> > > > modules. My past experience suggests that this will bring a lot of
> > > > additional engineering burden and technical debts without significant
> > > > benefit. I would suggest starting by supporting something simple
> like a
> > > > plugin module, before moving toward the general direction.
> > > >
> > > > Tianqi
> > > >
> > > > On Sun, Apr 7, 2019 at 1:31 PM kellen sunderland <
> > > > kellen.sunderland@gmail.com> wrote:
> > > >
> > > > > Strongly support the idea of runtime loadable components in MXNet.
> > > > There's
> > > > > no reason (other than perhaps engineering effort) we can't have a
> > > single
> > > > > compilation of MXNet that finds dependencies and chooses execution
> > > paths
> > > > > intelligently (or based on configuration) at runtime.
> > > > >
> > > > > On Thu, Apr 4, 2019 at 12:29 PM Marco de Abreu <
> marcoabreu@apache.org>
> > > > > wrote:
> > > > >
> > > > > > Hello,
> > > > > >
> > > > > > I'd like to start a discussion about something that I've noticed
> > > being
> > > > > > troublesome to maintain in the current version: Backend choices
> being
> > > > > made
> > > > > > at compile time.
> > > > > >
> > > > > > Right now, the different backends and accelerators (CPU, cuda,
> mkl,
> > > AWS
> > > > > > elastic inference, (future) AMD, openblas,TVM, etc) are all
> scattered
> > > > > > across the different layers of MXNet. On one hand, we have
> compile
> > > time
> > > > > > flags that decide which backends are being compiled into the
> binary,
> > > > > while
> > > > > > at the same time choices can be made in the frontend during
> runtime.
> > > > > >
> > > > > > At the moment, we have a lot of conditional build logic that
> picks
> > > > > > different parts. With the addition of MKLML and later MKLDNN the
> > > clear
> > > > > > separation of CPU and GPU got kind of broken up. While we have
> some
> > > > > places
> > > > > > where each code lives, in the end we resort to some files
> containing
> > > a
> > > > > lot
> > > > > > of conditional logic for the different backends (sorry I can't
> > > provide
> > > > > > links right now since I'm on mobile). To me this seems like a
> residue
> > > > of
> > > > > > the fast development style from the early days (more processor
> > > > statement
> > > > > > and less object orientation) while also having organic growth
> with
> > > new
> > > > > > accelerators. When I see how much AMD had to hack to fit in their
> > > > > > implementation, it seemed like we have to make this part more
> > > developer
> > > > > > friendly.
> > > > > >
> > > > > > At the moment, every new flavour of MXNet has to be entirely
> > > > recompiled.
> > > > > > This makes it hard for users to figure out which options to use,
> > > while
> > > > it
> > > > > > makes it harder for us to test since the overhead to test every
> > > single
> > > > > > combination of compile parameters would be overwhelming.
> > > > > >
> > > > > > I'd propose to have a clear class hierarchy based structure for
> > > > > > accelerators, operators and memory management. This structure can
> > > then
> > > > be
> > > > > > implemented by the different backends. To reduce the compile
> burden,
> > > we
> > > > > > would introduce dynamic loading and split the different backends
> into
> > > > > > modules. These could then be developed, maintained and compiled
> on
> > > > their
> > > > > > own and then placed in a "module" folder to be loaded at runtime.
> > > > Adding
> > > > > a
> > > > > > new accelerator would be a matter of placing the precompiled
> binary
> > > > into
> > > > > > the folder. The detailed configuration of that Backend would
> then be
> > > > done
> > > > > > on runtime - the user shouldn't worry at the point of downloading
> > > mxnet
> > > > > > whether they want mkl, MKLDNN, mkl, openblas, atlas, TVM, cuda or
> > > what
> > > > > ever
> > > > > > else there is. I have an idea how we could help the user
> choosing,
> > > but
> > > > > > that's outside the scope of this proposal.
> > > > > >
> > > > > > This would allow us to have a "core" MXNet that takes care of the
> > > > engine,
> > > > > > scheduling, communication and all the other crucial parts. On the
> > > other
> > > > > > hand we could make MXNet less of a monolith and have clear
> > > interfaces.
> > > > > This
> > > > > > would also act as a forcing function because the different parts
> > > > wouldn't
> > > > > > be intermingled but have to follow the common interface.
> > > > > >
> > > > > > Of course this comes with the question what these interfaces
> would
> > > look
> > > > > > like. For operators, I'd like to propose getting inspiring (or
> fully
> > > > > > adapting) ONNX. For memory management and other Backend specific
> > > things
> > > > > we
> > > > > > could look at the current implementations and find a common
> ground.
> > > > > >
> > > > > > Back when I had a community driven project, we heavily used this
> > > > > modularity
> > > > > > and it brought great benefits - besides the fact that our core
> was
> > > > closed
> > > > > > source. It allowed community developers to act entirely
> independent
> > > > from
> > > > > > other parts and even allowed them to add their own logic without
> > > having
> > > > > to
> > > > > > touch the core. Thinking about companies that implement their own
> > > > > backends
> > > > > > or have special tweaked operators without wanting to disclose
> them,
> > > > this
> > > > > > structure would avoid them having to fork the project and then
> spend
> > > a
> > > > > lot
> > > > > > of effort porting the changes to the latest source release
> versions.
> > > > > > Instead, they would maintain their module and we as MXNet
> community
> > > > would
> > > > > > only have to maintain these interfaces.
> > > > > >
> > > > > > Right now this is a lot of prosa and basically a brain dump of my
> > > > > thoughts.
> > > > > > I'd be happy to follow up with details, but first I'd be curious
> what
> > > > the
> > > > > > community thinks about this design.
> > > > > >
> > > > > > Best regards,
> > > > > > Marco
> > > > > >
> > > > >
> > > >
> > >
>

Re: [MXNET 2.0 Wishlist] [DISCUSS] Backend choices during runtime

Posted by Pedro Larroy <pe...@gmail.com>.

Thanks Marco for raising this issue. I think we can certainly do some
improvements in modularization and build. At the same time Tianqi's
point of view is important to consider and on point. I see a high risk
of overengineering in such endeavor.

I also see increased complexity, difficulty debugging, C++ ABI
headaches, API compatibility, crashes inside a binary module, etc.
which I don't want to deal with as a developer or even as an MXNet
user. Does somebody have answers to these problems?

If somebody thinks they have a good solution, by all means propose a
design in the wiki, I think we are all open. Personally I see several
other lower hanging fruits which need our attention:
 * Simplifying our build logic,
 * Cuda selection in CMake,
 * Reducing the abuse of inlined code moving more logic to
implementation files and improve scoping which will also speed up
compilation, (some units take more than 5 minutes to build and lots of
RAM in a top of the line CPU core)
 * Reduce runtime of some unit tests
And other  improvements in our codebase that would bring immediate
benefits without the risks of overengineering of a plugin system. I
also question our bandwidth for such an endeavor.
 * Improve MXNet startup time.
 * Thread safety

I would say, let's apply the KISS principle, let's make the project
fast to build, easy to work on, well documented and easy to contribute
to before building the next Netscape browser. Otherwise we could save
ourselves this exercise and switch to Rust directly.

Pedro.



On Mon, Apr 8, 2019 at 9:42 AM Tianqi Chen <tq...@cs.washington.edu> wrote:
>
> Just to clarify. I am not questioning the usefulness of the separation.
> Just want to highlight the technical challenges here based on our past
> experiences.
>
> Crossing DLL boundaries in C++ can create quite a lot of problems,
> especially some of the dependencies used a different version of the
> compiler, follows static packaging or simply because of the dynamic linking
> difference in windows. These problems could make this direction move less
> appealing compared to focusing effort on other things.
>
> Technically, as a first step, it is possible to make dependencies change
> not change the global header files and via registration so that changing
> certain component won't trigger a global recompile in CMake. This is also a
> required step toward some modularity.
>
> For plugins, solutions that use C ABI can be used for certain plugin
> modules.
>
> Some of the discussion has been tied to what the interface should look
> like. I think we should use different threads for these and puts in more
> thoughts.
>
> Tianqi
>
>
>
> On Sun, Apr 7, 2019 at 4:39 PM kellen sunderland <
> kellen.sunderland@gmail.com> wrote:
>
> > I think we can make some incremental progress.  My thoughts were along the
> > lines of plugins (thinking about what happens with the VLC project).  At
> > process launch time we could gather some information about our execution
> > environment (either through configuration, or by convention looking at our
> > folder structure and libraries available).  We could then later load the
> > components we need after understanding if we're using a CUDA backend and
> > what operators or subgraph components we would need.  Advantages would be
> > that we would move a lot of the current conditional compile logic to
> > runtime, and automate a lot of it.  It would also make packaging binaries
> > for targeted environments a little easier.  As an example we could compile
> > once, then remove CUDA focused libraries for systems that are going to run
> > on CPUs.
> >
> > On Sun, Apr 7, 2019 at 2:45 PM Tianqi Chen <tq...@cs.washington.edu>
> > wrote:
> >
> > > While I personally like the idea. This can be something that is fairly
> > > technical challenging and I would caution against this idea vs pushing
> > for
> > > good features and just allow runtime configuration.
> > >
> > > The main problem here is due to the C++ ABI. There is no standard c++ ABI
> > > across compilers, which means resorting to runtime DLL and dynamic
> > loading
> > > brings all sorts of technical problems, especially when multiple modules
> > > depend on the same third dependency(CUDA runtime).
> > > There is no good to go solution can be made here, especially given the
> > > explosion of the backend variants and dependencies in C++.
> > > A partial solution could be achieved, through the sole use of C ABI.
> > > Combing this with code generation can result in some simplifications and
> > > enable some runtime loadable module. TVM does this, and perhaps MXNet
> > could
> > > reuse some of that component for operator libraries. Similarly, having a
> > > customizable operator library that is loadable via C ABI might be
> > possible.
> > >
> > > So to summarize, while I really like the idea of dynamically loadable
> > > modules. My past experience suggests that this will bring a lot of
> > > additional engineering burden and technical debts without significant
> > > benefit. I would suggest starting by supporting something simple like a
> > > plugin module, before moving toward the general direction.
> > >
> > > Tianqi
> > >
> > > On Sun, Apr 7, 2019 at 1:31 PM kellen sunderland <
> > > kellen.sunderland@gmail.com> wrote:
> > >
> > > > Strongly support the idea of runtime loadable components in MXNet.
> > > There's
> > > > no reason (other than perhaps engineering effort) we can't have a
> > single
> > > > compilation of MXNet that finds dependencies and chooses execution
> > paths
> > > > intelligently (or based on configuration) at runtime.
> > > >
> > > > On Thu, Apr 4, 2019 at 12:29 PM Marco de Abreu <ma...@apache.org>
> > > > wrote:
> > > >
> > > > > Hello,
> > > > >
> > > > > I'd like to start a discussion about something that I've noticed
> > being
> > > > > troublesome to maintain in the current version: Backend choices being
> > > > made
> > > > > at compile time.
> > > > >
> > > > > Right now, the different backends and accelerators (CPU, cuda, mkl,
> > AWS
> > > > > elastic inference, (future) AMD, openblas,TVM, etc) are all scattered
> > > > > across the different layers of MXNet. On one hand, we have compile
> > time
> > > > > flags that decide which backends are being compiled into the binary,
> > > > while
> > > > > at the same time choices can be made in the frontend during runtime.
> > > > >
> > > > > At the moment, we have a lot of conditional build logic that picks
> > > > > different parts. With the addition of MKLML and later MKLDNN the
> > clear
> > > > > separation of CPU and GPU got kind of broken up. While we have some
> > > > places
> > > > > where each code lives, in the end we resort to some files containing
> > a
> > > > lot
> > > > > of conditional logic for the different backends (sorry I can't
> > provide
> > > > > links right now since I'm on mobile). To me this seems like a residue
> > > of
> > > > > the fast development style from the early days (more processor
> > > statement
> > > > > and less object orientation) while also having organic growth with
> > new
> > > > > accelerators. When I see how much AMD had to hack to fit in their
> > > > > implementation, it seemed like we have to make this part more
> > developer
> > > > > friendly.
> > > > >
> > > > > At the moment, every new flavour of MXNet has to be entirely
> > > recompiled.
> > > > > This makes it hard for users to figure out which options to use,
> > while
> > > it
> > > > > makes it harder for us to test since the overhead to test every
> > single
> > > > > combination of compile parameters would be overwhelming.
> > > > >
> > > > > I'd propose to have a clear class hierarchy based structure for
> > > > > accelerators, operators and memory management. This structure can
> > then
> > > be
> > > > > implemented by the different backends. To reduce the compile burden,
> > we
> > > > > would introduce dynamic loading and split the different backends into
> > > > > modules. These could then be developed, maintained and compiled on
> > > their
> > > > > own and then placed in a "module" folder to be loaded at runtime.
> > > Adding
> > > > a
> > > > > new accelerator would be a matter of placing the precompiled binary
> > > into
> > > > > the folder. The detailed configuration of that Backend would then be
> > > done
> > > > > on runtime - the user shouldn't worry at the point of downloading
> > mxnet
> > > > > whether they want mkl, MKLDNN, mkl, openblas, atlas, TVM, cuda or
> > what
> > > > ever
> > > > > else there is. I have an idea how we could help the user choosing,
> > but
> > > > > that's outside the scope of this proposal.
> > > > >
> > > > > This would allow us to have a "core" MXNet that takes care of the
> > > engine,
> > > > > scheduling, communication and all the other crucial parts. On the
> > other
> > > > > hand we could make MXNet less of a monolith and have clear
> > interfaces.
> > > > This
> > > > > would also act as a forcing function because the different parts
> > > wouldn't
> > > > > be intermingled but have to follow the common interface.
> > > > >
> > > > > Of course this comes with the question what these interfaces would
> > look
> > > > > like. For operators, I'd like to propose getting inspiring (or fully
> > > > > adapting) ONNX. For memory management and other Backend specific
> > things
> > > > we
> > > > > could look at the current implementations and find a common ground.
> > > > >
> > > > > Back when I had a community driven project, we heavily used this
> > > > modularity
> > > > > and it brought great benefits - besides the fact that our core was
> > > closed
> > > > > source. It allowed community developers to act entirely independent
> > > from
> > > > > other parts and even allowed them to add their own logic without
> > having
> > > > to
> > > > > touch the core. Thinking about companies that implement their own
> > > > backends
> > > > > or have special tweaked operators without wanting to disclose them,
> > > this
> > > > > structure would avoid them having to fork the project and then spend
> > a
> > > > lot
> > > > > of effort porting the changes to the latest source release versions.
> > > > > Instead, they would maintain their module and we as MXNet community
> > > would
> > > > > only have to maintain these interfaces.
> > > > >
> > > > > Right now this is a lot of prosa and basically a brain dump of my
> > > > thoughts.
> > > > > I'd be happy to follow up with details, but first I'd be curious what
> > > the
> > > > > community thinks about this design.
> > > > >
> > > > > Best regards,
> > > > > Marco
> > > > >
> > > >
> > >
> >

Re: [MXNET 2.0 Wishlist] [DISCUSS] Backend choices during runtime

Posted by Tianqi Chen <tq...@cs.washington.edu>.

Just to clarify. I am not questioning the usefulness of the separation.
Just want to highlight the technical challenges here based on our past
experiences.

Crossing DLL boundaries in C++ can create quite a lot of problems,
especially some of the dependencies used a different version of the
compiler, follows static packaging or simply because of the dynamic linking
difference in windows. These problems could make this direction move less
appealing compared to focusing effort on other things.

Technically, as a first step, it is possible to make dependencies change
not change the global header files and via registration so that changing
certain component won't trigger a global recompile in CMake. This is also a
required step toward some modularity.

For plugins, solutions that use C ABI can be used for certain plugin
modules.

Some of the discussion has been tied to what the interface should look
like. I think we should use different threads for these and puts in more
thoughts.

Tianqi



On Sun, Apr 7, 2019 at 4:39 PM kellen sunderland <
kellen.sunderland@gmail.com> wrote:

> I think we can make some incremental progress.  My thoughts were along the
> lines of plugins (thinking about what happens with the VLC project).  At
> process launch time we could gather some information about our execution
> environment (either through configuration, or by convention looking at our
> folder structure and libraries available).  We could then later load the
> components we need after understanding if we're using a CUDA backend and
> what operators or subgraph components we would need.  Advantages would be
> that we would move a lot of the current conditional compile logic to
> runtime, and automate a lot of it.  It would also make packaging binaries
> for targeted environments a little easier.  As an example we could compile
> once, then remove CUDA focused libraries for systems that are going to run
> on CPUs.
>
> On Sun, Apr 7, 2019 at 2:45 PM Tianqi Chen <tq...@cs.washington.edu>
> wrote:
>
> > While I personally like the idea. This can be something that is fairly
> > technical challenging and I would caution against this idea vs pushing
> for
> > good features and just allow runtime configuration.
> >
> > The main problem here is due to the C++ ABI. There is no standard c++ ABI
> > across compilers, which means resorting to runtime DLL and dynamic
> loading
> > brings all sorts of technical problems, especially when multiple modules
> > depend on the same third dependency(CUDA runtime).
> > There is no good to go solution can be made here, especially given the
> > explosion of the backend variants and dependencies in C++.
> > A partial solution could be achieved, through the sole use of C ABI.
> > Combing this with code generation can result in some simplifications and
> > enable some runtime loadable module. TVM does this, and perhaps MXNet
> could
> > reuse some of that component for operator libraries. Similarly, having a
> > customizable operator library that is loadable via C ABI might be
> possible.
> >
> > So to summarize, while I really like the idea of dynamically loadable
> > modules. My past experience suggests that this will bring a lot of
> > additional engineering burden and technical debts without significant
> > benefit. I would suggest starting by supporting something simple like a
> > plugin module, before moving toward the general direction.
> >
> > Tianqi
> >
> > On Sun, Apr 7, 2019 at 1:31 PM kellen sunderland <
> > kellen.sunderland@gmail.com> wrote:
> >
> > > Strongly support the idea of runtime loadable components in MXNet.
> > There's
> > > no reason (other than perhaps engineering effort) we can't have a
> single
> > > compilation of MXNet that finds dependencies and chooses execution
> paths
> > > intelligently (or based on configuration) at runtime.
> > >
> > > On Thu, Apr 4, 2019 at 12:29 PM Marco de Abreu <ma...@apache.org>
> > > wrote:
> > >
> > > > Hello,
> > > >
> > > > I'd like to start a discussion about something that I've noticed
> being
> > > > troublesome to maintain in the current version: Backend choices being
> > > made
> > > > at compile time.
> > > >
> > > > Right now, the different backends and accelerators (CPU, cuda, mkl,
> AWS
> > > > elastic inference, (future) AMD, openblas,TVM, etc) are all scattered
> > > > across the different layers of MXNet. On one hand, we have compile
> time
> > > > flags that decide which backends are being compiled into the binary,
> > > while
> > > > at the same time choices can be made in the frontend during runtime.
> > > >
> > > > At the moment, we have a lot of conditional build logic that picks
> > > > different parts. With the addition of MKLML and later MKLDNN the
> clear
> > > > separation of CPU and GPU got kind of broken up. While we have some
> > > places
> > > > where each code lives, in the end we resort to some files containing
> a
> > > lot
> > > > of conditional logic for the different backends (sorry I can't
> provide
> > > > links right now since I'm on mobile). To me this seems like a residue
> > of
> > > > the fast development style from the early days (more processor
> > statement
> > > > and less object orientation) while also having organic growth with
> new
> > > > accelerators. When I see how much AMD had to hack to fit in their
> > > > implementation, it seemed like we have to make this part more
> developer
> > > > friendly.
> > > >
> > > > At the moment, every new flavour of MXNet has to be entirely
> > recompiled.
> > > > This makes it hard for users to figure out which options to use,
> while
> > it
> > > > makes it harder for us to test since the overhead to test every
> single
> > > > combination of compile parameters would be overwhelming.
> > > >
> > > > I'd propose to have a clear class hierarchy based structure for
> > > > accelerators, operators and memory management. This structure can
> then
> > be
> > > > implemented by the different backends. To reduce the compile burden,
> we
> > > > would introduce dynamic loading and split the different backends into
> > > > modules. These could then be developed, maintained and compiled on
> > their
> > > > own and then placed in a "module" folder to be loaded at runtime.
> > Adding
> > > a
> > > > new accelerator would be a matter of placing the precompiled binary
> > into
> > > > the folder. The detailed configuration of that Backend would then be
> > done
> > > > on runtime - the user shouldn't worry at the point of downloading
> mxnet
> > > > whether they want mkl, MKLDNN, mkl, openblas, atlas, TVM, cuda or
> what
> > > ever
> > > > else there is. I have an idea how we could help the user choosing,
> but
> > > > that's outside the scope of this proposal.
> > > >
> > > > This would allow us to have a "core" MXNet that takes care of the
> > engine,
> > > > scheduling, communication and all the other crucial parts. On the
> other
> > > > hand we could make MXNet less of a monolith and have clear
> interfaces.
> > > This
> > > > would also act as a forcing function because the different parts
> > wouldn't
> > > > be intermingled but have to follow the common interface.
> > > >
> > > > Of course this comes with the question what these interfaces would
> look
> > > > like. For operators, I'd like to propose getting inspiring (or fully
> > > > adapting) ONNX. For memory management and other Backend specific
> things
> > > we
> > > > could look at the current implementations and find a common ground.
> > > >
> > > > Back when I had a community driven project, we heavily used this
> > > modularity
> > > > and it brought great benefits - besides the fact that our core was
> > closed
> > > > source. It allowed community developers to act entirely independent
> > from
> > > > other parts and even allowed them to add their own logic without
> having
> > > to
> > > > touch the core. Thinking about companies that implement their own
> > > backends
> > > > or have special tweaked operators without wanting to disclose them,
> > this
> > > > structure would avoid them having to fork the project and then spend
> a
> > > lot
> > > > of effort porting the changes to the latest source release versions.
> > > > Instead, they would maintain their module and we as MXNet community
> > would
> > > > only have to maintain these interfaces.
> > > >
> > > > Right now this is a lot of prosa and basically a brain dump of my
> > > thoughts.
> > > > I'd be happy to follow up with details, but first I'd be curious what
> > the
> > > > community thinks about this design.
> > > >
> > > > Best regards,
> > > > Marco
> > > >
> > >
> >
>

Re: [MXNET 2.0 Wishlist] [DISCUSS] Backend choices during runtime

Posted by kellen sunderland <ke...@gmail.com>.

I think we can make some incremental progress.  My thoughts were along the
lines of plugins (thinking about what happens with the VLC project).  At
process launch time we could gather some information about our execution
environment (either through configuration, or by convention looking at our
folder structure and libraries available).  We could then later load the
components we need after understanding if we're using a CUDA backend and
what operators or subgraph components we would need.  Advantages would be
that we would move a lot of the current conditional compile logic to
runtime, and automate a lot of it.  It would also make packaging binaries
for targeted environments a little easier.  As an example we could compile
once, then remove CUDA focused libraries for systems that are going to run
on CPUs.

On Sun, Apr 7, 2019 at 2:45 PM Tianqi Chen <tq...@cs.washington.edu> wrote:

> While I personally like the idea. This can be something that is fairly
> technical challenging and I would caution against this idea vs pushing for
> good features and just allow runtime configuration.
>
> The main problem here is due to the C++ ABI. There is no standard c++ ABI
> across compilers, which means resorting to runtime DLL and dynamic loading
> brings all sorts of technical problems, especially when multiple modules
> depend on the same third dependency(CUDA runtime).
> There is no good to go solution can be made here, especially given the
> explosion of the backend variants and dependencies in C++.
> A partial solution could be achieved, through the sole use of C ABI.
> Combing this with code generation can result in some simplifications and
> enable some runtime loadable module. TVM does this, and perhaps MXNet could
> reuse some of that component for operator libraries. Similarly, having a
> customizable operator library that is loadable via C ABI might be possible.
>
> So to summarize, while I really like the idea of dynamically loadable
> modules. My past experience suggests that this will bring a lot of
> additional engineering burden and technical debts without significant
> benefit. I would suggest starting by supporting something simple like a
> plugin module, before moving toward the general direction.
>
> Tianqi
>
> On Sun, Apr 7, 2019 at 1:31 PM kellen sunderland <
> kellen.sunderland@gmail.com> wrote:
>
> > Strongly support the idea of runtime loadable components in MXNet.
> There's
> > no reason (other than perhaps engineering effort) we can't have a single
> > compilation of MXNet that finds dependencies and chooses execution paths
> > intelligently (or based on configuration) at runtime.
> >
> > On Thu, Apr 4, 2019 at 12:29 PM Marco de Abreu <ma...@apache.org>
> > wrote:
> >
> > > Hello,
> > >
> > > I'd like to start a discussion about something that I've noticed being
> > > troublesome to maintain in the current version: Backend choices being
> > made
> > > at compile time.
> > >
> > > Right now, the different backends and accelerators (CPU, cuda, mkl, AWS
> > > elastic inference, (future) AMD, openblas,TVM, etc) are all scattered
> > > across the different layers of MXNet. On one hand, we have compile time
> > > flags that decide which backends are being compiled into the binary,
> > while
> > > at the same time choices can be made in the frontend during runtime.
> > >
> > > At the moment, we have a lot of conditional build logic that picks
> > > different parts. With the addition of MKLML and later MKLDNN the clear
> > > separation of CPU and GPU got kind of broken up. While we have some
> > places
> > > where each code lives, in the end we resort to some files containing a
> > lot
> > > of conditional logic for the different backends (sorry I can't provide
> > > links right now since I'm on mobile). To me this seems like a residue
> of
> > > the fast development style from the early days (more processor
> statement
> > > and less object orientation) while also having organic growth with new
> > > accelerators. When I see how much AMD had to hack to fit in their
> > > implementation, it seemed like we have to make this part more developer
> > > friendly.
> > >
> > > At the moment, every new flavour of MXNet has to be entirely
> recompiled.
> > > This makes it hard for users to figure out which options to use, while
> it
> > > makes it harder for us to test since the overhead to test every single
> > > combination of compile parameters would be overwhelming.
> > >
> > > I'd propose to have a clear class hierarchy based structure for
> > > accelerators, operators and memory management. This structure can then
> be
> > > implemented by the different backends. To reduce the compile burden, we
> > > would introduce dynamic loading and split the different backends into
> > > modules. These could then be developed, maintained and compiled on
> their
> > > own and then placed in a "module" folder to be loaded at runtime.
> Adding
> > a
> > > new accelerator would be a matter of placing the precompiled binary
> into
> > > the folder. The detailed configuration of that Backend would then be
> done
> > > on runtime - the user shouldn't worry at the point of downloading mxnet
> > > whether they want mkl, MKLDNN, mkl, openblas, atlas, TVM, cuda or what
> > ever
> > > else there is. I have an idea how we could help the user choosing, but
> > > that's outside the scope of this proposal.
> > >
> > > This would allow us to have a "core" MXNet that takes care of the
> engine,
> > > scheduling, communication and all the other crucial parts. On the other
> > > hand we could make MXNet less of a monolith and have clear interfaces.
> > This
> > > would also act as a forcing function because the different parts
> wouldn't
> > > be intermingled but have to follow the common interface.
> > >
> > > Of course this comes with the question what these interfaces would look
> > > like. For operators, I'd like to propose getting inspiring (or fully
> > > adapting) ONNX. For memory management and other Backend specific things
> > we
> > > could look at the current implementations and find a common ground.
> > >
> > > Back when I had a community driven project, we heavily used this
> > modularity
> > > and it brought great benefits - besides the fact that our core was
> closed
> > > source. It allowed community developers to act entirely independent
> from
> > > other parts and even allowed them to add their own logic without having
> > to
> > > touch the core. Thinking about companies that implement their own
> > backends
> > > or have special tweaked operators without wanting to disclose them,
> this
> > > structure would avoid them having to fork the project and then spend a
> > lot
> > > of effort porting the changes to the latest source release versions.
> > > Instead, they would maintain their module and we as MXNet community
> would
> > > only have to maintain these interfaces.
> > >
> > > Right now this is a lot of prosa and basically a brain dump of my
> > thoughts.
> > > I'd be happy to follow up with details, but first I'd be curious what
> the
> > > community thinks about this design.
> > >
> > > Best regards,
> > > Marco
> > >
> >
>

Re: [MXNET 2.0 Wishlist] [DISCUSS] Backend choices during runtime

Posted by Tianqi Chen <tq...@cs.washington.edu>.

While I personally like the idea. This can be something that is fairly
technical challenging and I would caution against this idea vs pushing for
good features and just allow runtime configuration.

The main problem here is due to the C++ ABI. There is no standard c++ ABI
across compilers, which means resorting to runtime DLL and dynamic loading
brings all sorts of technical problems, especially when multiple modules
depend on the same third dependency(CUDA runtime).
There is no good to go solution can be made here, especially given the
explosion of the backend variants and dependencies in C++.
A partial solution could be achieved, through the sole use of C ABI.
Combing this with code generation can result in some simplifications and
enable some runtime loadable module. TVM does this, and perhaps MXNet could
reuse some of that component for operator libraries. Similarly, having a
customizable operator library that is loadable via C ABI might be possible.

So to summarize, while I really like the idea of dynamically loadable
modules. My past experience suggests that this will bring a lot of
additional engineering burden and technical debts without significant
benefit. I would suggest starting by supporting something simple like a
plugin module, before moving toward the general direction.

Tianqi

On Sun, Apr 7, 2019 at 1:31 PM kellen sunderland <
kellen.sunderland@gmail.com> wrote:

> Strongly support the idea of runtime loadable components in MXNet.  There's
> no reason (other than perhaps engineering effort) we can't have a single
> compilation of MXNet that finds dependencies and chooses execution paths
> intelligently (or based on configuration) at runtime.
>
> On Thu, Apr 4, 2019 at 12:29 PM Marco de Abreu <ma...@apache.org>
> wrote:
>
> > Hello,
> >
> > I'd like to start a discussion about something that I've noticed being
> > troublesome to maintain in the current version: Backend choices being
> made
> > at compile time.
> >
> > Right now, the different backends and accelerators (CPU, cuda, mkl, AWS
> > elastic inference, (future) AMD, openblas,TVM, etc) are all scattered
> > across the different layers of MXNet. On one hand, we have compile time
> > flags that decide which backends are being compiled into the binary,
> while
> > at the same time choices can be made in the frontend during runtime.
> >
> > At the moment, we have a lot of conditional build logic that picks
> > different parts. With the addition of MKLML and later MKLDNN the clear
> > separation of CPU and GPU got kind of broken up. While we have some
> places
> > where each code lives, in the end we resort to some files containing a
> lot
> > of conditional logic for the different backends (sorry I can't provide
> > links right now since I'm on mobile). To me this seems like a residue of
> > the fast development style from the early days (more processor statement
> > and less object orientation) while also having organic growth with new
> > accelerators. When I see how much AMD had to hack to fit in their
> > implementation, it seemed like we have to make this part more developer
> > friendly.
> >
> > At the moment, every new flavour of MXNet has to be entirely recompiled.
> > This makes it hard for users to figure out which options to use, while it
> > makes it harder for us to test since the overhead to test every single
> > combination of compile parameters would be overwhelming.
> >
> > I'd propose to have a clear class hierarchy based structure for
> > accelerators, operators and memory management. This structure can then be
> > implemented by the different backends. To reduce the compile burden, we
> > would introduce dynamic loading and split the different backends into
> > modules. These could then be developed, maintained and compiled on their
> > own and then placed in a "module" folder to be loaded at runtime. Adding
> a
> > new accelerator would be a matter of placing the precompiled binary into
> > the folder. The detailed configuration of that Backend would then be done
> > on runtime - the user shouldn't worry at the point of downloading mxnet
> > whether they want mkl, MKLDNN, mkl, openblas, atlas, TVM, cuda or what
> ever
> > else there is. I have an idea how we could help the user choosing, but
> > that's outside the scope of this proposal.
> >
> > This would allow us to have a "core" MXNet that takes care of the engine,
> > scheduling, communication and all the other crucial parts. On the other
> > hand we could make MXNet less of a monolith and have clear interfaces.
> This
> > would also act as a forcing function because the different parts wouldn't
> > be intermingled but have to follow the common interface.
> >
> > Of course this comes with the question what these interfaces would look
> > like. For operators, I'd like to propose getting inspiring (or fully
> > adapting) ONNX. For memory management and other Backend specific things
> we
> > could look at the current implementations and find a common ground.
> >
> > Back when I had a community driven project, we heavily used this
> modularity
> > and it brought great benefits - besides the fact that our core was closed
> > source. It allowed community developers to act entirely independent from
> > other parts and even allowed them to add their own logic without having
> to
> > touch the core. Thinking about companies that implement their own
> backends
> > or have special tweaked operators without wanting to disclose them, this
> > structure would avoid them having to fork the project and then spend a
> lot
> > of effort porting the changes to the latest source release versions.
> > Instead, they would maintain their module and we as MXNet community would
> > only have to maintain these interfaces.
> >
> > Right now this is a lot of prosa and basically a brain dump of my
> thoughts.
> > I'd be happy to follow up with details, but first I'd be curious what the
> > community thinks about this design.
> >
> > Best regards,
> > Marco
> >
>

Re: [MXNET 2.0 Wishlist] [DISCUSS] Backend choices during runtime

Posted by kellen sunderland <ke...@gmail.com>.

Strongly support the idea of runtime loadable components in MXNet.  There's
no reason (other than perhaps engineering effort) we can't have a single
compilation of MXNet that finds dependencies and chooses execution paths
intelligently (or based on configuration) at runtime.

On Thu, Apr 4, 2019 at 12:29 PM Marco de Abreu <ma...@apache.org>
wrote:

> Hello,
>
> I'd like to start a discussion about something that I've noticed being
> troublesome to maintain in the current version: Backend choices being made
> at compile time.
>
> Right now, the different backends and accelerators (CPU, cuda, mkl, AWS
> elastic inference, (future) AMD, openblas,TVM, etc) are all scattered
> across the different layers of MXNet. On one hand, we have compile time
> flags that decide which backends are being compiled into the binary, while
> at the same time choices can be made in the frontend during runtime.
>
> At the moment, we have a lot of conditional build logic that picks
> different parts. With the addition of MKLML and later MKLDNN the clear
> separation of CPU and GPU got kind of broken up. While we have some places
> where each code lives, in the end we resort to some files containing a lot
> of conditional logic for the different backends (sorry I can't provide
> links right now since I'm on mobile). To me this seems like a residue of
> the fast development style from the early days (more processor statement
> and less object orientation) while also having organic growth with new
> accelerators. When I see how much AMD had to hack to fit in their
> implementation, it seemed like we have to make this part more developer
> friendly.
>
> At the moment, every new flavour of MXNet has to be entirely recompiled.
> This makes it hard for users to figure out which options to use, while it
> makes it harder for us to test since the overhead to test every single
> combination of compile parameters would be overwhelming.
>
> I'd propose to have a clear class hierarchy based structure for
> accelerators, operators and memory management. This structure can then be
> implemented by the different backends. To reduce the compile burden, we
> would introduce dynamic loading and split the different backends into
> modules. These could then be developed, maintained and compiled on their
> own and then placed in a "module" folder to be loaded at runtime. Adding a
> new accelerator would be a matter of placing the precompiled binary into
> the folder. The detailed configuration of that Backend would then be done
> on runtime - the user shouldn't worry at the point of downloading mxnet
> whether they want mkl, MKLDNN, mkl, openblas, atlas, TVM, cuda or what ever
> else there is. I have an idea how we could help the user choosing, but
> that's outside the scope of this proposal.
>
> This would allow us to have a "core" MXNet that takes care of the engine,
> scheduling, communication and all the other crucial parts. On the other
> hand we could make MXNet less of a monolith and have clear interfaces. This
> would also act as a forcing function because the different parts wouldn't
> be intermingled but have to follow the common interface.
>
> Of course this comes with the question what these interfaces would look
> like. For operators, I'd like to propose getting inspiring (or fully
> adapting) ONNX. For memory management and other Backend specific things we
> could look at the current implementations and find a common ground.
>
> Back when I had a community driven project, we heavily used this modularity
> and it brought great benefits - besides the fact that our core was closed
> source. It allowed community developers to act entirely independent from
> other parts and even allowed them to add their own logic without having to
> touch the core. Thinking about companies that implement their own backends
> or have special tweaked operators without wanting to disclose them, this
> structure would avoid them having to fork the project and then spend a lot
> of effort porting the changes to the latest source release versions.
> Instead, they would maintain their module and we as MXNet community would
> only have to maintain these interfaces.
>
> Right now this is a lot of prosa and basically a brain dump of my thoughts.
> I'd be happy to follow up with details, but first I'd be curious what the
> community thinks about this design.
>
> Best regards,
> Marco
>