You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mxnet.apache.org by Seb Kiureghian <se...@gmail.com> on 2017/10/03 03:02:20 UTC

Re: What's everyone working on?

It would be awesome if MXNet were the first DL framework to support Nvidia
Volta. What do you all think about cutting a v0.12 release once that
integration is ready?

On Wed, Sep 27, 2017 at 10:38 PM, Jun Wu <wu...@gmail.com> wrote:

> I had been working on the sparse tensor project with Haibin. After it was
> wrapped up for the first stage, I started my work on the quantization
> project (INT-8 inference). The benefits of using quantized models for
> inference include much higher inference throughput than FP32 model with
> acceptable accuracy loss and compact models saved on small devices. The
> work currently aims at quantizing ConvNets, and we will consider expanding
> it to RNN networks after getting good results for images. Meanwhile, it's
> expected to support quantization on CPU, GPU, and mobile devices.
>

Re: What's everyone working on?

Posted by Chris Olivier <cj...@gmail.com>.
+1


On Mon, Oct 2, 2017 at 8:04 PM Dominic Divakaruni <
dominic.divakaruni@gmail.com> wrote:

> 👏
>
> On Mon, Oct 2, 2017 at 8:02 PM Seb Kiureghian <se...@gmail.com> wrote:
>
> > It would be awesome if MXNet were the first DL framework to support
> Nvidia
> > Volta. What do you all think about cutting a v0.12 release once that
> > integration is ready?
> >
> > On Wed, Sep 27, 2017 at 10:38 PM, Jun Wu <wu...@gmail.com> wrote:
> >
> > > I had been working on the sparse tensor project with Haibin. After it
> was
> > > wrapped up for the first stage, I started my work on the quantization
> > > project (INT-8 inference). The benefits of using quantized models for
> > > inference include much higher inference throughput than FP32 model with
> > > acceptable accuracy loss and compact models saved on small devices. The
> > > work currently aims at quantizing ConvNets, and we will consider
> > expanding
> > > it to RNN networks after getting good results for images. Meanwhile,
> it's
> > > expected to support quantization on CPU, GPU, and mobile devices.
> > >
> >
> --
>
>
> Dominic Divakaruni
> 206.475.9200 Cell
>

Re: What's everyone working on?

Posted by Dominic Divakaruni <do...@gmail.com>.
on a separate but equally exciting note, how about we start to talk about a
1.0 release in the future and what everyone would want that to look like?
I'll start a separate thread. :)

On Mon, Oct 2, 2017 at 9:07 PM, Dominic Divakaruni <
dominic.divakaruni@gmail.com> wrote:

> Seb is talking about support for Cuda 9 and cuDNN 7. Pull requests below.
> @ptrendx and Dick Carter are working through some performance issues but
> should be done in a week (hopefully).
>
> Jun, Bhavin,
> Tensor RT runtime is a different subject. Nvidia is helping build a
> converter for MXNet models. Not sure on the ETA. Tensor RT helps accelerate
> vision models on the V100, TX2, P4/40 etc...
>
>
>    - Enabling persistent batch norm with cuDNN 7:
>    https://github.com/apache/incubator-mxnet/pull/7876
>    <https://github.com/apache/incubator-mxnet/pull/7876>
>    - Making mixed precision work with all optimizers:https://github.com/
>    apache/incubator-mxnet/pull/7654
>    <https://github.com/apache/incubator-mxnet/pull/7654>
>    - Faster IO pipeline needed for Volta:https://github.com/
>    apache/incubator-mxnet/pull/7152
>    <https://github.com/apache/incubator-mxnet/pull/7152>​;
>    - Expose Tell in RecordIO reader:https://github.com/
>    dmlc/dmlc-core/pull/301
>
>
> On Mon, Oct 2, 2017 at 8:44 PM, Bhavin Thaker <bh...@gmail.com>
> wrote:
>
>> Hi Seb: please use a different email thread for new topics of discussion.
>>
>> Hi Jun: I think Seb may be referring to Volta V100 support in MXNet and
>> NOT
>> P4/P40 inference accelerators.
>>
>> Corrections/clarifications welcome.
>>
>> Bhavin Thaker.
>>
>> On Mon, Oct 2, 2017 at 8:22 PM Jun Wu <wu...@gmail.com> wrote:
>>
>> > Thanks for your attention, Seb. We are inclined to be cautious on what
>> can
>> > claim for this project. TensorRT has already supported converting
>> > TensorFlow and Caffe models to its compatible format for fast inference,
>> > but not MXNet. In this sense, it may not be fair to claim MXNet as the
>> > first one supporting Nvidia Volta.
>> >
>> > What we are working on is more experimental and research oriented. We
>> want
>> > to get the first-hand materials in our own hands by building a INT-8
>> > inference prototype and have a thorough understanding on its strength
>> and
>> > limitation, rather than handing it off completely to TensorRT, which is
>> > transparent to us. Considering that the project is experimental, it's
>> still
>> > too early to make a conclusion here as there are plenty of known/unknown
>> > issues and unfinished work.
>> >
>> > On the other hand, we are glad to hear that Nvidia is working on
>> supporting
>> > model conversion from MXNet to TensorRT (Dom please correct me if I'm
>> > mistaken). It would be super beneficial to MXNet on INT-8 if they could
>> > open-source their work as we would be able to maintain and add new
>> features
>> > on our side.
>> >
>> >
>> > On Mon, Oct 2, 2017 at 8:04 PM, Dominic Divakaruni <
>> > dominic.divakaruni@gmail.com> wrote:
>> >
>> > > 👏
>> > >
>> > > On Mon, Oct 2, 2017 at 8:02 PM Seb Kiureghian <se...@gmail.com>
>> > wrote:
>> > >
>> > > > It would be awesome if MXNet were the first DL framework to support
>> > > Nvidia
>> > > > Volta. What do you all think about cutting a v0.12 release once that
>> > > > integration is ready?
>> > > >
>> > > > On Wed, Sep 27, 2017 at 10:38 PM, Jun Wu <wu...@gmail.com>
>> wrote:
>> > > >
>> > > > > I had been working on the sparse tensor project with Haibin.
>> After it
>> > > was
>> > > > > wrapped up for the first stage, I started my work on the
>> quantization
>> > > > > project (INT-8 inference). The benefits of using quantized models
>> for
>> > > > > inference include much higher inference throughput than FP32 model
>> > with
>> > > > > acceptable accuracy loss and compact models saved on small
>> devices.
>> > The
>> > > > > work currently aims at quantizing ConvNets, and we will consider
>> > > > expanding
>> > > > > it to RNN networks after getting good results for images.
>> Meanwhile,
>> > > it's
>> > > > > expected to support quantization on CPU, GPU, and mobile devices.
>> > > > >
>> > > >
>> > > --
>> > >
>> > >
>> > > Dominic Divakaruni
>> > > 206.475.9200 Cell
>> > >
>> >
>>
>
>
>
> --
>
>
> Dominic Divakaruni
> 206.475.9200 <(206)%20475-9200> Cell
>



-- 


Dominic Divakaruni
206.475.9200 Cell

Re: What's everyone working on?

Posted by Dominic Divakaruni <do...@gmail.com>.
Seb is talking about support for Cuda 9 and cuDNN 7. Pull requests below.
@ptrendx and Dick Carter are working through some performance issues but
should be done in a week (hopefully).

Jun, Bhavin,
Tensor RT runtime is a different subject. Nvidia is helping build a
converter for MXNet models. Not sure on the ETA. Tensor RT helps accelerate
vision models on the V100, TX2, P4/40 etc...


   - Enabling persistent batch norm with cuDNN 7:
   https://github.com/apache/incubator-mxnet/pull/7876
   - Making mixed precision work with all optimizers:
   https://github.com/apache/incubator-mxnet/pull/7654
   - Faster IO pipeline needed for Volta:
   https://github.com/apache/incubator-mxnet/pull/7152​;
   - Expose Tell in RecordIO reader:
   https://github.com/dmlc/dmlc-core/pull/301


On Mon, Oct 2, 2017 at 8:44 PM, Bhavin Thaker <bh...@gmail.com>
wrote:

> Hi Seb: please use a different email thread for new topics of discussion.
>
> Hi Jun: I think Seb may be referring to Volta V100 support in MXNet and NOT
> P4/P40 inference accelerators.
>
> Corrections/clarifications welcome.
>
> Bhavin Thaker.
>
> On Mon, Oct 2, 2017 at 8:22 PM Jun Wu <wu...@gmail.com> wrote:
>
> > Thanks for your attention, Seb. We are inclined to be cautious on what
> can
> > claim for this project. TensorRT has already supported converting
> > TensorFlow and Caffe models to its compatible format for fast inference,
> > but not MXNet. In this sense, it may not be fair to claim MXNet as the
> > first one supporting Nvidia Volta.
> >
> > What we are working on is more experimental and research oriented. We
> want
> > to get the first-hand materials in our own hands by building a INT-8
> > inference prototype and have a thorough understanding on its strength and
> > limitation, rather than handing it off completely to TensorRT, which is
> > transparent to us. Considering that the project is experimental, it's
> still
> > too early to make a conclusion here as there are plenty of known/unknown
> > issues and unfinished work.
> >
> > On the other hand, we are glad to hear that Nvidia is working on
> supporting
> > model conversion from MXNet to TensorRT (Dom please correct me if I'm
> > mistaken). It would be super beneficial to MXNet on INT-8 if they could
> > open-source their work as we would be able to maintain and add new
> features
> > on our side.
> >
> >
> > On Mon, Oct 2, 2017 at 8:04 PM, Dominic Divakaruni <
> > dominic.divakaruni@gmail.com> wrote:
> >
> > > 👏
> > >
> > > On Mon, Oct 2, 2017 at 8:02 PM Seb Kiureghian <se...@gmail.com>
> > wrote:
> > >
> > > > It would be awesome if MXNet were the first DL framework to support
> > > Nvidia
> > > > Volta. What do you all think about cutting a v0.12 release once that
> > > > integration is ready?
> > > >
> > > > On Wed, Sep 27, 2017 at 10:38 PM, Jun Wu <wu...@gmail.com>
> wrote:
> > > >
> > > > > I had been working on the sparse tensor project with Haibin. After
> it
> > > was
> > > > > wrapped up for the first stage, I started my work on the
> quantization
> > > > > project (INT-8 inference). The benefits of using quantized models
> for
> > > > > inference include much higher inference throughput than FP32 model
> > with
> > > > > acceptable accuracy loss and compact models saved on small devices.
> > The
> > > > > work currently aims at quantizing ConvNets, and we will consider
> > > > expanding
> > > > > it to RNN networks after getting good results for images.
> Meanwhile,
> > > it's
> > > > > expected to support quantization on CPU, GPU, and mobile devices.
> > > > >
> > > >
> > > --
> > >
> > >
> > > Dominic Divakaruni
> > > 206.475.9200 Cell
> > >
> >
>



-- 


Dominic Divakaruni
206.475.9200 Cell

Re: What's everyone working on?

Posted by Bhavin Thaker <bh...@gmail.com>.
Hi Seb: please use a different email thread for new topics of discussion.

Hi Jun: I think Seb may be referring to Volta V100 support in MXNet and NOT
P4/P40 inference accelerators.

Corrections/clarifications welcome.

Bhavin Thaker.

On Mon, Oct 2, 2017 at 8:22 PM Jun Wu <wu...@gmail.com> wrote:

> Thanks for your attention, Seb. We are inclined to be cautious on what can
> claim for this project. TensorRT has already supported converting
> TensorFlow and Caffe models to its compatible format for fast inference,
> but not MXNet. In this sense, it may not be fair to claim MXNet as the
> first one supporting Nvidia Volta.
>
> What we are working on is more experimental and research oriented. We want
> to get the first-hand materials in our own hands by building a INT-8
> inference prototype and have a thorough understanding on its strength and
> limitation, rather than handing it off completely to TensorRT, which is
> transparent to us. Considering that the project is experimental, it's still
> too early to make a conclusion here as there are plenty of known/unknown
> issues and unfinished work.
>
> On the other hand, we are glad to hear that Nvidia is working on supporting
> model conversion from MXNet to TensorRT (Dom please correct me if I'm
> mistaken). It would be super beneficial to MXNet on INT-8 if they could
> open-source their work as we would be able to maintain and add new features
> on our side.
>
>
> On Mon, Oct 2, 2017 at 8:04 PM, Dominic Divakaruni <
> dominic.divakaruni@gmail.com> wrote:
>
> > 👏
> >
> > On Mon, Oct 2, 2017 at 8:02 PM Seb Kiureghian <se...@gmail.com>
> wrote:
> >
> > > It would be awesome if MXNet were the first DL framework to support
> > Nvidia
> > > Volta. What do you all think about cutting a v0.12 release once that
> > > integration is ready?
> > >
> > > On Wed, Sep 27, 2017 at 10:38 PM, Jun Wu <wu...@gmail.com> wrote:
> > >
> > > > I had been working on the sparse tensor project with Haibin. After it
> > was
> > > > wrapped up for the first stage, I started my work on the quantization
> > > > project (INT-8 inference). The benefits of using quantized models for
> > > > inference include much higher inference throughput than FP32 model
> with
> > > > acceptable accuracy loss and compact models saved on small devices.
> The
> > > > work currently aims at quantizing ConvNets, and we will consider
> > > expanding
> > > > it to RNN networks after getting good results for images. Meanwhile,
> > it's
> > > > expected to support quantization on CPU, GPU, and mobile devices.
> > > >
> > >
> > --
> >
> >
> > Dominic Divakaruni
> > 206.475.9200 Cell
> >
>

Re: What's everyone working on?

Posted by Jun Wu <wu...@gmail.com>.
Thanks for your attention, Seb. We are inclined to be cautious on what can
claim for this project. TensorRT has already supported converting
TensorFlow and Caffe models to its compatible format for fast inference,
but not MXNet. In this sense, it may not be fair to claim MXNet as the
first one supporting Nvidia Volta.

What we are working on is more experimental and research oriented. We want
to get the first-hand materials in our own hands by building a INT-8
inference prototype and have a thorough understanding on its strength and
limitation, rather than handing it off completely to TensorRT, which is
transparent to us. Considering that the project is experimental, it's still
too early to make a conclusion here as there are plenty of known/unknown
issues and unfinished work.

On the other hand, we are glad to hear that Nvidia is working on supporting
model conversion from MXNet to TensorRT (Dom please correct me if I'm
mistaken). It would be super beneficial to MXNet on INT-8 if they could
open-source their work as we would be able to maintain and add new features
on our side.


On Mon, Oct 2, 2017 at 8:04 PM, Dominic Divakaruni <
dominic.divakaruni@gmail.com> wrote:

> 👏
>
> On Mon, Oct 2, 2017 at 8:02 PM Seb Kiureghian <se...@gmail.com> wrote:
>
> > It would be awesome if MXNet were the first DL framework to support
> Nvidia
> > Volta. What do you all think about cutting a v0.12 release once that
> > integration is ready?
> >
> > On Wed, Sep 27, 2017 at 10:38 PM, Jun Wu <wu...@gmail.com> wrote:
> >
> > > I had been working on the sparse tensor project with Haibin. After it
> was
> > > wrapped up for the first stage, I started my work on the quantization
> > > project (INT-8 inference). The benefits of using quantized models for
> > > inference include much higher inference throughput than FP32 model with
> > > acceptable accuracy loss and compact models saved on small devices. The
> > > work currently aims at quantizing ConvNets, and we will consider
> > expanding
> > > it to RNN networks after getting good results for images. Meanwhile,
> it's
> > > expected to support quantization on CPU, GPU, and mobile devices.
> > >
> >
> --
>
>
> Dominic Divakaruni
> 206.475.9200 Cell
>

Re: What's everyone working on?

Posted by Dominic Divakaruni <do...@gmail.com>.
👏

On Mon, Oct 2, 2017 at 8:02 PM Seb Kiureghian <se...@gmail.com> wrote:

> It would be awesome if MXNet were the first DL framework to support Nvidia
> Volta. What do you all think about cutting a v0.12 release once that
> integration is ready?
>
> On Wed, Sep 27, 2017 at 10:38 PM, Jun Wu <wu...@gmail.com> wrote:
>
> > I had been working on the sparse tensor project with Haibin. After it was
> > wrapped up for the first stage, I started my work on the quantization
> > project (INT-8 inference). The benefits of using quantized models for
> > inference include much higher inference throughput than FP32 model with
> > acceptable accuracy loss and compact models saved on small devices. The
> > work currently aims at quantizing ConvNets, and we will consider
> expanding
> > it to RNN networks after getting good results for images. Meanwhile, it's
> > expected to support quantization on CPU, GPU, and mobile devices.
> >
>
-- 


Dominic Divakaruni
206.475.9200 Cell