You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mxnet.apache.org by Anirudh Subramanian <an...@gmail.com> on 2019/04/29 10:59:43 UTC

Proposal for Conversion from FP32 to Mixed Precision Models

Hi all,

I have created a doc for conversion from FP32 to Mixed Precision Models:
https://cwiki.apache.org/confluence/display/MXNET/Conversion+from+FP32+to+Mixed+Precision+Models

I look forward to your feedback on the same.

Thanks,
Anirudh

Re: Proposal for Conversion from FP32 to Mixed Precision Models

Posted by Anirudh Subramanian <an...@gmail.com>.

Hi Tao,

I covered in the doc that it is specifically about inference. I can add
another section in FAQ to mention why INT8 quantization is not included.

Anirudh

On Tue, Apr 30, 2019 at 7:59 AM Lv, Tao A <ta...@intel.com> wrote:

> Thank you Anirudh! I'm just a little surprised that when we talk about
> mixed precision model we don't talk about training, and when talk about
> inference, INT8 quantization is not mentioned~
>
> -----Original Message-----
> From: Anirudh Subramanian [mailto:anirudh2290@gmail.com]
> Sent: Tuesday, April 30, 2019 8:27 PM
> To: dev@mxnet.incubator.apache.org
> Subject: Re: Proposal for Conversion from FP32 to Mixed Precision Models
>
> Hi Zach,
>
> I checked the QuantizeGraph pass and I think probably it can benefit from
> CSE pass to eliminate additional quantize/quantize_v2 nodes. Having said
> that, I think it may still be an overkill to add another NNVM pass to have
> a generic common subexpression elimination pass. Currently, this
> elimination logic takes only additional 3 to 6 lines of code in each of the
> two NNVM pass. Also, a generic common subexpression elimination has its own
> associated maintenance costs. I think it is better to continue with the
> current approach and revisit this need in the future as we add more NNVM
> passes.
>
> Anirudh
>
> On Mon, Apr 29, 2019 at 2:22 PM Anirudh Subramanian <anirudh2290@gmail.com
> >
> wrote:
>
> > Hi Zach,
> >
> > You raise an interesting point. Thank you for the pointer!
> >
> > Incorporating CSE pass comes with its own cost, and the advantage it
> > brings is to make the ReducePrecision nnvm pass more lightweight.
> > Since the amortized cost of the ReducePrecision pass is O(1) it
> > shouldn't matter much whether we  add it or not from performance point
> of view.
> >
> > From maintenance point of view, I would agree that separating these
> > two logics can be helpful if we have other such workflows which
> > require the original Pass followed by CSE pass. Currently, as far as I
> > know only the ReducePrecision pass using it. I will check to see if
> > CSE pass can benefit other NNVM pass also like quantization pass apart
> > from ReducePrecision, and will get back.
> >
> > Anirudh
> >
> > On Mon, Apr 29, 2019 at 11:18 AM Zach Kimberg
> > <za...@gmail.com>
> > wrote:
> >
> >> I have one suggestion. In the current design, there are the
> >> additional maps from each input entry to each target casted entry
> >> dtype in order to avoid creating duplicate casts. Instead of creating
> >> these, another option is to use a general purpose Common
> >> Subexpression Elimination (CSE) [1] pass to apply afterwards. So, you
> >> would run the mixed precision pass which creates the duplicates and
> >> then the CSE pass which would remove all duplicates.
> >>
> >> This design is common in existing compilers like LLVM because
> >> maintaining and testing the passes is much easier when they are kept
> >> as simple as possible. The CSE can also be reused as necessary for
> >> other passes that could create duplicates or to remove duplicate
> expressions in general.
> >> This
> >> tutorial [2] talks about it a bit.
> >>
> >> Zach
> >>
> >> [1] - https://en.wikipedia.org/wiki/Common_subexpression_elimination
> >> [2] - https://blog.regehr.org/archives/1603
> >>
> >> On Mon, Apr 29, 2019 at 9:26 AM Anirudh Subramanian <
> >> anirudh2290@gmail.com>
> >> wrote:
> >>
> >> > Hi Tao,
> >> >
> >> > Thanks for raising this question! I thought about the existing
> >> quantization
> >> > workflow and whether it can be included with the AMP API. Although
> >> > quantization can be considered as mixed precision, there are
> >> differences.
> >> > For example, only a small number of operators can be quantized
> >> > compared
> >> to
> >> > the operators that can run in FP16 precision. Thus, overriding the
> >> > operators to run in original dtype vs target dtype doesnt make much
> >> sense
> >> > for quantization.
> >> >
> >> > Also, quantization workflow may require a calibration dataset to
> >> calibrate
> >> > the min and max and calib_mode.
> >> > Arriving at a common API, for quantization with calibration and
> >> > mixed precision inference (FP16 and BF16) may make the API too
> >> > complicated and not very easy to use. I understand that this may
> >> > cause some confusion as people may try to use target_dtype of int8
> >> > but I think its still better than causing user confusion with the API
> usage.
> >> >
> >> > Also, when we move quantize_model APIs outside contrib we can
> >> > consider adding them under AMP namespace. The challenge would then
> >> > be to educate users on difference between "quantize" and "convert".
> >> >
> >> > Anirudh
> >> >
> >> > On Mon, Apr 29, 2019 at 7:45 AM Lv, Tao A <ta...@intel.com> wrote:
> >> >
> >> > > Thank you for the explanation. Sorry I didn't realize the
> >> > > proposal is
> >> for
> >> > > inference only.
> >> > >
> >> > > Then how do you think the amp_cast and amp_multicast in this
> >> > > proposal
> >> can
> >> > > work with the existing INT8 quantization workflow which I think
> >> > > should
> >> > also
> >> > > be considered as 'mixed precision'.
> >> > >
> >> > > -----Original Message-----
> >> > > From: Anirudh Subramanian [mailto:anirudh2290@gmail.com]
> >> > > Sent: Monday, April 29, 2019 10:25 PM
> >> > > To: dev@mxnet.incubator.apache.org
> >> > > Subject: Re: Proposal for Conversion from FP32 to Mixed Precision
> >> Models
> >> > >
> >> > > Hi Tao,
> >> > >
> >> > > The APIs proposed: "convert_model" and "convert_block" are mainly
> >> > > for inference use cases, where customers bring a FP32 model to
> >> > > convert it
> >> to
> >> > a
> >> > > mixed precision model to get improved performance while not
> >> > > losing
> >> out on
> >> > > the accuracy.
> >> > > The PR: https://github.com/apache/incubator-mxnet/pull/14173 is
> >> supposed
> >> > > to handle the training use cases and this proposal doesn't cover
> >> > > the
> >> AMP
> >> > > feature added in the PR. I think ptrendx@ and canoerst@ are
> >> > > better equipped to answer questions 1 and 2.
> >> > >
> >> > > > - more generally, what will be saved when users want to
> >> > > > serialize their
> >> > > model to disk?
> >> > >
> >> > > Lets say users want to save converted mixed precision model used
> >> > > for inference to disk. It will save both, the symbol with the
> >> > > amp_cast and amp_multicast operators and the params (which are
> >> > > casted if
> >> necessary).
> >> > >
> >> > > Anirudh
> >> > >
> >> > >
> >> > > On Mon, Apr 29, 2019 at 6:55 AM Lv, Tao A <ta...@intel.com>
> wrote:
> >> > >
> >> > > > Thank you for sharing this, Anirudh.
> >> > > >
> >> > > > Curious to know:
> >> > > > - what will be saved in a training checkpoint or snapshot? Can
> >> > > > it be resumed on another platform which might not support the
> >> > > > lower precision the previous one used?
> >> > > > - what will be saved in the final symbol.json and params file
> >> > > > when training is finished?
> >> > > > - more generally, what will be saved when users want to
> >> > > > serialize their model to disk?
> >> > > >
> >> > > > Thank you,
> >> > > > -tao
> >> > > >
> >> > > > -----Original Message-----
> >> > > > From: Anirudh Subramanian [mailto:anirudh2290@gmail.com]
> >> > > > Sent: Monday, April 29, 2019 7:00 PM
> >> > > > To: dev@mxnet.incubator.apache.org
> >> > > > Subject: Proposal for Conversion from FP32 to Mixed Precision
> >> > > > Models
> >> > > >
> >> > > > Hi all,
> >> > > >
> >> > > > I have created a doc for conversion from FP32 to Mixed
> >> > > > Precision
> >> > Models:
> >> > > >
> >> > > >
> >> https://cwiki.apache.org/confluence/display/MXNET/Conversion+from+FP3
> >> 2
> >> > > > +to+Mixed+Precision+Models
> >> > > >
> >> > > > I look forward to your feedback on the same.
> >> > > >
> >> > > > Thanks,
> >> > > > Anirudh
> >> > > >
> >> > >
> >> >
> >>
> >
>

RE: Proposal for Conversion from FP32 to Mixed Precision Models

Posted by "Lv, Tao A" <ta...@intel.com>.

Thank you Anirudh! I'm just a little surprised that when we talk about mixed precision model we don't talk about training, and when talk about inference, INT8 quantization is not mentioned~

-----Original Message-----
From: Anirudh Subramanian [mailto:anirudh2290@gmail.com] 
Sent: Tuesday, April 30, 2019 8:27 PM
To: dev@mxnet.incubator.apache.org
Subject: Re: Proposal for Conversion from FP32 to Mixed Precision Models

Hi Zach,

I checked the QuantizeGraph pass and I think probably it can benefit from CSE pass to eliminate additional quantize/quantize_v2 nodes. Having said that, I think it may still be an overkill to add another NNVM pass to have a generic common subexpression elimination pass. Currently, this elimination logic takes only additional 3 to 6 lines of code in each of the two NNVM pass. Also, a generic common subexpression elimination has its own associated maintenance costs. I think it is better to continue with the current approach and revisit this need in the future as we add more NNVM passes.

Anirudh

On Mon, Apr 29, 2019 at 2:22 PM Anirudh Subramanian <an...@gmail.com>
wrote:

> Hi Zach,
>
> You raise an interesting point. Thank you for the pointer!
>
> Incorporating CSE pass comes with its own cost, and the advantage it 
> brings is to make the ReducePrecision nnvm pass more lightweight. 
> Since the amortized cost of the ReducePrecision pass is O(1) it 
> shouldn't matter much whether we  add it or not from performance point of view.
>
> From maintenance point of view, I would agree that separating these 
> two logics can be helpful if we have other such workflows which 
> require the original Pass followed by CSE pass. Currently, as far as I 
> know only the ReducePrecision pass using it. I will check to see if 
> CSE pass can benefit other NNVM pass also like quantization pass apart 
> from ReducePrecision, and will get back.
>
> Anirudh
>
> On Mon, Apr 29, 2019 at 11:18 AM Zach Kimberg 
> <za...@gmail.com>
> wrote:
>
>> I have one suggestion. In the current design, there are the 
>> additional maps from each input entry to each target casted entry 
>> dtype in order to avoid creating duplicate casts. Instead of creating 
>> these, another option is to use a general purpose Common 
>> Subexpression Elimination (CSE) [1] pass to apply afterwards. So, you 
>> would run the mixed precision pass which creates the duplicates and 
>> then the CSE pass which would remove all duplicates.
>>
>> This design is common in existing compilers like LLVM because 
>> maintaining and testing the passes is much easier when they are kept 
>> as simple as possible. The CSE can also be reused as necessary for 
>> other passes that could create duplicates or to remove duplicate expressions in general.
>> This
>> tutorial [2] talks about it a bit.
>>
>> Zach
>>
>> [1] - https://en.wikipedia.org/wiki/Common_subexpression_elimination
>> [2] - https://blog.regehr.org/archives/1603
>>
>> On Mon, Apr 29, 2019 at 9:26 AM Anirudh Subramanian < 
>> anirudh2290@gmail.com>
>> wrote:
>>
>> > Hi Tao,
>> >
>> > Thanks for raising this question! I thought about the existing
>> quantization
>> > workflow and whether it can be included with the AMP API. Although 
>> > quantization can be considered as mixed precision, there are
>> differences.
>> > For example, only a small number of operators can be quantized 
>> > compared
>> to
>> > the operators that can run in FP16 precision. Thus, overriding the 
>> > operators to run in original dtype vs target dtype doesnt make much
>> sense
>> > for quantization.
>> >
>> > Also, quantization workflow may require a calibration dataset to
>> calibrate
>> > the min and max and calib_mode.
>> > Arriving at a common API, for quantization with calibration and 
>> > mixed precision inference (FP16 and BF16) may make the API too 
>> > complicated and not very easy to use. I understand that this may 
>> > cause some confusion as people may try to use target_dtype of int8 
>> > but I think its still better than causing user confusion with the API usage.
>> >
>> > Also, when we move quantize_model APIs outside contrib we can 
>> > consider adding them under AMP namespace. The challenge would then 
>> > be to educate users on difference between "quantize" and "convert".
>> >
>> > Anirudh
>> >
>> > On Mon, Apr 29, 2019 at 7:45 AM Lv, Tao A <ta...@intel.com> wrote:
>> >
>> > > Thank you for the explanation. Sorry I didn't realize the 
>> > > proposal is
>> for
>> > > inference only.
>> > >
>> > > Then how do you think the amp_cast and amp_multicast in this 
>> > > proposal
>> can
>> > > work with the existing INT8 quantization workflow which I think 
>> > > should
>> > also
>> > > be considered as 'mixed precision'.
>> > >
>> > > -----Original Message-----
>> > > From: Anirudh Subramanian [mailto:anirudh2290@gmail.com]
>> > > Sent: Monday, April 29, 2019 10:25 PM
>> > > To: dev@mxnet.incubator.apache.org
>> > > Subject: Re: Proposal for Conversion from FP32 to Mixed Precision
>> Models
>> > >
>> > > Hi Tao,
>> > >
>> > > The APIs proposed: "convert_model" and "convert_block" are mainly 
>> > > for inference use cases, where customers bring a FP32 model to 
>> > > convert it
>> to
>> > a
>> > > mixed precision model to get improved performance while not 
>> > > losing
>> out on
>> > > the accuracy.
>> > > The PR: https://github.com/apache/incubator-mxnet/pull/14173 is
>> supposed
>> > > to handle the training use cases and this proposal doesn't cover 
>> > > the
>> AMP
>> > > feature added in the PR. I think ptrendx@ and canoerst@ are 
>> > > better equipped to answer questions 1 and 2.
>> > >
>> > > > - more generally, what will be saved when users want to 
>> > > > serialize their
>> > > model to disk?
>> > >
>> > > Lets say users want to save converted mixed precision model used 
>> > > for inference to disk. It will save both, the symbol with the 
>> > > amp_cast and amp_multicast operators and the params (which are 
>> > > casted if
>> necessary).
>> > >
>> > > Anirudh
>> > >
>> > >
>> > > On Mon, Apr 29, 2019 at 6:55 AM Lv, Tao A <ta...@intel.com> wrote:
>> > >
>> > > > Thank you for sharing this, Anirudh.
>> > > >
>> > > > Curious to know:
>> > > > - what will be saved in a training checkpoint or snapshot? Can 
>> > > > it be resumed on another platform which might not support the 
>> > > > lower precision the previous one used?
>> > > > - what will be saved in the final symbol.json and params file 
>> > > > when training is finished?
>> > > > - more generally, what will be saved when users want to 
>> > > > serialize their model to disk?
>> > > >
>> > > > Thank you,
>> > > > -tao
>> > > >
>> > > > -----Original Message-----
>> > > > From: Anirudh Subramanian [mailto:anirudh2290@gmail.com]
>> > > > Sent: Monday, April 29, 2019 7:00 PM
>> > > > To: dev@mxnet.incubator.apache.org
>> > > > Subject: Proposal for Conversion from FP32 to Mixed Precision 
>> > > > Models
>> > > >
>> > > > Hi all,
>> > > >
>> > > > I have created a doc for conversion from FP32 to Mixed 
>> > > > Precision
>> > Models:
>> > > >
>> > > >
>> https://cwiki.apache.org/confluence/display/MXNET/Conversion+from+FP3
>> 2
>> > > > +to+Mixed+Precision+Models
>> > > >
>> > > > I look forward to your feedback on the same.
>> > > >
>> > > > Thanks,
>> > > > Anirudh
>> > > >
>> > >
>> >
>>
>

Re: Proposal for Conversion from FP32 to Mixed Precision Models

Posted by Anirudh Subramanian <an...@gmail.com>.

Hi Zach,

I checked the QuantizeGraph pass and I think probably it can benefit from
CSE pass to eliminate additional quantize/quantize_v2 nodes. Having said
that, I think it may still be an overkill to add another NNVM pass to have
a generic common subexpression elimination pass. Currently, this
elimination logic takes only additional 3 to 6 lines of code in each of the
two NNVM pass. Also, a generic common subexpression elimination has its own
associated maintenance costs. I think it is better to continue with the
current approach and revisit this need in the future as we add more NNVM
passes.

Anirudh

On Mon, Apr 29, 2019 at 2:22 PM Anirudh Subramanian <an...@gmail.com>
wrote:

> Hi Zach,
>
> You raise an interesting point. Thank you for the pointer!
>
> Incorporating CSE pass comes with its own cost, and the advantage it
> brings is to make the ReducePrecision nnvm pass more lightweight. Since the
> amortized cost of the ReducePrecision pass is O(1) it shouldn't matter much
> whether we  add it or not from performance point of view.
>
> From maintenance point of view, I would agree that separating these two
> logics can be helpful if we have other such workflows which require the
> original Pass followed by CSE pass. Currently, as far as I know only the
> ReducePrecision pass using it. I will check to see if CSE pass can benefit
> other NNVM pass also like quantization pass apart from ReducePrecision, and
> will get back.
>
> Anirudh
>
> On Mon, Apr 29, 2019 at 11:18 AM Zach Kimberg <za...@gmail.com>
> wrote:
>
>> I have one suggestion. In the current design, there are the additional
>> maps
>> from each input entry to each target casted entry dtype in order to avoid
>> creating duplicate casts. Instead of creating these, another option is to
>> use a general purpose Common Subexpression Elimination (CSE) [1] pass to
>> apply afterwards. So, you would run the mixed precision pass which creates
>> the duplicates and then the CSE pass which would remove all duplicates.
>>
>> This design is common in existing compilers like LLVM because maintaining
>> and testing the passes is much easier when they are kept as simple as
>> possible. The CSE can also be reused as necessary for other passes that
>> could create duplicates or to remove duplicate expressions in general.
>> This
>> tutorial [2] talks about it a bit.
>>
>> Zach
>>
>> [1] - https://en.wikipedia.org/wiki/Common_subexpression_elimination
>> [2] - https://blog.regehr.org/archives/1603
>>
>> On Mon, Apr 29, 2019 at 9:26 AM Anirudh Subramanian <
>> anirudh2290@gmail.com>
>> wrote:
>>
>> > Hi Tao,
>> >
>> > Thanks for raising this question! I thought about the existing
>> quantization
>> > workflow and whether it can be included with the AMP API. Although
>> > quantization can be considered as mixed precision, there are
>> differences.
>> > For example, only a small number of operators can be quantized compared
>> to
>> > the operators that can run in FP16 precision. Thus, overriding the
>> > operators to run in original dtype vs target dtype doesnt make much
>> sense
>> > for quantization.
>> >
>> > Also, quantization workflow may require a calibration dataset to
>> calibrate
>> > the min and max and calib_mode.
>> > Arriving at a common API, for quantization with calibration and mixed
>> > precision inference (FP16 and BF16) may make the API too complicated and
>> > not very easy to use. I understand that this may cause some confusion as
>> > people may try to use target_dtype of int8 but I think its still better
>> > than causing user confusion with the API usage.
>> >
>> > Also, when we move quantize_model APIs outside contrib we can consider
>> > adding them under AMP namespace. The challenge would then be to educate
>> > users on difference between "quantize" and "convert".
>> >
>> > Anirudh
>> >
>> > On Mon, Apr 29, 2019 at 7:45 AM Lv, Tao A <ta...@intel.com> wrote:
>> >
>> > > Thank you for the explanation. Sorry I didn't realize the proposal is
>> for
>> > > inference only.
>> > >
>> > > Then how do you think the amp_cast and amp_multicast in this proposal
>> can
>> > > work with the existing INT8 quantization workflow which I think should
>> > also
>> > > be considered as 'mixed precision'.
>> > >
>> > > -----Original Message-----
>> > > From: Anirudh Subramanian [mailto:anirudh2290@gmail.com]
>> > > Sent: Monday, April 29, 2019 10:25 PM
>> > > To: dev@mxnet.incubator.apache.org
>> > > Subject: Re: Proposal for Conversion from FP32 to Mixed Precision
>> Models
>> > >
>> > > Hi Tao,
>> > >
>> > > The APIs proposed: "convert_model" and "convert_block" are mainly for
>> > > inference use cases, where customers bring a FP32 model to convert it
>> to
>> > a
>> > > mixed precision model to get improved performance while not losing
>> out on
>> > > the accuracy.
>> > > The PR: https://github.com/apache/incubator-mxnet/pull/14173 is
>> supposed
>> > > to handle the training use cases and this proposal doesn't cover the
>> AMP
>> > > feature added in the PR. I think ptrendx@ and canoerst@ are better
>> > > equipped to answer questions 1 and 2.
>> > >
>> > > > - more generally, what will be saved when users want to serialize
>> > > > their
>> > > model to disk?
>> > >
>> > > Lets say users want to save converted mixed precision model used for
>> > > inference to disk. It will save both, the symbol with the amp_cast and
>> > > amp_multicast operators and the params (which are casted if
>> necessary).
>> > >
>> > > Anirudh
>> > >
>> > >
>> > > On Mon, Apr 29, 2019 at 6:55 AM Lv, Tao A <ta...@intel.com> wrote:
>> > >
>> > > > Thank you for sharing this, Anirudh.
>> > > >
>> > > > Curious to know:
>> > > > - what will be saved in a training checkpoint or snapshot? Can it be
>> > > > resumed on another platform which might not support the lower
>> > > > precision the previous one used?
>> > > > - what will be saved in the final symbol.json and params file when
>> > > > training is finished?
>> > > > - more generally, what will be saved when users want to serialize
>> > > > their model to disk?
>> > > >
>> > > > Thank you,
>> > > > -tao
>> > > >
>> > > > -----Original Message-----
>> > > > From: Anirudh Subramanian [mailto:anirudh2290@gmail.com]
>> > > > Sent: Monday, April 29, 2019 7:00 PM
>> > > > To: dev@mxnet.incubator.apache.org
>> > > > Subject: Proposal for Conversion from FP32 to Mixed Precision Models
>> > > >
>> > > > Hi all,
>> > > >
>> > > > I have created a doc for conversion from FP32 to Mixed Precision
>> > Models:
>> > > >
>> > > >
>> https://cwiki.apache.org/confluence/display/MXNET/Conversion+from+FP32
>> > > > +to+Mixed+Precision+Models
>> > > >
>> > > > I look forward to your feedback on the same.
>> > > >
>> > > > Thanks,
>> > > > Anirudh
>> > > >
>> > >
>> >
>>
>

Re: Proposal for Conversion from FP32 to Mixed Precision Models

Posted by Anirudh Subramanian <an...@gmail.com>.

Hi Zach,

You raise an interesting point. Thank you for the pointer!

Incorporating CSE pass comes with its own cost, and the advantage it brings
is to make the ReducePrecision nnvm pass more lightweight. Since the
amortized cost of the ReducePrecision pass is O(1) it shouldn't matter much
whether we  add it or not from performance point of view.

From maintenance point of view, I would agree that separating these two
logics can be helpful if we have other such workflows which require the
original Pass followed by CSE pass. Currently, as far as I know only the
ReducePrecision pass using it. I will check to see if CSE pass can benefit
other NNVM pass also like quantization pass apart from ReducePrecision, and
will get back.

Anirudh

On Mon, Apr 29, 2019 at 11:18 AM Zach Kimberg <za...@gmail.com>
wrote:

> I have one suggestion. In the current design, there are the additional maps
> from each input entry to each target casted entry dtype in order to avoid
> creating duplicate casts. Instead of creating these, another option is to
> use a general purpose Common Subexpression Elimination (CSE) [1] pass to
> apply afterwards. So, you would run the mixed precision pass which creates
> the duplicates and then the CSE pass which would remove all duplicates.
>
> This design is common in existing compilers like LLVM because maintaining
> and testing the passes is much easier when they are kept as simple as
> possible. The CSE can also be reused as necessary for other passes that
> could create duplicates or to remove duplicate expressions in general. This
> tutorial [2] talks about it a bit.
>
> Zach
>
> [1] - https://en.wikipedia.org/wiki/Common_subexpression_elimination
> [2] - https://blog.regehr.org/archives/1603
>
> On Mon, Apr 29, 2019 at 9:26 AM Anirudh Subramanian <anirudh2290@gmail.com
> >
> wrote:
>
> > Hi Tao,
> >
> > Thanks for raising this question! I thought about the existing
> quantization
> > workflow and whether it can be included with the AMP API. Although
> > quantization can be considered as mixed precision, there are differences.
> > For example, only a small number of operators can be quantized compared
> to
> > the operators that can run in FP16 precision. Thus, overriding the
> > operators to run in original dtype vs target dtype doesnt make much sense
> > for quantization.
> >
> > Also, quantization workflow may require a calibration dataset to
> calibrate
> > the min and max and calib_mode.
> > Arriving at a common API, for quantization with calibration and mixed
> > precision inference (FP16 and BF16) may make the API too complicated and
> > not very easy to use. I understand that this may cause some confusion as
> > people may try to use target_dtype of int8 but I think its still better
> > than causing user confusion with the API usage.
> >
> > Also, when we move quantize_model APIs outside contrib we can consider
> > adding them under AMP namespace. The challenge would then be to educate
> > users on difference between "quantize" and "convert".
> >
> > Anirudh
> >
> > On Mon, Apr 29, 2019 at 7:45 AM Lv, Tao A <ta...@intel.com> wrote:
> >
> > > Thank you for the explanation. Sorry I didn't realize the proposal is
> for
> > > inference only.
> > >
> > > Then how do you think the amp_cast and amp_multicast in this proposal
> can
> > > work with the existing INT8 quantization workflow which I think should
> > also
> > > be considered as 'mixed precision'.
> > >
> > > -----Original Message-----
> > > From: Anirudh Subramanian [mailto:anirudh2290@gmail.com]
> > > Sent: Monday, April 29, 2019 10:25 PM
> > > To: dev@mxnet.incubator.apache.org
> > > Subject: Re: Proposal for Conversion from FP32 to Mixed Precision
> Models
> > >
> > > Hi Tao,
> > >
> > > The APIs proposed: "convert_model" and "convert_block" are mainly for
> > > inference use cases, where customers bring a FP32 model to convert it
> to
> > a
> > > mixed precision model to get improved performance while not losing out
> on
> > > the accuracy.
> > > The PR: https://github.com/apache/incubator-mxnet/pull/14173 is
> supposed
> > > to handle the training use cases and this proposal doesn't cover the
> AMP
> > > feature added in the PR. I think ptrendx@ and canoerst@ are better
> > > equipped to answer questions 1 and 2.
> > >
> > > > - more generally, what will be saved when users want to serialize
> > > > their
> > > model to disk?
> > >
> > > Lets say users want to save converted mixed precision model used for
> > > inference to disk. It will save both, the symbol with the amp_cast and
> > > amp_multicast operators and the params (which are casted if necessary).
> > >
> > > Anirudh
> > >
> > >
> > > On Mon, Apr 29, 2019 at 6:55 AM Lv, Tao A <ta...@intel.com> wrote:
> > >
> > > > Thank you for sharing this, Anirudh.
> > > >
> > > > Curious to know:
> > > > - what will be saved in a training checkpoint or snapshot? Can it be
> > > > resumed on another platform which might not support the lower
> > > > precision the previous one used?
> > > > - what will be saved in the final symbol.json and params file when
> > > > training is finished?
> > > > - more generally, what will be saved when users want to serialize
> > > > their model to disk?
> > > >
> > > > Thank you,
> > > > -tao
> > > >
> > > > -----Original Message-----
> > > > From: Anirudh Subramanian [mailto:anirudh2290@gmail.com]
> > > > Sent: Monday, April 29, 2019 7:00 PM
> > > > To: dev@mxnet.incubator.apache.org
> > > > Subject: Proposal for Conversion from FP32 to Mixed Precision Models
> > > >
> > > > Hi all,
> > > >
> > > > I have created a doc for conversion from FP32 to Mixed Precision
> > Models:
> > > >
> > > >
> https://cwiki.apache.org/confluence/display/MXNET/Conversion+from+FP32
> > > > +to+Mixed+Precision+Models
> > > >
> > > > I look forward to your feedback on the same.
> > > >
> > > > Thanks,
> > > > Anirudh
> > > >
> > >
> >
>

Re: Proposal for Conversion from FP32 to Mixed Precision Models

Posted by Zach Kimberg <za...@gmail.com>.

I have one suggestion. In the current design, there are the additional maps
from each input entry to each target casted entry dtype in order to avoid
creating duplicate casts. Instead of creating these, another option is to
use a general purpose Common Subexpression Elimination (CSE) [1] pass to
apply afterwards. So, you would run the mixed precision pass which creates
the duplicates and then the CSE pass which would remove all duplicates.

This design is common in existing compilers like LLVM because maintaining
and testing the passes is much easier when they are kept as simple as
possible. The CSE can also be reused as necessary for other passes that
could create duplicates or to remove duplicate expressions in general. This
tutorial [2] talks about it a bit.

Zach

[1] - https://en.wikipedia.org/wiki/Common_subexpression_elimination
[2] - https://blog.regehr.org/archives/1603

On Mon, Apr 29, 2019 at 9:26 AM Anirudh Subramanian <an...@gmail.com>
wrote:

> Hi Tao,
>
> Thanks for raising this question! I thought about the existing quantization
> workflow and whether it can be included with the AMP API. Although
> quantization can be considered as mixed precision, there are differences.
> For example, only a small number of operators can be quantized compared to
> the operators that can run in FP16 precision. Thus, overriding the
> operators to run in original dtype vs target dtype doesnt make much sense
> for quantization.
>
> Also, quantization workflow may require a calibration dataset to calibrate
> the min and max and calib_mode.
> Arriving at a common API, for quantization with calibration and mixed
> precision inference (FP16 and BF16) may make the API too complicated and
> not very easy to use. I understand that this may cause some confusion as
> people may try to use target_dtype of int8 but I think its still better
> than causing user confusion with the API usage.
>
> Also, when we move quantize_model APIs outside contrib we can consider
> adding them under AMP namespace. The challenge would then be to educate
> users on difference between "quantize" and "convert".
>
> Anirudh
>
> On Mon, Apr 29, 2019 at 7:45 AM Lv, Tao A <ta...@intel.com> wrote:
>
> > Thank you for the explanation. Sorry I didn't realize the proposal is for
> > inference only.
> >
> > Then how do you think the amp_cast and amp_multicast in this proposal can
> > work with the existing INT8 quantization workflow which I think should
> also
> > be considered as 'mixed precision'.
> >
> > -----Original Message-----
> > From: Anirudh Subramanian [mailto:anirudh2290@gmail.com]
> > Sent: Monday, April 29, 2019 10:25 PM
> > To: dev@mxnet.incubator.apache.org
> > Subject: Re: Proposal for Conversion from FP32 to Mixed Precision Models
> >
> > Hi Tao,
> >
> > The APIs proposed: "convert_model" and "convert_block" are mainly for
> > inference use cases, where customers bring a FP32 model to convert it to
> a
> > mixed precision model to get improved performance while not losing out on
> > the accuracy.
> > The PR: https://github.com/apache/incubator-mxnet/pull/14173 is supposed
> > to handle the training use cases and this proposal doesn't cover the AMP
> > feature added in the PR. I think ptrendx@ and canoerst@ are better
> > equipped to answer questions 1 and 2.
> >
> > > - more generally, what will be saved when users want to serialize
> > > their
> > model to disk?
> >
> > Lets say users want to save converted mixed precision model used for
> > inference to disk. It will save both, the symbol with the amp_cast and
> > amp_multicast operators and the params (which are casted if necessary).
> >
> > Anirudh
> >
> >
> > On Mon, Apr 29, 2019 at 6:55 AM Lv, Tao A <ta...@intel.com> wrote:
> >
> > > Thank you for sharing this, Anirudh.
> > >
> > > Curious to know:
> > > - what will be saved in a training checkpoint or snapshot? Can it be
> > > resumed on another platform which might not support the lower
> > > precision the previous one used?
> > > - what will be saved in the final symbol.json and params file when
> > > training is finished?
> > > - more generally, what will be saved when users want to serialize
> > > their model to disk?
> > >
> > > Thank you,
> > > -tao
> > >
> > > -----Original Message-----
> > > From: Anirudh Subramanian [mailto:anirudh2290@gmail.com]
> > > Sent: Monday, April 29, 2019 7:00 PM
> > > To: dev@mxnet.incubator.apache.org
> > > Subject: Proposal for Conversion from FP32 to Mixed Precision Models
> > >
> > > Hi all,
> > >
> > > I have created a doc for conversion from FP32 to Mixed Precision
> Models:
> > >
> > > https://cwiki.apache.org/confluence/display/MXNET/Conversion+from+FP32
> > > +to+Mixed+Precision+Models
> > >
> > > I look forward to your feedback on the same.
> > >
> > > Thanks,
> > > Anirudh
> > >
> >
>

Re: Proposal for Conversion from FP32 to Mixed Precision Models

Posted by Anirudh Subramanian <an...@gmail.com>.

Hi Tao,

Thanks for raising this question! I thought about the existing quantization
workflow and whether it can be included with the AMP API. Although
quantization can be considered as mixed precision, there are differences.
For example, only a small number of operators can be quantized compared to
the operators that can run in FP16 precision. Thus, overriding the
operators to run in original dtype vs target dtype doesnt make much sense
for quantization.

Also, quantization workflow may require a calibration dataset to calibrate
the min and max and calib_mode.
Arriving at a common API, for quantization with calibration and mixed
precision inference (FP16 and BF16) may make the API too complicated and
not very easy to use. I understand that this may cause some confusion as
people may try to use target_dtype of int8 but I think its still better
than causing user confusion with the API usage.

Also, when we move quantize_model APIs outside contrib we can consider
adding them under AMP namespace. The challenge would then be to educate
users on difference between "quantize" and "convert".

Anirudh

On Mon, Apr 29, 2019 at 7:45 AM Lv, Tao A <ta...@intel.com> wrote:

> Thank you for the explanation. Sorry I didn't realize the proposal is for
> inference only.
>
> Then how do you think the amp_cast and amp_multicast in this proposal can
> work with the existing INT8 quantization workflow which I think should also
> be considered as 'mixed precision'.
>
> -----Original Message-----
> From: Anirudh Subramanian [mailto:anirudh2290@gmail.com]
> Sent: Monday, April 29, 2019 10:25 PM
> To: dev@mxnet.incubator.apache.org
> Subject: Re: Proposal for Conversion from FP32 to Mixed Precision Models
>
> Hi Tao,
>
> The APIs proposed: "convert_model" and "convert_block" are mainly for
> inference use cases, where customers bring a FP32 model to convert it to a
> mixed precision model to get improved performance while not losing out on
> the accuracy.
> The PR: https://github.com/apache/incubator-mxnet/pull/14173 is supposed
> to handle the training use cases and this proposal doesn't cover the AMP
> feature added in the PR. I think ptrendx@ and canoerst@ are better
> equipped to answer questions 1 and 2.
>
> > - more generally, what will be saved when users want to serialize
> > their
> model to disk?
>
> Lets say users want to save converted mixed precision model used for
> inference to disk. It will save both, the symbol with the amp_cast and
> amp_multicast operators and the params (which are casted if necessary).
>
> Anirudh
>
>
> On Mon, Apr 29, 2019 at 6:55 AM Lv, Tao A <ta...@intel.com> wrote:
>
> > Thank you for sharing this, Anirudh.
> >
> > Curious to know:
> > - what will be saved in a training checkpoint or snapshot? Can it be
> > resumed on another platform which might not support the lower
> > precision the previous one used?
> > - what will be saved in the final symbol.json and params file when
> > training is finished?
> > - more generally, what will be saved when users want to serialize
> > their model to disk?
> >
> > Thank you,
> > -tao
> >
> > -----Original Message-----
> > From: Anirudh Subramanian [mailto:anirudh2290@gmail.com]
> > Sent: Monday, April 29, 2019 7:00 PM
> > To: dev@mxnet.incubator.apache.org
> > Subject: Proposal for Conversion from FP32 to Mixed Precision Models
> >
> > Hi all,
> >
> > I have created a doc for conversion from FP32 to Mixed Precision Models:
> >
> > https://cwiki.apache.org/confluence/display/MXNET/Conversion+from+FP32
> > +to+Mixed+Precision+Models
> >
> > I look forward to your feedback on the same.
> >
> > Thanks,
> > Anirudh
> >
>

RE: Proposal for Conversion from FP32 to Mixed Precision Models

Posted by "Lv, Tao A" <ta...@intel.com>.

Thank you for the explanation. Sorry I didn't realize the proposal is for inference only.

Then how do you think the amp_cast and amp_multicast in this proposal can work with the existing INT8 quantization workflow which I think should also be considered as 'mixed precision'.

-----Original Message-----
From: Anirudh Subramanian [mailto:anirudh2290@gmail.com] 
Sent: Monday, April 29, 2019 10:25 PM
To: dev@mxnet.incubator.apache.org
Subject: Re: Proposal for Conversion from FP32 to Mixed Precision Models

Hi Tao,

The APIs proposed: "convert_model" and "convert_block" are mainly for inference use cases, where customers bring a FP32 model to convert it to a mixed precision model to get improved performance while not losing out on the accuracy.
The PR: https://github.com/apache/incubator-mxnet/pull/14173 is supposed to handle the training use cases and this proposal doesn't cover the AMP feature added in the PR. I think ptrendx@ and canoerst@ are better equipped to answer questions 1 and 2.

> - more generally, what will be saved when users want to serialize 
> their
model to disk?

Lets say users want to save converted mixed precision model used for inference to disk. It will save both, the symbol with the amp_cast and amp_multicast operators and the params (which are casted if necessary).

Anirudh

On Mon, Apr 29, 2019 at 6:55 AM Lv, Tao A <ta...@intel.com> wrote:

> Thank you for sharing this, Anirudh.
>
> Curious to know:
> - what will be saved in a training checkpoint or snapshot? Can it be 
> resumed on another platform which might not support the lower 
> precision the previous one used?
> - what will be saved in the final symbol.json and params file when 
> training is finished?
> - more generally, what will be saved when users want to serialize 
> their model to disk?
>
> Thank you,
> -tao
>
> -----Original Message-----
> From: Anirudh Subramanian [mailto:anirudh2290@gmail.com]
> Sent: Monday, April 29, 2019 7:00 PM
> To: dev@mxnet.incubator.apache.org
> Subject: Proposal for Conversion from FP32 to Mixed Precision Models
>
> Hi all,
>
> I have created a doc for conversion from FP32 to Mixed Precision Models:
>
> https://cwiki.apache.org/confluence/display/MXNET/Conversion+from+FP32
> +to+Mixed+Precision+Models
>
> I look forward to your feedback on the same.
>
> Thanks,
> Anirudh
>

Re: Proposal for Conversion from FP32 to Mixed Precision Models

Posted by Anirudh Subramanian <an...@gmail.com>.

Hi Tao,

The APIs proposed: "convert_model" and "convert_block" are mainly for
inference use cases, where customers bring a FP32 model to convert it to a
mixed precision model to get improved performance while not losing out on
the accuracy.
The PR: https://github.com/apache/incubator-mxnet/pull/14173 is supposed to
handle the training use cases and this proposal doesn't cover the AMP
feature added in the PR. I think ptrendx@ and canoerst@ are better equipped
to answer questions 1 and 2.

> - more generally, what will be saved when users want to serialize their
model to disk?

Lets say users want to save converted mixed precision model used for
inference to disk. It will save both, the symbol with the amp_cast and
amp_multicast operators and the params (which are casted if necessary).

Anirudh

On Mon, Apr 29, 2019 at 6:55 AM Lv, Tao A <ta...@intel.com> wrote:

> Thank you for sharing this, Anirudh.
>
> Curious to know:
> - what will be saved in a training checkpoint or snapshot? Can it be
> resumed on another platform which might not support the lower precision the
> previous one used?
> - what will be saved in the final symbol.json and params file when
> training is finished?
> - more generally, what will be saved when users want to serialize their
> model to disk?
>
> Thank you,
> -tao
>
> -----Original Message-----
> From: Anirudh Subramanian [mailto:anirudh2290@gmail.com]
> Sent: Monday, April 29, 2019 7:00 PM
> To: dev@mxnet.incubator.apache.org
> Subject: Proposal for Conversion from FP32 to Mixed Precision Models
>
> Hi all,
>
> I have created a doc for conversion from FP32 to Mixed Precision Models:
>
> https://cwiki.apache.org/confluence/display/MXNET/Conversion+from+FP32+to+Mixed+Precision+Models
>
> I look forward to your feedback on the same.
>
> Thanks,
> Anirudh
>

RE: Proposal for Conversion from FP32 to Mixed Precision Models

Posted by "Lv, Tao A" <ta...@intel.com>.

Thank you for sharing this, Anirudh.

Curious to know:
- what will be saved in a training checkpoint or snapshot? Can it be resumed on another platform which might not support the lower precision the previous one used?
- what will be saved in the final symbol.json and params file when training is finished?
- more generally, what will be saved when users want to serialize their model to disk?

Thank you,
-tao

-----Original Message-----
From: Anirudh Subramanian [mailto:anirudh2290@gmail.com] 
Sent: Monday, April 29, 2019 7:00 PM
To: dev@mxnet.incubator.apache.org
Subject: Proposal for Conversion from FP32 to Mixed Precision Models

Hi all,

I have created a doc for conversion from FP32 to Mixed Precision Models:
https://cwiki.apache.org/confluence/display/MXNET/Conversion+from+FP32+to+Mixed+Precision+Models

I look forward to your feedback on the same.

Thanks,
Anirudh