You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mxnet.apache.org by "Zhao, Patric" <pa...@intel.com> on 2018/11/28 12:06:37 UTC

LSTM regression (was RE: Include MKLDNN into default mxnet pip package)

Hi Anirudh,

The LSTM performance bug is fixed by MKL-DNN and PR  in here (https://github.com/apache/incubator-mxnet/pull/13417).

I am still working on MKL-DNN team to get a patch release for MXNet 1.4 in 1 or 2 days.

Will update the status soon.

Thanks everyone.

--Patric

> -----Original Message-----
> From: Anirudh Subramanian [mailto:anirudh2290@gmail.com]
> Sent: Tuesday, November 27, 2018 6:16 AM
> To: dev@mxnet.incubator.apache.org
> Subject: Re: Include MKLDNN into default mxnet pip package
> 
> Hi Tao,
> 
> I agree with Steffen that we can start with a stable release for MKLDNN for
> 1.4.0. For your suggestion on using 0.17, can you provide info on what
> versioning mechanism MKLDNN uses. Once a MKLDNN release is out and
> there are some regressions found like the LSTM regression, would it be
> possible to do a patch release for it or maintain a release branch for it ?
> 
> Anirudh
> 
> On Sun, Nov 25, 2018 at 5:03 PM Lv, Tao A <ta...@intel.com> wrote:
> 
> > Hi Steffen,
> >
> > I think all the commits on MKL-DNN master branch are well tested for
> > MKL-DNN development team. If we really want to have a release commit
> > in the coming 1.4 mxnet release, my suggestion is 0.17 MKL-DNN release.
> >
> > Thank you,
> > Tao
> >
> > Sent from my iPhone
> >
> > > On Nov 26, 2018, at 8:09 AM, Steffen Rochel
> > > <st...@gmail.com>
> > wrote:
> > >
> > > +1 to make MKL-DNN default.
> > > I'm tracking  https://github.com/apache/incubator-mxnet/issues/13369
> > > as open issue to be addressed for 1.4.0 I do agree that we should
> > > move to a model to include released
> > dependencies
> > > instead of just taking bleeding edge snapshots.
> > > However, speed of development is important as well.
> > > As a compromise for 1.4.0 release with MKL-DNN: can the MKL-DNN
> > development
> > > team provide us with a well tested tag/commit id to include in 1.4.0
> > > release?
> > > Steffen
> > >
> > >> On Wed, Nov 21, 2018 at 11:42 PM Lv, Tao A <ta...@intel.com>
> wrote:
> > >>
> > >> Thanks for the information, Kellen and Naveen.
> > >>
> > >> Better than onnx-tensorrt, MKL-DNN has already provided versioning
> > >> and release tags. My concern is that as MKL-DNN is still under
> > >> intensive development, if it has a new feature or bug fix on its
> > >> master branch,
> > do we
> > >> really want to wait for next release to get it supported in MXNet?
> > >>
> > >> Take the LSTM regression as an example, probably MKL-DNN will give
> > >> a fix or improvement on its master branch soon, do we need to wait
> > >> for 0.18 release to get it fixed for mxnet user? AFAIK, tensorflow
> > >> is also using normal commit id, not release, as the dependency for MKL-
> DNN.
> > >>
> > >> Regarding the LSTM regression, we are using internal JIRA tickets
> > >> rather than github issues to track the defects of MKL-DNN. But I
> > >> agree with
> > you,
> > >> we need update the progress of it in Alex's issue.
> > >>
> > >> Thanks,
> > >> -tao
> > >>
> > >> -----Original Message-----
> > >> From: kellen sunderland [mailto:kellen.sunderland@gmail.com]
> > >> Sent: Thursday, November 22, 2018 10:55 AM
> > >> To: dev@mxnet.incubator.apache.org
> > >> Subject: Re: Include MKLDNN into default mxnet pip package
> > >>
> > >> Agree with your point about other repos also not being based on
> > versioning
> > >> Tao.  I would point out that I've given some that I've worked with
> > similar
> > >> feedback: https://github.com/onnx/onnx-tensorrt/issues/68
> > >>
> > >>> On Wed, Nov 21, 2018 at 6:48 PM Naveen Swamy
> <mn...@gmail.com>
> > wrote:
> > >>>
> > >>> Tao,
> > >>>
> > >>> You are right there are many submodules in 3rd party. We have to
> > >>> start somewhere and I believe this one is a good candidate to start
> with.
> > >>> This is not to cater to release of MXNet or to tie them with the
> > >>> releases of the submodules but instead to pick only stable
> > >>> releases and not to pick up bleeding edge commits from the tip of
> > >>> the master, this gives us confidence in the submodule that MXNet
> > >>> users are depending on that especially if we make MKLDNN the default.
> > >>>
> > >>> Good to know it is known already as a regression.Alex has created
> > >>> this issue https://github.com/apache/incubator-mxnet/issues/13369,
> > >>> please add details and link the corresponding issue in MKLDNN(I
> > >>> couldn't
> > find).
> > >>>
> > >>> -Naveen
> > >>>
> > >>>> On Wed, Nov 21, 2018 at 6:04 PM Lv, Tao A <ta...@intel.com>
> wrote:
> > >>>>
> > >>>> Here are my answers for the questions from Kellen and Naveen
> > >>>> about MKL-DNN. It doesn't mean that I'm supportive for making
> > >>>> MKL-DNN default here.
> > >>>>
> > >>>> @Kellen,
> > >>>>
> > >>>> FYI, here is a list for those platforms which are officially
> > >>>> supported by MKL-DNN.
> > >>>> https://github.com/intel/mkl-dnn#system-requirements
> > >>>>
> > >>>> Most of computation intensive kernels in MKL-DNN are JITed. So
> > >>>> they are supposed to generate code according to the platform
> > >>>> during runtime. For non-JIT code in MKL-DNN, same as other code
> > >>>> in MXNet, it will generate instructions according to the
> > >>>> options/flags of compiler. We can set -DARCH_OPT_FLAGS when
> build
> > >>>> MKL-DNN to avoid optimization for compiling machine. That's
> > >>>> exactly what we are doing
> > >> for MKL-DNN build in MXNet.
> > >>> Even
> > >>>> without MKL-DNN, I noticed there were issues about illegal
> > >>>> instructions
> > >>> of
> > >>>> MXNet when users import the pip package on a lower end machine
> > >>>> which probably only supports SSE.
> > >>>>
> > >>>> @Naveen,
> > >>>>
> > >>>> The LSTM issue has already been identified as a regression from
> > >>>> the
> > >>> recent
> > >>>> version of MKL-DNN. Hopefully it will be fixed soon with a new
> > >>>> update of MKL-DNN.
> > >>>>
> > >>>> MXNet has many submodule dependencies under the 3rd party folder.
> > >>>> Seems
> > >>> we
> > >>>> don't require release versions for most of these dependencies.
> > >>>> The
> > >>> release
> > >>>> period of MKL-DNN and MXNet are not matched very well. I think it
> > >>>> would
> > >>> be
> > >>>> a risk for MXNet release if it hardly depends on the release of a
> > >>>> submodule, no need to say depends on the releases of all submodules.
> > >>>>
> > >>>> -tao
> > >>>>
> > >>>> -----Original Message-----
> > >>>> From: Naveen Swamy [mailto:mnnaveen@gmail.com]
> > >>>> Sent: Thursday, November 22, 2018 9:08 AM
> > >>>> To: dev@mxnet.incubator.apache.org
> > >>>> Cc: dev@mxnet.apache.org
> > >>>> Subject: Re: Include MKLDNN into default mxnet pip package
> > >>>>
> > >>>> Hi Alex,
> > >>>>
> > >>>> Thanks for promptly running the numbers on AMD and reporting here.
> > >>>>
> > >>>> Can you please update the AMD numbers here for posterity
> > >>>>
> > >>>
> https://cwiki.apache.org/confluence/display/MXNET/MXNet+with+Intel
> > >>> +MKL
> > >>> -DNN+-+Performance+Benchmarking
> > >>>> ?
> > >>>>
> > >>>> are there any outstanding issues when MKLDNN is enabled? from my
> > >>>> offline conversation I am briefly aware performance issues with
> > >>>> LSTM, is there an GitHub issue for it?
> > >>>>
> > >>>> MKLDNN is a submodule dependency, are we pulling the latest
> > >>>> commit or releases  ? If not we should move to releases before we
> > >>>> make it a
> > >>> default.
> > >>>> Ideally we should use platform specific distributions (-dev
> > >>>> packages) at least we should rely on well tested releases.
> > >>>>
> > >>>>
> > >>>> Thanks, Naveen
> > >>>>
> > >>>> On Wed, Nov 21, 2018 at 4:55 PM Zai, Alexander
> > >>> <alexzai@amazon.com.invalid
> > >>>>>
> > >>>> wrote:
> > >>>>
> > >>>>> AMD benchmarks have been published. We are seeing a x15.8
> > >>>>> speedup with
> > >>>>> Resnet50 (batch size 32) on AWS's new m5a.24xlarge machine. With
> > >>>>> a smaller network (Mobilenet - batch size 32) the speedup is
> > >>>>> more significant at x38.7. Let's have a vote to see if the PR to
> > >>>>> have MKLDNN enabled by default
> > >>>>> (https://github.com/apache/incubator-mxnet/pull/12591) can be
> > >>>>> merged before 1.4.0 release.
> > >>>>>
> > >>>>> On 10/19/18, 9:17 AM, "Pedro Larroy"
> > >>>>> <pe...@gmail.com>
> > >>>>> wrote:
> > >>>>>
> > >>>>>    I did  pip install mxnet-mkl==1.3.1b20181018 on an AMD Ryzen
> > >>>>> 1950X and unit
> > >>>>>    tests are passing.
> > >>>>>
> > >>>>>    Is this build using AVX512?  in /proc/cpuinfo I see only "avx"
> > >>> flag.
> > >>>>>    There's no "avx2" like on recent intel cpus.
> > >>>>>
> > >>>>>    Pedro.
> > >>>>>
> > >>>>>    On Fri, Oct 19, 2018 at 5:12 PM Hagay Lupesko
> > >>>>> <lu...@gmail.com>
> > >>>>> wrote:
> > >>>>>
> > >>>>>> Awesome collaborative effort across many contributors and
> > >>>> companies!
> > >>>>>>
> > >>>>>> The boost is impressive and for MXNet users to get this
> > >>>>> boost "out of the
> > >>>>>> box" is a great benefit and makes MXNet an even better choice.
> > >>>>>>
> > >>>>>> Alex - can you clarify whether there are any down sides with
> > >>>>> regards to
> > >>>>>> noon AVX-512 architectures, AMD CPUs, etc? Will it
> > >>>>> gracefully fallback?
> > >>>>>>
> > >>>>>> Hagay
> > >>>>>>
> > >>>>>>
> > >>>>>> On Fri, Oct 19, 2018, 15:46 Sergio Fernández
> > >>>>> <wi...@apache.org>
> > >>>>> wrote:
> > >>>>>>
> > >>>>>>> If there is no downside on platforms not supporting AVX512
> > >>>>> instructions,
> > >>>>>>> then +1
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> On Wed, Oct 17, 2018, 14:10 Alex Zai <az...@gmail.com>
> > >> wrote:
> > >>>>>>>
> > >>>>>>>> Hey all,
> > >>>>>>>> We have been working hard these past few months to
> > >>>>> integrate
> > >>>> and
> > >>>>>>> stabilize
> > >>>>>>>> Intel’s MKLDNN deep learning CPU accelerator into Mxnet
> > >>>>> and have made
> > >>>>>>>> incredible progress. On CPUs with AVX512 instructions
> > >>>>> (such as
> > >>>>> c5.18x)
> > >>>>>> we
> > >>>>>>>> have seen performance increase up to 12x and on other
> > >>>>> platforms (Macs,
> > >>>>>>>> AVX2) we seen a speedup of 1.5+. Full list of benchmarks
> > >>>>> can be found
> > >>>>>>> here
> > >>>>>>>> (
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=9
> > >>> 5650
> > >>> 764
> > >>>>>>>> and https://github.com/apache/incubator-mxnet/pull/12591
> > >> ).
> > >>>>>>>>
> > >>>>>>>> Currently, using this accelerator requires the developer
> > >>>>> to either pip
> > >>>>>>>> install the mxnet-mkl version of mxnet or to build it
> > >>>>> themselves from
> > >>>>>>>> source. Given that we should try to provide the best
> > >>>>> performance "out
> > >>>>>> of
> > >>>>>>>> the box” with mxnet we should include this in the
> > >>>>> default
> > >>>> build.
> > >>>>> The
> > >>>>>>> mkldnn
> > >>>>>>>> library is included with in the pip package build so it
> > >>>>> does
> > >>>> not
> > >>>>>> require
> > >>>>>>> an
> > >>>>>>>> external dependency.
> > >>>>>>>>
> > >>>>>>>> There were concerns that MKLDNN could cause regressions
> > >>>>> on certain
> > >>>>>>>> platforms (as it did with the tensorflow version a while
> > >>>>> back); but we
> > >>>>>>>> added a env flag (MXNET_MKLDNN_ENABLED) that allows
> > >>>>> users to turn of
> > >>>>>> this
> > >>>>>>>> feature during runtime. Please bring up any other
> > >>>>> concerns you may have
> > >>>>>>> and
> > >>>>>>>> your thoughts on including this accelerator in the
> > >>>>> default
> > >>>> build.
> > >>>>>>>>
> > >>>>>>>> Best,
> > >>>>>>>> Alex
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>
> >

Re: LSTM regression (was RE: Include MKLDNN into default mxnet pip package)

Posted by "Zai, Alexander" <al...@amazon.com.INVALID>.

Ran benchmark and it addresses issue. Thanks.

On 11/28/18, 6:02 PM, "Zhao, Patric" <pa...@intel.com> wrote:

    MKL-DNN v0.17.1 is released https://github.com/intel/mkl-dnn/tree/v0.17.1
    
    I have submitted the PR to pin this release version.
    
    Thanks,
    
    --Patric
    
    > -----Original Message-----
    > From: Zhao, Patric [mailto:patric.zhao@intel.com]
    > Sent: Wednesday, November 28, 2018 8:07 PM
    > To: dev@mxnet.incubator.apache.org
    > Subject: LSTM regression (was RE: Include MKLDNN into default mxnet pip
    > package)
    > 
    > Hi Anirudh,
    > 
    > The LSTM performance bug is fixed by MKL-DNN and PR  in here
    > (https://github.com/apache/incubator-mxnet/pull/13417).
    > 
    > I am still working on MKL-DNN team to get a patch release for MXNet 1.4 in
    > 1 or 2 days.
    > 
    > Will update the status soon.
    > 
    > Thanks everyone.
    > 
    > --Patric
    > 
    > > -----Original Message-----
    > > From: Anirudh Subramanian [mailto:anirudh2290@gmail.com]
    > > Sent: Tuesday, November 27, 2018 6:16 AM
    > > To: dev@mxnet.incubator.apache.org
    > > Subject: Re: Include MKLDNN into default mxnet pip package
    > >
    > > Hi Tao,
    > >
    > > I agree with Steffen that we can start with a stable release for
    > > MKLDNN for 1.4.0. For your suggestion on using 0.17, can you provide
    > > info on what versioning mechanism MKLDNN uses. Once a MKLDNN
    > release
    > > is out and there are some regressions found like the LSTM regression,
    > > would it be possible to do a patch release for it or maintain a release
    > branch for it ?
    > >
    > > Anirudh
    > >
    > > On Sun, Nov 25, 2018 at 5:03 PM Lv, Tao A <ta...@intel.com> wrote:
    > >
    > > > Hi Steffen,
    > > >
    > > > I think all the commits on MKL-DNN master branch are well tested for
    > > > MKL-DNN development team. If we really want to have a release commit
    > > > in the coming 1.4 mxnet release, my suggestion is 0.17 MKL-DNN release.
    > > >
    > > > Thank you,
    > > > Tao
    > > >
    > > > Sent from my iPhone
    > > >
    > > > > On Nov 26, 2018, at 8:09 AM, Steffen Rochel
    > > > > <st...@gmail.com>
    > > > wrote:
    > > > >
    > > > > +1 to make MKL-DNN default.
    > > > > I'm tracking
    > > > > https://github.com/apache/incubator-mxnet/issues/13369
    > > > > as open issue to be addressed for 1.4.0 I do agree that we should
    > > > > move to a model to include released
    > > > dependencies
    > > > > instead of just taking bleeding edge snapshots.
    > > > > However, speed of development is important as well.
    > > > > As a compromise for 1.4.0 release with MKL-DNN: can the MKL-DNN
    > > > development
    > > > > team provide us with a well tested tag/commit id to include in
    > > > > 1.4.0 release?
    > > > > Steffen
    > > > >
    > > > >> On Wed, Nov 21, 2018 at 11:42 PM Lv, Tao A <ta...@intel.com>
    > > wrote:
    > > > >>
    > > > >> Thanks for the information, Kellen and Naveen.
    > > > >>
    > > > >> Better than onnx-tensorrt, MKL-DNN has already provided
    > > > >> versioning and release tags. My concern is that as MKL-DNN is
    > > > >> still under intensive development, if it has a new feature or bug
    > > > >> fix on its master branch,
    > > > do we
    > > > >> really want to wait for next release to get it supported in MXNet?
    > > > >>
    > > > >> Take the LSTM regression as an example, probably MKL-DNN will
    > > > >> give a fix or improvement on its master branch soon, do we need
    > > > >> to wait for 0.18 release to get it fixed for mxnet user? AFAIK,
    > > > >> tensorflow is also using normal commit id, not release, as the
    > > > >> dependency for MKL-
    > > DNN.
    > > > >>
    > > > >> Regarding the LSTM regression, we are using internal JIRA tickets
    > > > >> rather than github issues to track the defects of MKL-DNN. But I
    > > > >> agree with
    > > > you,
    > > > >> we need update the progress of it in Alex's issue.
    > > > >>
    > > > >> Thanks,
    > > > >> -tao
    > > > >>
    > > > >> -----Original Message-----
    > > > >> From: kellen sunderland [mailto:kellen.sunderland@gmail.com]
    > > > >> Sent: Thursday, November 22, 2018 10:55 AM
    > > > >> To: dev@mxnet.incubator.apache.org
    > > > >> Subject: Re: Include MKLDNN into default mxnet pip package
    > > > >>
    > > > >> Agree with your point about other repos also not being based on
    > > > versioning
    > > > >> Tao.  I would point out that I've given some that I've worked
    > > > >> with
    > > > similar
    > > > >> feedback: https://github.com/onnx/onnx-tensorrt/issues/68
    > > > >>
    > > > >>> On Wed, Nov 21, 2018 at 6:48 PM Naveen Swamy
    > > <mn...@gmail.com>
    > > > wrote:
    > > > >>>
    > > > >>> Tao,
    > > > >>>
    > > > >>> You are right there are many submodules in 3rd party. We have to
    > > > >>> start somewhere and I believe this one is a good candidate to
    > > > >>> start
    > > with.
    > > > >>> This is not to cater to release of MXNet or to tie them with the
    > > > >>> releases of the submodules but instead to pick only stable
    > > > >>> releases and not to pick up bleeding edge commits from the tip
    > > > >>> of the master, this gives us confidence in the submodule that
    > > > >>> MXNet users are depending on that especially if we make MKLDNN
    > the default.
    > > > >>>
    > > > >>> Good to know it is known already as a regression.Alex has
    > > > >>> created this issue
    > > > >>> https://github.com/apache/incubator-mxnet/issues/13369,
    > > > >>> please add details and link the corresponding issue in MKLDNN(I
    > > > >>> couldn't
    > > > find).
    > > > >>>
    > > > >>> -Naveen
    > > > >>>
    > > > >>>> On Wed, Nov 21, 2018 at 6:04 PM Lv, Tao A <ta...@intel.com>
    > > wrote:
    > > > >>>>
    > > > >>>> Here are my answers for the questions from Kellen and Naveen
    > > > >>>> about MKL-DNN. It doesn't mean that I'm supportive for making
    > > > >>>> MKL-DNN default here.
    > > > >>>>
    > > > >>>> @Kellen,
    > > > >>>>
    > > > >>>> FYI, here is a list for those platforms which are officially
    > > > >>>> supported by MKL-DNN.
    > > > >>>> https://github.com/intel/mkl-dnn#system-requirements
    > > > >>>>
    > > > >>>> Most of computation intensive kernels in MKL-DNN are JITed. So
    > > > >>>> they are supposed to generate code according to the platform
    > > > >>>> during runtime. For non-JIT code in MKL-DNN, same as other code
    > > > >>>> in MXNet, it will generate instructions according to the
    > > > >>>> options/flags of compiler. We can set -DARCH_OPT_FLAGS when
    > > build
    > > > >>>> MKL-DNN to avoid optimization for compiling machine. That's
    > > > >>>> exactly what we are doing
    > > > >> for MKL-DNN build in MXNet.
    > > > >>> Even
    > > > >>>> without MKL-DNN, I noticed there were issues about illegal
    > > > >>>> instructions
    > > > >>> of
    > > > >>>> MXNet when users import the pip package on a lower end machine
    > > > >>>> which probably only supports SSE.
    > > > >>>>
    > > > >>>> @Naveen,
    > > > >>>>
    > > > >>>> The LSTM issue has already been identified as a regression from
    > > > >>>> the
    > > > >>> recent
    > > > >>>> version of MKL-DNN. Hopefully it will be fixed soon with a new
    > > > >>>> update of MKL-DNN.
    > > > >>>>
    > > > >>>> MXNet has many submodule dependencies under the 3rd party
    > folder.
    > > > >>>> Seems
    > > > >>> we
    > > > >>>> don't require release versions for most of these dependencies.
    > > > >>>> The
    > > > >>> release
    > > > >>>> period of MKL-DNN and MXNet are not matched very well. I think
    > > > >>>> it would
    > > > >>> be
    > > > >>>> a risk for MXNet release if it hardly depends on the release of
    > > > >>>> a submodule, no need to say depends on the releases of all
    > submodules.
    > > > >>>>
    > > > >>>> -tao
    > > > >>>>
    > > > >>>> -----Original Message-----
    > > > >>>> From: Naveen Swamy [mailto:mnnaveen@gmail.com]
    > > > >>>> Sent: Thursday, November 22, 2018 9:08 AM
    > > > >>>> To: dev@mxnet.incubator.apache.org
    > > > >>>> Cc: dev@mxnet.apache.org
    > > > >>>> Subject: Re: Include MKLDNN into default mxnet pip package
    > > > >>>>
    > > > >>>> Hi Alex,
    > > > >>>>
    > > > >>>> Thanks for promptly running the numbers on AMD and reporting
    > here.
    > > > >>>>
    > > > >>>> Can you please update the AMD numbers here for posterity
    > > > >>>>
    > > > >>>
    > > https://cwiki.apache.org/confluence/display/MXNET/MXNet+with+Intel
    > > > >>> +MKL
    > > > >>> -DNN+-+Performance+Benchmarking
    > > > >>>> ?
    > > > >>>>
    > > > >>>> are there any outstanding issues when MKLDNN is enabled? from
    > > > >>>> my offline conversation I am briefly aware performance issues
    > > > >>>> with LSTM, is there an GitHub issue for it?
    > > > >>>>
    > > > >>>> MKLDNN is a submodule dependency, are we pulling the latest
    > > > >>>> commit or releases  ? If not we should move to releases before
    > > > >>>> we make it a
    > > > >>> default.
    > > > >>>> Ideally we should use platform specific distributions (-dev
    > > > >>>> packages) at least we should rely on well tested releases.
    > > > >>>>
    > > > >>>>
    > > > >>>> Thanks, Naveen
    > > > >>>>
    > > > >>>> On Wed, Nov 21, 2018 at 4:55 PM Zai, Alexander
    > > > >>> <alexzai@amazon.com.invalid
    > > > >>>>>
    > > > >>>> wrote:
    > > > >>>>
    > > > >>>>> AMD benchmarks have been published. We are seeing a x15.8
    > > > >>>>> speedup with
    > > > >>>>> Resnet50 (batch size 32) on AWS's new m5a.24xlarge machine.
    > > > >>>>> With a smaller network (Mobilenet - batch size 32) the speedup
    > > > >>>>> is more significant at x38.7. Let's have a vote to see if the
    > > > >>>>> PR to have MKLDNN enabled by default
    > > > >>>>> (https://github.com/apache/incubator-mxnet/pull/12591) can be
    > > > >>>>> merged before 1.4.0 release.
    > > > >>>>>
    > > > >>>>> On 10/19/18, 9:17 AM, "Pedro Larroy"
    > > > >>>>> <pe...@gmail.com>
    > > > >>>>> wrote:
    > > > >>>>>
    > > > >>>>>    I did  pip install mxnet-mkl==1.3.1b20181018 on an AMD
    > > > >>>>> Ryzen 1950X and unit
    > > > >>>>>    tests are passing.
    > > > >>>>>
    > > > >>>>>    Is this build using AVX512?  in /proc/cpuinfo I see only "avx"
    > > > >>> flag.
    > > > >>>>>    There's no "avx2" like on recent intel cpus.
    > > > >>>>>
    > > > >>>>>    Pedro.
    > > > >>>>>
    > > > >>>>>    On Fri, Oct 19, 2018 at 5:12 PM Hagay Lupesko
    > > > >>>>> <lu...@gmail.com>
    > > > >>>>> wrote:
    > > > >>>>>
    > > > >>>>>> Awesome collaborative effort across many contributors and
    > > > >>>> companies!
    > > > >>>>>>
    > > > >>>>>> The boost is impressive and for MXNet users to get this
    > > > >>>>> boost "out of the
    > > > >>>>>> box" is a great benefit and makes MXNet an even better choice.
    > > > >>>>>>
    > > > >>>>>> Alex - can you clarify whether there are any down sides with
    > > > >>>>> regards to
    > > > >>>>>> noon AVX-512 architectures, AMD CPUs, etc? Will it
    > > > >>>>> gracefully fallback?
    > > > >>>>>>
    > > > >>>>>> Hagay
    > > > >>>>>>
    > > > >>>>>>
    > > > >>>>>> On Fri, Oct 19, 2018, 15:46 Sergio Fernández
    > > > >>>>> <wi...@apache.org>
    > > > >>>>> wrote:
    > > > >>>>>>
    > > > >>>>>>> If there is no downside on platforms not supporting AVX512
    > > > >>>>> instructions,
    > > > >>>>>>> then +1
    > > > >>>>>>>
    > > > >>>>>>>
    > > > >>>>>>> On Wed, Oct 17, 2018, 14:10 Alex Zai <az...@gmail.com>
    > > > >> wrote:
    > > > >>>>>>>
    > > > >>>>>>>> Hey all,
    > > > >>>>>>>> We have been working hard these past few months to
    > > > >>>>> integrate
    > > > >>>> and
    > > > >>>>>>> stabilize
    > > > >>>>>>>> Intel’s MKLDNN deep learning CPU accelerator into Mxnet
    > > > >>>>> and have made
    > > > >>>>>>>> incredible progress. On CPUs with AVX512 instructions
    > > > >>>>> (such as
    > > > >>>>> c5.18x)
    > > > >>>>>> we
    > > > >>>>>>>> have seen performance increase up to 12x and on other
    > > > >>>>> platforms (Macs,
    > > > >>>>>>>> AVX2) we seen a speedup of 1.5+. Full list of benchmarks
    > > > >>>>> can be found
    > > > >>>>>>> here
    > > > >>>>>>>> (
    > > > >>>>>>>>
    > > > >>>>>>>
    > > > >>>>>>
    > > > >>>>>
    > > > >>>>
    > > > >>>
    > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=9
    > > > >>> 5650
    > > > >>> 764
    > > > >>>>>>>> and https://github.com/apache/incubator-mxnet/pull/12591
    > > > >> ).
    > > > >>>>>>>>
    > > > >>>>>>>> Currently, using this accelerator requires the developer
    > > > >>>>> to either pip
    > > > >>>>>>>> install the mxnet-mkl version of mxnet or to build it
    > > > >>>>> themselves from
    > > > >>>>>>>> source. Given that we should try to provide the best
    > > > >>>>> performance "out
    > > > >>>>>> of
    > > > >>>>>>>> the box” with mxnet we should include this in the
    > > > >>>>> default
    > > > >>>> build.
    > > > >>>>> The
    > > > >>>>>>> mkldnn
    > > > >>>>>>>> library is included with in the pip package build so it
    > > > >>>>> does
    > > > >>>> not
    > > > >>>>>> require
    > > > >>>>>>> an
    > > > >>>>>>>> external dependency.
    > > > >>>>>>>>
    > > > >>>>>>>> There were concerns that MKLDNN could cause regressions
    > > > >>>>> on certain
    > > > >>>>>>>> platforms (as it did with the tensorflow version a while
    > > > >>>>> back); but we
    > > > >>>>>>>> added a env flag (MXNET_MKLDNN_ENABLED) that allows
    > > > >>>>> users to turn of
    > > > >>>>>> this
    > > > >>>>>>>> feature during runtime. Please bring up any other
    > > > >>>>> concerns you may have
    > > > >>>>>>> and
    > > > >>>>>>>> your thoughts on including this accelerator in the
    > > > >>>>> default
    > > > >>>> build.
    > > > >>>>>>>>
    > > > >>>>>>>> Best,
    > > > >>>>>>>> Alex
    > > > >>>>>>>>
    > > > >>>>>>>
    > > > >>>>>>
    > > > >>>>>
    > > > >>>>>
    > > > >>>>>
    > > > >>>>
    > > > >>>
    > > > >>
    > > >

RE: LSTM regression (was RE: Include MKLDNN into default mxnet pip package)

Posted by "Zhao, Patric" <pa...@intel.com>.

MKL-DNN v0.17.1 is released https://github.com/intel/mkl-dnn/tree/v0.17.1

I have submitted the PR to pin this release version.

Thanks,

--Patric

> -----Original Message-----
> From: Zhao, Patric [mailto:patric.zhao@intel.com]
> Sent: Wednesday, November 28, 2018 8:07 PM
> To: dev@mxnet.incubator.apache.org
> Subject: LSTM regression (was RE: Include MKLDNN into default mxnet pip
> package)
> 
> Hi Anirudh,
> 
> The LSTM performance bug is fixed by MKL-DNN and PR  in here
> (https://github.com/apache/incubator-mxnet/pull/13417).
> 
> I am still working on MKL-DNN team to get a patch release for MXNet 1.4 in
> 1 or 2 days.
> 
> Will update the status soon.
> 
> Thanks everyone.
> 
> --Patric
> 
> > -----Original Message-----
> > From: Anirudh Subramanian [mailto:anirudh2290@gmail.com]
> > Sent: Tuesday, November 27, 2018 6:16 AM
> > To: dev@mxnet.incubator.apache.org
> > Subject: Re: Include MKLDNN into default mxnet pip package
> >
> > Hi Tao,
> >
> > I agree with Steffen that we can start with a stable release for
> > MKLDNN for 1.4.0. For your suggestion on using 0.17, can you provide
> > info on what versioning mechanism MKLDNN uses. Once a MKLDNN
> release
> > is out and there are some regressions found like the LSTM regression,
> > would it be possible to do a patch release for it or maintain a release
> branch for it ?
> >
> > Anirudh
> >
> > On Sun, Nov 25, 2018 at 5:03 PM Lv, Tao A <ta...@intel.com> wrote:
> >
> > > Hi Steffen,
> > >
> > > I think all the commits on MKL-DNN master branch are well tested for
> > > MKL-DNN development team. If we really want to have a release commit
> > > in the coming 1.4 mxnet release, my suggestion is 0.17 MKL-DNN release.
> > >
> > > Thank you,
> > > Tao
> > >
> > > Sent from my iPhone
> > >
> > > > On Nov 26, 2018, at 8:09 AM, Steffen Rochel
> > > > <st...@gmail.com>
> > > wrote:
> > > >
> > > > +1 to make MKL-DNN default.
> > > > I'm tracking
> > > > https://github.com/apache/incubator-mxnet/issues/13369
> > > > as open issue to be addressed for 1.4.0 I do agree that we should
> > > > move to a model to include released
> > > dependencies
> > > > instead of just taking bleeding edge snapshots.
> > > > However, speed of development is important as well.
> > > > As a compromise for 1.4.0 release with MKL-DNN: can the MKL-DNN
> > > development
> > > > team provide us with a well tested tag/commit id to include in
> > > > 1.4.0 release?
> > > > Steffen
> > > >
> > > >> On Wed, Nov 21, 2018 at 11:42 PM Lv, Tao A <ta...@intel.com>
> > wrote:
> > > >>
> > > >> Thanks for the information, Kellen and Naveen.
> > > >>
> > > >> Better than onnx-tensorrt, MKL-DNN has already provided
> > > >> versioning and release tags. My concern is that as MKL-DNN is
> > > >> still under intensive development, if it has a new feature or bug
> > > >> fix on its master branch,
> > > do we
> > > >> really want to wait for next release to get it supported in MXNet?
> > > >>
> > > >> Take the LSTM regression as an example, probably MKL-DNN will
> > > >> give a fix or improvement on its master branch soon, do we need
> > > >> to wait for 0.18 release to get it fixed for mxnet user? AFAIK,
> > > >> tensorflow is also using normal commit id, not release, as the
> > > >> dependency for MKL-
> > DNN.
> > > >>
> > > >> Regarding the LSTM regression, we are using internal JIRA tickets
> > > >> rather than github issues to track the defects of MKL-DNN. But I
> > > >> agree with
> > > you,
> > > >> we need update the progress of it in Alex's issue.
> > > >>
> > > >> Thanks,
> > > >> -tao
> > > >>
> > > >> -----Original Message-----
> > > >> From: kellen sunderland [mailto:kellen.sunderland@gmail.com]
> > > >> Sent: Thursday, November 22, 2018 10:55 AM
> > > >> To: dev@mxnet.incubator.apache.org
> > > >> Subject: Re: Include MKLDNN into default mxnet pip package
> > > >>
> > > >> Agree with your point about other repos also not being based on
> > > versioning
> > > >> Tao.  I would point out that I've given some that I've worked
> > > >> with
> > > similar
> > > >> feedback: https://github.com/onnx/onnx-tensorrt/issues/68
> > > >>
> > > >>> On Wed, Nov 21, 2018 at 6:48 PM Naveen Swamy
> > <mn...@gmail.com>
> > > wrote:
> > > >>>
> > > >>> Tao,
> > > >>>
> > > >>> You are right there are many submodules in 3rd party. We have to
> > > >>> start somewhere and I believe this one is a good candidate to
> > > >>> start
> > with.
> > > >>> This is not to cater to release of MXNet or to tie them with the
> > > >>> releases of the submodules but instead to pick only stable
> > > >>> releases and not to pick up bleeding edge commits from the tip
> > > >>> of the master, this gives us confidence in the submodule that
> > > >>> MXNet users are depending on that especially if we make MKLDNN
> the default.
> > > >>>
> > > >>> Good to know it is known already as a regression.Alex has
> > > >>> created this issue
> > > >>> https://github.com/apache/incubator-mxnet/issues/13369,
> > > >>> please add details and link the corresponding issue in MKLDNN(I
> > > >>> couldn't
> > > find).
> > > >>>
> > > >>> -Naveen
> > > >>>
> > > >>>> On Wed, Nov 21, 2018 at 6:04 PM Lv, Tao A <ta...@intel.com>
> > wrote:
> > > >>>>
> > > >>>> Here are my answers for the questions from Kellen and Naveen
> > > >>>> about MKL-DNN. It doesn't mean that I'm supportive for making
> > > >>>> MKL-DNN default here.
> > > >>>>
> > > >>>> @Kellen,
> > > >>>>
> > > >>>> FYI, here is a list for those platforms which are officially
> > > >>>> supported by MKL-DNN.
> > > >>>> https://github.com/intel/mkl-dnn#system-requirements
> > > >>>>
> > > >>>> Most of computation intensive kernels in MKL-DNN are JITed. So
> > > >>>> they are supposed to generate code according to the platform
> > > >>>> during runtime. For non-JIT code in MKL-DNN, same as other code
> > > >>>> in MXNet, it will generate instructions according to the
> > > >>>> options/flags of compiler. We can set -DARCH_OPT_FLAGS when
> > build
> > > >>>> MKL-DNN to avoid optimization for compiling machine. That's
> > > >>>> exactly what we are doing
> > > >> for MKL-DNN build in MXNet.
> > > >>> Even
> > > >>>> without MKL-DNN, I noticed there were issues about illegal
> > > >>>> instructions
> > > >>> of
> > > >>>> MXNet when users import the pip package on a lower end machine
> > > >>>> which probably only supports SSE.
> > > >>>>
> > > >>>> @Naveen,
> > > >>>>
> > > >>>> The LSTM issue has already been identified as a regression from
> > > >>>> the
> > > >>> recent
> > > >>>> version of MKL-DNN. Hopefully it will be fixed soon with a new
> > > >>>> update of MKL-DNN.
> > > >>>>
> > > >>>> MXNet has many submodule dependencies under the 3rd party
> folder.
> > > >>>> Seems
> > > >>> we
> > > >>>> don't require release versions for most of these dependencies.
> > > >>>> The
> > > >>> release
> > > >>>> period of MKL-DNN and MXNet are not matched very well. I think
> > > >>>> it would
> > > >>> be
> > > >>>> a risk for MXNet release if it hardly depends on the release of
> > > >>>> a submodule, no need to say depends on the releases of all
> submodules.
> > > >>>>
> > > >>>> -tao
> > > >>>>
> > > >>>> -----Original Message-----
> > > >>>> From: Naveen Swamy [mailto:mnnaveen@gmail.com]
> > > >>>> Sent: Thursday, November 22, 2018 9:08 AM
> > > >>>> To: dev@mxnet.incubator.apache.org
> > > >>>> Cc: dev@mxnet.apache.org
> > > >>>> Subject: Re: Include MKLDNN into default mxnet pip package
> > > >>>>
> > > >>>> Hi Alex,
> > > >>>>
> > > >>>> Thanks for promptly running the numbers on AMD and reporting
> here.
> > > >>>>
> > > >>>> Can you please update the AMD numbers here for posterity
> > > >>>>
> > > >>>
> > https://cwiki.apache.org/confluence/display/MXNET/MXNet+with+Intel
> > > >>> +MKL
> > > >>> -DNN+-+Performance+Benchmarking
> > > >>>> ?
> > > >>>>
> > > >>>> are there any outstanding issues when MKLDNN is enabled? from
> > > >>>> my offline conversation I am briefly aware performance issues
> > > >>>> with LSTM, is there an GitHub issue for it?
> > > >>>>
> > > >>>> MKLDNN is a submodule dependency, are we pulling the latest
> > > >>>> commit or releases  ? If not we should move to releases before
> > > >>>> we make it a
> > > >>> default.
> > > >>>> Ideally we should use platform specific distributions (-dev
> > > >>>> packages) at least we should rely on well tested releases.
> > > >>>>
> > > >>>>
> > > >>>> Thanks, Naveen
> > > >>>>
> > > >>>> On Wed, Nov 21, 2018 at 4:55 PM Zai, Alexander
> > > >>> <alexzai@amazon.com.invalid
> > > >>>>>
> > > >>>> wrote:
> > > >>>>
> > > >>>>> AMD benchmarks have been published. We are seeing a x15.8
> > > >>>>> speedup with
> > > >>>>> Resnet50 (batch size 32) on AWS's new m5a.24xlarge machine.
> > > >>>>> With a smaller network (Mobilenet - batch size 32) the speedup
> > > >>>>> is more significant at x38.7. Let's have a vote to see if the
> > > >>>>> PR to have MKLDNN enabled by default
> > > >>>>> (https://github.com/apache/incubator-mxnet/pull/12591) can be
> > > >>>>> merged before 1.4.0 release.
> > > >>>>>
> > > >>>>> On 10/19/18, 9:17 AM, "Pedro Larroy"
> > > >>>>> <pe...@gmail.com>
> > > >>>>> wrote:
> > > >>>>>
> > > >>>>>    I did  pip install mxnet-mkl==1.3.1b20181018 on an AMD
> > > >>>>> Ryzen 1950X and unit
> > > >>>>>    tests are passing.
> > > >>>>>
> > > >>>>>    Is this build using AVX512?  in /proc/cpuinfo I see only "avx"
> > > >>> flag.
> > > >>>>>    There's no "avx2" like on recent intel cpus.
> > > >>>>>
> > > >>>>>    Pedro.
> > > >>>>>
> > > >>>>>    On Fri, Oct 19, 2018 at 5:12 PM Hagay Lupesko
> > > >>>>> <lu...@gmail.com>
> > > >>>>> wrote:
> > > >>>>>
> > > >>>>>> Awesome collaborative effort across many contributors and
> > > >>>> companies!
> > > >>>>>>
> > > >>>>>> The boost is impressive and for MXNet users to get this
> > > >>>>> boost "out of the
> > > >>>>>> box" is a great benefit and makes MXNet an even better choice.
> > > >>>>>>
> > > >>>>>> Alex - can you clarify whether there are any down sides with
> > > >>>>> regards to
> > > >>>>>> noon AVX-512 architectures, AMD CPUs, etc? Will it
> > > >>>>> gracefully fallback?
> > > >>>>>>
> > > >>>>>> Hagay
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> On Fri, Oct 19, 2018, 15:46 Sergio Fernández
> > > >>>>> <wi...@apache.org>
> > > >>>>> wrote:
> > > >>>>>>
> > > >>>>>>> If there is no downside on platforms not supporting AVX512
> > > >>>>> instructions,
> > > >>>>>>> then +1
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> On Wed, Oct 17, 2018, 14:10 Alex Zai <az...@gmail.com>
> > > >> wrote:
> > > >>>>>>>
> > > >>>>>>>> Hey all,
> > > >>>>>>>> We have been working hard these past few months to
> > > >>>>> integrate
> > > >>>> and
> > > >>>>>>> stabilize
> > > >>>>>>>> Intel’s MKLDNN deep learning CPU accelerator into Mxnet
> > > >>>>> and have made
> > > >>>>>>>> incredible progress. On CPUs with AVX512 instructions
> > > >>>>> (such as
> > > >>>>> c5.18x)
> > > >>>>>> we
> > > >>>>>>>> have seen performance increase up to 12x and on other
> > > >>>>> platforms (Macs,
> > > >>>>>>>> AVX2) we seen a speedup of 1.5+. Full list of benchmarks
> > > >>>>> can be found
> > > >>>>>>> here
> > > >>>>>>>> (
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>>>
> > > >>>
> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=9
> > > >>> 5650
> > > >>> 764
> > > >>>>>>>> and https://github.com/apache/incubator-mxnet/pull/12591
> > > >> ).
> > > >>>>>>>>
> > > >>>>>>>> Currently, using this accelerator requires the developer
> > > >>>>> to either pip
> > > >>>>>>>> install the mxnet-mkl version of mxnet or to build it
> > > >>>>> themselves from
> > > >>>>>>>> source. Given that we should try to provide the best
> > > >>>>> performance "out
> > > >>>>>> of
> > > >>>>>>>> the box” with mxnet we should include this in the
> > > >>>>> default
> > > >>>> build.
> > > >>>>> The
> > > >>>>>>> mkldnn
> > > >>>>>>>> library is included with in the pip package build so it
> > > >>>>> does
> > > >>>> not
> > > >>>>>> require
> > > >>>>>>> an
> > > >>>>>>>> external dependency.
> > > >>>>>>>>
> > > >>>>>>>> There were concerns that MKLDNN could cause regressions
> > > >>>>> on certain
> > > >>>>>>>> platforms (as it did with the tensorflow version a while
> > > >>>>> back); but we
> > > >>>>>>>> added a env flag (MXNET_MKLDNN_ENABLED) that allows
> > > >>>>> users to turn of
> > > >>>>>> this
> > > >>>>>>>> feature during runtime. Please bring up any other
> > > >>>>> concerns you may have
> > > >>>>>>> and
> > > >>>>>>>> your thoughts on including this accelerator in the
> > > >>>>> default
> > > >>>> build.
> > > >>>>>>>>
> > > >>>>>>>> Best,
> > > >>>>>>>> Alex
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>
> > > >>>
> > > >>
> > >