You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mxnet.apache.org by Anton Chernov <me...@gmail.com> on 2018/11/05 18:02:47 UTC

[Announce] Upcoming Apache MXNet (incubating) 1.3.1 patch release

Dear MXNet community,

I will be the release manager for the upcoming 1.3.1 patch release. Naveen
will be co-managing the release and providing help from the committers side.

The following dates have been set:

Code Freeze: 31st October 2018
Release published: 13th November 2018

Release notes have been drafted here [1].


* Known issues

Update MKL-DNN dependency
https://github.com/apache/incubator-mxnet/pull/12953

This PR hasn't been merged even to master yet. Requires additional
discussion and merge.

distributed kvstore bug in MXNet
https://github.com/apache/incubator-mxnet/issues/12713

> When distributed kvstore is used, by default gluon.Trainer doesn't work
with mx.optimizer.LRScheduler if a worker has more than 1 GPU. To be more
specific, the trainer updates once per GPU, the LRScheduler object is
shared across GPUs and get a wrong update count.

This needs to be fixed. [6]


* Changes

The following changes will be ported to the release branch, per [2]:

Infer dtype in SymbolBlock import from input symbol [3]
https://github.com/apache/incubator-mxnet/pull/12412

[MXNET-953] Fix oob memory read
https://github.com/apache/incubator-mxnet/pull/12631

[MXNET-969] Fix buffer overflow in RNNOp
https://github.com/apache/incubator-mxnet/pull/12603

[MXNET-922] Fix memleak in profiler
https://github.com/apache/incubator-mxnet/pull/12499

Implement mkldnn convolution fusion and quantization (MXNet Graph
Optimization and Quantization based on subgraph and MKL-DNN proposal [4])
https://github.com/apache/incubator-mxnet/pull/12530

Following items (test cases) should be already part of 1.3.0:

[MXNET-486] Create CPP test for concat MKLDNN operator
https://github.com/apache/incubator-mxnet/pull/11371

[MXNET-489] MKLDNN Pool test
https://github.com/apache/incubator-mxnet/pull/11608

[MXNET-484] MKLDNN C++ test for LRN operator
https://github.com/apache/incubator-mxnet/pull/11831

[MXNET-546] Add unit test for MKLDNNSum
https://github.com/apache/incubator-mxnet/pull/11272

[MXNET-498] Test MKLDNN backward operators
https://github.com/apache/incubator-mxnet/pull/11232

[MXNET-500] Test cases improvement for MKLDNN on Gluon
https://github.com/apache/incubator-mxnet/pull/10921

Set correct update on kvstore flag in dist_device_sync mode (as part of
fixing [5])
https://github.com/apache/incubator-mxnet/pull/12786

upgrade mshadow version
https://github.com/apache/incubator-mxnet/pull/12692
But another PR will be used instead:
update mshadow
https://github.com/apache/incubator-mxnet/pull/12674

CudnnFind() usage improvements
https://github.com/apache/incubator-mxnet/pull/12804
A critical CUDNN fix that reduces GPU memory consumption and addresses this
memory leak issue. This is an important fix to include in 1.3.1


From discussion about gluon toolkits:

disable opencv threading for forked process
https://github.com/apache/incubator-mxnet/pull/12025

Fix lazy record io when used with dataloader and multi_worker > 0
https://github.com/apache/incubator-mxnet/pull/12554

fix potential floating number overflow, enable float16
https://github.com/apache/incubator-mxnet/pull/12118



* Resolved issues

MxNet 1.2.1–module get_outputs()
https://discuss.mxnet.io/t/mxnet-1-2-1-module-get-outputs/1882

As far as I can see from the comments the issue has been resolved, no
actions need to be taken for this release. [7] is mentioned in this
regards, but I don't see any action points here either.


I will start with help of Naveen port the mentioned PR's to the 1.3.x
branch.


Best regards,
Anton

[1] https://cwiki.apache.org/confluence/x/eZGzBQ
[2]
https://cwiki.apache.org/confluence/display/MXNET/Project+Proposals+for+next+MXNet+Release
[3] https://github.com/apache/incubator-mxnet/issues/11849
[4]
https://cwiki.apache.org/confluence/display/MXNET/MXNet+Graph+Optimization+and+Quantization+based+on+subgraph+and+MKL-DNN
[5] https://github.com/apache/incubator-mxnet/issues/12713
[6]
https://github.com/apache/incubator-mxnet/issues/12713#issuecomment-435773777
[7] https://github.com/apache/incubator-mxnet/pull/11005

Re: [Announce] Upcoming Apache MXNet (incubating) 1.3.1 patch release

Posted by Anton Chernov <me...@gmail.com>.

Unfortunately, merging the following PR

Set correct update on kvstore flag in dist_device_sync mode (v1.3.x)
https://github.com/apache/incubator-mxnet/pull/13121

Broke `dist-kvstore tests CPU` test stage:

http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/v1.3.x/82/pipeline

A revert PR has been opened:

Revert "Set correct update on kvstore flag in dist_device_sync mode
(v1.3.x) (#13121)
https://github.com/apache/incubator-mxnet/pull/13228

The test already passed, so the PR is good to go. The initial fix will not
be considered for the release and will get a notion in the known issues
section.

Added a version bump to the release branch:

news, readme update for v1.3.1 release
https://github.com/apache/incubator-mxnet/pull/13225

Since patch releases are now done on branches the master branch needs a
version update. Following PR for introducing the change:

Bumped minor version to 1.4.0 as 1.3.1 will be continued in the v1.3x branch
https://github.com/apache/incubator-mxnet/pull/13231


The confluence page 'Apache MXNet (incubating) 1.3.1 Release Notes' has
been updated:
https://cwiki.apache.org/confluence/x/eZGzBQ


Best
Anton

сб, 10 нояб. 2018 г. в 11:59, Anton Chernov <me...@gmail.com>:

> Due to various problems we had to postpone the tagging and vote for the
> release till Monday, the 12th of November 2018.
>
> Following change has been updated and waiting to be merged:
>
> Disable flaky test test_operator.test_dropout (v1.3.x)
> https://github.com/apache/incubator-mxnet/pull/13200
>
> Indeed the MACOS tests timed out as well for the branch. The proposed
> change contains thus only the build:
>
> [MXNET-908] Enable minimal OSX Travis build (v1.3.x)
> https://github.com/apache/incubator-mxnet/pull/13179
>
>
> Best
> Anton
>
> пт, 9 нояб. 2018 г. в 13:11, Anton Chernov <me...@gmail.com>:
>
>> I created the following PR to disable the test:
>>
>> Disable flaky test test_operator.test_dropout (v1.3.x)
>> https://github.com/apache/incubator-mxnet/pull/13200
>>
>> The second failure I suppose is related to:
>>
>> distributed kvstore bug in MXNet
>> https://github.com/apache/incubator-mxnet/issues/12713
>>
>> Which partially was fixed by
>>
>> Set correct update on kvstore flag in dist_device_sync mode (v1.3.x)
>> https://github.com/apache/incubator-mxnet/pull/13121
>>
>> But another part of the issue is still open and does not have a fix yet:
>>
>> "When distributed kvstore is used, by default gluon.Trainer doesn't work
>> with mx.optimizer.LRScheduler if a worker has more than 1 GPU. To be more
>> specific, the trainer updates once per GPU, the LRScheduler object is
>> shared across GPUs and get a wrong update count."
>>
>>
>> Best
>> Anton
>>
>>
>> пт, 9 нояб. 2018 г. в 11:48, Anton Chernov <me...@gmail.com>:
>>
>>> In case the tests for MACOS will time out as well we can disable them
>>> and keep at least the build stage as in:
>>>
>>> Disable travis tests
>>> https://github.com/apache/incubator-mxnet/pull/13137
>>>
>>> Best
>>> Anton
>>>
>>> пт, 9 нояб. 2018 г. в 11:17, Anton Chernov <me...@gmail.com>:
>>>
>>>>
>>>> Hi Naveen,
>>>>
>>>> I believe that the timeout is not an issue for the branch. And I see
>>>> great benefit in having tests for MACOS on the release branch. The travis
>>>> build is not blocking anyway, so I don't see any risk in adding it.
>>>>
>>>> * test_dropout
>>>>
>>>> Currently, there is a problem with test_dropout that fails consistently
>>>> on the branch:
>>>>
>>>>
>>>> http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/v1.3.x/97/pipeline
>>>>
>>>> Error reported:
>>>>
>>>> ======================================================================
>>>> FAIL: test_operator.test_dropout
>>>> ----------------------------------------------------------------------
>>>> Traceback (most recent call last):
>>>>   File "C:\Anaconda3\envs\py3\lib\site-packages\nose\case.py", line
>>>> 197, in runTest
>>>>     self.test(*self.arg)
>>>>   File
>>>> "C:\jenkins_slave\workspace\ut-python-gpu\tests\python\unittest\common.py",
>>>> line 173, in test_new
>>>>     orig_test(*args, **kwargs)
>>>>   File
>>>> "C:\jenkins_slave\workspace\ut-python-gpu\tests\python\unittest\test_operator.py",
>>>> line 5853, in test_dropout
>>>>     check_dropout_ratio(0.0, shape)
>>>>   File
>>>> "C:\jenkins_slave\workspace\ut-python-gpu\tests\python\unittest\test_operator.py",
>>>> line 5797, in check_dropout_ratio
>>>>     assert exe.outputs[0].asnumpy().min() == min_value
>>>> AssertionError:
>>>> -------------------- >> begin captured logging << --------------------
>>>> common: INFO: Setting test np/mx/python random seeds, use
>>>> MXNET_TEST_SEED=428273587 to reproduce.
>>>> --------------------- >> end captured logging << ---------------------
>>>>
>>>> The test is enabled on master:
>>>>
>>>> Re-enables test_operator.test_dropout
>>>> https://github.com/apache/incubator-mxnet/pull/12717
>>>>
>>>> And there are no failures for it [1].
>>>>
>>>> * KVStore tests
>>>>
>>>> Unfortunately, KVStore tests fail as well.
>>>>
>>>>
>>>> http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/v1.3.x/96/pipeline
>>>>
>>>> Error reported:
>>>>
>>>> AssertionError
>>>> test_gluon_trainer_type()
>>>>     assert trainer._update_on_kvstore is update_on_kv\
>>>>   File "dist_sync_kvstore.py", line 388, in test_gluon_trainer_type
>>>>
>>>> If nobody has a fix for these issues, I will disable the tests and add
>>>> information to the known issues section.
>>>>
>>>> Best
>>>> Anton
>>>>
>>>> [1]
>>>> http://jenkins.mxnet-ci.amazon-ml.com/job/incubator-mxnet/job/master/
>>>>
>>>> чт, 8 нояб. 2018 г. в 21:44, Naveen Swamy <mn...@gmail.com>:
>>>>
>>>>> Anton, I don't think we need to add the Mac OS tests for 1.3.1 branch
>>>>> since
>>>>> travis CI is timing out and creates blockers, it also did not exist for
>>>>> v1.3.0.
>>>>>
>>>>>
>>>>> On Thu, Nov 8, 2018 at 10:04 AM Anton Chernov <me...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> > A PR to fix the tests:
>>>>> >
>>>>> > Remove test for non existing index copy operator (v1.3.x)
>>>>> > https://github.com/apache/incubator-mxnet/pull/13180
>>>>> >
>>>>> >
>>>>> > Best
>>>>> > Anton
>>>>> >
>>>>> > чт, 8 нояб. 2018 г. в 10:05, Anton Chernov <me...@gmail.com>:
>>>>> >
>>>>> > > An addition has been made to include MacOS tests for the v1.3.x
>>>>> branch:
>>>>> > >
>>>>> > > [MXNET-908] Enable minimal OSX Travis build (v1.3.x)
>>>>> > > https://github.com/apache/incubator-mxnet/pull/13179
>>>>> > >
>>>>> > > It includes following PR's for master:
>>>>> > >
>>>>> > > [MXNET-908] Enable minimal OSX Travis build
>>>>> > > https://github.com/apache/incubator-mxnet/pull/12462
>>>>> > >
>>>>> > > [MXNET-908] Enable python tests in Travis
>>>>> > > https://github.com/apache/incubator-mxnet/pull/12550
>>>>> > >
>>>>> > > [MXNET-968] Fix MacOS python tests
>>>>> > > https://github.com/apache/incubator-mxnet/pull/12590
>>>>> > >
>>>>> > >
>>>>> > > Best
>>>>> > > Anton
>>>>> > >
>>>>> > >
>>>>> > > чт, 8 нояб. 2018 г. в 9:38, Anton Chernov <me...@gmail.com>:
>>>>> > >
>>>>> > >> Thank you everyone for your support and suggestions. All proposed
>>>>> PR's
>>>>> > >> have been merged. We will tag the release candidate and start the
>>>>> vote
>>>>> > on
>>>>> > >> Friday, the 9th of November 2018.
>>>>> > >>
>>>>> > >> Unfortunately after the merges the tests started to fail:
>>>>> > >>
>>>>> > >>
>>>>> http://jenkins.mxnet-ci.amazon-ml.com/job/incubator-mxnet/job/v1.3.x/
>>>>> > >>
>>>>> > >> I will look into the failures, but any help as usual is very
>>>>> > appreciated.
>>>>> > >>
>>>>> > >> The nightly tests are fine:
>>>>> > >>
>>>>> http://jenkins.mxnet-ci.amazon-ml.com/job/NightlyTests/job/v1.3.x/
>>>>> > >>
>>>>> > >>
>>>>> > >> Best
>>>>> > >> Anton
>>>>> > >>
>>>>> > >>
>>>>> > >>
>>>>> > >>
>>>>> > >> ср, 7 нояб. 2018 г. в 17:19, Anton Chernov <me...@gmail.com>:
>>>>> > >>
>>>>> > >>> Yes, you are right about the versions wording, thanks for
>>>>> > clarification.
>>>>> > >>>
>>>>> > >>> A performance improvement can be considered a bugfix as well. I
>>>>> see no
>>>>> > >>> big risks in including PR's by Haibin and Lin into the patch
>>>>> release.
>>>>> > >>>
>>>>> > >>> @Haibin, if you can reopen the PR's they should be good to go
>>>>> for the
>>>>> > >>> relase, considering the importance of the improvements.
>>>>> > >>>
>>>>> > >>> I propose the following bugfixes for the release as well (already
>>>>> > >>> created corresponding PR's):
>>>>> > >>>
>>>>> > >>> Fixed __setattr__ method of _MXClassPropertyMetaClass (v1.3.x)
>>>>> > >>> https://github.com/apache/incubator-mxnet/pull/13157
>>>>> > >>>
>>>>> > >>> fixed symbols naming in RNNCell, LSTMCell, GRUCell (v1.3.x)
>>>>> > >>> https://github.com/apache/incubator-mxnet/pull/13158
>>>>> > >>>
>>>>> > >>> We will be starting to merge the PR's shortly. If are no more
>>>>> proposals
>>>>> > >>> for backporting I would consider the list as set.
>>>>> > >>>
>>>>> > >>> Best
>>>>> > >>> Anton
>>>>> > >>>
>>>>> > >>> ср, 7 нояб. 2018 г. в 17:01, Sheng Zha <sz...@gmail.com>:
>>>>> > >>>
>>>>> > >>>> Hi Anton,
>>>>> > >>>>
>>>>> > >>>> I hear your concern about a simultaneous 1.4.0 release and it
>>>>> > certainly
>>>>> > >>>> is a valid one.
>>>>> > >>>>
>>>>> > >>>> Regarding the release, let’s agree on the language first.
>>>>> According to
>>>>> > >>>> semver.org, 1.3.1 release is considered patch release, which
>>>>> is for
>>>>> > >>>> backward compatible bug fixes, while 1.4.0 release is
>>>>> considered minor
>>>>> > >>>> release, which is for backward compatible new features. A major
>>>>> > release
>>>>> > >>>> would mean 2.0.
>>>>> > >>>>
>>>>> > >>>> The three PRs suggested by Haibin and Lin are all introducing
>>>>> new
>>>>> > >>>> features. If they go into a patch release, it would require an
>>>>> > exception
>>>>> > >>>> accepted by the community. Also, if other violation happens it
>>>>> could
>>>>> > be
>>>>> > >>>> ground for declining a release during votes.
>>>>> > >>>>
>>>>> > >>>> -sz
>>>>> > >>>>
>>>>> > >>>> > On Nov 7, 2018, at 2:25 AM, Anton Chernov <
>>>>> mechernov@gmail.com>
>>>>> > >>>> wrote:
>>>>> > >>>> >
>>>>> > >>>> > [MXNET-1179] Enforce deterministic algorithms in convolution
>>>>> layers
>>>>> > >>>>
>>>>> > >>>
>>>>> >
>>>>>
>>>>

Re: [Announce] Upcoming Apache MXNet (incubating) 1.3.1 patch release

Posted by Anton Chernov <me...@gmail.com>.

Due to various problems we had to postpone the tagging and vote for the
release till Monday, the 12th of November 2018.

Following change has been updated and waiting to be merged:

Disable flaky test test_operator.test_dropout (v1.3.x)
https://github.com/apache/incubator-mxnet/pull/13200

Indeed the MACOS tests timed out as well for the branch. The proposed
change contains thus only the build:

[MXNET-908] Enable minimal OSX Travis build (v1.3.x)
https://github.com/apache/incubator-mxnet/pull/13179


Best
Anton

пт, 9 нояб. 2018 г. в 13:11, Anton Chernov <me...@gmail.com>:

> I created the following PR to disable the test:
>
> Disable flaky test test_operator.test_dropout (v1.3.x)
> https://github.com/apache/incubator-mxnet/pull/13200
>
> The second failure I suppose is related to:
>
> distributed kvstore bug in MXNet
> https://github.com/apache/incubator-mxnet/issues/12713
>
> Which partially was fixed by
>
> Set correct update on kvstore flag in dist_device_sync mode (v1.3.x)
> https://github.com/apache/incubator-mxnet/pull/13121
>
> But another part of the issue is still open and does not have a fix yet:
>
> "When distributed kvstore is used, by default gluon.Trainer doesn't work
> with mx.optimizer.LRScheduler if a worker has more than 1 GPU. To be more
> specific, the trainer updates once per GPU, the LRScheduler object is
> shared across GPUs and get a wrong update count."
>
>
> Best
> Anton
>
>
> пт, 9 нояб. 2018 г. в 11:48, Anton Chernov <me...@gmail.com>:
>
>> In case the tests for MACOS will time out as well we can disable them and
>> keep at least the build stage as in:
>>
>> Disable travis tests
>> https://github.com/apache/incubator-mxnet/pull/13137
>>
>> Best
>> Anton
>>
>> пт, 9 нояб. 2018 г. в 11:17, Anton Chernov <me...@gmail.com>:
>>
>>>
>>> Hi Naveen,
>>>
>>> I believe that the timeout is not an issue for the branch. And I see
>>> great benefit in having tests for MACOS on the release branch. The travis
>>> build is not blocking anyway, so I don't see any risk in adding it.
>>>
>>> * test_dropout
>>>
>>> Currently, there is a problem with test_dropout that fails consistently
>>> on the branch:
>>>
>>>
>>> http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/v1.3.x/97/pipeline
>>>
>>> Error reported:
>>>
>>> ======================================================================
>>> FAIL: test_operator.test_dropout
>>> ----------------------------------------------------------------------
>>> Traceback (most recent call last):
>>>   File "C:\Anaconda3\envs\py3\lib\site-packages\nose\case.py", line 197,
>>> in runTest
>>>     self.test(*self.arg)
>>>   File
>>> "C:\jenkins_slave\workspace\ut-python-gpu\tests\python\unittest\common.py",
>>> line 173, in test_new
>>>     orig_test(*args, **kwargs)
>>>   File
>>> "C:\jenkins_slave\workspace\ut-python-gpu\tests\python\unittest\test_operator.py",
>>> line 5853, in test_dropout
>>>     check_dropout_ratio(0.0, shape)
>>>   File
>>> "C:\jenkins_slave\workspace\ut-python-gpu\tests\python\unittest\test_operator.py",
>>> line 5797, in check_dropout_ratio
>>>     assert exe.outputs[0].asnumpy().min() == min_value
>>> AssertionError:
>>> -------------------- >> begin captured logging << --------------------
>>> common: INFO: Setting test np/mx/python random seeds, use
>>> MXNET_TEST_SEED=428273587 to reproduce.
>>> --------------------- >> end captured logging << ---------------------
>>>
>>> The test is enabled on master:
>>>
>>> Re-enables test_operator.test_dropout
>>> https://github.com/apache/incubator-mxnet/pull/12717
>>>
>>> And there are no failures for it [1].
>>>
>>> * KVStore tests
>>>
>>> Unfortunately, KVStore tests fail as well.
>>>
>>>
>>> http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/v1.3.x/96/pipeline
>>>
>>> Error reported:
>>>
>>> AssertionError
>>> test_gluon_trainer_type()
>>>     assert trainer._update_on_kvstore is update_on_kv\
>>>   File "dist_sync_kvstore.py", line 388, in test_gluon_trainer_type
>>>
>>> If nobody has a fix for these issues, I will disable the tests and add
>>> information to the known issues section.
>>>
>>> Best
>>> Anton
>>>
>>> [1]
>>> http://jenkins.mxnet-ci.amazon-ml.com/job/incubator-mxnet/job/master/
>>>
>>> чт, 8 нояб. 2018 г. в 21:44, Naveen Swamy <mn...@gmail.com>:
>>>
>>>> Anton, I don't think we need to add the Mac OS tests for 1.3.1 branch
>>>> since
>>>> travis CI is timing out and creates blockers, it also did not exist for
>>>> v1.3.0.
>>>>
>>>>
>>>> On Thu, Nov 8, 2018 at 10:04 AM Anton Chernov <me...@gmail.com>
>>>> wrote:
>>>>
>>>> > A PR to fix the tests:
>>>> >
>>>> > Remove test for non existing index copy operator (v1.3.x)
>>>> > https://github.com/apache/incubator-mxnet/pull/13180
>>>> >
>>>> >
>>>> > Best
>>>> > Anton
>>>> >
>>>> > чт, 8 нояб. 2018 г. в 10:05, Anton Chernov <me...@gmail.com>:
>>>> >
>>>> > > An addition has been made to include MacOS tests for the v1.3.x
>>>> branch:
>>>> > >
>>>> > > [MXNET-908] Enable minimal OSX Travis build (v1.3.x)
>>>> > > https://github.com/apache/incubator-mxnet/pull/13179
>>>> > >
>>>> > > It includes following PR's for master:
>>>> > >
>>>> > > [MXNET-908] Enable minimal OSX Travis build
>>>> > > https://github.com/apache/incubator-mxnet/pull/12462
>>>> > >
>>>> > > [MXNET-908] Enable python tests in Travis
>>>> > > https://github.com/apache/incubator-mxnet/pull/12550
>>>> > >
>>>> > > [MXNET-968] Fix MacOS python tests
>>>> > > https://github.com/apache/incubator-mxnet/pull/12590
>>>> > >
>>>> > >
>>>> > > Best
>>>> > > Anton
>>>> > >
>>>> > >
>>>> > > чт, 8 нояб. 2018 г. в 9:38, Anton Chernov <me...@gmail.com>:
>>>> > >
>>>> > >> Thank you everyone for your support and suggestions. All proposed
>>>> PR's
>>>> > >> have been merged. We will tag the release candidate and start the
>>>> vote
>>>> > on
>>>> > >> Friday, the 9th of November 2018.
>>>> > >>
>>>> > >> Unfortunately after the merges the tests started to fail:
>>>> > >>
>>>> > >>
>>>> http://jenkins.mxnet-ci.amazon-ml.com/job/incubator-mxnet/job/v1.3.x/
>>>> > >>
>>>> > >> I will look into the failures, but any help as usual is very
>>>> > appreciated.
>>>> > >>
>>>> > >> The nightly tests are fine:
>>>> > >> http://jenkins.mxnet-ci.amazon-ml.com/job/NightlyTests/job/v1.3.x/
>>>> > >>
>>>> > >>
>>>> > >> Best
>>>> > >> Anton
>>>> > >>
>>>> > >>
>>>> > >>
>>>> > >>
>>>> > >> ср, 7 нояб. 2018 г. в 17:19, Anton Chernov <me...@gmail.com>:
>>>> > >>
>>>> > >>> Yes, you are right about the versions wording, thanks for
>>>> > clarification.
>>>> > >>>
>>>> > >>> A performance improvement can be considered a bugfix as well. I
>>>> see no
>>>> > >>> big risks in including PR's by Haibin and Lin into the patch
>>>> release.
>>>> > >>>
>>>> > >>> @Haibin, if you can reopen the PR's they should be good to go for
>>>> the
>>>> > >>> relase, considering the importance of the improvements.
>>>> > >>>
>>>> > >>> I propose the following bugfixes for the release as well (already
>>>> > >>> created corresponding PR's):
>>>> > >>>
>>>> > >>> Fixed __setattr__ method of _MXClassPropertyMetaClass (v1.3.x)
>>>> > >>> https://github.com/apache/incubator-mxnet/pull/13157
>>>> > >>>
>>>> > >>> fixed symbols naming in RNNCell, LSTMCell, GRUCell (v1.3.x)
>>>> > >>> https://github.com/apache/incubator-mxnet/pull/13158
>>>> > >>>
>>>> > >>> We will be starting to merge the PR's shortly. If are no more
>>>> proposals
>>>> > >>> for backporting I would consider the list as set.
>>>> > >>>
>>>> > >>> Best
>>>> > >>> Anton
>>>> > >>>
>>>> > >>> ср, 7 нояб. 2018 г. в 17:01, Sheng Zha <sz...@gmail.com>:
>>>> > >>>
>>>> > >>>> Hi Anton,
>>>> > >>>>
>>>> > >>>> I hear your concern about a simultaneous 1.4.0 release and it
>>>> > certainly
>>>> > >>>> is a valid one.
>>>> > >>>>
>>>> > >>>> Regarding the release, let’s agree on the language first.
>>>> According to
>>>> > >>>> semver.org, 1.3.1 release is considered patch release, which is
>>>> for
>>>> > >>>> backward compatible bug fixes, while 1.4.0 release is considered
>>>> minor
>>>> > >>>> release, which is for backward compatible new features. A major
>>>> > release
>>>> > >>>> would mean 2.0.
>>>> > >>>>
>>>> > >>>> The three PRs suggested by Haibin and Lin are all introducing new
>>>> > >>>> features. If they go into a patch release, it would require an
>>>> > exception
>>>> > >>>> accepted by the community. Also, if other violation happens it
>>>> could
>>>> > be
>>>> > >>>> ground for declining a release during votes.
>>>> > >>>>
>>>> > >>>> -sz
>>>> > >>>>
>>>> > >>>> > On Nov 7, 2018, at 2:25 AM, Anton Chernov <mechernov@gmail.com
>>>> >
>>>> > >>>> wrote:
>>>> > >>>> >
>>>> > >>>> > [MXNET-1179] Enforce deterministic algorithms in convolution
>>>> layers
>>>> > >>>>
>>>> > >>>
>>>> >
>>>>
>>>

Re: [Announce] Upcoming Apache MXNet (incubating) 1.3.1 patch release

Posted by Anton Chernov <me...@gmail.com>.

I created the following PR to disable the test:

Disable flaky test test_operator.test_dropout (v1.3.x)
https://github.com/apache/incubator-mxnet/pull/13200

The second failure I suppose is related to:

distributed kvstore bug in MXNet
https://github.com/apache/incubator-mxnet/issues/12713

Which partially was fixed by

Set correct update on kvstore flag in dist_device_sync mode (v1.3.x)
https://github.com/apache/incubator-mxnet/pull/13121

But another part of the issue is still open and does not have a fix yet:

"When distributed kvstore is used, by default gluon.Trainer doesn't work
with mx.optimizer.LRScheduler if a worker has more than 1 GPU. To be more
specific, the trainer updates once per GPU, the LRScheduler object is
shared across GPUs and get a wrong update count."


Best
Anton


пт, 9 нояб. 2018 г. в 11:48, Anton Chernov <me...@gmail.com>:

> In case the tests for MACOS will time out as well we can disable them and
> keep at least the build stage as in:
>
> Disable travis tests
> https://github.com/apache/incubator-mxnet/pull/13137
>
> Best
> Anton
>
> пт, 9 нояб. 2018 г. в 11:17, Anton Chernov <me...@gmail.com>:
>
>>
>> Hi Naveen,
>>
>> I believe that the timeout is not an issue for the branch. And I see
>> great benefit in having tests for MACOS on the release branch. The travis
>> build is not blocking anyway, so I don't see any risk in adding it.
>>
>> * test_dropout
>>
>> Currently, there is a problem with test_dropout that fails consistently
>> on the branch:
>>
>>
>> http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/v1.3.x/97/pipeline
>>
>> Error reported:
>>
>> ======================================================================
>> FAIL: test_operator.test_dropout
>> ----------------------------------------------------------------------
>> Traceback (most recent call last):
>>   File "C:\Anaconda3\envs\py3\lib\site-packages\nose\case.py", line 197,
>> in runTest
>>     self.test(*self.arg)
>>   File
>> "C:\jenkins_slave\workspace\ut-python-gpu\tests\python\unittest\common.py",
>> line 173, in test_new
>>     orig_test(*args, **kwargs)
>>   File
>> "C:\jenkins_slave\workspace\ut-python-gpu\tests\python\unittest\test_operator.py",
>> line 5853, in test_dropout
>>     check_dropout_ratio(0.0, shape)
>>   File
>> "C:\jenkins_slave\workspace\ut-python-gpu\tests\python\unittest\test_operator.py",
>> line 5797, in check_dropout_ratio
>>     assert exe.outputs[0].asnumpy().min() == min_value
>> AssertionError:
>> -------------------- >> begin captured logging << --------------------
>> common: INFO: Setting test np/mx/python random seeds, use
>> MXNET_TEST_SEED=428273587 to reproduce.
>> --------------------- >> end captured logging << ---------------------
>>
>> The test is enabled on master:
>>
>> Re-enables test_operator.test_dropout
>> https://github.com/apache/incubator-mxnet/pull/12717
>>
>> And there are no failures for it [1].
>>
>> * KVStore tests
>>
>> Unfortunately, KVStore tests fail as well.
>>
>>
>> http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/v1.3.x/96/pipeline
>>
>> Error reported:
>>
>> AssertionError
>> test_gluon_trainer_type()
>>     assert trainer._update_on_kvstore is update_on_kv\
>>   File "dist_sync_kvstore.py", line 388, in test_gluon_trainer_type
>>
>> If nobody has a fix for these issues, I will disable the tests and add
>> information to the known issues section.
>>
>> Best
>> Anton
>>
>> [1] http://jenkins.mxnet-ci.amazon-ml.com/job/incubator-mxnet/job/master/
>>
>> чт, 8 нояб. 2018 г. в 21:44, Naveen Swamy <mn...@gmail.com>:
>>
>>> Anton, I don't think we need to add the Mac OS tests for 1.3.1 branch
>>> since
>>> travis CI is timing out and creates blockers, it also did not exist for
>>> v1.3.0.
>>>
>>>
>>> On Thu, Nov 8, 2018 at 10:04 AM Anton Chernov <me...@gmail.com>
>>> wrote:
>>>
>>> > A PR to fix the tests:
>>> >
>>> > Remove test for non existing index copy operator (v1.3.x)
>>> > https://github.com/apache/incubator-mxnet/pull/13180
>>> >
>>> >
>>> > Best
>>> > Anton
>>> >
>>> > чт, 8 нояб. 2018 г. в 10:05, Anton Chernov <me...@gmail.com>:
>>> >
>>> > > An addition has been made to include MacOS tests for the v1.3.x
>>> branch:
>>> > >
>>> > > [MXNET-908] Enable minimal OSX Travis build (v1.3.x)
>>> > > https://github.com/apache/incubator-mxnet/pull/13179
>>> > >
>>> > > It includes following PR's for master:
>>> > >
>>> > > [MXNET-908] Enable minimal OSX Travis build
>>> > > https://github.com/apache/incubator-mxnet/pull/12462
>>> > >
>>> > > [MXNET-908] Enable python tests in Travis
>>> > > https://github.com/apache/incubator-mxnet/pull/12550
>>> > >
>>> > > [MXNET-968] Fix MacOS python tests
>>> > > https://github.com/apache/incubator-mxnet/pull/12590
>>> > >
>>> > >
>>> > > Best
>>> > > Anton
>>> > >
>>> > >
>>> > > чт, 8 нояб. 2018 г. в 9:38, Anton Chernov <me...@gmail.com>:
>>> > >
>>> > >> Thank you everyone for your support and suggestions. All proposed
>>> PR's
>>> > >> have been merged. We will tag the release candidate and start the
>>> vote
>>> > on
>>> > >> Friday, the 9th of November 2018.
>>> > >>
>>> > >> Unfortunately after the merges the tests started to fail:
>>> > >>
>>> > >>
>>> http://jenkins.mxnet-ci.amazon-ml.com/job/incubator-mxnet/job/v1.3.x/
>>> > >>
>>> > >> I will look into the failures, but any help as usual is very
>>> > appreciated.
>>> > >>
>>> > >> The nightly tests are fine:
>>> > >> http://jenkins.mxnet-ci.amazon-ml.com/job/NightlyTests/job/v1.3.x/
>>> > >>
>>> > >>
>>> > >> Best
>>> > >> Anton
>>> > >>
>>> > >>
>>> > >>
>>> > >>
>>> > >> ср, 7 нояб. 2018 г. в 17:19, Anton Chernov <me...@gmail.com>:
>>> > >>
>>> > >>> Yes, you are right about the versions wording, thanks for
>>> > clarification.
>>> > >>>
>>> > >>> A performance improvement can be considered a bugfix as well. I
>>> see no
>>> > >>> big risks in including PR's by Haibin and Lin into the patch
>>> release.
>>> > >>>
>>> > >>> @Haibin, if you can reopen the PR's they should be good to go for
>>> the
>>> > >>> relase, considering the importance of the improvements.
>>> > >>>
>>> > >>> I propose the following bugfixes for the release as well (already
>>> > >>> created corresponding PR's):
>>> > >>>
>>> > >>> Fixed __setattr__ method of _MXClassPropertyMetaClass (v1.3.x)
>>> > >>> https://github.com/apache/incubator-mxnet/pull/13157
>>> > >>>
>>> > >>> fixed symbols naming in RNNCell, LSTMCell, GRUCell (v1.3.x)
>>> > >>> https://github.com/apache/incubator-mxnet/pull/13158
>>> > >>>
>>> > >>> We will be starting to merge the PR's shortly. If are no more
>>> proposals
>>> > >>> for backporting I would consider the list as set.
>>> > >>>
>>> > >>> Best
>>> > >>> Anton
>>> > >>>
>>> > >>> ср, 7 нояб. 2018 г. в 17:01, Sheng Zha <sz...@gmail.com>:
>>> > >>>
>>> > >>>> Hi Anton,
>>> > >>>>
>>> > >>>> I hear your concern about a simultaneous 1.4.0 release and it
>>> > certainly
>>> > >>>> is a valid one.
>>> > >>>>
>>> > >>>> Regarding the release, let’s agree on the language first.
>>> According to
>>> > >>>> semver.org, 1.3.1 release is considered patch release, which is
>>> for
>>> > >>>> backward compatible bug fixes, while 1.4.0 release is considered
>>> minor
>>> > >>>> release, which is for backward compatible new features. A major
>>> > release
>>> > >>>> would mean 2.0.
>>> > >>>>
>>> > >>>> The three PRs suggested by Haibin and Lin are all introducing new
>>> > >>>> features. If they go into a patch release, it would require an
>>> > exception
>>> > >>>> accepted by the community. Also, if other violation happens it
>>> could
>>> > be
>>> > >>>> ground for declining a release during votes.
>>> > >>>>
>>> > >>>> -sz
>>> > >>>>
>>> > >>>> > On Nov 7, 2018, at 2:25 AM, Anton Chernov <me...@gmail.com>
>>> > >>>> wrote:
>>> > >>>> >
>>> > >>>> > [MXNET-1179] Enforce deterministic algorithms in convolution
>>> layers
>>> > >>>>
>>> > >>>
>>> >
>>>
>>

Re: [Announce] Upcoming Apache MXNet (incubating) 1.3.1 patch release

Posted by Anton Chernov <me...@gmail.com>.

In case the tests for MACOS will time out as well we can disable them and
keep at least the build stage as in:

Disable travis tests
https://github.com/apache/incubator-mxnet/pull/13137

Best
Anton

пт, 9 нояб. 2018 г. в 11:17, Anton Chernov <me...@gmail.com>:

>
> Hi Naveen,
>
> I believe that the timeout is not an issue for the branch. And I see great
> benefit in having tests for MACOS on the release branch. The travis build
> is not blocking anyway, so I don't see any risk in adding it.
>
> * test_dropout
>
> Currently, there is a problem with test_dropout that fails consistently on
> the branch:
>
>
> http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/v1.3.x/97/pipeline
>
> Error reported:
>
> ======================================================================
> FAIL: test_operator.test_dropout
> ----------------------------------------------------------------------
> Traceback (most recent call last):
>   File "C:\Anaconda3\envs\py3\lib\site-packages\nose\case.py", line 197,
> in runTest
>     self.test(*self.arg)
>   File
> "C:\jenkins_slave\workspace\ut-python-gpu\tests\python\unittest\common.py",
> line 173, in test_new
>     orig_test(*args, **kwargs)
>   File
> "C:\jenkins_slave\workspace\ut-python-gpu\tests\python\unittest\test_operator.py",
> line 5853, in test_dropout
>     check_dropout_ratio(0.0, shape)
>   File
> "C:\jenkins_slave\workspace\ut-python-gpu\tests\python\unittest\test_operator.py",
> line 5797, in check_dropout_ratio
>     assert exe.outputs[0].asnumpy().min() == min_value
> AssertionError:
> -------------------- >> begin captured logging << --------------------
> common: INFO: Setting test np/mx/python random seeds, use
> MXNET_TEST_SEED=428273587 to reproduce.
> --------------------- >> end captured logging << ---------------------
>
> The test is enabled on master:
>
> Re-enables test_operator.test_dropout
> https://github.com/apache/incubator-mxnet/pull/12717
>
> And there are no failures for it [1].
>
> * KVStore tests
>
> Unfortunately, KVStore tests fail as well.
>
>
> http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/v1.3.x/96/pipeline
>
> Error reported:
>
> AssertionError
> test_gluon_trainer_type()
>     assert trainer._update_on_kvstore is update_on_kv\
>   File "dist_sync_kvstore.py", line 388, in test_gluon_trainer_type
>
> If nobody has a fix for these issues, I will disable the tests and add
> information to the known issues section.
>
> Best
> Anton
>
> [1] http://jenkins.mxnet-ci.amazon-ml.com/job/incubator-mxnet/job/master/
>
> чт, 8 нояб. 2018 г. в 21:44, Naveen Swamy <mn...@gmail.com>:
>
>> Anton, I don't think we need to add the Mac OS tests for 1.3.1 branch
>> since
>> travis CI is timing out and creates blockers, it also did not exist for
>> v1.3.0.
>>
>>
>> On Thu, Nov 8, 2018 at 10:04 AM Anton Chernov <me...@gmail.com>
>> wrote:
>>
>> > A PR to fix the tests:
>> >
>> > Remove test for non existing index copy operator (v1.3.x)
>> > https://github.com/apache/incubator-mxnet/pull/13180
>> >
>> >
>> > Best
>> > Anton
>> >
>> > чт, 8 нояб. 2018 г. в 10:05, Anton Chernov <me...@gmail.com>:
>> >
>> > > An addition has been made to include MacOS tests for the v1.3.x
>> branch:
>> > >
>> > > [MXNET-908] Enable minimal OSX Travis build (v1.3.x)
>> > > https://github.com/apache/incubator-mxnet/pull/13179
>> > >
>> > > It includes following PR's for master:
>> > >
>> > > [MXNET-908] Enable minimal OSX Travis build
>> > > https://github.com/apache/incubator-mxnet/pull/12462
>> > >
>> > > [MXNET-908] Enable python tests in Travis
>> > > https://github.com/apache/incubator-mxnet/pull/12550
>> > >
>> > > [MXNET-968] Fix MacOS python tests
>> > > https://github.com/apache/incubator-mxnet/pull/12590
>> > >
>> > >
>> > > Best
>> > > Anton
>> > >
>> > >
>> > > чт, 8 нояб. 2018 г. в 9:38, Anton Chernov <me...@gmail.com>:
>> > >
>> > >> Thank you everyone for your support and suggestions. All proposed
>> PR's
>> > >> have been merged. We will tag the release candidate and start the
>> vote
>> > on
>> > >> Friday, the 9th of November 2018.
>> > >>
>> > >> Unfortunately after the merges the tests started to fail:
>> > >>
>> > >>
>> http://jenkins.mxnet-ci.amazon-ml.com/job/incubator-mxnet/job/v1.3.x/
>> > >>
>> > >> I will look into the failures, but any help as usual is very
>> > appreciated.
>> > >>
>> > >> The nightly tests are fine:
>> > >> http://jenkins.mxnet-ci.amazon-ml.com/job/NightlyTests/job/v1.3.x/
>> > >>
>> > >>
>> > >> Best
>> > >> Anton
>> > >>
>> > >>
>> > >>
>> > >>
>> > >> ср, 7 нояб. 2018 г. в 17:19, Anton Chernov <me...@gmail.com>:
>> > >>
>> > >>> Yes, you are right about the versions wording, thanks for
>> > clarification.
>> > >>>
>> > >>> A performance improvement can be considered a bugfix as well. I see
>> no
>> > >>> big risks in including PR's by Haibin and Lin into the patch
>> release.
>> > >>>
>> > >>> @Haibin, if you can reopen the PR's they should be good to go for
>> the
>> > >>> relase, considering the importance of the improvements.
>> > >>>
>> > >>> I propose the following bugfixes for the release as well (already
>> > >>> created corresponding PR's):
>> > >>>
>> > >>> Fixed __setattr__ method of _MXClassPropertyMetaClass (v1.3.x)
>> > >>> https://github.com/apache/incubator-mxnet/pull/13157
>> > >>>
>> > >>> fixed symbols naming in RNNCell, LSTMCell, GRUCell (v1.3.x)
>> > >>> https://github.com/apache/incubator-mxnet/pull/13158
>> > >>>
>> > >>> We will be starting to merge the PR's shortly. If are no more
>> proposals
>> > >>> for backporting I would consider the list as set.
>> > >>>
>> > >>> Best
>> > >>> Anton
>> > >>>
>> > >>> ср, 7 нояб. 2018 г. в 17:01, Sheng Zha <sz...@gmail.com>:
>> > >>>
>> > >>>> Hi Anton,
>> > >>>>
>> > >>>> I hear your concern about a simultaneous 1.4.0 release and it
>> > certainly
>> > >>>> is a valid one.
>> > >>>>
>> > >>>> Regarding the release, let’s agree on the language first.
>> According to
>> > >>>> semver.org, 1.3.1 release is considered patch release, which is
>> for
>> > >>>> backward compatible bug fixes, while 1.4.0 release is considered
>> minor
>> > >>>> release, which is for backward compatible new features. A major
>> > release
>> > >>>> would mean 2.0.
>> > >>>>
>> > >>>> The three PRs suggested by Haibin and Lin are all introducing new
>> > >>>> features. If they go into a patch release, it would require an
>> > exception
>> > >>>> accepted by the community. Also, if other violation happens it
>> could
>> > be
>> > >>>> ground for declining a release during votes.
>> > >>>>
>> > >>>> -sz
>> > >>>>
>> > >>>> > On Nov 7, 2018, at 2:25 AM, Anton Chernov <me...@gmail.com>
>> > >>>> wrote:
>> > >>>> >
>> > >>>> > [MXNET-1179] Enforce deterministic algorithms in convolution
>> layers
>> > >>>>
>> > >>>
>> >
>>
>

Re: [Announce] Upcoming Apache MXNet (incubating) 1.3.1 patch release

Posted by Anton Chernov <me...@gmail.com>.

Hi Naveen,

I believe that the timeout is not an issue for the branch. And I see great
benefit in having tests for MACOS on the release branch. The travis build
is not blocking anyway, so I don't see any risk in adding it.

* test_dropout

Currently, there is a problem with test_dropout that fails consistently on
the branch:

http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/v1.3.x/97/pipeline

Error reported:

======================================================================
FAIL: test_operator.test_dropout
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\Anaconda3\envs\py3\lib\site-packages\nose\case.py", line 197, in
runTest
    self.test(*self.arg)
  File
"C:\jenkins_slave\workspace\ut-python-gpu\tests\python\unittest\common.py",
line 173, in test_new
    orig_test(*args, **kwargs)
  File
"C:\jenkins_slave\workspace\ut-python-gpu\tests\python\unittest\test_operator.py",
line 5853, in test_dropout
    check_dropout_ratio(0.0, shape)
  File
"C:\jenkins_slave\workspace\ut-python-gpu\tests\python\unittest\test_operator.py",
line 5797, in check_dropout_ratio
    assert exe.outputs[0].asnumpy().min() == min_value
AssertionError:
-------------------- >> begin captured logging << --------------------
common: INFO: Setting test np/mx/python random seeds, use
MXNET_TEST_SEED=428273587 to reproduce.
--------------------- >> end captured logging << ---------------------

The test is enabled on master:

Re-enables test_operator.test_dropout
https://github.com/apache/incubator-mxnet/pull/12717

And there are no failures for it [1].

* KVStore tests

Unfortunately, KVStore tests fail as well.

http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/v1.3.x/96/pipeline

Error reported:

AssertionError
test_gluon_trainer_type()
    assert trainer._update_on_kvstore is update_on_kv\
  File "dist_sync_kvstore.py", line 388, in test_gluon_trainer_type

If nobody has a fix for these issues, I will disable the tests and add
information to the known issues section.

Best
Anton

[1] http://jenkins.mxnet-ci.amazon-ml.com/job/incubator-mxnet/job/master/

чт, 8 нояб. 2018 г. в 21:44, Naveen Swamy <mn...@gmail.com>:

> Anton, I don't think we need to add the Mac OS tests for 1.3.1 branch since
> travis CI is timing out and creates blockers, it also did not exist for
> v1.3.0.
>
>
> On Thu, Nov 8, 2018 at 10:04 AM Anton Chernov <me...@gmail.com> wrote:
>
> > A PR to fix the tests:
> >
> > Remove test for non existing index copy operator (v1.3.x)
> > https://github.com/apache/incubator-mxnet/pull/13180
> >
> >
> > Best
> > Anton
> >
> > чт, 8 нояб. 2018 г. в 10:05, Anton Chernov <me...@gmail.com>:
> >
> > > An addition has been made to include MacOS tests for the v1.3.x branch:
> > >
> > > [MXNET-908] Enable minimal OSX Travis build (v1.3.x)
> > > https://github.com/apache/incubator-mxnet/pull/13179
> > >
> > > It includes following PR's for master:
> > >
> > > [MXNET-908] Enable minimal OSX Travis build
> > > https://github.com/apache/incubator-mxnet/pull/12462
> > >
> > > [MXNET-908] Enable python tests in Travis
> > > https://github.com/apache/incubator-mxnet/pull/12550
> > >
> > > [MXNET-968] Fix MacOS python tests
> > > https://github.com/apache/incubator-mxnet/pull/12590
> > >
> > >
> > > Best
> > > Anton
> > >
> > >
> > > чт, 8 нояб. 2018 г. в 9:38, Anton Chernov <me...@gmail.com>:
> > >
> > >> Thank you everyone for your support and suggestions. All proposed PR's
> > >> have been merged. We will tag the release candidate and start the vote
> > on
> > >> Friday, the 9th of November 2018.
> > >>
> > >> Unfortunately after the merges the tests started to fail:
> > >>
> > >> http://jenkins.mxnet-ci.amazon-ml.com/job/incubator-mxnet/job/v1.3.x/
> > >>
> > >> I will look into the failures, but any help as usual is very
> > appreciated.
> > >>
> > >> The nightly tests are fine:
> > >> http://jenkins.mxnet-ci.amazon-ml.com/job/NightlyTests/job/v1.3.x/
> > >>
> > >>
> > >> Best
> > >> Anton
> > >>
> > >>
> > >>
> > >>
> > >> ср, 7 нояб. 2018 г. в 17:19, Anton Chernov <me...@gmail.com>:
> > >>
> > >>> Yes, you are right about the versions wording, thanks for
> > clarification.
> > >>>
> > >>> A performance improvement can be considered a bugfix as well. I see
> no
> > >>> big risks in including PR's by Haibin and Lin into the patch release.
> > >>>
> > >>> @Haibin, if you can reopen the PR's they should be good to go for the
> > >>> relase, considering the importance of the improvements.
> > >>>
> > >>> I propose the following bugfixes for the release as well (already
> > >>> created corresponding PR's):
> > >>>
> > >>> Fixed __setattr__ method of _MXClassPropertyMetaClass (v1.3.x)
> > >>> https://github.com/apache/incubator-mxnet/pull/13157
> > >>>
> > >>> fixed symbols naming in RNNCell, LSTMCell, GRUCell (v1.3.x)
> > >>> https://github.com/apache/incubator-mxnet/pull/13158
> > >>>
> > >>> We will be starting to merge the PR's shortly. If are no more
> proposals
> > >>> for backporting I would consider the list as set.
> > >>>
> > >>> Best
> > >>> Anton
> > >>>
> > >>> ср, 7 нояб. 2018 г. в 17:01, Sheng Zha <sz...@gmail.com>:
> > >>>
> > >>>> Hi Anton,
> > >>>>
> > >>>> I hear your concern about a simultaneous 1.4.0 release and it
> > certainly
> > >>>> is a valid one.
> > >>>>
> > >>>> Regarding the release, let’s agree on the language first. According
> to
> > >>>> semver.org, 1.3.1 release is considered patch release, which is for
> > >>>> backward compatible bug fixes, while 1.4.0 release is considered
> minor
> > >>>> release, which is for backward compatible new features. A major
> > release
> > >>>> would mean 2.0.
> > >>>>
> > >>>> The three PRs suggested by Haibin and Lin are all introducing new
> > >>>> features. If they go into a patch release, it would require an
> > exception
> > >>>> accepted by the community. Also, if other violation happens it could
> > be
> > >>>> ground for declining a release during votes.
> > >>>>
> > >>>> -sz
> > >>>>
> > >>>> > On Nov 7, 2018, at 2:25 AM, Anton Chernov <me...@gmail.com>
> > >>>> wrote:
> > >>>> >
> > >>>> > [MXNET-1179] Enforce deterministic algorithms in convolution
> layers
> > >>>>
> > >>>
> >
>

Re: [Announce] Upcoming Apache MXNet (incubating) 1.3.1 patch release

Posted by Naveen Swamy <mn...@gmail.com>.

Anton, I don't think we need to add the Mac OS tests for 1.3.1 branch since
travis CI is timing out and creates blockers, it also did not exist for
v1.3.0.


On Thu, Nov 8, 2018 at 10:04 AM Anton Chernov <me...@gmail.com> wrote:

> A PR to fix the tests:
>
> Remove test for non existing index copy operator (v1.3.x)
> https://github.com/apache/incubator-mxnet/pull/13180
>
>
> Best
> Anton
>
> чт, 8 нояб. 2018 г. в 10:05, Anton Chernov <me...@gmail.com>:
>
> > An addition has been made to include MacOS tests for the v1.3.x branch:
> >
> > [MXNET-908] Enable minimal OSX Travis build (v1.3.x)
> > https://github.com/apache/incubator-mxnet/pull/13179
> >
> > It includes following PR's for master:
> >
> > [MXNET-908] Enable minimal OSX Travis build
> > https://github.com/apache/incubator-mxnet/pull/12462
> >
> > [MXNET-908] Enable python tests in Travis
> > https://github.com/apache/incubator-mxnet/pull/12550
> >
> > [MXNET-968] Fix MacOS python tests
> > https://github.com/apache/incubator-mxnet/pull/12590
> >
> >
> > Best
> > Anton
> >
> >
> > чт, 8 нояб. 2018 г. в 9:38, Anton Chernov <me...@gmail.com>:
> >
> >> Thank you everyone for your support and suggestions. All proposed PR's
> >> have been merged. We will tag the release candidate and start the vote
> on
> >> Friday, the 9th of November 2018.
> >>
> >> Unfortunately after the merges the tests started to fail:
> >>
> >> http://jenkins.mxnet-ci.amazon-ml.com/job/incubator-mxnet/job/v1.3.x/
> >>
> >> I will look into the failures, but any help as usual is very
> appreciated.
> >>
> >> The nightly tests are fine:
> >> http://jenkins.mxnet-ci.amazon-ml.com/job/NightlyTests/job/v1.3.x/
> >>
> >>
> >> Best
> >> Anton
> >>
> >>
> >>
> >>
> >> ср, 7 нояб. 2018 г. в 17:19, Anton Chernov <me...@gmail.com>:
> >>
> >>> Yes, you are right about the versions wording, thanks for
> clarification.
> >>>
> >>> A performance improvement can be considered a bugfix as well. I see no
> >>> big risks in including PR's by Haibin and Lin into the patch release.
> >>>
> >>> @Haibin, if you can reopen the PR's they should be good to go for the
> >>> relase, considering the importance of the improvements.
> >>>
> >>> I propose the following bugfixes for the release as well (already
> >>> created corresponding PR's):
> >>>
> >>> Fixed __setattr__ method of _MXClassPropertyMetaClass (v1.3.x)
> >>> https://github.com/apache/incubator-mxnet/pull/13157
> >>>
> >>> fixed symbols naming in RNNCell, LSTMCell, GRUCell (v1.3.x)
> >>> https://github.com/apache/incubator-mxnet/pull/13158
> >>>
> >>> We will be starting to merge the PR's shortly. If are no more proposals
> >>> for backporting I would consider the list as set.
> >>>
> >>> Best
> >>> Anton
> >>>
> >>> ср, 7 нояб. 2018 г. в 17:01, Sheng Zha <sz...@gmail.com>:
> >>>
> >>>> Hi Anton,
> >>>>
> >>>> I hear your concern about a simultaneous 1.4.0 release and it
> certainly
> >>>> is a valid one.
> >>>>
> >>>> Regarding the release, let’s agree on the language first. According to
> >>>> semver.org, 1.3.1 release is considered patch release, which is for
> >>>> backward compatible bug fixes, while 1.4.0 release is considered minor
> >>>> release, which is for backward compatible new features. A major
> release
> >>>> would mean 2.0.
> >>>>
> >>>> The three PRs suggested by Haibin and Lin are all introducing new
> >>>> features. If they go into a patch release, it would require an
> exception
> >>>> accepted by the community. Also, if other violation happens it could
> be
> >>>> ground for declining a release during votes.
> >>>>
> >>>> -sz
> >>>>
> >>>> > On Nov 7, 2018, at 2:25 AM, Anton Chernov <me...@gmail.com>
> >>>> wrote:
> >>>> >
> >>>> > [MXNET-1179] Enforce deterministic algorithms in convolution layers
> >>>>
> >>>
>

Re: [Announce] Upcoming Apache MXNet (incubating) 1.3.1 patch release

Posted by Anton Chernov <me...@gmail.com>.

A PR to fix the tests:

Remove test for non existing index copy operator (v1.3.x)
https://github.com/apache/incubator-mxnet/pull/13180


Best
Anton

чт, 8 нояб. 2018 г. в 10:05, Anton Chernov <me...@gmail.com>:

> An addition has been made to include MacOS tests for the v1.3.x branch:
>
> [MXNET-908] Enable minimal OSX Travis build (v1.3.x)
> https://github.com/apache/incubator-mxnet/pull/13179
>
> It includes following PR's for master:
>
> [MXNET-908] Enable minimal OSX Travis build
> https://github.com/apache/incubator-mxnet/pull/12462
>
> [MXNET-908] Enable python tests in Travis
> https://github.com/apache/incubator-mxnet/pull/12550
>
> [MXNET-968] Fix MacOS python tests
> https://github.com/apache/incubator-mxnet/pull/12590
>
>
> Best
> Anton
>
>
> чт, 8 нояб. 2018 г. в 9:38, Anton Chernov <me...@gmail.com>:
>
>> Thank you everyone for your support and suggestions. All proposed PR's
>> have been merged. We will tag the release candidate and start the vote on
>> Friday, the 9th of November 2018.
>>
>> Unfortunately after the merges the tests started to fail:
>>
>> http://jenkins.mxnet-ci.amazon-ml.com/job/incubator-mxnet/job/v1.3.x/
>>
>> I will look into the failures, but any help as usual is very appreciated.
>>
>> The nightly tests are fine:
>> http://jenkins.mxnet-ci.amazon-ml.com/job/NightlyTests/job/v1.3.x/
>>
>>
>> Best
>> Anton
>>
>>
>>
>>
>> ср, 7 нояб. 2018 г. в 17:19, Anton Chernov <me...@gmail.com>:
>>
>>> Yes, you are right about the versions wording, thanks for clarification.
>>>
>>> A performance improvement can be considered a bugfix as well. I see no
>>> big risks in including PR's by Haibin and Lin into the patch release.
>>>
>>> @Haibin, if you can reopen the PR's they should be good to go for the
>>> relase, considering the importance of the improvements.
>>>
>>> I propose the following bugfixes for the release as well (already
>>> created corresponding PR's):
>>>
>>> Fixed __setattr__ method of _MXClassPropertyMetaClass (v1.3.x)
>>> https://github.com/apache/incubator-mxnet/pull/13157
>>>
>>> fixed symbols naming in RNNCell, LSTMCell, GRUCell (v1.3.x)
>>> https://github.com/apache/incubator-mxnet/pull/13158
>>>
>>> We will be starting to merge the PR's shortly. If are no more proposals
>>> for backporting I would consider the list as set.
>>>
>>> Best
>>> Anton
>>>
>>> ср, 7 нояб. 2018 г. в 17:01, Sheng Zha <sz...@gmail.com>:
>>>
>>>> Hi Anton,
>>>>
>>>> I hear your concern about a simultaneous 1.4.0 release and it certainly
>>>> is a valid one.
>>>>
>>>> Regarding the release, let’s agree on the language first. According to
>>>> semver.org, 1.3.1 release is considered patch release, which is for
>>>> backward compatible bug fixes, while 1.4.0 release is considered minor
>>>> release, which is for backward compatible new features. A major release
>>>> would mean 2.0.
>>>>
>>>> The three PRs suggested by Haibin and Lin are all introducing new
>>>> features. If they go into a patch release, it would require an exception
>>>> accepted by the community. Also, if other violation happens it could be
>>>> ground for declining a release during votes.
>>>>
>>>> -sz
>>>>
>>>> > On Nov 7, 2018, at 2:25 AM, Anton Chernov <me...@gmail.com>
>>>> wrote:
>>>> >
>>>> > [MXNET-1179] Enforce deterministic algorithms in convolution layers
>>>>
>>>

Re: [Announce] Upcoming Apache MXNet (incubating) 1.3.1 patch release

Posted by Anton Chernov <me...@gmail.com>.

An addition has been made to include MacOS tests for the v1.3.x branch:

[MXNET-908] Enable minimal OSX Travis build (v1.3.x)
https://github.com/apache/incubator-mxnet/pull/13179

It includes following PR's for master:

[MXNET-908] Enable minimal OSX Travis build
https://github.com/apache/incubator-mxnet/pull/12462

[MXNET-908] Enable python tests in Travis
https://github.com/apache/incubator-mxnet/pull/12550

[MXNET-968] Fix MacOS python tests
https://github.com/apache/incubator-mxnet/pull/12590


Best
Anton


чт, 8 нояб. 2018 г. в 9:38, Anton Chernov <me...@gmail.com>:

> Thank you everyone for your support and suggestions. All proposed PR's
> have been merged. We will tag the release candidate and start the vote on
> Friday, the 9th of November 2018.
>
> Unfortunately after the merges the tests started to fail:
>
> http://jenkins.mxnet-ci.amazon-ml.com/job/incubator-mxnet/job/v1.3.x/
>
> I will look into the failures, but any help as usual is very appreciated.
>
> The nightly tests are fine:
> http://jenkins.mxnet-ci.amazon-ml.com/job/NightlyTests/job/v1.3.x/
>
>
> Best
> Anton
>
>
>
>
> ср, 7 нояб. 2018 г. в 17:19, Anton Chernov <me...@gmail.com>:
>
>> Yes, you are right about the versions wording, thanks for clarification.
>>
>> A performance improvement can be considered a bugfix as well. I see no
>> big risks in including PR's by Haibin and Lin into the patch release.
>>
>> @Haibin, if you can reopen the PR's they should be good to go for the
>> relase, considering the importance of the improvements.
>>
>> I propose the following bugfixes for the release as well (already created
>> corresponding PR's):
>>
>> Fixed __setattr__ method of _MXClassPropertyMetaClass (v1.3.x)
>> https://github.com/apache/incubator-mxnet/pull/13157
>>
>> fixed symbols naming in RNNCell, LSTMCell, GRUCell (v1.3.x)
>> https://github.com/apache/incubator-mxnet/pull/13158
>>
>> We will be starting to merge the PR's shortly. If are no more proposals
>> for backporting I would consider the list as set.
>>
>> Best
>> Anton
>>
>> ср, 7 нояб. 2018 г. в 17:01, Sheng Zha <sz...@gmail.com>:
>>
>>> Hi Anton,
>>>
>>> I hear your concern about a simultaneous 1.4.0 release and it certainly
>>> is a valid one.
>>>
>>> Regarding the release, let’s agree on the language first. According to
>>> semver.org, 1.3.1 release is considered patch release, which is for
>>> backward compatible bug fixes, while 1.4.0 release is considered minor
>>> release, which is for backward compatible new features. A major release
>>> would mean 2.0.
>>>
>>> The three PRs suggested by Haibin and Lin are all introducing new
>>> features. If they go into a patch release, it would require an exception
>>> accepted by the community. Also, if other violation happens it could be
>>> ground for declining a release during votes.
>>>
>>> -sz
>>>
>>> > On Nov 7, 2018, at 2:25 AM, Anton Chernov <me...@gmail.com> wrote:
>>> >
>>> > [MXNET-1179] Enforce deterministic algorithms in convolution layers
>>>
>>

Re: [Announce] Upcoming Apache MXNet (incubating) 1.3.1 patch release

Posted by Anton Chernov <me...@gmail.com>.

Thank you everyone for your support and suggestions. All proposed PR's have
been merged. We will tag the release candidate and start the vote on
Friday, the 9th of November 2018.

Unfortunately after the merges the tests started to fail:

http://jenkins.mxnet-ci.amazon-ml.com/job/incubator-mxnet/job/v1.3.x/

I will look into the failures, but any help as usual is very appreciated.

The nightly tests are fine:
http://jenkins.mxnet-ci.amazon-ml.com/job/NightlyTests/job/v1.3.x/


Best
Anton




ср, 7 нояб. 2018 г. в 17:19, Anton Chernov <me...@gmail.com>:

> Yes, you are right about the versions wording, thanks for clarification.
>
> A performance improvement can be considered a bugfix as well. I see no big
> risks in including PR's by Haibin and Lin into the patch release.
>
> @Haibin, if you can reopen the PR's they should be good to go for the
> relase, considering the importance of the improvements.
>
> I propose the following bugfixes for the release as well (already created
> corresponding PR's):
>
> Fixed __setattr__ method of _MXClassPropertyMetaClass (v1.3.x)
> https://github.com/apache/incubator-mxnet/pull/13157
>
> fixed symbols naming in RNNCell, LSTMCell, GRUCell (v1.3.x)
> https://github.com/apache/incubator-mxnet/pull/13158
>
> We will be starting to merge the PR's shortly. If are no more proposals
> for backporting I would consider the list as set.
>
> Best
> Anton
>
> ср, 7 нояб. 2018 г. в 17:01, Sheng Zha <sz...@gmail.com>:
>
>> Hi Anton,
>>
>> I hear your concern about a simultaneous 1.4.0 release and it certainly
>> is a valid one.
>>
>> Regarding the release, let’s agree on the language first. According to
>> semver.org, 1.3.1 release is considered patch release, which is for
>> backward compatible bug fixes, while 1.4.0 release is considered minor
>> release, which is for backward compatible new features. A major release
>> would mean 2.0.
>>
>> The three PRs suggested by Haibin and Lin are all introducing new
>> features. If they go into a patch release, it would require an exception
>> accepted by the community. Also, if other violation happens it could be
>> ground for declining a release during votes.
>>
>> -sz
>>
>> > On Nov 7, 2018, at 2:25 AM, Anton Chernov <me...@gmail.com> wrote:
>> >
>> > [MXNET-1179] Enforce deterministic algorithms in convolution layers
>>
>

Re: [Announce] Upcoming Apache MXNet (incubating) 1.3.1 patch release

Posted by Anton Chernov <me...@gmail.com>.

Yes, you are right about the versions wording, thanks for clarification.

A performance improvement can be considered a bugfix as well. I see no big
risks in including PR's by Haibin and Lin into the patch release.

@Haibin, if you can reopen the PR's they should be good to go for the
relase, considering the importance of the improvements.

I propose the following bugfixes for the release as well (already created
corresponding PR's):

Fixed __setattr__ method of _MXClassPropertyMetaClass (v1.3.x)
https://github.com/apache/incubator-mxnet/pull/13157

fixed symbols naming in RNNCell, LSTMCell, GRUCell (v1.3.x)
https://github.com/apache/incubator-mxnet/pull/13158

We will be starting to merge the PR's shortly. If are no more proposals for
backporting I would consider the list as set.

Best
Anton

ср, 7 нояб. 2018 г. в 17:01, Sheng Zha <sz...@gmail.com>:

> Hi Anton,
>
> I hear your concern about a simultaneous 1.4.0 release and it certainly is
> a valid one.
>
> Regarding the release, let’s agree on the language first. According to
> semver.org, 1.3.1 release is considered patch release, which is for
> backward compatible bug fixes, while 1.4.0 release is considered minor
> release, which is for backward compatible new features. A major release
> would mean 2.0.
>
> The three PRs suggested by Haibin and Lin are all introducing new
> features. If they go into a patch release, it would require an exception
> accepted by the community. Also, if other violation happens it could be
> ground for declining a release during votes.
>
> -sz
>
> > On Nov 7, 2018, at 2:25 AM, Anton Chernov <me...@gmail.com> wrote:
> >
> > [MXNET-1179] Enforce deterministic algorithms in convolution layers
>

Re: [Announce] Upcoming Apache MXNet (incubating) 1.3.1 patch release

Posted by Sheng Zha <sz...@gmail.com>.

Hi Anton,

I hear your concern about a simultaneous 1.4.0 release and it certainly is a valid one.

Regarding the release, let’s agree on the language first. According to semver.org, 1.3.1 release is considered patch release, which is for backward compatible bug fixes, while 1.4.0 release is considered minor release, which is for backward compatible new features. A major release would mean 2.0.

The three PRs suggested by Haibin and Lin are all introducing new features. If they go into a patch release, it would require an exception accepted by the community. Also, if other violation happens it could be ground for declining a release during votes.

-sz

> On Nov 7, 2018, at 2:25 AM, Anton Chernov <me...@gmail.com> wrote:
> 
> [MXNET-1179] Enforce deterministic algorithms in convolution layers

Re: [Announce] Upcoming Apache MXNet (incubating) 1.3.1 patch release

Posted by Anton Chernov <me...@gmail.com>.

@Sheng: Sorry, nevermind. It was already suggested by Lin.

The following backport PR's have been created:

allow foreach on input with 0 length (v1.3.x)
https://github.com/apache/incubator-mxnet/pull/13151

[MXNET-1179] Enforce deterministic algorithms in convolution layers (v1.3.x)
https://github.com/apache/incubator-mxnet/pull/13152

Document the newly added env variable (v1.3.x)
https://github.com/apache/incubator-mxnet/pull/13156

add/update infer_range docs (v1.3.x)
https://github.com/apache/incubator-mxnet/pull/13153

fix broken Python IO API docs (v1.3.x)
https://github.com/apache/incubator-mxnet/pull/13154

fix broken links (v1.3.x)
https://github.com/apache/incubator-mxnet/pull/13155


Best
Anton


ср, 7 нояб. 2018 г. в 10:49, Anton Chernov <me...@gmail.com>:

> Hi Sheng,
>
> thanks for you suggestions. Personally, I would not rush with new major
> release as this breaks the pace and creates unnecessary pressure in my
> opinion.
>
> If the changes suggested by Haibin are really important then I think we
> can consider them for the minor release, even if they are not strictly
> speaking *bugfixes*. Do you think that might be an option?
>
> And did I understand correctly, you are suggesting:
>
> [MXNET-1179] Enforce deterministic algorithms in convolution layers
> https://github.com/apache/incubator-mxnet/pull/12992
>
> for the 1.3.1 release?
>
> Best
> Anton
>
>
> ср, 7 нояб. 2018 г. в 0:59, Sheng Zha <sz...@gmail.com>:
>
>> Similar to the two PRs that Haibin suggested, 12992 introduces new
>> interface for controlling determinism, which is better suited for minor
>> release.
>>
>> I think other than lack of release manager to drive 1.4.0 release,
>> there’s no reason we cannot do two releases (1.4.0 & 1.3.1) at the same
>> time. I’m willing to help with the 1.4.0 release to make these new features
>> available one month sooner, if there’s no other concern.
>>
>> -sz
>>
>> > On Nov 6, 2018, at 3:30 PM, Lin Yuan <ap...@gmail.com> wrote:
>> >
>> > Hi Anton,
>> >
>> > Thanks for helping the release.
>> > The following PRs are needed by customers who want to use deterministic
>> > CUDNN convolution algorithms:
>> >
>> > https://github.com/apache/incubator-mxnet/pull/12992
>> > https://github.com/apache/incubator-mxnet/pull/13049
>> >
>> > Thanks！
>> >
>> > Lin
>> >
>> >
>> > On Tue, Nov 6, 2018 at 1:51 PM Aaron Markham <aaron.s.markham@gmail.com
>> >
>> > wrote:
>> >
>> >> Hi Anton,
>> >> I have the following suggestions for fixes to include in 1.3.1. These
>> each
>> >> have updates to files that will impact docs generation for the 1.3.x
>> >> version of the website's Python API docs:
>> >>
>> >> https://github.com/apache/incubator-mxnet/pull/12879
>> >> https://github.com/apache/incubator-mxnet/pull/12871
>> >> https://github.com/apache/incubator-mxnet/pull/12856
>> >>
>> >> Thanks,
>> >> Aaron
>> >>
>> >>> On Tue, Nov 6, 2018 at 1:29 PM Lai Wei <ro...@gmail.com> wrote:
>> >>>
>> >>> Hi Anton,
>> >>>
>> >>> Thanks for driving this, I would like to include the following fix in
>> >>> 1.3.1:
>> >>> Allow infer shape partial on foreach operator:
>> >>> https://github.com/apache/incubator-mxnet/pull/12471
>> >>>
>> >>> Keras-MXNet needs this functionality to infer shape partially
>> >>> on foreach operator. (Used in RNN operators)
>> >>>
>> >>> Thanks a lot!
>> >>>
>> >>>
>> >>> Best Regards
>> >>> Lai Wei
>> >>>
>> >>>
>> >>>
>> >>> On Tue, Nov 6, 2018 at 10:44 AM Haibin Lin <ha...@gmail.com>
>> >>> wrote:
>> >>>
>> >>>> Hi Naveen and Anton,
>> >>>>
>> >>>> Thanks for pointing that out. You are right that these are not
>> critical
>> >>>> fixes. Putting them in 1.4.0 is more appropriate. PRs are closed.
>> >>>>
>> >>>> Best,
>> >>>> Haibin
>> >>>>
>> >>>> On Tue, Nov 6, 2018 at 7:35 AM Naveen Swamy <mn...@gmail.com>
>> >> wrote:
>> >>>>
>> >>>>> Please note that this is a patch release(1.3.1) to address critical
>> >>>> bugs!,
>> >>>>> For everything else please wait for 1.4.0 which is planned very
>> >> shortly
>> >>>>> after 1.3.1
>> >>>>>
>> >>>>>> On Nov 6, 2018, at 7:17 AM, Anton Chernov <me...@gmail.com>
>> >>> wrote:
>> >>>>>>
>> >>>>>> The following PR's have been created so far:
>> >>>>>>
>> >>>>>> Infer dtype in SymbolBlock import from input symbol (v1.3.x)
>> >>>>>> https://github.com/apache/incubator-mxnet/pull/13117
>> >>>>>>
>> >>>>>> [MXNET-953] Fix oob memory read (v1.3.x)
>> >>>>>> https://github.com/apache/incubator-mxnet/pull/13118
>> >>>>>>
>> >>>>>> [MXNET-969] Fix buffer overflow in RNNOp (v1.3.x)
>> >>>>>> https://github.com/apache/incubator-mxnet/pull/13119
>> >>>>>>
>> >>>>>> [MXNET-922] Fix memleak in profiler (v1.3.x)
>> >>>>>> https://github.com/apache/incubator-mxnet/pull/13120
>> >>>>>>
>> >>>>>> Set correct update on kvstore flag in dist_device_sync mode
>> >> (v1.3.x)
>> >>>>>> https://github.com/apache/incubator-mxnet/pull/13121
>> >>>>>>
>> >>>>>> update mshadow (v1.3.x)
>> >>>>>> https://github.com/apache/incubator-mxnet/pull/13122
>> >>>>>>
>> >>>>>> CudnnFind() usage improvements (v1.3.x)
>> >>>>>> https://github.com/apache/incubator-mxnet/pull/13123
>> >>>>>>
>> >>>>>> Fix lazy record io when used with dataloader and multi_worker > 0
>> >>>>> (v1.3.x)
>> >>>>>> https://github.com/apache/incubator-mxnet/pull/13124
>> >>>>>>
>> >>>>>>
>> >>>>>> As stated previously I would be rather opposed to have following
>> >> PR's
>> >>>> it
>> >>>>> in
>> >>>>>> the patch release:
>> >>>>>>
>> >>>>>> Gluon LSTM Projection and Clipping Support (#13055) v1.3.x
>> >>>>>> https://github.com/apache/incubator-mxnet/pull/13129
>> >>>>>>
>> >>>>>> sample_like operators (#13034) v1.3.x
>> >>>>>> https://github.com/apache/incubator-mxnet/pull/13130
>> >>>>>>
>> >>>>>>
>> >>>>>> Best
>> >>>>>> Anton
>> >>>>>>
>> >>>>>> вт, 6 нояб. 2018 г. в 16:06, Anton Chernov <me...@gmail.com>:
>> >>>>>>
>> >>>>>>> Hi Haibin,
>> >>>>>>>
>> >>>>>>> I have a few comments regarding the proposed performance
>> >> improvement
>> >>>>>>> changes.
>> >>>>>>>
>> >>>>>>> CUDNN support for LSTM with projection & clipping
>> >>>>>>> https://github.com/apache/incubator-mxnet/pull/13056
>> >>>>>>>
>> >>>>>>> There is no doubt that this change brings value, but I don't see
>> >> it
>> >>>> as a
>> >>>>>>> critical bug fix. I would rather leave it for the next major
>> >>> release.
>> >>>>>>>
>> >>>>>>> sample_like operators
>> >>>>>>> https://github.com/apache/incubator-mxnet/pull/13034
>> >>>>>>>
>> >>>>>>> Even if it's related to performance, this is an addition of
>> >>>>> functionality
>> >>>>>>> and I would also push this to be in the next major release only.
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> Best
>> >>>>>>> Anton
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> вт, 6 нояб. 2018 г. в 15:55, Anton Chernov <me...@gmail.com>:
>> >>>>>>>
>> >>>>>>>> Hi Patric,
>> >>>>>>>>
>> >>>>>>>> This change was listed in the 'PR candidates suggested for
>> >>>>> consideration
>> >>>>>>>> for v1.3.1 patch release' section [1].
>> >>>>>>>>
>> >>>>>>>> You are right, I also think that this is not a critical hotfix
>> >>> change
>> >>>>>>>> that should be included into the 1.3.1 patch release.
>> >>>>>>>>
>> >>>>>>>> Thus I'm not making any further efforts to bring it in.
>> >>>>>>>>
>> >>>>>>>> Best
>> >>>>>>>> Anton
>> >>>>>>>>
>> >>>>>>>> [1]
>> >>>>>>>>
>> >>>>>
>> >>>>
>> >>>
>> >>
>> https://cwiki.apache.org/confluence/display/MXNET/Project+Proposals+for+next+MXNet+Release#PR_candidates
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> вт, 6 нояб. 2018 г. в 1:14, Zhao, Patric <patric.zhao@intel.com
>> >>> :
>> >>>>>>>>
>> >>>>>>>>> Hi Anton,
>> >>>>>>>>>
>> >>>>>>>>> Thanks for looking into the MKL-DNN PR.
>> >>>>>>>>>
>> >>>>>>>>> As my understanding of cwiki (
>> >>>>>>>>>
>> >>>>>
>> >>>>
>> >>>
>> >>
>> https://cwiki.apache.org/confluence/display/MXNET/Project+Proposals+for+next+MXNet+Release
>> >>>>>>>>> ),
>> >>>>>>>>> these features will go into 1.4 rather than patch release of
>> >>> 1.3.1.
>> >>>>>>>>>
>> >>>>>>>>> Feel free to correct me :)
>> >>>>>>>>>
>> >>>>>>>>> Thanks,
>> >>>>>>>>>
>> >>>>>>>>> --Patric
>> >>>>>>>>>
>> >>>>>>>>>> -----Original Message-----
>> >>>>>>>>>> From: Anton Chernov [mailto:mechernov@gmail.com]
>> >>>>>>>>>> Sent: Tuesday, November 6, 2018 3:11 AM
>> >>>>>>>>>> To: dev@mxnet.apache.org
>> >>>>>>>>>> Subject: Re: [Announce] Upcoming Apache MXNet (incubating)
>> >> 1.3.1
>> >>>>> patch
>> >>>>>>>>>> release
>> >>>>>>>>>>
>> >>>>>>>>>> It seems that there is a problem porting following changes to
>> >> the
>> >>>>>>>>> v1.3.x
>> >>>>>>>>>> release branch:
>> >>>>>>>>>>
>> >>>>>>>>>> Implement mkldnn convolution fusion and quantization
>> >>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/12530
>> >>>>>>>>>>
>> >>>>>>>>>> MKL-DNN Quantization Examples and README
>> >>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/12808
>> >>>>>>>>>>
>> >>>>>>>>>> The bases are different.
>> >>>>>>>>>>
>> >>>>>>>>>> I would need help from authors of these changes to make a
>> >>> backport
>> >>>>> PR.
>> >>>>>>>>>>
>> >>>>>>>>>> @ZhennanQin, @xinyu-intel would you be able to assist me and
>> >>> create
>> >>>>> the
>> >>>>>>>>>> corresponding PR's?
>> >>>>>>>>>>
>> >>>>>>>>>> Without proper history and domain knowledge I would not be able
>> >>> to
>> >>>>>>>>> create
>> >>>>>>>>>> them by my own in reasonable amount of time, I'm afraid.
>> >>>>>>>>>>
>> >>>>>>>>>> Best regards,
>> >>>>>>>>>> Anton
>> >>>>>>>>>>
>> >>>>>>>>>> пн, 5 нояб. 2018 г. в 19:45, Anton Chernov <
>> >> mechernov@gmail.com
>> >>>> :
>> >>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>> As part of:
>> >>>>>>>>>>>
>> >>>>>>>>>>> Implement mkldnn convolution fusion and quantization
>> >>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/12530
>> >>>>>>>>>>>
>> >>>>>>>>>>> I propose to add the examples and documentation PR as well:
>> >>>>>>>>>>>
>> >>>>>>>>>>> MKL-DNN Quantization Examples and README
>> >>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/12808
>> >>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>> Best regards,
>> >>>>>>>>>>> Anton
>> >>>>>>>>>>>
>> >>>>>>>>>>> пн, 5 нояб. 2018 г. в 19:02, Anton Chernov <
>> >> mechernov@gmail.com
>> >>>> :
>> >>>>>>>>>>>
>> >>>>>>>>>>>> Dear MXNet community,
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> I will be the release manager for the upcoming 1.3.1 patch
>> >>>> release.
>> >>>>>>>>>>>> Naveen will be co-managing the release and providing help
>> >> from
>> >>>> the
>> >>>>>>>>>>>> committers side.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> The following dates have been set:
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Code Freeze: 31st October 2018
>> >>>>>>>>>>>> Release published: 13th November 2018
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Release notes have been drafted here [1].
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> * Known issues
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Update MKL-DNN dependency
>> >>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/12953
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> This PR hasn't been merged even to master yet. Requires
>> >>>> additional
>> >>>>>>>>>>>> discussion and merge.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> distributed kvstore bug in MXNet
>> >>>>>>>>>>>> https://github.com/apache/incubator-mxnet/issues/12713
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>> When distributed kvstore is used, by default gluon.Trainer
>> >>>> doesn't
>> >>>>>>>>>>>>> work
>> >>>>>>>>>>>> with mx.optimizer.LRScheduler if a worker has more than 1
>> >> GPU.
>> >>> To
>> >>>>> be
>> >>>>>>>>>>>> more specific, the trainer updates once per GPU, the
>> >>> LRScheduler
>> >>>>>>>>>>>> object is shared across GPUs and get a wrong update count.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> This needs to be fixed. [6]
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> * Changes
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> The following changes will be ported to the release branch,
>> >> per
>> >>>>> [2]:
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Infer dtype in SymbolBlock import from input symbol [3]
>> >>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/12412
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> [MXNET-953] Fix oob memory read
>> >>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/12631
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> [MXNET-969] Fix buffer overflow in RNNOp
>> >>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/12603
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> [MXNET-922] Fix memleak in profiler
>> >>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/12499
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Implement mkldnn convolution fusion and quantization (MXNet
>> >>> Graph
>> >>>>>>>>>>>> Optimization and Quantization based on subgraph and MKL-DNN
>> >>>>>>>>>> proposal
>> >>>>>>>>>>>> [4])
>> >>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/12530
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Following items (test cases) should be already part of 1.3.0:
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> [MXNET-486] Create CPP test for concat MKLDNN operator
>> >>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/11371
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> [MXNET-489] MKLDNN Pool test
>> >>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/11608
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> [MXNET-484] MKLDNN C++ test for LRN operator
>> >>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/11831
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> [MXNET-546] Add unit test for MKLDNNSum
>> >>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/11272
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> [MXNET-498] Test MKLDNN backward operators
>> >>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/11232
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> [MXNET-500] Test cases improvement for MKLDNN on Gluon
>> >>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/10921
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Set correct update on kvstore flag in dist_device_sync mode
>> >> (as
>> >>>>> part
>> >>>>>>>>>>>> of fixing [5])
>> >>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/12786
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> upgrade mshadow version
>> >>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/12692
>> >>>>>>>>>>>> But another PR will be used instead:
>> >>>>>>>>>>>> update mshadow
>> >>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/12674
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> CudnnFind() usage improvements
>> >>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/12804
>> >>>>>>>>>>>> A critical CUDNN fix that reduces GPU memory consumption and
>> >>>>>>>>>>>> addresses this memory leak issue. This is an important fix to
>> >>>>>>>>> include
>> >>>>>>>>>>>> in 1.3.1
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> From discussion about gluon toolkits:
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> disable opencv threading for forked process
>> >>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/12025
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Fix lazy record io when used with dataloader and multi_worker
>> >>>> 0
>> >>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/12554
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> fix potential floating number overflow, enable float16
>> >>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/12118
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> * Resolved issues
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> MxNet 1.2.1–module get_outputs()
>> >>>>>>>>>>>>
>> >> https://discuss.mxnet.io/t/mxnet-1-2-1-module-get-outputs/1882
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> As far as I can see from the comments the issue has been
>> >>>> resolved,
>> >>>>>>>>> no
>> >>>>>>>>>>>> actions need to be taken for this release. [7] is mentioned
>> >> in
>> >>>> this
>> >>>>>>>>>>>> regards, but I don't see any action points here either.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> I will start with help of Naveen port the mentioned PR's to
>> >> the
>> >>>>>>>>> 1.3.x
>> >>>>>>>>>>>> branch.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Best regards,
>> >>>>>>>>>>>> Anton
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> [1] https://cwiki.apache.org/confluence/x/eZGzBQ
>> >>>>>>>>>>>> [2]
>> >>>>>>>>>>>>
>> >>>>>>>>>
>> >>>>
>> https://cwiki.apache.org/confluence/display/MXNET/Project+Proposals+f
>> >>>>>>>>>>>> or+next+MXNet+Release [3]
>> >>>>>>>>>>>> https://github.com/apache/incubator-mxnet/issues/11849
>> >>>>>>>>>>>> [4]
>> >>>>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>
>> >> https://cwiki.apache.org/confluence/display/MXNET/MXNet+Graph+Optimiz
>> >>>>>>>>>>>> ation+and+Quantization+based+on+subgraph+and+MKL-DNN
>> >>>>>>>>>>>> [5] https://github.com/apache/incubator-mxnet/issues/12713
>> >>>>>>>>>>>> [6]
>> >>>>>>>>>>>> https://github.com/apache/incubator-
>> >>>>>>>>>> mxnet/issues/12713#issuecomment-4
>> >>>>>>>>>>>> 35773777 [7]
>> >>>> https://github.com/apache/incubator-mxnet/pull/11005
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>
>> >>>>
>> >>>
>> >>
>>
>

Re: [Announce] Upcoming Apache MXNet (incubating) 1.3.1 patch release

Posted by Anton Chernov <me...@gmail.com>.

Hi Sheng,

thanks for you suggestions. Personally, I would not rush with new major
release as this breaks the pace and creates unnecessary pressure in my
opinion.

If the changes suggested by Haibin are really important then I think we can
consider them for the minor release, even if they are not strictly speaking
*bugfixes*. Do you think that might be an option?

And did I understand correctly, you are suggesting:

[MXNET-1179] Enforce deterministic algorithms in convolution layers
https://github.com/apache/incubator-mxnet/pull/12992

for the 1.3.1 release?

Best
Anton


ср, 7 нояб. 2018 г. в 0:59, Sheng Zha <sz...@gmail.com>:

> Similar to the two PRs that Haibin suggested, 12992 introduces new
> interface for controlling determinism, which is better suited for minor
> release.
>
> I think other than lack of release manager to drive 1.4.0 release, there’s
> no reason we cannot do two releases (1.4.0 & 1.3.1) at the same time. I’m
> willing to help with the 1.4.0 release to make these new features available
> one month sooner, if there’s no other concern.
>
> -sz
>
> > On Nov 6, 2018, at 3:30 PM, Lin Yuan <ap...@gmail.com> wrote:
> >
> > Hi Anton,
> >
> > Thanks for helping the release.
> > The following PRs are needed by customers who want to use deterministic
> > CUDNN convolution algorithms:
> >
> > https://github.com/apache/incubator-mxnet/pull/12992
> > https://github.com/apache/incubator-mxnet/pull/13049
> >
> > Thanks！
> >
> > Lin
> >
> >
> > On Tue, Nov 6, 2018 at 1:51 PM Aaron Markham <aa...@gmail.com>
> > wrote:
> >
> >> Hi Anton,
> >> I have the following suggestions for fixes to include in 1.3.1. These
> each
> >> have updates to files that will impact docs generation for the 1.3.x
> >> version of the website's Python API docs:
> >>
> >> https://github.com/apache/incubator-mxnet/pull/12879
> >> https://github.com/apache/incubator-mxnet/pull/12871
> >> https://github.com/apache/incubator-mxnet/pull/12856
> >>
> >> Thanks,
> >> Aaron
> >>
> >>> On Tue, Nov 6, 2018 at 1:29 PM Lai Wei <ro...@gmail.com> wrote:
> >>>
> >>> Hi Anton,
> >>>
> >>> Thanks for driving this, I would like to include the following fix in
> >>> 1.3.1:
> >>> Allow infer shape partial on foreach operator:
> >>> https://github.com/apache/incubator-mxnet/pull/12471
> >>>
> >>> Keras-MXNet needs this functionality to infer shape partially
> >>> on foreach operator. (Used in RNN operators)
> >>>
> >>> Thanks a lot!
> >>>
> >>>
> >>> Best Regards
> >>> Lai Wei
> >>>
> >>>
> >>>
> >>> On Tue, Nov 6, 2018 at 10:44 AM Haibin Lin <ha...@gmail.com>
> >>> wrote:
> >>>
> >>>> Hi Naveen and Anton,
> >>>>
> >>>> Thanks for pointing that out. You are right that these are not
> critical
> >>>> fixes. Putting them in 1.4.0 is more appropriate. PRs are closed.
> >>>>
> >>>> Best,
> >>>> Haibin
> >>>>
> >>>> On Tue, Nov 6, 2018 at 7:35 AM Naveen Swamy <mn...@gmail.com>
> >> wrote:
> >>>>
> >>>>> Please note that this is a patch release(1.3.1) to address critical
> >>>> bugs!,
> >>>>> For everything else please wait for 1.4.0 which is planned very
> >> shortly
> >>>>> after 1.3.1
> >>>>>
> >>>>>> On Nov 6, 2018, at 7:17 AM, Anton Chernov <me...@gmail.com>
> >>> wrote:
> >>>>>>
> >>>>>> The following PR's have been created so far:
> >>>>>>
> >>>>>> Infer dtype in SymbolBlock import from input symbol (v1.3.x)
> >>>>>> https://github.com/apache/incubator-mxnet/pull/13117
> >>>>>>
> >>>>>> [MXNET-953] Fix oob memory read (v1.3.x)
> >>>>>> https://github.com/apache/incubator-mxnet/pull/13118
> >>>>>>
> >>>>>> [MXNET-969] Fix buffer overflow in RNNOp (v1.3.x)
> >>>>>> https://github.com/apache/incubator-mxnet/pull/13119
> >>>>>>
> >>>>>> [MXNET-922] Fix memleak in profiler (v1.3.x)
> >>>>>> https://github.com/apache/incubator-mxnet/pull/13120
> >>>>>>
> >>>>>> Set correct update on kvstore flag in dist_device_sync mode
> >> (v1.3.x)
> >>>>>> https://github.com/apache/incubator-mxnet/pull/13121
> >>>>>>
> >>>>>> update mshadow (v1.3.x)
> >>>>>> https://github.com/apache/incubator-mxnet/pull/13122
> >>>>>>
> >>>>>> CudnnFind() usage improvements (v1.3.x)
> >>>>>> https://github.com/apache/incubator-mxnet/pull/13123
> >>>>>>
> >>>>>> Fix lazy record io when used with dataloader and multi_worker > 0
> >>>>> (v1.3.x)
> >>>>>> https://github.com/apache/incubator-mxnet/pull/13124
> >>>>>>
> >>>>>>
> >>>>>> As stated previously I would be rather opposed to have following
> >> PR's
> >>>> it
> >>>>> in
> >>>>>> the patch release:
> >>>>>>
> >>>>>> Gluon LSTM Projection and Clipping Support (#13055) v1.3.x
> >>>>>> https://github.com/apache/incubator-mxnet/pull/13129
> >>>>>>
> >>>>>> sample_like operators (#13034) v1.3.x
> >>>>>> https://github.com/apache/incubator-mxnet/pull/13130
> >>>>>>
> >>>>>>
> >>>>>> Best
> >>>>>> Anton
> >>>>>>
> >>>>>> вт, 6 нояб. 2018 г. в 16:06, Anton Chernov <me...@gmail.com>:
> >>>>>>
> >>>>>>> Hi Haibin,
> >>>>>>>
> >>>>>>> I have a few comments regarding the proposed performance
> >> improvement
> >>>>>>> changes.
> >>>>>>>
> >>>>>>> CUDNN support for LSTM with projection & clipping
> >>>>>>> https://github.com/apache/incubator-mxnet/pull/13056
> >>>>>>>
> >>>>>>> There is no doubt that this change brings value, but I don't see
> >> it
> >>>> as a
> >>>>>>> critical bug fix. I would rather leave it for the next major
> >>> release.
> >>>>>>>
> >>>>>>> sample_like operators
> >>>>>>> https://github.com/apache/incubator-mxnet/pull/13034
> >>>>>>>
> >>>>>>> Even if it's related to performance, this is an addition of
> >>>>> functionality
> >>>>>>> and I would also push this to be in the next major release only.
> >>>>>>>
> >>>>>>>
> >>>>>>> Best
> >>>>>>> Anton
> >>>>>>>
> >>>>>>>
> >>>>>>> вт, 6 нояб. 2018 г. в 15:55, Anton Chernov <me...@gmail.com>:
> >>>>>>>
> >>>>>>>> Hi Patric,
> >>>>>>>>
> >>>>>>>> This change was listed in the 'PR candidates suggested for
> >>>>> consideration
> >>>>>>>> for v1.3.1 patch release' section [1].
> >>>>>>>>
> >>>>>>>> You are right, I also think that this is not a critical hotfix
> >>> change
> >>>>>>>> that should be included into the 1.3.1 patch release.
> >>>>>>>>
> >>>>>>>> Thus I'm not making any further efforts to bring it in.
> >>>>>>>>
> >>>>>>>> Best
> >>>>>>>> Anton
> >>>>>>>>
> >>>>>>>> [1]
> >>>>>>>>
> >>>>>
> >>>>
> >>>
> >>
> https://cwiki.apache.org/confluence/display/MXNET/Project+Proposals+for+next+MXNet+Release#PR_candidates
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> вт, 6 нояб. 2018 г. в 1:14, Zhao, Patric <patric.zhao@intel.com
> >>> :
> >>>>>>>>
> >>>>>>>>> Hi Anton,
> >>>>>>>>>
> >>>>>>>>> Thanks for looking into the MKL-DNN PR.
> >>>>>>>>>
> >>>>>>>>> As my understanding of cwiki (
> >>>>>>>>>
> >>>>>
> >>>>
> >>>
> >>
> https://cwiki.apache.org/confluence/display/MXNET/Project+Proposals+for+next+MXNet+Release
> >>>>>>>>> ),
> >>>>>>>>> these features will go into 1.4 rather than patch release of
> >>> 1.3.1.
> >>>>>>>>>
> >>>>>>>>> Feel free to correct me :)
> >>>>>>>>>
> >>>>>>>>> Thanks,
> >>>>>>>>>
> >>>>>>>>> --Patric
> >>>>>>>>>
> >>>>>>>>>> -----Original Message-----
> >>>>>>>>>> From: Anton Chernov [mailto:mechernov@gmail.com]
> >>>>>>>>>> Sent: Tuesday, November 6, 2018 3:11 AM
> >>>>>>>>>> To: dev@mxnet.apache.org
> >>>>>>>>>> Subject: Re: [Announce] Upcoming Apache MXNet (incubating)
> >> 1.3.1
> >>>>> patch
> >>>>>>>>>> release
> >>>>>>>>>>
> >>>>>>>>>> It seems that there is a problem porting following changes to
> >> the
> >>>>>>>>> v1.3.x
> >>>>>>>>>> release branch:
> >>>>>>>>>>
> >>>>>>>>>> Implement mkldnn convolution fusion and quantization
> >>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/12530
> >>>>>>>>>>
> >>>>>>>>>> MKL-DNN Quantization Examples and README
> >>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/12808
> >>>>>>>>>>
> >>>>>>>>>> The bases are different.
> >>>>>>>>>>
> >>>>>>>>>> I would need help from authors of these changes to make a
> >>> backport
> >>>>> PR.
> >>>>>>>>>>
> >>>>>>>>>> @ZhennanQin, @xinyu-intel would you be able to assist me and
> >>> create
> >>>>> the
> >>>>>>>>>> corresponding PR's?
> >>>>>>>>>>
> >>>>>>>>>> Without proper history and domain knowledge I would not be able
> >>> to
> >>>>>>>>> create
> >>>>>>>>>> them by my own in reasonable amount of time, I'm afraid.
> >>>>>>>>>>
> >>>>>>>>>> Best regards,
> >>>>>>>>>> Anton
> >>>>>>>>>>
> >>>>>>>>>> пн, 5 нояб. 2018 г. в 19:45, Anton Chernov <
> >> mechernov@gmail.com
> >>>> :
> >>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> As part of:
> >>>>>>>>>>>
> >>>>>>>>>>> Implement mkldnn convolution fusion and quantization
> >>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/12530
> >>>>>>>>>>>
> >>>>>>>>>>> I propose to add the examples and documentation PR as well:
> >>>>>>>>>>>
> >>>>>>>>>>> MKL-DNN Quantization Examples and README
> >>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/12808
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> Best regards,
> >>>>>>>>>>> Anton
> >>>>>>>>>>>
> >>>>>>>>>>> пн, 5 нояб. 2018 г. в 19:02, Anton Chernov <
> >> mechernov@gmail.com
> >>>> :
> >>>>>>>>>>>
> >>>>>>>>>>>> Dear MXNet community,
> >>>>>>>>>>>>
> >>>>>>>>>>>> I will be the release manager for the upcoming 1.3.1 patch
> >>>> release.
> >>>>>>>>>>>> Naveen will be co-managing the release and providing help
> >> from
> >>>> the
> >>>>>>>>>>>> committers side.
> >>>>>>>>>>>>
> >>>>>>>>>>>> The following dates have been set:
> >>>>>>>>>>>>
> >>>>>>>>>>>> Code Freeze: 31st October 2018
> >>>>>>>>>>>> Release published: 13th November 2018
> >>>>>>>>>>>>
> >>>>>>>>>>>> Release notes have been drafted here [1].
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> * Known issues
> >>>>>>>>>>>>
> >>>>>>>>>>>> Update MKL-DNN dependency
> >>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/12953
> >>>>>>>>>>>>
> >>>>>>>>>>>> This PR hasn't been merged even to master yet. Requires
> >>>> additional
> >>>>>>>>>>>> discussion and merge.
> >>>>>>>>>>>>
> >>>>>>>>>>>> distributed kvstore bug in MXNet
> >>>>>>>>>>>> https://github.com/apache/incubator-mxnet/issues/12713
> >>>>>>>>>>>>
> >>>>>>>>>>>>> When distributed kvstore is used, by default gluon.Trainer
> >>>> doesn't
> >>>>>>>>>>>>> work
> >>>>>>>>>>>> with mx.optimizer.LRScheduler if a worker has more than 1
> >> GPU.
> >>> To
> >>>>> be
> >>>>>>>>>>>> more specific, the trainer updates once per GPU, the
> >>> LRScheduler
> >>>>>>>>>>>> object is shared across GPUs and get a wrong update count.
> >>>>>>>>>>>>
> >>>>>>>>>>>> This needs to be fixed. [6]
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> * Changes
> >>>>>>>>>>>>
> >>>>>>>>>>>> The following changes will be ported to the release branch,
> >> per
> >>>>> [2]:
> >>>>>>>>>>>>
> >>>>>>>>>>>> Infer dtype in SymbolBlock import from input symbol [3]
> >>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/12412
> >>>>>>>>>>>>
> >>>>>>>>>>>> [MXNET-953] Fix oob memory read
> >>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/12631
> >>>>>>>>>>>>
> >>>>>>>>>>>> [MXNET-969] Fix buffer overflow in RNNOp
> >>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/12603
> >>>>>>>>>>>>
> >>>>>>>>>>>> [MXNET-922] Fix memleak in profiler
> >>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/12499
> >>>>>>>>>>>>
> >>>>>>>>>>>> Implement mkldnn convolution fusion and quantization (MXNet
> >>> Graph
> >>>>>>>>>>>> Optimization and Quantization based on subgraph and MKL-DNN
> >>>>>>>>>> proposal
> >>>>>>>>>>>> [4])
> >>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/12530
> >>>>>>>>>>>>
> >>>>>>>>>>>> Following items (test cases) should be already part of 1.3.0:
> >>>>>>>>>>>>
> >>>>>>>>>>>> [MXNET-486] Create CPP test for concat MKLDNN operator
> >>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/11371
> >>>>>>>>>>>>
> >>>>>>>>>>>> [MXNET-489] MKLDNN Pool test
> >>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/11608
> >>>>>>>>>>>>
> >>>>>>>>>>>> [MXNET-484] MKLDNN C++ test for LRN operator
> >>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/11831
> >>>>>>>>>>>>
> >>>>>>>>>>>> [MXNET-546] Add unit test for MKLDNNSum
> >>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/11272
> >>>>>>>>>>>>
> >>>>>>>>>>>> [MXNET-498] Test MKLDNN backward operators
> >>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/11232
> >>>>>>>>>>>>
> >>>>>>>>>>>> [MXNET-500] Test cases improvement for MKLDNN on Gluon
> >>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/10921
> >>>>>>>>>>>>
> >>>>>>>>>>>> Set correct update on kvstore flag in dist_device_sync mode
> >> (as
> >>>>> part
> >>>>>>>>>>>> of fixing [5])
> >>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/12786
> >>>>>>>>>>>>
> >>>>>>>>>>>> upgrade mshadow version
> >>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/12692
> >>>>>>>>>>>> But another PR will be used instead:
> >>>>>>>>>>>> update mshadow
> >>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/12674
> >>>>>>>>>>>>
> >>>>>>>>>>>> CudnnFind() usage improvements
> >>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/12804
> >>>>>>>>>>>> A critical CUDNN fix that reduces GPU memory consumption and
> >>>>>>>>>>>> addresses this memory leak issue. This is an important fix to
> >>>>>>>>> include
> >>>>>>>>>>>> in 1.3.1
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> From discussion about gluon toolkits:
> >>>>>>>>>>>>
> >>>>>>>>>>>> disable opencv threading for forked process
> >>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/12025
> >>>>>>>>>>>>
> >>>>>>>>>>>> Fix lazy record io when used with dataloader and multi_worker
> >>>> 0
> >>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/12554
> >>>>>>>>>>>>
> >>>>>>>>>>>> fix potential floating number overflow, enable float16
> >>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/12118
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> * Resolved issues
> >>>>>>>>>>>>
> >>>>>>>>>>>> MxNet 1.2.1–module get_outputs()
> >>>>>>>>>>>>
> >> https://discuss.mxnet.io/t/mxnet-1-2-1-module-get-outputs/1882
> >>>>>>>>>>>>
> >>>>>>>>>>>> As far as I can see from the comments the issue has been
> >>>> resolved,
> >>>>>>>>> no
> >>>>>>>>>>>> actions need to be taken for this release. [7] is mentioned
> >> in
> >>>> this
> >>>>>>>>>>>> regards, but I don't see any action points here either.
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> I will start with help of Naveen port the mentioned PR's to
> >> the
> >>>>>>>>> 1.3.x
> >>>>>>>>>>>> branch.
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> Best regards,
> >>>>>>>>>>>> Anton
> >>>>>>>>>>>>
> >>>>>>>>>>>> [1] https://cwiki.apache.org/confluence/x/eZGzBQ
> >>>>>>>>>>>> [2]
> >>>>>>>>>>>>
> >>>>>>>>>
> >>>> https://cwiki.apache.org/confluence/display/MXNET/Project+Proposals+f
> >>>>>>>>>>>> or+next+MXNet+Release [3]
> >>>>>>>>>>>> https://github.com/apache/incubator-mxnet/issues/11849
> >>>>>>>>>>>> [4]
> >>>>>>>>>>>>
> >>>>>>>>>>
> >>>>>
> >> https://cwiki.apache.org/confluence/display/MXNET/MXNet+Graph+Optimiz
> >>>>>>>>>>>> ation+and+Quantization+based+on+subgraph+and+MKL-DNN
> >>>>>>>>>>>> [5] https://github.com/apache/incubator-mxnet/issues/12713
> >>>>>>>>>>>> [6]
> >>>>>>>>>>>> https://github.com/apache/incubator-
> >>>>>>>>>> mxnet/issues/12713#issuecomment-4
> >>>>>>>>>>>> 35773777 [7]
> >>>> https://github.com/apache/incubator-mxnet/pull/11005
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>
> >>>>
> >>>
> >>
>

Re: [Announce] Upcoming Apache MXNet (incubating) 1.3.1 patch release

Posted by Sheng Zha <sz...@gmail.com>.

Similar to the two PRs that Haibin suggested, 12992 introduces new interface for controlling determinism, which is better suited for minor release.

I think other than lack of release manager to drive 1.4.0 release, there’s no reason we cannot do two releases (1.4.0 & 1.3.1) at the same time. I’m willing to help with the 1.4.0 release to make these new features available one month sooner, if there’s no other concern.

-sz

> On Nov 6, 2018, at 3:30 PM, Lin Yuan <ap...@gmail.com> wrote:
> 
> Hi Anton,
> 
> Thanks for helping the release.
> The following PRs are needed by customers who want to use deterministic
> CUDNN convolution algorithms:
> 
> https://github.com/apache/incubator-mxnet/pull/12992
> https://github.com/apache/incubator-mxnet/pull/13049
> 
> Thanks！
> 
> Lin
> 
> 
> On Tue, Nov 6, 2018 at 1:51 PM Aaron Markham <aa...@gmail.com>
> wrote:
> 
>> Hi Anton,
>> I have the following suggestions for fixes to include in 1.3.1. These each
>> have updates to files that will impact docs generation for the 1.3.x
>> version of the website's Python API docs:
>> 
>> https://github.com/apache/incubator-mxnet/pull/12879
>> https://github.com/apache/incubator-mxnet/pull/12871
>> https://github.com/apache/incubator-mxnet/pull/12856
>> 
>> Thanks,
>> Aaron
>> 
>>> On Tue, Nov 6, 2018 at 1:29 PM Lai Wei <ro...@gmail.com> wrote:
>>> 
>>> Hi Anton,
>>> 
>>> Thanks for driving this, I would like to include the following fix in
>>> 1.3.1:
>>> Allow infer shape partial on foreach operator:
>>> https://github.com/apache/incubator-mxnet/pull/12471
>>> 
>>> Keras-MXNet needs this functionality to infer shape partially
>>> on foreach operator. (Used in RNN operators)
>>> 
>>> Thanks a lot!
>>> 
>>> 
>>> Best Regards
>>> Lai Wei
>>> 
>>> 
>>> 
>>> On Tue, Nov 6, 2018 at 10:44 AM Haibin Lin <ha...@gmail.com>
>>> wrote:
>>> 
>>>> Hi Naveen and Anton,
>>>> 
>>>> Thanks for pointing that out. You are right that these are not critical
>>>> fixes. Putting them in 1.4.0 is more appropriate. PRs are closed.
>>>> 
>>>> Best,
>>>> Haibin
>>>> 
>>>> On Tue, Nov 6, 2018 at 7:35 AM Naveen Swamy <mn...@gmail.com>
>> wrote:
>>>> 
>>>>> Please note that this is a patch release(1.3.1) to address critical
>>>> bugs!,
>>>>> For everything else please wait for 1.4.0 which is planned very
>> shortly
>>>>> after 1.3.1
>>>>> 
>>>>>> On Nov 6, 2018, at 7:17 AM, Anton Chernov <me...@gmail.com>
>>> wrote:
>>>>>> 
>>>>>> The following PR's have been created so far:
>>>>>> 
>>>>>> Infer dtype in SymbolBlock import from input symbol (v1.3.x)
>>>>>> https://github.com/apache/incubator-mxnet/pull/13117
>>>>>> 
>>>>>> [MXNET-953] Fix oob memory read (v1.3.x)
>>>>>> https://github.com/apache/incubator-mxnet/pull/13118
>>>>>> 
>>>>>> [MXNET-969] Fix buffer overflow in RNNOp (v1.3.x)
>>>>>> https://github.com/apache/incubator-mxnet/pull/13119
>>>>>> 
>>>>>> [MXNET-922] Fix memleak in profiler (v1.3.x)
>>>>>> https://github.com/apache/incubator-mxnet/pull/13120
>>>>>> 
>>>>>> Set correct update on kvstore flag in dist_device_sync mode
>> (v1.3.x)
>>>>>> https://github.com/apache/incubator-mxnet/pull/13121
>>>>>> 
>>>>>> update mshadow (v1.3.x)
>>>>>> https://github.com/apache/incubator-mxnet/pull/13122
>>>>>> 
>>>>>> CudnnFind() usage improvements (v1.3.x)
>>>>>> https://github.com/apache/incubator-mxnet/pull/13123
>>>>>> 
>>>>>> Fix lazy record io when used with dataloader and multi_worker > 0
>>>>> (v1.3.x)
>>>>>> https://github.com/apache/incubator-mxnet/pull/13124
>>>>>> 
>>>>>> 
>>>>>> As stated previously I would be rather opposed to have following
>> PR's
>>>> it
>>>>> in
>>>>>> the patch release:
>>>>>> 
>>>>>> Gluon LSTM Projection and Clipping Support (#13055) v1.3.x
>>>>>> https://github.com/apache/incubator-mxnet/pull/13129
>>>>>> 
>>>>>> sample_like operators (#13034) v1.3.x
>>>>>> https://github.com/apache/incubator-mxnet/pull/13130
>>>>>> 
>>>>>> 
>>>>>> Best
>>>>>> Anton
>>>>>> 
>>>>>> вт, 6 нояб. 2018 г. в 16:06, Anton Chernov <me...@gmail.com>:
>>>>>> 
>>>>>>> Hi Haibin,
>>>>>>> 
>>>>>>> I have a few comments regarding the proposed performance
>> improvement
>>>>>>> changes.
>>>>>>> 
>>>>>>> CUDNN support for LSTM with projection & clipping
>>>>>>> https://github.com/apache/incubator-mxnet/pull/13056
>>>>>>> 
>>>>>>> There is no doubt that this change brings value, but I don't see
>> it
>>>> as a
>>>>>>> critical bug fix. I would rather leave it for the next major
>>> release.
>>>>>>> 
>>>>>>> sample_like operators
>>>>>>> https://github.com/apache/incubator-mxnet/pull/13034
>>>>>>> 
>>>>>>> Even if it's related to performance, this is an addition of
>>>>> functionality
>>>>>>> and I would also push this to be in the next major release only.
>>>>>>> 
>>>>>>> 
>>>>>>> Best
>>>>>>> Anton
>>>>>>> 
>>>>>>> 
>>>>>>> вт, 6 нояб. 2018 г. в 15:55, Anton Chernov <me...@gmail.com>:
>>>>>>> 
>>>>>>>> Hi Patric,
>>>>>>>> 
>>>>>>>> This change was listed in the 'PR candidates suggested for
>>>>> consideration
>>>>>>>> for v1.3.1 patch release' section [1].
>>>>>>>> 
>>>>>>>> You are right, I also think that this is not a critical hotfix
>>> change
>>>>>>>> that should be included into the 1.3.1 patch release.
>>>>>>>> 
>>>>>>>> Thus I'm not making any further efforts to bring it in.
>>>>>>>> 
>>>>>>>> Best
>>>>>>>> Anton
>>>>>>>> 
>>>>>>>> [1]
>>>>>>>> 
>>>>> 
>>>> 
>>> 
>> https://cwiki.apache.org/confluence/display/MXNET/Project+Proposals+for+next+MXNet+Release#PR_candidates
>>>>>>>> 
>>>>>>>> 
>>>>>>>> вт, 6 нояб. 2018 г. в 1:14, Zhao, Patric <patric.zhao@intel.com
>>> :
>>>>>>>> 
>>>>>>>>> Hi Anton,
>>>>>>>>> 
>>>>>>>>> Thanks for looking into the MKL-DNN PR.
>>>>>>>>> 
>>>>>>>>> As my understanding of cwiki (
>>>>>>>>> 
>>>>> 
>>>> 
>>> 
>> https://cwiki.apache.org/confluence/display/MXNET/Project+Proposals+for+next+MXNet+Release
>>>>>>>>> ),
>>>>>>>>> these features will go into 1.4 rather than patch release of
>>> 1.3.1.
>>>>>>>>> 
>>>>>>>>> Feel free to correct me :)
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> 
>>>>>>>>> --Patric
>>>>>>>>> 
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: Anton Chernov [mailto:mechernov@gmail.com]
>>>>>>>>>> Sent: Tuesday, November 6, 2018 3:11 AM
>>>>>>>>>> To: dev@mxnet.apache.org
>>>>>>>>>> Subject: Re: [Announce] Upcoming Apache MXNet (incubating)
>> 1.3.1
>>>>> patch
>>>>>>>>>> release
>>>>>>>>>> 
>>>>>>>>>> It seems that there is a problem porting following changes to
>> the
>>>>>>>>> v1.3.x
>>>>>>>>>> release branch:
>>>>>>>>>> 
>>>>>>>>>> Implement mkldnn convolution fusion and quantization
>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/12530
>>>>>>>>>> 
>>>>>>>>>> MKL-DNN Quantization Examples and README
>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/12808
>>>>>>>>>> 
>>>>>>>>>> The bases are different.
>>>>>>>>>> 
>>>>>>>>>> I would need help from authors of these changes to make a
>>> backport
>>>>> PR.
>>>>>>>>>> 
>>>>>>>>>> @ZhennanQin, @xinyu-intel would you be able to assist me and
>>> create
>>>>> the
>>>>>>>>>> corresponding PR's?
>>>>>>>>>> 
>>>>>>>>>> Without proper history and domain knowledge I would not be able
>>> to
>>>>>>>>> create
>>>>>>>>>> them by my own in reasonable amount of time, I'm afraid.
>>>>>>>>>> 
>>>>>>>>>> Best regards,
>>>>>>>>>> Anton
>>>>>>>>>> 
>>>>>>>>>> пн, 5 нояб. 2018 г. в 19:45, Anton Chernov <
>> mechernov@gmail.com
>>>> :
>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> As part of:
>>>>>>>>>>> 
>>>>>>>>>>> Implement mkldnn convolution fusion and quantization
>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/12530
>>>>>>>>>>> 
>>>>>>>>>>> I propose to add the examples and documentation PR as well:
>>>>>>>>>>> 
>>>>>>>>>>> MKL-DNN Quantization Examples and README
>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/12808
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Best regards,
>>>>>>>>>>> Anton
>>>>>>>>>>> 
>>>>>>>>>>> пн, 5 нояб. 2018 г. в 19:02, Anton Chernov <
>> mechernov@gmail.com
>>>> :
>>>>>>>>>>> 
>>>>>>>>>>>> Dear MXNet community,
>>>>>>>>>>>> 
>>>>>>>>>>>> I will be the release manager for the upcoming 1.3.1 patch
>>>> release.
>>>>>>>>>>>> Naveen will be co-managing the release and providing help
>> from
>>>> the
>>>>>>>>>>>> committers side.
>>>>>>>>>>>> 
>>>>>>>>>>>> The following dates have been set:
>>>>>>>>>>>> 
>>>>>>>>>>>> Code Freeze: 31st October 2018
>>>>>>>>>>>> Release published: 13th November 2018
>>>>>>>>>>>> 
>>>>>>>>>>>> Release notes have been drafted here [1].
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> * Known issues
>>>>>>>>>>>> 
>>>>>>>>>>>> Update MKL-DNN dependency
>>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/12953
>>>>>>>>>>>> 
>>>>>>>>>>>> This PR hasn't been merged even to master yet. Requires
>>>> additional
>>>>>>>>>>>> discussion and merge.
>>>>>>>>>>>> 
>>>>>>>>>>>> distributed kvstore bug in MXNet
>>>>>>>>>>>> https://github.com/apache/incubator-mxnet/issues/12713
>>>>>>>>>>>> 
>>>>>>>>>>>>> When distributed kvstore is used, by default gluon.Trainer
>>>> doesn't
>>>>>>>>>>>>> work
>>>>>>>>>>>> with mx.optimizer.LRScheduler if a worker has more than 1
>> GPU.
>>> To
>>>>> be
>>>>>>>>>>>> more specific, the trainer updates once per GPU, the
>>> LRScheduler
>>>>>>>>>>>> object is shared across GPUs and get a wrong update count.
>>>>>>>>>>>> 
>>>>>>>>>>>> This needs to be fixed. [6]
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> * Changes
>>>>>>>>>>>> 
>>>>>>>>>>>> The following changes will be ported to the release branch,
>> per
>>>>> [2]:
>>>>>>>>>>>> 
>>>>>>>>>>>> Infer dtype in SymbolBlock import from input symbol [3]
>>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/12412
>>>>>>>>>>>> 
>>>>>>>>>>>> [MXNET-953] Fix oob memory read
>>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/12631
>>>>>>>>>>>> 
>>>>>>>>>>>> [MXNET-969] Fix buffer overflow in RNNOp
>>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/12603
>>>>>>>>>>>> 
>>>>>>>>>>>> [MXNET-922] Fix memleak in profiler
>>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/12499
>>>>>>>>>>>> 
>>>>>>>>>>>> Implement mkldnn convolution fusion and quantization (MXNet
>>> Graph
>>>>>>>>>>>> Optimization and Quantization based on subgraph and MKL-DNN
>>>>>>>>>> proposal
>>>>>>>>>>>> [4])
>>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/12530
>>>>>>>>>>>> 
>>>>>>>>>>>> Following items (test cases) should be already part of 1.3.0:
>>>>>>>>>>>> 
>>>>>>>>>>>> [MXNET-486] Create CPP test for concat MKLDNN operator
>>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/11371
>>>>>>>>>>>> 
>>>>>>>>>>>> [MXNET-489] MKLDNN Pool test
>>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/11608
>>>>>>>>>>>> 
>>>>>>>>>>>> [MXNET-484] MKLDNN C++ test for LRN operator
>>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/11831
>>>>>>>>>>>> 
>>>>>>>>>>>> [MXNET-546] Add unit test for MKLDNNSum
>>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/11272
>>>>>>>>>>>> 
>>>>>>>>>>>> [MXNET-498] Test MKLDNN backward operators
>>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/11232
>>>>>>>>>>>> 
>>>>>>>>>>>> [MXNET-500] Test cases improvement for MKLDNN on Gluon
>>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/10921
>>>>>>>>>>>> 
>>>>>>>>>>>> Set correct update on kvstore flag in dist_device_sync mode
>> (as
>>>>> part
>>>>>>>>>>>> of fixing [5])
>>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/12786
>>>>>>>>>>>> 
>>>>>>>>>>>> upgrade mshadow version
>>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/12692
>>>>>>>>>>>> But another PR will be used instead:
>>>>>>>>>>>> update mshadow
>>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/12674
>>>>>>>>>>>> 
>>>>>>>>>>>> CudnnFind() usage improvements
>>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/12804
>>>>>>>>>>>> A critical CUDNN fix that reduces GPU memory consumption and
>>>>>>>>>>>> addresses this memory leak issue. This is an important fix to
>>>>>>>>> include
>>>>>>>>>>>> in 1.3.1
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> From discussion about gluon toolkits:
>>>>>>>>>>>> 
>>>>>>>>>>>> disable opencv threading for forked process
>>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/12025
>>>>>>>>>>>> 
>>>>>>>>>>>> Fix lazy record io when used with dataloader and multi_worker
>>>> 0
>>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/12554
>>>>>>>>>>>> 
>>>>>>>>>>>> fix potential floating number overflow, enable float16
>>>>>>>>>>>> https://github.com/apache/incubator-mxnet/pull/12118
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> * Resolved issues
>>>>>>>>>>>> 
>>>>>>>>>>>> MxNet 1.2.1–module get_outputs()
>>>>>>>>>>>> 
>> https://discuss.mxnet.io/t/mxnet-1-2-1-module-get-outputs/1882
>>>>>>>>>>>> 
>>>>>>>>>>>> As far as I can see from the comments the issue has been
>>>> resolved,
>>>>>>>>> no
>>>>>>>>>>>> actions need to be taken for this release. [7] is mentioned
>> in
>>>> this
>>>>>>>>>>>> regards, but I don't see any action points here either.
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> I will start with help of Naveen port the mentioned PR's to
>> the
>>>>>>>>> 1.3.x
>>>>>>>>>>>> branch.
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> Best regards,
>>>>>>>>>>>> Anton
>>>>>>>>>>>> 
>>>>>>>>>>>> [1] https://cwiki.apache.org/confluence/x/eZGzBQ
>>>>>>>>>>>> [2]
>>>>>>>>>>>> 
>>>>>>>>> 
>>>> https://cwiki.apache.org/confluence/display/MXNET/Project+Proposals+f
>>>>>>>>>>>> or+next+MXNet+Release [3]
>>>>>>>>>>>> https://github.com/apache/incubator-mxnet/issues/11849
>>>>>>>>>>>> [4]
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>> 
>> https://cwiki.apache.org/confluence/display/MXNET/MXNet+Graph+Optimiz
>>>>>>>>>>>> ation+and+Quantization+based+on+subgraph+and+MKL-DNN
>>>>>>>>>>>> [5] https://github.com/apache/incubator-mxnet/issues/12713
>>>>>>>>>>>> [6]
>>>>>>>>>>>> https://github.com/apache/incubator-
>>>>>>>>>> mxnet/issues/12713#issuecomment-4
>>>>>>>>>>>> 35773777 [7]
>>>> https://github.com/apache/incubator-mxnet/pull/11005
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>> 
>>>> 
>>> 
>>

Re: [Announce] Upcoming Apache MXNet (incubating) 1.3.1 patch release

Posted by Lin Yuan <ap...@gmail.com>.

Hi Anton,

Thanks for helping the release.
The following PRs are needed by customers who want to use deterministic
CUDNN convolution algorithms:

https://github.com/apache/incubator-mxnet/pull/12992
https://github.com/apache/incubator-mxnet/pull/13049

Thanks！

Lin


On Tue, Nov 6, 2018 at 1:51 PM Aaron Markham <aa...@gmail.com>
wrote:

> Hi Anton,
> I have the following suggestions for fixes to include in 1.3.1. These each
> have updates to files that will impact docs generation for the 1.3.x
> version of the website's Python API docs:
>
> https://github.com/apache/incubator-mxnet/pull/12879
> https://github.com/apache/incubator-mxnet/pull/12871
> https://github.com/apache/incubator-mxnet/pull/12856
>
> Thanks,
> Aaron
>
> On Tue, Nov 6, 2018 at 1:29 PM Lai Wei <ro...@gmail.com> wrote:
>
> > Hi Anton,
> >
> > Thanks for driving this, I would like to include the following fix in
> > 1.3.1:
> > Allow infer shape partial on foreach operator:
> > https://github.com/apache/incubator-mxnet/pull/12471
> >
> > Keras-MXNet needs this functionality to infer shape partially
> > on foreach operator. (Used in RNN operators)
> >
> > Thanks a lot!
> >
> >
> > Best Regards
> > Lai Wei
> >
> >
> >
> > On Tue, Nov 6, 2018 at 10:44 AM Haibin Lin <ha...@gmail.com>
> > wrote:
> >
> > > Hi Naveen and Anton,
> > >
> > > Thanks for pointing that out. You are right that these are not critical
> > > fixes. Putting them in 1.4.0 is more appropriate. PRs are closed.
> > >
> > > Best,
> > > Haibin
> > >
> > > On Tue, Nov 6, 2018 at 7:35 AM Naveen Swamy <mn...@gmail.com>
> wrote:
> > >
> > > > Please note that this is a patch release(1.3.1) to address critical
> > > bugs!,
> > > > For everything else please wait for 1.4.0 which is planned very
> shortly
> > > > after 1.3.1
> > > >
> > > > > On Nov 6, 2018, at 7:17 AM, Anton Chernov <me...@gmail.com>
> > wrote:
> > > > >
> > > > > The following PR's have been created so far:
> > > > >
> > > > > Infer dtype in SymbolBlock import from input symbol (v1.3.x)
> > > > > https://github.com/apache/incubator-mxnet/pull/13117
> > > > >
> > > > > [MXNET-953] Fix oob memory read (v1.3.x)
> > > > > https://github.com/apache/incubator-mxnet/pull/13118
> > > > >
> > > > > [MXNET-969] Fix buffer overflow in RNNOp (v1.3.x)
> > > > > https://github.com/apache/incubator-mxnet/pull/13119
> > > > >
> > > > > [MXNET-922] Fix memleak in profiler (v1.3.x)
> > > > > https://github.com/apache/incubator-mxnet/pull/13120
> > > > >
> > > > > Set correct update on kvstore flag in dist_device_sync mode
> (v1.3.x)
> > > > > https://github.com/apache/incubator-mxnet/pull/13121
> > > > >
> > > > > update mshadow (v1.3.x)
> > > > > https://github.com/apache/incubator-mxnet/pull/13122
> > > > >
> > > > > CudnnFind() usage improvements (v1.3.x)
> > > > > https://github.com/apache/incubator-mxnet/pull/13123
> > > > >
> > > > > Fix lazy record io when used with dataloader and multi_worker > 0
> > > > (v1.3.x)
> > > > > https://github.com/apache/incubator-mxnet/pull/13124
> > > > >
> > > > >
> > > > > As stated previously I would be rather opposed to have following
> PR's
> > > it
> > > > in
> > > > > the patch release:
> > > > >
> > > > > Gluon LSTM Projection and Clipping Support (#13055) v1.3.x
> > > > > https://github.com/apache/incubator-mxnet/pull/13129
> > > > >
> > > > > sample_like operators (#13034) v1.3.x
> > > > > https://github.com/apache/incubator-mxnet/pull/13130
> > > > >
> > > > >
> > > > > Best
> > > > > Anton
> > > > >
> > > > > вт, 6 нояб. 2018 г. в 16:06, Anton Chernov <me...@gmail.com>:
> > > > >
> > > > >> Hi Haibin,
> > > > >>
> > > > >> I have a few comments regarding the proposed performance
> improvement
> > > > >> changes.
> > > > >>
> > > > >> CUDNN support for LSTM with projection & clipping
> > > > >> https://github.com/apache/incubator-mxnet/pull/13056
> > > > >>
> > > > >> There is no doubt that this change brings value, but I don't see
> it
> > > as a
> > > > >> critical bug fix. I would rather leave it for the next major
> > release.
> > > > >>
> > > > >> sample_like operators
> > > > >> https://github.com/apache/incubator-mxnet/pull/13034
> > > > >>
> > > > >> Even if it's related to performance, this is an addition of
> > > > functionality
> > > > >> and I would also push this to be in the next major release only.
> > > > >>
> > > > >>
> > > > >> Best
> > > > >> Anton
> > > > >>
> > > > >>
> > > > >> вт, 6 нояб. 2018 г. в 15:55, Anton Chernov <me...@gmail.com>:
> > > > >>
> > > > >>> Hi Patric,
> > > > >>>
> > > > >>> This change was listed in the 'PR candidates suggested for
> > > > consideration
> > > > >>> for v1.3.1 patch release' section [1].
> > > > >>>
> > > > >>> You are right, I also think that this is not a critical hotfix
> > change
> > > > >>> that should be included into the 1.3.1 patch release.
> > > > >>>
> > > > >>> Thus I'm not making any further efforts to bring it in.
> > > > >>>
> > > > >>> Best
> > > > >>> Anton
> > > > >>>
> > > > >>> [1]
> > > > >>>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/MXNET/Project+Proposals+for+next+MXNet+Release#PR_candidates
> > > > >>>
> > > > >>>
> > > > >>> вт, 6 нояб. 2018 г. в 1:14, Zhao, Patric <patric.zhao@intel.com
> >:
> > > > >>>
> > > > >>>> Hi Anton,
> > > > >>>>
> > > > >>>> Thanks for looking into the MKL-DNN PR.
> > > > >>>>
> > > > >>>> As my understanding of cwiki (
> > > > >>>>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/MXNET/Project+Proposals+for+next+MXNet+Release
> > > > >>>> ),
> > > > >>>> these features will go into 1.4 rather than patch release of
> > 1.3.1.
> > > > >>>>
> > > > >>>> Feel free to correct me :)
> > > > >>>>
> > > > >>>> Thanks,
> > > > >>>>
> > > > >>>> --Patric
> > > > >>>>
> > > > >>>>> -----Original Message-----
> > > > >>>>> From: Anton Chernov [mailto:mechernov@gmail.com]
> > > > >>>>> Sent: Tuesday, November 6, 2018 3:11 AM
> > > > >>>>> To: dev@mxnet.apache.org
> > > > >>>>> Subject: Re: [Announce] Upcoming Apache MXNet (incubating)
> 1.3.1
> > > > patch
> > > > >>>>> release
> > > > >>>>>
> > > > >>>>> It seems that there is a problem porting following changes to
> the
> > > > >>>> v1.3.x
> > > > >>>>> release branch:
> > > > >>>>>
> > > > >>>>> Implement mkldnn convolution fusion and quantization
> > > > >>>>> https://github.com/apache/incubator-mxnet/pull/12530
> > > > >>>>>
> > > > >>>>> MKL-DNN Quantization Examples and README
> > > > >>>>> https://github.com/apache/incubator-mxnet/pull/12808
> > > > >>>>>
> > > > >>>>> The bases are different.
> > > > >>>>>
> > > > >>>>> I would need help from authors of these changes to make a
> > backport
> > > > PR.
> > > > >>>>>
> > > > >>>>> @ZhennanQin, @xinyu-intel would you be able to assist me and
> > create
> > > > the
> > > > >>>>> corresponding PR's?
> > > > >>>>>
> > > > >>>>> Without proper history and domain knowledge I would not be able
> > to
> > > > >>>> create
> > > > >>>>> them by my own in reasonable amount of time, I'm afraid.
> > > > >>>>>
> > > > >>>>> Best regards,
> > > > >>>>> Anton
> > > > >>>>>
> > > > >>>>> пн, 5 нояб. 2018 г. в 19:45, Anton Chernov <
> mechernov@gmail.com
> > >:
> > > > >>>>>
> > > > >>>>>>
> > > > >>>>>> As part of:
> > > > >>>>>>
> > > > >>>>>> Implement mkldnn convolution fusion and quantization
> > > > >>>>>> https://github.com/apache/incubator-mxnet/pull/12530
> > > > >>>>>>
> > > > >>>>>> I propose to add the examples and documentation PR as well:
> > > > >>>>>>
> > > > >>>>>> MKL-DNN Quantization Examples and README
> > > > >>>>>> https://github.com/apache/incubator-mxnet/pull/12808
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>>>> Best regards,
> > > > >>>>>> Anton
> > > > >>>>>>
> > > > >>>>>> пн, 5 нояб. 2018 г. в 19:02, Anton Chernov <
> mechernov@gmail.com
> > >:
> > > > >>>>>>
> > > > >>>>>>> Dear MXNet community,
> > > > >>>>>>>
> > > > >>>>>>> I will be the release manager for the upcoming 1.3.1 patch
> > > release.
> > > > >>>>>>> Naveen will be co-managing the release and providing help
> from
> > > the
> > > > >>>>>>> committers side.
> > > > >>>>>>>
> > > > >>>>>>> The following dates have been set:
> > > > >>>>>>>
> > > > >>>>>>> Code Freeze: 31st October 2018
> > > > >>>>>>> Release published: 13th November 2018
> > > > >>>>>>>
> > > > >>>>>>> Release notes have been drafted here [1].
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>> * Known issues
> > > > >>>>>>>
> > > > >>>>>>> Update MKL-DNN dependency
> > > > >>>>>>> https://github.com/apache/incubator-mxnet/pull/12953
> > > > >>>>>>>
> > > > >>>>>>> This PR hasn't been merged even to master yet. Requires
> > > additional
> > > > >>>>>>> discussion and merge.
> > > > >>>>>>>
> > > > >>>>>>> distributed kvstore bug in MXNet
> > > > >>>>>>> https://github.com/apache/incubator-mxnet/issues/12713
> > > > >>>>>>>
> > > > >>>>>>>> When distributed kvstore is used, by default gluon.Trainer
> > > doesn't
> > > > >>>>>>>> work
> > > > >>>>>>> with mx.optimizer.LRScheduler if a worker has more than 1
> GPU.
> > To
> > > > be
> > > > >>>>>>> more specific, the trainer updates once per GPU, the
> > LRScheduler
> > > > >>>>>>> object is shared across GPUs and get a wrong update count.
> > > > >>>>>>>
> > > > >>>>>>> This needs to be fixed. [6]
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>> * Changes
> > > > >>>>>>>
> > > > >>>>>>> The following changes will be ported to the release branch,
> per
> > > > [2]:
> > > > >>>>>>>
> > > > >>>>>>> Infer dtype in SymbolBlock import from input symbol [3]
> > > > >>>>>>> https://github.com/apache/incubator-mxnet/pull/12412
> > > > >>>>>>>
> > > > >>>>>>> [MXNET-953] Fix oob memory read
> > > > >>>>>>> https://github.com/apache/incubator-mxnet/pull/12631
> > > > >>>>>>>
> > > > >>>>>>> [MXNET-969] Fix buffer overflow in RNNOp
> > > > >>>>>>> https://github.com/apache/incubator-mxnet/pull/12603
> > > > >>>>>>>
> > > > >>>>>>> [MXNET-922] Fix memleak in profiler
> > > > >>>>>>> https://github.com/apache/incubator-mxnet/pull/12499
> > > > >>>>>>>
> > > > >>>>>>> Implement mkldnn convolution fusion and quantization (MXNet
> > Graph
> > > > >>>>>>> Optimization and Quantization based on subgraph and MKL-DNN
> > > > >>>>> proposal
> > > > >>>>>>> [4])
> > > > >>>>>>> https://github.com/apache/incubator-mxnet/pull/12530
> > > > >>>>>>>
> > > > >>>>>>> Following items (test cases) should be already part of 1.3.0:
> > > > >>>>>>>
> > > > >>>>>>> [MXNET-486] Create CPP test for concat MKLDNN operator
> > > > >>>>>>> https://github.com/apache/incubator-mxnet/pull/11371
> > > > >>>>>>>
> > > > >>>>>>> [MXNET-489] MKLDNN Pool test
> > > > >>>>>>> https://github.com/apache/incubator-mxnet/pull/11608
> > > > >>>>>>>
> > > > >>>>>>> [MXNET-484] MKLDNN C++ test for LRN operator
> > > > >>>>>>> https://github.com/apache/incubator-mxnet/pull/11831
> > > > >>>>>>>
> > > > >>>>>>> [MXNET-546] Add unit test for MKLDNNSum
> > > > >>>>>>> https://github.com/apache/incubator-mxnet/pull/11272
> > > > >>>>>>>
> > > > >>>>>>> [MXNET-498] Test MKLDNN backward operators
> > > > >>>>>>> https://github.com/apache/incubator-mxnet/pull/11232
> > > > >>>>>>>
> > > > >>>>>>> [MXNET-500] Test cases improvement for MKLDNN on Gluon
> > > > >>>>>>> https://github.com/apache/incubator-mxnet/pull/10921
> > > > >>>>>>>
> > > > >>>>>>> Set correct update on kvstore flag in dist_device_sync mode
> (as
> > > > part
> > > > >>>>>>> of fixing [5])
> > > > >>>>>>> https://github.com/apache/incubator-mxnet/pull/12786
> > > > >>>>>>>
> > > > >>>>>>> upgrade mshadow version
> > > > >>>>>>> https://github.com/apache/incubator-mxnet/pull/12692
> > > > >>>>>>> But another PR will be used instead:
> > > > >>>>>>> update mshadow
> > > > >>>>>>> https://github.com/apache/incubator-mxnet/pull/12674
> > > > >>>>>>>
> > > > >>>>>>> CudnnFind() usage improvements
> > > > >>>>>>> https://github.com/apache/incubator-mxnet/pull/12804
> > > > >>>>>>> A critical CUDNN fix that reduces GPU memory consumption and
> > > > >>>>>>> addresses this memory leak issue. This is an important fix to
> > > > >>>> include
> > > > >>>>>>> in 1.3.1
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>> From discussion about gluon toolkits:
> > > > >>>>>>>
> > > > >>>>>>> disable opencv threading for forked process
> > > > >>>>>>> https://github.com/apache/incubator-mxnet/pull/12025
> > > > >>>>>>>
> > > > >>>>>>> Fix lazy record io when used with dataloader and multi_worker
> > > 0
> > > > >>>>>>> https://github.com/apache/incubator-mxnet/pull/12554
> > > > >>>>>>>
> > > > >>>>>>> fix potential floating number overflow, enable float16
> > > > >>>>>>> https://github.com/apache/incubator-mxnet/pull/12118
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>> * Resolved issues
> > > > >>>>>>>
> > > > >>>>>>> MxNet 1.2.1–module get_outputs()
> > > > >>>>>>>
> https://discuss.mxnet.io/t/mxnet-1-2-1-module-get-outputs/1882
> > > > >>>>>>>
> > > > >>>>>>> As far as I can see from the comments the issue has been
> > > resolved,
> > > > >>>> no
> > > > >>>>>>> actions need to be taken for this release. [7] is mentioned
> in
> > > this
> > > > >>>>>>> regards, but I don't see any action points here either.
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>> I will start with help of Naveen port the mentioned PR's to
> the
> > > > >>>> 1.3.x
> > > > >>>>>>> branch.
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>> Best regards,
> > > > >>>>>>> Anton
> > > > >>>>>>>
> > > > >>>>>>> [1] https://cwiki.apache.org/confluence/x/eZGzBQ
> > > > >>>>>>> [2]
> > > > >>>>>>>
> > > > >>>>
> > > https://cwiki.apache.org/confluence/display/MXNET/Project+Proposals+f
> > > > >>>>>>> or+next+MXNet+Release [3]
> > > > >>>>>>> https://github.com/apache/incubator-mxnet/issues/11849
> > > > >>>>>>> [4]
> > > > >>>>>>>
> > > > >>>>>
> > > >
> https://cwiki.apache.org/confluence/display/MXNET/MXNet+Graph+Optimiz
> > > > >>>>>>> ation+and+Quantization+based+on+subgraph+and+MKL-DNN
> > > > >>>>>>> [5] https://github.com/apache/incubator-mxnet/issues/12713
> > > > >>>>>>> [6]
> > > > >>>>>>> https://github.com/apache/incubator-
> > > > >>>>> mxnet/issues/12713#issuecomment-4
> > > > >>>>>>> 35773777 [7]
> > > https://github.com/apache/incubator-mxnet/pull/11005
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>
> > > > >>>
> > > >
> > >
> >
>

Re: [Announce] Upcoming Apache MXNet (incubating) 1.3.1 patch release

Posted by Aaron Markham <aa...@gmail.com>.

Hi Anton,
I have the following suggestions for fixes to include in 1.3.1. These each
have updates to files that will impact docs generation for the 1.3.x
version of the website's Python API docs:

https://github.com/apache/incubator-mxnet/pull/12879
https://github.com/apache/incubator-mxnet/pull/12871
https://github.com/apache/incubator-mxnet/pull/12856

Thanks,
Aaron

On Tue, Nov 6, 2018 at 1:29 PM Lai Wei <ro...@gmail.com> wrote:

> Hi Anton,
>
> Thanks for driving this, I would like to include the following fix in
> 1.3.1:
> Allow infer shape partial on foreach operator:
> https://github.com/apache/incubator-mxnet/pull/12471
>
> Keras-MXNet needs this functionality to infer shape partially
> on foreach operator. (Used in RNN operators)
>
> Thanks a lot!
>
>
> Best Regards
> Lai Wei
>
>
>
> On Tue, Nov 6, 2018 at 10:44 AM Haibin Lin <ha...@gmail.com>
> wrote:
>
> > Hi Naveen and Anton,
> >
> > Thanks for pointing that out. You are right that these are not critical
> > fixes. Putting them in 1.4.0 is more appropriate. PRs are closed.
> >
> > Best,
> > Haibin
> >
> > On Tue, Nov 6, 2018 at 7:35 AM Naveen Swamy <mn...@gmail.com> wrote:
> >
> > > Please note that this is a patch release(1.3.1) to address critical
> > bugs!,
> > > For everything else please wait for 1.4.0 which is planned very shortly
> > > after 1.3.1
> > >
> > > > On Nov 6, 2018, at 7:17 AM, Anton Chernov <me...@gmail.com>
> wrote:
> > > >
> > > > The following PR's have been created so far:
> > > >
> > > > Infer dtype in SymbolBlock import from input symbol (v1.3.x)
> > > > https://github.com/apache/incubator-mxnet/pull/13117
> > > >
> > > > [MXNET-953] Fix oob memory read (v1.3.x)
> > > > https://github.com/apache/incubator-mxnet/pull/13118
> > > >
> > > > [MXNET-969] Fix buffer overflow in RNNOp (v1.3.x)
> > > > https://github.com/apache/incubator-mxnet/pull/13119
> > > >
> > > > [MXNET-922] Fix memleak in profiler (v1.3.x)
> > > > https://github.com/apache/incubator-mxnet/pull/13120
> > > >
> > > > Set correct update on kvstore flag in dist_device_sync mode (v1.3.x)
> > > > https://github.com/apache/incubator-mxnet/pull/13121
> > > >
> > > > update mshadow (v1.3.x)
> > > > https://github.com/apache/incubator-mxnet/pull/13122
> > > >
> > > > CudnnFind() usage improvements (v1.3.x)
> > > > https://github.com/apache/incubator-mxnet/pull/13123
> > > >
> > > > Fix lazy record io when used with dataloader and multi_worker > 0
> > > (v1.3.x)
> > > > https://github.com/apache/incubator-mxnet/pull/13124
> > > >
> > > >
> > > > As stated previously I would be rather opposed to have following PR's
> > it
> > > in
> > > > the patch release:
> > > >
> > > > Gluon LSTM Projection and Clipping Support (#13055) v1.3.x
> > > > https://github.com/apache/incubator-mxnet/pull/13129
> > > >
> > > > sample_like operators (#13034) v1.3.x
> > > > https://github.com/apache/incubator-mxnet/pull/13130
> > > >
> > > >
> > > > Best
> > > > Anton
> > > >
> > > > вт, 6 нояб. 2018 г. в 16:06, Anton Chernov <me...@gmail.com>:
> > > >
> > > >> Hi Haibin,
> > > >>
> > > >> I have a few comments regarding the proposed performance improvement
> > > >> changes.
> > > >>
> > > >> CUDNN support for LSTM with projection & clipping
> > > >> https://github.com/apache/incubator-mxnet/pull/13056
> > > >>
> > > >> There is no doubt that this change brings value, but I don't see it
> > as a
> > > >> critical bug fix. I would rather leave it for the next major
> release.
> > > >>
> > > >> sample_like operators
> > > >> https://github.com/apache/incubator-mxnet/pull/13034
> > > >>
> > > >> Even if it's related to performance, this is an addition of
> > > functionality
> > > >> and I would also push this to be in the next major release only.
> > > >>
> > > >>
> > > >> Best
> > > >> Anton
> > > >>
> > > >>
> > > >> вт, 6 нояб. 2018 г. в 15:55, Anton Chernov <me...@gmail.com>:
> > > >>
> > > >>> Hi Patric,
> > > >>>
> > > >>> This change was listed in the 'PR candidates suggested for
> > > consideration
> > > >>> for v1.3.1 patch release' section [1].
> > > >>>
> > > >>> You are right, I also think that this is not a critical hotfix
> change
> > > >>> that should be included into the 1.3.1 patch release.
> > > >>>
> > > >>> Thus I'm not making any further efforts to bring it in.
> > > >>>
> > > >>> Best
> > > >>> Anton
> > > >>>
> > > >>> [1]
> > > >>>
> > >
> >
> https://cwiki.apache.org/confluence/display/MXNET/Project+Proposals+for+next+MXNet+Release#PR_candidates
> > > >>>
> > > >>>
> > > >>> вт, 6 нояб. 2018 г. в 1:14, Zhao, Patric <pa...@intel.com>:
> > > >>>
> > > >>>> Hi Anton,
> > > >>>>
> > > >>>> Thanks for looking into the MKL-DNN PR.
> > > >>>>
> > > >>>> As my understanding of cwiki (
> > > >>>>
> > >
> >
> https://cwiki.apache.org/confluence/display/MXNET/Project+Proposals+for+next+MXNet+Release
> > > >>>> ),
> > > >>>> these features will go into 1.4 rather than patch release of
> 1.3.1.
> > > >>>>
> > > >>>> Feel free to correct me :)
> > > >>>>
> > > >>>> Thanks,
> > > >>>>
> > > >>>> --Patric
> > > >>>>
> > > >>>>> -----Original Message-----
> > > >>>>> From: Anton Chernov [mailto:mechernov@gmail.com]
> > > >>>>> Sent: Tuesday, November 6, 2018 3:11 AM
> > > >>>>> To: dev@mxnet.apache.org
> > > >>>>> Subject: Re: [Announce] Upcoming Apache MXNet (incubating) 1.3.1
> > > patch
> > > >>>>> release
> > > >>>>>
> > > >>>>> It seems that there is a problem porting following changes to the
> > > >>>> v1.3.x
> > > >>>>> release branch:
> > > >>>>>
> > > >>>>> Implement mkldnn convolution fusion and quantization
> > > >>>>> https://github.com/apache/incubator-mxnet/pull/12530
> > > >>>>>
> > > >>>>> MKL-DNN Quantization Examples and README
> > > >>>>> https://github.com/apache/incubator-mxnet/pull/12808
> > > >>>>>
> > > >>>>> The bases are different.
> > > >>>>>
> > > >>>>> I would need help from authors of these changes to make a
> backport
> > > PR.
> > > >>>>>
> > > >>>>> @ZhennanQin, @xinyu-intel would you be able to assist me and
> create
> > > the
> > > >>>>> corresponding PR's?
> > > >>>>>
> > > >>>>> Without proper history and domain knowledge I would not be able
> to
> > > >>>> create
> > > >>>>> them by my own in reasonable amount of time, I'm afraid.
> > > >>>>>
> > > >>>>> Best regards,
> > > >>>>> Anton
> > > >>>>>
> > > >>>>> пн, 5 нояб. 2018 г. в 19:45, Anton Chernov <mechernov@gmail.com
> >:
> > > >>>>>
> > > >>>>>>
> > > >>>>>> As part of:
> > > >>>>>>
> > > >>>>>> Implement mkldnn convolution fusion and quantization
> > > >>>>>> https://github.com/apache/incubator-mxnet/pull/12530
> > > >>>>>>
> > > >>>>>> I propose to add the examples and documentation PR as well:
> > > >>>>>>
> > > >>>>>> MKL-DNN Quantization Examples and README
> > > >>>>>> https://github.com/apache/incubator-mxnet/pull/12808
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> Best regards,
> > > >>>>>> Anton
> > > >>>>>>
> > > >>>>>> пн, 5 нояб. 2018 г. в 19:02, Anton Chernov <mechernov@gmail.com
> >:
> > > >>>>>>
> > > >>>>>>> Dear MXNet community,
> > > >>>>>>>
> > > >>>>>>> I will be the release manager for the upcoming 1.3.1 patch
> > release.
> > > >>>>>>> Naveen will be co-managing the release and providing help from
> > the
> > > >>>>>>> committers side.
> > > >>>>>>>
> > > >>>>>>> The following dates have been set:
> > > >>>>>>>
> > > >>>>>>> Code Freeze: 31st October 2018
> > > >>>>>>> Release published: 13th November 2018
> > > >>>>>>>
> > > >>>>>>> Release notes have been drafted here [1].
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> * Known issues
> > > >>>>>>>
> > > >>>>>>> Update MKL-DNN dependency
> > > >>>>>>> https://github.com/apache/incubator-mxnet/pull/12953
> > > >>>>>>>
> > > >>>>>>> This PR hasn't been merged even to master yet. Requires
> > additional
> > > >>>>>>> discussion and merge.
> > > >>>>>>>
> > > >>>>>>> distributed kvstore bug in MXNet
> > > >>>>>>> https://github.com/apache/incubator-mxnet/issues/12713
> > > >>>>>>>
> > > >>>>>>>> When distributed kvstore is used, by default gluon.Trainer
> > doesn't
> > > >>>>>>>> work
> > > >>>>>>> with mx.optimizer.LRScheduler if a worker has more than 1 GPU.
> To
> > > be
> > > >>>>>>> more specific, the trainer updates once per GPU, the
> LRScheduler
> > > >>>>>>> object is shared across GPUs and get a wrong update count.
> > > >>>>>>>
> > > >>>>>>> This needs to be fixed. [6]
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> * Changes
> > > >>>>>>>
> > > >>>>>>> The following changes will be ported to the release branch, per
> > > [2]:
> > > >>>>>>>
> > > >>>>>>> Infer dtype in SymbolBlock import from input symbol [3]
> > > >>>>>>> https://github.com/apache/incubator-mxnet/pull/12412
> > > >>>>>>>
> > > >>>>>>> [MXNET-953] Fix oob memory read
> > > >>>>>>> https://github.com/apache/incubator-mxnet/pull/12631
> > > >>>>>>>
> > > >>>>>>> [MXNET-969] Fix buffer overflow in RNNOp
> > > >>>>>>> https://github.com/apache/incubator-mxnet/pull/12603
> > > >>>>>>>
> > > >>>>>>> [MXNET-922] Fix memleak in profiler
> > > >>>>>>> https://github.com/apache/incubator-mxnet/pull/12499
> > > >>>>>>>
> > > >>>>>>> Implement mkldnn convolution fusion and quantization (MXNet
> Graph
> > > >>>>>>> Optimization and Quantization based on subgraph and MKL-DNN
> > > >>>>> proposal
> > > >>>>>>> [4])
> > > >>>>>>> https://github.com/apache/incubator-mxnet/pull/12530
> > > >>>>>>>
> > > >>>>>>> Following items (test cases) should be already part of 1.3.0:
> > > >>>>>>>
> > > >>>>>>> [MXNET-486] Create CPP test for concat MKLDNN operator
> > > >>>>>>> https://github.com/apache/incubator-mxnet/pull/11371
> > > >>>>>>>
> > > >>>>>>> [MXNET-489] MKLDNN Pool test
> > > >>>>>>> https://github.com/apache/incubator-mxnet/pull/11608
> > > >>>>>>>
> > > >>>>>>> [MXNET-484] MKLDNN C++ test for LRN operator
> > > >>>>>>> https://github.com/apache/incubator-mxnet/pull/11831
> > > >>>>>>>
> > > >>>>>>> [MXNET-546] Add unit test for MKLDNNSum
> > > >>>>>>> https://github.com/apache/incubator-mxnet/pull/11272
> > > >>>>>>>
> > > >>>>>>> [MXNET-498] Test MKLDNN backward operators
> > > >>>>>>> https://github.com/apache/incubator-mxnet/pull/11232
> > > >>>>>>>
> > > >>>>>>> [MXNET-500] Test cases improvement for MKLDNN on Gluon
> > > >>>>>>> https://github.com/apache/incubator-mxnet/pull/10921
> > > >>>>>>>
> > > >>>>>>> Set correct update on kvstore flag in dist_device_sync mode (as
> > > part
> > > >>>>>>> of fixing [5])
> > > >>>>>>> https://github.com/apache/incubator-mxnet/pull/12786
> > > >>>>>>>
> > > >>>>>>> upgrade mshadow version
> > > >>>>>>> https://github.com/apache/incubator-mxnet/pull/12692
> > > >>>>>>> But another PR will be used instead:
> > > >>>>>>> update mshadow
> > > >>>>>>> https://github.com/apache/incubator-mxnet/pull/12674
> > > >>>>>>>
> > > >>>>>>> CudnnFind() usage improvements
> > > >>>>>>> https://github.com/apache/incubator-mxnet/pull/12804
> > > >>>>>>> A critical CUDNN fix that reduces GPU memory consumption and
> > > >>>>>>> addresses this memory leak issue. This is an important fix to
> > > >>>> include
> > > >>>>>>> in 1.3.1
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> From discussion about gluon toolkits:
> > > >>>>>>>
> > > >>>>>>> disable opencv threading for forked process
> > > >>>>>>> https://github.com/apache/incubator-mxnet/pull/12025
> > > >>>>>>>
> > > >>>>>>> Fix lazy record io when used with dataloader and multi_worker
> > 0
> > > >>>>>>> https://github.com/apache/incubator-mxnet/pull/12554
> > > >>>>>>>
> > > >>>>>>> fix potential floating number overflow, enable float16
> > > >>>>>>> https://github.com/apache/incubator-mxnet/pull/12118
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> * Resolved issues
> > > >>>>>>>
> > > >>>>>>> MxNet 1.2.1–module get_outputs()
> > > >>>>>>> https://discuss.mxnet.io/t/mxnet-1-2-1-module-get-outputs/1882
> > > >>>>>>>
> > > >>>>>>> As far as I can see from the comments the issue has been
> > resolved,
> > > >>>> no
> > > >>>>>>> actions need to be taken for this release. [7] is mentioned in
> > this
> > > >>>>>>> regards, but I don't see any action points here either.
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> I will start with help of Naveen port the mentioned PR's to the
> > > >>>> 1.3.x
> > > >>>>>>> branch.
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> Best regards,
> > > >>>>>>> Anton
> > > >>>>>>>
> > > >>>>>>> [1] https://cwiki.apache.org/confluence/x/eZGzBQ
> > > >>>>>>> [2]
> > > >>>>>>>
> > > >>>>
> > https://cwiki.apache.org/confluence/display/MXNET/Project+Proposals+f
> > > >>>>>>> or+next+MXNet+Release [3]
> > > >>>>>>> https://github.com/apache/incubator-mxnet/issues/11849
> > > >>>>>>> [4]
> > > >>>>>>>
> > > >>>>>
> > > https://cwiki.apache.org/confluence/display/MXNET/MXNet+Graph+Optimiz
> > > >>>>>>> ation+and+Quantization+based+on+subgraph+and+MKL-DNN
> > > >>>>>>> [5] https://github.com/apache/incubator-mxnet/issues/12713
> > > >>>>>>> [6]
> > > >>>>>>> https://github.com/apache/incubator-
> > > >>>>> mxnet/issues/12713#issuecomment-4
> > > >>>>>>> 35773777 [7]
> > https://github.com/apache/incubator-mxnet/pull/11005
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>
> > > >>>
> > >
> >
>

Re: [Announce] Upcoming Apache MXNet (incubating) 1.3.1 patch release

Posted by Lai Wei <ro...@gmail.com>.

Hi Anton,

Thanks for driving this, I would like to include the following fix in
1.3.1:
Allow infer shape partial on foreach operator:
https://github.com/apache/incubator-mxnet/pull/12471

Keras-MXNet needs this functionality to infer shape partially
on foreach operator. (Used in RNN operators)

Thanks a lot!


Best Regards
Lai Wei



On Tue, Nov 6, 2018 at 10:44 AM Haibin Lin <ha...@gmail.com> wrote:

> Hi Naveen and Anton,
>
> Thanks for pointing that out. You are right that these are not critical
> fixes. Putting them in 1.4.0 is more appropriate. PRs are closed.
>
> Best,
> Haibin
>
> On Tue, Nov 6, 2018 at 7:35 AM Naveen Swamy <mn...@gmail.com> wrote:
>
> > Please note that this is a patch release(1.3.1) to address critical
> bugs!,
> > For everything else please wait for 1.4.0 which is planned very shortly
> > after 1.3.1
> >
> > > On Nov 6, 2018, at 7:17 AM, Anton Chernov <me...@gmail.com> wrote:
> > >
> > > The following PR's have been created so far:
> > >
> > > Infer dtype in SymbolBlock import from input symbol (v1.3.x)
> > > https://github.com/apache/incubator-mxnet/pull/13117
> > >
> > > [MXNET-953] Fix oob memory read (v1.3.x)
> > > https://github.com/apache/incubator-mxnet/pull/13118
> > >
> > > [MXNET-969] Fix buffer overflow in RNNOp (v1.3.x)
> > > https://github.com/apache/incubator-mxnet/pull/13119
> > >
> > > [MXNET-922] Fix memleak in profiler (v1.3.x)
> > > https://github.com/apache/incubator-mxnet/pull/13120
> > >
> > > Set correct update on kvstore flag in dist_device_sync mode (v1.3.x)
> > > https://github.com/apache/incubator-mxnet/pull/13121
> > >
> > > update mshadow (v1.3.x)
> > > https://github.com/apache/incubator-mxnet/pull/13122
> > >
> > > CudnnFind() usage improvements (v1.3.x)
> > > https://github.com/apache/incubator-mxnet/pull/13123
> > >
> > > Fix lazy record io when used with dataloader and multi_worker > 0
> > (v1.3.x)
> > > https://github.com/apache/incubator-mxnet/pull/13124
> > >
> > >
> > > As stated previously I would be rather opposed to have following PR's
> it
> > in
> > > the patch release:
> > >
> > > Gluon LSTM Projection and Clipping Support (#13055) v1.3.x
> > > https://github.com/apache/incubator-mxnet/pull/13129
> > >
> > > sample_like operators (#13034) v1.3.x
> > > https://github.com/apache/incubator-mxnet/pull/13130
> > >
> > >
> > > Best
> > > Anton
> > >
> > > вт, 6 нояб. 2018 г. в 16:06, Anton Chernov <me...@gmail.com>:
> > >
> > >> Hi Haibin,
> > >>
> > >> I have a few comments regarding the proposed performance improvement
> > >> changes.
> > >>
> > >> CUDNN support for LSTM with projection & clipping
> > >> https://github.com/apache/incubator-mxnet/pull/13056
> > >>
> > >> There is no doubt that this change brings value, but I don't see it
> as a
> > >> critical bug fix. I would rather leave it for the next major release.
> > >>
> > >> sample_like operators
> > >> https://github.com/apache/incubator-mxnet/pull/13034
> > >>
> > >> Even if it's related to performance, this is an addition of
> > functionality
> > >> and I would also push this to be in the next major release only.
> > >>
> > >>
> > >> Best
> > >> Anton
> > >>
> > >>
> > >> вт, 6 нояб. 2018 г. в 15:55, Anton Chernov <me...@gmail.com>:
> > >>
> > >>> Hi Patric,
> > >>>
> > >>> This change was listed in the 'PR candidates suggested for
> > consideration
> > >>> for v1.3.1 patch release' section [1].
> > >>>
> > >>> You are right, I also think that this is not a critical hotfix change
> > >>> that should be included into the 1.3.1 patch release.
> > >>>
> > >>> Thus I'm not making any further efforts to bring it in.
> > >>>
> > >>> Best
> > >>> Anton
> > >>>
> > >>> [1]
> > >>>
> >
> https://cwiki.apache.org/confluence/display/MXNET/Project+Proposals+for+next+MXNet+Release#PR_candidates
> > >>>
> > >>>
> > >>> вт, 6 нояб. 2018 г. в 1:14, Zhao, Patric <pa...@intel.com>:
> > >>>
> > >>>> Hi Anton,
> > >>>>
> > >>>> Thanks for looking into the MKL-DNN PR.
> > >>>>
> > >>>> As my understanding of cwiki (
> > >>>>
> >
> https://cwiki.apache.org/confluence/display/MXNET/Project+Proposals+for+next+MXNet+Release
> > >>>> ),
> > >>>> these features will go into 1.4 rather than patch release of 1.3.1.
> > >>>>
> > >>>> Feel free to correct me :)
> > >>>>
> > >>>> Thanks,
> > >>>>
> > >>>> --Patric
> > >>>>
> > >>>>> -----Original Message-----
> > >>>>> From: Anton Chernov [mailto:mechernov@gmail.com]
> > >>>>> Sent: Tuesday, November 6, 2018 3:11 AM
> > >>>>> To: dev@mxnet.apache.org
> > >>>>> Subject: Re: [Announce] Upcoming Apache MXNet (incubating) 1.3.1
> > patch
> > >>>>> release
> > >>>>>
> > >>>>> It seems that there is a problem porting following changes to the
> > >>>> v1.3.x
> > >>>>> release branch:
> > >>>>>
> > >>>>> Implement mkldnn convolution fusion and quantization
> > >>>>> https://github.com/apache/incubator-mxnet/pull/12530
> > >>>>>
> > >>>>> MKL-DNN Quantization Examples and README
> > >>>>> https://github.com/apache/incubator-mxnet/pull/12808
> > >>>>>
> > >>>>> The bases are different.
> > >>>>>
> > >>>>> I would need help from authors of these changes to make a backport
> > PR.
> > >>>>>
> > >>>>> @ZhennanQin, @xinyu-intel would you be able to assist me and create
> > the
> > >>>>> corresponding PR's?
> > >>>>>
> > >>>>> Without proper history and domain knowledge I would not be able to
> > >>>> create
> > >>>>> them by my own in reasonable amount of time, I'm afraid.
> > >>>>>
> > >>>>> Best regards,
> > >>>>> Anton
> > >>>>>
> > >>>>> пн, 5 нояб. 2018 г. в 19:45, Anton Chernov <me...@gmail.com>:
> > >>>>>
> > >>>>>>
> > >>>>>> As part of:
> > >>>>>>
> > >>>>>> Implement mkldnn convolution fusion and quantization
> > >>>>>> https://github.com/apache/incubator-mxnet/pull/12530
> > >>>>>>
> > >>>>>> I propose to add the examples and documentation PR as well:
> > >>>>>>
> > >>>>>> MKL-DNN Quantization Examples and README
> > >>>>>> https://github.com/apache/incubator-mxnet/pull/12808
> > >>>>>>
> > >>>>>>
> > >>>>>> Best regards,
> > >>>>>> Anton
> > >>>>>>
> > >>>>>> пн, 5 нояб. 2018 г. в 19:02, Anton Chernov <me...@gmail.com>:
> > >>>>>>
> > >>>>>>> Dear MXNet community,
> > >>>>>>>
> > >>>>>>> I will be the release manager for the upcoming 1.3.1 patch
> release.
> > >>>>>>> Naveen will be co-managing the release and providing help from
> the
> > >>>>>>> committers side.
> > >>>>>>>
> > >>>>>>> The following dates have been set:
> > >>>>>>>
> > >>>>>>> Code Freeze: 31st October 2018
> > >>>>>>> Release published: 13th November 2018
> > >>>>>>>
> > >>>>>>> Release notes have been drafted here [1].
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> * Known issues
> > >>>>>>>
> > >>>>>>> Update MKL-DNN dependency
> > >>>>>>> https://github.com/apache/incubator-mxnet/pull/12953
> > >>>>>>>
> > >>>>>>> This PR hasn't been merged even to master yet. Requires
> additional
> > >>>>>>> discussion and merge.
> > >>>>>>>
> > >>>>>>> distributed kvstore bug in MXNet
> > >>>>>>> https://github.com/apache/incubator-mxnet/issues/12713
> > >>>>>>>
> > >>>>>>>> When distributed kvstore is used, by default gluon.Trainer
> doesn't
> > >>>>>>>> work
> > >>>>>>> with mx.optimizer.LRScheduler if a worker has more than 1 GPU. To
> > be
> > >>>>>>> more specific, the trainer updates once per GPU, the LRScheduler
> > >>>>>>> object is shared across GPUs and get a wrong update count.
> > >>>>>>>
> > >>>>>>> This needs to be fixed. [6]
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> * Changes
> > >>>>>>>
> > >>>>>>> The following changes will be ported to the release branch, per
> > [2]:
> > >>>>>>>
> > >>>>>>> Infer dtype in SymbolBlock import from input symbol [3]
> > >>>>>>> https://github.com/apache/incubator-mxnet/pull/12412
> > >>>>>>>
> > >>>>>>> [MXNET-953] Fix oob memory read
> > >>>>>>> https://github.com/apache/incubator-mxnet/pull/12631
> > >>>>>>>
> > >>>>>>> [MXNET-969] Fix buffer overflow in RNNOp
> > >>>>>>> https://github.com/apache/incubator-mxnet/pull/12603
> > >>>>>>>
> > >>>>>>> [MXNET-922] Fix memleak in profiler
> > >>>>>>> https://github.com/apache/incubator-mxnet/pull/12499
> > >>>>>>>
> > >>>>>>> Implement mkldnn convolution fusion and quantization (MXNet Graph
> > >>>>>>> Optimization and Quantization based on subgraph and MKL-DNN
> > >>>>> proposal
> > >>>>>>> [4])
> > >>>>>>> https://github.com/apache/incubator-mxnet/pull/12530
> > >>>>>>>
> > >>>>>>> Following items (test cases) should be already part of 1.3.0:
> > >>>>>>>
> > >>>>>>> [MXNET-486] Create CPP test for concat MKLDNN operator
> > >>>>>>> https://github.com/apache/incubator-mxnet/pull/11371
> > >>>>>>>
> > >>>>>>> [MXNET-489] MKLDNN Pool test
> > >>>>>>> https://github.com/apache/incubator-mxnet/pull/11608
> > >>>>>>>
> > >>>>>>> [MXNET-484] MKLDNN C++ test for LRN operator
> > >>>>>>> https://github.com/apache/incubator-mxnet/pull/11831
> > >>>>>>>
> > >>>>>>> [MXNET-546] Add unit test for MKLDNNSum
> > >>>>>>> https://github.com/apache/incubator-mxnet/pull/11272
> > >>>>>>>
> > >>>>>>> [MXNET-498] Test MKLDNN backward operators
> > >>>>>>> https://github.com/apache/incubator-mxnet/pull/11232
> > >>>>>>>
> > >>>>>>> [MXNET-500] Test cases improvement for MKLDNN on Gluon
> > >>>>>>> https://github.com/apache/incubator-mxnet/pull/10921
> > >>>>>>>
> > >>>>>>> Set correct update on kvstore flag in dist_device_sync mode (as
> > part
> > >>>>>>> of fixing [5])
> > >>>>>>> https://github.com/apache/incubator-mxnet/pull/12786
> > >>>>>>>
> > >>>>>>> upgrade mshadow version
> > >>>>>>> https://github.com/apache/incubator-mxnet/pull/12692
> > >>>>>>> But another PR will be used instead:
> > >>>>>>> update mshadow
> > >>>>>>> https://github.com/apache/incubator-mxnet/pull/12674
> > >>>>>>>
> > >>>>>>> CudnnFind() usage improvements
> > >>>>>>> https://github.com/apache/incubator-mxnet/pull/12804
> > >>>>>>> A critical CUDNN fix that reduces GPU memory consumption and
> > >>>>>>> addresses this memory leak issue. This is an important fix to
> > >>>> include
> > >>>>>>> in 1.3.1
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> From discussion about gluon toolkits:
> > >>>>>>>
> > >>>>>>> disable opencv threading for forked process
> > >>>>>>> https://github.com/apache/incubator-mxnet/pull/12025
> > >>>>>>>
> > >>>>>>> Fix lazy record io when used with dataloader and multi_worker > 0
> > >>>>>>> https://github.com/apache/incubator-mxnet/pull/12554
> > >>>>>>>
> > >>>>>>> fix potential floating number overflow, enable float16
> > >>>>>>> https://github.com/apache/incubator-mxnet/pull/12118
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> * Resolved issues
> > >>>>>>>
> > >>>>>>> MxNet 1.2.1–module get_outputs()
> > >>>>>>> https://discuss.mxnet.io/t/mxnet-1-2-1-module-get-outputs/1882
> > >>>>>>>
> > >>>>>>> As far as I can see from the comments the issue has been
> resolved,
> > >>>> no
> > >>>>>>> actions need to be taken for this release. [7] is mentioned in
> this
> > >>>>>>> regards, but I don't see any action points here either.
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> I will start with help of Naveen port the mentioned PR's to the
> > >>>> 1.3.x
> > >>>>>>> branch.
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> Best regards,
> > >>>>>>> Anton
> > >>>>>>>
> > >>>>>>> [1] https://cwiki.apache.org/confluence/x/eZGzBQ
> > >>>>>>> [2]
> > >>>>>>>
> > >>>>
> https://cwiki.apache.org/confluence/display/MXNET/Project+Proposals+f
> > >>>>>>> or+next+MXNet+Release [3]
> > >>>>>>> https://github.com/apache/incubator-mxnet/issues/11849
> > >>>>>>> [4]
> > >>>>>>>
> > >>>>>
> > https://cwiki.apache.org/confluence/display/MXNET/MXNet+Graph+Optimiz
> > >>>>>>> ation+and+Quantization+based+on+subgraph+and+MKL-DNN
> > >>>>>>> [5] https://github.com/apache/incubator-mxnet/issues/12713
> > >>>>>>> [6]
> > >>>>>>> https://github.com/apache/incubator-
> > >>>>> mxnet/issues/12713#issuecomment-4
> > >>>>>>> 35773777 [7]
> https://github.com/apache/incubator-mxnet/pull/11005
> > >>>>>>>
> > >>>>>>>
> > >>>>
> > >>>
> >
>

Re: [Announce] Upcoming Apache MXNet (incubating) 1.3.1 patch release

Posted by Haibin Lin <ha...@gmail.com>.

Hi Naveen and Anton,

Thanks for pointing that out. You are right that these are not critical
fixes. Putting them in 1.4.0 is more appropriate. PRs are closed.

Best,
Haibin

On Tue, Nov 6, 2018 at 7:35 AM Naveen Swamy <mn...@gmail.com> wrote:

> Please note that this is a patch release(1.3.1) to address critical bugs!,
> For everything else please wait for 1.4.0 which is planned very shortly
> after 1.3.1
>
> > On Nov 6, 2018, at 7:17 AM, Anton Chernov <me...@gmail.com> wrote:
> >
> > The following PR's have been created so far:
> >
> > Infer dtype in SymbolBlock import from input symbol (v1.3.x)
> > https://github.com/apache/incubator-mxnet/pull/13117
> >
> > [MXNET-953] Fix oob memory read (v1.3.x)
> > https://github.com/apache/incubator-mxnet/pull/13118
> >
> > [MXNET-969] Fix buffer overflow in RNNOp (v1.3.x)
> > https://github.com/apache/incubator-mxnet/pull/13119
> >
> > [MXNET-922] Fix memleak in profiler (v1.3.x)
> > https://github.com/apache/incubator-mxnet/pull/13120
> >
> > Set correct update on kvstore flag in dist_device_sync mode (v1.3.x)
> > https://github.com/apache/incubator-mxnet/pull/13121
> >
> > update mshadow (v1.3.x)
> > https://github.com/apache/incubator-mxnet/pull/13122
> >
> > CudnnFind() usage improvements (v1.3.x)
> > https://github.com/apache/incubator-mxnet/pull/13123
> >
> > Fix lazy record io when used with dataloader and multi_worker > 0
> (v1.3.x)
> > https://github.com/apache/incubator-mxnet/pull/13124
> >
> >
> > As stated previously I would be rather opposed to have following PR's it
> in
> > the patch release:
> >
> > Gluon LSTM Projection and Clipping Support (#13055) v1.3.x
> > https://github.com/apache/incubator-mxnet/pull/13129
> >
> > sample_like operators (#13034) v1.3.x
> > https://github.com/apache/incubator-mxnet/pull/13130
> >
> >
> > Best
> > Anton
> >
> > вт, 6 нояб. 2018 г. в 16:06, Anton Chernov <me...@gmail.com>:
> >
> >> Hi Haibin,
> >>
> >> I have a few comments regarding the proposed performance improvement
> >> changes.
> >>
> >> CUDNN support for LSTM with projection & clipping
> >> https://github.com/apache/incubator-mxnet/pull/13056
> >>
> >> There is no doubt that this change brings value, but I don't see it as a
> >> critical bug fix. I would rather leave it for the next major release.
> >>
> >> sample_like operators
> >> https://github.com/apache/incubator-mxnet/pull/13034
> >>
> >> Even if it's related to performance, this is an addition of
> functionality
> >> and I would also push this to be in the next major release only.
> >>
> >>
> >> Best
> >> Anton
> >>
> >>
> >> вт, 6 нояб. 2018 г. в 15:55, Anton Chernov <me...@gmail.com>:
> >>
> >>> Hi Patric,
> >>>
> >>> This change was listed in the 'PR candidates suggested for
> consideration
> >>> for v1.3.1 patch release' section [1].
> >>>
> >>> You are right, I also think that this is not a critical hotfix change
> >>> that should be included into the 1.3.1 patch release.
> >>>
> >>> Thus I'm not making any further efforts to bring it in.
> >>>
> >>> Best
> >>> Anton
> >>>
> >>> [1]
> >>>
> https://cwiki.apache.org/confluence/display/MXNET/Project+Proposals+for+next+MXNet+Release#PR_candidates
> >>>
> >>>
> >>> вт, 6 нояб. 2018 г. в 1:14, Zhao, Patric <pa...@intel.com>:
> >>>
> >>>> Hi Anton,
> >>>>
> >>>> Thanks for looking into the MKL-DNN PR.
> >>>>
> >>>> As my understanding of cwiki (
> >>>>
> https://cwiki.apache.org/confluence/display/MXNET/Project+Proposals+for+next+MXNet+Release
> >>>> ),
> >>>> these features will go into 1.4 rather than patch release of 1.3.1.
> >>>>
> >>>> Feel free to correct me :)
> >>>>
> >>>> Thanks,
> >>>>
> >>>> --Patric
> >>>>
> >>>>> -----Original Message-----
> >>>>> From: Anton Chernov [mailto:mechernov@gmail.com]
> >>>>> Sent: Tuesday, November 6, 2018 3:11 AM
> >>>>> To: dev@mxnet.apache.org
> >>>>> Subject: Re: [Announce] Upcoming Apache MXNet (incubating) 1.3.1
> patch
> >>>>> release
> >>>>>
> >>>>> It seems that there is a problem porting following changes to the
> >>>> v1.3.x
> >>>>> release branch:
> >>>>>
> >>>>> Implement mkldnn convolution fusion and quantization
> >>>>> https://github.com/apache/incubator-mxnet/pull/12530
> >>>>>
> >>>>> MKL-DNN Quantization Examples and README
> >>>>> https://github.com/apache/incubator-mxnet/pull/12808
> >>>>>
> >>>>> The bases are different.
> >>>>>
> >>>>> I would need help from authors of these changes to make a backport
> PR.
> >>>>>
> >>>>> @ZhennanQin, @xinyu-intel would you be able to assist me and create
> the
> >>>>> corresponding PR's?
> >>>>>
> >>>>> Without proper history and domain knowledge I would not be able to
> >>>> create
> >>>>> them by my own in reasonable amount of time, I'm afraid.
> >>>>>
> >>>>> Best regards,
> >>>>> Anton
> >>>>>
> >>>>> пн, 5 нояб. 2018 г. в 19:45, Anton Chernov <me...@gmail.com>:
> >>>>>
> >>>>>>
> >>>>>> As part of:
> >>>>>>
> >>>>>> Implement mkldnn convolution fusion and quantization
> >>>>>> https://github.com/apache/incubator-mxnet/pull/12530
> >>>>>>
> >>>>>> I propose to add the examples and documentation PR as well:
> >>>>>>
> >>>>>> MKL-DNN Quantization Examples and README
> >>>>>> https://github.com/apache/incubator-mxnet/pull/12808
> >>>>>>
> >>>>>>
> >>>>>> Best regards,
> >>>>>> Anton
> >>>>>>
> >>>>>> пн, 5 нояб. 2018 г. в 19:02, Anton Chernov <me...@gmail.com>:
> >>>>>>
> >>>>>>> Dear MXNet community,
> >>>>>>>
> >>>>>>> I will be the release manager for the upcoming 1.3.1 patch release.
> >>>>>>> Naveen will be co-managing the release and providing help from the
> >>>>>>> committers side.
> >>>>>>>
> >>>>>>> The following dates have been set:
> >>>>>>>
> >>>>>>> Code Freeze: 31st October 2018
> >>>>>>> Release published: 13th November 2018
> >>>>>>>
> >>>>>>> Release notes have been drafted here [1].
> >>>>>>>
> >>>>>>>
> >>>>>>> * Known issues
> >>>>>>>
> >>>>>>> Update MKL-DNN dependency
> >>>>>>> https://github.com/apache/incubator-mxnet/pull/12953
> >>>>>>>
> >>>>>>> This PR hasn't been merged even to master yet. Requires additional
> >>>>>>> discussion and merge.
> >>>>>>>
> >>>>>>> distributed kvstore bug in MXNet
> >>>>>>> https://github.com/apache/incubator-mxnet/issues/12713
> >>>>>>>
> >>>>>>>> When distributed kvstore is used, by default gluon.Trainer doesn't
> >>>>>>>> work
> >>>>>>> with mx.optimizer.LRScheduler if a worker has more than 1 GPU. To
> be
> >>>>>>> more specific, the trainer updates once per GPU, the LRScheduler
> >>>>>>> object is shared across GPUs and get a wrong update count.
> >>>>>>>
> >>>>>>> This needs to be fixed. [6]
> >>>>>>>
> >>>>>>>
> >>>>>>> * Changes
> >>>>>>>
> >>>>>>> The following changes will be ported to the release branch, per
> [2]:
> >>>>>>>
> >>>>>>> Infer dtype in SymbolBlock import from input symbol [3]
> >>>>>>> https://github.com/apache/incubator-mxnet/pull/12412
> >>>>>>>
> >>>>>>> [MXNET-953] Fix oob memory read
> >>>>>>> https://github.com/apache/incubator-mxnet/pull/12631
> >>>>>>>
> >>>>>>> [MXNET-969] Fix buffer overflow in RNNOp
> >>>>>>> https://github.com/apache/incubator-mxnet/pull/12603
> >>>>>>>
> >>>>>>> [MXNET-922] Fix memleak in profiler
> >>>>>>> https://github.com/apache/incubator-mxnet/pull/12499
> >>>>>>>
> >>>>>>> Implement mkldnn convolution fusion and quantization (MXNet Graph
> >>>>>>> Optimization and Quantization based on subgraph and MKL-DNN
> >>>>> proposal
> >>>>>>> [4])
> >>>>>>> https://github.com/apache/incubator-mxnet/pull/12530
> >>>>>>>
> >>>>>>> Following items (test cases) should be already part of 1.3.0:
> >>>>>>>
> >>>>>>> [MXNET-486] Create CPP test for concat MKLDNN operator
> >>>>>>> https://github.com/apache/incubator-mxnet/pull/11371
> >>>>>>>
> >>>>>>> [MXNET-489] MKLDNN Pool test
> >>>>>>> https://github.com/apache/incubator-mxnet/pull/11608
> >>>>>>>
> >>>>>>> [MXNET-484] MKLDNN C++ test for LRN operator
> >>>>>>> https://github.com/apache/incubator-mxnet/pull/11831
> >>>>>>>
> >>>>>>> [MXNET-546] Add unit test for MKLDNNSum
> >>>>>>> https://github.com/apache/incubator-mxnet/pull/11272
> >>>>>>>
> >>>>>>> [MXNET-498] Test MKLDNN backward operators
> >>>>>>> https://github.com/apache/incubator-mxnet/pull/11232
> >>>>>>>
> >>>>>>> [MXNET-500] Test cases improvement for MKLDNN on Gluon
> >>>>>>> https://github.com/apache/incubator-mxnet/pull/10921
> >>>>>>>
> >>>>>>> Set correct update on kvstore flag in dist_device_sync mode (as
> part
> >>>>>>> of fixing [5])
> >>>>>>> https://github.com/apache/incubator-mxnet/pull/12786
> >>>>>>>
> >>>>>>> upgrade mshadow version
> >>>>>>> https://github.com/apache/incubator-mxnet/pull/12692
> >>>>>>> But another PR will be used instead:
> >>>>>>> update mshadow
> >>>>>>> https://github.com/apache/incubator-mxnet/pull/12674
> >>>>>>>
> >>>>>>> CudnnFind() usage improvements
> >>>>>>> https://github.com/apache/incubator-mxnet/pull/12804
> >>>>>>> A critical CUDNN fix that reduces GPU memory consumption and
> >>>>>>> addresses this memory leak issue. This is an important fix to
> >>>> include
> >>>>>>> in 1.3.1
> >>>>>>>
> >>>>>>>
> >>>>>>> From discussion about gluon toolkits:
> >>>>>>>
> >>>>>>> disable opencv threading for forked process
> >>>>>>> https://github.com/apache/incubator-mxnet/pull/12025
> >>>>>>>
> >>>>>>> Fix lazy record io when used with dataloader and multi_worker > 0
> >>>>>>> https://github.com/apache/incubator-mxnet/pull/12554
> >>>>>>>
> >>>>>>> fix potential floating number overflow, enable float16
> >>>>>>> https://github.com/apache/incubator-mxnet/pull/12118
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> * Resolved issues
> >>>>>>>
> >>>>>>> MxNet 1.2.1–module get_outputs()
> >>>>>>> https://discuss.mxnet.io/t/mxnet-1-2-1-module-get-outputs/1882
> >>>>>>>
> >>>>>>> As far as I can see from the comments the issue has been resolved,
> >>>> no
> >>>>>>> actions need to be taken for this release. [7] is mentioned in this
> >>>>>>> regards, but I don't see any action points here either.
> >>>>>>>
> >>>>>>>
> >>>>>>> I will start with help of Naveen port the mentioned PR's to the
> >>>> 1.3.x
> >>>>>>> branch.
> >>>>>>>
> >>>>>>>
> >>>>>>> Best regards,
> >>>>>>> Anton
> >>>>>>>
> >>>>>>> [1] https://cwiki.apache.org/confluence/x/eZGzBQ
> >>>>>>> [2]
> >>>>>>>
> >>>> https://cwiki.apache.org/confluence/display/MXNET/Project+Proposals+f
> >>>>>>> or+next+MXNet+Release [3]
> >>>>>>> https://github.com/apache/incubator-mxnet/issues/11849
> >>>>>>> [4]
> >>>>>>>
> >>>>>
> https://cwiki.apache.org/confluence/display/MXNET/MXNet+Graph+Optimiz
> >>>>>>> ation+and+Quantization+based+on+subgraph+and+MKL-DNN
> >>>>>>> [5] https://github.com/apache/incubator-mxnet/issues/12713
> >>>>>>> [6]
> >>>>>>> https://github.com/apache/incubator-
> >>>>> mxnet/issues/12713#issuecomment-4
> >>>>>>> 35773777 [7] https://github.com/apache/incubator-mxnet/pull/11005
> >>>>>>>
> >>>>>>>
> >>>>
> >>>
>

Re: [Announce] Upcoming Apache MXNet (incubating) 1.3.1 patch release

Posted by Naveen Swamy <mn...@gmail.com>.

Please note that this is a patch release(1.3.1) to address critical bugs!, For everything else please wait for 1.4.0 which is planned very shortly after 1.3.1

> On Nov 6, 2018, at 7:17 AM, Anton Chernov <me...@gmail.com> wrote:
> 
> The following PR's have been created so far:
> 
> Infer dtype in SymbolBlock import from input symbol (v1.3.x)
> https://github.com/apache/incubator-mxnet/pull/13117
> 
> [MXNET-953] Fix oob memory read (v1.3.x)
> https://github.com/apache/incubator-mxnet/pull/13118
> 
> [MXNET-969] Fix buffer overflow in RNNOp (v1.3.x)
> https://github.com/apache/incubator-mxnet/pull/13119
> 
> [MXNET-922] Fix memleak in profiler (v1.3.x)
> https://github.com/apache/incubator-mxnet/pull/13120
> 
> Set correct update on kvstore flag in dist_device_sync mode (v1.3.x)
> https://github.com/apache/incubator-mxnet/pull/13121
> 
> update mshadow (v1.3.x)
> https://github.com/apache/incubator-mxnet/pull/13122
> 
> CudnnFind() usage improvements (v1.3.x)
> https://github.com/apache/incubator-mxnet/pull/13123
> 
> Fix lazy record io when used with dataloader and multi_worker > 0 (v1.3.x)
> https://github.com/apache/incubator-mxnet/pull/13124
> 
> 
> As stated previously I would be rather opposed to have following PR's it in
> the patch release:
> 
> Gluon LSTM Projection and Clipping Support (#13055) v1.3.x
> https://github.com/apache/incubator-mxnet/pull/13129
> 
> sample_like operators (#13034) v1.3.x
> https://github.com/apache/incubator-mxnet/pull/13130
> 
> 
> Best
> Anton
> 
> вт, 6 нояб. 2018 г. в 16:06, Anton Chernov <me...@gmail.com>:
> 
>> Hi Haibin,
>> 
>> I have a few comments regarding the proposed performance improvement
>> changes.
>> 
>> CUDNN support for LSTM with projection & clipping
>> https://github.com/apache/incubator-mxnet/pull/13056
>> 
>> There is no doubt that this change brings value, but I don't see it as a
>> critical bug fix. I would rather leave it for the next major release.
>> 
>> sample_like operators
>> https://github.com/apache/incubator-mxnet/pull/13034
>> 
>> Even if it's related to performance, this is an addition of functionality
>> and I would also push this to be in the next major release only.
>> 
>> 
>> Best
>> Anton
>> 
>> 
>> вт, 6 нояб. 2018 г. в 15:55, Anton Chernov <me...@gmail.com>:
>> 
>>> Hi Patric,
>>> 
>>> This change was listed in the 'PR candidates suggested for consideration
>>> for v1.3.1 patch release' section [1].
>>> 
>>> You are right, I also think that this is not a critical hotfix change
>>> that should be included into the 1.3.1 patch release.
>>> 
>>> Thus I'm not making any further efforts to bring it in.
>>> 
>>> Best
>>> Anton
>>> 
>>> [1]
>>> https://cwiki.apache.org/confluence/display/MXNET/Project+Proposals+for+next+MXNet+Release#PR_candidates
>>> 
>>> 
>>> вт, 6 нояб. 2018 г. в 1:14, Zhao, Patric <pa...@intel.com>:
>>> 
>>>> Hi Anton,
>>>> 
>>>> Thanks for looking into the MKL-DNN PR.
>>>> 
>>>> As my understanding of cwiki (
>>>> https://cwiki.apache.org/confluence/display/MXNET/Project+Proposals+for+next+MXNet+Release
>>>> ),
>>>> these features will go into 1.4 rather than patch release of 1.3.1.
>>>> 
>>>> Feel free to correct me :)
>>>> 
>>>> Thanks,
>>>> 
>>>> --Patric
>>>> 
>>>>> -----Original Message-----
>>>>> From: Anton Chernov [mailto:mechernov@gmail.com]
>>>>> Sent: Tuesday, November 6, 2018 3:11 AM
>>>>> To: dev@mxnet.apache.org
>>>>> Subject: Re: [Announce] Upcoming Apache MXNet (incubating) 1.3.1 patch
>>>>> release
>>>>> 
>>>>> It seems that there is a problem porting following changes to the
>>>> v1.3.x
>>>>> release branch:
>>>>> 
>>>>> Implement mkldnn convolution fusion and quantization
>>>>> https://github.com/apache/incubator-mxnet/pull/12530
>>>>> 
>>>>> MKL-DNN Quantization Examples and README
>>>>> https://github.com/apache/incubator-mxnet/pull/12808
>>>>> 
>>>>> The bases are different.
>>>>> 
>>>>> I would need help from authors of these changes to make a backport PR.
>>>>> 
>>>>> @ZhennanQin, @xinyu-intel would you be able to assist me and create the
>>>>> corresponding PR's?
>>>>> 
>>>>> Without proper history and domain knowledge I would not be able to
>>>> create
>>>>> them by my own in reasonable amount of time, I'm afraid.
>>>>> 
>>>>> Best regards,
>>>>> Anton
>>>>> 
>>>>> пн, 5 нояб. 2018 г. в 19:45, Anton Chernov <me...@gmail.com>:
>>>>> 
>>>>>> 
>>>>>> As part of:
>>>>>> 
>>>>>> Implement mkldnn convolution fusion and quantization
>>>>>> https://github.com/apache/incubator-mxnet/pull/12530
>>>>>> 
>>>>>> I propose to add the examples and documentation PR as well:
>>>>>> 
>>>>>> MKL-DNN Quantization Examples and README
>>>>>> https://github.com/apache/incubator-mxnet/pull/12808
>>>>>> 
>>>>>> 
>>>>>> Best regards,
>>>>>> Anton
>>>>>> 
>>>>>> пн, 5 нояб. 2018 г. в 19:02, Anton Chernov <me...@gmail.com>:
>>>>>> 
>>>>>>> Dear MXNet community,
>>>>>>> 
>>>>>>> I will be the release manager for the upcoming 1.3.1 patch release.
>>>>>>> Naveen will be co-managing the release and providing help from the
>>>>>>> committers side.
>>>>>>> 
>>>>>>> The following dates have been set:
>>>>>>> 
>>>>>>> Code Freeze: 31st October 2018
>>>>>>> Release published: 13th November 2018
>>>>>>> 
>>>>>>> Release notes have been drafted here [1].
>>>>>>> 
>>>>>>> 
>>>>>>> * Known issues
>>>>>>> 
>>>>>>> Update MKL-DNN dependency
>>>>>>> https://github.com/apache/incubator-mxnet/pull/12953
>>>>>>> 
>>>>>>> This PR hasn't been merged even to master yet. Requires additional
>>>>>>> discussion and merge.
>>>>>>> 
>>>>>>> distributed kvstore bug in MXNet
>>>>>>> https://github.com/apache/incubator-mxnet/issues/12713
>>>>>>> 
>>>>>>>> When distributed kvstore is used, by default gluon.Trainer doesn't
>>>>>>>> work
>>>>>>> with mx.optimizer.LRScheduler if a worker has more than 1 GPU. To be
>>>>>>> more specific, the trainer updates once per GPU, the LRScheduler
>>>>>>> object is shared across GPUs and get a wrong update count.
>>>>>>> 
>>>>>>> This needs to be fixed. [6]
>>>>>>> 
>>>>>>> 
>>>>>>> * Changes
>>>>>>> 
>>>>>>> The following changes will be ported to the release branch, per [2]:
>>>>>>> 
>>>>>>> Infer dtype in SymbolBlock import from input symbol [3]
>>>>>>> https://github.com/apache/incubator-mxnet/pull/12412
>>>>>>> 
>>>>>>> [MXNET-953] Fix oob memory read
>>>>>>> https://github.com/apache/incubator-mxnet/pull/12631
>>>>>>> 
>>>>>>> [MXNET-969] Fix buffer overflow in RNNOp
>>>>>>> https://github.com/apache/incubator-mxnet/pull/12603
>>>>>>> 
>>>>>>> [MXNET-922] Fix memleak in profiler
>>>>>>> https://github.com/apache/incubator-mxnet/pull/12499
>>>>>>> 
>>>>>>> Implement mkldnn convolution fusion and quantization (MXNet Graph
>>>>>>> Optimization and Quantization based on subgraph and MKL-DNN
>>>>> proposal
>>>>>>> [4])
>>>>>>> https://github.com/apache/incubator-mxnet/pull/12530
>>>>>>> 
>>>>>>> Following items (test cases) should be already part of 1.3.0:
>>>>>>> 
>>>>>>> [MXNET-486] Create CPP test for concat MKLDNN operator
>>>>>>> https://github.com/apache/incubator-mxnet/pull/11371
>>>>>>> 
>>>>>>> [MXNET-489] MKLDNN Pool test
>>>>>>> https://github.com/apache/incubator-mxnet/pull/11608
>>>>>>> 
>>>>>>> [MXNET-484] MKLDNN C++ test for LRN operator
>>>>>>> https://github.com/apache/incubator-mxnet/pull/11831
>>>>>>> 
>>>>>>> [MXNET-546] Add unit test for MKLDNNSum
>>>>>>> https://github.com/apache/incubator-mxnet/pull/11272
>>>>>>> 
>>>>>>> [MXNET-498] Test MKLDNN backward operators
>>>>>>> https://github.com/apache/incubator-mxnet/pull/11232
>>>>>>> 
>>>>>>> [MXNET-500] Test cases improvement for MKLDNN on Gluon
>>>>>>> https://github.com/apache/incubator-mxnet/pull/10921
>>>>>>> 
>>>>>>> Set correct update on kvstore flag in dist_device_sync mode (as part
>>>>>>> of fixing [5])
>>>>>>> https://github.com/apache/incubator-mxnet/pull/12786
>>>>>>> 
>>>>>>> upgrade mshadow version
>>>>>>> https://github.com/apache/incubator-mxnet/pull/12692
>>>>>>> But another PR will be used instead:
>>>>>>> update mshadow
>>>>>>> https://github.com/apache/incubator-mxnet/pull/12674
>>>>>>> 
>>>>>>> CudnnFind() usage improvements
>>>>>>> https://github.com/apache/incubator-mxnet/pull/12804
>>>>>>> A critical CUDNN fix that reduces GPU memory consumption and
>>>>>>> addresses this memory leak issue. This is an important fix to
>>>> include
>>>>>>> in 1.3.1
>>>>>>> 
>>>>>>> 
>>>>>>> From discussion about gluon toolkits:
>>>>>>> 
>>>>>>> disable opencv threading for forked process
>>>>>>> https://github.com/apache/incubator-mxnet/pull/12025
>>>>>>> 
>>>>>>> Fix lazy record io when used with dataloader and multi_worker > 0
>>>>>>> https://github.com/apache/incubator-mxnet/pull/12554
>>>>>>> 
>>>>>>> fix potential floating number overflow, enable float16
>>>>>>> https://github.com/apache/incubator-mxnet/pull/12118
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> * Resolved issues
>>>>>>> 
>>>>>>> MxNet 1.2.1–module get_outputs()
>>>>>>> https://discuss.mxnet.io/t/mxnet-1-2-1-module-get-outputs/1882
>>>>>>> 
>>>>>>> As far as I can see from the comments the issue has been resolved,
>>>> no
>>>>>>> actions need to be taken for this release. [7] is mentioned in this
>>>>>>> regards, but I don't see any action points here either.
>>>>>>> 
>>>>>>> 
>>>>>>> I will start with help of Naveen port the mentioned PR's to the
>>>> 1.3.x
>>>>>>> branch.
>>>>>>> 
>>>>>>> 
>>>>>>> Best regards,
>>>>>>> Anton
>>>>>>> 
>>>>>>> [1] https://cwiki.apache.org/confluence/x/eZGzBQ
>>>>>>> [2]
>>>>>>> 
>>>> https://cwiki.apache.org/confluence/display/MXNET/Project+Proposals+f
>>>>>>> or+next+MXNet+Release [3]
>>>>>>> https://github.com/apache/incubator-mxnet/issues/11849
>>>>>>> [4]
>>>>>>> 
>>>>> https://cwiki.apache.org/confluence/display/MXNET/MXNet+Graph+Optimiz
>>>>>>> ation+and+Quantization+based+on+subgraph+and+MKL-DNN
>>>>>>> [5] https://github.com/apache/incubator-mxnet/issues/12713
>>>>>>> [6]
>>>>>>> https://github.com/apache/incubator-
>>>>> mxnet/issues/12713#issuecomment-4
>>>>>>> 35773777 [7] https://github.com/apache/incubator-mxnet/pull/11005
>>>>>>> 
>>>>>>> 
>>>> 
>>>

Re: [Announce] Upcoming Apache MXNet (incubating) 1.3.1 patch release

Posted by Anton Chernov <me...@gmail.com>.

The following PR's have been created so far:

Infer dtype in SymbolBlock import from input symbol (v1.3.x)
https://github.com/apache/incubator-mxnet/pull/13117

[MXNET-953] Fix oob memory read (v1.3.x)
https://github.com/apache/incubator-mxnet/pull/13118

[MXNET-969] Fix buffer overflow in RNNOp (v1.3.x)
https://github.com/apache/incubator-mxnet/pull/13119

[MXNET-922] Fix memleak in profiler (v1.3.x)
https://github.com/apache/incubator-mxnet/pull/13120

Set correct update on kvstore flag in dist_device_sync mode (v1.3.x)
https://github.com/apache/incubator-mxnet/pull/13121

update mshadow (v1.3.x)
https://github.com/apache/incubator-mxnet/pull/13122

CudnnFind() usage improvements (v1.3.x)
https://github.com/apache/incubator-mxnet/pull/13123

Fix lazy record io when used with dataloader and multi_worker > 0 (v1.3.x)
https://github.com/apache/incubator-mxnet/pull/13124


As stated previously I would be rather opposed to have following PR's it in
the patch release:

Gluon LSTM Projection and Clipping Support (#13055) v1.3.x
https://github.com/apache/incubator-mxnet/pull/13129

sample_like operators (#13034) v1.3.x
https://github.com/apache/incubator-mxnet/pull/13130


Best
Anton

вт, 6 нояб. 2018 г. в 16:06, Anton Chernov <me...@gmail.com>:

> Hi Haibin,
>
> I have a few comments regarding the proposed performance improvement
> changes.
>
> CUDNN support for LSTM with projection & clipping
> https://github.com/apache/incubator-mxnet/pull/13056
>
> There is no doubt that this change brings value, but I don't see it as a
> critical bug fix. I would rather leave it for the next major release.
>
> sample_like operators
> https://github.com/apache/incubator-mxnet/pull/13034
>
> Even if it's related to performance, this is an addition of functionality
> and I would also push this to be in the next major release only.
>
>
> Best
> Anton
>
>
> вт, 6 нояб. 2018 г. в 15:55, Anton Chernov <me...@gmail.com>:
>
>> Hi Patric,
>>
>> This change was listed in the 'PR candidates suggested for consideration
>> for v1.3.1 patch release' section [1].
>>
>> You are right, I also think that this is not a critical hotfix change
>> that should be included into the 1.3.1 patch release.
>>
>> Thus I'm not making any further efforts to bring it in.
>>
>> Best
>> Anton
>>
>> [1]
>> https://cwiki.apache.org/confluence/display/MXNET/Project+Proposals+for+next+MXNet+Release#PR_candidates
>>
>>
>> вт, 6 нояб. 2018 г. в 1:14, Zhao, Patric <pa...@intel.com>:
>>
>>> Hi Anton,
>>>
>>> Thanks for looking into the MKL-DNN PR.
>>>
>>> As my understanding of cwiki (
>>> https://cwiki.apache.org/confluence/display/MXNET/Project+Proposals+for+next+MXNet+Release
>>> ),
>>> these features will go into 1.4 rather than patch release of 1.3.1.
>>>
>>> Feel free to correct me :)
>>>
>>> Thanks,
>>>
>>> --Patric
>>>
>>> > -----Original Message-----
>>> > From: Anton Chernov [mailto:mechernov@gmail.com]
>>> > Sent: Tuesday, November 6, 2018 3:11 AM
>>> > To: dev@mxnet.apache.org
>>> > Subject: Re: [Announce] Upcoming Apache MXNet (incubating) 1.3.1 patch
>>> > release
>>> >
>>> > It seems that there is a problem porting following changes to the
>>> v1.3.x
>>> > release branch:
>>> >
>>> > Implement mkldnn convolution fusion and quantization
>>> > https://github.com/apache/incubator-mxnet/pull/12530
>>> >
>>> > MKL-DNN Quantization Examples and README
>>> > https://github.com/apache/incubator-mxnet/pull/12808
>>> >
>>> > The bases are different.
>>> >
>>> > I would need help from authors of these changes to make a backport PR.
>>> >
>>> > @ZhennanQin, @xinyu-intel would you be able to assist me and create the
>>> > corresponding PR's?
>>> >
>>> > Without proper history and domain knowledge I would not be able to
>>> create
>>> > them by my own in reasonable amount of time, I'm afraid.
>>> >
>>> > Best regards,
>>> > Anton
>>> >
>>> > пн, 5 нояб. 2018 г. в 19:45, Anton Chernov <me...@gmail.com>:
>>> >
>>> > >
>>> > > As part of:
>>> > >
>>> > > Implement mkldnn convolution fusion and quantization
>>> > > https://github.com/apache/incubator-mxnet/pull/12530
>>> > >
>>> > > I propose to add the examples and documentation PR as well:
>>> > >
>>> > > MKL-DNN Quantization Examples and README
>>> > > https://github.com/apache/incubator-mxnet/pull/12808
>>> > >
>>> > >
>>> > > Best regards,
>>> > > Anton
>>> > >
>>> > > пн, 5 нояб. 2018 г. в 19:02, Anton Chernov <me...@gmail.com>:
>>> > >
>>> > >> Dear MXNet community,
>>> > >>
>>> > >> I will be the release manager for the upcoming 1.3.1 patch release.
>>> > >> Naveen will be co-managing the release and providing help from the
>>> > >> committers side.
>>> > >>
>>> > >> The following dates have been set:
>>> > >>
>>> > >> Code Freeze: 31st October 2018
>>> > >> Release published: 13th November 2018
>>> > >>
>>> > >> Release notes have been drafted here [1].
>>> > >>
>>> > >>
>>> > >> * Known issues
>>> > >>
>>> > >> Update MKL-DNN dependency
>>> > >> https://github.com/apache/incubator-mxnet/pull/12953
>>> > >>
>>> > >> This PR hasn't been merged even to master yet. Requires additional
>>> > >> discussion and merge.
>>> > >>
>>> > >> distributed kvstore bug in MXNet
>>> > >> https://github.com/apache/incubator-mxnet/issues/12713
>>> > >>
>>> > >> > When distributed kvstore is used, by default gluon.Trainer doesn't
>>> > >> > work
>>> > >> with mx.optimizer.LRScheduler if a worker has more than 1 GPU. To be
>>> > >> more specific, the trainer updates once per GPU, the LRScheduler
>>> > >> object is shared across GPUs and get a wrong update count.
>>> > >>
>>> > >> This needs to be fixed. [6]
>>> > >>
>>> > >>
>>> > >> * Changes
>>> > >>
>>> > >> The following changes will be ported to the release branch, per [2]:
>>> > >>
>>> > >> Infer dtype in SymbolBlock import from input symbol [3]
>>> > >> https://github.com/apache/incubator-mxnet/pull/12412
>>> > >>
>>> > >> [MXNET-953] Fix oob memory read
>>> > >> https://github.com/apache/incubator-mxnet/pull/12631
>>> > >>
>>> > >> [MXNET-969] Fix buffer overflow in RNNOp
>>> > >> https://github.com/apache/incubator-mxnet/pull/12603
>>> > >>
>>> > >> [MXNET-922] Fix memleak in profiler
>>> > >> https://github.com/apache/incubator-mxnet/pull/12499
>>> > >>
>>> > >> Implement mkldnn convolution fusion and quantization (MXNet Graph
>>> > >> Optimization and Quantization based on subgraph and MKL-DNN
>>> > proposal
>>> > >> [4])
>>> > >> https://github.com/apache/incubator-mxnet/pull/12530
>>> > >>
>>> > >> Following items (test cases) should be already part of 1.3.0:
>>> > >>
>>> > >> [MXNET-486] Create CPP test for concat MKLDNN operator
>>> > >> https://github.com/apache/incubator-mxnet/pull/11371
>>> > >>
>>> > >> [MXNET-489] MKLDNN Pool test
>>> > >> https://github.com/apache/incubator-mxnet/pull/11608
>>> > >>
>>> > >> [MXNET-484] MKLDNN C++ test for LRN operator
>>> > >> https://github.com/apache/incubator-mxnet/pull/11831
>>> > >>
>>> > >> [MXNET-546] Add unit test for MKLDNNSum
>>> > >> https://github.com/apache/incubator-mxnet/pull/11272
>>> > >>
>>> > >> [MXNET-498] Test MKLDNN backward operators
>>> > >> https://github.com/apache/incubator-mxnet/pull/11232
>>> > >>
>>> > >> [MXNET-500] Test cases improvement for MKLDNN on Gluon
>>> > >> https://github.com/apache/incubator-mxnet/pull/10921
>>> > >>
>>> > >> Set correct update on kvstore flag in dist_device_sync mode (as part
>>> > >> of fixing [5])
>>> > >> https://github.com/apache/incubator-mxnet/pull/12786
>>> > >>
>>> > >> upgrade mshadow version
>>> > >> https://github.com/apache/incubator-mxnet/pull/12692
>>> > >> But another PR will be used instead:
>>> > >> update mshadow
>>> > >> https://github.com/apache/incubator-mxnet/pull/12674
>>> > >>
>>> > >> CudnnFind() usage improvements
>>> > >> https://github.com/apache/incubator-mxnet/pull/12804
>>> > >> A critical CUDNN fix that reduces GPU memory consumption and
>>> > >> addresses this memory leak issue. This is an important fix to
>>> include
>>> > >> in 1.3.1
>>> > >>
>>> > >>
>>> > >> From discussion about gluon toolkits:
>>> > >>
>>> > >> disable opencv threading for forked process
>>> > >> https://github.com/apache/incubator-mxnet/pull/12025
>>> > >>
>>> > >> Fix lazy record io when used with dataloader and multi_worker > 0
>>> > >> https://github.com/apache/incubator-mxnet/pull/12554
>>> > >>
>>> > >> fix potential floating number overflow, enable float16
>>> > >> https://github.com/apache/incubator-mxnet/pull/12118
>>> > >>
>>> > >>
>>> > >>
>>> > >> * Resolved issues
>>> > >>
>>> > >> MxNet 1.2.1–module get_outputs()
>>> > >> https://discuss.mxnet.io/t/mxnet-1-2-1-module-get-outputs/1882
>>> > >>
>>> > >> As far as I can see from the comments the issue has been resolved,
>>> no
>>> > >> actions need to be taken for this release. [7] is mentioned in this
>>> > >> regards, but I don't see any action points here either.
>>> > >>
>>> > >>
>>> > >> I will start with help of Naveen port the mentioned PR's to the
>>> 1.3.x
>>> > >> branch.
>>> > >>
>>> > >>
>>> > >> Best regards,
>>> > >> Anton
>>> > >>
>>> > >> [1] https://cwiki.apache.org/confluence/x/eZGzBQ
>>> > >> [2]
>>> > >>
>>> https://cwiki.apache.org/confluence/display/MXNET/Project+Proposals+f
>>> > >> or+next+MXNet+Release [3]
>>> > >> https://github.com/apache/incubator-mxnet/issues/11849
>>> > >> [4]
>>> > >>
>>> > https://cwiki.apache.org/confluence/display/MXNET/MXNet+Graph+Optimiz
>>> > >> ation+and+Quantization+based+on+subgraph+and+MKL-DNN
>>> > >> [5] https://github.com/apache/incubator-mxnet/issues/12713
>>> > >> [6]
>>> > >> https://github.com/apache/incubator-
>>> > mxnet/issues/12713#issuecomment-4
>>> > >> 35773777 [7] https://github.com/apache/incubator-mxnet/pull/11005
>>> > >>
>>> > >>
>>>
>>

Re: [Announce] Upcoming Apache MXNet (incubating) 1.3.1 patch release

Posted by Anton Chernov <me...@gmail.com>.

Hi Haibin,

I have a few comments regarding the proposed performance improvement
changes.

CUDNN support for LSTM with projection & clipping
https://github.com/apache/incubator-mxnet/pull/13056

There is no doubt that this change brings value, but I don't see it as a
critical bug fix. I would rather leave it for the next major release.

sample_like operators
https://github.com/apache/incubator-mxnet/pull/13034

Even if it's related to performance, this is an addition of functionality
and I would also push this to be in the next major release only.


Best
Anton


вт, 6 нояб. 2018 г. в 15:55, Anton Chernov <me...@gmail.com>:

> Hi Patric,
>
> This change was listed in the 'PR candidates suggested for consideration
> for v1.3.1 patch release' section [1].
>
> You are right, I also think that this is not a critical hotfix change that
> should be included into the 1.3.1 patch release.
>
> Thus I'm not making any further efforts to bring it in.
>
> Best
> Anton
>
> [1]
> https://cwiki.apache.org/confluence/display/MXNET/Project+Proposals+for+next+MXNet+Release#PR_candidates
>
>
> вт, 6 нояб. 2018 г. в 1:14, Zhao, Patric <pa...@intel.com>:
>
>> Hi Anton,
>>
>> Thanks for looking into the MKL-DNN PR.
>>
>> As my understanding of cwiki (
>> https://cwiki.apache.org/confluence/display/MXNET/Project+Proposals+for+next+MXNet+Release
>> ),
>> these features will go into 1.4 rather than patch release of 1.3.1.
>>
>> Feel free to correct me :)
>>
>> Thanks,
>>
>> --Patric
>>
>> > -----Original Message-----
>> > From: Anton Chernov [mailto:mechernov@gmail.com]
>> > Sent: Tuesday, November 6, 2018 3:11 AM
>> > To: dev@mxnet.apache.org
>> > Subject: Re: [Announce] Upcoming Apache MXNet (incubating) 1.3.1 patch
>> > release
>> >
>> > It seems that there is a problem porting following changes to the v1.3.x
>> > release branch:
>> >
>> > Implement mkldnn convolution fusion and quantization
>> > https://github.com/apache/incubator-mxnet/pull/12530
>> >
>> > MKL-DNN Quantization Examples and README
>> > https://github.com/apache/incubator-mxnet/pull/12808
>> >
>> > The bases are different.
>> >
>> > I would need help from authors of these changes to make a backport PR.
>> >
>> > @ZhennanQin, @xinyu-intel would you be able to assist me and create the
>> > corresponding PR's?
>> >
>> > Without proper history and domain knowledge I would not be able to
>> create
>> > them by my own in reasonable amount of time, I'm afraid.
>> >
>> > Best regards,
>> > Anton
>> >
>> > пн, 5 нояб. 2018 г. в 19:45, Anton Chernov <me...@gmail.com>:
>> >
>> > >
>> > > As part of:
>> > >
>> > > Implement mkldnn convolution fusion and quantization
>> > > https://github.com/apache/incubator-mxnet/pull/12530
>> > >
>> > > I propose to add the examples and documentation PR as well:
>> > >
>> > > MKL-DNN Quantization Examples and README
>> > > https://github.com/apache/incubator-mxnet/pull/12808
>> > >
>> > >
>> > > Best regards,
>> > > Anton
>> > >
>> > > пн, 5 нояб. 2018 г. в 19:02, Anton Chernov <me...@gmail.com>:
>> > >
>> > >> Dear MXNet community,
>> > >>
>> > >> I will be the release manager for the upcoming 1.3.1 patch release.
>> > >> Naveen will be co-managing the release and providing help from the
>> > >> committers side.
>> > >>
>> > >> The following dates have been set:
>> > >>
>> > >> Code Freeze: 31st October 2018
>> > >> Release published: 13th November 2018
>> > >>
>> > >> Release notes have been drafted here [1].
>> > >>
>> > >>
>> > >> * Known issues
>> > >>
>> > >> Update MKL-DNN dependency
>> > >> https://github.com/apache/incubator-mxnet/pull/12953
>> > >>
>> > >> This PR hasn't been merged even to master yet. Requires additional
>> > >> discussion and merge.
>> > >>
>> > >> distributed kvstore bug in MXNet
>> > >> https://github.com/apache/incubator-mxnet/issues/12713
>> > >>
>> > >> > When distributed kvstore is used, by default gluon.Trainer doesn't
>> > >> > work
>> > >> with mx.optimizer.LRScheduler if a worker has more than 1 GPU. To be
>> > >> more specific, the trainer updates once per GPU, the LRScheduler
>> > >> object is shared across GPUs and get a wrong update count.
>> > >>
>> > >> This needs to be fixed. [6]
>> > >>
>> > >>
>> > >> * Changes
>> > >>
>> > >> The following changes will be ported to the release branch, per [2]:
>> > >>
>> > >> Infer dtype in SymbolBlock import from input symbol [3]
>> > >> https://github.com/apache/incubator-mxnet/pull/12412
>> > >>
>> > >> [MXNET-953] Fix oob memory read
>> > >> https://github.com/apache/incubator-mxnet/pull/12631
>> > >>
>> > >> [MXNET-969] Fix buffer overflow in RNNOp
>> > >> https://github.com/apache/incubator-mxnet/pull/12603
>> > >>
>> > >> [MXNET-922] Fix memleak in profiler
>> > >> https://github.com/apache/incubator-mxnet/pull/12499
>> > >>
>> > >> Implement mkldnn convolution fusion and quantization (MXNet Graph
>> > >> Optimization and Quantization based on subgraph and MKL-DNN
>> > proposal
>> > >> [4])
>> > >> https://github.com/apache/incubator-mxnet/pull/12530
>> > >>
>> > >> Following items (test cases) should be already part of 1.3.0:
>> > >>
>> > >> [MXNET-486] Create CPP test for concat MKLDNN operator
>> > >> https://github.com/apache/incubator-mxnet/pull/11371
>> > >>
>> > >> [MXNET-489] MKLDNN Pool test
>> > >> https://github.com/apache/incubator-mxnet/pull/11608
>> > >>
>> > >> [MXNET-484] MKLDNN C++ test for LRN operator
>> > >> https://github.com/apache/incubator-mxnet/pull/11831
>> > >>
>> > >> [MXNET-546] Add unit test for MKLDNNSum
>> > >> https://github.com/apache/incubator-mxnet/pull/11272
>> > >>
>> > >> [MXNET-498] Test MKLDNN backward operators
>> > >> https://github.com/apache/incubator-mxnet/pull/11232
>> > >>
>> > >> [MXNET-500] Test cases improvement for MKLDNN on Gluon
>> > >> https://github.com/apache/incubator-mxnet/pull/10921
>> > >>
>> > >> Set correct update on kvstore flag in dist_device_sync mode (as part
>> > >> of fixing [5])
>> > >> https://github.com/apache/incubator-mxnet/pull/12786
>> > >>
>> > >> upgrade mshadow version
>> > >> https://github.com/apache/incubator-mxnet/pull/12692
>> > >> But another PR will be used instead:
>> > >> update mshadow
>> > >> https://github.com/apache/incubator-mxnet/pull/12674
>> > >>
>> > >> CudnnFind() usage improvements
>> > >> https://github.com/apache/incubator-mxnet/pull/12804
>> > >> A critical CUDNN fix that reduces GPU memory consumption and
>> > >> addresses this memory leak issue. This is an important fix to include
>> > >> in 1.3.1
>> > >>
>> > >>
>> > >> From discussion about gluon toolkits:
>> > >>
>> > >> disable opencv threading for forked process
>> > >> https://github.com/apache/incubator-mxnet/pull/12025
>> > >>
>> > >> Fix lazy record io when used with dataloader and multi_worker > 0
>> > >> https://github.com/apache/incubator-mxnet/pull/12554
>> > >>
>> > >> fix potential floating number overflow, enable float16
>> > >> https://github.com/apache/incubator-mxnet/pull/12118
>> > >>
>> > >>
>> > >>
>> > >> * Resolved issues
>> > >>
>> > >> MxNet 1.2.1–module get_outputs()
>> > >> https://discuss.mxnet.io/t/mxnet-1-2-1-module-get-outputs/1882
>> > >>
>> > >> As far as I can see from the comments the issue has been resolved, no
>> > >> actions need to be taken for this release. [7] is mentioned in this
>> > >> regards, but I don't see any action points here either.
>> > >>
>> > >>
>> > >> I will start with help of Naveen port the mentioned PR's to the 1.3.x
>> > >> branch.
>> > >>
>> > >>
>> > >> Best regards,
>> > >> Anton
>> > >>
>> > >> [1] https://cwiki.apache.org/confluence/x/eZGzBQ
>> > >> [2]
>> > >>
>> https://cwiki.apache.org/confluence/display/MXNET/Project+Proposals+f
>> > >> or+next+MXNet+Release [3]
>> > >> https://github.com/apache/incubator-mxnet/issues/11849
>> > >> [4]
>> > >>
>> > https://cwiki.apache.org/confluence/display/MXNET/MXNet+Graph+Optimiz
>> > >> ation+and+Quantization+based+on+subgraph+and+MKL-DNN
>> > >> [5] https://github.com/apache/incubator-mxnet/issues/12713
>> > >> [6]
>> > >> https://github.com/apache/incubator-
>> > mxnet/issues/12713#issuecomment-4
>> > >> 35773777 [7] https://github.com/apache/incubator-mxnet/pull/11005
>> > >>
>> > >>
>>
>

Re: [Announce] Upcoming Apache MXNet (incubating) 1.3.1 patch release

Posted by Anton Chernov <me...@gmail.com>.

Hi Patric,

This change was listed in the 'PR candidates suggested for consideration
for v1.3.1 patch release' section [1].

You are right, I also think that this is not a critical hotfix change that
should be included into the 1.3.1 patch release.

Thus I'm not making any further efforts to bring it in.

Best
Anton

[1]
https://cwiki.apache.org/confluence/display/MXNET/Project+Proposals+for+next+MXNet+Release#PR_candidates


вт, 6 нояб. 2018 г. в 1:14, Zhao, Patric <pa...@intel.com>:

> Hi Anton,
>
> Thanks for looking into the MKL-DNN PR.
>
> As my understanding of cwiki (
> https://cwiki.apache.org/confluence/display/MXNET/Project+Proposals+for+next+MXNet+Release
> ),
> these features will go into 1.4 rather than patch release of 1.3.1.
>
> Feel free to correct me :)
>
> Thanks,
>
> --Patric
>
> > -----Original Message-----
> > From: Anton Chernov [mailto:mechernov@gmail.com]
> > Sent: Tuesday, November 6, 2018 3:11 AM
> > To: dev@mxnet.apache.org
> > Subject: Re: [Announce] Upcoming Apache MXNet (incubating) 1.3.1 patch
> > release
> >
> > It seems that there is a problem porting following changes to the v1.3.x
> > release branch:
> >
> > Implement mkldnn convolution fusion and quantization
> > https://github.com/apache/incubator-mxnet/pull/12530
> >
> > MKL-DNN Quantization Examples and README
> > https://github.com/apache/incubator-mxnet/pull/12808
> >
> > The bases are different.
> >
> > I would need help from authors of these changes to make a backport PR.
> >
> > @ZhennanQin, @xinyu-intel would you be able to assist me and create the
> > corresponding PR's?
> >
> > Without proper history and domain knowledge I would not be able to create
> > them by my own in reasonable amount of time, I'm afraid.
> >
> > Best regards,
> > Anton
> >
> > пн, 5 нояб. 2018 г. в 19:45, Anton Chernov <me...@gmail.com>:
> >
> > >
> > > As part of:
> > >
> > > Implement mkldnn convolution fusion and quantization
> > > https://github.com/apache/incubator-mxnet/pull/12530
> > >
> > > I propose to add the examples and documentation PR as well:
> > >
> > > MKL-DNN Quantization Examples and README
> > > https://github.com/apache/incubator-mxnet/pull/12808
> > >
> > >
> > > Best regards,
> > > Anton
> > >
> > > пн, 5 нояб. 2018 г. в 19:02, Anton Chernov <me...@gmail.com>:
> > >
> > >> Dear MXNet community,
> > >>
> > >> I will be the release manager for the upcoming 1.3.1 patch release.
> > >> Naveen will be co-managing the release and providing help from the
> > >> committers side.
> > >>
> > >> The following dates have been set:
> > >>
> > >> Code Freeze: 31st October 2018
> > >> Release published: 13th November 2018
> > >>
> > >> Release notes have been drafted here [1].
> > >>
> > >>
> > >> * Known issues
> > >>
> > >> Update MKL-DNN dependency
> > >> https://github.com/apache/incubator-mxnet/pull/12953
> > >>
> > >> This PR hasn't been merged even to master yet. Requires additional
> > >> discussion and merge.
> > >>
> > >> distributed kvstore bug in MXNet
> > >> https://github.com/apache/incubator-mxnet/issues/12713
> > >>
> > >> > When distributed kvstore is used, by default gluon.Trainer doesn't
> > >> > work
> > >> with mx.optimizer.LRScheduler if a worker has more than 1 GPU. To be
> > >> more specific, the trainer updates once per GPU, the LRScheduler
> > >> object is shared across GPUs and get a wrong update count.
> > >>
> > >> This needs to be fixed. [6]
> > >>
> > >>
> > >> * Changes
> > >>
> > >> The following changes will be ported to the release branch, per [2]:
> > >>
> > >> Infer dtype in SymbolBlock import from input symbol [3]
> > >> https://github.com/apache/incubator-mxnet/pull/12412
> > >>
> > >> [MXNET-953] Fix oob memory read
> > >> https://github.com/apache/incubator-mxnet/pull/12631
> > >>
> > >> [MXNET-969] Fix buffer overflow in RNNOp
> > >> https://github.com/apache/incubator-mxnet/pull/12603
> > >>
> > >> [MXNET-922] Fix memleak in profiler
> > >> https://github.com/apache/incubator-mxnet/pull/12499
> > >>
> > >> Implement mkldnn convolution fusion and quantization (MXNet Graph
> > >> Optimization and Quantization based on subgraph and MKL-DNN
> > proposal
> > >> [4])
> > >> https://github.com/apache/incubator-mxnet/pull/12530
> > >>
> > >> Following items (test cases) should be already part of 1.3.0:
> > >>
> > >> [MXNET-486] Create CPP test for concat MKLDNN operator
> > >> https://github.com/apache/incubator-mxnet/pull/11371
> > >>
> > >> [MXNET-489] MKLDNN Pool test
> > >> https://github.com/apache/incubator-mxnet/pull/11608
> > >>
> > >> [MXNET-484] MKLDNN C++ test for LRN operator
> > >> https://github.com/apache/incubator-mxnet/pull/11831
> > >>
> > >> [MXNET-546] Add unit test for MKLDNNSum
> > >> https://github.com/apache/incubator-mxnet/pull/11272
> > >>
> > >> [MXNET-498] Test MKLDNN backward operators
> > >> https://github.com/apache/incubator-mxnet/pull/11232
> > >>
> > >> [MXNET-500] Test cases improvement for MKLDNN on Gluon
> > >> https://github.com/apache/incubator-mxnet/pull/10921
> > >>
> > >> Set correct update on kvstore flag in dist_device_sync mode (as part
> > >> of fixing [5])
> > >> https://github.com/apache/incubator-mxnet/pull/12786
> > >>
> > >> upgrade mshadow version
> > >> https://github.com/apache/incubator-mxnet/pull/12692
> > >> But another PR will be used instead:
> > >> update mshadow
> > >> https://github.com/apache/incubator-mxnet/pull/12674
> > >>
> > >> CudnnFind() usage improvements
> > >> https://github.com/apache/incubator-mxnet/pull/12804
> > >> A critical CUDNN fix that reduces GPU memory consumption and
> > >> addresses this memory leak issue. This is an important fix to include
> > >> in 1.3.1
> > >>
> > >>
> > >> From discussion about gluon toolkits:
> > >>
> > >> disable opencv threading for forked process
> > >> https://github.com/apache/incubator-mxnet/pull/12025
> > >>
> > >> Fix lazy record io when used with dataloader and multi_worker > 0
> > >> https://github.com/apache/incubator-mxnet/pull/12554
> > >>
> > >> fix potential floating number overflow, enable float16
> > >> https://github.com/apache/incubator-mxnet/pull/12118
> > >>
> > >>
> > >>
> > >> * Resolved issues
> > >>
> > >> MxNet 1.2.1–module get_outputs()
> > >> https://discuss.mxnet.io/t/mxnet-1-2-1-module-get-outputs/1882
> > >>
> > >> As far as I can see from the comments the issue has been resolved, no
> > >> actions need to be taken for this release. [7] is mentioned in this
> > >> regards, but I don't see any action points here either.
> > >>
> > >>
> > >> I will start with help of Naveen port the mentioned PR's to the 1.3.x
> > >> branch.
> > >>
> > >>
> > >> Best regards,
> > >> Anton
> > >>
> > >> [1] https://cwiki.apache.org/confluence/x/eZGzBQ
> > >> [2]
> > >> https://cwiki.apache.org/confluence/display/MXNET/Project+Proposals+f
> > >> or+next+MXNet+Release [3]
> > >> https://github.com/apache/incubator-mxnet/issues/11849
> > >> [4]
> > >>
> > https://cwiki.apache.org/confluence/display/MXNET/MXNet+Graph+Optimiz
> > >> ation+and+Quantization+based+on+subgraph+and+MKL-DNN
> > >> [5] https://github.com/apache/incubator-mxnet/issues/12713
> > >> [6]
> > >> https://github.com/apache/incubator-
> > mxnet/issues/12713#issuecomment-4
> > >> 35773777 [7] https://github.com/apache/incubator-mxnet/pull/11005
> > >>
> > >>
>

RE: [Announce] Upcoming Apache MXNet (incubating) 1.3.1 patch release

Posted by "Zhao, Patric" <pa...@intel.com>.

Hi Anton,

Thanks for looking into the MKL-DNN PR.

As my understanding of cwiki (https://cwiki.apache.org/confluence/display/MXNET/Project+Proposals+for+next+MXNet+Release),
these features will go into 1.4 rather than patch release of 1.3.1.

Feel free to correct me :)

Thanks,

--Patric

> -----Original Message-----
> From: Anton Chernov [mailto:mechernov@gmail.com]
> Sent: Tuesday, November 6, 2018 3:11 AM
> To: dev@mxnet.apache.org
> Subject: Re: [Announce] Upcoming Apache MXNet (incubating) 1.3.1 patch
> release
> 
> It seems that there is a problem porting following changes to the v1.3.x
> release branch:
> 
> Implement mkldnn convolution fusion and quantization
> https://github.com/apache/incubator-mxnet/pull/12530
> 
> MKL-DNN Quantization Examples and README
> https://github.com/apache/incubator-mxnet/pull/12808
> 
> The bases are different.
> 
> I would need help from authors of these changes to make a backport PR.
> 
> @ZhennanQin, @xinyu-intel would you be able to assist me and create the
> corresponding PR's?
> 
> Without proper history and domain knowledge I would not be able to create
> them by my own in reasonable amount of time, I'm afraid.
> 
> Best regards,
> Anton
> 
> пн, 5 нояб. 2018 г. в 19:45, Anton Chernov <me...@gmail.com>:
> 
> >
> > As part of:
> >
> > Implement mkldnn convolution fusion and quantization
> > https://github.com/apache/incubator-mxnet/pull/12530
> >
> > I propose to add the examples and documentation PR as well:
> >
> > MKL-DNN Quantization Examples and README
> > https://github.com/apache/incubator-mxnet/pull/12808
> >
> >
> > Best regards,
> > Anton
> >
> > пн, 5 нояб. 2018 г. в 19:02, Anton Chernov <me...@gmail.com>:
> >
> >> Dear MXNet community,
> >>
> >> I will be the release manager for the upcoming 1.3.1 patch release.
> >> Naveen will be co-managing the release and providing help from the
> >> committers side.
> >>
> >> The following dates have been set:
> >>
> >> Code Freeze: 31st October 2018
> >> Release published: 13th November 2018
> >>
> >> Release notes have been drafted here [1].
> >>
> >>
> >> * Known issues
> >>
> >> Update MKL-DNN dependency
> >> https://github.com/apache/incubator-mxnet/pull/12953
> >>
> >> This PR hasn't been merged even to master yet. Requires additional
> >> discussion and merge.
> >>
> >> distributed kvstore bug in MXNet
> >> https://github.com/apache/incubator-mxnet/issues/12713
> >>
> >> > When distributed kvstore is used, by default gluon.Trainer doesn't
> >> > work
> >> with mx.optimizer.LRScheduler if a worker has more than 1 GPU. To be
> >> more specific, the trainer updates once per GPU, the LRScheduler
> >> object is shared across GPUs and get a wrong update count.
> >>
> >> This needs to be fixed. [6]
> >>
> >>
> >> * Changes
> >>
> >> The following changes will be ported to the release branch, per [2]:
> >>
> >> Infer dtype in SymbolBlock import from input symbol [3]
> >> https://github.com/apache/incubator-mxnet/pull/12412
> >>
> >> [MXNET-953] Fix oob memory read
> >> https://github.com/apache/incubator-mxnet/pull/12631
> >>
> >> [MXNET-969] Fix buffer overflow in RNNOp
> >> https://github.com/apache/incubator-mxnet/pull/12603
> >>
> >> [MXNET-922] Fix memleak in profiler
> >> https://github.com/apache/incubator-mxnet/pull/12499
> >>
> >> Implement mkldnn convolution fusion and quantization (MXNet Graph
> >> Optimization and Quantization based on subgraph and MKL-DNN
> proposal
> >> [4])
> >> https://github.com/apache/incubator-mxnet/pull/12530
> >>
> >> Following items (test cases) should be already part of 1.3.0:
> >>
> >> [MXNET-486] Create CPP test for concat MKLDNN operator
> >> https://github.com/apache/incubator-mxnet/pull/11371
> >>
> >> [MXNET-489] MKLDNN Pool test
> >> https://github.com/apache/incubator-mxnet/pull/11608
> >>
> >> [MXNET-484] MKLDNN C++ test for LRN operator
> >> https://github.com/apache/incubator-mxnet/pull/11831
> >>
> >> [MXNET-546] Add unit test for MKLDNNSum
> >> https://github.com/apache/incubator-mxnet/pull/11272
> >>
> >> [MXNET-498] Test MKLDNN backward operators
> >> https://github.com/apache/incubator-mxnet/pull/11232
> >>
> >> [MXNET-500] Test cases improvement for MKLDNN on Gluon
> >> https://github.com/apache/incubator-mxnet/pull/10921
> >>
> >> Set correct update on kvstore flag in dist_device_sync mode (as part
> >> of fixing [5])
> >> https://github.com/apache/incubator-mxnet/pull/12786
> >>
> >> upgrade mshadow version
> >> https://github.com/apache/incubator-mxnet/pull/12692
> >> But another PR will be used instead:
> >> update mshadow
> >> https://github.com/apache/incubator-mxnet/pull/12674
> >>
> >> CudnnFind() usage improvements
> >> https://github.com/apache/incubator-mxnet/pull/12804
> >> A critical CUDNN fix that reduces GPU memory consumption and
> >> addresses this memory leak issue. This is an important fix to include
> >> in 1.3.1
> >>
> >>
> >> From discussion about gluon toolkits:
> >>
> >> disable opencv threading for forked process
> >> https://github.com/apache/incubator-mxnet/pull/12025
> >>
> >> Fix lazy record io when used with dataloader and multi_worker > 0
> >> https://github.com/apache/incubator-mxnet/pull/12554
> >>
> >> fix potential floating number overflow, enable float16
> >> https://github.com/apache/incubator-mxnet/pull/12118
> >>
> >>
> >>
> >> * Resolved issues
> >>
> >> MxNet 1.2.1–module get_outputs()
> >> https://discuss.mxnet.io/t/mxnet-1-2-1-module-get-outputs/1882
> >>
> >> As far as I can see from the comments the issue has been resolved, no
> >> actions need to be taken for this release. [7] is mentioned in this
> >> regards, but I don't see any action points here either.
> >>
> >>
> >> I will start with help of Naveen port the mentioned PR's to the 1.3.x
> >> branch.
> >>
> >>
> >> Best regards,
> >> Anton
> >>
> >> [1] https://cwiki.apache.org/confluence/x/eZGzBQ
> >> [2]
> >> https://cwiki.apache.org/confluence/display/MXNET/Project+Proposals+f
> >> or+next+MXNet+Release [3]
> >> https://github.com/apache/incubator-mxnet/issues/11849
> >> [4]
> >>
> https://cwiki.apache.org/confluence/display/MXNET/MXNet+Graph+Optimiz
> >> ation+and+Quantization+based+on+subgraph+and+MKL-DNN
> >> [5] https://github.com/apache/incubator-mxnet/issues/12713
> >> [6]
> >> https://github.com/apache/incubator-
> mxnet/issues/12713#issuecomment-4
> >> 35773777 [7] https://github.com/apache/incubator-mxnet/pull/11005
> >>
> >>

Re: [Announce] Upcoming Apache MXNet (incubating) 1.3.1 patch release

Posted by Haibin Lin <ha...@gmail.com>.

Hi Anton,

Thanks for driving the patch release. Besides the MKL improvements, I
suggest we include two changes for *performance improvement* for NLP tasks
below:

CUDNN support for LSTM with projection & clipping:
- https://github.com/apache/incubator-mxnet/pull/13056
- It is used in state of the art language models such as BIG-LSTM [1] and
Elmo (ACL 2018 best paper) [2]

sample_like operators:
- https://github.com/apache/incubator-mxnet/pull/13034
- Many models require candidate sampling (e.g. word2vec [3], fasttext [4])
for training. The sample_like operator enables drawing random samples
without shape information, therefore the candidate sampling blocks can now
be hybridized and be accelerated a lot.

If there is no concern I will open two PRs for the above two changes to
1.3.x branch. Thanks!

Best,
Haibin

[1] https://arxiv.org/pdf/1602.02410.pdf
[2] https://arxiv.org/pdf/1802.05365.pdf
[3]
https://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf

[4] https://arxiv.org/pdf/1607.01759.pdf

On Mon, Nov 5, 2018 at 11:11 AM Anton Chernov <me...@gmail.com> wrote:

> It seems that there is a problem porting following changes to the v1.3.x
> release branch:
>
> Implement mkldnn convolution fusion and quantization
> https://github.com/apache/incubator-mxnet/pull/12530
>
> MKL-DNN Quantization Examples and README
> https://github.com/apache/incubator-mxnet/pull/12808
>
> The bases are different.
>
> I would need help from authors of these changes to make a backport PR.
>
> @ZhennanQin, @xinyu-intel would you be able to assist me and create the
> corresponding PR's?
>
> Without proper history and domain knowledge I would not be able to create
> them by my own in reasonable amount of time, I'm afraid.
>
> Best regards,
> Anton
>
> пн, 5 нояб. 2018 г. в 19:45, Anton Chernov <me...@gmail.com>:
>
> >
> > As part of:
> >
> > Implement mkldnn convolution fusion and quantization
> > https://github.com/apache/incubator-mxnet/pull/12530
> >
> > I propose to add the examples and documentation PR as well:
> >
> > MKL-DNN Quantization Examples and README
> > https://github.com/apache/incubator-mxnet/pull/12808
> >
> >
> > Best regards,
> > Anton
> >
> > пн, 5 нояб. 2018 г. в 19:02, Anton Chernov <me...@gmail.com>:
> >
> >> Dear MXNet community,
> >>
> >> I will be the release manager for the upcoming 1.3.1 patch release.
> >> Naveen will be co-managing the release and providing help from the
> >> committers side.
> >>
> >> The following dates have been set:
> >>
> >> Code Freeze: 31st October 2018
> >> Release published: 13th November 2018
> >>
> >> Release notes have been drafted here [1].
> >>
> >>
> >> * Known issues
> >>
> >> Update MKL-DNN dependency
> >> https://github.com/apache/incubator-mxnet/pull/12953
> >>
> >> This PR hasn't been merged even to master yet. Requires additional
> >> discussion and merge.
> >>
> >> distributed kvstore bug in MXNet
> >> https://github.com/apache/incubator-mxnet/issues/12713
> >>
> >> > When distributed kvstore is used, by default gluon.Trainer doesn't
> work
> >> with mx.optimizer.LRScheduler if a worker has more than 1 GPU. To be
> more
> >> specific, the trainer updates once per GPU, the LRScheduler object is
> >> shared across GPUs and get a wrong update count.
> >>
> >> This needs to be fixed. [6]
> >>
> >>
> >> * Changes
> >>
> >> The following changes will be ported to the release branch, per [2]:
> >>
> >> Infer dtype in SymbolBlock import from input symbol [3]
> >> https://github.com/apache/incubator-mxnet/pull/12412
> >>
> >> [MXNET-953] Fix oob memory read
> >> https://github.com/apache/incubator-mxnet/pull/12631
> >>
> >> [MXNET-969] Fix buffer overflow in RNNOp
> >> https://github.com/apache/incubator-mxnet/pull/12603
> >>
> >> [MXNET-922] Fix memleak in profiler
> >> https://github.com/apache/incubator-mxnet/pull/12499
> >>
> >> Implement mkldnn convolution fusion and quantization (MXNet Graph
> >> Optimization and Quantization based on subgraph and MKL-DNN proposal
> [4])
> >> https://github.com/apache/incubator-mxnet/pull/12530
> >>
> >> Following items (test cases) should be already part of 1.3.0:
> >>
> >> [MXNET-486] Create CPP test for concat MKLDNN operator
> >> https://github.com/apache/incubator-mxnet/pull/11371
> >>
> >> [MXNET-489] MKLDNN Pool test
> >> https://github.com/apache/incubator-mxnet/pull/11608
> >>
> >> [MXNET-484] MKLDNN C++ test for LRN operator
> >> https://github.com/apache/incubator-mxnet/pull/11831
> >>
> >> [MXNET-546] Add unit test for MKLDNNSum
> >> https://github.com/apache/incubator-mxnet/pull/11272
> >>
> >> [MXNET-498] Test MKLDNN backward operators
> >> https://github.com/apache/incubator-mxnet/pull/11232
> >>
> >> [MXNET-500] Test cases improvement for MKLDNN on Gluon
> >> https://github.com/apache/incubator-mxnet/pull/10921
> >>
> >> Set correct update on kvstore flag in dist_device_sync mode (as part of
> >> fixing [5])
> >> https://github.com/apache/incubator-mxnet/pull/12786
> >>
> >> upgrade mshadow version
> >> https://github.com/apache/incubator-mxnet/pull/12692
> >> But another PR will be used instead:
> >> update mshadow
> >> https://github.com/apache/incubator-mxnet/pull/12674
> >>
> >> CudnnFind() usage improvements
> >> https://github.com/apache/incubator-mxnet/pull/12804
> >> A critical CUDNN fix that reduces GPU memory consumption and addresses
> >> this memory leak issue. This is an important fix to include in 1.3.1
> >>
> >>
> >> From discussion about gluon toolkits:
> >>
> >> disable opencv threading for forked process
> >> https://github.com/apache/incubator-mxnet/pull/12025
> >>
> >> Fix lazy record io when used with dataloader and multi_worker > 0
> >> https://github.com/apache/incubator-mxnet/pull/12554
> >>
> >> fix potential floating number overflow, enable float16
> >> https://github.com/apache/incubator-mxnet/pull/12118
> >>
> >>
> >>
> >> * Resolved issues
> >>
> >> MxNet 1.2.1–module get_outputs()
> >> https://discuss.mxnet.io/t/mxnet-1-2-1-module-get-outputs/1882
> >>
> >> As far as I can see from the comments the issue has been resolved, no
> >> actions need to be taken for this release. [7] is mentioned in this
> >> regards, but I don't see any action points here either.
> >>
> >>
> >> I will start with help of Naveen port the mentioned PR's to the 1.3.x
> >> branch.
> >>
> >>
> >> Best regards,
> >> Anton
> >>
> >> [1] https://cwiki.apache.org/confluence/x/eZGzBQ
> >> [2]
> >>
> https://cwiki.apache.org/confluence/display/MXNET/Project+Proposals+for+next+MXNet+Release
> >> [3] https://github.com/apache/incubator-mxnet/issues/11849
> >> [4]
> >>
> https://cwiki.apache.org/confluence/display/MXNET/MXNet+Graph+Optimization+and+Quantization+based+on+subgraph+and+MKL-DNN
> >> [5] https://github.com/apache/incubator-mxnet/issues/12713
> >> [6]
> >>
> https://github.com/apache/incubator-mxnet/issues/12713#issuecomment-435773777
> >> [7] https://github.com/apache/incubator-mxnet/pull/11005
> >>
> >>
>

Re: [Announce] Upcoming Apache MXNet (incubating) 1.3.1 patch release

Posted by Anton Chernov <me...@gmail.com>.

It seems that there is a problem porting following changes to the v1.3.x
release branch:

Implement mkldnn convolution fusion and quantization
https://github.com/apache/incubator-mxnet/pull/12530

MKL-DNN Quantization Examples and README
https://github.com/apache/incubator-mxnet/pull/12808

The bases are different.

I would need help from authors of these changes to make a backport PR.

@ZhennanQin, @xinyu-intel would you be able to assist me and create the
corresponding PR's?

Without proper history and domain knowledge I would not be able to create
them by my own in reasonable amount of time, I'm afraid.

Best regards,
Anton

пн, 5 нояб. 2018 г. в 19:45, Anton Chernov <me...@gmail.com>:

>
> As part of:
>
> Implement mkldnn convolution fusion and quantization
> https://github.com/apache/incubator-mxnet/pull/12530
>
> I propose to add the examples and documentation PR as well:
>
> MKL-DNN Quantization Examples and README
> https://github.com/apache/incubator-mxnet/pull/12808
>
>
> Best regards,
> Anton
>
> пн, 5 нояб. 2018 г. в 19:02, Anton Chernov <me...@gmail.com>:
>
>> Dear MXNet community,
>>
>> I will be the release manager for the upcoming 1.3.1 patch release.
>> Naveen will be co-managing the release and providing help from the
>> committers side.
>>
>> The following dates have been set:
>>
>> Code Freeze: 31st October 2018
>> Release published: 13th November 2018
>>
>> Release notes have been drafted here [1].
>>
>>
>> * Known issues
>>
>> Update MKL-DNN dependency
>> https://github.com/apache/incubator-mxnet/pull/12953
>>
>> This PR hasn't been merged even to master yet. Requires additional
>> discussion and merge.
>>
>> distributed kvstore bug in MXNet
>> https://github.com/apache/incubator-mxnet/issues/12713
>>
>> > When distributed kvstore is used, by default gluon.Trainer doesn't work
>> with mx.optimizer.LRScheduler if a worker has more than 1 GPU. To be more
>> specific, the trainer updates once per GPU, the LRScheduler object is
>> shared across GPUs and get a wrong update count.
>>
>> This needs to be fixed. [6]
>>
>>
>> * Changes
>>
>> The following changes will be ported to the release branch, per [2]:
>>
>> Infer dtype in SymbolBlock import from input symbol [3]
>> https://github.com/apache/incubator-mxnet/pull/12412
>>
>> [MXNET-953] Fix oob memory read
>> https://github.com/apache/incubator-mxnet/pull/12631
>>
>> [MXNET-969] Fix buffer overflow in RNNOp
>> https://github.com/apache/incubator-mxnet/pull/12603
>>
>> [MXNET-922] Fix memleak in profiler
>> https://github.com/apache/incubator-mxnet/pull/12499
>>
>> Implement mkldnn convolution fusion and quantization (MXNet Graph
>> Optimization and Quantization based on subgraph and MKL-DNN proposal [4])
>> https://github.com/apache/incubator-mxnet/pull/12530
>>
>> Following items (test cases) should be already part of 1.3.0:
>>
>> [MXNET-486] Create CPP test for concat MKLDNN operator
>> https://github.com/apache/incubator-mxnet/pull/11371
>>
>> [MXNET-489] MKLDNN Pool test
>> https://github.com/apache/incubator-mxnet/pull/11608
>>
>> [MXNET-484] MKLDNN C++ test for LRN operator
>> https://github.com/apache/incubator-mxnet/pull/11831
>>
>> [MXNET-546] Add unit test for MKLDNNSum
>> https://github.com/apache/incubator-mxnet/pull/11272
>>
>> [MXNET-498] Test MKLDNN backward operators
>> https://github.com/apache/incubator-mxnet/pull/11232
>>
>> [MXNET-500] Test cases improvement for MKLDNN on Gluon
>> https://github.com/apache/incubator-mxnet/pull/10921
>>
>> Set correct update on kvstore flag in dist_device_sync mode (as part of
>> fixing [5])
>> https://github.com/apache/incubator-mxnet/pull/12786
>>
>> upgrade mshadow version
>> https://github.com/apache/incubator-mxnet/pull/12692
>> But another PR will be used instead:
>> update mshadow
>> https://github.com/apache/incubator-mxnet/pull/12674
>>
>> CudnnFind() usage improvements
>> https://github.com/apache/incubator-mxnet/pull/12804
>> A critical CUDNN fix that reduces GPU memory consumption and addresses
>> this memory leak issue. This is an important fix to include in 1.3.1
>>
>>
>> From discussion about gluon toolkits:
>>
>> disable opencv threading for forked process
>> https://github.com/apache/incubator-mxnet/pull/12025
>>
>> Fix lazy record io when used with dataloader and multi_worker > 0
>> https://github.com/apache/incubator-mxnet/pull/12554
>>
>> fix potential floating number overflow, enable float16
>> https://github.com/apache/incubator-mxnet/pull/12118
>>
>>
>>
>> * Resolved issues
>>
>> MxNet 1.2.1–module get_outputs()
>> https://discuss.mxnet.io/t/mxnet-1-2-1-module-get-outputs/1882
>>
>> As far as I can see from the comments the issue has been resolved, no
>> actions need to be taken for this release. [7] is mentioned in this
>> regards, but I don't see any action points here either.
>>
>>
>> I will start with help of Naveen port the mentioned PR's to the 1.3.x
>> branch.
>>
>>
>> Best regards,
>> Anton
>>
>> [1] https://cwiki.apache.org/confluence/x/eZGzBQ
>> [2]
>> https://cwiki.apache.org/confluence/display/MXNET/Project+Proposals+for+next+MXNet+Release
>> [3] https://github.com/apache/incubator-mxnet/issues/11849
>> [4]
>> https://cwiki.apache.org/confluence/display/MXNET/MXNet+Graph+Optimization+and+Quantization+based+on+subgraph+and+MKL-DNN
>> [5] https://github.com/apache/incubator-mxnet/issues/12713
>> [6]
>> https://github.com/apache/incubator-mxnet/issues/12713#issuecomment-435773777
>> [7] https://github.com/apache/incubator-mxnet/pull/11005
>>
>>

Re: [Announce] Upcoming Apache MXNet (incubating) 1.3.1 patch release

Posted by Anton Chernov <me...@gmail.com>.

As part of:

Implement mkldnn convolution fusion and quantization
https://github.com/apache/incubator-mxnet/pull/12530

I propose to add the examples and documentation PR as well:

MKL-DNN Quantization Examples and README
https://github.com/apache/incubator-mxnet/pull/12808


Best regards,
Anton

пн, 5 нояб. 2018 г. в 19:02, Anton Chernov <me...@gmail.com>:

> Dear MXNet community,
>
> I will be the release manager for the upcoming 1.3.1 patch release. Naveen
> will be co-managing the release and providing help from the committers side.
>
> The following dates have been set:
>
> Code Freeze: 31st October 2018
> Release published: 13th November 2018
>
> Release notes have been drafted here [1].
>
>
> * Known issues
>
> Update MKL-DNN dependency
> https://github.com/apache/incubator-mxnet/pull/12953
>
> This PR hasn't been merged even to master yet. Requires additional
> discussion and merge.
>
> distributed kvstore bug in MXNet
> https://github.com/apache/incubator-mxnet/issues/12713
>
> > When distributed kvstore is used, by default gluon.Trainer doesn't work
> with mx.optimizer.LRScheduler if a worker has more than 1 GPU. To be more
> specific, the trainer updates once per GPU, the LRScheduler object is
> shared across GPUs and get a wrong update count.
>
> This needs to be fixed. [6]
>
>
> * Changes
>
> The following changes will be ported to the release branch, per [2]:
>
> Infer dtype in SymbolBlock import from input symbol [3]
> https://github.com/apache/incubator-mxnet/pull/12412
>
> [MXNET-953] Fix oob memory read
> https://github.com/apache/incubator-mxnet/pull/12631
>
> [MXNET-969] Fix buffer overflow in RNNOp
> https://github.com/apache/incubator-mxnet/pull/12603
>
> [MXNET-922] Fix memleak in profiler
> https://github.com/apache/incubator-mxnet/pull/12499
>
> Implement mkldnn convolution fusion and quantization (MXNet Graph
> Optimization and Quantization based on subgraph and MKL-DNN proposal [4])
> https://github.com/apache/incubator-mxnet/pull/12530
>
> Following items (test cases) should be already part of 1.3.0:
>
> [MXNET-486] Create CPP test for concat MKLDNN operator
> https://github.com/apache/incubator-mxnet/pull/11371
>
> [MXNET-489] MKLDNN Pool test
> https://github.com/apache/incubator-mxnet/pull/11608
>
> [MXNET-484] MKLDNN C++ test for LRN operator
> https://github.com/apache/incubator-mxnet/pull/11831
>
> [MXNET-546] Add unit test for MKLDNNSum
> https://github.com/apache/incubator-mxnet/pull/11272
>
> [MXNET-498] Test MKLDNN backward operators
> https://github.com/apache/incubator-mxnet/pull/11232
>
> [MXNET-500] Test cases improvement for MKLDNN on Gluon
> https://github.com/apache/incubator-mxnet/pull/10921
>
> Set correct update on kvstore flag in dist_device_sync mode (as part of
> fixing [5])
> https://github.com/apache/incubator-mxnet/pull/12786
>
> upgrade mshadow version
> https://github.com/apache/incubator-mxnet/pull/12692
> But another PR will be used instead:
> update mshadow
> https://github.com/apache/incubator-mxnet/pull/12674
>
> CudnnFind() usage improvements
> https://github.com/apache/incubator-mxnet/pull/12804
> A critical CUDNN fix that reduces GPU memory consumption and addresses
> this memory leak issue. This is an important fix to include in 1.3.1
>
>
> From discussion about gluon toolkits:
>
> disable opencv threading for forked process
> https://github.com/apache/incubator-mxnet/pull/12025
>
> Fix lazy record io when used with dataloader and multi_worker > 0
> https://github.com/apache/incubator-mxnet/pull/12554
>
> fix potential floating number overflow, enable float16
> https://github.com/apache/incubator-mxnet/pull/12118
>
>
>
> * Resolved issues
>
> MxNet 1.2.1–module get_outputs()
> https://discuss.mxnet.io/t/mxnet-1-2-1-module-get-outputs/1882
>
> As far as I can see from the comments the issue has been resolved, no
> actions need to be taken for this release. [7] is mentioned in this
> regards, but I don't see any action points here either.
>
>
> I will start with help of Naveen port the mentioned PR's to the 1.3.x
> branch.
>
>
> Best regards,
> Anton
>
> [1] https://cwiki.apache.org/confluence/x/eZGzBQ
> [2]
> https://cwiki.apache.org/confluence/display/MXNET/Project+Proposals+for+next+MXNet+Release
> [3] https://github.com/apache/incubator-mxnet/issues/11849
> [4]
> https://cwiki.apache.org/confluence/display/MXNET/MXNet+Graph+Optimization+and+Quantization+based+on+subgraph+and+MKL-DNN
> [5] https://github.com/apache/incubator-mxnet/issues/12713
> [6]
> https://github.com/apache/incubator-mxnet/issues/12713#issuecomment-435773777
> [7] https://github.com/apache/incubator-mxnet/pull/11005
>
>