You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mxnet.apache.org by Steffen Rochel <st...@gmail.com> on 2018/11/14 05:04:16 UTC

[Announce] Upcoming Apache MXNet (incubating) 1.4.0 release

Dear MXNet community,
the agreed plan was to establish code freeze for 1.4.0 release today. As
the 1.3.1 patch release is still ongoing I suggest to post-pone the code
freeze to Friday 16th November 2018.

Sergey Kolychev has agreed to act as co-release manager for all tasks which
require committer privileges. If anybody is interested to volunteer as
release manager - now is the time to speak up. Otherwise I will manage the
release.

Regards,
Steffen

Re: [Announce] Upcoming Apache MXNet (incubating) 1.4.0 release

Posted by Hagay Lupesko <lu...@gmail.com>.
+1 to wait until Java API work is ready since it is a major feature of the
release, yet performance should be at least on par with Python.

Also, I consider the MKL-DNN feature to be another major feature of the
release, the performance boost on CPU is significant [1], as an example,
ResNet50-v1 is 15.9x faster on C5.18xlarge.
I spoke with Alex Zai and Manu Seth who are working on MKL-DNN issues and
test coverage, and they feel they can get all remaining open issues in for
this Friday - I propose we also wait for that work to be ready and included
in 1.4.0

Cheers,
Hagay

[1]
https://cwiki.apache.org/confluence/display/MXNET/MXNet+with+Intel+MKL-DNN+-+Performance+Benchmarking


On Mon, Nov 19, 2018 at 10:57 AM Steffen Rochel <st...@gmail.com>
wrote:

> On Friday the contributors working on Java API discovered a potential
> performance problem with inference using Java API vs. Python. Investigation
> is ongoing.
> As the Java API is one of the main features for the upcoming release, I
> suggest to post-pone the code freeze towards end of this week.
>
> Please provide feedback and concern about the change in dates for code
> freeze and 1.4.0 release. I will provide updates on progress resolving the
> potential performance problem.
>
> Patrick - do you think it is possible to resolve the remaining issues on
> MKL-DNN this week, so we can consider GA for MKL-DNN with 1.4.0?
>
> Regards,
> Steffen
>
> On Thu, Nov 15, 2018 at 5:26 AM Anton Chernov <me...@gmail.com> wrote:
>
> > I'd like to remind everyone that 'code freeze' would mean cutting a
> v1.4.x
> > release branch and all following fixes would need to be backported.
> > Development on master can be continued as usual.
> >
> > Best
> > Anton
> >
> > ср, 14 нояб. 2018 г. в 6:04, Steffen Rochel <st...@gmail.com>:
> >
> > > Dear MXNet community,
> > > the agreed plan was to establish code freeze for 1.4.0 release today.
> As
> > > the 1.3.1 patch release is still ongoing I suggest to post-pone the
> code
> > > freeze to Friday 16th November 2018.
> > >
> > > Sergey Kolychev has agreed to act as co-release manager for all tasks
> > which
> > > require committer privileges. If anybody is interested to volunteer as
> > > release manager - now is the time to speak up. Otherwise I will manage
> > the
> > > release.
> > >
> > > Regards,
> > > Steffen
> > >
> >
>

Re: [Announce] Upcoming Apache MXNet (incubating) 1.4.0 release

Posted by kellen sunderland <ke...@gmail.com>.
I believe this PR is ready to merge but so far I don't have any approvals.
Would appreciate if someone could do a quick review:

https://github.com/apache/incubator-mxnet/pull/13311
and
https://github.com/apache/incubator-mxnet/pull/13310

-Kellen

On Thu, Nov 29, 2018 at 12:43 PM Steffen Rochel <st...@gmail.com>
wrote:

> Kellen - please merge your PR before v1.4.x branch is created or integrate
> afterwards.
> Steffen
>
> On Tue, Nov 20, 2018 at 7:01 PM kellen sunderland <
> kellen.sunderland@gmail.com> wrote:
>
> > Hey Steffen, I'd like to be able to merge this PR for version 1.4:
> > https://github.com/apache/incubator-mxnet/pull/13310 . It fixes a
> > regression in master which causes incorrect feature vectors to be output
> > when using the TensorRT feature.  (Thanks to Nathalie for helping me
> track
> > down the root cause of the issue).   I'm currently blocked on a CI issue
> I
> > haven't seen before, but hope to have it resolved by EOW.
> >
> > One call-out I would make is that we currently don't support Turing
> > architecture (sm_75).  I've been slowly trying to add support, but I
> don't
> > think I'd have capacity to do this done by EOW.  Does anyone feel
> strongly
> > we need this in the 1.4 release?  From my perspective this will already
> be
> > a strong release without it.
> >
> > On Tue, Nov 20, 2018 at 6:42 PM Steffen Rochel <st...@gmail.com>
> > wrote:
> >
> > > Thanks Patrick, lets target to get the PR's merged this week.
> > >
> > > Call for contributions from the community: Right now we have 10 PR
> > awaiting
> > > merge
> > > <
> > >
> >
> https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aopen+label%3Apr-awaiting-merge+
> > > >
> > > and
> > > we have 61 open PR awaiting review.
> > > <
> > >
> >
> https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aopen+label%3Apr-awaiting-review
> > > >
> > > I would appreciate if you all can help to review the open PR and the
> > > committers can drive the merge before code freeze for 1.4.0.
> > >
> > > The contributors on the Java API are making progress, but not all
> > > performance issues are resolved. With some luck it should be possible
> to
> > > code freeze towards end of this week.
> > >
> > > Are there other critical features/bugs/PR you think need to be included
> > in
> > > 1.4.0? If so, please communicate as soon as possible.
> > >
> > > Regards,
> > > Steffen
> > >
> > > On Mon, Nov 19, 2018 at 8:26 PM Zhao, Patric <pa...@intel.com>
> > > wrote:
> > >
> > > > Thanks, Steffen. I think there is NO open issue to block the MKLDNN
> to
> > GA
> > > > now.
> > > >
> > > > BTW, several quantization related PRs (#13297,#13260) are under the
> > > review
> > > > and I think it can be merged in this week.
> > > >
> > > > Thanks,
> > > >
> > > > --Patric
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Steffen Rochel [mailto:steffenrochel@gmail.com]
> > > > > Sent: Tuesday, November 20, 2018 2:57 AM
> > > > > To: dev@mxnet.incubator.apache.org
> > > > > Subject: Re: [Announce] Upcoming Apache MXNet (incubating) 1.4.0
> > > release
> > > > >
> > > > > On Friday the contributors working on Java API discovered a
> potential
> > > > > performance problem with inference using Java API vs. Python.
> > > > Investigation
> > > > > is ongoing.
> > > > > As the Java API is one of the main features for the upcoming
> > release, I
> > > > > suggest to post-pone the code freeze towards end of this week.
> > > > >
> > > > > Please provide feedback and concern about the change in dates for
> > code
> > > > > freeze and 1.4.0 release. I will provide updates on progress
> > resolving
> > > > the
> > > > > potential performance problem.
> > > > >
> > > > > Patrick - do you think it is possible to resolve the remaining
> issues
> > > on
> > > > MKL-
> > > > > DNN this week, so we can consider GA for MKL-DNN with 1.4.0?
> > > > >
> > > > > Regards,
> > > > > Steffen
> > > > >
> > > > > On Thu, Nov 15, 2018 at 5:26 AM Anton Chernov <mechernov@gmail.com
> >
> > > > > wrote:
> > > > >
> > > > > > I'd like to remind everyone that 'code freeze' would mean
> cutting a
> > > > > > v1.4.x release branch and all following fixes would need to be
> > > > backported.
> > > > > > Development on master can be continued as usual.
> > > > > >
> > > > > > Best
> > > > > > Anton
> > > > > >
> > > > > > ср, 14 нояб. 2018 г. в 6:04, Steffen Rochel <
> > steffenrochel@gmail.com
> > > >:
> > > > > >
> > > > > > > Dear MXNet community,
> > > > > > > the agreed plan was to establish code freeze for 1.4.0 release
> > > > > > > today. As the 1.3.1 patch release is still ongoing I suggest to
> > > > > > > post-pone the code freeze to Friday 16th November 2018.
> > > > > > >
> > > > > > > Sergey Kolychev has agreed to act as co-release manager for all
> > > > > > > tasks
> > > > > > which
> > > > > > > require committer privileges. If anybody is interested to
> > volunteer
> > > > > > > as release manager - now is the time to speak up. Otherwise I
> > will
> > > > > > > manage
> > > > > > the
> > > > > > > release.
> > > > > > >
> > > > > > > Regards,
> > > > > > > Steffen
> > > > > > >
> > > > > >
> > > >
> > >
> >
>

Re: [Announce] Upcoming Apache MXNet (incubating) 1.4.0 release

Posted by Steffen Rochel <st...@gmail.com>.
Kellen - please merge your PR before v1.4.x branch is created or integrate
afterwards.
Steffen

On Tue, Nov 20, 2018 at 7:01 PM kellen sunderland <
kellen.sunderland@gmail.com> wrote:

> Hey Steffen, I'd like to be able to merge this PR for version 1.4:
> https://github.com/apache/incubator-mxnet/pull/13310 . It fixes a
> regression in master which causes incorrect feature vectors to be output
> when using the TensorRT feature.  (Thanks to Nathalie for helping me track
> down the root cause of the issue).   I'm currently blocked on a CI issue I
> haven't seen before, but hope to have it resolved by EOW.
>
> One call-out I would make is that we currently don't support Turing
> architecture (sm_75).  I've been slowly trying to add support, but I don't
> think I'd have capacity to do this done by EOW.  Does anyone feel strongly
> we need this in the 1.4 release?  From my perspective this will already be
> a strong release without it.
>
> On Tue, Nov 20, 2018 at 6:42 PM Steffen Rochel <st...@gmail.com>
> wrote:
>
> > Thanks Patrick, lets target to get the PR's merged this week.
> >
> > Call for contributions from the community: Right now we have 10 PR
> awaiting
> > merge
> > <
> >
> https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aopen+label%3Apr-awaiting-merge+
> > >
> > and
> > we have 61 open PR awaiting review.
> > <
> >
> https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aopen+label%3Apr-awaiting-review
> > >
> > I would appreciate if you all can help to review the open PR and the
> > committers can drive the merge before code freeze for 1.4.0.
> >
> > The contributors on the Java API are making progress, but not all
> > performance issues are resolved. With some luck it should be possible to
> > code freeze towards end of this week.
> >
> > Are there other critical features/bugs/PR you think need to be included
> in
> > 1.4.0? If so, please communicate as soon as possible.
> >
> > Regards,
> > Steffen
> >
> > On Mon, Nov 19, 2018 at 8:26 PM Zhao, Patric <pa...@intel.com>
> > wrote:
> >
> > > Thanks, Steffen. I think there is NO open issue to block the MKLDNN to
> GA
> > > now.
> > >
> > > BTW, several quantization related PRs (#13297,#13260) are under the
> > review
> > > and I think it can be merged in this week.
> > >
> > > Thanks,
> > >
> > > --Patric
> > >
> > >
> > > > -----Original Message-----
> > > > From: Steffen Rochel [mailto:steffenrochel@gmail.com]
> > > > Sent: Tuesday, November 20, 2018 2:57 AM
> > > > To: dev@mxnet.incubator.apache.org
> > > > Subject: Re: [Announce] Upcoming Apache MXNet (incubating) 1.4.0
> > release
> > > >
> > > > On Friday the contributors working on Java API discovered a potential
> > > > performance problem with inference using Java API vs. Python.
> > > Investigation
> > > > is ongoing.
> > > > As the Java API is one of the main features for the upcoming
> release, I
> > > > suggest to post-pone the code freeze towards end of this week.
> > > >
> > > > Please provide feedback and concern about the change in dates for
> code
> > > > freeze and 1.4.0 release. I will provide updates on progress
> resolving
> > > the
> > > > potential performance problem.
> > > >
> > > > Patrick - do you think it is possible to resolve the remaining issues
> > on
> > > MKL-
> > > > DNN this week, so we can consider GA for MKL-DNN with 1.4.0?
> > > >
> > > > Regards,
> > > > Steffen
> > > >
> > > > On Thu, Nov 15, 2018 at 5:26 AM Anton Chernov <me...@gmail.com>
> > > > wrote:
> > > >
> > > > > I'd like to remind everyone that 'code freeze' would mean cutting a
> > > > > v1.4.x release branch and all following fixes would need to be
> > > backported.
> > > > > Development on master can be continued as usual.
> > > > >
> > > > > Best
> > > > > Anton
> > > > >
> > > > > ср, 14 нояб. 2018 г. в 6:04, Steffen Rochel <
> steffenrochel@gmail.com
> > >:
> > > > >
> > > > > > Dear MXNet community,
> > > > > > the agreed plan was to establish code freeze for 1.4.0 release
> > > > > > today. As the 1.3.1 patch release is still ongoing I suggest to
> > > > > > post-pone the code freeze to Friday 16th November 2018.
> > > > > >
> > > > > > Sergey Kolychev has agreed to act as co-release manager for all
> > > > > > tasks
> > > > > which
> > > > > > require committer privileges. If anybody is interested to
> volunteer
> > > > > > as release manager - now is the time to speak up. Otherwise I
> will
> > > > > > manage
> > > > > the
> > > > > > release.
> > > > > >
> > > > > > Regards,
> > > > > > Steffen
> > > > > >
> > > > >
> > >
> >
>

Re: [Announce] Upcoming Apache MXNet (incubating) 1.4.0 release

Posted by Pedro Larroy <pe...@gmail.com>.
I see. There's also an openmp primitive to change this. I see a way to
fix this issue with a bit of refactor.

Thanks.

Pedro.
On Thu, Nov 29, 2018 at 6:24 PM Chris Olivier <cj...@gmail.com> wrote:
>
> I don’t think that does anything at all, as stated in my other email.
> Someone can look into the omp code to be sure but my suspicion is that the
> environment variable is only read on startup, and at any rate, better to be
> set through the api at runtime
>
> On Thu, Nov 29, 2018 at 8:11 AM Pedro Larroy <pe...@gmail.com>
> wrote:
>
> > To be precise, what would be the consequences of not having these env
> > variables set in the engine threads related to OMP?
> > Given your experience with OpenMP I hope you can help us answer these
> > questions.
> >
> > Hopefully we can get the same effect (if any) of these setenvs using
> > some openmp call or a pragma. Definitely we shouldn't be mutating the
> > environment from a different thread from what I understand, which is
> > the likely cause of the random crashes some users are experiencing.
> >
> > Pedro
> > On Thu, Nov 29, 2018 at 5:00 PM Pedro Larroy
> > <pe...@gmail.com> wrote:
> > >
> > > Chris.  The problem is with setenv, not with getenv. We don't want to
> > > remove any getenv call, just these misplaced setenvs:
> > >
> > >
> > >
> > https://github.com/apache/incubator-mxnet/blob/master/src/initialize.cc#L61
> > >
> > > Please check the code above carefully and give us your feedback. Based
> > > on your email I think we don't yet have a common understanding of the
> > > root cause of this issue.
> > >
> > > Pedro.
> > > On Thu, Nov 29, 2018 at 4:02 PM Chris Olivier <cj...@gmail.com>
> > wrote:
> > > >
> > > > - getenv should be thread safe as long as nothing is calling
> > putenv/setenv
> > > > in another thread (the environment doesn’t change) as stated here:
> > > >
> > > > http://www.cplusplus.com/reference/cstdlib/getenv/
> > > >
> > > > it’s a simple library call, so to be sure either way, one can check the
> > > > actual source and see (in case some particular implementation is
> > acting in
> > > > a particularly thread-unsafe manner). This should be vetted before
> > making
> > > > any high-impact decisions such as trying to go remove every getenv
> > call in
> > > > the whole system.
> > > >
> > > > - locking after fork is possibly due to libgomp not supporting forking
> > such
> > > > that after a fork, a call is made to release the blocked omp threads
> > and
> > > > the main thread waits for the omp threads to finish, but the omp
> > threads
> > > > belong to the pre-forked process and thus never execute, causing that
> > > > forked process to freeze.  This behavior has been witnessed before.
> > > >
> > > >
> > > >
> > > >
> > > > On Thu, Nov 29, 2018 at 6:13 AM Pedro Larroy <
> > pedro.larroy.lists@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi all.
> > > > >
> > > > > There are two important issues / fixes that should go in the next
> > > > > release in my radar:
> > > > >
> > > > > 1) https://github.com/apache/incubator-mxnet/pull/13409/files
> > > > > There is a bug in shape inference on CPU when not using MKL, also we
> > > > > are running activation on CPU via MKL when we compile CUDNN+MKLDNN.
> > > > > I'm finishing a fix for these issues in the above PR.
> > > > >
> > > > > 2) https://github.com/apache/incubator-mxnet/issues/13438
> > > > > We are seeing crashes due to unsafe setenv in multithreaded code.
> > > > > Setenv / getenv from multiple threads is not safe and is causing
> > > > > segfaults. This piece of code (the handlers in pthread_atfork)
> > already
> > > > > caused a very difficult to diagnose hang in a previous release, where
> > > > > a fork inside cudnn would deadlock the engine.
> > > > >
> > > > > I would remove setenv from 2) as a mitigation, but we would need to
> > > > > check for regressions as we could be creating additional threads
> > > > > inside the engine.
> > > > >
> > > > > I would suggest that we address these two major issues before the
> > next
> > > > > release.
> > > > >
> > > > > Pedro
> > > > >
> > > > >
> > > > >
> > > > > On Sun, Nov 25, 2018 at 11:41 PM Steffen Rochel <
> > steffenrochel@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > Dear MXNet community,
> > > > > >
> > > > > > I will be the release manager for the upcoming Apache MXNet 1.4.0
> > > > > release.
> > > > > > Sergey Kolychev will be co-managing the release and providing help
> > from
> > > > > the
> > > > > > committers side.
> > > > > > A release candidate will be cut on November 29, 2018 and voting
> > will
> > > > > start
> > > > > > December 7, 2018. Release notes have been drafted here [1]. If you
> > have
> > > > > any
> > > > > > additional features in progress and would like to include it in
> > this
> > > > > > release, please assure they have been merged by November 27, 2018.
> > > > > Release
> > > > > > schedule is available here [2].
> > > > > >
> > > > > > Feel free to add any other comments/suggestions. Please help to
> > review
> > > > > and
> > > > > > merge outstanding PR's and resolve issues impacting the quality of
> > the
> > > > > > 1.4.0 release.
> > > > > >
> > > > > > Regards,
> > > > > >
> > > > > > Steffen
> > > > > >
> > > > > > [1]
> > > > > >
> > > > >
> > https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Notes
> > > > > >
> > > > > > [2]
> > > > >
> > https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Plan+and+Status
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Tue, Nov 20, 2018 at 7:15 PM kellen sunderland <
> > > > > > kellen.sunderland@gmail.com> wrote:
> > > > > >
> > > > > > > Spoke too soon[1], looks like others have been adding Turing
> > support as
> > > > > > > well (thanks to those helping with this).  I believe there's
> > still a
> > > > > few
> > > > > > > changes we'd have to make to claim support though (mshadow CMake
> > > > > changes,
> > > > > > > PyPi package creation tweaks).
> > > > > > >
> > > > > > > 1:
> > > > > > >
> > > > > > >
> > > > >
> > https://github.com/apache/incubator-mxnet/commit/2c3357443ec3d49a11e93c89f278264ce10c2f08
> > > > > > >
> > > > > > > On Tue, Nov 20, 2018 at 7:00 PM kellen sunderland <
> > > > > > > kellen.sunderland@gmail.com> wrote:
> > > > > > >
> > > > > > > > Hey Steffen, I'd like to be able to merge this PR for version
> > 1.4:
> > > > > > > > https://github.com/apache/incubator-mxnet/pull/13310 . It
> > fixes a
> > > > > > > > regression in master which causes incorrect feature vectors to
> > be
> > > > > output
> > > > > > > > when using the TensorRT feature.  (Thanks to Nathalie for
> > helping me
> > > > > > > track
> > > > > > > > down the root cause of the issue).   I'm currently blocked on
> > a CI
> > > > > issue
> > > > > > > I
> > > > > > > > haven't seen before, but hope to have it resolved by EOW.
> > > > > > > >
> > > > > > > > One call-out I would make is that we currently don't support
> > Turing
> > > > > > > > architecture (sm_75).  I've been slowly trying to add support,
> > but I
> > > > > > > don't
> > > > > > > > think I'd have capacity to do this done by EOW.  Does anyone
> > feel
> > > > > > > strongly
> > > > > > > > we need this in the 1.4 release?  From my perspective this will
> > > > > already
> > > > > > > be
> > > > > > > > a strong release without it.
> > > > > > > >
> > > > > > > > On Tue, Nov 20, 2018 at 6:42 PM Steffen Rochel <
> > > > > steffenrochel@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > >> Thanks Patrick, lets target to get the PR's merged this week.
> > > > > > > >>
> > > > > > > >> Call for contributions from the community: Right now we have
> > 10 PR
> > > > > > > >> awaiting
> > > > > > > >> merge
> > > > > > > >> <
> > > > > > > >>
> > > > > > >
> > > > >
> > https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aopen+label%3Apr-awaiting-merge+
> > > > > > > >> >
> > > > > > > >> and
> > > > > > > >> we have 61 open PR awaiting review.
> > > > > > > >> <
> > > > > > > >>
> > > > > > >
> > > > >
> > https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aopen+label%3Apr-awaiting-review
> > > > > > > >> >
> > > > > > > >> I would appreciate if you all can help to review the open PR
> > and the
> > > > > > > >> committers can drive the merge before code freeze for 1.4.0.
> > > > > > > >>
> > > > > > > >> The contributors on the Java API are making progress, but not
> > all
> > > > > > > >> performance issues are resolved. With some luck it should be
> > > > > possible to
> > > > > > > >> code freeze towards end of this week.
> > > > > > > >>
> > > > > > > >> Are there other critical features/bugs/PR you think need to be
> > > > > included
> > > > > > > in
> > > > > > > >> 1.4.0? If so, please communicate as soon as possible.
> > > > > > > >>
> > > > > > > >> Regards,
> > > > > > > >> Steffen
> > > > > > > >>
> > > > > > > >> On Mon, Nov 19, 2018 at 8:26 PM Zhao, Patric <
> > patric.zhao@intel.com
> > > > > >
> > > > > > > >> wrote:
> > > > > > > >>
> > > > > > > >> > Thanks, Steffen. I think there is NO open issue to block the
> > > > > MKLDNN to
> > > > > > > >> GA
> > > > > > > >> > now.
> > > > > > > >> >
> > > > > > > >> > BTW, several quantization related PRs (#13297,#13260) are
> > under
> > > > > the
> > > > > > > >> review
> > > > > > > >> > and I think it can be merged in this week.
> > > > > > > >> >
> > > > > > > >> > Thanks,
> > > > > > > >> >
> > > > > > > >> > --Patric
> > > > > > > >> >
> > > > > > > >> >
> > > > > > > >> > > -----Original Message-----
> > > > > > > >> > > From: Steffen Rochel [mailto:steffenrochel@gmail.com]
> > > > > > > >> > > Sent: Tuesday, November 20, 2018 2:57 AM
> > > > > > > >> > > To: dev@mxnet.incubator.apache.org
> > > > > > > >> > > Subject: Re: [Announce] Upcoming Apache MXNet
> > (incubating) 1.4.0
> > > > > > > >> release
> > > > > > > >> > >
> > > > > > > >> > > On Friday the contributors working on Java API discovered
> > a
> > > > > > > potential
> > > > > > > >> > > performance problem with inference using Java API vs.
> > Python.
> > > > > > > >> > Investigation
> > > > > > > >> > > is ongoing.
> > > > > > > >> > > As the Java API is one of the main features for the
> > upcoming
> > > > > > > release,
> > > > > > > >> I
> > > > > > > >> > > suggest to post-pone the code freeze towards end of this
> > week.
> > > > > > > >> > >
> > > > > > > >> > > Please provide feedback and concern about the change in
> > dates
> > > > > for
> > > > > > > code
> > > > > > > >> > > freeze and 1.4.0 release. I will provide updates on
> > progress
> > > > > > > resolving
> > > > > > > >> > the
> > > > > > > >> > > potential performance problem.
> > > > > > > >> > >
> > > > > > > >> > > Patrick - do you think it is possible to resolve the
> > remaining
> > > > > > > issues
> > > > > > > >> on
> > > > > > > >> > MKL-
> > > > > > > >> > > DNN this week, so we can consider GA for MKL-DNN with
> > 1.4.0?
> > > > > > > >> > >
> > > > > > > >> > > Regards,
> > > > > > > >> > > Steffen
> > > > > > > >> > >
> > > > > > > >> > > On Thu, Nov 15, 2018 at 5:26 AM Anton Chernov <
> > > > > mechernov@gmail.com>
> > > > > > > >> > > wrote:
> > > > > > > >> > >
> > > > > > > >> > > > I'd like to remind everyone that 'code freeze' would
> > mean
> > > > > cutting
> > > > > > > a
> > > > > > > >> > > > v1.4.x release branch and all following fixes would
> > need to be
> > > > > > > >> > backported.
> > > > > > > >> > > > Development on master can be continued as usual.
> > > > > > > >> > > >
> > > > > > > >> > > > Best
> > > > > > > >> > > > Anton
> > > > > > > >> > > >
> > > > > > > >> > > > ср, 14 нояб. 2018 г. в 6:04, Steffen Rochel <
> > > > > > > >> steffenrochel@gmail.com>:
> > > > > > > >> > > >
> > > > > > > >> > > > > Dear MXNet community,
> > > > > > > >> > > > > the agreed plan was to establish code freeze for 1.4.0
> > > > > release
> > > > > > > >> > > > > today. As the 1.3.1 patch release is still ongoing I
> > > > > suggest to
> > > > > > > >> > > > > post-pone the code freeze to Friday 16th November
> > 2018.
> > > > > > > >> > > > >
> > > > > > > >> > > > > Sergey Kolychev has agreed to act as co-release
> > manager for
> > > > > all
> > > > > > > >> > > > > tasks
> > > > > > > >> > > > which
> > > > > > > >> > > > > require committer privileges. If anybody is
> > interested to
> > > > > > > >> volunteer
> > > > > > > >> > > > > as release manager - now is the time to speak up.
> > Otherwise
> > > > > I
> > > > > > > will
> > > > > > > >> > > > > manage
> > > > > > > >> > > > the
> > > > > > > >> > > > > release.
> > > > > > > >> > > > >
> > > > > > > >> > > > > Regards,
> > > > > > > >> > > > > Steffen
> > > > > > > >> > > > >
> > > > > > > >> > > >
> > > > > > > >> >
> > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > >
> >

Re: [Announce] Upcoming Apache MXNet (incubating) 1.4.0 release

Posted by Chris Olivier <cj...@gmail.com>.
I don’t think that does anything at all, as stated in my other email.
Someone can look into the omp code to be sure but my suspicion is that the
environment variable is only read on startup, and at any rate, better to be
set through the api at runtime

On Thu, Nov 29, 2018 at 8:11 AM Pedro Larroy <pe...@gmail.com>
wrote:

> To be precise, what would be the consequences of not having these env
> variables set in the engine threads related to OMP?
> Given your experience with OpenMP I hope you can help us answer these
> questions.
>
> Hopefully we can get the same effect (if any) of these setenvs using
> some openmp call or a pragma. Definitely we shouldn't be mutating the
> environment from a different thread from what I understand, which is
> the likely cause of the random crashes some users are experiencing.
>
> Pedro
> On Thu, Nov 29, 2018 at 5:00 PM Pedro Larroy
> <pe...@gmail.com> wrote:
> >
> > Chris.  The problem is with setenv, not with getenv. We don't want to
> > remove any getenv call, just these misplaced setenvs:
> >
> >
> >
> https://github.com/apache/incubator-mxnet/blob/master/src/initialize.cc#L61
> >
> > Please check the code above carefully and give us your feedback. Based
> > on your email I think we don't yet have a common understanding of the
> > root cause of this issue.
> >
> > Pedro.
> > On Thu, Nov 29, 2018 at 4:02 PM Chris Olivier <cj...@gmail.com>
> wrote:
> > >
> > > - getenv should be thread safe as long as nothing is calling
> putenv/setenv
> > > in another thread (the environment doesn’t change) as stated here:
> > >
> > > http://www.cplusplus.com/reference/cstdlib/getenv/
> > >
> > > it’s a simple library call, so to be sure either way, one can check the
> > > actual source and see (in case some particular implementation is
> acting in
> > > a particularly thread-unsafe manner). This should be vetted before
> making
> > > any high-impact decisions such as trying to go remove every getenv
> call in
> > > the whole system.
> > >
> > > - locking after fork is possibly due to libgomp not supporting forking
> such
> > > that after a fork, a call is made to release the blocked omp threads
> and
> > > the main thread waits for the omp threads to finish, but the omp
> threads
> > > belong to the pre-forked process and thus never execute, causing that
> > > forked process to freeze.  This behavior has been witnessed before.
> > >
> > >
> > >
> > >
> > > On Thu, Nov 29, 2018 at 6:13 AM Pedro Larroy <
> pedro.larroy.lists@gmail.com>
> > > wrote:
> > >
> > > > Hi all.
> > > >
> > > > There are two important issues / fixes that should go in the next
> > > > release in my radar:
> > > >
> > > > 1) https://github.com/apache/incubator-mxnet/pull/13409/files
> > > > There is a bug in shape inference on CPU when not using MKL, also we
> > > > are running activation on CPU via MKL when we compile CUDNN+MKLDNN.
> > > > I'm finishing a fix for these issues in the above PR.
> > > >
> > > > 2) https://github.com/apache/incubator-mxnet/issues/13438
> > > > We are seeing crashes due to unsafe setenv in multithreaded code.
> > > > Setenv / getenv from multiple threads is not safe and is causing
> > > > segfaults. This piece of code (the handlers in pthread_atfork)
> already
> > > > caused a very difficult to diagnose hang in a previous release, where
> > > > a fork inside cudnn would deadlock the engine.
> > > >
> > > > I would remove setenv from 2) as a mitigation, but we would need to
> > > > check for regressions as we could be creating additional threads
> > > > inside the engine.
> > > >
> > > > I would suggest that we address these two major issues before the
> next
> > > > release.
> > > >
> > > > Pedro
> > > >
> > > >
> > > >
> > > > On Sun, Nov 25, 2018 at 11:41 PM Steffen Rochel <
> steffenrochel@gmail.com>
> > > > wrote:
> > > > >
> > > > > Dear MXNet community,
> > > > >
> > > > > I will be the release manager for the upcoming Apache MXNet 1.4.0
> > > > release.
> > > > > Sergey Kolychev will be co-managing the release and providing help
> from
> > > > the
> > > > > committers side.
> > > > > A release candidate will be cut on November 29, 2018 and voting
> will
> > > > start
> > > > > December 7, 2018. Release notes have been drafted here [1]. If you
> have
> > > > any
> > > > > additional features in progress and would like to include it in
> this
> > > > > release, please assure they have been merged by November 27, 2018.
> > > > Release
> > > > > schedule is available here [2].
> > > > >
> > > > > Feel free to add any other comments/suggestions. Please help to
> review
> > > > and
> > > > > merge outstanding PR's and resolve issues impacting the quality of
> the
> > > > > 1.4.0 release.
> > > > >
> > > > > Regards,
> > > > >
> > > > > Steffen
> > > > >
> > > > > [1]
> > > > >
> > > >
> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Notes
> > > > >
> > > > > [2]
> > > >
> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Plan+and+Status
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Tue, Nov 20, 2018 at 7:15 PM kellen sunderland <
> > > > > kellen.sunderland@gmail.com> wrote:
> > > > >
> > > > > > Spoke too soon[1], looks like others have been adding Turing
> support as
> > > > > > well (thanks to those helping with this).  I believe there's
> still a
> > > > few
> > > > > > changes we'd have to make to claim support though (mshadow CMake
> > > > changes,
> > > > > > PyPi package creation tweaks).
> > > > > >
> > > > > > 1:
> > > > > >
> > > > > >
> > > >
> https://github.com/apache/incubator-mxnet/commit/2c3357443ec3d49a11e93c89f278264ce10c2f08
> > > > > >
> > > > > > On Tue, Nov 20, 2018 at 7:00 PM kellen sunderland <
> > > > > > kellen.sunderland@gmail.com> wrote:
> > > > > >
> > > > > > > Hey Steffen, I'd like to be able to merge this PR for version
> 1.4:
> > > > > > > https://github.com/apache/incubator-mxnet/pull/13310 . It
> fixes a
> > > > > > > regression in master which causes incorrect feature vectors to
> be
> > > > output
> > > > > > > when using the TensorRT feature.  (Thanks to Nathalie for
> helping me
> > > > > > track
> > > > > > > down the root cause of the issue).   I'm currently blocked on
> a CI
> > > > issue
> > > > > > I
> > > > > > > haven't seen before, but hope to have it resolved by EOW.
> > > > > > >
> > > > > > > One call-out I would make is that we currently don't support
> Turing
> > > > > > > architecture (sm_75).  I've been slowly trying to add support,
> but I
> > > > > > don't
> > > > > > > think I'd have capacity to do this done by EOW.  Does anyone
> feel
> > > > > > strongly
> > > > > > > we need this in the 1.4 release?  From my perspective this will
> > > > already
> > > > > > be
> > > > > > > a strong release without it.
> > > > > > >
> > > > > > > On Tue, Nov 20, 2018 at 6:42 PM Steffen Rochel <
> > > > steffenrochel@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > >> Thanks Patrick, lets target to get the PR's merged this week.
> > > > > > >>
> > > > > > >> Call for contributions from the community: Right now we have
> 10 PR
> > > > > > >> awaiting
> > > > > > >> merge
> > > > > > >> <
> > > > > > >>
> > > > > >
> > > >
> https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aopen+label%3Apr-awaiting-merge+
> > > > > > >> >
> > > > > > >> and
> > > > > > >> we have 61 open PR awaiting review.
> > > > > > >> <
> > > > > > >>
> > > > > >
> > > >
> https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aopen+label%3Apr-awaiting-review
> > > > > > >> >
> > > > > > >> I would appreciate if you all can help to review the open PR
> and the
> > > > > > >> committers can drive the merge before code freeze for 1.4.0.
> > > > > > >>
> > > > > > >> The contributors on the Java API are making progress, but not
> all
> > > > > > >> performance issues are resolved. With some luck it should be
> > > > possible to
> > > > > > >> code freeze towards end of this week.
> > > > > > >>
> > > > > > >> Are there other critical features/bugs/PR you think need to be
> > > > included
> > > > > > in
> > > > > > >> 1.4.0? If so, please communicate as soon as possible.
> > > > > > >>
> > > > > > >> Regards,
> > > > > > >> Steffen
> > > > > > >>
> > > > > > >> On Mon, Nov 19, 2018 at 8:26 PM Zhao, Patric <
> patric.zhao@intel.com
> > > > >
> > > > > > >> wrote:
> > > > > > >>
> > > > > > >> > Thanks, Steffen. I think there is NO open issue to block the
> > > > MKLDNN to
> > > > > > >> GA
> > > > > > >> > now.
> > > > > > >> >
> > > > > > >> > BTW, several quantization related PRs (#13297,#13260) are
> under
> > > > the
> > > > > > >> review
> > > > > > >> > and I think it can be merged in this week.
> > > > > > >> >
> > > > > > >> > Thanks,
> > > > > > >> >
> > > > > > >> > --Patric
> > > > > > >> >
> > > > > > >> >
> > > > > > >> > > -----Original Message-----
> > > > > > >> > > From: Steffen Rochel [mailto:steffenrochel@gmail.com]
> > > > > > >> > > Sent: Tuesday, November 20, 2018 2:57 AM
> > > > > > >> > > To: dev@mxnet.incubator.apache.org
> > > > > > >> > > Subject: Re: [Announce] Upcoming Apache MXNet
> (incubating) 1.4.0
> > > > > > >> release
> > > > > > >> > >
> > > > > > >> > > On Friday the contributors working on Java API discovered
> a
> > > > > > potential
> > > > > > >> > > performance problem with inference using Java API vs.
> Python.
> > > > > > >> > Investigation
> > > > > > >> > > is ongoing.
> > > > > > >> > > As the Java API is one of the main features for the
> upcoming
> > > > > > release,
> > > > > > >> I
> > > > > > >> > > suggest to post-pone the code freeze towards end of this
> week.
> > > > > > >> > >
> > > > > > >> > > Please provide feedback and concern about the change in
> dates
> > > > for
> > > > > > code
> > > > > > >> > > freeze and 1.4.0 release. I will provide updates on
> progress
> > > > > > resolving
> > > > > > >> > the
> > > > > > >> > > potential performance problem.
> > > > > > >> > >
> > > > > > >> > > Patrick - do you think it is possible to resolve the
> remaining
> > > > > > issues
> > > > > > >> on
> > > > > > >> > MKL-
> > > > > > >> > > DNN this week, so we can consider GA for MKL-DNN with
> 1.4.0?
> > > > > > >> > >
> > > > > > >> > > Regards,
> > > > > > >> > > Steffen
> > > > > > >> > >
> > > > > > >> > > On Thu, Nov 15, 2018 at 5:26 AM Anton Chernov <
> > > > mechernov@gmail.com>
> > > > > > >> > > wrote:
> > > > > > >> > >
> > > > > > >> > > > I'd like to remind everyone that 'code freeze' would
> mean
> > > > cutting
> > > > > > a
> > > > > > >> > > > v1.4.x release branch and all following fixes would
> need to be
> > > > > > >> > backported.
> > > > > > >> > > > Development on master can be continued as usual.
> > > > > > >> > > >
> > > > > > >> > > > Best
> > > > > > >> > > > Anton
> > > > > > >> > > >
> > > > > > >> > > > ср, 14 нояб. 2018 г. в 6:04, Steffen Rochel <
> > > > > > >> steffenrochel@gmail.com>:
> > > > > > >> > > >
> > > > > > >> > > > > Dear MXNet community,
> > > > > > >> > > > > the agreed plan was to establish code freeze for 1.4.0
> > > > release
> > > > > > >> > > > > today. As the 1.3.1 patch release is still ongoing I
> > > > suggest to
> > > > > > >> > > > > post-pone the code freeze to Friday 16th November
> 2018.
> > > > > > >> > > > >
> > > > > > >> > > > > Sergey Kolychev has agreed to act as co-release
> manager for
> > > > all
> > > > > > >> > > > > tasks
> > > > > > >> > > > which
> > > > > > >> > > > > require committer privileges. If anybody is
> interested to
> > > > > > >> volunteer
> > > > > > >> > > > > as release manager - now is the time to speak up.
> Otherwise
> > > > I
> > > > > > will
> > > > > > >> > > > > manage
> > > > > > >> > > > the
> > > > > > >> > > > > release.
> > > > > > >> > > > >
> > > > > > >> > > > > Regards,
> > > > > > >> > > > > Steffen
> > > > > > >> > > > >
> > > > > > >> > > >
> > > > > > >> >
> > > > > > >>
> > > > > > >
> > > > > >
> > > >
>

Re: [Announce] Upcoming Apache MXNet (incubating) 1.4.0 release

Posted by Chris Olivier <cj...@gmail.com>.
By the way, have you traced a problem to these calls?

I am a bit skeptical that this is problematic here for the following reason:

At the time of arfork(), the new process doesn’t have any other threads to
speak of that are calling getenv(). Any globals from the last process are
owned by that process and copy-on-write in the new process. This would mean
that the getenv() in the old process wouldn’t be affected by putenv() in
the newly forked process and like I said, at this time, the newly forked
process tends to be single-threaded.



On Thu, Nov 29, 2018 at 8:11 AM Pedro Larroy <pe...@gmail.com>
wrote:

> To be precise, what would be the consequences of not having these env
> variables set in the engine threads related to OMP?
> Given your experience with OpenMP I hope you can help us answer these
> questions.
>
> Hopefully we can get the same effect (if any) of these setenvs using
> some openmp call or a pragma. Definitely we shouldn't be mutating the
> environment from a different thread from what I understand, which is
> the likely cause of the random crashes some users are experiencing.
>
> Pedro
> On Thu, Nov 29, 2018 at 5:00 PM Pedro Larroy
> <pe...@gmail.com> wrote:
> >
> > Chris.  The problem is with setenv, not with getenv. We don't want to
> > remove any getenv call, just these misplaced setenvs:
> >
> >
> >
> https://github.com/apache/incubator-mxnet/blob/master/src/initialize.cc#L61
> >
> > Please check the code above carefully and give us your feedback. Based
> > on your email I think we don't yet have a common understanding of the
> > root cause of this issue.
> >
> > Pedro.
> > On Thu, Nov 29, 2018 at 4:02 PM Chris Olivier <cj...@gmail.com>
> wrote:
> > >
> > > - getenv should be thread safe as long as nothing is calling
> putenv/setenv
> > > in another thread (the environment doesn’t change) as stated here:
> > >
> > > http://www.cplusplus.com/reference/cstdlib/getenv/
> > >
> > > it’s a simple library call, so to be sure either way, one can check the
> > > actual source and see (in case some particular implementation is
> acting in
> > > a particularly thread-unsafe manner). This should be vetted before
> making
> > > any high-impact decisions such as trying to go remove every getenv
> call in
> > > the whole system.
> > >
> > > - locking after fork is possibly due to libgomp not supporting forking
> such
> > > that after a fork, a call is made to release the blocked omp threads
> and
> > > the main thread waits for the omp threads to finish, but the omp
> threads
> > > belong to the pre-forked process and thus never execute, causing that
> > > forked process to freeze.  This behavior has been witnessed before.
> > >
> > >
> > >
> > >
> > > On Thu, Nov 29, 2018 at 6:13 AM Pedro Larroy <
> pedro.larroy.lists@gmail.com>
> > > wrote:
> > >
> > > > Hi all.
> > > >
> > > > There are two important issues / fixes that should go in the next
> > > > release in my radar:
> > > >
> > > > 1) https://github.com/apache/incubator-mxnet/pull/13409/files
> > > > There is a bug in shape inference on CPU when not using MKL, also we
> > > > are running activation on CPU via MKL when we compile CUDNN+MKLDNN.
> > > > I'm finishing a fix for these issues in the above PR.
> > > >
> > > > 2) https://github.com/apache/incubator-mxnet/issues/13438
> > > > We are seeing crashes due to unsafe setenv in multithreaded code.
> > > > Setenv / getenv from multiple threads is not safe and is causing
> > > > segfaults. This piece of code (the handlers in pthread_atfork)
> already
> > > > caused a very difficult to diagnose hang in a previous release, where
> > > > a fork inside cudnn would deadlock the engine.
> > > >
> > > > I would remove setenv from 2) as a mitigation, but we would need to
> > > > check for regressions as we could be creating additional threads
> > > > inside the engine.
> > > >
> > > > I would suggest that we address these two major issues before the
> next
> > > > release.
> > > >
> > > > Pedro
> > > >
> > > >
> > > >
> > > > On Sun, Nov 25, 2018 at 11:41 PM Steffen Rochel <
> steffenrochel@gmail.com>
> > > > wrote:
> > > > >
> > > > > Dear MXNet community,
> > > > >
> > > > > I will be the release manager for the upcoming Apache MXNet 1.4.0
> > > > release.
> > > > > Sergey Kolychev will be co-managing the release and providing help
> from
> > > > the
> > > > > committers side.
> > > > > A release candidate will be cut on November 29, 2018 and voting
> will
> > > > start
> > > > > December 7, 2018. Release notes have been drafted here [1]. If you
> have
> > > > any
> > > > > additional features in progress and would like to include it in
> this
> > > > > release, please assure they have been merged by November 27, 2018.
> > > > Release
> > > > > schedule is available here [2].
> > > > >
> > > > > Feel free to add any other comments/suggestions. Please help to
> review
> > > > and
> > > > > merge outstanding PR's and resolve issues impacting the quality of
> the
> > > > > 1.4.0 release.
> > > > >
> > > > > Regards,
> > > > >
> > > > > Steffen
> > > > >
> > > > > [1]
> > > > >
> > > >
> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Notes
> > > > >
> > > > > [2]
> > > >
> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Plan+and+Status
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Tue, Nov 20, 2018 at 7:15 PM kellen sunderland <
> > > > > kellen.sunderland@gmail.com> wrote:
> > > > >
> > > > > > Spoke too soon[1], looks like others have been adding Turing
> support as
> > > > > > well (thanks to those helping with this).  I believe there's
> still a
> > > > few
> > > > > > changes we'd have to make to claim support though (mshadow CMake
> > > > changes,
> > > > > > PyPi package creation tweaks).
> > > > > >
> > > > > > 1:
> > > > > >
> > > > > >
> > > >
> https://github.com/apache/incubator-mxnet/commit/2c3357443ec3d49a11e93c89f278264ce10c2f08
> > > > > >
> > > > > > On Tue, Nov 20, 2018 at 7:00 PM kellen sunderland <
> > > > > > kellen.sunderland@gmail.com> wrote:
> > > > > >
> > > > > > > Hey Steffen, I'd like to be able to merge this PR for version
> 1.4:
> > > > > > > https://github.com/apache/incubator-mxnet/pull/13310 . It
> fixes a
> > > > > > > regression in master which causes incorrect feature vectors to
> be
> > > > output
> > > > > > > when using the TensorRT feature.  (Thanks to Nathalie for
> helping me
> > > > > > track
> > > > > > > down the root cause of the issue).   I'm currently blocked on
> a CI
> > > > issue
> > > > > > I
> > > > > > > haven't seen before, but hope to have it resolved by EOW.
> > > > > > >
> > > > > > > One call-out I would make is that we currently don't support
> Turing
> > > > > > > architecture (sm_75).  I've been slowly trying to add support,
> but I
> > > > > > don't
> > > > > > > think I'd have capacity to do this done by EOW.  Does anyone
> feel
> > > > > > strongly
> > > > > > > we need this in the 1.4 release?  From my perspective this will
> > > > already
> > > > > > be
> > > > > > > a strong release without it.
> > > > > > >
> > > > > > > On Tue, Nov 20, 2018 at 6:42 PM Steffen Rochel <
> > > > steffenrochel@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > >> Thanks Patrick, lets target to get the PR's merged this week.
> > > > > > >>
> > > > > > >> Call for contributions from the community: Right now we have
> 10 PR
> > > > > > >> awaiting
> > > > > > >> merge
> > > > > > >> <
> > > > > > >>
> > > > > >
> > > >
> https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aopen+label%3Apr-awaiting-merge+
> > > > > > >> >
> > > > > > >> and
> > > > > > >> we have 61 open PR awaiting review.
> > > > > > >> <
> > > > > > >>
> > > > > >
> > > >
> https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aopen+label%3Apr-awaiting-review
> > > > > > >> >
> > > > > > >> I would appreciate if you all can help to review the open PR
> and the
> > > > > > >> committers can drive the merge before code freeze for 1.4.0.
> > > > > > >>
> > > > > > >> The contributors on the Java API are making progress, but not
> all
> > > > > > >> performance issues are resolved. With some luck it should be
> > > > possible to
> > > > > > >> code freeze towards end of this week.
> > > > > > >>
> > > > > > >> Are there other critical features/bugs/PR you think need to be
> > > > included
> > > > > > in
> > > > > > >> 1.4.0? If so, please communicate as soon as possible.
> > > > > > >>
> > > > > > >> Regards,
> > > > > > >> Steffen
> > > > > > >>
> > > > > > >> On Mon, Nov 19, 2018 at 8:26 PM Zhao, Patric <
> patric.zhao@intel.com
> > > > >
> > > > > > >> wrote:
> > > > > > >>
> > > > > > >> > Thanks, Steffen. I think there is NO open issue to block the
> > > > MKLDNN to
> > > > > > >> GA
> > > > > > >> > now.
> > > > > > >> >
> > > > > > >> > BTW, several quantization related PRs (#13297,#13260) are
> under
> > > > the
> > > > > > >> review
> > > > > > >> > and I think it can be merged in this week.
> > > > > > >> >
> > > > > > >> > Thanks,
> > > > > > >> >
> > > > > > >> > --Patric
> > > > > > >> >
> > > > > > >> >
> > > > > > >> > > -----Original Message-----
> > > > > > >> > > From: Steffen Rochel [mailto:steffenrochel@gmail.com]
> > > > > > >> > > Sent: Tuesday, November 20, 2018 2:57 AM
> > > > > > >> > > To: dev@mxnet.incubator.apache.org
> > > > > > >> > > Subject: Re: [Announce] Upcoming Apache MXNet
> (incubating) 1.4.0
> > > > > > >> release
> > > > > > >> > >
> > > > > > >> > > On Friday the contributors working on Java API discovered
> a
> > > > > > potential
> > > > > > >> > > performance problem with inference using Java API vs.
> Python.
> > > > > > >> > Investigation
> > > > > > >> > > is ongoing.
> > > > > > >> > > As the Java API is one of the main features for the
> upcoming
> > > > > > release,
> > > > > > >> I
> > > > > > >> > > suggest to post-pone the code freeze towards end of this
> week.
> > > > > > >> > >
> > > > > > >> > > Please provide feedback and concern about the change in
> dates
> > > > for
> > > > > > code
> > > > > > >> > > freeze and 1.4.0 release. I will provide updates on
> progress
> > > > > > resolving
> > > > > > >> > the
> > > > > > >> > > potential performance problem.
> > > > > > >> > >
> > > > > > >> > > Patrick - do you think it is possible to resolve the
> remaining
> > > > > > issues
> > > > > > >> on
> > > > > > >> > MKL-
> > > > > > >> > > DNN this week, so we can consider GA for MKL-DNN with
> 1.4.0?
> > > > > > >> > >
> > > > > > >> > > Regards,
> > > > > > >> > > Steffen
> > > > > > >> > >
> > > > > > >> > > On Thu, Nov 15, 2018 at 5:26 AM Anton Chernov <
> > > > mechernov@gmail.com>
> > > > > > >> > > wrote:
> > > > > > >> > >
> > > > > > >> > > > I'd like to remind everyone that 'code freeze' would
> mean
> > > > cutting
> > > > > > a
> > > > > > >> > > > v1.4.x release branch and all following fixes would
> need to be
> > > > > > >> > backported.
> > > > > > >> > > > Development on master can be continued as usual.
> > > > > > >> > > >
> > > > > > >> > > > Best
> > > > > > >> > > > Anton
> > > > > > >> > > >
> > > > > > >> > > > ср, 14 нояб. 2018 г. в 6:04, Steffen Rochel <
> > > > > > >> steffenrochel@gmail.com>:
> > > > > > >> > > >
> > > > > > >> > > > > Dear MXNet community,
> > > > > > >> > > > > the agreed plan was to establish code freeze for 1.4.0
> > > > release
> > > > > > >> > > > > today. As the 1.3.1 patch release is still ongoing I
> > > > suggest to
> > > > > > >> > > > > post-pone the code freeze to Friday 16th November
> 2018.
> > > > > > >> > > > >
> > > > > > >> > > > > Sergey Kolychev has agreed to act as co-release
> manager for
> > > > all
> > > > > > >> > > > > tasks
> > > > > > >> > > > which
> > > > > > >> > > > > require committer privileges. If anybody is
> interested to
> > > > > > >> volunteer
> > > > > > >> > > > > as release manager - now is the time to speak up.
> Otherwise
> > > > I
> > > > > > will
> > > > > > >> > > > > manage
> > > > > > >> > > > the
> > > > > > >> > > > > release.
> > > > > > >> > > > >
> > > > > > >> > > > > Regards,
> > > > > > >> > > > > Steffen
> > > > > > >> > > > >
> > > > > > >> > > >
> > > > > > >> >
> > > > > > >>
> > > > > > >
> > > > > >
> > > >
>

Re: [Announce] Upcoming Apache MXNet (incubating) 1.4.0 release

Posted by Pedro Larroy <pe...@gmail.com>.
To be precise, what would be the consequences of not having these env
variables set in the engine threads related to OMP?
Given your experience with OpenMP I hope you can help us answer these questions.

Hopefully we can get the same effect (if any) of these setenvs using
some openmp call or a pragma. Definitely we shouldn't be mutating the
environment from a different thread from what I understand, which is
the likely cause of the random crashes some users are experiencing.

Pedro
On Thu, Nov 29, 2018 at 5:00 PM Pedro Larroy
<pe...@gmail.com> wrote:
>
> Chris.  The problem is with setenv, not with getenv. We don't want to
> remove any getenv call, just these misplaced setenvs:
>
>
> https://github.com/apache/incubator-mxnet/blob/master/src/initialize.cc#L61
>
> Please check the code above carefully and give us your feedback. Based
> on your email I think we don't yet have a common understanding of the
> root cause of this issue.
>
> Pedro.
> On Thu, Nov 29, 2018 at 4:02 PM Chris Olivier <cj...@gmail.com> wrote:
> >
> > - getenv should be thread safe as long as nothing is calling putenv/setenv
> > in another thread (the environment doesn’t change) as stated here:
> >
> > http://www.cplusplus.com/reference/cstdlib/getenv/
> >
> > it’s a simple library call, so to be sure either way, one can check the
> > actual source and see (in case some particular implementation is acting in
> > a particularly thread-unsafe manner). This should be vetted before making
> > any high-impact decisions such as trying to go remove every getenv call in
> > the whole system.
> >
> > - locking after fork is possibly due to libgomp not supporting forking such
> > that after a fork, a call is made to release the blocked omp threads and
> > the main thread waits for the omp threads to finish, but the omp threads
> > belong to the pre-forked process and thus never execute, causing that
> > forked process to freeze.  This behavior has been witnessed before.
> >
> >
> >
> >
> > On Thu, Nov 29, 2018 at 6:13 AM Pedro Larroy <pe...@gmail.com>
> > wrote:
> >
> > > Hi all.
> > >
> > > There are two important issues / fixes that should go in the next
> > > release in my radar:
> > >
> > > 1) https://github.com/apache/incubator-mxnet/pull/13409/files
> > > There is a bug in shape inference on CPU when not using MKL, also we
> > > are running activation on CPU via MKL when we compile CUDNN+MKLDNN.
> > > I'm finishing a fix for these issues in the above PR.
> > >
> > > 2) https://github.com/apache/incubator-mxnet/issues/13438
> > > We are seeing crashes due to unsafe setenv in multithreaded code.
> > > Setenv / getenv from multiple threads is not safe and is causing
> > > segfaults. This piece of code (the handlers in pthread_atfork) already
> > > caused a very difficult to diagnose hang in a previous release, where
> > > a fork inside cudnn would deadlock the engine.
> > >
> > > I would remove setenv from 2) as a mitigation, but we would need to
> > > check for regressions as we could be creating additional threads
> > > inside the engine.
> > >
> > > I would suggest that we address these two major issues before the next
> > > release.
> > >
> > > Pedro
> > >
> > >
> > >
> > > On Sun, Nov 25, 2018 at 11:41 PM Steffen Rochel <st...@gmail.com>
> > > wrote:
> > > >
> > > > Dear MXNet community,
> > > >
> > > > I will be the release manager for the upcoming Apache MXNet 1.4.0
> > > release.
> > > > Sergey Kolychev will be co-managing the release and providing help from
> > > the
> > > > committers side.
> > > > A release candidate will be cut on November 29, 2018 and voting will
> > > start
> > > > December 7, 2018. Release notes have been drafted here [1]. If you have
> > > any
> > > > additional features in progress and would like to include it in this
> > > > release, please assure they have been merged by November 27, 2018.
> > > Release
> > > > schedule is available here [2].
> > > >
> > > > Feel free to add any other comments/suggestions. Please help to review
> > > and
> > > > merge outstanding PR's and resolve issues impacting the quality of the
> > > > 1.4.0 release.
> > > >
> > > > Regards,
> > > >
> > > > Steffen
> > > >
> > > > [1]
> > > >
> > > https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Notes
> > > >
> > > > [2]
> > > https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Plan+and+Status
> > > >
> > > >
> > > >
> > > >
> > > > On Tue, Nov 20, 2018 at 7:15 PM kellen sunderland <
> > > > kellen.sunderland@gmail.com> wrote:
> > > >
> > > > > Spoke too soon[1], looks like others have been adding Turing support as
> > > > > well (thanks to those helping with this).  I believe there's still a
> > > few
> > > > > changes we'd have to make to claim support though (mshadow CMake
> > > changes,
> > > > > PyPi package creation tweaks).
> > > > >
> > > > > 1:
> > > > >
> > > > >
> > > https://github.com/apache/incubator-mxnet/commit/2c3357443ec3d49a11e93c89f278264ce10c2f08
> > > > >
> > > > > On Tue, Nov 20, 2018 at 7:00 PM kellen sunderland <
> > > > > kellen.sunderland@gmail.com> wrote:
> > > > >
> > > > > > Hey Steffen, I'd like to be able to merge this PR for version 1.4:
> > > > > > https://github.com/apache/incubator-mxnet/pull/13310 . It fixes a
> > > > > > regression in master which causes incorrect feature vectors to be
> > > output
> > > > > > when using the TensorRT feature.  (Thanks to Nathalie for helping me
> > > > > track
> > > > > > down the root cause of the issue).   I'm currently blocked on a CI
> > > issue
> > > > > I
> > > > > > haven't seen before, but hope to have it resolved by EOW.
> > > > > >
> > > > > > One call-out I would make is that we currently don't support Turing
> > > > > > architecture (sm_75).  I've been slowly trying to add support, but I
> > > > > don't
> > > > > > think I'd have capacity to do this done by EOW.  Does anyone feel
> > > > > strongly
> > > > > > we need this in the 1.4 release?  From my perspective this will
> > > already
> > > > > be
> > > > > > a strong release without it.
> > > > > >
> > > > > > On Tue, Nov 20, 2018 at 6:42 PM Steffen Rochel <
> > > steffenrochel@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > >> Thanks Patrick, lets target to get the PR's merged this week.
> > > > > >>
> > > > > >> Call for contributions from the community: Right now we have 10 PR
> > > > > >> awaiting
> > > > > >> merge
> > > > > >> <
> > > > > >>
> > > > >
> > > https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aopen+label%3Apr-awaiting-merge+
> > > > > >> >
> > > > > >> and
> > > > > >> we have 61 open PR awaiting review.
> > > > > >> <
> > > > > >>
> > > > >
> > > https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aopen+label%3Apr-awaiting-review
> > > > > >> >
> > > > > >> I would appreciate if you all can help to review the open PR and the
> > > > > >> committers can drive the merge before code freeze for 1.4.0.
> > > > > >>
> > > > > >> The contributors on the Java API are making progress, but not all
> > > > > >> performance issues are resolved. With some luck it should be
> > > possible to
> > > > > >> code freeze towards end of this week.
> > > > > >>
> > > > > >> Are there other critical features/bugs/PR you think need to be
> > > included
> > > > > in
> > > > > >> 1.4.0? If so, please communicate as soon as possible.
> > > > > >>
> > > > > >> Regards,
> > > > > >> Steffen
> > > > > >>
> > > > > >> On Mon, Nov 19, 2018 at 8:26 PM Zhao, Patric <patric.zhao@intel.com
> > > >
> > > > > >> wrote:
> > > > > >>
> > > > > >> > Thanks, Steffen. I think there is NO open issue to block the
> > > MKLDNN to
> > > > > >> GA
> > > > > >> > now.
> > > > > >> >
> > > > > >> > BTW, several quantization related PRs (#13297,#13260) are under
> > > the
> > > > > >> review
> > > > > >> > and I think it can be merged in this week.
> > > > > >> >
> > > > > >> > Thanks,
> > > > > >> >
> > > > > >> > --Patric
> > > > > >> >
> > > > > >> >
> > > > > >> > > -----Original Message-----
> > > > > >> > > From: Steffen Rochel [mailto:steffenrochel@gmail.com]
> > > > > >> > > Sent: Tuesday, November 20, 2018 2:57 AM
> > > > > >> > > To: dev@mxnet.incubator.apache.org
> > > > > >> > > Subject: Re: [Announce] Upcoming Apache MXNet (incubating) 1.4.0
> > > > > >> release
> > > > > >> > >
> > > > > >> > > On Friday the contributors working on Java API discovered a
> > > > > potential
> > > > > >> > > performance problem with inference using Java API vs. Python.
> > > > > >> > Investigation
> > > > > >> > > is ongoing.
> > > > > >> > > As the Java API is one of the main features for the upcoming
> > > > > release,
> > > > > >> I
> > > > > >> > > suggest to post-pone the code freeze towards end of this week.
> > > > > >> > >
> > > > > >> > > Please provide feedback and concern about the change in dates
> > > for
> > > > > code
> > > > > >> > > freeze and 1.4.0 release. I will provide updates on progress
> > > > > resolving
> > > > > >> > the
> > > > > >> > > potential performance problem.
> > > > > >> > >
> > > > > >> > > Patrick - do you think it is possible to resolve the remaining
> > > > > issues
> > > > > >> on
> > > > > >> > MKL-
> > > > > >> > > DNN this week, so we can consider GA for MKL-DNN with 1.4.0?
> > > > > >> > >
> > > > > >> > > Regards,
> > > > > >> > > Steffen
> > > > > >> > >
> > > > > >> > > On Thu, Nov 15, 2018 at 5:26 AM Anton Chernov <
> > > mechernov@gmail.com>
> > > > > >> > > wrote:
> > > > > >> > >
> > > > > >> > > > I'd like to remind everyone that 'code freeze' would mean
> > > cutting
> > > > > a
> > > > > >> > > > v1.4.x release branch and all following fixes would need to be
> > > > > >> > backported.
> > > > > >> > > > Development on master can be continued as usual.
> > > > > >> > > >
> > > > > >> > > > Best
> > > > > >> > > > Anton
> > > > > >> > > >
> > > > > >> > > > ср, 14 нояб. 2018 г. в 6:04, Steffen Rochel <
> > > > > >> steffenrochel@gmail.com>:
> > > > > >> > > >
> > > > > >> > > > > Dear MXNet community,
> > > > > >> > > > > the agreed plan was to establish code freeze for 1.4.0
> > > release
> > > > > >> > > > > today. As the 1.3.1 patch release is still ongoing I
> > > suggest to
> > > > > >> > > > > post-pone the code freeze to Friday 16th November 2018.
> > > > > >> > > > >
> > > > > >> > > > > Sergey Kolychev has agreed to act as co-release manager for
> > > all
> > > > > >> > > > > tasks
> > > > > >> > > > which
> > > > > >> > > > > require committer privileges. If anybody is interested to
> > > > > >> volunteer
> > > > > >> > > > > as release manager - now is the time to speak up. Otherwise
> > > I
> > > > > will
> > > > > >> > > > > manage
> > > > > >> > > > the
> > > > > >> > > > > release.
> > > > > >> > > > >
> > > > > >> > > > > Regards,
> > > > > >> > > > > Steffen
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> >
> > > > > >>
> > > > > >
> > > > >
> > >

Re: [Announce] Upcoming Apache MXNet (incubating) 1.4.0 release

Posted by Chris Olivier <cj...@gmail.com>.
I see. Yeah probably those can be removed. I haven’t checked the source,
but I would be surprised if omp even looked at the environment variable
after initial startup since looking up environment variables is a slow
linear search each time.

On Thu, Nov 29, 2018 at 8:09 AM Pedro Larroy <pe...@gmail.com>
wrote:

> Chris.  The problem is with setenv, not with getenv. We don't want to
> remove any getenv call, just these misplaced setenvs:
>
>
> https://github.com/apache/incubator-mxnet/blob/master/src/initialize.cc#L61
>
> Please check the code above carefully and give us your feedback. Based
> on your email I think we don't yet have a common understanding of the
> root cause of this issue.
>
> Pedro.
> On Thu, Nov 29, 2018 at 4:02 PM Chris Olivier <cj...@gmail.com>
> wrote:
> >
> > - getenv should be thread safe as long as nothing is calling
> putenv/setenv
> > in another thread (the environment doesn’t change) as stated here:
> >
> > http://www.cplusplus.com/reference/cstdlib/getenv/
> >
> > it’s a simple library call, so to be sure either way, one can check the
> > actual source and see (in case some particular implementation is acting
> in
> > a particularly thread-unsafe manner). This should be vetted before making
> > any high-impact decisions such as trying to go remove every getenv call
> in
> > the whole system.
> >
> > - locking after fork is possibly due to libgomp not supporting forking
> such
> > that after a fork, a call is made to release the blocked omp threads and
> > the main thread waits for the omp threads to finish, but the omp threads
> > belong to the pre-forked process and thus never execute, causing that
> > forked process to freeze.  This behavior has been witnessed before.
> >
> >
> >
> >
> > On Thu, Nov 29, 2018 at 6:13 AM Pedro Larroy <
> pedro.larroy.lists@gmail.com>
> > wrote:
> >
> > > Hi all.
> > >
> > > There are two important issues / fixes that should go in the next
> > > release in my radar:
> > >
> > > 1) https://github.com/apache/incubator-mxnet/pull/13409/files
> > > There is a bug in shape inference on CPU when not using MKL, also we
> > > are running activation on CPU via MKL when we compile CUDNN+MKLDNN.
> > > I'm finishing a fix for these issues in the above PR.
> > >
> > > 2) https://github.com/apache/incubator-mxnet/issues/13438
> > > We are seeing crashes due to unsafe setenv in multithreaded code.
> > > Setenv / getenv from multiple threads is not safe and is causing
> > > segfaults. This piece of code (the handlers in pthread_atfork) already
> > > caused a very difficult to diagnose hang in a previous release, where
> > > a fork inside cudnn would deadlock the engine.
> > >
> > > I would remove setenv from 2) as a mitigation, but we would need to
> > > check for regressions as we could be creating additional threads
> > > inside the engine.
> > >
> > > I would suggest that we address these two major issues before the next
> > > release.
> > >
> > > Pedro
> > >
> > >
> > >
> > > On Sun, Nov 25, 2018 at 11:41 PM Steffen Rochel <
> steffenrochel@gmail.com>
> > > wrote:
> > > >
> > > > Dear MXNet community,
> > > >
> > > > I will be the release manager for the upcoming Apache MXNet 1.4.0
> > > release.
> > > > Sergey Kolychev will be co-managing the release and providing help
> from
> > > the
> > > > committers side.
> > > > A release candidate will be cut on November 29, 2018 and voting will
> > > start
> > > > December 7, 2018. Release notes have been drafted here [1]. If you
> have
> > > any
> > > > additional features in progress and would like to include it in this
> > > > release, please assure they have been merged by November 27, 2018.
> > > Release
> > > > schedule is available here [2].
> > > >
> > > > Feel free to add any other comments/suggestions. Please help to
> review
> > > and
> > > > merge outstanding PR's and resolve issues impacting the quality of
> the
> > > > 1.4.0 release.
> > > >
> > > > Regards,
> > > >
> > > > Steffen
> > > >
> > > > [1]
> > > >
> > >
> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Notes
> > > >
> > > > [2]
> > >
> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Plan+and+Status
> > > >
> > > >
> > > >
> > > >
> > > > On Tue, Nov 20, 2018 at 7:15 PM kellen sunderland <
> > > > kellen.sunderland@gmail.com> wrote:
> > > >
> > > > > Spoke too soon[1], looks like others have been adding Turing
> support as
> > > > > well (thanks to those helping with this).  I believe there's still
> a
> > > few
> > > > > changes we'd have to make to claim support though (mshadow CMake
> > > changes,
> > > > > PyPi package creation tweaks).
> > > > >
> > > > > 1:
> > > > >
> > > > >
> > >
> https://github.com/apache/incubator-mxnet/commit/2c3357443ec3d49a11e93c89f278264ce10c2f08
> > > > >
> > > > > On Tue, Nov 20, 2018 at 7:00 PM kellen sunderland <
> > > > > kellen.sunderland@gmail.com> wrote:
> > > > >
> > > > > > Hey Steffen, I'd like to be able to merge this PR for version
> 1.4:
> > > > > > https://github.com/apache/incubator-mxnet/pull/13310 . It fixes
> a
> > > > > > regression in master which causes incorrect feature vectors to be
> > > output
> > > > > > when using the TensorRT feature.  (Thanks to Nathalie for
> helping me
> > > > > track
> > > > > > down the root cause of the issue).   I'm currently blocked on a
> CI
> > > issue
> > > > > I
> > > > > > haven't seen before, but hope to have it resolved by EOW.
> > > > > >
> > > > > > One call-out I would make is that we currently don't support
> Turing
> > > > > > architecture (sm_75).  I've been slowly trying to add support,
> but I
> > > > > don't
> > > > > > think I'd have capacity to do this done by EOW.  Does anyone feel
> > > > > strongly
> > > > > > we need this in the 1.4 release?  From my perspective this will
> > > already
> > > > > be
> > > > > > a strong release without it.
> > > > > >
> > > > > > On Tue, Nov 20, 2018 at 6:42 PM Steffen Rochel <
> > > steffenrochel@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > >> Thanks Patrick, lets target to get the PR's merged this week.
> > > > > >>
> > > > > >> Call for contributions from the community: Right now we have 10
> PR
> > > > > >> awaiting
> > > > > >> merge
> > > > > >> <
> > > > > >>
> > > > >
> > >
> https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aopen+label%3Apr-awaiting-merge+
> > > > > >> >
> > > > > >> and
> > > > > >> we have 61 open PR awaiting review.
> > > > > >> <
> > > > > >>
> > > > >
> > >
> https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aopen+label%3Apr-awaiting-review
> > > > > >> >
> > > > > >> I would appreciate if you all can help to review the open PR
> and the
> > > > > >> committers can drive the merge before code freeze for 1.4.0.
> > > > > >>
> > > > > >> The contributors on the Java API are making progress, but not
> all
> > > > > >> performance issues are resolved. With some luck it should be
> > > possible to
> > > > > >> code freeze towards end of this week.
> > > > > >>
> > > > > >> Are there other critical features/bugs/PR you think need to be
> > > included
> > > > > in
> > > > > >> 1.4.0? If so, please communicate as soon as possible.
> > > > > >>
> > > > > >> Regards,
> > > > > >> Steffen
> > > > > >>
> > > > > >> On Mon, Nov 19, 2018 at 8:26 PM Zhao, Patric <
> patric.zhao@intel.com
> > > >
> > > > > >> wrote:
> > > > > >>
> > > > > >> > Thanks, Steffen. I think there is NO open issue to block the
> > > MKLDNN to
> > > > > >> GA
> > > > > >> > now.
> > > > > >> >
> > > > > >> > BTW, several quantization related PRs (#13297,#13260) are
> under
> > > the
> > > > > >> review
> > > > > >> > and I think it can be merged in this week.
> > > > > >> >
> > > > > >> > Thanks,
> > > > > >> >
> > > > > >> > --Patric
> > > > > >> >
> > > > > >> >
> > > > > >> > > -----Original Message-----
> > > > > >> > > From: Steffen Rochel [mailto:steffenrochel@gmail.com]
> > > > > >> > > Sent: Tuesday, November 20, 2018 2:57 AM
> > > > > >> > > To: dev@mxnet.incubator.apache.org
> > > > > >> > > Subject: Re: [Announce] Upcoming Apache MXNet (incubating)
> 1.4.0
> > > > > >> release
> > > > > >> > >
> > > > > >> > > On Friday the contributors working on Java API discovered a
> > > > > potential
> > > > > >> > > performance problem with inference using Java API vs.
> Python.
> > > > > >> > Investigation
> > > > > >> > > is ongoing.
> > > > > >> > > As the Java API is one of the main features for the upcoming
> > > > > release,
> > > > > >> I
> > > > > >> > > suggest to post-pone the code freeze towards end of this
> week.
> > > > > >> > >
> > > > > >> > > Please provide feedback and concern about the change in
> dates
> > > for
> > > > > code
> > > > > >> > > freeze and 1.4.0 release. I will provide updates on progress
> > > > > resolving
> > > > > >> > the
> > > > > >> > > potential performance problem.
> > > > > >> > >
> > > > > >> > > Patrick - do you think it is possible to resolve the
> remaining
> > > > > issues
> > > > > >> on
> > > > > >> > MKL-
> > > > > >> > > DNN this week, so we can consider GA for MKL-DNN with 1.4.0?
> > > > > >> > >
> > > > > >> > > Regards,
> > > > > >> > > Steffen
> > > > > >> > >
> > > > > >> > > On Thu, Nov 15, 2018 at 5:26 AM Anton Chernov <
> > > mechernov@gmail.com>
> > > > > >> > > wrote:
> > > > > >> > >
> > > > > >> > > > I'd like to remind everyone that 'code freeze' would mean
> > > cutting
> > > > > a
> > > > > >> > > > v1.4.x release branch and all following fixes would need
> to be
> > > > > >> > backported.
> > > > > >> > > > Development on master can be continued as usual.
> > > > > >> > > >
> > > > > >> > > > Best
> > > > > >> > > > Anton
> > > > > >> > > >
> > > > > >> > > > ср, 14 нояб. 2018 г. в 6:04, Steffen Rochel <
> > > > > >> steffenrochel@gmail.com>:
> > > > > >> > > >
> > > > > >> > > > > Dear MXNet community,
> > > > > >> > > > > the agreed plan was to establish code freeze for 1.4.0
> > > release
> > > > > >> > > > > today. As the 1.3.1 patch release is still ongoing I
> > > suggest to
> > > > > >> > > > > post-pone the code freeze to Friday 16th November 2018.
> > > > > >> > > > >
> > > > > >> > > > > Sergey Kolychev has agreed to act as co-release manager
> for
> > > all
> > > > > >> > > > > tasks
> > > > > >> > > > which
> > > > > >> > > > > require committer privileges. If anybody is interested
> to
> > > > > >> volunteer
> > > > > >> > > > > as release manager - now is the time to speak up.
> Otherwise
> > > I
> > > > > will
> > > > > >> > > > > manage
> > > > > >> > > > the
> > > > > >> > > > > release.
> > > > > >> > > > >
> > > > > >> > > > > Regards,
> > > > > >> > > > > Steffen
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> >
> > > > > >>
> > > > > >
> > > > >
> > >
>

Re: [Announce] Upcoming Apache MXNet (incubating) 1.4.0 release

Posted by Pedro Larroy <pe...@gmail.com>.
Chris.  The problem is with setenv, not with getenv. We don't want to
remove any getenv call, just these misplaced setenvs:


https://github.com/apache/incubator-mxnet/blob/master/src/initialize.cc#L61

Please check the code above carefully and give us your feedback. Based
on your email I think we don't yet have a common understanding of the
root cause of this issue.

Pedro.
On Thu, Nov 29, 2018 at 4:02 PM Chris Olivier <cj...@gmail.com> wrote:
>
> - getenv should be thread safe as long as nothing is calling putenv/setenv
> in another thread (the environment doesn’t change) as stated here:
>
> http://www.cplusplus.com/reference/cstdlib/getenv/
>
> it’s a simple library call, so to be sure either way, one can check the
> actual source and see (in case some particular implementation is acting in
> a particularly thread-unsafe manner). This should be vetted before making
> any high-impact decisions such as trying to go remove every getenv call in
> the whole system.
>
> - locking after fork is possibly due to libgomp not supporting forking such
> that after a fork, a call is made to release the blocked omp threads and
> the main thread waits for the omp threads to finish, but the omp threads
> belong to the pre-forked process and thus never execute, causing that
> forked process to freeze.  This behavior has been witnessed before.
>
>
>
>
> On Thu, Nov 29, 2018 at 6:13 AM Pedro Larroy <pe...@gmail.com>
> wrote:
>
> > Hi all.
> >
> > There are two important issues / fixes that should go in the next
> > release in my radar:
> >
> > 1) https://github.com/apache/incubator-mxnet/pull/13409/files
> > There is a bug in shape inference on CPU when not using MKL, also we
> > are running activation on CPU via MKL when we compile CUDNN+MKLDNN.
> > I'm finishing a fix for these issues in the above PR.
> >
> > 2) https://github.com/apache/incubator-mxnet/issues/13438
> > We are seeing crashes due to unsafe setenv in multithreaded code.
> > Setenv / getenv from multiple threads is not safe and is causing
> > segfaults. This piece of code (the handlers in pthread_atfork) already
> > caused a very difficult to diagnose hang in a previous release, where
> > a fork inside cudnn would deadlock the engine.
> >
> > I would remove setenv from 2) as a mitigation, but we would need to
> > check for regressions as we could be creating additional threads
> > inside the engine.
> >
> > I would suggest that we address these two major issues before the next
> > release.
> >
> > Pedro
> >
> >
> >
> > On Sun, Nov 25, 2018 at 11:41 PM Steffen Rochel <st...@gmail.com>
> > wrote:
> > >
> > > Dear MXNet community,
> > >
> > > I will be the release manager for the upcoming Apache MXNet 1.4.0
> > release.
> > > Sergey Kolychev will be co-managing the release and providing help from
> > the
> > > committers side.
> > > A release candidate will be cut on November 29, 2018 and voting will
> > start
> > > December 7, 2018. Release notes have been drafted here [1]. If you have
> > any
> > > additional features in progress and would like to include it in this
> > > release, please assure they have been merged by November 27, 2018.
> > Release
> > > schedule is available here [2].
> > >
> > > Feel free to add any other comments/suggestions. Please help to review
> > and
> > > merge outstanding PR's and resolve issues impacting the quality of the
> > > 1.4.0 release.
> > >
> > > Regards,
> > >
> > > Steffen
> > >
> > > [1]
> > >
> > https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Notes
> > >
> > > [2]
> > https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Plan+and+Status
> > >
> > >
> > >
> > >
> > > On Tue, Nov 20, 2018 at 7:15 PM kellen sunderland <
> > > kellen.sunderland@gmail.com> wrote:
> > >
> > > > Spoke too soon[1], looks like others have been adding Turing support as
> > > > well (thanks to those helping with this).  I believe there's still a
> > few
> > > > changes we'd have to make to claim support though (mshadow CMake
> > changes,
> > > > PyPi package creation tweaks).
> > > >
> > > > 1:
> > > >
> > > >
> > https://github.com/apache/incubator-mxnet/commit/2c3357443ec3d49a11e93c89f278264ce10c2f08
> > > >
> > > > On Tue, Nov 20, 2018 at 7:00 PM kellen sunderland <
> > > > kellen.sunderland@gmail.com> wrote:
> > > >
> > > > > Hey Steffen, I'd like to be able to merge this PR for version 1.4:
> > > > > https://github.com/apache/incubator-mxnet/pull/13310 . It fixes a
> > > > > regression in master which causes incorrect feature vectors to be
> > output
> > > > > when using the TensorRT feature.  (Thanks to Nathalie for helping me
> > > > track
> > > > > down the root cause of the issue).   I'm currently blocked on a CI
> > issue
> > > > I
> > > > > haven't seen before, but hope to have it resolved by EOW.
> > > > >
> > > > > One call-out I would make is that we currently don't support Turing
> > > > > architecture (sm_75).  I've been slowly trying to add support, but I
> > > > don't
> > > > > think I'd have capacity to do this done by EOW.  Does anyone feel
> > > > strongly
> > > > > we need this in the 1.4 release?  From my perspective this will
> > already
> > > > be
> > > > > a strong release without it.
> > > > >
> > > > > On Tue, Nov 20, 2018 at 6:42 PM Steffen Rochel <
> > steffenrochel@gmail.com>
> > > > > wrote:
> > > > >
> > > > >> Thanks Patrick, lets target to get the PR's merged this week.
> > > > >>
> > > > >> Call for contributions from the community: Right now we have 10 PR
> > > > >> awaiting
> > > > >> merge
> > > > >> <
> > > > >>
> > > >
> > https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aopen+label%3Apr-awaiting-merge+
> > > > >> >
> > > > >> and
> > > > >> we have 61 open PR awaiting review.
> > > > >> <
> > > > >>
> > > >
> > https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aopen+label%3Apr-awaiting-review
> > > > >> >
> > > > >> I would appreciate if you all can help to review the open PR and the
> > > > >> committers can drive the merge before code freeze for 1.4.0.
> > > > >>
> > > > >> The contributors on the Java API are making progress, but not all
> > > > >> performance issues are resolved. With some luck it should be
> > possible to
> > > > >> code freeze towards end of this week.
> > > > >>
> > > > >> Are there other critical features/bugs/PR you think need to be
> > included
> > > > in
> > > > >> 1.4.0? If so, please communicate as soon as possible.
> > > > >>
> > > > >> Regards,
> > > > >> Steffen
> > > > >>
> > > > >> On Mon, Nov 19, 2018 at 8:26 PM Zhao, Patric <patric.zhao@intel.com
> > >
> > > > >> wrote:
> > > > >>
> > > > >> > Thanks, Steffen. I think there is NO open issue to block the
> > MKLDNN to
> > > > >> GA
> > > > >> > now.
> > > > >> >
> > > > >> > BTW, several quantization related PRs (#13297,#13260) are under
> > the
> > > > >> review
> > > > >> > and I think it can be merged in this week.
> > > > >> >
> > > > >> > Thanks,
> > > > >> >
> > > > >> > --Patric
> > > > >> >
> > > > >> >
> > > > >> > > -----Original Message-----
> > > > >> > > From: Steffen Rochel [mailto:steffenrochel@gmail.com]
> > > > >> > > Sent: Tuesday, November 20, 2018 2:57 AM
> > > > >> > > To: dev@mxnet.incubator.apache.org
> > > > >> > > Subject: Re: [Announce] Upcoming Apache MXNet (incubating) 1.4.0
> > > > >> release
> > > > >> > >
> > > > >> > > On Friday the contributors working on Java API discovered a
> > > > potential
> > > > >> > > performance problem with inference using Java API vs. Python.
> > > > >> > Investigation
> > > > >> > > is ongoing.
> > > > >> > > As the Java API is one of the main features for the upcoming
> > > > release,
> > > > >> I
> > > > >> > > suggest to post-pone the code freeze towards end of this week.
> > > > >> > >
> > > > >> > > Please provide feedback and concern about the change in dates
> > for
> > > > code
> > > > >> > > freeze and 1.4.0 release. I will provide updates on progress
> > > > resolving
> > > > >> > the
> > > > >> > > potential performance problem.
> > > > >> > >
> > > > >> > > Patrick - do you think it is possible to resolve the remaining
> > > > issues
> > > > >> on
> > > > >> > MKL-
> > > > >> > > DNN this week, so we can consider GA for MKL-DNN with 1.4.0?
> > > > >> > >
> > > > >> > > Regards,
> > > > >> > > Steffen
> > > > >> > >
> > > > >> > > On Thu, Nov 15, 2018 at 5:26 AM Anton Chernov <
> > mechernov@gmail.com>
> > > > >> > > wrote:
> > > > >> > >
> > > > >> > > > I'd like to remind everyone that 'code freeze' would mean
> > cutting
> > > > a
> > > > >> > > > v1.4.x release branch and all following fixes would need to be
> > > > >> > backported.
> > > > >> > > > Development on master can be continued as usual.
> > > > >> > > >
> > > > >> > > > Best
> > > > >> > > > Anton
> > > > >> > > >
> > > > >> > > > ср, 14 нояб. 2018 г. в 6:04, Steffen Rochel <
> > > > >> steffenrochel@gmail.com>:
> > > > >> > > >
> > > > >> > > > > Dear MXNet community,
> > > > >> > > > > the agreed plan was to establish code freeze for 1.4.0
> > release
> > > > >> > > > > today. As the 1.3.1 patch release is still ongoing I
> > suggest to
> > > > >> > > > > post-pone the code freeze to Friday 16th November 2018.
> > > > >> > > > >
> > > > >> > > > > Sergey Kolychev has agreed to act as co-release manager for
> > all
> > > > >> > > > > tasks
> > > > >> > > > which
> > > > >> > > > > require committer privileges. If anybody is interested to
> > > > >> volunteer
> > > > >> > > > > as release manager - now is the time to speak up. Otherwise
> > I
> > > > will
> > > > >> > > > > manage
> > > > >> > > > the
> > > > >> > > > > release.
> > > > >> > > > >
> > > > >> > > > > Regards,
> > > > >> > > > > Steffen
> > > > >> > > > >
> > > > >> > > >
> > > > >> >
> > > > >>
> > > > >
> > > >
> >

Re: [Announce] Upcoming Apache MXNet (incubating) 1.4.0 release

Posted by Chris Olivier <cj...@gmail.com>.
- getenv should be thread safe as long as nothing is calling putenv/setenv
in another thread (the environment doesn’t change) as stated here:

http://www.cplusplus.com/reference/cstdlib/getenv/

it’s a simple library call, so to be sure either way, one can check the
actual source and see (in case some particular implementation is acting in
a particularly thread-unsafe manner). This should be vetted before making
any high-impact decisions such as trying to go remove every getenv call in
the whole system.

- locking after fork is possibly due to libgomp not supporting forking such
that after a fork, a call is made to release the blocked omp threads and
the main thread waits for the omp threads to finish, but the omp threads
belong to the pre-forked process and thus never execute, causing that
forked process to freeze.  This behavior has been witnessed before.




On Thu, Nov 29, 2018 at 6:13 AM Pedro Larroy <pe...@gmail.com>
wrote:

> Hi all.
>
> There are two important issues / fixes that should go in the next
> release in my radar:
>
> 1) https://github.com/apache/incubator-mxnet/pull/13409/files
> There is a bug in shape inference on CPU when not using MKL, also we
> are running activation on CPU via MKL when we compile CUDNN+MKLDNN.
> I'm finishing a fix for these issues in the above PR.
>
> 2) https://github.com/apache/incubator-mxnet/issues/13438
> We are seeing crashes due to unsafe setenv in multithreaded code.
> Setenv / getenv from multiple threads is not safe and is causing
> segfaults. This piece of code (the handlers in pthread_atfork) already
> caused a very difficult to diagnose hang in a previous release, where
> a fork inside cudnn would deadlock the engine.
>
> I would remove setenv from 2) as a mitigation, but we would need to
> check for regressions as we could be creating additional threads
> inside the engine.
>
> I would suggest that we address these two major issues before the next
> release.
>
> Pedro
>
>
>
> On Sun, Nov 25, 2018 at 11:41 PM Steffen Rochel <st...@gmail.com>
> wrote:
> >
> > Dear MXNet community,
> >
> > I will be the release manager for the upcoming Apache MXNet 1.4.0
> release.
> > Sergey Kolychev will be co-managing the release and providing help from
> the
> > committers side.
> > A release candidate will be cut on November 29, 2018 and voting will
> start
> > December 7, 2018. Release notes have been drafted here [1]. If you have
> any
> > additional features in progress and would like to include it in this
> > release, please assure they have been merged by November 27, 2018.
> Release
> > schedule is available here [2].
> >
> > Feel free to add any other comments/suggestions. Please help to review
> and
> > merge outstanding PR's and resolve issues impacting the quality of the
> > 1.4.0 release.
> >
> > Regards,
> >
> > Steffen
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Notes
> >
> > [2]
> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Plan+and+Status
> >
> >
> >
> >
> > On Tue, Nov 20, 2018 at 7:15 PM kellen sunderland <
> > kellen.sunderland@gmail.com> wrote:
> >
> > > Spoke too soon[1], looks like others have been adding Turing support as
> > > well (thanks to those helping with this).  I believe there's still a
> few
> > > changes we'd have to make to claim support though (mshadow CMake
> changes,
> > > PyPi package creation tweaks).
> > >
> > > 1:
> > >
> > >
> https://github.com/apache/incubator-mxnet/commit/2c3357443ec3d49a11e93c89f278264ce10c2f08
> > >
> > > On Tue, Nov 20, 2018 at 7:00 PM kellen sunderland <
> > > kellen.sunderland@gmail.com> wrote:
> > >
> > > > Hey Steffen, I'd like to be able to merge this PR for version 1.4:
> > > > https://github.com/apache/incubator-mxnet/pull/13310 . It fixes a
> > > > regression in master which causes incorrect feature vectors to be
> output
> > > > when using the TensorRT feature.  (Thanks to Nathalie for helping me
> > > track
> > > > down the root cause of the issue).   I'm currently blocked on a CI
> issue
> > > I
> > > > haven't seen before, but hope to have it resolved by EOW.
> > > >
> > > > One call-out I would make is that we currently don't support Turing
> > > > architecture (sm_75).  I've been slowly trying to add support, but I
> > > don't
> > > > think I'd have capacity to do this done by EOW.  Does anyone feel
> > > strongly
> > > > we need this in the 1.4 release?  From my perspective this will
> already
> > > be
> > > > a strong release without it.
> > > >
> > > > On Tue, Nov 20, 2018 at 6:42 PM Steffen Rochel <
> steffenrochel@gmail.com>
> > > > wrote:
> > > >
> > > >> Thanks Patrick, lets target to get the PR's merged this week.
> > > >>
> > > >> Call for contributions from the community: Right now we have 10 PR
> > > >> awaiting
> > > >> merge
> > > >> <
> > > >>
> > >
> https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aopen+label%3Apr-awaiting-merge+
> > > >> >
> > > >> and
> > > >> we have 61 open PR awaiting review.
> > > >> <
> > > >>
> > >
> https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aopen+label%3Apr-awaiting-review
> > > >> >
> > > >> I would appreciate if you all can help to review the open PR and the
> > > >> committers can drive the merge before code freeze for 1.4.0.
> > > >>
> > > >> The contributors on the Java API are making progress, but not all
> > > >> performance issues are resolved. With some luck it should be
> possible to
> > > >> code freeze towards end of this week.
> > > >>
> > > >> Are there other critical features/bugs/PR you think need to be
> included
> > > in
> > > >> 1.4.0? If so, please communicate as soon as possible.
> > > >>
> > > >> Regards,
> > > >> Steffen
> > > >>
> > > >> On Mon, Nov 19, 2018 at 8:26 PM Zhao, Patric <patric.zhao@intel.com
> >
> > > >> wrote:
> > > >>
> > > >> > Thanks, Steffen. I think there is NO open issue to block the
> MKLDNN to
> > > >> GA
> > > >> > now.
> > > >> >
> > > >> > BTW, several quantization related PRs (#13297,#13260) are under
> the
> > > >> review
> > > >> > and I think it can be merged in this week.
> > > >> >
> > > >> > Thanks,
> > > >> >
> > > >> > --Patric
> > > >> >
> > > >> >
> > > >> > > -----Original Message-----
> > > >> > > From: Steffen Rochel [mailto:steffenrochel@gmail.com]
> > > >> > > Sent: Tuesday, November 20, 2018 2:57 AM
> > > >> > > To: dev@mxnet.incubator.apache.org
> > > >> > > Subject: Re: [Announce] Upcoming Apache MXNet (incubating) 1.4.0
> > > >> release
> > > >> > >
> > > >> > > On Friday the contributors working on Java API discovered a
> > > potential
> > > >> > > performance problem with inference using Java API vs. Python.
> > > >> > Investigation
> > > >> > > is ongoing.
> > > >> > > As the Java API is one of the main features for the upcoming
> > > release,
> > > >> I
> > > >> > > suggest to post-pone the code freeze towards end of this week.
> > > >> > >
> > > >> > > Please provide feedback and concern about the change in dates
> for
> > > code
> > > >> > > freeze and 1.4.0 release. I will provide updates on progress
> > > resolving
> > > >> > the
> > > >> > > potential performance problem.
> > > >> > >
> > > >> > > Patrick - do you think it is possible to resolve the remaining
> > > issues
> > > >> on
> > > >> > MKL-
> > > >> > > DNN this week, so we can consider GA for MKL-DNN with 1.4.0?
> > > >> > >
> > > >> > > Regards,
> > > >> > > Steffen
> > > >> > >
> > > >> > > On Thu, Nov 15, 2018 at 5:26 AM Anton Chernov <
> mechernov@gmail.com>
> > > >> > > wrote:
> > > >> > >
> > > >> > > > I'd like to remind everyone that 'code freeze' would mean
> cutting
> > > a
> > > >> > > > v1.4.x release branch and all following fixes would need to be
> > > >> > backported.
> > > >> > > > Development on master can be continued as usual.
> > > >> > > >
> > > >> > > > Best
> > > >> > > > Anton
> > > >> > > >
> > > >> > > > ср, 14 нояб. 2018 г. в 6:04, Steffen Rochel <
> > > >> steffenrochel@gmail.com>:
> > > >> > > >
> > > >> > > > > Dear MXNet community,
> > > >> > > > > the agreed plan was to establish code freeze for 1.4.0
> release
> > > >> > > > > today. As the 1.3.1 patch release is still ongoing I
> suggest to
> > > >> > > > > post-pone the code freeze to Friday 16th November 2018.
> > > >> > > > >
> > > >> > > > > Sergey Kolychev has agreed to act as co-release manager for
> all
> > > >> > > > > tasks
> > > >> > > > which
> > > >> > > > > require committer privileges. If anybody is interested to
> > > >> volunteer
> > > >> > > > > as release manager - now is the time to speak up. Otherwise
> I
> > > will
> > > >> > > > > manage
> > > >> > > > the
> > > >> > > > > release.
> > > >> > > > >
> > > >> > > > > Regards,
> > > >> > > > > Steffen
> > > >> > > > >
> > > >> > > >
> > > >> >
> > > >>
> > > >
> > >
>

Re: [Announce] Upcoming Apache MXNet (incubating) 1.4.0 release

Posted by Steffen Rochel <st...@gmail.com>.
Hi Jake - please go ahead and PR for v1.4.x branch.
Steffen

On Thu, Dec 6, 2018 at 3:21 PM Jake Lee <gs...@gmail.com> wrote:

> Hi Steffen,
>
> It would be nice to add this PR to 1.4 release.
> https://github.com/apache/incubator-mxnet/pull/13550
>
> It fixes the imagedetiter issue for mxnet 1.3
> https://github.com/apache/incubator-mxnet/issues/13037.
>
> Thanks,
>
> Jake Lee
>
> Lin Yuan <ap...@gmail.com> 於 2018年11月29日 週四 下午7:27寫道:
>
> > Hi Steffen,
> >
> > Can we add the following PR to 1.4.0 release:
> >
> > https://github.com/apache/incubator-mxnet/pull/13452
> >
> > It's just a Python API returning header path so it should not cause any
> > regression issues. But it is required for Horovod to integrate MXNet.
> It's
> > better to have this in a minor release than patch release.
> >
> > Thanks,
> >
> > Lin
> >
> > On Thu, Nov 29, 2018 at 6:46 PM Steffen Rochel <st...@gmail.com>
> > wrote:
> >
> > > Hi Zhi - thanks for the improvement, which we should consider for
> 1.4.0.
> > > However, I don't see any tests with the PR and think it is too risky to
> > add
> > > changes without tests. I will add your PR to the tracking list, but
> would
> > > like to ask you to add functional tests before completing the PR to
> > master
> > > and v1.4.x branch.
> > >
> > > Steffen
> > >
> > > On Thu, Nov 29, 2018 at 5:01 PM Joshua Z. Zhang <ch...@gmail.com>
> > > wrote:
> > >
> > > > Hi, I would like to bring a critical performance and stability patch
> of
> > > > existing gluon dataloader to 1.4.0:
> > > > https://github.com/apache/incubator-mxnet/pull/13447 <
> > > > https://github.com/apache/incubator-mxnet/pull/13447>.
> > > >
> > > > This PR is finished, waiting for CI to pass.
> > > >
> > > > Steffen, could you help me add that to the tracked list?
> > > >
> > > > Best,
> > > > Zhi
> > > >
> > > > > On Nov 29, 2018, at 4:25 PM, Naveen Swamy <mn...@gmail.com>
> > wrote:
> > > > >
> > > > > the tests are randomly failing in different stages
> > > > >
> > > >
> > >
> >
> http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-13105/
> > > > > This PR has failed 8 times so far
> > > > >
> > > > > On Thu, Nov 29, 2018 at 3:43 PM Steffen Rochel <
> > > steffenrochel@gmail.com>
> > > > > wrote:
> > > > >
> > > > >> Pedro - ok. Please add PR to v1.4.x branch after merge to master
> and
> > > > please
> > > > >> update tracking page
> > > > >> <
> > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Plan+and+Status#ApacheMXNet(incubating)1.4.0ReleasePlanandStatus-OpenPRstotrack
> > > > >>>
> > > > >> .
> > > > >> Steffen
> > > > >>
> > > > >> On Thu, Nov 29, 2018 at 3:00 PM Pedro Larroy <
> > > > pedro.larroy.lists@gmail.com
> > > > >>>
> > > > >> wrote:
> > > > >>
> > > > >>> PR is ready from my side and passes the tests, unless somebody
> > raises
> > > > >>> any concerns it's good to go.
> > > > >>> On Thu, Nov 29, 2018 at 9:50 PM Steffen Rochel <
> > > > steffenrochel@gmail.com>
> > > > >>> wrote:
> > > > >>>>
> > > > >>>> Pedro - added  to 1.4.0 tracking list
> > > > >>>> <
> > > > >>>
> > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Plan+and+Status#ApacheMXNet(incubating)1.4.0ReleasePlanandStatus-OpenPRstotrack
> > > > >>>>
> > > > >>>>
> > > > >>>> Do you have already ETA?
> > > > >>>> Steffen
> > > > >>>>
> > > > >>>> On Thu, Nov 29, 2018 at 6:13 AM Pedro Larroy <
> > > > >>> pedro.larroy.lists@gmail.com>
> > > > >>>> wrote:
> > > > >>>>
> > > > >>>>> Hi all.
> > > > >>>>>
> > > > >>>>> There are two important issues / fixes that should go in the
> next
> > > > >>>>> release in my radar:
> > > > >>>>>
> > > > >>>>> 1) https://github.com/apache/incubator-mxnet/pull/13409/files
> > > > >>>>> There is a bug in shape inference on CPU when not using MKL,
> also
> > > we
> > > > >>>>> are running activation on CPU via MKL when we compile
> > CUDNN+MKLDNN.
> > > > >>>>> I'm finishing a fix for these issues in the above PR.
> > > > >>>>>
> > > > >>>>> 2) https://github.com/apache/incubator-mxnet/issues/13438
> > > > >>>>> We are seeing crashes due to unsafe setenv in multithreaded
> code.
> > > > >>>>> Setenv / getenv from multiple threads is not safe and is
> causing
> > > > >>>>> segfaults. This piece of code (the handlers in pthread_atfork)
> > > > >> already
> > > > >>>>> caused a very difficult to diagnose hang in a previous release,
> > > where
> > > > >>>>> a fork inside cudnn would deadlock the engine.
> > > > >>>>>
> > > > >>>>> I would remove setenv from 2) as a mitigation, but we would
> need
> > to
> > > > >>>>> check for regressions as we could be creating additional
> threads
> > > > >>>>> inside the engine.
> > > > >>>>>
> > > > >>>>> I would suggest that we address these two major issues before
> the
> > > > >> next
> > > > >>>>> release.
> > > > >>>>>
> > > > >>>>> Pedro
> > > > >>>>>
> > > > >>>>>
> > > > >>>>>
> > > > >>>>> On Sun, Nov 25, 2018 at 11:41 PM Steffen Rochel <
> > > > >>> steffenrochel@gmail.com>
> > > > >>>>> wrote:
> > > > >>>>>>
> > > > >>>>>> Dear MXNet community,
> > > > >>>>>>
> > > > >>>>>> I will be the release manager for the upcoming Apache MXNet
> > 1.4.0
> > > > >>>>> release.
> > > > >>>>>> Sergey Kolychev will be co-managing the release and providing
> > help
> > > > >>> from
> > > > >>>>> the
> > > > >>>>>> committers side.
> > > > >>>>>> A release candidate will be cut on November 29, 2018 and
> voting
> > > > >> will
> > > > >>>>> start
> > > > >>>>>> December 7, 2018. Release notes have been drafted here [1]. If
> > you
> > > > >>> have
> > > > >>>>> any
> > > > >>>>>> additional features in progress and would like to include it
> in
> > > > >> this
> > > > >>>>>> release, please assure they have been merged by November 27,
> > 2018.
> > > > >>>>> Release
> > > > >>>>>> schedule is available here [2].
> > > > >>>>>>
> > > > >>>>>> Feel free to add any other comments/suggestions. Please help
> to
> > > > >>> review
> > > > >>>>> and
> > > > >>>>>> merge outstanding PR's and resolve issues impacting the
> quality
> > of
> > > > >>> the
> > > > >>>>>> 1.4.0 release.
> > > > >>>>>>
> > > > >>>>>> Regards,
> > > > >>>>>>
> > > > >>>>>> Steffen
> > > > >>>>>>
> > > > >>>>>> [1]
> > > > >>>>>>
> > > > >>>>>
> > > > >>>
> > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Notes
> > > > >>>>>>
> > > > >>>>>> [2]
> > > > >>>>>
> > > > >>>
> > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Plan+and+Status
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>>>> On Tue, Nov 20, 2018 at 7:15 PM kellen sunderland <
> > > > >>>>>> kellen.sunderland@gmail.com> wrote:
> > > > >>>>>>
> > > > >>>>>>> Spoke too soon[1], looks like others have been adding Turing
> > > > >>> support as
> > > > >>>>>>> well (thanks to those helping with this).  I believe there's
> > > > >> still
> > > > >>> a
> > > > >>>>> few
> > > > >>>>>>> changes we'd have to make to claim support though (mshadow
> > CMake
> > > > >>>>> changes,
> > > > >>>>>>> PyPi package creation tweaks).
> > > > >>>>>>>
> > > > >>>>>>> 1:
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>
> > > > >>>
> > > > >>
> > > >
> > >
> >
> https://github.com/apache/incubator-mxnet/commit/2c3357443ec3d49a11e93c89f278264ce10c2f08
> > > > >>>>>>>
> > > > >>>>>>> On Tue, Nov 20, 2018 at 7:00 PM kellen sunderland <
> > > > >>>>>>> kellen.sunderland@gmail.com> wrote:
> > > > >>>>>>>
> > > > >>>>>>>> Hey Steffen, I'd like to be able to merge this PR for
> version
> > > > >>> 1.4:
> > > > >>>>>>>> https://github.com/apache/incubator-mxnet/pull/13310 . It
> > > > >> fixes
> > > > >>> a
> > > > >>>>>>>> regression in master which causes incorrect feature vectors
> to
> > > > >> be
> > > > >>>>> output
> > > > >>>>>>>> when using the TensorRT feature.  (Thanks to Nathalie for
> > > > >>> helping me
> > > > >>>>>>> track
> > > > >>>>>>>> down the root cause of the issue).   I'm currently blocked
> on
> > a
> > > > >>> CI
> > > > >>>>> issue
> > > > >>>>>>> I
> > > > >>>>>>>> haven't seen before, but hope to have it resolved by EOW.
> > > > >>>>>>>>
> > > > >>>>>>>> One call-out I would make is that we currently don't support
> > > > >>> Turing
> > > > >>>>>>>> architecture (sm_75).  I've been slowly trying to add
> support,
> > > > >>> but I
> > > > >>>>>>> don't
> > > > >>>>>>>> think I'd have capacity to do this done by EOW.  Does anyone
> > > > >> feel
> > > > >>>>>>> strongly
> > > > >>>>>>>> we need this in the 1.4 release?  From my perspective this
> > will
> > > > >>>>> already
> > > > >>>>>>> be
> > > > >>>>>>>> a strong release without it.
> > > > >>>>>>>>
> > > > >>>>>>>> On Tue, Nov 20, 2018 at 6:42 PM Steffen Rochel <
> > > > >>>>> steffenrochel@gmail.com>
> > > > >>>>>>>> wrote:
> > > > >>>>>>>>
> > > > >>>>>>>>> Thanks Patrick, lets target to get the PR's merged this
> week.
> > > > >>>>>>>>>
> > > > >>>>>>>>> Call for contributions from the community: Right now we
> have
> > > > >> 10
> > > > >>> PR
> > > > >>>>>>>>> awaiting
> > > > >>>>>>>>> merge
> > > > >>>>>>>>> <
> > > > >>>>>>>>>
> > > > >>>>>>>
> > > > >>>>>
> > > > >>>
> > > > >>
> > > >
> > >
> >
> https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aopen+label%3Apr-awaiting-merge+
> > > > >>>>>>>>>>
> > > > >>>>>>>>> and
> > > > >>>>>>>>> we have 61 open PR awaiting review.
> > > > >>>>>>>>> <
> > > > >>>>>>>>>
> > > > >>>>>>>
> > > > >>>>>
> > > > >>>
> > > > >>
> > > >
> > >
> >
> https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aopen+label%3Apr-awaiting-review
> > > > >>>>>>>>>>
> > > > >>>>>>>>> I would appreciate if you all can help to review the open
> PR
> > > > >>> and the
> > > > >>>>>>>>> committers can drive the merge before code freeze for
> 1.4.0.
> > > > >>>>>>>>>
> > > > >>>>>>>>> The contributors on the Java API are making progress, but
> not
> > > > >>> all
> > > > >>>>>>>>> performance issues are resolved. With some luck it should
> be
> > > > >>>>> possible to
> > > > >>>>>>>>> code freeze towards end of this week.
> > > > >>>>>>>>>
> > > > >>>>>>>>> Are there other critical features/bugs/PR you think need to
> > be
> > > > >>>>> included
> > > > >>>>>>> in
> > > > >>>>>>>>> 1.4.0? If so, please communicate as soon as possible.
> > > > >>>>>>>>>
> > > > >>>>>>>>> Regards,
> > > > >>>>>>>>> Steffen
> > > > >>>>>>>>>
> > > > >>>>>>>>> On Mon, Nov 19, 2018 at 8:26 PM Zhao, Patric <
> > > > >>> patric.zhao@intel.com
> > > > >>>>>>
> > > > >>>>>>>>> wrote:
> > > > >>>>>>>>>
> > > > >>>>>>>>>> Thanks, Steffen. I think there is NO open issue to block
> the
> > > > >>>>> MKLDNN to
> > > > >>>>>>>>> GA
> > > > >>>>>>>>>> now.
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> BTW, several quantization related PRs (#13297,#13260) are
> > > > >>> under
> > > > >>>>> the
> > > > >>>>>>>>> review
> > > > >>>>>>>>>> and I think it can be merged in this week.
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> Thanks,
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> --Patric
> > > > >>>>>>>>>>
> > > > >>>>>>>>>>
> > > > >>>>>>>>>>> -----Original Message-----
> > > > >>>>>>>>>>> From: Steffen Rochel [mailto:steffenrochel@gmail.com]
> > > > >>>>>>>>>>> Sent: Tuesday, November 20, 2018 2:57 AM
> > > > >>>>>>>>>>> To: dev@mxnet.incubator.apache.org
> > > > >>>>>>>>>>> Subject: Re: [Announce] Upcoming Apache MXNet
> (incubating)
> > > > >>> 1.4.0
> > > > >>>>>>>>> release
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> On Friday the contributors working on Java API discovered
> > > > >> a
> > > > >>>>>>> potential
> > > > >>>>>>>>>>> performance problem with inference using Java API vs.
> > > > >>> Python.
> > > > >>>>>>>>>> Investigation
> > > > >>>>>>>>>>> is ongoing.
> > > > >>>>>>>>>>> As the Java API is one of the main features for the
> > > > >> upcoming
> > > > >>>>>>> release,
> > > > >>>>>>>>> I
> > > > >>>>>>>>>>> suggest to post-pone the code freeze towards end of this
> > > > >>> week.
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> Please provide feedback and concern about the change in
> > > > >>> dates
> > > > >>>>> for
> > > > >>>>>>> code
> > > > >>>>>>>>>>> freeze and 1.4.0 release. I will provide updates on
> > > > >> progress
> > > > >>>>>>> resolving
> > > > >>>>>>>>>> the
> > > > >>>>>>>>>>> potential performance problem.
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> Patrick - do you think it is possible to resolve the
> > > > >>> remaining
> > > > >>>>>>> issues
> > > > >>>>>>>>> on
> > > > >>>>>>>>>> MKL-
> > > > >>>>>>>>>>> DNN this week, so we can consider GA for MKL-DNN with
> > > > >> 1.4.0?
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> Regards,
> > > > >>>>>>>>>>> Steffen
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> On Thu, Nov 15, 2018 at 5:26 AM Anton Chernov <
> > > > >>>>> mechernov@gmail.com>
> > > > >>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>>> I'd like to remind everyone that 'code freeze' would
> > > > >> mean
> > > > >>>>> cutting
> > > > >>>>>>> a
> > > > >>>>>>>>>>>> v1.4.x release branch and all following fixes would need
> > > > >>> to be
> > > > >>>>>>>>>> backported.
> > > > >>>>>>>>>>>> Development on master can be continued as usual.
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> Best
> > > > >>>>>>>>>>>> Anton
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> ср, 14 нояб. 2018 г. в 6:04, Steffen Rochel <
> > > > >>>>>>>>> steffenrochel@gmail.com>:
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>>> Dear MXNet community,
> > > > >>>>>>>>>>>>> the agreed plan was to establish code freeze for 1.4.0
> > > > >>>>> release
> > > > >>>>>>>>>>>>> today. As the 1.3.1 patch release is still ongoing I
> > > > >>>>> suggest to
> > > > >>>>>>>>>>>>> post-pone the code freeze to Friday 16th November
> > > > >> 2018.
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> Sergey Kolychev has agreed to act as co-release
> > > > >> manager
> > > > >>> for
> > > > >>>>> all
> > > > >>>>>>>>>>>>> tasks
> > > > >>>>>>>>>>>> which
> > > > >>>>>>>>>>>>> require committer privileges. If anybody is interested
> > > > >>> to
> > > > >>>>>>>>> volunteer
> > > > >>>>>>>>>>>>> as release manager - now is the time to speak up.
> > > > >>> Otherwise
> > > > >>>>> I
> > > > >>>>>>> will
> > > > >>>>>>>>>>>>> manage
> > > > >>>>>>>>>>>> the
> > > > >>>>>>>>>>>>> release.
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> Regards,
> > > > >>>>>>>>>>>>> Steffen
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>
> > > > >>>>>
> > > > >>>
> > > > >>
> > > >
> > > >
> > >
> >
>

Re: [Announce] Upcoming Apache MXNet (incubating) 1.4.0 release

Posted by Jake Lee <gs...@gmail.com>.
Hi Steffen,

It would be nice to add this PR to 1.4 release.
https://github.com/apache/incubator-mxnet/pull/13550

It fixes the imagedetiter issue for mxnet 1.3
https://github.com/apache/incubator-mxnet/issues/13037.

Thanks,

Jake Lee

Lin Yuan <ap...@gmail.com> 於 2018年11月29日 週四 下午7:27寫道:

> Hi Steffen,
>
> Can we add the following PR to 1.4.0 release:
>
> https://github.com/apache/incubator-mxnet/pull/13452
>
> It's just a Python API returning header path so it should not cause any
> regression issues. But it is required for Horovod to integrate MXNet. It's
> better to have this in a minor release than patch release.
>
> Thanks,
>
> Lin
>
> On Thu, Nov 29, 2018 at 6:46 PM Steffen Rochel <st...@gmail.com>
> wrote:
>
> > Hi Zhi - thanks for the improvement, which we should consider for 1.4.0.
> > However, I don't see any tests with the PR and think it is too risky to
> add
> > changes without tests. I will add your PR to the tracking list, but would
> > like to ask you to add functional tests before completing the PR to
> master
> > and v1.4.x branch.
> >
> > Steffen
> >
> > On Thu, Nov 29, 2018 at 5:01 PM Joshua Z. Zhang <ch...@gmail.com>
> > wrote:
> >
> > > Hi, I would like to bring a critical performance and stability patch of
> > > existing gluon dataloader to 1.4.0:
> > > https://github.com/apache/incubator-mxnet/pull/13447 <
> > > https://github.com/apache/incubator-mxnet/pull/13447>.
> > >
> > > This PR is finished, waiting for CI to pass.
> > >
> > > Steffen, could you help me add that to the tracked list?
> > >
> > > Best,
> > > Zhi
> > >
> > > > On Nov 29, 2018, at 4:25 PM, Naveen Swamy <mn...@gmail.com>
> wrote:
> > > >
> > > > the tests are randomly failing in different stages
> > > >
> > >
> >
> http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-13105/
> > > > This PR has failed 8 times so far
> > > >
> > > > On Thu, Nov 29, 2018 at 3:43 PM Steffen Rochel <
> > steffenrochel@gmail.com>
> > > > wrote:
> > > >
> > > >> Pedro - ok. Please add PR to v1.4.x branch after merge to master and
> > > please
> > > >> update tracking page
> > > >> <
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Plan+and+Status#ApacheMXNet(incubating)1.4.0ReleasePlanandStatus-OpenPRstotrack
> > > >>>
> > > >> .
> > > >> Steffen
> > > >>
> > > >> On Thu, Nov 29, 2018 at 3:00 PM Pedro Larroy <
> > > pedro.larroy.lists@gmail.com
> > > >>>
> > > >> wrote:
> > > >>
> > > >>> PR is ready from my side and passes the tests, unless somebody
> raises
> > > >>> any concerns it's good to go.
> > > >>> On Thu, Nov 29, 2018 at 9:50 PM Steffen Rochel <
> > > steffenrochel@gmail.com>
> > > >>> wrote:
> > > >>>>
> > > >>>> Pedro - added  to 1.4.0 tracking list
> > > >>>> <
> > > >>>
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Plan+and+Status#ApacheMXNet(incubating)1.4.0ReleasePlanandStatus-OpenPRstotrack
> > > >>>>
> > > >>>>
> > > >>>> Do you have already ETA?
> > > >>>> Steffen
> > > >>>>
> > > >>>> On Thu, Nov 29, 2018 at 6:13 AM Pedro Larroy <
> > > >>> pedro.larroy.lists@gmail.com>
> > > >>>> wrote:
> > > >>>>
> > > >>>>> Hi all.
> > > >>>>>
> > > >>>>> There are two important issues / fixes that should go in the next
> > > >>>>> release in my radar:
> > > >>>>>
> > > >>>>> 1) https://github.com/apache/incubator-mxnet/pull/13409/files
> > > >>>>> There is a bug in shape inference on CPU when not using MKL, also
> > we
> > > >>>>> are running activation on CPU via MKL when we compile
> CUDNN+MKLDNN.
> > > >>>>> I'm finishing a fix for these issues in the above PR.
> > > >>>>>
> > > >>>>> 2) https://github.com/apache/incubator-mxnet/issues/13438
> > > >>>>> We are seeing crashes due to unsafe setenv in multithreaded code.
> > > >>>>> Setenv / getenv from multiple threads is not safe and is causing
> > > >>>>> segfaults. This piece of code (the handlers in pthread_atfork)
> > > >> already
> > > >>>>> caused a very difficult to diagnose hang in a previous release,
> > where
> > > >>>>> a fork inside cudnn would deadlock the engine.
> > > >>>>>
> > > >>>>> I would remove setenv from 2) as a mitigation, but we would need
> to
> > > >>>>> check for regressions as we could be creating additional threads
> > > >>>>> inside the engine.
> > > >>>>>
> > > >>>>> I would suggest that we address these two major issues before the
> > > >> next
> > > >>>>> release.
> > > >>>>>
> > > >>>>> Pedro
> > > >>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>> On Sun, Nov 25, 2018 at 11:41 PM Steffen Rochel <
> > > >>> steffenrochel@gmail.com>
> > > >>>>> wrote:
> > > >>>>>>
> > > >>>>>> Dear MXNet community,
> > > >>>>>>
> > > >>>>>> I will be the release manager for the upcoming Apache MXNet
> 1.4.0
> > > >>>>> release.
> > > >>>>>> Sergey Kolychev will be co-managing the release and providing
> help
> > > >>> from
> > > >>>>> the
> > > >>>>>> committers side.
> > > >>>>>> A release candidate will be cut on November 29, 2018 and voting
> > > >> will
> > > >>>>> start
> > > >>>>>> December 7, 2018. Release notes have been drafted here [1]. If
> you
> > > >>> have
> > > >>>>> any
> > > >>>>>> additional features in progress and would like to include it in
> > > >> this
> > > >>>>>> release, please assure they have been merged by November 27,
> 2018.
> > > >>>>> Release
> > > >>>>>> schedule is available here [2].
> > > >>>>>>
> > > >>>>>> Feel free to add any other comments/suggestions. Please help to
> > > >>> review
> > > >>>>> and
> > > >>>>>> merge outstanding PR's and resolve issues impacting the quality
> of
> > > >>> the
> > > >>>>>> 1.4.0 release.
> > > >>>>>>
> > > >>>>>> Regards,
> > > >>>>>>
> > > >>>>>> Steffen
> > > >>>>>>
> > > >>>>>> [1]
> > > >>>>>>
> > > >>>>>
> > > >>>
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Notes
> > > >>>>>>
> > > >>>>>> [2]
> > > >>>>>
> > > >>>
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Plan+and+Status
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> On Tue, Nov 20, 2018 at 7:15 PM kellen sunderland <
> > > >>>>>> kellen.sunderland@gmail.com> wrote:
> > > >>>>>>
> > > >>>>>>> Spoke too soon[1], looks like others have been adding Turing
> > > >>> support as
> > > >>>>>>> well (thanks to those helping with this).  I believe there's
> > > >> still
> > > >>> a
> > > >>>>> few
> > > >>>>>>> changes we'd have to make to claim support though (mshadow
> CMake
> > > >>>>> changes,
> > > >>>>>>> PyPi package creation tweaks).
> > > >>>>>>>
> > > >>>>>>> 1:
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>
> > > >>>
> > > >>
> > >
> >
> https://github.com/apache/incubator-mxnet/commit/2c3357443ec3d49a11e93c89f278264ce10c2f08
> > > >>>>>>>
> > > >>>>>>> On Tue, Nov 20, 2018 at 7:00 PM kellen sunderland <
> > > >>>>>>> kellen.sunderland@gmail.com> wrote:
> > > >>>>>>>
> > > >>>>>>>> Hey Steffen, I'd like to be able to merge this PR for version
> > > >>> 1.4:
> > > >>>>>>>> https://github.com/apache/incubator-mxnet/pull/13310 . It
> > > >> fixes
> > > >>> a
> > > >>>>>>>> regression in master which causes incorrect feature vectors to
> > > >> be
> > > >>>>> output
> > > >>>>>>>> when using the TensorRT feature.  (Thanks to Nathalie for
> > > >>> helping me
> > > >>>>>>> track
> > > >>>>>>>> down the root cause of the issue).   I'm currently blocked on
> a
> > > >>> CI
> > > >>>>> issue
> > > >>>>>>> I
> > > >>>>>>>> haven't seen before, but hope to have it resolved by EOW.
> > > >>>>>>>>
> > > >>>>>>>> One call-out I would make is that we currently don't support
> > > >>> Turing
> > > >>>>>>>> architecture (sm_75).  I've been slowly trying to add support,
> > > >>> but I
> > > >>>>>>> don't
> > > >>>>>>>> think I'd have capacity to do this done by EOW.  Does anyone
> > > >> feel
> > > >>>>>>> strongly
> > > >>>>>>>> we need this in the 1.4 release?  From my perspective this
> will
> > > >>>>> already
> > > >>>>>>> be
> > > >>>>>>>> a strong release without it.
> > > >>>>>>>>
> > > >>>>>>>> On Tue, Nov 20, 2018 at 6:42 PM Steffen Rochel <
> > > >>>>> steffenrochel@gmail.com>
> > > >>>>>>>> wrote:
> > > >>>>>>>>
> > > >>>>>>>>> Thanks Patrick, lets target to get the PR's merged this week.
> > > >>>>>>>>>
> > > >>>>>>>>> Call for contributions from the community: Right now we have
> > > >> 10
> > > >>> PR
> > > >>>>>>>>> awaiting
> > > >>>>>>>>> merge
> > > >>>>>>>>> <
> > > >>>>>>>>>
> > > >>>>>>>
> > > >>>>>
> > > >>>
> > > >>
> > >
> >
> https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aopen+label%3Apr-awaiting-merge+
> > > >>>>>>>>>>
> > > >>>>>>>>> and
> > > >>>>>>>>> we have 61 open PR awaiting review.
> > > >>>>>>>>> <
> > > >>>>>>>>>
> > > >>>>>>>
> > > >>>>>
> > > >>>
> > > >>
> > >
> >
> https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aopen+label%3Apr-awaiting-review
> > > >>>>>>>>>>
> > > >>>>>>>>> I would appreciate if you all can help to review the open PR
> > > >>> and the
> > > >>>>>>>>> committers can drive the merge before code freeze for 1.4.0.
> > > >>>>>>>>>
> > > >>>>>>>>> The contributors on the Java API are making progress, but not
> > > >>> all
> > > >>>>>>>>> performance issues are resolved. With some luck it should be
> > > >>>>> possible to
> > > >>>>>>>>> code freeze towards end of this week.
> > > >>>>>>>>>
> > > >>>>>>>>> Are there other critical features/bugs/PR you think need to
> be
> > > >>>>> included
> > > >>>>>>> in
> > > >>>>>>>>> 1.4.0? If so, please communicate as soon as possible.
> > > >>>>>>>>>
> > > >>>>>>>>> Regards,
> > > >>>>>>>>> Steffen
> > > >>>>>>>>>
> > > >>>>>>>>> On Mon, Nov 19, 2018 at 8:26 PM Zhao, Patric <
> > > >>> patric.zhao@intel.com
> > > >>>>>>
> > > >>>>>>>>> wrote:
> > > >>>>>>>>>
> > > >>>>>>>>>> Thanks, Steffen. I think there is NO open issue to block the
> > > >>>>> MKLDNN to
> > > >>>>>>>>> GA
> > > >>>>>>>>>> now.
> > > >>>>>>>>>>
> > > >>>>>>>>>> BTW, several quantization related PRs (#13297,#13260) are
> > > >>> under
> > > >>>>> the
> > > >>>>>>>>> review
> > > >>>>>>>>>> and I think it can be merged in this week.
> > > >>>>>>>>>>
> > > >>>>>>>>>> Thanks,
> > > >>>>>>>>>>
> > > >>>>>>>>>> --Patric
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>>> -----Original Message-----
> > > >>>>>>>>>>> From: Steffen Rochel [mailto:steffenrochel@gmail.com]
> > > >>>>>>>>>>> Sent: Tuesday, November 20, 2018 2:57 AM
> > > >>>>>>>>>>> To: dev@mxnet.incubator.apache.org
> > > >>>>>>>>>>> Subject: Re: [Announce] Upcoming Apache MXNet (incubating)
> > > >>> 1.4.0
> > > >>>>>>>>> release
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> On Friday the contributors working on Java API discovered
> > > >> a
> > > >>>>>>> potential
> > > >>>>>>>>>>> performance problem with inference using Java API vs.
> > > >>> Python.
> > > >>>>>>>>>> Investigation
> > > >>>>>>>>>>> is ongoing.
> > > >>>>>>>>>>> As the Java API is one of the main features for the
> > > >> upcoming
> > > >>>>>>> release,
> > > >>>>>>>>> I
> > > >>>>>>>>>>> suggest to post-pone the code freeze towards end of this
> > > >>> week.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> Please provide feedback and concern about the change in
> > > >>> dates
> > > >>>>> for
> > > >>>>>>> code
> > > >>>>>>>>>>> freeze and 1.4.0 release. I will provide updates on
> > > >> progress
> > > >>>>>>> resolving
> > > >>>>>>>>>> the
> > > >>>>>>>>>>> potential performance problem.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> Patrick - do you think it is possible to resolve the
> > > >>> remaining
> > > >>>>>>> issues
> > > >>>>>>>>> on
> > > >>>>>>>>>> MKL-
> > > >>>>>>>>>>> DNN this week, so we can consider GA for MKL-DNN with
> > > >> 1.4.0?
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> Regards,
> > > >>>>>>>>>>> Steffen
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> On Thu, Nov 15, 2018 at 5:26 AM Anton Chernov <
> > > >>>>> mechernov@gmail.com>
> > > >>>>>>>>>>> wrote:
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>> I'd like to remind everyone that 'code freeze' would
> > > >> mean
> > > >>>>> cutting
> > > >>>>>>> a
> > > >>>>>>>>>>>> v1.4.x release branch and all following fixes would need
> > > >>> to be
> > > >>>>>>>>>> backported.
> > > >>>>>>>>>>>> Development on master can be continued as usual.
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> Best
> > > >>>>>>>>>>>> Anton
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> ср, 14 нояб. 2018 г. в 6:04, Steffen Rochel <
> > > >>>>>>>>> steffenrochel@gmail.com>:
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>> Dear MXNet community,
> > > >>>>>>>>>>>>> the agreed plan was to establish code freeze for 1.4.0
> > > >>>>> release
> > > >>>>>>>>>>>>> today. As the 1.3.1 patch release is still ongoing I
> > > >>>>> suggest to
> > > >>>>>>>>>>>>> post-pone the code freeze to Friday 16th November
> > > >> 2018.
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> Sergey Kolychev has agreed to act as co-release
> > > >> manager
> > > >>> for
> > > >>>>> all
> > > >>>>>>>>>>>>> tasks
> > > >>>>>>>>>>>> which
> > > >>>>>>>>>>>>> require committer privileges. If anybody is interested
> > > >>> to
> > > >>>>>>>>> volunteer
> > > >>>>>>>>>>>>> as release manager - now is the time to speak up.
> > > >>> Otherwise
> > > >>>>> I
> > > >>>>>>> will
> > > >>>>>>>>>>>>> manage
> > > >>>>>>>>>>>> the
> > > >>>>>>>>>>>>> release.
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> Regards,
> > > >>>>>>>>>>>>> Steffen
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>>
> > > >>>
> > > >>
> > >
> > >
> >
>

Re: [Announce] Upcoming Apache MXNet (incubating) 1.4.0 release

Posted by Lin Yuan <ap...@gmail.com>.
Hi Steffen,

Can we add the following PR to 1.4.0 release:

https://github.com/apache/incubator-mxnet/pull/13452

It's just a Python API returning header path so it should not cause any
regression issues. But it is required for Horovod to integrate MXNet. It's
better to have this in a minor release than patch release.

Thanks,

Lin

On Thu, Nov 29, 2018 at 6:46 PM Steffen Rochel <st...@gmail.com>
wrote:

> Hi Zhi - thanks for the improvement, which we should consider for 1.4.0.
> However, I don't see any tests with the PR and think it is too risky to add
> changes without tests. I will add your PR to the tracking list, but would
> like to ask you to add functional tests before completing the PR to master
> and v1.4.x branch.
>
> Steffen
>
> On Thu, Nov 29, 2018 at 5:01 PM Joshua Z. Zhang <ch...@gmail.com>
> wrote:
>
> > Hi, I would like to bring a critical performance and stability patch of
> > existing gluon dataloader to 1.4.0:
> > https://github.com/apache/incubator-mxnet/pull/13447 <
> > https://github.com/apache/incubator-mxnet/pull/13447>.
> >
> > This PR is finished, waiting for CI to pass.
> >
> > Steffen, could you help me add that to the tracked list?
> >
> > Best,
> > Zhi
> >
> > > On Nov 29, 2018, at 4:25 PM, Naveen Swamy <mn...@gmail.com> wrote:
> > >
> > > the tests are randomly failing in different stages
> > >
> >
> http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-13105/
> > > This PR has failed 8 times so far
> > >
> > > On Thu, Nov 29, 2018 at 3:43 PM Steffen Rochel <
> steffenrochel@gmail.com>
> > > wrote:
> > >
> > >> Pedro - ok. Please add PR to v1.4.x branch after merge to master and
> > please
> > >> update tracking page
> > >> <
> > >>
> >
> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Plan+and+Status#ApacheMXNet(incubating)1.4.0ReleasePlanandStatus-OpenPRstotrack
> > >>>
> > >> .
> > >> Steffen
> > >>
> > >> On Thu, Nov 29, 2018 at 3:00 PM Pedro Larroy <
> > pedro.larroy.lists@gmail.com
> > >>>
> > >> wrote:
> > >>
> > >>> PR is ready from my side and passes the tests, unless somebody raises
> > >>> any concerns it's good to go.
> > >>> On Thu, Nov 29, 2018 at 9:50 PM Steffen Rochel <
> > steffenrochel@gmail.com>
> > >>> wrote:
> > >>>>
> > >>>> Pedro - added  to 1.4.0 tracking list
> > >>>> <
> > >>>
> > >>
> >
> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Plan+and+Status#ApacheMXNet(incubating)1.4.0ReleasePlanandStatus-OpenPRstotrack
> > >>>>
> > >>>>
> > >>>> Do you have already ETA?
> > >>>> Steffen
> > >>>>
> > >>>> On Thu, Nov 29, 2018 at 6:13 AM Pedro Larroy <
> > >>> pedro.larroy.lists@gmail.com>
> > >>>> wrote:
> > >>>>
> > >>>>> Hi all.
> > >>>>>
> > >>>>> There are two important issues / fixes that should go in the next
> > >>>>> release in my radar:
> > >>>>>
> > >>>>> 1) https://github.com/apache/incubator-mxnet/pull/13409/files
> > >>>>> There is a bug in shape inference on CPU when not using MKL, also
> we
> > >>>>> are running activation on CPU via MKL when we compile CUDNN+MKLDNN.
> > >>>>> I'm finishing a fix for these issues in the above PR.
> > >>>>>
> > >>>>> 2) https://github.com/apache/incubator-mxnet/issues/13438
> > >>>>> We are seeing crashes due to unsafe setenv in multithreaded code.
> > >>>>> Setenv / getenv from multiple threads is not safe and is causing
> > >>>>> segfaults. This piece of code (the handlers in pthread_atfork)
> > >> already
> > >>>>> caused a very difficult to diagnose hang in a previous release,
> where
> > >>>>> a fork inside cudnn would deadlock the engine.
> > >>>>>
> > >>>>> I would remove setenv from 2) as a mitigation, but we would need to
> > >>>>> check for regressions as we could be creating additional threads
> > >>>>> inside the engine.
> > >>>>>
> > >>>>> I would suggest that we address these two major issues before the
> > >> next
> > >>>>> release.
> > >>>>>
> > >>>>> Pedro
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> On Sun, Nov 25, 2018 at 11:41 PM Steffen Rochel <
> > >>> steffenrochel@gmail.com>
> > >>>>> wrote:
> > >>>>>>
> > >>>>>> Dear MXNet community,
> > >>>>>>
> > >>>>>> I will be the release manager for the upcoming Apache MXNet 1.4.0
> > >>>>> release.
> > >>>>>> Sergey Kolychev will be co-managing the release and providing help
> > >>> from
> > >>>>> the
> > >>>>>> committers side.
> > >>>>>> A release candidate will be cut on November 29, 2018 and voting
> > >> will
> > >>>>> start
> > >>>>>> December 7, 2018. Release notes have been drafted here [1]. If you
> > >>> have
> > >>>>> any
> > >>>>>> additional features in progress and would like to include it in
> > >> this
> > >>>>>> release, please assure they have been merged by November 27, 2018.
> > >>>>> Release
> > >>>>>> schedule is available here [2].
> > >>>>>>
> > >>>>>> Feel free to add any other comments/suggestions. Please help to
> > >>> review
> > >>>>> and
> > >>>>>> merge outstanding PR's and resolve issues impacting the quality of
> > >>> the
> > >>>>>> 1.4.0 release.
> > >>>>>>
> > >>>>>> Regards,
> > >>>>>>
> > >>>>>> Steffen
> > >>>>>>
> > >>>>>> [1]
> > >>>>>>
> > >>>>>
> > >>>
> > >>
> >
> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Notes
> > >>>>>>
> > >>>>>> [2]
> > >>>>>
> > >>>
> > >>
> >
> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Plan+and+Status
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>> On Tue, Nov 20, 2018 at 7:15 PM kellen sunderland <
> > >>>>>> kellen.sunderland@gmail.com> wrote:
> > >>>>>>
> > >>>>>>> Spoke too soon[1], looks like others have been adding Turing
> > >>> support as
> > >>>>>>> well (thanks to those helping with this).  I believe there's
> > >> still
> > >>> a
> > >>>>> few
> > >>>>>>> changes we'd have to make to claim support though (mshadow CMake
> > >>>>> changes,
> > >>>>>>> PyPi package creation tweaks).
> > >>>>>>>
> > >>>>>>> 1:
> > >>>>>>>
> > >>>>>>>
> > >>>>>
> > >>>
> > >>
> >
> https://github.com/apache/incubator-mxnet/commit/2c3357443ec3d49a11e93c89f278264ce10c2f08
> > >>>>>>>
> > >>>>>>> On Tue, Nov 20, 2018 at 7:00 PM kellen sunderland <
> > >>>>>>> kellen.sunderland@gmail.com> wrote:
> > >>>>>>>
> > >>>>>>>> Hey Steffen, I'd like to be able to merge this PR for version
> > >>> 1.4:
> > >>>>>>>> https://github.com/apache/incubator-mxnet/pull/13310 . It
> > >> fixes
> > >>> a
> > >>>>>>>> regression in master which causes incorrect feature vectors to
> > >> be
> > >>>>> output
> > >>>>>>>> when using the TensorRT feature.  (Thanks to Nathalie for
> > >>> helping me
> > >>>>>>> track
> > >>>>>>>> down the root cause of the issue).   I'm currently blocked on a
> > >>> CI
> > >>>>> issue
> > >>>>>>> I
> > >>>>>>>> haven't seen before, but hope to have it resolved by EOW.
> > >>>>>>>>
> > >>>>>>>> One call-out I would make is that we currently don't support
> > >>> Turing
> > >>>>>>>> architecture (sm_75).  I've been slowly trying to add support,
> > >>> but I
> > >>>>>>> don't
> > >>>>>>>> think I'd have capacity to do this done by EOW.  Does anyone
> > >> feel
> > >>>>>>> strongly
> > >>>>>>>> we need this in the 1.4 release?  From my perspective this will
> > >>>>> already
> > >>>>>>> be
> > >>>>>>>> a strong release without it.
> > >>>>>>>>
> > >>>>>>>> On Tue, Nov 20, 2018 at 6:42 PM Steffen Rochel <
> > >>>>> steffenrochel@gmail.com>
> > >>>>>>>> wrote:
> > >>>>>>>>
> > >>>>>>>>> Thanks Patrick, lets target to get the PR's merged this week.
> > >>>>>>>>>
> > >>>>>>>>> Call for contributions from the community: Right now we have
> > >> 10
> > >>> PR
> > >>>>>>>>> awaiting
> > >>>>>>>>> merge
> > >>>>>>>>> <
> > >>>>>>>>>
> > >>>>>>>
> > >>>>>
> > >>>
> > >>
> >
> https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aopen+label%3Apr-awaiting-merge+
> > >>>>>>>>>>
> > >>>>>>>>> and
> > >>>>>>>>> we have 61 open PR awaiting review.
> > >>>>>>>>> <
> > >>>>>>>>>
> > >>>>>>>
> > >>>>>
> > >>>
> > >>
> >
> https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aopen+label%3Apr-awaiting-review
> > >>>>>>>>>>
> > >>>>>>>>> I would appreciate if you all can help to review the open PR
> > >>> and the
> > >>>>>>>>> committers can drive the merge before code freeze for 1.4.0.
> > >>>>>>>>>
> > >>>>>>>>> The contributors on the Java API are making progress, but not
> > >>> all
> > >>>>>>>>> performance issues are resolved. With some luck it should be
> > >>>>> possible to
> > >>>>>>>>> code freeze towards end of this week.
> > >>>>>>>>>
> > >>>>>>>>> Are there other critical features/bugs/PR you think need to be
> > >>>>> included
> > >>>>>>> in
> > >>>>>>>>> 1.4.0? If so, please communicate as soon as possible.
> > >>>>>>>>>
> > >>>>>>>>> Regards,
> > >>>>>>>>> Steffen
> > >>>>>>>>>
> > >>>>>>>>> On Mon, Nov 19, 2018 at 8:26 PM Zhao, Patric <
> > >>> patric.zhao@intel.com
> > >>>>>>
> > >>>>>>>>> wrote:
> > >>>>>>>>>
> > >>>>>>>>>> Thanks, Steffen. I think there is NO open issue to block the
> > >>>>> MKLDNN to
> > >>>>>>>>> GA
> > >>>>>>>>>> now.
> > >>>>>>>>>>
> > >>>>>>>>>> BTW, several quantization related PRs (#13297,#13260) are
> > >>> under
> > >>>>> the
> > >>>>>>>>> review
> > >>>>>>>>>> and I think it can be merged in this week.
> > >>>>>>>>>>
> > >>>>>>>>>> Thanks,
> > >>>>>>>>>>
> > >>>>>>>>>> --Patric
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>> -----Original Message-----
> > >>>>>>>>>>> From: Steffen Rochel [mailto:steffenrochel@gmail.com]
> > >>>>>>>>>>> Sent: Tuesday, November 20, 2018 2:57 AM
> > >>>>>>>>>>> To: dev@mxnet.incubator.apache.org
> > >>>>>>>>>>> Subject: Re: [Announce] Upcoming Apache MXNet (incubating)
> > >>> 1.4.0
> > >>>>>>>>> release
> > >>>>>>>>>>>
> > >>>>>>>>>>> On Friday the contributors working on Java API discovered
> > >> a
> > >>>>>>> potential
> > >>>>>>>>>>> performance problem with inference using Java API vs.
> > >>> Python.
> > >>>>>>>>>> Investigation
> > >>>>>>>>>>> is ongoing.
> > >>>>>>>>>>> As the Java API is one of the main features for the
> > >> upcoming
> > >>>>>>> release,
> > >>>>>>>>> I
> > >>>>>>>>>>> suggest to post-pone the code freeze towards end of this
> > >>> week.
> > >>>>>>>>>>>
> > >>>>>>>>>>> Please provide feedback and concern about the change in
> > >>> dates
> > >>>>> for
> > >>>>>>> code
> > >>>>>>>>>>> freeze and 1.4.0 release. I will provide updates on
> > >> progress
> > >>>>>>> resolving
> > >>>>>>>>>> the
> > >>>>>>>>>>> potential performance problem.
> > >>>>>>>>>>>
> > >>>>>>>>>>> Patrick - do you think it is possible to resolve the
> > >>> remaining
> > >>>>>>> issues
> > >>>>>>>>> on
> > >>>>>>>>>> MKL-
> > >>>>>>>>>>> DNN this week, so we can consider GA for MKL-DNN with
> > >> 1.4.0?
> > >>>>>>>>>>>
> > >>>>>>>>>>> Regards,
> > >>>>>>>>>>> Steffen
> > >>>>>>>>>>>
> > >>>>>>>>>>> On Thu, Nov 15, 2018 at 5:26 AM Anton Chernov <
> > >>>>> mechernov@gmail.com>
> > >>>>>>>>>>> wrote:
> > >>>>>>>>>>>
> > >>>>>>>>>>>> I'd like to remind everyone that 'code freeze' would
> > >> mean
> > >>>>> cutting
> > >>>>>>> a
> > >>>>>>>>>>>> v1.4.x release branch and all following fixes would need
> > >>> to be
> > >>>>>>>>>> backported.
> > >>>>>>>>>>>> Development on master can be continued as usual.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Best
> > >>>>>>>>>>>> Anton
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> ср, 14 нояб. 2018 г. в 6:04, Steffen Rochel <
> > >>>>>>>>> steffenrochel@gmail.com>:
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>> Dear MXNet community,
> > >>>>>>>>>>>>> the agreed plan was to establish code freeze for 1.4.0
> > >>>>> release
> > >>>>>>>>>>>>> today. As the 1.3.1 patch release is still ongoing I
> > >>>>> suggest to
> > >>>>>>>>>>>>> post-pone the code freeze to Friday 16th November
> > >> 2018.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Sergey Kolychev has agreed to act as co-release
> > >> manager
> > >>> for
> > >>>>> all
> > >>>>>>>>>>>>> tasks
> > >>>>>>>>>>>> which
> > >>>>>>>>>>>>> require committer privileges. If anybody is interested
> > >>> to
> > >>>>>>>>> volunteer
> > >>>>>>>>>>>>> as release manager - now is the time to speak up.
> > >>> Otherwise
> > >>>>> I
> > >>>>>>> will
> > >>>>>>>>>>>>> manage
> > >>>>>>>>>>>> the
> > >>>>>>>>>>>>> release.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Regards,
> > >>>>>>>>>>>>> Steffen
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>>
> > >>>
> > >>
> >
> >
>

Re: [Announce] Upcoming Apache MXNet (incubating) 1.4.0 release

Posted by Steffen Rochel <st...@gmail.com>.
Hi Zhi - thanks for the improvement, which we should consider for 1.4.0.
However, I don't see any tests with the PR and think it is too risky to add
changes without tests. I will add your PR to the tracking list, but would
like to ask you to add functional tests before completing the PR to master
and v1.4.x branch.

Steffen

On Thu, Nov 29, 2018 at 5:01 PM Joshua Z. Zhang <ch...@gmail.com>
wrote:

> Hi, I would like to bring a critical performance and stability patch of
> existing gluon dataloader to 1.4.0:
> https://github.com/apache/incubator-mxnet/pull/13447 <
> https://github.com/apache/incubator-mxnet/pull/13447>.
>
> This PR is finished, waiting for CI to pass.
>
> Steffen, could you help me add that to the tracked list?
>
> Best,
> Zhi
>
> > On Nov 29, 2018, at 4:25 PM, Naveen Swamy <mn...@gmail.com> wrote:
> >
> > the tests are randomly failing in different stages
> >
> http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-13105/
> > This PR has failed 8 times so far
> >
> > On Thu, Nov 29, 2018 at 3:43 PM Steffen Rochel <st...@gmail.com>
> > wrote:
> >
> >> Pedro - ok. Please add PR to v1.4.x branch after merge to master and
> please
> >> update tracking page
> >> <
> >>
> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Plan+and+Status#ApacheMXNet(incubating)1.4.0ReleasePlanandStatus-OpenPRstotrack
> >>>
> >> .
> >> Steffen
> >>
> >> On Thu, Nov 29, 2018 at 3:00 PM Pedro Larroy <
> pedro.larroy.lists@gmail.com
> >>>
> >> wrote:
> >>
> >>> PR is ready from my side and passes the tests, unless somebody raises
> >>> any concerns it's good to go.
> >>> On Thu, Nov 29, 2018 at 9:50 PM Steffen Rochel <
> steffenrochel@gmail.com>
> >>> wrote:
> >>>>
> >>>> Pedro - added  to 1.4.0 tracking list
> >>>> <
> >>>
> >>
> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Plan+and+Status#ApacheMXNet(incubating)1.4.0ReleasePlanandStatus-OpenPRstotrack
> >>>>
> >>>>
> >>>> Do you have already ETA?
> >>>> Steffen
> >>>>
> >>>> On Thu, Nov 29, 2018 at 6:13 AM Pedro Larroy <
> >>> pedro.larroy.lists@gmail.com>
> >>>> wrote:
> >>>>
> >>>>> Hi all.
> >>>>>
> >>>>> There are two important issues / fixes that should go in the next
> >>>>> release in my radar:
> >>>>>
> >>>>> 1) https://github.com/apache/incubator-mxnet/pull/13409/files
> >>>>> There is a bug in shape inference on CPU when not using MKL, also we
> >>>>> are running activation on CPU via MKL when we compile CUDNN+MKLDNN.
> >>>>> I'm finishing a fix for these issues in the above PR.
> >>>>>
> >>>>> 2) https://github.com/apache/incubator-mxnet/issues/13438
> >>>>> We are seeing crashes due to unsafe setenv in multithreaded code.
> >>>>> Setenv / getenv from multiple threads is not safe and is causing
> >>>>> segfaults. This piece of code (the handlers in pthread_atfork)
> >> already
> >>>>> caused a very difficult to diagnose hang in a previous release, where
> >>>>> a fork inside cudnn would deadlock the engine.
> >>>>>
> >>>>> I would remove setenv from 2) as a mitigation, but we would need to
> >>>>> check for regressions as we could be creating additional threads
> >>>>> inside the engine.
> >>>>>
> >>>>> I would suggest that we address these two major issues before the
> >> next
> >>>>> release.
> >>>>>
> >>>>> Pedro
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Sun, Nov 25, 2018 at 11:41 PM Steffen Rochel <
> >>> steffenrochel@gmail.com>
> >>>>> wrote:
> >>>>>>
> >>>>>> Dear MXNet community,
> >>>>>>
> >>>>>> I will be the release manager for the upcoming Apache MXNet 1.4.0
> >>>>> release.
> >>>>>> Sergey Kolychev will be co-managing the release and providing help
> >>> from
> >>>>> the
> >>>>>> committers side.
> >>>>>> A release candidate will be cut on November 29, 2018 and voting
> >> will
> >>>>> start
> >>>>>> December 7, 2018. Release notes have been drafted here [1]. If you
> >>> have
> >>>>> any
> >>>>>> additional features in progress and would like to include it in
> >> this
> >>>>>> release, please assure they have been merged by November 27, 2018.
> >>>>> Release
> >>>>>> schedule is available here [2].
> >>>>>>
> >>>>>> Feel free to add any other comments/suggestions. Please help to
> >>> review
> >>>>> and
> >>>>>> merge outstanding PR's and resolve issues impacting the quality of
> >>> the
> >>>>>> 1.4.0 release.
> >>>>>>
> >>>>>> Regards,
> >>>>>>
> >>>>>> Steffen
> >>>>>>
> >>>>>> [1]
> >>>>>>
> >>>>>
> >>>
> >>
> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Notes
> >>>>>>
> >>>>>> [2]
> >>>>>
> >>>
> >>
> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Plan+and+Status
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On Tue, Nov 20, 2018 at 7:15 PM kellen sunderland <
> >>>>>> kellen.sunderland@gmail.com> wrote:
> >>>>>>
> >>>>>>> Spoke too soon[1], looks like others have been adding Turing
> >>> support as
> >>>>>>> well (thanks to those helping with this).  I believe there's
> >> still
> >>> a
> >>>>> few
> >>>>>>> changes we'd have to make to claim support though (mshadow CMake
> >>>>> changes,
> >>>>>>> PyPi package creation tweaks).
> >>>>>>>
> >>>>>>> 1:
> >>>>>>>
> >>>>>>>
> >>>>>
> >>>
> >>
> https://github.com/apache/incubator-mxnet/commit/2c3357443ec3d49a11e93c89f278264ce10c2f08
> >>>>>>>
> >>>>>>> On Tue, Nov 20, 2018 at 7:00 PM kellen sunderland <
> >>>>>>> kellen.sunderland@gmail.com> wrote:
> >>>>>>>
> >>>>>>>> Hey Steffen, I'd like to be able to merge this PR for version
> >>> 1.4:
> >>>>>>>> https://github.com/apache/incubator-mxnet/pull/13310 . It
> >> fixes
> >>> a
> >>>>>>>> regression in master which causes incorrect feature vectors to
> >> be
> >>>>> output
> >>>>>>>> when using the TensorRT feature.  (Thanks to Nathalie for
> >>> helping me
> >>>>>>> track
> >>>>>>>> down the root cause of the issue).   I'm currently blocked on a
> >>> CI
> >>>>> issue
> >>>>>>> I
> >>>>>>>> haven't seen before, but hope to have it resolved by EOW.
> >>>>>>>>
> >>>>>>>> One call-out I would make is that we currently don't support
> >>> Turing
> >>>>>>>> architecture (sm_75).  I've been slowly trying to add support,
> >>> but I
> >>>>>>> don't
> >>>>>>>> think I'd have capacity to do this done by EOW.  Does anyone
> >> feel
> >>>>>>> strongly
> >>>>>>>> we need this in the 1.4 release?  From my perspective this will
> >>>>> already
> >>>>>>> be
> >>>>>>>> a strong release without it.
> >>>>>>>>
> >>>>>>>> On Tue, Nov 20, 2018 at 6:42 PM Steffen Rochel <
> >>>>> steffenrochel@gmail.com>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Thanks Patrick, lets target to get the PR's merged this week.
> >>>>>>>>>
> >>>>>>>>> Call for contributions from the community: Right now we have
> >> 10
> >>> PR
> >>>>>>>>> awaiting
> >>>>>>>>> merge
> >>>>>>>>> <
> >>>>>>>>>
> >>>>>>>
> >>>>>
> >>>
> >>
> https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aopen+label%3Apr-awaiting-merge+
> >>>>>>>>>>
> >>>>>>>>> and
> >>>>>>>>> we have 61 open PR awaiting review.
> >>>>>>>>> <
> >>>>>>>>>
> >>>>>>>
> >>>>>
> >>>
> >>
> https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aopen+label%3Apr-awaiting-review
> >>>>>>>>>>
> >>>>>>>>> I would appreciate if you all can help to review the open PR
> >>> and the
> >>>>>>>>> committers can drive the merge before code freeze for 1.4.0.
> >>>>>>>>>
> >>>>>>>>> The contributors on the Java API are making progress, but not
> >>> all
> >>>>>>>>> performance issues are resolved. With some luck it should be
> >>>>> possible to
> >>>>>>>>> code freeze towards end of this week.
> >>>>>>>>>
> >>>>>>>>> Are there other critical features/bugs/PR you think need to be
> >>>>> included
> >>>>>>> in
> >>>>>>>>> 1.4.0? If so, please communicate as soon as possible.
> >>>>>>>>>
> >>>>>>>>> Regards,
> >>>>>>>>> Steffen
> >>>>>>>>>
> >>>>>>>>> On Mon, Nov 19, 2018 at 8:26 PM Zhao, Patric <
> >>> patric.zhao@intel.com
> >>>>>>
> >>>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Thanks, Steffen. I think there is NO open issue to block the
> >>>>> MKLDNN to
> >>>>>>>>> GA
> >>>>>>>>>> now.
> >>>>>>>>>>
> >>>>>>>>>> BTW, several quantization related PRs (#13297,#13260) are
> >>> under
> >>>>> the
> >>>>>>>>> review
> >>>>>>>>>> and I think it can be merged in this week.
> >>>>>>>>>>
> >>>>>>>>>> Thanks,
> >>>>>>>>>>
> >>>>>>>>>> --Patric
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>> -----Original Message-----
> >>>>>>>>>>> From: Steffen Rochel [mailto:steffenrochel@gmail.com]
> >>>>>>>>>>> Sent: Tuesday, November 20, 2018 2:57 AM
> >>>>>>>>>>> To: dev@mxnet.incubator.apache.org
> >>>>>>>>>>> Subject: Re: [Announce] Upcoming Apache MXNet (incubating)
> >>> 1.4.0
> >>>>>>>>> release
> >>>>>>>>>>>
> >>>>>>>>>>> On Friday the contributors working on Java API discovered
> >> a
> >>>>>>> potential
> >>>>>>>>>>> performance problem with inference using Java API vs.
> >>> Python.
> >>>>>>>>>> Investigation
> >>>>>>>>>>> is ongoing.
> >>>>>>>>>>> As the Java API is one of the main features for the
> >> upcoming
> >>>>>>> release,
> >>>>>>>>> I
> >>>>>>>>>>> suggest to post-pone the code freeze towards end of this
> >>> week.
> >>>>>>>>>>>
> >>>>>>>>>>> Please provide feedback and concern about the change in
> >>> dates
> >>>>> for
> >>>>>>> code
> >>>>>>>>>>> freeze and 1.4.0 release. I will provide updates on
> >> progress
> >>>>>>> resolving
> >>>>>>>>>> the
> >>>>>>>>>>> potential performance problem.
> >>>>>>>>>>>
> >>>>>>>>>>> Patrick - do you think it is possible to resolve the
> >>> remaining
> >>>>>>> issues
> >>>>>>>>> on
> >>>>>>>>>> MKL-
> >>>>>>>>>>> DNN this week, so we can consider GA for MKL-DNN with
> >> 1.4.0?
> >>>>>>>>>>>
> >>>>>>>>>>> Regards,
> >>>>>>>>>>> Steffen
> >>>>>>>>>>>
> >>>>>>>>>>> On Thu, Nov 15, 2018 at 5:26 AM Anton Chernov <
> >>>>> mechernov@gmail.com>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> I'd like to remind everyone that 'code freeze' would
> >> mean
> >>>>> cutting
> >>>>>>> a
> >>>>>>>>>>>> v1.4.x release branch and all following fixes would need
> >>> to be
> >>>>>>>>>> backported.
> >>>>>>>>>>>> Development on master can be continued as usual.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Best
> >>>>>>>>>>>> Anton
> >>>>>>>>>>>>
> >>>>>>>>>>>> ср, 14 нояб. 2018 г. в 6:04, Steffen Rochel <
> >>>>>>>>> steffenrochel@gmail.com>:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> Dear MXNet community,
> >>>>>>>>>>>>> the agreed plan was to establish code freeze for 1.4.0
> >>>>> release
> >>>>>>>>>>>>> today. As the 1.3.1 patch release is still ongoing I
> >>>>> suggest to
> >>>>>>>>>>>>> post-pone the code freeze to Friday 16th November
> >> 2018.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Sergey Kolychev has agreed to act as co-release
> >> manager
> >>> for
> >>>>> all
> >>>>>>>>>>>>> tasks
> >>>>>>>>>>>> which
> >>>>>>>>>>>>> require committer privileges. If anybody is interested
> >>> to
> >>>>>>>>> volunteer
> >>>>>>>>>>>>> as release manager - now is the time to speak up.
> >>> Otherwise
> >>>>> I
> >>>>>>> will
> >>>>>>>>>>>>> manage
> >>>>>>>>>>>> the
> >>>>>>>>>>>>> release.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Regards,
> >>>>>>>>>>>>> Steffen
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>
> >>>
> >>
>
>

Re: [Announce] Upcoming Apache MXNet (incubating) 1.4.0 release

Posted by Marco de Abreu <ma...@googlemail.com.INVALID>.
Hi everyone,

would you mind prepending [1.4.x] to the title of your PRs so we can see
cherry-picks at a glance? That'd allow me to better classify the load we
have on our CI (Release-branches have a higher load than master due to
cache mismatches).

Best regards,
Marco

On Fri, Nov 30, 2018 at 2:17 AM Marco de Abreu <ma...@googlemail.com>
wrote:

> Hi Naveen,
>
> yeah sorry, that's DockerHub acting up again (this happens every now and
> then unfortunately). Basically docker pull starts multiple download threads
> and it seems like sometimes a single web server request sits in the queue
> forever which then slows down the docker pull (for the cache retrieval).
>
> Chance will be assisting with CI issues this week and I explained him my
> proposed solution: Basically wrap the 'docker pull' into a timeout in
> combination with a retry with backoff. Anton proposed, in case that retry
> fails after a few times, we are falling back to local cache and cache
> regeneration to avoid the job failing. That would solve the problem you're
> encountering. We would basically wrap [1] into the timeout-retry-mechanism.
>
> Best regards,
> Marco
>
> [1]:
> https://github.com/apache/incubator-mxnet/blob/master/ci/docker_cache.py#L107
>
> On Fri, Nov 30, 2018 at 2:01 AM Joshua Z. Zhang <ch...@gmail.com>
> wrote:
>
>> Hi, I would like to bring a critical performance and stability patch of
>> existing gluon dataloader to 1.4.0:
>> https://github.com/apache/incubator-mxnet/pull/13447 <
>> https://github.com/apache/incubator-mxnet/pull/13447>.
>>
>> This PR is finished, waiting for CI to pass.
>>
>> Steffen, could you help me add that to the tracked list?
>>
>> Best,
>> Zhi
>>
>> > On Nov 29, 2018, at 4:25 PM, Naveen Swamy <mn...@gmail.com> wrote:
>> >
>> > the tests are randomly failing in different stages
>> >
>> http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-13105/
>> > This PR has failed 8 times so far
>> >
>> > On Thu, Nov 29, 2018 at 3:43 PM Steffen Rochel <steffenrochel@gmail.com
>> >
>> > wrote:
>> >
>> >> Pedro - ok. Please add PR to v1.4.x branch after merge to master and
>> please
>> >> update tracking page
>> >> <
>> >>
>> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Plan+and+Status#ApacheMXNet(incubating)1.4.0ReleasePlanandStatus-OpenPRstotrack
>> >>>
>> >> .
>> >> Steffen
>> >>
>> >> On Thu, Nov 29, 2018 at 3:00 PM Pedro Larroy <
>> pedro.larroy.lists@gmail.com
>> >>>
>> >> wrote:
>> >>
>> >>> PR is ready from my side and passes the tests, unless somebody raises
>> >>> any concerns it's good to go.
>> >>> On Thu, Nov 29, 2018 at 9:50 PM Steffen Rochel <
>> steffenrochel@gmail.com>
>> >>> wrote:
>> >>>>
>> >>>> Pedro - added  to 1.4.0 tracking list
>> >>>> <
>> >>>
>> >>
>> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Plan+and+Status#ApacheMXNet(incubating)1.4.0ReleasePlanandStatus-OpenPRstotrack
>> >>>>
>> >>>>
>> >>>> Do you have already ETA?
>> >>>> Steffen
>> >>>>
>> >>>> On Thu, Nov 29, 2018 at 6:13 AM Pedro Larroy <
>> >>> pedro.larroy.lists@gmail.com>
>> >>>> wrote:
>> >>>>
>> >>>>> Hi all.
>> >>>>>
>> >>>>> There are two important issues / fixes that should go in the next
>> >>>>> release in my radar:
>> >>>>>
>> >>>>> 1) https://github.com/apache/incubator-mxnet/pull/13409/files
>> >>>>> There is a bug in shape inference on CPU when not using MKL, also we
>> >>>>> are running activation on CPU via MKL when we compile CUDNN+MKLDNN.
>> >>>>> I'm finishing a fix for these issues in the above PR.
>> >>>>>
>> >>>>> 2) https://github.com/apache/incubator-mxnet/issues/13438
>> >>>>> We are seeing crashes due to unsafe setenv in multithreaded code.
>> >>>>> Setenv / getenv from multiple threads is not safe and is causing
>> >>>>> segfaults. This piece of code (the handlers in pthread_atfork)
>> >> already
>> >>>>> caused a very difficult to diagnose hang in a previous release,
>> where
>> >>>>> a fork inside cudnn would deadlock the engine.
>> >>>>>
>> >>>>> I would remove setenv from 2) as a mitigation, but we would need to
>> >>>>> check for regressions as we could be creating additional threads
>> >>>>> inside the engine.
>> >>>>>
>> >>>>> I would suggest that we address these two major issues before the
>> >> next
>> >>>>> release.
>> >>>>>
>> >>>>> Pedro
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> On Sun, Nov 25, 2018 at 11:41 PM Steffen Rochel <
>> >>> steffenrochel@gmail.com>
>> >>>>> wrote:
>> >>>>>>
>> >>>>>> Dear MXNet community,
>> >>>>>>
>> >>>>>> I will be the release manager for the upcoming Apache MXNet 1.4.0
>> >>>>> release.
>> >>>>>> Sergey Kolychev will be co-managing the release and providing help
>> >>> from
>> >>>>> the
>> >>>>>> committers side.
>> >>>>>> A release candidate will be cut on November 29, 2018 and voting
>> >> will
>> >>>>> start
>> >>>>>> December 7, 2018. Release notes have been drafted here [1]. If you
>> >>> have
>> >>>>> any
>> >>>>>> additional features in progress and would like to include it in
>> >> this
>> >>>>>> release, please assure they have been merged by November 27, 2018.
>> >>>>> Release
>> >>>>>> schedule is available here [2].
>> >>>>>>
>> >>>>>> Feel free to add any other comments/suggestions. Please help to
>> >>> review
>> >>>>> and
>> >>>>>> merge outstanding PR's and resolve issues impacting the quality of
>> >>> the
>> >>>>>> 1.4.0 release.
>> >>>>>>
>> >>>>>> Regards,
>> >>>>>>
>> >>>>>> Steffen
>> >>>>>>
>> >>>>>> [1]
>> >>>>>>
>> >>>>>
>> >>>
>> >>
>> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Notes
>> >>>>>>
>> >>>>>> [2]
>> >>>>>
>> >>>
>> >>
>> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Plan+and+Status
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> On Tue, Nov 20, 2018 at 7:15 PM kellen sunderland <
>> >>>>>> kellen.sunderland@gmail.com> wrote:
>> >>>>>>
>> >>>>>>> Spoke too soon[1], looks like others have been adding Turing
>> >>> support as
>> >>>>>>> well (thanks to those helping with this).  I believe there's
>> >> still
>> >>> a
>> >>>>> few
>> >>>>>>> changes we'd have to make to claim support though (mshadow CMake
>> >>>>> changes,
>> >>>>>>> PyPi package creation tweaks).
>> >>>>>>>
>> >>>>>>> 1:
>> >>>>>>>
>> >>>>>>>
>> >>>>>
>> >>>
>> >>
>> https://github.com/apache/incubator-mxnet/commit/2c3357443ec3d49a11e93c89f278264ce10c2f08
>> >>>>>>>
>> >>>>>>> On Tue, Nov 20, 2018 at 7:00 PM kellen sunderland <
>> >>>>>>> kellen.sunderland@gmail.com> wrote:
>> >>>>>>>
>> >>>>>>>> Hey Steffen, I'd like to be able to merge this PR for version
>> >>> 1.4:
>> >>>>>>>> https://github.com/apache/incubator-mxnet/pull/13310 . It
>> >> fixes
>> >>> a
>> >>>>>>>> regression in master which causes incorrect feature vectors to
>> >> be
>> >>>>> output
>> >>>>>>>> when using the TensorRT feature.  (Thanks to Nathalie for
>> >>> helping me
>> >>>>>>> track
>> >>>>>>>> down the root cause of the issue).   I'm currently blocked on a
>> >>> CI
>> >>>>> issue
>> >>>>>>> I
>> >>>>>>>> haven't seen before, but hope to have it resolved by EOW.
>> >>>>>>>>
>> >>>>>>>> One call-out I would make is that we currently don't support
>> >>> Turing
>> >>>>>>>> architecture (sm_75).  I've been slowly trying to add support,
>> >>> but I
>> >>>>>>> don't
>> >>>>>>>> think I'd have capacity to do this done by EOW.  Does anyone
>> >> feel
>> >>>>>>> strongly
>> >>>>>>>> we need this in the 1.4 release?  From my perspective this will
>> >>>>> already
>> >>>>>>> be
>> >>>>>>>> a strong release without it.
>> >>>>>>>>
>> >>>>>>>> On Tue, Nov 20, 2018 at 6:42 PM Steffen Rochel <
>> >>>>> steffenrochel@gmail.com>
>> >>>>>>>> wrote:
>> >>>>>>>>
>> >>>>>>>>> Thanks Patrick, lets target to get the PR's merged this week.
>> >>>>>>>>>
>> >>>>>>>>> Call for contributions from the community: Right now we have
>> >> 10
>> >>> PR
>> >>>>>>>>> awaiting
>> >>>>>>>>> merge
>> >>>>>>>>> <
>> >>>>>>>>>
>> >>>>>>>
>> >>>>>
>> >>>
>> >>
>> https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aopen+label%3Apr-awaiting-merge+
>> >>>>>>>>>>
>> >>>>>>>>> and
>> >>>>>>>>> we have 61 open PR awaiting review.
>> >>>>>>>>> <
>> >>>>>>>>>
>> >>>>>>>
>> >>>>>
>> >>>
>> >>
>> https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aopen+label%3Apr-awaiting-review
>> >>>>>>>>>>
>> >>>>>>>>> I would appreciate if you all can help to review the open PR
>> >>> and the
>> >>>>>>>>> committers can drive the merge before code freeze for 1.4.0.
>> >>>>>>>>>
>> >>>>>>>>> The contributors on the Java API are making progress, but not
>> >>> all
>> >>>>>>>>> performance issues are resolved. With some luck it should be
>> >>>>> possible to
>> >>>>>>>>> code freeze towards end of this week.
>> >>>>>>>>>
>> >>>>>>>>> Are there other critical features/bugs/PR you think need to be
>> >>>>> included
>> >>>>>>> in
>> >>>>>>>>> 1.4.0? If so, please communicate as soon as possible.
>> >>>>>>>>>
>> >>>>>>>>> Regards,
>> >>>>>>>>> Steffen
>> >>>>>>>>>
>> >>>>>>>>> On Mon, Nov 19, 2018 at 8:26 PM Zhao, Patric <
>> >>> patric.zhao@intel.com
>> >>>>>>
>> >>>>>>>>> wrote:
>> >>>>>>>>>
>> >>>>>>>>>> Thanks, Steffen. I think there is NO open issue to block the
>> >>>>> MKLDNN to
>> >>>>>>>>> GA
>> >>>>>>>>>> now.
>> >>>>>>>>>>
>> >>>>>>>>>> BTW, several quantization related PRs (#13297,#13260) are
>> >>> under
>> >>>>> the
>> >>>>>>>>> review
>> >>>>>>>>>> and I think it can be merged in this week.
>> >>>>>>>>>>
>> >>>>>>>>>> Thanks,
>> >>>>>>>>>>
>> >>>>>>>>>> --Patric
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>> -----Original Message-----
>> >>>>>>>>>>> From: Steffen Rochel [mailto:steffenrochel@gmail.com]
>> >>>>>>>>>>> Sent: Tuesday, November 20, 2018 2:57 AM
>> >>>>>>>>>>> To: dev@mxnet.incubator.apache.org
>> >>>>>>>>>>> Subject: Re: [Announce] Upcoming Apache MXNet (incubating)
>> >>> 1.4.0
>> >>>>>>>>> release
>> >>>>>>>>>>>
>> >>>>>>>>>>> On Friday the contributors working on Java API discovered
>> >> a
>> >>>>>>> potential
>> >>>>>>>>>>> performance problem with inference using Java API vs.
>> >>> Python.
>> >>>>>>>>>> Investigation
>> >>>>>>>>>>> is ongoing.
>> >>>>>>>>>>> As the Java API is one of the main features for the
>> >> upcoming
>> >>>>>>> release,
>> >>>>>>>>> I
>> >>>>>>>>>>> suggest to post-pone the code freeze towards end of this
>> >>> week.
>> >>>>>>>>>>>
>> >>>>>>>>>>> Please provide feedback and concern about the change in
>> >>> dates
>> >>>>> for
>> >>>>>>> code
>> >>>>>>>>>>> freeze and 1.4.0 release. I will provide updates on
>> >> progress
>> >>>>>>> resolving
>> >>>>>>>>>> the
>> >>>>>>>>>>> potential performance problem.
>> >>>>>>>>>>>
>> >>>>>>>>>>> Patrick - do you think it is possible to resolve the
>> >>> remaining
>> >>>>>>> issues
>> >>>>>>>>> on
>> >>>>>>>>>> MKL-
>> >>>>>>>>>>> DNN this week, so we can consider GA for MKL-DNN with
>> >> 1.4.0?
>> >>>>>>>>>>>
>> >>>>>>>>>>> Regards,
>> >>>>>>>>>>> Steffen
>> >>>>>>>>>>>
>> >>>>>>>>>>> On Thu, Nov 15, 2018 at 5:26 AM Anton Chernov <
>> >>>>> mechernov@gmail.com>
>> >>>>>>>>>>> wrote:
>> >>>>>>>>>>>
>> >>>>>>>>>>>> I'd like to remind everyone that 'code freeze' would
>> >> mean
>> >>>>> cutting
>> >>>>>>> a
>> >>>>>>>>>>>> v1.4.x release branch and all following fixes would need
>> >>> to be
>> >>>>>>>>>> backported.
>> >>>>>>>>>>>> Development on master can be continued as usual.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Best
>> >>>>>>>>>>>> Anton
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> ср, 14 нояб. 2018 г. в 6:04, Steffen Rochel <
>> >>>>>>>>> steffenrochel@gmail.com>:
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>> Dear MXNet community,
>> >>>>>>>>>>>>> the agreed plan was to establish code freeze for 1.4.0
>> >>>>> release
>> >>>>>>>>>>>>> today. As the 1.3.1 patch release is still ongoing I
>> >>>>> suggest to
>> >>>>>>>>>>>>> post-pone the code freeze to Friday 16th November
>> >> 2018.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> Sergey Kolychev has agreed to act as co-release
>> >> manager
>> >>> for
>> >>>>> all
>> >>>>>>>>>>>>> tasks
>> >>>>>>>>>>>> which
>> >>>>>>>>>>>>> require committer privileges. If anybody is interested
>> >>> to
>> >>>>>>>>> volunteer
>> >>>>>>>>>>>>> as release manager - now is the time to speak up.
>> >>> Otherwise
>> >>>>> I
>> >>>>>>> will
>> >>>>>>>>>>>>> manage
>> >>>>>>>>>>>> the
>> >>>>>>>>>>>>> release.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> Regards,
>> >>>>>>>>>>>>> Steffen
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>>>
>> >>>>>
>> >>>
>> >>
>>
>>

Re: [Announce] Upcoming Apache MXNet (incubating) 1.4.0 release

Posted by Marco de Abreu <ma...@googlemail.com.INVALID>.
Hi Naveen,

yeah sorry, that's DockerHub acting up again (this happens every now and
then unfortunately). Basically docker pull starts multiple download threads
and it seems like sometimes a single web server request sits in the queue
forever which then slows down the docker pull (for the cache retrieval).

Chance will be assisting with CI issues this week and I explained him my
proposed solution: Basically wrap the 'docker pull' into a timeout in
combination with a retry with backoff. Anton proposed, in case that retry
fails after a few times, we are falling back to local cache and cache
regeneration to avoid the job failing. That would solve the problem you're
encountering. We would basically wrap [1] into the timeout-retry-mechanism.

Best regards,
Marco

[1]:
https://github.com/apache/incubator-mxnet/blob/master/ci/docker_cache.py#L107

On Fri, Nov 30, 2018 at 2:01 AM Joshua Z. Zhang <ch...@gmail.com>
wrote:

> Hi, I would like to bring a critical performance and stability patch of
> existing gluon dataloader to 1.4.0:
> https://github.com/apache/incubator-mxnet/pull/13447 <
> https://github.com/apache/incubator-mxnet/pull/13447>.
>
> This PR is finished, waiting for CI to pass.
>
> Steffen, could you help me add that to the tracked list?
>
> Best,
> Zhi
>
> > On Nov 29, 2018, at 4:25 PM, Naveen Swamy <mn...@gmail.com> wrote:
> >
> > the tests are randomly failing in different stages
> >
> http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-13105/
> > This PR has failed 8 times so far
> >
> > On Thu, Nov 29, 2018 at 3:43 PM Steffen Rochel <st...@gmail.com>
> > wrote:
> >
> >> Pedro - ok. Please add PR to v1.4.x branch after merge to master and
> please
> >> update tracking page
> >> <
> >>
> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Plan+and+Status#ApacheMXNet(incubating)1.4.0ReleasePlanandStatus-OpenPRstotrack
> >>>
> >> .
> >> Steffen
> >>
> >> On Thu, Nov 29, 2018 at 3:00 PM Pedro Larroy <
> pedro.larroy.lists@gmail.com
> >>>
> >> wrote:
> >>
> >>> PR is ready from my side and passes the tests, unless somebody raises
> >>> any concerns it's good to go.
> >>> On Thu, Nov 29, 2018 at 9:50 PM Steffen Rochel <
> steffenrochel@gmail.com>
> >>> wrote:
> >>>>
> >>>> Pedro - added  to 1.4.0 tracking list
> >>>> <
> >>>
> >>
> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Plan+and+Status#ApacheMXNet(incubating)1.4.0ReleasePlanandStatus-OpenPRstotrack
> >>>>
> >>>>
> >>>> Do you have already ETA?
> >>>> Steffen
> >>>>
> >>>> On Thu, Nov 29, 2018 at 6:13 AM Pedro Larroy <
> >>> pedro.larroy.lists@gmail.com>
> >>>> wrote:
> >>>>
> >>>>> Hi all.
> >>>>>
> >>>>> There are two important issues / fixes that should go in the next
> >>>>> release in my radar:
> >>>>>
> >>>>> 1) https://github.com/apache/incubator-mxnet/pull/13409/files
> >>>>> There is a bug in shape inference on CPU when not using MKL, also we
> >>>>> are running activation on CPU via MKL when we compile CUDNN+MKLDNN.
> >>>>> I'm finishing a fix for these issues in the above PR.
> >>>>>
> >>>>> 2) https://github.com/apache/incubator-mxnet/issues/13438
> >>>>> We are seeing crashes due to unsafe setenv in multithreaded code.
> >>>>> Setenv / getenv from multiple threads is not safe and is causing
> >>>>> segfaults. This piece of code (the handlers in pthread_atfork)
> >> already
> >>>>> caused a very difficult to diagnose hang in a previous release, where
> >>>>> a fork inside cudnn would deadlock the engine.
> >>>>>
> >>>>> I would remove setenv from 2) as a mitigation, but we would need to
> >>>>> check for regressions as we could be creating additional threads
> >>>>> inside the engine.
> >>>>>
> >>>>> I would suggest that we address these two major issues before the
> >> next
> >>>>> release.
> >>>>>
> >>>>> Pedro
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Sun, Nov 25, 2018 at 11:41 PM Steffen Rochel <
> >>> steffenrochel@gmail.com>
> >>>>> wrote:
> >>>>>>
> >>>>>> Dear MXNet community,
> >>>>>>
> >>>>>> I will be the release manager for the upcoming Apache MXNet 1.4.0
> >>>>> release.
> >>>>>> Sergey Kolychev will be co-managing the release and providing help
> >>> from
> >>>>> the
> >>>>>> committers side.
> >>>>>> A release candidate will be cut on November 29, 2018 and voting
> >> will
> >>>>> start
> >>>>>> December 7, 2018. Release notes have been drafted here [1]. If you
> >>> have
> >>>>> any
> >>>>>> additional features in progress and would like to include it in
> >> this
> >>>>>> release, please assure they have been merged by November 27, 2018.
> >>>>> Release
> >>>>>> schedule is available here [2].
> >>>>>>
> >>>>>> Feel free to add any other comments/suggestions. Please help to
> >>> review
> >>>>> and
> >>>>>> merge outstanding PR's and resolve issues impacting the quality of
> >>> the
> >>>>>> 1.4.0 release.
> >>>>>>
> >>>>>> Regards,
> >>>>>>
> >>>>>> Steffen
> >>>>>>
> >>>>>> [1]
> >>>>>>
> >>>>>
> >>>
> >>
> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Notes
> >>>>>>
> >>>>>> [2]
> >>>>>
> >>>
> >>
> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Plan+and+Status
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On Tue, Nov 20, 2018 at 7:15 PM kellen sunderland <
> >>>>>> kellen.sunderland@gmail.com> wrote:
> >>>>>>
> >>>>>>> Spoke too soon[1], looks like others have been adding Turing
> >>> support as
> >>>>>>> well (thanks to those helping with this).  I believe there's
> >> still
> >>> a
> >>>>> few
> >>>>>>> changes we'd have to make to claim support though (mshadow CMake
> >>>>> changes,
> >>>>>>> PyPi package creation tweaks).
> >>>>>>>
> >>>>>>> 1:
> >>>>>>>
> >>>>>>>
> >>>>>
> >>>
> >>
> https://github.com/apache/incubator-mxnet/commit/2c3357443ec3d49a11e93c89f278264ce10c2f08
> >>>>>>>
> >>>>>>> On Tue, Nov 20, 2018 at 7:00 PM kellen sunderland <
> >>>>>>> kellen.sunderland@gmail.com> wrote:
> >>>>>>>
> >>>>>>>> Hey Steffen, I'd like to be able to merge this PR for version
> >>> 1.4:
> >>>>>>>> https://github.com/apache/incubator-mxnet/pull/13310 . It
> >> fixes
> >>> a
> >>>>>>>> regression in master which causes incorrect feature vectors to
> >> be
> >>>>> output
> >>>>>>>> when using the TensorRT feature.  (Thanks to Nathalie for
> >>> helping me
> >>>>>>> track
> >>>>>>>> down the root cause of the issue).   I'm currently blocked on a
> >>> CI
> >>>>> issue
> >>>>>>> I
> >>>>>>>> haven't seen before, but hope to have it resolved by EOW.
> >>>>>>>>
> >>>>>>>> One call-out I would make is that we currently don't support
> >>> Turing
> >>>>>>>> architecture (sm_75).  I've been slowly trying to add support,
> >>> but I
> >>>>>>> don't
> >>>>>>>> think I'd have capacity to do this done by EOW.  Does anyone
> >> feel
> >>>>>>> strongly
> >>>>>>>> we need this in the 1.4 release?  From my perspective this will
> >>>>> already
> >>>>>>> be
> >>>>>>>> a strong release without it.
> >>>>>>>>
> >>>>>>>> On Tue, Nov 20, 2018 at 6:42 PM Steffen Rochel <
> >>>>> steffenrochel@gmail.com>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Thanks Patrick, lets target to get the PR's merged this week.
> >>>>>>>>>
> >>>>>>>>> Call for contributions from the community: Right now we have
> >> 10
> >>> PR
> >>>>>>>>> awaiting
> >>>>>>>>> merge
> >>>>>>>>> <
> >>>>>>>>>
> >>>>>>>
> >>>>>
> >>>
> >>
> https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aopen+label%3Apr-awaiting-merge+
> >>>>>>>>>>
> >>>>>>>>> and
> >>>>>>>>> we have 61 open PR awaiting review.
> >>>>>>>>> <
> >>>>>>>>>
> >>>>>>>
> >>>>>
> >>>
> >>
> https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aopen+label%3Apr-awaiting-review
> >>>>>>>>>>
> >>>>>>>>> I would appreciate if you all can help to review the open PR
> >>> and the
> >>>>>>>>> committers can drive the merge before code freeze for 1.4.0.
> >>>>>>>>>
> >>>>>>>>> The contributors on the Java API are making progress, but not
> >>> all
> >>>>>>>>> performance issues are resolved. With some luck it should be
> >>>>> possible to
> >>>>>>>>> code freeze towards end of this week.
> >>>>>>>>>
> >>>>>>>>> Are there other critical features/bugs/PR you think need to be
> >>>>> included
> >>>>>>> in
> >>>>>>>>> 1.4.0? If so, please communicate as soon as possible.
> >>>>>>>>>
> >>>>>>>>> Regards,
> >>>>>>>>> Steffen
> >>>>>>>>>
> >>>>>>>>> On Mon, Nov 19, 2018 at 8:26 PM Zhao, Patric <
> >>> patric.zhao@intel.com
> >>>>>>
> >>>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Thanks, Steffen. I think there is NO open issue to block the
> >>>>> MKLDNN to
> >>>>>>>>> GA
> >>>>>>>>>> now.
> >>>>>>>>>>
> >>>>>>>>>> BTW, several quantization related PRs (#13297,#13260) are
> >>> under
> >>>>> the
> >>>>>>>>> review
> >>>>>>>>>> and I think it can be merged in this week.
> >>>>>>>>>>
> >>>>>>>>>> Thanks,
> >>>>>>>>>>
> >>>>>>>>>> --Patric
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>> -----Original Message-----
> >>>>>>>>>>> From: Steffen Rochel [mailto:steffenrochel@gmail.com]
> >>>>>>>>>>> Sent: Tuesday, November 20, 2018 2:57 AM
> >>>>>>>>>>> To: dev@mxnet.incubator.apache.org
> >>>>>>>>>>> Subject: Re: [Announce] Upcoming Apache MXNet (incubating)
> >>> 1.4.0
> >>>>>>>>> release
> >>>>>>>>>>>
> >>>>>>>>>>> On Friday the contributors working on Java API discovered
> >> a
> >>>>>>> potential
> >>>>>>>>>>> performance problem with inference using Java API vs.
> >>> Python.
> >>>>>>>>>> Investigation
> >>>>>>>>>>> is ongoing.
> >>>>>>>>>>> As the Java API is one of the main features for the
> >> upcoming
> >>>>>>> release,
> >>>>>>>>> I
> >>>>>>>>>>> suggest to post-pone the code freeze towards end of this
> >>> week.
> >>>>>>>>>>>
> >>>>>>>>>>> Please provide feedback and concern about the change in
> >>> dates
> >>>>> for
> >>>>>>> code
> >>>>>>>>>>> freeze and 1.4.0 release. I will provide updates on
> >> progress
> >>>>>>> resolving
> >>>>>>>>>> the
> >>>>>>>>>>> potential performance problem.
> >>>>>>>>>>>
> >>>>>>>>>>> Patrick - do you think it is possible to resolve the
> >>> remaining
> >>>>>>> issues
> >>>>>>>>> on
> >>>>>>>>>> MKL-
> >>>>>>>>>>> DNN this week, so we can consider GA for MKL-DNN with
> >> 1.4.0?
> >>>>>>>>>>>
> >>>>>>>>>>> Regards,
> >>>>>>>>>>> Steffen
> >>>>>>>>>>>
> >>>>>>>>>>> On Thu, Nov 15, 2018 at 5:26 AM Anton Chernov <
> >>>>> mechernov@gmail.com>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> I'd like to remind everyone that 'code freeze' would
> >> mean
> >>>>> cutting
> >>>>>>> a
> >>>>>>>>>>>> v1.4.x release branch and all following fixes would need
> >>> to be
> >>>>>>>>>> backported.
> >>>>>>>>>>>> Development on master can be continued as usual.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Best
> >>>>>>>>>>>> Anton
> >>>>>>>>>>>>
> >>>>>>>>>>>> ср, 14 нояб. 2018 г. в 6:04, Steffen Rochel <
> >>>>>>>>> steffenrochel@gmail.com>:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> Dear MXNet community,
> >>>>>>>>>>>>> the agreed plan was to establish code freeze for 1.4.0
> >>>>> release
> >>>>>>>>>>>>> today. As the 1.3.1 patch release is still ongoing I
> >>>>> suggest to
> >>>>>>>>>>>>> post-pone the code freeze to Friday 16th November
> >> 2018.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Sergey Kolychev has agreed to act as co-release
> >> manager
> >>> for
> >>>>> all
> >>>>>>>>>>>>> tasks
> >>>>>>>>>>>> which
> >>>>>>>>>>>>> require committer privileges. If anybody is interested
> >>> to
> >>>>>>>>> volunteer
> >>>>>>>>>>>>> as release manager - now is the time to speak up.
> >>> Otherwise
> >>>>> I
> >>>>>>> will
> >>>>>>>>>>>>> manage
> >>>>>>>>>>>> the
> >>>>>>>>>>>>> release.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Regards,
> >>>>>>>>>>>>> Steffen
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>
> >>>
> >>
>
>

Re: [Announce] Upcoming Apache MXNet (incubating) 1.4.0 release

Posted by "Joshua Z. Zhang" <ch...@gmail.com>.
Hi, I would like to bring a critical performance and stability patch of existing gluon dataloader to 1.4.0: https://github.com/apache/incubator-mxnet/pull/13447 <https://github.com/apache/incubator-mxnet/pull/13447>. 

This PR is finished, waiting for CI to pass. 

Steffen, could you help me add that to the tracked list?

Best,
Zhi

> On Nov 29, 2018, at 4:25 PM, Naveen Swamy <mn...@gmail.com> wrote:
> 
> the tests are randomly failing in different stages
> http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-13105/
> This PR has failed 8 times so far
> 
> On Thu, Nov 29, 2018 at 3:43 PM Steffen Rochel <st...@gmail.com>
> wrote:
> 
>> Pedro - ok. Please add PR to v1.4.x branch after merge to master and please
>> update tracking page
>> <
>> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Plan+and+Status#ApacheMXNet(incubating)1.4.0ReleasePlanandStatus-OpenPRstotrack
>>> 
>> .
>> Steffen
>> 
>> On Thu, Nov 29, 2018 at 3:00 PM Pedro Larroy <pedro.larroy.lists@gmail.com
>>> 
>> wrote:
>> 
>>> PR is ready from my side and passes the tests, unless somebody raises
>>> any concerns it's good to go.
>>> On Thu, Nov 29, 2018 at 9:50 PM Steffen Rochel <st...@gmail.com>
>>> wrote:
>>>> 
>>>> Pedro - added  to 1.4.0 tracking list
>>>> <
>>> 
>> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Plan+and+Status#ApacheMXNet(incubating)1.4.0ReleasePlanandStatus-OpenPRstotrack
>>>> 
>>>> 
>>>> Do you have already ETA?
>>>> Steffen
>>>> 
>>>> On Thu, Nov 29, 2018 at 6:13 AM Pedro Larroy <
>>> pedro.larroy.lists@gmail.com>
>>>> wrote:
>>>> 
>>>>> Hi all.
>>>>> 
>>>>> There are two important issues / fixes that should go in the next
>>>>> release in my radar:
>>>>> 
>>>>> 1) https://github.com/apache/incubator-mxnet/pull/13409/files
>>>>> There is a bug in shape inference on CPU when not using MKL, also we
>>>>> are running activation on CPU via MKL when we compile CUDNN+MKLDNN.
>>>>> I'm finishing a fix for these issues in the above PR.
>>>>> 
>>>>> 2) https://github.com/apache/incubator-mxnet/issues/13438
>>>>> We are seeing crashes due to unsafe setenv in multithreaded code.
>>>>> Setenv / getenv from multiple threads is not safe and is causing
>>>>> segfaults. This piece of code (the handlers in pthread_atfork)
>> already
>>>>> caused a very difficult to diagnose hang in a previous release, where
>>>>> a fork inside cudnn would deadlock the engine.
>>>>> 
>>>>> I would remove setenv from 2) as a mitigation, but we would need to
>>>>> check for regressions as we could be creating additional threads
>>>>> inside the engine.
>>>>> 
>>>>> I would suggest that we address these two major issues before the
>> next
>>>>> release.
>>>>> 
>>>>> Pedro
>>>>> 
>>>>> 
>>>>> 
>>>>> On Sun, Nov 25, 2018 at 11:41 PM Steffen Rochel <
>>> steffenrochel@gmail.com>
>>>>> wrote:
>>>>>> 
>>>>>> Dear MXNet community,
>>>>>> 
>>>>>> I will be the release manager for the upcoming Apache MXNet 1.4.0
>>>>> release.
>>>>>> Sergey Kolychev will be co-managing the release and providing help
>>> from
>>>>> the
>>>>>> committers side.
>>>>>> A release candidate will be cut on November 29, 2018 and voting
>> will
>>>>> start
>>>>>> December 7, 2018. Release notes have been drafted here [1]. If you
>>> have
>>>>> any
>>>>>> additional features in progress and would like to include it in
>> this
>>>>>> release, please assure they have been merged by November 27, 2018.
>>>>> Release
>>>>>> schedule is available here [2].
>>>>>> 
>>>>>> Feel free to add any other comments/suggestions. Please help to
>>> review
>>>>> and
>>>>>> merge outstanding PR's and resolve issues impacting the quality of
>>> the
>>>>>> 1.4.0 release.
>>>>>> 
>>>>>> Regards,
>>>>>> 
>>>>>> Steffen
>>>>>> 
>>>>>> [1]
>>>>>> 
>>>>> 
>>> 
>> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Notes
>>>>>> 
>>>>>> [2]
>>>>> 
>>> 
>> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Plan+and+Status
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Tue, Nov 20, 2018 at 7:15 PM kellen sunderland <
>>>>>> kellen.sunderland@gmail.com> wrote:
>>>>>> 
>>>>>>> Spoke too soon[1], looks like others have been adding Turing
>>> support as
>>>>>>> well (thanks to those helping with this).  I believe there's
>> still
>>> a
>>>>> few
>>>>>>> changes we'd have to make to claim support though (mshadow CMake
>>>>> changes,
>>>>>>> PyPi package creation tweaks).
>>>>>>> 
>>>>>>> 1:
>>>>>>> 
>>>>>>> 
>>>>> 
>>> 
>> https://github.com/apache/incubator-mxnet/commit/2c3357443ec3d49a11e93c89f278264ce10c2f08
>>>>>>> 
>>>>>>> On Tue, Nov 20, 2018 at 7:00 PM kellen sunderland <
>>>>>>> kellen.sunderland@gmail.com> wrote:
>>>>>>> 
>>>>>>>> Hey Steffen, I'd like to be able to merge this PR for version
>>> 1.4:
>>>>>>>> https://github.com/apache/incubator-mxnet/pull/13310 . It
>> fixes
>>> a
>>>>>>>> regression in master which causes incorrect feature vectors to
>> be
>>>>> output
>>>>>>>> when using the TensorRT feature.  (Thanks to Nathalie for
>>> helping me
>>>>>>> track
>>>>>>>> down the root cause of the issue).   I'm currently blocked on a
>>> CI
>>>>> issue
>>>>>>> I
>>>>>>>> haven't seen before, but hope to have it resolved by EOW.
>>>>>>>> 
>>>>>>>> One call-out I would make is that we currently don't support
>>> Turing
>>>>>>>> architecture (sm_75).  I've been slowly trying to add support,
>>> but I
>>>>>>> don't
>>>>>>>> think I'd have capacity to do this done by EOW.  Does anyone
>> feel
>>>>>>> strongly
>>>>>>>> we need this in the 1.4 release?  From my perspective this will
>>>>> already
>>>>>>> be
>>>>>>>> a strong release without it.
>>>>>>>> 
>>>>>>>> On Tue, Nov 20, 2018 at 6:42 PM Steffen Rochel <
>>>>> steffenrochel@gmail.com>
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Thanks Patrick, lets target to get the PR's merged this week.
>>>>>>>>> 
>>>>>>>>> Call for contributions from the community: Right now we have
>> 10
>>> PR
>>>>>>>>> awaiting
>>>>>>>>> merge
>>>>>>>>> <
>>>>>>>>> 
>>>>>>> 
>>>>> 
>>> 
>> https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aopen+label%3Apr-awaiting-merge+
>>>>>>>>>> 
>>>>>>>>> and
>>>>>>>>> we have 61 open PR awaiting review.
>>>>>>>>> <
>>>>>>>>> 
>>>>>>> 
>>>>> 
>>> 
>> https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aopen+label%3Apr-awaiting-review
>>>>>>>>>> 
>>>>>>>>> I would appreciate if you all can help to review the open PR
>>> and the
>>>>>>>>> committers can drive the merge before code freeze for 1.4.0.
>>>>>>>>> 
>>>>>>>>> The contributors on the Java API are making progress, but not
>>> all
>>>>>>>>> performance issues are resolved. With some luck it should be
>>>>> possible to
>>>>>>>>> code freeze towards end of this week.
>>>>>>>>> 
>>>>>>>>> Are there other critical features/bugs/PR you think need to be
>>>>> included
>>>>>>> in
>>>>>>>>> 1.4.0? If so, please communicate as soon as possible.
>>>>>>>>> 
>>>>>>>>> Regards,
>>>>>>>>> Steffen
>>>>>>>>> 
>>>>>>>>> On Mon, Nov 19, 2018 at 8:26 PM Zhao, Patric <
>>> patric.zhao@intel.com
>>>>>> 
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> Thanks, Steffen. I think there is NO open issue to block the
>>>>> MKLDNN to
>>>>>>>>> GA
>>>>>>>>>> now.
>>>>>>>>>> 
>>>>>>>>>> BTW, several quantization related PRs (#13297,#13260) are
>>> under
>>>>> the
>>>>>>>>> review
>>>>>>>>>> and I think it can be merged in this week.
>>>>>>>>>> 
>>>>>>>>>> Thanks,
>>>>>>>>>> 
>>>>>>>>>> --Patric
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>> From: Steffen Rochel [mailto:steffenrochel@gmail.com]
>>>>>>>>>>> Sent: Tuesday, November 20, 2018 2:57 AM
>>>>>>>>>>> To: dev@mxnet.incubator.apache.org
>>>>>>>>>>> Subject: Re: [Announce] Upcoming Apache MXNet (incubating)
>>> 1.4.0
>>>>>>>>> release
>>>>>>>>>>> 
>>>>>>>>>>> On Friday the contributors working on Java API discovered
>> a
>>>>>>> potential
>>>>>>>>>>> performance problem with inference using Java API vs.
>>> Python.
>>>>>>>>>> Investigation
>>>>>>>>>>> is ongoing.
>>>>>>>>>>> As the Java API is one of the main features for the
>> upcoming
>>>>>>> release,
>>>>>>>>> I
>>>>>>>>>>> suggest to post-pone the code freeze towards end of this
>>> week.
>>>>>>>>>>> 
>>>>>>>>>>> Please provide feedback and concern about the change in
>>> dates
>>>>> for
>>>>>>> code
>>>>>>>>>>> freeze and 1.4.0 release. I will provide updates on
>> progress
>>>>>>> resolving
>>>>>>>>>> the
>>>>>>>>>>> potential performance problem.
>>>>>>>>>>> 
>>>>>>>>>>> Patrick - do you think it is possible to resolve the
>>> remaining
>>>>>>> issues
>>>>>>>>> on
>>>>>>>>>> MKL-
>>>>>>>>>>> DNN this week, so we can consider GA for MKL-DNN with
>> 1.4.0?
>>>>>>>>>>> 
>>>>>>>>>>> Regards,
>>>>>>>>>>> Steffen
>>>>>>>>>>> 
>>>>>>>>>>> On Thu, Nov 15, 2018 at 5:26 AM Anton Chernov <
>>>>> mechernov@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> I'd like to remind everyone that 'code freeze' would
>> mean
>>>>> cutting
>>>>>>> a
>>>>>>>>>>>> v1.4.x release branch and all following fixes would need
>>> to be
>>>>>>>>>> backported.
>>>>>>>>>>>> Development on master can be continued as usual.
>>>>>>>>>>>> 
>>>>>>>>>>>> Best
>>>>>>>>>>>> Anton
>>>>>>>>>>>> 
>>>>>>>>>>>> ср, 14 нояб. 2018 г. в 6:04, Steffen Rochel <
>>>>>>>>> steffenrochel@gmail.com>:
>>>>>>>>>>>> 
>>>>>>>>>>>>> Dear MXNet community,
>>>>>>>>>>>>> the agreed plan was to establish code freeze for 1.4.0
>>>>> release
>>>>>>>>>>>>> today. As the 1.3.1 patch release is still ongoing I
>>>>> suggest to
>>>>>>>>>>>>> post-pone the code freeze to Friday 16th November
>> 2018.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Sergey Kolychev has agreed to act as co-release
>> manager
>>> for
>>>>> all
>>>>>>>>>>>>> tasks
>>>>>>>>>>>> which
>>>>>>>>>>>>> require committer privileges. If anybody is interested
>>> to
>>>>>>>>> volunteer
>>>>>>>>>>>>> as release manager - now is the time to speak up.
>>> Otherwise
>>>>> I
>>>>>>> will
>>>>>>>>>>>>> manage
>>>>>>>>>>>> the
>>>>>>>>>>>>> release.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>> Steffen
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>> 
>>> 
>> 


Re: [Announce] Upcoming Apache MXNet (incubating) 1.4.0 release

Posted by Naveen Swamy <mn...@gmail.com>.
the tests are randomly failing in different stages
http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-13105/
This PR has failed 8 times so far

On Thu, Nov 29, 2018 at 3:43 PM Steffen Rochel <st...@gmail.com>
wrote:

> Pedro - ok. Please add PR to v1.4.x branch after merge to master and please
> update tracking page
> <
> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Plan+and+Status#ApacheMXNet(incubating)1.4.0ReleasePlanandStatus-OpenPRstotrack
> >
> .
> Steffen
>
> On Thu, Nov 29, 2018 at 3:00 PM Pedro Larroy <pedro.larroy.lists@gmail.com
> >
> wrote:
>
> > PR is ready from my side and passes the tests, unless somebody raises
> > any concerns it's good to go.
> > On Thu, Nov 29, 2018 at 9:50 PM Steffen Rochel <st...@gmail.com>
> > wrote:
> > >
> > > Pedro - added  to 1.4.0 tracking list
> > > <
> >
> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Plan+and+Status#ApacheMXNet(incubating)1.4.0ReleasePlanandStatus-OpenPRstotrack
> > >
> > >
> > > Do you have already ETA?
> > > Steffen
> > >
> > > On Thu, Nov 29, 2018 at 6:13 AM Pedro Larroy <
> > pedro.larroy.lists@gmail.com>
> > > wrote:
> > >
> > > > Hi all.
> > > >
> > > > There are two important issues / fixes that should go in the next
> > > > release in my radar:
> > > >
> > > > 1) https://github.com/apache/incubator-mxnet/pull/13409/files
> > > > There is a bug in shape inference on CPU when not using MKL, also we
> > > > are running activation on CPU via MKL when we compile CUDNN+MKLDNN.
> > > > I'm finishing a fix for these issues in the above PR.
> > > >
> > > > 2) https://github.com/apache/incubator-mxnet/issues/13438
> > > > We are seeing crashes due to unsafe setenv in multithreaded code.
> > > > Setenv / getenv from multiple threads is not safe and is causing
> > > > segfaults. This piece of code (the handlers in pthread_atfork)
> already
> > > > caused a very difficult to diagnose hang in a previous release, where
> > > > a fork inside cudnn would deadlock the engine.
> > > >
> > > > I would remove setenv from 2) as a mitigation, but we would need to
> > > > check for regressions as we could be creating additional threads
> > > > inside the engine.
> > > >
> > > > I would suggest that we address these two major issues before the
> next
> > > > release.
> > > >
> > > > Pedro
> > > >
> > > >
> > > >
> > > > On Sun, Nov 25, 2018 at 11:41 PM Steffen Rochel <
> > steffenrochel@gmail.com>
> > > > wrote:
> > > > >
> > > > > Dear MXNet community,
> > > > >
> > > > > I will be the release manager for the upcoming Apache MXNet 1.4.0
> > > > release.
> > > > > Sergey Kolychev will be co-managing the release and providing help
> > from
> > > > the
> > > > > committers side.
> > > > > A release candidate will be cut on November 29, 2018 and voting
> will
> > > > start
> > > > > December 7, 2018. Release notes have been drafted here [1]. If you
> > have
> > > > any
> > > > > additional features in progress and would like to include it in
> this
> > > > > release, please assure they have been merged by November 27, 2018.
> > > > Release
> > > > > schedule is available here [2].
> > > > >
> > > > > Feel free to add any other comments/suggestions. Please help to
> > review
> > > > and
> > > > > merge outstanding PR's and resolve issues impacting the quality of
> > the
> > > > > 1.4.0 release.
> > > > >
> > > > > Regards,
> > > > >
> > > > > Steffen
> > > > >
> > > > > [1]
> > > > >
> > > >
> >
> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Notes
> > > > >
> > > > > [2]
> > > >
> >
> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Plan+and+Status
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Tue, Nov 20, 2018 at 7:15 PM kellen sunderland <
> > > > > kellen.sunderland@gmail.com> wrote:
> > > > >
> > > > > > Spoke too soon[1], looks like others have been adding Turing
> > support as
> > > > > > well (thanks to those helping with this).  I believe there's
> still
> > a
> > > > few
> > > > > > changes we'd have to make to claim support though (mshadow CMake
> > > > changes,
> > > > > > PyPi package creation tweaks).
> > > > > >
> > > > > > 1:
> > > > > >
> > > > > >
> > > >
> >
> https://github.com/apache/incubator-mxnet/commit/2c3357443ec3d49a11e93c89f278264ce10c2f08
> > > > > >
> > > > > > On Tue, Nov 20, 2018 at 7:00 PM kellen sunderland <
> > > > > > kellen.sunderland@gmail.com> wrote:
> > > > > >
> > > > > > > Hey Steffen, I'd like to be able to merge this PR for version
> > 1.4:
> > > > > > > https://github.com/apache/incubator-mxnet/pull/13310 . It
> fixes
> > a
> > > > > > > regression in master which causes incorrect feature vectors to
> be
> > > > output
> > > > > > > when using the TensorRT feature.  (Thanks to Nathalie for
> > helping me
> > > > > > track
> > > > > > > down the root cause of the issue).   I'm currently blocked on a
> > CI
> > > > issue
> > > > > > I
> > > > > > > haven't seen before, but hope to have it resolved by EOW.
> > > > > > >
> > > > > > > One call-out I would make is that we currently don't support
> > Turing
> > > > > > > architecture (sm_75).  I've been slowly trying to add support,
> > but I
> > > > > > don't
> > > > > > > think I'd have capacity to do this done by EOW.  Does anyone
> feel
> > > > > > strongly
> > > > > > > we need this in the 1.4 release?  From my perspective this will
> > > > already
> > > > > > be
> > > > > > > a strong release without it.
> > > > > > >
> > > > > > > On Tue, Nov 20, 2018 at 6:42 PM Steffen Rochel <
> > > > steffenrochel@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > >> Thanks Patrick, lets target to get the PR's merged this week.
> > > > > > >>
> > > > > > >> Call for contributions from the community: Right now we have
> 10
> > PR
> > > > > > >> awaiting
> > > > > > >> merge
> > > > > > >> <
> > > > > > >>
> > > > > >
> > > >
> >
> https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aopen+label%3Apr-awaiting-merge+
> > > > > > >> >
> > > > > > >> and
> > > > > > >> we have 61 open PR awaiting review.
> > > > > > >> <
> > > > > > >>
> > > > > >
> > > >
> >
> https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aopen+label%3Apr-awaiting-review
> > > > > > >> >
> > > > > > >> I would appreciate if you all can help to review the open PR
> > and the
> > > > > > >> committers can drive the merge before code freeze for 1.4.0.
> > > > > > >>
> > > > > > >> The contributors on the Java API are making progress, but not
> > all
> > > > > > >> performance issues are resolved. With some luck it should be
> > > > possible to
> > > > > > >> code freeze towards end of this week.
> > > > > > >>
> > > > > > >> Are there other critical features/bugs/PR you think need to be
> > > > included
> > > > > > in
> > > > > > >> 1.4.0? If so, please communicate as soon as possible.
> > > > > > >>
> > > > > > >> Regards,
> > > > > > >> Steffen
> > > > > > >>
> > > > > > >> On Mon, Nov 19, 2018 at 8:26 PM Zhao, Patric <
> > patric.zhao@intel.com
> > > > >
> > > > > > >> wrote:
> > > > > > >>
> > > > > > >> > Thanks, Steffen. I think there is NO open issue to block the
> > > > MKLDNN to
> > > > > > >> GA
> > > > > > >> > now.
> > > > > > >> >
> > > > > > >> > BTW, several quantization related PRs (#13297,#13260) are
> > under
> > > > the
> > > > > > >> review
> > > > > > >> > and I think it can be merged in this week.
> > > > > > >> >
> > > > > > >> > Thanks,
> > > > > > >> >
> > > > > > >> > --Patric
> > > > > > >> >
> > > > > > >> >
> > > > > > >> > > -----Original Message-----
> > > > > > >> > > From: Steffen Rochel [mailto:steffenrochel@gmail.com]
> > > > > > >> > > Sent: Tuesday, November 20, 2018 2:57 AM
> > > > > > >> > > To: dev@mxnet.incubator.apache.org
> > > > > > >> > > Subject: Re: [Announce] Upcoming Apache MXNet (incubating)
> > 1.4.0
> > > > > > >> release
> > > > > > >> > >
> > > > > > >> > > On Friday the contributors working on Java API discovered
> a
> > > > > > potential
> > > > > > >> > > performance problem with inference using Java API vs.
> > Python.
> > > > > > >> > Investigation
> > > > > > >> > > is ongoing.
> > > > > > >> > > As the Java API is one of the main features for the
> upcoming
> > > > > > release,
> > > > > > >> I
> > > > > > >> > > suggest to post-pone the code freeze towards end of this
> > week.
> > > > > > >> > >
> > > > > > >> > > Please provide feedback and concern about the change in
> > dates
> > > > for
> > > > > > code
> > > > > > >> > > freeze and 1.4.0 release. I will provide updates on
> progress
> > > > > > resolving
> > > > > > >> > the
> > > > > > >> > > potential performance problem.
> > > > > > >> > >
> > > > > > >> > > Patrick - do you think it is possible to resolve the
> > remaining
> > > > > > issues
> > > > > > >> on
> > > > > > >> > MKL-
> > > > > > >> > > DNN this week, so we can consider GA for MKL-DNN with
> 1.4.0?
> > > > > > >> > >
> > > > > > >> > > Regards,
> > > > > > >> > > Steffen
> > > > > > >> > >
> > > > > > >> > > On Thu, Nov 15, 2018 at 5:26 AM Anton Chernov <
> > > > mechernov@gmail.com>
> > > > > > >> > > wrote:
> > > > > > >> > >
> > > > > > >> > > > I'd like to remind everyone that 'code freeze' would
> mean
> > > > cutting
> > > > > > a
> > > > > > >> > > > v1.4.x release branch and all following fixes would need
> > to be
> > > > > > >> > backported.
> > > > > > >> > > > Development on master can be continued as usual.
> > > > > > >> > > >
> > > > > > >> > > > Best
> > > > > > >> > > > Anton
> > > > > > >> > > >
> > > > > > >> > > > ср, 14 нояб. 2018 г. в 6:04, Steffen Rochel <
> > > > > > >> steffenrochel@gmail.com>:
> > > > > > >> > > >
> > > > > > >> > > > > Dear MXNet community,
> > > > > > >> > > > > the agreed plan was to establish code freeze for 1.4.0
> > > > release
> > > > > > >> > > > > today. As the 1.3.1 patch release is still ongoing I
> > > > suggest to
> > > > > > >> > > > > post-pone the code freeze to Friday 16th November
> 2018.
> > > > > > >> > > > >
> > > > > > >> > > > > Sergey Kolychev has agreed to act as co-release
> manager
> > for
> > > > all
> > > > > > >> > > > > tasks
> > > > > > >> > > > which
> > > > > > >> > > > > require committer privileges. If anybody is interested
> > to
> > > > > > >> volunteer
> > > > > > >> > > > > as release manager - now is the time to speak up.
> > Otherwise
> > > > I
> > > > > > will
> > > > > > >> > > > > manage
> > > > > > >> > > > the
> > > > > > >> > > > > release.
> > > > > > >> > > > >
> > > > > > >> > > > > Regards,
> > > > > > >> > > > > Steffen
> > > > > > >> > > > >
> > > > > > >> > > >
> > > > > > >> >
> > > > > > >>
> > > > > > >
> > > > > >
> > > >
> >
>

Re: [Announce] Upcoming Apache MXNet (incubating) 1.4.0 release

Posted by Steffen Rochel <st...@gmail.com>.
Pedro - ok. Please add PR to v1.4.x branch after merge to master and please
update tracking page
<https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Plan+and+Status#ApacheMXNet(incubating)1.4.0ReleasePlanandStatus-OpenPRstotrack>
.
Steffen

On Thu, Nov 29, 2018 at 3:00 PM Pedro Larroy <pe...@gmail.com>
wrote:

> PR is ready from my side and passes the tests, unless somebody raises
> any concerns it's good to go.
> On Thu, Nov 29, 2018 at 9:50 PM Steffen Rochel <st...@gmail.com>
> wrote:
> >
> > Pedro - added  to 1.4.0 tracking list
> > <
> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Plan+and+Status#ApacheMXNet(incubating)1.4.0ReleasePlanandStatus-OpenPRstotrack
> >
> >
> > Do you have already ETA?
> > Steffen
> >
> > On Thu, Nov 29, 2018 at 6:13 AM Pedro Larroy <
> pedro.larroy.lists@gmail.com>
> > wrote:
> >
> > > Hi all.
> > >
> > > There are two important issues / fixes that should go in the next
> > > release in my radar:
> > >
> > > 1) https://github.com/apache/incubator-mxnet/pull/13409/files
> > > There is a bug in shape inference on CPU when not using MKL, also we
> > > are running activation on CPU via MKL when we compile CUDNN+MKLDNN.
> > > I'm finishing a fix for these issues in the above PR.
> > >
> > > 2) https://github.com/apache/incubator-mxnet/issues/13438
> > > We are seeing crashes due to unsafe setenv in multithreaded code.
> > > Setenv / getenv from multiple threads is not safe and is causing
> > > segfaults. This piece of code (the handlers in pthread_atfork) already
> > > caused a very difficult to diagnose hang in a previous release, where
> > > a fork inside cudnn would deadlock the engine.
> > >
> > > I would remove setenv from 2) as a mitigation, but we would need to
> > > check for regressions as we could be creating additional threads
> > > inside the engine.
> > >
> > > I would suggest that we address these two major issues before the next
> > > release.
> > >
> > > Pedro
> > >
> > >
> > >
> > > On Sun, Nov 25, 2018 at 11:41 PM Steffen Rochel <
> steffenrochel@gmail.com>
> > > wrote:
> > > >
> > > > Dear MXNet community,
> > > >
> > > > I will be the release manager for the upcoming Apache MXNet 1.4.0
> > > release.
> > > > Sergey Kolychev will be co-managing the release and providing help
> from
> > > the
> > > > committers side.
> > > > A release candidate will be cut on November 29, 2018 and voting will
> > > start
> > > > December 7, 2018. Release notes have been drafted here [1]. If you
> have
> > > any
> > > > additional features in progress and would like to include it in this
> > > > release, please assure they have been merged by November 27, 2018.
> > > Release
> > > > schedule is available here [2].
> > > >
> > > > Feel free to add any other comments/suggestions. Please help to
> review
> > > and
> > > > merge outstanding PR's and resolve issues impacting the quality of
> the
> > > > 1.4.0 release.
> > > >
> > > > Regards,
> > > >
> > > > Steffen
> > > >
> > > > [1]
> > > >
> > >
> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Notes
> > > >
> > > > [2]
> > >
> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Plan+and+Status
> > > >
> > > >
> > > >
> > > >
> > > > On Tue, Nov 20, 2018 at 7:15 PM kellen sunderland <
> > > > kellen.sunderland@gmail.com> wrote:
> > > >
> > > > > Spoke too soon[1], looks like others have been adding Turing
> support as
> > > > > well (thanks to those helping with this).  I believe there's still
> a
> > > few
> > > > > changes we'd have to make to claim support though (mshadow CMake
> > > changes,
> > > > > PyPi package creation tweaks).
> > > > >
> > > > > 1:
> > > > >
> > > > >
> > >
> https://github.com/apache/incubator-mxnet/commit/2c3357443ec3d49a11e93c89f278264ce10c2f08
> > > > >
> > > > > On Tue, Nov 20, 2018 at 7:00 PM kellen sunderland <
> > > > > kellen.sunderland@gmail.com> wrote:
> > > > >
> > > > > > Hey Steffen, I'd like to be able to merge this PR for version
> 1.4:
> > > > > > https://github.com/apache/incubator-mxnet/pull/13310 . It fixes
> a
> > > > > > regression in master which causes incorrect feature vectors to be
> > > output
> > > > > > when using the TensorRT feature.  (Thanks to Nathalie for
> helping me
> > > > > track
> > > > > > down the root cause of the issue).   I'm currently blocked on a
> CI
> > > issue
> > > > > I
> > > > > > haven't seen before, but hope to have it resolved by EOW.
> > > > > >
> > > > > > One call-out I would make is that we currently don't support
> Turing
> > > > > > architecture (sm_75).  I've been slowly trying to add support,
> but I
> > > > > don't
> > > > > > think I'd have capacity to do this done by EOW.  Does anyone feel
> > > > > strongly
> > > > > > we need this in the 1.4 release?  From my perspective this will
> > > already
> > > > > be
> > > > > > a strong release without it.
> > > > > >
> > > > > > On Tue, Nov 20, 2018 at 6:42 PM Steffen Rochel <
> > > steffenrochel@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > >> Thanks Patrick, lets target to get the PR's merged this week.
> > > > > >>
> > > > > >> Call for contributions from the community: Right now we have 10
> PR
> > > > > >> awaiting
> > > > > >> merge
> > > > > >> <
> > > > > >>
> > > > >
> > >
> https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aopen+label%3Apr-awaiting-merge+
> > > > > >> >
> > > > > >> and
> > > > > >> we have 61 open PR awaiting review.
> > > > > >> <
> > > > > >>
> > > > >
> > >
> https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aopen+label%3Apr-awaiting-review
> > > > > >> >
> > > > > >> I would appreciate if you all can help to review the open PR
> and the
> > > > > >> committers can drive the merge before code freeze for 1.4.0.
> > > > > >>
> > > > > >> The contributors on the Java API are making progress, but not
> all
> > > > > >> performance issues are resolved. With some luck it should be
> > > possible to
> > > > > >> code freeze towards end of this week.
> > > > > >>
> > > > > >> Are there other critical features/bugs/PR you think need to be
> > > included
> > > > > in
> > > > > >> 1.4.0? If so, please communicate as soon as possible.
> > > > > >>
> > > > > >> Regards,
> > > > > >> Steffen
> > > > > >>
> > > > > >> On Mon, Nov 19, 2018 at 8:26 PM Zhao, Patric <
> patric.zhao@intel.com
> > > >
> > > > > >> wrote:
> > > > > >>
> > > > > >> > Thanks, Steffen. I think there is NO open issue to block the
> > > MKLDNN to
> > > > > >> GA
> > > > > >> > now.
> > > > > >> >
> > > > > >> > BTW, several quantization related PRs (#13297,#13260) are
> under
> > > the
> > > > > >> review
> > > > > >> > and I think it can be merged in this week.
> > > > > >> >
> > > > > >> > Thanks,
> > > > > >> >
> > > > > >> > --Patric
> > > > > >> >
> > > > > >> >
> > > > > >> > > -----Original Message-----
> > > > > >> > > From: Steffen Rochel [mailto:steffenrochel@gmail.com]
> > > > > >> > > Sent: Tuesday, November 20, 2018 2:57 AM
> > > > > >> > > To: dev@mxnet.incubator.apache.org
> > > > > >> > > Subject: Re: [Announce] Upcoming Apache MXNet (incubating)
> 1.4.0
> > > > > >> release
> > > > > >> > >
> > > > > >> > > On Friday the contributors working on Java API discovered a
> > > > > potential
> > > > > >> > > performance problem with inference using Java API vs.
> Python.
> > > > > >> > Investigation
> > > > > >> > > is ongoing.
> > > > > >> > > As the Java API is one of the main features for the upcoming
> > > > > release,
> > > > > >> I
> > > > > >> > > suggest to post-pone the code freeze towards end of this
> week.
> > > > > >> > >
> > > > > >> > > Please provide feedback and concern about the change in
> dates
> > > for
> > > > > code
> > > > > >> > > freeze and 1.4.0 release. I will provide updates on progress
> > > > > resolving
> > > > > >> > the
> > > > > >> > > potential performance problem.
> > > > > >> > >
> > > > > >> > > Patrick - do you think it is possible to resolve the
> remaining
> > > > > issues
> > > > > >> on
> > > > > >> > MKL-
> > > > > >> > > DNN this week, so we can consider GA for MKL-DNN with 1.4.0?
> > > > > >> > >
> > > > > >> > > Regards,
> > > > > >> > > Steffen
> > > > > >> > >
> > > > > >> > > On Thu, Nov 15, 2018 at 5:26 AM Anton Chernov <
> > > mechernov@gmail.com>
> > > > > >> > > wrote:
> > > > > >> > >
> > > > > >> > > > I'd like to remind everyone that 'code freeze' would mean
> > > cutting
> > > > > a
> > > > > >> > > > v1.4.x release branch and all following fixes would need
> to be
> > > > > >> > backported.
> > > > > >> > > > Development on master can be continued as usual.
> > > > > >> > > >
> > > > > >> > > > Best
> > > > > >> > > > Anton
> > > > > >> > > >
> > > > > >> > > > ср, 14 нояб. 2018 г. в 6:04, Steffen Rochel <
> > > > > >> steffenrochel@gmail.com>:
> > > > > >> > > >
> > > > > >> > > > > Dear MXNet community,
> > > > > >> > > > > the agreed plan was to establish code freeze for 1.4.0
> > > release
> > > > > >> > > > > today. As the 1.3.1 patch release is still ongoing I
> > > suggest to
> > > > > >> > > > > post-pone the code freeze to Friday 16th November 2018.
> > > > > >> > > > >
> > > > > >> > > > > Sergey Kolychev has agreed to act as co-release manager
> for
> > > all
> > > > > >> > > > > tasks
> > > > > >> > > > which
> > > > > >> > > > > require committer privileges. If anybody is interested
> to
> > > > > >> volunteer
> > > > > >> > > > > as release manager - now is the time to speak up.
> Otherwise
> > > I
> > > > > will
> > > > > >> > > > > manage
> > > > > >> > > > the
> > > > > >> > > > > release.
> > > > > >> > > > >
> > > > > >> > > > > Regards,
> > > > > >> > > > > Steffen
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> >
> > > > > >>
> > > > > >
> > > > >
> > >
>

Re: [Announce] Upcoming Apache MXNet (incubating) 1.4.0 release

Posted by Steffen Rochel <st...@gmail.com>.
Qing - ok. Please merge to v1.4.x branch after merged to master.
Steffen

On Thu, Nov 29, 2018 at 3:17 PM Qing Lan <la...@live.com> wrote:

> Hi all,
> I have a critical bug-fix PR
> https://github.com/apache/incubator-mxnet/pull/13330 that essentially fix
> the problems for supporting inference with different shape in Scala/Java
> (introduced in v1.1). I would like to request to cherry-pick this one in
> 1.4.
>
> Thanks,
> Qing
>
> On 11/29/18, 3:00 PM, "Pedro Larroy" <pe...@gmail.com>
> wrote:
>
>     PR is ready from my side and passes the tests, unless somebody raises
>     any concerns it's good to go.
>     On Thu, Nov 29, 2018 at 9:50 PM Steffen Rochel <
> steffenrochel@gmail.com> wrote:
>     >
>     > Pedro - added  to 1.4.0 tracking list
>     > <
> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Plan+and+Status#ApacheMXNet(incubating)1.4.0ReleasePlanandStatus-OpenPRstotrack
> >
>     >
>     > Do you have already ETA?
>     > Steffen
>     >
>     > On Thu, Nov 29, 2018 at 6:13 AM Pedro Larroy <
> pedro.larroy.lists@gmail.com>
>     > wrote:
>     >
>     > > Hi all.
>     > >
>     > > There are two important issues / fixes that should go in the next
>     > > release in my radar:
>     > >
>     > > 1) https://github.com/apache/incubator-mxnet/pull/13409/files
>     > > There is a bug in shape inference on CPU when not using MKL, also
> we
>     > > are running activation on CPU via MKL when we compile CUDNN+MKLDNN.
>     > > I'm finishing a fix for these issues in the above PR.
>     > >
>     > > 2) https://github.com/apache/incubator-mxnet/issues/13438
>     > > We are seeing crashes due to unsafe setenv in multithreaded code.
>     > > Setenv / getenv from multiple threads is not safe and is causing
>     > > segfaults. This piece of code (the handlers in pthread_atfork)
> already
>     > > caused a very difficult to diagnose hang in a previous release,
> where
>     > > a fork inside cudnn would deadlock the engine.
>     > >
>     > > I would remove setenv from 2) as a mitigation, but we would need to
>     > > check for regressions as we could be creating additional threads
>     > > inside the engine.
>     > >
>     > > I would suggest that we address these two major issues before the
> next
>     > > release.
>     > >
>     > > Pedro
>     > >
>     > >
>     > >
>     > > On Sun, Nov 25, 2018 at 11:41 PM Steffen Rochel <
> steffenrochel@gmail.com>
>     > > wrote:
>     > > >
>     > > > Dear MXNet community,
>     > > >
>     > > > I will be the release manager for the upcoming Apache MXNet 1.4.0
>     > > release.
>     > > > Sergey Kolychev will be co-managing the release and providing
> help from
>     > > the
>     > > > committers side.
>     > > > A release candidate will be cut on November 29, 2018 and voting
> will
>     > > start
>     > > > December 7, 2018. Release notes have been drafted here [1]. If
> you have
>     > > any
>     > > > additional features in progress and would like to include it in
> this
>     > > > release, please assure they have been merged by November 27,
> 2018.
>     > > Release
>     > > > schedule is available here [2].
>     > > >
>     > > > Feel free to add any other comments/suggestions. Please help to
> review
>     > > and
>     > > > merge outstanding PR's and resolve issues impacting the quality
> of the
>     > > > 1.4.0 release.
>     > > >
>     > > > Regards,
>     > > >
>     > > > Steffen
>     > > >
>     > > > [1]
>     > > >
>     > >
> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Notes
>     > > >
>     > > > [2]
>     > >
> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Plan+and+Status
>     > > >
>     > > >
>     > > >
>     > > >
>     > > > On Tue, Nov 20, 2018 at 7:15 PM kellen sunderland <
>     > > > kellen.sunderland@gmail.com> wrote:
>     > > >
>     > > > > Spoke too soon[1], looks like others have been adding Turing
> support as
>     > > > > well (thanks to those helping with this).  I believe there's
> still a
>     > > few
>     > > > > changes we'd have to make to claim support though (mshadow
> CMake
>     > > changes,
>     > > > > PyPi package creation tweaks).
>     > > > >
>     > > > > 1:
>     > > > >
>     > > > >
>     > >
> https://github.com/apache/incubator-mxnet/commit/2c3357443ec3d49a11e93c89f278264ce10c2f08
>     > > > >
>     > > > > On Tue, Nov 20, 2018 at 7:00 PM kellen sunderland <
>     > > > > kellen.sunderland@gmail.com> wrote:
>     > > > >
>     > > > > > Hey Steffen, I'd like to be able to merge this PR for
> version 1.4:
>     > > > > > https://github.com/apache/incubator-mxnet/pull/13310 . It
> fixes a
>     > > > > > regression in master which causes incorrect feature vectors
> to be
>     > > output
>     > > > > > when using the TensorRT feature.  (Thanks to Nathalie for
> helping me
>     > > > > track
>     > > > > > down the root cause of the issue).   I'm currently blocked
> on a CI
>     > > issue
>     > > > > I
>     > > > > > haven't seen before, but hope to have it resolved by EOW.
>     > > > > >
>     > > > > > One call-out I would make is that we currently don't support
> Turing
>     > > > > > architecture (sm_75).  I've been slowly trying to add
> support, but I
>     > > > > don't
>     > > > > > think I'd have capacity to do this done by EOW.  Does anyone
> feel
>     > > > > strongly
>     > > > > > we need this in the 1.4 release?  From my perspective this
> will
>     > > already
>     > > > > be
>     > > > > > a strong release without it.
>     > > > > >
>     > > > > > On Tue, Nov 20, 2018 at 6:42 PM Steffen Rochel <
>     > > steffenrochel@gmail.com>
>     > > > > > wrote:
>     > > > > >
>     > > > > >> Thanks Patrick, lets target to get the PR's merged this
> week.
>     > > > > >>
>     > > > > >> Call for contributions from the community: Right now we
> have 10 PR
>     > > > > >> awaiting
>     > > > > >> merge
>     > > > > >> <
>     > > > > >>
>     > > > >
>     > >
> https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aopen+label%3Apr-awaiting-merge+
>     > > > > >> >
>     > > > > >> and
>     > > > > >> we have 61 open PR awaiting review.
>     > > > > >> <
>     > > > > >>
>     > > > >
>     > >
> https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aopen+label%3Apr-awaiting-review
>     > > > > >> >
>     > > > > >> I would appreciate if you all can help to review the open
> PR and the
>     > > > > >> committers can drive the merge before code freeze for 1.4.0.
>     > > > > >>
>     > > > > >> The contributors on the Java API are making progress, but
> not all
>     > > > > >> performance issues are resolved. With some luck it should be
>     > > possible to
>     > > > > >> code freeze towards end of this week.
>     > > > > >>
>     > > > > >> Are there other critical features/bugs/PR you think need to
> be
>     > > included
>     > > > > in
>     > > > > >> 1.4.0? If so, please communicate as soon as possible.
>     > > > > >>
>     > > > > >> Regards,
>     > > > > >> Steffen
>     > > > > >>
>     > > > > >> On Mon, Nov 19, 2018 at 8:26 PM Zhao, Patric <
> patric.zhao@intel.com
>     > > >
>     > > > > >> wrote:
>     > > > > >>
>     > > > > >> > Thanks, Steffen. I think there is NO open issue to block
> the
>     > > MKLDNN to
>     > > > > >> GA
>     > > > > >> > now.
>     > > > > >> >
>     > > > > >> > BTW, several quantization related PRs (#13297,#13260) are
> under
>     > > the
>     > > > > >> review
>     > > > > >> > and I think it can be merged in this week.
>     > > > > >> >
>     > > > > >> > Thanks,
>     > > > > >> >
>     > > > > >> > --Patric
>     > > > > >> >
>     > > > > >> >
>     > > > > >> > > -----Original Message-----
>     > > > > >> > > From: Steffen Rochel [mailto:steffenrochel@gmail.com]
>     > > > > >> > > Sent: Tuesday, November 20, 2018 2:57 AM
>     > > > > >> > > To: dev@mxnet.incubator.apache.org
>     > > > > >> > > Subject: Re: [Announce] Upcoming Apache MXNet
> (incubating) 1.4.0
>     > > > > >> release
>     > > > > >> > >
>     > > > > >> > > On Friday the contributors working on Java API
> discovered a
>     > > > > potential
>     > > > > >> > > performance problem with inference using Java API vs.
> Python.
>     > > > > >> > Investigation
>     > > > > >> > > is ongoing.
>     > > > > >> > > As the Java API is one of the main features for the
> upcoming
>     > > > > release,
>     > > > > >> I
>     > > > > >> > > suggest to post-pone the code freeze towards end of
> this week.
>     > > > > >> > >
>     > > > > >> > > Please provide feedback and concern about the change in
> dates
>     > > for
>     > > > > code
>     > > > > >> > > freeze and 1.4.0 release. I will provide updates on
> progress
>     > > > > resolving
>     > > > > >> > the
>     > > > > >> > > potential performance problem.
>     > > > > >> > >
>     > > > > >> > > Patrick - do you think it is possible to resolve the
> remaining
>     > > > > issues
>     > > > > >> on
>     > > > > >> > MKL-
>     > > > > >> > > DNN this week, so we can consider GA for MKL-DNN with
> 1.4.0?
>     > > > > >> > >
>     > > > > >> > > Regards,
>     > > > > >> > > Steffen
>     > > > > >> > >
>     > > > > >> > > On Thu, Nov 15, 2018 at 5:26 AM Anton Chernov <
>     > > mechernov@gmail.com>
>     > > > > >> > > wrote:
>     > > > > >> > >
>     > > > > >> > > > I'd like to remind everyone that 'code freeze' would
> mean
>     > > cutting
>     > > > > a
>     > > > > >> > > > v1.4.x release branch and all following fixes would
> need to be
>     > > > > >> > backported.
>     > > > > >> > > > Development on master can be continued as usual.
>     > > > > >> > > >
>     > > > > >> > > > Best
>     > > > > >> > > > Anton
>     > > > > >> > > >
>     > > > > >> > > > ср, 14 нояб. 2018 г. в 6:04, Steffen Rochel <
>     > > > > >> steffenrochel@gmail.com>:
>     > > > > >> > > >
>     > > > > >> > > > > Dear MXNet community,
>     > > > > >> > > > > the agreed plan was to establish code freeze for
> 1.4.0
>     > > release
>     > > > > >> > > > > today. As the 1.3.1 patch release is still ongoing I
>     > > suggest to
>     > > > > >> > > > > post-pone the code freeze to Friday 16th November
> 2018.
>     > > > > >> > > > >
>     > > > > >> > > > > Sergey Kolychev has agreed to act as co-release
> manager for
>     > > all
>     > > > > >> > > > > tasks
>     > > > > >> > > > which
>     > > > > >> > > > > require committer privileges. If anybody is
> interested to
>     > > > > >> volunteer
>     > > > > >> > > > > as release manager - now is the time to speak up.
> Otherwise
>     > > I
>     > > > > will
>     > > > > >> > > > > manage
>     > > > > >> > > > the
>     > > > > >> > > > > release.
>     > > > > >> > > > >
>     > > > > >> > > > > Regards,
>     > > > > >> > > > > Steffen
>     > > > > >> > > > >
>     > > > > >> > > >
>     > > > > >> >
>     > > > > >>
>     > > > > >
>     > > > >
>     > >
>
>
>

Re: [Announce] Upcoming Apache MXNet (incubating) 1.4.0 release

Posted by Qing Lan <la...@live.com>.
Hi all,
I have a critical bug-fix PR https://github.com/apache/incubator-mxnet/pull/13330 that essentially fix the problems for supporting inference with different shape in Scala/Java (introduced in v1.1). I would like to request to cherry-pick this one in 1.4.

Thanks,
Qing

On 11/29/18, 3:00 PM, "Pedro Larroy" <pe...@gmail.com> wrote:

    PR is ready from my side and passes the tests, unless somebody raises
    any concerns it's good to go.
    On Thu, Nov 29, 2018 at 9:50 PM Steffen Rochel <st...@gmail.com> wrote:
    >
    > Pedro - added  to 1.4.0 tracking list
    > <https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Plan+and+Status#ApacheMXNet(incubating)1.4.0ReleasePlanandStatus-OpenPRstotrack>
    >
    > Do you have already ETA?
    > Steffen
    >
    > On Thu, Nov 29, 2018 at 6:13 AM Pedro Larroy <pe...@gmail.com>
    > wrote:
    >
    > > Hi all.
    > >
    > > There are two important issues / fixes that should go in the next
    > > release in my radar:
    > >
    > > 1) https://github.com/apache/incubator-mxnet/pull/13409/files
    > > There is a bug in shape inference on CPU when not using MKL, also we
    > > are running activation on CPU via MKL when we compile CUDNN+MKLDNN.
    > > I'm finishing a fix for these issues in the above PR.
    > >
    > > 2) https://github.com/apache/incubator-mxnet/issues/13438
    > > We are seeing crashes due to unsafe setenv in multithreaded code.
    > > Setenv / getenv from multiple threads is not safe and is causing
    > > segfaults. This piece of code (the handlers in pthread_atfork) already
    > > caused a very difficult to diagnose hang in a previous release, where
    > > a fork inside cudnn would deadlock the engine.
    > >
    > > I would remove setenv from 2) as a mitigation, but we would need to
    > > check for regressions as we could be creating additional threads
    > > inside the engine.
    > >
    > > I would suggest that we address these two major issues before the next
    > > release.
    > >
    > > Pedro
    > >
    > >
    > >
    > > On Sun, Nov 25, 2018 at 11:41 PM Steffen Rochel <st...@gmail.com>
    > > wrote:
    > > >
    > > > Dear MXNet community,
    > > >
    > > > I will be the release manager for the upcoming Apache MXNet 1.4.0
    > > release.
    > > > Sergey Kolychev will be co-managing the release and providing help from
    > > the
    > > > committers side.
    > > > A release candidate will be cut on November 29, 2018 and voting will
    > > start
    > > > December 7, 2018. Release notes have been drafted here [1]. If you have
    > > any
    > > > additional features in progress and would like to include it in this
    > > > release, please assure they have been merged by November 27, 2018.
    > > Release
    > > > schedule is available here [2].
    > > >
    > > > Feel free to add any other comments/suggestions. Please help to review
    > > and
    > > > merge outstanding PR's and resolve issues impacting the quality of the
    > > > 1.4.0 release.
    > > >
    > > > Regards,
    > > >
    > > > Steffen
    > > >
    > > > [1]
    > > >
    > > https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Notes
    > > >
    > > > [2]
    > > https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Plan+and+Status
    > > >
    > > >
    > > >
    > > >
    > > > On Tue, Nov 20, 2018 at 7:15 PM kellen sunderland <
    > > > kellen.sunderland@gmail.com> wrote:
    > > >
    > > > > Spoke too soon[1], looks like others have been adding Turing support as
    > > > > well (thanks to those helping with this).  I believe there's still a
    > > few
    > > > > changes we'd have to make to claim support though (mshadow CMake
    > > changes,
    > > > > PyPi package creation tweaks).
    > > > >
    > > > > 1:
    > > > >
    > > > >
    > > https://github.com/apache/incubator-mxnet/commit/2c3357443ec3d49a11e93c89f278264ce10c2f08
    > > > >
    > > > > On Tue, Nov 20, 2018 at 7:00 PM kellen sunderland <
    > > > > kellen.sunderland@gmail.com> wrote:
    > > > >
    > > > > > Hey Steffen, I'd like to be able to merge this PR for version 1.4:
    > > > > > https://github.com/apache/incubator-mxnet/pull/13310 . It fixes a
    > > > > > regression in master which causes incorrect feature vectors to be
    > > output
    > > > > > when using the TensorRT feature.  (Thanks to Nathalie for helping me
    > > > > track
    > > > > > down the root cause of the issue).   I'm currently blocked on a CI
    > > issue
    > > > > I
    > > > > > haven't seen before, but hope to have it resolved by EOW.
    > > > > >
    > > > > > One call-out I would make is that we currently don't support Turing
    > > > > > architecture (sm_75).  I've been slowly trying to add support, but I
    > > > > don't
    > > > > > think I'd have capacity to do this done by EOW.  Does anyone feel
    > > > > strongly
    > > > > > we need this in the 1.4 release?  From my perspective this will
    > > already
    > > > > be
    > > > > > a strong release without it.
    > > > > >
    > > > > > On Tue, Nov 20, 2018 at 6:42 PM Steffen Rochel <
    > > steffenrochel@gmail.com>
    > > > > > wrote:
    > > > > >
    > > > > >> Thanks Patrick, lets target to get the PR's merged this week.
    > > > > >>
    > > > > >> Call for contributions from the community: Right now we have 10 PR
    > > > > >> awaiting
    > > > > >> merge
    > > > > >> <
    > > > > >>
    > > > >
    > > https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aopen+label%3Apr-awaiting-merge+
    > > > > >> >
    > > > > >> and
    > > > > >> we have 61 open PR awaiting review.
    > > > > >> <
    > > > > >>
    > > > >
    > > https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aopen+label%3Apr-awaiting-review
    > > > > >> >
    > > > > >> I would appreciate if you all can help to review the open PR and the
    > > > > >> committers can drive the merge before code freeze for 1.4.0.
    > > > > >>
    > > > > >> The contributors on the Java API are making progress, but not all
    > > > > >> performance issues are resolved. With some luck it should be
    > > possible to
    > > > > >> code freeze towards end of this week.
    > > > > >>
    > > > > >> Are there other critical features/bugs/PR you think need to be
    > > included
    > > > > in
    > > > > >> 1.4.0? If so, please communicate as soon as possible.
    > > > > >>
    > > > > >> Regards,
    > > > > >> Steffen
    > > > > >>
    > > > > >> On Mon, Nov 19, 2018 at 8:26 PM Zhao, Patric <patric.zhao@intel.com
    > > >
    > > > > >> wrote:
    > > > > >>
    > > > > >> > Thanks, Steffen. I think there is NO open issue to block the
    > > MKLDNN to
    > > > > >> GA
    > > > > >> > now.
    > > > > >> >
    > > > > >> > BTW, several quantization related PRs (#13297,#13260) are under
    > > the
    > > > > >> review
    > > > > >> > and I think it can be merged in this week.
    > > > > >> >
    > > > > >> > Thanks,
    > > > > >> >
    > > > > >> > --Patric
    > > > > >> >
    > > > > >> >
    > > > > >> > > -----Original Message-----
    > > > > >> > > From: Steffen Rochel [mailto:steffenrochel@gmail.com]
    > > > > >> > > Sent: Tuesday, November 20, 2018 2:57 AM
    > > > > >> > > To: dev@mxnet.incubator.apache.org
    > > > > >> > > Subject: Re: [Announce] Upcoming Apache MXNet (incubating) 1.4.0
    > > > > >> release
    > > > > >> > >
    > > > > >> > > On Friday the contributors working on Java API discovered a
    > > > > potential
    > > > > >> > > performance problem with inference using Java API vs. Python.
    > > > > >> > Investigation
    > > > > >> > > is ongoing.
    > > > > >> > > As the Java API is one of the main features for the upcoming
    > > > > release,
    > > > > >> I
    > > > > >> > > suggest to post-pone the code freeze towards end of this week.
    > > > > >> > >
    > > > > >> > > Please provide feedback and concern about the change in dates
    > > for
    > > > > code
    > > > > >> > > freeze and 1.4.0 release. I will provide updates on progress
    > > > > resolving
    > > > > >> > the
    > > > > >> > > potential performance problem.
    > > > > >> > >
    > > > > >> > > Patrick - do you think it is possible to resolve the remaining
    > > > > issues
    > > > > >> on
    > > > > >> > MKL-
    > > > > >> > > DNN this week, so we can consider GA for MKL-DNN with 1.4.0?
    > > > > >> > >
    > > > > >> > > Regards,
    > > > > >> > > Steffen
    > > > > >> > >
    > > > > >> > > On Thu, Nov 15, 2018 at 5:26 AM Anton Chernov <
    > > mechernov@gmail.com>
    > > > > >> > > wrote:
    > > > > >> > >
    > > > > >> > > > I'd like to remind everyone that 'code freeze' would mean
    > > cutting
    > > > > a
    > > > > >> > > > v1.4.x release branch and all following fixes would need to be
    > > > > >> > backported.
    > > > > >> > > > Development on master can be continued as usual.
    > > > > >> > > >
    > > > > >> > > > Best
    > > > > >> > > > Anton
    > > > > >> > > >
    > > > > >> > > > ср, 14 нояб. 2018 г. в 6:04, Steffen Rochel <
    > > > > >> steffenrochel@gmail.com>:
    > > > > >> > > >
    > > > > >> > > > > Dear MXNet community,
    > > > > >> > > > > the agreed plan was to establish code freeze for 1.4.0
    > > release
    > > > > >> > > > > today. As the 1.3.1 patch release is still ongoing I
    > > suggest to
    > > > > >> > > > > post-pone the code freeze to Friday 16th November 2018.
    > > > > >> > > > >
    > > > > >> > > > > Sergey Kolychev has agreed to act as co-release manager for
    > > all
    > > > > >> > > > > tasks
    > > > > >> > > > which
    > > > > >> > > > > require committer privileges. If anybody is interested to
    > > > > >> volunteer
    > > > > >> > > > > as release manager - now is the time to speak up. Otherwise
    > > I
    > > > > will
    > > > > >> > > > > manage
    > > > > >> > > > the
    > > > > >> > > > > release.
    > > > > >> > > > >
    > > > > >> > > > > Regards,
    > > > > >> > > > > Steffen
    > > > > >> > > > >
    > > > > >> > > >
    > > > > >> >
    > > > > >>
    > > > > >
    > > > >
    > >
    


Re: [Announce] Upcoming Apache MXNet (incubating) 1.4.0 release

Posted by Pedro Larroy <pe...@gmail.com>.
PR is ready from my side and passes the tests, unless somebody raises
any concerns it's good to go.
On Thu, Nov 29, 2018 at 9:50 PM Steffen Rochel <st...@gmail.com> wrote:
>
> Pedro - added  to 1.4.0 tracking list
> <https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Plan+and+Status#ApacheMXNet(incubating)1.4.0ReleasePlanandStatus-OpenPRstotrack>
>
> Do you have already ETA?
> Steffen
>
> On Thu, Nov 29, 2018 at 6:13 AM Pedro Larroy <pe...@gmail.com>
> wrote:
>
> > Hi all.
> >
> > There are two important issues / fixes that should go in the next
> > release in my radar:
> >
> > 1) https://github.com/apache/incubator-mxnet/pull/13409/files
> > There is a bug in shape inference on CPU when not using MKL, also we
> > are running activation on CPU via MKL when we compile CUDNN+MKLDNN.
> > I'm finishing a fix for these issues in the above PR.
> >
> > 2) https://github.com/apache/incubator-mxnet/issues/13438
> > We are seeing crashes due to unsafe setenv in multithreaded code.
> > Setenv / getenv from multiple threads is not safe and is causing
> > segfaults. This piece of code (the handlers in pthread_atfork) already
> > caused a very difficult to diagnose hang in a previous release, where
> > a fork inside cudnn would deadlock the engine.
> >
> > I would remove setenv from 2) as a mitigation, but we would need to
> > check for regressions as we could be creating additional threads
> > inside the engine.
> >
> > I would suggest that we address these two major issues before the next
> > release.
> >
> > Pedro
> >
> >
> >
> > On Sun, Nov 25, 2018 at 11:41 PM Steffen Rochel <st...@gmail.com>
> > wrote:
> > >
> > > Dear MXNet community,
> > >
> > > I will be the release manager for the upcoming Apache MXNet 1.4.0
> > release.
> > > Sergey Kolychev will be co-managing the release and providing help from
> > the
> > > committers side.
> > > A release candidate will be cut on November 29, 2018 and voting will
> > start
> > > December 7, 2018. Release notes have been drafted here [1]. If you have
> > any
> > > additional features in progress and would like to include it in this
> > > release, please assure they have been merged by November 27, 2018.
> > Release
> > > schedule is available here [2].
> > >
> > > Feel free to add any other comments/suggestions. Please help to review
> > and
> > > merge outstanding PR's and resolve issues impacting the quality of the
> > > 1.4.0 release.
> > >
> > > Regards,
> > >
> > > Steffen
> > >
> > > [1]
> > >
> > https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Notes
> > >
> > > [2]
> > https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Plan+and+Status
> > >
> > >
> > >
> > >
> > > On Tue, Nov 20, 2018 at 7:15 PM kellen sunderland <
> > > kellen.sunderland@gmail.com> wrote:
> > >
> > > > Spoke too soon[1], looks like others have been adding Turing support as
> > > > well (thanks to those helping with this).  I believe there's still a
> > few
> > > > changes we'd have to make to claim support though (mshadow CMake
> > changes,
> > > > PyPi package creation tweaks).
> > > >
> > > > 1:
> > > >
> > > >
> > https://github.com/apache/incubator-mxnet/commit/2c3357443ec3d49a11e93c89f278264ce10c2f08
> > > >
> > > > On Tue, Nov 20, 2018 at 7:00 PM kellen sunderland <
> > > > kellen.sunderland@gmail.com> wrote:
> > > >
> > > > > Hey Steffen, I'd like to be able to merge this PR for version 1.4:
> > > > > https://github.com/apache/incubator-mxnet/pull/13310 . It fixes a
> > > > > regression in master which causes incorrect feature vectors to be
> > output
> > > > > when using the TensorRT feature.  (Thanks to Nathalie for helping me
> > > > track
> > > > > down the root cause of the issue).   I'm currently blocked on a CI
> > issue
> > > > I
> > > > > haven't seen before, but hope to have it resolved by EOW.
> > > > >
> > > > > One call-out I would make is that we currently don't support Turing
> > > > > architecture (sm_75).  I've been slowly trying to add support, but I
> > > > don't
> > > > > think I'd have capacity to do this done by EOW.  Does anyone feel
> > > > strongly
> > > > > we need this in the 1.4 release?  From my perspective this will
> > already
> > > > be
> > > > > a strong release without it.
> > > > >
> > > > > On Tue, Nov 20, 2018 at 6:42 PM Steffen Rochel <
> > steffenrochel@gmail.com>
> > > > > wrote:
> > > > >
> > > > >> Thanks Patrick, lets target to get the PR's merged this week.
> > > > >>
> > > > >> Call for contributions from the community: Right now we have 10 PR
> > > > >> awaiting
> > > > >> merge
> > > > >> <
> > > > >>
> > > >
> > https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aopen+label%3Apr-awaiting-merge+
> > > > >> >
> > > > >> and
> > > > >> we have 61 open PR awaiting review.
> > > > >> <
> > > > >>
> > > >
> > https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aopen+label%3Apr-awaiting-review
> > > > >> >
> > > > >> I would appreciate if you all can help to review the open PR and the
> > > > >> committers can drive the merge before code freeze for 1.4.0.
> > > > >>
> > > > >> The contributors on the Java API are making progress, but not all
> > > > >> performance issues are resolved. With some luck it should be
> > possible to
> > > > >> code freeze towards end of this week.
> > > > >>
> > > > >> Are there other critical features/bugs/PR you think need to be
> > included
> > > > in
> > > > >> 1.4.0? If so, please communicate as soon as possible.
> > > > >>
> > > > >> Regards,
> > > > >> Steffen
> > > > >>
> > > > >> On Mon, Nov 19, 2018 at 8:26 PM Zhao, Patric <patric.zhao@intel.com
> > >
> > > > >> wrote:
> > > > >>
> > > > >> > Thanks, Steffen. I think there is NO open issue to block the
> > MKLDNN to
> > > > >> GA
> > > > >> > now.
> > > > >> >
> > > > >> > BTW, several quantization related PRs (#13297,#13260) are under
> > the
> > > > >> review
> > > > >> > and I think it can be merged in this week.
> > > > >> >
> > > > >> > Thanks,
> > > > >> >
> > > > >> > --Patric
> > > > >> >
> > > > >> >
> > > > >> > > -----Original Message-----
> > > > >> > > From: Steffen Rochel [mailto:steffenrochel@gmail.com]
> > > > >> > > Sent: Tuesday, November 20, 2018 2:57 AM
> > > > >> > > To: dev@mxnet.incubator.apache.org
> > > > >> > > Subject: Re: [Announce] Upcoming Apache MXNet (incubating) 1.4.0
> > > > >> release
> > > > >> > >
> > > > >> > > On Friday the contributors working on Java API discovered a
> > > > potential
> > > > >> > > performance problem with inference using Java API vs. Python.
> > > > >> > Investigation
> > > > >> > > is ongoing.
> > > > >> > > As the Java API is one of the main features for the upcoming
> > > > release,
> > > > >> I
> > > > >> > > suggest to post-pone the code freeze towards end of this week.
> > > > >> > >
> > > > >> > > Please provide feedback and concern about the change in dates
> > for
> > > > code
> > > > >> > > freeze and 1.4.0 release. I will provide updates on progress
> > > > resolving
> > > > >> > the
> > > > >> > > potential performance problem.
> > > > >> > >
> > > > >> > > Patrick - do you think it is possible to resolve the remaining
> > > > issues
> > > > >> on
> > > > >> > MKL-
> > > > >> > > DNN this week, so we can consider GA for MKL-DNN with 1.4.0?
> > > > >> > >
> > > > >> > > Regards,
> > > > >> > > Steffen
> > > > >> > >
> > > > >> > > On Thu, Nov 15, 2018 at 5:26 AM Anton Chernov <
> > mechernov@gmail.com>
> > > > >> > > wrote:
> > > > >> > >
> > > > >> > > > I'd like to remind everyone that 'code freeze' would mean
> > cutting
> > > > a
> > > > >> > > > v1.4.x release branch and all following fixes would need to be
> > > > >> > backported.
> > > > >> > > > Development on master can be continued as usual.
> > > > >> > > >
> > > > >> > > > Best
> > > > >> > > > Anton
> > > > >> > > >
> > > > >> > > > ср, 14 нояб. 2018 г. в 6:04, Steffen Rochel <
> > > > >> steffenrochel@gmail.com>:
> > > > >> > > >
> > > > >> > > > > Dear MXNet community,
> > > > >> > > > > the agreed plan was to establish code freeze for 1.4.0
> > release
> > > > >> > > > > today. As the 1.3.1 patch release is still ongoing I
> > suggest to
> > > > >> > > > > post-pone the code freeze to Friday 16th November 2018.
> > > > >> > > > >
> > > > >> > > > > Sergey Kolychev has agreed to act as co-release manager for
> > all
> > > > >> > > > > tasks
> > > > >> > > > which
> > > > >> > > > > require committer privileges. If anybody is interested to
> > > > >> volunteer
> > > > >> > > > > as release manager - now is the time to speak up. Otherwise
> > I
> > > > will
> > > > >> > > > > manage
> > > > >> > > > the
> > > > >> > > > > release.
> > > > >> > > > >
> > > > >> > > > > Regards,
> > > > >> > > > > Steffen
> > > > >> > > > >
> > > > >> > > >
> > > > >> >
> > > > >>
> > > > >
> > > >
> >

Re: [Announce] Upcoming Apache MXNet (incubating) 1.4.0 release

Posted by Steffen Rochel <st...@gmail.com>.
Pedro - added  to 1.4.0 tracking list
<https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Plan+and+Status#ApacheMXNet(incubating)1.4.0ReleasePlanandStatus-OpenPRstotrack>

Do you have already ETA?
Steffen

On Thu, Nov 29, 2018 at 6:13 AM Pedro Larroy <pe...@gmail.com>
wrote:

> Hi all.
>
> There are two important issues / fixes that should go in the next
> release in my radar:
>
> 1) https://github.com/apache/incubator-mxnet/pull/13409/files
> There is a bug in shape inference on CPU when not using MKL, also we
> are running activation on CPU via MKL when we compile CUDNN+MKLDNN.
> I'm finishing a fix for these issues in the above PR.
>
> 2) https://github.com/apache/incubator-mxnet/issues/13438
> We are seeing crashes due to unsafe setenv in multithreaded code.
> Setenv / getenv from multiple threads is not safe and is causing
> segfaults. This piece of code (the handlers in pthread_atfork) already
> caused a very difficult to diagnose hang in a previous release, where
> a fork inside cudnn would deadlock the engine.
>
> I would remove setenv from 2) as a mitigation, but we would need to
> check for regressions as we could be creating additional threads
> inside the engine.
>
> I would suggest that we address these two major issues before the next
> release.
>
> Pedro
>
>
>
> On Sun, Nov 25, 2018 at 11:41 PM Steffen Rochel <st...@gmail.com>
> wrote:
> >
> > Dear MXNet community,
> >
> > I will be the release manager for the upcoming Apache MXNet 1.4.0
> release.
> > Sergey Kolychev will be co-managing the release and providing help from
> the
> > committers side.
> > A release candidate will be cut on November 29, 2018 and voting will
> start
> > December 7, 2018. Release notes have been drafted here [1]. If you have
> any
> > additional features in progress and would like to include it in this
> > release, please assure they have been merged by November 27, 2018.
> Release
> > schedule is available here [2].
> >
> > Feel free to add any other comments/suggestions. Please help to review
> and
> > merge outstanding PR's and resolve issues impacting the quality of the
> > 1.4.0 release.
> >
> > Regards,
> >
> > Steffen
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Notes
> >
> > [2]
> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Plan+and+Status
> >
> >
> >
> >
> > On Tue, Nov 20, 2018 at 7:15 PM kellen sunderland <
> > kellen.sunderland@gmail.com> wrote:
> >
> > > Spoke too soon[1], looks like others have been adding Turing support as
> > > well (thanks to those helping with this).  I believe there's still a
> few
> > > changes we'd have to make to claim support though (mshadow CMake
> changes,
> > > PyPi package creation tweaks).
> > >
> > > 1:
> > >
> > >
> https://github.com/apache/incubator-mxnet/commit/2c3357443ec3d49a11e93c89f278264ce10c2f08
> > >
> > > On Tue, Nov 20, 2018 at 7:00 PM kellen sunderland <
> > > kellen.sunderland@gmail.com> wrote:
> > >
> > > > Hey Steffen, I'd like to be able to merge this PR for version 1.4:
> > > > https://github.com/apache/incubator-mxnet/pull/13310 . It fixes a
> > > > regression in master which causes incorrect feature vectors to be
> output
> > > > when using the TensorRT feature.  (Thanks to Nathalie for helping me
> > > track
> > > > down the root cause of the issue).   I'm currently blocked on a CI
> issue
> > > I
> > > > haven't seen before, but hope to have it resolved by EOW.
> > > >
> > > > One call-out I would make is that we currently don't support Turing
> > > > architecture (sm_75).  I've been slowly trying to add support, but I
> > > don't
> > > > think I'd have capacity to do this done by EOW.  Does anyone feel
> > > strongly
> > > > we need this in the 1.4 release?  From my perspective this will
> already
> > > be
> > > > a strong release without it.
> > > >
> > > > On Tue, Nov 20, 2018 at 6:42 PM Steffen Rochel <
> steffenrochel@gmail.com>
> > > > wrote:
> > > >
> > > >> Thanks Patrick, lets target to get the PR's merged this week.
> > > >>
> > > >> Call for contributions from the community: Right now we have 10 PR
> > > >> awaiting
> > > >> merge
> > > >> <
> > > >>
> > >
> https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aopen+label%3Apr-awaiting-merge+
> > > >> >
> > > >> and
> > > >> we have 61 open PR awaiting review.
> > > >> <
> > > >>
> > >
> https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aopen+label%3Apr-awaiting-review
> > > >> >
> > > >> I would appreciate if you all can help to review the open PR and the
> > > >> committers can drive the merge before code freeze for 1.4.0.
> > > >>
> > > >> The contributors on the Java API are making progress, but not all
> > > >> performance issues are resolved. With some luck it should be
> possible to
> > > >> code freeze towards end of this week.
> > > >>
> > > >> Are there other critical features/bugs/PR you think need to be
> included
> > > in
> > > >> 1.4.0? If so, please communicate as soon as possible.
> > > >>
> > > >> Regards,
> > > >> Steffen
> > > >>
> > > >> On Mon, Nov 19, 2018 at 8:26 PM Zhao, Patric <patric.zhao@intel.com
> >
> > > >> wrote:
> > > >>
> > > >> > Thanks, Steffen. I think there is NO open issue to block the
> MKLDNN to
> > > >> GA
> > > >> > now.
> > > >> >
> > > >> > BTW, several quantization related PRs (#13297,#13260) are under
> the
> > > >> review
> > > >> > and I think it can be merged in this week.
> > > >> >
> > > >> > Thanks,
> > > >> >
> > > >> > --Patric
> > > >> >
> > > >> >
> > > >> > > -----Original Message-----
> > > >> > > From: Steffen Rochel [mailto:steffenrochel@gmail.com]
> > > >> > > Sent: Tuesday, November 20, 2018 2:57 AM
> > > >> > > To: dev@mxnet.incubator.apache.org
> > > >> > > Subject: Re: [Announce] Upcoming Apache MXNet (incubating) 1.4.0
> > > >> release
> > > >> > >
> > > >> > > On Friday the contributors working on Java API discovered a
> > > potential
> > > >> > > performance problem with inference using Java API vs. Python.
> > > >> > Investigation
> > > >> > > is ongoing.
> > > >> > > As the Java API is one of the main features for the upcoming
> > > release,
> > > >> I
> > > >> > > suggest to post-pone the code freeze towards end of this week.
> > > >> > >
> > > >> > > Please provide feedback and concern about the change in dates
> for
> > > code
> > > >> > > freeze and 1.4.0 release. I will provide updates on progress
> > > resolving
> > > >> > the
> > > >> > > potential performance problem.
> > > >> > >
> > > >> > > Patrick - do you think it is possible to resolve the remaining
> > > issues
> > > >> on
> > > >> > MKL-
> > > >> > > DNN this week, so we can consider GA for MKL-DNN with 1.4.0?
> > > >> > >
> > > >> > > Regards,
> > > >> > > Steffen
> > > >> > >
> > > >> > > On Thu, Nov 15, 2018 at 5:26 AM Anton Chernov <
> mechernov@gmail.com>
> > > >> > > wrote:
> > > >> > >
> > > >> > > > I'd like to remind everyone that 'code freeze' would mean
> cutting
> > > a
> > > >> > > > v1.4.x release branch and all following fixes would need to be
> > > >> > backported.
> > > >> > > > Development on master can be continued as usual.
> > > >> > > >
> > > >> > > > Best
> > > >> > > > Anton
> > > >> > > >
> > > >> > > > ср, 14 нояб. 2018 г. в 6:04, Steffen Rochel <
> > > >> steffenrochel@gmail.com>:
> > > >> > > >
> > > >> > > > > Dear MXNet community,
> > > >> > > > > the agreed plan was to establish code freeze for 1.4.0
> release
> > > >> > > > > today. As the 1.3.1 patch release is still ongoing I
> suggest to
> > > >> > > > > post-pone the code freeze to Friday 16th November 2018.
> > > >> > > > >
> > > >> > > > > Sergey Kolychev has agreed to act as co-release manager for
> all
> > > >> > > > > tasks
> > > >> > > > which
> > > >> > > > > require committer privileges. If anybody is interested to
> > > >> volunteer
> > > >> > > > > as release manager - now is the time to speak up. Otherwise
> I
> > > will
> > > >> > > > > manage
> > > >> > > > the
> > > >> > > > > release.
> > > >> > > > >
> > > >> > > > > Regards,
> > > >> > > > > Steffen
> > > >> > > > >
> > > >> > > >
> > > >> >
> > > >>
> > > >
> > >
>

Re: [Announce] Upcoming Apache MXNet (incubating) 1.4.0 release

Posted by Pedro Larroy <pe...@gmail.com>.
Hi all.

There are two important issues / fixes that should go in the next
release in my radar:

1) https://github.com/apache/incubator-mxnet/pull/13409/files
There is a bug in shape inference on CPU when not using MKL, also we
are running activation on CPU via MKL when we compile CUDNN+MKLDNN.
I'm finishing a fix for these issues in the above PR.

2) https://github.com/apache/incubator-mxnet/issues/13438
We are seeing crashes due to unsafe setenv in multithreaded code.
Setenv / getenv from multiple threads is not safe and is causing
segfaults. This piece of code (the handlers in pthread_atfork) already
caused a very difficult to diagnose hang in a previous release, where
a fork inside cudnn would deadlock the engine.

I would remove setenv from 2) as a mitigation, but we would need to
check for regressions as we could be creating additional threads
inside the engine.

I would suggest that we address these two major issues before the next release.

Pedro



On Sun, Nov 25, 2018 at 11:41 PM Steffen Rochel <st...@gmail.com> wrote:
>
> Dear MXNet community,
>
> I will be the release manager for the upcoming Apache MXNet 1.4.0 release.
> Sergey Kolychev will be co-managing the release and providing help from the
> committers side.
> A release candidate will be cut on November 29, 2018 and voting will start
> December 7, 2018. Release notes have been drafted here [1]. If you have any
> additional features in progress and would like to include it in this
> release, please assure they have been merged by November 27, 2018. Release
> schedule is available here [2].
>
> Feel free to add any other comments/suggestions. Please help to review and
> merge outstanding PR's and resolve issues impacting the quality of the
> 1.4.0 release.
>
> Regards,
>
> Steffen
>
> [1]
> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Notes
>
> [2] https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Plan+and+Status
>
>
>
>
> On Tue, Nov 20, 2018 at 7:15 PM kellen sunderland <
> kellen.sunderland@gmail.com> wrote:
>
> > Spoke too soon[1], looks like others have been adding Turing support as
> > well (thanks to those helping with this).  I believe there's still a few
> > changes we'd have to make to claim support though (mshadow CMake changes,
> > PyPi package creation tweaks).
> >
> > 1:
> >
> > https://github.com/apache/incubator-mxnet/commit/2c3357443ec3d49a11e93c89f278264ce10c2f08
> >
> > On Tue, Nov 20, 2018 at 7:00 PM kellen sunderland <
> > kellen.sunderland@gmail.com> wrote:
> >
> > > Hey Steffen, I'd like to be able to merge this PR for version 1.4:
> > > https://github.com/apache/incubator-mxnet/pull/13310 . It fixes a
> > > regression in master which causes incorrect feature vectors to be output
> > > when using the TensorRT feature.  (Thanks to Nathalie for helping me
> > track
> > > down the root cause of the issue).   I'm currently blocked on a CI issue
> > I
> > > haven't seen before, but hope to have it resolved by EOW.
> > >
> > > One call-out I would make is that we currently don't support Turing
> > > architecture (sm_75).  I've been slowly trying to add support, but I
> > don't
> > > think I'd have capacity to do this done by EOW.  Does anyone feel
> > strongly
> > > we need this in the 1.4 release?  From my perspective this will already
> > be
> > > a strong release without it.
> > >
> > > On Tue, Nov 20, 2018 at 6:42 PM Steffen Rochel <st...@gmail.com>
> > > wrote:
> > >
> > >> Thanks Patrick, lets target to get the PR's merged this week.
> > >>
> > >> Call for contributions from the community: Right now we have 10 PR
> > >> awaiting
> > >> merge
> > >> <
> > >>
> > https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aopen+label%3Apr-awaiting-merge+
> > >> >
> > >> and
> > >> we have 61 open PR awaiting review.
> > >> <
> > >>
> > https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aopen+label%3Apr-awaiting-review
> > >> >
> > >> I would appreciate if you all can help to review the open PR and the
> > >> committers can drive the merge before code freeze for 1.4.0.
> > >>
> > >> The contributors on the Java API are making progress, but not all
> > >> performance issues are resolved. With some luck it should be possible to
> > >> code freeze towards end of this week.
> > >>
> > >> Are there other critical features/bugs/PR you think need to be included
> > in
> > >> 1.4.0? If so, please communicate as soon as possible.
> > >>
> > >> Regards,
> > >> Steffen
> > >>
> > >> On Mon, Nov 19, 2018 at 8:26 PM Zhao, Patric <pa...@intel.com>
> > >> wrote:
> > >>
> > >> > Thanks, Steffen. I think there is NO open issue to block the MKLDNN to
> > >> GA
> > >> > now.
> > >> >
> > >> > BTW, several quantization related PRs (#13297,#13260) are under the
> > >> review
> > >> > and I think it can be merged in this week.
> > >> >
> > >> > Thanks,
> > >> >
> > >> > --Patric
> > >> >
> > >> >
> > >> > > -----Original Message-----
> > >> > > From: Steffen Rochel [mailto:steffenrochel@gmail.com]
> > >> > > Sent: Tuesday, November 20, 2018 2:57 AM
> > >> > > To: dev@mxnet.incubator.apache.org
> > >> > > Subject: Re: [Announce] Upcoming Apache MXNet (incubating) 1.4.0
> > >> release
> > >> > >
> > >> > > On Friday the contributors working on Java API discovered a
> > potential
> > >> > > performance problem with inference using Java API vs. Python.
> > >> > Investigation
> > >> > > is ongoing.
> > >> > > As the Java API is one of the main features for the upcoming
> > release,
> > >> I
> > >> > > suggest to post-pone the code freeze towards end of this week.
> > >> > >
> > >> > > Please provide feedback and concern about the change in dates for
> > code
> > >> > > freeze and 1.4.0 release. I will provide updates on progress
> > resolving
> > >> > the
> > >> > > potential performance problem.
> > >> > >
> > >> > > Patrick - do you think it is possible to resolve the remaining
> > issues
> > >> on
> > >> > MKL-
> > >> > > DNN this week, so we can consider GA for MKL-DNN with 1.4.0?
> > >> > >
> > >> > > Regards,
> > >> > > Steffen
> > >> > >
> > >> > > On Thu, Nov 15, 2018 at 5:26 AM Anton Chernov <me...@gmail.com>
> > >> > > wrote:
> > >> > >
> > >> > > > I'd like to remind everyone that 'code freeze' would mean cutting
> > a
> > >> > > > v1.4.x release branch and all following fixes would need to be
> > >> > backported.
> > >> > > > Development on master can be continued as usual.
> > >> > > >
> > >> > > > Best
> > >> > > > Anton
> > >> > > >
> > >> > > > ср, 14 нояб. 2018 г. в 6:04, Steffen Rochel <
> > >> steffenrochel@gmail.com>:
> > >> > > >
> > >> > > > > Dear MXNet community,
> > >> > > > > the agreed plan was to establish code freeze for 1.4.0 release
> > >> > > > > today. As the 1.3.1 patch release is still ongoing I suggest to
> > >> > > > > post-pone the code freeze to Friday 16th November 2018.
> > >> > > > >
> > >> > > > > Sergey Kolychev has agreed to act as co-release manager for all
> > >> > > > > tasks
> > >> > > > which
> > >> > > > > require committer privileges. If anybody is interested to
> > >> volunteer
> > >> > > > > as release manager - now is the time to speak up. Otherwise I
> > will
> > >> > > > > manage
> > >> > > > the
> > >> > > > > release.
> > >> > > > >
> > >> > > > > Regards,
> > >> > > > > Steffen
> > >> > > > >
> > >> > > >
> > >> >
> > >>
> > >
> >

Re: [Announce] Upcoming Apache MXNet (incubating) 1.4.0 release

Posted by Steffen Rochel <st...@gmail.com>.
All - Sergey has created v1.4.x branch and I opened first PR:
https://github.com/apache/incubator-mxnet/pull/13469

Please add critical - and only critical - bug fixes to v1.4.x branch and
add myself as approver.

Regards,
Steffen

On Thu, Nov 29, 2018 at 2:17 PM Lin Yuan <ap...@gmail.com> wrote:

> https://github.com/apache/incubator-mxnet/pull/13452 is needed in 1.4.0 to
> support Horovod integration project.
>
> Thanks!
>
> Lin
>
>
> On Thu, Nov 29, 2018 at 1:40 PM Davydenko, Denis <
> dzianis.davydzenka@gmail.com> wrote:
>
> > I suggest to include this issue into tracked ones for the release:
> > https://github.com/apache/incubator-mxnet/issues/12255. It has proven to
> > be a problem with MXNet start up time and it will cause even more
> problems
> > down the line with Elastic Training, EIA where MXNet is a commodity
> rather
> > than statically running process. Also it already causes noticeable issues
> > with MMS (MXNet Model Server [1]). MMS users already noticed significant
> > lag with MMS start up time, especially on beefy instances like C5.18xl
> with
> > 72 vCPUs. MMS spins up multiple MXNet instances during its start up to
> > ensure full utilization of CPU or GPU resources on the host. By default
> it
> > spins up as many MXNet instances as there are cores (either CPU or GPU
> > cores) and the bigger the host the more MXNet instances are spun up. And
> > the more MXNet instances spun up - the more each instance takes time to
> > start. For example, on C5.4xl users reported waiting for as long as 2
> > minutes to have just 8 MXNet instances spun up with MXNet 1.3. Same
> efforts
> > with MXNet 1.1 take less than 0.5 sec.
> >
> > This is quite a significant regression in MXNet when it comes to start up
> > experience. I suggest to consider this as a blocker for 1.4.
> >
> > [1] https://github.com/awslabs/mxnet-model-server
> >
> > On 11/29/18, 12:51 PM, "Steffen Rochel" <st...@gmail.com>
> wrote:
> >
> >     added to 1.4.0 tracking list
> >     <
> >
> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Plan+and+Status#ApacheMXNet(incubating)1.4.0ReleasePlanandStatus-OpenPRstotrack
> > >
> >     .
> >     Steffen
> >
> >     On Thu, Nov 29, 2018 at 9:32 AM Zheng, Da <dzzhen@amazon.com.invalid
> >
> > wrote:
> >
> >     > Hello Steffen,
> >     >
> >     > Can this bug be fixed in 1.4.0 release? It's a significant
> > performance
> >     > regression on sparse matrix multiplication.
> >     > https://github.com/apache/incubator-mxnet/issues/13449
> >     >
> >     > Thanks,
> >     > Da
> >     >
> >     > On 11/26/18, 6:42 AM, "Steffen Rochel" <st...@gmail.com>
> > wrote:
> >     >
> >     >     Dear MXNet community,
> >     >
> >     >     I will be the release manager for the upcoming Apache MXNet
> 1.4.0
> >     > release.
> >     >     Sergey Kolychev will be co-managing the release and providing
> > help
> >     > from the
> >     >     committers side.
> >     >     A release candidate will be cut on November 29, 2018 and voting
> > will
> >     > start
> >     >     December 7, 2018. Release notes have been drafted here [1]. If
> > you
> >     > have any
> >     >     additional features in progress and would like to include it in
> > this
> >     >     release, please assure they have been merged by November 27,
> > 2018.
> >     > Release
> >     >     schedule is available here [2].
> >     >
> >     >     Feel free to add any other comments/suggestions. Please help to
> > review
> >     > and
> >     >     merge outstanding PR's and resolve issues impacting the quality
> > of the
> >     >     1.4.0 release.
> >     >
> >     >     Regards,
> >     >
> >     >     Steffen
> >     >
> >     >     [1]
> >     >
> >     >
> >
> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Notes
> >     >
> >     >     [2]
> >     >
> >
> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Plan+and+Status
> >     >
> >     >
> >     >
> >     >
> >     >     On Tue, Nov 20, 2018 at 7:15 PM kellen sunderland <
> >     >     kellen.sunderland@gmail.com> wrote:
> >     >
> >     >     > Spoke too soon[1], looks like others have been adding Turing
> > support
> >     > as
> >     >     > well (thanks to those helping with this).  I believe there's
> > still a
> >     > few
> >     >     > changes we'd have to make to claim support though (mshadow
> > CMake
> >     > changes,
> >     >     > PyPi package creation tweaks).
> >     >     >
> >     >     > 1:
> >     >     >
> >     >     >
> >     >
> >
> https://github.com/apache/incubator-mxnet/commit/2c3357443ec3d49a11e93c89f278264ce10c2f08
> >     >     >
> >     >     > On Tue, Nov 20, 2018 at 7:00 PM kellen sunderland <
> >     >     > kellen.sunderland@gmail.com> wrote:
> >     >     >
> >     >     > > Hey Steffen, I'd like to be able to merge this PR for
> > version 1.4:
> >     >     > > https://github.com/apache/incubator-mxnet/pull/13310 . It
> > fixes a
> >     >     > > regression in master which causes incorrect feature vectors
> > to be
> >     > output
> >     >     > > when using the TensorRT feature.  (Thanks to Nathalie for
> > helping
> >     > me
> >     >     > track
> >     >     > > down the root cause of the issue).   I'm currently blocked
> > on a CI
> >     > issue
> >     >     > I
> >     >     > > haven't seen before, but hope to have it resolved by EOW.
> >     >     > >
> >     >     > > One call-out I would make is that we currently don't
> support
> > Turing
> >     >     > > architecture (sm_75).  I've been slowly trying to add
> > support, but
> >     > I
> >     >     > don't
> >     >     > > think I'd have capacity to do this done by EOW.  Does
> anyone
> > feel
> >     >     > strongly
> >     >     > > we need this in the 1.4 release?  From my perspective this
> > will
> >     > already
> >     >     > be
> >     >     > > a strong release without it.
> >     >     > >
> >     >     > > On Tue, Nov 20, 2018 at 6:42 PM Steffen Rochel <
> >     > steffenrochel@gmail.com>
> >     >     > > wrote:
> >     >     > >
> >     >     > >> Thanks Patrick, lets target to get the PR's merged this
> > week.
> >     >     > >>
> >     >     > >> Call for contributions from the community: Right now we
> > have 10 PR
> >     >     > >> awaiting
> >     >     > >> merge
> >     >     > >> <
> >     >     > >>
> >     >     >
> >     >
> >
> https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aopen+label%3Apr-awaiting-merge+
> >     >     > >> >
> >     >     > >> and
> >     >     > >> we have 61 open PR awaiting review.
> >     >     > >> <
> >     >     > >>
> >     >     >
> >     >
> >
> https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aopen+label%3Apr-awaiting-review
> >     >     > >> >
> >     >     > >> I would appreciate if you all can help to review the open
> > PR and
> >     > the
> >     >     > >> committers can drive the merge before code freeze for
> 1.4.0.
> >     >     > >>
> >     >     > >> The contributors on the Java API are making progress, but
> > not all
> >     >     > >> performance issues are resolved. With some luck it should
> be
> >     > possible to
> >     >     > >> code freeze towards end of this week.
> >     >     > >>
> >     >     > >> Are there other critical features/bugs/PR you think need
> to
> > be
> >     > included
> >     >     > in
> >     >     > >> 1.4.0? If so, please communicate as soon as possible.
> >     >     > >>
> >     >     > >> Regards,
> >     >     > >> Steffen
> >     >     > >>
> >     >     > >> On Mon, Nov 19, 2018 at 8:26 PM Zhao, Patric <
> >     > patric.zhao@intel.com>
> >     >     > >> wrote:
> >     >     > >>
> >     >     > >> > Thanks, Steffen. I think there is NO open issue to block
> > the
> >     > MKLDNN to
> >     >     > >> GA
> >     >     > >> > now.
> >     >     > >> >
> >     >     > >> > BTW, several quantization related PRs (#13297,#13260)
> are
> > under
> >     > the
> >     >     > >> review
> >     >     > >> > and I think it can be merged in this week.
> >     >     > >> >
> >     >     > >> > Thanks,
> >     >     > >> >
> >     >     > >> > --Patric
> >     >     > >> >
> >     >     > >> >
> >     >     > >> > > -----Original Message-----
> >     >     > >> > > From: Steffen Rochel [mailto:steffenrochel@gmail.com]
> >     >     > >> > > Sent: Tuesday, November 20, 2018 2:57 AM
> >     >     > >> > > To: dev@mxnet.incubator.apache.org
> >     >     > >> > > Subject: Re: [Announce] Upcoming Apache MXNet
> > (incubating)
> >     > 1.4.0
> >     >     > >> release
> >     >     > >> > >
> >     >     > >> > > On Friday the contributors working on Java API
> > discovered a
> >     >     > potential
> >     >     > >> > > performance problem with inference using Java API vs.
> > Python.
> >     >     > >> > Investigation
> >     >     > >> > > is ongoing.
> >     >     > >> > > As the Java API is one of the main features for the
> > upcoming
> >     >     > release,
> >     >     > >> I
> >     >     > >> > > suggest to post-pone the code freeze towards end of
> > this week.
> >     >     > >> > >
> >     >     > >> > > Please provide feedback and concern about the change
> in
> > dates
> >     > for
> >     >     > code
> >     >     > >> > > freeze and 1.4.0 release. I will provide updates on
> > progress
> >     >     > resolving
> >     >     > >> > the
> >     >     > >> > > potential performance problem.
> >     >     > >> > >
> >     >     > >> > > Patrick - do you think it is possible to resolve the
> > remaining
> >     >     > issues
> >     >     > >> on
> >     >     > >> > MKL-
> >     >     > >> > > DNN this week, so we can consider GA for MKL-DNN with
> > 1.4.0?
> >     >     > >> > >
> >     >     > >> > > Regards,
> >     >     > >> > > Steffen
> >     >     > >> > >
> >     >     > >> > > On Thu, Nov 15, 2018 at 5:26 AM Anton Chernov <
> >     > mechernov@gmail.com>
> >     >     > >> > > wrote:
> >     >     > >> > >
> >     >     > >> > > > I'd like to remind everyone that 'code freeze' would
> > mean
> >     > cutting
> >     >     > a
> >     >     > >> > > > v1.4.x release branch and all following fixes would
> > need to
> >     > be
> >     >     > >> > backported.
> >     >     > >> > > > Development on master can be continued as usual.
> >     >     > >> > > >
> >     >     > >> > > > Best
> >     >     > >> > > > Anton
> >     >     > >> > > >
> >     >     > >> > > > ср, 14 нояб. 2018 г. в 6:04, Steffen Rochel <
> >     >     > >> steffenrochel@gmail.com>:
> >     >     > >> > > >
> >     >     > >> > > > > Dear MXNet community,
> >     >     > >> > > > > the agreed plan was to establish code freeze for
> > 1.4.0
> >     > release
> >     >     > >> > > > > today. As the 1.3.1 patch release is still
> ongoing I
> >     > suggest to
> >     >     > >> > > > > post-pone the code freeze to Friday 16th November
> > 2018.
> >     >     > >> > > > >
> >     >     > >> > > > > Sergey Kolychev has agreed to act as co-release
> > manager
> >     > for all
> >     >     > >> > > > > tasks
> >     >     > >> > > > which
> >     >     > >> > > > > require committer privileges. If anybody is
> > interested to
> >     >     > >> volunteer
> >     >     > >> > > > > as release manager - now is the time to speak up.
> >     > Otherwise I
> >     >     > will
> >     >     > >> > > > > manage
> >     >     > >> > > > the
> >     >     > >> > > > > release.
> >     >     > >> > > > >
> >     >     > >> > > > > Regards,
> >     >     > >> > > > > Steffen
> >     >     > >> > > > >
> >     >     > >> > > >
> >     >     > >> >
> >     >     > >>
> >     >     > >
> >     >     >
> >     >
> >     >
> >     >
> >
> >
> >
> >
>

Re: [Announce] Upcoming Apache MXNet (incubating) 1.4.0 release

Posted by Lin Yuan <ap...@gmail.com>.
https://github.com/apache/incubator-mxnet/pull/13452 is needed in 1.4.0 to
support Horovod integration project.

Thanks!

Lin


On Thu, Nov 29, 2018 at 1:40 PM Davydenko, Denis <
dzianis.davydzenka@gmail.com> wrote:

> I suggest to include this issue into tracked ones for the release:
> https://github.com/apache/incubator-mxnet/issues/12255. It has proven to
> be a problem with MXNet start up time and it will cause even more problems
> down the line with Elastic Training, EIA where MXNet is a commodity rather
> than statically running process. Also it already causes noticeable issues
> with MMS (MXNet Model Server [1]). MMS users already noticed significant
> lag with MMS start up time, especially on beefy instances like C5.18xl with
> 72 vCPUs. MMS spins up multiple MXNet instances during its start up to
> ensure full utilization of CPU or GPU resources on the host. By default it
> spins up as many MXNet instances as there are cores (either CPU or GPU
> cores) and the bigger the host the more MXNet instances are spun up. And
> the more MXNet instances spun up - the more each instance takes time to
> start. For example, on C5.4xl users reported waiting for as long as 2
> minutes to have just 8 MXNet instances spun up with MXNet 1.3. Same efforts
> with MXNet 1.1 take less than 0.5 sec.
>
> This is quite a significant regression in MXNet when it comes to start up
> experience. I suggest to consider this as a blocker for 1.4.
>
> [1] https://github.com/awslabs/mxnet-model-server
>
> On 11/29/18, 12:51 PM, "Steffen Rochel" <st...@gmail.com> wrote:
>
>     added to 1.4.0 tracking list
>     <
> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Plan+and+Status#ApacheMXNet(incubating)1.4.0ReleasePlanandStatus-OpenPRstotrack
> >
>     .
>     Steffen
>
>     On Thu, Nov 29, 2018 at 9:32 AM Zheng, Da <dz...@amazon.com.invalid>
> wrote:
>
>     > Hello Steffen,
>     >
>     > Can this bug be fixed in 1.4.0 release? It's a significant
> performance
>     > regression on sparse matrix multiplication.
>     > https://github.com/apache/incubator-mxnet/issues/13449
>     >
>     > Thanks,
>     > Da
>     >
>     > On 11/26/18, 6:42 AM, "Steffen Rochel" <st...@gmail.com>
> wrote:
>     >
>     >     Dear MXNet community,
>     >
>     >     I will be the release manager for the upcoming Apache MXNet 1.4.0
>     > release.
>     >     Sergey Kolychev will be co-managing the release and providing
> help
>     > from the
>     >     committers side.
>     >     A release candidate will be cut on November 29, 2018 and voting
> will
>     > start
>     >     December 7, 2018. Release notes have been drafted here [1]. If
> you
>     > have any
>     >     additional features in progress and would like to include it in
> this
>     >     release, please assure they have been merged by November 27,
> 2018.
>     > Release
>     >     schedule is available here [2].
>     >
>     >     Feel free to add any other comments/suggestions. Please help to
> review
>     > and
>     >     merge outstanding PR's and resolve issues impacting the quality
> of the
>     >     1.4.0 release.
>     >
>     >     Regards,
>     >
>     >     Steffen
>     >
>     >     [1]
>     >
>     >
> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Notes
>     >
>     >     [2]
>     >
> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Plan+and+Status
>     >
>     >
>     >
>     >
>     >     On Tue, Nov 20, 2018 at 7:15 PM kellen sunderland <
>     >     kellen.sunderland@gmail.com> wrote:
>     >
>     >     > Spoke too soon[1], looks like others have been adding Turing
> support
>     > as
>     >     > well (thanks to those helping with this).  I believe there's
> still a
>     > few
>     >     > changes we'd have to make to claim support though (mshadow
> CMake
>     > changes,
>     >     > PyPi package creation tweaks).
>     >     >
>     >     > 1:
>     >     >
>     >     >
>     >
> https://github.com/apache/incubator-mxnet/commit/2c3357443ec3d49a11e93c89f278264ce10c2f08
>     >     >
>     >     > On Tue, Nov 20, 2018 at 7:00 PM kellen sunderland <
>     >     > kellen.sunderland@gmail.com> wrote:
>     >     >
>     >     > > Hey Steffen, I'd like to be able to merge this PR for
> version 1.4:
>     >     > > https://github.com/apache/incubator-mxnet/pull/13310 . It
> fixes a
>     >     > > regression in master which causes incorrect feature vectors
> to be
>     > output
>     >     > > when using the TensorRT feature.  (Thanks to Nathalie for
> helping
>     > me
>     >     > track
>     >     > > down the root cause of the issue).   I'm currently blocked
> on a CI
>     > issue
>     >     > I
>     >     > > haven't seen before, but hope to have it resolved by EOW.
>     >     > >
>     >     > > One call-out I would make is that we currently don't support
> Turing
>     >     > > architecture (sm_75).  I've been slowly trying to add
> support, but
>     > I
>     >     > don't
>     >     > > think I'd have capacity to do this done by EOW.  Does anyone
> feel
>     >     > strongly
>     >     > > we need this in the 1.4 release?  From my perspective this
> will
>     > already
>     >     > be
>     >     > > a strong release without it.
>     >     > >
>     >     > > On Tue, Nov 20, 2018 at 6:42 PM Steffen Rochel <
>     > steffenrochel@gmail.com>
>     >     > > wrote:
>     >     > >
>     >     > >> Thanks Patrick, lets target to get the PR's merged this
> week.
>     >     > >>
>     >     > >> Call for contributions from the community: Right now we
> have 10 PR
>     >     > >> awaiting
>     >     > >> merge
>     >     > >> <
>     >     > >>
>     >     >
>     >
> https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aopen+label%3Apr-awaiting-merge+
>     >     > >> >
>     >     > >> and
>     >     > >> we have 61 open PR awaiting review.
>     >     > >> <
>     >     > >>
>     >     >
>     >
> https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aopen+label%3Apr-awaiting-review
>     >     > >> >
>     >     > >> I would appreciate if you all can help to review the open
> PR and
>     > the
>     >     > >> committers can drive the merge before code freeze for 1.4.0.
>     >     > >>
>     >     > >> The contributors on the Java API are making progress, but
> not all
>     >     > >> performance issues are resolved. With some luck it should be
>     > possible to
>     >     > >> code freeze towards end of this week.
>     >     > >>
>     >     > >> Are there other critical features/bugs/PR you think need to
> be
>     > included
>     >     > in
>     >     > >> 1.4.0? If so, please communicate as soon as possible.
>     >     > >>
>     >     > >> Regards,
>     >     > >> Steffen
>     >     > >>
>     >     > >> On Mon, Nov 19, 2018 at 8:26 PM Zhao, Patric <
>     > patric.zhao@intel.com>
>     >     > >> wrote:
>     >     > >>
>     >     > >> > Thanks, Steffen. I think there is NO open issue to block
> the
>     > MKLDNN to
>     >     > >> GA
>     >     > >> > now.
>     >     > >> >
>     >     > >> > BTW, several quantization related PRs (#13297,#13260) are
> under
>     > the
>     >     > >> review
>     >     > >> > and I think it can be merged in this week.
>     >     > >> >
>     >     > >> > Thanks,
>     >     > >> >
>     >     > >> > --Patric
>     >     > >> >
>     >     > >> >
>     >     > >> > > -----Original Message-----
>     >     > >> > > From: Steffen Rochel [mailto:steffenrochel@gmail.com]
>     >     > >> > > Sent: Tuesday, November 20, 2018 2:57 AM
>     >     > >> > > To: dev@mxnet.incubator.apache.org
>     >     > >> > > Subject: Re: [Announce] Upcoming Apache MXNet
> (incubating)
>     > 1.4.0
>     >     > >> release
>     >     > >> > >
>     >     > >> > > On Friday the contributors working on Java API
> discovered a
>     >     > potential
>     >     > >> > > performance problem with inference using Java API vs.
> Python.
>     >     > >> > Investigation
>     >     > >> > > is ongoing.
>     >     > >> > > As the Java API is one of the main features for the
> upcoming
>     >     > release,
>     >     > >> I
>     >     > >> > > suggest to post-pone the code freeze towards end of
> this week.
>     >     > >> > >
>     >     > >> > > Please provide feedback and concern about the change in
> dates
>     > for
>     >     > code
>     >     > >> > > freeze and 1.4.0 release. I will provide updates on
> progress
>     >     > resolving
>     >     > >> > the
>     >     > >> > > potential performance problem.
>     >     > >> > >
>     >     > >> > > Patrick - do you think it is possible to resolve the
> remaining
>     >     > issues
>     >     > >> on
>     >     > >> > MKL-
>     >     > >> > > DNN this week, so we can consider GA for MKL-DNN with
> 1.4.0?
>     >     > >> > >
>     >     > >> > > Regards,
>     >     > >> > > Steffen
>     >     > >> > >
>     >     > >> > > On Thu, Nov 15, 2018 at 5:26 AM Anton Chernov <
>     > mechernov@gmail.com>
>     >     > >> > > wrote:
>     >     > >> > >
>     >     > >> > > > I'd like to remind everyone that 'code freeze' would
> mean
>     > cutting
>     >     > a
>     >     > >> > > > v1.4.x release branch and all following fixes would
> need to
>     > be
>     >     > >> > backported.
>     >     > >> > > > Development on master can be continued as usual.
>     >     > >> > > >
>     >     > >> > > > Best
>     >     > >> > > > Anton
>     >     > >> > > >
>     >     > >> > > > ср, 14 нояб. 2018 г. в 6:04, Steffen Rochel <
>     >     > >> steffenrochel@gmail.com>:
>     >     > >> > > >
>     >     > >> > > > > Dear MXNet community,
>     >     > >> > > > > the agreed plan was to establish code freeze for
> 1.4.0
>     > release
>     >     > >> > > > > today. As the 1.3.1 patch release is still ongoing I
>     > suggest to
>     >     > >> > > > > post-pone the code freeze to Friday 16th November
> 2018.
>     >     > >> > > > >
>     >     > >> > > > > Sergey Kolychev has agreed to act as co-release
> manager
>     > for all
>     >     > >> > > > > tasks
>     >     > >> > > > which
>     >     > >> > > > > require committer privileges. If anybody is
> interested to
>     >     > >> volunteer
>     >     > >> > > > > as release manager - now is the time to speak up.
>     > Otherwise I
>     >     > will
>     >     > >> > > > > manage
>     >     > >> > > > the
>     >     > >> > > > > release.
>     >     > >> > > > >
>     >     > >> > > > > Regards,
>     >     > >> > > > > Steffen
>     >     > >> > > > >
>     >     > >> > > >
>     >     > >> >
>     >     > >>
>     >     > >
>     >     >
>     >
>     >
>     >
>
>
>
>

Re: [Announce] Upcoming Apache MXNet (incubating) 1.4.0 release

Posted by Steffen Rochel <st...@gmail.com>.
Denis - added.

On Thu, Nov 29, 2018 at 1:40 PM Davydenko, Denis <
dzianis.davydzenka@gmail.com> wrote:

> I suggest to include this issue into tracked ones for the release:
> https://github.com/apache/incubator-mxnet/issues/12255. It has proven to
> be a problem with MXNet start up time and it will cause even more problems
> down the line with Elastic Training, EIA where MXNet is a commodity rather
> than statically running process. Also it already causes noticeable issues
> with MMS (MXNet Model Server [1]). MMS users already noticed significant
> lag with MMS start up time, especially on beefy instances like C5.18xl with
> 72 vCPUs. MMS spins up multiple MXNet instances during its start up to
> ensure full utilization of CPU or GPU resources on the host. By default it
> spins up as many MXNet instances as there are cores (either CPU or GPU
> cores) and the bigger the host the more MXNet instances are spun up. And
> the more MXNet instances spun up - the more each instance takes time to
> start. For example, on C5.4xl users reported waiting for as long as 2
> minutes to have just 8 MXNet instances spun up with MXNet 1.3. Same efforts
> with MXNet 1.1 take less than 0.5 sec.
>
> This is quite a significant regression in MXNet when it comes to start up
> experience. I suggest to consider this as a blocker for 1.4.
>
> [1] https://github.com/awslabs/mxnet-model-server
>
> On 11/29/18, 12:51 PM, "Steffen Rochel" <st...@gmail.com> wrote:
>
>     added to 1.4.0 tracking list
>     <
> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Plan+and+Status#ApacheMXNet(incubating)1.4.0ReleasePlanandStatus-OpenPRstotrack
> >
>     .
>     Steffen
>
>     On Thu, Nov 29, 2018 at 9:32 AM Zheng, Da <dz...@amazon.com.invalid>
> wrote:
>
>     > Hello Steffen,
>     >
>     > Can this bug be fixed in 1.4.0 release? It's a significant
> performance
>     > regression on sparse matrix multiplication.
>     > https://github.com/apache/incubator-mxnet/issues/13449
>     >
>     > Thanks,
>     > Da
>     >
>     > On 11/26/18, 6:42 AM, "Steffen Rochel" <st...@gmail.com>
> wrote:
>     >
>     >     Dear MXNet community,
>     >
>     >     I will be the release manager for the upcoming Apache MXNet 1.4.0
>     > release.
>     >     Sergey Kolychev will be co-managing the release and providing
> help
>     > from the
>     >     committers side.
>     >     A release candidate will be cut on November 29, 2018 and voting
> will
>     > start
>     >     December 7, 2018. Release notes have been drafted here [1]. If
> you
>     > have any
>     >     additional features in progress and would like to include it in
> this
>     >     release, please assure they have been merged by November 27,
> 2018.
>     > Release
>     >     schedule is available here [2].
>     >
>     >     Feel free to add any other comments/suggestions. Please help to
> review
>     > and
>     >     merge outstanding PR's and resolve issues impacting the quality
> of the
>     >     1.4.0 release.
>     >
>     >     Regards,
>     >
>     >     Steffen
>     >
>     >     [1]
>     >
>     >
> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Notes
>     >
>     >     [2]
>     >
> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Plan+and+Status
>     >
>     >
>     >
>     >
>     >     On Tue, Nov 20, 2018 at 7:15 PM kellen sunderland <
>     >     kellen.sunderland@gmail.com> wrote:
>     >
>     >     > Spoke too soon[1], looks like others have been adding Turing
> support
>     > as
>     >     > well (thanks to those helping with this).  I believe there's
> still a
>     > few
>     >     > changes we'd have to make to claim support though (mshadow
> CMake
>     > changes,
>     >     > PyPi package creation tweaks).
>     >     >
>     >     > 1:
>     >     >
>     >     >
>     >
> https://github.com/apache/incubator-mxnet/commit/2c3357443ec3d49a11e93c89f278264ce10c2f08
>     >     >
>     >     > On Tue, Nov 20, 2018 at 7:00 PM kellen sunderland <
>     >     > kellen.sunderland@gmail.com> wrote:
>     >     >
>     >     > > Hey Steffen, I'd like to be able to merge this PR for
> version 1.4:
>     >     > > https://github.com/apache/incubator-mxnet/pull/13310 . It
> fixes a
>     >     > > regression in master which causes incorrect feature vectors
> to be
>     > output
>     >     > > when using the TensorRT feature.  (Thanks to Nathalie for
> helping
>     > me
>     >     > track
>     >     > > down the root cause of the issue).   I'm currently blocked
> on a CI
>     > issue
>     >     > I
>     >     > > haven't seen before, but hope to have it resolved by EOW.
>     >     > >
>     >     > > One call-out I would make is that we currently don't support
> Turing
>     >     > > architecture (sm_75).  I've been slowly trying to add
> support, but
>     > I
>     >     > don't
>     >     > > think I'd have capacity to do this done by EOW.  Does anyone
> feel
>     >     > strongly
>     >     > > we need this in the 1.4 release?  From my perspective this
> will
>     > already
>     >     > be
>     >     > > a strong release without it.
>     >     > >
>     >     > > On Tue, Nov 20, 2018 at 6:42 PM Steffen Rochel <
>     > steffenrochel@gmail.com>
>     >     > > wrote:
>     >     > >
>     >     > >> Thanks Patrick, lets target to get the PR's merged this
> week.
>     >     > >>
>     >     > >> Call for contributions from the community: Right now we
> have 10 PR
>     >     > >> awaiting
>     >     > >> merge
>     >     > >> <
>     >     > >>
>     >     >
>     >
> https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aopen+label%3Apr-awaiting-merge+
>     >     > >> >
>     >     > >> and
>     >     > >> we have 61 open PR awaiting review.
>     >     > >> <
>     >     > >>
>     >     >
>     >
> https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aopen+label%3Apr-awaiting-review
>     >     > >> >
>     >     > >> I would appreciate if you all can help to review the open
> PR and
>     > the
>     >     > >> committers can drive the merge before code freeze for 1.4.0.
>     >     > >>
>     >     > >> The contributors on the Java API are making progress, but
> not all
>     >     > >> performance issues are resolved. With some luck it should be
>     > possible to
>     >     > >> code freeze towards end of this week.
>     >     > >>
>     >     > >> Are there other critical features/bugs/PR you think need to
> be
>     > included
>     >     > in
>     >     > >> 1.4.0? If so, please communicate as soon as possible.
>     >     > >>
>     >     > >> Regards,
>     >     > >> Steffen
>     >     > >>
>     >     > >> On Mon, Nov 19, 2018 at 8:26 PM Zhao, Patric <
>     > patric.zhao@intel.com>
>     >     > >> wrote:
>     >     > >>
>     >     > >> > Thanks, Steffen. I think there is NO open issue to block
> the
>     > MKLDNN to
>     >     > >> GA
>     >     > >> > now.
>     >     > >> >
>     >     > >> > BTW, several quantization related PRs (#13297,#13260) are
> under
>     > the
>     >     > >> review
>     >     > >> > and I think it can be merged in this week.
>     >     > >> >
>     >     > >> > Thanks,
>     >     > >> >
>     >     > >> > --Patric
>     >     > >> >
>     >     > >> >
>     >     > >> > > -----Original Message-----
>     >     > >> > > From: Steffen Rochel [mailto:steffenrochel@gmail.com]
>     >     > >> > > Sent: Tuesday, November 20, 2018 2:57 AM
>     >     > >> > > To: dev@mxnet.incubator.apache.org
>     >     > >> > > Subject: Re: [Announce] Upcoming Apache MXNet
> (incubating)
>     > 1.4.0
>     >     > >> release
>     >     > >> > >
>     >     > >> > > On Friday the contributors working on Java API
> discovered a
>     >     > potential
>     >     > >> > > performance problem with inference using Java API vs.
> Python.
>     >     > >> > Investigation
>     >     > >> > > is ongoing.
>     >     > >> > > As the Java API is one of the main features for the
> upcoming
>     >     > release,
>     >     > >> I
>     >     > >> > > suggest to post-pone the code freeze towards end of
> this week.
>     >     > >> > >
>     >     > >> > > Please provide feedback and concern about the change in
> dates
>     > for
>     >     > code
>     >     > >> > > freeze and 1.4.0 release. I will provide updates on
> progress
>     >     > resolving
>     >     > >> > the
>     >     > >> > > potential performance problem.
>     >     > >> > >
>     >     > >> > > Patrick - do you think it is possible to resolve the
> remaining
>     >     > issues
>     >     > >> on
>     >     > >> > MKL-
>     >     > >> > > DNN this week, so we can consider GA for MKL-DNN with
> 1.4.0?
>     >     > >> > >
>     >     > >> > > Regards,
>     >     > >> > > Steffen
>     >     > >> > >
>     >     > >> > > On Thu, Nov 15, 2018 at 5:26 AM Anton Chernov <
>     > mechernov@gmail.com>
>     >     > >> > > wrote:
>     >     > >> > >
>     >     > >> > > > I'd like to remind everyone that 'code freeze' would
> mean
>     > cutting
>     >     > a
>     >     > >> > > > v1.4.x release branch and all following fixes would
> need to
>     > be
>     >     > >> > backported.
>     >     > >> > > > Development on master can be continued as usual.
>     >     > >> > > >
>     >     > >> > > > Best
>     >     > >> > > > Anton
>     >     > >> > > >
>     >     > >> > > > ср, 14 нояб. 2018 г. в 6:04, Steffen Rochel <
>     >     > >> steffenrochel@gmail.com>:
>     >     > >> > > >
>     >     > >> > > > > Dear MXNet community,
>     >     > >> > > > > the agreed plan was to establish code freeze for
> 1.4.0
>     > release
>     >     > >> > > > > today. As the 1.3.1 patch release is still ongoing I
>     > suggest to
>     >     > >> > > > > post-pone the code freeze to Friday 16th November
> 2018.
>     >     > >> > > > >
>     >     > >> > > > > Sergey Kolychev has agreed to act as co-release
> manager
>     > for all
>     >     > >> > > > > tasks
>     >     > >> > > > which
>     >     > >> > > > > require committer privileges. If anybody is
> interested to
>     >     > >> volunteer
>     >     > >> > > > > as release manager - now is the time to speak up.
>     > Otherwise I
>     >     > will
>     >     > >> > > > > manage
>     >     > >> > > > the
>     >     > >> > > > > release.
>     >     > >> > > > >
>     >     > >> > > > > Regards,
>     >     > >> > > > > Steffen
>     >     > >> > > > >
>     >     > >> > > >
>     >     > >> >
>     >     > >>
>     >     > >
>     >     >
>     >
>     >
>     >
>
>
>
>

Re: [Announce] Upcoming Apache MXNet (incubating) 1.4.0 release

Posted by "Davydenko, Denis" <dz...@gmail.com>.
I suggest to include this issue into tracked ones for the release: https://github.com/apache/incubator-mxnet/issues/12255. It has proven to be a problem with MXNet start up time and it will cause even more problems down the line with Elastic Training, EIA where MXNet is a commodity rather than statically running process. Also it already causes noticeable issues with MMS (MXNet Model Server [1]). MMS users already noticed significant lag with MMS start up time, especially on beefy instances like C5.18xl with 72 vCPUs. MMS spins up multiple MXNet instances during its start up to ensure full utilization of CPU or GPU resources on the host. By default it spins up as many MXNet instances as there are cores (either CPU or GPU cores) and the bigger the host the more MXNet instances are spun up. And the more MXNet instances spun up - the more each instance takes time to start. For example, on C5.4xl users reported waiting for as long as 2 minutes to have just 8 MXNet instances spun up with MXNet 1.3. Same efforts with MXNet 1.1 take less than 0.5 sec.

This is quite a significant regression in MXNet when it comes to start up experience. I suggest to consider this as a blocker for 1.4.

[1] https://github.com/awslabs/mxnet-model-server 

On 11/29/18, 12:51 PM, "Steffen Rochel" <st...@gmail.com> wrote:

    added to 1.4.0 tracking list
    <https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Plan+and+Status#ApacheMXNet(incubating)1.4.0ReleasePlanandStatus-OpenPRstotrack>
    .
    Steffen
    
    On Thu, Nov 29, 2018 at 9:32 AM Zheng, Da <dz...@amazon.com.invalid> wrote:
    
    > Hello Steffen,
    >
    > Can this bug be fixed in 1.4.0 release? It's a significant performance
    > regression on sparse matrix multiplication.
    > https://github.com/apache/incubator-mxnet/issues/13449
    >
    > Thanks,
    > Da
    >
    > On 11/26/18, 6:42 AM, "Steffen Rochel" <st...@gmail.com> wrote:
    >
    >     Dear MXNet community,
    >
    >     I will be the release manager for the upcoming Apache MXNet 1.4.0
    > release.
    >     Sergey Kolychev will be co-managing the release and providing help
    > from the
    >     committers side.
    >     A release candidate will be cut on November 29, 2018 and voting will
    > start
    >     December 7, 2018. Release notes have been drafted here [1]. If you
    > have any
    >     additional features in progress and would like to include it in this
    >     release, please assure they have been merged by November 27, 2018.
    > Release
    >     schedule is available here [2].
    >
    >     Feel free to add any other comments/suggestions. Please help to review
    > and
    >     merge outstanding PR's and resolve issues impacting the quality of the
    >     1.4.0 release.
    >
    >     Regards,
    >
    >     Steffen
    >
    >     [1]
    >
    > https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Notes
    >
    >     [2]
    > https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Plan+and+Status
    >
    >
    >
    >
    >     On Tue, Nov 20, 2018 at 7:15 PM kellen sunderland <
    >     kellen.sunderland@gmail.com> wrote:
    >
    >     > Spoke too soon[1], looks like others have been adding Turing support
    > as
    >     > well (thanks to those helping with this).  I believe there's still a
    > few
    >     > changes we'd have to make to claim support though (mshadow CMake
    > changes,
    >     > PyPi package creation tweaks).
    >     >
    >     > 1:
    >     >
    >     >
    > https://github.com/apache/incubator-mxnet/commit/2c3357443ec3d49a11e93c89f278264ce10c2f08
    >     >
    >     > On Tue, Nov 20, 2018 at 7:00 PM kellen sunderland <
    >     > kellen.sunderland@gmail.com> wrote:
    >     >
    >     > > Hey Steffen, I'd like to be able to merge this PR for version 1.4:
    >     > > https://github.com/apache/incubator-mxnet/pull/13310 . It fixes a
    >     > > regression in master which causes incorrect feature vectors to be
    > output
    >     > > when using the TensorRT feature.  (Thanks to Nathalie for helping
    > me
    >     > track
    >     > > down the root cause of the issue).   I'm currently blocked on a CI
    > issue
    >     > I
    >     > > haven't seen before, but hope to have it resolved by EOW.
    >     > >
    >     > > One call-out I would make is that we currently don't support Turing
    >     > > architecture (sm_75).  I've been slowly trying to add support, but
    > I
    >     > don't
    >     > > think I'd have capacity to do this done by EOW.  Does anyone feel
    >     > strongly
    >     > > we need this in the 1.4 release?  From my perspective this will
    > already
    >     > be
    >     > > a strong release without it.
    >     > >
    >     > > On Tue, Nov 20, 2018 at 6:42 PM Steffen Rochel <
    > steffenrochel@gmail.com>
    >     > > wrote:
    >     > >
    >     > >> Thanks Patrick, lets target to get the PR's merged this week.
    >     > >>
    >     > >> Call for contributions from the community: Right now we have 10 PR
    >     > >> awaiting
    >     > >> merge
    >     > >> <
    >     > >>
    >     >
    > https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aopen+label%3Apr-awaiting-merge+
    >     > >> >
    >     > >> and
    >     > >> we have 61 open PR awaiting review.
    >     > >> <
    >     > >>
    >     >
    > https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aopen+label%3Apr-awaiting-review
    >     > >> >
    >     > >> I would appreciate if you all can help to review the open PR and
    > the
    >     > >> committers can drive the merge before code freeze for 1.4.0.
    >     > >>
    >     > >> The contributors on the Java API are making progress, but not all
    >     > >> performance issues are resolved. With some luck it should be
    > possible to
    >     > >> code freeze towards end of this week.
    >     > >>
    >     > >> Are there other critical features/bugs/PR you think need to be
    > included
    >     > in
    >     > >> 1.4.0? If so, please communicate as soon as possible.
    >     > >>
    >     > >> Regards,
    >     > >> Steffen
    >     > >>
    >     > >> On Mon, Nov 19, 2018 at 8:26 PM Zhao, Patric <
    > patric.zhao@intel.com>
    >     > >> wrote:
    >     > >>
    >     > >> > Thanks, Steffen. I think there is NO open issue to block the
    > MKLDNN to
    >     > >> GA
    >     > >> > now.
    >     > >> >
    >     > >> > BTW, several quantization related PRs (#13297,#13260) are under
    > the
    >     > >> review
    >     > >> > and I think it can be merged in this week.
    >     > >> >
    >     > >> > Thanks,
    >     > >> >
    >     > >> > --Patric
    >     > >> >
    >     > >> >
    >     > >> > > -----Original Message-----
    >     > >> > > From: Steffen Rochel [mailto:steffenrochel@gmail.com]
    >     > >> > > Sent: Tuesday, November 20, 2018 2:57 AM
    >     > >> > > To: dev@mxnet.incubator.apache.org
    >     > >> > > Subject: Re: [Announce] Upcoming Apache MXNet (incubating)
    > 1.4.0
    >     > >> release
    >     > >> > >
    >     > >> > > On Friday the contributors working on Java API discovered a
    >     > potential
    >     > >> > > performance problem with inference using Java API vs. Python.
    >     > >> > Investigation
    >     > >> > > is ongoing.
    >     > >> > > As the Java API is one of the main features for the upcoming
    >     > release,
    >     > >> I
    >     > >> > > suggest to post-pone the code freeze towards end of this week.
    >     > >> > >
    >     > >> > > Please provide feedback and concern about the change in dates
    > for
    >     > code
    >     > >> > > freeze and 1.4.0 release. I will provide updates on progress
    >     > resolving
    >     > >> > the
    >     > >> > > potential performance problem.
    >     > >> > >
    >     > >> > > Patrick - do you think it is possible to resolve the remaining
    >     > issues
    >     > >> on
    >     > >> > MKL-
    >     > >> > > DNN this week, so we can consider GA for MKL-DNN with 1.4.0?
    >     > >> > >
    >     > >> > > Regards,
    >     > >> > > Steffen
    >     > >> > >
    >     > >> > > On Thu, Nov 15, 2018 at 5:26 AM Anton Chernov <
    > mechernov@gmail.com>
    >     > >> > > wrote:
    >     > >> > >
    >     > >> > > > I'd like to remind everyone that 'code freeze' would mean
    > cutting
    >     > a
    >     > >> > > > v1.4.x release branch and all following fixes would need to
    > be
    >     > >> > backported.
    >     > >> > > > Development on master can be continued as usual.
    >     > >> > > >
    >     > >> > > > Best
    >     > >> > > > Anton
    >     > >> > > >
    >     > >> > > > ср, 14 нояб. 2018 г. в 6:04, Steffen Rochel <
    >     > >> steffenrochel@gmail.com>:
    >     > >> > > >
    >     > >> > > > > Dear MXNet community,
    >     > >> > > > > the agreed plan was to establish code freeze for 1.4.0
    > release
    >     > >> > > > > today. As the 1.3.1 patch release is still ongoing I
    > suggest to
    >     > >> > > > > post-pone the code freeze to Friday 16th November 2018.
    >     > >> > > > >
    >     > >> > > > > Sergey Kolychev has agreed to act as co-release manager
    > for all
    >     > >> > > > > tasks
    >     > >> > > > which
    >     > >> > > > > require committer privileges. If anybody is interested to
    >     > >> volunteer
    >     > >> > > > > as release manager - now is the time to speak up.
    > Otherwise I
    >     > will
    >     > >> > > > > manage
    >     > >> > > > the
    >     > >> > > > > release.
    >     > >> > > > >
    >     > >> > > > > Regards,
    >     > >> > > > > Steffen
    >     > >> > > > >
    >     > >> > > >
    >     > >> >
    >     > >>
    >     > >
    >     >
    >
    >
    >
    



Re: [Announce] Upcoming Apache MXNet (incubating) 1.4.0 release

Posted by Steffen Rochel <st...@gmail.com>.
added to 1.4.0 tracking list
<https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Plan+and+Status#ApacheMXNet(incubating)1.4.0ReleasePlanandStatus-OpenPRstotrack>
.
Steffen

On Thu, Nov 29, 2018 at 9:32 AM Zheng, Da <dz...@amazon.com.invalid> wrote:

> Hello Steffen,
>
> Can this bug be fixed in 1.4.0 release? It's a significant performance
> regression on sparse matrix multiplication.
> https://github.com/apache/incubator-mxnet/issues/13449
>
> Thanks,
> Da
>
> On 11/26/18, 6:42 AM, "Steffen Rochel" <st...@gmail.com> wrote:
>
>     Dear MXNet community,
>
>     I will be the release manager for the upcoming Apache MXNet 1.4.0
> release.
>     Sergey Kolychev will be co-managing the release and providing help
> from the
>     committers side.
>     A release candidate will be cut on November 29, 2018 and voting will
> start
>     December 7, 2018. Release notes have been drafted here [1]. If you
> have any
>     additional features in progress and would like to include it in this
>     release, please assure they have been merged by November 27, 2018.
> Release
>     schedule is available here [2].
>
>     Feel free to add any other comments/suggestions. Please help to review
> and
>     merge outstanding PR's and resolve issues impacting the quality of the
>     1.4.0 release.
>
>     Regards,
>
>     Steffen
>
>     [1]
>
> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Notes
>
>     [2]
> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Plan+and+Status
>
>
>
>
>     On Tue, Nov 20, 2018 at 7:15 PM kellen sunderland <
>     kellen.sunderland@gmail.com> wrote:
>
>     > Spoke too soon[1], looks like others have been adding Turing support
> as
>     > well (thanks to those helping with this).  I believe there's still a
> few
>     > changes we'd have to make to claim support though (mshadow CMake
> changes,
>     > PyPi package creation tweaks).
>     >
>     > 1:
>     >
>     >
> https://github.com/apache/incubator-mxnet/commit/2c3357443ec3d49a11e93c89f278264ce10c2f08
>     >
>     > On Tue, Nov 20, 2018 at 7:00 PM kellen sunderland <
>     > kellen.sunderland@gmail.com> wrote:
>     >
>     > > Hey Steffen, I'd like to be able to merge this PR for version 1.4:
>     > > https://github.com/apache/incubator-mxnet/pull/13310 . It fixes a
>     > > regression in master which causes incorrect feature vectors to be
> output
>     > > when using the TensorRT feature.  (Thanks to Nathalie for helping
> me
>     > track
>     > > down the root cause of the issue).   I'm currently blocked on a CI
> issue
>     > I
>     > > haven't seen before, but hope to have it resolved by EOW.
>     > >
>     > > One call-out I would make is that we currently don't support Turing
>     > > architecture (sm_75).  I've been slowly trying to add support, but
> I
>     > don't
>     > > think I'd have capacity to do this done by EOW.  Does anyone feel
>     > strongly
>     > > we need this in the 1.4 release?  From my perspective this will
> already
>     > be
>     > > a strong release without it.
>     > >
>     > > On Tue, Nov 20, 2018 at 6:42 PM Steffen Rochel <
> steffenrochel@gmail.com>
>     > > wrote:
>     > >
>     > >> Thanks Patrick, lets target to get the PR's merged this week.
>     > >>
>     > >> Call for contributions from the community: Right now we have 10 PR
>     > >> awaiting
>     > >> merge
>     > >> <
>     > >>
>     >
> https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aopen+label%3Apr-awaiting-merge+
>     > >> >
>     > >> and
>     > >> we have 61 open PR awaiting review.
>     > >> <
>     > >>
>     >
> https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aopen+label%3Apr-awaiting-review
>     > >> >
>     > >> I would appreciate if you all can help to review the open PR and
> the
>     > >> committers can drive the merge before code freeze for 1.4.0.
>     > >>
>     > >> The contributors on the Java API are making progress, but not all
>     > >> performance issues are resolved. With some luck it should be
> possible to
>     > >> code freeze towards end of this week.
>     > >>
>     > >> Are there other critical features/bugs/PR you think need to be
> included
>     > in
>     > >> 1.4.0? If so, please communicate as soon as possible.
>     > >>
>     > >> Regards,
>     > >> Steffen
>     > >>
>     > >> On Mon, Nov 19, 2018 at 8:26 PM Zhao, Patric <
> patric.zhao@intel.com>
>     > >> wrote:
>     > >>
>     > >> > Thanks, Steffen. I think there is NO open issue to block the
> MKLDNN to
>     > >> GA
>     > >> > now.
>     > >> >
>     > >> > BTW, several quantization related PRs (#13297,#13260) are under
> the
>     > >> review
>     > >> > and I think it can be merged in this week.
>     > >> >
>     > >> > Thanks,
>     > >> >
>     > >> > --Patric
>     > >> >
>     > >> >
>     > >> > > -----Original Message-----
>     > >> > > From: Steffen Rochel [mailto:steffenrochel@gmail.com]
>     > >> > > Sent: Tuesday, November 20, 2018 2:57 AM
>     > >> > > To: dev@mxnet.incubator.apache.org
>     > >> > > Subject: Re: [Announce] Upcoming Apache MXNet (incubating)
> 1.4.0
>     > >> release
>     > >> > >
>     > >> > > On Friday the contributors working on Java API discovered a
>     > potential
>     > >> > > performance problem with inference using Java API vs. Python.
>     > >> > Investigation
>     > >> > > is ongoing.
>     > >> > > As the Java API is one of the main features for the upcoming
>     > release,
>     > >> I
>     > >> > > suggest to post-pone the code freeze towards end of this week.
>     > >> > >
>     > >> > > Please provide feedback and concern about the change in dates
> for
>     > code
>     > >> > > freeze and 1.4.0 release. I will provide updates on progress
>     > resolving
>     > >> > the
>     > >> > > potential performance problem.
>     > >> > >
>     > >> > > Patrick - do you think it is possible to resolve the remaining
>     > issues
>     > >> on
>     > >> > MKL-
>     > >> > > DNN this week, so we can consider GA for MKL-DNN with 1.4.0?
>     > >> > >
>     > >> > > Regards,
>     > >> > > Steffen
>     > >> > >
>     > >> > > On Thu, Nov 15, 2018 at 5:26 AM Anton Chernov <
> mechernov@gmail.com>
>     > >> > > wrote:
>     > >> > >
>     > >> > > > I'd like to remind everyone that 'code freeze' would mean
> cutting
>     > a
>     > >> > > > v1.4.x release branch and all following fixes would need to
> be
>     > >> > backported.
>     > >> > > > Development on master can be continued as usual.
>     > >> > > >
>     > >> > > > Best
>     > >> > > > Anton
>     > >> > > >
>     > >> > > > ср, 14 нояб. 2018 г. в 6:04, Steffen Rochel <
>     > >> steffenrochel@gmail.com>:
>     > >> > > >
>     > >> > > > > Dear MXNet community,
>     > >> > > > > the agreed plan was to establish code freeze for 1.4.0
> release
>     > >> > > > > today. As the 1.3.1 patch release is still ongoing I
> suggest to
>     > >> > > > > post-pone the code freeze to Friday 16th November 2018.
>     > >> > > > >
>     > >> > > > > Sergey Kolychev has agreed to act as co-release manager
> for all
>     > >> > > > > tasks
>     > >> > > > which
>     > >> > > > > require committer privileges. If anybody is interested to
>     > >> volunteer
>     > >> > > > > as release manager - now is the time to speak up.
> Otherwise I
>     > will
>     > >> > > > > manage
>     > >> > > > the
>     > >> > > > > release.
>     > >> > > > >
>     > >> > > > > Regards,
>     > >> > > > > Steffen
>     > >> > > > >
>     > >> > > >
>     > >> >
>     > >>
>     > >
>     >
>
>
>

Re: [Announce] Upcoming Apache MXNet (incubating) 1.4.0 release

Posted by "Zheng, Da" <dz...@amazon.com.INVALID>.
Hello Steffen,

Can this bug be fixed in 1.4.0 release? It's a significant performance regression on sparse matrix multiplication.
https://github.com/apache/incubator-mxnet/issues/13449

Thanks,
Da

On 11/26/18, 6:42 AM, "Steffen Rochel" <st...@gmail.com> wrote:

    Dear MXNet community,
    
    I will be the release manager for the upcoming Apache MXNet 1.4.0 release.
    Sergey Kolychev will be co-managing the release and providing help from the
    committers side.
    A release candidate will be cut on November 29, 2018 and voting will start
    December 7, 2018. Release notes have been drafted here [1]. If you have any
    additional features in progress and would like to include it in this
    release, please assure they have been merged by November 27, 2018. Release
    schedule is available here [2].
    
    Feel free to add any other comments/suggestions. Please help to review and
    merge outstanding PR's and resolve issues impacting the quality of the
    1.4.0 release.
    
    Regards,
    
    Steffen
    
    [1]
    https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Notes
    
    [2] https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Plan+and+Status
    
    
    
    
    On Tue, Nov 20, 2018 at 7:15 PM kellen sunderland <
    kellen.sunderland@gmail.com> wrote:
    
    > Spoke too soon[1], looks like others have been adding Turing support as
    > well (thanks to those helping with this).  I believe there's still a few
    > changes we'd have to make to claim support though (mshadow CMake changes,
    > PyPi package creation tweaks).
    >
    > 1:
    >
    > https://github.com/apache/incubator-mxnet/commit/2c3357443ec3d49a11e93c89f278264ce10c2f08
    >
    > On Tue, Nov 20, 2018 at 7:00 PM kellen sunderland <
    > kellen.sunderland@gmail.com> wrote:
    >
    > > Hey Steffen, I'd like to be able to merge this PR for version 1.4:
    > > https://github.com/apache/incubator-mxnet/pull/13310 . It fixes a
    > > regression in master which causes incorrect feature vectors to be output
    > > when using the TensorRT feature.  (Thanks to Nathalie for helping me
    > track
    > > down the root cause of the issue).   I'm currently blocked on a CI issue
    > I
    > > haven't seen before, but hope to have it resolved by EOW.
    > >
    > > One call-out I would make is that we currently don't support Turing
    > > architecture (sm_75).  I've been slowly trying to add support, but I
    > don't
    > > think I'd have capacity to do this done by EOW.  Does anyone feel
    > strongly
    > > we need this in the 1.4 release?  From my perspective this will already
    > be
    > > a strong release without it.
    > >
    > > On Tue, Nov 20, 2018 at 6:42 PM Steffen Rochel <st...@gmail.com>
    > > wrote:
    > >
    > >> Thanks Patrick, lets target to get the PR's merged this week.
    > >>
    > >> Call for contributions from the community: Right now we have 10 PR
    > >> awaiting
    > >> merge
    > >> <
    > >>
    > https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aopen+label%3Apr-awaiting-merge+
    > >> >
    > >> and
    > >> we have 61 open PR awaiting review.
    > >> <
    > >>
    > https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aopen+label%3Apr-awaiting-review
    > >> >
    > >> I would appreciate if you all can help to review the open PR and the
    > >> committers can drive the merge before code freeze for 1.4.0.
    > >>
    > >> The contributors on the Java API are making progress, but not all
    > >> performance issues are resolved. With some luck it should be possible to
    > >> code freeze towards end of this week.
    > >>
    > >> Are there other critical features/bugs/PR you think need to be included
    > in
    > >> 1.4.0? If so, please communicate as soon as possible.
    > >>
    > >> Regards,
    > >> Steffen
    > >>
    > >> On Mon, Nov 19, 2018 at 8:26 PM Zhao, Patric <pa...@intel.com>
    > >> wrote:
    > >>
    > >> > Thanks, Steffen. I think there is NO open issue to block the MKLDNN to
    > >> GA
    > >> > now.
    > >> >
    > >> > BTW, several quantization related PRs (#13297,#13260) are under the
    > >> review
    > >> > and I think it can be merged in this week.
    > >> >
    > >> > Thanks,
    > >> >
    > >> > --Patric
    > >> >
    > >> >
    > >> > > -----Original Message-----
    > >> > > From: Steffen Rochel [mailto:steffenrochel@gmail.com]
    > >> > > Sent: Tuesday, November 20, 2018 2:57 AM
    > >> > > To: dev@mxnet.incubator.apache.org
    > >> > > Subject: Re: [Announce] Upcoming Apache MXNet (incubating) 1.4.0
    > >> release
    > >> > >
    > >> > > On Friday the contributors working on Java API discovered a
    > potential
    > >> > > performance problem with inference using Java API vs. Python.
    > >> > Investigation
    > >> > > is ongoing.
    > >> > > As the Java API is one of the main features for the upcoming
    > release,
    > >> I
    > >> > > suggest to post-pone the code freeze towards end of this week.
    > >> > >
    > >> > > Please provide feedback and concern about the change in dates for
    > code
    > >> > > freeze and 1.4.0 release. I will provide updates on progress
    > resolving
    > >> > the
    > >> > > potential performance problem.
    > >> > >
    > >> > > Patrick - do you think it is possible to resolve the remaining
    > issues
    > >> on
    > >> > MKL-
    > >> > > DNN this week, so we can consider GA for MKL-DNN with 1.4.0?
    > >> > >
    > >> > > Regards,
    > >> > > Steffen
    > >> > >
    > >> > > On Thu, Nov 15, 2018 at 5:26 AM Anton Chernov <me...@gmail.com>
    > >> > > wrote:
    > >> > >
    > >> > > > I'd like to remind everyone that 'code freeze' would mean cutting
    > a
    > >> > > > v1.4.x release branch and all following fixes would need to be
    > >> > backported.
    > >> > > > Development on master can be continued as usual.
    > >> > > >
    > >> > > > Best
    > >> > > > Anton
    > >> > > >
    > >> > > > ср, 14 нояб. 2018 г. в 6:04, Steffen Rochel <
    > >> steffenrochel@gmail.com>:
    > >> > > >
    > >> > > > > Dear MXNet community,
    > >> > > > > the agreed plan was to establish code freeze for 1.4.0 release
    > >> > > > > today. As the 1.3.1 patch release is still ongoing I suggest to
    > >> > > > > post-pone the code freeze to Friday 16th November 2018.
    > >> > > > >
    > >> > > > > Sergey Kolychev has agreed to act as co-release manager for all
    > >> > > > > tasks
    > >> > > > which
    > >> > > > > require committer privileges. If anybody is interested to
    > >> volunteer
    > >> > > > > as release manager - now is the time to speak up. Otherwise I
    > will
    > >> > > > > manage
    > >> > > > the
    > >> > > > > release.
    > >> > > > >
    > >> > > > > Regards,
    > >> > > > > Steffen
    > >> > > > >
    > >> > > >
    > >> >
    > >>
    > >
    >
    


Re: [Announce] Upcoming Apache MXNet (incubating) 1.4.0 release

Posted by Steffen Rochel <st...@gmail.com>.
Dear MXNet community,

I will be the release manager for the upcoming Apache MXNet 1.4.0 release.
Sergey Kolychev will be co-managing the release and providing help from the
committers side.
A release candidate will be cut on November 29, 2018 and voting will start
December 7, 2018. Release notes have been drafted here [1]. If you have any
additional features in progress and would like to include it in this
release, please assure they have been merged by November 27, 2018. Release
schedule is available here [2].

Feel free to add any other comments/suggestions. Please help to review and
merge outstanding PR's and resolve issues impacting the quality of the
1.4.0 release.

Regards,

Steffen

[1]
https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Notes

[2] https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Plan+and+Status




On Tue, Nov 20, 2018 at 7:15 PM kellen sunderland <
kellen.sunderland@gmail.com> wrote:

> Spoke too soon[1], looks like others have been adding Turing support as
> well (thanks to those helping with this).  I believe there's still a few
> changes we'd have to make to claim support though (mshadow CMake changes,
> PyPi package creation tweaks).
>
> 1:
>
> https://github.com/apache/incubator-mxnet/commit/2c3357443ec3d49a11e93c89f278264ce10c2f08
>
> On Tue, Nov 20, 2018 at 7:00 PM kellen sunderland <
> kellen.sunderland@gmail.com> wrote:
>
> > Hey Steffen, I'd like to be able to merge this PR for version 1.4:
> > https://github.com/apache/incubator-mxnet/pull/13310 . It fixes a
> > regression in master which causes incorrect feature vectors to be output
> > when using the TensorRT feature.  (Thanks to Nathalie for helping me
> track
> > down the root cause of the issue).   I'm currently blocked on a CI issue
> I
> > haven't seen before, but hope to have it resolved by EOW.
> >
> > One call-out I would make is that we currently don't support Turing
> > architecture (sm_75).  I've been slowly trying to add support, but I
> don't
> > think I'd have capacity to do this done by EOW.  Does anyone feel
> strongly
> > we need this in the 1.4 release?  From my perspective this will already
> be
> > a strong release without it.
> >
> > On Tue, Nov 20, 2018 at 6:42 PM Steffen Rochel <st...@gmail.com>
> > wrote:
> >
> >> Thanks Patrick, lets target to get the PR's merged this week.
> >>
> >> Call for contributions from the community: Right now we have 10 PR
> >> awaiting
> >> merge
> >> <
> >>
> https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aopen+label%3Apr-awaiting-merge+
> >> >
> >> and
> >> we have 61 open PR awaiting review.
> >> <
> >>
> https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aopen+label%3Apr-awaiting-review
> >> >
> >> I would appreciate if you all can help to review the open PR and the
> >> committers can drive the merge before code freeze for 1.4.0.
> >>
> >> The contributors on the Java API are making progress, but not all
> >> performance issues are resolved. With some luck it should be possible to
> >> code freeze towards end of this week.
> >>
> >> Are there other critical features/bugs/PR you think need to be included
> in
> >> 1.4.0? If so, please communicate as soon as possible.
> >>
> >> Regards,
> >> Steffen
> >>
> >> On Mon, Nov 19, 2018 at 8:26 PM Zhao, Patric <pa...@intel.com>
> >> wrote:
> >>
> >> > Thanks, Steffen. I think there is NO open issue to block the MKLDNN to
> >> GA
> >> > now.
> >> >
> >> > BTW, several quantization related PRs (#13297,#13260) are under the
> >> review
> >> > and I think it can be merged in this week.
> >> >
> >> > Thanks,
> >> >
> >> > --Patric
> >> >
> >> >
> >> > > -----Original Message-----
> >> > > From: Steffen Rochel [mailto:steffenrochel@gmail.com]
> >> > > Sent: Tuesday, November 20, 2018 2:57 AM
> >> > > To: dev@mxnet.incubator.apache.org
> >> > > Subject: Re: [Announce] Upcoming Apache MXNet (incubating) 1.4.0
> >> release
> >> > >
> >> > > On Friday the contributors working on Java API discovered a
> potential
> >> > > performance problem with inference using Java API vs. Python.
> >> > Investigation
> >> > > is ongoing.
> >> > > As the Java API is one of the main features for the upcoming
> release,
> >> I
> >> > > suggest to post-pone the code freeze towards end of this week.
> >> > >
> >> > > Please provide feedback and concern about the change in dates for
> code
> >> > > freeze and 1.4.0 release. I will provide updates on progress
> resolving
> >> > the
> >> > > potential performance problem.
> >> > >
> >> > > Patrick - do you think it is possible to resolve the remaining
> issues
> >> on
> >> > MKL-
> >> > > DNN this week, so we can consider GA for MKL-DNN with 1.4.0?
> >> > >
> >> > > Regards,
> >> > > Steffen
> >> > >
> >> > > On Thu, Nov 15, 2018 at 5:26 AM Anton Chernov <me...@gmail.com>
> >> > > wrote:
> >> > >
> >> > > > I'd like to remind everyone that 'code freeze' would mean cutting
> a
> >> > > > v1.4.x release branch and all following fixes would need to be
> >> > backported.
> >> > > > Development on master can be continued as usual.
> >> > > >
> >> > > > Best
> >> > > > Anton
> >> > > >
> >> > > > ср, 14 нояб. 2018 г. в 6:04, Steffen Rochel <
> >> steffenrochel@gmail.com>:
> >> > > >
> >> > > > > Dear MXNet community,
> >> > > > > the agreed plan was to establish code freeze for 1.4.0 release
> >> > > > > today. As the 1.3.1 patch release is still ongoing I suggest to
> >> > > > > post-pone the code freeze to Friday 16th November 2018.
> >> > > > >
> >> > > > > Sergey Kolychev has agreed to act as co-release manager for all
> >> > > > > tasks
> >> > > > which
> >> > > > > require committer privileges. If anybody is interested to
> >> volunteer
> >> > > > > as release manager - now is the time to speak up. Otherwise I
> will
> >> > > > > manage
> >> > > > the
> >> > > > > release.
> >> > > > >
> >> > > > > Regards,
> >> > > > > Steffen
> >> > > > >
> >> > > >
> >> >
> >>
> >
>

Re: [Announce] Upcoming Apache MXNet (incubating) 1.4.0 release

Posted by kellen sunderland <ke...@gmail.com>.
Spoke too soon[1], looks like others have been adding Turing support as
well (thanks to those helping with this).  I believe there's still a few
changes we'd have to make to claim support though (mshadow CMake changes,
PyPi package creation tweaks).

1:
https://github.com/apache/incubator-mxnet/commit/2c3357443ec3d49a11e93c89f278264ce10c2f08

On Tue, Nov 20, 2018 at 7:00 PM kellen sunderland <
kellen.sunderland@gmail.com> wrote:

> Hey Steffen, I'd like to be able to merge this PR for version 1.4:
> https://github.com/apache/incubator-mxnet/pull/13310 . It fixes a
> regression in master which causes incorrect feature vectors to be output
> when using the TensorRT feature.  (Thanks to Nathalie for helping me track
> down the root cause of the issue).   I'm currently blocked on a CI issue I
> haven't seen before, but hope to have it resolved by EOW.
>
> One call-out I would make is that we currently don't support Turing
> architecture (sm_75).  I've been slowly trying to add support, but I don't
> think I'd have capacity to do this done by EOW.  Does anyone feel strongly
> we need this in the 1.4 release?  From my perspective this will already be
> a strong release without it.
>
> On Tue, Nov 20, 2018 at 6:42 PM Steffen Rochel <st...@gmail.com>
> wrote:
>
>> Thanks Patrick, lets target to get the PR's merged this week.
>>
>> Call for contributions from the community: Right now we have 10 PR
>> awaiting
>> merge
>> <
>> https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aopen+label%3Apr-awaiting-merge+
>> >
>> and
>> we have 61 open PR awaiting review.
>> <
>> https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aopen+label%3Apr-awaiting-review
>> >
>> I would appreciate if you all can help to review the open PR and the
>> committers can drive the merge before code freeze for 1.4.0.
>>
>> The contributors on the Java API are making progress, but not all
>> performance issues are resolved. With some luck it should be possible to
>> code freeze towards end of this week.
>>
>> Are there other critical features/bugs/PR you think need to be included in
>> 1.4.0? If so, please communicate as soon as possible.
>>
>> Regards,
>> Steffen
>>
>> On Mon, Nov 19, 2018 at 8:26 PM Zhao, Patric <pa...@intel.com>
>> wrote:
>>
>> > Thanks, Steffen. I think there is NO open issue to block the MKLDNN to
>> GA
>> > now.
>> >
>> > BTW, several quantization related PRs (#13297,#13260) are under the
>> review
>> > and I think it can be merged in this week.
>> >
>> > Thanks,
>> >
>> > --Patric
>> >
>> >
>> > > -----Original Message-----
>> > > From: Steffen Rochel [mailto:steffenrochel@gmail.com]
>> > > Sent: Tuesday, November 20, 2018 2:57 AM
>> > > To: dev@mxnet.incubator.apache.org
>> > > Subject: Re: [Announce] Upcoming Apache MXNet (incubating) 1.4.0
>> release
>> > >
>> > > On Friday the contributors working on Java API discovered a potential
>> > > performance problem with inference using Java API vs. Python.
>> > Investigation
>> > > is ongoing.
>> > > As the Java API is one of the main features for the upcoming release,
>> I
>> > > suggest to post-pone the code freeze towards end of this week.
>> > >
>> > > Please provide feedback and concern about the change in dates for code
>> > > freeze and 1.4.0 release. I will provide updates on progress resolving
>> > the
>> > > potential performance problem.
>> > >
>> > > Patrick - do you think it is possible to resolve the remaining issues
>> on
>> > MKL-
>> > > DNN this week, so we can consider GA for MKL-DNN with 1.4.0?
>> > >
>> > > Regards,
>> > > Steffen
>> > >
>> > > On Thu, Nov 15, 2018 at 5:26 AM Anton Chernov <me...@gmail.com>
>> > > wrote:
>> > >
>> > > > I'd like to remind everyone that 'code freeze' would mean cutting a
>> > > > v1.4.x release branch and all following fixes would need to be
>> > backported.
>> > > > Development on master can be continued as usual.
>> > > >
>> > > > Best
>> > > > Anton
>> > > >
>> > > > ср, 14 нояб. 2018 г. в 6:04, Steffen Rochel <
>> steffenrochel@gmail.com>:
>> > > >
>> > > > > Dear MXNet community,
>> > > > > the agreed plan was to establish code freeze for 1.4.0 release
>> > > > > today. As the 1.3.1 patch release is still ongoing I suggest to
>> > > > > post-pone the code freeze to Friday 16th November 2018.
>> > > > >
>> > > > > Sergey Kolychev has agreed to act as co-release manager for all
>> > > > > tasks
>> > > > which
>> > > > > require committer privileges. If anybody is interested to
>> volunteer
>> > > > > as release manager - now is the time to speak up. Otherwise I will
>> > > > > manage
>> > > > the
>> > > > > release.
>> > > > >
>> > > > > Regards,
>> > > > > Steffen
>> > > > >
>> > > >
>> >
>>
>

Re: [Announce] Upcoming Apache MXNet (incubating) 1.4.0 release

Posted by kellen sunderland <ke...@gmail.com>.
Hey Steffen, I'd like to be able to merge this PR for version 1.4:
https://github.com/apache/incubator-mxnet/pull/13310 . It fixes a
regression in master which causes incorrect feature vectors to be output
when using the TensorRT feature.  (Thanks to Nathalie for helping me track
down the root cause of the issue).   I'm currently blocked on a CI issue I
haven't seen before, but hope to have it resolved by EOW.

One call-out I would make is that we currently don't support Turing
architecture (sm_75).  I've been slowly trying to add support, but I don't
think I'd have capacity to do this done by EOW.  Does anyone feel strongly
we need this in the 1.4 release?  From my perspective this will already be
a strong release without it.

On Tue, Nov 20, 2018 at 6:42 PM Steffen Rochel <st...@gmail.com>
wrote:

> Thanks Patrick, lets target to get the PR's merged this week.
>
> Call for contributions from the community: Right now we have 10 PR awaiting
> merge
> <
> https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aopen+label%3Apr-awaiting-merge+
> >
> and
> we have 61 open PR awaiting review.
> <
> https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aopen+label%3Apr-awaiting-review
> >
> I would appreciate if you all can help to review the open PR and the
> committers can drive the merge before code freeze for 1.4.0.
>
> The contributors on the Java API are making progress, but not all
> performance issues are resolved. With some luck it should be possible to
> code freeze towards end of this week.
>
> Are there other critical features/bugs/PR you think need to be included in
> 1.4.0? If so, please communicate as soon as possible.
>
> Regards,
> Steffen
>
> On Mon, Nov 19, 2018 at 8:26 PM Zhao, Patric <pa...@intel.com>
> wrote:
>
> > Thanks, Steffen. I think there is NO open issue to block the MKLDNN to GA
> > now.
> >
> > BTW, several quantization related PRs (#13297,#13260) are under the
> review
> > and I think it can be merged in this week.
> >
> > Thanks,
> >
> > --Patric
> >
> >
> > > -----Original Message-----
> > > From: Steffen Rochel [mailto:steffenrochel@gmail.com]
> > > Sent: Tuesday, November 20, 2018 2:57 AM
> > > To: dev@mxnet.incubator.apache.org
> > > Subject: Re: [Announce] Upcoming Apache MXNet (incubating) 1.4.0
> release
> > >
> > > On Friday the contributors working on Java API discovered a potential
> > > performance problem with inference using Java API vs. Python.
> > Investigation
> > > is ongoing.
> > > As the Java API is one of the main features for the upcoming release, I
> > > suggest to post-pone the code freeze towards end of this week.
> > >
> > > Please provide feedback and concern about the change in dates for code
> > > freeze and 1.4.0 release. I will provide updates on progress resolving
> > the
> > > potential performance problem.
> > >
> > > Patrick - do you think it is possible to resolve the remaining issues
> on
> > MKL-
> > > DNN this week, so we can consider GA for MKL-DNN with 1.4.0?
> > >
> > > Regards,
> > > Steffen
> > >
> > > On Thu, Nov 15, 2018 at 5:26 AM Anton Chernov <me...@gmail.com>
> > > wrote:
> > >
> > > > I'd like to remind everyone that 'code freeze' would mean cutting a
> > > > v1.4.x release branch and all following fixes would need to be
> > backported.
> > > > Development on master can be continued as usual.
> > > >
> > > > Best
> > > > Anton
> > > >
> > > > ср, 14 нояб. 2018 г. в 6:04, Steffen Rochel <steffenrochel@gmail.com
> >:
> > > >
> > > > > Dear MXNet community,
> > > > > the agreed plan was to establish code freeze for 1.4.0 release
> > > > > today. As the 1.3.1 patch release is still ongoing I suggest to
> > > > > post-pone the code freeze to Friday 16th November 2018.
> > > > >
> > > > > Sergey Kolychev has agreed to act as co-release manager for all
> > > > > tasks
> > > > which
> > > > > require committer privileges. If anybody is interested to volunteer
> > > > > as release manager - now is the time to speak up. Otherwise I will
> > > > > manage
> > > > the
> > > > > release.
> > > > >
> > > > > Regards,
> > > > > Steffen
> > > > >
> > > >
> >
>

Re: [Announce] Upcoming Apache MXNet (incubating) 1.4.0 release

Posted by Steffen Rochel <st...@gmail.com>.
Thanks Patrick, lets target to get the PR's merged this week.

Call for contributions from the community: Right now we have 10 PR awaiting
merge
<https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aopen+label%3Apr-awaiting-merge+>
and
we have 61 open PR awaiting review.
<https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aopen+label%3Apr-awaiting-review>
I would appreciate if you all can help to review the open PR and the
committers can drive the merge before code freeze for 1.4.0.

The contributors on the Java API are making progress, but not all
performance issues are resolved. With some luck it should be possible to
code freeze towards end of this week.

Are there other critical features/bugs/PR you think need to be included in
1.4.0? If so, please communicate as soon as possible.

Regards,
Steffen

On Mon, Nov 19, 2018 at 8:26 PM Zhao, Patric <pa...@intel.com> wrote:

> Thanks, Steffen. I think there is NO open issue to block the MKLDNN to GA
> now.
>
> BTW, several quantization related PRs (#13297,#13260) are under the review
> and I think it can be merged in this week.
>
> Thanks,
>
> --Patric
>
>
> > -----Original Message-----
> > From: Steffen Rochel [mailto:steffenrochel@gmail.com]
> > Sent: Tuesday, November 20, 2018 2:57 AM
> > To: dev@mxnet.incubator.apache.org
> > Subject: Re: [Announce] Upcoming Apache MXNet (incubating) 1.4.0 release
> >
> > On Friday the contributors working on Java API discovered a potential
> > performance problem with inference using Java API vs. Python.
> Investigation
> > is ongoing.
> > As the Java API is one of the main features for the upcoming release, I
> > suggest to post-pone the code freeze towards end of this week.
> >
> > Please provide feedback and concern about the change in dates for code
> > freeze and 1.4.0 release. I will provide updates on progress resolving
> the
> > potential performance problem.
> >
> > Patrick - do you think it is possible to resolve the remaining issues on
> MKL-
> > DNN this week, so we can consider GA for MKL-DNN with 1.4.0?
> >
> > Regards,
> > Steffen
> >
> > On Thu, Nov 15, 2018 at 5:26 AM Anton Chernov <me...@gmail.com>
> > wrote:
> >
> > > I'd like to remind everyone that 'code freeze' would mean cutting a
> > > v1.4.x release branch and all following fixes would need to be
> backported.
> > > Development on master can be continued as usual.
> > >
> > > Best
> > > Anton
> > >
> > > ср, 14 нояб. 2018 г. в 6:04, Steffen Rochel <st...@gmail.com>:
> > >
> > > > Dear MXNet community,
> > > > the agreed plan was to establish code freeze for 1.4.0 release
> > > > today. As the 1.3.1 patch release is still ongoing I suggest to
> > > > post-pone the code freeze to Friday 16th November 2018.
> > > >
> > > > Sergey Kolychev has agreed to act as co-release manager for all
> > > > tasks
> > > which
> > > > require committer privileges. If anybody is interested to volunteer
> > > > as release manager - now is the time to speak up. Otherwise I will
> > > > manage
> > > the
> > > > release.
> > > >
> > > > Regards,
> > > > Steffen
> > > >
> > >
>

RE: [Announce] Upcoming Apache MXNet (incubating) 1.4.0 release

Posted by "Zhao, Patric" <pa...@intel.com>.
Thanks, Steffen. I think there is NO open issue to block the MKLDNN to GA now.

BTW, several quantization related PRs (#13297,#13260) are under the review and I think it can be merged in this week.

Thanks,

--Patric


> -----Original Message-----
> From: Steffen Rochel [mailto:steffenrochel@gmail.com]
> Sent: Tuesday, November 20, 2018 2:57 AM
> To: dev@mxnet.incubator.apache.org
> Subject: Re: [Announce] Upcoming Apache MXNet (incubating) 1.4.0 release
> 
> On Friday the contributors working on Java API discovered a potential
> performance problem with inference using Java API vs. Python. Investigation
> is ongoing.
> As the Java API is one of the main features for the upcoming release, I
> suggest to post-pone the code freeze towards end of this week.
> 
> Please provide feedback and concern about the change in dates for code
> freeze and 1.4.0 release. I will provide updates on progress resolving the
> potential performance problem.
> 
> Patrick - do you think it is possible to resolve the remaining issues on MKL-
> DNN this week, so we can consider GA for MKL-DNN with 1.4.0?
> 
> Regards,
> Steffen
> 
> On Thu, Nov 15, 2018 at 5:26 AM Anton Chernov <me...@gmail.com>
> wrote:
> 
> > I'd like to remind everyone that 'code freeze' would mean cutting a
> > v1.4.x release branch and all following fixes would need to be backported.
> > Development on master can be continued as usual.
> >
> > Best
> > Anton
> >
> > ср, 14 нояб. 2018 г. в 6:04, Steffen Rochel <st...@gmail.com>:
> >
> > > Dear MXNet community,
> > > the agreed plan was to establish code freeze for 1.4.0 release
> > > today. As the 1.3.1 patch release is still ongoing I suggest to
> > > post-pone the code freeze to Friday 16th November 2018.
> > >
> > > Sergey Kolychev has agreed to act as co-release manager for all
> > > tasks
> > which
> > > require committer privileges. If anybody is interested to volunteer
> > > as release manager - now is the time to speak up. Otherwise I will
> > > manage
> > the
> > > release.
> > >
> > > Regards,
> > > Steffen
> > >
> >

Re: [Announce] Upcoming Apache MXNet (incubating) 1.4.0 release

Posted by Steffen Rochel <st...@gmail.com>.
On Friday the contributors working on Java API discovered a potential
performance problem with inference using Java API vs. Python. Investigation
is ongoing.
As the Java API is one of the main features for the upcoming release, I
suggest to post-pone the code freeze towards end of this week.

Please provide feedback and concern about the change in dates for code
freeze and 1.4.0 release. I will provide updates on progress resolving the
potential performance problem.

Patrick - do you think it is possible to resolve the remaining issues on
MKL-DNN this week, so we can consider GA for MKL-DNN with 1.4.0?

Regards,
Steffen

On Thu, Nov 15, 2018 at 5:26 AM Anton Chernov <me...@gmail.com> wrote:

> I'd like to remind everyone that 'code freeze' would mean cutting a v1.4.x
> release branch and all following fixes would need to be backported.
> Development on master can be continued as usual.
>
> Best
> Anton
>
> ср, 14 нояб. 2018 г. в 6:04, Steffen Rochel <st...@gmail.com>:
>
> > Dear MXNet community,
> > the agreed plan was to establish code freeze for 1.4.0 release today. As
> > the 1.3.1 patch release is still ongoing I suggest to post-pone the code
> > freeze to Friday 16th November 2018.
> >
> > Sergey Kolychev has agreed to act as co-release manager for all tasks
> which
> > require committer privileges. If anybody is interested to volunteer as
> > release manager - now is the time to speak up. Otherwise I will manage
> the
> > release.
> >
> > Regards,
> > Steffen
> >
>

Re: [Announce] Upcoming Apache MXNet (incubating) 1.4.0 release

Posted by Anton Chernov <me...@gmail.com>.
I'd like to remind everyone that 'code freeze' would mean cutting a v1.4.x
release branch and all following fixes would need to be backported.
Development on master can be continued as usual.

Best
Anton

ср, 14 нояб. 2018 г. в 6:04, Steffen Rochel <st...@gmail.com>:

> Dear MXNet community,
> the agreed plan was to establish code freeze for 1.4.0 release today. As
> the 1.3.1 patch release is still ongoing I suggest to post-pone the code
> freeze to Friday 16th November 2018.
>
> Sergey Kolychev has agreed to act as co-release manager for all tasks which
> require committer privileges. If anybody is interested to volunteer as
> release manager - now is the time to speak up. Otherwise I will manage the
> release.
>
> Regards,
> Steffen
>