You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mxnet.apache.org by Roshani Nagmote <ro...@gmail.com> on 2018/09/01 02:59:33 UTC

[VOTE] Release MXNet version 1.3.0.RC0

Hi all,

I would like to propose a vote to release Apache MXNet (incubating) version
1.3.0.RC0. Voting will start now (Friday, Aug 31st) and end at 7:00 PM
PDT, Wednesday, Sept 5th.

Link to release notes:
https://github.com/apache/incubator-mxnet/releases

Link to release candidate 1.3.0.rc0:
*https://github.com/apache/incubator-mxnet/releases/tag/1.3.0.rc
<https://github.com/apache/incubator-mxnet/releases/tag/1.3.0.rc0>0*

View this page, click on "Build from Source", and use the source code
obtained from 1.3.0.rc0 tag:
https://mxnet.incubator.apache.org/install/index.html

Please remember to TEST first before voting accordingly:

+1 = approve
+0 = no opinion
-1 = disapprove (provide reason)

Thanks,
Roshani

Re: [VOTE] Release MXNet version 1.3.0.RC0

Posted by Sheng Zha <sz...@gmail.com>.
Hi Hagay,

You asked, "It can be fixed and included in the release alongside the rest
of the release content, right?"

Yes, it can, after it has appropriate approval and merged to master, and at
the cost of restarting the vote.

However, personally, I do not think there's enough justification for this
patch to stop the release, given that:
1. this is not a regression, so 1.3 is not in a worse shape than any prior
releases, in the area that this patch addresses.
2. the attempt of putting in this patch does not respect the code freeze
time that the community agrees.
3. we are not stopping this issue for any of the other 139 open bug reports
[1] and you did not provide an argument that fixing this bug is more
important than fixing any of those 139 bugs.

Finally, your first claiming that the fix "is ready to be cherry picked
into the release branch" when it's not, and then moving on to "I think it
is waiting for your review", this flow makes me uncomfortable. If you'd
like to imply that I'm blocking the merge of that patch, I'm not. As you
may not realize, I have other work to do as many committers do. Given your
status as an engineering lead at Amazon, you can probably get immediate
help if you ask the committers on your team.

[1]
https://github.com/apache/incubator-mxnet/issues?page=2&q=is%3Aissue+is%3Aopen+label%3ABug

On Tue, Sep 4, 2018 at 4:14 PM Hagay Lupesko <lu...@gmail.com> wrote:

> Hi Sheng,
>
> Addressing your questions:
>
> - "why this specific bug is more important than all the other known bugs,
> that this becomes a release blocker"
> I do not consider it to be more or less important than other fixes. It can
> be fixed and included in the release alongside the rest of the release
> content, right?
> From the description of the issue it seems important since it is blocking
> users from loading models that were previously trained and saved. There is
> nothing stopping the community from including this fix into 1.3.0,
> alongside the rest of the features and fixes.
>
> - "The bug exists since SymbolBlock was introduced a year ago and has
> survived at least three releases, so this is not a regression."
> I do not think I said it is a regression. However, the fact a bug existed
> before, does not mean it is OK to release it rather than fix it.
>
> - "Timeline-wise, this bug was reported on 7/21, but was not reported as
> release-blocker in the release discussion thread until 8/31 [1]. Neither
> its reporting as release-blocker nor its fix made it for the 8/3 code
> freeze."
> You are right, would have been better to have this identified and fixed
> earlier and included before code freeze.
>
> - "The PR is still not ready yet as it doesn't have approval."
> I think it is waiting for your review.
>
> - "it would be great if you could provide some additional reasoning besides
> "X mentions the issue" or "fix was done by X""
> I have. Repeating what I wrote in my previous email for clarity: Basic
> functionality broken: loading a model (albeit one that that was saved as
> non FP32)
>
> So, yes - this issue seems to have been out there for a while, somehow went
> under the radar... but I think the key question is whether this blocks a
> basic functionality in MXNet. I believe so, hence my -1 vote.
>
> Hagay
>
> On Tue, Sep 4, 2018 at 1:19 PM Sheng Zha <sz...@gmail.com> wrote:
>
> > Hi Hagay and Sandeep,
> >
> > Could you help us understand why this specific bug is more important than
> > all the other known bugs, that this becomes a release blocker?
> >
> > Some facts to consider:
> > - The bug exists since SymbolBlock was introduced a year ago and has
> > survived at least three releases, so this is not a regression.
> > - Timeline-wise, this bug was reported on 7/21, but was not reported as
> > release-blocker in the release discussion thread until 8/31 [1]. Neither
> > its reporting as release-blocker nor its fix made it for the 8/3 code
> > freeze.
> > - The PR is still not ready yet as it doesn't have approval.
> >
> > Hagay, it would be great if you could provide some additional reasoning
> > besides "X mentions the issue" or "fix was done by X". Thanks.
> >
> > -sz
> >
> > [1]
> >
> >
> https://lists.apache.org/thread.html/d1ed611f98c20d5d85c294b0c07c8bdebca13a209cf66a3872c9123e@%3Cdev.mxnet.apache.org%3E
> >
> > On Tue, Sep 4, 2018 at 12:39 PM Hagay Lupesko <lu...@gmail.com> wrote:
> >
> > > Sandeep mentions the issue of an error when user tries to load model
> > params
> > > trained/saved as FP16.
> > > https://github.com/apache/incubator-mxnet/issues/11849
> > > The fix was done by Sandeep:
> > > https://github.com/apache/incubator-mxnet/pull/12412 and is ready to
> be
> > > cherry picked into the release branch.
> > >
> > > This seems like a release blocker to me:
> > > - Basic functionality broken: loading a model (albeit one that that was
> > > saved as non FP32)
> > > - Reported by 3 users (wgchang@, nicklhy@ and ThomasDelteil@)
> > >
> > > -1 (non binding)
> > >
> > > Hagay
> > >
> > >
> > >
> > > On Tue, Sep 4, 2018 at 12:01 PM sandeep krishnamurthy <
> > > sandeep.krishna98@gmail.com> wrote:
> > >
> > > > "- 0"
> > > >
> > > > I believe the bug #11849
> > > > <https://github.com/apache/incubator-mxnet/issues/11849>, unable to
> > > import
> > > > non-fp32 models into Gluon, fixed in this PR #12412
> > > > <https://github.com/apache/incubator-mxnet/pull/12412> is important
> > for
> > > > the
> > > > users. I would rather pick this fix in this release than plan a minor
> > > > release later.
> > > >
> > > > Best,
> > > > Sandeep
> > > >
> > > >
> > > >
> > > > On Mon, Sep 3, 2018 at 2:34 PM Philip Cho <
> chohyu01@cs.washington.edu>
> > > > wrote:
> > > >
> > > > > Actually, the command "git clone --recursive
> > > > > https://github.com/apache/incubator-mxnet -b 1.3.0.rc0" works fine
> > > now,
> > > > > never mind.
> > > > >
> > > > > On Mon, Sep 3, 2018 at 1:45 PM Philip Cho <
> > chohyu01@cs.washington.edu>
> > > > > wrote:
> > > > >
> > > > > > Unfortunately, MXNet was depending on a branch of TVM that is now
> > > > > deleted.
> > > > > > We will have to merge #12448
> > > > > > <https://github.com/apache/incubator-mxnet/pull/12448> before
> the
> > > > > release.
> > > > > >
> > > > > > Background: See dmlc/tvm#1394 <
> > > https://github.com/dmlc/tvm/issues/1394
> > > > >.
> > > > > >
> > > > > > Philip.
> > > > > >
> > > > > > On Mon, Sep 3, 2018 at 7:26 AM Carin Meier <carinmeier@gmail.com
> >
> > > > wrote:
> > > > > >
> > > > > >> Checked out the tag, built and tested the Clojure package. +1
> > > > > >>
> > > > > >> On Fri, Aug 31, 2018 at 10:59 PM Roshani Nagmote <
> > > > > >> roshaninagmote2@gmail.com>
> > > > > >> wrote:
> > > > > >>
> > > > > >> > Hi all,
> > > > > >> >
> > > > > >> > I would like to propose a vote to release Apache MXNet
> > > (incubating)
> > > > > >> version
> > > > > >> > 1.3.0.RC0. Voting will start now (Friday, Aug 31st) and end at
> > > 7:00
> > > > PM
> > > > > >> > PDT, Wednesday, Sept 5th.
> > > > > >> >
> > > > > >> > Link to release notes:
> > > > > >> > https://github.com/apache/incubator-mxnet/releases
> > > > > >> >
> > > > > >> > Link to release candidate 1.3.0.rc0:
> > > > > >> > *
> > https://github.com/apache/incubator-mxnet/releases/tag/1.3.0.rc
> > > > > >> > <
> > https://github.com/apache/incubator-mxnet/releases/tag/1.3.0.rc0
> > > > >0*
> > > > > >> >
> > > > > >> > View this page, click on "Build from Source", and use the
> source
> > > > code
> > > > > >> > obtained from 1.3.0.rc0 tag:
> > > > > >> > https://mxnet.incubator.apache.org/install/index.html
> > > > > >> >
> > > > > >> > Please remember to TEST first before voting accordingly:
> > > > > >> >
> > > > > >> > +1 = approve
> > > > > >> > +0 = no opinion
> > > > > >> > -1 = disapprove (provide reason)
> > > > > >> >
> > > > > >> > Thanks,
> > > > > >> > Roshani
> > > > > >> >
> > > > > >>
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Sandeep Krishnamurthy
> > > >
> > >
> >
>

Re: [VOTE] Release MXNet version 1.3.0.RC0

Posted by Roshani Nagmote <ro...@gmail.com>.
Thanks everyone for testing and voting for the release. I am working with
Sheng to finalize and post the release. Announcement will follow soon.

Regards,
Roshani

On Mon, Sep 10, 2018 at 7:03 AM kellen sunderland <
kellen.sunderland@gmail.com> wrote:

> Tracked down the issue referred to above and it's not a bug.   I'll update
> the ticket.
>
> Changing to +1.
>
> On Mon, Sep 10, 2018 at 3:00 PM kellen sunderland <
> kellen.sunderland@gmail.com> wrote:
>
> > -0.1
> >
> > There's one test failure I've run into (details below).  Following
> Indhu's
> > logic I don't think this should block the release as it's not relating
> to a
> > release feature introduced in this version.
> >
> > I'm trying to use the cpp-package examples as reference code for how to
> > run MXNet models from a native context. I'd like to run them with ASAN
> as a
> > sanity check for memory leaks and pointer errors.  I was continually
> > running into segfaults and crashes w/ and w/o ASAN.  A little googling
> > shows me that this issue has already been reported, and is related to
> > running tests on CPU, not to any changes I made:
> > https://github.com/apache/incubator-mxnet/issues/9814  Having what our
> > effectively our reference examples crash is not a good practice IMO.
> >
> > I also share some concerns around the fp16 failures.  I know developers
> > who are currently porting their models to Gluon who use fp16.  They'll be
> > disappointed with the error.
> >
> > In general though, release looks good.  Big thanks to Sheng and Roshani
> > for putting it together (and sorry for the late testing).
> >
> > -Kellen
> >
> >
> > On Fri, Sep 7, 2018 at 4:31 AM Anirudh <an...@gmail.com> wrote:
> >
> >> -1 Considering that using fp16 with gluon is much easier than the
> >> alternative where you need access to the model code, this fix is really
> >> useful. I understand the pain of doing mxnet release and appreciate
> >> Roshani
> >> and Shengs efforts, but this seems like something we should fix.
> >>
> >> On Thu, Sep 6, 2018, 4:57 PM Haibin Lin <ha...@gmail.com>
> wrote:
> >>
> >> > +1 built from source and passes dist_sync_kvstore test on Ubuntu.
> >> >
> >> > Best,
> >> > Haibin
> >> >
> >> > On Thu, Sep 6, 2018 at 1:32 PM Indhu <in...@gmail.com> wrote:
> >> >
> >> > > +1
> >> > >
> >> > > The release candidate looks good. I'm able to build and run basic
> >> models.
> >> > >
> >> > > One the FP16 issue:
> >> > >
> >> > > Like others have pointed out, releases on expensive in terms of time
> >> and
> >> > > effort. There needs to be a high and more objective bar on what
> >> qualifies
> >> > > as a release blocker to make sure we are not setting precedence for
> a
> >> lot
> >> > > of release blockers in future.
> >> > >
> >> > > I think a release blocker is justified only if there is a serious
> bug
> >> > > discovered in one of the features included in the release or if
> there
> >> is
> >> > a
> >> > > regression. Given FP16 supports is not a new feature claimed in this
> >> > > release and this is not a regression in this release candidate, I'm
> >> > > inclined to release this candidate and include the FP16 fix in a
> >> > subsequent
> >> > > release.
> >> > >
> >> > > Thanks,
> >> > > Indu
> >> > >
> >> > > On Wed, Sep 5, 2018 at 10:21 AM Aaron Markham <
> >> aaron.s.markham@gmail.com
> >> > >
> >> > > wrote:
> >> > >
> >> > > > 0 (non-binding) If we have a problem that blocks users, and a
> >> solution
> >> > in
> >> > > > hand... then we should fix it, but not at the expense of starting
> >> the
> >> > > > release cycle again just for one fix. Users can cherry pick or
> build
> >> > from
> >> > > > master if they want the fix right away, right? I'd change my mind
> >> to -1
> >> > > if
> >> > > > this wasn't the case, with good reason, and if the user impact was
> >> > > critical
> >> > > > to adoption or risks abandonment.
> >> > > >
> >> > > >
> >> > > > On Wed, Sep 5, 2018 at 9:57 AM Roshani Nagmote <
> >> > > roshaninagmote2@gmail.com>
> >> > > > wrote:
> >> > > >
> >> > > > > I believe everyone here is working hard to make MXNet a better
> >> > > framework
> >> > > > > for users. It's completely okay to have different opinions, we
> can
> >> > > decide
> >> > > > > together if this issue is a blocker or not after voting time is
> >> over.
> >> > > > >
> >> > > > > As I mentioned before, voting will end at 7 pm today. So there
> is
> >> > still
> >> > > > > time to test the release. If there are any other issues anyone
> >> > finds, I
> >> > > > > will be happy to start the process again and work on RC1. For
> >> now, I
> >> > > want
> >> > > > > to encourage everyone to utilize this time and vote. :)
> >> > > > >
> >> > > > > Thanks,
> >> > > > > Roshani
> >> > > > >
> >> > > > > On Tue, Sep 4, 2018 at 10:35 PM sandeep krishnamurthy <
> >> > > > > sandeep.krishna98@gmail.com> wrote:
> >> > > > >
> >> > > > > >    1. As a Apache MXNet community member, I raised the concern
> >> of
> >> > > > broken
> >> > > > > >    functionality for the user. I explained and provided the
> data
> >> > > points
> >> > > > > on
> >> > > > > > the
> >> > > > > >    issue, workaround and why I think it is important. If after
> >> all
> >> > > > this,
> >> > > > > > you
> >> > > > > >    think my vote is biased on my employer just because a user
> I
> >> > > quoted
> >> > > > is
> >> > > > > > from
> >> > > > > >    Amazon, this is more concerning to me on my voting
> abilities.
> >> > > > > >    2. My -1 no where undermines the huge amount of effort that
> >> goes
> >> > > > > behind
> >> > > > > >    the scene for a release to happen. Great respect and
> >> recognition
> >> > > for
> >> > > > > >    everyone involved in all the releases of MXNet in the past
> >> and
> >> > > > this. I
> >> > > > > >    voted on my judgement of what may be good for the users of
> >> > MXNet.
> >> > > > > >    3. As pointed by Naveen & Chris, -1 are NOT veto. Feel free
> >> to
> >> > > > decide
> >> > > > > >    and progress on the release as we already have >3 +1 in
> this
> >> > > thread.
> >> > > > > >
> >> > > > > >
> >> > > > > > Best,
> >> > > > > >
> >> > > > > > Sandeep
> >> > > > > >
> >> > > > > > On Tue, Sep 4, 2018 at 8:29 PM Chris Olivier <
> >> > cjolivier01@gmail.com>
> >> > > > > > wrote:
> >> > > > > >
> >> > > > > > > btw, there are no vetoes on package releases:
> >> > > > > > >
> >> > > > > > > VOTES ON PACKAGE RELEASES
> >> > > > > > > <https://www.apache.org/foundation/voting.html#ReleaseVotes
> >
> >> > > > > > >
> >> > > > > > > Votes on whether a package is ready to be released use
> >> majority
> >> > > > > approval
> >> > > > > > > <
> >> > https://www.apache.org/foundation/glossary.html#MajorityApproval>
> >> > > > --
> >> > > > > > i.e.
> >> > > > > > > at least three PMC members must vote affirmatively for
> >> release,
> >> > and
> >> > > > > there
> >> > > > > > > must be more positive than negative votes.Releases may not
> be
> >> > > vetoed.
> >> > > > > > > Generally
> >> > > > > > > the community will cancel the release vote if anyone
> >> identifies
> >> > > > serious
> >> > > > > > > problems, but in most cases the ultimate decision, lies with
> >> the
> >> > > > > > individual
> >> > > > > > > serving as release manager. The specifics of the process may
> >> vary
> >> > > > from
> >> > > > > > > project to project, but the 'minimum quorum of three +1
> votes'
> >> > rule
> >> > > > is
> >> > > > > > > universal.
> >> > > > > > >
> >> > > > > > > On Tue, Sep 4, 2018 at 7:12 PM Sheng Zha <
> szha.pvg@gmail.com>
> >> > > wrote:
> >> > > > > > >
> >> > > > > > > > Thanks for sharing your opinions, Thomas. Your recognition
> >> and
> >> > > > > respect
> >> > > > > > of
> >> > > > > > > > people's efforts on preparing the release candidate are
> >> > certainly
> >> > > > > > > > appreciated.
> >> > > > > > > >
> >> > > > > > > > Now that the vote is set to fail thanks to the veto, there
> >> will
> >> > > be
> >> > > > > > plenty
> >> > > > > > > > of opportunities to include those bug fixes, including the
> >> one
> >> > > Zhi
> >> > > > > > > > mentioned [1], which was already merged in the master and
> >> yet
> >> > > chose
> >> > > > > not
> >> > > > > > > to
> >> > > > > > > > block this release with [2]. I will be happy to work with
> >> > Roshani
> >> > > > to
> >> > > > > > > > prepare another release candidate once ready.
> >> > > > > > > >
> >> > > > > > > > -sz
> >> > > > > > > >
> >> > > > > > > > [1]
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://lists.apache.org/thread.html/f02e952bec22c82cb00a6741390a78f55373311c97464997bb455a6c@%3Cdev.mxnet.apache.org%3E
> >> > > > > > > > [2]
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://lists.apache.org/thread.html/85d3fcabb3437ba7f1af455cf69aa13eb3afd1ea1d1f6f891e9c339c@%3Cdev.mxnet.apache.org%3E
> >> > > > > > > >
> >> > > > > > > > On Tue, Sep 4, 2018 at 6:02 PM Thomas DELTEIL <
> >> > > > > > thomas.delteil1@gmail.com
> >> > > > > > > >
> >> > > > > > > > wrote:
> >> > > > > > > >
> >> > > > > > > > > -0
> >> > > > > > > > > (non-binding)
> >> > > > > > > > >
> >> > > > > > > > > If I may add some nuancing plus a personal data point as
> >> one
> >> > of
> >> > > > the
> >> > > > > > > users
> >> > > > > > > > > commenting in the bug report in question:
> >> > > > > > > > >
> >> > > > > > > > > - Performance vs. Basic functionality => I don't think
> >> high
> >> > > > > > performance
> >> > > > > > > > > use-cases and basic functionality are two obviously
> >> opposed
> >> > > > > concepts
> >> > > > > > > and
> >> > > > > > > > > see no contradiction in Hagay's and Sandeep's
> statements.
> >> > > > > > > > > Float16 support is feature of MXNet that provides more
> >> than
> >> > > twice
> >> > > > > the
> >> > > > > > > > > performance of Float32 on supported platforms, hence the
> >> high
> >> > > > > > > performance
> >> > > > > > > > > use-case. The bug is that the basic functionality of
> >> > reloading
> >> > > a
> >> > > > > > saved
> >> > > > > > > > > float16 models is currently broken.
> >> > > > > > > > >
> >> > > > > > > > > - This bug vs Other bugs => Contrary the vast majority
> of
> >> the
> >> > > 140
> >> > > > > > open
> >> > > > > > > > bugs
> >> > > > > > > > > that are mentioned above, I would put to Sandeep's
> credit
> >> > that
> >> > > > this
> >> > > > > > one
> >> > > > > > > > bug
> >> > > > > > > > > has a PR open that provides a fix for it. This would
> make
> >> it
> >> > a
> >> > > > > better
> >> > > > > > > > > candidate to get included in this release than a bug
> that
> >> has
> >> > > no
> >> > > > > fix
> >> > > > > > > > ready
> >> > > > > > > > > for it.
> >> > > > > > > > >
> >> > > > > > > > > - Personal datapoint: I recently did some
> experimentation
> >> > with
> >> > > > > > float16
> >> > > > > > > > [1]
> >> > > > > > > > > and actually coincidentally just published a video on
> >> > > optimizing
> >> > > > > > > > > performance for Gluon. Float16 conversion is one of the
> >> most,
> >> > > if
> >> > > > > not
> >> > > > > > > the
> >> > > > > > > > > most effective way to get performance out of MXNet [2].
> I
> >> > > believe
> >> > > > > > there
> >> > > > > > > > is
> >> > > > > > > > > a lot of value in publicizing more its use and hence
> >> making
> >> > > sure
> >> > > > at
> >> > > > > > > least
> >> > > > > > > > > the basic support for normal use-cases is present.
> >> > > > > > > > >
> >> > > > > > > > > Of course this needs to be balanced with the overhead of
> >> > > > preparing
> >> > > > > a
> >> > > > > > > new
> >> > > > > > > > > release candidate once the fixed is reviewed and merged,
> >> > which
> >> > > > > seems
> >> > > > > > to
> >> > > > > > > > be
> >> > > > > > > > > a lengthy and complex process in its own right, and the
> >> delay
> >> > > > with
> >> > > > > > > > > providing the other features present in 1.3 for users
> that
> >> > are
> >> > > > not
> >> > > > > > > > running
> >> > > > > > > > > off the nightly builds.
> >> > > > > > > > >
> >> > > > > > > > > All the best,
> >> > > > > > > > >
> >> > > > > > > > > Thomas
> >> > > > > > > > >
> >> > > > > > > > > [1]
> >> > > https://github.com/ThomasDelteil/PerformanceTricksMXNetGluon
> >> > > > > > > > > [2]
> >> > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://www.youtube.com/watch?v=Cqo7FPftNyo&t=0s&list=PLkEvNnRk8uVk6U515Pj-jHQUxFC4eDi3m
> >> > > > > > > > >
> >> > > > > > > > > Le mar. 4 sept. 2018 à 17:11, Sheng Zha <
> >> szha.pvg@gmail.com>
> >> > a
> >> > > > > > écrit :
> >> > > > > > > > >
> >> > > > > > > > > > Sandeep,
> >> > > > > > > > > >
> >> > > > > > > > > > Thanks for explaining your veto. We have open bugs
> that
> >> > > > impacted
> >> > > > > a
> >> > > > > > > lot
> >> > > > > > > > > more
> >> > > > > > > > > > than just 3 customers, just by referring to the number
> >> of
> >> > > > > > commenters
> >> > > > > > > on
> >> > > > > > > > > the
> >> > > > > > > > > > issue [1].
> >> > > > > > > > > >
> >> > > > > > > > > > You said that this is for "high performance use
> cases",
> >> > which
> >> > > > > > > > contradicts
> >> > > > > > > > > > with Hagay's assement that this is "basic
> functionality
> >> > > > broken".
> >> > > > > > > Given
> >> > > > > > > > > that
> >> > > > > > > > > > this is for advanced use cases of using half-precision
> >> > > > training,
> >> > > > > > why
> >> > > > > > > is
> >> > > > > > > > > it
> >> > > > > > > > > > so much more important than any other open bug
> reports,
> >> > that
> >> > > > for
> >> > > > > > this
> >> > > > > > > > > > specific bug fix, we have to delay the access of
> regular
> >> > > users
> >> > > > to
> >> > > > > > the
> >> > > > > > > > new
> >> > > > > > > > > > MXNet 1.3 release by at least another week?
> >> > > > > > > > > >
> >> > > > > > > > > > Honestly, I'm concerned that your vote is biased by
> >> Amazon
> >> > > > > > > involvement,
> >> > > > > > > > > > given that you quoted Amazon Rekognition.
> >> > > > > > > > > >
> >> > > > > > > > > > -sz
> >> > > > > > > > > >
> >> > > > > > > > > > [1]
> >> > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://github.com/apache/incubator-mxnet/issues?q=is%3Aissue+is%3Aopen+label%3ABug+sort%3Acomments-desc
> >> > > > > > > > > >
> >> > > > > > > > > > On Tue, Sep 4, 2018 at 4:51 PM sandeep krishnamurthy <
> >> > > > > > > > > > sandeep.krishna98@gmail.com> wrote:
> >> > > > > > > > > >
> >> > > > > > > > > > > My initial vote of “-0” was due to lack of info
> from a
> >> > user
> >> > > > who
> >> > > > > > had
> >> > > > > > > > > said,
> >> > > > > > > > > > > he overcame this issue for FP16 model.
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > > However, suggested workaround [1] for the issue is
> not
> >> > > > straight
> >> > > > > > > > forward
> >> > > > > > > > > > and
> >> > > > > > > > > > > generally usable for all users. Also, issue is not
> >> simple
> >> > > and
> >> > > > > > > > isolated
> >> > > > > > > > > to
> >> > > > > > > > > > > be listed in the Release Notes as known issue with a
> >> > > > > workaround.
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > > Changing my vote to: "-1 (binding)" owing to the
> user
> >> > > impact
> >> > > > > [3]
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > > @Sheng:
> >> > > > > > > > > > >
> >> > > > > > > > > > > 1. Agreed, bug existed from long time. However, FP16
> >> and
> >> > > such
> >> > > > > > > > > > optimizations
> >> > > > > > > > > > > were added later on. Followed by users [2] using
> this
> >> > > feature
> >> > > > > for
> >> > > > > > > > high
> >> > > > > > > > > > > performance use cases. It is not ok to measure
> >> severity
> >> > of
> >> > > > the
> >> > > > > > bug
> >> > > > > > > > > based
> >> > > > > > > > > > on
> >> > > > > > > > > > > its past existence, rather we can see who is
> impacted
> >> now
> >> > > and
> >> > > > > is
> >> > > > > > > it a
> >> > > > > > > > > > small
> >> > > > > > > > > > > subset with a simple workaround or large user
> >> impacting
> >> > > > issue.
> >> > > > > > > > > > >
> >> > > > > > > > > > > 2. Agreed bug was reported 7/21. However, I became
> >> aware
> >> > of
> >> > > > > this
> >> > > > > > > > issue
> >> > > > > > > > > on
> >> > > > > > > > > > > 08/29 and submitted the fix on 08/30. Also, I did
> >> bring
> >> > > this
> >> > > > to
> >> > > > > > the
> >> > > > > > > > > > notice
> >> > > > > > > > > > > of community, you and 1.3 release manager (Roshani)
> on
> >> > the
> >> > > > RC0
> >> > > > > > > > proposal
> >> > > > > > > > > > > thread. Also, I would focus on the issue and user
> >> impact
> >> > > than
> >> > > > > who
> >> > > > > > > > > > > identified and who is fixing the issue.
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > > Based on my discussion with 2 users, I think it is a
> >> > > > important
> >> > > > > > > > feature
> >> > > > > > > > > > for
> >> > > > > > > > > > > them to see in Apache MXNet v1.3.0.
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > > Best,
> >> > > > > > > > > > >
> >> > > > > > > > > > > Sandeep
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > > [1] Workaround used by the user.
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > > net_fp16 =
> >> > > > > > > mx.gluon.SymbolBlock.imports('resnet34_fp16-symbol.json',
> >> > > > > > > > > > > ['data'])
> >> > > > > > > > > > >
> >> > > > > > > > > > > params_fp16 =
> mx.nd.load('resnet34_fp16-0000.params')
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > > for k, v in params_fp16.items():
> >> > > > > > > > > > >
> >> > > > > > > > > > >     new_key = k.split(':')[1]
> >> > > > > > > > > > >
> >> > > > > > > > > > >     net_fp16.collect_params()[new_key].cast(v.dtype)
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > net_fp16.collect_params().load('resnet34_fp16-0000.params',
> >> > > > > ctx)
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > > [2] Amazon Rekognition
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > > [3] User story: Train a model -> Cast it to FP16 ->
> >> Save
> >> > > the
> >> > > > > > model
> >> > > > > > > ->
> >> > > > > > > > > > Load
> >> > > > > > > > > > > back the model does not work. They have to cast
> every
> >> > > > parameter
> >> > > > > > > with
> >> > > > > > > > a
> >> > > > > > > > > > > workaround mentioned above [1].
> >> > > > > > > > > > >
> >> > > > > > > > > > > On Tue, Sep 4, 2018 at 4:14 PM Hagay Lupesko <
> >> > > > > lupesko@gmail.com>
> >> > > > > > > > > wrote:
> >> > > > > > > > > > >
> >> > > > > > > > > > > > Hi Sheng,
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > Addressing your questions:
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > - "why this specific bug is more important than
> all
> >> the
> >> > > > other
> >> > > > > > > known
> >> > > > > > > > > > bugs,
> >> > > > > > > > > > > > that this becomes a release blocker"
> >> > > > > > > > > > > > I do not consider it to be more or less important
> >> than
> >> > > > other
> >> > > > > > > fixes.
> >> > > > > > > > > It
> >> > > > > > > > > > > can
> >> > > > > > > > > > > > be fixed and included in the release alongside the
> >> rest
> >> > > of
> >> > > > > the
> >> > > > > > > > > release
> >> > > > > > > > > > > > content, right?
> >> > > > > > > > > > > > From the description of the issue it seems
> important
> >> > > since
> >> > > > it
> >> > > > > > is
> >> > > > > > > > > > blocking
> >> > > > > > > > > > > > users from loading models that were previously
> >> trained
> >> > > and
> >> > > > > > saved.
> >> > > > > > > > > There
> >> > > > > > > > > > > is
> >> > > > > > > > > > > > nothing stopping the community from including this
> >> fix
> >> > > into
> >> > > > > > > 1.3.0,
> >> > > > > > > > > > > > alongside the rest of the features and fixes.
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > - "The bug exists since SymbolBlock was
> introduced a
> >> > year
> >> > > > ago
> >> > > > > > and
> >> > > > > > > > has
> >> > > > > > > > > > > > survived at least three releases, so this is not a
> >> > > > > regression."
> >> > > > > > > > > > > > I do not think I said it is a regression. However,
> >> the
> >> > > > fact a
> >> > > > > > bug
> >> > > > > > > > > > existed
> >> > > > > > > > > > > > before, does not mean it is OK to release it
> rather
> >> > than
> >> > > > fix
> >> > > > > > it.
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > - "Timeline-wise, this bug was reported on 7/21,
> but
> >> > was
> >> > > > not
> >> > > > > > > > reported
> >> > > > > > > > > > as
> >> > > > > > > > > > > > release-blocker in the release discussion thread
> >> until
> >> > > 8/31
> >> > > > > > [1].
> >> > > > > > > > > > Neither
> >> > > > > > > > > > > > its reporting as release-blocker nor its fix made
> it
> >> > for
> >> > > > the
> >> > > > > > 8/3
> >> > > > > > > > code
> >> > > > > > > > > > > > freeze."
> >> > > > > > > > > > > > You are right, would have been better to have this
> >> > > > identified
> >> > > > > > and
> >> > > > > > > > > fixed
> >> > > > > > > > > > > > earlier and included before code freeze.
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > - "The PR is still not ready yet as it doesn't
> have
> >> > > > > approval."
> >> > > > > > > > > > > > I think it is waiting for your review.
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > - "it would be great if you could provide some
> >> > additional
> >> > > > > > > reasoning
> >> > > > > > > > > > > besides
> >> > > > > > > > > > > > "X mentions the issue" or "fix was done by X""
> >> > > > > > > > > > > > I have. Repeating what I wrote in my previous
> email
> >> for
> >> > > > > > clarity:
> >> > > > > > > > > Basic
> >> > > > > > > > > > > > functionality broken: loading a model (albeit one
> >> that
> >> > > that
> >> > > > > was
> >> > > > > > > > saved
> >> > > > > > > > > > as
> >> > > > > > > > > > > > non FP32)
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > So, yes - this issue seems to have been out there
> >> for a
> >> > > > > while,
> >> > > > > > > > > somehow
> >> > > > > > > > > > > went
> >> > > > > > > > > > > > under the radar... but I think the key question is
> >> > > whether
> >> > > > > this
> >> > > > > > > > > blocks
> >> > > > > > > > > > a
> >> > > > > > > > > > > > basic functionality in MXNet. I believe so, hence
> >> my -1
> >> > > > vote.
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > Hagay
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > On Tue, Sep 4, 2018 at 1:19 PM Sheng Zha <
> >> > > > szha.pvg@gmail.com
> >> > > > > >
> >> > > > > > > > wrote:
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > > Hi Hagay and Sandeep,
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > Could you help us understand why this specific
> >> bug is
> >> > > > more
> >> > > > > > > > > important
> >> > > > > > > > > > > than
> >> > > > > > > > > > > > > all the other known bugs, that this becomes a
> >> release
> >> > > > > > blocker?
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > Some facts to consider:
> >> > > > > > > > > > > > > - The bug exists since SymbolBlock was
> introduced
> >> a
> >> > > year
> >> > > > > ago
> >> > > > > > > and
> >> > > > > > > > > has
> >> > > > > > > > > > > > > survived at least three releases, so this is
> not a
> >> > > > > > regression.
> >> > > > > > > > > > > > > - Timeline-wise, this bug was reported on 7/21,
> >> but
> >> > was
> >> > > > not
> >> > > > > > > > > reported
> >> > > > > > > > > > as
> >> > > > > > > > > > > > > release-blocker in the release discussion thread
> >> > until
> >> > > > 8/31
> >> > > > > > > [1].
> >> > > > > > > > > > > Neither
> >> > > > > > > > > > > > > its reporting as release-blocker nor its fix
> made
> >> it
> >> > > for
> >> > > > > the
> >> > > > > > > 8/3
> >> > > > > > > > > code
> >> > > > > > > > > > > > > freeze.
> >> > > > > > > > > > > > > - The PR is still not ready yet as it doesn't
> have
> >> > > > > approval.
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > Hagay, it would be great if you could provide
> some
> >> > > > > additional
> >> > > > > > > > > > reasoning
> >> > > > > > > > > > > > > besides "X mentions the issue" or "fix was done
> by
> >> > X".
> >> > > > > > Thanks.
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > -sz
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > [1]
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://lists.apache.org/thread.html/d1ed611f98c20d5d85c294b0c07c8bdebca13a209cf66a3872c9123e@%3Cdev.mxnet.apache.org%3E
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > On Tue, Sep 4, 2018 at 12:39 PM Hagay Lupesko <
> >> > > > > > > lupesko@gmail.com
> >> > > > > > > > >
> >> > > > > > > > > > > wrote:
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > > Sandeep mentions the issue of an error when
> user
> >> > > tries
> >> > > > to
> >> > > > > > > load
> >> > > > > > > > > > model
> >> > > > > > > > > > > > > params
> >> > > > > > > > > > > > > > trained/saved as FP16.
> >> > > > > > > > > > > > > >
> >> > > https://github.com/apache/incubator-mxnet/issues/11849
> >> > > > > > > > > > > > > > The fix was done by Sandeep:
> >> > > > > > > > > > > > > >
> >> > https://github.com/apache/incubator-mxnet/pull/12412
> >> > > > and
> >> > > > > > is
> >> > > > > > > > > ready
> >> > > > > > > > > > to
> >> > > > > > > > > > > > be
> >> > > > > > > > > > > > > > cherry picked into the release branch.
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > This seems like a release blocker to me:
> >> > > > > > > > > > > > > > - Basic functionality broken: loading a model
> >> > (albeit
> >> > > > one
> >> > > > > > > that
> >> > > > > > > > > that
> >> > > > > > > > > > > was
> >> > > > > > > > > > > > > > saved as non FP32)
> >> > > > > > > > > > > > > > - Reported by 3 users (wgchang@, nicklhy@ and
> >> > > > > > ThomasDelteil@
> >> > > > > > > )
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > -1 (non binding)
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > Hagay
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > On Tue, Sep 4, 2018 at 12:01 PM sandeep
> >> > > krishnamurthy <
> >> > > > > > > > > > > > > > sandeep.krishna98@gmail.com> wrote:
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > "- 0"
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > I believe the bug #11849
> >> > > > > > > > > > > > > > > <
> >> > > > > https://github.com/apache/incubator-mxnet/issues/11849
> >> > > > > > >,
> >> > > > > > > > > unable
> >> > > > > > > > > > > to
> >> > > > > > > > > > > > > > import
> >> > > > > > > > > > > > > > > non-fp32 models into Gluon, fixed in this PR
> >> > #12412
> >> > > > > > > > > > > > > > > <
> >> > > > https://github.com/apache/incubator-mxnet/pull/12412>
> >> > > > > > is
> >> > > > > > > > > > > important
> >> > > > > > > > > > > > > for
> >> > > > > > > > > > > > > > > the
> >> > > > > > > > > > > > > > > users. I would rather pick this fix in this
> >> > release
> >> > > > > than
> >> > > > > > > > plan a
> >> > > > > > > > > > > minor
> >> > > > > > > > > > > > > > > release later.
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > Best,
> >> > > > > > > > > > > > > > > Sandeep
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > On Mon, Sep 3, 2018 at 2:34 PM Philip Cho <
> >> > > > > > > > > > > > chohyu01@cs.washington.edu>
> >> > > > > > > > > > > > > > > wrote:
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > Actually, the command "git clone
> --recursive
> >> > > > > > > > > > > > > > > > https://github.com/apache/incubator-mxnet
> >> -b
> >> > > > > > 1.3.0.rc0"
> >> > > > > > > > > works
> >> > > > > > > > > > > fine
> >> > > > > > > > > > > > > > now,
> >> > > > > > > > > > > > > > > > never mind.
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > On Mon, Sep 3, 2018 at 1:45 PM Philip Cho
> <
> >> > > > > > > > > > > > > chohyu01@cs.washington.edu>
> >> > > > > > > > > > > > > > > > wrote:
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > Unfortunately, MXNet was depending on a
> >> > branch
> >> > > of
> >> > > > > TVM
> >> > > > > > > > that
> >> > > > > > > > > is
> >> > > > > > > > > > > now
> >> > > > > > > > > > > > > > > > deleted.
> >> > > > > > > > > > > > > > > > > We will have to merge #12448
> >> > > > > > > > > > > > > > > > > <
> >> > > > > > https://github.com/apache/incubator-mxnet/pull/12448>
> >> > > > > > > > > > before
> >> > > > > > > > > > > > the
> >> > > > > > > > > > > > > > > > release.
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > Background: See dmlc/tvm#1394 <
> >> > > > > > > > > > > > > > https://github.com/dmlc/tvm/issues/1394
> >> > > > > > > > > > > > > > > >.
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > Philip.
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > On Mon, Sep 3, 2018 at 7:26 AM Carin
> >> Meier <
> >> > > > > > > > > > > carinmeier@gmail.com
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > wrote:
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > >> Checked out the tag, built and tested
> the
> >> > > > Clojure
> >> > > > > > > > package.
> >> > > > > > > > > > +1
> >> > > > > > > > > > > > > > > > >>
> >> > > > > > > > > > > > > > > > >> On Fri, Aug 31, 2018 at 10:59 PM
> Roshani
> >> > > > Nagmote <
> >> > > > > > > > > > > > > > > > >> roshaninagmote2@gmail.com>
> >> > > > > > > > > > > > > > > > >> wrote:
> >> > > > > > > > > > > > > > > > >>
> >> > > > > > > > > > > > > > > > >> > Hi all,
> >> > > > > > > > > > > > > > > > >> >
> >> > > > > > > > > > > > > > > > >> > I would like to propose a vote to
> >> release
> >> > > > Apache
> >> > > > > > > MXNet
> >> > > > > > > > > > > > > > (incubating)
> >> > > > > > > > > > > > > > > > >> version
> >> > > > > > > > > > > > > > > > >> > 1.3.0.RC0. Voting will start now
> >> (Friday,
> >> > > Aug
> >> > > > > > 31st)
> >> > > > > > > > and
> >> > > > > > > > > > end
> >> > > > > > > > > > > at
> >> > > > > > > > > > > > > > 7:00
> >> > > > > > > > > > > > > > > PM
> >> > > > > > > > > > > > > > > > >> > PDT, Wednesday, Sept 5th.
> >> > > > > > > > > > > > > > > > >> >
> >> > > > > > > > > > > > > > > > >> > Link to release notes:
> >> > > > > > > > > > > > > > > > >> >
> >> > > > > > https://github.com/apache/incubator-mxnet/releases
> >> > > > > > > > > > > > > > > > >> >
> >> > > > > > > > > > > > > > > > >> > Link to release candidate 1.3.0.rc0:
> >> > > > > > > > > > > > > > > > >> > *
> >> > > > > > > > > > > > >
> >> > > > > > >
> >> https://github.com/apache/incubator-mxnet/releases/tag/1.3.0.rc
> >> > > > > > > > > > > > > > > > >> > <
> >> > > > > > > > > > > > >
> >> > > > > > >
> >> https://github.com/apache/incubator-mxnet/releases/tag/1.3.0.rc0
> >> > > > > > > > > > > > > > > >0*
> >> > > > > > > > > > > > > > > > >> >
> >> > > > > > > > > > > > > > > > >> > View this page, click on "Build from
> >> > > Source",
> >> > > > > and
> >> > > > > > > use
> >> > > > > > > > > the
> >> > > > > > > > > > > > source
> >> > > > > > > > > > > > > > > code
> >> > > > > > > > > > > > > > > > >> > obtained from 1.3.0.rc0 tag:
> >> > > > > > > > > > > > > > > > >> >
> >> > > > > > > https://mxnet.incubator.apache.org/install/index.html
> >> > > > > > > > > > > > > > > > >> >
> >> > > > > > > > > > > > > > > > >> > Please remember to TEST first before
> >> > voting
> >> > > > > > > > accordingly:
> >> > > > > > > > > > > > > > > > >> >
> >> > > > > > > > > > > > > > > > >> > +1 = approve
> >> > > > > > > > > > > > > > > > >> > +0 = no opinion
> >> > > > > > > > > > > > > > > > >> > -1 = disapprove (provide reason)
> >> > > > > > > > > > > > > > > > >> >
> >> > > > > > > > > > > > > > > > >> > Thanks,
> >> > > > > > > > > > > > > > > > >> > Roshani
> >> > > > > > > > > > > > > > > > >> >
> >> > > > > > > > > > > > > > > > >>
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > --
> >> > > > > > > > > > > > > > > Sandeep Krishnamurthy
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > > --
> >> > > > > > > > > > > Sandeep Krishnamurthy
> >> > > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > > >
> >> > > > > > --
> >> > > > > > Sandeep Krishnamurthy
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
>

Re: [VOTE] Release MXNet version 1.3.0.RC0

Posted by kellen sunderland <ke...@gmail.com>.
Tracked down the issue referred to above and it's not a bug.   I'll update
the ticket.

Changing to +1.

On Mon, Sep 10, 2018 at 3:00 PM kellen sunderland <
kellen.sunderland@gmail.com> wrote:

> -0.1
>
> There's one test failure I've run into (details below).  Following Indhu's
> logic I don't think this should block the release as it's not relating to a
> release feature introduced in this version.
>
> I'm trying to use the cpp-package examples as reference code for how to
> run MXNet models from a native context. I'd like to run them with ASAN as a
> sanity check for memory leaks and pointer errors.  I was continually
> running into segfaults and crashes w/ and w/o ASAN.  A little googling
> shows me that this issue has already been reported, and is related to
> running tests on CPU, not to any changes I made:
> https://github.com/apache/incubator-mxnet/issues/9814  Having what our
> effectively our reference examples crash is not a good practice IMO.
>
> I also share some concerns around the fp16 failures.  I know developers
> who are currently porting their models to Gluon who use fp16.  They'll be
> disappointed with the error.
>
> In general though, release looks good.  Big thanks to Sheng and Roshani
> for putting it together (and sorry for the late testing).
>
> -Kellen
>
>
> On Fri, Sep 7, 2018 at 4:31 AM Anirudh <an...@gmail.com> wrote:
>
>> -1 Considering that using fp16 with gluon is much easier than the
>> alternative where you need access to the model code, this fix is really
>> useful. I understand the pain of doing mxnet release and appreciate
>> Roshani
>> and Shengs efforts, but this seems like something we should fix.
>>
>> On Thu, Sep 6, 2018, 4:57 PM Haibin Lin <ha...@gmail.com> wrote:
>>
>> > +1 built from source and passes dist_sync_kvstore test on Ubuntu.
>> >
>> > Best,
>> > Haibin
>> >
>> > On Thu, Sep 6, 2018 at 1:32 PM Indhu <in...@gmail.com> wrote:
>> >
>> > > +1
>> > >
>> > > The release candidate looks good. I'm able to build and run basic
>> models.
>> > >
>> > > One the FP16 issue:
>> > >
>> > > Like others have pointed out, releases on expensive in terms of time
>> and
>> > > effort. There needs to be a high and more objective bar on what
>> qualifies
>> > > as a release blocker to make sure we are not setting precedence for a
>> lot
>> > > of release blockers in future.
>> > >
>> > > I think a release blocker is justified only if there is a serious bug
>> > > discovered in one of the features included in the release or if there
>> is
>> > a
>> > > regression. Given FP16 supports is not a new feature claimed in this
>> > > release and this is not a regression in this release candidate, I'm
>> > > inclined to release this candidate and include the FP16 fix in a
>> > subsequent
>> > > release.
>> > >
>> > > Thanks,
>> > > Indu
>> > >
>> > > On Wed, Sep 5, 2018 at 10:21 AM Aaron Markham <
>> aaron.s.markham@gmail.com
>> > >
>> > > wrote:
>> > >
>> > > > 0 (non-binding) If we have a problem that blocks users, and a
>> solution
>> > in
>> > > > hand... then we should fix it, but not at the expense of starting
>> the
>> > > > release cycle again just for one fix. Users can cherry pick or build
>> > from
>> > > > master if they want the fix right away, right? I'd change my mind
>> to -1
>> > > if
>> > > > this wasn't the case, with good reason, and if the user impact was
>> > > critical
>> > > > to adoption or risks abandonment.
>> > > >
>> > > >
>> > > > On Wed, Sep 5, 2018 at 9:57 AM Roshani Nagmote <
>> > > roshaninagmote2@gmail.com>
>> > > > wrote:
>> > > >
>> > > > > I believe everyone here is working hard to make MXNet a better
>> > > framework
>> > > > > for users. It's completely okay to have different opinions, we can
>> > > decide
>> > > > > together if this issue is a blocker or not after voting time is
>> over.
>> > > > >
>> > > > > As I mentioned before, voting will end at 7 pm today. So there is
>> > still
>> > > > > time to test the release. If there are any other issues anyone
>> > finds, I
>> > > > > will be happy to start the process again and work on RC1. For
>> now, I
>> > > want
>> > > > > to encourage everyone to utilize this time and vote. :)
>> > > > >
>> > > > > Thanks,
>> > > > > Roshani
>> > > > >
>> > > > > On Tue, Sep 4, 2018 at 10:35 PM sandeep krishnamurthy <
>> > > > > sandeep.krishna98@gmail.com> wrote:
>> > > > >
>> > > > > >    1. As a Apache MXNet community member, I raised the concern
>> of
>> > > > broken
>> > > > > >    functionality for the user. I explained and provided the data
>> > > points
>> > > > > on
>> > > > > > the
>> > > > > >    issue, workaround and why I think it is important. If after
>> all
>> > > > this,
>> > > > > > you
>> > > > > >    think my vote is biased on my employer just because a user I
>> > > quoted
>> > > > is
>> > > > > > from
>> > > > > >    Amazon, this is more concerning to me on my voting abilities.
>> > > > > >    2. My -1 no where undermines the huge amount of effort that
>> goes
>> > > > > behind
>> > > > > >    the scene for a release to happen. Great respect and
>> recognition
>> > > for
>> > > > > >    everyone involved in all the releases of MXNet in the past
>> and
>> > > > this. I
>> > > > > >    voted on my judgement of what may be good for the users of
>> > MXNet.
>> > > > > >    3. As pointed by Naveen & Chris, -1 are NOT veto. Feel free
>> to
>> > > > decide
>> > > > > >    and progress on the release as we already have >3 +1 in this
>> > > thread.
>> > > > > >
>> > > > > >
>> > > > > > Best,
>> > > > > >
>> > > > > > Sandeep
>> > > > > >
>> > > > > > On Tue, Sep 4, 2018 at 8:29 PM Chris Olivier <
>> > cjolivier01@gmail.com>
>> > > > > > wrote:
>> > > > > >
>> > > > > > > btw, there are no vetoes on package releases:
>> > > > > > >
>> > > > > > > VOTES ON PACKAGE RELEASES
>> > > > > > > <https://www.apache.org/foundation/voting.html#ReleaseVotes>
>> > > > > > >
>> > > > > > > Votes on whether a package is ready to be released use
>> majority
>> > > > > approval
>> > > > > > > <
>> > https://www.apache.org/foundation/glossary.html#MajorityApproval>
>> > > > --
>> > > > > > i.e.
>> > > > > > > at least three PMC members must vote affirmatively for
>> release,
>> > and
>> > > > > there
>> > > > > > > must be more positive than negative votes.Releases may not be
>> > > vetoed.
>> > > > > > > Generally
>> > > > > > > the community will cancel the release vote if anyone
>> identifies
>> > > > serious
>> > > > > > > problems, but in most cases the ultimate decision, lies with
>> the
>> > > > > > individual
>> > > > > > > serving as release manager. The specifics of the process may
>> vary
>> > > > from
>> > > > > > > project to project, but the 'minimum quorum of three +1 votes'
>> > rule
>> > > > is
>> > > > > > > universal.
>> > > > > > >
>> > > > > > > On Tue, Sep 4, 2018 at 7:12 PM Sheng Zha <sz...@gmail.com>
>> > > wrote:
>> > > > > > >
>> > > > > > > > Thanks for sharing your opinions, Thomas. Your recognition
>> and
>> > > > > respect
>> > > > > > of
>> > > > > > > > people's efforts on preparing the release candidate are
>> > certainly
>> > > > > > > > appreciated.
>> > > > > > > >
>> > > > > > > > Now that the vote is set to fail thanks to the veto, there
>> will
>> > > be
>> > > > > > plenty
>> > > > > > > > of opportunities to include those bug fixes, including the
>> one
>> > > Zhi
>> > > > > > > > mentioned [1], which was already merged in the master and
>> yet
>> > > chose
>> > > > > not
>> > > > > > > to
>> > > > > > > > block this release with [2]. I will be happy to work with
>> > Roshani
>> > > > to
>> > > > > > > > prepare another release candidate once ready.
>> > > > > > > >
>> > > > > > > > -sz
>> > > > > > > >
>> > > > > > > > [1]
>> > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://lists.apache.org/thread.html/f02e952bec22c82cb00a6741390a78f55373311c97464997bb455a6c@%3Cdev.mxnet.apache.org%3E
>> > > > > > > > [2]
>> > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://lists.apache.org/thread.html/85d3fcabb3437ba7f1af455cf69aa13eb3afd1ea1d1f6f891e9c339c@%3Cdev.mxnet.apache.org%3E
>> > > > > > > >
>> > > > > > > > On Tue, Sep 4, 2018 at 6:02 PM Thomas DELTEIL <
>> > > > > > thomas.delteil1@gmail.com
>> > > > > > > >
>> > > > > > > > wrote:
>> > > > > > > >
>> > > > > > > > > -0
>> > > > > > > > > (non-binding)
>> > > > > > > > >
>> > > > > > > > > If I may add some nuancing plus a personal data point as
>> one
>> > of
>> > > > the
>> > > > > > > users
>> > > > > > > > > commenting in the bug report in question:
>> > > > > > > > >
>> > > > > > > > > - Performance vs. Basic functionality => I don't think
>> high
>> > > > > > performance
>> > > > > > > > > use-cases and basic functionality are two obviously
>> opposed
>> > > > > concepts
>> > > > > > > and
>> > > > > > > > > see no contradiction in Hagay's and Sandeep's statements.
>> > > > > > > > > Float16 support is feature of MXNet that provides more
>> than
>> > > twice
>> > > > > the
>> > > > > > > > > performance of Float32 on supported platforms, hence the
>> high
>> > > > > > > performance
>> > > > > > > > > use-case. The bug is that the basic functionality of
>> > reloading
>> > > a
>> > > > > > saved
>> > > > > > > > > float16 models is currently broken.
>> > > > > > > > >
>> > > > > > > > > - This bug vs Other bugs => Contrary the vast majority of
>> the
>> > > 140
>> > > > > > open
>> > > > > > > > bugs
>> > > > > > > > > that are mentioned above, I would put to Sandeep's credit
>> > that
>> > > > this
>> > > > > > one
>> > > > > > > > bug
>> > > > > > > > > has a PR open that provides a fix for it. This would make
>> it
>> > a
>> > > > > better
>> > > > > > > > > candidate to get included in this release than a bug that
>> has
>> > > no
>> > > > > fix
>> > > > > > > > ready
>> > > > > > > > > for it.
>> > > > > > > > >
>> > > > > > > > > - Personal datapoint: I recently did some experimentation
>> > with
>> > > > > > float16
>> > > > > > > > [1]
>> > > > > > > > > and actually coincidentally just published a video on
>> > > optimizing
>> > > > > > > > > performance for Gluon. Float16 conversion is one of the
>> most,
>> > > if
>> > > > > not
>> > > > > > > the
>> > > > > > > > > most effective way to get performance out of MXNet [2]. I
>> > > believe
>> > > > > > there
>> > > > > > > > is
>> > > > > > > > > a lot of value in publicizing more its use and hence
>> making
>> > > sure
>> > > > at
>> > > > > > > least
>> > > > > > > > > the basic support for normal use-cases is present.
>> > > > > > > > >
>> > > > > > > > > Of course this needs to be balanced with the overhead of
>> > > > preparing
>> > > > > a
>> > > > > > > new
>> > > > > > > > > release candidate once the fixed is reviewed and merged,
>> > which
>> > > > > seems
>> > > > > > to
>> > > > > > > > be
>> > > > > > > > > a lengthy and complex process in its own right, and the
>> delay
>> > > > with
>> > > > > > > > > providing the other features present in 1.3 for users that
>> > are
>> > > > not
>> > > > > > > > running
>> > > > > > > > > off the nightly builds.
>> > > > > > > > >
>> > > > > > > > > All the best,
>> > > > > > > > >
>> > > > > > > > > Thomas
>> > > > > > > > >
>> > > > > > > > > [1]
>> > > https://github.com/ThomasDelteil/PerformanceTricksMXNetGluon
>> > > > > > > > > [2]
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://www.youtube.com/watch?v=Cqo7FPftNyo&t=0s&list=PLkEvNnRk8uVk6U515Pj-jHQUxFC4eDi3m
>> > > > > > > > >
>> > > > > > > > > Le mar. 4 sept. 2018 à 17:11, Sheng Zha <
>> szha.pvg@gmail.com>
>> > a
>> > > > > > écrit :
>> > > > > > > > >
>> > > > > > > > > > Sandeep,
>> > > > > > > > > >
>> > > > > > > > > > Thanks for explaining your veto. We have open bugs that
>> > > > impacted
>> > > > > a
>> > > > > > > lot
>> > > > > > > > > more
>> > > > > > > > > > than just 3 customers, just by referring to the number
>> of
>> > > > > > commenters
>> > > > > > > on
>> > > > > > > > > the
>> > > > > > > > > > issue [1].
>> > > > > > > > > >
>> > > > > > > > > > You said that this is for "high performance use cases",
>> > which
>> > > > > > > > contradicts
>> > > > > > > > > > with Hagay's assement that this is "basic functionality
>> > > > broken".
>> > > > > > > Given
>> > > > > > > > > that
>> > > > > > > > > > this is for advanced use cases of using half-precision
>> > > > training,
>> > > > > > why
>> > > > > > > is
>> > > > > > > > > it
>> > > > > > > > > > so much more important than any other open bug reports,
>> > that
>> > > > for
>> > > > > > this
>> > > > > > > > > > specific bug fix, we have to delay the access of regular
>> > > users
>> > > > to
>> > > > > > the
>> > > > > > > > new
>> > > > > > > > > > MXNet 1.3 release by at least another week?
>> > > > > > > > > >
>> > > > > > > > > > Honestly, I'm concerned that your vote is biased by
>> Amazon
>> > > > > > > involvement,
>> > > > > > > > > > given that you quoted Amazon Rekognition.
>> > > > > > > > > >
>> > > > > > > > > > -sz
>> > > > > > > > > >
>> > > > > > > > > > [1]
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://github.com/apache/incubator-mxnet/issues?q=is%3Aissue+is%3Aopen+label%3ABug+sort%3Acomments-desc
>> > > > > > > > > >
>> > > > > > > > > > On Tue, Sep 4, 2018 at 4:51 PM sandeep krishnamurthy <
>> > > > > > > > > > sandeep.krishna98@gmail.com> wrote:
>> > > > > > > > > >
>> > > > > > > > > > > My initial vote of “-0” was due to lack of info from a
>> > user
>> > > > who
>> > > > > > had
>> > > > > > > > > said,
>> > > > > > > > > > > he overcame this issue for FP16 model.
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > However, suggested workaround [1] for the issue is not
>> > > > straight
>> > > > > > > > forward
>> > > > > > > > > > and
>> > > > > > > > > > > generally usable for all users. Also, issue is not
>> simple
>> > > and
>> > > > > > > > isolated
>> > > > > > > > > to
>> > > > > > > > > > > be listed in the Release Notes as known issue with a
>> > > > > workaround.
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > Changing my vote to: "-1 (binding)" owing to the user
>> > > impact
>> > > > > [3]
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > @Sheng:
>> > > > > > > > > > >
>> > > > > > > > > > > 1. Agreed, bug existed from long time. However, FP16
>> and
>> > > such
>> > > > > > > > > > optimizations
>> > > > > > > > > > > were added later on. Followed by users [2] using this
>> > > feature
>> > > > > for
>> > > > > > > > high
>> > > > > > > > > > > performance use cases. It is not ok to measure
>> severity
>> > of
>> > > > the
>> > > > > > bug
>> > > > > > > > > based
>> > > > > > > > > > on
>> > > > > > > > > > > its past existence, rather we can see who is impacted
>> now
>> > > and
>> > > > > is
>> > > > > > > it a
>> > > > > > > > > > small
>> > > > > > > > > > > subset with a simple workaround or large user
>> impacting
>> > > > issue.
>> > > > > > > > > > >
>> > > > > > > > > > > 2. Agreed bug was reported 7/21. However, I became
>> aware
>> > of
>> > > > > this
>> > > > > > > > issue
>> > > > > > > > > on
>> > > > > > > > > > > 08/29 and submitted the fix on 08/30. Also, I did
>> bring
>> > > this
>> > > > to
>> > > > > > the
>> > > > > > > > > > notice
>> > > > > > > > > > > of community, you and 1.3 release manager (Roshani) on
>> > the
>> > > > RC0
>> > > > > > > > proposal
>> > > > > > > > > > > thread. Also, I would focus on the issue and user
>> impact
>> > > than
>> > > > > who
>> > > > > > > > > > > identified and who is fixing the issue.
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > Based on my discussion with 2 users, I think it is a
>> > > > important
>> > > > > > > > feature
>> > > > > > > > > > for
>> > > > > > > > > > > them to see in Apache MXNet v1.3.0.
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > Best,
>> > > > > > > > > > >
>> > > > > > > > > > > Sandeep
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > [1] Workaround used by the user.
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > net_fp16 =
>> > > > > > > mx.gluon.SymbolBlock.imports('resnet34_fp16-symbol.json',
>> > > > > > > > > > > ['data'])
>> > > > > > > > > > >
>> > > > > > > > > > > params_fp16 = mx.nd.load('resnet34_fp16-0000.params')
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > for k, v in params_fp16.items():
>> > > > > > > > > > >
>> > > > > > > > > > >     new_key = k.split(':')[1]
>> > > > > > > > > > >
>> > > > > > > > > > >     net_fp16.collect_params()[new_key].cast(v.dtype)
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > net_fp16.collect_params().load('resnet34_fp16-0000.params',
>> > > > > ctx)
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > [2] Amazon Rekognition
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > [3] User story: Train a model -> Cast it to FP16 ->
>> Save
>> > > the
>> > > > > > model
>> > > > > > > ->
>> > > > > > > > > > Load
>> > > > > > > > > > > back the model does not work. They have to cast every
>> > > > parameter
>> > > > > > > with
>> > > > > > > > a
>> > > > > > > > > > > workaround mentioned above [1].
>> > > > > > > > > > >
>> > > > > > > > > > > On Tue, Sep 4, 2018 at 4:14 PM Hagay Lupesko <
>> > > > > lupesko@gmail.com>
>> > > > > > > > > wrote:
>> > > > > > > > > > >
>> > > > > > > > > > > > Hi Sheng,
>> > > > > > > > > > > >
>> > > > > > > > > > > > Addressing your questions:
>> > > > > > > > > > > >
>> > > > > > > > > > > > - "why this specific bug is more important than all
>> the
>> > > > other
>> > > > > > > known
>> > > > > > > > > > bugs,
>> > > > > > > > > > > > that this becomes a release blocker"
>> > > > > > > > > > > > I do not consider it to be more or less important
>> than
>> > > > other
>> > > > > > > fixes.
>> > > > > > > > > It
>> > > > > > > > > > > can
>> > > > > > > > > > > > be fixed and included in the release alongside the
>> rest
>> > > of
>> > > > > the
>> > > > > > > > > release
>> > > > > > > > > > > > content, right?
>> > > > > > > > > > > > From the description of the issue it seems important
>> > > since
>> > > > it
>> > > > > > is
>> > > > > > > > > > blocking
>> > > > > > > > > > > > users from loading models that were previously
>> trained
>> > > and
>> > > > > > saved.
>> > > > > > > > > There
>> > > > > > > > > > > is
>> > > > > > > > > > > > nothing stopping the community from including this
>> fix
>> > > into
>> > > > > > > 1.3.0,
>> > > > > > > > > > > > alongside the rest of the features and fixes.
>> > > > > > > > > > > >
>> > > > > > > > > > > > - "The bug exists since SymbolBlock was introduced a
>> > year
>> > > > ago
>> > > > > > and
>> > > > > > > > has
>> > > > > > > > > > > > survived at least three releases, so this is not a
>> > > > > regression."
>> > > > > > > > > > > > I do not think I said it is a regression. However,
>> the
>> > > > fact a
>> > > > > > bug
>> > > > > > > > > > existed
>> > > > > > > > > > > > before, does not mean it is OK to release it rather
>> > than
>> > > > fix
>> > > > > > it.
>> > > > > > > > > > > >
>> > > > > > > > > > > > - "Timeline-wise, this bug was reported on 7/21, but
>> > was
>> > > > not
>> > > > > > > > reported
>> > > > > > > > > > as
>> > > > > > > > > > > > release-blocker in the release discussion thread
>> until
>> > > 8/31
>> > > > > > [1].
>> > > > > > > > > > Neither
>> > > > > > > > > > > > its reporting as release-blocker nor its fix made it
>> > for
>> > > > the
>> > > > > > 8/3
>> > > > > > > > code
>> > > > > > > > > > > > freeze."
>> > > > > > > > > > > > You are right, would have been better to have this
>> > > > identified
>> > > > > > and
>> > > > > > > > > fixed
>> > > > > > > > > > > > earlier and included before code freeze.
>> > > > > > > > > > > >
>> > > > > > > > > > > > - "The PR is still not ready yet as it doesn't have
>> > > > > approval."
>> > > > > > > > > > > > I think it is waiting for your review.
>> > > > > > > > > > > >
>> > > > > > > > > > > > - "it would be great if you could provide some
>> > additional
>> > > > > > > reasoning
>> > > > > > > > > > > besides
>> > > > > > > > > > > > "X mentions the issue" or "fix was done by X""
>> > > > > > > > > > > > I have. Repeating what I wrote in my previous email
>> for
>> > > > > > clarity:
>> > > > > > > > > Basic
>> > > > > > > > > > > > functionality broken: loading a model (albeit one
>> that
>> > > that
>> > > > > was
>> > > > > > > > saved
>> > > > > > > > > > as
>> > > > > > > > > > > > non FP32)
>> > > > > > > > > > > >
>> > > > > > > > > > > > So, yes - this issue seems to have been out there
>> for a
>> > > > > while,
>> > > > > > > > > somehow
>> > > > > > > > > > > went
>> > > > > > > > > > > > under the radar... but I think the key question is
>> > > whether
>> > > > > this
>> > > > > > > > > blocks
>> > > > > > > > > > a
>> > > > > > > > > > > > basic functionality in MXNet. I believe so, hence
>> my -1
>> > > > vote.
>> > > > > > > > > > > >
>> > > > > > > > > > > > Hagay
>> > > > > > > > > > > >
>> > > > > > > > > > > > On Tue, Sep 4, 2018 at 1:19 PM Sheng Zha <
>> > > > szha.pvg@gmail.com
>> > > > > >
>> > > > > > > > wrote:
>> > > > > > > > > > > >
>> > > > > > > > > > > > > Hi Hagay and Sandeep,
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > Could you help us understand why this specific
>> bug is
>> > > > more
>> > > > > > > > > important
>> > > > > > > > > > > than
>> > > > > > > > > > > > > all the other known bugs, that this becomes a
>> release
>> > > > > > blocker?
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > Some facts to consider:
>> > > > > > > > > > > > > - The bug exists since SymbolBlock was introduced
>> a
>> > > year
>> > > > > ago
>> > > > > > > and
>> > > > > > > > > has
>> > > > > > > > > > > > > survived at least three releases, so this is not a
>> > > > > > regression.
>> > > > > > > > > > > > > - Timeline-wise, this bug was reported on 7/21,
>> but
>> > was
>> > > > not
>> > > > > > > > > reported
>> > > > > > > > > > as
>> > > > > > > > > > > > > release-blocker in the release discussion thread
>> > until
>> > > > 8/31
>> > > > > > > [1].
>> > > > > > > > > > > Neither
>> > > > > > > > > > > > > its reporting as release-blocker nor its fix made
>> it
>> > > for
>> > > > > the
>> > > > > > > 8/3
>> > > > > > > > > code
>> > > > > > > > > > > > > freeze.
>> > > > > > > > > > > > > - The PR is still not ready yet as it doesn't have
>> > > > > approval.
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > Hagay, it would be great if you could provide some
>> > > > > additional
>> > > > > > > > > > reasoning
>> > > > > > > > > > > > > besides "X mentions the issue" or "fix was done by
>> > X".
>> > > > > > Thanks.
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > -sz
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > [1]
>> > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://lists.apache.org/thread.html/d1ed611f98c20d5d85c294b0c07c8bdebca13a209cf66a3872c9123e@%3Cdev.mxnet.apache.org%3E
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > On Tue, Sep 4, 2018 at 12:39 PM Hagay Lupesko <
>> > > > > > > lupesko@gmail.com
>> > > > > > > > >
>> > > > > > > > > > > wrote:
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > > Sandeep mentions the issue of an error when user
>> > > tries
>> > > > to
>> > > > > > > load
>> > > > > > > > > > model
>> > > > > > > > > > > > > params
>> > > > > > > > > > > > > > trained/saved as FP16.
>> > > > > > > > > > > > > >
>> > > https://github.com/apache/incubator-mxnet/issues/11849
>> > > > > > > > > > > > > > The fix was done by Sandeep:
>> > > > > > > > > > > > > >
>> > https://github.com/apache/incubator-mxnet/pull/12412
>> > > > and
>> > > > > > is
>> > > > > > > > > ready
>> > > > > > > > > > to
>> > > > > > > > > > > > be
>> > > > > > > > > > > > > > cherry picked into the release branch.
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > This seems like a release blocker to me:
>> > > > > > > > > > > > > > - Basic functionality broken: loading a model
>> > (albeit
>> > > > one
>> > > > > > > that
>> > > > > > > > > that
>> > > > > > > > > > > was
>> > > > > > > > > > > > > > saved as non FP32)
>> > > > > > > > > > > > > > - Reported by 3 users (wgchang@, nicklhy@ and
>> > > > > > ThomasDelteil@
>> > > > > > > )
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > -1 (non binding)
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > Hagay
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > On Tue, Sep 4, 2018 at 12:01 PM sandeep
>> > > krishnamurthy <
>> > > > > > > > > > > > > > sandeep.krishna98@gmail.com> wrote:
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > "- 0"
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > I believe the bug #11849
>> > > > > > > > > > > > > > > <
>> > > > > https://github.com/apache/incubator-mxnet/issues/11849
>> > > > > > >,
>> > > > > > > > > unable
>> > > > > > > > > > > to
>> > > > > > > > > > > > > > import
>> > > > > > > > > > > > > > > non-fp32 models into Gluon, fixed in this PR
>> > #12412
>> > > > > > > > > > > > > > > <
>> > > > https://github.com/apache/incubator-mxnet/pull/12412>
>> > > > > > is
>> > > > > > > > > > > important
>> > > > > > > > > > > > > for
>> > > > > > > > > > > > > > > the
>> > > > > > > > > > > > > > > users. I would rather pick this fix in this
>> > release
>> > > > > than
>> > > > > > > > plan a
>> > > > > > > > > > > minor
>> > > > > > > > > > > > > > > release later.
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > Best,
>> > > > > > > > > > > > > > > Sandeep
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > On Mon, Sep 3, 2018 at 2:34 PM Philip Cho <
>> > > > > > > > > > > > chohyu01@cs.washington.edu>
>> > > > > > > > > > > > > > > wrote:
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > Actually, the command "git clone --recursive
>> > > > > > > > > > > > > > > > https://github.com/apache/incubator-mxnet
>> -b
>> > > > > > 1.3.0.rc0"
>> > > > > > > > > works
>> > > > > > > > > > > fine
>> > > > > > > > > > > > > > now,
>> > > > > > > > > > > > > > > > never mind.
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > On Mon, Sep 3, 2018 at 1:45 PM Philip Cho <
>> > > > > > > > > > > > > chohyu01@cs.washington.edu>
>> > > > > > > > > > > > > > > > wrote:
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > Unfortunately, MXNet was depending on a
>> > branch
>> > > of
>> > > > > TVM
>> > > > > > > > that
>> > > > > > > > > is
>> > > > > > > > > > > now
>> > > > > > > > > > > > > > > > deleted.
>> > > > > > > > > > > > > > > > > We will have to merge #12448
>> > > > > > > > > > > > > > > > > <
>> > > > > > https://github.com/apache/incubator-mxnet/pull/12448>
>> > > > > > > > > > before
>> > > > > > > > > > > > the
>> > > > > > > > > > > > > > > > release.
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > Background: See dmlc/tvm#1394 <
>> > > > > > > > > > > > > > https://github.com/dmlc/tvm/issues/1394
>> > > > > > > > > > > > > > > >.
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > Philip.
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > On Mon, Sep 3, 2018 at 7:26 AM Carin
>> Meier <
>> > > > > > > > > > > carinmeier@gmail.com
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > > > wrote:
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > >> Checked out the tag, built and tested the
>> > > > Clojure
>> > > > > > > > package.
>> > > > > > > > > > +1
>> > > > > > > > > > > > > > > > >>
>> > > > > > > > > > > > > > > > >> On Fri, Aug 31, 2018 at 10:59 PM Roshani
>> > > > Nagmote <
>> > > > > > > > > > > > > > > > >> roshaninagmote2@gmail.com>
>> > > > > > > > > > > > > > > > >> wrote:
>> > > > > > > > > > > > > > > > >>
>> > > > > > > > > > > > > > > > >> > Hi all,
>> > > > > > > > > > > > > > > > >> >
>> > > > > > > > > > > > > > > > >> > I would like to propose a vote to
>> release
>> > > > Apache
>> > > > > > > MXNet
>> > > > > > > > > > > > > > (incubating)
>> > > > > > > > > > > > > > > > >> version
>> > > > > > > > > > > > > > > > >> > 1.3.0.RC0. Voting will start now
>> (Friday,
>> > > Aug
>> > > > > > 31st)
>> > > > > > > > and
>> > > > > > > > > > end
>> > > > > > > > > > > at
>> > > > > > > > > > > > > > 7:00
>> > > > > > > > > > > > > > > PM
>> > > > > > > > > > > > > > > > >> > PDT, Wednesday, Sept 5th.
>> > > > > > > > > > > > > > > > >> >
>> > > > > > > > > > > > > > > > >> > Link to release notes:
>> > > > > > > > > > > > > > > > >> >
>> > > > > > https://github.com/apache/incubator-mxnet/releases
>> > > > > > > > > > > > > > > > >> >
>> > > > > > > > > > > > > > > > >> > Link to release candidate 1.3.0.rc0:
>> > > > > > > > > > > > > > > > >> > *
>> > > > > > > > > > > > >
>> > > > > > >
>> https://github.com/apache/incubator-mxnet/releases/tag/1.3.0.rc
>> > > > > > > > > > > > > > > > >> > <
>> > > > > > > > > > > > >
>> > > > > > >
>> https://github.com/apache/incubator-mxnet/releases/tag/1.3.0.rc0
>> > > > > > > > > > > > > > > >0*
>> > > > > > > > > > > > > > > > >> >
>> > > > > > > > > > > > > > > > >> > View this page, click on "Build from
>> > > Source",
>> > > > > and
>> > > > > > > use
>> > > > > > > > > the
>> > > > > > > > > > > > source
>> > > > > > > > > > > > > > > code
>> > > > > > > > > > > > > > > > >> > obtained from 1.3.0.rc0 tag:
>> > > > > > > > > > > > > > > > >> >
>> > > > > > > https://mxnet.incubator.apache.org/install/index.html
>> > > > > > > > > > > > > > > > >> >
>> > > > > > > > > > > > > > > > >> > Please remember to TEST first before
>> > voting
>> > > > > > > > accordingly:
>> > > > > > > > > > > > > > > > >> >
>> > > > > > > > > > > > > > > > >> > +1 = approve
>> > > > > > > > > > > > > > > > >> > +0 = no opinion
>> > > > > > > > > > > > > > > > >> > -1 = disapprove (provide reason)
>> > > > > > > > > > > > > > > > >> >
>> > > > > > > > > > > > > > > > >> > Thanks,
>> > > > > > > > > > > > > > > > >> > Roshani
>> > > > > > > > > > > > > > > > >> >
>> > > > > > > > > > > > > > > > >>
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > --
>> > > > > > > > > > > > > > > Sandeep Krishnamurthy
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > --
>> > > > > > > > > > > Sandeep Krishnamurthy
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > > >
>> > > > > > --
>> > > > > > Sandeep Krishnamurthy
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>

Re: [VOTE] Release MXNet version 1.3.0.RC0

Posted by kellen sunderland <ke...@gmail.com>.
-0.1

There's one test failure I've run into (details below).  Following Indhu's
logic I don't think this should block the release as it's not relating to a
release feature introduced in this version.

I'm trying to use the cpp-package examples as reference code for how to run
MXNet models from a native context. I'd like to run them with ASAN as a
sanity check for memory leaks and pointer errors.  I was continually
running into segfaults and crashes w/ and w/o ASAN.  A little googling
shows me that this issue has already been reported, and is related to
running tests on CPU, not to any changes I made:
https://github.com/apache/incubator-mxnet/issues/9814  Having what our
effectively our reference examples crash is not a good practice IMO.

I also share some concerns around the fp16 failures.  I know developers who
are currently porting their models to Gluon who use fp16.  They'll be
disappointed with the error.

In general though, release looks good.  Big thanks to Sheng and Roshani for
putting it together (and sorry for the late testing).

-Kellen


On Fri, Sep 7, 2018 at 4:31 AM Anirudh <an...@gmail.com> wrote:

> -1 Considering that using fp16 with gluon is much easier than the
> alternative where you need access to the model code, this fix is really
> useful. I understand the pain of doing mxnet release and appreciate Roshani
> and Shengs efforts, but this seems like something we should fix.
>
> On Thu, Sep 6, 2018, 4:57 PM Haibin Lin <ha...@gmail.com> wrote:
>
> > +1 built from source and passes dist_sync_kvstore test on Ubuntu.
> >
> > Best,
> > Haibin
> >
> > On Thu, Sep 6, 2018 at 1:32 PM Indhu <in...@gmail.com> wrote:
> >
> > > +1
> > >
> > > The release candidate looks good. I'm able to build and run basic
> models.
> > >
> > > One the FP16 issue:
> > >
> > > Like others have pointed out, releases on expensive in terms of time
> and
> > > effort. There needs to be a high and more objective bar on what
> qualifies
> > > as a release blocker to make sure we are not setting precedence for a
> lot
> > > of release blockers in future.
> > >
> > > I think a release blocker is justified only if there is a serious bug
> > > discovered in one of the features included in the release or if there
> is
> > a
> > > regression. Given FP16 supports is not a new feature claimed in this
> > > release and this is not a regression in this release candidate, I'm
> > > inclined to release this candidate and include the FP16 fix in a
> > subsequent
> > > release.
> > >
> > > Thanks,
> > > Indu
> > >
> > > On Wed, Sep 5, 2018 at 10:21 AM Aaron Markham <
> aaron.s.markham@gmail.com
> > >
> > > wrote:
> > >
> > > > 0 (non-binding) If we have a problem that blocks users, and a
> solution
> > in
> > > > hand... then we should fix it, but not at the expense of starting the
> > > > release cycle again just for one fix. Users can cherry pick or build
> > from
> > > > master if they want the fix right away, right? I'd change my mind to
> -1
> > > if
> > > > this wasn't the case, with good reason, and if the user impact was
> > > critical
> > > > to adoption or risks abandonment.
> > > >
> > > >
> > > > On Wed, Sep 5, 2018 at 9:57 AM Roshani Nagmote <
> > > roshaninagmote2@gmail.com>
> > > > wrote:
> > > >
> > > > > I believe everyone here is working hard to make MXNet a better
> > > framework
> > > > > for users. It's completely okay to have different opinions, we can
> > > decide
> > > > > together if this issue is a blocker or not after voting time is
> over.
> > > > >
> > > > > As I mentioned before, voting will end at 7 pm today. So there is
> > still
> > > > > time to test the release. If there are any other issues anyone
> > finds, I
> > > > > will be happy to start the process again and work on RC1. For now,
> I
> > > want
> > > > > to encourage everyone to utilize this time and vote. :)
> > > > >
> > > > > Thanks,
> > > > > Roshani
> > > > >
> > > > > On Tue, Sep 4, 2018 at 10:35 PM sandeep krishnamurthy <
> > > > > sandeep.krishna98@gmail.com> wrote:
> > > > >
> > > > > >    1. As a Apache MXNet community member, I raised the concern of
> > > > broken
> > > > > >    functionality for the user. I explained and provided the data
> > > points
> > > > > on
> > > > > > the
> > > > > >    issue, workaround and why I think it is important. If after
> all
> > > > this,
> > > > > > you
> > > > > >    think my vote is biased on my employer just because a user I
> > > quoted
> > > > is
> > > > > > from
> > > > > >    Amazon, this is more concerning to me on my voting abilities.
> > > > > >    2. My -1 no where undermines the huge amount of effort that
> goes
> > > > > behind
> > > > > >    the scene for a release to happen. Great respect and
> recognition
> > > for
> > > > > >    everyone involved in all the releases of MXNet in the past and
> > > > this. I
> > > > > >    voted on my judgement of what may be good for the users of
> > MXNet.
> > > > > >    3. As pointed by Naveen & Chris, -1 are NOT veto. Feel free to
> > > > decide
> > > > > >    and progress on the release as we already have >3 +1 in this
> > > thread.
> > > > > >
> > > > > >
> > > > > > Best,
> > > > > >
> > > > > > Sandeep
> > > > > >
> > > > > > On Tue, Sep 4, 2018 at 8:29 PM Chris Olivier <
> > cjolivier01@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > btw, there are no vetoes on package releases:
> > > > > > >
> > > > > > > VOTES ON PACKAGE RELEASES
> > > > > > > <https://www.apache.org/foundation/voting.html#ReleaseVotes>
> > > > > > >
> > > > > > > Votes on whether a package is ready to be released use majority
> > > > > approval
> > > > > > > <
> > https://www.apache.org/foundation/glossary.html#MajorityApproval>
> > > > --
> > > > > > i.e.
> > > > > > > at least three PMC members must vote affirmatively for release,
> > and
> > > > > there
> > > > > > > must be more positive than negative votes.Releases may not be
> > > vetoed.
> > > > > > > Generally
> > > > > > > the community will cancel the release vote if anyone identifies
> > > > serious
> > > > > > > problems, but in most cases the ultimate decision, lies with
> the
> > > > > > individual
> > > > > > > serving as release manager. The specifics of the process may
> vary
> > > > from
> > > > > > > project to project, but the 'minimum quorum of three +1 votes'
> > rule
> > > > is
> > > > > > > universal.
> > > > > > >
> > > > > > > On Tue, Sep 4, 2018 at 7:12 PM Sheng Zha <sz...@gmail.com>
> > > wrote:
> > > > > > >
> > > > > > > > Thanks for sharing your opinions, Thomas. Your recognition
> and
> > > > > respect
> > > > > > of
> > > > > > > > people's efforts on preparing the release candidate are
> > certainly
> > > > > > > > appreciated.
> > > > > > > >
> > > > > > > > Now that the vote is set to fail thanks to the veto, there
> will
> > > be
> > > > > > plenty
> > > > > > > > of opportunities to include those bug fixes, including the
> one
> > > Zhi
> > > > > > > > mentioned [1], which was already merged in the master and yet
> > > chose
> > > > > not
> > > > > > > to
> > > > > > > > block this release with [2]. I will be happy to work with
> > Roshani
> > > > to
> > > > > > > > prepare another release candidate once ready.
> > > > > > > >
> > > > > > > > -sz
> > > > > > > >
> > > > > > > > [1]
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://lists.apache.org/thread.html/f02e952bec22c82cb00a6741390a78f55373311c97464997bb455a6c@%3Cdev.mxnet.apache.org%3E
> > > > > > > > [2]
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://lists.apache.org/thread.html/85d3fcabb3437ba7f1af455cf69aa13eb3afd1ea1d1f6f891e9c339c@%3Cdev.mxnet.apache.org%3E
> > > > > > > >
> > > > > > > > On Tue, Sep 4, 2018 at 6:02 PM Thomas DELTEIL <
> > > > > > thomas.delteil1@gmail.com
> > > > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > -0
> > > > > > > > > (non-binding)
> > > > > > > > >
> > > > > > > > > If I may add some nuancing plus a personal data point as
> one
> > of
> > > > the
> > > > > > > users
> > > > > > > > > commenting in the bug report in question:
> > > > > > > > >
> > > > > > > > > - Performance vs. Basic functionality => I don't think high
> > > > > > performance
> > > > > > > > > use-cases and basic functionality are two obviously opposed
> > > > > concepts
> > > > > > > and
> > > > > > > > > see no contradiction in Hagay's and Sandeep's statements.
> > > > > > > > > Float16 support is feature of MXNet that provides more than
> > > twice
> > > > > the
> > > > > > > > > performance of Float32 on supported platforms, hence the
> high
> > > > > > > performance
> > > > > > > > > use-case. The bug is that the basic functionality of
> > reloading
> > > a
> > > > > > saved
> > > > > > > > > float16 models is currently broken.
> > > > > > > > >
> > > > > > > > > - This bug vs Other bugs => Contrary the vast majority of
> the
> > > 140
> > > > > > open
> > > > > > > > bugs
> > > > > > > > > that are mentioned above, I would put to Sandeep's credit
> > that
> > > > this
> > > > > > one
> > > > > > > > bug
> > > > > > > > > has a PR open that provides a fix for it. This would make
> it
> > a
> > > > > better
> > > > > > > > > candidate to get included in this release than a bug that
> has
> > > no
> > > > > fix
> > > > > > > > ready
> > > > > > > > > for it.
> > > > > > > > >
> > > > > > > > > - Personal datapoint: I recently did some experimentation
> > with
> > > > > > float16
> > > > > > > > [1]
> > > > > > > > > and actually coincidentally just published a video on
> > > optimizing
> > > > > > > > > performance for Gluon. Float16 conversion is one of the
> most,
> > > if
> > > > > not
> > > > > > > the
> > > > > > > > > most effective way to get performance out of MXNet [2]. I
> > > believe
> > > > > > there
> > > > > > > > is
> > > > > > > > > a lot of value in publicizing more its use and hence making
> > > sure
> > > > at
> > > > > > > least
> > > > > > > > > the basic support for normal use-cases is present.
> > > > > > > > >
> > > > > > > > > Of course this needs to be balanced with the overhead of
> > > > preparing
> > > > > a
> > > > > > > new
> > > > > > > > > release candidate once the fixed is reviewed and merged,
> > which
> > > > > seems
> > > > > > to
> > > > > > > > be
> > > > > > > > > a lengthy and complex process in its own right, and the
> delay
> > > > with
> > > > > > > > > providing the other features present in 1.3 for users that
> > are
> > > > not
> > > > > > > > running
> > > > > > > > > off the nightly builds.
> > > > > > > > >
> > > > > > > > > All the best,
> > > > > > > > >
> > > > > > > > > Thomas
> > > > > > > > >
> > > > > > > > > [1]
> > > https://github.com/ThomasDelteil/PerformanceTricksMXNetGluon
> > > > > > > > > [2]
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://www.youtube.com/watch?v=Cqo7FPftNyo&t=0s&list=PLkEvNnRk8uVk6U515Pj-jHQUxFC4eDi3m
> > > > > > > > >
> > > > > > > > > Le mar. 4 sept. 2018 à 17:11, Sheng Zha <
> szha.pvg@gmail.com>
> > a
> > > > > > écrit :
> > > > > > > > >
> > > > > > > > > > Sandeep,
> > > > > > > > > >
> > > > > > > > > > Thanks for explaining your veto. We have open bugs that
> > > > impacted
> > > > > a
> > > > > > > lot
> > > > > > > > > more
> > > > > > > > > > than just 3 customers, just by referring to the number of
> > > > > > commenters
> > > > > > > on
> > > > > > > > > the
> > > > > > > > > > issue [1].
> > > > > > > > > >
> > > > > > > > > > You said that this is for "high performance use cases",
> > which
> > > > > > > > contradicts
> > > > > > > > > > with Hagay's assement that this is "basic functionality
> > > > broken".
> > > > > > > Given
> > > > > > > > > that
> > > > > > > > > > this is for advanced use cases of using half-precision
> > > > training,
> > > > > > why
> > > > > > > is
> > > > > > > > > it
> > > > > > > > > > so much more important than any other open bug reports,
> > that
> > > > for
> > > > > > this
> > > > > > > > > > specific bug fix, we have to delay the access of regular
> > > users
> > > > to
> > > > > > the
> > > > > > > > new
> > > > > > > > > > MXNet 1.3 release by at least another week?
> > > > > > > > > >
> > > > > > > > > > Honestly, I'm concerned that your vote is biased by
> Amazon
> > > > > > > involvement,
> > > > > > > > > > given that you quoted Amazon Rekognition.
> > > > > > > > > >
> > > > > > > > > > -sz
> > > > > > > > > >
> > > > > > > > > > [1]
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/incubator-mxnet/issues?q=is%3Aissue+is%3Aopen+label%3ABug+sort%3Acomments-desc
> > > > > > > > > >
> > > > > > > > > > On Tue, Sep 4, 2018 at 4:51 PM sandeep krishnamurthy <
> > > > > > > > > > sandeep.krishna98@gmail.com> wrote:
> > > > > > > > > >
> > > > > > > > > > > My initial vote of “-0” was due to lack of info from a
> > user
> > > > who
> > > > > > had
> > > > > > > > > said,
> > > > > > > > > > > he overcame this issue for FP16 model.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > However, suggested workaround [1] for the issue is not
> > > > straight
> > > > > > > > forward
> > > > > > > > > > and
> > > > > > > > > > > generally usable for all users. Also, issue is not
> simple
> > > and
> > > > > > > > isolated
> > > > > > > > > to
> > > > > > > > > > > be listed in the Release Notes as known issue with a
> > > > > workaround.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Changing my vote to: "-1 (binding)" owing to the user
> > > impact
> > > > > [3]
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > @Sheng:
> > > > > > > > > > >
> > > > > > > > > > > 1. Agreed, bug existed from long time. However, FP16
> and
> > > such
> > > > > > > > > > optimizations
> > > > > > > > > > > were added later on. Followed by users [2] using this
> > > feature
> > > > > for
> > > > > > > > high
> > > > > > > > > > > performance use cases. It is not ok to measure severity
> > of
> > > > the
> > > > > > bug
> > > > > > > > > based
> > > > > > > > > > on
> > > > > > > > > > > its past existence, rather we can see who is impacted
> now
> > > and
> > > > > is
> > > > > > > it a
> > > > > > > > > > small
> > > > > > > > > > > subset with a simple workaround or large user impacting
> > > > issue.
> > > > > > > > > > >
> > > > > > > > > > > 2. Agreed bug was reported 7/21. However, I became
> aware
> > of
> > > > > this
> > > > > > > > issue
> > > > > > > > > on
> > > > > > > > > > > 08/29 and submitted the fix on 08/30. Also, I did bring
> > > this
> > > > to
> > > > > > the
> > > > > > > > > > notice
> > > > > > > > > > > of community, you and 1.3 release manager (Roshani) on
> > the
> > > > RC0
> > > > > > > > proposal
> > > > > > > > > > > thread. Also, I would focus on the issue and user
> impact
> > > than
> > > > > who
> > > > > > > > > > > identified and who is fixing the issue.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Based on my discussion with 2 users, I think it is a
> > > > important
> > > > > > > > feature
> > > > > > > > > > for
> > > > > > > > > > > them to see in Apache MXNet v1.3.0.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Best,
> > > > > > > > > > >
> > > > > > > > > > > Sandeep
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > [1] Workaround used by the user.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > net_fp16 =
> > > > > > > mx.gluon.SymbolBlock.imports('resnet34_fp16-symbol.json',
> > > > > > > > > > > ['data'])
> > > > > > > > > > >
> > > > > > > > > > > params_fp16 = mx.nd.load('resnet34_fp16-0000.params')
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > for k, v in params_fp16.items():
> > > > > > > > > > >
> > > > > > > > > > >     new_key = k.split(':')[1]
> > > > > > > > > > >
> > > > > > > > > > >     net_fp16.collect_params()[new_key].cast(v.dtype)
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > net_fp16.collect_params().load('resnet34_fp16-0000.params',
> > > > > ctx)
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > [2] Amazon Rekognition
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > [3] User story: Train a model -> Cast it to FP16 ->
> Save
> > > the
> > > > > > model
> > > > > > > ->
> > > > > > > > > > Load
> > > > > > > > > > > back the model does not work. They have to cast every
> > > > parameter
> > > > > > > with
> > > > > > > > a
> > > > > > > > > > > workaround mentioned above [1].
> > > > > > > > > > >
> > > > > > > > > > > On Tue, Sep 4, 2018 at 4:14 PM Hagay Lupesko <
> > > > > lupesko@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Hi Sheng,
> > > > > > > > > > > >
> > > > > > > > > > > > Addressing your questions:
> > > > > > > > > > > >
> > > > > > > > > > > > - "why this specific bug is more important than all
> the
> > > > other
> > > > > > > known
> > > > > > > > > > bugs,
> > > > > > > > > > > > that this becomes a release blocker"
> > > > > > > > > > > > I do not consider it to be more or less important
> than
> > > > other
> > > > > > > fixes.
> > > > > > > > > It
> > > > > > > > > > > can
> > > > > > > > > > > > be fixed and included in the release alongside the
> rest
> > > of
> > > > > the
> > > > > > > > > release
> > > > > > > > > > > > content, right?
> > > > > > > > > > > > From the description of the issue it seems important
> > > since
> > > > it
> > > > > > is
> > > > > > > > > > blocking
> > > > > > > > > > > > users from loading models that were previously
> trained
> > > and
> > > > > > saved.
> > > > > > > > > There
> > > > > > > > > > > is
> > > > > > > > > > > > nothing stopping the community from including this
> fix
> > > into
> > > > > > > 1.3.0,
> > > > > > > > > > > > alongside the rest of the features and fixes.
> > > > > > > > > > > >
> > > > > > > > > > > > - "The bug exists since SymbolBlock was introduced a
> > year
> > > > ago
> > > > > > and
> > > > > > > > has
> > > > > > > > > > > > survived at least three releases, so this is not a
> > > > > regression."
> > > > > > > > > > > > I do not think I said it is a regression. However,
> the
> > > > fact a
> > > > > > bug
> > > > > > > > > > existed
> > > > > > > > > > > > before, does not mean it is OK to release it rather
> > than
> > > > fix
> > > > > > it.
> > > > > > > > > > > >
> > > > > > > > > > > > - "Timeline-wise, this bug was reported on 7/21, but
> > was
> > > > not
> > > > > > > > reported
> > > > > > > > > > as
> > > > > > > > > > > > release-blocker in the release discussion thread
> until
> > > 8/31
> > > > > > [1].
> > > > > > > > > > Neither
> > > > > > > > > > > > its reporting as release-blocker nor its fix made it
> > for
> > > > the
> > > > > > 8/3
> > > > > > > > code
> > > > > > > > > > > > freeze."
> > > > > > > > > > > > You are right, would have been better to have this
> > > > identified
> > > > > > and
> > > > > > > > > fixed
> > > > > > > > > > > > earlier and included before code freeze.
> > > > > > > > > > > >
> > > > > > > > > > > > - "The PR is still not ready yet as it doesn't have
> > > > > approval."
> > > > > > > > > > > > I think it is waiting for your review.
> > > > > > > > > > > >
> > > > > > > > > > > > - "it would be great if you could provide some
> > additional
> > > > > > > reasoning
> > > > > > > > > > > besides
> > > > > > > > > > > > "X mentions the issue" or "fix was done by X""
> > > > > > > > > > > > I have. Repeating what I wrote in my previous email
> for
> > > > > > clarity:
> > > > > > > > > Basic
> > > > > > > > > > > > functionality broken: loading a model (albeit one
> that
> > > that
> > > > > was
> > > > > > > > saved
> > > > > > > > > > as
> > > > > > > > > > > > non FP32)
> > > > > > > > > > > >
> > > > > > > > > > > > So, yes - this issue seems to have been out there
> for a
> > > > > while,
> > > > > > > > > somehow
> > > > > > > > > > > went
> > > > > > > > > > > > under the radar... but I think the key question is
> > > whether
> > > > > this
> > > > > > > > > blocks
> > > > > > > > > > a
> > > > > > > > > > > > basic functionality in MXNet. I believe so, hence my
> -1
> > > > vote.
> > > > > > > > > > > >
> > > > > > > > > > > > Hagay
> > > > > > > > > > > >
> > > > > > > > > > > > On Tue, Sep 4, 2018 at 1:19 PM Sheng Zha <
> > > > szha.pvg@gmail.com
> > > > > >
> > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Hi Hagay and Sandeep,
> > > > > > > > > > > > >
> > > > > > > > > > > > > Could you help us understand why this specific bug
> is
> > > > more
> > > > > > > > > important
> > > > > > > > > > > than
> > > > > > > > > > > > > all the other known bugs, that this becomes a
> release
> > > > > > blocker?
> > > > > > > > > > > > >
> > > > > > > > > > > > > Some facts to consider:
> > > > > > > > > > > > > - The bug exists since SymbolBlock was introduced a
> > > year
> > > > > ago
> > > > > > > and
> > > > > > > > > has
> > > > > > > > > > > > > survived at least three releases, so this is not a
> > > > > > regression.
> > > > > > > > > > > > > - Timeline-wise, this bug was reported on 7/21, but
> > was
> > > > not
> > > > > > > > > reported
> > > > > > > > > > as
> > > > > > > > > > > > > release-blocker in the release discussion thread
> > until
> > > > 8/31
> > > > > > > [1].
> > > > > > > > > > > Neither
> > > > > > > > > > > > > its reporting as release-blocker nor its fix made
> it
> > > for
> > > > > the
> > > > > > > 8/3
> > > > > > > > > code
> > > > > > > > > > > > > freeze.
> > > > > > > > > > > > > - The PR is still not ready yet as it doesn't have
> > > > > approval.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Hagay, it would be great if you could provide some
> > > > > additional
> > > > > > > > > > reasoning
> > > > > > > > > > > > > besides "X mentions the issue" or "fix was done by
> > X".
> > > > > > Thanks.
> > > > > > > > > > > > >
> > > > > > > > > > > > > -sz
> > > > > > > > > > > > >
> > > > > > > > > > > > > [1]
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://lists.apache.org/thread.html/d1ed611f98c20d5d85c294b0c07c8bdebca13a209cf66a3872c9123e@%3Cdev.mxnet.apache.org%3E
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Tue, Sep 4, 2018 at 12:39 PM Hagay Lupesko <
> > > > > > > lupesko@gmail.com
> > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Sandeep mentions the issue of an error when user
> > > tries
> > > > to
> > > > > > > load
> > > > > > > > > > model
> > > > > > > > > > > > > params
> > > > > > > > > > > > > > trained/saved as FP16.
> > > > > > > > > > > > > >
> > > https://github.com/apache/incubator-mxnet/issues/11849
> > > > > > > > > > > > > > The fix was done by Sandeep:
> > > > > > > > > > > > > >
> > https://github.com/apache/incubator-mxnet/pull/12412
> > > > and
> > > > > > is
> > > > > > > > > ready
> > > > > > > > > > to
> > > > > > > > > > > > be
> > > > > > > > > > > > > > cherry picked into the release branch.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > This seems like a release blocker to me:
> > > > > > > > > > > > > > - Basic functionality broken: loading a model
> > (albeit
> > > > one
> > > > > > > that
> > > > > > > > > that
> > > > > > > > > > > was
> > > > > > > > > > > > > > saved as non FP32)
> > > > > > > > > > > > > > - Reported by 3 users (wgchang@, nicklhy@ and
> > > > > > ThomasDelteil@
> > > > > > > )
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > -1 (non binding)
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Hagay
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Tue, Sep 4, 2018 at 12:01 PM sandeep
> > > krishnamurthy <
> > > > > > > > > > > > > > sandeep.krishna98@gmail.com> wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > "- 0"
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I believe the bug #11849
> > > > > > > > > > > > > > > <
> > > > > https://github.com/apache/incubator-mxnet/issues/11849
> > > > > > >,
> > > > > > > > > unable
> > > > > > > > > > > to
> > > > > > > > > > > > > > import
> > > > > > > > > > > > > > > non-fp32 models into Gluon, fixed in this PR
> > #12412
> > > > > > > > > > > > > > > <
> > > > https://github.com/apache/incubator-mxnet/pull/12412>
> > > > > > is
> > > > > > > > > > > important
> > > > > > > > > > > > > for
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > users. I would rather pick this fix in this
> > release
> > > > > than
> > > > > > > > plan a
> > > > > > > > > > > minor
> > > > > > > > > > > > > > > release later.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > Sandeep
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Mon, Sep 3, 2018 at 2:34 PM Philip Cho <
> > > > > > > > > > > > chohyu01@cs.washington.edu>
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Actually, the command "git clone --recursive
> > > > > > > > > > > > > > > > https://github.com/apache/incubator-mxnet -b
> > > > > > 1.3.0.rc0"
> > > > > > > > > works
> > > > > > > > > > > fine
> > > > > > > > > > > > > > now,
> > > > > > > > > > > > > > > > never mind.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Mon, Sep 3, 2018 at 1:45 PM Philip Cho <
> > > > > > > > > > > > > chohyu01@cs.washington.edu>
> > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Unfortunately, MXNet was depending on a
> > branch
> > > of
> > > > > TVM
> > > > > > > > that
> > > > > > > > > is
> > > > > > > > > > > now
> > > > > > > > > > > > > > > > deleted.
> > > > > > > > > > > > > > > > > We will have to merge #12448
> > > > > > > > > > > > > > > > > <
> > > > > > https://github.com/apache/incubator-mxnet/pull/12448>
> > > > > > > > > > before
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > > release.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Background: See dmlc/tvm#1394 <
> > > > > > > > > > > > > > https://github.com/dmlc/tvm/issues/1394
> > > > > > > > > > > > > > > >.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Philip.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Mon, Sep 3, 2018 at 7:26 AM Carin Meier
> <
> > > > > > > > > > > carinmeier@gmail.com
> > > > > > > > > > > > >
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >> Checked out the tag, built and tested the
> > > > Clojure
> > > > > > > > package.
> > > > > > > > > > +1
> > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > >> On Fri, Aug 31, 2018 at 10:59 PM Roshani
> > > > Nagmote <
> > > > > > > > > > > > > > > > >> roshaninagmote2@gmail.com>
> > > > > > > > > > > > > > > > >> wrote:
> > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > >> > Hi all,
> > > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > > >> > I would like to propose a vote to
> release
> > > > Apache
> > > > > > > MXNet
> > > > > > > > > > > > > > (incubating)
> > > > > > > > > > > > > > > > >> version
> > > > > > > > > > > > > > > > >> > 1.3.0.RC0. Voting will start now
> (Friday,
> > > Aug
> > > > > > 31st)
> > > > > > > > and
> > > > > > > > > > end
> > > > > > > > > > > at
> > > > > > > > > > > > > > 7:00
> > > > > > > > > > > > > > > PM
> > > > > > > > > > > > > > > > >> > PDT, Wednesday, Sept 5th.
> > > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > > >> > Link to release notes:
> > > > > > > > > > > > > > > > >> >
> > > > > > https://github.com/apache/incubator-mxnet/releases
> > > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > > >> > Link to release candidate 1.3.0.rc0:
> > > > > > > > > > > > > > > > >> > *
> > > > > > > > > > > > >
> > > > > > >
> https://github.com/apache/incubator-mxnet/releases/tag/1.3.0.rc
> > > > > > > > > > > > > > > > >> > <
> > > > > > > > > > > > >
> > > > > > >
> https://github.com/apache/incubator-mxnet/releases/tag/1.3.0.rc0
> > > > > > > > > > > > > > > >0*
> > > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > > >> > View this page, click on "Build from
> > > Source",
> > > > > and
> > > > > > > use
> > > > > > > > > the
> > > > > > > > > > > > source
> > > > > > > > > > > > > > > code
> > > > > > > > > > > > > > > > >> > obtained from 1.3.0.rc0 tag:
> > > > > > > > > > > > > > > > >> >
> > > > > > > https://mxnet.incubator.apache.org/install/index.html
> > > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > > >> > Please remember to TEST first before
> > voting
> > > > > > > > accordingly:
> > > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > > >> > +1 = approve
> > > > > > > > > > > > > > > > >> > +0 = no opinion
> > > > > > > > > > > > > > > > >> > -1 = disapprove (provide reason)
> > > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > > >> > Thanks,
> > > > > > > > > > > > > > > > >> > Roshani
> > > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > --
> > > > > > > > > > > > > > > Sandeep Krishnamurthy
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > --
> > > > > > > > > > > Sandeep Krishnamurthy
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Sandeep Krishnamurthy
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [VOTE] Release MXNet version 1.3.0.RC0

Posted by Anirudh <an...@gmail.com>.
-1 Considering that using fp16 with gluon is much easier than the
alternative where you need access to the model code, this fix is really
useful. I understand the pain of doing mxnet release and appreciate Roshani
and Shengs efforts, but this seems like something we should fix.

On Thu, Sep 6, 2018, 4:57 PM Haibin Lin <ha...@gmail.com> wrote:

> +1 built from source and passes dist_sync_kvstore test on Ubuntu.
>
> Best,
> Haibin
>
> On Thu, Sep 6, 2018 at 1:32 PM Indhu <in...@gmail.com> wrote:
>
> > +1
> >
> > The release candidate looks good. I'm able to build and run basic models.
> >
> > One the FP16 issue:
> >
> > Like others have pointed out, releases on expensive in terms of time and
> > effort. There needs to be a high and more objective bar on what qualifies
> > as a release blocker to make sure we are not setting precedence for a lot
> > of release blockers in future.
> >
> > I think a release blocker is justified only if there is a serious bug
> > discovered in one of the features included in the release or if there is
> a
> > regression. Given FP16 supports is not a new feature claimed in this
> > release and this is not a regression in this release candidate, I'm
> > inclined to release this candidate and include the FP16 fix in a
> subsequent
> > release.
> >
> > Thanks,
> > Indu
> >
> > On Wed, Sep 5, 2018 at 10:21 AM Aaron Markham <aaron.s.markham@gmail.com
> >
> > wrote:
> >
> > > 0 (non-binding) If we have a problem that blocks users, and a solution
> in
> > > hand... then we should fix it, but not at the expense of starting the
> > > release cycle again just for one fix. Users can cherry pick or build
> from
> > > master if they want the fix right away, right? I'd change my mind to -1
> > if
> > > this wasn't the case, with good reason, and if the user impact was
> > critical
> > > to adoption or risks abandonment.
> > >
> > >
> > > On Wed, Sep 5, 2018 at 9:57 AM Roshani Nagmote <
> > roshaninagmote2@gmail.com>
> > > wrote:
> > >
> > > > I believe everyone here is working hard to make MXNet a better
> > framework
> > > > for users. It's completely okay to have different opinions, we can
> > decide
> > > > together if this issue is a blocker or not after voting time is over.
> > > >
> > > > As I mentioned before, voting will end at 7 pm today. So there is
> still
> > > > time to test the release. If there are any other issues anyone
> finds, I
> > > > will be happy to start the process again and work on RC1. For now, I
> > want
> > > > to encourage everyone to utilize this time and vote. :)
> > > >
> > > > Thanks,
> > > > Roshani
> > > >
> > > > On Tue, Sep 4, 2018 at 10:35 PM sandeep krishnamurthy <
> > > > sandeep.krishna98@gmail.com> wrote:
> > > >
> > > > >    1. As a Apache MXNet community member, I raised the concern of
> > > broken
> > > > >    functionality for the user. I explained and provided the data
> > points
> > > > on
> > > > > the
> > > > >    issue, workaround and why I think it is important. If after all
> > > this,
> > > > > you
> > > > >    think my vote is biased on my employer just because a user I
> > quoted
> > > is
> > > > > from
> > > > >    Amazon, this is more concerning to me on my voting abilities.
> > > > >    2. My -1 no where undermines the huge amount of effort that goes
> > > > behind
> > > > >    the scene for a release to happen. Great respect and recognition
> > for
> > > > >    everyone involved in all the releases of MXNet in the past and
> > > this. I
> > > > >    voted on my judgement of what may be good for the users of
> MXNet.
> > > > >    3. As pointed by Naveen & Chris, -1 are NOT veto. Feel free to
> > > decide
> > > > >    and progress on the release as we already have >3 +1 in this
> > thread.
> > > > >
> > > > >
> > > > > Best,
> > > > >
> > > > > Sandeep
> > > > >
> > > > > On Tue, Sep 4, 2018 at 8:29 PM Chris Olivier <
> cjolivier01@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > btw, there are no vetoes on package releases:
> > > > > >
> > > > > > VOTES ON PACKAGE RELEASES
> > > > > > <https://www.apache.org/foundation/voting.html#ReleaseVotes>
> > > > > >
> > > > > > Votes on whether a package is ready to be released use majority
> > > > approval
> > > > > > <
> https://www.apache.org/foundation/glossary.html#MajorityApproval>
> > > --
> > > > > i.e.
> > > > > > at least three PMC members must vote affirmatively for release,
> and
> > > > there
> > > > > > must be more positive than negative votes.Releases may not be
> > vetoed.
> > > > > > Generally
> > > > > > the community will cancel the release vote if anyone identifies
> > > serious
> > > > > > problems, but in most cases the ultimate decision, lies with the
> > > > > individual
> > > > > > serving as release manager. The specifics of the process may vary
> > > from
> > > > > > project to project, but the 'minimum quorum of three +1 votes'
> rule
> > > is
> > > > > > universal.
> > > > > >
> > > > > > On Tue, Sep 4, 2018 at 7:12 PM Sheng Zha <sz...@gmail.com>
> > wrote:
> > > > > >
> > > > > > > Thanks for sharing your opinions, Thomas. Your recognition and
> > > > respect
> > > > > of
> > > > > > > people's efforts on preparing the release candidate are
> certainly
> > > > > > > appreciated.
> > > > > > >
> > > > > > > Now that the vote is set to fail thanks to the veto, there will
> > be
> > > > > plenty
> > > > > > > of opportunities to include those bug fixes, including the one
> > Zhi
> > > > > > > mentioned [1], which was already merged in the master and yet
> > chose
> > > > not
> > > > > > to
> > > > > > > block this release with [2]. I will be happy to work with
> Roshani
> > > to
> > > > > > > prepare another release candidate once ready.
> > > > > > >
> > > > > > > -sz
> > > > > > >
> > > > > > > [1]
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://lists.apache.org/thread.html/f02e952bec22c82cb00a6741390a78f55373311c97464997bb455a6c@%3Cdev.mxnet.apache.org%3E
> > > > > > > [2]
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://lists.apache.org/thread.html/85d3fcabb3437ba7f1af455cf69aa13eb3afd1ea1d1f6f891e9c339c@%3Cdev.mxnet.apache.org%3E
> > > > > > >
> > > > > > > On Tue, Sep 4, 2018 at 6:02 PM Thomas DELTEIL <
> > > > > thomas.delteil1@gmail.com
> > > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > -0
> > > > > > > > (non-binding)
> > > > > > > >
> > > > > > > > If I may add some nuancing plus a personal data point as one
> of
> > > the
> > > > > > users
> > > > > > > > commenting in the bug report in question:
> > > > > > > >
> > > > > > > > - Performance vs. Basic functionality => I don't think high
> > > > > performance
> > > > > > > > use-cases and basic functionality are two obviously opposed
> > > > concepts
> > > > > > and
> > > > > > > > see no contradiction in Hagay's and Sandeep's statements.
> > > > > > > > Float16 support is feature of MXNet that provides more than
> > twice
> > > > the
> > > > > > > > performance of Float32 on supported platforms, hence the high
> > > > > > performance
> > > > > > > > use-case. The bug is that the basic functionality of
> reloading
> > a
> > > > > saved
> > > > > > > > float16 models is currently broken.
> > > > > > > >
> > > > > > > > - This bug vs Other bugs => Contrary the vast majority of the
> > 140
> > > > > open
> > > > > > > bugs
> > > > > > > > that are mentioned above, I would put to Sandeep's credit
> that
> > > this
> > > > > one
> > > > > > > bug
> > > > > > > > has a PR open that provides a fix for it. This would make it
> a
> > > > better
> > > > > > > > candidate to get included in this release than a bug that has
> > no
> > > > fix
> > > > > > > ready
> > > > > > > > for it.
> > > > > > > >
> > > > > > > > - Personal datapoint: I recently did some experimentation
> with
> > > > > float16
> > > > > > > [1]
> > > > > > > > and actually coincidentally just published a video on
> > optimizing
> > > > > > > > performance for Gluon. Float16 conversion is one of the most,
> > if
> > > > not
> > > > > > the
> > > > > > > > most effective way to get performance out of MXNet [2]. I
> > believe
> > > > > there
> > > > > > > is
> > > > > > > > a lot of value in publicizing more its use and hence making
> > sure
> > > at
> > > > > > least
> > > > > > > > the basic support for normal use-cases is present.
> > > > > > > >
> > > > > > > > Of course this needs to be balanced with the overhead of
> > > preparing
> > > > a
> > > > > > new
> > > > > > > > release candidate once the fixed is reviewed and merged,
> which
> > > > seems
> > > > > to
> > > > > > > be
> > > > > > > > a lengthy and complex process in its own right, and the delay
> > > with
> > > > > > > > providing the other features present in 1.3 for users that
> are
> > > not
> > > > > > > running
> > > > > > > > off the nightly builds.
> > > > > > > >
> > > > > > > > All the best,
> > > > > > > >
> > > > > > > > Thomas
> > > > > > > >
> > > > > > > > [1]
> > https://github.com/ThomasDelteil/PerformanceTricksMXNetGluon
> > > > > > > > [2]
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://www.youtube.com/watch?v=Cqo7FPftNyo&t=0s&list=PLkEvNnRk8uVk6U515Pj-jHQUxFC4eDi3m
> > > > > > > >
> > > > > > > > Le mar. 4 sept. 2018 à 17:11, Sheng Zha <sz...@gmail.com>
> a
> > > > > écrit :
> > > > > > > >
> > > > > > > > > Sandeep,
> > > > > > > > >
> > > > > > > > > Thanks for explaining your veto. We have open bugs that
> > > impacted
> > > > a
> > > > > > lot
> > > > > > > > more
> > > > > > > > > than just 3 customers, just by referring to the number of
> > > > > commenters
> > > > > > on
> > > > > > > > the
> > > > > > > > > issue [1].
> > > > > > > > >
> > > > > > > > > You said that this is for "high performance use cases",
> which
> > > > > > > contradicts
> > > > > > > > > with Hagay's assement that this is "basic functionality
> > > broken".
> > > > > > Given
> > > > > > > > that
> > > > > > > > > this is for advanced use cases of using half-precision
> > > training,
> > > > > why
> > > > > > is
> > > > > > > > it
> > > > > > > > > so much more important than any other open bug reports,
> that
> > > for
> > > > > this
> > > > > > > > > specific bug fix, we have to delay the access of regular
> > users
> > > to
> > > > > the
> > > > > > > new
> > > > > > > > > MXNet 1.3 release by at least another week?
> > > > > > > > >
> > > > > > > > > Honestly, I'm concerned that your vote is biased by Amazon
> > > > > > involvement,
> > > > > > > > > given that you quoted Amazon Rekognition.
> > > > > > > > >
> > > > > > > > > -sz
> > > > > > > > >
> > > > > > > > > [1]
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/incubator-mxnet/issues?q=is%3Aissue+is%3Aopen+label%3ABug+sort%3Acomments-desc
> > > > > > > > >
> > > > > > > > > On Tue, Sep 4, 2018 at 4:51 PM sandeep krishnamurthy <
> > > > > > > > > sandeep.krishna98@gmail.com> wrote:
> > > > > > > > >
> > > > > > > > > > My initial vote of “-0” was due to lack of info from a
> user
> > > who
> > > > > had
> > > > > > > > said,
> > > > > > > > > > he overcame this issue for FP16 model.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > However, suggested workaround [1] for the issue is not
> > > straight
> > > > > > > forward
> > > > > > > > > and
> > > > > > > > > > generally usable for all users. Also, issue is not simple
> > and
> > > > > > > isolated
> > > > > > > > to
> > > > > > > > > > be listed in the Release Notes as known issue with a
> > > > workaround.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Changing my vote to: "-1 (binding)" owing to the user
> > impact
> > > > [3]
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > @Sheng:
> > > > > > > > > >
> > > > > > > > > > 1. Agreed, bug existed from long time. However, FP16 and
> > such
> > > > > > > > > optimizations
> > > > > > > > > > were added later on. Followed by users [2] using this
> > feature
> > > > for
> > > > > > > high
> > > > > > > > > > performance use cases. It is not ok to measure severity
> of
> > > the
> > > > > bug
> > > > > > > > based
> > > > > > > > > on
> > > > > > > > > > its past existence, rather we can see who is impacted now
> > and
> > > > is
> > > > > > it a
> > > > > > > > > small
> > > > > > > > > > subset with a simple workaround or large user impacting
> > > issue.
> > > > > > > > > >
> > > > > > > > > > 2. Agreed bug was reported 7/21. However, I became aware
> of
> > > > this
> > > > > > > issue
> > > > > > > > on
> > > > > > > > > > 08/29 and submitted the fix on 08/30. Also, I did bring
> > this
> > > to
> > > > > the
> > > > > > > > > notice
> > > > > > > > > > of community, you and 1.3 release manager (Roshani) on
> the
> > > RC0
> > > > > > > proposal
> > > > > > > > > > thread. Also, I would focus on the issue and user impact
> > than
> > > > who
> > > > > > > > > > identified and who is fixing the issue.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Based on my discussion with 2 users, I think it is a
> > > important
> > > > > > > feature
> > > > > > > > > for
> > > > > > > > > > them to see in Apache MXNet v1.3.0.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Best,
> > > > > > > > > >
> > > > > > > > > > Sandeep
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > [1] Workaround used by the user.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > net_fp16 =
> > > > > > mx.gluon.SymbolBlock.imports('resnet34_fp16-symbol.json',
> > > > > > > > > > ['data'])
> > > > > > > > > >
> > > > > > > > > > params_fp16 = mx.nd.load('resnet34_fp16-0000.params')
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > for k, v in params_fp16.items():
> > > > > > > > > >
> > > > > > > > > >     new_key = k.split(':')[1]
> > > > > > > > > >
> > > > > > > > > >     net_fp16.collect_params()[new_key].cast(v.dtype)
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> net_fp16.collect_params().load('resnet34_fp16-0000.params',
> > > > ctx)
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > [2] Amazon Rekognition
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > [3] User story: Train a model -> Cast it to FP16 -> Save
> > the
> > > > > model
> > > > > > ->
> > > > > > > > > Load
> > > > > > > > > > back the model does not work. They have to cast every
> > > parameter
> > > > > > with
> > > > > > > a
> > > > > > > > > > workaround mentioned above [1].
> > > > > > > > > >
> > > > > > > > > > On Tue, Sep 4, 2018 at 4:14 PM Hagay Lupesko <
> > > > lupesko@gmail.com>
> > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi Sheng,
> > > > > > > > > > >
> > > > > > > > > > > Addressing your questions:
> > > > > > > > > > >
> > > > > > > > > > > - "why this specific bug is more important than all the
> > > other
> > > > > > known
> > > > > > > > > bugs,
> > > > > > > > > > > that this becomes a release blocker"
> > > > > > > > > > > I do not consider it to be more or less important than
> > > other
> > > > > > fixes.
> > > > > > > > It
> > > > > > > > > > can
> > > > > > > > > > > be fixed and included in the release alongside the rest
> > of
> > > > the
> > > > > > > > release
> > > > > > > > > > > content, right?
> > > > > > > > > > > From the description of the issue it seems important
> > since
> > > it
> > > > > is
> > > > > > > > > blocking
> > > > > > > > > > > users from loading models that were previously trained
> > and
> > > > > saved.
> > > > > > > > There
> > > > > > > > > > is
> > > > > > > > > > > nothing stopping the community from including this fix
> > into
> > > > > > 1.3.0,
> > > > > > > > > > > alongside the rest of the features and fixes.
> > > > > > > > > > >
> > > > > > > > > > > - "The bug exists since SymbolBlock was introduced a
> year
> > > ago
> > > > > and
> > > > > > > has
> > > > > > > > > > > survived at least three releases, so this is not a
> > > > regression."
> > > > > > > > > > > I do not think I said it is a regression. However, the
> > > fact a
> > > > > bug
> > > > > > > > > existed
> > > > > > > > > > > before, does not mean it is OK to release it rather
> than
> > > fix
> > > > > it.
> > > > > > > > > > >
> > > > > > > > > > > - "Timeline-wise, this bug was reported on 7/21, but
> was
> > > not
> > > > > > > reported
> > > > > > > > > as
> > > > > > > > > > > release-blocker in the release discussion thread until
> > 8/31
> > > > > [1].
> > > > > > > > > Neither
> > > > > > > > > > > its reporting as release-blocker nor its fix made it
> for
> > > the
> > > > > 8/3
> > > > > > > code
> > > > > > > > > > > freeze."
> > > > > > > > > > > You are right, would have been better to have this
> > > identified
> > > > > and
> > > > > > > > fixed
> > > > > > > > > > > earlier and included before code freeze.
> > > > > > > > > > >
> > > > > > > > > > > - "The PR is still not ready yet as it doesn't have
> > > > approval."
> > > > > > > > > > > I think it is waiting for your review.
> > > > > > > > > > >
> > > > > > > > > > > - "it would be great if you could provide some
> additional
> > > > > > reasoning
> > > > > > > > > > besides
> > > > > > > > > > > "X mentions the issue" or "fix was done by X""
> > > > > > > > > > > I have. Repeating what I wrote in my previous email for
> > > > > clarity:
> > > > > > > > Basic
> > > > > > > > > > > functionality broken: loading a model (albeit one that
> > that
> > > > was
> > > > > > > saved
> > > > > > > > > as
> > > > > > > > > > > non FP32)
> > > > > > > > > > >
> > > > > > > > > > > So, yes - this issue seems to have been out there for a
> > > > while,
> > > > > > > > somehow
> > > > > > > > > > went
> > > > > > > > > > > under the radar... but I think the key question is
> > whether
> > > > this
> > > > > > > > blocks
> > > > > > > > > a
> > > > > > > > > > > basic functionality in MXNet. I believe so, hence my -1
> > > vote.
> > > > > > > > > > >
> > > > > > > > > > > Hagay
> > > > > > > > > > >
> > > > > > > > > > > On Tue, Sep 4, 2018 at 1:19 PM Sheng Zha <
> > > szha.pvg@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Hi Hagay and Sandeep,
> > > > > > > > > > > >
> > > > > > > > > > > > Could you help us understand why this specific bug is
> > > more
> > > > > > > > important
> > > > > > > > > > than
> > > > > > > > > > > > all the other known bugs, that this becomes a release
> > > > > blocker?
> > > > > > > > > > > >
> > > > > > > > > > > > Some facts to consider:
> > > > > > > > > > > > - The bug exists since SymbolBlock was introduced a
> > year
> > > > ago
> > > > > > and
> > > > > > > > has
> > > > > > > > > > > > survived at least three releases, so this is not a
> > > > > regression.
> > > > > > > > > > > > - Timeline-wise, this bug was reported on 7/21, but
> was
> > > not
> > > > > > > > reported
> > > > > > > > > as
> > > > > > > > > > > > release-blocker in the release discussion thread
> until
> > > 8/31
> > > > > > [1].
> > > > > > > > > > Neither
> > > > > > > > > > > > its reporting as release-blocker nor its fix made it
> > for
> > > > the
> > > > > > 8/3
> > > > > > > > code
> > > > > > > > > > > > freeze.
> > > > > > > > > > > > - The PR is still not ready yet as it doesn't have
> > > > approval.
> > > > > > > > > > > >
> > > > > > > > > > > > Hagay, it would be great if you could provide some
> > > > additional
> > > > > > > > > reasoning
> > > > > > > > > > > > besides "X mentions the issue" or "fix was done by
> X".
> > > > > Thanks.
> > > > > > > > > > > >
> > > > > > > > > > > > -sz
> > > > > > > > > > > >
> > > > > > > > > > > > [1]
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://lists.apache.org/thread.html/d1ed611f98c20d5d85c294b0c07c8bdebca13a209cf66a3872c9123e@%3Cdev.mxnet.apache.org%3E
> > > > > > > > > > > >
> > > > > > > > > > > > On Tue, Sep 4, 2018 at 12:39 PM Hagay Lupesko <
> > > > > > lupesko@gmail.com
> > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Sandeep mentions the issue of an error when user
> > tries
> > > to
> > > > > > load
> > > > > > > > > model
> > > > > > > > > > > > params
> > > > > > > > > > > > > trained/saved as FP16.
> > > > > > > > > > > > >
> > https://github.com/apache/incubator-mxnet/issues/11849
> > > > > > > > > > > > > The fix was done by Sandeep:
> > > > > > > > > > > > >
> https://github.com/apache/incubator-mxnet/pull/12412
> > > and
> > > > > is
> > > > > > > > ready
> > > > > > > > > to
> > > > > > > > > > > be
> > > > > > > > > > > > > cherry picked into the release branch.
> > > > > > > > > > > > >
> > > > > > > > > > > > > This seems like a release blocker to me:
> > > > > > > > > > > > > - Basic functionality broken: loading a model
> (albeit
> > > one
> > > > > > that
> > > > > > > > that
> > > > > > > > > > was
> > > > > > > > > > > > > saved as non FP32)
> > > > > > > > > > > > > - Reported by 3 users (wgchang@, nicklhy@ and
> > > > > ThomasDelteil@
> > > > > > )
> > > > > > > > > > > > >
> > > > > > > > > > > > > -1 (non binding)
> > > > > > > > > > > > >
> > > > > > > > > > > > > Hagay
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Tue, Sep 4, 2018 at 12:01 PM sandeep
> > krishnamurthy <
> > > > > > > > > > > > > sandeep.krishna98@gmail.com> wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > "- 0"
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I believe the bug #11849
> > > > > > > > > > > > > > <
> > > > https://github.com/apache/incubator-mxnet/issues/11849
> > > > > >,
> > > > > > > > unable
> > > > > > > > > > to
> > > > > > > > > > > > > import
> > > > > > > > > > > > > > non-fp32 models into Gluon, fixed in this PR
> #12412
> > > > > > > > > > > > > > <
> > > https://github.com/apache/incubator-mxnet/pull/12412>
> > > > > is
> > > > > > > > > > important
> > > > > > > > > > > > for
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > users. I would rather pick this fix in this
> release
> > > > than
> > > > > > > plan a
> > > > > > > > > > minor
> > > > > > > > > > > > > > release later.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > Sandeep
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Mon, Sep 3, 2018 at 2:34 PM Philip Cho <
> > > > > > > > > > > chohyu01@cs.washington.edu>
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Actually, the command "git clone --recursive
> > > > > > > > > > > > > > > https://github.com/apache/incubator-mxnet -b
> > > > > 1.3.0.rc0"
> > > > > > > > works
> > > > > > > > > > fine
> > > > > > > > > > > > > now,
> > > > > > > > > > > > > > > never mind.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Mon, Sep 3, 2018 at 1:45 PM Philip Cho <
> > > > > > > > > > > > chohyu01@cs.washington.edu>
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Unfortunately, MXNet was depending on a
> branch
> > of
> > > > TVM
> > > > > > > that
> > > > > > > > is
> > > > > > > > > > now
> > > > > > > > > > > > > > > deleted.
> > > > > > > > > > > > > > > > We will have to merge #12448
> > > > > > > > > > > > > > > > <
> > > > > https://github.com/apache/incubator-mxnet/pull/12448>
> > > > > > > > > before
> > > > > > > > > > > the
> > > > > > > > > > > > > > > release.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Background: See dmlc/tvm#1394 <
> > > > > > > > > > > > > https://github.com/dmlc/tvm/issues/1394
> > > > > > > > > > > > > > >.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Philip.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Mon, Sep 3, 2018 at 7:26 AM Carin Meier <
> > > > > > > > > > carinmeier@gmail.com
> > > > > > > > > > > >
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >> Checked out the tag, built and tested the
> > > Clojure
> > > > > > > package.
> > > > > > > > > +1
> > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > >> On Fri, Aug 31, 2018 at 10:59 PM Roshani
> > > Nagmote <
> > > > > > > > > > > > > > > >> roshaninagmote2@gmail.com>
> > > > > > > > > > > > > > > >> wrote:
> > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > >> > Hi all,
> > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > >> > I would like to propose a vote to release
> > > Apache
> > > > > > MXNet
> > > > > > > > > > > > > (incubating)
> > > > > > > > > > > > > > > >> version
> > > > > > > > > > > > > > > >> > 1.3.0.RC0. Voting will start now (Friday,
> > Aug
> > > > > 31st)
> > > > > > > and
> > > > > > > > > end
> > > > > > > > > > at
> > > > > > > > > > > > > 7:00
> > > > > > > > > > > > > > PM
> > > > > > > > > > > > > > > >> > PDT, Wednesday, Sept 5th.
> > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > >> > Link to release notes:
> > > > > > > > > > > > > > > >> >
> > > > > https://github.com/apache/incubator-mxnet/releases
> > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > >> > Link to release candidate 1.3.0.rc0:
> > > > > > > > > > > > > > > >> > *
> > > > > > > > > > > >
> > > > > > https://github.com/apache/incubator-mxnet/releases/tag/1.3.0.rc
> > > > > > > > > > > > > > > >> > <
> > > > > > > > > > > >
> > > > > > https://github.com/apache/incubator-mxnet/releases/tag/1.3.0.rc0
> > > > > > > > > > > > > > >0*
> > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > >> > View this page, click on "Build from
> > Source",
> > > > and
> > > > > > use
> > > > > > > > the
> > > > > > > > > > > source
> > > > > > > > > > > > > > code
> > > > > > > > > > > > > > > >> > obtained from 1.3.0.rc0 tag:
> > > > > > > > > > > > > > > >> >
> > > > > > https://mxnet.incubator.apache.org/install/index.html
> > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > >> > Please remember to TEST first before
> voting
> > > > > > > accordingly:
> > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > >> > +1 = approve
> > > > > > > > > > > > > > > >> > +0 = no opinion
> > > > > > > > > > > > > > > >> > -1 = disapprove (provide reason)
> > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > >> > Thanks,
> > > > > > > > > > > > > > > >> > Roshani
> > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > --
> > > > > > > > > > > > > > Sandeep Krishnamurthy
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > Sandeep Krishnamurthy
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Sandeep Krishnamurthy
> > > > >
> > > >
> > >
> >
>

Re: [VOTE] Release MXNet version 1.3.0.RC0

Posted by Haibin Lin <ha...@gmail.com>.
+1 built from source and passes dist_sync_kvstore test on Ubuntu.

Best,
Haibin

On Thu, Sep 6, 2018 at 1:32 PM Indhu <in...@gmail.com> wrote:

> +1
>
> The release candidate looks good. I'm able to build and run basic models.
>
> One the FP16 issue:
>
> Like others have pointed out, releases on expensive in terms of time and
> effort. There needs to be a high and more objective bar on what qualifies
> as a release blocker to make sure we are not setting precedence for a lot
> of release blockers in future.
>
> I think a release blocker is justified only if there is a serious bug
> discovered in one of the features included in the release or if there is a
> regression. Given FP16 supports is not a new feature claimed in this
> release and this is not a regression in this release candidate, I'm
> inclined to release this candidate and include the FP16 fix in a subsequent
> release.
>
> Thanks,
> Indu
>
> On Wed, Sep 5, 2018 at 10:21 AM Aaron Markham <aa...@gmail.com>
> wrote:
>
> > 0 (non-binding) If we have a problem that blocks users, and a solution in
> > hand... then we should fix it, but not at the expense of starting the
> > release cycle again just for one fix. Users can cherry pick or build from
> > master if they want the fix right away, right? I'd change my mind to -1
> if
> > this wasn't the case, with good reason, and if the user impact was
> critical
> > to adoption or risks abandonment.
> >
> >
> > On Wed, Sep 5, 2018 at 9:57 AM Roshani Nagmote <
> roshaninagmote2@gmail.com>
> > wrote:
> >
> > > I believe everyone here is working hard to make MXNet a better
> framework
> > > for users. It's completely okay to have different opinions, we can
> decide
> > > together if this issue is a blocker or not after voting time is over.
> > >
> > > As I mentioned before, voting will end at 7 pm today. So there is still
> > > time to test the release. If there are any other issues anyone finds, I
> > > will be happy to start the process again and work on RC1. For now, I
> want
> > > to encourage everyone to utilize this time and vote. :)
> > >
> > > Thanks,
> > > Roshani
> > >
> > > On Tue, Sep 4, 2018 at 10:35 PM sandeep krishnamurthy <
> > > sandeep.krishna98@gmail.com> wrote:
> > >
> > > >    1. As a Apache MXNet community member, I raised the concern of
> > broken
> > > >    functionality for the user. I explained and provided the data
> points
> > > on
> > > > the
> > > >    issue, workaround and why I think it is important. If after all
> > this,
> > > > you
> > > >    think my vote is biased on my employer just because a user I
> quoted
> > is
> > > > from
> > > >    Amazon, this is more concerning to me on my voting abilities.
> > > >    2. My -1 no where undermines the huge amount of effort that goes
> > > behind
> > > >    the scene for a release to happen. Great respect and recognition
> for
> > > >    everyone involved in all the releases of MXNet in the past and
> > this. I
> > > >    voted on my judgement of what may be good for the users of MXNet.
> > > >    3. As pointed by Naveen & Chris, -1 are NOT veto. Feel free to
> > decide
> > > >    and progress on the release as we already have >3 +1 in this
> thread.
> > > >
> > > >
> > > > Best,
> > > >
> > > > Sandeep
> > > >
> > > > On Tue, Sep 4, 2018 at 8:29 PM Chris Olivier <cj...@gmail.com>
> > > > wrote:
> > > >
> > > > > btw, there are no vetoes on package releases:
> > > > >
> > > > > VOTES ON PACKAGE RELEASES
> > > > > <https://www.apache.org/foundation/voting.html#ReleaseVotes>
> > > > >
> > > > > Votes on whether a package is ready to be released use majority
> > > approval
> > > > > <https://www.apache.org/foundation/glossary.html#MajorityApproval>
> > --
> > > > i.e.
> > > > > at least three PMC members must vote affirmatively for release, and
> > > there
> > > > > must be more positive than negative votes.Releases may not be
> vetoed.
> > > > > Generally
> > > > > the community will cancel the release vote if anyone identifies
> > serious
> > > > > problems, but in most cases the ultimate decision, lies with the
> > > > individual
> > > > > serving as release manager. The specifics of the process may vary
> > from
> > > > > project to project, but the 'minimum quorum of three +1 votes' rule
> > is
> > > > > universal.
> > > > >
> > > > > On Tue, Sep 4, 2018 at 7:12 PM Sheng Zha <sz...@gmail.com>
> wrote:
> > > > >
> > > > > > Thanks for sharing your opinions, Thomas. Your recognition and
> > > respect
> > > > of
> > > > > > people's efforts on preparing the release candidate are certainly
> > > > > > appreciated.
> > > > > >
> > > > > > Now that the vote is set to fail thanks to the veto, there will
> be
> > > > plenty
> > > > > > of opportunities to include those bug fixes, including the one
> Zhi
> > > > > > mentioned [1], which was already merged in the master and yet
> chose
> > > not
> > > > > to
> > > > > > block this release with [2]. I will be happy to work with Roshani
> > to
> > > > > > prepare another release candidate once ready.
> > > > > >
> > > > > > -sz
> > > > > >
> > > > > > [1]
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://lists.apache.org/thread.html/f02e952bec22c82cb00a6741390a78f55373311c97464997bb455a6c@%3Cdev.mxnet.apache.org%3E
> > > > > > [2]
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://lists.apache.org/thread.html/85d3fcabb3437ba7f1af455cf69aa13eb3afd1ea1d1f6f891e9c339c@%3Cdev.mxnet.apache.org%3E
> > > > > >
> > > > > > On Tue, Sep 4, 2018 at 6:02 PM Thomas DELTEIL <
> > > > thomas.delteil1@gmail.com
> > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > -0
> > > > > > > (non-binding)
> > > > > > >
> > > > > > > If I may add some nuancing plus a personal data point as one of
> > the
> > > > > users
> > > > > > > commenting in the bug report in question:
> > > > > > >
> > > > > > > - Performance vs. Basic functionality => I don't think high
> > > > performance
> > > > > > > use-cases and basic functionality are two obviously opposed
> > > concepts
> > > > > and
> > > > > > > see no contradiction in Hagay's and Sandeep's statements.
> > > > > > > Float16 support is feature of MXNet that provides more than
> twice
> > > the
> > > > > > > performance of Float32 on supported platforms, hence the high
> > > > > performance
> > > > > > > use-case. The bug is that the basic functionality of reloading
> a
> > > > saved
> > > > > > > float16 models is currently broken.
> > > > > > >
> > > > > > > - This bug vs Other bugs => Contrary the vast majority of the
> 140
> > > > open
> > > > > > bugs
> > > > > > > that are mentioned above, I would put to Sandeep's credit that
> > this
> > > > one
> > > > > > bug
> > > > > > > has a PR open that provides a fix for it. This would make it a
> > > better
> > > > > > > candidate to get included in this release than a bug that has
> no
> > > fix
> > > > > > ready
> > > > > > > for it.
> > > > > > >
> > > > > > > - Personal datapoint: I recently did some experimentation with
> > > > float16
> > > > > > [1]
> > > > > > > and actually coincidentally just published a video on
> optimizing
> > > > > > > performance for Gluon. Float16 conversion is one of the most,
> if
> > > not
> > > > > the
> > > > > > > most effective way to get performance out of MXNet [2]. I
> believe
> > > > there
> > > > > > is
> > > > > > > a lot of value in publicizing more its use and hence making
> sure
> > at
> > > > > least
> > > > > > > the basic support for normal use-cases is present.
> > > > > > >
> > > > > > > Of course this needs to be balanced with the overhead of
> > preparing
> > > a
> > > > > new
> > > > > > > release candidate once the fixed is reviewed and merged, which
> > > seems
> > > > to
> > > > > > be
> > > > > > > a lengthy and complex process in its own right, and the delay
> > with
> > > > > > > providing the other features present in 1.3 for users that are
> > not
> > > > > > running
> > > > > > > off the nightly builds.
> > > > > > >
> > > > > > > All the best,
> > > > > > >
> > > > > > > Thomas
> > > > > > >
> > > > > > > [1]
> https://github.com/ThomasDelteil/PerformanceTricksMXNetGluon
> > > > > > > [2]
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://www.youtube.com/watch?v=Cqo7FPftNyo&t=0s&list=PLkEvNnRk8uVk6U515Pj-jHQUxFC4eDi3m
> > > > > > >
> > > > > > > Le mar. 4 sept. 2018 à 17:11, Sheng Zha <sz...@gmail.com> a
> > > > écrit :
> > > > > > >
> > > > > > > > Sandeep,
> > > > > > > >
> > > > > > > > Thanks for explaining your veto. We have open bugs that
> > impacted
> > > a
> > > > > lot
> > > > > > > more
> > > > > > > > than just 3 customers, just by referring to the number of
> > > > commenters
> > > > > on
> > > > > > > the
> > > > > > > > issue [1].
> > > > > > > >
> > > > > > > > You said that this is for "high performance use cases", which
> > > > > > contradicts
> > > > > > > > with Hagay's assement that this is "basic functionality
> > broken".
> > > > > Given
> > > > > > > that
> > > > > > > > this is for advanced use cases of using half-precision
> > training,
> > > > why
> > > > > is
> > > > > > > it
> > > > > > > > so much more important than any other open bug reports, that
> > for
> > > > this
> > > > > > > > specific bug fix, we have to delay the access of regular
> users
> > to
> > > > the
> > > > > > new
> > > > > > > > MXNet 1.3 release by at least another week?
> > > > > > > >
> > > > > > > > Honestly, I'm concerned that your vote is biased by Amazon
> > > > > involvement,
> > > > > > > > given that you quoted Amazon Rekognition.
> > > > > > > >
> > > > > > > > -sz
> > > > > > > >
> > > > > > > > [1]
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/incubator-mxnet/issues?q=is%3Aissue+is%3Aopen+label%3ABug+sort%3Acomments-desc
> > > > > > > >
> > > > > > > > On Tue, Sep 4, 2018 at 4:51 PM sandeep krishnamurthy <
> > > > > > > > sandeep.krishna98@gmail.com> wrote:
> > > > > > > >
> > > > > > > > > My initial vote of “-0” was due to lack of info from a user
> > who
> > > > had
> > > > > > > said,
> > > > > > > > > he overcame this issue for FP16 model.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > However, suggested workaround [1] for the issue is not
> > straight
> > > > > > forward
> > > > > > > > and
> > > > > > > > > generally usable for all users. Also, issue is not simple
> and
> > > > > > isolated
> > > > > > > to
> > > > > > > > > be listed in the Release Notes as known issue with a
> > > workaround.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Changing my vote to: "-1 (binding)" owing to the user
> impact
> > > [3]
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > @Sheng:
> > > > > > > > >
> > > > > > > > > 1. Agreed, bug existed from long time. However, FP16 and
> such
> > > > > > > > optimizations
> > > > > > > > > were added later on. Followed by users [2] using this
> feature
> > > for
> > > > > > high
> > > > > > > > > performance use cases. It is not ok to measure severity of
> > the
> > > > bug
> > > > > > > based
> > > > > > > > on
> > > > > > > > > its past existence, rather we can see who is impacted now
> and
> > > is
> > > > > it a
> > > > > > > > small
> > > > > > > > > subset with a simple workaround or large user impacting
> > issue.
> > > > > > > > >
> > > > > > > > > 2. Agreed bug was reported 7/21. However, I became aware of
> > > this
> > > > > > issue
> > > > > > > on
> > > > > > > > > 08/29 and submitted the fix on 08/30. Also, I did bring
> this
> > to
> > > > the
> > > > > > > > notice
> > > > > > > > > of community, you and 1.3 release manager (Roshani) on the
> > RC0
> > > > > > proposal
> > > > > > > > > thread. Also, I would focus on the issue and user impact
> than
> > > who
> > > > > > > > > identified and who is fixing the issue.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Based on my discussion with 2 users, I think it is a
> > important
> > > > > > feature
> > > > > > > > for
> > > > > > > > > them to see in Apache MXNet v1.3.0.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > >
> > > > > > > > > Sandeep
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > [1] Workaround used by the user.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > net_fp16 =
> > > > > mx.gluon.SymbolBlock.imports('resnet34_fp16-symbol.json',
> > > > > > > > > ['data'])
> > > > > > > > >
> > > > > > > > > params_fp16 = mx.nd.load('resnet34_fp16-0000.params')
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > for k, v in params_fp16.items():
> > > > > > > > >
> > > > > > > > >     new_key = k.split(':')[1]
> > > > > > > > >
> > > > > > > > >     net_fp16.collect_params()[new_key].cast(v.dtype)
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > net_fp16.collect_params().load('resnet34_fp16-0000.params',
> > > ctx)
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > [2] Amazon Rekognition
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > [3] User story: Train a model -> Cast it to FP16 -> Save
> the
> > > > model
> > > > > ->
> > > > > > > > Load
> > > > > > > > > back the model does not work. They have to cast every
> > parameter
> > > > > with
> > > > > > a
> > > > > > > > > workaround mentioned above [1].
> > > > > > > > >
> > > > > > > > > On Tue, Sep 4, 2018 at 4:14 PM Hagay Lupesko <
> > > lupesko@gmail.com>
> > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi Sheng,
> > > > > > > > > >
> > > > > > > > > > Addressing your questions:
> > > > > > > > > >
> > > > > > > > > > - "why this specific bug is more important than all the
> > other
> > > > > known
> > > > > > > > bugs,
> > > > > > > > > > that this becomes a release blocker"
> > > > > > > > > > I do not consider it to be more or less important than
> > other
> > > > > fixes.
> > > > > > > It
> > > > > > > > > can
> > > > > > > > > > be fixed and included in the release alongside the rest
> of
> > > the
> > > > > > > release
> > > > > > > > > > content, right?
> > > > > > > > > > From the description of the issue it seems important
> since
> > it
> > > > is
> > > > > > > > blocking
> > > > > > > > > > users from loading models that were previously trained
> and
> > > > saved.
> > > > > > > There
> > > > > > > > > is
> > > > > > > > > > nothing stopping the community from including this fix
> into
> > > > > 1.3.0,
> > > > > > > > > > alongside the rest of the features and fixes.
> > > > > > > > > >
> > > > > > > > > > - "The bug exists since SymbolBlock was introduced a year
> > ago
> > > > and
> > > > > > has
> > > > > > > > > > survived at least three releases, so this is not a
> > > regression."
> > > > > > > > > > I do not think I said it is a regression. However, the
> > fact a
> > > > bug
> > > > > > > > existed
> > > > > > > > > > before, does not mean it is OK to release it rather than
> > fix
> > > > it.
> > > > > > > > > >
> > > > > > > > > > - "Timeline-wise, this bug was reported on 7/21, but was
> > not
> > > > > > reported
> > > > > > > > as
> > > > > > > > > > release-blocker in the release discussion thread until
> 8/31
> > > > [1].
> > > > > > > > Neither
> > > > > > > > > > its reporting as release-blocker nor its fix made it for
> > the
> > > > 8/3
> > > > > > code
> > > > > > > > > > freeze."
> > > > > > > > > > You are right, would have been better to have this
> > identified
> > > > and
> > > > > > > fixed
> > > > > > > > > > earlier and included before code freeze.
> > > > > > > > > >
> > > > > > > > > > - "The PR is still not ready yet as it doesn't have
> > > approval."
> > > > > > > > > > I think it is waiting for your review.
> > > > > > > > > >
> > > > > > > > > > - "it would be great if you could provide some additional
> > > > > reasoning
> > > > > > > > > besides
> > > > > > > > > > "X mentions the issue" or "fix was done by X""
> > > > > > > > > > I have. Repeating what I wrote in my previous email for
> > > > clarity:
> > > > > > > Basic
> > > > > > > > > > functionality broken: loading a model (albeit one that
> that
> > > was
> > > > > > saved
> > > > > > > > as
> > > > > > > > > > non FP32)
> > > > > > > > > >
> > > > > > > > > > So, yes - this issue seems to have been out there for a
> > > while,
> > > > > > > somehow
> > > > > > > > > went
> > > > > > > > > > under the radar... but I think the key question is
> whether
> > > this
> > > > > > > blocks
> > > > > > > > a
> > > > > > > > > > basic functionality in MXNet. I believe so, hence my -1
> > vote.
> > > > > > > > > >
> > > > > > > > > > Hagay
> > > > > > > > > >
> > > > > > > > > > On Tue, Sep 4, 2018 at 1:19 PM Sheng Zha <
> > szha.pvg@gmail.com
> > > >
> > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi Hagay and Sandeep,
> > > > > > > > > > >
> > > > > > > > > > > Could you help us understand why this specific bug is
> > more
> > > > > > > important
> > > > > > > > > than
> > > > > > > > > > > all the other known bugs, that this becomes a release
> > > > blocker?
> > > > > > > > > > >
> > > > > > > > > > > Some facts to consider:
> > > > > > > > > > > - The bug exists since SymbolBlock was introduced a
> year
> > > ago
> > > > > and
> > > > > > > has
> > > > > > > > > > > survived at least three releases, so this is not a
> > > > regression.
> > > > > > > > > > > - Timeline-wise, this bug was reported on 7/21, but was
> > not
> > > > > > > reported
> > > > > > > > as
> > > > > > > > > > > release-blocker in the release discussion thread until
> > 8/31
> > > > > [1].
> > > > > > > > > Neither
> > > > > > > > > > > its reporting as release-blocker nor its fix made it
> for
> > > the
> > > > > 8/3
> > > > > > > code
> > > > > > > > > > > freeze.
> > > > > > > > > > > - The PR is still not ready yet as it doesn't have
> > > approval.
> > > > > > > > > > >
> > > > > > > > > > > Hagay, it would be great if you could provide some
> > > additional
> > > > > > > > reasoning
> > > > > > > > > > > besides "X mentions the issue" or "fix was done by X".
> > > > Thanks.
> > > > > > > > > > >
> > > > > > > > > > > -sz
> > > > > > > > > > >
> > > > > > > > > > > [1]
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://lists.apache.org/thread.html/d1ed611f98c20d5d85c294b0c07c8bdebca13a209cf66a3872c9123e@%3Cdev.mxnet.apache.org%3E
> > > > > > > > > > >
> > > > > > > > > > > On Tue, Sep 4, 2018 at 12:39 PM Hagay Lupesko <
> > > > > lupesko@gmail.com
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Sandeep mentions the issue of an error when user
> tries
> > to
> > > > > load
> > > > > > > > model
> > > > > > > > > > > params
> > > > > > > > > > > > trained/saved as FP16.
> > > > > > > > > > > >
> https://github.com/apache/incubator-mxnet/issues/11849
> > > > > > > > > > > > The fix was done by Sandeep:
> > > > > > > > > > > > https://github.com/apache/incubator-mxnet/pull/12412
> > and
> > > > is
> > > > > > > ready
> > > > > > > > to
> > > > > > > > > > be
> > > > > > > > > > > > cherry picked into the release branch.
> > > > > > > > > > > >
> > > > > > > > > > > > This seems like a release blocker to me:
> > > > > > > > > > > > - Basic functionality broken: loading a model (albeit
> > one
> > > > > that
> > > > > > > that
> > > > > > > > > was
> > > > > > > > > > > > saved as non FP32)
> > > > > > > > > > > > - Reported by 3 users (wgchang@, nicklhy@ and
> > > > ThomasDelteil@
> > > > > )
> > > > > > > > > > > >
> > > > > > > > > > > > -1 (non binding)
> > > > > > > > > > > >
> > > > > > > > > > > > Hagay
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Tue, Sep 4, 2018 at 12:01 PM sandeep
> krishnamurthy <
> > > > > > > > > > > > sandeep.krishna98@gmail.com> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > "- 0"
> > > > > > > > > > > > >
> > > > > > > > > > > > > I believe the bug #11849
> > > > > > > > > > > > > <
> > > https://github.com/apache/incubator-mxnet/issues/11849
> > > > >,
> > > > > > > unable
> > > > > > > > > to
> > > > > > > > > > > > import
> > > > > > > > > > > > > non-fp32 models into Gluon, fixed in this PR #12412
> > > > > > > > > > > > > <
> > https://github.com/apache/incubator-mxnet/pull/12412>
> > > > is
> > > > > > > > > important
> > > > > > > > > > > for
> > > > > > > > > > > > > the
> > > > > > > > > > > > > users. I would rather pick this fix in this release
> > > than
> > > > > > plan a
> > > > > > > > > minor
> > > > > > > > > > > > > release later.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Best,
> > > > > > > > > > > > > Sandeep
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Mon, Sep 3, 2018 at 2:34 PM Philip Cho <
> > > > > > > > > > chohyu01@cs.washington.edu>
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Actually, the command "git clone --recursive
> > > > > > > > > > > > > > https://github.com/apache/incubator-mxnet -b
> > > > 1.3.0.rc0"
> > > > > > > works
> > > > > > > > > fine
> > > > > > > > > > > > now,
> > > > > > > > > > > > > > never mind.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Mon, Sep 3, 2018 at 1:45 PM Philip Cho <
> > > > > > > > > > > chohyu01@cs.washington.edu>
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Unfortunately, MXNet was depending on a branch
> of
> > > TVM
> > > > > > that
> > > > > > > is
> > > > > > > > > now
> > > > > > > > > > > > > > deleted.
> > > > > > > > > > > > > > > We will have to merge #12448
> > > > > > > > > > > > > > > <
> > > > https://github.com/apache/incubator-mxnet/pull/12448>
> > > > > > > > before
> > > > > > > > > > the
> > > > > > > > > > > > > > release.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Background: See dmlc/tvm#1394 <
> > > > > > > > > > > > https://github.com/dmlc/tvm/issues/1394
> > > > > > > > > > > > > >.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Philip.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Mon, Sep 3, 2018 at 7:26 AM Carin Meier <
> > > > > > > > > carinmeier@gmail.com
> > > > > > > > > > >
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >> Checked out the tag, built and tested the
> > Clojure
> > > > > > package.
> > > > > > > > +1
> > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > >> On Fri, Aug 31, 2018 at 10:59 PM Roshani
> > Nagmote <
> > > > > > > > > > > > > > >> roshaninagmote2@gmail.com>
> > > > > > > > > > > > > > >> wrote:
> > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > >> > Hi all,
> > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > >> > I would like to propose a vote to release
> > Apache
> > > > > MXNet
> > > > > > > > > > > > (incubating)
> > > > > > > > > > > > > > >> version
> > > > > > > > > > > > > > >> > 1.3.0.RC0. Voting will start now (Friday,
> Aug
> > > > 31st)
> > > > > > and
> > > > > > > > end
> > > > > > > > > at
> > > > > > > > > > > > 7:00
> > > > > > > > > > > > > PM
> > > > > > > > > > > > > > >> > PDT, Wednesday, Sept 5th.
> > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > >> > Link to release notes:
> > > > > > > > > > > > > > >> >
> > > > https://github.com/apache/incubator-mxnet/releases
> > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > >> > Link to release candidate 1.3.0.rc0:
> > > > > > > > > > > > > > >> > *
> > > > > > > > > > >
> > > > > https://github.com/apache/incubator-mxnet/releases/tag/1.3.0.rc
> > > > > > > > > > > > > > >> > <
> > > > > > > > > > >
> > > > > https://github.com/apache/incubator-mxnet/releases/tag/1.3.0.rc0
> > > > > > > > > > > > > >0*
> > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > >> > View this page, click on "Build from
> Source",
> > > and
> > > > > use
> > > > > > > the
> > > > > > > > > > source
> > > > > > > > > > > > > code
> > > > > > > > > > > > > > >> > obtained from 1.3.0.rc0 tag:
> > > > > > > > > > > > > > >> >
> > > > > https://mxnet.incubator.apache.org/install/index.html
> > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > >> > Please remember to TEST first before voting
> > > > > > accordingly:
> > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > >> > +1 = approve
> > > > > > > > > > > > > > >> > +0 = no opinion
> > > > > > > > > > > > > > >> > -1 = disapprove (provide reason)
> > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > >> > Thanks,
> > > > > > > > > > > > > > >> > Roshani
> > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > --
> > > > > > > > > > > > > Sandeep Krishnamurthy
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Sandeep Krishnamurthy
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Sandeep Krishnamurthy
> > > >
> > >
> >
>

Re: [VOTE] Release MXNet version 1.3.0.RC0

Posted by Indhu <in...@gmail.com>.
+1

The release candidate looks good. I'm able to build and run basic models.

One the FP16 issue:

Like others have pointed out, releases on expensive in terms of time and
effort. There needs to be a high and more objective bar on what qualifies
as a release blocker to make sure we are not setting precedence for a lot
of release blockers in future.

I think a release blocker is justified only if there is a serious bug
discovered in one of the features included in the release or if there is a
regression. Given FP16 supports is not a new feature claimed in this
release and this is not a regression in this release candidate, I'm
inclined to release this candidate and include the FP16 fix in a subsequent
release.

Thanks,
Indu

On Wed, Sep 5, 2018 at 10:21 AM Aaron Markham <aa...@gmail.com>
wrote:

> 0 (non-binding) If we have a problem that blocks users, and a solution in
> hand... then we should fix it, but not at the expense of starting the
> release cycle again just for one fix. Users can cherry pick or build from
> master if they want the fix right away, right? I'd change my mind to -1 if
> this wasn't the case, with good reason, and if the user impact was critical
> to adoption or risks abandonment.
>
>
> On Wed, Sep 5, 2018 at 9:57 AM Roshani Nagmote <ro...@gmail.com>
> wrote:
>
> > I believe everyone here is working hard to make MXNet a better framework
> > for users. It's completely okay to have different opinions, we can decide
> > together if this issue is a blocker or not after voting time is over.
> >
> > As I mentioned before, voting will end at 7 pm today. So there is still
> > time to test the release. If there are any other issues anyone finds, I
> > will be happy to start the process again and work on RC1. For now, I want
> > to encourage everyone to utilize this time and vote. :)
> >
> > Thanks,
> > Roshani
> >
> > On Tue, Sep 4, 2018 at 10:35 PM sandeep krishnamurthy <
> > sandeep.krishna98@gmail.com> wrote:
> >
> > >    1. As a Apache MXNet community member, I raised the concern of
> broken
> > >    functionality for the user. I explained and provided the data points
> > on
> > > the
> > >    issue, workaround and why I think it is important. If after all
> this,
> > > you
> > >    think my vote is biased on my employer just because a user I quoted
> is
> > > from
> > >    Amazon, this is more concerning to me on my voting abilities.
> > >    2. My -1 no where undermines the huge amount of effort that goes
> > behind
> > >    the scene for a release to happen. Great respect and recognition for
> > >    everyone involved in all the releases of MXNet in the past and
> this. I
> > >    voted on my judgement of what may be good for the users of MXNet.
> > >    3. As pointed by Naveen & Chris, -1 are NOT veto. Feel free to
> decide
> > >    and progress on the release as we already have >3 +1 in this thread.
> > >
> > >
> > > Best,
> > >
> > > Sandeep
> > >
> > > On Tue, Sep 4, 2018 at 8:29 PM Chris Olivier <cj...@gmail.com>
> > > wrote:
> > >
> > > > btw, there are no vetoes on package releases:
> > > >
> > > > VOTES ON PACKAGE RELEASES
> > > > <https://www.apache.org/foundation/voting.html#ReleaseVotes>
> > > >
> > > > Votes on whether a package is ready to be released use majority
> > approval
> > > > <https://www.apache.org/foundation/glossary.html#MajorityApproval>
> --
> > > i.e.
> > > > at least three PMC members must vote affirmatively for release, and
> > there
> > > > must be more positive than negative votes.Releases may not be vetoed.
> > > > Generally
> > > > the community will cancel the release vote if anyone identifies
> serious
> > > > problems, but in most cases the ultimate decision, lies with the
> > > individual
> > > > serving as release manager. The specifics of the process may vary
> from
> > > > project to project, but the 'minimum quorum of three +1 votes' rule
> is
> > > > universal.
> > > >
> > > > On Tue, Sep 4, 2018 at 7:12 PM Sheng Zha <sz...@gmail.com> wrote:
> > > >
> > > > > Thanks for sharing your opinions, Thomas. Your recognition and
> > respect
> > > of
> > > > > people's efforts on preparing the release candidate are certainly
> > > > > appreciated.
> > > > >
> > > > > Now that the vote is set to fail thanks to the veto, there will be
> > > plenty
> > > > > of opportunities to include those bug fixes, including the one Zhi
> > > > > mentioned [1], which was already merged in the master and yet chose
> > not
> > > > to
> > > > > block this release with [2]. I will be happy to work with Roshani
> to
> > > > > prepare another release candidate once ready.
> > > > >
> > > > > -sz
> > > > >
> > > > > [1]
> > > > >
> > > > >
> > > >
> > >
> >
> https://lists.apache.org/thread.html/f02e952bec22c82cb00a6741390a78f55373311c97464997bb455a6c@%3Cdev.mxnet.apache.org%3E
> > > > > [2]
> > > > >
> > > > >
> > > >
> > >
> >
> https://lists.apache.org/thread.html/85d3fcabb3437ba7f1af455cf69aa13eb3afd1ea1d1f6f891e9c339c@%3Cdev.mxnet.apache.org%3E
> > > > >
> > > > > On Tue, Sep 4, 2018 at 6:02 PM Thomas DELTEIL <
> > > thomas.delteil1@gmail.com
> > > > >
> > > > > wrote:
> > > > >
> > > > > > -0
> > > > > > (non-binding)
> > > > > >
> > > > > > If I may add some nuancing plus a personal data point as one of
> the
> > > > users
> > > > > > commenting in the bug report in question:
> > > > > >
> > > > > > - Performance vs. Basic functionality => I don't think high
> > > performance
> > > > > > use-cases and basic functionality are two obviously opposed
> > concepts
> > > > and
> > > > > > see no contradiction in Hagay's and Sandeep's statements.
> > > > > > Float16 support is feature of MXNet that provides more than twice
> > the
> > > > > > performance of Float32 on supported platforms, hence the high
> > > > performance
> > > > > > use-case. The bug is that the basic functionality of reloading a
> > > saved
> > > > > > float16 models is currently broken.
> > > > > >
> > > > > > - This bug vs Other bugs => Contrary the vast majority of the 140
> > > open
> > > > > bugs
> > > > > > that are mentioned above, I would put to Sandeep's credit that
> this
> > > one
> > > > > bug
> > > > > > has a PR open that provides a fix for it. This would make it a
> > better
> > > > > > candidate to get included in this release than a bug that has no
> > fix
> > > > > ready
> > > > > > for it.
> > > > > >
> > > > > > - Personal datapoint: I recently did some experimentation with
> > > float16
> > > > > [1]
> > > > > > and actually coincidentally just published a video on optimizing
> > > > > > performance for Gluon. Float16 conversion is one of the most, if
> > not
> > > > the
> > > > > > most effective way to get performance out of MXNet [2]. I believe
> > > there
> > > > > is
> > > > > > a lot of value in publicizing more its use and hence making sure
> at
> > > > least
> > > > > > the basic support for normal use-cases is present.
> > > > > >
> > > > > > Of course this needs to be balanced with the overhead of
> preparing
> > a
> > > > new
> > > > > > release candidate once the fixed is reviewed and merged, which
> > seems
> > > to
> > > > > be
> > > > > > a lengthy and complex process in its own right, and the delay
> with
> > > > > > providing the other features present in 1.3 for users that are
> not
> > > > > running
> > > > > > off the nightly builds.
> > > > > >
> > > > > > All the best,
> > > > > >
> > > > > > Thomas
> > > > > >
> > > > > > [1] https://github.com/ThomasDelteil/PerformanceTricksMXNetGluon
> > > > > > [2]
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://www.youtube.com/watch?v=Cqo7FPftNyo&t=0s&list=PLkEvNnRk8uVk6U515Pj-jHQUxFC4eDi3m
> > > > > >
> > > > > > Le mar. 4 sept. 2018 à 17:11, Sheng Zha <sz...@gmail.com> a
> > > écrit :
> > > > > >
> > > > > > > Sandeep,
> > > > > > >
> > > > > > > Thanks for explaining your veto. We have open bugs that
> impacted
> > a
> > > > lot
> > > > > > more
> > > > > > > than just 3 customers, just by referring to the number of
> > > commenters
> > > > on
> > > > > > the
> > > > > > > issue [1].
> > > > > > >
> > > > > > > You said that this is for "high performance use cases", which
> > > > > contradicts
> > > > > > > with Hagay's assement that this is "basic functionality
> broken".
> > > > Given
> > > > > > that
> > > > > > > this is for advanced use cases of using half-precision
> training,
> > > why
> > > > is
> > > > > > it
> > > > > > > so much more important than any other open bug reports, that
> for
> > > this
> > > > > > > specific bug fix, we have to delay the access of regular users
> to
> > > the
> > > > > new
> > > > > > > MXNet 1.3 release by at least another week?
> > > > > > >
> > > > > > > Honestly, I'm concerned that your vote is biased by Amazon
> > > > involvement,
> > > > > > > given that you quoted Amazon Rekognition.
> > > > > > >
> > > > > > > -sz
> > > > > > >
> > > > > > > [1]
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/incubator-mxnet/issues?q=is%3Aissue+is%3Aopen+label%3ABug+sort%3Acomments-desc
> > > > > > >
> > > > > > > On Tue, Sep 4, 2018 at 4:51 PM sandeep krishnamurthy <
> > > > > > > sandeep.krishna98@gmail.com> wrote:
> > > > > > >
> > > > > > > > My initial vote of “-0” was due to lack of info from a user
> who
> > > had
> > > > > > said,
> > > > > > > > he overcame this issue for FP16 model.
> > > > > > > >
> > > > > > > >
> > > > > > > > However, suggested workaround [1] for the issue is not
> straight
> > > > > forward
> > > > > > > and
> > > > > > > > generally usable for all users. Also, issue is not simple and
> > > > > isolated
> > > > > > to
> > > > > > > > be listed in the Release Notes as known issue with a
> > workaround.
> > > > > > > >
> > > > > > > >
> > > > > > > > Changing my vote to: "-1 (binding)" owing to the user impact
> > [3]
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > @Sheng:
> > > > > > > >
> > > > > > > > 1. Agreed, bug existed from long time. However, FP16 and such
> > > > > > > optimizations
> > > > > > > > were added later on. Followed by users [2] using this feature
> > for
> > > > > high
> > > > > > > > performance use cases. It is not ok to measure severity of
> the
> > > bug
> > > > > > based
> > > > > > > on
> > > > > > > > its past existence, rather we can see who is impacted now and
> > is
> > > > it a
> > > > > > > small
> > > > > > > > subset with a simple workaround or large user impacting
> issue.
> > > > > > > >
> > > > > > > > 2. Agreed bug was reported 7/21. However, I became aware of
> > this
> > > > > issue
> > > > > > on
> > > > > > > > 08/29 and submitted the fix on 08/30. Also, I did bring this
> to
> > > the
> > > > > > > notice
> > > > > > > > of community, you and 1.3 release manager (Roshani) on the
> RC0
> > > > > proposal
> > > > > > > > thread. Also, I would focus on the issue and user impact than
> > who
> > > > > > > > identified and who is fixing the issue.
> > > > > > > >
> > > > > > > >
> > > > > > > > Based on my discussion with 2 users, I think it is a
> important
> > > > > feature
> > > > > > > for
> > > > > > > > them to see in Apache MXNet v1.3.0.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > Best,
> > > > > > > >
> > > > > > > > Sandeep
> > > > > > > >
> > > > > > > >
> > > > > > > > [1] Workaround used by the user.
> > > > > > > >
> > > > > > > >
> > > > > > > > net_fp16 =
> > > > mx.gluon.SymbolBlock.imports('resnet34_fp16-symbol.json',
> > > > > > > > ['data'])
> > > > > > > >
> > > > > > > > params_fp16 = mx.nd.load('resnet34_fp16-0000.params')
> > > > > > > >
> > > > > > > >
> > > > > > > > for k, v in params_fp16.items():
> > > > > > > >
> > > > > > > >     new_key = k.split(':')[1]
> > > > > > > >
> > > > > > > >     net_fp16.collect_params()[new_key].cast(v.dtype)
> > > > > > > >
> > > > > > > >
> > > > > > > > net_fp16.collect_params().load('resnet34_fp16-0000.params',
> > ctx)
> > > > > > > >
> > > > > > > >
> > > > > > > > [2] Amazon Rekognition
> > > > > > > >
> > > > > > > >
> > > > > > > > [3] User story: Train a model -> Cast it to FP16 -> Save the
> > > model
> > > > ->
> > > > > > > Load
> > > > > > > > back the model does not work. They have to cast every
> parameter
> > > > with
> > > > > a
> > > > > > > > workaround mentioned above [1].
> > > > > > > >
> > > > > > > > On Tue, Sep 4, 2018 at 4:14 PM Hagay Lupesko <
> > lupesko@gmail.com>
> > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Sheng,
> > > > > > > > >
> > > > > > > > > Addressing your questions:
> > > > > > > > >
> > > > > > > > > - "why this specific bug is more important than all the
> other
> > > > known
> > > > > > > bugs,
> > > > > > > > > that this becomes a release blocker"
> > > > > > > > > I do not consider it to be more or less important than
> other
> > > > fixes.
> > > > > > It
> > > > > > > > can
> > > > > > > > > be fixed and included in the release alongside the rest of
> > the
> > > > > > release
> > > > > > > > > content, right?
> > > > > > > > > From the description of the issue it seems important since
> it
> > > is
> > > > > > > blocking
> > > > > > > > > users from loading models that were previously trained and
> > > saved.
> > > > > > There
> > > > > > > > is
> > > > > > > > > nothing stopping the community from including this fix into
> > > > 1.3.0,
> > > > > > > > > alongside the rest of the features and fixes.
> > > > > > > > >
> > > > > > > > > - "The bug exists since SymbolBlock was introduced a year
> ago
> > > and
> > > > > has
> > > > > > > > > survived at least three releases, so this is not a
> > regression."
> > > > > > > > > I do not think I said it is a regression. However, the
> fact a
> > > bug
> > > > > > > existed
> > > > > > > > > before, does not mean it is OK to release it rather than
> fix
> > > it.
> > > > > > > > >
> > > > > > > > > - "Timeline-wise, this bug was reported on 7/21, but was
> not
> > > > > reported
> > > > > > > as
> > > > > > > > > release-blocker in the release discussion thread until 8/31
> > > [1].
> > > > > > > Neither
> > > > > > > > > its reporting as release-blocker nor its fix made it for
> the
> > > 8/3
> > > > > code
> > > > > > > > > freeze."
> > > > > > > > > You are right, would have been better to have this
> identified
> > > and
> > > > > > fixed
> > > > > > > > > earlier and included before code freeze.
> > > > > > > > >
> > > > > > > > > - "The PR is still not ready yet as it doesn't have
> > approval."
> > > > > > > > > I think it is waiting for your review.
> > > > > > > > >
> > > > > > > > > - "it would be great if you could provide some additional
> > > > reasoning
> > > > > > > > besides
> > > > > > > > > "X mentions the issue" or "fix was done by X""
> > > > > > > > > I have. Repeating what I wrote in my previous email for
> > > clarity:
> > > > > > Basic
> > > > > > > > > functionality broken: loading a model (albeit one that that
> > was
> > > > > saved
> > > > > > > as
> > > > > > > > > non FP32)
> > > > > > > > >
> > > > > > > > > So, yes - this issue seems to have been out there for a
> > while,
> > > > > > somehow
> > > > > > > > went
> > > > > > > > > under the radar... but I think the key question is whether
> > this
> > > > > > blocks
> > > > > > > a
> > > > > > > > > basic functionality in MXNet. I believe so, hence my -1
> vote.
> > > > > > > > >
> > > > > > > > > Hagay
> > > > > > > > >
> > > > > > > > > On Tue, Sep 4, 2018 at 1:19 PM Sheng Zha <
> szha.pvg@gmail.com
> > >
> > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi Hagay and Sandeep,
> > > > > > > > > >
> > > > > > > > > > Could you help us understand why this specific bug is
> more
> > > > > > important
> > > > > > > > than
> > > > > > > > > > all the other known bugs, that this becomes a release
> > > blocker?
> > > > > > > > > >
> > > > > > > > > > Some facts to consider:
> > > > > > > > > > - The bug exists since SymbolBlock was introduced a year
> > ago
> > > > and
> > > > > > has
> > > > > > > > > > survived at least three releases, so this is not a
> > > regression.
> > > > > > > > > > - Timeline-wise, this bug was reported on 7/21, but was
> not
> > > > > > reported
> > > > > > > as
> > > > > > > > > > release-blocker in the release discussion thread until
> 8/31
> > > > [1].
> > > > > > > > Neither
> > > > > > > > > > its reporting as release-blocker nor its fix made it for
> > the
> > > > 8/3
> > > > > > code
> > > > > > > > > > freeze.
> > > > > > > > > > - The PR is still not ready yet as it doesn't have
> > approval.
> > > > > > > > > >
> > > > > > > > > > Hagay, it would be great if you could provide some
> > additional
> > > > > > > reasoning
> > > > > > > > > > besides "X mentions the issue" or "fix was done by X".
> > > Thanks.
> > > > > > > > > >
> > > > > > > > > > -sz
> > > > > > > > > >
> > > > > > > > > > [1]
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://lists.apache.org/thread.html/d1ed611f98c20d5d85c294b0c07c8bdebca13a209cf66a3872c9123e@%3Cdev.mxnet.apache.org%3E
> > > > > > > > > >
> > > > > > > > > > On Tue, Sep 4, 2018 at 12:39 PM Hagay Lupesko <
> > > > lupesko@gmail.com
> > > > > >
> > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Sandeep mentions the issue of an error when user tries
> to
> > > > load
> > > > > > > model
> > > > > > > > > > params
> > > > > > > > > > > trained/saved as FP16.
> > > > > > > > > > > https://github.com/apache/incubator-mxnet/issues/11849
> > > > > > > > > > > The fix was done by Sandeep:
> > > > > > > > > > > https://github.com/apache/incubator-mxnet/pull/12412
> and
> > > is
> > > > > > ready
> > > > > > > to
> > > > > > > > > be
> > > > > > > > > > > cherry picked into the release branch.
> > > > > > > > > > >
> > > > > > > > > > > This seems like a release blocker to me:
> > > > > > > > > > > - Basic functionality broken: loading a model (albeit
> one
> > > > that
> > > > > > that
> > > > > > > > was
> > > > > > > > > > > saved as non FP32)
> > > > > > > > > > > - Reported by 3 users (wgchang@, nicklhy@ and
> > > ThomasDelteil@
> > > > )
> > > > > > > > > > >
> > > > > > > > > > > -1 (non binding)
> > > > > > > > > > >
> > > > > > > > > > > Hagay
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Tue, Sep 4, 2018 at 12:01 PM sandeep krishnamurthy <
> > > > > > > > > > > sandeep.krishna98@gmail.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > > "- 0"
> > > > > > > > > > > >
> > > > > > > > > > > > I believe the bug #11849
> > > > > > > > > > > > <
> > https://github.com/apache/incubator-mxnet/issues/11849
> > > >,
> > > > > > unable
> > > > > > > > to
> > > > > > > > > > > import
> > > > > > > > > > > > non-fp32 models into Gluon, fixed in this PR #12412
> > > > > > > > > > > > <
> https://github.com/apache/incubator-mxnet/pull/12412>
> > > is
> > > > > > > > important
> > > > > > > > > > for
> > > > > > > > > > > > the
> > > > > > > > > > > > users. I would rather pick this fix in this release
> > than
> > > > > plan a
> > > > > > > > minor
> > > > > > > > > > > > release later.
> > > > > > > > > > > >
> > > > > > > > > > > > Best,
> > > > > > > > > > > > Sandeep
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Mon, Sep 3, 2018 at 2:34 PM Philip Cho <
> > > > > > > > > chohyu01@cs.washington.edu>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Actually, the command "git clone --recursive
> > > > > > > > > > > > > https://github.com/apache/incubator-mxnet -b
> > > 1.3.0.rc0"
> > > > > > works
> > > > > > > > fine
> > > > > > > > > > > now,
> > > > > > > > > > > > > never mind.
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Mon, Sep 3, 2018 at 1:45 PM Philip Cho <
> > > > > > > > > > chohyu01@cs.washington.edu>
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Unfortunately, MXNet was depending on a branch of
> > TVM
> > > > > that
> > > > > > is
> > > > > > > > now
> > > > > > > > > > > > > deleted.
> > > > > > > > > > > > > > We will have to merge #12448
> > > > > > > > > > > > > > <
> > > https://github.com/apache/incubator-mxnet/pull/12448>
> > > > > > > before
> > > > > > > > > the
> > > > > > > > > > > > > release.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Background: See dmlc/tvm#1394 <
> > > > > > > > > > > https://github.com/dmlc/tvm/issues/1394
> > > > > > > > > > > > >.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Philip.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Mon, Sep 3, 2018 at 7:26 AM Carin Meier <
> > > > > > > > carinmeier@gmail.com
> > > > > > > > > >
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >> Checked out the tag, built and tested the
> Clojure
> > > > > package.
> > > > > > > +1
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >> On Fri, Aug 31, 2018 at 10:59 PM Roshani
> Nagmote <
> > > > > > > > > > > > > >> roshaninagmote2@gmail.com>
> > > > > > > > > > > > > >> wrote:
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >> > Hi all,
> > > > > > > > > > > > > >> >
> > > > > > > > > > > > > >> > I would like to propose a vote to release
> Apache
> > > > MXNet
> > > > > > > > > > > (incubating)
> > > > > > > > > > > > > >> version
> > > > > > > > > > > > > >> > 1.3.0.RC0. Voting will start now (Friday, Aug
> > > 31st)
> > > > > and
> > > > > > > end
> > > > > > > > at
> > > > > > > > > > > 7:00
> > > > > > > > > > > > PM
> > > > > > > > > > > > > >> > PDT, Wednesday, Sept 5th.
> > > > > > > > > > > > > >> >
> > > > > > > > > > > > > >> > Link to release notes:
> > > > > > > > > > > > > >> >
> > > https://github.com/apache/incubator-mxnet/releases
> > > > > > > > > > > > > >> >
> > > > > > > > > > > > > >> > Link to release candidate 1.3.0.rc0:
> > > > > > > > > > > > > >> > *
> > > > > > > > > >
> > > > https://github.com/apache/incubator-mxnet/releases/tag/1.3.0.rc
> > > > > > > > > > > > > >> > <
> > > > > > > > > >
> > > > https://github.com/apache/incubator-mxnet/releases/tag/1.3.0.rc0
> > > > > > > > > > > > >0*
> > > > > > > > > > > > > >> >
> > > > > > > > > > > > > >> > View this page, click on "Build from Source",
> > and
> > > > use
> > > > > > the
> > > > > > > > > source
> > > > > > > > > > > > code
> > > > > > > > > > > > > >> > obtained from 1.3.0.rc0 tag:
> > > > > > > > > > > > > >> >
> > > > https://mxnet.incubator.apache.org/install/index.html
> > > > > > > > > > > > > >> >
> > > > > > > > > > > > > >> > Please remember to TEST first before voting
> > > > > accordingly:
> > > > > > > > > > > > > >> >
> > > > > > > > > > > > > >> > +1 = approve
> > > > > > > > > > > > > >> > +0 = no opinion
> > > > > > > > > > > > > >> > -1 = disapprove (provide reason)
> > > > > > > > > > > > > >> >
> > > > > > > > > > > > > >> > Thanks,
> > > > > > > > > > > > > >> > Roshani
> > > > > > > > > > > > > >> >
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > --
> > > > > > > > > > > > Sandeep Krishnamurthy
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Sandeep Krishnamurthy
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > >
> > > --
> > > Sandeep Krishnamurthy
> > >
> >
>

Re: [VOTE] Release MXNet version 1.3.0.RC0

Posted by Roshani Nagmote <ro...@gmail.com>.
Thanks Kellen and Naveen for pointing it out.
Now we have 3 committers +1 votes to move forward with the release. But it
will be great if more people can test the release.

I am extending the timeline for voting till 7 pm today. Please test and
vote.

Thanks,
Roshani

On Thu, Sep 6, 2018 at 5:46 AM Naveen Swamy <mn...@gmail.com> wrote:

> +1
>
>
> Roshani/Sheng,
>
> Thanks for putting this release together, I was able to test the release
> only now. As Kellen indicated this release does not have enough committer
> votes, I suggest you extend the timeline.
>
> I downloaded the source code from
> https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.3.0.rc0/.
>
> I verified the signature of the release and built the Scala package from
> this source, I was able to run Scala Unit Tests and Integration tests
> successfully.
>
> Also IMO, the issue that Sandeep though is good to include in the release,
> I would not consider it a release blocker since it has a work around and
> you can add it to release notes as a link to the github issue with the
> workaround.
>
> Other notes (consider adding to retrospective):
>
> On running  gpg --verify, I received a message that the signature is Good
> from Sheng Zha along with a WARNING(gpg: WARNING: This key is not certified
> with a trusted signature!), On researching I found this is fine[1] and the
> fingerprint matches with Sheng's Key here
> https://dist.apache.org/repos/dist/dev/incubator/mxnet/KEYS.
>
> Next time, please send a link to the source and signatures on apache dist
> server
>
> I am currently working with Qing to create and test a maven package for
> Scala, please wait and add that to the Announcement email.
>
> Next time, please give a day or two after the RC is cut so we can create
> packages for various language bindings(Scala, Clojure, R) --(currently this
> is manual), so we can get the packages that users use tested during the RC
> phase.
>
> During the release, I suggest the release manager communicate
> regularly(daily) on dev@ until an announcement is made so everyone is
> aware
> of the status and can plan their work to accommodate building packages,
> testing RC, etc.,
>
> 1.
>
> http://www.apache.org/dev/release-signing.html#valid-untrusted-vs-invalid-trusted
>
>
> Thanks, Naveen
>
>
>
> On Wed, Sep 5, 2018 at 10:20 AM, Aaron Markham <aa...@gmail.com>
> wrote:
>
> > 0 (non-binding) If we have a problem that blocks users, and a solution in
> > hand... then we should fix it, but not at the expense of starting the
> > release cycle again just for one fix. Users can cherry pick or build from
> > master if they want the fix right away, right? I'd change my mind to -1
> if
> > this wasn't the case, with good reason, and if the user impact was
> critical
> > to adoption or risks abandonment.
> >
> >
> > On Wed, Sep 5, 2018 at 9:57 AM Roshani Nagmote <
> roshaninagmote2@gmail.com>
> > wrote:
> >
> > > I believe everyone here is working hard to make MXNet a better
> framework
> > > for users. It's completely okay to have different opinions, we can
> decide
> > > together if this issue is a blocker or not after voting time is over.
> > >
> > > As I mentioned before, voting will end at 7 pm today. So there is still
> > > time to test the release. If there are any other issues anyone finds, I
> > > will be happy to start the process again and work on RC1. For now, I
> want
> > > to encourage everyone to utilize this time and vote. :)
> > >
> > > Thanks,
> > > Roshani
> > >
> > > On Tue, Sep 4, 2018 at 10:35 PM sandeep krishnamurthy <
> > > sandeep.krishna98@gmail.com> wrote:
> > >
> > > >    1. As a Apache MXNet community member, I raised the concern of
> > broken
> > > >    functionality for the user. I explained and provided the data
> points
> > > on
> > > > the
> > > >    issue, workaround and why I think it is important. If after all
> > this,
> > > > you
> > > >    think my vote is biased on my employer just because a user I
> quoted
> > is
> > > > from
> > > >    Amazon, this is more concerning to me on my voting abilities.
> > > >    2. My -1 no where undermines the huge amount of effort that goes
> > > behind
> > > >    the scene for a release to happen. Great respect and recognition
> for
> > > >    everyone involved in all the releases of MXNet in the past and
> > this. I
> > > >    voted on my judgement of what may be good for the users of MXNet.
> > > >    3. As pointed by Naveen & Chris, -1 are NOT veto. Feel free to
> > decide
> > > >    and progress on the release as we already have >3 +1 in this
> thread.
> > > >
> > > >
> > > > Best,
> > > >
> > > > Sandeep
> > > >
> > > > On Tue, Sep 4, 2018 at 8:29 PM Chris Olivier <cj...@gmail.com>
> > > > wrote:
> > > >
> > > > > btw, there are no vetoes on package releases:
> > > > >
> > > > > VOTES ON PACKAGE RELEASES
> > > > > <https://www.apache.org/foundation/voting.html#ReleaseVotes>
> > > > >
> > > > > Votes on whether a package is ready to be released use majority
> > > approval
> > > > > <https://www.apache.org/foundation/glossary.html#MajorityApproval>
> > --
> > > > i.e.
> > > > > at least three PMC members must vote affirmatively for release, and
> > > there
> > > > > must be more positive than negative votes.Releases may not be
> vetoed.
> > > > > Generally
> > > > > the community will cancel the release vote if anyone identifies
> > serious
> > > > > problems, but in most cases the ultimate decision, lies with the
> > > > individual
> > > > > serving as release manager. The specifics of the process may vary
> > from
> > > > > project to project, but the 'minimum quorum of three +1 votes' rule
> > is
> > > > > universal.
> > > > >
> > > > > On Tue, Sep 4, 2018 at 7:12 PM Sheng Zha <sz...@gmail.com>
> wrote:
> > > > >
> > > > > > Thanks for sharing your opinions, Thomas. Your recognition and
> > > respect
> > > > of
> > > > > > people's efforts on preparing the release candidate are certainly
> > > > > > appreciated.
> > > > > >
> > > > > > Now that the vote is set to fail thanks to the veto, there will
> be
> > > > plenty
> > > > > > of opportunities to include those bug fixes, including the one
> Zhi
> > > > > > mentioned [1], which was already merged in the master and yet
> chose
> > > not
> > > > > to
> > > > > > block this release with [2]. I will be happy to work with Roshani
> > to
> > > > > > prepare another release candidate once ready.
> > > > > >
> > > > > > -sz
> > > > > >
> > > > > > [1]
> > > > > >
> > > > > >
> > > > >
> > > >
> > > https://lists.apache.org/thread.html/f02e952bec22c82cb00a6741390a78
> > f55373311c97464997bb455a6c@%3Cdev.mxnet.apache.org%3E
> > > > > > [2]
> > > > > >
> > > > > >
> > > > >
> > > >
> > > https://lists.apache.org/thread.html/85d3fcabb3437ba7f1af455cf69aa1
> > 3eb3afd1ea1d1f6f891e9c339c@%3Cdev.mxnet.apache.org%3E
> > > > > >
> > > > > > On Tue, Sep 4, 2018 at 6:02 PM Thomas DELTEIL <
> > > > thomas.delteil1@gmail.com
> > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > -0
> > > > > > > (non-binding)
> > > > > > >
> > > > > > > If I may add some nuancing plus a personal data point as one of
> > the
> > > > > users
> > > > > > > commenting in the bug report in question:
> > > > > > >
> > > > > > > - Performance vs. Basic functionality => I don't think high
> > > > performance
> > > > > > > use-cases and basic functionality are two obviously opposed
> > > concepts
> > > > > and
> > > > > > > see no contradiction in Hagay's and Sandeep's statements.
> > > > > > > Float16 support is feature of MXNet that provides more than
> twice
> > > the
> > > > > > > performance of Float32 on supported platforms, hence the high
> > > > > performance
> > > > > > > use-case. The bug is that the basic functionality of reloading
> a
> > > > saved
> > > > > > > float16 models is currently broken.
> > > > > > >
> > > > > > > - This bug vs Other bugs => Contrary the vast majority of the
> 140
> > > > open
> > > > > > bugs
> > > > > > > that are mentioned above, I would put to Sandeep's credit that
> > this
> > > > one
> > > > > > bug
> > > > > > > has a PR open that provides a fix for it. This would make it a
> > > better
> > > > > > > candidate to get included in this release than a bug that has
> no
> > > fix
> > > > > > ready
> > > > > > > for it.
> > > > > > >
> > > > > > > - Personal datapoint: I recently did some experimentation with
> > > > float16
> > > > > > [1]
> > > > > > > and actually coincidentally just published a video on
> optimizing
> > > > > > > performance for Gluon. Float16 conversion is one of the most,
> if
> > > not
> > > > > the
> > > > > > > most effective way to get performance out of MXNet [2]. I
> believe
> > > > there
> > > > > > is
> > > > > > > a lot of value in publicizing more its use and hence making
> sure
> > at
> > > > > least
> > > > > > > the basic support for normal use-cases is present.
> > > > > > >
> > > > > > > Of course this needs to be balanced with the overhead of
> > preparing
> > > a
> > > > > new
> > > > > > > release candidate once the fixed is reviewed and merged, which
> > > seems
> > > > to
> > > > > > be
> > > > > > > a lengthy and complex process in its own right, and the delay
> > with
> > > > > > > providing the other features present in 1.3 for users that are
> > not
> > > > > > running
> > > > > > > off the nightly builds.
> > > > > > >
> > > > > > > All the best,
> > > > > > >
> > > > > > > Thomas
> > > > > > >
> > > > > > > [1]
> https://github.com/ThomasDelteil/PerformanceTricksMXNetGluon
> > > > > > > [2]
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > https://www.youtube.com/watch?v=Cqo7FPftNyo&t=0s&list=
> > PLkEvNnRk8uVk6U515Pj-jHQUxFC4eDi3m
> > > > > > >
> > > > > > > Le mar. 4 sept. 2018 à 17:11, Sheng Zha <sz...@gmail.com> a
> > > > écrit :
> > > > > > >
> > > > > > > > Sandeep,
> > > > > > > >
> > > > > > > > Thanks for explaining your veto. We have open bugs that
> > impacted
> > > a
> > > > > lot
> > > > > > > more
> > > > > > > > than just 3 customers, just by referring to the number of
> > > > commenters
> > > > > on
> > > > > > > the
> > > > > > > > issue [1].
> > > > > > > >
> > > > > > > > You said that this is for "high performance use cases", which
> > > > > > contradicts
> > > > > > > > with Hagay's assement that this is "basic functionality
> > broken".
> > > > > Given
> > > > > > > that
> > > > > > > > this is for advanced use cases of using half-precision
> > training,
> > > > why
> > > > > is
> > > > > > > it
> > > > > > > > so much more important than any other open bug reports, that
> > for
> > > > this
> > > > > > > > specific bug fix, we have to delay the access of regular
> users
> > to
> > > > the
> > > > > > new
> > > > > > > > MXNet 1.3 release by at least another week?
> > > > > > > >
> > > > > > > > Honestly, I'm concerned that your vote is biased by Amazon
> > > > > involvement,
> > > > > > > > given that you quoted Amazon Rekognition.
> > > > > > > >
> > > > > > > > -sz
> > > > > > > >
> > > > > > > > [1]
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > https://github.com/apache/incubator-mxnet/issues?q=is%
> > 3Aissue+is%3Aopen+label%3ABug+sort%3Acomments-desc
> > > > > > > >
> > > > > > > > On Tue, Sep 4, 2018 at 4:51 PM sandeep krishnamurthy <
> > > > > > > > sandeep.krishna98@gmail.com> wrote:
> > > > > > > >
> > > > > > > > > My initial vote of “-0” was due to lack of info from a user
> > who
> > > > had
> > > > > > > said,
> > > > > > > > > he overcame this issue for FP16 model.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > However, suggested workaround [1] for the issue is not
> > straight
> > > > > > forward
> > > > > > > > and
> > > > > > > > > generally usable for all users. Also, issue is not simple
> and
> > > > > > isolated
> > > > > > > to
> > > > > > > > > be listed in the Release Notes as known issue with a
> > > workaround.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Changing my vote to: "-1 (binding)" owing to the user
> impact
> > > [3]
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > @Sheng:
> > > > > > > > >
> > > > > > > > > 1. Agreed, bug existed from long time. However, FP16 and
> such
> > > > > > > > optimizations
> > > > > > > > > were added later on. Followed by users [2] using this
> feature
> > > for
> > > > > > high
> > > > > > > > > performance use cases. It is not ok to measure severity of
> > the
> > > > bug
> > > > > > > based
> > > > > > > > on
> > > > > > > > > its past existence, rather we can see who is impacted now
> and
> > > is
> > > > > it a
> > > > > > > > small
> > > > > > > > > subset with a simple workaround or large user impacting
> > issue.
> > > > > > > > >
> > > > > > > > > 2. Agreed bug was reported 7/21. However, I became aware of
> > > this
> > > > > > issue
> > > > > > > on
> > > > > > > > > 08/29 and submitted the fix on 08/30. Also, I did bring
> this
> > to
> > > > the
> > > > > > > > notice
> > > > > > > > > of community, you and 1.3 release manager (Roshani) on the
> > RC0
> > > > > > proposal
> > > > > > > > > thread. Also, I would focus on the issue and user impact
> than
> > > who
> > > > > > > > > identified and who is fixing the issue.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Based on my discussion with 2 users, I think it is a
> > important
> > > > > > feature
> > > > > > > > for
> > > > > > > > > them to see in Apache MXNet v1.3.0.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > >
> > > > > > > > > Sandeep
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > [1] Workaround used by the user.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > net_fp16 =
> > > > > mx.gluon.SymbolBlock.imports('resnet34_fp16-symbol.json',
> > > > > > > > > ['data'])
> > > > > > > > >
> > > > > > > > > params_fp16 = mx.nd.load('resnet34_fp16-0000.params')
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > for k, v in params_fp16.items():
> > > > > > > > >
> > > > > > > > >     new_key = k.split(':')[1]
> > > > > > > > >
> > > > > > > > >     net_fp16.collect_params()[new_key].cast(v.dtype)
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > net_fp16.collect_params().load('resnet34_fp16-0000.params',
> > > ctx)
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > [2] Amazon Rekognition
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > [3] User story: Train a model -> Cast it to FP16 -> Save
> the
> > > > model
> > > > > ->
> > > > > > > > Load
> > > > > > > > > back the model does not work. They have to cast every
> > parameter
> > > > > with
> > > > > > a
> > > > > > > > > workaround mentioned above [1].
> > > > > > > > >
> > > > > > > > > On Tue, Sep 4, 2018 at 4:14 PM Hagay Lupesko <
> > > lupesko@gmail.com>
> > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi Sheng,
> > > > > > > > > >
> > > > > > > > > > Addressing your questions:
> > > > > > > > > >
> > > > > > > > > > - "why this specific bug is more important than all the
> > other
> > > > > known
> > > > > > > > bugs,
> > > > > > > > > > that this becomes a release blocker"
> > > > > > > > > > I do not consider it to be more or less important than
> > other
> > > > > fixes.
> > > > > > > It
> > > > > > > > > can
> > > > > > > > > > be fixed and included in the release alongside the rest
> of
> > > the
> > > > > > > release
> > > > > > > > > > content, right?
> > > > > > > > > > From the description of the issue it seems important
> since
> > it
> > > > is
> > > > > > > > blocking
> > > > > > > > > > users from loading models that were previously trained
> and
> > > > saved.
> > > > > > > There
> > > > > > > > > is
> > > > > > > > > > nothing stopping the community from including this fix
> into
> > > > > 1.3.0,
> > > > > > > > > > alongside the rest of the features and fixes.
> > > > > > > > > >
> > > > > > > > > > - "The bug exists since SymbolBlock was introduced a year
> > ago
> > > > and
> > > > > > has
> > > > > > > > > > survived at least three releases, so this is not a
> > > regression."
> > > > > > > > > > I do not think I said it is a regression. However, the
> > fact a
> > > > bug
> > > > > > > > existed
> > > > > > > > > > before, does not mean it is OK to release it rather than
> > fix
> > > > it.
> > > > > > > > > >
> > > > > > > > > > - "Timeline-wise, this bug was reported on 7/21, but was
> > not
> > > > > > reported
> > > > > > > > as
> > > > > > > > > > release-blocker in the release discussion thread until
> 8/31
> > > > [1].
> > > > > > > > Neither
> > > > > > > > > > its reporting as release-blocker nor its fix made it for
> > the
> > > > 8/3
> > > > > > code
> > > > > > > > > > freeze."
> > > > > > > > > > You are right, would have been better to have this
> > identified
> > > > and
> > > > > > > fixed
> > > > > > > > > > earlier and included before code freeze.
> > > > > > > > > >
> > > > > > > > > > - "The PR is still not ready yet as it doesn't have
> > > approval."
> > > > > > > > > > I think it is waiting for your review.
> > > > > > > > > >
> > > > > > > > > > - "it would be great if you could provide some additional
> > > > > reasoning
> > > > > > > > > besides
> > > > > > > > > > "X mentions the issue" or "fix was done by X""
> > > > > > > > > > I have. Repeating what I wrote in my previous email for
> > > > clarity:
> > > > > > > Basic
> > > > > > > > > > functionality broken: loading a model (albeit one that
> that
> > > was
> > > > > > saved
> > > > > > > > as
> > > > > > > > > > non FP32)
> > > > > > > > > >
> > > > > > > > > > So, yes - this issue seems to have been out there for a
> > > while,
> > > > > > > somehow
> > > > > > > > > went
> > > > > > > > > > under the radar... but I think the key question is
> whether
> > > this
> > > > > > > blocks
> > > > > > > > a
> > > > > > > > > > basic functionality in MXNet. I believe so, hence my -1
> > vote.
> > > > > > > > > >
> > > > > > > > > > Hagay
> > > > > > > > > >
> > > > > > > > > > On Tue, Sep 4, 2018 at 1:19 PM Sheng Zha <
> > szha.pvg@gmail.com
> > > >
> > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi Hagay and Sandeep,
> > > > > > > > > > >
> > > > > > > > > > > Could you help us understand why this specific bug is
> > more
> > > > > > > important
> > > > > > > > > than
> > > > > > > > > > > all the other known bugs, that this becomes a release
> > > > blocker?
> > > > > > > > > > >
> > > > > > > > > > > Some facts to consider:
> > > > > > > > > > > - The bug exists since SymbolBlock was introduced a
> year
> > > ago
> > > > > and
> > > > > > > has
> > > > > > > > > > > survived at least three releases, so this is not a
> > > > regression.
> > > > > > > > > > > - Timeline-wise, this bug was reported on 7/21, but was
> > not
> > > > > > > reported
> > > > > > > > as
> > > > > > > > > > > release-blocker in the release discussion thread until
> > 8/31
> > > > > [1].
> > > > > > > > > Neither
> > > > > > > > > > > its reporting as release-blocker nor its fix made it
> for
> > > the
> > > > > 8/3
> > > > > > > code
> > > > > > > > > > > freeze.
> > > > > > > > > > > - The PR is still not ready yet as it doesn't have
> > > approval.
> > > > > > > > > > >
> > > > > > > > > > > Hagay, it would be great if you could provide some
> > > additional
> > > > > > > > reasoning
> > > > > > > > > > > besides "X mentions the issue" or "fix was done by X".
> > > > Thanks.
> > > > > > > > > > >
> > > > > > > > > > > -sz
> > > > > > > > > > >
> > > > > > > > > > > [1]
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > https://lists.apache.org/thread.html/d1ed611f98c20d5d85c294b0c07c8b
> > debca13a209cf66a3872c9123e@%3Cdev.mxnet.apache.org%3E
> > > > > > > > > > >
> > > > > > > > > > > On Tue, Sep 4, 2018 at 12:39 PM Hagay Lupesko <
> > > > > lupesko@gmail.com
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Sandeep mentions the issue of an error when user
> tries
> > to
> > > > > load
> > > > > > > > model
> > > > > > > > > > > params
> > > > > > > > > > > > trained/saved as FP16.
> > > > > > > > > > > >
> https://github.com/apache/incubator-mxnet/issues/11849
> > > > > > > > > > > > The fix was done by Sandeep:
> > > > > > > > > > > > https://github.com/apache/incubator-mxnet/pull/12412
> > and
> > > > is
> > > > > > > ready
> > > > > > > > to
> > > > > > > > > > be
> > > > > > > > > > > > cherry picked into the release branch.
> > > > > > > > > > > >
> > > > > > > > > > > > This seems like a release blocker to me:
> > > > > > > > > > > > - Basic functionality broken: loading a model (albeit
> > one
> > > > > that
> > > > > > > that
> > > > > > > > > was
> > > > > > > > > > > > saved as non FP32)
> > > > > > > > > > > > - Reported by 3 users (wgchang@, nicklhy@ and
> > > > ThomasDelteil@
> > > > > )
> > > > > > > > > > > >
> > > > > > > > > > > > -1 (non binding)
> > > > > > > > > > > >
> > > > > > > > > > > > Hagay
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Tue, Sep 4, 2018 at 12:01 PM sandeep
> krishnamurthy <
> > > > > > > > > > > > sandeep.krishna98@gmail.com> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > "- 0"
> > > > > > > > > > > > >
> > > > > > > > > > > > > I believe the bug #11849
> > > > > > > > > > > > > <
> > > https://github.com/apache/incubator-mxnet/issues/11849
> > > > >,
> > > > > > > unable
> > > > > > > > > to
> > > > > > > > > > > > import
> > > > > > > > > > > > > non-fp32 models into Gluon, fixed in this PR #12412
> > > > > > > > > > > > > <https://github.com/apache/
> > incubator-mxnet/pull/12412>
> > > > is
> > > > > > > > > important
> > > > > > > > > > > for
> > > > > > > > > > > > > the
> > > > > > > > > > > > > users. I would rather pick this fix in this release
> > > than
> > > > > > plan a
> > > > > > > > > minor
> > > > > > > > > > > > > release later.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Best,
> > > > > > > > > > > > > Sandeep
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Mon, Sep 3, 2018 at 2:34 PM Philip Cho <
> > > > > > > > > > chohyu01@cs.washington.edu>
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Actually, the command "git clone --recursive
> > > > > > > > > > > > > > https://github.com/apache/incubator-mxnet -b
> > > > 1.3.0.rc0"
> > > > > > > works
> > > > > > > > > fine
> > > > > > > > > > > > now,
> > > > > > > > > > > > > > never mind.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Mon, Sep 3, 2018 at 1:45 PM Philip Cho <
> > > > > > > > > > > chohyu01@cs.washington.edu>
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Unfortunately, MXNet was depending on a branch
> of
> > > TVM
> > > > > > that
> > > > > > > is
> > > > > > > > > now
> > > > > > > > > > > > > > deleted.
> > > > > > > > > > > > > > > We will have to merge #12448
> > > > > > > > > > > > > > > <
> > > > https://github.com/apache/incubator-mxnet/pull/12448>
> > > > > > > > before
> > > > > > > > > > the
> > > > > > > > > > > > > > release.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Background: See dmlc/tvm#1394 <
> > > > > > > > > > > > https://github.com/dmlc/tvm/issues/1394
> > > > > > > > > > > > > >.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Philip.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Mon, Sep 3, 2018 at 7:26 AM Carin Meier <
> > > > > > > > > carinmeier@gmail.com
> > > > > > > > > > >
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >> Checked out the tag, built and tested the
> > Clojure
> > > > > > package.
> > > > > > > > +1
> > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > >> On Fri, Aug 31, 2018 at 10:59 PM Roshani
> > Nagmote <
> > > > > > > > > > > > > > >> roshaninagmote2@gmail.com>
> > > > > > > > > > > > > > >> wrote:
> > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > >> > Hi all,
> > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > >> > I would like to propose a vote to release
> > Apache
> > > > > MXNet
> > > > > > > > > > > > (incubating)
> > > > > > > > > > > > > > >> version
> > > > > > > > > > > > > > >> > 1.3.0.RC0. Voting will start now (Friday,
> Aug
> > > > 31st)
> > > > > > and
> > > > > > > > end
> > > > > > > > > at
> > > > > > > > > > > > 7:00
> > > > > > > > > > > > > PM
> > > > > > > > > > > > > > >> > PDT, Wednesday, Sept 5th.
> > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > >> > Link to release notes:
> > > > > > > > > > > > > > >> >
> > > > https://github.com/apache/incubator-mxnet/releases
> > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > >> > Link to release candidate 1.3.0.rc0:
> > > > > > > > > > > > > > >> > *
> > > > > > > > > > >
> > > > > https://github.com/apache/incubator-mxnet/releases/tag/1.3.0.rc
> > > > > > > > > > > > > > >> > <
> > > > > > > > > > >
> > > > > https://github.com/apache/incubator-mxnet/releases/tag/1.3.0.rc0
> > > > > > > > > > > > > >0*
> > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > >> > View this page, click on "Build from
> Source",
> > > and
> > > > > use
> > > > > > > the
> > > > > > > > > > source
> > > > > > > > > > > > > code
> > > > > > > > > > > > > > >> > obtained from 1.3.0.rc0 tag:
> > > > > > > > > > > > > > >> >
> > > > > https://mxnet.incubator.apache.org/install/index.html
> > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > >> > Please remember to TEST first before voting
> > > > > > accordingly:
> > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > >> > +1 = approve
> > > > > > > > > > > > > > >> > +0 = no opinion
> > > > > > > > > > > > > > >> > -1 = disapprove (provide reason)
> > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > >> > Thanks,
> > > > > > > > > > > > > > >> > Roshani
> > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > --
> > > > > > > > > > > > > Sandeep Krishnamurthy
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Sandeep Krishnamurthy
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Sandeep Krishnamurthy
> > > >
> > >
> >
>

Re: [VOTE] Release MXNet version 1.3.0.RC0

Posted by Naveen Swamy <mn...@gmail.com>.
+1


Roshani/Sheng,

Thanks for putting this release together, I was able to test the release
only now. As Kellen indicated this release does not have enough committer
votes, I suggest you extend the timeline.

I downloaded the source code from
https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.3.0.rc0/.

I verified the signature of the release and built the Scala package from
this source, I was able to run Scala Unit Tests and Integration tests
successfully.

Also IMO, the issue that Sandeep though is good to include in the release,
I would not consider it a release blocker since it has a work around and
you can add it to release notes as a link to the github issue with the
workaround.

Other notes (consider adding to retrospective):

On running  gpg --verify, I received a message that the signature is Good
from Sheng Zha along with a WARNING(gpg: WARNING: This key is not certified
with a trusted signature!), On researching I found this is fine[1] and the
fingerprint matches with Sheng's Key here
https://dist.apache.org/repos/dist/dev/incubator/mxnet/KEYS.

Next time, please send a link to the source and signatures on apache dist
server

I am currently working with Qing to create and test a maven package for
Scala, please wait and add that to the Announcement email.

Next time, please give a day or two after the RC is cut so we can create
packages for various language bindings(Scala, Clojure, R) --(currently this
is manual), so we can get the packages that users use tested during the RC
phase.

During the release, I suggest the release manager communicate
regularly(daily) on dev@ until an announcement is made so everyone is aware
of the status and can plan their work to accommodate building packages,
testing RC, etc.,

1.
http://www.apache.org/dev/release-signing.html#valid-untrusted-vs-invalid-trusted


Thanks, Naveen



On Wed, Sep 5, 2018 at 10:20 AM, Aaron Markham <aa...@gmail.com>
wrote:

> 0 (non-binding) If we have a problem that blocks users, and a solution in
> hand... then we should fix it, but not at the expense of starting the
> release cycle again just for one fix. Users can cherry pick or build from
> master if they want the fix right away, right? I'd change my mind to -1 if
> this wasn't the case, with good reason, and if the user impact was critical
> to adoption or risks abandonment.
>
>
> On Wed, Sep 5, 2018 at 9:57 AM Roshani Nagmote <ro...@gmail.com>
> wrote:
>
> > I believe everyone here is working hard to make MXNet a better framework
> > for users. It's completely okay to have different opinions, we can decide
> > together if this issue is a blocker or not after voting time is over.
> >
> > As I mentioned before, voting will end at 7 pm today. So there is still
> > time to test the release. If there are any other issues anyone finds, I
> > will be happy to start the process again and work on RC1. For now, I want
> > to encourage everyone to utilize this time and vote. :)
> >
> > Thanks,
> > Roshani
> >
> > On Tue, Sep 4, 2018 at 10:35 PM sandeep krishnamurthy <
> > sandeep.krishna98@gmail.com> wrote:
> >
> > >    1. As a Apache MXNet community member, I raised the concern of
> broken
> > >    functionality for the user. I explained and provided the data points
> > on
> > > the
> > >    issue, workaround and why I think it is important. If after all
> this,
> > > you
> > >    think my vote is biased on my employer just because a user I quoted
> is
> > > from
> > >    Amazon, this is more concerning to me on my voting abilities.
> > >    2. My -1 no where undermines the huge amount of effort that goes
> > behind
> > >    the scene for a release to happen. Great respect and recognition for
> > >    everyone involved in all the releases of MXNet in the past and
> this. I
> > >    voted on my judgement of what may be good for the users of MXNet.
> > >    3. As pointed by Naveen & Chris, -1 are NOT veto. Feel free to
> decide
> > >    and progress on the release as we already have >3 +1 in this thread.
> > >
> > >
> > > Best,
> > >
> > > Sandeep
> > >
> > > On Tue, Sep 4, 2018 at 8:29 PM Chris Olivier <cj...@gmail.com>
> > > wrote:
> > >
> > > > btw, there are no vetoes on package releases:
> > > >
> > > > VOTES ON PACKAGE RELEASES
> > > > <https://www.apache.org/foundation/voting.html#ReleaseVotes>
> > > >
> > > > Votes on whether a package is ready to be released use majority
> > approval
> > > > <https://www.apache.org/foundation/glossary.html#MajorityApproval>
> --
> > > i.e.
> > > > at least three PMC members must vote affirmatively for release, and
> > there
> > > > must be more positive than negative votes.Releases may not be vetoed.
> > > > Generally
> > > > the community will cancel the release vote if anyone identifies
> serious
> > > > problems, but in most cases the ultimate decision, lies with the
> > > individual
> > > > serving as release manager. The specifics of the process may vary
> from
> > > > project to project, but the 'minimum quorum of three +1 votes' rule
> is
> > > > universal.
> > > >
> > > > On Tue, Sep 4, 2018 at 7:12 PM Sheng Zha <sz...@gmail.com> wrote:
> > > >
> > > > > Thanks for sharing your opinions, Thomas. Your recognition and
> > respect
> > > of
> > > > > people's efforts on preparing the release candidate are certainly
> > > > > appreciated.
> > > > >
> > > > > Now that the vote is set to fail thanks to the veto, there will be
> > > plenty
> > > > > of opportunities to include those bug fixes, including the one Zhi
> > > > > mentioned [1], which was already merged in the master and yet chose
> > not
> > > > to
> > > > > block this release with [2]. I will be happy to work with Roshani
> to
> > > > > prepare another release candidate once ready.
> > > > >
> > > > > -sz
> > > > >
> > > > > [1]
> > > > >
> > > > >
> > > >
> > >
> > https://lists.apache.org/thread.html/f02e952bec22c82cb00a6741390a78
> f55373311c97464997bb455a6c@%3Cdev.mxnet.apache.org%3E
> > > > > [2]
> > > > >
> > > > >
> > > >
> > >
> > https://lists.apache.org/thread.html/85d3fcabb3437ba7f1af455cf69aa1
> 3eb3afd1ea1d1f6f891e9c339c@%3Cdev.mxnet.apache.org%3E
> > > > >
> > > > > On Tue, Sep 4, 2018 at 6:02 PM Thomas DELTEIL <
> > > thomas.delteil1@gmail.com
> > > > >
> > > > > wrote:
> > > > >
> > > > > > -0
> > > > > > (non-binding)
> > > > > >
> > > > > > If I may add some nuancing plus a personal data point as one of
> the
> > > > users
> > > > > > commenting in the bug report in question:
> > > > > >
> > > > > > - Performance vs. Basic functionality => I don't think high
> > > performance
> > > > > > use-cases and basic functionality are two obviously opposed
> > concepts
> > > > and
> > > > > > see no contradiction in Hagay's and Sandeep's statements.
> > > > > > Float16 support is feature of MXNet that provides more than twice
> > the
> > > > > > performance of Float32 on supported platforms, hence the high
> > > > performance
> > > > > > use-case. The bug is that the basic functionality of reloading a
> > > saved
> > > > > > float16 models is currently broken.
> > > > > >
> > > > > > - This bug vs Other bugs => Contrary the vast majority of the 140
> > > open
> > > > > bugs
> > > > > > that are mentioned above, I would put to Sandeep's credit that
> this
> > > one
> > > > > bug
> > > > > > has a PR open that provides a fix for it. This would make it a
> > better
> > > > > > candidate to get included in this release than a bug that has no
> > fix
> > > > > ready
> > > > > > for it.
> > > > > >
> > > > > > - Personal datapoint: I recently did some experimentation with
> > > float16
> > > > > [1]
> > > > > > and actually coincidentally just published a video on optimizing
> > > > > > performance for Gluon. Float16 conversion is one of the most, if
> > not
> > > > the
> > > > > > most effective way to get performance out of MXNet [2]. I believe
> > > there
> > > > > is
> > > > > > a lot of value in publicizing more its use and hence making sure
> at
> > > > least
> > > > > > the basic support for normal use-cases is present.
> > > > > >
> > > > > > Of course this needs to be balanced with the overhead of
> preparing
> > a
> > > > new
> > > > > > release candidate once the fixed is reviewed and merged, which
> > seems
> > > to
> > > > > be
> > > > > > a lengthy and complex process in its own right, and the delay
> with
> > > > > > providing the other features present in 1.3 for users that are
> not
> > > > > running
> > > > > > off the nightly builds.
> > > > > >
> > > > > > All the best,
> > > > > >
> > > > > > Thomas
> > > > > >
> > > > > > [1] https://github.com/ThomasDelteil/PerformanceTricksMXNetGluon
> > > > > > [2]
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > https://www.youtube.com/watch?v=Cqo7FPftNyo&t=0s&list=
> PLkEvNnRk8uVk6U515Pj-jHQUxFC4eDi3m
> > > > > >
> > > > > > Le mar. 4 sept. 2018 à 17:11, Sheng Zha <sz...@gmail.com> a
> > > écrit :
> > > > > >
> > > > > > > Sandeep,
> > > > > > >
> > > > > > > Thanks for explaining your veto. We have open bugs that
> impacted
> > a
> > > > lot
> > > > > > more
> > > > > > > than just 3 customers, just by referring to the number of
> > > commenters
> > > > on
> > > > > > the
> > > > > > > issue [1].
> > > > > > >
> > > > > > > You said that this is for "high performance use cases", which
> > > > > contradicts
> > > > > > > with Hagay's assement that this is "basic functionality
> broken".
> > > > Given
> > > > > > that
> > > > > > > this is for advanced use cases of using half-precision
> training,
> > > why
> > > > is
> > > > > > it
> > > > > > > so much more important than any other open bug reports, that
> for
> > > this
> > > > > > > specific bug fix, we have to delay the access of regular users
> to
> > > the
> > > > > new
> > > > > > > MXNet 1.3 release by at least another week?
> > > > > > >
> > > > > > > Honestly, I'm concerned that your vote is biased by Amazon
> > > > involvement,
> > > > > > > given that you quoted Amazon Rekognition.
> > > > > > >
> > > > > > > -sz
> > > > > > >
> > > > > > > [1]
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > https://github.com/apache/incubator-mxnet/issues?q=is%
> 3Aissue+is%3Aopen+label%3ABug+sort%3Acomments-desc
> > > > > > >
> > > > > > > On Tue, Sep 4, 2018 at 4:51 PM sandeep krishnamurthy <
> > > > > > > sandeep.krishna98@gmail.com> wrote:
> > > > > > >
> > > > > > > > My initial vote of “-0” was due to lack of info from a user
> who
> > > had
> > > > > > said,
> > > > > > > > he overcame this issue for FP16 model.
> > > > > > > >
> > > > > > > >
> > > > > > > > However, suggested workaround [1] for the issue is not
> straight
> > > > > forward
> > > > > > > and
> > > > > > > > generally usable for all users. Also, issue is not simple and
> > > > > isolated
> > > > > > to
> > > > > > > > be listed in the Release Notes as known issue with a
> > workaround.
> > > > > > > >
> > > > > > > >
> > > > > > > > Changing my vote to: "-1 (binding)" owing to the user impact
> > [3]
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > @Sheng:
> > > > > > > >
> > > > > > > > 1. Agreed, bug existed from long time. However, FP16 and such
> > > > > > > optimizations
> > > > > > > > were added later on. Followed by users [2] using this feature
> > for
> > > > > high
> > > > > > > > performance use cases. It is not ok to measure severity of
> the
> > > bug
> > > > > > based
> > > > > > > on
> > > > > > > > its past existence, rather we can see who is impacted now and
> > is
> > > > it a
> > > > > > > small
> > > > > > > > subset with a simple workaround or large user impacting
> issue.
> > > > > > > >
> > > > > > > > 2. Agreed bug was reported 7/21. However, I became aware of
> > this
> > > > > issue
> > > > > > on
> > > > > > > > 08/29 and submitted the fix on 08/30. Also, I did bring this
> to
> > > the
> > > > > > > notice
> > > > > > > > of community, you and 1.3 release manager (Roshani) on the
> RC0
> > > > > proposal
> > > > > > > > thread. Also, I would focus on the issue and user impact than
> > who
> > > > > > > > identified and who is fixing the issue.
> > > > > > > >
> > > > > > > >
> > > > > > > > Based on my discussion with 2 users, I think it is a
> important
> > > > > feature
> > > > > > > for
> > > > > > > > them to see in Apache MXNet v1.3.0.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > Best,
> > > > > > > >
> > > > > > > > Sandeep
> > > > > > > >
> > > > > > > >
> > > > > > > > [1] Workaround used by the user.
> > > > > > > >
> > > > > > > >
> > > > > > > > net_fp16 =
> > > > mx.gluon.SymbolBlock.imports('resnet34_fp16-symbol.json',
> > > > > > > > ['data'])
> > > > > > > >
> > > > > > > > params_fp16 = mx.nd.load('resnet34_fp16-0000.params')
> > > > > > > >
> > > > > > > >
> > > > > > > > for k, v in params_fp16.items():
> > > > > > > >
> > > > > > > >     new_key = k.split(':')[1]
> > > > > > > >
> > > > > > > >     net_fp16.collect_params()[new_key].cast(v.dtype)
> > > > > > > >
> > > > > > > >
> > > > > > > > net_fp16.collect_params().load('resnet34_fp16-0000.params',
> > ctx)
> > > > > > > >
> > > > > > > >
> > > > > > > > [2] Amazon Rekognition
> > > > > > > >
> > > > > > > >
> > > > > > > > [3] User story: Train a model -> Cast it to FP16 -> Save the
> > > model
> > > > ->
> > > > > > > Load
> > > > > > > > back the model does not work. They have to cast every
> parameter
> > > > with
> > > > > a
> > > > > > > > workaround mentioned above [1].
> > > > > > > >
> > > > > > > > On Tue, Sep 4, 2018 at 4:14 PM Hagay Lupesko <
> > lupesko@gmail.com>
> > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Sheng,
> > > > > > > > >
> > > > > > > > > Addressing your questions:
> > > > > > > > >
> > > > > > > > > - "why this specific bug is more important than all the
> other
> > > > known
> > > > > > > bugs,
> > > > > > > > > that this becomes a release blocker"
> > > > > > > > > I do not consider it to be more or less important than
> other
> > > > fixes.
> > > > > > It
> > > > > > > > can
> > > > > > > > > be fixed and included in the release alongside the rest of
> > the
> > > > > > release
> > > > > > > > > content, right?
> > > > > > > > > From the description of the issue it seems important since
> it
> > > is
> > > > > > > blocking
> > > > > > > > > users from loading models that were previously trained and
> > > saved.
> > > > > > There
> > > > > > > > is
> > > > > > > > > nothing stopping the community from including this fix into
> > > > 1.3.0,
> > > > > > > > > alongside the rest of the features and fixes.
> > > > > > > > >
> > > > > > > > > - "The bug exists since SymbolBlock was introduced a year
> ago
> > > and
> > > > > has
> > > > > > > > > survived at least three releases, so this is not a
> > regression."
> > > > > > > > > I do not think I said it is a regression. However, the
> fact a
> > > bug
> > > > > > > existed
> > > > > > > > > before, does not mean it is OK to release it rather than
> fix
> > > it.
> > > > > > > > >
> > > > > > > > > - "Timeline-wise, this bug was reported on 7/21, but was
> not
> > > > > reported
> > > > > > > as
> > > > > > > > > release-blocker in the release discussion thread until 8/31
> > > [1].
> > > > > > > Neither
> > > > > > > > > its reporting as release-blocker nor its fix made it for
> the
> > > 8/3
> > > > > code
> > > > > > > > > freeze."
> > > > > > > > > You are right, would have been better to have this
> identified
> > > and
> > > > > > fixed
> > > > > > > > > earlier and included before code freeze.
> > > > > > > > >
> > > > > > > > > - "The PR is still not ready yet as it doesn't have
> > approval."
> > > > > > > > > I think it is waiting for your review.
> > > > > > > > >
> > > > > > > > > - "it would be great if you could provide some additional
> > > > reasoning
> > > > > > > > besides
> > > > > > > > > "X mentions the issue" or "fix was done by X""
> > > > > > > > > I have. Repeating what I wrote in my previous email for
> > > clarity:
> > > > > > Basic
> > > > > > > > > functionality broken: loading a model (albeit one that that
> > was
> > > > > saved
> > > > > > > as
> > > > > > > > > non FP32)
> > > > > > > > >
> > > > > > > > > So, yes - this issue seems to have been out there for a
> > while,
> > > > > > somehow
> > > > > > > > went
> > > > > > > > > under the radar... but I think the key question is whether
> > this
> > > > > > blocks
> > > > > > > a
> > > > > > > > > basic functionality in MXNet. I believe so, hence my -1
> vote.
> > > > > > > > >
> > > > > > > > > Hagay
> > > > > > > > >
> > > > > > > > > On Tue, Sep 4, 2018 at 1:19 PM Sheng Zha <
> szha.pvg@gmail.com
> > >
> > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi Hagay and Sandeep,
> > > > > > > > > >
> > > > > > > > > > Could you help us understand why this specific bug is
> more
> > > > > > important
> > > > > > > > than
> > > > > > > > > > all the other known bugs, that this becomes a release
> > > blocker?
> > > > > > > > > >
> > > > > > > > > > Some facts to consider:
> > > > > > > > > > - The bug exists since SymbolBlock was introduced a year
> > ago
> > > > and
> > > > > > has
> > > > > > > > > > survived at least three releases, so this is not a
> > > regression.
> > > > > > > > > > - Timeline-wise, this bug was reported on 7/21, but was
> not
> > > > > > reported
> > > > > > > as
> > > > > > > > > > release-blocker in the release discussion thread until
> 8/31
> > > > [1].
> > > > > > > > Neither
> > > > > > > > > > its reporting as release-blocker nor its fix made it for
> > the
> > > > 8/3
> > > > > > code
> > > > > > > > > > freeze.
> > > > > > > > > > - The PR is still not ready yet as it doesn't have
> > approval.
> > > > > > > > > >
> > > > > > > > > > Hagay, it would be great if you could provide some
> > additional
> > > > > > > reasoning
> > > > > > > > > > besides "X mentions the issue" or "fix was done by X".
> > > Thanks.
> > > > > > > > > >
> > > > > > > > > > -sz
> > > > > > > > > >
> > > > > > > > > > [1]
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > https://lists.apache.org/thread.html/d1ed611f98c20d5d85c294b0c07c8b
> debca13a209cf66a3872c9123e@%3Cdev.mxnet.apache.org%3E
> > > > > > > > > >
> > > > > > > > > > On Tue, Sep 4, 2018 at 12:39 PM Hagay Lupesko <
> > > > lupesko@gmail.com
> > > > > >
> > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Sandeep mentions the issue of an error when user tries
> to
> > > > load
> > > > > > > model
> > > > > > > > > > params
> > > > > > > > > > > trained/saved as FP16.
> > > > > > > > > > > https://github.com/apache/incubator-mxnet/issues/11849
> > > > > > > > > > > The fix was done by Sandeep:
> > > > > > > > > > > https://github.com/apache/incubator-mxnet/pull/12412
> and
> > > is
> > > > > > ready
> > > > > > > to
> > > > > > > > > be
> > > > > > > > > > > cherry picked into the release branch.
> > > > > > > > > > >
> > > > > > > > > > > This seems like a release blocker to me:
> > > > > > > > > > > - Basic functionality broken: loading a model (albeit
> one
> > > > that
> > > > > > that
> > > > > > > > was
> > > > > > > > > > > saved as non FP32)
> > > > > > > > > > > - Reported by 3 users (wgchang@, nicklhy@ and
> > > ThomasDelteil@
> > > > )
> > > > > > > > > > >
> > > > > > > > > > > -1 (non binding)
> > > > > > > > > > >
> > > > > > > > > > > Hagay
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Tue, Sep 4, 2018 at 12:01 PM sandeep krishnamurthy <
> > > > > > > > > > > sandeep.krishna98@gmail.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > > "- 0"
> > > > > > > > > > > >
> > > > > > > > > > > > I believe the bug #11849
> > > > > > > > > > > > <
> > https://github.com/apache/incubator-mxnet/issues/11849
> > > >,
> > > > > > unable
> > > > > > > > to
> > > > > > > > > > > import
> > > > > > > > > > > > non-fp32 models into Gluon, fixed in this PR #12412
> > > > > > > > > > > > <https://github.com/apache/
> incubator-mxnet/pull/12412>
> > > is
> > > > > > > > important
> > > > > > > > > > for
> > > > > > > > > > > > the
> > > > > > > > > > > > users. I would rather pick this fix in this release
> > than
> > > > > plan a
> > > > > > > > minor
> > > > > > > > > > > > release later.
> > > > > > > > > > > >
> > > > > > > > > > > > Best,
> > > > > > > > > > > > Sandeep
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Mon, Sep 3, 2018 at 2:34 PM Philip Cho <
> > > > > > > > > chohyu01@cs.washington.edu>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Actually, the command "git clone --recursive
> > > > > > > > > > > > > https://github.com/apache/incubator-mxnet -b
> > > 1.3.0.rc0"
> > > > > > works
> > > > > > > > fine
> > > > > > > > > > > now,
> > > > > > > > > > > > > never mind.
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Mon, Sep 3, 2018 at 1:45 PM Philip Cho <
> > > > > > > > > > chohyu01@cs.washington.edu>
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Unfortunately, MXNet was depending on a branch of
> > TVM
> > > > > that
> > > > > > is
> > > > > > > > now
> > > > > > > > > > > > > deleted.
> > > > > > > > > > > > > > We will have to merge #12448
> > > > > > > > > > > > > > <
> > > https://github.com/apache/incubator-mxnet/pull/12448>
> > > > > > > before
> > > > > > > > > the
> > > > > > > > > > > > > release.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Background: See dmlc/tvm#1394 <
> > > > > > > > > > > https://github.com/dmlc/tvm/issues/1394
> > > > > > > > > > > > >.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Philip.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Mon, Sep 3, 2018 at 7:26 AM Carin Meier <
> > > > > > > > carinmeier@gmail.com
> > > > > > > > > >
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >> Checked out the tag, built and tested the
> Clojure
> > > > > package.
> > > > > > > +1
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >> On Fri, Aug 31, 2018 at 10:59 PM Roshani
> Nagmote <
> > > > > > > > > > > > > >> roshaninagmote2@gmail.com>
> > > > > > > > > > > > > >> wrote:
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >> > Hi all,
> > > > > > > > > > > > > >> >
> > > > > > > > > > > > > >> > I would like to propose a vote to release
> Apache
> > > > MXNet
> > > > > > > > > > > (incubating)
> > > > > > > > > > > > > >> version
> > > > > > > > > > > > > >> > 1.3.0.RC0. Voting will start now (Friday, Aug
> > > 31st)
> > > > > and
> > > > > > > end
> > > > > > > > at
> > > > > > > > > > > 7:00
> > > > > > > > > > > > PM
> > > > > > > > > > > > > >> > PDT, Wednesday, Sept 5th.
> > > > > > > > > > > > > >> >
> > > > > > > > > > > > > >> > Link to release notes:
> > > > > > > > > > > > > >> >
> > > https://github.com/apache/incubator-mxnet/releases
> > > > > > > > > > > > > >> >
> > > > > > > > > > > > > >> > Link to release candidate 1.3.0.rc0:
> > > > > > > > > > > > > >> > *
> > > > > > > > > >
> > > > https://github.com/apache/incubator-mxnet/releases/tag/1.3.0.rc
> > > > > > > > > > > > > >> > <
> > > > > > > > > >
> > > > https://github.com/apache/incubator-mxnet/releases/tag/1.3.0.rc0
> > > > > > > > > > > > >0*
> > > > > > > > > > > > > >> >
> > > > > > > > > > > > > >> > View this page, click on "Build from Source",
> > and
> > > > use
> > > > > > the
> > > > > > > > > source
> > > > > > > > > > > > code
> > > > > > > > > > > > > >> > obtained from 1.3.0.rc0 tag:
> > > > > > > > > > > > > >> >
> > > > https://mxnet.incubator.apache.org/install/index.html
> > > > > > > > > > > > > >> >
> > > > > > > > > > > > > >> > Please remember to TEST first before voting
> > > > > accordingly:
> > > > > > > > > > > > > >> >
> > > > > > > > > > > > > >> > +1 = approve
> > > > > > > > > > > > > >> > +0 = no opinion
> > > > > > > > > > > > > >> > -1 = disapprove (provide reason)
> > > > > > > > > > > > > >> >
> > > > > > > > > > > > > >> > Thanks,
> > > > > > > > > > > > > >> > Roshani
> > > > > > > > > > > > > >> >
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > --
> > > > > > > > > > > > Sandeep Krishnamurthy
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Sandeep Krishnamurthy
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > >
> > > --
> > > Sandeep Krishnamurthy
> > >
> >
>

Re: [VOTE] Release MXNet version 1.3.0.RC0

Posted by Aaron Markham <aa...@gmail.com>.
0 (non-binding) If we have a problem that blocks users, and a solution in
hand... then we should fix it, but not at the expense of starting the
release cycle again just for one fix. Users can cherry pick or build from
master if they want the fix right away, right? I'd change my mind to -1 if
this wasn't the case, with good reason, and if the user impact was critical
to adoption or risks abandonment.


On Wed, Sep 5, 2018 at 9:57 AM Roshani Nagmote <ro...@gmail.com>
wrote:

> I believe everyone here is working hard to make MXNet a better framework
> for users. It's completely okay to have different opinions, we can decide
> together if this issue is a blocker or not after voting time is over.
>
> As I mentioned before, voting will end at 7 pm today. So there is still
> time to test the release. If there are any other issues anyone finds, I
> will be happy to start the process again and work on RC1. For now, I want
> to encourage everyone to utilize this time and vote. :)
>
> Thanks,
> Roshani
>
> On Tue, Sep 4, 2018 at 10:35 PM sandeep krishnamurthy <
> sandeep.krishna98@gmail.com> wrote:
>
> >    1. As a Apache MXNet community member, I raised the concern of broken
> >    functionality for the user. I explained and provided the data points
> on
> > the
> >    issue, workaround and why I think it is important. If after all this,
> > you
> >    think my vote is biased on my employer just because a user I quoted is
> > from
> >    Amazon, this is more concerning to me on my voting abilities.
> >    2. My -1 no where undermines the huge amount of effort that goes
> behind
> >    the scene for a release to happen. Great respect and recognition for
> >    everyone involved in all the releases of MXNet in the past and this. I
> >    voted on my judgement of what may be good for the users of MXNet.
> >    3. As pointed by Naveen & Chris, -1 are NOT veto. Feel free to decide
> >    and progress on the release as we already have >3 +1 in this thread.
> >
> >
> > Best,
> >
> > Sandeep
> >
> > On Tue, Sep 4, 2018 at 8:29 PM Chris Olivier <cj...@gmail.com>
> > wrote:
> >
> > > btw, there are no vetoes on package releases:
> > >
> > > VOTES ON PACKAGE RELEASES
> > > <https://www.apache.org/foundation/voting.html#ReleaseVotes>
> > >
> > > Votes on whether a package is ready to be released use majority
> approval
> > > <https://www.apache.org/foundation/glossary.html#MajorityApproval> --
> > i.e.
> > > at least three PMC members must vote affirmatively for release, and
> there
> > > must be more positive than negative votes.Releases may not be vetoed.
> > > Generally
> > > the community will cancel the release vote if anyone identifies serious
> > > problems, but in most cases the ultimate decision, lies with the
> > individual
> > > serving as release manager. The specifics of the process may vary from
> > > project to project, but the 'minimum quorum of three +1 votes' rule is
> > > universal.
> > >
> > > On Tue, Sep 4, 2018 at 7:12 PM Sheng Zha <sz...@gmail.com> wrote:
> > >
> > > > Thanks for sharing your opinions, Thomas. Your recognition and
> respect
> > of
> > > > people's efforts on preparing the release candidate are certainly
> > > > appreciated.
> > > >
> > > > Now that the vote is set to fail thanks to the veto, there will be
> > plenty
> > > > of opportunities to include those bug fixes, including the one Zhi
> > > > mentioned [1], which was already merged in the master and yet chose
> not
> > > to
> > > > block this release with [2]. I will be happy to work with Roshani to
> > > > prepare another release candidate once ready.
> > > >
> > > > -sz
> > > >
> > > > [1]
> > > >
> > > >
> > >
> >
> https://lists.apache.org/thread.html/f02e952bec22c82cb00a6741390a78f55373311c97464997bb455a6c@%3Cdev.mxnet.apache.org%3E
> > > > [2]
> > > >
> > > >
> > >
> >
> https://lists.apache.org/thread.html/85d3fcabb3437ba7f1af455cf69aa13eb3afd1ea1d1f6f891e9c339c@%3Cdev.mxnet.apache.org%3E
> > > >
> > > > On Tue, Sep 4, 2018 at 6:02 PM Thomas DELTEIL <
> > thomas.delteil1@gmail.com
> > > >
> > > > wrote:
> > > >
> > > > > -0
> > > > > (non-binding)
> > > > >
> > > > > If I may add some nuancing plus a personal data point as one of the
> > > users
> > > > > commenting in the bug report in question:
> > > > >
> > > > > - Performance vs. Basic functionality => I don't think high
> > performance
> > > > > use-cases and basic functionality are two obviously opposed
> concepts
> > > and
> > > > > see no contradiction in Hagay's and Sandeep's statements.
> > > > > Float16 support is feature of MXNet that provides more than twice
> the
> > > > > performance of Float32 on supported platforms, hence the high
> > > performance
> > > > > use-case. The bug is that the basic functionality of reloading a
> > saved
> > > > > float16 models is currently broken.
> > > > >
> > > > > - This bug vs Other bugs => Contrary the vast majority of the 140
> > open
> > > > bugs
> > > > > that are mentioned above, I would put to Sandeep's credit that this
> > one
> > > > bug
> > > > > has a PR open that provides a fix for it. This would make it a
> better
> > > > > candidate to get included in this release than a bug that has no
> fix
> > > > ready
> > > > > for it.
> > > > >
> > > > > - Personal datapoint: I recently did some experimentation with
> > float16
> > > > [1]
> > > > > and actually coincidentally just published a video on optimizing
> > > > > performance for Gluon. Float16 conversion is one of the most, if
> not
> > > the
> > > > > most effective way to get performance out of MXNet [2]. I believe
> > there
> > > > is
> > > > > a lot of value in publicizing more its use and hence making sure at
> > > least
> > > > > the basic support for normal use-cases is present.
> > > > >
> > > > > Of course this needs to be balanced with the overhead of preparing
> a
> > > new
> > > > > release candidate once the fixed is reviewed and merged, which
> seems
> > to
> > > > be
> > > > > a lengthy and complex process in its own right, and the delay with
> > > > > providing the other features present in 1.3 for users that are not
> > > > running
> > > > > off the nightly builds.
> > > > >
> > > > > All the best,
> > > > >
> > > > > Thomas
> > > > >
> > > > > [1] https://github.com/ThomasDelteil/PerformanceTricksMXNetGluon
> > > > > [2]
> > > > >
> > > > >
> > > >
> > >
> >
> https://www.youtube.com/watch?v=Cqo7FPftNyo&t=0s&list=PLkEvNnRk8uVk6U515Pj-jHQUxFC4eDi3m
> > > > >
> > > > > Le mar. 4 sept. 2018 à 17:11, Sheng Zha <sz...@gmail.com> a
> > écrit :
> > > > >
> > > > > > Sandeep,
> > > > > >
> > > > > > Thanks for explaining your veto. We have open bugs that impacted
> a
> > > lot
> > > > > more
> > > > > > than just 3 customers, just by referring to the number of
> > commenters
> > > on
> > > > > the
> > > > > > issue [1].
> > > > > >
> > > > > > You said that this is for "high performance use cases", which
> > > > contradicts
> > > > > > with Hagay's assement that this is "basic functionality broken".
> > > Given
> > > > > that
> > > > > > this is for advanced use cases of using half-precision training,
> > why
> > > is
> > > > > it
> > > > > > so much more important than any other open bug reports, that for
> > this
> > > > > > specific bug fix, we have to delay the access of regular users to
> > the
> > > > new
> > > > > > MXNet 1.3 release by at least another week?
> > > > > >
> > > > > > Honestly, I'm concerned that your vote is biased by Amazon
> > > involvement,
> > > > > > given that you quoted Amazon Rekognition.
> > > > > >
> > > > > > -sz
> > > > > >
> > > > > > [1]
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/incubator-mxnet/issues?q=is%3Aissue+is%3Aopen+label%3ABug+sort%3Acomments-desc
> > > > > >
> > > > > > On Tue, Sep 4, 2018 at 4:51 PM sandeep krishnamurthy <
> > > > > > sandeep.krishna98@gmail.com> wrote:
> > > > > >
> > > > > > > My initial vote of “-0” was due to lack of info from a user who
> > had
> > > > > said,
> > > > > > > he overcame this issue for FP16 model.
> > > > > > >
> > > > > > >
> > > > > > > However, suggested workaround [1] for the issue is not straight
> > > > forward
> > > > > > and
> > > > > > > generally usable for all users. Also, issue is not simple and
> > > > isolated
> > > > > to
> > > > > > > be listed in the Release Notes as known issue with a
> workaround.
> > > > > > >
> > > > > > >
> > > > > > > Changing my vote to: "-1 (binding)" owing to the user impact
> [3]
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > @Sheng:
> > > > > > >
> > > > > > > 1. Agreed, bug existed from long time. However, FP16 and such
> > > > > > optimizations
> > > > > > > were added later on. Followed by users [2] using this feature
> for
> > > > high
> > > > > > > performance use cases. It is not ok to measure severity of the
> > bug
> > > > > based
> > > > > > on
> > > > > > > its past existence, rather we can see who is impacted now and
> is
> > > it a
> > > > > > small
> > > > > > > subset with a simple workaround or large user impacting issue.
> > > > > > >
> > > > > > > 2. Agreed bug was reported 7/21. However, I became aware of
> this
> > > > issue
> > > > > on
> > > > > > > 08/29 and submitted the fix on 08/30. Also, I did bring this to
> > the
> > > > > > notice
> > > > > > > of community, you and 1.3 release manager (Roshani) on the RC0
> > > > proposal
> > > > > > > thread. Also, I would focus on the issue and user impact than
> who
> > > > > > > identified and who is fixing the issue.
> > > > > > >
> > > > > > >
> > > > > > > Based on my discussion with 2 users, I think it is a important
> > > > feature
> > > > > > for
> > > > > > > them to see in Apache MXNet v1.3.0.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Best,
> > > > > > >
> > > > > > > Sandeep
> > > > > > >
> > > > > > >
> > > > > > > [1] Workaround used by the user.
> > > > > > >
> > > > > > >
> > > > > > > net_fp16 =
> > > mx.gluon.SymbolBlock.imports('resnet34_fp16-symbol.json',
> > > > > > > ['data'])
> > > > > > >
> > > > > > > params_fp16 = mx.nd.load('resnet34_fp16-0000.params')
> > > > > > >
> > > > > > >
> > > > > > > for k, v in params_fp16.items():
> > > > > > >
> > > > > > >     new_key = k.split(':')[1]
> > > > > > >
> > > > > > >     net_fp16.collect_params()[new_key].cast(v.dtype)
> > > > > > >
> > > > > > >
> > > > > > > net_fp16.collect_params().load('resnet34_fp16-0000.params',
> ctx)
> > > > > > >
> > > > > > >
> > > > > > > [2] Amazon Rekognition
> > > > > > >
> > > > > > >
> > > > > > > [3] User story: Train a model -> Cast it to FP16 -> Save the
> > model
> > > ->
> > > > > > Load
> > > > > > > back the model does not work. They have to cast every parameter
> > > with
> > > > a
> > > > > > > workaround mentioned above [1].
> > > > > > >
> > > > > > > On Tue, Sep 4, 2018 at 4:14 PM Hagay Lupesko <
> lupesko@gmail.com>
> > > > > wrote:
> > > > > > >
> > > > > > > > Hi Sheng,
> > > > > > > >
> > > > > > > > Addressing your questions:
> > > > > > > >
> > > > > > > > - "why this specific bug is more important than all the other
> > > known
> > > > > > bugs,
> > > > > > > > that this becomes a release blocker"
> > > > > > > > I do not consider it to be more or less important than other
> > > fixes.
> > > > > It
> > > > > > > can
> > > > > > > > be fixed and included in the release alongside the rest of
> the
> > > > > release
> > > > > > > > content, right?
> > > > > > > > From the description of the issue it seems important since it
> > is
> > > > > > blocking
> > > > > > > > users from loading models that were previously trained and
> > saved.
> > > > > There
> > > > > > > is
> > > > > > > > nothing stopping the community from including this fix into
> > > 1.3.0,
> > > > > > > > alongside the rest of the features and fixes.
> > > > > > > >
> > > > > > > > - "The bug exists since SymbolBlock was introduced a year ago
> > and
> > > > has
> > > > > > > > survived at least three releases, so this is not a
> regression."
> > > > > > > > I do not think I said it is a regression. However, the fact a
> > bug
> > > > > > existed
> > > > > > > > before, does not mean it is OK to release it rather than fix
> > it.
> > > > > > > >
> > > > > > > > - "Timeline-wise, this bug was reported on 7/21, but was not
> > > > reported
> > > > > > as
> > > > > > > > release-blocker in the release discussion thread until 8/31
> > [1].
> > > > > > Neither
> > > > > > > > its reporting as release-blocker nor its fix made it for the
> > 8/3
> > > > code
> > > > > > > > freeze."
> > > > > > > > You are right, would have been better to have this identified
> > and
> > > > > fixed
> > > > > > > > earlier and included before code freeze.
> > > > > > > >
> > > > > > > > - "The PR is still not ready yet as it doesn't have
> approval."
> > > > > > > > I think it is waiting for your review.
> > > > > > > >
> > > > > > > > - "it would be great if you could provide some additional
> > > reasoning
> > > > > > > besides
> > > > > > > > "X mentions the issue" or "fix was done by X""
> > > > > > > > I have. Repeating what I wrote in my previous email for
> > clarity:
> > > > > Basic
> > > > > > > > functionality broken: loading a model (albeit one that that
> was
> > > > saved
> > > > > > as
> > > > > > > > non FP32)
> > > > > > > >
> > > > > > > > So, yes - this issue seems to have been out there for a
> while,
> > > > > somehow
> > > > > > > went
> > > > > > > > under the radar... but I think the key question is whether
> this
> > > > > blocks
> > > > > > a
> > > > > > > > basic functionality in MXNet. I believe so, hence my -1 vote.
> > > > > > > >
> > > > > > > > Hagay
> > > > > > > >
> > > > > > > > On Tue, Sep 4, 2018 at 1:19 PM Sheng Zha <szha.pvg@gmail.com
> >
> > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Hagay and Sandeep,
> > > > > > > > >
> > > > > > > > > Could you help us understand why this specific bug is more
> > > > > important
> > > > > > > than
> > > > > > > > > all the other known bugs, that this becomes a release
> > blocker?
> > > > > > > > >
> > > > > > > > > Some facts to consider:
> > > > > > > > > - The bug exists since SymbolBlock was introduced a year
> ago
> > > and
> > > > > has
> > > > > > > > > survived at least three releases, so this is not a
> > regression.
> > > > > > > > > - Timeline-wise, this bug was reported on 7/21, but was not
> > > > > reported
> > > > > > as
> > > > > > > > > release-blocker in the release discussion thread until 8/31
> > > [1].
> > > > > > > Neither
> > > > > > > > > its reporting as release-blocker nor its fix made it for
> the
> > > 8/3
> > > > > code
> > > > > > > > > freeze.
> > > > > > > > > - The PR is still not ready yet as it doesn't have
> approval.
> > > > > > > > >
> > > > > > > > > Hagay, it would be great if you could provide some
> additional
> > > > > > reasoning
> > > > > > > > > besides "X mentions the issue" or "fix was done by X".
> > Thanks.
> > > > > > > > >
> > > > > > > > > -sz
> > > > > > > > >
> > > > > > > > > [1]
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://lists.apache.org/thread.html/d1ed611f98c20d5d85c294b0c07c8bdebca13a209cf66a3872c9123e@%3Cdev.mxnet.apache.org%3E
> > > > > > > > >
> > > > > > > > > On Tue, Sep 4, 2018 at 12:39 PM Hagay Lupesko <
> > > lupesko@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Sandeep mentions the issue of an error when user tries to
> > > load
> > > > > > model
> > > > > > > > > params
> > > > > > > > > > trained/saved as FP16.
> > > > > > > > > > https://github.com/apache/incubator-mxnet/issues/11849
> > > > > > > > > > The fix was done by Sandeep:
> > > > > > > > > > https://github.com/apache/incubator-mxnet/pull/12412 and
> > is
> > > > > ready
> > > > > > to
> > > > > > > > be
> > > > > > > > > > cherry picked into the release branch.
> > > > > > > > > >
> > > > > > > > > > This seems like a release blocker to me:
> > > > > > > > > > - Basic functionality broken: loading a model (albeit one
> > > that
> > > > > that
> > > > > > > was
> > > > > > > > > > saved as non FP32)
> > > > > > > > > > - Reported by 3 users (wgchang@, nicklhy@ and
> > ThomasDelteil@
> > > )
> > > > > > > > > >
> > > > > > > > > > -1 (non binding)
> > > > > > > > > >
> > > > > > > > > > Hagay
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Tue, Sep 4, 2018 at 12:01 PM sandeep krishnamurthy <
> > > > > > > > > > sandeep.krishna98@gmail.com> wrote:
> > > > > > > > > >
> > > > > > > > > > > "- 0"
> > > > > > > > > > >
> > > > > > > > > > > I believe the bug #11849
> > > > > > > > > > > <
> https://github.com/apache/incubator-mxnet/issues/11849
> > >,
> > > > > unable
> > > > > > > to
> > > > > > > > > > import
> > > > > > > > > > > non-fp32 models into Gluon, fixed in this PR #12412
> > > > > > > > > > > <https://github.com/apache/incubator-mxnet/pull/12412>
> > is
> > > > > > > important
> > > > > > > > > for
> > > > > > > > > > > the
> > > > > > > > > > > users. I would rather pick this fix in this release
> than
> > > > plan a
> > > > > > > minor
> > > > > > > > > > > release later.
> > > > > > > > > > >
> > > > > > > > > > > Best,
> > > > > > > > > > > Sandeep
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Mon, Sep 3, 2018 at 2:34 PM Philip Cho <
> > > > > > > > chohyu01@cs.washington.edu>
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Actually, the command "git clone --recursive
> > > > > > > > > > > > https://github.com/apache/incubator-mxnet -b
> > 1.3.0.rc0"
> > > > > works
> > > > > > > fine
> > > > > > > > > > now,
> > > > > > > > > > > > never mind.
> > > > > > > > > > > >
> > > > > > > > > > > > On Mon, Sep 3, 2018 at 1:45 PM Philip Cho <
> > > > > > > > > chohyu01@cs.washington.edu>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Unfortunately, MXNet was depending on a branch of
> TVM
> > > > that
> > > > > is
> > > > > > > now
> > > > > > > > > > > > deleted.
> > > > > > > > > > > > > We will have to merge #12448
> > > > > > > > > > > > > <
> > https://github.com/apache/incubator-mxnet/pull/12448>
> > > > > > before
> > > > > > > > the
> > > > > > > > > > > > release.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Background: See dmlc/tvm#1394 <
> > > > > > > > > > https://github.com/dmlc/tvm/issues/1394
> > > > > > > > > > > >.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Philip.
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Mon, Sep 3, 2018 at 7:26 AM Carin Meier <
> > > > > > > carinmeier@gmail.com
> > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > >> Checked out the tag, built and tested the Clojure
> > > > package.
> > > > > > +1
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> On Fri, Aug 31, 2018 at 10:59 PM Roshani Nagmote <
> > > > > > > > > > > > >> roshaninagmote2@gmail.com>
> > > > > > > > > > > > >> wrote:
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> > Hi all,
> > > > > > > > > > > > >> >
> > > > > > > > > > > > >> > I would like to propose a vote to release Apache
> > > MXNet
> > > > > > > > > > (incubating)
> > > > > > > > > > > > >> version
> > > > > > > > > > > > >> > 1.3.0.RC0. Voting will start now (Friday, Aug
> > 31st)
> > > > and
> > > > > > end
> > > > > > > at
> > > > > > > > > > 7:00
> > > > > > > > > > > PM
> > > > > > > > > > > > >> > PDT, Wednesday, Sept 5th.
> > > > > > > > > > > > >> >
> > > > > > > > > > > > >> > Link to release notes:
> > > > > > > > > > > > >> >
> > https://github.com/apache/incubator-mxnet/releases
> > > > > > > > > > > > >> >
> > > > > > > > > > > > >> > Link to release candidate 1.3.0.rc0:
> > > > > > > > > > > > >> > *
> > > > > > > > >
> > > https://github.com/apache/incubator-mxnet/releases/tag/1.3.0.rc
> > > > > > > > > > > > >> > <
> > > > > > > > >
> > > https://github.com/apache/incubator-mxnet/releases/tag/1.3.0.rc0
> > > > > > > > > > > >0*
> > > > > > > > > > > > >> >
> > > > > > > > > > > > >> > View this page, click on "Build from Source",
> and
> > > use
> > > > > the
> > > > > > > > source
> > > > > > > > > > > code
> > > > > > > > > > > > >> > obtained from 1.3.0.rc0 tag:
> > > > > > > > > > > > >> >
> > > https://mxnet.incubator.apache.org/install/index.html
> > > > > > > > > > > > >> >
> > > > > > > > > > > > >> > Please remember to TEST first before voting
> > > > accordingly:
> > > > > > > > > > > > >> >
> > > > > > > > > > > > >> > +1 = approve
> > > > > > > > > > > > >> > +0 = no opinion
> > > > > > > > > > > > >> > -1 = disapprove (provide reason)
> > > > > > > > > > > > >> >
> > > > > > > > > > > > >> > Thanks,
> > > > > > > > > > > > >> > Roshani
> > > > > > > > > > > > >> >
> > > > > > > > > > > > >>
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > --
> > > > > > > > > > > Sandeep Krishnamurthy
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Sandeep Krishnamurthy
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> > --
> > Sandeep Krishnamurthy
> >
>

Re: [VOTE] Release MXNet version 1.3.0.RC0

Posted by Roshani Nagmote <ro...@gmail.com>.
I believe everyone here is working hard to make MXNet a better framework
for users. It's completely okay to have different opinions, we can decide
together if this issue is a blocker or not after voting time is over.

As I mentioned before, voting will end at 7 pm today. So there is still
time to test the release. If there are any other issues anyone finds, I
will be happy to start the process again and work on RC1. For now, I want
to encourage everyone to utilize this time and vote. :)

Thanks,
Roshani

On Tue, Sep 4, 2018 at 10:35 PM sandeep krishnamurthy <
sandeep.krishna98@gmail.com> wrote:

>    1. As a Apache MXNet community member, I raised the concern of broken
>    functionality for the user. I explained and provided the data points on
> the
>    issue, workaround and why I think it is important. If after all this,
> you
>    think my vote is biased on my employer just because a user I quoted is
> from
>    Amazon, this is more concerning to me on my voting abilities.
>    2. My -1 no where undermines the huge amount of effort that goes behind
>    the scene for a release to happen. Great respect and recognition for
>    everyone involved in all the releases of MXNet in the past and this. I
>    voted on my judgement of what may be good for the users of MXNet.
>    3. As pointed by Naveen & Chris, -1 are NOT veto. Feel free to decide
>    and progress on the release as we already have >3 +1 in this thread.
>
>
> Best,
>
> Sandeep
>
> On Tue, Sep 4, 2018 at 8:29 PM Chris Olivier <cj...@gmail.com>
> wrote:
>
> > btw, there are no vetoes on package releases:
> >
> > VOTES ON PACKAGE RELEASES
> > <https://www.apache.org/foundation/voting.html#ReleaseVotes>
> >
> > Votes on whether a package is ready to be released use majority approval
> > <https://www.apache.org/foundation/glossary.html#MajorityApproval> --
> i.e.
> > at least three PMC members must vote affirmatively for release, and there
> > must be more positive than negative votes.Releases may not be vetoed.
> > Generally
> > the community will cancel the release vote if anyone identifies serious
> > problems, but in most cases the ultimate decision, lies with the
> individual
> > serving as release manager. The specifics of the process may vary from
> > project to project, but the 'minimum quorum of three +1 votes' rule is
> > universal.
> >
> > On Tue, Sep 4, 2018 at 7:12 PM Sheng Zha <sz...@gmail.com> wrote:
> >
> > > Thanks for sharing your opinions, Thomas. Your recognition and respect
> of
> > > people's efforts on preparing the release candidate are certainly
> > > appreciated.
> > >
> > > Now that the vote is set to fail thanks to the veto, there will be
> plenty
> > > of opportunities to include those bug fixes, including the one Zhi
> > > mentioned [1], which was already merged in the master and yet chose not
> > to
> > > block this release with [2]. I will be happy to work with Roshani to
> > > prepare another release candidate once ready.
> > >
> > > -sz
> > >
> > > [1]
> > >
> > >
> >
> https://lists.apache.org/thread.html/f02e952bec22c82cb00a6741390a78f55373311c97464997bb455a6c@%3Cdev.mxnet.apache.org%3E
> > > [2]
> > >
> > >
> >
> https://lists.apache.org/thread.html/85d3fcabb3437ba7f1af455cf69aa13eb3afd1ea1d1f6f891e9c339c@%3Cdev.mxnet.apache.org%3E
> > >
> > > On Tue, Sep 4, 2018 at 6:02 PM Thomas DELTEIL <
> thomas.delteil1@gmail.com
> > >
> > > wrote:
> > >
> > > > -0
> > > > (non-binding)
> > > >
> > > > If I may add some nuancing plus a personal data point as one of the
> > users
> > > > commenting in the bug report in question:
> > > >
> > > > - Performance vs. Basic functionality => I don't think high
> performance
> > > > use-cases and basic functionality are two obviously opposed concepts
> > and
> > > > see no contradiction in Hagay's and Sandeep's statements.
> > > > Float16 support is feature of MXNet that provides more than twice the
> > > > performance of Float32 on supported platforms, hence the high
> > performance
> > > > use-case. The bug is that the basic functionality of reloading a
> saved
> > > > float16 models is currently broken.
> > > >
> > > > - This bug vs Other bugs => Contrary the vast majority of the 140
> open
> > > bugs
> > > > that are mentioned above, I would put to Sandeep's credit that this
> one
> > > bug
> > > > has a PR open that provides a fix for it. This would make it a better
> > > > candidate to get included in this release than a bug that has no fix
> > > ready
> > > > for it.
> > > >
> > > > - Personal datapoint: I recently did some experimentation with
> float16
> > > [1]
> > > > and actually coincidentally just published a video on optimizing
> > > > performance for Gluon. Float16 conversion is one of the most, if not
> > the
> > > > most effective way to get performance out of MXNet [2]. I believe
> there
> > > is
> > > > a lot of value in publicizing more its use and hence making sure at
> > least
> > > > the basic support for normal use-cases is present.
> > > >
> > > > Of course this needs to be balanced with the overhead of preparing a
> > new
> > > > release candidate once the fixed is reviewed and merged, which seems
> to
> > > be
> > > > a lengthy and complex process in its own right, and the delay with
> > > > providing the other features present in 1.3 for users that are not
> > > running
> > > > off the nightly builds.
> > > >
> > > > All the best,
> > > >
> > > > Thomas
> > > >
> > > > [1] https://github.com/ThomasDelteil/PerformanceTricksMXNetGluon
> > > > [2]
> > > >
> > > >
> > >
> >
> https://www.youtube.com/watch?v=Cqo7FPftNyo&t=0s&list=PLkEvNnRk8uVk6U515Pj-jHQUxFC4eDi3m
> > > >
> > > > Le mar. 4 sept. 2018 à 17:11, Sheng Zha <sz...@gmail.com> a
> écrit :
> > > >
> > > > > Sandeep,
> > > > >
> > > > > Thanks for explaining your veto. We have open bugs that impacted a
> > lot
> > > > more
> > > > > than just 3 customers, just by referring to the number of
> commenters
> > on
> > > > the
> > > > > issue [1].
> > > > >
> > > > > You said that this is for "high performance use cases", which
> > > contradicts
> > > > > with Hagay's assement that this is "basic functionality broken".
> > Given
> > > > that
> > > > > this is for advanced use cases of using half-precision training,
> why
> > is
> > > > it
> > > > > so much more important than any other open bug reports, that for
> this
> > > > > specific bug fix, we have to delay the access of regular users to
> the
> > > new
> > > > > MXNet 1.3 release by at least another week?
> > > > >
> > > > > Honestly, I'm concerned that your vote is biased by Amazon
> > involvement,
> > > > > given that you quoted Amazon Rekognition.
> > > > >
> > > > > -sz
> > > > >
> > > > > [1]
> > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/incubator-mxnet/issues?q=is%3Aissue+is%3Aopen+label%3ABug+sort%3Acomments-desc
> > > > >
> > > > > On Tue, Sep 4, 2018 at 4:51 PM sandeep krishnamurthy <
> > > > > sandeep.krishna98@gmail.com> wrote:
> > > > >
> > > > > > My initial vote of “-0” was due to lack of info from a user who
> had
> > > > said,
> > > > > > he overcame this issue for FP16 model.
> > > > > >
> > > > > >
> > > > > > However, suggested workaround [1] for the issue is not straight
> > > forward
> > > > > and
> > > > > > generally usable for all users. Also, issue is not simple and
> > > isolated
> > > > to
> > > > > > be listed in the Release Notes as known issue with a workaround.
> > > > > >
> > > > > >
> > > > > > Changing my vote to: "-1 (binding)" owing to the user impact [3]
> > > > > >
> > > > > >
> > > > > >
> > > > > > @Sheng:
> > > > > >
> > > > > > 1. Agreed, bug existed from long time. However, FP16 and such
> > > > > optimizations
> > > > > > were added later on. Followed by users [2] using this feature for
> > > high
> > > > > > performance use cases. It is not ok to measure severity of the
> bug
> > > > based
> > > > > on
> > > > > > its past existence, rather we can see who is impacted now and is
> > it a
> > > > > small
> > > > > > subset with a simple workaround or large user impacting issue.
> > > > > >
> > > > > > 2. Agreed bug was reported 7/21. However, I became aware of this
> > > issue
> > > > on
> > > > > > 08/29 and submitted the fix on 08/30. Also, I did bring this to
> the
> > > > > notice
> > > > > > of community, you and 1.3 release manager (Roshani) on the RC0
> > > proposal
> > > > > > thread. Also, I would focus on the issue and user impact than who
> > > > > > identified and who is fixing the issue.
> > > > > >
> > > > > >
> > > > > > Based on my discussion with 2 users, I think it is a important
> > > feature
> > > > > for
> > > > > > them to see in Apache MXNet v1.3.0.
> > > > > >
> > > > > >
> > > > > >
> > > > > > Best,
> > > > > >
> > > > > > Sandeep
> > > > > >
> > > > > >
> > > > > > [1] Workaround used by the user.
> > > > > >
> > > > > >
> > > > > > net_fp16 =
> > mx.gluon.SymbolBlock.imports('resnet34_fp16-symbol.json',
> > > > > > ['data'])
> > > > > >
> > > > > > params_fp16 = mx.nd.load('resnet34_fp16-0000.params')
> > > > > >
> > > > > >
> > > > > > for k, v in params_fp16.items():
> > > > > >
> > > > > >     new_key = k.split(':')[1]
> > > > > >
> > > > > >     net_fp16.collect_params()[new_key].cast(v.dtype)
> > > > > >
> > > > > >
> > > > > > net_fp16.collect_params().load('resnet34_fp16-0000.params', ctx)
> > > > > >
> > > > > >
> > > > > > [2] Amazon Rekognition
> > > > > >
> > > > > >
> > > > > > [3] User story: Train a model -> Cast it to FP16 -> Save the
> model
> > ->
> > > > > Load
> > > > > > back the model does not work. They have to cast every parameter
> > with
> > > a
> > > > > > workaround mentioned above [1].
> > > > > >
> > > > > > On Tue, Sep 4, 2018 at 4:14 PM Hagay Lupesko <lu...@gmail.com>
> > > > wrote:
> > > > > >
> > > > > > > Hi Sheng,
> > > > > > >
> > > > > > > Addressing your questions:
> > > > > > >
> > > > > > > - "why this specific bug is more important than all the other
> > known
> > > > > bugs,
> > > > > > > that this becomes a release blocker"
> > > > > > > I do not consider it to be more or less important than other
> > fixes.
> > > > It
> > > > > > can
> > > > > > > be fixed and included in the release alongside the rest of the
> > > > release
> > > > > > > content, right?
> > > > > > > From the description of the issue it seems important since it
> is
> > > > > blocking
> > > > > > > users from loading models that were previously trained and
> saved.
> > > > There
> > > > > > is
> > > > > > > nothing stopping the community from including this fix into
> > 1.3.0,
> > > > > > > alongside the rest of the features and fixes.
> > > > > > >
> > > > > > > - "The bug exists since SymbolBlock was introduced a year ago
> and
> > > has
> > > > > > > survived at least three releases, so this is not a regression."
> > > > > > > I do not think I said it is a regression. However, the fact a
> bug
> > > > > existed
> > > > > > > before, does not mean it is OK to release it rather than fix
> it.
> > > > > > >
> > > > > > > - "Timeline-wise, this bug was reported on 7/21, but was not
> > > reported
> > > > > as
> > > > > > > release-blocker in the release discussion thread until 8/31
> [1].
> > > > > Neither
> > > > > > > its reporting as release-blocker nor its fix made it for the
> 8/3
> > > code
> > > > > > > freeze."
> > > > > > > You are right, would have been better to have this identified
> and
> > > > fixed
> > > > > > > earlier and included before code freeze.
> > > > > > >
> > > > > > > - "The PR is still not ready yet as it doesn't have approval."
> > > > > > > I think it is waiting for your review.
> > > > > > >
> > > > > > > - "it would be great if you could provide some additional
> > reasoning
> > > > > > besides
> > > > > > > "X mentions the issue" or "fix was done by X""
> > > > > > > I have. Repeating what I wrote in my previous email for
> clarity:
> > > > Basic
> > > > > > > functionality broken: loading a model (albeit one that that was
> > > saved
> > > > > as
> > > > > > > non FP32)
> > > > > > >
> > > > > > > So, yes - this issue seems to have been out there for a while,
> > > > somehow
> > > > > > went
> > > > > > > under the radar... but I think the key question is whether this
> > > > blocks
> > > > > a
> > > > > > > basic functionality in MXNet. I believe so, hence my -1 vote.
> > > > > > >
> > > > > > > Hagay
> > > > > > >
> > > > > > > On Tue, Sep 4, 2018 at 1:19 PM Sheng Zha <sz...@gmail.com>
> > > wrote:
> > > > > > >
> > > > > > > > Hi Hagay and Sandeep,
> > > > > > > >
> > > > > > > > Could you help us understand why this specific bug is more
> > > > important
> > > > > > than
> > > > > > > > all the other known bugs, that this becomes a release
> blocker?
> > > > > > > >
> > > > > > > > Some facts to consider:
> > > > > > > > - The bug exists since SymbolBlock was introduced a year ago
> > and
> > > > has
> > > > > > > > survived at least three releases, so this is not a
> regression.
> > > > > > > > - Timeline-wise, this bug was reported on 7/21, but was not
> > > > reported
> > > > > as
> > > > > > > > release-blocker in the release discussion thread until 8/31
> > [1].
> > > > > > Neither
> > > > > > > > its reporting as release-blocker nor its fix made it for the
> > 8/3
> > > > code
> > > > > > > > freeze.
> > > > > > > > - The PR is still not ready yet as it doesn't have approval.
> > > > > > > >
> > > > > > > > Hagay, it would be great if you could provide some additional
> > > > > reasoning
> > > > > > > > besides "X mentions the issue" or "fix was done by X".
> Thanks.
> > > > > > > >
> > > > > > > > -sz
> > > > > > > >
> > > > > > > > [1]
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://lists.apache.org/thread.html/d1ed611f98c20d5d85c294b0c07c8bdebca13a209cf66a3872c9123e@%3Cdev.mxnet.apache.org%3E
> > > > > > > >
> > > > > > > > On Tue, Sep 4, 2018 at 12:39 PM Hagay Lupesko <
> > lupesko@gmail.com
> > > >
> > > > > > wrote:
> > > > > > > >
> > > > > > > > > Sandeep mentions the issue of an error when user tries to
> > load
> > > > > model
> > > > > > > > params
> > > > > > > > > trained/saved as FP16.
> > > > > > > > > https://github.com/apache/incubator-mxnet/issues/11849
> > > > > > > > > The fix was done by Sandeep:
> > > > > > > > > https://github.com/apache/incubator-mxnet/pull/12412 and
> is
> > > > ready
> > > > > to
> > > > > > > be
> > > > > > > > > cherry picked into the release branch.
> > > > > > > > >
> > > > > > > > > This seems like a release blocker to me:
> > > > > > > > > - Basic functionality broken: loading a model (albeit one
> > that
> > > > that
> > > > > > was
> > > > > > > > > saved as non FP32)
> > > > > > > > > - Reported by 3 users (wgchang@, nicklhy@ and
> ThomasDelteil@
> > )
> > > > > > > > >
> > > > > > > > > -1 (non binding)
> > > > > > > > >
> > > > > > > > > Hagay
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Tue, Sep 4, 2018 at 12:01 PM sandeep krishnamurthy <
> > > > > > > > > sandeep.krishna98@gmail.com> wrote:
> > > > > > > > >
> > > > > > > > > > "- 0"
> > > > > > > > > >
> > > > > > > > > > I believe the bug #11849
> > > > > > > > > > <https://github.com/apache/incubator-mxnet/issues/11849
> >,
> > > > unable
> > > > > > to
> > > > > > > > > import
> > > > > > > > > > non-fp32 models into Gluon, fixed in this PR #12412
> > > > > > > > > > <https://github.com/apache/incubator-mxnet/pull/12412>
> is
> > > > > > important
> > > > > > > > for
> > > > > > > > > > the
> > > > > > > > > > users. I would rather pick this fix in this release than
> > > plan a
> > > > > > minor
> > > > > > > > > > release later.
> > > > > > > > > >
> > > > > > > > > > Best,
> > > > > > > > > > Sandeep
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Mon, Sep 3, 2018 at 2:34 PM Philip Cho <
> > > > > > > chohyu01@cs.washington.edu>
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Actually, the command "git clone --recursive
> > > > > > > > > > > https://github.com/apache/incubator-mxnet -b
> 1.3.0.rc0"
> > > > works
> > > > > > fine
> > > > > > > > > now,
> > > > > > > > > > > never mind.
> > > > > > > > > > >
> > > > > > > > > > > On Mon, Sep 3, 2018 at 1:45 PM Philip Cho <
> > > > > > > > chohyu01@cs.washington.edu>
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Unfortunately, MXNet was depending on a branch of TVM
> > > that
> > > > is
> > > > > > now
> > > > > > > > > > > deleted.
> > > > > > > > > > > > We will have to merge #12448
> > > > > > > > > > > > <
> https://github.com/apache/incubator-mxnet/pull/12448>
> > > > > before
> > > > > > > the
> > > > > > > > > > > release.
> > > > > > > > > > > >
> > > > > > > > > > > > Background: See dmlc/tvm#1394 <
> > > > > > > > > https://github.com/dmlc/tvm/issues/1394
> > > > > > > > > > >.
> > > > > > > > > > > >
> > > > > > > > > > > > Philip.
> > > > > > > > > > > >
> > > > > > > > > > > > On Mon, Sep 3, 2018 at 7:26 AM Carin Meier <
> > > > > > carinmeier@gmail.com
> > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > >> Checked out the tag, built and tested the Clojure
> > > package.
> > > > > +1
> > > > > > > > > > > >>
> > > > > > > > > > > >> On Fri, Aug 31, 2018 at 10:59 PM Roshani Nagmote <
> > > > > > > > > > > >> roshaninagmote2@gmail.com>
> > > > > > > > > > > >> wrote:
> > > > > > > > > > > >>
> > > > > > > > > > > >> > Hi all,
> > > > > > > > > > > >> >
> > > > > > > > > > > >> > I would like to propose a vote to release Apache
> > MXNet
> > > > > > > > > (incubating)
> > > > > > > > > > > >> version
> > > > > > > > > > > >> > 1.3.0.RC0. Voting will start now (Friday, Aug
> 31st)
> > > and
> > > > > end
> > > > > > at
> > > > > > > > > 7:00
> > > > > > > > > > PM
> > > > > > > > > > > >> > PDT, Wednesday, Sept 5th.
> > > > > > > > > > > >> >
> > > > > > > > > > > >> > Link to release notes:
> > > > > > > > > > > >> >
> https://github.com/apache/incubator-mxnet/releases
> > > > > > > > > > > >> >
> > > > > > > > > > > >> > Link to release candidate 1.3.0.rc0:
> > > > > > > > > > > >> > *
> > > > > > > >
> > https://github.com/apache/incubator-mxnet/releases/tag/1.3.0.rc
> > > > > > > > > > > >> > <
> > > > > > > >
> > https://github.com/apache/incubator-mxnet/releases/tag/1.3.0.rc0
> > > > > > > > > > >0*
> > > > > > > > > > > >> >
> > > > > > > > > > > >> > View this page, click on "Build from Source", and
> > use
> > > > the
> > > > > > > source
> > > > > > > > > > code
> > > > > > > > > > > >> > obtained from 1.3.0.rc0 tag:
> > > > > > > > > > > >> >
> > https://mxnet.incubator.apache.org/install/index.html
> > > > > > > > > > > >> >
> > > > > > > > > > > >> > Please remember to TEST first before voting
> > > accordingly:
> > > > > > > > > > > >> >
> > > > > > > > > > > >> > +1 = approve
> > > > > > > > > > > >> > +0 = no opinion
> > > > > > > > > > > >> > -1 = disapprove (provide reason)
> > > > > > > > > > > >> >
> > > > > > > > > > > >> > Thanks,
> > > > > > > > > > > >> > Roshani
> > > > > > > > > > > >> >
> > > > > > > > > > > >>
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > Sandeep Krishnamurthy
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Sandeep Krishnamurthy
> > > > > >
> > > > >
> > > >
> > >
> >
>
>
> --
> Sandeep Krishnamurthy
>

Re: [VOTE] Release MXNet version 1.3.0.RC0

Posted by sandeep krishnamurthy <sa...@gmail.com>.
   1. As a Apache MXNet community member, I raised the concern of broken
   functionality for the user. I explained and provided the data points on the
   issue, workaround and why I think it is important. If after all this, you
   think my vote is biased on my employer just because a user I quoted is from
   Amazon, this is more concerning to me on my voting abilities.
   2. My -1 no where undermines the huge amount of effort that goes behind
   the scene for a release to happen. Great respect and recognition for
   everyone involved in all the releases of MXNet in the past and this. I
   voted on my judgement of what may be good for the users of MXNet.
   3. As pointed by Naveen & Chris, -1 are NOT veto. Feel free to decide
   and progress on the release as we already have >3 +1 in this thread.


Best,

Sandeep

On Tue, Sep 4, 2018 at 8:29 PM Chris Olivier <cj...@gmail.com> wrote:

> btw, there are no vetoes on package releases:
>
> VOTES ON PACKAGE RELEASES
> <https://www.apache.org/foundation/voting.html#ReleaseVotes>
>
> Votes on whether a package is ready to be released use majority approval
> <https://www.apache.org/foundation/glossary.html#MajorityApproval> -- i.e.
> at least three PMC members must vote affirmatively for release, and there
> must be more positive than negative votes.Releases may not be vetoed.
> Generally
> the community will cancel the release vote if anyone identifies serious
> problems, but in most cases the ultimate decision, lies with the individual
> serving as release manager. The specifics of the process may vary from
> project to project, but the 'minimum quorum of three +1 votes' rule is
> universal.
>
> On Tue, Sep 4, 2018 at 7:12 PM Sheng Zha <sz...@gmail.com> wrote:
>
> > Thanks for sharing your opinions, Thomas. Your recognition and respect of
> > people's efforts on preparing the release candidate are certainly
> > appreciated.
> >
> > Now that the vote is set to fail thanks to the veto, there will be plenty
> > of opportunities to include those bug fixes, including the one Zhi
> > mentioned [1], which was already merged in the master and yet chose not
> to
> > block this release with [2]. I will be happy to work with Roshani to
> > prepare another release candidate once ready.
> >
> > -sz
> >
> > [1]
> >
> >
> https://lists.apache.org/thread.html/f02e952bec22c82cb00a6741390a78f55373311c97464997bb455a6c@%3Cdev.mxnet.apache.org%3E
> > [2]
> >
> >
> https://lists.apache.org/thread.html/85d3fcabb3437ba7f1af455cf69aa13eb3afd1ea1d1f6f891e9c339c@%3Cdev.mxnet.apache.org%3E
> >
> > On Tue, Sep 4, 2018 at 6:02 PM Thomas DELTEIL <thomas.delteil1@gmail.com
> >
> > wrote:
> >
> > > -0
> > > (non-binding)
> > >
> > > If I may add some nuancing plus a personal data point as one of the
> users
> > > commenting in the bug report in question:
> > >
> > > - Performance vs. Basic functionality => I don't think high performance
> > > use-cases and basic functionality are two obviously opposed concepts
> and
> > > see no contradiction in Hagay's and Sandeep's statements.
> > > Float16 support is feature of MXNet that provides more than twice the
> > > performance of Float32 on supported platforms, hence the high
> performance
> > > use-case. The bug is that the basic functionality of reloading a saved
> > > float16 models is currently broken.
> > >
> > > - This bug vs Other bugs => Contrary the vast majority of the 140 open
> > bugs
> > > that are mentioned above, I would put to Sandeep's credit that this one
> > bug
> > > has a PR open that provides a fix for it. This would make it a better
> > > candidate to get included in this release than a bug that has no fix
> > ready
> > > for it.
> > >
> > > - Personal datapoint: I recently did some experimentation with float16
> > [1]
> > > and actually coincidentally just published a video on optimizing
> > > performance for Gluon. Float16 conversion is one of the most, if not
> the
> > > most effective way to get performance out of MXNet [2]. I believe there
> > is
> > > a lot of value in publicizing more its use and hence making sure at
> least
> > > the basic support for normal use-cases is present.
> > >
> > > Of course this needs to be balanced with the overhead of preparing a
> new
> > > release candidate once the fixed is reviewed and merged, which seems to
> > be
> > > a lengthy and complex process in its own right, and the delay with
> > > providing the other features present in 1.3 for users that are not
> > running
> > > off the nightly builds.
> > >
> > > All the best,
> > >
> > > Thomas
> > >
> > > [1] https://github.com/ThomasDelteil/PerformanceTricksMXNetGluon
> > > [2]
> > >
> > >
> >
> https://www.youtube.com/watch?v=Cqo7FPftNyo&t=0s&list=PLkEvNnRk8uVk6U515Pj-jHQUxFC4eDi3m
> > >
> > > Le mar. 4 sept. 2018 à 17:11, Sheng Zha <sz...@gmail.com> a écrit :
> > >
> > > > Sandeep,
> > > >
> > > > Thanks for explaining your veto. We have open bugs that impacted a
> lot
> > > more
> > > > than just 3 customers, just by referring to the number of commenters
> on
> > > the
> > > > issue [1].
> > > >
> > > > You said that this is for "high performance use cases", which
> > contradicts
> > > > with Hagay's assement that this is "basic functionality broken".
> Given
> > > that
> > > > this is for advanced use cases of using half-precision training, why
> is
> > > it
> > > > so much more important than any other open bug reports, that for this
> > > > specific bug fix, we have to delay the access of regular users to the
> > new
> > > > MXNet 1.3 release by at least another week?
> > > >
> > > > Honestly, I'm concerned that your vote is biased by Amazon
> involvement,
> > > > given that you quoted Amazon Rekognition.
> > > >
> > > > -sz
> > > >
> > > > [1]
> > > >
> > > >
> > >
> >
> https://github.com/apache/incubator-mxnet/issues?q=is%3Aissue+is%3Aopen+label%3ABug+sort%3Acomments-desc
> > > >
> > > > On Tue, Sep 4, 2018 at 4:51 PM sandeep krishnamurthy <
> > > > sandeep.krishna98@gmail.com> wrote:
> > > >
> > > > > My initial vote of “-0” was due to lack of info from a user who had
> > > said,
> > > > > he overcame this issue for FP16 model.
> > > > >
> > > > >
> > > > > However, suggested workaround [1] for the issue is not straight
> > forward
> > > > and
> > > > > generally usable for all users. Also, issue is not simple and
> > isolated
> > > to
> > > > > be listed in the Release Notes as known issue with a workaround.
> > > > >
> > > > >
> > > > > Changing my vote to: "-1 (binding)" owing to the user impact [3]
> > > > >
> > > > >
> > > > >
> > > > > @Sheng:
> > > > >
> > > > > 1. Agreed, bug existed from long time. However, FP16 and such
> > > > optimizations
> > > > > were added later on. Followed by users [2] using this feature for
> > high
> > > > > performance use cases. It is not ok to measure severity of the bug
> > > based
> > > > on
> > > > > its past existence, rather we can see who is impacted now and is
> it a
> > > > small
> > > > > subset with a simple workaround or large user impacting issue.
> > > > >
> > > > > 2. Agreed bug was reported 7/21. However, I became aware of this
> > issue
> > > on
> > > > > 08/29 and submitted the fix on 08/30. Also, I did bring this to the
> > > > notice
> > > > > of community, you and 1.3 release manager (Roshani) on the RC0
> > proposal
> > > > > thread. Also, I would focus on the issue and user impact than who
> > > > > identified and who is fixing the issue.
> > > > >
> > > > >
> > > > > Based on my discussion with 2 users, I think it is a important
> > feature
> > > > for
> > > > > them to see in Apache MXNet v1.3.0.
> > > > >
> > > > >
> > > > >
> > > > > Best,
> > > > >
> > > > > Sandeep
> > > > >
> > > > >
> > > > > [1] Workaround used by the user.
> > > > >
> > > > >
> > > > > net_fp16 =
> mx.gluon.SymbolBlock.imports('resnet34_fp16-symbol.json',
> > > > > ['data'])
> > > > >
> > > > > params_fp16 = mx.nd.load('resnet34_fp16-0000.params')
> > > > >
> > > > >
> > > > > for k, v in params_fp16.items():
> > > > >
> > > > >     new_key = k.split(':')[1]
> > > > >
> > > > >     net_fp16.collect_params()[new_key].cast(v.dtype)
> > > > >
> > > > >
> > > > > net_fp16.collect_params().load('resnet34_fp16-0000.params', ctx)
> > > > >
> > > > >
> > > > > [2] Amazon Rekognition
> > > > >
> > > > >
> > > > > [3] User story: Train a model -> Cast it to FP16 -> Save the model
> ->
> > > > Load
> > > > > back the model does not work. They have to cast every parameter
> with
> > a
> > > > > workaround mentioned above [1].
> > > > >
> > > > > On Tue, Sep 4, 2018 at 4:14 PM Hagay Lupesko <lu...@gmail.com>
> > > wrote:
> > > > >
> > > > > > Hi Sheng,
> > > > > >
> > > > > > Addressing your questions:
> > > > > >
> > > > > > - "why this specific bug is more important than all the other
> known
> > > > bugs,
> > > > > > that this becomes a release blocker"
> > > > > > I do not consider it to be more or less important than other
> fixes.
> > > It
> > > > > can
> > > > > > be fixed and included in the release alongside the rest of the
> > > release
> > > > > > content, right?
> > > > > > From the description of the issue it seems important since it is
> > > > blocking
> > > > > > users from loading models that were previously trained and saved.
> > > There
> > > > > is
> > > > > > nothing stopping the community from including this fix into
> 1.3.0,
> > > > > > alongside the rest of the features and fixes.
> > > > > >
> > > > > > - "The bug exists since SymbolBlock was introduced a year ago and
> > has
> > > > > > survived at least three releases, so this is not a regression."
> > > > > > I do not think I said it is a regression. However, the fact a bug
> > > > existed
> > > > > > before, does not mean it is OK to release it rather than fix it.
> > > > > >
> > > > > > - "Timeline-wise, this bug was reported on 7/21, but was not
> > reported
> > > > as
> > > > > > release-blocker in the release discussion thread until 8/31 [1].
> > > > Neither
> > > > > > its reporting as release-blocker nor its fix made it for the 8/3
> > code
> > > > > > freeze."
> > > > > > You are right, would have been better to have this identified and
> > > fixed
> > > > > > earlier and included before code freeze.
> > > > > >
> > > > > > - "The PR is still not ready yet as it doesn't have approval."
> > > > > > I think it is waiting for your review.
> > > > > >
> > > > > > - "it would be great if you could provide some additional
> reasoning
> > > > > besides
> > > > > > "X mentions the issue" or "fix was done by X""
> > > > > > I have. Repeating what I wrote in my previous email for clarity:
> > > Basic
> > > > > > functionality broken: loading a model (albeit one that that was
> > saved
> > > > as
> > > > > > non FP32)
> > > > > >
> > > > > > So, yes - this issue seems to have been out there for a while,
> > > somehow
> > > > > went
> > > > > > under the radar... but I think the key question is whether this
> > > blocks
> > > > a
> > > > > > basic functionality in MXNet. I believe so, hence my -1 vote.
> > > > > >
> > > > > > Hagay
> > > > > >
> > > > > > On Tue, Sep 4, 2018 at 1:19 PM Sheng Zha <sz...@gmail.com>
> > wrote:
> > > > > >
> > > > > > > Hi Hagay and Sandeep,
> > > > > > >
> > > > > > > Could you help us understand why this specific bug is more
> > > important
> > > > > than
> > > > > > > all the other known bugs, that this becomes a release blocker?
> > > > > > >
> > > > > > > Some facts to consider:
> > > > > > > - The bug exists since SymbolBlock was introduced a year ago
> and
> > > has
> > > > > > > survived at least three releases, so this is not a regression.
> > > > > > > - Timeline-wise, this bug was reported on 7/21, but was not
> > > reported
> > > > as
> > > > > > > release-blocker in the release discussion thread until 8/31
> [1].
> > > > > Neither
> > > > > > > its reporting as release-blocker nor its fix made it for the
> 8/3
> > > code
> > > > > > > freeze.
> > > > > > > - The PR is still not ready yet as it doesn't have approval.
> > > > > > >
> > > > > > > Hagay, it would be great if you could provide some additional
> > > > reasoning
> > > > > > > besides "X mentions the issue" or "fix was done by X". Thanks.
> > > > > > >
> > > > > > > -sz
> > > > > > >
> > > > > > > [1]
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://lists.apache.org/thread.html/d1ed611f98c20d5d85c294b0c07c8bdebca13a209cf66a3872c9123e@%3Cdev.mxnet.apache.org%3E
> > > > > > >
> > > > > > > On Tue, Sep 4, 2018 at 12:39 PM Hagay Lupesko <
> lupesko@gmail.com
> > >
> > > > > wrote:
> > > > > > >
> > > > > > > > Sandeep mentions the issue of an error when user tries to
> load
> > > > model
> > > > > > > params
> > > > > > > > trained/saved as FP16.
> > > > > > > > https://github.com/apache/incubator-mxnet/issues/11849
> > > > > > > > The fix was done by Sandeep:
> > > > > > > > https://github.com/apache/incubator-mxnet/pull/12412 and is
> > > ready
> > > > to
> > > > > > be
> > > > > > > > cherry picked into the release branch.
> > > > > > > >
> > > > > > > > This seems like a release blocker to me:
> > > > > > > > - Basic functionality broken: loading a model (albeit one
> that
> > > that
> > > > > was
> > > > > > > > saved as non FP32)
> > > > > > > > - Reported by 3 users (wgchang@, nicklhy@ and ThomasDelteil@
> )
> > > > > > > >
> > > > > > > > -1 (non binding)
> > > > > > > >
> > > > > > > > Hagay
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Tue, Sep 4, 2018 at 12:01 PM sandeep krishnamurthy <
> > > > > > > > sandeep.krishna98@gmail.com> wrote:
> > > > > > > >
> > > > > > > > > "- 0"
> > > > > > > > >
> > > > > > > > > I believe the bug #11849
> > > > > > > > > <https://github.com/apache/incubator-mxnet/issues/11849>,
> > > unable
> > > > > to
> > > > > > > > import
> > > > > > > > > non-fp32 models into Gluon, fixed in this PR #12412
> > > > > > > > > <https://github.com/apache/incubator-mxnet/pull/12412> is
> > > > > important
> > > > > > > for
> > > > > > > > > the
> > > > > > > > > users. I would rather pick this fix in this release than
> > plan a
> > > > > minor
> > > > > > > > > release later.
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > > Sandeep
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Mon, Sep 3, 2018 at 2:34 PM Philip Cho <
> > > > > > chohyu01@cs.washington.edu>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Actually, the command "git clone --recursive
> > > > > > > > > > https://github.com/apache/incubator-mxnet -b 1.3.0.rc0"
> > > works
> > > > > fine
> > > > > > > > now,
> > > > > > > > > > never mind.
> > > > > > > > > >
> > > > > > > > > > On Mon, Sep 3, 2018 at 1:45 PM Philip Cho <
> > > > > > > chohyu01@cs.washington.edu>
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Unfortunately, MXNet was depending on a branch of TVM
> > that
> > > is
> > > > > now
> > > > > > > > > > deleted.
> > > > > > > > > > > We will have to merge #12448
> > > > > > > > > > > <https://github.com/apache/incubator-mxnet/pull/12448>
> > > > before
> > > > > > the
> > > > > > > > > > release.
> > > > > > > > > > >
> > > > > > > > > > > Background: See dmlc/tvm#1394 <
> > > > > > > > https://github.com/dmlc/tvm/issues/1394
> > > > > > > > > >.
> > > > > > > > > > >
> > > > > > > > > > > Philip.
> > > > > > > > > > >
> > > > > > > > > > > On Mon, Sep 3, 2018 at 7:26 AM Carin Meier <
> > > > > carinmeier@gmail.com
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > >> Checked out the tag, built and tested the Clojure
> > package.
> > > > +1
> > > > > > > > > > >>
> > > > > > > > > > >> On Fri, Aug 31, 2018 at 10:59 PM Roshani Nagmote <
> > > > > > > > > > >> roshaninagmote2@gmail.com>
> > > > > > > > > > >> wrote:
> > > > > > > > > > >>
> > > > > > > > > > >> > Hi all,
> > > > > > > > > > >> >
> > > > > > > > > > >> > I would like to propose a vote to release Apache
> MXNet
> > > > > > > > (incubating)
> > > > > > > > > > >> version
> > > > > > > > > > >> > 1.3.0.RC0. Voting will start now (Friday, Aug 31st)
> > and
> > > > end
> > > > > at
> > > > > > > > 7:00
> > > > > > > > > PM
> > > > > > > > > > >> > PDT, Wednesday, Sept 5th.
> > > > > > > > > > >> >
> > > > > > > > > > >> > Link to release notes:
> > > > > > > > > > >> > https://github.com/apache/incubator-mxnet/releases
> > > > > > > > > > >> >
> > > > > > > > > > >> > Link to release candidate 1.3.0.rc0:
> > > > > > > > > > >> > *
> > > > > > >
> https://github.com/apache/incubator-mxnet/releases/tag/1.3.0.rc
> > > > > > > > > > >> > <
> > > > > > >
> https://github.com/apache/incubator-mxnet/releases/tag/1.3.0.rc0
> > > > > > > > > >0*
> > > > > > > > > > >> >
> > > > > > > > > > >> > View this page, click on "Build from Source", and
> use
> > > the
> > > > > > source
> > > > > > > > > code
> > > > > > > > > > >> > obtained from 1.3.0.rc0 tag:
> > > > > > > > > > >> >
> https://mxnet.incubator.apache.org/install/index.html
> > > > > > > > > > >> >
> > > > > > > > > > >> > Please remember to TEST first before voting
> > accordingly:
> > > > > > > > > > >> >
> > > > > > > > > > >> > +1 = approve
> > > > > > > > > > >> > +0 = no opinion
> > > > > > > > > > >> > -1 = disapprove (provide reason)
> > > > > > > > > > >> >
> > > > > > > > > > >> > Thanks,
> > > > > > > > > > >> > Roshani
> > > > > > > > > > >> >
> > > > > > > > > > >>
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Sandeep Krishnamurthy
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Sandeep Krishnamurthy
> > > > >
> > > >
> > >
> >
>


-- 
Sandeep Krishnamurthy

Re: [VOTE] Release MXNet version 1.3.0.RC0

Posted by Chris Olivier <cj...@gmail.com>.
btw, there are no vetoes on package releases:

VOTES ON PACKAGE RELEASES
<https://www.apache.org/foundation/voting.html#ReleaseVotes>

Votes on whether a package is ready to be released use majority approval
<https://www.apache.org/foundation/glossary.html#MajorityApproval> -- i.e.
at least three PMC members must vote affirmatively for release, and there
must be more positive than negative votes.Releases may not be vetoed. Generally
the community will cancel the release vote if anyone identifies serious
problems, but in most cases the ultimate decision, lies with the individual
serving as release manager. The specifics of the process may vary from
project to project, but the 'minimum quorum of three +1 votes' rule is
universal.

On Tue, Sep 4, 2018 at 7:12 PM Sheng Zha <sz...@gmail.com> wrote:

> Thanks for sharing your opinions, Thomas. Your recognition and respect of
> people's efforts on preparing the release candidate are certainly
> appreciated.
>
> Now that the vote is set to fail thanks to the veto, there will be plenty
> of opportunities to include those bug fixes, including the one Zhi
> mentioned [1], which was already merged in the master and yet chose not to
> block this release with [2]. I will be happy to work with Roshani to
> prepare another release candidate once ready.
>
> -sz
>
> [1]
>
> https://lists.apache.org/thread.html/f02e952bec22c82cb00a6741390a78f55373311c97464997bb455a6c@%3Cdev.mxnet.apache.org%3E
> [2]
>
> https://lists.apache.org/thread.html/85d3fcabb3437ba7f1af455cf69aa13eb3afd1ea1d1f6f891e9c339c@%3Cdev.mxnet.apache.org%3E
>
> On Tue, Sep 4, 2018 at 6:02 PM Thomas DELTEIL <th...@gmail.com>
> wrote:
>
> > -0
> > (non-binding)
> >
> > If I may add some nuancing plus a personal data point as one of the users
> > commenting in the bug report in question:
> >
> > - Performance vs. Basic functionality => I don't think high performance
> > use-cases and basic functionality are two obviously opposed concepts and
> > see no contradiction in Hagay's and Sandeep's statements.
> > Float16 support is feature of MXNet that provides more than twice the
> > performance of Float32 on supported platforms, hence the high performance
> > use-case. The bug is that the basic functionality of reloading a saved
> > float16 models is currently broken.
> >
> > - This bug vs Other bugs => Contrary the vast majority of the 140 open
> bugs
> > that are mentioned above, I would put to Sandeep's credit that this one
> bug
> > has a PR open that provides a fix for it. This would make it a better
> > candidate to get included in this release than a bug that has no fix
> ready
> > for it.
> >
> > - Personal datapoint: I recently did some experimentation with float16
> [1]
> > and actually coincidentally just published a video on optimizing
> > performance for Gluon. Float16 conversion is one of the most, if not the
> > most effective way to get performance out of MXNet [2]. I believe there
> is
> > a lot of value in publicizing more its use and hence making sure at least
> > the basic support for normal use-cases is present.
> >
> > Of course this needs to be balanced with the overhead of preparing a new
> > release candidate once the fixed is reviewed and merged, which seems to
> be
> > a lengthy and complex process in its own right, and the delay with
> > providing the other features present in 1.3 for users that are not
> running
> > off the nightly builds.
> >
> > All the best,
> >
> > Thomas
> >
> > [1] https://github.com/ThomasDelteil/PerformanceTricksMXNetGluon
> > [2]
> >
> >
> https://www.youtube.com/watch?v=Cqo7FPftNyo&t=0s&list=PLkEvNnRk8uVk6U515Pj-jHQUxFC4eDi3m
> >
> > Le mar. 4 sept. 2018 à 17:11, Sheng Zha <sz...@gmail.com> a écrit :
> >
> > > Sandeep,
> > >
> > > Thanks for explaining your veto. We have open bugs that impacted a lot
> > more
> > > than just 3 customers, just by referring to the number of commenters on
> > the
> > > issue [1].
> > >
> > > You said that this is for "high performance use cases", which
> contradicts
> > > with Hagay's assement that this is "basic functionality broken". Given
> > that
> > > this is for advanced use cases of using half-precision training, why is
> > it
> > > so much more important than any other open bug reports, that for this
> > > specific bug fix, we have to delay the access of regular users to the
> new
> > > MXNet 1.3 release by at least another week?
> > >
> > > Honestly, I'm concerned that your vote is biased by Amazon involvement,
> > > given that you quoted Amazon Rekognition.
> > >
> > > -sz
> > >
> > > [1]
> > >
> > >
> >
> https://github.com/apache/incubator-mxnet/issues?q=is%3Aissue+is%3Aopen+label%3ABug+sort%3Acomments-desc
> > >
> > > On Tue, Sep 4, 2018 at 4:51 PM sandeep krishnamurthy <
> > > sandeep.krishna98@gmail.com> wrote:
> > >
> > > > My initial vote of “-0” was due to lack of info from a user who had
> > said,
> > > > he overcame this issue for FP16 model.
> > > >
> > > >
> > > > However, suggested workaround [1] for the issue is not straight
> forward
> > > and
> > > > generally usable for all users. Also, issue is not simple and
> isolated
> > to
> > > > be listed in the Release Notes as known issue with a workaround.
> > > >
> > > >
> > > > Changing my vote to: "-1 (binding)" owing to the user impact [3]
> > > >
> > > >
> > > >
> > > > @Sheng:
> > > >
> > > > 1. Agreed, bug existed from long time. However, FP16 and such
> > > optimizations
> > > > were added later on. Followed by users [2] using this feature for
> high
> > > > performance use cases. It is not ok to measure severity of the bug
> > based
> > > on
> > > > its past existence, rather we can see who is impacted now and is it a
> > > small
> > > > subset with a simple workaround or large user impacting issue.
> > > >
> > > > 2. Agreed bug was reported 7/21. However, I became aware of this
> issue
> > on
> > > > 08/29 and submitted the fix on 08/30. Also, I did bring this to the
> > > notice
> > > > of community, you and 1.3 release manager (Roshani) on the RC0
> proposal
> > > > thread. Also, I would focus on the issue and user impact than who
> > > > identified and who is fixing the issue.
> > > >
> > > >
> > > > Based on my discussion with 2 users, I think it is a important
> feature
> > > for
> > > > them to see in Apache MXNet v1.3.0.
> > > >
> > > >
> > > >
> > > > Best,
> > > >
> > > > Sandeep
> > > >
> > > >
> > > > [1] Workaround used by the user.
> > > >
> > > >
> > > > net_fp16 = mx.gluon.SymbolBlock.imports('resnet34_fp16-symbol.json',
> > > > ['data'])
> > > >
> > > > params_fp16 = mx.nd.load('resnet34_fp16-0000.params')
> > > >
> > > >
> > > > for k, v in params_fp16.items():
> > > >
> > > >     new_key = k.split(':')[1]
> > > >
> > > >     net_fp16.collect_params()[new_key].cast(v.dtype)
> > > >
> > > >
> > > > net_fp16.collect_params().load('resnet34_fp16-0000.params', ctx)
> > > >
> > > >
> > > > [2] Amazon Rekognition
> > > >
> > > >
> > > > [3] User story: Train a model -> Cast it to FP16 -> Save the model ->
> > > Load
> > > > back the model does not work. They have to cast every parameter with
> a
> > > > workaround mentioned above [1].
> > > >
> > > > On Tue, Sep 4, 2018 at 4:14 PM Hagay Lupesko <lu...@gmail.com>
> > wrote:
> > > >
> > > > > Hi Sheng,
> > > > >
> > > > > Addressing your questions:
> > > > >
> > > > > - "why this specific bug is more important than all the other known
> > > bugs,
> > > > > that this becomes a release blocker"
> > > > > I do not consider it to be more or less important than other fixes.
> > It
> > > > can
> > > > > be fixed and included in the release alongside the rest of the
> > release
> > > > > content, right?
> > > > > From the description of the issue it seems important since it is
> > > blocking
> > > > > users from loading models that were previously trained and saved.
> > There
> > > > is
> > > > > nothing stopping the community from including this fix into 1.3.0,
> > > > > alongside the rest of the features and fixes.
> > > > >
> > > > > - "The bug exists since SymbolBlock was introduced a year ago and
> has
> > > > > survived at least three releases, so this is not a regression."
> > > > > I do not think I said it is a regression. However, the fact a bug
> > > existed
> > > > > before, does not mean it is OK to release it rather than fix it.
> > > > >
> > > > > - "Timeline-wise, this bug was reported on 7/21, but was not
> reported
> > > as
> > > > > release-blocker in the release discussion thread until 8/31 [1].
> > > Neither
> > > > > its reporting as release-blocker nor its fix made it for the 8/3
> code
> > > > > freeze."
> > > > > You are right, would have been better to have this identified and
> > fixed
> > > > > earlier and included before code freeze.
> > > > >
> > > > > - "The PR is still not ready yet as it doesn't have approval."
> > > > > I think it is waiting for your review.
> > > > >
> > > > > - "it would be great if you could provide some additional reasoning
> > > > besides
> > > > > "X mentions the issue" or "fix was done by X""
> > > > > I have. Repeating what I wrote in my previous email for clarity:
> > Basic
> > > > > functionality broken: loading a model (albeit one that that was
> saved
> > > as
> > > > > non FP32)
> > > > >
> > > > > So, yes - this issue seems to have been out there for a while,
> > somehow
> > > > went
> > > > > under the radar... but I think the key question is whether this
> > blocks
> > > a
> > > > > basic functionality in MXNet. I believe so, hence my -1 vote.
> > > > >
> > > > > Hagay
> > > > >
> > > > > On Tue, Sep 4, 2018 at 1:19 PM Sheng Zha <sz...@gmail.com>
> wrote:
> > > > >
> > > > > > Hi Hagay and Sandeep,
> > > > > >
> > > > > > Could you help us understand why this specific bug is more
> > important
> > > > than
> > > > > > all the other known bugs, that this becomes a release blocker?
> > > > > >
> > > > > > Some facts to consider:
> > > > > > - The bug exists since SymbolBlock was introduced a year ago and
> > has
> > > > > > survived at least three releases, so this is not a regression.
> > > > > > - Timeline-wise, this bug was reported on 7/21, but was not
> > reported
> > > as
> > > > > > release-blocker in the release discussion thread until 8/31 [1].
> > > > Neither
> > > > > > its reporting as release-blocker nor its fix made it for the 8/3
> > code
> > > > > > freeze.
> > > > > > - The PR is still not ready yet as it doesn't have approval.
> > > > > >
> > > > > > Hagay, it would be great if you could provide some additional
> > > reasoning
> > > > > > besides "X mentions the issue" or "fix was done by X". Thanks.
> > > > > >
> > > > > > -sz
> > > > > >
> > > > > > [1]
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://lists.apache.org/thread.html/d1ed611f98c20d5d85c294b0c07c8bdebca13a209cf66a3872c9123e@%3Cdev.mxnet.apache.org%3E
> > > > > >
> > > > > > On Tue, Sep 4, 2018 at 12:39 PM Hagay Lupesko <lupesko@gmail.com
> >
> > > > wrote:
> > > > > >
> > > > > > > Sandeep mentions the issue of an error when user tries to load
> > > model
> > > > > > params
> > > > > > > trained/saved as FP16.
> > > > > > > https://github.com/apache/incubator-mxnet/issues/11849
> > > > > > > The fix was done by Sandeep:
> > > > > > > https://github.com/apache/incubator-mxnet/pull/12412 and is
> > ready
> > > to
> > > > > be
> > > > > > > cherry picked into the release branch.
> > > > > > >
> > > > > > > This seems like a release blocker to me:
> > > > > > > - Basic functionality broken: loading a model (albeit one that
> > that
> > > > was
> > > > > > > saved as non FP32)
> > > > > > > - Reported by 3 users (wgchang@, nicklhy@ and ThomasDelteil@)
> > > > > > >
> > > > > > > -1 (non binding)
> > > > > > >
> > > > > > > Hagay
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Tue, Sep 4, 2018 at 12:01 PM sandeep krishnamurthy <
> > > > > > > sandeep.krishna98@gmail.com> wrote:
> > > > > > >
> > > > > > > > "- 0"
> > > > > > > >
> > > > > > > > I believe the bug #11849
> > > > > > > > <https://github.com/apache/incubator-mxnet/issues/11849>,
> > unable
> > > > to
> > > > > > > import
> > > > > > > > non-fp32 models into Gluon, fixed in this PR #12412
> > > > > > > > <https://github.com/apache/incubator-mxnet/pull/12412> is
> > > > important
> > > > > > for
> > > > > > > > the
> > > > > > > > users. I would rather pick this fix in this release than
> plan a
> > > > minor
> > > > > > > > release later.
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Sandeep
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Mon, Sep 3, 2018 at 2:34 PM Philip Cho <
> > > > > chohyu01@cs.washington.edu>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Actually, the command "git clone --recursive
> > > > > > > > > https://github.com/apache/incubator-mxnet -b 1.3.0.rc0"
> > works
> > > > fine
> > > > > > > now,
> > > > > > > > > never mind.
> > > > > > > > >
> > > > > > > > > On Mon, Sep 3, 2018 at 1:45 PM Philip Cho <
> > > > > > chohyu01@cs.washington.edu>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Unfortunately, MXNet was depending on a branch of TVM
> that
> > is
> > > > now
> > > > > > > > > deleted.
> > > > > > > > > > We will have to merge #12448
> > > > > > > > > > <https://github.com/apache/incubator-mxnet/pull/12448>
> > > before
> > > > > the
> > > > > > > > > release.
> > > > > > > > > >
> > > > > > > > > > Background: See dmlc/tvm#1394 <
> > > > > > > https://github.com/dmlc/tvm/issues/1394
> > > > > > > > >.
> > > > > > > > > >
> > > > > > > > > > Philip.
> > > > > > > > > >
> > > > > > > > > > On Mon, Sep 3, 2018 at 7:26 AM Carin Meier <
> > > > carinmeier@gmail.com
> > > > > >
> > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > >> Checked out the tag, built and tested the Clojure
> package.
> > > +1
> > > > > > > > > >>
> > > > > > > > > >> On Fri, Aug 31, 2018 at 10:59 PM Roshani Nagmote <
> > > > > > > > > >> roshaninagmote2@gmail.com>
> > > > > > > > > >> wrote:
> > > > > > > > > >>
> > > > > > > > > >> > Hi all,
> > > > > > > > > >> >
> > > > > > > > > >> > I would like to propose a vote to release Apache MXNet
> > > > > > > (incubating)
> > > > > > > > > >> version
> > > > > > > > > >> > 1.3.0.RC0. Voting will start now (Friday, Aug 31st)
> and
> > > end
> > > > at
> > > > > > > 7:00
> > > > > > > > PM
> > > > > > > > > >> > PDT, Wednesday, Sept 5th.
> > > > > > > > > >> >
> > > > > > > > > >> > Link to release notes:
> > > > > > > > > >> > https://github.com/apache/incubator-mxnet/releases
> > > > > > > > > >> >
> > > > > > > > > >> > Link to release candidate 1.3.0.rc0:
> > > > > > > > > >> > *
> > > > > > https://github.com/apache/incubator-mxnet/releases/tag/1.3.0.rc
> > > > > > > > > >> > <
> > > > > > https://github.com/apache/incubator-mxnet/releases/tag/1.3.0.rc0
> > > > > > > > >0*
> > > > > > > > > >> >
> > > > > > > > > >> > View this page, click on "Build from Source", and use
> > the
> > > > > source
> > > > > > > > code
> > > > > > > > > >> > obtained from 1.3.0.rc0 tag:
> > > > > > > > > >> > https://mxnet.incubator.apache.org/install/index.html
> > > > > > > > > >> >
> > > > > > > > > >> > Please remember to TEST first before voting
> accordingly:
> > > > > > > > > >> >
> > > > > > > > > >> > +1 = approve
> > > > > > > > > >> > +0 = no opinion
> > > > > > > > > >> > -1 = disapprove (provide reason)
> > > > > > > > > >> >
> > > > > > > > > >> > Thanks,
> > > > > > > > > >> > Roshani
> > > > > > > > > >> >
> > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Sandeep Krishnamurthy
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Sandeep Krishnamurthy
> > > >
> > >
> >
>

Re: [VOTE] Release MXNet version 1.3.0.RC0

Posted by Naveen Swamy <mn...@gmail.com>.
"Releases may not be vetoed"
http://www.apache.org/legal/release-policy.html#release-approval

I haven't tested the release yet, I'll do so tomorrow.

> On Sep 4, 2018, at 7:13 PM, Sheng Zha <sz...@gmail.com> wrote:
> 
> Thanks for sharing your opinions, Thomas. Your recognition and respect of
> people's efforts on preparing the release candidate are certainly
> appreciated.
> 
> Now that the vote is set to fail thanks to the veto, there will be plenty
> of opportunities to include those bug fixes, including the one Zhi
> mentioned [1], which was already merged in the master and yet chose not to
> block this release with [2]. I will be happy to work with Roshani to
> prepare another release candidate once ready.
> 
> -sz
> 
> [1]
> https://lists.apache.org/thread.html/f02e952bec22c82cb00a6741390a78f55373311c97464997bb455a6c@%3Cdev.mxnet.apache.org%3E
> [2]
> https://lists.apache.org/thread.html/85d3fcabb3437ba7f1af455cf69aa13eb3afd1ea1d1f6f891e9c339c@%3Cdev.mxnet.apache.org%3E
> 
> On Tue, Sep 4, 2018 at 6:02 PM Thomas DELTEIL <th...@gmail.com>
> wrote:
> 
>> -0
>> (non-binding)
>> 
>> If I may add some nuancing plus a personal data point as one of the users
>> commenting in the bug report in question:
>> 
>> - Performance vs. Basic functionality => I don't think high performance
>> use-cases and basic functionality are two obviously opposed concepts and
>> see no contradiction in Hagay's and Sandeep's statements.
>> Float16 support is feature of MXNet that provides more than twice the
>> performance of Float32 on supported platforms, hence the high performance
>> use-case. The bug is that the basic functionality of reloading a saved
>> float16 models is currently broken.
>> 
>> - This bug vs Other bugs => Contrary the vast majority of the 140 open bugs
>> that are mentioned above, I would put to Sandeep's credit that this one bug
>> has a PR open that provides a fix for it. This would make it a better
>> candidate to get included in this release than a bug that has no fix ready
>> for it.
>> 
>> - Personal datapoint: I recently did some experimentation with float16 [1]
>> and actually coincidentally just published a video on optimizing
>> performance for Gluon. Float16 conversion is one of the most, if not the
>> most effective way to get performance out of MXNet [2]. I believe there is
>> a lot of value in publicizing more its use and hence making sure at least
>> the basic support for normal use-cases is present.
>> 
>> Of course this needs to be balanced with the overhead of preparing a new
>> release candidate once the fixed is reviewed and merged, which seems to be
>> a lengthy and complex process in its own right, and the delay with
>> providing the other features present in 1.3 for users that are not running
>> off the nightly builds.
>> 
>> All the best,
>> 
>> Thomas
>> 
>> [1] https://github.com/ThomasDelteil/PerformanceTricksMXNetGluon
>> [2]
>> 
>> https://www.youtube.com/watch?v=Cqo7FPftNyo&t=0s&list=PLkEvNnRk8uVk6U515Pj-jHQUxFC4eDi3m
>> 
>>> Le mar. 4 sept. 2018 à 17:11, Sheng Zha <sz...@gmail.com> a écrit :
>>> 
>>> Sandeep,
>>> 
>>> Thanks for explaining your veto. We have open bugs that impacted a lot
>> more
>>> than just 3 customers, just by referring to the number of commenters on
>> the
>>> issue [1].
>>> 
>>> You said that this is for "high performance use cases", which contradicts
>>> with Hagay's assement that this is "basic functionality broken". Given
>> that
>>> this is for advanced use cases of using half-precision training, why is
>> it
>>> so much more important than any other open bug reports, that for this
>>> specific bug fix, we have to delay the access of regular users to the new
>>> MXNet 1.3 release by at least another week?
>>> 
>>> Honestly, I'm concerned that your vote is biased by Amazon involvement,
>>> given that you quoted Amazon Rekognition.
>>> 
>>> -sz
>>> 
>>> [1]
>>> 
>>> 
>> https://github.com/apache/incubator-mxnet/issues?q=is%3Aissue+is%3Aopen+label%3ABug+sort%3Acomments-desc
>>> 
>>> On Tue, Sep 4, 2018 at 4:51 PM sandeep krishnamurthy <
>>> sandeep.krishna98@gmail.com> wrote:
>>> 
>>>> My initial vote of “-0” was due to lack of info from a user who had
>> said,
>>>> he overcame this issue for FP16 model.
>>>> 
>>>> 
>>>> However, suggested workaround [1] for the issue is not straight forward
>>> and
>>>> generally usable for all users. Also, issue is not simple and isolated
>> to
>>>> be listed in the Release Notes as known issue with a workaround.
>>>> 
>>>> 
>>>> Changing my vote to: "-1 (binding)" owing to the user impact [3]
>>>> 
>>>> 
>>>> 
>>>> @Sheng:
>>>> 
>>>> 1. Agreed, bug existed from long time. However, FP16 and such
>>> optimizations
>>>> were added later on. Followed by users [2] using this feature for high
>>>> performance use cases. It is not ok to measure severity of the bug
>> based
>>> on
>>>> its past existence, rather we can see who is impacted now and is it a
>>> small
>>>> subset with a simple workaround or large user impacting issue.
>>>> 
>>>> 2. Agreed bug was reported 7/21. However, I became aware of this issue
>> on
>>>> 08/29 and submitted the fix on 08/30. Also, I did bring this to the
>>> notice
>>>> of community, you and 1.3 release manager (Roshani) on the RC0 proposal
>>>> thread. Also, I would focus on the issue and user impact than who
>>>> identified and who is fixing the issue.
>>>> 
>>>> 
>>>> Based on my discussion with 2 users, I think it is a important feature
>>> for
>>>> them to see in Apache MXNet v1.3.0.
>>>> 
>>>> 
>>>> 
>>>> Best,
>>>> 
>>>> Sandeep
>>>> 
>>>> 
>>>> [1] Workaround used by the user.
>>>> 
>>>> 
>>>> net_fp16 = mx.gluon.SymbolBlock.imports('resnet34_fp16-symbol.json',
>>>> ['data'])
>>>> 
>>>> params_fp16 = mx.nd.load('resnet34_fp16-0000.params')
>>>> 
>>>> 
>>>> for k, v in params_fp16.items():
>>>> 
>>>>    new_key = k.split(':')[1]
>>>> 
>>>>    net_fp16.collect_params()[new_key].cast(v.dtype)
>>>> 
>>>> 
>>>> net_fp16.collect_params().load('resnet34_fp16-0000.params', ctx)
>>>> 
>>>> 
>>>> [2] Amazon Rekognition
>>>> 
>>>> 
>>>> [3] User story: Train a model -> Cast it to FP16 -> Save the model ->
>>> Load
>>>> back the model does not work. They have to cast every parameter with a
>>>> workaround mentioned above [1].
>>>> 
>>>> On Tue, Sep 4, 2018 at 4:14 PM Hagay Lupesko <lu...@gmail.com>
>> wrote:
>>>> 
>>>>> Hi Sheng,
>>>>> 
>>>>> Addressing your questions:
>>>>> 
>>>>> - "why this specific bug is more important than all the other known
>>> bugs,
>>>>> that this becomes a release blocker"
>>>>> I do not consider it to be more or less important than other fixes.
>> It
>>>> can
>>>>> be fixed and included in the release alongside the rest of the
>> release
>>>>> content, right?
>>>>> From the description of the issue it seems important since it is
>>> blocking
>>>>> users from loading models that were previously trained and saved.
>> There
>>>> is
>>>>> nothing stopping the community from including this fix into 1.3.0,
>>>>> alongside the rest of the features and fixes.
>>>>> 
>>>>> - "The bug exists since SymbolBlock was introduced a year ago and has
>>>>> survived at least three releases, so this is not a regression."
>>>>> I do not think I said it is a regression. However, the fact a bug
>>> existed
>>>>> before, does not mean it is OK to release it rather than fix it.
>>>>> 
>>>>> - "Timeline-wise, this bug was reported on 7/21, but was not reported
>>> as
>>>>> release-blocker in the release discussion thread until 8/31 [1].
>>> Neither
>>>>> its reporting as release-blocker nor its fix made it for the 8/3 code
>>>>> freeze."
>>>>> You are right, would have been better to have this identified and
>> fixed
>>>>> earlier and included before code freeze.
>>>>> 
>>>>> - "The PR is still not ready yet as it doesn't have approval."
>>>>> I think it is waiting for your review.
>>>>> 
>>>>> - "it would be great if you could provide some additional reasoning
>>>> besides
>>>>> "X mentions the issue" or "fix was done by X""
>>>>> I have. Repeating what I wrote in my previous email for clarity:
>> Basic
>>>>> functionality broken: loading a model (albeit one that that was saved
>>> as
>>>>> non FP32)
>>>>> 
>>>>> So, yes - this issue seems to have been out there for a while,
>> somehow
>>>> went
>>>>> under the radar... but I think the key question is whether this
>> blocks
>>> a
>>>>> basic functionality in MXNet. I believe so, hence my -1 vote.
>>>>> 
>>>>> Hagay
>>>>> 
>>>>>> On Tue, Sep 4, 2018 at 1:19 PM Sheng Zha <sz...@gmail.com> wrote:
>>>>>> 
>>>>>> Hi Hagay and Sandeep,
>>>>>> 
>>>>>> Could you help us understand why this specific bug is more
>> important
>>>> than
>>>>>> all the other known bugs, that this becomes a release blocker?
>>>>>> 
>>>>>> Some facts to consider:
>>>>>> - The bug exists since SymbolBlock was introduced a year ago and
>> has
>>>>>> survived at least three releases, so this is not a regression.
>>>>>> - Timeline-wise, this bug was reported on 7/21, but was not
>> reported
>>> as
>>>>>> release-blocker in the release discussion thread until 8/31 [1].
>>>> Neither
>>>>>> its reporting as release-blocker nor its fix made it for the 8/3
>> code
>>>>>> freeze.
>>>>>> - The PR is still not ready yet as it doesn't have approval.
>>>>>> 
>>>>>> Hagay, it would be great if you could provide some additional
>>> reasoning
>>>>>> besides "X mentions the issue" or "fix was done by X". Thanks.
>>>>>> 
>>>>>> -sz
>>>>>> 
>>>>>> [1]
>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> https://lists.apache.org/thread.html/d1ed611f98c20d5d85c294b0c07c8bdebca13a209cf66a3872c9123e@%3Cdev.mxnet.apache.org%3E
>>>>>> 
>>>>>> On Tue, Sep 4, 2018 at 12:39 PM Hagay Lupesko <lu...@gmail.com>
>>>> wrote:
>>>>>> 
>>>>>>> Sandeep mentions the issue of an error when user tries to load
>>> model
>>>>>> params
>>>>>>> trained/saved as FP16.
>>>>>>> https://github.com/apache/incubator-mxnet/issues/11849
>>>>>>> The fix was done by Sandeep:
>>>>>>> https://github.com/apache/incubator-mxnet/pull/12412 and is
>> ready
>>> to
>>>>> be
>>>>>>> cherry picked into the release branch.
>>>>>>> 
>>>>>>> This seems like a release blocker to me:
>>>>>>> - Basic functionality broken: loading a model (albeit one that
>> that
>>>> was
>>>>>>> saved as non FP32)
>>>>>>> - Reported by 3 users (wgchang@, nicklhy@ and ThomasDelteil@)
>>>>>>> 
>>>>>>> -1 (non binding)
>>>>>>> 
>>>>>>> Hagay
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On Tue, Sep 4, 2018 at 12:01 PM sandeep krishnamurthy <
>>>>>>> sandeep.krishna98@gmail.com> wrote:
>>>>>>> 
>>>>>>>> "- 0"
>>>>>>>> 
>>>>>>>> I believe the bug #11849
>>>>>>>> <https://github.com/apache/incubator-mxnet/issues/11849>,
>> unable
>>>> to
>>>>>>> import
>>>>>>>> non-fp32 models into Gluon, fixed in this PR #12412
>>>>>>>> <https://github.com/apache/incubator-mxnet/pull/12412> is
>>>> important
>>>>>> for
>>>>>>>> the
>>>>>>>> users. I would rather pick this fix in this release than plan a
>>>> minor
>>>>>>>> release later.
>>>>>>>> 
>>>>>>>> Best,
>>>>>>>> Sandeep
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Mon, Sep 3, 2018 at 2:34 PM Philip Cho <
>>>>> chohyu01@cs.washington.edu>
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Actually, the command "git clone --recursive
>>>>>>>>> https://github.com/apache/incubator-mxnet -b 1.3.0.rc0"
>> works
>>>> fine
>>>>>>> now,
>>>>>>>>> never mind.
>>>>>>>>> 
>>>>>>>>> On Mon, Sep 3, 2018 at 1:45 PM Philip Cho <
>>>>>> chohyu01@cs.washington.edu>
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> Unfortunately, MXNet was depending on a branch of TVM that
>> is
>>>> now
>>>>>>>>> deleted.
>>>>>>>>>> We will have to merge #12448
>>>>>>>>>> <https://github.com/apache/incubator-mxnet/pull/12448>
>>> before
>>>>> the
>>>>>>>>> release.
>>>>>>>>>> 
>>>>>>>>>> Background: See dmlc/tvm#1394 <
>>>>>>> https://github.com/dmlc/tvm/issues/1394
>>>>>>>>> .
>>>>>>>>>> 
>>>>>>>>>> Philip.
>>>>>>>>>> 
>>>>>>>>>> On Mon, Sep 3, 2018 at 7:26 AM Carin Meier <
>>>> carinmeier@gmail.com
>>>>>> 
>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Checked out the tag, built and tested the Clojure package.
>>> +1
>>>>>>>>>>> 
>>>>>>>>>>> On Fri, Aug 31, 2018 at 10:59 PM Roshani Nagmote <
>>>>>>>>>>> roshaninagmote2@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Hi all,
>>>>>>>>>>>> 
>>>>>>>>>>>> I would like to propose a vote to release Apache MXNet
>>>>>>> (incubating)
>>>>>>>>>>> version
>>>>>>>>>>>> 1.3.0.RC0. Voting will start now (Friday, Aug 31st) and
>>> end
>>>> at
>>>>>>> 7:00
>>>>>>>> PM
>>>>>>>>>>>> PDT, Wednesday, Sept 5th.
>>>>>>>>>>>> 
>>>>>>>>>>>> Link to release notes:
>>>>>>>>>>>> https://github.com/apache/incubator-mxnet/releases
>>>>>>>>>>>> 
>>>>>>>>>>>> Link to release candidate 1.3.0.rc0:
>>>>>>>>>>>> *
>>>>>> https://github.com/apache/incubator-mxnet/releases/tag/1.3.0.rc
>>>>>>>>>>>> <
>>>>>> https://github.com/apache/incubator-mxnet/releases/tag/1.3.0.rc0
>>>>>>>>> 0*
>>>>>>>>>>>> 
>>>>>>>>>>>> View this page, click on "Build from Source", and use
>> the
>>>>> source
>>>>>>>> code
>>>>>>>>>>>> obtained from 1.3.0.rc0 tag:
>>>>>>>>>>>> https://mxnet.incubator.apache.org/install/index.html
>>>>>>>>>>>> 
>>>>>>>>>>>> Please remember to TEST first before voting accordingly:
>>>>>>>>>>>> 
>>>>>>>>>>>> +1 = approve
>>>>>>>>>>>> +0 = no opinion
>>>>>>>>>>>> -1 = disapprove (provide reason)
>>>>>>>>>>>> 
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Roshani
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> Sandeep Krishnamurthy
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> --
>>>> Sandeep Krishnamurthy
>>>> 
>>> 
>> 

Re: [VOTE] Release MXNet version 1.3.0.RC0

Posted by Sheng Zha <sz...@gmail.com>.
Thanks for sharing your opinions, Thomas. Your recognition and respect of
people's efforts on preparing the release candidate are certainly
appreciated.

Now that the vote is set to fail thanks to the veto, there will be plenty
of opportunities to include those bug fixes, including the one Zhi
mentioned [1], which was already merged in the master and yet chose not to
block this release with [2]. I will be happy to work with Roshani to
prepare another release candidate once ready.

-sz

[1]
https://lists.apache.org/thread.html/f02e952bec22c82cb00a6741390a78f55373311c97464997bb455a6c@%3Cdev.mxnet.apache.org%3E
[2]
https://lists.apache.org/thread.html/85d3fcabb3437ba7f1af455cf69aa13eb3afd1ea1d1f6f891e9c339c@%3Cdev.mxnet.apache.org%3E

On Tue, Sep 4, 2018 at 6:02 PM Thomas DELTEIL <th...@gmail.com>
wrote:

> -0
> (non-binding)
>
> If I may add some nuancing plus a personal data point as one of the users
> commenting in the bug report in question:
>
> - Performance vs. Basic functionality => I don't think high performance
> use-cases and basic functionality are two obviously opposed concepts and
> see no contradiction in Hagay's and Sandeep's statements.
> Float16 support is feature of MXNet that provides more than twice the
> performance of Float32 on supported platforms, hence the high performance
> use-case. The bug is that the basic functionality of reloading a saved
> float16 models is currently broken.
>
> - This bug vs Other bugs => Contrary the vast majority of the 140 open bugs
> that are mentioned above, I would put to Sandeep's credit that this one bug
> has a PR open that provides a fix for it. This would make it a better
> candidate to get included in this release than a bug that has no fix ready
> for it.
>
> - Personal datapoint: I recently did some experimentation with float16 [1]
> and actually coincidentally just published a video on optimizing
> performance for Gluon. Float16 conversion is one of the most, if not the
> most effective way to get performance out of MXNet [2]. I believe there is
> a lot of value in publicizing more its use and hence making sure at least
> the basic support for normal use-cases is present.
>
> Of course this needs to be balanced with the overhead of preparing a new
> release candidate once the fixed is reviewed and merged, which seems to be
> a lengthy and complex process in its own right, and the delay with
> providing the other features present in 1.3 for users that are not running
> off the nightly builds.
>
> All the best,
>
> Thomas
>
> [1] https://github.com/ThomasDelteil/PerformanceTricksMXNetGluon
> [2]
>
> https://www.youtube.com/watch?v=Cqo7FPftNyo&t=0s&list=PLkEvNnRk8uVk6U515Pj-jHQUxFC4eDi3m
>
> Le mar. 4 sept. 2018 à 17:11, Sheng Zha <sz...@gmail.com> a écrit :
>
> > Sandeep,
> >
> > Thanks for explaining your veto. We have open bugs that impacted a lot
> more
> > than just 3 customers, just by referring to the number of commenters on
> the
> > issue [1].
> >
> > You said that this is for "high performance use cases", which contradicts
> > with Hagay's assement that this is "basic functionality broken". Given
> that
> > this is for advanced use cases of using half-precision training, why is
> it
> > so much more important than any other open bug reports, that for this
> > specific bug fix, we have to delay the access of regular users to the new
> > MXNet 1.3 release by at least another week?
> >
> > Honestly, I'm concerned that your vote is biased by Amazon involvement,
> > given that you quoted Amazon Rekognition.
> >
> > -sz
> >
> > [1]
> >
> >
> https://github.com/apache/incubator-mxnet/issues?q=is%3Aissue+is%3Aopen+label%3ABug+sort%3Acomments-desc
> >
> > On Tue, Sep 4, 2018 at 4:51 PM sandeep krishnamurthy <
> > sandeep.krishna98@gmail.com> wrote:
> >
> > > My initial vote of “-0” was due to lack of info from a user who had
> said,
> > > he overcame this issue for FP16 model.
> > >
> > >
> > > However, suggested workaround [1] for the issue is not straight forward
> > and
> > > generally usable for all users. Also, issue is not simple and isolated
> to
> > > be listed in the Release Notes as known issue with a workaround.
> > >
> > >
> > > Changing my vote to: "-1 (binding)" owing to the user impact [3]
> > >
> > >
> > >
> > > @Sheng:
> > >
> > > 1. Agreed, bug existed from long time. However, FP16 and such
> > optimizations
> > > were added later on. Followed by users [2] using this feature for high
> > > performance use cases. It is not ok to measure severity of the bug
> based
> > on
> > > its past existence, rather we can see who is impacted now and is it a
> > small
> > > subset with a simple workaround or large user impacting issue.
> > >
> > > 2. Agreed bug was reported 7/21. However, I became aware of this issue
> on
> > > 08/29 and submitted the fix on 08/30. Also, I did bring this to the
> > notice
> > > of community, you and 1.3 release manager (Roshani) on the RC0 proposal
> > > thread. Also, I would focus on the issue and user impact than who
> > > identified and who is fixing the issue.
> > >
> > >
> > > Based on my discussion with 2 users, I think it is a important feature
> > for
> > > them to see in Apache MXNet v1.3.0.
> > >
> > >
> > >
> > > Best,
> > >
> > > Sandeep
> > >
> > >
> > > [1] Workaround used by the user.
> > >
> > >
> > > net_fp16 = mx.gluon.SymbolBlock.imports('resnet34_fp16-symbol.json',
> > > ['data'])
> > >
> > > params_fp16 = mx.nd.load('resnet34_fp16-0000.params')
> > >
> > >
> > > for k, v in params_fp16.items():
> > >
> > >     new_key = k.split(':')[1]
> > >
> > >     net_fp16.collect_params()[new_key].cast(v.dtype)
> > >
> > >
> > > net_fp16.collect_params().load('resnet34_fp16-0000.params', ctx)
> > >
> > >
> > > [2] Amazon Rekognition
> > >
> > >
> > > [3] User story: Train a model -> Cast it to FP16 -> Save the model ->
> > Load
> > > back the model does not work. They have to cast every parameter with a
> > > workaround mentioned above [1].
> > >
> > > On Tue, Sep 4, 2018 at 4:14 PM Hagay Lupesko <lu...@gmail.com>
> wrote:
> > >
> > > > Hi Sheng,
> > > >
> > > > Addressing your questions:
> > > >
> > > > - "why this specific bug is more important than all the other known
> > bugs,
> > > > that this becomes a release blocker"
> > > > I do not consider it to be more or less important than other fixes.
> It
> > > can
> > > > be fixed and included in the release alongside the rest of the
> release
> > > > content, right?
> > > > From the description of the issue it seems important since it is
> > blocking
> > > > users from loading models that were previously trained and saved.
> There
> > > is
> > > > nothing stopping the community from including this fix into 1.3.0,
> > > > alongside the rest of the features and fixes.
> > > >
> > > > - "The bug exists since SymbolBlock was introduced a year ago and has
> > > > survived at least three releases, so this is not a regression."
> > > > I do not think I said it is a regression. However, the fact a bug
> > existed
> > > > before, does not mean it is OK to release it rather than fix it.
> > > >
> > > > - "Timeline-wise, this bug was reported on 7/21, but was not reported
> > as
> > > > release-blocker in the release discussion thread until 8/31 [1].
> > Neither
> > > > its reporting as release-blocker nor its fix made it for the 8/3 code
> > > > freeze."
> > > > You are right, would have been better to have this identified and
> fixed
> > > > earlier and included before code freeze.
> > > >
> > > > - "The PR is still not ready yet as it doesn't have approval."
> > > > I think it is waiting for your review.
> > > >
> > > > - "it would be great if you could provide some additional reasoning
> > > besides
> > > > "X mentions the issue" or "fix was done by X""
> > > > I have. Repeating what I wrote in my previous email for clarity:
> Basic
> > > > functionality broken: loading a model (albeit one that that was saved
> > as
> > > > non FP32)
> > > >
> > > > So, yes - this issue seems to have been out there for a while,
> somehow
> > > went
> > > > under the radar... but I think the key question is whether this
> blocks
> > a
> > > > basic functionality in MXNet. I believe so, hence my -1 vote.
> > > >
> > > > Hagay
> > > >
> > > > On Tue, Sep 4, 2018 at 1:19 PM Sheng Zha <sz...@gmail.com> wrote:
> > > >
> > > > > Hi Hagay and Sandeep,
> > > > >
> > > > > Could you help us understand why this specific bug is more
> important
> > > than
> > > > > all the other known bugs, that this becomes a release blocker?
> > > > >
> > > > > Some facts to consider:
> > > > > - The bug exists since SymbolBlock was introduced a year ago and
> has
> > > > > survived at least three releases, so this is not a regression.
> > > > > - Timeline-wise, this bug was reported on 7/21, but was not
> reported
> > as
> > > > > release-blocker in the release discussion thread until 8/31 [1].
> > > Neither
> > > > > its reporting as release-blocker nor its fix made it for the 8/3
> code
> > > > > freeze.
> > > > > - The PR is still not ready yet as it doesn't have approval.
> > > > >
> > > > > Hagay, it would be great if you could provide some additional
> > reasoning
> > > > > besides "X mentions the issue" or "fix was done by X". Thanks.
> > > > >
> > > > > -sz
> > > > >
> > > > > [1]
> > > > >
> > > > >
> > > >
> > >
> >
> https://lists.apache.org/thread.html/d1ed611f98c20d5d85c294b0c07c8bdebca13a209cf66a3872c9123e@%3Cdev.mxnet.apache.org%3E
> > > > >
> > > > > On Tue, Sep 4, 2018 at 12:39 PM Hagay Lupesko <lu...@gmail.com>
> > > wrote:
> > > > >
> > > > > > Sandeep mentions the issue of an error when user tries to load
> > model
> > > > > params
> > > > > > trained/saved as FP16.
> > > > > > https://github.com/apache/incubator-mxnet/issues/11849
> > > > > > The fix was done by Sandeep:
> > > > > > https://github.com/apache/incubator-mxnet/pull/12412 and is
> ready
> > to
> > > > be
> > > > > > cherry picked into the release branch.
> > > > > >
> > > > > > This seems like a release blocker to me:
> > > > > > - Basic functionality broken: loading a model (albeit one that
> that
> > > was
> > > > > > saved as non FP32)
> > > > > > - Reported by 3 users (wgchang@, nicklhy@ and ThomasDelteil@)
> > > > > >
> > > > > > -1 (non binding)
> > > > > >
> > > > > > Hagay
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Tue, Sep 4, 2018 at 12:01 PM sandeep krishnamurthy <
> > > > > > sandeep.krishna98@gmail.com> wrote:
> > > > > >
> > > > > > > "- 0"
> > > > > > >
> > > > > > > I believe the bug #11849
> > > > > > > <https://github.com/apache/incubator-mxnet/issues/11849>,
> unable
> > > to
> > > > > > import
> > > > > > > non-fp32 models into Gluon, fixed in this PR #12412
> > > > > > > <https://github.com/apache/incubator-mxnet/pull/12412> is
> > > important
> > > > > for
> > > > > > > the
> > > > > > > users. I would rather pick this fix in this release than plan a
> > > minor
> > > > > > > release later.
> > > > > > >
> > > > > > > Best,
> > > > > > > Sandeep
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Mon, Sep 3, 2018 at 2:34 PM Philip Cho <
> > > > chohyu01@cs.washington.edu>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Actually, the command "git clone --recursive
> > > > > > > > https://github.com/apache/incubator-mxnet -b 1.3.0.rc0"
> works
> > > fine
> > > > > > now,
> > > > > > > > never mind.
> > > > > > > >
> > > > > > > > On Mon, Sep 3, 2018 at 1:45 PM Philip Cho <
> > > > > chohyu01@cs.washington.edu>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Unfortunately, MXNet was depending on a branch of TVM that
> is
> > > now
> > > > > > > > deleted.
> > > > > > > > > We will have to merge #12448
> > > > > > > > > <https://github.com/apache/incubator-mxnet/pull/12448>
> > before
> > > > the
> > > > > > > > release.
> > > > > > > > >
> > > > > > > > > Background: See dmlc/tvm#1394 <
> > > > > > https://github.com/dmlc/tvm/issues/1394
> > > > > > > >.
> > > > > > > > >
> > > > > > > > > Philip.
> > > > > > > > >
> > > > > > > > > On Mon, Sep 3, 2018 at 7:26 AM Carin Meier <
> > > carinmeier@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > > > >
> > > > > > > > >> Checked out the tag, built and tested the Clojure package.
> > +1
> > > > > > > > >>
> > > > > > > > >> On Fri, Aug 31, 2018 at 10:59 PM Roshani Nagmote <
> > > > > > > > >> roshaninagmote2@gmail.com>
> > > > > > > > >> wrote:
> > > > > > > > >>
> > > > > > > > >> > Hi all,
> > > > > > > > >> >
> > > > > > > > >> > I would like to propose a vote to release Apache MXNet
> > > > > > (incubating)
> > > > > > > > >> version
> > > > > > > > >> > 1.3.0.RC0. Voting will start now (Friday, Aug 31st) and
> > end
> > > at
> > > > > > 7:00
> > > > > > > PM
> > > > > > > > >> > PDT, Wednesday, Sept 5th.
> > > > > > > > >> >
> > > > > > > > >> > Link to release notes:
> > > > > > > > >> > https://github.com/apache/incubator-mxnet/releases
> > > > > > > > >> >
> > > > > > > > >> > Link to release candidate 1.3.0.rc0:
> > > > > > > > >> > *
> > > > > https://github.com/apache/incubator-mxnet/releases/tag/1.3.0.rc
> > > > > > > > >> > <
> > > > > https://github.com/apache/incubator-mxnet/releases/tag/1.3.0.rc0
> > > > > > > >0*
> > > > > > > > >> >
> > > > > > > > >> > View this page, click on "Build from Source", and use
> the
> > > > source
> > > > > > > code
> > > > > > > > >> > obtained from 1.3.0.rc0 tag:
> > > > > > > > >> > https://mxnet.incubator.apache.org/install/index.html
> > > > > > > > >> >
> > > > > > > > >> > Please remember to TEST first before voting accordingly:
> > > > > > > > >> >
> > > > > > > > >> > +1 = approve
> > > > > > > > >> > +0 = no opinion
> > > > > > > > >> > -1 = disapprove (provide reason)
> > > > > > > > >> >
> > > > > > > > >> > Thanks,
> > > > > > > > >> > Roshani
> > > > > > > > >> >
> > > > > > > > >>
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Sandeep Krishnamurthy
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > >
> > > --
> > > Sandeep Krishnamurthy
> > >
> >
>

Re: [VOTE] Release MXNet version 1.3.0.RC0

Posted by Thomas DELTEIL <th...@gmail.com>.
-0
(non-binding)

If I may add some nuancing plus a personal data point as one of the users
commenting in the bug report in question:

- Performance vs. Basic functionality => I don't think high performance
use-cases and basic functionality are two obviously opposed concepts and
see no contradiction in Hagay's and Sandeep's statements.
Float16 support is feature of MXNet that provides more than twice the
performance of Float32 on supported platforms, hence the high performance
use-case. The bug is that the basic functionality of reloading a saved
float16 models is currently broken.

- This bug vs Other bugs => Contrary the vast majority of the 140 open bugs
that are mentioned above, I would put to Sandeep's credit that this one bug
has a PR open that provides a fix for it. This would make it a better
candidate to get included in this release than a bug that has no fix ready
for it.

- Personal datapoint: I recently did some experimentation with float16 [1]
and actually coincidentally just published a video on optimizing
performance for Gluon. Float16 conversion is one of the most, if not the
most effective way to get performance out of MXNet [2]. I believe there is
a lot of value in publicizing more its use and hence making sure at least
the basic support for normal use-cases is present.

Of course this needs to be balanced with the overhead of preparing a new
release candidate once the fixed is reviewed and merged, which seems to be
a lengthy and complex process in its own right, and the delay with
providing the other features present in 1.3 for users that are not running
off the nightly builds.

All the best,

Thomas

[1] https://github.com/ThomasDelteil/PerformanceTricksMXNetGluon
[2]
https://www.youtube.com/watch?v=Cqo7FPftNyo&t=0s&list=PLkEvNnRk8uVk6U515Pj-jHQUxFC4eDi3m

Le mar. 4 sept. 2018 à 17:11, Sheng Zha <sz...@gmail.com> a écrit :

> Sandeep,
>
> Thanks for explaining your veto. We have open bugs that impacted a lot more
> than just 3 customers, just by referring to the number of commenters on the
> issue [1].
>
> You said that this is for "high performance use cases", which contradicts
> with Hagay's assement that this is "basic functionality broken". Given that
> this is for advanced use cases of using half-precision training, why is it
> so much more important than any other open bug reports, that for this
> specific bug fix, we have to delay the access of regular users to the new
> MXNet 1.3 release by at least another week?
>
> Honestly, I'm concerned that your vote is biased by Amazon involvement,
> given that you quoted Amazon Rekognition.
>
> -sz
>
> [1]
>
> https://github.com/apache/incubator-mxnet/issues?q=is%3Aissue+is%3Aopen+label%3ABug+sort%3Acomments-desc
>
> On Tue, Sep 4, 2018 at 4:51 PM sandeep krishnamurthy <
> sandeep.krishna98@gmail.com> wrote:
>
> > My initial vote of “-0” was due to lack of info from a user who had said,
> > he overcame this issue for FP16 model.
> >
> >
> > However, suggested workaround [1] for the issue is not straight forward
> and
> > generally usable for all users. Also, issue is not simple and isolated to
> > be listed in the Release Notes as known issue with a workaround.
> >
> >
> > Changing my vote to: "-1 (binding)" owing to the user impact [3]
> >
> >
> >
> > @Sheng:
> >
> > 1. Agreed, bug existed from long time. However, FP16 and such
> optimizations
> > were added later on. Followed by users [2] using this feature for high
> > performance use cases. It is not ok to measure severity of the bug based
> on
> > its past existence, rather we can see who is impacted now and is it a
> small
> > subset with a simple workaround or large user impacting issue.
> >
> > 2. Agreed bug was reported 7/21. However, I became aware of this issue on
> > 08/29 and submitted the fix on 08/30. Also, I did bring this to the
> notice
> > of community, you and 1.3 release manager (Roshani) on the RC0 proposal
> > thread. Also, I would focus on the issue and user impact than who
> > identified and who is fixing the issue.
> >
> >
> > Based on my discussion with 2 users, I think it is a important feature
> for
> > them to see in Apache MXNet v1.3.0.
> >
> >
> >
> > Best,
> >
> > Sandeep
> >
> >
> > [1] Workaround used by the user.
> >
> >
> > net_fp16 = mx.gluon.SymbolBlock.imports('resnet34_fp16-symbol.json',
> > ['data'])
> >
> > params_fp16 = mx.nd.load('resnet34_fp16-0000.params')
> >
> >
> > for k, v in params_fp16.items():
> >
> >     new_key = k.split(':')[1]
> >
> >     net_fp16.collect_params()[new_key].cast(v.dtype)
> >
> >
> > net_fp16.collect_params().load('resnet34_fp16-0000.params', ctx)
> >
> >
> > [2] Amazon Rekognition
> >
> >
> > [3] User story: Train a model -> Cast it to FP16 -> Save the model ->
> Load
> > back the model does not work. They have to cast every parameter with a
> > workaround mentioned above [1].
> >
> > On Tue, Sep 4, 2018 at 4:14 PM Hagay Lupesko <lu...@gmail.com> wrote:
> >
> > > Hi Sheng,
> > >
> > > Addressing your questions:
> > >
> > > - "why this specific bug is more important than all the other known
> bugs,
> > > that this becomes a release blocker"
> > > I do not consider it to be more or less important than other fixes. It
> > can
> > > be fixed and included in the release alongside the rest of the release
> > > content, right?
> > > From the description of the issue it seems important since it is
> blocking
> > > users from loading models that were previously trained and saved. There
> > is
> > > nothing stopping the community from including this fix into 1.3.0,
> > > alongside the rest of the features and fixes.
> > >
> > > - "The bug exists since SymbolBlock was introduced a year ago and has
> > > survived at least three releases, so this is not a regression."
> > > I do not think I said it is a regression. However, the fact a bug
> existed
> > > before, does not mean it is OK to release it rather than fix it.
> > >
> > > - "Timeline-wise, this bug was reported on 7/21, but was not reported
> as
> > > release-blocker in the release discussion thread until 8/31 [1].
> Neither
> > > its reporting as release-blocker nor its fix made it for the 8/3 code
> > > freeze."
> > > You are right, would have been better to have this identified and fixed
> > > earlier and included before code freeze.
> > >
> > > - "The PR is still not ready yet as it doesn't have approval."
> > > I think it is waiting for your review.
> > >
> > > - "it would be great if you could provide some additional reasoning
> > besides
> > > "X mentions the issue" or "fix was done by X""
> > > I have. Repeating what I wrote in my previous email for clarity: Basic
> > > functionality broken: loading a model (albeit one that that was saved
> as
> > > non FP32)
> > >
> > > So, yes - this issue seems to have been out there for a while, somehow
> > went
> > > under the radar... but I think the key question is whether this blocks
> a
> > > basic functionality in MXNet. I believe so, hence my -1 vote.
> > >
> > > Hagay
> > >
> > > On Tue, Sep 4, 2018 at 1:19 PM Sheng Zha <sz...@gmail.com> wrote:
> > >
> > > > Hi Hagay and Sandeep,
> > > >
> > > > Could you help us understand why this specific bug is more important
> > than
> > > > all the other known bugs, that this becomes a release blocker?
> > > >
> > > > Some facts to consider:
> > > > - The bug exists since SymbolBlock was introduced a year ago and has
> > > > survived at least three releases, so this is not a regression.
> > > > - Timeline-wise, this bug was reported on 7/21, but was not reported
> as
> > > > release-blocker in the release discussion thread until 8/31 [1].
> > Neither
> > > > its reporting as release-blocker nor its fix made it for the 8/3 code
> > > > freeze.
> > > > - The PR is still not ready yet as it doesn't have approval.
> > > >
> > > > Hagay, it would be great if you could provide some additional
> reasoning
> > > > besides "X mentions the issue" or "fix was done by X". Thanks.
> > > >
> > > > -sz
> > > >
> > > > [1]
> > > >
> > > >
> > >
> >
> https://lists.apache.org/thread.html/d1ed611f98c20d5d85c294b0c07c8bdebca13a209cf66a3872c9123e@%3Cdev.mxnet.apache.org%3E
> > > >
> > > > On Tue, Sep 4, 2018 at 12:39 PM Hagay Lupesko <lu...@gmail.com>
> > wrote:
> > > >
> > > > > Sandeep mentions the issue of an error when user tries to load
> model
> > > > params
> > > > > trained/saved as FP16.
> > > > > https://github.com/apache/incubator-mxnet/issues/11849
> > > > > The fix was done by Sandeep:
> > > > > https://github.com/apache/incubator-mxnet/pull/12412 and is ready
> to
> > > be
> > > > > cherry picked into the release branch.
> > > > >
> > > > > This seems like a release blocker to me:
> > > > > - Basic functionality broken: loading a model (albeit one that that
> > was
> > > > > saved as non FP32)
> > > > > - Reported by 3 users (wgchang@, nicklhy@ and ThomasDelteil@)
> > > > >
> > > > > -1 (non binding)
> > > > >
> > > > > Hagay
> > > > >
> > > > >
> > > > >
> > > > > On Tue, Sep 4, 2018 at 12:01 PM sandeep krishnamurthy <
> > > > > sandeep.krishna98@gmail.com> wrote:
> > > > >
> > > > > > "- 0"
> > > > > >
> > > > > > I believe the bug #11849
> > > > > > <https://github.com/apache/incubator-mxnet/issues/11849>, unable
> > to
> > > > > import
> > > > > > non-fp32 models into Gluon, fixed in this PR #12412
> > > > > > <https://github.com/apache/incubator-mxnet/pull/12412> is
> > important
> > > > for
> > > > > > the
> > > > > > users. I would rather pick this fix in this release than plan a
> > minor
> > > > > > release later.
> > > > > >
> > > > > > Best,
> > > > > > Sandeep
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Mon, Sep 3, 2018 at 2:34 PM Philip Cho <
> > > chohyu01@cs.washington.edu>
> > > > > > wrote:
> > > > > >
> > > > > > > Actually, the command "git clone --recursive
> > > > > > > https://github.com/apache/incubator-mxnet -b 1.3.0.rc0" works
> > fine
> > > > > now,
> > > > > > > never mind.
> > > > > > >
> > > > > > > On Mon, Sep 3, 2018 at 1:45 PM Philip Cho <
> > > > chohyu01@cs.washington.edu>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Unfortunately, MXNet was depending on a branch of TVM that is
> > now
> > > > > > > deleted.
> > > > > > > > We will have to merge #12448
> > > > > > > > <https://github.com/apache/incubator-mxnet/pull/12448>
> before
> > > the
> > > > > > > release.
> > > > > > > >
> > > > > > > > Background: See dmlc/tvm#1394 <
> > > > > https://github.com/dmlc/tvm/issues/1394
> > > > > > >.
> > > > > > > >
> > > > > > > > Philip.
> > > > > > > >
> > > > > > > > On Mon, Sep 3, 2018 at 7:26 AM Carin Meier <
> > carinmeier@gmail.com
> > > >
> > > > > > wrote:
> > > > > > > >
> > > > > > > >> Checked out the tag, built and tested the Clojure package.
> +1
> > > > > > > >>
> > > > > > > >> On Fri, Aug 31, 2018 at 10:59 PM Roshani Nagmote <
> > > > > > > >> roshaninagmote2@gmail.com>
> > > > > > > >> wrote:
> > > > > > > >>
> > > > > > > >> > Hi all,
> > > > > > > >> >
> > > > > > > >> > I would like to propose a vote to release Apache MXNet
> > > > > (incubating)
> > > > > > > >> version
> > > > > > > >> > 1.3.0.RC0. Voting will start now (Friday, Aug 31st) and
> end
> > at
> > > > > 7:00
> > > > > > PM
> > > > > > > >> > PDT, Wednesday, Sept 5th.
> > > > > > > >> >
> > > > > > > >> > Link to release notes:
> > > > > > > >> > https://github.com/apache/incubator-mxnet/releases
> > > > > > > >> >
> > > > > > > >> > Link to release candidate 1.3.0.rc0:
> > > > > > > >> > *
> > > > https://github.com/apache/incubator-mxnet/releases/tag/1.3.0.rc
> > > > > > > >> > <
> > > > https://github.com/apache/incubator-mxnet/releases/tag/1.3.0.rc0
> > > > > > >0*
> > > > > > > >> >
> > > > > > > >> > View this page, click on "Build from Source", and use the
> > > source
> > > > > > code
> > > > > > > >> > obtained from 1.3.0.rc0 tag:
> > > > > > > >> > https://mxnet.incubator.apache.org/install/index.html
> > > > > > > >> >
> > > > > > > >> > Please remember to TEST first before voting accordingly:
> > > > > > > >> >
> > > > > > > >> > +1 = approve
> > > > > > > >> > +0 = no opinion
> > > > > > > >> > -1 = disapprove (provide reason)
> > > > > > > >> >
> > > > > > > >> > Thanks,
> > > > > > > >> > Roshani
> > > > > > > >> >
> > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Sandeep Krishnamurthy
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> > --
> > Sandeep Krishnamurthy
> >
>

Re: [VOTE] Release MXNet version 1.3.0.RC0

Posted by Sheng Zha <sz...@gmail.com>.
Sandeep,

Thanks for explaining your veto. We have open bugs that impacted a lot more
than just 3 customers, just by referring to the number of commenters on the
issue [1].

You said that this is for "high performance use cases", which contradicts
with Hagay's assement that this is "basic functionality broken". Given that
this is for advanced use cases of using half-precision training, why is it
so much more important than any other open bug reports, that for this
specific bug fix, we have to delay the access of regular users to the new
MXNet 1.3 release by at least another week?

Honestly, I'm concerned that your vote is biased by Amazon involvement,
given that you quoted Amazon Rekognition.

-sz

[1]
https://github.com/apache/incubator-mxnet/issues?q=is%3Aissue+is%3Aopen+label%3ABug+sort%3Acomments-desc

On Tue, Sep 4, 2018 at 4:51 PM sandeep krishnamurthy <
sandeep.krishna98@gmail.com> wrote:

> My initial vote of “-0” was due to lack of info from a user who had said,
> he overcame this issue for FP16 model.
>
>
> However, suggested workaround [1] for the issue is not straight forward and
> generally usable for all users. Also, issue is not simple and isolated to
> be listed in the Release Notes as known issue with a workaround.
>
>
> Changing my vote to: "-1 (binding)" owing to the user impact [3]
>
>
>
> @Sheng:
>
> 1. Agreed, bug existed from long time. However, FP16 and such optimizations
> were added later on. Followed by users [2] using this feature for high
> performance use cases. It is not ok to measure severity of the bug based on
> its past existence, rather we can see who is impacted now and is it a small
> subset with a simple workaround or large user impacting issue.
>
> 2. Agreed bug was reported 7/21. However, I became aware of this issue on
> 08/29 and submitted the fix on 08/30. Also, I did bring this to the notice
> of community, you and 1.3 release manager (Roshani) on the RC0 proposal
> thread. Also, I would focus on the issue and user impact than who
> identified and who is fixing the issue.
>
>
> Based on my discussion with 2 users, I think it is a important feature for
> them to see in Apache MXNet v1.3.0.
>
>
>
> Best,
>
> Sandeep
>
>
> [1] Workaround used by the user.
>
>
> net_fp16 = mx.gluon.SymbolBlock.imports('resnet34_fp16-symbol.json',
> ['data'])
>
> params_fp16 = mx.nd.load('resnet34_fp16-0000.params')
>
>
> for k, v in params_fp16.items():
>
>     new_key = k.split(':')[1]
>
>     net_fp16.collect_params()[new_key].cast(v.dtype)
>
>
> net_fp16.collect_params().load('resnet34_fp16-0000.params', ctx)
>
>
> [2] Amazon Rekognition
>
>
> [3] User story: Train a model -> Cast it to FP16 -> Save the model -> Load
> back the model does not work. They have to cast every parameter with a
> workaround mentioned above [1].
>
> On Tue, Sep 4, 2018 at 4:14 PM Hagay Lupesko <lu...@gmail.com> wrote:
>
> > Hi Sheng,
> >
> > Addressing your questions:
> >
> > - "why this specific bug is more important than all the other known bugs,
> > that this becomes a release blocker"
> > I do not consider it to be more or less important than other fixes. It
> can
> > be fixed and included in the release alongside the rest of the release
> > content, right?
> > From the description of the issue it seems important since it is blocking
> > users from loading models that were previously trained and saved. There
> is
> > nothing stopping the community from including this fix into 1.3.0,
> > alongside the rest of the features and fixes.
> >
> > - "The bug exists since SymbolBlock was introduced a year ago and has
> > survived at least three releases, so this is not a regression."
> > I do not think I said it is a regression. However, the fact a bug existed
> > before, does not mean it is OK to release it rather than fix it.
> >
> > - "Timeline-wise, this bug was reported on 7/21, but was not reported as
> > release-blocker in the release discussion thread until 8/31 [1]. Neither
> > its reporting as release-blocker nor its fix made it for the 8/3 code
> > freeze."
> > You are right, would have been better to have this identified and fixed
> > earlier and included before code freeze.
> >
> > - "The PR is still not ready yet as it doesn't have approval."
> > I think it is waiting for your review.
> >
> > - "it would be great if you could provide some additional reasoning
> besides
> > "X mentions the issue" or "fix was done by X""
> > I have. Repeating what I wrote in my previous email for clarity: Basic
> > functionality broken: loading a model (albeit one that that was saved as
> > non FP32)
> >
> > So, yes - this issue seems to have been out there for a while, somehow
> went
> > under the radar... but I think the key question is whether this blocks a
> > basic functionality in MXNet. I believe so, hence my -1 vote.
> >
> > Hagay
> >
> > On Tue, Sep 4, 2018 at 1:19 PM Sheng Zha <sz...@gmail.com> wrote:
> >
> > > Hi Hagay and Sandeep,
> > >
> > > Could you help us understand why this specific bug is more important
> than
> > > all the other known bugs, that this becomes a release blocker?
> > >
> > > Some facts to consider:
> > > - The bug exists since SymbolBlock was introduced a year ago and has
> > > survived at least three releases, so this is not a regression.
> > > - Timeline-wise, this bug was reported on 7/21, but was not reported as
> > > release-blocker in the release discussion thread until 8/31 [1].
> Neither
> > > its reporting as release-blocker nor its fix made it for the 8/3 code
> > > freeze.
> > > - The PR is still not ready yet as it doesn't have approval.
> > >
> > > Hagay, it would be great if you could provide some additional reasoning
> > > besides "X mentions the issue" or "fix was done by X". Thanks.
> > >
> > > -sz
> > >
> > > [1]
> > >
> > >
> >
> https://lists.apache.org/thread.html/d1ed611f98c20d5d85c294b0c07c8bdebca13a209cf66a3872c9123e@%3Cdev.mxnet.apache.org%3E
> > >
> > > On Tue, Sep 4, 2018 at 12:39 PM Hagay Lupesko <lu...@gmail.com>
> wrote:
> > >
> > > > Sandeep mentions the issue of an error when user tries to load model
> > > params
> > > > trained/saved as FP16.
> > > > https://github.com/apache/incubator-mxnet/issues/11849
> > > > The fix was done by Sandeep:
> > > > https://github.com/apache/incubator-mxnet/pull/12412 and is ready to
> > be
> > > > cherry picked into the release branch.
> > > >
> > > > This seems like a release blocker to me:
> > > > - Basic functionality broken: loading a model (albeit one that that
> was
> > > > saved as non FP32)
> > > > - Reported by 3 users (wgchang@, nicklhy@ and ThomasDelteil@)
> > > >
> > > > -1 (non binding)
> > > >
> > > > Hagay
> > > >
> > > >
> > > >
> > > > On Tue, Sep 4, 2018 at 12:01 PM sandeep krishnamurthy <
> > > > sandeep.krishna98@gmail.com> wrote:
> > > >
> > > > > "- 0"
> > > > >
> > > > > I believe the bug #11849
> > > > > <https://github.com/apache/incubator-mxnet/issues/11849>, unable
> to
> > > > import
> > > > > non-fp32 models into Gluon, fixed in this PR #12412
> > > > > <https://github.com/apache/incubator-mxnet/pull/12412> is
> important
> > > for
> > > > > the
> > > > > users. I would rather pick this fix in this release than plan a
> minor
> > > > > release later.
> > > > >
> > > > > Best,
> > > > > Sandeep
> > > > >
> > > > >
> > > > >
> > > > > On Mon, Sep 3, 2018 at 2:34 PM Philip Cho <
> > chohyu01@cs.washington.edu>
> > > > > wrote:
> > > > >
> > > > > > Actually, the command "git clone --recursive
> > > > > > https://github.com/apache/incubator-mxnet -b 1.3.0.rc0" works
> fine
> > > > now,
> > > > > > never mind.
> > > > > >
> > > > > > On Mon, Sep 3, 2018 at 1:45 PM Philip Cho <
> > > chohyu01@cs.washington.edu>
> > > > > > wrote:
> > > > > >
> > > > > > > Unfortunately, MXNet was depending on a branch of TVM that is
> now
> > > > > > deleted.
> > > > > > > We will have to merge #12448
> > > > > > > <https://github.com/apache/incubator-mxnet/pull/12448> before
> > the
> > > > > > release.
> > > > > > >
> > > > > > > Background: See dmlc/tvm#1394 <
> > > > https://github.com/dmlc/tvm/issues/1394
> > > > > >.
> > > > > > >
> > > > > > > Philip.
> > > > > > >
> > > > > > > On Mon, Sep 3, 2018 at 7:26 AM Carin Meier <
> carinmeier@gmail.com
> > >
> > > > > wrote:
> > > > > > >
> > > > > > >> Checked out the tag, built and tested the Clojure package. +1
> > > > > > >>
> > > > > > >> On Fri, Aug 31, 2018 at 10:59 PM Roshani Nagmote <
> > > > > > >> roshaninagmote2@gmail.com>
> > > > > > >> wrote:
> > > > > > >>
> > > > > > >> > Hi all,
> > > > > > >> >
> > > > > > >> > I would like to propose a vote to release Apache MXNet
> > > > (incubating)
> > > > > > >> version
> > > > > > >> > 1.3.0.RC0. Voting will start now (Friday, Aug 31st) and end
> at
> > > > 7:00
> > > > > PM
> > > > > > >> > PDT, Wednesday, Sept 5th.
> > > > > > >> >
> > > > > > >> > Link to release notes:
> > > > > > >> > https://github.com/apache/incubator-mxnet/releases
> > > > > > >> >
> > > > > > >> > Link to release candidate 1.3.0.rc0:
> > > > > > >> > *
> > > https://github.com/apache/incubator-mxnet/releases/tag/1.3.0.rc
> > > > > > >> > <
> > > https://github.com/apache/incubator-mxnet/releases/tag/1.3.0.rc0
> > > > > >0*
> > > > > > >> >
> > > > > > >> > View this page, click on "Build from Source", and use the
> > source
> > > > > code
> > > > > > >> > obtained from 1.3.0.rc0 tag:
> > > > > > >> > https://mxnet.incubator.apache.org/install/index.html
> > > > > > >> >
> > > > > > >> > Please remember to TEST first before voting accordingly:
> > > > > > >> >
> > > > > > >> > +1 = approve
> > > > > > >> > +0 = no opinion
> > > > > > >> > -1 = disapprove (provide reason)
> > > > > > >> >
> > > > > > >> > Thanks,
> > > > > > >> > Roshani
> > > > > > >> >
> > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Sandeep Krishnamurthy
> > > > >
> > > >
> > >
> >
>
>
> --
> Sandeep Krishnamurthy
>

Re: [VOTE] Release MXNet version 1.3.0.RC0

Posted by sandeep krishnamurthy <sa...@gmail.com>.
My initial vote of “-0” was due to lack of info from a user who had said,
he overcame this issue for FP16 model.


However, suggested workaround [1] for the issue is not straight forward and
generally usable for all users. Also, issue is not simple and isolated to
be listed in the Release Notes as known issue with a workaround.


Changing my vote to: "-1 (binding)" owing to the user impact [3]



@Sheng:

1. Agreed, bug existed from long time. However, FP16 and such optimizations
were added later on. Followed by users [2] using this feature for high
performance use cases. It is not ok to measure severity of the bug based on
its past existence, rather we can see who is impacted now and is it a small
subset with a simple workaround or large user impacting issue.

2. Agreed bug was reported 7/21. However, I became aware of this issue on
08/29 and submitted the fix on 08/30. Also, I did bring this to the notice
of community, you and 1.3 release manager (Roshani) on the RC0 proposal
thread. Also, I would focus on the issue and user impact than who
identified and who is fixing the issue.


Based on my discussion with 2 users, I think it is a important feature for
them to see in Apache MXNet v1.3.0.



Best,

Sandeep


[1] Workaround used by the user.


net_fp16 = mx.gluon.SymbolBlock.imports('resnet34_fp16-symbol.json',
['data'])

params_fp16 = mx.nd.load('resnet34_fp16-0000.params')


for k, v in params_fp16.items():

    new_key = k.split(':')[1]

    net_fp16.collect_params()[new_key].cast(v.dtype)


net_fp16.collect_params().load('resnet34_fp16-0000.params', ctx)


[2] Amazon Rekognition


[3] User story: Train a model -> Cast it to FP16 -> Save the model -> Load
back the model does not work. They have to cast every parameter with a
workaround mentioned above [1].

On Tue, Sep 4, 2018 at 4:14 PM Hagay Lupesko <lu...@gmail.com> wrote:

> Hi Sheng,
>
> Addressing your questions:
>
> - "why this specific bug is more important than all the other known bugs,
> that this becomes a release blocker"
> I do not consider it to be more or less important than other fixes. It can
> be fixed and included in the release alongside the rest of the release
> content, right?
> From the description of the issue it seems important since it is blocking
> users from loading models that were previously trained and saved. There is
> nothing stopping the community from including this fix into 1.3.0,
> alongside the rest of the features and fixes.
>
> - "The bug exists since SymbolBlock was introduced a year ago and has
> survived at least three releases, so this is not a regression."
> I do not think I said it is a regression. However, the fact a bug existed
> before, does not mean it is OK to release it rather than fix it.
>
> - "Timeline-wise, this bug was reported on 7/21, but was not reported as
> release-blocker in the release discussion thread until 8/31 [1]. Neither
> its reporting as release-blocker nor its fix made it for the 8/3 code
> freeze."
> You are right, would have been better to have this identified and fixed
> earlier and included before code freeze.
>
> - "The PR is still not ready yet as it doesn't have approval."
> I think it is waiting for your review.
>
> - "it would be great if you could provide some additional reasoning besides
> "X mentions the issue" or "fix was done by X""
> I have. Repeating what I wrote in my previous email for clarity: Basic
> functionality broken: loading a model (albeit one that that was saved as
> non FP32)
>
> So, yes - this issue seems to have been out there for a while, somehow went
> under the radar... but I think the key question is whether this blocks a
> basic functionality in MXNet. I believe so, hence my -1 vote.
>
> Hagay
>
> On Tue, Sep 4, 2018 at 1:19 PM Sheng Zha <sz...@gmail.com> wrote:
>
> > Hi Hagay and Sandeep,
> >
> > Could you help us understand why this specific bug is more important than
> > all the other known bugs, that this becomes a release blocker?
> >
> > Some facts to consider:
> > - The bug exists since SymbolBlock was introduced a year ago and has
> > survived at least three releases, so this is not a regression.
> > - Timeline-wise, this bug was reported on 7/21, but was not reported as
> > release-blocker in the release discussion thread until 8/31 [1]. Neither
> > its reporting as release-blocker nor its fix made it for the 8/3 code
> > freeze.
> > - The PR is still not ready yet as it doesn't have approval.
> >
> > Hagay, it would be great if you could provide some additional reasoning
> > besides "X mentions the issue" or "fix was done by X". Thanks.
> >
> > -sz
> >
> > [1]
> >
> >
> https://lists.apache.org/thread.html/d1ed611f98c20d5d85c294b0c07c8bdebca13a209cf66a3872c9123e@%3Cdev.mxnet.apache.org%3E
> >
> > On Tue, Sep 4, 2018 at 12:39 PM Hagay Lupesko <lu...@gmail.com> wrote:
> >
> > > Sandeep mentions the issue of an error when user tries to load model
> > params
> > > trained/saved as FP16.
> > > https://github.com/apache/incubator-mxnet/issues/11849
> > > The fix was done by Sandeep:
> > > https://github.com/apache/incubator-mxnet/pull/12412 and is ready to
> be
> > > cherry picked into the release branch.
> > >
> > > This seems like a release blocker to me:
> > > - Basic functionality broken: loading a model (albeit one that that was
> > > saved as non FP32)
> > > - Reported by 3 users (wgchang@, nicklhy@ and ThomasDelteil@)
> > >
> > > -1 (non binding)
> > >
> > > Hagay
> > >
> > >
> > >
> > > On Tue, Sep 4, 2018 at 12:01 PM sandeep krishnamurthy <
> > > sandeep.krishna98@gmail.com> wrote:
> > >
> > > > "- 0"
> > > >
> > > > I believe the bug #11849
> > > > <https://github.com/apache/incubator-mxnet/issues/11849>, unable to
> > > import
> > > > non-fp32 models into Gluon, fixed in this PR #12412
> > > > <https://github.com/apache/incubator-mxnet/pull/12412> is important
> > for
> > > > the
> > > > users. I would rather pick this fix in this release than plan a minor
> > > > release later.
> > > >
> > > > Best,
> > > > Sandeep
> > > >
> > > >
> > > >
> > > > On Mon, Sep 3, 2018 at 2:34 PM Philip Cho <
> chohyu01@cs.washington.edu>
> > > > wrote:
> > > >
> > > > > Actually, the command "git clone --recursive
> > > > > https://github.com/apache/incubator-mxnet -b 1.3.0.rc0" works fine
> > > now,
> > > > > never mind.
> > > > >
> > > > > On Mon, Sep 3, 2018 at 1:45 PM Philip Cho <
> > chohyu01@cs.washington.edu>
> > > > > wrote:
> > > > >
> > > > > > Unfortunately, MXNet was depending on a branch of TVM that is now
> > > > > deleted.
> > > > > > We will have to merge #12448
> > > > > > <https://github.com/apache/incubator-mxnet/pull/12448> before
> the
> > > > > release.
> > > > > >
> > > > > > Background: See dmlc/tvm#1394 <
> > > https://github.com/dmlc/tvm/issues/1394
> > > > >.
> > > > > >
> > > > > > Philip.
> > > > > >
> > > > > > On Mon, Sep 3, 2018 at 7:26 AM Carin Meier <carinmeier@gmail.com
> >
> > > > wrote:
> > > > > >
> > > > > >> Checked out the tag, built and tested the Clojure package. +1
> > > > > >>
> > > > > >> On Fri, Aug 31, 2018 at 10:59 PM Roshani Nagmote <
> > > > > >> roshaninagmote2@gmail.com>
> > > > > >> wrote:
> > > > > >>
> > > > > >> > Hi all,
> > > > > >> >
> > > > > >> > I would like to propose a vote to release Apache MXNet
> > > (incubating)
> > > > > >> version
> > > > > >> > 1.3.0.RC0. Voting will start now (Friday, Aug 31st) and end at
> > > 7:00
> > > > PM
> > > > > >> > PDT, Wednesday, Sept 5th.
> > > > > >> >
> > > > > >> > Link to release notes:
> > > > > >> > https://github.com/apache/incubator-mxnet/releases
> > > > > >> >
> > > > > >> > Link to release candidate 1.3.0.rc0:
> > > > > >> > *
> > https://github.com/apache/incubator-mxnet/releases/tag/1.3.0.rc
> > > > > >> > <
> > https://github.com/apache/incubator-mxnet/releases/tag/1.3.0.rc0
> > > > >0*
> > > > > >> >
> > > > > >> > View this page, click on "Build from Source", and use the
> source
> > > > code
> > > > > >> > obtained from 1.3.0.rc0 tag:
> > > > > >> > https://mxnet.incubator.apache.org/install/index.html
> > > > > >> >
> > > > > >> > Please remember to TEST first before voting accordingly:
> > > > > >> >
> > > > > >> > +1 = approve
> > > > > >> > +0 = no opinion
> > > > > >> > -1 = disapprove (provide reason)
> > > > > >> >
> > > > > >> > Thanks,
> > > > > >> > Roshani
> > > > > >> >
> > > > > >>
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Sandeep Krishnamurthy
> > > >
> > >
> >
>


-- 
Sandeep Krishnamurthy

Re: [VOTE] Release MXNet version 1.3.0.RC0

Posted by Hagay Lupesko <lu...@gmail.com>.
Hi Sheng,

Addressing your questions:

- "why this specific bug is more important than all the other known bugs,
that this becomes a release blocker"
I do not consider it to be more or less important than other fixes. It can
be fixed and included in the release alongside the rest of the release
content, right?
From the description of the issue it seems important since it is blocking
users from loading models that were previously trained and saved. There is
nothing stopping the community from including this fix into 1.3.0,
alongside the rest of the features and fixes.

- "The bug exists since SymbolBlock was introduced a year ago and has
survived at least three releases, so this is not a regression."
I do not think I said it is a regression. However, the fact a bug existed
before, does not mean it is OK to release it rather than fix it.

- "Timeline-wise, this bug was reported on 7/21, but was not reported as
release-blocker in the release discussion thread until 8/31 [1]. Neither
its reporting as release-blocker nor its fix made it for the 8/3 code
freeze."
You are right, would have been better to have this identified and fixed
earlier and included before code freeze.

- "The PR is still not ready yet as it doesn't have approval."
I think it is waiting for your review.

- "it would be great if you could provide some additional reasoning besides
"X mentions the issue" or "fix was done by X""
I have. Repeating what I wrote in my previous email for clarity: Basic
functionality broken: loading a model (albeit one that that was saved as
non FP32)

So, yes - this issue seems to have been out there for a while, somehow went
under the radar... but I think the key question is whether this blocks a
basic functionality in MXNet. I believe so, hence my -1 vote.

Hagay

On Tue, Sep 4, 2018 at 1:19 PM Sheng Zha <sz...@gmail.com> wrote:

> Hi Hagay and Sandeep,
>
> Could you help us understand why this specific bug is more important than
> all the other known bugs, that this becomes a release blocker?
>
> Some facts to consider:
> - The bug exists since SymbolBlock was introduced a year ago and has
> survived at least three releases, so this is not a regression.
> - Timeline-wise, this bug was reported on 7/21, but was not reported as
> release-blocker in the release discussion thread until 8/31 [1]. Neither
> its reporting as release-blocker nor its fix made it for the 8/3 code
> freeze.
> - The PR is still not ready yet as it doesn't have approval.
>
> Hagay, it would be great if you could provide some additional reasoning
> besides "X mentions the issue" or "fix was done by X". Thanks.
>
> -sz
>
> [1]
>
> https://lists.apache.org/thread.html/d1ed611f98c20d5d85c294b0c07c8bdebca13a209cf66a3872c9123e@%3Cdev.mxnet.apache.org%3E
>
> On Tue, Sep 4, 2018 at 12:39 PM Hagay Lupesko <lu...@gmail.com> wrote:
>
> > Sandeep mentions the issue of an error when user tries to load model
> params
> > trained/saved as FP16.
> > https://github.com/apache/incubator-mxnet/issues/11849
> > The fix was done by Sandeep:
> > https://github.com/apache/incubator-mxnet/pull/12412 and is ready to be
> > cherry picked into the release branch.
> >
> > This seems like a release blocker to me:
> > - Basic functionality broken: loading a model (albeit one that that was
> > saved as non FP32)
> > - Reported by 3 users (wgchang@, nicklhy@ and ThomasDelteil@)
> >
> > -1 (non binding)
> >
> > Hagay
> >
> >
> >
> > On Tue, Sep 4, 2018 at 12:01 PM sandeep krishnamurthy <
> > sandeep.krishna98@gmail.com> wrote:
> >
> > > "- 0"
> > >
> > > I believe the bug #11849
> > > <https://github.com/apache/incubator-mxnet/issues/11849>, unable to
> > import
> > > non-fp32 models into Gluon, fixed in this PR #12412
> > > <https://github.com/apache/incubator-mxnet/pull/12412> is important
> for
> > > the
> > > users. I would rather pick this fix in this release than plan a minor
> > > release later.
> > >
> > > Best,
> > > Sandeep
> > >
> > >
> > >
> > > On Mon, Sep 3, 2018 at 2:34 PM Philip Cho <ch...@cs.washington.edu>
> > > wrote:
> > >
> > > > Actually, the command "git clone --recursive
> > > > https://github.com/apache/incubator-mxnet -b 1.3.0.rc0" works fine
> > now,
> > > > never mind.
> > > >
> > > > On Mon, Sep 3, 2018 at 1:45 PM Philip Cho <
> chohyu01@cs.washington.edu>
> > > > wrote:
> > > >
> > > > > Unfortunately, MXNet was depending on a branch of TVM that is now
> > > > deleted.
> > > > > We will have to merge #12448
> > > > > <https://github.com/apache/incubator-mxnet/pull/12448> before the
> > > > release.
> > > > >
> > > > > Background: See dmlc/tvm#1394 <
> > https://github.com/dmlc/tvm/issues/1394
> > > >.
> > > > >
> > > > > Philip.
> > > > >
> > > > > On Mon, Sep 3, 2018 at 7:26 AM Carin Meier <ca...@gmail.com>
> > > wrote:
> > > > >
> > > > >> Checked out the tag, built and tested the Clojure package. +1
> > > > >>
> > > > >> On Fri, Aug 31, 2018 at 10:59 PM Roshani Nagmote <
> > > > >> roshaninagmote2@gmail.com>
> > > > >> wrote:
> > > > >>
> > > > >> > Hi all,
> > > > >> >
> > > > >> > I would like to propose a vote to release Apache MXNet
> > (incubating)
> > > > >> version
> > > > >> > 1.3.0.RC0. Voting will start now (Friday, Aug 31st) and end at
> > 7:00
> > > PM
> > > > >> > PDT, Wednesday, Sept 5th.
> > > > >> >
> > > > >> > Link to release notes:
> > > > >> > https://github.com/apache/incubator-mxnet/releases
> > > > >> >
> > > > >> > Link to release candidate 1.3.0.rc0:
> > > > >> > *
> https://github.com/apache/incubator-mxnet/releases/tag/1.3.0.rc
> > > > >> > <
> https://github.com/apache/incubator-mxnet/releases/tag/1.3.0.rc0
> > > >0*
> > > > >> >
> > > > >> > View this page, click on "Build from Source", and use the source
> > > code
> > > > >> > obtained from 1.3.0.rc0 tag:
> > > > >> > https://mxnet.incubator.apache.org/install/index.html
> > > > >> >
> > > > >> > Please remember to TEST first before voting accordingly:
> > > > >> >
> > > > >> > +1 = approve
> > > > >> > +0 = no opinion
> > > > >> > -1 = disapprove (provide reason)
> > > > >> >
> > > > >> > Thanks,
> > > > >> > Roshani
> > > > >> >
> > > > >>
> > > > >
> > > >
> > >
> > >
> > > --
> > > Sandeep Krishnamurthy
> > >
> >
>

Re: [VOTE] Release MXNet version 1.3.0.RC0

Posted by Sheng Zha <sz...@gmail.com>.
Hi Hagay and Sandeep,

Could you help us understand why this specific bug is more important than
all the other known bugs, that this becomes a release blocker?

Some facts to consider:
- The bug exists since SymbolBlock was introduced a year ago and has
survived at least three releases, so this is not a regression.
- Timeline-wise, this bug was reported on 7/21, but was not reported as
release-blocker in the release discussion thread until 8/31 [1]. Neither
its reporting as release-blocker nor its fix made it for the 8/3 code
freeze.
- The PR is still not ready yet as it doesn't have approval.

Hagay, it would be great if you could provide some additional reasoning
besides "X mentions the issue" or "fix was done by X". Thanks.

-sz

[1]
https://lists.apache.org/thread.html/d1ed611f98c20d5d85c294b0c07c8bdebca13a209cf66a3872c9123e@%3Cdev.mxnet.apache.org%3E

On Tue, Sep 4, 2018 at 12:39 PM Hagay Lupesko <lu...@gmail.com> wrote:

> Sandeep mentions the issue of an error when user tries to load model params
> trained/saved as FP16.
> https://github.com/apache/incubator-mxnet/issues/11849
> The fix was done by Sandeep:
> https://github.com/apache/incubator-mxnet/pull/12412 and is ready to be
> cherry picked into the release branch.
>
> This seems like a release blocker to me:
> - Basic functionality broken: loading a model (albeit one that that was
> saved as non FP32)
> - Reported by 3 users (wgchang@, nicklhy@ and ThomasDelteil@)
>
> -1 (non binding)
>
> Hagay
>
>
>
> On Tue, Sep 4, 2018 at 12:01 PM sandeep krishnamurthy <
> sandeep.krishna98@gmail.com> wrote:
>
> > "- 0"
> >
> > I believe the bug #11849
> > <https://github.com/apache/incubator-mxnet/issues/11849>, unable to
> import
> > non-fp32 models into Gluon, fixed in this PR #12412
> > <https://github.com/apache/incubator-mxnet/pull/12412> is important for
> > the
> > users. I would rather pick this fix in this release than plan a minor
> > release later.
> >
> > Best,
> > Sandeep
> >
> >
> >
> > On Mon, Sep 3, 2018 at 2:34 PM Philip Cho <ch...@cs.washington.edu>
> > wrote:
> >
> > > Actually, the command "git clone --recursive
> > > https://github.com/apache/incubator-mxnet -b 1.3.0.rc0" works fine
> now,
> > > never mind.
> > >
> > > On Mon, Sep 3, 2018 at 1:45 PM Philip Cho <ch...@cs.washington.edu>
> > > wrote:
> > >
> > > > Unfortunately, MXNet was depending on a branch of TVM that is now
> > > deleted.
> > > > We will have to merge #12448
> > > > <https://github.com/apache/incubator-mxnet/pull/12448> before the
> > > release.
> > > >
> > > > Background: See dmlc/tvm#1394 <
> https://github.com/dmlc/tvm/issues/1394
> > >.
> > > >
> > > > Philip.
> > > >
> > > > On Mon, Sep 3, 2018 at 7:26 AM Carin Meier <ca...@gmail.com>
> > wrote:
> > > >
> > > >> Checked out the tag, built and tested the Clojure package. +1
> > > >>
> > > >> On Fri, Aug 31, 2018 at 10:59 PM Roshani Nagmote <
> > > >> roshaninagmote2@gmail.com>
> > > >> wrote:
> > > >>
> > > >> > Hi all,
> > > >> >
> > > >> > I would like to propose a vote to release Apache MXNet
> (incubating)
> > > >> version
> > > >> > 1.3.0.RC0. Voting will start now (Friday, Aug 31st) and end at
> 7:00
> > PM
> > > >> > PDT, Wednesday, Sept 5th.
> > > >> >
> > > >> > Link to release notes:
> > > >> > https://github.com/apache/incubator-mxnet/releases
> > > >> >
> > > >> > Link to release candidate 1.3.0.rc0:
> > > >> > *https://github.com/apache/incubator-mxnet/releases/tag/1.3.0.rc
> > > >> > <https://github.com/apache/incubator-mxnet/releases/tag/1.3.0.rc0
> > >0*
> > > >> >
> > > >> > View this page, click on "Build from Source", and use the source
> > code
> > > >> > obtained from 1.3.0.rc0 tag:
> > > >> > https://mxnet.incubator.apache.org/install/index.html
> > > >> >
> > > >> > Please remember to TEST first before voting accordingly:
> > > >> >
> > > >> > +1 = approve
> > > >> > +0 = no opinion
> > > >> > -1 = disapprove (provide reason)
> > > >> >
> > > >> > Thanks,
> > > >> > Roshani
> > > >> >
> > > >>
> > > >
> > >
> >
> >
> > --
> > Sandeep Krishnamurthy
> >
>

Re: [VOTE] Release MXNet version 1.3.0.RC0

Posted by Hagay Lupesko <lu...@gmail.com>.
Sandeep mentions the issue of an error when user tries to load model params
trained/saved as FP16.
https://github.com/apache/incubator-mxnet/issues/11849
The fix was done by Sandeep:
https://github.com/apache/incubator-mxnet/pull/12412 and is ready to be
cherry picked into the release branch.

This seems like a release blocker to me:
- Basic functionality broken: loading a model (albeit one that that was
saved as non FP32)
- Reported by 3 users (wgchang@, nicklhy@ and ThomasDelteil@)

-1 (non binding)

Hagay



On Tue, Sep 4, 2018 at 12:01 PM sandeep krishnamurthy <
sandeep.krishna98@gmail.com> wrote:

> "- 0"
>
> I believe the bug #11849
> <https://github.com/apache/incubator-mxnet/issues/11849>, unable to import
> non-fp32 models into Gluon, fixed in this PR #12412
> <https://github.com/apache/incubator-mxnet/pull/12412> is important for
> the
> users. I would rather pick this fix in this release than plan a minor
> release later.
>
> Best,
> Sandeep
>
>
>
> On Mon, Sep 3, 2018 at 2:34 PM Philip Cho <ch...@cs.washington.edu>
> wrote:
>
> > Actually, the command "git clone --recursive
> > https://github.com/apache/incubator-mxnet -b 1.3.0.rc0" works fine now,
> > never mind.
> >
> > On Mon, Sep 3, 2018 at 1:45 PM Philip Cho <ch...@cs.washington.edu>
> > wrote:
> >
> > > Unfortunately, MXNet was depending on a branch of TVM that is now
> > deleted.
> > > We will have to merge #12448
> > > <https://github.com/apache/incubator-mxnet/pull/12448> before the
> > release.
> > >
> > > Background: See dmlc/tvm#1394 <https://github.com/dmlc/tvm/issues/1394
> >.
> > >
> > > Philip.
> > >
> > > On Mon, Sep 3, 2018 at 7:26 AM Carin Meier <ca...@gmail.com>
> wrote:
> > >
> > >> Checked out the tag, built and tested the Clojure package. +1
> > >>
> > >> On Fri, Aug 31, 2018 at 10:59 PM Roshani Nagmote <
> > >> roshaninagmote2@gmail.com>
> > >> wrote:
> > >>
> > >> > Hi all,
> > >> >
> > >> > I would like to propose a vote to release Apache MXNet (incubating)
> > >> version
> > >> > 1.3.0.RC0. Voting will start now (Friday, Aug 31st) and end at 7:00
> PM
> > >> > PDT, Wednesday, Sept 5th.
> > >> >
> > >> > Link to release notes:
> > >> > https://github.com/apache/incubator-mxnet/releases
> > >> >
> > >> > Link to release candidate 1.3.0.rc0:
> > >> > *https://github.com/apache/incubator-mxnet/releases/tag/1.3.0.rc
> > >> > <https://github.com/apache/incubator-mxnet/releases/tag/1.3.0.rc0
> >0*
> > >> >
> > >> > View this page, click on "Build from Source", and use the source
> code
> > >> > obtained from 1.3.0.rc0 tag:
> > >> > https://mxnet.incubator.apache.org/install/index.html
> > >> >
> > >> > Please remember to TEST first before voting accordingly:
> > >> >
> > >> > +1 = approve
> > >> > +0 = no opinion
> > >> > -1 = disapprove (provide reason)
> > >> >
> > >> > Thanks,
> > >> > Roshani
> > >> >
> > >>
> > >
> >
>
>
> --
> Sandeep Krishnamurthy
>

Re: [VOTE] Release MXNet version 1.3.0.RC0

Posted by sandeep krishnamurthy <sa...@gmail.com>.
"- 0"

I believe the bug #11849
<https://github.com/apache/incubator-mxnet/issues/11849>, unable to import
non-fp32 models into Gluon, fixed in this PR #12412
<https://github.com/apache/incubator-mxnet/pull/12412> is important for the
users. I would rather pick this fix in this release than plan a minor
release later.

Best,
Sandeep



On Mon, Sep 3, 2018 at 2:34 PM Philip Cho <ch...@cs.washington.edu>
wrote:

> Actually, the command "git clone --recursive
> https://github.com/apache/incubator-mxnet -b 1.3.0.rc0" works fine now,
> never mind.
>
> On Mon, Sep 3, 2018 at 1:45 PM Philip Cho <ch...@cs.washington.edu>
> wrote:
>
> > Unfortunately, MXNet was depending on a branch of TVM that is now
> deleted.
> > We will have to merge #12448
> > <https://github.com/apache/incubator-mxnet/pull/12448> before the
> release.
> >
> > Background: See dmlc/tvm#1394 <https://github.com/dmlc/tvm/issues/1394>.
> >
> > Philip.
> >
> > On Mon, Sep 3, 2018 at 7:26 AM Carin Meier <ca...@gmail.com> wrote:
> >
> >> Checked out the tag, built and tested the Clojure package. +1
> >>
> >> On Fri, Aug 31, 2018 at 10:59 PM Roshani Nagmote <
> >> roshaninagmote2@gmail.com>
> >> wrote:
> >>
> >> > Hi all,
> >> >
> >> > I would like to propose a vote to release Apache MXNet (incubating)
> >> version
> >> > 1.3.0.RC0. Voting will start now (Friday, Aug 31st) and end at 7:00 PM
> >> > PDT, Wednesday, Sept 5th.
> >> >
> >> > Link to release notes:
> >> > https://github.com/apache/incubator-mxnet/releases
> >> >
> >> > Link to release candidate 1.3.0.rc0:
> >> > *https://github.com/apache/incubator-mxnet/releases/tag/1.3.0.rc
> >> > <https://github.com/apache/incubator-mxnet/releases/tag/1.3.0.rc0>0*
> >> >
> >> > View this page, click on "Build from Source", and use the source code
> >> > obtained from 1.3.0.rc0 tag:
> >> > https://mxnet.incubator.apache.org/install/index.html
> >> >
> >> > Please remember to TEST first before voting accordingly:
> >> >
> >> > +1 = approve
> >> > +0 = no opinion
> >> > -1 = disapprove (provide reason)
> >> >
> >> > Thanks,
> >> > Roshani
> >> >
> >>
> >
>


-- 
Sandeep Krishnamurthy

Re: [VOTE] Release MXNet version 1.3.0.RC0

Posted by Philip Cho <ch...@cs.washington.edu>.
Actually, the command "git clone --recursive
https://github.com/apache/incubator-mxnet -b 1.3.0.rc0" works fine now,
never mind.

On Mon, Sep 3, 2018 at 1:45 PM Philip Cho <ch...@cs.washington.edu>
wrote:

> Unfortunately, MXNet was depending on a branch of TVM that is now deleted.
> We will have to merge #12448
> <https://github.com/apache/incubator-mxnet/pull/12448> before the release.
>
> Background: See dmlc/tvm#1394 <https://github.com/dmlc/tvm/issues/1394>.
>
> Philip.
>
> On Mon, Sep 3, 2018 at 7:26 AM Carin Meier <ca...@gmail.com> wrote:
>
>> Checked out the tag, built and tested the Clojure package. +1
>>
>> On Fri, Aug 31, 2018 at 10:59 PM Roshani Nagmote <
>> roshaninagmote2@gmail.com>
>> wrote:
>>
>> > Hi all,
>> >
>> > I would like to propose a vote to release Apache MXNet (incubating)
>> version
>> > 1.3.0.RC0. Voting will start now (Friday, Aug 31st) and end at 7:00 PM
>> > PDT, Wednesday, Sept 5th.
>> >
>> > Link to release notes:
>> > https://github.com/apache/incubator-mxnet/releases
>> >
>> > Link to release candidate 1.3.0.rc0:
>> > *https://github.com/apache/incubator-mxnet/releases/tag/1.3.0.rc
>> > <https://github.com/apache/incubator-mxnet/releases/tag/1.3.0.rc0>0*
>> >
>> > View this page, click on "Build from Source", and use the source code
>> > obtained from 1.3.0.rc0 tag:
>> > https://mxnet.incubator.apache.org/install/index.html
>> >
>> > Please remember to TEST first before voting accordingly:
>> >
>> > +1 = approve
>> > +0 = no opinion
>> > -1 = disapprove (provide reason)
>> >
>> > Thanks,
>> > Roshani
>> >
>>
>

Re: [VOTE] Release MXNet version 1.3.0.RC0

Posted by Philip Cho <ch...@cs.washington.edu>.
Unfortunately, MXNet was depending on a branch of TVM that is now deleted.
We will have to merge #12448
<https://github.com/apache/incubator-mxnet/pull/12448> before the release.

Background: See dmlc/tvm#1394 <https://github.com/dmlc/tvm/issues/1394>.

Philip.

On Mon, Sep 3, 2018 at 7:26 AM Carin Meier <ca...@gmail.com> wrote:

> Checked out the tag, built and tested the Clojure package. +1
>
> On Fri, Aug 31, 2018 at 10:59 PM Roshani Nagmote <
> roshaninagmote2@gmail.com>
> wrote:
>
> > Hi all,
> >
> > I would like to propose a vote to release Apache MXNet (incubating)
> version
> > 1.3.0.RC0. Voting will start now (Friday, Aug 31st) and end at 7:00 PM
> > PDT, Wednesday, Sept 5th.
> >
> > Link to release notes:
> > https://github.com/apache/incubator-mxnet/releases
> >
> > Link to release candidate 1.3.0.rc0:
> > *https://github.com/apache/incubator-mxnet/releases/tag/1.3.0.rc
> > <https://github.com/apache/incubator-mxnet/releases/tag/1.3.0.rc0>0*
> >
> > View this page, click on "Build from Source", and use the source code
> > obtained from 1.3.0.rc0 tag:
> > https://mxnet.incubator.apache.org/install/index.html
> >
> > Please remember to TEST first before voting accordingly:
> >
> > +1 = approve
> > +0 = no opinion
> > -1 = disapprove (provide reason)
> >
> > Thanks,
> > Roshani
> >
>

Re: [VOTE] Release MXNet version 1.3.0.RC0

Posted by Carin Meier <ca...@gmail.com>.
Checked out the tag, built and tested the Clojure package. +1

On Fri, Aug 31, 2018 at 10:59 PM Roshani Nagmote <ro...@gmail.com>
wrote:

> Hi all,
>
> I would like to propose a vote to release Apache MXNet (incubating) version
> 1.3.0.RC0. Voting will start now (Friday, Aug 31st) and end at 7:00 PM
> PDT, Wednesday, Sept 5th.
>
> Link to release notes:
> https://github.com/apache/incubator-mxnet/releases
>
> Link to release candidate 1.3.0.rc0:
> *https://github.com/apache/incubator-mxnet/releases/tag/1.3.0.rc
> <https://github.com/apache/incubator-mxnet/releases/tag/1.3.0.rc0>0*
>
> View this page, click on "Build from Source", and use the source code
> obtained from 1.3.0.rc0 tag:
> https://mxnet.incubator.apache.org/install/index.html
>
> Please remember to TEST first before voting accordingly:
>
> +1 = approve
> +0 = no opinion
> -1 = disapprove (provide reason)
>
> Thanks,
> Roshani
>

Re: [VOTE] Release MXNet version 1.3.0.RC0

Posted by Steffen Rochel <st...@gmail.com>.
Thanks Sheng. With [1] I was able to build and test 1.3.0.rc0. Explanation
of version makes sense. Changing my vote to +1.
Roshani - please note, master still has version 1.3.0. This makes it
difficult to differentiate to rc0. Hope master version will be updated soon
as part of release process.

Thanks,
Steffen

On Sun, Sep 2, 2018 at 11:22 AM Joshua Z. Zhang <ch...@gmail.com>
wrote:

>
>
>
>
>
>
>
>
> Sheng, thanks for clarification. That make sense to me. I will change the
> vote to +1
>
>
> >
> > On Sep 2, 2018 at 9:43 AM,  <Sheng Zha (mailto:zhasheng@apache.org)>
> wrote:
> >
> >
> >
> >  Hi Steffen and Zhi,
> >
> > That's because those are not the artifacts being voted on. I just
> uploaded the actual release artifact to [1]. Unfortunately, even the
> lengthy release process doc [2] didn't capture this step...
> >
> > Steffen,
> >
> > In case you don't already know, regarding the version string, since we
> cannot change the code after the vote passes, the version never says it's a
> release candidate, only the file name does. None of the previous releases
> follow the convention you suggested. Please adjust your expectation and
> vote again. Feel free to download previous releases and verify:
> >
> > % tar -zxf apache-mxnet-src-1.2.1.rc1-incubating.tar.gz -O
> apache-mxnet-src-1.2.1.rc1-incubating/python/mxnet/libinfo.py | grep
> '__version__'
> > __version__ = "1.2.1"
> >
> > Zhi,
> >
> > We are not accepting new patches after the announced cutoff time. If you
> think this patch is optional and you see no other issue with this release,
> consider changing your vote. If you think the patch is critical, feel free
> to sustain your -1 vote until the end of this voting cycle.
> >
> > [1]
> https://github.com/apache/incubator-mxnet/releases/download/1.3.0.rc0/apache-mxnet-src-1.3.0.rc0-incubating.tar.gz
>
> > [2]
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=73630468
>
> >
> > -sz
> >
> > On 2018/09/01 22:08:48, "Joshua Z. Zhang"  <ch...@gmail.com>
> wrote:
> > >  -1. Please include all 3rd party dependencies, GitHub won’t
> automatically do that.
> > >
> > >  BTW, Per user request in forum, I found this PR(
> https://github.com/apache/incubator-mxnet/pull/12118  <
> https://github.com/apache/incubator-mxnet/pull/12118>) is not included in
> 1.3 rc0, I recommend to cherry-pick into release to avoid potential
> problems.
> > >
> > >  Best,
> > >  Zhi
> > >   >  On Sep 1, 2018, at 2:27 PM, Steffen Rochel  <
> steffenrochel@gmail.com>  wrote:
> > >   >
> > >   >  -1
> > >   >
> > >   >  https://github.com/apache/incubator-mxnet/archive/1.3.0.rc0.zip
> and
> > >   >
> https://github.com/apache/incubator-mxnet/archive/1.3.0.rc0.tar.gz do
> not
> > >   >  contain 3rdparty packages, resulting in make failure:
> > >   >  tar zxf incubator-mxnet-1.3.0.rc0.tar.gz
> > >   >  cd incubator-mxnet-1.3.0.rc0/
> > >   >  make USE_OPENCV=1 USE_BLAS=openblas
> > >   >  Makefile:74:
> > >   >
> /home/steffen/Downloads/incubator-mxnet-1.3.0.rc0/3rdparty/mshadow/make/
> > >   >  mshadow.mk: No such file or directory
> > >   >  Makefile:75:
> > >   >
> /home/steffen/Downloads/incubator-mxnet-1.3.0.rc0/3rdparty/dmlc-core/make/
> > >   >  dmlc.mk: No such file or directory
> > >   >  Makefile:176: "USE_LAPACK disabled because libraries were not
> found"
> > >   >  Makefile:284: WARNING: Significant performance increases can be
> achieved by
> > >   >  installing and enabling gperftools or jemalloc development
> packages
> > >   >  Makefile:355:
> > >   >
> /home/steffen/Downloads/incubator-mxnet-1.3.0.rc0/3rdparty/ps-lite/make/
> > >   >  ps.mk: No such file or directory
> > >   >  make: *** No rule to make target
> > >   >
> '/home/steffen/Downloads/incubator-mxnet-1.3.0.rc0/3rdparty/ps-lite/make/
> > >   >  ps.mk'. Stop.
> > >   >
> > >   >  ~/Downloads/incubator-mxnet-1.3.0.rc0/3rdparty$ ls -al *
> > >   >  cub:
> > >   >  total 8
> > >   >  drwxr-xr-x 2 steffen steffen 4096 Aug 29 10:07 .
> > >   >  drwxr-xr-x 12 steffen steffen 4096 Aug 29 10:07 ..
> > >   >
> > >   >  dlpack:
> > >   >  total 8
> > >   >  drwxr-xr-x 2 steffen steffen 4096 Aug 29 10:07 .
> > >   >  drwxr-xr-x 12 steffen steffen 4096 Aug 29 10:07 ..
> > >   >
> > >   >  dmlc-core:
> > >   >  total 8
> > >   >  drwxr-xr-x 2 steffen steffen 4096 Aug 29 10:07 .
> > >   >  drwxr-xr-x 12 steffen steffen 4096 Aug 29 10:07 ..
> > >   >
> > >   >  Environment:
> > >   >  uname -a
> > >   >  Linux steffen 4.15.0-33-generic #36-Ubuntu SMP Wed Aug 15
> 16:00:05 UTC 2018
> > >   >  x86_64 x86_64 x86_64 GNU/L
> > >   >
> > >   >  Build from git succeeded:
> > >   >  git clone --recursive https://github.com/apache/incubator-mxnet
> --branch
> > >   >  1.3.0.rc0
> > >   >  cd incubator-mxnet/
> > >   >  git checkout 1.3.0.rc0
> > >   >  make USE_OPENCV=1 USE_BLAS=openblas
> > >   >  cd python/
> > >   >  sudo pip install -e .
> > >   >
> > >   >>>>  import mxnet as mx
> > >   >>>>  print(mx.__version__)
> > >   >  1.3.0
> > >   >
> > >   >  I was expecting version to be 1.3.0.rc0
> > >   >
> > >   >  Steffen
> > >   >
> > >   >
> > >   >
> > >   >  On Sat, Sep 1, 2018 at 3:22 AM Pigeon Lucky  <
> lucky90322@gmail.com>  wrote:
> > >   >
> > >   >>  +1
> > >   >>
> > >   >>  On Sat, 1 Sep 2018, 10:59 Roshani Nagmote,  <
> roshaninagmote2@gmail.com>
> > >   >>  wrote:
> > >   >>
> > >   >>>  Hi all,
> > >   >>>
> > >   >>>  I would like to propose a vote to release Apache MXNet
> (incubating)
> > >   >>  version
> > >   >>>  1.3.0.RC0. Voting will start now (Friday, Aug 31st) and end at
> 7:00 PM
> > >   >>>  PDT, Wednesday, Sept 5th.
> > >   >>>
> > >   >>>  Link to release notes:
> > >   >>>  https://github.com/apache/incubator-mxnet/releases
> > >   >>>
> > >   >>>  Link to release candidate 1.3.0.rc0:
> > >   >>>  *
> https://github.com/apache/incubator-mxnet/releases/tag/1.3.0.rc
> > >   >>>   <
> https://github.com/apache/incubator-mxnet/releases/tag/1.3.0.rc0>0*
> > >   >>>
> > >   >>>  View this page, click on "Build from Source", and use the
> source code
> > >   >>>  obtained from 1.3.0.rc0 tag:
> > >   >>>  https://mxnet.incubator.apache.org/install/index.html
> > >   >>>
> > >   >>>  Please remember to TEST first before voting accordingly:
> > >   >>>
> > >   >>>  +1 = approve
> > >   >>>  +0 = no opinion
> > >   >>>  -1 = disapprove (provide reason)
> > >   >>>
> > >   >>>  Thanks,
> > >   >>>  Roshani
> > >   >>>
> > >   >>
> > >
> > >
> >

Re: [VOTE] Release MXNet version 1.3.0.RC0

Posted by "Joshua Z. Zhang" <ch...@gmail.com>.
      
  

  
  

  
  
Sheng, thanks for clarification. That make sense to me. I will change the vote to +1
  
  
>   
> On Sep 2, 2018 at 9:43 AM,  <Sheng Zha (mailto:zhasheng@apache.org)>  wrote:
>   
>   
>   
>  Hi Steffen and Zhi,  
>
> That's because those are not the artifacts being voted on. I just uploaded the actual release artifact to [1]. Unfortunately, even the lengthy release process doc [2] didn't capture this step...  
>
> Steffen,  
>
> In case you don't already know, regarding the version string, since we cannot change the code after the vote passes, the version never says it's a release candidate, only the file name does. None of the previous releases follow the convention you suggested. Please adjust your expectation and vote again. Feel free to download previous releases and verify:  
>
> % tar -zxf apache-mxnet-src-1.2.1.rc1-incubating.tar.gz -O apache-mxnet-src-1.2.1.rc1-incubating/python/mxnet/libinfo.py | grep '__version__'  
> __version__ = "1.2.1"  
>
> Zhi,  
>
> We are not accepting new patches after the announced cutoff time. If you think this patch is optional and you see no other issue with this release, consider changing your vote. If you think the patch is critical, feel free to sustain your -1 vote until the end of this voting cycle.  
>
> [1] https://github.com/apache/incubator-mxnet/releases/download/1.3.0.rc0/apache-mxnet-src-1.3.0.rc0-incubating.tar.gz  
> [2] https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=73630468  
>
> -sz  
>
> On 2018/09/01 22:08:48, "Joshua Z. Zhang"  <ch...@gmail.com>  wrote:  
> >  -1. Please include all 3rd party dependencies, GitHub won’t automatically do that.  
> >   
> >  BTW, Per user request in forum, I found this PR(https://github.com/apache/incubator-mxnet/pull/12118  <https://github.com/apache/incubator-mxnet/pull/12118>) is not included in 1.3 rc0, I recommend to cherry-pick into release to avoid potential problems.  
> >   
> >  Best,  
> >  Zhi  
> >   >  On Sep 1, 2018, at 2:27 PM, Steffen Rochel  <st...@gmail.com>  wrote:  
> >   >   
> >   >  -1  
> >   >   
> >   >  https://github.com/apache/incubator-mxnet/archive/1.3.0.rc0.zip and  
> >   >  https://github.com/apache/incubator-mxnet/archive/1.3.0.rc0.tar.gz do not  
> >   >  contain 3rdparty packages, resulting in make failure:  
> >   >  tar zxf incubator-mxnet-1.3.0.rc0.tar.gz  
> >   >  cd incubator-mxnet-1.3.0.rc0/  
> >   >  make USE_OPENCV=1 USE_BLAS=openblas  
> >   >  Makefile:74:  
> >   >  /home/steffen/Downloads/incubator-mxnet-1.3.0.rc0/3rdparty/mshadow/make/  
> >   >  mshadow.mk: No such file or directory  
> >   >  Makefile:75:  
> >   >  /home/steffen/Downloads/incubator-mxnet-1.3.0.rc0/3rdparty/dmlc-core/make/  
> >   >  dmlc.mk: No such file or directory  
> >   >  Makefile:176: "USE_LAPACK disabled because libraries were not found"  
> >   >  Makefile:284: WARNING: Significant performance increases can be achieved by  
> >   >  installing and enabling gperftools or jemalloc development packages  
> >   >  Makefile:355:  
> >   >  /home/steffen/Downloads/incubator-mxnet-1.3.0.rc0/3rdparty/ps-lite/make/  
> >   >  ps.mk: No such file or directory  
> >   >  make: *** No rule to make target  
> >   >  '/home/steffen/Downloads/incubator-mxnet-1.3.0.rc0/3rdparty/ps-lite/make/  
> >   >  ps.mk'. Stop.  
> >   >   
> >   >  ~/Downloads/incubator-mxnet-1.3.0.rc0/3rdparty$ ls -al *  
> >   >  cub:  
> >   >  total 8  
> >   >  drwxr-xr-x 2 steffen steffen 4096 Aug 29 10:07 .  
> >   >  drwxr-xr-x 12 steffen steffen 4096 Aug 29 10:07 ..  
> >   >   
> >   >  dlpack:  
> >   >  total 8  
> >   >  drwxr-xr-x 2 steffen steffen 4096 Aug 29 10:07 .  
> >   >  drwxr-xr-x 12 steffen steffen 4096 Aug 29 10:07 ..  
> >   >   
> >   >  dmlc-core:  
> >   >  total 8  
> >   >  drwxr-xr-x 2 steffen steffen 4096 Aug 29 10:07 .  
> >   >  drwxr-xr-x 12 steffen steffen 4096 Aug 29 10:07 ..  
> >   >   
> >   >  Environment:  
> >   >  uname -a  
> >   >  Linux steffen 4.15.0-33-generic #36-Ubuntu SMP Wed Aug 15 16:00:05 UTC 2018  
> >   >  x86_64 x86_64 x86_64 GNU/L  
> >   >   
> >   >  Build from git succeeded:  
> >   >  git clone --recursive https://github.com/apache/incubator-mxnet --branch  
> >   >  1.3.0.rc0  
> >   >  cd incubator-mxnet/  
> >   >  git checkout 1.3.0.rc0  
> >   >  make USE_OPENCV=1 USE_BLAS=openblas  
> >   >  cd python/  
> >   >  sudo pip install -e .  
> >   >   
> >   >>>>  import mxnet as mx  
> >   >>>>  print(mx.__version__)  
> >   >  1.3.0  
> >   >   
> >   >  I was expecting version to be 1.3.0.rc0  
> >   >   
> >   >  Steffen  
> >   >   
> >   >   
> >   >   
> >   >  On Sat, Sep 1, 2018 at 3:22 AM Pigeon Lucky  <lu...@gmail.com>  wrote:  
> >   >   
> >   >>  +1  
> >   >>   
> >   >>  On Sat, 1 Sep 2018, 10:59 Roshani Nagmote,  <ro...@gmail.com>   
> >   >>  wrote:  
> >   >>   
> >   >>>  Hi all,  
> >   >>>   
> >   >>>  I would like to propose a vote to release Apache MXNet (incubating)  
> >   >>  version  
> >   >>>  1.3.0.RC0. Voting will start now (Friday, Aug 31st) and end at 7:00 PM  
> >   >>>  PDT, Wednesday, Sept 5th.  
> >   >>>   
> >   >>>  Link to release notes:  
> >   >>>  https://github.com/apache/incubator-mxnet/releases  
> >   >>>   
> >   >>>  Link to release candidate 1.3.0.rc0:  
> >   >>>  *https://github.com/apache/incubator-mxnet/releases/tag/1.3.0.rc  
> >   >>>   <https://github.com/apache/incubator-mxnet/releases/tag/1.3.0.rc0>0*  
> >   >>>   
> >   >>>  View this page, click on "Build from Source", and use the source code  
> >   >>>  obtained from 1.3.0.rc0 tag:  
> >   >>>  https://mxnet.incubator.apache.org/install/index.html  
> >   >>>   
> >   >>>  Please remember to TEST first before voting accordingly:  
> >   >>>   
> >   >>>  +1 = approve  
> >   >>>  +0 = no opinion  
> >   >>>  -1 = disapprove (provide reason)  
> >   >>>   
> >   >>>  Thanks,  
> >   >>>  Roshani  
> >   >>>   
> >   >>   
> >   
> >   
>              

Re: [VOTE] Release MXNet version 1.3.0.RC0

Posted by Sheng Zha <zh...@apache.org>.
Hi Steffen and Zhi,

That's because those are not the artifacts being voted on. I just uploaded the actual release artifact to [1]. Unfortunately, even the lengthy release process doc [2] didn't capture this step...

Steffen,

In case you don't already know, regarding the version string, since we cannot change the code after the vote passes, the version never says it's a release candidate, only the file name does. None of the previous releases follow the convention you suggested. Please adjust your expectation and vote again. Feel free to download previous releases and verify:

% tar -zxf apache-mxnet-src-1.2.1.rc1-incubating.tar.gz -O apache-mxnet-src-1.2.1.rc1-incubating/python/mxnet/libinfo.py | grep '__version__'
__version__ = "1.2.1"

Zhi,

We are not accepting new patches after the announced cutoff time. If you think this patch is optional and you see no other issue with this release, consider changing your vote. If you think the patch is critical, feel free to sustain your -1 vote until the end of this voting cycle.

[1] https://github.com/apache/incubator-mxnet/releases/download/1.3.0.rc0/apache-mxnet-src-1.3.0.rc0-incubating.tar.gz
[2] https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=73630468

-sz

On 2018/09/01 22:08:48, "Joshua Z. Zhang" <ch...@gmail.com> wrote: 
> -1. Please include all 3rd party dependencies, GitHub won’t automatically do that. 
> 
> BTW, Per user request in forum, I found this PR(https://github.com/apache/incubator-mxnet/pull/12118 <https://github.com/apache/incubator-mxnet/pull/12118>) is not included in 1.3 rc0, I recommend to cherry-pick into release to avoid potential problems. 
> 
> Best,
> Zhi
> > On Sep 1, 2018, at 2:27 PM, Steffen Rochel <st...@gmail.com> wrote:
> > 
> > -1
> > 
> > https://github.com/apache/incubator-mxnet/archive/1.3.0.rc0.zip and
> > https://github.com/apache/incubator-mxnet/archive/1.3.0.rc0.tar.gz do not
> > contain 3rdparty packages, resulting in make failure:
> > tar zxf incubator-mxnet-1.3.0.rc0.tar.gz
> > cd incubator-mxnet-1.3.0.rc0/
> > make USE_OPENCV=1 USE_BLAS=openblas
> > Makefile:74:
> > /home/steffen/Downloads/incubator-mxnet-1.3.0.rc0/3rdparty/mshadow/make/
> > mshadow.mk: No such file or directory
> > Makefile:75:
> > /home/steffen/Downloads/incubator-mxnet-1.3.0.rc0/3rdparty/dmlc-core/make/
> > dmlc.mk: No such file or directory
> > Makefile:176: "USE_LAPACK disabled because libraries were not found"
> > Makefile:284: WARNING: Significant performance increases can be achieved by
> > installing and enabling gperftools or jemalloc development packages
> > Makefile:355:
> > /home/steffen/Downloads/incubator-mxnet-1.3.0.rc0/3rdparty/ps-lite/make/
> > ps.mk: No such file or directory
> > make: *** No rule to make target
> > '/home/steffen/Downloads/incubator-mxnet-1.3.0.rc0/3rdparty/ps-lite/make/
> > ps.mk'.  Stop.
> > 
> > ~/Downloads/incubator-mxnet-1.3.0.rc0/3rdparty$ ls -al *
> > cub:
> > total 8
> > drwxr-xr-x  2 steffen steffen 4096 Aug 29 10:07 .
> > drwxr-xr-x 12 steffen steffen 4096 Aug 29 10:07 ..
> > 
> > dlpack:
> > total 8
> > drwxr-xr-x  2 steffen steffen 4096 Aug 29 10:07 .
> > drwxr-xr-x 12 steffen steffen 4096 Aug 29 10:07 ..
> > 
> > dmlc-core:
> > total 8
> > drwxr-xr-x  2 steffen steffen 4096 Aug 29 10:07 .
> > drwxr-xr-x 12 steffen steffen 4096 Aug 29 10:07 ..
> > 
> > Environment:
> > uname -a
> > Linux steffen 4.15.0-33-generic #36-Ubuntu SMP Wed Aug 15 16:00:05 UTC 2018
> > x86_64 x86_64 x86_64 GNU/L
> > 
> > Build from git succeeded:
> > git clone --recursive https://github.com/apache/incubator-mxnet --branch
> > 1.3.0.rc0
> > cd incubator-mxnet/
> > git checkout 1.3.0.rc0
> > make USE_OPENCV=1 USE_BLAS=openblas
> > cd python/
> > sudo pip install -e .
> > 
> >>>> import mxnet as mx
> >>>> print(mx.__version__)
> > 1.3.0
> > 
> > I was expecting version to be 1.3.0.rc0
> > 
> > Steffen
> > 
> > 
> > 
> > On Sat, Sep 1, 2018 at 3:22 AM Pigeon Lucky <lu...@gmail.com> wrote:
> > 
> >> +1
> >> 
> >> On Sat, 1 Sep 2018, 10:59 Roshani Nagmote, <ro...@gmail.com>
> >> wrote:
> >> 
> >>> Hi all,
> >>> 
> >>> I would like to propose a vote to release Apache MXNet (incubating)
> >> version
> >>> 1.3.0.RC0. Voting will start now (Friday, Aug 31st) and end at 7:00 PM
> >>> PDT, Wednesday, Sept 5th.
> >>> 
> >>> Link to release notes:
> >>> https://github.com/apache/incubator-mxnet/releases
> >>> 
> >>> Link to release candidate 1.3.0.rc0:
> >>> *https://github.com/apache/incubator-mxnet/releases/tag/1.3.0.rc
> >>> <https://github.com/apache/incubator-mxnet/releases/tag/1.3.0.rc0>0*
> >>> 
> >>> View this page, click on "Build from Source", and use the source code
> >>> obtained from 1.3.0.rc0 tag:
> >>> https://mxnet.incubator.apache.org/install/index.html
> >>> 
> >>> Please remember to TEST first before voting accordingly:
> >>> 
> >>> +1 = approve
> >>> +0 = no opinion
> >>> -1 = disapprove (provide reason)
> >>> 
> >>> Thanks,
> >>> Roshani
> >>> 
> >> 
> 
> 

Re: [VOTE] Release MXNet version 1.3.0.RC0

Posted by "Joshua Z. Zhang" <ch...@gmail.com>.
-1. Please include all 3rd party dependencies, GitHub won’t automatically do that. 

BTW, Per user request in forum, I found this PR(https://github.com/apache/incubator-mxnet/pull/12118 <https://github.com/apache/incubator-mxnet/pull/12118>) is not included in 1.3 rc0, I recommend to cherry-pick into release to avoid potential problems. 

Best,
Zhi
> On Sep 1, 2018, at 2:27 PM, Steffen Rochel <st...@gmail.com> wrote:
> 
> -1
> 
> https://github.com/apache/incubator-mxnet/archive/1.3.0.rc0.zip and
> https://github.com/apache/incubator-mxnet/archive/1.3.0.rc0.tar.gz do not
> contain 3rdparty packages, resulting in make failure:
> tar zxf incubator-mxnet-1.3.0.rc0.tar.gz
> cd incubator-mxnet-1.3.0.rc0/
> make USE_OPENCV=1 USE_BLAS=openblas
> Makefile:74:
> /home/steffen/Downloads/incubator-mxnet-1.3.0.rc0/3rdparty/mshadow/make/
> mshadow.mk: No such file or directory
> Makefile:75:
> /home/steffen/Downloads/incubator-mxnet-1.3.0.rc0/3rdparty/dmlc-core/make/
> dmlc.mk: No such file or directory
> Makefile:176: "USE_LAPACK disabled because libraries were not found"
> Makefile:284: WARNING: Significant performance increases can be achieved by
> installing and enabling gperftools or jemalloc development packages
> Makefile:355:
> /home/steffen/Downloads/incubator-mxnet-1.3.0.rc0/3rdparty/ps-lite/make/
> ps.mk: No such file or directory
> make: *** No rule to make target
> '/home/steffen/Downloads/incubator-mxnet-1.3.0.rc0/3rdparty/ps-lite/make/
> ps.mk'.  Stop.
> 
> ~/Downloads/incubator-mxnet-1.3.0.rc0/3rdparty$ ls -al *
> cub:
> total 8
> drwxr-xr-x  2 steffen steffen 4096 Aug 29 10:07 .
> drwxr-xr-x 12 steffen steffen 4096 Aug 29 10:07 ..
> 
> dlpack:
> total 8
> drwxr-xr-x  2 steffen steffen 4096 Aug 29 10:07 .
> drwxr-xr-x 12 steffen steffen 4096 Aug 29 10:07 ..
> 
> dmlc-core:
> total 8
> drwxr-xr-x  2 steffen steffen 4096 Aug 29 10:07 .
> drwxr-xr-x 12 steffen steffen 4096 Aug 29 10:07 ..
> 
> Environment:
> uname -a
> Linux steffen 4.15.0-33-generic #36-Ubuntu SMP Wed Aug 15 16:00:05 UTC 2018
> x86_64 x86_64 x86_64 GNU/L
> 
> Build from git succeeded:
> git clone --recursive https://github.com/apache/incubator-mxnet --branch
> 1.3.0.rc0
> cd incubator-mxnet/
> git checkout 1.3.0.rc0
> make USE_OPENCV=1 USE_BLAS=openblas
> cd python/
> sudo pip install -e .
> 
>>>> import mxnet as mx
>>>> print(mx.__version__)
> 1.3.0
> 
> I was expecting version to be 1.3.0.rc0
> 
> Steffen
> 
> 
> 
> On Sat, Sep 1, 2018 at 3:22 AM Pigeon Lucky <lu...@gmail.com> wrote:
> 
>> +1
>> 
>> On Sat, 1 Sep 2018, 10:59 Roshani Nagmote, <ro...@gmail.com>
>> wrote:
>> 
>>> Hi all,
>>> 
>>> I would like to propose a vote to release Apache MXNet (incubating)
>> version
>>> 1.3.0.RC0. Voting will start now (Friday, Aug 31st) and end at 7:00 PM
>>> PDT, Wednesday, Sept 5th.
>>> 
>>> Link to release notes:
>>> https://github.com/apache/incubator-mxnet/releases
>>> 
>>> Link to release candidate 1.3.0.rc0:
>>> *https://github.com/apache/incubator-mxnet/releases/tag/1.3.0.rc
>>> <https://github.com/apache/incubator-mxnet/releases/tag/1.3.0.rc0>0*
>>> 
>>> View this page, click on "Build from Source", and use the source code
>>> obtained from 1.3.0.rc0 tag:
>>> https://mxnet.incubator.apache.org/install/index.html
>>> 
>>> Please remember to TEST first before voting accordingly:
>>> 
>>> +1 = approve
>>> +0 = no opinion
>>> -1 = disapprove (provide reason)
>>> 
>>> Thanks,
>>> Roshani
>>> 
>> 


Re: [VOTE] Release MXNet version 1.3.0.RC0

Posted by Steffen Rochel <st...@gmail.com>.
-1

https://github.com/apache/incubator-mxnet/archive/1.3.0.rc0.zip and
https://github.com/apache/incubator-mxnet/archive/1.3.0.rc0.tar.gz do not
contain 3rdparty packages, resulting in make failure:
tar zxf incubator-mxnet-1.3.0.rc0.tar.gz
cd incubator-mxnet-1.3.0.rc0/
make USE_OPENCV=1 USE_BLAS=openblas
Makefile:74:
/home/steffen/Downloads/incubator-mxnet-1.3.0.rc0/3rdparty/mshadow/make/
mshadow.mk: No such file or directory
Makefile:75:
/home/steffen/Downloads/incubator-mxnet-1.3.0.rc0/3rdparty/dmlc-core/make/
dmlc.mk: No such file or directory
Makefile:176: "USE_LAPACK disabled because libraries were not found"
Makefile:284: WARNING: Significant performance increases can be achieved by
installing and enabling gperftools or jemalloc development packages
Makefile:355:
/home/steffen/Downloads/incubator-mxnet-1.3.0.rc0/3rdparty/ps-lite/make/
ps.mk: No such file or directory
make: *** No rule to make target
'/home/steffen/Downloads/incubator-mxnet-1.3.0.rc0/3rdparty/ps-lite/make/
ps.mk'.  Stop.

~/Downloads/incubator-mxnet-1.3.0.rc0/3rdparty$ ls -al *
cub:
total 8
drwxr-xr-x  2 steffen steffen 4096 Aug 29 10:07 .
drwxr-xr-x 12 steffen steffen 4096 Aug 29 10:07 ..

dlpack:
total 8
drwxr-xr-x  2 steffen steffen 4096 Aug 29 10:07 .
drwxr-xr-x 12 steffen steffen 4096 Aug 29 10:07 ..

dmlc-core:
total 8
drwxr-xr-x  2 steffen steffen 4096 Aug 29 10:07 .
drwxr-xr-x 12 steffen steffen 4096 Aug 29 10:07 ..

Environment:
uname -a
Linux steffen 4.15.0-33-generic #36-Ubuntu SMP Wed Aug 15 16:00:05 UTC 2018
x86_64 x86_64 x86_64 GNU/L

Build from git succeeded:
git clone --recursive https://github.com/apache/incubator-mxnet --branch
1.3.0.rc0
cd incubator-mxnet/
git checkout 1.3.0.rc0
make USE_OPENCV=1 USE_BLAS=openblas
cd python/
sudo pip install -e .

>>> import mxnet as mx
>>> print(mx.__version__)
1.3.0

I was expecting version to be 1.3.0.rc0

Steffen



On Sat, Sep 1, 2018 at 3:22 AM Pigeon Lucky <lu...@gmail.com> wrote:

> +1
>
> On Sat, 1 Sep 2018, 10:59 Roshani Nagmote, <ro...@gmail.com>
> wrote:
>
> > Hi all,
> >
> > I would like to propose a vote to release Apache MXNet (incubating)
> version
> > 1.3.0.RC0. Voting will start now (Friday, Aug 31st) and end at 7:00 PM
> > PDT, Wednesday, Sept 5th.
> >
> > Link to release notes:
> > https://github.com/apache/incubator-mxnet/releases
> >
> > Link to release candidate 1.3.0.rc0:
> > *https://github.com/apache/incubator-mxnet/releases/tag/1.3.0.rc
> > <https://github.com/apache/incubator-mxnet/releases/tag/1.3.0.rc0>0*
> >
> > View this page, click on "Build from Source", and use the source code
> > obtained from 1.3.0.rc0 tag:
> > https://mxnet.incubator.apache.org/install/index.html
> >
> > Please remember to TEST first before voting accordingly:
> >
> > +1 = approve
> > +0 = no opinion
> > -1 = disapprove (provide reason)
> >
> > Thanks,
> > Roshani
> >
>

Re: [VOTE] Release MXNet version 1.3.0.RC0

Posted by Pigeon Lucky <lu...@gmail.com>.
+1

On Sat, 1 Sep 2018, 10:59 Roshani Nagmote, <ro...@gmail.com>
wrote:

> Hi all,
>
> I would like to propose a vote to release Apache MXNet (incubating) version
> 1.3.0.RC0. Voting will start now (Friday, Aug 31st) and end at 7:00 PM
> PDT, Wednesday, Sept 5th.
>
> Link to release notes:
> https://github.com/apache/incubator-mxnet/releases
>
> Link to release candidate 1.3.0.rc0:
> *https://github.com/apache/incubator-mxnet/releases/tag/1.3.0.rc
> <https://github.com/apache/incubator-mxnet/releases/tag/1.3.0.rc0>0*
>
> View this page, click on "Build from Source", and use the source code
> obtained from 1.3.0.rc0 tag:
> https://mxnet.incubator.apache.org/install/index.html
>
> Please remember to TEST first before voting accordingly:
>
> +1 = approve
> +0 = no opinion
> -1 = disapprove (provide reason)
>
> Thanks,
> Roshani
>