You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mxnet.apache.org by Gautam <ga...@gmail.com> on 2017/10/02 20:19:51 UTC

Re: Apache MXNet build failures are mostly valid - verify before merge

Thanks All.

 I've created the JIRA to mark the protected branch for master
https://issues.apache.org/jira/browse/INCUBATOR-205.
We also need to add all the committers to be code owner as discussed in the
slack, I've opened a PR for it
https://github.com/apache/incubator-mxnet/pull/8128.

Good point Joern, I'll follow up on that.

Regards,
Gautam

On Fri, Sep 29, 2017 at 2:20 AM, Joern Kottmann <ko...@gmail.com> wrote:

> It also makes sense to block too old PRs from merging, because the
> test results are outdated and the build might fail after it gets
> merged.
>
> Jörn
>
> On Thu, Sep 28, 2017 at 9:14 PM, Zha, Sheng <zh...@amazon.com> wrote:
> > +1 on protected branch.
> >
> > Best regards,
> > -sz
> >
> > On 9/28/17, 11:48 AM, "Kumar, Gautam" <ga...@amazon.com> wrote:
> >
> >     Hi Guys,
> >
> >      Let’s focus on specific issue here.
> >
> >     Marking the master branch protected which involves “Only merge if
> checks has passed, and yes it will run the complete build”.
> >
> >     We can’t afford to degrade the quality and keep debugging the build
> failure forever. If it’s slow down the development at the cost of quality I
> will vote for the quality.
> >     We can work on improving the infrastructure to improve the overall
> speed.  If you have any specific concerns on availability of Jenkins please
> point out.
> >
> >     -Gautam
> >
> >
> >     On 9/28/17, 11:38 AM, "Chris Olivier" <cj...@gmail.com> wrote:
> >
> >         -1000 on that. :)
> >
> >         On Thu, Sep 28, 2017 at 11:33 AM Naveen Swamy <
> mnnaveen@gmail.com> wrote:
> >
> >         > PR->Sanity test/Linux build/test->reviewer/committer approves
> the
> >         > change->Comment "Build Now" (Or trigger on at least one
> approval from a
> >         > committer other than author)->*Full build-*>*passes
> build*->Enable Merge
> >         >
> >         > Let us take this particular topic to a separate thread or
> discuss offline
> >         > if further clarification is needed.
> >         >
> >         > On Thu, Sep 28, 2017 at 11:24 AM, Chris Olivier <
> cjolivier01@gmail.com>
> >         > wrote:
> >         >
> >         > > I understand the proposal.  How to trigger a build in that
> case?
> >         > >
> >         > >
> >         > > On Thu, Sep 28, 2017 at 10:54 AM Madan Jampani <
> madan.jampani@gmail.com>
> >         > > wrote:
> >         > >
> >         > > > Chris,
> >         > > > I don't think Naveen is suggesting that a merge happen
> without full
> >         > > > verification i.e. all tests across all platforms pass.
> >         > > > If a PR has some back and forth and results in multiple
> revisions
> >         > (which
> >         > > is
> >         > > > arguably more common than a random unit test failing), we
> simply delay
> >         > > full
> >         > > > verification until the owner/reviewer have settled on a
> mutually
> >         > > acceptable
> >         > > > state.
> >         > > >
> >         > > > On Thu, Sep 28, 2017 at 10:25 AM, Chris Olivier <
> cjolivier01@gmail.com
> >         > >
> >         > > > wrote:
> >         > > >
> >         > > > > -1 for running only partial tests.  Most failing unit
> tests that get
> >         > > > > through fail only for certain platforms/configurations.
> I personally
> >         > > > > prefer to be assured the build and test is good before
> merge.  Most
> >         > PR
> >         > > > > merges aren't in a huge hurry.
> >         > > > >
> >         > > > > On Thu, Sep 28, 2017 at 9:54 AM, Naveen Swamy <
> mnnaveen@gmail.com>
> >         > > > wrote:
> >         > > > >
> >         > > > > > +1 to make it protected. Here is what I am thinking
> for PR builds
> >         > > > > > on a PR run Sanity Tests + build/test one
> platform->committer
> >         > reviews
> >         > > > the
> >         > > > > > code and issues "Build Now", a full build is
> run->Github checks
> >         > that
> >         > > > the
> >         > > > > > full build checks succeed before it can be merged.
> >         > > > > >
> >         > > > > > I agree with Madan that PR should be approved by one
> another
> >         > > committer.
> >         > > > > >
> >         > > > > >
> >         > > > > >
> >         > > > > > On Thu, Sep 28, 2017 at 9:37 AM, Madan Jampani <
> >         > > > madan.jampani@gmail.com>
> >         > > > > > wrote:
> >         > > > > >
> >         > > > > > > +1
> >         > > > > > >
> >         > > > > > > At a minimum I'd like to see the following two
> happen:
> >         > > > > > > - Option to merge is disabled until all required
> checks pass.
> >         > > > > > > - Code is reviewed and given +1 by at least one
> other committer
> >         > (no
> >         > > > > self
> >         > > > > > > review).
> >         > > > > > >
> >         > > > > > > On Wed, Sep 27, 2017 at 11:15 PM, Gautam <
> gautamnitc@gmail.com>
> >         > > > wrote:
> >         > > > > > >
> >         > > > > > > > Hi Chris,
> >         > > > > > > >
> >         > > > > > > >   Here <https://help.github.com/
> articles/about-protected-
> >         > > branches/
> >         > > > >
> >         > > > > is
> >         > > > > > > > user
> >         > > > > > > > document on semantics of protected branch.
> >         > > > > > > > In short when a branch is protected following
> applies to that
> >         > > > branch.
> >         > > > > > > >
> >         > > > > > > >    - Can't be force pushed
> >         > > > > > > >    - Can't be deleted
> >         > > > > > > >    - Can't have changes merged into it until
> required status
> >         > > checks
> >         > > > > > > >    <https://help.github.com/
> articles/about-required-
> >         > > status-checks>
> >         > > > > > pass
> >         > > > > > > >    - Can't have changes merged into it until
> required reviews
> >         > are
> >         > > > > > > approved
> >         > > > > > > >    <https://help.github.com/
> articles/approving-a-pull-
> >         > > > > > > > request-with-required-reviews>
> >         > > > > > > >    - Can't be edited or have files uploaded to it
> from the web
> >         > > > > > > >    - Can't have changes merged into it until
> changes to files
> >         > > that
> >         > > > > > > > have a designated
> >         > > > > > > >    code owner <https://help.github.com/
> >         > > articles/about-codeowners/>
> >         > > > > > have
> >         > > > > > > >    been approved by that owner
> >         > > > > > > >
> >         > > > > > > >  I am sure many of us might not want to have all
> these but we
> >         > can
> >         > > > > > debate
> >         > > > > > > on
> >         > > > > > > > it. My main motive was to "*Can't have changes
> merged into it
> >         > > until
> >         > > > > > > > required status checks pass*"
> >         > > > > > > >
> >         > > > > > > >
> >         > > > > > > > -Gautam
> >         > > > > > > >
> >         > > > > > > >
> >         > > > > > > >
> >         > > > > > > > On Wed, Sep 27, 2017 at 11:09 PM, Chris Olivier <
> >         > > > > cjolivier01@gmail.com
> >         > > > > > >
> >         > > > > > > > wrote:
> >         > > > > > > >
> >         > > > > > > > > What does that mean? "Protected"? Protected from
> what?
> >         > > > > > > > >
> >         > > > > > > > > On Wed, Sep 27, 2017 at 11:08 PM Gautam <
> >         > gautamnitc@gmail.com>
> >         > > > > > wrote:
> >         > > > > > > > >
> >         > > > > > > > > > Hi Chris,
> >         > > > > > > > > >
> >         > > > > > > > > >    I mean make "master branch protected" of
> MXNet.
> >         > > > > > > > > >
> >         > > > > > > > > > -Gautam
> >         > > > > > > > > >
> >         > > > > > > > > > On Wed, Sep 27, 2017 at 11:04 PM, Chris
> Olivier <
> >         > > > > > > cjolivier01@gmail.com
> >         > > > > > > > >
> >         > > > > > > > > > wrote:
> >         > > > > > > > > >
> >         > > > > > > > > > > What does this mean? "Mx-net branch
> protected"?
> >         > > > > > > > > > >
> >         > > > > > > > > > > On Wed, Sep 27, 2017 at 9:59 PM Tsuyoshi
> OZAWA <
> >         > > > > > > > > ozawa.tsuyoshi@gmail.com
> >         > > > > > > > > > >
> >         > > > > > > > > > > wrote:
> >         > > > > > > > > > >
> >         > > > > > > > > > > > +1,
> >         > > > > > > > > > > >
> >         > > > > > > > > > > > While I'm checking the recent build
> failures, and I
> >         > think
> >         > > > the
> >         > > > > > > > > decision
> >         > > > > > > > > > > > of making the mx-net branch protected is
> necessary for
> >         > > > stable
> >         > > > > > > > > > > > building.
> >         > > > > > > > > > > > Thanks Kumar for resuming important
> discussion.
> >         > > > > > > > > > > >
> >         > > > > > > > > > > > Best regards
> >         > > > > > > > > > > > - Tsuyoshi
> >         > > > > > > > > > > >
> >         > > > > > > > > > > > On Thu, Sep 28, 2017 at 12:56 PM, Kumar,
> Gautam <
> >         > > > > > > gauta@amazon.com>
> >         > > > > > > > > > > wrote:
> >         > > > > > > > > > > > > Reviving the discussion.
> >         > > > > > > > > > > > >
> >         > > > > > > > > > > > > At this point of time we have couple of
> stable builds
> >         > > > > > > > > > > > >
> >         > > > > > > > > > > >
> >         > https://builds.apache.org/view/Incubator%20Projects/job/
> >         > > > > > > > > > > incubator-mxnet/job/master/448/
> >         > > > > > > > > > > > >
> >         > > > > > > > > > > >
> >         > https://builds.apache.org/view/Incubator%20Projects/job/
> >         > > > > > > > > > > incubator-mxnet/job/master/449/
> >         > > > > > > > > > > > >
> >         > > > > > > > > > > > > Should we have a quick discussion or
> polling on
> >         > making
> >         > > > the
> >         > > > > > > mx-net
> >         > > > > > > > > > > branch
> >         > > > > > > > > > > > protected? If you still think we shouldn’t
> make it
> >         > > > protected
> >         > > > > > > please
> >         > > > > > > > > > > provide
> >         > > > > > > > > > > > a reason to support your claim.
> >         > > > > > > > > > > > >
> >         > > > > > > > > > > > > Few of us have concern over Jenkin’s
> stability. If I
> >         > > look
> >         > > > > two
> >         > > > > > > > weeks
> >         > > > > > > > > > > > back, after upgrading Linux slave to g2.8x
> and new
> >         > > windows
> >         > > > > AMI,
> >         > > > > > > we
> >         > > > > > > > > have
> >         > > > > > > > > > > not
> >         > > > > > > > > > > > seen any case where instance died due to
> high memory
> >         > > usage
> >         > > > or
> >         > > > > > any
> >         > > > > > > > > > process
> >         > > > > > > > > > > > got killed due to high cpu usage or any
> other issue
> >         > with
> >         > > > > > windows
> >         > > > > > > > > > slaves.
> >         > > > > > > > > > > > >
> >         > > > > > > > > > > > > Going forward we are also planning that
> if we add any
> >         > > new
> >         > > > > > slave
> >         > > > > > > > we
> >         > > > > > > > > > will
> >         > > > > > > > > > > > not enable the main load immediately, but
> rather will
> >         > do
> >         > > > > ‘test
> >         > > > > > > > build’
> >         > > > > > > > > > to
> >         > > > > > > > > > > > make sure that new slaves are not causing
> any
> >         > > > infrastructure
> >         > > > > > > issue
> >         > > > > > > > > and
> >         > > > > > > > > > > > capable to perform as good as existing
> slaves.
> >         > > > > > > > > > > > >
> >         > > > > > > > > > > > > -Gautam
> >         > > > > > > > > > > > >
> >         > > > > > > > > > > > > On 8/31/17, 5:27 PM, "Lupesko, Hagay" <
> >         > > lupesko@gmail.com
> >         > > > >
> >         > > > > > > wrote:
> >         > > > > > > > > > > > >
> >         > > > > > > > > > > > >     @madan looking into some failures –
> you’re right…
> >         > > > > there’s
> >         > > > > > > > > > multiple
> >         > > > > > > > > > > > issues going on, some of them
> intermittent, and we want
> >         > > to
> >         > > > be
> >         > > > > > > able
> >         > > > > > > > to
> >         > > > > > > > > > > merge
> >         > > > > > > > > > > > fixes in.
> >         > > > > > > > > > > > >     Agreed that we can wait with setting
> up protected
> >         > > > mode
> >         > > > > > > until
> >         > > > > > > > > > build
> >         > > > > > > > > > > > stabilizes.
> >         > > > > > > > > > > > >
> >         > > > > > > > > > > > >     On 8/31/17, 11:41, "Madan Jampani" <
> >         > > > > > > madan.jampani@gmail.com>
> >         > > > > > > > > > > wrote:
> >         > > > > > > > > > > > >
> >         > > > > > > > > > > > >         @hagay: we agree on the end
> state. I'm not
> >         > too
> >         > > > > > > particular
> >         > > > > > > > > > about
> >         > > > > > > > > > > > how we get
> >         > > > > > > > > > > > >         there. If you think enabling it
> now and fixes
> >         > > > > > > regression
> >         > > > > > > > > > later
> >         > > > > > > > > > > > is doable,
> >         > > > > > > > > > > > >         I'm fine with. I see a bit of a
> chicken and
> >         > egg
> >         > > > > > > problem.
> >         > > > > > > > We
> >         > > > > > > > > > > need
> >         > > > > > > > > > > > to get
> >         > > > > > > > > > > > >         some fixes in even when the
> status checks are
> >         > > > > > failing.
> >         > > > > > > > > > > > >
> >         > > > > > > > > > > > >         On Thu, Aug 31, 2017 at 11:25
> AM, Lupesko,
> >         > > Hagay
> >         > > > <
> >         > > > > > > > > > > > lupesko@gmail.com> wrote:
> >         > > > > > > > > > > > >
> >         > > > > > > > > > > > >         > @madan – re: getting to a
> stable CI first:
> >         > > > > > > > > > > > >         > I’m concerned that by not
> enabling
> >         > protected
> >         > > > > branch
> >         > > > > > > > mode
> >         > > > > > > > > > > ASAP,
> >         > > > > > > > > > > > we’re just
> >         > > > > > > > > > > > >         > taking in more regressions,
> which makes a
> >         > > > stable
> >         > > > > > > build
> >         > > > > > > > a
> >         > > > > > > > > > > > moving target for
> >         > > > > > > > > > > > >         > us…
> >         > > > > > > > > > > > >         >
> >         > > > > > > > > > > > >         > On 8/31/17, 10:49, "Zha,
> Sheng" <
> >         > > > > > zhasheng@amazon.com
> >         > > > > > > >
> >         > > > > > > > > > wrote:
> >         > > > > > > > > > > > >         >
> >         > > > > > > > > > > > >         >     Just one thing: please
> don’t disable
> >         > more
> >         > > > > tests
> >         > > > > > > or
> >         > > > > > > > > just
> >         > > > > > > > > > > > raise the
> >         > > > > > > > > > > > >         > tolerance thresholds.
> >         > > > > > > > > > > > >         >
> >         > > > > > > > > > > > >         >     Best regards,
> >         > > > > > > > > > > > >         >     -sz
> >         > > > > > > > > > > > >         >
> >         > > > > > > > > > > > >         >     On 8/31/17, 10:45 AM,
> "Madan Jampani" <
> >         > > > > > > > > > > > madan.jampani@gmail.com> wrote:
> >         > > > > > > > > > > > >         >
> >         > > > > > > > > > > > >         >         +1
> >         > > > > > > > > > > > >         >         Before we can turn
> protected mode I
> >         > > > feel
> >         > > > > we
> >         > > > > > > > > should
> >         > > > > > > > > > > > first get to a
> >         > > > > > > > > > > > >         > stable CI
> >         > > > > > > > > > > > >         >         pipeline.
> >         > > > > > > > > > > > >         >         Sandeep is chasing
> down known
> >         > > breaking
> >         > > > > > > issues.
> >         > > > > > > > > > > > >         >
> >         > > > > > > > > > > > >         >
> >         > > > > > > > > > > > >         >         On Thu, Aug 31, 2017
> at 10:27 AM,
> >         > > Hagay
> >         > > > > > > > Lupesko <
> >         > > > > > > > > > > > lupesko@gmail.com>
> >         > > > > > > > > > > > >         > wrote:
> >         > > > > > > > > > > > >         >
> >         > > > > > > > > > > > >         >         > Build stability is a
> major issue,
> >         > > > > builds
> >         > > > > > > have
> >         > > > > > > > > > been
> >         > > > > > > > > > > > failing left
> >         > > > > > > > > > > > >         > and right
> >         > > > > > > > > > > > >         >         > over the last week.
> Some of it is
> >         > > due
> >         > > > > to
> >         > > > > > > > > Jenkins
> >         > > > > > > > > > > > slave issues,
> >         > > > > > > > > > > > >         > but some are
> >         > > > > > > > > > > > >         >         > real regressions.
> >         > > > > > > > > > > > >         >         > We need to be more
> strict in the
> >         > > code
> >         > > > > > we're
> >         > > > > > > > > > > > committing.
> >         > > > > > > > > > > > >         >         >
> >         > > > > > > > > > > > >         >         > I propose we
> configure our master
> >         > > to
> >         > > > > be a
> >         > > > > > > > > > protected
> >         > > > > > > > > > > > branch (
> >         > > > > > > > > > > > >         >         >
> >         > > > > > > > > > > > https://help.github.com/
> articles/about-protected-
> >         > > branches/
> >         > > > ).
> >         > > > > > > > > > > > >         >         >
> >         > > > > > > > > > > > >         >         > Thoughts?
> >         > > > > > > > > > > > >         >         >
> >         > > > > > > > > > > > >         >         > On 2017-08-28 22:41,
> sandeep
> >         > > > > > krishnamurthy
> >         > > > > > > <
> >         > > > > > > > > > > > s...@gmail.com>
> >         > > > > > > > > > > > >         > wrote:
> >         > > > > > > > > > > > >         >         > > Hello Committers
> and
> >         > > Contributors,>
> >         > > > > > > > > > > > >         >         > >
> >         > > > > > > > > > > > >         >         > > Due to unstable
> build
> >         > pipelines,
> >         > > > from
> >         > > > > > > past
> >         > > > > > > > 1
> >         > > > > > > > > > > week,
> >         > > > > > > > > > > > PRs are
> >         > > > > > > > > > > > >         > being merged>
> >         > > > > > > > > > > > >         >         > > after CR ignoring
> PR build
> >         > > status.
> >         > > > > > Build
> >         > > > > > > > > > pipeline
> >         > > > > > > > > > > > is much more
> >         > > > > > > > > > > > >         > stable
> >         > > > > > > > > > > > >         >         > than>
> >         > > > > > > > > > > > >         >         > > last week and most
> of the build
> >         > > > > > failures
> >         > > > > > > > you
> >         > > > > > > > > > see
> >         > > > > > > > > > > > from now on,
> >         > > > > > > > > > > > >         > are likely
> >         > > > > > > > > > > > >         >         > to>
> >         > > > > > > > > > > > >         >         > > be a valid failure
> and hence,
> >         > it
> >         > > is
> >         > > > > > > > > recommended
> >         > > > > > > > > > > to
> >         > > > > > > > > > > > wait for PR
> >         > > > > > > > > > > > >         > builds,
> >         > > > > > > > > > > > >         >         > see>
> >         > > > > > > > > > > > >         >         > > the root cause of
> any build
> >         > > > failures
> >         > > > > > > before
> >         > > > > > > > > > > > proceeding with
> >         > > > > > > > > > > > >         > merges.>
> >         > > > > > > > > > > > >         >         > >
> >         > > > > > > > > > > > >         >         > > At this point of
> time, there
> >         > are
> >         > > 2
> >         > > > > > > > > intermittent
> >         > > > > > > > > > > > issue yet to
> >         > > > > > > > > > > > >         > be fixed ->
> >         > > > > > > > > > > > >         >         > > * Network error
> leading to
> >         > GitHub
> >         > > > > > > requests
> >         > > > > > > > > > > > throwing 404>
> >         > > > > > > > > > > > >         >         > > * A conflict in
> artifacts
> >         > > generated
> >         > > > > > > between
> >         > > > > > > > > > > > branches/PR -
> >         > > > > > > > > > > > >         > Cause unknown
> >         > > > > > > > > > > > >         >         > yet.>
> >         > > > > > > > > > > > >         >         > > These issues will
> be fixed
> >         > soon.>
> >         > > > > > > > > > > > >         >         > >
> >         > > > > > > > > > > > >         >         > >
> >         > > > > > > > > > > > >         >         > > -- >
> >         > > > > > > > > > > > >         >         > > Sandeep
> Krishnamurthy>
> >         > > > > > > > > > > > >         >         > >
> >         > > > > > > > > > > > >         >         >
> >         > > > > > > > > > > > >         >
> >         > > > > > > > > > > > >         >
> >         > > > > > > > > > > > >         >
> >         > > > > > > > > > > > >         >
> >         > > > > > > > > > > > >         >
> >         > > > > > > > > > > > >         >
> >         > > > > > > > > > > > >
> >         > > > > > > > > > > > >
> >         > > > > > > > > > > > >
> >         > > > > > > > > > > > >
> >         > > > > > > > > > > > >
> >         > > > > > > > > > > >
> >         > > > > > > > > > > >
> >         > > > > > > > > > > >
> >         > > > > > > > > > > > --
> >         > > > > > > > > > > > - Tsuyoshi
> >         > > > > > > > > > > >
> >         > > > > > > > > > >
> >         > > > > > > > > >
> >         > > > > > > > > >
> >         > > > > > > > > >
> >         > > > > > > > > > --
> >         > > > > > > > > > Best Regards,
> >         > > > > > > > > > Gautam Kumar
> >         > > > > > > > > >
> >         > > > > > > > >
> >         > > > > > > >
> >         > > > > > > >
> >         > > > > > > >
> >         > > > > > > > --
> >         > > > > > > > Best Regards,
> >         > > > > > > > Gautam Kumar
> >         > > > > > > >
> >         > > > > > >
> >         > > > > >
> >         > > > >
> >         > > >
> >         > >
> >         >
> >
> >
> >
> >
>



-- 
Best Regards,
Gautam Kumar

Re: Apache MXNet build failures are mostly valid - verify before merge

Posted by "Lupesko, Hagay" <lu...@gmail.com>.
Yay!

On 10/12/17, 11:27, "Gautam" <ga...@gmail.com> wrote:

    Hi All,
    
      The master branch is protected now. Please review your PR for merge.
    Thanks to Infra team for following up on this.
    
    Regards,
    Gautam
    
    On Mon, Oct 2, 2017 at 1:19 PM, Gautam <ga...@gmail.com> wrote:
    
    > Thanks All.
    >
    >  I've created the JIRA to mark the protected branch for master
    > https://issues.apache.org/jira/browse/INCUBATOR-205.
    > We also need to add all the committers to be code owner as discussed in
    > the slack, I've opened a PR for it https://github.com/apache/
    > incubator-mxnet/pull/8128.
    >
    > Good point Joern, I'll follow up on that.
    >
    > Regards,
    > Gautam
    >
    > On Fri, Sep 29, 2017 at 2:20 AM, Joern Kottmann <ko...@gmail.com>
    > wrote:
    >
    >> It also makes sense to block too old PRs from merging, because the
    >> test results are outdated and the build might fail after it gets
    >> merged.
    >>
    >> Jörn
    >>
    >> On Thu, Sep 28, 2017 at 9:14 PM, Zha, Sheng <zh...@amazon.com> wrote:
    >> > +1 on protected branch.
    >> >
    >> > Best regards,
    >> > -sz
    >> >
    >> > On 9/28/17, 11:48 AM, "Kumar, Gautam" <ga...@amazon.com> wrote:
    >> >
    >> >     Hi Guys,
    >> >
    >> >      Let’s focus on specific issue here.
    >> >
    >> >     Marking the master branch protected which involves “Only merge if
    >> checks has passed, and yes it will run the complete build”.
    >> >
    >> >     We can’t afford to degrade the quality and keep debugging the build
    >> failure forever. If it’s slow down the development at the cost of quality I
    >> will vote for the quality.
    >> >     We can work on improving the infrastructure to improve the overall
    >> speed.  If you have any specific concerns on availability of Jenkins please
    >> point out.
    >> >
    >> >     -Gautam
    >> >
    >> >
    >> >     On 9/28/17, 11:38 AM, "Chris Olivier" <cj...@gmail.com>
    >> wrote:
    >> >
    >> >         -1000 on that. :)
    >> >
    >> >         On Thu, Sep 28, 2017 at 11:33 AM Naveen Swamy <
    >> mnnaveen@gmail.com> wrote:
    >> >
    >> >         > PR->Sanity test/Linux build/test->reviewer/committer approves
    >> the
    >> >         > change->Comment "Build Now" (Or trigger on at least one
    >> approval from a
    >> >         > committer other than author)->*Full build-*>*passes
    >> build*->Enable Merge
    >> >         >
    >> >         > Let us take this particular topic to a separate thread or
    >> discuss offline
    >> >         > if further clarification is needed.
    >> >         >
    >> >         > On Thu, Sep 28, 2017 at 11:24 AM, Chris Olivier <
    >> cjolivier01@gmail.com>
    >> >         > wrote:
    >> >         >
    >> >         > > I understand the proposal.  How to trigger a build in that
    >> case?
    >> >         > >
    >> >         > >
    >> >         > > On Thu, Sep 28, 2017 at 10:54 AM Madan Jampani <
    >> madan.jampani@gmail.com>
    >> >         > > wrote:
    >> >         > >
    >> >         > > > Chris,
    >> >         > > > I don't think Naveen is suggesting that a merge happen
    >> without full
    >> >         > > > verification i.e. all tests across all platforms pass.
    >> >         > > > If a PR has some back and forth and results in multiple
    >> revisions
    >> >         > (which
    >> >         > > is
    >> >         > > > arguably more common than a random unit test failing), we
    >> simply delay
    >> >         > > full
    >> >         > > > verification until the owner/reviewer have settled on a
    >> mutually
    >> >         > > acceptable
    >> >         > > > state.
    >> >         > > >
    >> >         > > > On Thu, Sep 28, 2017 at 10:25 AM, Chris Olivier <
    >> cjolivier01@gmail.com
    >> >         > >
    >> >         > > > wrote:
    >> >         > > >
    >> >         > > > > -1 for running only partial tests.  Most failing unit
    >> tests that get
    >> >         > > > > through fail only for certain
    >> platforms/configurations.  I personally
    >> >         > > > > prefer to be assured the build and test is good before
    >> merge.  Most
    >> >         > PR
    >> >         > > > > merges aren't in a huge hurry.
    >> >         > > > >
    >> >         > > > > On Thu, Sep 28, 2017 at 9:54 AM, Naveen Swamy <
    >> mnnaveen@gmail.com>
    >> >         > > > wrote:
    >> >         > > > >
    >> >         > > > > > +1 to make it protected. Here is what I am thinking
    >> for PR builds
    >> >         > > > > > on a PR run Sanity Tests + build/test one
    >> platform->committer
    >> >         > reviews
    >> >         > > > the
    >> >         > > > > > code and issues "Build Now", a full build is
    >> run->Github checks
    >> >         > that
    >> >         > > > the
    >> >         > > > > > full build checks succeed before it can be merged.
    >> >         > > > > >
    >> >         > > > > > I agree with Madan that PR should be approved by one
    >> another
    >> >         > > committer.
    >> >         > > > > >
    >> >         > > > > >
    >> >         > > > > >
    >> >         > > > > > On Thu, Sep 28, 2017 at 9:37 AM, Madan Jampani <
    >> >         > > > madan.jampani@gmail.com>
    >> >         > > > > > wrote:
    >> >         > > > > >
    >> >         > > > > > > +1
    >> >         > > > > > >
    >> >         > > > > > > At a minimum I'd like to see the following two
    >> happen:
    >> >         > > > > > > - Option to merge is disabled until all required
    >> checks pass.
    >> >         > > > > > > - Code is reviewed and given +1 by at least one
    >> other committer
    >> >         > (no
    >> >         > > > > self
    >> >         > > > > > > review).
    >> >         > > > > > >
    >> >         > > > > > > On Wed, Sep 27, 2017 at 11:15 PM, Gautam <
    >> gautamnitc@gmail.com>
    >> >         > > > wrote:
    >> >         > > > > > >
    >> >         > > > > > > > Hi Chris,
    >> >         > > > > > > >
    >> >         > > > > > > >   Here <https://help.github.com/artic
    >> les/about-protected-
    >> >         > > branches/
    >> >         > > > >
    >> >         > > > > is
    >> >         > > > > > > > user
    >> >         > > > > > > > document on semantics of protected branch.
    >> >         > > > > > > > In short when a branch is protected following
    >> applies to that
    >> >         > > > branch.
    >> >         > > > > > > >
    >> >         > > > > > > >    - Can't be force pushed
    >> >         > > > > > > >    - Can't be deleted
    >> >         > > > > > > >    - Can't have changes merged into it until
    >> required status
    >> >         > > checks
    >> >         > > > > > > >    <https://help.github.com/artic
    >> les/about-required-
    >> >         > > status-checks>
    >> >         > > > > > pass
    >> >         > > > > > > >    - Can't have changes merged into it until
    >> required reviews
    >> >         > are
    >> >         > > > > > > approved
    >> >         > > > > > > >    <https://help.github.com/artic
    >> les/approving-a-pull-
    >> >         > > > > > > > request-with-required-reviews>
    >> >         > > > > > > >    - Can't be edited or have files uploaded to it
    >> from the web
    >> >         > > > > > > >    - Can't have changes merged into it until
    >> changes to files
    >> >         > > that
    >> >         > > > > > > > have a designated
    >> >         > > > > > > >    code owner <https://help.github.com/
    >> >         > > articles/about-codeowners/>
    >> >         > > > > > have
    >> >         > > > > > > >    been approved by that owner
    >> >         > > > > > > >
    >> >         > > > > > > >  I am sure many of us might not want to have all
    >> these but we
    >> >         > can
    >> >         > > > > > debate
    >> >         > > > > > > on
    >> >         > > > > > > > it. My main motive was to "*Can't have changes
    >> merged into it
    >> >         > > until
    >> >         > > > > > > > required status checks pass*"
    >> >         > > > > > > >
    >> >         > > > > > > >
    >> >         > > > > > > > -Gautam
    >> >         > > > > > > >
    >> >         > > > > > > >
    >> >         > > > > > > >
    >> >         > > > > > > > On Wed, Sep 27, 2017 at 11:09 PM, Chris Olivier <
    >> >         > > > > cjolivier01@gmail.com
    >> >         > > > > > >
    >> >         > > > > > > > wrote:
    >> >         > > > > > > >
    >> >         > > > > > > > > What does that mean? "Protected"? Protected
    >> from what?
    >> >         > > > > > > > >
    >> >         > > > > > > > > On Wed, Sep 27, 2017 at 11:08 PM Gautam <
    >> >         > gautamnitc@gmail.com>
    >> >         > > > > > wrote:
    >> >         > > > > > > > >
    >> >         > > > > > > > > > Hi Chris,
    >> >         > > > > > > > > >
    >> >         > > > > > > > > >    I mean make "master branch protected" of
    >> MXNet.
    >> >         > > > > > > > > >
    >> >         > > > > > > > > > -Gautam
    >> >         > > > > > > > > >
    >> >         > > > > > > > > > On Wed, Sep 27, 2017 at 11:04 PM, Chris
    >> Olivier <
    >> >         > > > > > > cjolivier01@gmail.com
    >> >         > > > > > > > >
    >> >         > > > > > > > > > wrote:
    >> >         > > > > > > > > >
    >> >         > > > > > > > > > > What does this mean? "Mx-net branch
    >> protected"?
    >> >         > > > > > > > > > >
    >> >         > > > > > > > > > > On Wed, Sep 27, 2017 at 9:59 PM Tsuyoshi
    >> OZAWA <
    >> >         > > > > > > > > ozawa.tsuyoshi@gmail.com
    >> >         > > > > > > > > > >
    >> >         > > > > > > > > > > wrote:
    >> >         > > > > > > > > > >
    >> >         > > > > > > > > > > > +1,
    >> >         > > > > > > > > > > >
    >> >         > > > > > > > > > > > While I'm checking the recent build
    >> failures, and I
    >> >         > think
    >> >         > > > the
    >> >         > > > > > > > > decision
    >> >         > > > > > > > > > > > of making the mx-net branch protected is
    >> necessary for
    >> >         > > > stable
    >> >         > > > > > > > > > > > building.
    >> >         > > > > > > > > > > > Thanks Kumar for resuming important
    >> discussion.
    >> >         > > > > > > > > > > >
    >> >         > > > > > > > > > > > Best regards
    >> >         > > > > > > > > > > > - Tsuyoshi
    >> >         > > > > > > > > > > >
    >> >         > > > > > > > > > > > On Thu, Sep 28, 2017 at 12:56 PM, Kumar,
    >> Gautam <
    >> >         > > > > > > gauta@amazon.com>
    >> >         > > > > > > > > > > wrote:
    >> >         > > > > > > > > > > > > Reviving the discussion.
    >> >         > > > > > > > > > > > >
    >> >         > > > > > > > > > > > > At this point of time we have couple of
    >> stable builds
    >> >         > > > > > > > > > > > >
    >> >         > > > > > > > > > > >
    >> >         > https://builds.apache.org/view/Incubator%20Projects/job/
    >> >         > > > > > > > > > > incubator-mxnet/job/master/448/
    >> >         > > > > > > > > > > > >
    >> >         > > > > > > > > > > >
    >> >         > https://builds.apache.org/view/Incubator%20Projects/job/
    >> >         > > > > > > > > > > incubator-mxnet/job/master/449/
    >> >         > > > > > > > > > > > >
    >> >         > > > > > > > > > > > > Should we have a quick discussion or
    >> polling on
    >> >         > making
    >> >         > > > the
    >> >         > > > > > > mx-net
    >> >         > > > > > > > > > > branch
    >> >         > > > > > > > > > > > protected? If you still think we
    >> shouldn’t make it
    >> >         > > > protected
    >> >         > > > > > > please
    >> >         > > > > > > > > > > provide
    >> >         > > > > > > > > > > > a reason to support your claim.
    >> >         > > > > > > > > > > > >
    >> >         > > > > > > > > > > > > Few of us have concern over Jenkin’s
    >> stability. If I
    >> >         > > look
    >> >         > > > > two
    >> >         > > > > > > > weeks
    >> >         > > > > > > > > > > > back, after upgrading Linux slave to
    >> g2.8x and new
    >> >         > > windows
    >> >         > > > > AMI,
    >> >         > > > > > > we
    >> >         > > > > > > > > have
    >> >         > > > > > > > > > > not
    >> >         > > > > > > > > > > > seen any case where instance died due to
    >> high memory
    >> >         > > usage
    >> >         > > > or
    >> >         > > > > > any
    >> >         > > > > > > > > > process
    >> >         > > > > > > > > > > > got killed due to high cpu usage or any
    >> other issue
    >> >         > with
    >> >         > > > > > windows
    >> >         > > > > > > > > > slaves.
    >> >         > > > > > > > > > > > >
    >> >         > > > > > > > > > > > > Going forward we are also planning that
    >> if we add any
    >> >         > > new
    >> >         > > > > > slave
    >> >         > > > > > > > we
    >> >         > > > > > > > > > will
    >> >         > > > > > > > > > > > not enable the main load immediately, but
    >> rather will
    >> >         > do
    >> >         > > > > ‘test
    >> >         > > > > > > > build’
    >> >         > > > > > > > > > to
    >> >         > > > > > > > > > > > make sure that new slaves are not causing
    >> any
    >> >         > > > infrastructure
    >> >         > > > > > > issue
    >> >         > > > > > > > > and
    >> >         > > > > > > > > > > > capable to perform as good as existing
    >> slaves.
    >> >         > > > > > > > > > > > >
    >> >         > > > > > > > > > > > > -Gautam
    >> >         > > > > > > > > > > > >
    >> >         > > > > > > > > > > > > On 8/31/17, 5:27 PM, "Lupesko, Hagay" <
    >> >         > > lupesko@gmail.com
    >> >         > > > >
    >> >         > > > > > > wrote:
    >> >         > > > > > > > > > > > >
    >> >         > > > > > > > > > > > >     @madan looking into some failures –
    >> you’re right…
    >> >         > > > > there’s
    >> >         > > > > > > > > > multiple
    >> >         > > > > > > > > > > > issues going on, some of them
    >> intermittent, and we want
    >> >         > > to
    >> >         > > > be
    >> >         > > > > > > able
    >> >         > > > > > > > to
    >> >         > > > > > > > > > > merge
    >> >         > > > > > > > > > > > fixes in.
    >> >         > > > > > > > > > > > >     Agreed that we can wait with
    >> setting up protected
    >> >         > > > mode
    >> >         > > > > > > until
    >> >         > > > > > > > > > build
    >> >         > > > > > > > > > > > stabilizes.
    >> >         > > > > > > > > > > > >
    >> >         > > > > > > > > > > > >     On 8/31/17, 11:41, "Madan Jampani" <
    >> >         > > > > > > madan.jampani@gmail.com>
    >> >         > > > > > > > > > > wrote:
    >> >         > > > > > > > > > > > >
    >> >         > > > > > > > > > > > >         @hagay: we agree on the end
    >> state. I'm not
    >> >         > too
    >> >         > > > > > > particular
    >> >         > > > > > > > > > about
    >> >         > > > > > > > > > > > how we get
    >> >         > > > > > > > > > > > >         there. If you think enabling it
    >> now and fixes
    >> >         > > > > > > regression
    >> >         > > > > > > > > > later
    >> >         > > > > > > > > > > > is doable,
    >> >         > > > > > > > > > > > >         I'm fine with. I see a bit of a
    >> chicken and
    >> >         > egg
    >> >         > > > > > > problem.
    >> >         > > > > > > > We
    >> >         > > > > > > > > > > need
    >> >         > > > > > > > > > > > to get
    >> >         > > > > > > > > > > > >         some fixes in even when the
    >> status checks are
    >> >         > > > > > failing.
    >> >         > > > > > > > > > > > >
    >> >         > > > > > > > > > > > >         On Thu, Aug 31, 2017 at 11:25
    >> AM, Lupesko,
    >> >         > > Hagay
    >> >         > > > <
    >> >         > > > > > > > > > > > lupesko@gmail.com> wrote:
    >> >         > > > > > > > > > > > >
    >> >         > > > > > > > > > > > >         > @madan – re: getting to a
    >> stable CI first:
    >> >         > > > > > > > > > > > >         > I’m concerned that by not
    >> enabling
    >> >         > protected
    >> >         > > > > branch
    >> >         > > > > > > > mode
    >> >         > > > > > > > > > > ASAP,
    >> >         > > > > > > > > > > > we’re just
    >> >         > > > > > > > > > > > >         > taking in more regressions,
    >> which makes a
    >> >         > > > stable
    >> >         > > > > > > build
    >> >         > > > > > > > a
    >> >         > > > > > > > > > > > moving target for
    >> >         > > > > > > > > > > > >         > us…
    >> >         > > > > > > > > > > > >         >
    >> >         > > > > > > > > > > > >         > On 8/31/17, 10:49, "Zha,
    >> Sheng" <
    >> >         > > > > > zhasheng@amazon.com
    >> >         > > > > > > >
    >> >         > > > > > > > > > wrote:
    >> >         > > > > > > > > > > > >         >
    >> >         > > > > > > > > > > > >         >     Just one thing: please
    >> don’t disable
    >> >         > more
    >> >         > > > > tests
    >> >         > > > > > > or
    >> >         > > > > > > > > just
    >> >         > > > > > > > > > > > raise the
    >> >         > > > > > > > > > > > >         > tolerance thresholds.
    >> >         > > > > > > > > > > > >         >
    >> >         > > > > > > > > > > > >         >     Best regards,
    >> >         > > > > > > > > > > > >         >     -sz
    >> >         > > > > > > > > > > > >         >
    >> >         > > > > > > > > > > > >         >     On 8/31/17, 10:45 AM,
    >> "Madan Jampani" <
    >> >         > > > > > > > > > > > madan.jampani@gmail.com> wrote:
    >> >         > > > > > > > > > > > >         >
    >> >         > > > > > > > > > > > >         >         +1
    >> >         > > > > > > > > > > > >         >         Before we can turn
    >> protected mode I
    >> >         > > > feel
    >> >         > > > > we
    >> >         > > > > > > > > should
    >> >         > > > > > > > > > > > first get to a
    >> >         > > > > > > > > > > > >         > stable CI
    >> >         > > > > > > > > > > > >         >         pipeline.
    >> >         > > > > > > > > > > > >         >         Sandeep is chasing
    >> down known
    >> >         > > breaking
    >> >         > > > > > > issues.
    >> >         > > > > > > > > > > > >         >
    >> >         > > > > > > > > > > > >         >
    >> >         > > > > > > > > > > > >         >         On Thu, Aug 31, 2017
    >> at 10:27 AM,
    >> >         > > Hagay
    >> >         > > > > > > > Lupesko <
    >> >         > > > > > > > > > > > lupesko@gmail.com>
    >> >         > > > > > > > > > > > >         > wrote:
    >> >         > > > > > > > > > > > >         >
    >> >         > > > > > > > > > > > >         >         > Build stability is
    >> a major issue,
    >> >         > > > > builds
    >> >         > > > > > > have
    >> >         > > > > > > > > > been
    >> >         > > > > > > > > > > > failing left
    >> >         > > > > > > > > > > > >         > and right
    >> >         > > > > > > > > > > > >         >         > over the last week.
    >> Some of it is
    >> >         > > due
    >> >         > > > > to
    >> >         > > > > > > > > Jenkins
    >> >         > > > > > > > > > > > slave issues,
    >> >         > > > > > > > > > > > >         > but some are
    >> >         > > > > > > > > > > > >         >         > real regressions.
    >> >         > > > > > > > > > > > >         >         > We need to be more
    >> strict in the
    >> >         > > code
    >> >         > > > > > we're
    >> >         > > > > > > > > > > > committing.
    >> >         > > > > > > > > > > > >         >         >
    >> >         > > > > > > > > > > > >         >         > I propose we
    >> configure our master
    >> >         > > to
    >> >         > > > > be a
    >> >         > > > > > > > > > protected
    >> >         > > > > > > > > > > > branch (
    >> >         > > > > > > > > > > > >         >         >
    >> >         > > > > > > > > > > > https://help.github.com/articl
    >> es/about-protected-
    >> >         > > branches/
    >> >         > > > ).
    >> >         > > > > > > > > > > > >         >         >
    >> >         > > > > > > > > > > > >         >         > Thoughts?
    >> >         > > > > > > > > > > > >         >         >
    >> >         > > > > > > > > > > > >         >         > On 2017-08-28
    >> 22:41, sandeep
    >> >         > > > > > krishnamurthy
    >> >         > > > > > > <
    >> >         > > > > > > > > > > > s...@gmail.com>
    >> >         > > > > > > > > > > > >         > wrote:
    >> >         > > > > > > > > > > > >         >         > > Hello Committers
    >> and
    >> >         > > Contributors,>
    >> >         > > > > > > > > > > > >         >         > >
    >> >         > > > > > > > > > > > >         >         > > Due to unstable
    >> build
    >> >         > pipelines,
    >> >         > > > from
    >> >         > > > > > > past
    >> >         > > > > > > > 1
    >> >         > > > > > > > > > > week,
    >> >         > > > > > > > > > > > PRs are
    >> >         > > > > > > > > > > > >         > being merged>
    >> >         > > > > > > > > > > > >         >         > > after CR ignoring
    >> PR build
    >> >         > > status.
    >> >         > > > > > Build
    >> >         > > > > > > > > > pipeline
    >> >         > > > > > > > > > > > is much more
    >> >         > > > > > > > > > > > >         > stable
    >> >         > > > > > > > > > > > >         >         > than>
    >> >         > > > > > > > > > > > >         >         > > last week and
    >> most of the build
    >> >         > > > > > failures
    >> >         > > > > > > > you
    >> >         > > > > > > > > > see
    >> >         > > > > > > > > > > > from now on,
    >> >         > > > > > > > > > > > >         > are likely
    >> >         > > > > > > > > > > > >         >         > to>
    >> >         > > > > > > > > > > > >         >         > > be a valid
    >> failure and hence,
    >> >         > it
    >> >         > > is
    >> >         > > > > > > > > recommended
    >> >         > > > > > > > > > > to
    >> >         > > > > > > > > > > > wait for PR
    >> >         > > > > > > > > > > > >         > builds,
    >> >         > > > > > > > > > > > >         >         > see>
    >> >         > > > > > > > > > > > >         >         > > the root cause of
    >> any build
    >> >         > > > failures
    >> >         > > > > > > before
    >> >         > > > > > > > > > > > proceeding with
    >> >         > > > > > > > > > > > >         > merges.>
    >> >         > > > > > > > > > > > >         >         > >
    >> >         > > > > > > > > > > > >         >         > > At this point of
    >> time, there
    >> >         > are
    >> >         > > 2
    >> >         > > > > > > > > intermittent
    >> >         > > > > > > > > > > > issue yet to
    >> >         > > > > > > > > > > > >         > be fixed ->
    >> >         > > > > > > > > > > > >         >         > > * Network error
    >> leading to
    >> >         > GitHub
    >> >         > > > > > > requests
    >> >         > > > > > > > > > > > throwing 404>
    >> >         > > > > > > > > > > > >         >         > > * A conflict in
    >> artifacts
    >> >         > > generated
    >> >         > > > > > > between
    >> >         > > > > > > > > > > > branches/PR -
    >> >         > > > > > > > > > > > >         > Cause unknown
    >> >         > > > > > > > > > > > >         >         > yet.>
    >> >         > > > > > > > > > > > >         >         > > These issues will
    >> be fixed
    >> >         > soon.>
    >> >         > > > > > > > > > > > >         >         > >
    >> >         > > > > > > > > > > > >         >         > >
    >> >         > > > > > > > > > > > >         >         > > -- >
    >> >         > > > > > > > > > > > >         >         > > Sandeep
    >> Krishnamurthy>
    >> >         > > > > > > > > > > > >         >         > >
    >> >         > > > > > > > > > > > >         >         >
    >> >         > > > > > > > > > > > >         >
    >> >         > > > > > > > > > > > >         >
    >> >         > > > > > > > > > > > >         >
    >> >         > > > > > > > > > > > >         >
    >> >         > > > > > > > > > > > >         >
    >> >         > > > > > > > > > > > >         >
    >> >         > > > > > > > > > > > >
    >> >         > > > > > > > > > > > >
    >> >         > > > > > > > > > > > >
    >> >         > > > > > > > > > > > >
    >> >         > > > > > > > > > > > >
    >> >         > > > > > > > > > > >
    >> >         > > > > > > > > > > >
    >> >         > > > > > > > > > > >
    >> >         > > > > > > > > > > > --
    >> >         > > > > > > > > > > > - Tsuyoshi
    >> >         > > > > > > > > > > >
    >> >         > > > > > > > > > >
    >> >         > > > > > > > > >
    >> >         > > > > > > > > >
    >> >         > > > > > > > > >
    >> >         > > > > > > > > > --
    >> >         > > > > > > > > > Best Regards,
    >> >         > > > > > > > > > Gautam Kumar
    >> >         > > > > > > > > >
    >> >         > > > > > > > >
    >> >         > > > > > > >
    >> >         > > > > > > >
    >> >         > > > > > > >
    >> >         > > > > > > > --
    >> >         > > > > > > > Best Regards,
    >> >         > > > > > > > Gautam Kumar
    >> >         > > > > > > >
    >> >         > > > > > >
    >> >         > > > > >
    >> >         > > > >
    >> >         > > >
    >> >         > >
    >> >         >
    >> >
    >> >
    >> >
    >> >
    >>
    >
    >
    >
    > --
    > Best Regards,
    > Gautam Kumar
    >
    >
    >
    >
    >
    
    
    -- 
    Best Regards,
    Gautam Kumar
    



Re: Apache MXNet build failures are mostly valid - verify before merge

Posted by Gautam <ga...@gmail.com>.
Hi All,

  The master branch is protected now. Please review your PR for merge.
Thanks to Infra team for following up on this.

Regards,
Gautam

On Mon, Oct 2, 2017 at 1:19 PM, Gautam <ga...@gmail.com> wrote:

> Thanks All.
>
>  I've created the JIRA to mark the protected branch for master
> https://issues.apache.org/jira/browse/INCUBATOR-205.
> We also need to add all the committers to be code owner as discussed in
> the slack, I've opened a PR for it https://github.com/apache/
> incubator-mxnet/pull/8128.
>
> Good point Joern, I'll follow up on that.
>
> Regards,
> Gautam
>
> On Fri, Sep 29, 2017 at 2:20 AM, Joern Kottmann <ko...@gmail.com>
> wrote:
>
>> It also makes sense to block too old PRs from merging, because the
>> test results are outdated and the build might fail after it gets
>> merged.
>>
>> Jörn
>>
>> On Thu, Sep 28, 2017 at 9:14 PM, Zha, Sheng <zh...@amazon.com> wrote:
>> > +1 on protected branch.
>> >
>> > Best regards,
>> > -sz
>> >
>> > On 9/28/17, 11:48 AM, "Kumar, Gautam" <ga...@amazon.com> wrote:
>> >
>> >     Hi Guys,
>> >
>> >      Let’s focus on specific issue here.
>> >
>> >     Marking the master branch protected which involves “Only merge if
>> checks has passed, and yes it will run the complete build”.
>> >
>> >     We can’t afford to degrade the quality and keep debugging the build
>> failure forever. If it’s slow down the development at the cost of quality I
>> will vote for the quality.
>> >     We can work on improving the infrastructure to improve the overall
>> speed.  If you have any specific concerns on availability of Jenkins please
>> point out.
>> >
>> >     -Gautam
>> >
>> >
>> >     On 9/28/17, 11:38 AM, "Chris Olivier" <cj...@gmail.com>
>> wrote:
>> >
>> >         -1000 on that. :)
>> >
>> >         On Thu, Sep 28, 2017 at 11:33 AM Naveen Swamy <
>> mnnaveen@gmail.com> wrote:
>> >
>> >         > PR->Sanity test/Linux build/test->reviewer/committer approves
>> the
>> >         > change->Comment "Build Now" (Or trigger on at least one
>> approval from a
>> >         > committer other than author)->*Full build-*>*passes
>> build*->Enable Merge
>> >         >
>> >         > Let us take this particular topic to a separate thread or
>> discuss offline
>> >         > if further clarification is needed.
>> >         >
>> >         > On Thu, Sep 28, 2017 at 11:24 AM, Chris Olivier <
>> cjolivier01@gmail.com>
>> >         > wrote:
>> >         >
>> >         > > I understand the proposal.  How to trigger a build in that
>> case?
>> >         > >
>> >         > >
>> >         > > On Thu, Sep 28, 2017 at 10:54 AM Madan Jampani <
>> madan.jampani@gmail.com>
>> >         > > wrote:
>> >         > >
>> >         > > > Chris,
>> >         > > > I don't think Naveen is suggesting that a merge happen
>> without full
>> >         > > > verification i.e. all tests across all platforms pass.
>> >         > > > If a PR has some back and forth and results in multiple
>> revisions
>> >         > (which
>> >         > > is
>> >         > > > arguably more common than a random unit test failing), we
>> simply delay
>> >         > > full
>> >         > > > verification until the owner/reviewer have settled on a
>> mutually
>> >         > > acceptable
>> >         > > > state.
>> >         > > >
>> >         > > > On Thu, Sep 28, 2017 at 10:25 AM, Chris Olivier <
>> cjolivier01@gmail.com
>> >         > >
>> >         > > > wrote:
>> >         > > >
>> >         > > > > -1 for running only partial tests.  Most failing unit
>> tests that get
>> >         > > > > through fail only for certain
>> platforms/configurations.  I personally
>> >         > > > > prefer to be assured the build and test is good before
>> merge.  Most
>> >         > PR
>> >         > > > > merges aren't in a huge hurry.
>> >         > > > >
>> >         > > > > On Thu, Sep 28, 2017 at 9:54 AM, Naveen Swamy <
>> mnnaveen@gmail.com>
>> >         > > > wrote:
>> >         > > > >
>> >         > > > > > +1 to make it protected. Here is what I am thinking
>> for PR builds
>> >         > > > > > on a PR run Sanity Tests + build/test one
>> platform->committer
>> >         > reviews
>> >         > > > the
>> >         > > > > > code and issues "Build Now", a full build is
>> run->Github checks
>> >         > that
>> >         > > > the
>> >         > > > > > full build checks succeed before it can be merged.
>> >         > > > > >
>> >         > > > > > I agree with Madan that PR should be approved by one
>> another
>> >         > > committer.
>> >         > > > > >
>> >         > > > > >
>> >         > > > > >
>> >         > > > > > On Thu, Sep 28, 2017 at 9:37 AM, Madan Jampani <
>> >         > > > madan.jampani@gmail.com>
>> >         > > > > > wrote:
>> >         > > > > >
>> >         > > > > > > +1
>> >         > > > > > >
>> >         > > > > > > At a minimum I'd like to see the following two
>> happen:
>> >         > > > > > > - Option to merge is disabled until all required
>> checks pass.
>> >         > > > > > > - Code is reviewed and given +1 by at least one
>> other committer
>> >         > (no
>> >         > > > > self
>> >         > > > > > > review).
>> >         > > > > > >
>> >         > > > > > > On Wed, Sep 27, 2017 at 11:15 PM, Gautam <
>> gautamnitc@gmail.com>
>> >         > > > wrote:
>> >         > > > > > >
>> >         > > > > > > > Hi Chris,
>> >         > > > > > > >
>> >         > > > > > > >   Here <https://help.github.com/artic
>> les/about-protected-
>> >         > > branches/
>> >         > > > >
>> >         > > > > is
>> >         > > > > > > > user
>> >         > > > > > > > document on semantics of protected branch.
>> >         > > > > > > > In short when a branch is protected following
>> applies to that
>> >         > > > branch.
>> >         > > > > > > >
>> >         > > > > > > >    - Can't be force pushed
>> >         > > > > > > >    - Can't be deleted
>> >         > > > > > > >    - Can't have changes merged into it until
>> required status
>> >         > > checks
>> >         > > > > > > >    <https://help.github.com/artic
>> les/about-required-
>> >         > > status-checks>
>> >         > > > > > pass
>> >         > > > > > > >    - Can't have changes merged into it until
>> required reviews
>> >         > are
>> >         > > > > > > approved
>> >         > > > > > > >    <https://help.github.com/artic
>> les/approving-a-pull-
>> >         > > > > > > > request-with-required-reviews>
>> >         > > > > > > >    - Can't be edited or have files uploaded to it
>> from the web
>> >         > > > > > > >    - Can't have changes merged into it until
>> changes to files
>> >         > > that
>> >         > > > > > > > have a designated
>> >         > > > > > > >    code owner <https://help.github.com/
>> >         > > articles/about-codeowners/>
>> >         > > > > > have
>> >         > > > > > > >    been approved by that owner
>> >         > > > > > > >
>> >         > > > > > > >  I am sure many of us might not want to have all
>> these but we
>> >         > can
>> >         > > > > > debate
>> >         > > > > > > on
>> >         > > > > > > > it. My main motive was to "*Can't have changes
>> merged into it
>> >         > > until
>> >         > > > > > > > required status checks pass*"
>> >         > > > > > > >
>> >         > > > > > > >
>> >         > > > > > > > -Gautam
>> >         > > > > > > >
>> >         > > > > > > >
>> >         > > > > > > >
>> >         > > > > > > > On Wed, Sep 27, 2017 at 11:09 PM, Chris Olivier <
>> >         > > > > cjolivier01@gmail.com
>> >         > > > > > >
>> >         > > > > > > > wrote:
>> >         > > > > > > >
>> >         > > > > > > > > What does that mean? "Protected"? Protected
>> from what?
>> >         > > > > > > > >
>> >         > > > > > > > > On Wed, Sep 27, 2017 at 11:08 PM Gautam <
>> >         > gautamnitc@gmail.com>
>> >         > > > > > wrote:
>> >         > > > > > > > >
>> >         > > > > > > > > > Hi Chris,
>> >         > > > > > > > > >
>> >         > > > > > > > > >    I mean make "master branch protected" of
>> MXNet.
>> >         > > > > > > > > >
>> >         > > > > > > > > > -Gautam
>> >         > > > > > > > > >
>> >         > > > > > > > > > On Wed, Sep 27, 2017 at 11:04 PM, Chris
>> Olivier <
>> >         > > > > > > cjolivier01@gmail.com
>> >         > > > > > > > >
>> >         > > > > > > > > > wrote:
>> >         > > > > > > > > >
>> >         > > > > > > > > > > What does this mean? "Mx-net branch
>> protected"?
>> >         > > > > > > > > > >
>> >         > > > > > > > > > > On Wed, Sep 27, 2017 at 9:59 PM Tsuyoshi
>> OZAWA <
>> >         > > > > > > > > ozawa.tsuyoshi@gmail.com
>> >         > > > > > > > > > >
>> >         > > > > > > > > > > wrote:
>> >         > > > > > > > > > >
>> >         > > > > > > > > > > > +1,
>> >         > > > > > > > > > > >
>> >         > > > > > > > > > > > While I'm checking the recent build
>> failures, and I
>> >         > think
>> >         > > > the
>> >         > > > > > > > > decision
>> >         > > > > > > > > > > > of making the mx-net branch protected is
>> necessary for
>> >         > > > stable
>> >         > > > > > > > > > > > building.
>> >         > > > > > > > > > > > Thanks Kumar for resuming important
>> discussion.
>> >         > > > > > > > > > > >
>> >         > > > > > > > > > > > Best regards
>> >         > > > > > > > > > > > - Tsuyoshi
>> >         > > > > > > > > > > >
>> >         > > > > > > > > > > > On Thu, Sep 28, 2017 at 12:56 PM, Kumar,
>> Gautam <
>> >         > > > > > > gauta@amazon.com>
>> >         > > > > > > > > > > wrote:
>> >         > > > > > > > > > > > > Reviving the discussion.
>> >         > > > > > > > > > > > >
>> >         > > > > > > > > > > > > At this point of time we have couple of
>> stable builds
>> >         > > > > > > > > > > > >
>> >         > > > > > > > > > > >
>> >         > https://builds.apache.org/view/Incubator%20Projects/job/
>> >         > > > > > > > > > > incubator-mxnet/job/master/448/
>> >         > > > > > > > > > > > >
>> >         > > > > > > > > > > >
>> >         > https://builds.apache.org/view/Incubator%20Projects/job/
>> >         > > > > > > > > > > incubator-mxnet/job/master/449/
>> >         > > > > > > > > > > > >
>> >         > > > > > > > > > > > > Should we have a quick discussion or
>> polling on
>> >         > making
>> >         > > > the
>> >         > > > > > > mx-net
>> >         > > > > > > > > > > branch
>> >         > > > > > > > > > > > protected? If you still think we
>> shouldn’t make it
>> >         > > > protected
>> >         > > > > > > please
>> >         > > > > > > > > > > provide
>> >         > > > > > > > > > > > a reason to support your claim.
>> >         > > > > > > > > > > > >
>> >         > > > > > > > > > > > > Few of us have concern over Jenkin’s
>> stability. If I
>> >         > > look
>> >         > > > > two
>> >         > > > > > > > weeks
>> >         > > > > > > > > > > > back, after upgrading Linux slave to
>> g2.8x and new
>> >         > > windows
>> >         > > > > AMI,
>> >         > > > > > > we
>> >         > > > > > > > > have
>> >         > > > > > > > > > > not
>> >         > > > > > > > > > > > seen any case where instance died due to
>> high memory
>> >         > > usage
>> >         > > > or
>> >         > > > > > any
>> >         > > > > > > > > > process
>> >         > > > > > > > > > > > got killed due to high cpu usage or any
>> other issue
>> >         > with
>> >         > > > > > windows
>> >         > > > > > > > > > slaves.
>> >         > > > > > > > > > > > >
>> >         > > > > > > > > > > > > Going forward we are also planning that
>> if we add any
>> >         > > new
>> >         > > > > > slave
>> >         > > > > > > > we
>> >         > > > > > > > > > will
>> >         > > > > > > > > > > > not enable the main load immediately, but
>> rather will
>> >         > do
>> >         > > > > ‘test
>> >         > > > > > > > build’
>> >         > > > > > > > > > to
>> >         > > > > > > > > > > > make sure that new slaves are not causing
>> any
>> >         > > > infrastructure
>> >         > > > > > > issue
>> >         > > > > > > > > and
>> >         > > > > > > > > > > > capable to perform as good as existing
>> slaves.
>> >         > > > > > > > > > > > >
>> >         > > > > > > > > > > > > -Gautam
>> >         > > > > > > > > > > > >
>> >         > > > > > > > > > > > > On 8/31/17, 5:27 PM, "Lupesko, Hagay" <
>> >         > > lupesko@gmail.com
>> >         > > > >
>> >         > > > > > > wrote:
>> >         > > > > > > > > > > > >
>> >         > > > > > > > > > > > >     @madan looking into some failures –
>> you’re right…
>> >         > > > > there’s
>> >         > > > > > > > > > multiple
>> >         > > > > > > > > > > > issues going on, some of them
>> intermittent, and we want
>> >         > > to
>> >         > > > be
>> >         > > > > > > able
>> >         > > > > > > > to
>> >         > > > > > > > > > > merge
>> >         > > > > > > > > > > > fixes in.
>> >         > > > > > > > > > > > >     Agreed that we can wait with
>> setting up protected
>> >         > > > mode
>> >         > > > > > > until
>> >         > > > > > > > > > build
>> >         > > > > > > > > > > > stabilizes.
>> >         > > > > > > > > > > > >
>> >         > > > > > > > > > > > >     On 8/31/17, 11:41, "Madan Jampani" <
>> >         > > > > > > madan.jampani@gmail.com>
>> >         > > > > > > > > > > wrote:
>> >         > > > > > > > > > > > >
>> >         > > > > > > > > > > > >         @hagay: we agree on the end
>> state. I'm not
>> >         > too
>> >         > > > > > > particular
>> >         > > > > > > > > > about
>> >         > > > > > > > > > > > how we get
>> >         > > > > > > > > > > > >         there. If you think enabling it
>> now and fixes
>> >         > > > > > > regression
>> >         > > > > > > > > > later
>> >         > > > > > > > > > > > is doable,
>> >         > > > > > > > > > > > >         I'm fine with. I see a bit of a
>> chicken and
>> >         > egg
>> >         > > > > > > problem.
>> >         > > > > > > > We
>> >         > > > > > > > > > > need
>> >         > > > > > > > > > > > to get
>> >         > > > > > > > > > > > >         some fixes in even when the
>> status checks are
>> >         > > > > > failing.
>> >         > > > > > > > > > > > >
>> >         > > > > > > > > > > > >         On Thu, Aug 31, 2017 at 11:25
>> AM, Lupesko,
>> >         > > Hagay
>> >         > > > <
>> >         > > > > > > > > > > > lupesko@gmail.com> wrote:
>> >         > > > > > > > > > > > >
>> >         > > > > > > > > > > > >         > @madan – re: getting to a
>> stable CI first:
>> >         > > > > > > > > > > > >         > I’m concerned that by not
>> enabling
>> >         > protected
>> >         > > > > branch
>> >         > > > > > > > mode
>> >         > > > > > > > > > > ASAP,
>> >         > > > > > > > > > > > we’re just
>> >         > > > > > > > > > > > >         > taking in more regressions,
>> which makes a
>> >         > > > stable
>> >         > > > > > > build
>> >         > > > > > > > a
>> >         > > > > > > > > > > > moving target for
>> >         > > > > > > > > > > > >         > us…
>> >         > > > > > > > > > > > >         >
>> >         > > > > > > > > > > > >         > On 8/31/17, 10:49, "Zha,
>> Sheng" <
>> >         > > > > > zhasheng@amazon.com
>> >         > > > > > > >
>> >         > > > > > > > > > wrote:
>> >         > > > > > > > > > > > >         >
>> >         > > > > > > > > > > > >         >     Just one thing: please
>> don’t disable
>> >         > more
>> >         > > > > tests
>> >         > > > > > > or
>> >         > > > > > > > > just
>> >         > > > > > > > > > > > raise the
>> >         > > > > > > > > > > > >         > tolerance thresholds.
>> >         > > > > > > > > > > > >         >
>> >         > > > > > > > > > > > >         >     Best regards,
>> >         > > > > > > > > > > > >         >     -sz
>> >         > > > > > > > > > > > >         >
>> >         > > > > > > > > > > > >         >     On 8/31/17, 10:45 AM,
>> "Madan Jampani" <
>> >         > > > > > > > > > > > madan.jampani@gmail.com> wrote:
>> >         > > > > > > > > > > > >         >
>> >         > > > > > > > > > > > >         >         +1
>> >         > > > > > > > > > > > >         >         Before we can turn
>> protected mode I
>> >         > > > feel
>> >         > > > > we
>> >         > > > > > > > > should
>> >         > > > > > > > > > > > first get to a
>> >         > > > > > > > > > > > >         > stable CI
>> >         > > > > > > > > > > > >         >         pipeline.
>> >         > > > > > > > > > > > >         >         Sandeep is chasing
>> down known
>> >         > > breaking
>> >         > > > > > > issues.
>> >         > > > > > > > > > > > >         >
>> >         > > > > > > > > > > > >         >
>> >         > > > > > > > > > > > >         >         On Thu, Aug 31, 2017
>> at 10:27 AM,
>> >         > > Hagay
>> >         > > > > > > > Lupesko <
>> >         > > > > > > > > > > > lupesko@gmail.com>
>> >         > > > > > > > > > > > >         > wrote:
>> >         > > > > > > > > > > > >         >
>> >         > > > > > > > > > > > >         >         > Build stability is
>> a major issue,
>> >         > > > > builds
>> >         > > > > > > have
>> >         > > > > > > > > > been
>> >         > > > > > > > > > > > failing left
>> >         > > > > > > > > > > > >         > and right
>> >         > > > > > > > > > > > >         >         > over the last week.
>> Some of it is
>> >         > > due
>> >         > > > > to
>> >         > > > > > > > > Jenkins
>> >         > > > > > > > > > > > slave issues,
>> >         > > > > > > > > > > > >         > but some are
>> >         > > > > > > > > > > > >         >         > real regressions.
>> >         > > > > > > > > > > > >         >         > We need to be more
>> strict in the
>> >         > > code
>> >         > > > > > we're
>> >         > > > > > > > > > > > committing.
>> >         > > > > > > > > > > > >         >         >
>> >         > > > > > > > > > > > >         >         > I propose we
>> configure our master
>> >         > > to
>> >         > > > > be a
>> >         > > > > > > > > > protected
>> >         > > > > > > > > > > > branch (
>> >         > > > > > > > > > > > >         >         >
>> >         > > > > > > > > > > > https://help.github.com/articl
>> es/about-protected-
>> >         > > branches/
>> >         > > > ).
>> >         > > > > > > > > > > > >         >         >
>> >         > > > > > > > > > > > >         >         > Thoughts?
>> >         > > > > > > > > > > > >         >         >
>> >         > > > > > > > > > > > >         >         > On 2017-08-28
>> 22:41, sandeep
>> >         > > > > > krishnamurthy
>> >         > > > > > > <
>> >         > > > > > > > > > > > s...@gmail.com>
>> >         > > > > > > > > > > > >         > wrote:
>> >         > > > > > > > > > > > >         >         > > Hello Committers
>> and
>> >         > > Contributors,>
>> >         > > > > > > > > > > > >         >         > >
>> >         > > > > > > > > > > > >         >         > > Due to unstable
>> build
>> >         > pipelines,
>> >         > > > from
>> >         > > > > > > past
>> >         > > > > > > > 1
>> >         > > > > > > > > > > week,
>> >         > > > > > > > > > > > PRs are
>> >         > > > > > > > > > > > >         > being merged>
>> >         > > > > > > > > > > > >         >         > > after CR ignoring
>> PR build
>> >         > > status.
>> >         > > > > > Build
>> >         > > > > > > > > > pipeline
>> >         > > > > > > > > > > > is much more
>> >         > > > > > > > > > > > >         > stable
>> >         > > > > > > > > > > > >         >         > than>
>> >         > > > > > > > > > > > >         >         > > last week and
>> most of the build
>> >         > > > > > failures
>> >         > > > > > > > you
>> >         > > > > > > > > > see
>> >         > > > > > > > > > > > from now on,
>> >         > > > > > > > > > > > >         > are likely
>> >         > > > > > > > > > > > >         >         > to>
>> >         > > > > > > > > > > > >         >         > > be a valid
>> failure and hence,
>> >         > it
>> >         > > is
>> >         > > > > > > > > recommended
>> >         > > > > > > > > > > to
>> >         > > > > > > > > > > > wait for PR
>> >         > > > > > > > > > > > >         > builds,
>> >         > > > > > > > > > > > >         >         > see>
>> >         > > > > > > > > > > > >         >         > > the root cause of
>> any build
>> >         > > > failures
>> >         > > > > > > before
>> >         > > > > > > > > > > > proceeding with
>> >         > > > > > > > > > > > >         > merges.>
>> >         > > > > > > > > > > > >         >         > >
>> >         > > > > > > > > > > > >         >         > > At this point of
>> time, there
>> >         > are
>> >         > > 2
>> >         > > > > > > > > intermittent
>> >         > > > > > > > > > > > issue yet to
>> >         > > > > > > > > > > > >         > be fixed ->
>> >         > > > > > > > > > > > >         >         > > * Network error
>> leading to
>> >         > GitHub
>> >         > > > > > > requests
>> >         > > > > > > > > > > > throwing 404>
>> >         > > > > > > > > > > > >         >         > > * A conflict in
>> artifacts
>> >         > > generated
>> >         > > > > > > between
>> >         > > > > > > > > > > > branches/PR -
>> >         > > > > > > > > > > > >         > Cause unknown
>> >         > > > > > > > > > > > >         >         > yet.>
>> >         > > > > > > > > > > > >         >         > > These issues will
>> be fixed
>> >         > soon.>
>> >         > > > > > > > > > > > >         >         > >
>> >         > > > > > > > > > > > >         >         > >
>> >         > > > > > > > > > > > >         >         > > -- >
>> >         > > > > > > > > > > > >         >         > > Sandeep
>> Krishnamurthy>
>> >         > > > > > > > > > > > >         >         > >
>> >         > > > > > > > > > > > >         >         >
>> >         > > > > > > > > > > > >         >
>> >         > > > > > > > > > > > >         >
>> >         > > > > > > > > > > > >         >
>> >         > > > > > > > > > > > >         >
>> >         > > > > > > > > > > > >         >
>> >         > > > > > > > > > > > >         >
>> >         > > > > > > > > > > > >
>> >         > > > > > > > > > > > >
>> >         > > > > > > > > > > > >
>> >         > > > > > > > > > > > >
>> >         > > > > > > > > > > > >
>> >         > > > > > > > > > > >
>> >         > > > > > > > > > > >
>> >         > > > > > > > > > > >
>> >         > > > > > > > > > > > --
>> >         > > > > > > > > > > > - Tsuyoshi
>> >         > > > > > > > > > > >
>> >         > > > > > > > > > >
>> >         > > > > > > > > >
>> >         > > > > > > > > >
>> >         > > > > > > > > >
>> >         > > > > > > > > > --
>> >         > > > > > > > > > Best Regards,
>> >         > > > > > > > > > Gautam Kumar
>> >         > > > > > > > > >
>> >         > > > > > > > >
>> >         > > > > > > >
>> >         > > > > > > >
>> >         > > > > > > >
>> >         > > > > > > > --
>> >         > > > > > > > Best Regards,
>> >         > > > > > > > Gautam Kumar
>> >         > > > > > > >
>> >         > > > > > >
>> >         > > > > >
>> >         > > > >
>> >         > > >
>> >         > >
>> >         >
>> >
>> >
>> >
>> >
>>
>
>
>
> --
> Best Regards,
> Gautam Kumar
>
>
>
>
>


-- 
Best Regards,
Gautam Kumar

Re: Apache MXNet build failures are mostly valid - verify before merge

Posted by Steffen Rochel <st...@gmail.com>.
Hi Hen - we started to document the process -
https://cwiki.apache.org/confluence/display/MXNET/Development+Process
Contributions are welcome!
Steffen

On Sun, Oct 8, 2017 at 3:08 AM, Henri Yandell <ba...@apache.org> wrote:

> A late followup on this.
>
> Is the "How a committer develops on MXNet" documented anywhere?
> Staging/master etc? The more complex the development process, the harder it
> is for a newcomer to get involved on the project. I couldn't find it on the
> website/github.
>
> (I'd also note that the 'how to modify the website/documentation' also
> needs to be documented)
>
> I'd suggest that the how-to-dev doc also explain why it's bad for master to
> not build. One could argue that, outside of when a release is being made,
> master is not important to our users. Yes it should build, but an accident
> should only affect those who have put themselves on the bleeding edge.
>
> Hen
>
> On Mon, Oct 2, 2017 at 1:19 PM, Gautam <ga...@gmail.com> wrote:
>
> > Thanks All.
> >
> >  I've created the JIRA to mark the protected branch for master
> > https://issues.apache.org/jira/browse/INCUBATOR-205.
> > We also need to add all the committers to be code owner as discussed in
> the
> > slack, I've opened a PR for it
> > https://github.com/apache/incubator-mxnet/pull/8128.
> >
> > Good point Joern, I'll follow up on that.
> >
> > Regards,
> > Gautam
> >
> > On Fri, Sep 29, 2017 at 2:20 AM, Joern Kottmann <ko...@gmail.com>
> > wrote:
> >
> > > It also makes sense to block too old PRs from merging, because the
> > > test results are outdated and the build might fail after it gets
> > > merged.
> > >
> > > Jörn
> > >
> > > On Thu, Sep 28, 2017 at 9:14 PM, Zha, Sheng <zh...@amazon.com>
> wrote:
> > > > +1 on protected branch.
> > > >
> > > > Best regards,
> > > > -sz
> > > >
> > > > On 9/28/17, 11:48 AM, "Kumar, Gautam" <ga...@amazon.com> wrote:
> > > >
> > > >     Hi Guys,
> > > >
> > > >      Let’s focus on specific issue here.
> > > >
> > > >     Marking the master branch protected which involves “Only merge if
> > > checks has passed, and yes it will run the complete build”.
> > > >
> > > >     We can’t afford to degrade the quality and keep debugging the
> build
> > > failure forever. If it’s slow down the development at the cost of
> > quality I
> > > will vote for the quality.
> > > >     We can work on improving the infrastructure to improve the
> overall
> > > speed.  If you have any specific concerns on availability of Jenkins
> > please
> > > point out.
> > > >
> > > >     -Gautam
> > > >
> > > >
> > > >     On 9/28/17, 11:38 AM, "Chris Olivier" <cj...@gmail.com>
> > wrote:
> > > >
> > > >         -1000 on that. :)
> > > >
> > > >         On Thu, Sep 28, 2017 at 11:33 AM Naveen Swamy <
> > > mnnaveen@gmail.com> wrote:
> > > >
> > > >         > PR->Sanity test/Linux build/test->reviewer/committer
> approves
> > > the
> > > >         > change->Comment "Build Now" (Or trigger on at least one
> > > approval from a
> > > >         > committer other than author)->*Full build-*>*passes
> > > build*->Enable Merge
> > > >         >
> > > >         > Let us take this particular topic to a separate thread or
> > > discuss offline
> > > >         > if further clarification is needed.
> > > >         >
> > > >         > On Thu, Sep 28, 2017 at 11:24 AM, Chris Olivier <
> > > cjolivier01@gmail.com>
> > > >         > wrote:
> > > >         >
> > > >         > > I understand the proposal.  How to trigger a build in
> that
> > > case?
> > > >         > >
> > > >         > >
> > > >         > > On Thu, Sep 28, 2017 at 10:54 AM Madan Jampani <
> > > madan.jampani@gmail.com>
> > > >         > > wrote:
> > > >         > >
> > > >         > > > Chris,
> > > >         > > > I don't think Naveen is suggesting that a merge happen
> > > without full
> > > >         > > > verification i.e. all tests across all platforms pass.
> > > >         > > > If a PR has some back and forth and results in multiple
> > > revisions
> > > >         > (which
> > > >         > > is
> > > >         > > > arguably more common than a random unit test failing),
> we
> > > simply delay
> > > >         > > full
> > > >         > > > verification until the owner/reviewer have settled on a
> > > mutually
> > > >         > > acceptable
> > > >         > > > state.
> > > >         > > >
> > > >         > > > On Thu, Sep 28, 2017 at 10:25 AM, Chris Olivier <
> > > cjolivier01@gmail.com
> > > >         > >
> > > >         > > > wrote:
> > > >         > > >
> > > >         > > > > -1 for running only partial tests.  Most failing unit
> > > tests that get
> > > >         > > > > through fail only for certain
> platforms/configurations.
> > > I personally
> > > >         > > > > prefer to be assured the build and test is good
> before
> > > merge.  Most
> > > >         > PR
> > > >         > > > > merges aren't in a huge hurry.
> > > >         > > > >
> > > >         > > > > On Thu, Sep 28, 2017 at 9:54 AM, Naveen Swamy <
> > > mnnaveen@gmail.com>
> > > >         > > > wrote:
> > > >         > > > >
> > > >         > > > > > +1 to make it protected. Here is what I am thinking
> > > for PR builds
> > > >         > > > > > on a PR run Sanity Tests + build/test one
> > > platform->committer
> > > >         > reviews
> > > >         > > > the
> > > >         > > > > > code and issues "Build Now", a full build is
> > > run->Github checks
> > > >         > that
> > > >         > > > the
> > > >         > > > > > full build checks succeed before it can be merged.
> > > >         > > > > >
> > > >         > > > > > I agree with Madan that PR should be approved by
> one
> > > another
> > > >         > > committer.
> > > >         > > > > >
> > > >         > > > > >
> > > >         > > > > >
> > > >         > > > > > On Thu, Sep 28, 2017 at 9:37 AM, Madan Jampani <
> > > >         > > > madan.jampani@gmail.com>
> > > >         > > > > > wrote:
> > > >         > > > > >
> > > >         > > > > > > +1
> > > >         > > > > > >
> > > >         > > > > > > At a minimum I'd like to see the following two
> > > happen:
> > > >         > > > > > > - Option to merge is disabled until all required
> > > checks pass.
> > > >         > > > > > > - Code is reviewed and given +1 by at least one
> > > other committer
> > > >         > (no
> > > >         > > > > self
> > > >         > > > > > > review).
> > > >         > > > > > >
> > > >         > > > > > > On Wed, Sep 27, 2017 at 11:15 PM, Gautam <
> > > gautamnitc@gmail.com>
> > > >         > > > wrote:
> > > >         > > > > > >
> > > >         > > > > > > > Hi Chris,
> > > >         > > > > > > >
> > > >         > > > > > > >   Here <https://help.github.com/
> > > articles/about-protected-
> > > >         > > branches/
> > > >         > > > >
> > > >         > > > > is
> > > >         > > > > > > > user
> > > >         > > > > > > > document on semantics of protected branch.
> > > >         > > > > > > > In short when a branch is protected following
> > > applies to that
> > > >         > > > branch.
> > > >         > > > > > > >
> > > >         > > > > > > >    - Can't be force pushed
> > > >         > > > > > > >    - Can't be deleted
> > > >         > > > > > > >    - Can't have changes merged into it until
> > > required status
> > > >         > > checks
> > > >         > > > > > > >    <https://help.github.com/
> > > articles/about-required-
> > > >         > > status-checks>
> > > >         > > > > > pass
> > > >         > > > > > > >    - Can't have changes merged into it until
> > > required reviews
> > > >         > are
> > > >         > > > > > > approved
> > > >         > > > > > > >    <https://help.github.com/
> > > articles/approving-a-pull-
> > > >         > > > > > > > request-with-required-reviews>
> > > >         > > > > > > >    - Can't be edited or have files uploaded to
> it
> > > from the web
> > > >         > > > > > > >    - Can't have changes merged into it until
> > > changes to files
> > > >         > > that
> > > >         > > > > > > > have a designated
> > > >         > > > > > > >    code owner <https://help.github.com/
> > > >         > > articles/about-codeowners/>
> > > >         > > > > > have
> > > >         > > > > > > >    been approved by that owner
> > > >         > > > > > > >
> > > >         > > > > > > >  I am sure many of us might not want to have
> all
> > > these but we
> > > >         > can
> > > >         > > > > > debate
> > > >         > > > > > > on
> > > >         > > > > > > > it. My main motive was to "*Can't have changes
> > > merged into it
> > > >         > > until
> > > >         > > > > > > > required status checks pass*"
> > > >         > > > > > > >
> > > >         > > > > > > >
> > > >         > > > > > > > -Gautam
> > > >         > > > > > > >
> > > >         > > > > > > >
> > > >         > > > > > > >
> > > >         > > > > > > > On Wed, Sep 27, 2017 at 11:09 PM, Chris
> Olivier <
> > > >         > > > > cjolivier01@gmail.com
> > > >         > > > > > >
> > > >         > > > > > > > wrote:
> > > >         > > > > > > >
> > > >         > > > > > > > > What does that mean? "Protected"? Protected
> > from
> > > what?
> > > >         > > > > > > > >
> > > >         > > > > > > > > On Wed, Sep 27, 2017 at 11:08 PM Gautam <
> > > >         > gautamnitc@gmail.com>
> > > >         > > > > > wrote:
> > > >         > > > > > > > >
> > > >         > > > > > > > > > Hi Chris,
> > > >         > > > > > > > > >
> > > >         > > > > > > > > >    I mean make "master branch protected" of
> > > MXNet.
> > > >         > > > > > > > > >
> > > >         > > > > > > > > > -Gautam
> > > >         > > > > > > > > >
> > > >         > > > > > > > > > On Wed, Sep 27, 2017 at 11:04 PM, Chris
> > > Olivier <
> > > >         > > > > > > cjolivier01@gmail.com
> > > >         > > > > > > > >
> > > >         > > > > > > > > > wrote:
> > > >         > > > > > > > > >
> > > >         > > > > > > > > > > What does this mean? "Mx-net branch
> > > protected"?
> > > >         > > > > > > > > > >
> > > >         > > > > > > > > > > On Wed, Sep 27, 2017 at 9:59 PM Tsuyoshi
> > > OZAWA <
> > > >         > > > > > > > > ozawa.tsuyoshi@gmail.com
> > > >         > > > > > > > > > >
> > > >         > > > > > > > > > > wrote:
> > > >         > > > > > > > > > >
> > > >         > > > > > > > > > > > +1,
> > > >         > > > > > > > > > > >
> > > >         > > > > > > > > > > > While I'm checking the recent build
> > > failures, and I
> > > >         > think
> > > >         > > > the
> > > >         > > > > > > > > decision
> > > >         > > > > > > > > > > > of making the mx-net branch protected
> is
> > > necessary for
> > > >         > > > stable
> > > >         > > > > > > > > > > > building.
> > > >         > > > > > > > > > > > Thanks Kumar for resuming important
> > > discussion.
> > > >         > > > > > > > > > > >
> > > >         > > > > > > > > > > > Best regards
> > > >         > > > > > > > > > > > - Tsuyoshi
> > > >         > > > > > > > > > > >
> > > >         > > > > > > > > > > > On Thu, Sep 28, 2017 at 12:56 PM,
> Kumar,
> > > Gautam <
> > > >         > > > > > > gauta@amazon.com>
> > > >         > > > > > > > > > > wrote:
> > > >         > > > > > > > > > > > > Reviving the discussion.
> > > >         > > > > > > > > > > > >
> > > >         > > > > > > > > > > > > At this point of time we have couple
> of
> > > stable builds
> > > >         > > > > > > > > > > > >
> > > >         > > > > > > > > > > >
> > > >         > https://builds.apache.org/view/Incubator%20Projects/job/
> > > >         > > > > > > > > > > incubator-mxnet/job/master/448/
> > > >         > > > > > > > > > > > >
> > > >         > > > > > > > > > > >
> > > >         > https://builds.apache.org/view/Incubator%20Projects/job/
> > > >         > > > > > > > > > > incubator-mxnet/job/master/449/
> > > >         > > > > > > > > > > > >
> > > >         > > > > > > > > > > > > Should we have a quick discussion or
> > > polling on
> > > >         > making
> > > >         > > > the
> > > >         > > > > > > mx-net
> > > >         > > > > > > > > > > branch
> > > >         > > > > > > > > > > > protected? If you still think we
> > shouldn’t
> > > make it
> > > >         > > > protected
> > > >         > > > > > > please
> > > >         > > > > > > > > > > provide
> > > >         > > > > > > > > > > > a reason to support your claim.
> > > >         > > > > > > > > > > > >
> > > >         > > > > > > > > > > > > Few of us have concern over Jenkin’s
> > > stability. If I
> > > >         > > look
> > > >         > > > > two
> > > >         > > > > > > > weeks
> > > >         > > > > > > > > > > > back, after upgrading Linux slave to
> > g2.8x
> > > and new
> > > >         > > windows
> > > >         > > > > AMI,
> > > >         > > > > > > we
> > > >         > > > > > > > > have
> > > >         > > > > > > > > > > not
> > > >         > > > > > > > > > > > seen any case where instance died due
> to
> > > high memory
> > > >         > > usage
> > > >         > > > or
> > > >         > > > > > any
> > > >         > > > > > > > > > process
> > > >         > > > > > > > > > > > got killed due to high cpu usage or any
> > > other issue
> > > >         > with
> > > >         > > > > > windows
> > > >         > > > > > > > > > slaves.
> > > >         > > > > > > > > > > > >
> > > >         > > > > > > > > > > > > Going forward we are also planning
> that
> > > if we add any
> > > >         > > new
> > > >         > > > > > slave
> > > >         > > > > > > > we
> > > >         > > > > > > > > > will
> > > >         > > > > > > > > > > > not enable the main load immediately,
> but
> > > rather will
> > > >         > do
> > > >         > > > > ‘test
> > > >         > > > > > > > build’
> > > >         > > > > > > > > > to
> > > >         > > > > > > > > > > > make sure that new slaves are not
> causing
> > > any
> > > >         > > > infrastructure
> > > >         > > > > > > issue
> > > >         > > > > > > > > and
> > > >         > > > > > > > > > > > capable to perform as good as existing
> > > slaves.
> > > >         > > > > > > > > > > > >
> > > >         > > > > > > > > > > > > -Gautam
> > > >         > > > > > > > > > > > >
> > > >         > > > > > > > > > > > > On 8/31/17, 5:27 PM, "Lupesko,
> Hagay" <
> > > >         > > lupesko@gmail.com
> > > >         > > > >
> > > >         > > > > > > wrote:
> > > >         > > > > > > > > > > > >
> > > >         > > > > > > > > > > > >     @madan looking into some
> failures –
> > > you’re right…
> > > >         > > > > there’s
> > > >         > > > > > > > > > multiple
> > > >         > > > > > > > > > > > issues going on, some of them
> > > intermittent, and we want
> > > >         > > to
> > > >         > > > be
> > > >         > > > > > > able
> > > >         > > > > > > > to
> > > >         > > > > > > > > > > merge
> > > >         > > > > > > > > > > > fixes in.
> > > >         > > > > > > > > > > > >     Agreed that we can wait with
> > setting
> > > up protected
> > > >         > > > mode
> > > >         > > > > > > until
> > > >         > > > > > > > > > build
> > > >         > > > > > > > > > > > stabilizes.
> > > >         > > > > > > > > > > > >
> > > >         > > > > > > > > > > > >     On 8/31/17, 11:41, "Madan
> Jampani"
> > <
> > > >         > > > > > > madan.jampani@gmail.com>
> > > >         > > > > > > > > > > wrote:
> > > >         > > > > > > > > > > > >
> > > >         > > > > > > > > > > > >         @hagay: we agree on the end
> > > state. I'm not
> > > >         > too
> > > >         > > > > > > particular
> > > >         > > > > > > > > > about
> > > >         > > > > > > > > > > > how we get
> > > >         > > > > > > > > > > > >         there. If you think enabling
> it
> > > now and fixes
> > > >         > > > > > > regression
> > > >         > > > > > > > > > later
> > > >         > > > > > > > > > > > is doable,
> > > >         > > > > > > > > > > > >         I'm fine with. I see a bit
> of a
> > > chicken and
> > > >         > egg
> > > >         > > > > > > problem.
> > > >         > > > > > > > We
> > > >         > > > > > > > > > > need
> > > >         > > > > > > > > > > > to get
> > > >         > > > > > > > > > > > >         some fixes in even when the
> > > status checks are
> > > >         > > > > > failing.
> > > >         > > > > > > > > > > > >
> > > >         > > > > > > > > > > > >         On Thu, Aug 31, 2017 at 11:25
> > > AM, Lupesko,
> > > >         > > Hagay
> > > >         > > > <
> > > >         > > > > > > > > > > > lupesko@gmail.com> wrote:
> > > >         > > > > > > > > > > > >
> > > >         > > > > > > > > > > > >         > @madan – re: getting to a
> > > stable CI first:
> > > >         > > > > > > > > > > > >         > I’m concerned that by not
> > > enabling
> > > >         > protected
> > > >         > > > > branch
> > > >         > > > > > > > mode
> > > >         > > > > > > > > > > ASAP,
> > > >         > > > > > > > > > > > we’re just
> > > >         > > > > > > > > > > > >         > taking in more regressions,
> > > which makes a
> > > >         > > > stable
> > > >         > > > > > > build
> > > >         > > > > > > > a
> > > >         > > > > > > > > > > > moving target for
> > > >         > > > > > > > > > > > >         > us…
> > > >         > > > > > > > > > > > >         >
> > > >         > > > > > > > > > > > >         > On 8/31/17, 10:49, "Zha,
> > > Sheng" <
> > > >         > > > > > zhasheng@amazon.com
> > > >         > > > > > > >
> > > >         > > > > > > > > > wrote:
> > > >         > > > > > > > > > > > >         >
> > > >         > > > > > > > > > > > >         >     Just one thing: please
> > > don’t disable
> > > >         > more
> > > >         > > > > tests
> > > >         > > > > > > or
> > > >         > > > > > > > > just
> > > >         > > > > > > > > > > > raise the
> > > >         > > > > > > > > > > > >         > tolerance thresholds.
> > > >         > > > > > > > > > > > >         >
> > > >         > > > > > > > > > > > >         >     Best regards,
> > > >         > > > > > > > > > > > >         >     -sz
> > > >         > > > > > > > > > > > >         >
> > > >         > > > > > > > > > > > >         >     On 8/31/17, 10:45 AM,
> > > "Madan Jampani" <
> > > >         > > > > > > > > > > > madan.jampani@gmail.com> wrote:
> > > >         > > > > > > > > > > > >         >
> > > >         > > > > > > > > > > > >         >         +1
> > > >         > > > > > > > > > > > >         >         Before we can turn
> > > protected mode I
> > > >         > > > feel
> > > >         > > > > we
> > > >         > > > > > > > > should
> > > >         > > > > > > > > > > > first get to a
> > > >         > > > > > > > > > > > >         > stable CI
> > > >         > > > > > > > > > > > >         >         pipeline.
> > > >         > > > > > > > > > > > >         >         Sandeep is chasing
> > > down known
> > > >         > > breaking
> > > >         > > > > > > issues.
> > > >         > > > > > > > > > > > >         >
> > > >         > > > > > > > > > > > >         >
> > > >         > > > > > > > > > > > >         >         On Thu, Aug 31,
> 2017
> > > at 10:27 AM,
> > > >         > > Hagay
> > > >         > > > > > > > Lupesko <
> > > >         > > > > > > > > > > > lupesko@gmail.com>
> > > >         > > > > > > > > > > > >         > wrote:
> > > >         > > > > > > > > > > > >         >
> > > >         > > > > > > > > > > > >         >         > Build stability
> is
> > a
> > > major issue,
> > > >         > > > > builds
> > > >         > > > > > > have
> > > >         > > > > > > > > > been
> > > >         > > > > > > > > > > > failing left
> > > >         > > > > > > > > > > > >         > and right
> > > >         > > > > > > > > > > > >         >         > over the last
> week.
> > > Some of it is
> > > >         > > due
> > > >         > > > > to
> > > >         > > > > > > > > Jenkins
> > > >         > > > > > > > > > > > slave issues,
> > > >         > > > > > > > > > > > >         > but some are
> > > >         > > > > > > > > > > > >         >         > real regressions.
> > > >         > > > > > > > > > > > >         >         > We need to be
> more
> > > strict in the
> > > >         > > code
> > > >         > > > > > we're
> > > >         > > > > > > > > > > > committing.
> > > >         > > > > > > > > > > > >         >         >
> > > >         > > > > > > > > > > > >         >         > I propose we
> > > configure our master
> > > >         > > to
> > > >         > > > > be a
> > > >         > > > > > > > > > protected
> > > >         > > > > > > > > > > > branch (
> > > >         > > > > > > > > > > > >         >         >
> > > >         > > > > > > > > > > > https://help.github.com/
> > > articles/about-protected-
> > > >         > > branches/
> > > >         > > > ).
> > > >         > > > > > > > > > > > >         >         >
> > > >         > > > > > > > > > > > >         >         > Thoughts?
> > > >         > > > > > > > > > > > >         >         >
> > > >         > > > > > > > > > > > >         >         > On 2017-08-28
> > 22:41,
> > > sandeep
> > > >         > > > > > krishnamurthy
> > > >         > > > > > > <
> > > >         > > > > > > > > > > > s...@gmail.com>
> > > >         > > > > > > > > > > > >         > wrote:
> > > >         > > > > > > > > > > > >         >         > > Hello
> Committers
> > > and
> > > >         > > Contributors,>
> > > >         > > > > > > > > > > > >         >         > >
> > > >         > > > > > > > > > > > >         >         > > Due to unstable
> > > build
> > > >         > pipelines,
> > > >         > > > from
> > > >         > > > > > > past
> > > >         > > > > > > > 1
> > > >         > > > > > > > > > > week,
> > > >         > > > > > > > > > > > PRs are
> > > >         > > > > > > > > > > > >         > being merged>
> > > >         > > > > > > > > > > > >         >         > > after CR
> ignoring
> > > PR build
> > > >         > > status.
> > > >         > > > > > Build
> > > >         > > > > > > > > > pipeline
> > > >         > > > > > > > > > > > is much more
> > > >         > > > > > > > > > > > >         > stable
> > > >         > > > > > > > > > > > >         >         > than>
> > > >         > > > > > > > > > > > >         >         > > last week and
> > most
> > > of the build
> > > >         > > > > > failures
> > > >         > > > > > > > you
> > > >         > > > > > > > > > see
> > > >         > > > > > > > > > > > from now on,
> > > >         > > > > > > > > > > > >         > are likely
> > > >         > > > > > > > > > > > >         >         > to>
> > > >         > > > > > > > > > > > >         >         > > be a valid
> > failure
> > > and hence,
> > > >         > it
> > > >         > > is
> > > >         > > > > > > > > recommended
> > > >         > > > > > > > > > > to
> > > >         > > > > > > > > > > > wait for PR
> > > >         > > > > > > > > > > > >         > builds,
> > > >         > > > > > > > > > > > >         >         > see>
> > > >         > > > > > > > > > > > >         >         > > the root cause
> of
> > > any build
> > > >         > > > failures
> > > >         > > > > > > before
> > > >         > > > > > > > > > > > proceeding with
> > > >         > > > > > > > > > > > >         > merges.>
> > > >         > > > > > > > > > > > >         >         > >
> > > >         > > > > > > > > > > > >         >         > > At this point
> of
> > > time, there
> > > >         > are
> > > >         > > 2
> > > >         > > > > > > > > intermittent
> > > >         > > > > > > > > > > > issue yet to
> > > >         > > > > > > > > > > > >         > be fixed ->
> > > >         > > > > > > > > > > > >         >         > > * Network error
> > > leading to
> > > >         > GitHub
> > > >         > > > > > > requests
> > > >         > > > > > > > > > > > throwing 404>
> > > >         > > > > > > > > > > > >         >         > > * A conflict in
> > > artifacts
> > > >         > > generated
> > > >         > > > > > > between
> > > >         > > > > > > > > > > > branches/PR -
> > > >         > > > > > > > > > > > >         > Cause unknown
> > > >         > > > > > > > > > > > >         >         > yet.>
> > > >         > > > > > > > > > > > >         >         > > These issues
> will
> > > be fixed
> > > >         > soon.>
> > > >         > > > > > > > > > > > >         >         > >
> > > >         > > > > > > > > > > > >         >         > >
> > > >         > > > > > > > > > > > >         >         > > -- >
> > > >         > > > > > > > > > > > >         >         > > Sandeep
> > > Krishnamurthy>
> > > >         > > > > > > > > > > > >         >         > >
> > > >         > > > > > > > > > > > >         >         >
> > > >         > > > > > > > > > > > >         >
> > > >         > > > > > > > > > > > >         >
> > > >         > > > > > > > > > > > >         >
> > > >         > > > > > > > > > > > >         >
> > > >         > > > > > > > > > > > >         >
> > > >         > > > > > > > > > > > >         >
> > > >         > > > > > > > > > > > >
> > > >         > > > > > > > > > > > >
> > > >         > > > > > > > > > > > >
> > > >         > > > > > > > > > > > >
> > > >         > > > > > > > > > > > >
> > > >         > > > > > > > > > > >
> > > >         > > > > > > > > > > >
> > > >         > > > > > > > > > > >
> > > >         > > > > > > > > > > > --
> > > >         > > > > > > > > > > > - Tsuyoshi
> > > >         > > > > > > > > > > >
> > > >         > > > > > > > > > >
> > > >         > > > > > > > > >
> > > >         > > > > > > > > >
> > > >         > > > > > > > > >
> > > >         > > > > > > > > > --
> > > >         > > > > > > > > > Best Regards,
> > > >         > > > > > > > > > Gautam Kumar
> > > >         > > > > > > > > >
> > > >         > > > > > > > >
> > > >         > > > > > > >
> > > >         > > > > > > >
> > > >         > > > > > > >
> > > >         > > > > > > > --
> > > >         > > > > > > > Best Regards,
> > > >         > > > > > > > Gautam Kumar
> > > >         > > > > > > >
> > > >         > > > > > >
> > > >         > > > > >
> > > >         > > > >
> > > >         > > >
> > > >         > >
> > > >         >
> > > >
> > > >
> > > >
> > > >
> > >
> >
> >
> >
> > --
> > Best Regards,
> > Gautam Kumar
> >
>

Re: Apache MXNet build failures are mostly valid - verify before merge

Posted by Henri Yandell <ba...@apache.org>.
A late followup on this.

Is the "How a committer develops on MXNet" documented anywhere?
Staging/master etc? The more complex the development process, the harder it
is for a newcomer to get involved on the project. I couldn't find it on the
website/github.

(I'd also note that the 'how to modify the website/documentation' also
needs to be documented)

I'd suggest that the how-to-dev doc also explain why it's bad for master to
not build. One could argue that, outside of when a release is being made,
master is not important to our users. Yes it should build, but an accident
should only affect those who have put themselves on the bleeding edge.

Hen

On Mon, Oct 2, 2017 at 1:19 PM, Gautam <ga...@gmail.com> wrote:

> Thanks All.
>
>  I've created the JIRA to mark the protected branch for master
> https://issues.apache.org/jira/browse/INCUBATOR-205.
> We also need to add all the committers to be code owner as discussed in the
> slack, I've opened a PR for it
> https://github.com/apache/incubator-mxnet/pull/8128.
>
> Good point Joern, I'll follow up on that.
>
> Regards,
> Gautam
>
> On Fri, Sep 29, 2017 at 2:20 AM, Joern Kottmann <ko...@gmail.com>
> wrote:
>
> > It also makes sense to block too old PRs from merging, because the
> > test results are outdated and the build might fail after it gets
> > merged.
> >
> > Jörn
> >
> > On Thu, Sep 28, 2017 at 9:14 PM, Zha, Sheng <zh...@amazon.com> wrote:
> > > +1 on protected branch.
> > >
> > > Best regards,
> > > -sz
> > >
> > > On 9/28/17, 11:48 AM, "Kumar, Gautam" <ga...@amazon.com> wrote:
> > >
> > >     Hi Guys,
> > >
> > >      Let’s focus on specific issue here.
> > >
> > >     Marking the master branch protected which involves “Only merge if
> > checks has passed, and yes it will run the complete build”.
> > >
> > >     We can’t afford to degrade the quality and keep debugging the build
> > failure forever. If it’s slow down the development at the cost of
> quality I
> > will vote for the quality.
> > >     We can work on improving the infrastructure to improve the overall
> > speed.  If you have any specific concerns on availability of Jenkins
> please
> > point out.
> > >
> > >     -Gautam
> > >
> > >
> > >     On 9/28/17, 11:38 AM, "Chris Olivier" <cj...@gmail.com>
> wrote:
> > >
> > >         -1000 on that. :)
> > >
> > >         On Thu, Sep 28, 2017 at 11:33 AM Naveen Swamy <
> > mnnaveen@gmail.com> wrote:
> > >
> > >         > PR->Sanity test/Linux build/test->reviewer/committer approves
> > the
> > >         > change->Comment "Build Now" (Or trigger on at least one
> > approval from a
> > >         > committer other than author)->*Full build-*>*passes
> > build*->Enable Merge
> > >         >
> > >         > Let us take this particular topic to a separate thread or
> > discuss offline
> > >         > if further clarification is needed.
> > >         >
> > >         > On Thu, Sep 28, 2017 at 11:24 AM, Chris Olivier <
> > cjolivier01@gmail.com>
> > >         > wrote:
> > >         >
> > >         > > I understand the proposal.  How to trigger a build in that
> > case?
> > >         > >
> > >         > >
> > >         > > On Thu, Sep 28, 2017 at 10:54 AM Madan Jampani <
> > madan.jampani@gmail.com>
> > >         > > wrote:
> > >         > >
> > >         > > > Chris,
> > >         > > > I don't think Naveen is suggesting that a merge happen
> > without full
> > >         > > > verification i.e. all tests across all platforms pass.
> > >         > > > If a PR has some back and forth and results in multiple
> > revisions
> > >         > (which
> > >         > > is
> > >         > > > arguably more common than a random unit test failing), we
> > simply delay
> > >         > > full
> > >         > > > verification until the owner/reviewer have settled on a
> > mutually
> > >         > > acceptable
> > >         > > > state.
> > >         > > >
> > >         > > > On Thu, Sep 28, 2017 at 10:25 AM, Chris Olivier <
> > cjolivier01@gmail.com
> > >         > >
> > >         > > > wrote:
> > >         > > >
> > >         > > > > -1 for running only partial tests.  Most failing unit
> > tests that get
> > >         > > > > through fail only for certain platforms/configurations.
> > I personally
> > >         > > > > prefer to be assured the build and test is good before
> > merge.  Most
> > >         > PR
> > >         > > > > merges aren't in a huge hurry.
> > >         > > > >
> > >         > > > > On Thu, Sep 28, 2017 at 9:54 AM, Naveen Swamy <
> > mnnaveen@gmail.com>
> > >         > > > wrote:
> > >         > > > >
> > >         > > > > > +1 to make it protected. Here is what I am thinking
> > for PR builds
> > >         > > > > > on a PR run Sanity Tests + build/test one
> > platform->committer
> > >         > reviews
> > >         > > > the
> > >         > > > > > code and issues "Build Now", a full build is
> > run->Github checks
> > >         > that
> > >         > > > the
> > >         > > > > > full build checks succeed before it can be merged.
> > >         > > > > >
> > >         > > > > > I agree with Madan that PR should be approved by one
> > another
> > >         > > committer.
> > >         > > > > >
> > >         > > > > >
> > >         > > > > >
> > >         > > > > > On Thu, Sep 28, 2017 at 9:37 AM, Madan Jampani <
> > >         > > > madan.jampani@gmail.com>
> > >         > > > > > wrote:
> > >         > > > > >
> > >         > > > > > > +1
> > >         > > > > > >
> > >         > > > > > > At a minimum I'd like to see the following two
> > happen:
> > >         > > > > > > - Option to merge is disabled until all required
> > checks pass.
> > >         > > > > > > - Code is reviewed and given +1 by at least one
> > other committer
> > >         > (no
> > >         > > > > self
> > >         > > > > > > review).
> > >         > > > > > >
> > >         > > > > > > On Wed, Sep 27, 2017 at 11:15 PM, Gautam <
> > gautamnitc@gmail.com>
> > >         > > > wrote:
> > >         > > > > > >
> > >         > > > > > > > Hi Chris,
> > >         > > > > > > >
> > >         > > > > > > >   Here <https://help.github.com/
> > articles/about-protected-
> > >         > > branches/
> > >         > > > >
> > >         > > > > is
> > >         > > > > > > > user
> > >         > > > > > > > document on semantics of protected branch.
> > >         > > > > > > > In short when a branch is protected following
> > applies to that
> > >         > > > branch.
> > >         > > > > > > >
> > >         > > > > > > >    - Can't be force pushed
> > >         > > > > > > >    - Can't be deleted
> > >         > > > > > > >    - Can't have changes merged into it until
> > required status
> > >         > > checks
> > >         > > > > > > >    <https://help.github.com/
> > articles/about-required-
> > >         > > status-checks>
> > >         > > > > > pass
> > >         > > > > > > >    - Can't have changes merged into it until
> > required reviews
> > >         > are
> > >         > > > > > > approved
> > >         > > > > > > >    <https://help.github.com/
> > articles/approving-a-pull-
> > >         > > > > > > > request-with-required-reviews>
> > >         > > > > > > >    - Can't be edited or have files uploaded to it
> > from the web
> > >         > > > > > > >    - Can't have changes merged into it until
> > changes to files
> > >         > > that
> > >         > > > > > > > have a designated
> > >         > > > > > > >    code owner <https://help.github.com/
> > >         > > articles/about-codeowners/>
> > >         > > > > > have
> > >         > > > > > > >    been approved by that owner
> > >         > > > > > > >
> > >         > > > > > > >  I am sure many of us might not want to have all
> > these but we
> > >         > can
> > >         > > > > > debate
> > >         > > > > > > on
> > >         > > > > > > > it. My main motive was to "*Can't have changes
> > merged into it
> > >         > > until
> > >         > > > > > > > required status checks pass*"
> > >         > > > > > > >
> > >         > > > > > > >
> > >         > > > > > > > -Gautam
> > >         > > > > > > >
> > >         > > > > > > >
> > >         > > > > > > >
> > >         > > > > > > > On Wed, Sep 27, 2017 at 11:09 PM, Chris Olivier <
> > >         > > > > cjolivier01@gmail.com
> > >         > > > > > >
> > >         > > > > > > > wrote:
> > >         > > > > > > >
> > >         > > > > > > > > What does that mean? "Protected"? Protected
> from
> > what?
> > >         > > > > > > > >
> > >         > > > > > > > > On Wed, Sep 27, 2017 at 11:08 PM Gautam <
> > >         > gautamnitc@gmail.com>
> > >         > > > > > wrote:
> > >         > > > > > > > >
> > >         > > > > > > > > > Hi Chris,
> > >         > > > > > > > > >
> > >         > > > > > > > > >    I mean make "master branch protected" of
> > MXNet.
> > >         > > > > > > > > >
> > >         > > > > > > > > > -Gautam
> > >         > > > > > > > > >
> > >         > > > > > > > > > On Wed, Sep 27, 2017 at 11:04 PM, Chris
> > Olivier <
> > >         > > > > > > cjolivier01@gmail.com
> > >         > > > > > > > >
> > >         > > > > > > > > > wrote:
> > >         > > > > > > > > >
> > >         > > > > > > > > > > What does this mean? "Mx-net branch
> > protected"?
> > >         > > > > > > > > > >
> > >         > > > > > > > > > > On Wed, Sep 27, 2017 at 9:59 PM Tsuyoshi
> > OZAWA <
> > >         > > > > > > > > ozawa.tsuyoshi@gmail.com
> > >         > > > > > > > > > >
> > >         > > > > > > > > > > wrote:
> > >         > > > > > > > > > >
> > >         > > > > > > > > > > > +1,
> > >         > > > > > > > > > > >
> > >         > > > > > > > > > > > While I'm checking the recent build
> > failures, and I
> > >         > think
> > >         > > > the
> > >         > > > > > > > > decision
> > >         > > > > > > > > > > > of making the mx-net branch protected is
> > necessary for
> > >         > > > stable
> > >         > > > > > > > > > > > building.
> > >         > > > > > > > > > > > Thanks Kumar for resuming important
> > discussion.
> > >         > > > > > > > > > > >
> > >         > > > > > > > > > > > Best regards
> > >         > > > > > > > > > > > - Tsuyoshi
> > >         > > > > > > > > > > >
> > >         > > > > > > > > > > > On Thu, Sep 28, 2017 at 12:56 PM, Kumar,
> > Gautam <
> > >         > > > > > > gauta@amazon.com>
> > >         > > > > > > > > > > wrote:
> > >         > > > > > > > > > > > > Reviving the discussion.
> > >         > > > > > > > > > > > >
> > >         > > > > > > > > > > > > At this point of time we have couple of
> > stable builds
> > >         > > > > > > > > > > > >
> > >         > > > > > > > > > > >
> > >         > https://builds.apache.org/view/Incubator%20Projects/job/
> > >         > > > > > > > > > > incubator-mxnet/job/master/448/
> > >         > > > > > > > > > > > >
> > >         > > > > > > > > > > >
> > >         > https://builds.apache.org/view/Incubator%20Projects/job/
> > >         > > > > > > > > > > incubator-mxnet/job/master/449/
> > >         > > > > > > > > > > > >
> > >         > > > > > > > > > > > > Should we have a quick discussion or
> > polling on
> > >         > making
> > >         > > > the
> > >         > > > > > > mx-net
> > >         > > > > > > > > > > branch
> > >         > > > > > > > > > > > protected? If you still think we
> shouldn’t
> > make it
> > >         > > > protected
> > >         > > > > > > please
> > >         > > > > > > > > > > provide
> > >         > > > > > > > > > > > a reason to support your claim.
> > >         > > > > > > > > > > > >
> > >         > > > > > > > > > > > > Few of us have concern over Jenkin’s
> > stability. If I
> > >         > > look
> > >         > > > > two
> > >         > > > > > > > weeks
> > >         > > > > > > > > > > > back, after upgrading Linux slave to
> g2.8x
> > and new
> > >         > > windows
> > >         > > > > AMI,
> > >         > > > > > > we
> > >         > > > > > > > > have
> > >         > > > > > > > > > > not
> > >         > > > > > > > > > > > seen any case where instance died due to
> > high memory
> > >         > > usage
> > >         > > > or
> > >         > > > > > any
> > >         > > > > > > > > > process
> > >         > > > > > > > > > > > got killed due to high cpu usage or any
> > other issue
> > >         > with
> > >         > > > > > windows
> > >         > > > > > > > > > slaves.
> > >         > > > > > > > > > > > >
> > >         > > > > > > > > > > > > Going forward we are also planning that
> > if we add any
> > >         > > new
> > >         > > > > > slave
> > >         > > > > > > > we
> > >         > > > > > > > > > will
> > >         > > > > > > > > > > > not enable the main load immediately, but
> > rather will
> > >         > do
> > >         > > > > ‘test
> > >         > > > > > > > build’
> > >         > > > > > > > > > to
> > >         > > > > > > > > > > > make sure that new slaves are not causing
> > any
> > >         > > > infrastructure
> > >         > > > > > > issue
> > >         > > > > > > > > and
> > >         > > > > > > > > > > > capable to perform as good as existing
> > slaves.
> > >         > > > > > > > > > > > >
> > >         > > > > > > > > > > > > -Gautam
> > >         > > > > > > > > > > > >
> > >         > > > > > > > > > > > > On 8/31/17, 5:27 PM, "Lupesko, Hagay" <
> > >         > > lupesko@gmail.com
> > >         > > > >
> > >         > > > > > > wrote:
> > >         > > > > > > > > > > > >
> > >         > > > > > > > > > > > >     @madan looking into some failures –
> > you’re right…
> > >         > > > > there’s
> > >         > > > > > > > > > multiple
> > >         > > > > > > > > > > > issues going on, some of them
> > intermittent, and we want
> > >         > > to
> > >         > > > be
> > >         > > > > > > able
> > >         > > > > > > > to
> > >         > > > > > > > > > > merge
> > >         > > > > > > > > > > > fixes in.
> > >         > > > > > > > > > > > >     Agreed that we can wait with
> setting
> > up protected
> > >         > > > mode
> > >         > > > > > > until
> > >         > > > > > > > > > build
> > >         > > > > > > > > > > > stabilizes.
> > >         > > > > > > > > > > > >
> > >         > > > > > > > > > > > >     On 8/31/17, 11:41, "Madan Jampani"
> <
> > >         > > > > > > madan.jampani@gmail.com>
> > >         > > > > > > > > > > wrote:
> > >         > > > > > > > > > > > >
> > >         > > > > > > > > > > > >         @hagay: we agree on the end
> > state. I'm not
> > >         > too
> > >         > > > > > > particular
> > >         > > > > > > > > > about
> > >         > > > > > > > > > > > how we get
> > >         > > > > > > > > > > > >         there. If you think enabling it
> > now and fixes
> > >         > > > > > > regression
> > >         > > > > > > > > > later
> > >         > > > > > > > > > > > is doable,
> > >         > > > > > > > > > > > >         I'm fine with. I see a bit of a
> > chicken and
> > >         > egg
> > >         > > > > > > problem.
> > >         > > > > > > > We
> > >         > > > > > > > > > > need
> > >         > > > > > > > > > > > to get
> > >         > > > > > > > > > > > >         some fixes in even when the
> > status checks are
> > >         > > > > > failing.
> > >         > > > > > > > > > > > >
> > >         > > > > > > > > > > > >         On Thu, Aug 31, 2017 at 11:25
> > AM, Lupesko,
> > >         > > Hagay
> > >         > > > <
> > >         > > > > > > > > > > > lupesko@gmail.com> wrote:
> > >         > > > > > > > > > > > >
> > >         > > > > > > > > > > > >         > @madan – re: getting to a
> > stable CI first:
> > >         > > > > > > > > > > > >         > I’m concerned that by not
> > enabling
> > >         > protected
> > >         > > > > branch
> > >         > > > > > > > mode
> > >         > > > > > > > > > > ASAP,
> > >         > > > > > > > > > > > we’re just
> > >         > > > > > > > > > > > >         > taking in more regressions,
> > which makes a
> > >         > > > stable
> > >         > > > > > > build
> > >         > > > > > > > a
> > >         > > > > > > > > > > > moving target for
> > >         > > > > > > > > > > > >         > us…
> > >         > > > > > > > > > > > >         >
> > >         > > > > > > > > > > > >         > On 8/31/17, 10:49, "Zha,
> > Sheng" <
> > >         > > > > > zhasheng@amazon.com
> > >         > > > > > > >
> > >         > > > > > > > > > wrote:
> > >         > > > > > > > > > > > >         >
> > >         > > > > > > > > > > > >         >     Just one thing: please
> > don’t disable
> > >         > more
> > >         > > > > tests
> > >         > > > > > > or
> > >         > > > > > > > > just
> > >         > > > > > > > > > > > raise the
> > >         > > > > > > > > > > > >         > tolerance thresholds.
> > >         > > > > > > > > > > > >         >
> > >         > > > > > > > > > > > >         >     Best regards,
> > >         > > > > > > > > > > > >         >     -sz
> > >         > > > > > > > > > > > >         >
> > >         > > > > > > > > > > > >         >     On 8/31/17, 10:45 AM,
> > "Madan Jampani" <
> > >         > > > > > > > > > > > madan.jampani@gmail.com> wrote:
> > >         > > > > > > > > > > > >         >
> > >         > > > > > > > > > > > >         >         +1
> > >         > > > > > > > > > > > >         >         Before we can turn
> > protected mode I
> > >         > > > feel
> > >         > > > > we
> > >         > > > > > > > > should
> > >         > > > > > > > > > > > first get to a
> > >         > > > > > > > > > > > >         > stable CI
> > >         > > > > > > > > > > > >         >         pipeline.
> > >         > > > > > > > > > > > >         >         Sandeep is chasing
> > down known
> > >         > > breaking
> > >         > > > > > > issues.
> > >         > > > > > > > > > > > >         >
> > >         > > > > > > > > > > > >         >
> > >         > > > > > > > > > > > >         >         On Thu, Aug 31, 2017
> > at 10:27 AM,
> > >         > > Hagay
> > >         > > > > > > > Lupesko <
> > >         > > > > > > > > > > > lupesko@gmail.com>
> > >         > > > > > > > > > > > >         > wrote:
> > >         > > > > > > > > > > > >         >
> > >         > > > > > > > > > > > >         >         > Build stability is
> a
> > major issue,
> > >         > > > > builds
> > >         > > > > > > have
> > >         > > > > > > > > > been
> > >         > > > > > > > > > > > failing left
> > >         > > > > > > > > > > > >         > and right
> > >         > > > > > > > > > > > >         >         > over the last week.
> > Some of it is
> > >         > > due
> > >         > > > > to
> > >         > > > > > > > > Jenkins
> > >         > > > > > > > > > > > slave issues,
> > >         > > > > > > > > > > > >         > but some are
> > >         > > > > > > > > > > > >         >         > real regressions.
> > >         > > > > > > > > > > > >         >         > We need to be more
> > strict in the
> > >         > > code
> > >         > > > > > we're
> > >         > > > > > > > > > > > committing.
> > >         > > > > > > > > > > > >         >         >
> > >         > > > > > > > > > > > >         >         > I propose we
> > configure our master
> > >         > > to
> > >         > > > > be a
> > >         > > > > > > > > > protected
> > >         > > > > > > > > > > > branch (
> > >         > > > > > > > > > > > >         >         >
> > >         > > > > > > > > > > > https://help.github.com/
> > articles/about-protected-
> > >         > > branches/
> > >         > > > ).
> > >         > > > > > > > > > > > >         >         >
> > >         > > > > > > > > > > > >         >         > Thoughts?
> > >         > > > > > > > > > > > >         >         >
> > >         > > > > > > > > > > > >         >         > On 2017-08-28
> 22:41,
> > sandeep
> > >         > > > > > krishnamurthy
> > >         > > > > > > <
> > >         > > > > > > > > > > > s...@gmail.com>
> > >         > > > > > > > > > > > >         > wrote:
> > >         > > > > > > > > > > > >         >         > > Hello Committers
> > and
> > >         > > Contributors,>
> > >         > > > > > > > > > > > >         >         > >
> > >         > > > > > > > > > > > >         >         > > Due to unstable
> > build
> > >         > pipelines,
> > >         > > > from
> > >         > > > > > > past
> > >         > > > > > > > 1
> > >         > > > > > > > > > > week,
> > >         > > > > > > > > > > > PRs are
> > >         > > > > > > > > > > > >         > being merged>
> > >         > > > > > > > > > > > >         >         > > after CR ignoring
> > PR build
> > >         > > status.
> > >         > > > > > Build
> > >         > > > > > > > > > pipeline
> > >         > > > > > > > > > > > is much more
> > >         > > > > > > > > > > > >         > stable
> > >         > > > > > > > > > > > >         >         > than>
> > >         > > > > > > > > > > > >         >         > > last week and
> most
> > of the build
> > >         > > > > > failures
> > >         > > > > > > > you
> > >         > > > > > > > > > see
> > >         > > > > > > > > > > > from now on,
> > >         > > > > > > > > > > > >         > are likely
> > >         > > > > > > > > > > > >         >         > to>
> > >         > > > > > > > > > > > >         >         > > be a valid
> failure
> > and hence,
> > >         > it
> > >         > > is
> > >         > > > > > > > > recommended
> > >         > > > > > > > > > > to
> > >         > > > > > > > > > > > wait for PR
> > >         > > > > > > > > > > > >         > builds,
> > >         > > > > > > > > > > > >         >         > see>
> > >         > > > > > > > > > > > >         >         > > the root cause of
> > any build
> > >         > > > failures
> > >         > > > > > > before
> > >         > > > > > > > > > > > proceeding with
> > >         > > > > > > > > > > > >         > merges.>
> > >         > > > > > > > > > > > >         >         > >
> > >         > > > > > > > > > > > >         >         > > At this point of
> > time, there
> > >         > are
> > >         > > 2
> > >         > > > > > > > > intermittent
> > >         > > > > > > > > > > > issue yet to
> > >         > > > > > > > > > > > >         > be fixed ->
> > >         > > > > > > > > > > > >         >         > > * Network error
> > leading to
> > >         > GitHub
> > >         > > > > > > requests
> > >         > > > > > > > > > > > throwing 404>
> > >         > > > > > > > > > > > >         >         > > * A conflict in
> > artifacts
> > >         > > generated
> > >         > > > > > > between
> > >         > > > > > > > > > > > branches/PR -
> > >         > > > > > > > > > > > >         > Cause unknown
> > >         > > > > > > > > > > > >         >         > yet.>
> > >         > > > > > > > > > > > >         >         > > These issues will
> > be fixed
> > >         > soon.>
> > >         > > > > > > > > > > > >         >         > >
> > >         > > > > > > > > > > > >         >         > >
> > >         > > > > > > > > > > > >         >         > > -- >
> > >         > > > > > > > > > > > >         >         > > Sandeep
> > Krishnamurthy>
> > >         > > > > > > > > > > > >         >         > >
> > >         > > > > > > > > > > > >         >         >
> > >         > > > > > > > > > > > >         >
> > >         > > > > > > > > > > > >         >
> > >         > > > > > > > > > > > >         >
> > >         > > > > > > > > > > > >         >
> > >         > > > > > > > > > > > >         >
> > >         > > > > > > > > > > > >         >
> > >         > > > > > > > > > > > >
> > >         > > > > > > > > > > > >
> > >         > > > > > > > > > > > >
> > >         > > > > > > > > > > > >
> > >         > > > > > > > > > > > >
> > >         > > > > > > > > > > >
> > >         > > > > > > > > > > >
> > >         > > > > > > > > > > >
> > >         > > > > > > > > > > > --
> > >         > > > > > > > > > > > - Tsuyoshi
> > >         > > > > > > > > > > >
> > >         > > > > > > > > > >
> > >         > > > > > > > > >
> > >         > > > > > > > > >
> > >         > > > > > > > > >
> > >         > > > > > > > > > --
> > >         > > > > > > > > > Best Regards,
> > >         > > > > > > > > > Gautam Kumar
> > >         > > > > > > > > >
> > >         > > > > > > > >
> > >         > > > > > > >
> > >         > > > > > > >
> > >         > > > > > > >
> > >         > > > > > > > --
> > >         > > > > > > > Best Regards,
> > >         > > > > > > > Gautam Kumar
> > >         > > > > > > >
> > >         > > > > > >
> > >         > > > > >
> > >         > > > >
> > >         > > >
> > >         > >
> > >         >
> > >
> > >
> > >
> > >
> >
>
>
>
> --
> Best Regards,
> Gautam Kumar
>