You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mxnet.apache.org by "Lupesko, Hagay" <lu...@gmail.com> on 2017/09/01 00:27:12 UTC

Re: Apache MXNet build failures are mostly valid - verify before merge

@madan looking into some failures – you’re right… there’s multiple issues going on, some of them intermittent, and we want to be able to merge fixes in.
Agreed that we can wait with setting up protected mode until build stabilizes.

On 8/31/17, 11:41, "Madan Jampani" <ma...@gmail.com> wrote:

    @hagay: we agree on the end state. I'm not too particular about how we get
    there. If you think enabling it now and fixes regression later is doable,
    I'm fine with. I see a bit of a chicken and egg problem. We need to get
    some fixes in even when the status checks are failing.
    
    On Thu, Aug 31, 2017 at 11:25 AM, Lupesko, Hagay <lu...@gmail.com> wrote:
    
    > @madan – re: getting to a stable CI first:
    > I’m concerned that by not enabling protected branch mode ASAP, we’re just
    > taking in more regressions, which makes a stable build a moving target for
    > us…
    >
    > On 8/31/17, 10:49, "Zha, Sheng" <zh...@amazon.com> wrote:
    >
    >     Just one thing: please don’t disable more tests or just raise the
    > tolerance thresholds.
    >
    >     Best regards,
    >     -sz
    >
    >     On 8/31/17, 10:45 AM, "Madan Jampani" <ma...@gmail.com> wrote:
    >
    >         +1
    >         Before we can turn protected mode I feel we should first get to a
    > stable CI
    >         pipeline.
    >         Sandeep is chasing down known breaking issues.
    >
    >
    >         On Thu, Aug 31, 2017 at 10:27 AM, Hagay Lupesko <lu...@gmail.com>
    > wrote:
    >
    >         > Build stability is a major issue, builds have been failing left
    > and right
    >         > over the last week. Some of it is due to Jenkins slave issues,
    > but some are
    >         > real regressions.
    >         > We need to be more strict in the code we're committing.
    >         >
    >         > I propose we configure our master to be a protected branch (
    >         > https://help.github.com/articles/about-protected-branches/).
    >         >
    >         > Thoughts?
    >         >
    >         > On 2017-08-28 22:41, sandeep krishnamurthy <s....@gmail.com>
    > wrote:
    >         > > Hello Committers and Contributors,>
    >         > >
    >         > > Due to unstable build pipelines, from past 1 week, PRs are
    > being merged>
    >         > > after CR ignoring PR build status. Build pipeline is much more
    > stable
    >         > than>
    >         > > last week and most of the build failures you see from now on,
    > are likely
    >         > to>
    >         > > be a valid failure and hence, it is recommended to wait for PR
    > builds,
    >         > see>
    >         > > the root cause of any build failures before proceeding with
    > merges.>
    >         > >
    >         > > At this point of time, there are 2 intermittent issue yet to
    > be fixed ->
    >         > > * Network error leading to GitHub requests throwing 404>
    >         > > * A conflict in artifacts generated between branches/PR -
    > Cause unknown
    >         > yet.>
    >         > > These issues will be fixed soon.>
    >         > >
    >         > >
    >         > > -- >
    >         > > Sandeep Krishnamurthy>
    >         > >
    >         >
    >
    >
    >
    >
    >
    >
    



Re: Apache MXNet build failures are mostly valid - verify before merge

Posted by Tsuyoshi Ozawa <oz...@apache.org>.
> At a minimum I'd like to see the following two happen:
> - Option to merge is disabled until all required checks pass.
> - Code is reviewed and given +1 by at least one other committer (no self review).

I like this policy. In fact, Apache Hadoop community works with the
policy too. Thanks for your good summary, Madan.

>  And as I explained, we wouldn't need to run extensive tests on every PR, just nightly on staging.

Or, we have another option since we have limited test resources: a
privileged person's commenting on something like "ready to test" on
github PR to launch Jenkins CI like Apache Spark. I think we need more
additional infra setup for doing this, though.
https://github.com/apache/spark/pull/19181#issuecomment-328809979

On Fri, Sep 29, 2017 at 1:37 AM, Madan Jampani <ma...@gmail.com> wrote:
> +1
>
> At a minimum I'd like to see the following two happen:
> - Option to merge is disabled until all required checks pass.
> - Code is reviewed and given +1 by at least one other committer (no self
> review).
>
> On Wed, Sep 27, 2017 at 11:15 PM, Gautam <ga...@gmail.com> wrote:
>
>> Hi Chris,
>>
>>   Here <https://help.github.com/articles/about-protected-branches/> is
>> user
>> document on semantics of protected branch.
>> In short when a branch is protected following applies to that branch.
>>
>>    - Can't be force pushed
>>    - Can't be deleted
>>    - Can't have changes merged into it until required status checks
>>    <https://help.github.com/articles/about-required-status-checks> pass
>>    - Can't have changes merged into it until required reviews are approved
>>    <https://help.github.com/articles/approving-a-pull-
>> request-with-required-reviews>
>>    - Can't be edited or have files uploaded to it from the web
>>    - Can't have changes merged into it until changes to files that
>> have a designated
>>    code owner <https://help.github.com/articles/about-codeowners/> have
>>    been approved by that owner
>>
>>  I am sure many of us might not want to have all these but we can debate on
>> it. My main motive was to "*Can't have changes merged into it until
>> required status checks pass*"
>>
>>
>> -Gautam
>>
>>
>>
>> On Wed, Sep 27, 2017 at 11:09 PM, Chris Olivier <cj...@gmail.com>
>> wrote:
>>
>> > What does that mean? "Protected"? Protected from what?
>> >
>> > On Wed, Sep 27, 2017 at 11:08 PM Gautam <ga...@gmail.com> wrote:
>> >
>> > > Hi Chris,
>> > >
>> > >    I mean make "master branch protected" of  MXNet.
>> > >
>> > > -Gautam
>> > >
>> > > On Wed, Sep 27, 2017 at 11:04 PM, Chris Olivier <cjolivier01@gmail.com
>> >
>> > > wrote:
>> > >
>> > > > What does this mean? "Mx-net branch protected"?
>> > > >
>> > > > On Wed, Sep 27, 2017 at 9:59 PM Tsuyoshi OZAWA <
>> > ozawa.tsuyoshi@gmail.com
>> > > >
>> > > > wrote:
>> > > >
>> > > > > +1,
>> > > > >
>> > > > > While I'm checking the recent build failures, and I think the
>> > decision
>> > > > > of making the mx-net branch protected is necessary for stable
>> > > > > building.
>> > > > > Thanks Kumar for resuming important discussion.
>> > > > >
>> > > > > Best regards
>> > > > > - Tsuyoshi
>> > > > >
>> > > > > On Thu, Sep 28, 2017 at 12:56 PM, Kumar, Gautam <ga...@amazon.com>
>> > > > wrote:
>> > > > > > Reviving the discussion.
>> > > > > >
>> > > > > > At this point of time we have couple of stable builds
>> > > > > >
>> > > > > https://builds.apache.org/view/Incubator%20Projects/job/
>> > > > incubator-mxnet/job/master/448/
>> > > > > >
>> > > > > https://builds.apache.org/view/Incubator%20Projects/job/
>> > > > incubator-mxnet/job/master/449/
>> > > > > >
>> > > > > > Should we have a quick discussion or polling on making the mx-net
>> > > > branch
>> > > > > protected? If you still think we shouldn’t make it protected please
>> > > > provide
>> > > > > a reason to support your claim.
>> > > > > >
>> > > > > > Few of us have concern over Jenkin’s stability. If I look two
>> weeks
>> > > > > back, after upgrading Linux slave to g2.8x and new windows AMI, we
>> > have
>> > > > not
>> > > > > seen any case where instance died due to high memory usage or any
>> > > process
>> > > > > got killed due to high cpu usage or any other issue with windows
>> > > slaves.
>> > > > > >
>> > > > > > Going forward we are also planning that if we add any new slave
>> we
>> > > will
>> > > > > not enable the main load immediately, but rather will do ‘test
>> build’
>> > > to
>> > > > > make sure that new slaves are not causing any infrastructure issue
>> > and
>> > > > > capable to perform as good as existing slaves.
>> > > > > >
>> > > > > > -Gautam
>> > > > > >
>> > > > > > On 8/31/17, 5:27 PM, "Lupesko, Hagay" <lu...@gmail.com> wrote:
>> > > > > >
>> > > > > >     @madan looking into some failures – you’re right… there’s
>> > > multiple
>> > > > > issues going on, some of them intermittent, and we want to be able
>> to
>> > > > merge
>> > > > > fixes in.
>> > > > > >     Agreed that we can wait with setting up protected mode until
>> > > build
>> > > > > stabilizes.
>> > > > > >
>> > > > > >     On 8/31/17, 11:41, "Madan Jampani" <ma...@gmail.com>
>> > > > wrote:
>> > > > > >
>> > > > > >         @hagay: we agree on the end state. I'm not too particular
>> > > about
>> > > > > how we get
>> > > > > >         there. If you think enabling it now and fixes regression
>> > > later
>> > > > > is doable,
>> > > > > >         I'm fine with. I see a bit of a chicken and egg problem.
>> We
>> > > > need
>> > > > > to get
>> > > > > >         some fixes in even when the status checks are failing.
>> > > > > >
>> > > > > >         On Thu, Aug 31, 2017 at 11:25 AM, Lupesko, Hagay <
>> > > > > lupesko@gmail.com> wrote:
>> > > > > >
>> > > > > >         > @madan – re: getting to a stable CI first:
>> > > > > >         > I’m concerned that by not enabling protected branch
>> mode
>> > > > ASAP,
>> > > > > we’re just
>> > > > > >         > taking in more regressions, which makes a stable build
>> a
>> > > > > moving target for
>> > > > > >         > us…
>> > > > > >         >
>> > > > > >         > On 8/31/17, 10:49, "Zha, Sheng" <zh...@amazon.com>
>> > > wrote:
>> > > > > >         >
>> > > > > >         >     Just one thing: please don’t disable more tests or
>> > just
>> > > > > raise the
>> > > > > >         > tolerance thresholds.
>> > > > > >         >
>> > > > > >         >     Best regards,
>> > > > > >         >     -sz
>> > > > > >         >
>> > > > > >         >     On 8/31/17, 10:45 AM, "Madan Jampani" <
>> > > > > madan.jampani@gmail.com> wrote:
>> > > > > >         >
>> > > > > >         >         +1
>> > > > > >         >         Before we can turn protected mode I feel we
>> > should
>> > > > > first get to a
>> > > > > >         > stable CI
>> > > > > >         >         pipeline.
>> > > > > >         >         Sandeep is chasing down known breaking issues.
>> > > > > >         >
>> > > > > >         >
>> > > > > >         >         On Thu, Aug 31, 2017 at 10:27 AM, Hagay
>> Lupesko <
>> > > > > lupesko@gmail.com>
>> > > > > >         > wrote:
>> > > > > >         >
>> > > > > >         >         > Build stability is a major issue, builds have
>> > > been
>> > > > > failing left
>> > > > > >         > and right
>> > > > > >         >         > over the last week. Some of it is due to
>> > Jenkins
>> > > > > slave issues,
>> > > > > >         > but some are
>> > > > > >         >         > real regressions.
>> > > > > >         >         > We need to be more strict in the code we're
>> > > > > committing.
>> > > > > >         >         >
>> > > > > >         >         > I propose we configure our master to be a
>> > > protected
>> > > > > branch (
>> > > > > >         >         >
>> > > > > https://help.github.com/articles/about-protected-branches/).
>> > > > > >         >         >
>> > > > > >         >         > Thoughts?
>> > > > > >         >         >
>> > > > > >         >         > On 2017-08-28 22:41, sandeep krishnamurthy <
>> > > > > s...@gmail.com>
>> > > > > >         > wrote:
>> > > > > >         >         > > Hello Committers and Contributors,>
>> > > > > >         >         > >
>> > > > > >         >         > > Due to unstable build pipelines, from past
>> 1
>> > > > week,
>> > > > > PRs are
>> > > > > >         > being merged>
>> > > > > >         >         > > after CR ignoring PR build status. Build
>> > > pipeline
>> > > > > is much more
>> > > > > >         > stable
>> > > > > >         >         > than>
>> > > > > >         >         > > last week and most of the build failures
>> you
>> > > see
>> > > > > from now on,
>> > > > > >         > are likely
>> > > > > >         >         > to>
>> > > > > >         >         > > be a valid failure and hence, it is
>> > recommended
>> > > > to
>> > > > > wait for PR
>> > > > > >         > builds,
>> > > > > >         >         > see>
>> > > > > >         >         > > the root cause of any build failures before
>> > > > > proceeding with
>> > > > > >         > merges.>
>> > > > > >         >         > >
>> > > > > >         >         > > At this point of time, there are 2
>> > intermittent
>> > > > > issue yet to
>> > > > > >         > be fixed ->
>> > > > > >         >         > > * Network error leading to GitHub requests
>> > > > > throwing 404>
>> > > > > >         >         > > * A conflict in artifacts generated between
>> > > > > branches/PR -
>> > > > > >         > Cause unknown
>> > > > > >         >         > yet.>
>> > > > > >         >         > > These issues will be fixed soon.>
>> > > > > >         >         > >
>> > > > > >         >         > >
>> > > > > >         >         > > -- >
>> > > > > >         >         > > Sandeep Krishnamurthy>
>> > > > > >         >         > >
>> > > > > >         >         >
>> > > > > >         >
>> > > > > >         >
>> > > > > >         >
>> > > > > >         >
>> > > > > >         >
>> > > > > >         >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > --
>> > > > > - Tsuyoshi
>> > > > >
>> > > >
>> > >
>> > >
>> > >
>> > > --
>> > > Best Regards,
>> > > Gautam Kumar
>> > >
>> >
>>
>>
>>
>> --
>> Best Regards,
>> Gautam Kumar
>>

Re: Apache MXNet build failures are mostly valid - verify before merge

Posted by "Lupesko, Hagay" <lu...@gmail.com>.
Yay!

On 10/12/17, 11:27, "Gautam" <ga...@gmail.com> wrote:

    Hi All,
    
      The master branch is protected now. Please review your PR for merge.
    Thanks to Infra team for following up on this.
    
    Regards,
    Gautam
    
    On Mon, Oct 2, 2017 at 1:19 PM, Gautam <ga...@gmail.com> wrote:
    
    > Thanks All.
    >
    >  I've created the JIRA to mark the protected branch for master
    > https://issues.apache.org/jira/browse/INCUBATOR-205.
    > We also need to add all the committers to be code owner as discussed in
    > the slack, I've opened a PR for it https://github.com/apache/
    > incubator-mxnet/pull/8128.
    >
    > Good point Joern, I'll follow up on that.
    >
    > Regards,
    > Gautam
    >
    > On Fri, Sep 29, 2017 at 2:20 AM, Joern Kottmann <ko...@gmail.com>
    > wrote:
    >
    >> It also makes sense to block too old PRs from merging, because the
    >> test results are outdated and the build might fail after it gets
    >> merged.
    >>
    >> Jörn
    >>
    >> On Thu, Sep 28, 2017 at 9:14 PM, Zha, Sheng <zh...@amazon.com> wrote:
    >> > +1 on protected branch.
    >> >
    >> > Best regards,
    >> > -sz
    >> >
    >> > On 9/28/17, 11:48 AM, "Kumar, Gautam" <ga...@amazon.com> wrote:
    >> >
    >> >     Hi Guys,
    >> >
    >> >      Let’s focus on specific issue here.
    >> >
    >> >     Marking the master branch protected which involves “Only merge if
    >> checks has passed, and yes it will run the complete build”.
    >> >
    >> >     We can’t afford to degrade the quality and keep debugging the build
    >> failure forever. If it’s slow down the development at the cost of quality I
    >> will vote for the quality.
    >> >     We can work on improving the infrastructure to improve the overall
    >> speed.  If you have any specific concerns on availability of Jenkins please
    >> point out.
    >> >
    >> >     -Gautam
    >> >
    >> >
    >> >     On 9/28/17, 11:38 AM, "Chris Olivier" <cj...@gmail.com>
    >> wrote:
    >> >
    >> >         -1000 on that. :)
    >> >
    >> >         On Thu, Sep 28, 2017 at 11:33 AM Naveen Swamy <
    >> mnnaveen@gmail.com> wrote:
    >> >
    >> >         > PR->Sanity test/Linux build/test->reviewer/committer approves
    >> the
    >> >         > change->Comment "Build Now" (Or trigger on at least one
    >> approval from a
    >> >         > committer other than author)->*Full build-*>*passes
    >> build*->Enable Merge
    >> >         >
    >> >         > Let us take this particular topic to a separate thread or
    >> discuss offline
    >> >         > if further clarification is needed.
    >> >         >
    >> >         > On Thu, Sep 28, 2017 at 11:24 AM, Chris Olivier <
    >> cjolivier01@gmail.com>
    >> >         > wrote:
    >> >         >
    >> >         > > I understand the proposal.  How to trigger a build in that
    >> case?
    >> >         > >
    >> >         > >
    >> >         > > On Thu, Sep 28, 2017 at 10:54 AM Madan Jampani <
    >> madan.jampani@gmail.com>
    >> >         > > wrote:
    >> >         > >
    >> >         > > > Chris,
    >> >         > > > I don't think Naveen is suggesting that a merge happen
    >> without full
    >> >         > > > verification i.e. all tests across all platforms pass.
    >> >         > > > If a PR has some back and forth and results in multiple
    >> revisions
    >> >         > (which
    >> >         > > is
    >> >         > > > arguably more common than a random unit test failing), we
    >> simply delay
    >> >         > > full
    >> >         > > > verification until the owner/reviewer have settled on a
    >> mutually
    >> >         > > acceptable
    >> >         > > > state.
    >> >         > > >
    >> >         > > > On Thu, Sep 28, 2017 at 10:25 AM, Chris Olivier <
    >> cjolivier01@gmail.com
    >> >         > >
    >> >         > > > wrote:
    >> >         > > >
    >> >         > > > > -1 for running only partial tests.  Most failing unit
    >> tests that get
    >> >         > > > > through fail only for certain
    >> platforms/configurations.  I personally
    >> >         > > > > prefer to be assured the build and test is good before
    >> merge.  Most
    >> >         > PR
    >> >         > > > > merges aren't in a huge hurry.
    >> >         > > > >
    >> >         > > > > On Thu, Sep 28, 2017 at 9:54 AM, Naveen Swamy <
    >> mnnaveen@gmail.com>
    >> >         > > > wrote:
    >> >         > > > >
    >> >         > > > > > +1 to make it protected. Here is what I am thinking
    >> for PR builds
    >> >         > > > > > on a PR run Sanity Tests + build/test one
    >> platform->committer
    >> >         > reviews
    >> >         > > > the
    >> >         > > > > > code and issues "Build Now", a full build is
    >> run->Github checks
    >> >         > that
    >> >         > > > the
    >> >         > > > > > full build checks succeed before it can be merged.
    >> >         > > > > >
    >> >         > > > > > I agree with Madan that PR should be approved by one
    >> another
    >> >         > > committer.
    >> >         > > > > >
    >> >         > > > > >
    >> >         > > > > >
    >> >         > > > > > On Thu, Sep 28, 2017 at 9:37 AM, Madan Jampani <
    >> >         > > > madan.jampani@gmail.com>
    >> >         > > > > > wrote:
    >> >         > > > > >
    >> >         > > > > > > +1
    >> >         > > > > > >
    >> >         > > > > > > At a minimum I'd like to see the following two
    >> happen:
    >> >         > > > > > > - Option to merge is disabled until all required
    >> checks pass.
    >> >         > > > > > > - Code is reviewed and given +1 by at least one
    >> other committer
    >> >         > (no
    >> >         > > > > self
    >> >         > > > > > > review).
    >> >         > > > > > >
    >> >         > > > > > > On Wed, Sep 27, 2017 at 11:15 PM, Gautam <
    >> gautamnitc@gmail.com>
    >> >         > > > wrote:
    >> >         > > > > > >
    >> >         > > > > > > > Hi Chris,
    >> >         > > > > > > >
    >> >         > > > > > > >   Here <https://help.github.com/artic
    >> les/about-protected-
    >> >         > > branches/
    >> >         > > > >
    >> >         > > > > is
    >> >         > > > > > > > user
    >> >         > > > > > > > document on semantics of protected branch.
    >> >         > > > > > > > In short when a branch is protected following
    >> applies to that
    >> >         > > > branch.
    >> >         > > > > > > >
    >> >         > > > > > > >    - Can't be force pushed
    >> >         > > > > > > >    - Can't be deleted
    >> >         > > > > > > >    - Can't have changes merged into it until
    >> required status
    >> >         > > checks
    >> >         > > > > > > >    <https://help.github.com/artic
    >> les/about-required-
    >> >         > > status-checks>
    >> >         > > > > > pass
    >> >         > > > > > > >    - Can't have changes merged into it until
    >> required reviews
    >> >         > are
    >> >         > > > > > > approved
    >> >         > > > > > > >    <https://help.github.com/artic
    >> les/approving-a-pull-
    >> >         > > > > > > > request-with-required-reviews>
    >> >         > > > > > > >    - Can't be edited or have files uploaded to it
    >> from the web
    >> >         > > > > > > >    - Can't have changes merged into it until
    >> changes to files
    >> >         > > that
    >> >         > > > > > > > have a designated
    >> >         > > > > > > >    code owner <https://help.github.com/
    >> >         > > articles/about-codeowners/>
    >> >         > > > > > have
    >> >         > > > > > > >    been approved by that owner
    >> >         > > > > > > >
    >> >         > > > > > > >  I am sure many of us might not want to have all
    >> these but we
    >> >         > can
    >> >         > > > > > debate
    >> >         > > > > > > on
    >> >         > > > > > > > it. My main motive was to "*Can't have changes
    >> merged into it
    >> >         > > until
    >> >         > > > > > > > required status checks pass*"
    >> >         > > > > > > >
    >> >         > > > > > > >
    >> >         > > > > > > > -Gautam
    >> >         > > > > > > >
    >> >         > > > > > > >
    >> >         > > > > > > >
    >> >         > > > > > > > On Wed, Sep 27, 2017 at 11:09 PM, Chris Olivier <
    >> >         > > > > cjolivier01@gmail.com
    >> >         > > > > > >
    >> >         > > > > > > > wrote:
    >> >         > > > > > > >
    >> >         > > > > > > > > What does that mean? "Protected"? Protected
    >> from what?
    >> >         > > > > > > > >
    >> >         > > > > > > > > On Wed, Sep 27, 2017 at 11:08 PM Gautam <
    >> >         > gautamnitc@gmail.com>
    >> >         > > > > > wrote:
    >> >         > > > > > > > >
    >> >         > > > > > > > > > Hi Chris,
    >> >         > > > > > > > > >
    >> >         > > > > > > > > >    I mean make "master branch protected" of
    >> MXNet.
    >> >         > > > > > > > > >
    >> >         > > > > > > > > > -Gautam
    >> >         > > > > > > > > >
    >> >         > > > > > > > > > On Wed, Sep 27, 2017 at 11:04 PM, Chris
    >> Olivier <
    >> >         > > > > > > cjolivier01@gmail.com
    >> >         > > > > > > > >
    >> >         > > > > > > > > > wrote:
    >> >         > > > > > > > > >
    >> >         > > > > > > > > > > What does this mean? "Mx-net branch
    >> protected"?
    >> >         > > > > > > > > > >
    >> >         > > > > > > > > > > On Wed, Sep 27, 2017 at 9:59 PM Tsuyoshi
    >> OZAWA <
    >> >         > > > > > > > > ozawa.tsuyoshi@gmail.com
    >> >         > > > > > > > > > >
    >> >         > > > > > > > > > > wrote:
    >> >         > > > > > > > > > >
    >> >         > > > > > > > > > > > +1,
    >> >         > > > > > > > > > > >
    >> >         > > > > > > > > > > > While I'm checking the recent build
    >> failures, and I
    >> >         > think
    >> >         > > > the
    >> >         > > > > > > > > decision
    >> >         > > > > > > > > > > > of making the mx-net branch protected is
    >> necessary for
    >> >         > > > stable
    >> >         > > > > > > > > > > > building.
    >> >         > > > > > > > > > > > Thanks Kumar for resuming important
    >> discussion.
    >> >         > > > > > > > > > > >
    >> >         > > > > > > > > > > > Best regards
    >> >         > > > > > > > > > > > - Tsuyoshi
    >> >         > > > > > > > > > > >
    >> >         > > > > > > > > > > > On Thu, Sep 28, 2017 at 12:56 PM, Kumar,
    >> Gautam <
    >> >         > > > > > > gauta@amazon.com>
    >> >         > > > > > > > > > > wrote:
    >> >         > > > > > > > > > > > > Reviving the discussion.
    >> >         > > > > > > > > > > > >
    >> >         > > > > > > > > > > > > At this point of time we have couple of
    >> stable builds
    >> >         > > > > > > > > > > > >
    >> >         > > > > > > > > > > >
    >> >         > https://builds.apache.org/view/Incubator%20Projects/job/
    >> >         > > > > > > > > > > incubator-mxnet/job/master/448/
    >> >         > > > > > > > > > > > >
    >> >         > > > > > > > > > > >
    >> >         > https://builds.apache.org/view/Incubator%20Projects/job/
    >> >         > > > > > > > > > > incubator-mxnet/job/master/449/
    >> >         > > > > > > > > > > > >
    >> >         > > > > > > > > > > > > Should we have a quick discussion or
    >> polling on
    >> >         > making
    >> >         > > > the
    >> >         > > > > > > mx-net
    >> >         > > > > > > > > > > branch
    >> >         > > > > > > > > > > > protected? If you still think we
    >> shouldn’t make it
    >> >         > > > protected
    >> >         > > > > > > please
    >> >         > > > > > > > > > > provide
    >> >         > > > > > > > > > > > a reason to support your claim.
    >> >         > > > > > > > > > > > >
    >> >         > > > > > > > > > > > > Few of us have concern over Jenkin’s
    >> stability. If I
    >> >         > > look
    >> >         > > > > two
    >> >         > > > > > > > weeks
    >> >         > > > > > > > > > > > back, after upgrading Linux slave to
    >> g2.8x and new
    >> >         > > windows
    >> >         > > > > AMI,
    >> >         > > > > > > we
    >> >         > > > > > > > > have
    >> >         > > > > > > > > > > not
    >> >         > > > > > > > > > > > seen any case where instance died due to
    >> high memory
    >> >         > > usage
    >> >         > > > or
    >> >         > > > > > any
    >> >         > > > > > > > > > process
    >> >         > > > > > > > > > > > got killed due to high cpu usage or any
    >> other issue
    >> >         > with
    >> >         > > > > > windows
    >> >         > > > > > > > > > slaves.
    >> >         > > > > > > > > > > > >
    >> >         > > > > > > > > > > > > Going forward we are also planning that
    >> if we add any
    >> >         > > new
    >> >         > > > > > slave
    >> >         > > > > > > > we
    >> >         > > > > > > > > > will
    >> >         > > > > > > > > > > > not enable the main load immediately, but
    >> rather will
    >> >         > do
    >> >         > > > > ‘test
    >> >         > > > > > > > build’
    >> >         > > > > > > > > > to
    >> >         > > > > > > > > > > > make sure that new slaves are not causing
    >> any
    >> >         > > > infrastructure
    >> >         > > > > > > issue
    >> >         > > > > > > > > and
    >> >         > > > > > > > > > > > capable to perform as good as existing
    >> slaves.
    >> >         > > > > > > > > > > > >
    >> >         > > > > > > > > > > > > -Gautam
    >> >         > > > > > > > > > > > >
    >> >         > > > > > > > > > > > > On 8/31/17, 5:27 PM, "Lupesko, Hagay" <
    >> >         > > lupesko@gmail.com
    >> >         > > > >
    >> >         > > > > > > wrote:
    >> >         > > > > > > > > > > > >
    >> >         > > > > > > > > > > > >     @madan looking into some failures –
    >> you’re right…
    >> >         > > > > there’s
    >> >         > > > > > > > > > multiple
    >> >         > > > > > > > > > > > issues going on, some of them
    >> intermittent, and we want
    >> >         > > to
    >> >         > > > be
    >> >         > > > > > > able
    >> >         > > > > > > > to
    >> >         > > > > > > > > > > merge
    >> >         > > > > > > > > > > > fixes in.
    >> >         > > > > > > > > > > > >     Agreed that we can wait with
    >> setting up protected
    >> >         > > > mode
    >> >         > > > > > > until
    >> >         > > > > > > > > > build
    >> >         > > > > > > > > > > > stabilizes.
    >> >         > > > > > > > > > > > >
    >> >         > > > > > > > > > > > >     On 8/31/17, 11:41, "Madan Jampani" <
    >> >         > > > > > > madan.jampani@gmail.com>
    >> >         > > > > > > > > > > wrote:
    >> >         > > > > > > > > > > > >
    >> >         > > > > > > > > > > > >         @hagay: we agree on the end
    >> state. I'm not
    >> >         > too
    >> >         > > > > > > particular
    >> >         > > > > > > > > > about
    >> >         > > > > > > > > > > > how we get
    >> >         > > > > > > > > > > > >         there. If you think enabling it
    >> now and fixes
    >> >         > > > > > > regression
    >> >         > > > > > > > > > later
    >> >         > > > > > > > > > > > is doable,
    >> >         > > > > > > > > > > > >         I'm fine with. I see a bit of a
    >> chicken and
    >> >         > egg
    >> >         > > > > > > problem.
    >> >         > > > > > > > We
    >> >         > > > > > > > > > > need
    >> >         > > > > > > > > > > > to get
    >> >         > > > > > > > > > > > >         some fixes in even when the
    >> status checks are
    >> >         > > > > > failing.
    >> >         > > > > > > > > > > > >
    >> >         > > > > > > > > > > > >         On Thu, Aug 31, 2017 at 11:25
    >> AM, Lupesko,
    >> >         > > Hagay
    >> >         > > > <
    >> >         > > > > > > > > > > > lupesko@gmail.com> wrote:
    >> >         > > > > > > > > > > > >
    >> >         > > > > > > > > > > > >         > @madan – re: getting to a
    >> stable CI first:
    >> >         > > > > > > > > > > > >         > I’m concerned that by not
    >> enabling
    >> >         > protected
    >> >         > > > > branch
    >> >         > > > > > > > mode
    >> >         > > > > > > > > > > ASAP,
    >> >         > > > > > > > > > > > we’re just
    >> >         > > > > > > > > > > > >         > taking in more regressions,
    >> which makes a
    >> >         > > > stable
    >> >         > > > > > > build
    >> >         > > > > > > > a
    >> >         > > > > > > > > > > > moving target for
    >> >         > > > > > > > > > > > >         > us…
    >> >         > > > > > > > > > > > >         >
    >> >         > > > > > > > > > > > >         > On 8/31/17, 10:49, "Zha,
    >> Sheng" <
    >> >         > > > > > zhasheng@amazon.com
    >> >         > > > > > > >
    >> >         > > > > > > > > > wrote:
    >> >         > > > > > > > > > > > >         >
    >> >         > > > > > > > > > > > >         >     Just one thing: please
    >> don’t disable
    >> >         > more
    >> >         > > > > tests
    >> >         > > > > > > or
    >> >         > > > > > > > > just
    >> >         > > > > > > > > > > > raise the
    >> >         > > > > > > > > > > > >         > tolerance thresholds.
    >> >         > > > > > > > > > > > >         >
    >> >         > > > > > > > > > > > >         >     Best regards,
    >> >         > > > > > > > > > > > >         >     -sz
    >> >         > > > > > > > > > > > >         >
    >> >         > > > > > > > > > > > >         >     On 8/31/17, 10:45 AM,
    >> "Madan Jampani" <
    >> >         > > > > > > > > > > > madan.jampani@gmail.com> wrote:
    >> >         > > > > > > > > > > > >         >
    >> >         > > > > > > > > > > > >         >         +1
    >> >         > > > > > > > > > > > >         >         Before we can turn
    >> protected mode I
    >> >         > > > feel
    >> >         > > > > we
    >> >         > > > > > > > > should
    >> >         > > > > > > > > > > > first get to a
    >> >         > > > > > > > > > > > >         > stable CI
    >> >         > > > > > > > > > > > >         >         pipeline.
    >> >         > > > > > > > > > > > >         >         Sandeep is chasing
    >> down known
    >> >         > > breaking
    >> >         > > > > > > issues.
    >> >         > > > > > > > > > > > >         >
    >> >         > > > > > > > > > > > >         >
    >> >         > > > > > > > > > > > >         >         On Thu, Aug 31, 2017
    >> at 10:27 AM,
    >> >         > > Hagay
    >> >         > > > > > > > Lupesko <
    >> >         > > > > > > > > > > > lupesko@gmail.com>
    >> >         > > > > > > > > > > > >         > wrote:
    >> >         > > > > > > > > > > > >         >
    >> >         > > > > > > > > > > > >         >         > Build stability is
    >> a major issue,
    >> >         > > > > builds
    >> >         > > > > > > have
    >> >         > > > > > > > > > been
    >> >         > > > > > > > > > > > failing left
    >> >         > > > > > > > > > > > >         > and right
    >> >         > > > > > > > > > > > >         >         > over the last week.
    >> Some of it is
    >> >         > > due
    >> >         > > > > to
    >> >         > > > > > > > > Jenkins
    >> >         > > > > > > > > > > > slave issues,
    >> >         > > > > > > > > > > > >         > but some are
    >> >         > > > > > > > > > > > >         >         > real regressions.
    >> >         > > > > > > > > > > > >         >         > We need to be more
    >> strict in the
    >> >         > > code
    >> >         > > > > > we're
    >> >         > > > > > > > > > > > committing.
    >> >         > > > > > > > > > > > >         >         >
    >> >         > > > > > > > > > > > >         >         > I propose we
    >> configure our master
    >> >         > > to
    >> >         > > > > be a
    >> >         > > > > > > > > > protected
    >> >         > > > > > > > > > > > branch (
    >> >         > > > > > > > > > > > >         >         >
    >> >         > > > > > > > > > > > https://help.github.com/articl
    >> es/about-protected-
    >> >         > > branches/
    >> >         > > > ).
    >> >         > > > > > > > > > > > >         >         >
    >> >         > > > > > > > > > > > >         >         > Thoughts?
    >> >         > > > > > > > > > > > >         >         >
    >> >         > > > > > > > > > > > >         >         > On 2017-08-28
    >> 22:41, sandeep
    >> >         > > > > > krishnamurthy
    >> >         > > > > > > <
    >> >         > > > > > > > > > > > s...@gmail.com>
    >> >         > > > > > > > > > > > >         > wrote:
    >> >         > > > > > > > > > > > >         >         > > Hello Committers
    >> and
    >> >         > > Contributors,>
    >> >         > > > > > > > > > > > >         >         > >
    >> >         > > > > > > > > > > > >         >         > > Due to unstable
    >> build
    >> >         > pipelines,
    >> >         > > > from
    >> >         > > > > > > past
    >> >         > > > > > > > 1
    >> >         > > > > > > > > > > week,
    >> >         > > > > > > > > > > > PRs are
    >> >         > > > > > > > > > > > >         > being merged>
    >> >         > > > > > > > > > > > >         >         > > after CR ignoring
    >> PR build
    >> >         > > status.
    >> >         > > > > > Build
    >> >         > > > > > > > > > pipeline
    >> >         > > > > > > > > > > > is much more
    >> >         > > > > > > > > > > > >         > stable
    >> >         > > > > > > > > > > > >         >         > than>
    >> >         > > > > > > > > > > > >         >         > > last week and
    >> most of the build
    >> >         > > > > > failures
    >> >         > > > > > > > you
    >> >         > > > > > > > > > see
    >> >         > > > > > > > > > > > from now on,
    >> >         > > > > > > > > > > > >         > are likely
    >> >         > > > > > > > > > > > >         >         > to>
    >> >         > > > > > > > > > > > >         >         > > be a valid
    >> failure and hence,
    >> >         > it
    >> >         > > is
    >> >         > > > > > > > > recommended
    >> >         > > > > > > > > > > to
    >> >         > > > > > > > > > > > wait for PR
    >> >         > > > > > > > > > > > >         > builds,
    >> >         > > > > > > > > > > > >         >         > see>
    >> >         > > > > > > > > > > > >         >         > > the root cause of
    >> any build
    >> >         > > > failures
    >> >         > > > > > > before
    >> >         > > > > > > > > > > > proceeding with
    >> >         > > > > > > > > > > > >         > merges.>
    >> >         > > > > > > > > > > > >         >         > >
    >> >         > > > > > > > > > > > >         >         > > At this point of
    >> time, there
    >> >         > are
    >> >         > > 2
    >> >         > > > > > > > > intermittent
    >> >         > > > > > > > > > > > issue yet to
    >> >         > > > > > > > > > > > >         > be fixed ->
    >> >         > > > > > > > > > > > >         >         > > * Network error
    >> leading to
    >> >         > GitHub
    >> >         > > > > > > requests
    >> >         > > > > > > > > > > > throwing 404>
    >> >         > > > > > > > > > > > >         >         > > * A conflict in
    >> artifacts
    >> >         > > generated
    >> >         > > > > > > between
    >> >         > > > > > > > > > > > branches/PR -
    >> >         > > > > > > > > > > > >         > Cause unknown
    >> >         > > > > > > > > > > > >         >         > yet.>
    >> >         > > > > > > > > > > > >         >         > > These issues will
    >> be fixed
    >> >         > soon.>
    >> >         > > > > > > > > > > > >         >         > >
    >> >         > > > > > > > > > > > >         >         > >
    >> >         > > > > > > > > > > > >         >         > > -- >
    >> >         > > > > > > > > > > > >         >         > > Sandeep
    >> Krishnamurthy>
    >> >         > > > > > > > > > > > >         >         > >
    >> >         > > > > > > > > > > > >         >         >
    >> >         > > > > > > > > > > > >         >
    >> >         > > > > > > > > > > > >         >
    >> >         > > > > > > > > > > > >         >
    >> >         > > > > > > > > > > > >         >
    >> >         > > > > > > > > > > > >         >
    >> >         > > > > > > > > > > > >         >
    >> >         > > > > > > > > > > > >
    >> >         > > > > > > > > > > > >
    >> >         > > > > > > > > > > > >
    >> >         > > > > > > > > > > > >
    >> >         > > > > > > > > > > > >
    >> >         > > > > > > > > > > >
    >> >         > > > > > > > > > > >
    >> >         > > > > > > > > > > >
    >> >         > > > > > > > > > > > --
    >> >         > > > > > > > > > > > - Tsuyoshi
    >> >         > > > > > > > > > > >
    >> >         > > > > > > > > > >
    >> >         > > > > > > > > >
    >> >         > > > > > > > > >
    >> >         > > > > > > > > >
    >> >         > > > > > > > > > --
    >> >         > > > > > > > > > Best Regards,
    >> >         > > > > > > > > > Gautam Kumar
    >> >         > > > > > > > > >
    >> >         > > > > > > > >
    >> >         > > > > > > >
    >> >         > > > > > > >
    >> >         > > > > > > >
    >> >         > > > > > > > --
    >> >         > > > > > > > Best Regards,
    >> >         > > > > > > > Gautam Kumar
    >> >         > > > > > > >
    >> >         > > > > > >
    >> >         > > > > >
    >> >         > > > >
    >> >         > > >
    >> >         > >
    >> >         >
    >> >
    >> >
    >> >
    >> >
    >>
    >
    >
    >
    > --
    > Best Regards,
    > Gautam Kumar
    >
    >
    >
    >
    >
    
    
    -- 
    Best Regards,
    Gautam Kumar
    



Re: Apache MXNet build failures are mostly valid - verify before merge

Posted by Gautam <ga...@gmail.com>.
Hi All,

  The master branch is protected now. Please review your PR for merge.
Thanks to Infra team for following up on this.

Regards,
Gautam

On Mon, Oct 2, 2017 at 1:19 PM, Gautam <ga...@gmail.com> wrote:

> Thanks All.
>
>  I've created the JIRA to mark the protected branch for master
> https://issues.apache.org/jira/browse/INCUBATOR-205.
> We also need to add all the committers to be code owner as discussed in
> the slack, I've opened a PR for it https://github.com/apache/
> incubator-mxnet/pull/8128.
>
> Good point Joern, I'll follow up on that.
>
> Regards,
> Gautam
>
> On Fri, Sep 29, 2017 at 2:20 AM, Joern Kottmann <ko...@gmail.com>
> wrote:
>
>> It also makes sense to block too old PRs from merging, because the
>> test results are outdated and the build might fail after it gets
>> merged.
>>
>> Jörn
>>
>> On Thu, Sep 28, 2017 at 9:14 PM, Zha, Sheng <zh...@amazon.com> wrote:
>> > +1 on protected branch.
>> >
>> > Best regards,
>> > -sz
>> >
>> > On 9/28/17, 11:48 AM, "Kumar, Gautam" <ga...@amazon.com> wrote:
>> >
>> >     Hi Guys,
>> >
>> >      Let’s focus on specific issue here.
>> >
>> >     Marking the master branch protected which involves “Only merge if
>> checks has passed, and yes it will run the complete build”.
>> >
>> >     We can’t afford to degrade the quality and keep debugging the build
>> failure forever. If it’s slow down the development at the cost of quality I
>> will vote for the quality.
>> >     We can work on improving the infrastructure to improve the overall
>> speed.  If you have any specific concerns on availability of Jenkins please
>> point out.
>> >
>> >     -Gautam
>> >
>> >
>> >     On 9/28/17, 11:38 AM, "Chris Olivier" <cj...@gmail.com>
>> wrote:
>> >
>> >         -1000 on that. :)
>> >
>> >         On Thu, Sep 28, 2017 at 11:33 AM Naveen Swamy <
>> mnnaveen@gmail.com> wrote:
>> >
>> >         > PR->Sanity test/Linux build/test->reviewer/committer approves
>> the
>> >         > change->Comment "Build Now" (Or trigger on at least one
>> approval from a
>> >         > committer other than author)->*Full build-*>*passes
>> build*->Enable Merge
>> >         >
>> >         > Let us take this particular topic to a separate thread or
>> discuss offline
>> >         > if further clarification is needed.
>> >         >
>> >         > On Thu, Sep 28, 2017 at 11:24 AM, Chris Olivier <
>> cjolivier01@gmail.com>
>> >         > wrote:
>> >         >
>> >         > > I understand the proposal.  How to trigger a build in that
>> case?
>> >         > >
>> >         > >
>> >         > > On Thu, Sep 28, 2017 at 10:54 AM Madan Jampani <
>> madan.jampani@gmail.com>
>> >         > > wrote:
>> >         > >
>> >         > > > Chris,
>> >         > > > I don't think Naveen is suggesting that a merge happen
>> without full
>> >         > > > verification i.e. all tests across all platforms pass.
>> >         > > > If a PR has some back and forth and results in multiple
>> revisions
>> >         > (which
>> >         > > is
>> >         > > > arguably more common than a random unit test failing), we
>> simply delay
>> >         > > full
>> >         > > > verification until the owner/reviewer have settled on a
>> mutually
>> >         > > acceptable
>> >         > > > state.
>> >         > > >
>> >         > > > On Thu, Sep 28, 2017 at 10:25 AM, Chris Olivier <
>> cjolivier01@gmail.com
>> >         > >
>> >         > > > wrote:
>> >         > > >
>> >         > > > > -1 for running only partial tests.  Most failing unit
>> tests that get
>> >         > > > > through fail only for certain
>> platforms/configurations.  I personally
>> >         > > > > prefer to be assured the build and test is good before
>> merge.  Most
>> >         > PR
>> >         > > > > merges aren't in a huge hurry.
>> >         > > > >
>> >         > > > > On Thu, Sep 28, 2017 at 9:54 AM, Naveen Swamy <
>> mnnaveen@gmail.com>
>> >         > > > wrote:
>> >         > > > >
>> >         > > > > > +1 to make it protected. Here is what I am thinking
>> for PR builds
>> >         > > > > > on a PR run Sanity Tests + build/test one
>> platform->committer
>> >         > reviews
>> >         > > > the
>> >         > > > > > code and issues "Build Now", a full build is
>> run->Github checks
>> >         > that
>> >         > > > the
>> >         > > > > > full build checks succeed before it can be merged.
>> >         > > > > >
>> >         > > > > > I agree with Madan that PR should be approved by one
>> another
>> >         > > committer.
>> >         > > > > >
>> >         > > > > >
>> >         > > > > >
>> >         > > > > > On Thu, Sep 28, 2017 at 9:37 AM, Madan Jampani <
>> >         > > > madan.jampani@gmail.com>
>> >         > > > > > wrote:
>> >         > > > > >
>> >         > > > > > > +1
>> >         > > > > > >
>> >         > > > > > > At a minimum I'd like to see the following two
>> happen:
>> >         > > > > > > - Option to merge is disabled until all required
>> checks pass.
>> >         > > > > > > - Code is reviewed and given +1 by at least one
>> other committer
>> >         > (no
>> >         > > > > self
>> >         > > > > > > review).
>> >         > > > > > >
>> >         > > > > > > On Wed, Sep 27, 2017 at 11:15 PM, Gautam <
>> gautamnitc@gmail.com>
>> >         > > > wrote:
>> >         > > > > > >
>> >         > > > > > > > Hi Chris,
>> >         > > > > > > >
>> >         > > > > > > >   Here <https://help.github.com/artic
>> les/about-protected-
>> >         > > branches/
>> >         > > > >
>> >         > > > > is
>> >         > > > > > > > user
>> >         > > > > > > > document on semantics of protected branch.
>> >         > > > > > > > In short when a branch is protected following
>> applies to that
>> >         > > > branch.
>> >         > > > > > > >
>> >         > > > > > > >    - Can't be force pushed
>> >         > > > > > > >    - Can't be deleted
>> >         > > > > > > >    - Can't have changes merged into it until
>> required status
>> >         > > checks
>> >         > > > > > > >    <https://help.github.com/artic
>> les/about-required-
>> >         > > status-checks>
>> >         > > > > > pass
>> >         > > > > > > >    - Can't have changes merged into it until
>> required reviews
>> >         > are
>> >         > > > > > > approved
>> >         > > > > > > >    <https://help.github.com/artic
>> les/approving-a-pull-
>> >         > > > > > > > request-with-required-reviews>
>> >         > > > > > > >    - Can't be edited or have files uploaded to it
>> from the web
>> >         > > > > > > >    - Can't have changes merged into it until
>> changes to files
>> >         > > that
>> >         > > > > > > > have a designated
>> >         > > > > > > >    code owner <https://help.github.com/
>> >         > > articles/about-codeowners/>
>> >         > > > > > have
>> >         > > > > > > >    been approved by that owner
>> >         > > > > > > >
>> >         > > > > > > >  I am sure many of us might not want to have all
>> these but we
>> >         > can
>> >         > > > > > debate
>> >         > > > > > > on
>> >         > > > > > > > it. My main motive was to "*Can't have changes
>> merged into it
>> >         > > until
>> >         > > > > > > > required status checks pass*"
>> >         > > > > > > >
>> >         > > > > > > >
>> >         > > > > > > > -Gautam
>> >         > > > > > > >
>> >         > > > > > > >
>> >         > > > > > > >
>> >         > > > > > > > On Wed, Sep 27, 2017 at 11:09 PM, Chris Olivier <
>> >         > > > > cjolivier01@gmail.com
>> >         > > > > > >
>> >         > > > > > > > wrote:
>> >         > > > > > > >
>> >         > > > > > > > > What does that mean? "Protected"? Protected
>> from what?
>> >         > > > > > > > >
>> >         > > > > > > > > On Wed, Sep 27, 2017 at 11:08 PM Gautam <
>> >         > gautamnitc@gmail.com>
>> >         > > > > > wrote:
>> >         > > > > > > > >
>> >         > > > > > > > > > Hi Chris,
>> >         > > > > > > > > >
>> >         > > > > > > > > >    I mean make "master branch protected" of
>> MXNet.
>> >         > > > > > > > > >
>> >         > > > > > > > > > -Gautam
>> >         > > > > > > > > >
>> >         > > > > > > > > > On Wed, Sep 27, 2017 at 11:04 PM, Chris
>> Olivier <
>> >         > > > > > > cjolivier01@gmail.com
>> >         > > > > > > > >
>> >         > > > > > > > > > wrote:
>> >         > > > > > > > > >
>> >         > > > > > > > > > > What does this mean? "Mx-net branch
>> protected"?
>> >         > > > > > > > > > >
>> >         > > > > > > > > > > On Wed, Sep 27, 2017 at 9:59 PM Tsuyoshi
>> OZAWA <
>> >         > > > > > > > > ozawa.tsuyoshi@gmail.com
>> >         > > > > > > > > > >
>> >         > > > > > > > > > > wrote:
>> >         > > > > > > > > > >
>> >         > > > > > > > > > > > +1,
>> >         > > > > > > > > > > >
>> >         > > > > > > > > > > > While I'm checking the recent build
>> failures, and I
>> >         > think
>> >         > > > the
>> >         > > > > > > > > decision
>> >         > > > > > > > > > > > of making the mx-net branch protected is
>> necessary for
>> >         > > > stable
>> >         > > > > > > > > > > > building.
>> >         > > > > > > > > > > > Thanks Kumar for resuming important
>> discussion.
>> >         > > > > > > > > > > >
>> >         > > > > > > > > > > > Best regards
>> >         > > > > > > > > > > > - Tsuyoshi
>> >         > > > > > > > > > > >
>> >         > > > > > > > > > > > On Thu, Sep 28, 2017 at 12:56 PM, Kumar,
>> Gautam <
>> >         > > > > > > gauta@amazon.com>
>> >         > > > > > > > > > > wrote:
>> >         > > > > > > > > > > > > Reviving the discussion.
>> >         > > > > > > > > > > > >
>> >         > > > > > > > > > > > > At this point of time we have couple of
>> stable builds
>> >         > > > > > > > > > > > >
>> >         > > > > > > > > > > >
>> >         > https://builds.apache.org/view/Incubator%20Projects/job/
>> >         > > > > > > > > > > incubator-mxnet/job/master/448/
>> >         > > > > > > > > > > > >
>> >         > > > > > > > > > > >
>> >         > https://builds.apache.org/view/Incubator%20Projects/job/
>> >         > > > > > > > > > > incubator-mxnet/job/master/449/
>> >         > > > > > > > > > > > >
>> >         > > > > > > > > > > > > Should we have a quick discussion or
>> polling on
>> >         > making
>> >         > > > the
>> >         > > > > > > mx-net
>> >         > > > > > > > > > > branch
>> >         > > > > > > > > > > > protected? If you still think we
>> shouldn’t make it
>> >         > > > protected
>> >         > > > > > > please
>> >         > > > > > > > > > > provide
>> >         > > > > > > > > > > > a reason to support your claim.
>> >         > > > > > > > > > > > >
>> >         > > > > > > > > > > > > Few of us have concern over Jenkin’s
>> stability. If I
>> >         > > look
>> >         > > > > two
>> >         > > > > > > > weeks
>> >         > > > > > > > > > > > back, after upgrading Linux slave to
>> g2.8x and new
>> >         > > windows
>> >         > > > > AMI,
>> >         > > > > > > we
>> >         > > > > > > > > have
>> >         > > > > > > > > > > not
>> >         > > > > > > > > > > > seen any case where instance died due to
>> high memory
>> >         > > usage
>> >         > > > or
>> >         > > > > > any
>> >         > > > > > > > > > process
>> >         > > > > > > > > > > > got killed due to high cpu usage or any
>> other issue
>> >         > with
>> >         > > > > > windows
>> >         > > > > > > > > > slaves.
>> >         > > > > > > > > > > > >
>> >         > > > > > > > > > > > > Going forward we are also planning that
>> if we add any
>> >         > > new
>> >         > > > > > slave
>> >         > > > > > > > we
>> >         > > > > > > > > > will
>> >         > > > > > > > > > > > not enable the main load immediately, but
>> rather will
>> >         > do
>> >         > > > > ‘test
>> >         > > > > > > > build’
>> >         > > > > > > > > > to
>> >         > > > > > > > > > > > make sure that new slaves are not causing
>> any
>> >         > > > infrastructure
>> >         > > > > > > issue
>> >         > > > > > > > > and
>> >         > > > > > > > > > > > capable to perform as good as existing
>> slaves.
>> >         > > > > > > > > > > > >
>> >         > > > > > > > > > > > > -Gautam
>> >         > > > > > > > > > > > >
>> >         > > > > > > > > > > > > On 8/31/17, 5:27 PM, "Lupesko, Hagay" <
>> >         > > lupesko@gmail.com
>> >         > > > >
>> >         > > > > > > wrote:
>> >         > > > > > > > > > > > >
>> >         > > > > > > > > > > > >     @madan looking into some failures –
>> you’re right…
>> >         > > > > there’s
>> >         > > > > > > > > > multiple
>> >         > > > > > > > > > > > issues going on, some of them
>> intermittent, and we want
>> >         > > to
>> >         > > > be
>> >         > > > > > > able
>> >         > > > > > > > to
>> >         > > > > > > > > > > merge
>> >         > > > > > > > > > > > fixes in.
>> >         > > > > > > > > > > > >     Agreed that we can wait with
>> setting up protected
>> >         > > > mode
>> >         > > > > > > until
>> >         > > > > > > > > > build
>> >         > > > > > > > > > > > stabilizes.
>> >         > > > > > > > > > > > >
>> >         > > > > > > > > > > > >     On 8/31/17, 11:41, "Madan Jampani" <
>> >         > > > > > > madan.jampani@gmail.com>
>> >         > > > > > > > > > > wrote:
>> >         > > > > > > > > > > > >
>> >         > > > > > > > > > > > >         @hagay: we agree on the end
>> state. I'm not
>> >         > too
>> >         > > > > > > particular
>> >         > > > > > > > > > about
>> >         > > > > > > > > > > > how we get
>> >         > > > > > > > > > > > >         there. If you think enabling it
>> now and fixes
>> >         > > > > > > regression
>> >         > > > > > > > > > later
>> >         > > > > > > > > > > > is doable,
>> >         > > > > > > > > > > > >         I'm fine with. I see a bit of a
>> chicken and
>> >         > egg
>> >         > > > > > > problem.
>> >         > > > > > > > We
>> >         > > > > > > > > > > need
>> >         > > > > > > > > > > > to get
>> >         > > > > > > > > > > > >         some fixes in even when the
>> status checks are
>> >         > > > > > failing.
>> >         > > > > > > > > > > > >
>> >         > > > > > > > > > > > >         On Thu, Aug 31, 2017 at 11:25
>> AM, Lupesko,
>> >         > > Hagay
>> >         > > > <
>> >         > > > > > > > > > > > lupesko@gmail.com> wrote:
>> >         > > > > > > > > > > > >
>> >         > > > > > > > > > > > >         > @madan – re: getting to a
>> stable CI first:
>> >         > > > > > > > > > > > >         > I’m concerned that by not
>> enabling
>> >         > protected
>> >         > > > > branch
>> >         > > > > > > > mode
>> >         > > > > > > > > > > ASAP,
>> >         > > > > > > > > > > > we’re just
>> >         > > > > > > > > > > > >         > taking in more regressions,
>> which makes a
>> >         > > > stable
>> >         > > > > > > build
>> >         > > > > > > > a
>> >         > > > > > > > > > > > moving target for
>> >         > > > > > > > > > > > >         > us…
>> >         > > > > > > > > > > > >         >
>> >         > > > > > > > > > > > >         > On 8/31/17, 10:49, "Zha,
>> Sheng" <
>> >         > > > > > zhasheng@amazon.com
>> >         > > > > > > >
>> >         > > > > > > > > > wrote:
>> >         > > > > > > > > > > > >         >
>> >         > > > > > > > > > > > >         >     Just one thing: please
>> don’t disable
>> >         > more
>> >         > > > > tests
>> >         > > > > > > or
>> >         > > > > > > > > just
>> >         > > > > > > > > > > > raise the
>> >         > > > > > > > > > > > >         > tolerance thresholds.
>> >         > > > > > > > > > > > >         >
>> >         > > > > > > > > > > > >         >     Best regards,
>> >         > > > > > > > > > > > >         >     -sz
>> >         > > > > > > > > > > > >         >
>> >         > > > > > > > > > > > >         >     On 8/31/17, 10:45 AM,
>> "Madan Jampani" <
>> >         > > > > > > > > > > > madan.jampani@gmail.com> wrote:
>> >         > > > > > > > > > > > >         >
>> >         > > > > > > > > > > > >         >         +1
>> >         > > > > > > > > > > > >         >         Before we can turn
>> protected mode I
>> >         > > > feel
>> >         > > > > we
>> >         > > > > > > > > should
>> >         > > > > > > > > > > > first get to a
>> >         > > > > > > > > > > > >         > stable CI
>> >         > > > > > > > > > > > >         >         pipeline.
>> >         > > > > > > > > > > > >         >         Sandeep is chasing
>> down known
>> >         > > breaking
>> >         > > > > > > issues.
>> >         > > > > > > > > > > > >         >
>> >         > > > > > > > > > > > >         >
>> >         > > > > > > > > > > > >         >         On Thu, Aug 31, 2017
>> at 10:27 AM,
>> >         > > Hagay
>> >         > > > > > > > Lupesko <
>> >         > > > > > > > > > > > lupesko@gmail.com>
>> >         > > > > > > > > > > > >         > wrote:
>> >         > > > > > > > > > > > >         >
>> >         > > > > > > > > > > > >         >         > Build stability is
>> a major issue,
>> >         > > > > builds
>> >         > > > > > > have
>> >         > > > > > > > > > been
>> >         > > > > > > > > > > > failing left
>> >         > > > > > > > > > > > >         > and right
>> >         > > > > > > > > > > > >         >         > over the last week.
>> Some of it is
>> >         > > due
>> >         > > > > to
>> >         > > > > > > > > Jenkins
>> >         > > > > > > > > > > > slave issues,
>> >         > > > > > > > > > > > >         > but some are
>> >         > > > > > > > > > > > >         >         > real regressions.
>> >         > > > > > > > > > > > >         >         > We need to be more
>> strict in the
>> >         > > code
>> >         > > > > > we're
>> >         > > > > > > > > > > > committing.
>> >         > > > > > > > > > > > >         >         >
>> >         > > > > > > > > > > > >         >         > I propose we
>> configure our master
>> >         > > to
>> >         > > > > be a
>> >         > > > > > > > > > protected
>> >         > > > > > > > > > > > branch (
>> >         > > > > > > > > > > > >         >         >
>> >         > > > > > > > > > > > https://help.github.com/articl
>> es/about-protected-
>> >         > > branches/
>> >         > > > ).
>> >         > > > > > > > > > > > >         >         >
>> >         > > > > > > > > > > > >         >         > Thoughts?
>> >         > > > > > > > > > > > >         >         >
>> >         > > > > > > > > > > > >         >         > On 2017-08-28
>> 22:41, sandeep
>> >         > > > > > krishnamurthy
>> >         > > > > > > <
>> >         > > > > > > > > > > > s...@gmail.com>
>> >         > > > > > > > > > > > >         > wrote:
>> >         > > > > > > > > > > > >         >         > > Hello Committers
>> and
>> >         > > Contributors,>
>> >         > > > > > > > > > > > >         >         > >
>> >         > > > > > > > > > > > >         >         > > Due to unstable
>> build
>> >         > pipelines,
>> >         > > > from
>> >         > > > > > > past
>> >         > > > > > > > 1
>> >         > > > > > > > > > > week,
>> >         > > > > > > > > > > > PRs are
>> >         > > > > > > > > > > > >         > being merged>
>> >         > > > > > > > > > > > >         >         > > after CR ignoring
>> PR build
>> >         > > status.
>> >         > > > > > Build
>> >         > > > > > > > > > pipeline
>> >         > > > > > > > > > > > is much more
>> >         > > > > > > > > > > > >         > stable
>> >         > > > > > > > > > > > >         >         > than>
>> >         > > > > > > > > > > > >         >         > > last week and
>> most of the build
>> >         > > > > > failures
>> >         > > > > > > > you
>> >         > > > > > > > > > see
>> >         > > > > > > > > > > > from now on,
>> >         > > > > > > > > > > > >         > are likely
>> >         > > > > > > > > > > > >         >         > to>
>> >         > > > > > > > > > > > >         >         > > be a valid
>> failure and hence,
>> >         > it
>> >         > > is
>> >         > > > > > > > > recommended
>> >         > > > > > > > > > > to
>> >         > > > > > > > > > > > wait for PR
>> >         > > > > > > > > > > > >         > builds,
>> >         > > > > > > > > > > > >         >         > see>
>> >         > > > > > > > > > > > >         >         > > the root cause of
>> any build
>> >         > > > failures
>> >         > > > > > > before
>> >         > > > > > > > > > > > proceeding with
>> >         > > > > > > > > > > > >         > merges.>
>> >         > > > > > > > > > > > >         >         > >
>> >         > > > > > > > > > > > >         >         > > At this point of
>> time, there
>> >         > are
>> >         > > 2
>> >         > > > > > > > > intermittent
>> >         > > > > > > > > > > > issue yet to
>> >         > > > > > > > > > > > >         > be fixed ->
>> >         > > > > > > > > > > > >         >         > > * Network error
>> leading to
>> >         > GitHub
>> >         > > > > > > requests
>> >         > > > > > > > > > > > throwing 404>
>> >         > > > > > > > > > > > >         >         > > * A conflict in
>> artifacts
>> >         > > generated
>> >         > > > > > > between
>> >         > > > > > > > > > > > branches/PR -
>> >         > > > > > > > > > > > >         > Cause unknown
>> >         > > > > > > > > > > > >         >         > yet.>
>> >         > > > > > > > > > > > >         >         > > These issues will
>> be fixed
>> >         > soon.>
>> >         > > > > > > > > > > > >         >         > >
>> >         > > > > > > > > > > > >         >         > >
>> >         > > > > > > > > > > > >         >         > > -- >
>> >         > > > > > > > > > > > >         >         > > Sandeep
>> Krishnamurthy>
>> >         > > > > > > > > > > > >         >         > >
>> >         > > > > > > > > > > > >         >         >
>> >         > > > > > > > > > > > >         >
>> >         > > > > > > > > > > > >         >
>> >         > > > > > > > > > > > >         >
>> >         > > > > > > > > > > > >         >
>> >         > > > > > > > > > > > >         >
>> >         > > > > > > > > > > > >         >
>> >         > > > > > > > > > > > >
>> >         > > > > > > > > > > > >
>> >         > > > > > > > > > > > >
>> >         > > > > > > > > > > > >
>> >         > > > > > > > > > > > >
>> >         > > > > > > > > > > >
>> >         > > > > > > > > > > >
>> >         > > > > > > > > > > >
>> >         > > > > > > > > > > > --
>> >         > > > > > > > > > > > - Tsuyoshi
>> >         > > > > > > > > > > >
>> >         > > > > > > > > > >
>> >         > > > > > > > > >
>> >         > > > > > > > > >
>> >         > > > > > > > > >
>> >         > > > > > > > > > --
>> >         > > > > > > > > > Best Regards,
>> >         > > > > > > > > > Gautam Kumar
>> >         > > > > > > > > >
>> >         > > > > > > > >
>> >         > > > > > > >
>> >         > > > > > > >
>> >         > > > > > > >
>> >         > > > > > > > --
>> >         > > > > > > > Best Regards,
>> >         > > > > > > > Gautam Kumar
>> >         > > > > > > >
>> >         > > > > > >
>> >         > > > > >
>> >         > > > >
>> >         > > >
>> >         > >
>> >         >
>> >
>> >
>> >
>> >
>>
>
>
>
> --
> Best Regards,
> Gautam Kumar
>
>
>
>
>


-- 
Best Regards,
Gautam Kumar

Re: Apache MXNet build failures are mostly valid - verify before merge

Posted by Steffen Rochel <st...@gmail.com>.
Hi Hen - we started to document the process -
https://cwiki.apache.org/confluence/display/MXNET/Development+Process
Contributions are welcome!
Steffen

On Sun, Oct 8, 2017 at 3:08 AM, Henri Yandell <ba...@apache.org> wrote:

> A late followup on this.
>
> Is the "How a committer develops on MXNet" documented anywhere?
> Staging/master etc? The more complex the development process, the harder it
> is for a newcomer to get involved on the project. I couldn't find it on the
> website/github.
>
> (I'd also note that the 'how to modify the website/documentation' also
> needs to be documented)
>
> I'd suggest that the how-to-dev doc also explain why it's bad for master to
> not build. One could argue that, outside of when a release is being made,
> master is not important to our users. Yes it should build, but an accident
> should only affect those who have put themselves on the bleeding edge.
>
> Hen
>
> On Mon, Oct 2, 2017 at 1:19 PM, Gautam <ga...@gmail.com> wrote:
>
> > Thanks All.
> >
> >  I've created the JIRA to mark the protected branch for master
> > https://issues.apache.org/jira/browse/INCUBATOR-205.
> > We also need to add all the committers to be code owner as discussed in
> the
> > slack, I've opened a PR for it
> > https://github.com/apache/incubator-mxnet/pull/8128.
> >
> > Good point Joern, I'll follow up on that.
> >
> > Regards,
> > Gautam
> >
> > On Fri, Sep 29, 2017 at 2:20 AM, Joern Kottmann <ko...@gmail.com>
> > wrote:
> >
> > > It also makes sense to block too old PRs from merging, because the
> > > test results are outdated and the build might fail after it gets
> > > merged.
> > >
> > > Jörn
> > >
> > > On Thu, Sep 28, 2017 at 9:14 PM, Zha, Sheng <zh...@amazon.com>
> wrote:
> > > > +1 on protected branch.
> > > >
> > > > Best regards,
> > > > -sz
> > > >
> > > > On 9/28/17, 11:48 AM, "Kumar, Gautam" <ga...@amazon.com> wrote:
> > > >
> > > >     Hi Guys,
> > > >
> > > >      Let’s focus on specific issue here.
> > > >
> > > >     Marking the master branch protected which involves “Only merge if
> > > checks has passed, and yes it will run the complete build”.
> > > >
> > > >     We can’t afford to degrade the quality and keep debugging the
> build
> > > failure forever. If it’s slow down the development at the cost of
> > quality I
> > > will vote for the quality.
> > > >     We can work on improving the infrastructure to improve the
> overall
> > > speed.  If you have any specific concerns on availability of Jenkins
> > please
> > > point out.
> > > >
> > > >     -Gautam
> > > >
> > > >
> > > >     On 9/28/17, 11:38 AM, "Chris Olivier" <cj...@gmail.com>
> > wrote:
> > > >
> > > >         -1000 on that. :)
> > > >
> > > >         On Thu, Sep 28, 2017 at 11:33 AM Naveen Swamy <
> > > mnnaveen@gmail.com> wrote:
> > > >
> > > >         > PR->Sanity test/Linux build/test->reviewer/committer
> approves
> > > the
> > > >         > change->Comment "Build Now" (Or trigger on at least one
> > > approval from a
> > > >         > committer other than author)->*Full build-*>*passes
> > > build*->Enable Merge
> > > >         >
> > > >         > Let us take this particular topic to a separate thread or
> > > discuss offline
> > > >         > if further clarification is needed.
> > > >         >
> > > >         > On Thu, Sep 28, 2017 at 11:24 AM, Chris Olivier <
> > > cjolivier01@gmail.com>
> > > >         > wrote:
> > > >         >
> > > >         > > I understand the proposal.  How to trigger a build in
> that
> > > case?
> > > >         > >
> > > >         > >
> > > >         > > On Thu, Sep 28, 2017 at 10:54 AM Madan Jampani <
> > > madan.jampani@gmail.com>
> > > >         > > wrote:
> > > >         > >
> > > >         > > > Chris,
> > > >         > > > I don't think Naveen is suggesting that a merge happen
> > > without full
> > > >         > > > verification i.e. all tests across all platforms pass.
> > > >         > > > If a PR has some back and forth and results in multiple
> > > revisions
> > > >         > (which
> > > >         > > is
> > > >         > > > arguably more common than a random unit test failing),
> we
> > > simply delay
> > > >         > > full
> > > >         > > > verification until the owner/reviewer have settled on a
> > > mutually
> > > >         > > acceptable
> > > >         > > > state.
> > > >         > > >
> > > >         > > > On Thu, Sep 28, 2017 at 10:25 AM, Chris Olivier <
> > > cjolivier01@gmail.com
> > > >         > >
> > > >         > > > wrote:
> > > >         > > >
> > > >         > > > > -1 for running only partial tests.  Most failing unit
> > > tests that get
> > > >         > > > > through fail only for certain
> platforms/configurations.
> > > I personally
> > > >         > > > > prefer to be assured the build and test is good
> before
> > > merge.  Most
> > > >         > PR
> > > >         > > > > merges aren't in a huge hurry.
> > > >         > > > >
> > > >         > > > > On Thu, Sep 28, 2017 at 9:54 AM, Naveen Swamy <
> > > mnnaveen@gmail.com>
> > > >         > > > wrote:
> > > >         > > > >
> > > >         > > > > > +1 to make it protected. Here is what I am thinking
> > > for PR builds
> > > >         > > > > > on a PR run Sanity Tests + build/test one
> > > platform->committer
> > > >         > reviews
> > > >         > > > the
> > > >         > > > > > code and issues "Build Now", a full build is
> > > run->Github checks
> > > >         > that
> > > >         > > > the
> > > >         > > > > > full build checks succeed before it can be merged.
> > > >         > > > > >
> > > >         > > > > > I agree with Madan that PR should be approved by
> one
> > > another
> > > >         > > committer.
> > > >         > > > > >
> > > >         > > > > >
> > > >         > > > > >
> > > >         > > > > > On Thu, Sep 28, 2017 at 9:37 AM, Madan Jampani <
> > > >         > > > madan.jampani@gmail.com>
> > > >         > > > > > wrote:
> > > >         > > > > >
> > > >         > > > > > > +1
> > > >         > > > > > >
> > > >         > > > > > > At a minimum I'd like to see the following two
> > > happen:
> > > >         > > > > > > - Option to merge is disabled until all required
> > > checks pass.
> > > >         > > > > > > - Code is reviewed and given +1 by at least one
> > > other committer
> > > >         > (no
> > > >         > > > > self
> > > >         > > > > > > review).
> > > >         > > > > > >
> > > >         > > > > > > On Wed, Sep 27, 2017 at 11:15 PM, Gautam <
> > > gautamnitc@gmail.com>
> > > >         > > > wrote:
> > > >         > > > > > >
> > > >         > > > > > > > Hi Chris,
> > > >         > > > > > > >
> > > >         > > > > > > >   Here <https://help.github.com/
> > > articles/about-protected-
> > > >         > > branches/
> > > >         > > > >
> > > >         > > > > is
> > > >         > > > > > > > user
> > > >         > > > > > > > document on semantics of protected branch.
> > > >         > > > > > > > In short when a branch is protected following
> > > applies to that
> > > >         > > > branch.
> > > >         > > > > > > >
> > > >         > > > > > > >    - Can't be force pushed
> > > >         > > > > > > >    - Can't be deleted
> > > >         > > > > > > >    - Can't have changes merged into it until
> > > required status
> > > >         > > checks
> > > >         > > > > > > >    <https://help.github.com/
> > > articles/about-required-
> > > >         > > status-checks>
> > > >         > > > > > pass
> > > >         > > > > > > >    - Can't have changes merged into it until
> > > required reviews
> > > >         > are
> > > >         > > > > > > approved
> > > >         > > > > > > >    <https://help.github.com/
> > > articles/approving-a-pull-
> > > >         > > > > > > > request-with-required-reviews>
> > > >         > > > > > > >    - Can't be edited or have files uploaded to
> it
> > > from the web
> > > >         > > > > > > >    - Can't have changes merged into it until
> > > changes to files
> > > >         > > that
> > > >         > > > > > > > have a designated
> > > >         > > > > > > >    code owner <https://help.github.com/
> > > >         > > articles/about-codeowners/>
> > > >         > > > > > have
> > > >         > > > > > > >    been approved by that owner
> > > >         > > > > > > >
> > > >         > > > > > > >  I am sure many of us might not want to have
> all
> > > these but we
> > > >         > can
> > > >         > > > > > debate
> > > >         > > > > > > on
> > > >         > > > > > > > it. My main motive was to "*Can't have changes
> > > merged into it
> > > >         > > until
> > > >         > > > > > > > required status checks pass*"
> > > >         > > > > > > >
> > > >         > > > > > > >
> > > >         > > > > > > > -Gautam
> > > >         > > > > > > >
> > > >         > > > > > > >
> > > >         > > > > > > >
> > > >         > > > > > > > On Wed, Sep 27, 2017 at 11:09 PM, Chris
> Olivier <
> > > >         > > > > cjolivier01@gmail.com
> > > >         > > > > > >
> > > >         > > > > > > > wrote:
> > > >         > > > > > > >
> > > >         > > > > > > > > What does that mean? "Protected"? Protected
> > from
> > > what?
> > > >         > > > > > > > >
> > > >         > > > > > > > > On Wed, Sep 27, 2017 at 11:08 PM Gautam <
> > > >         > gautamnitc@gmail.com>
> > > >         > > > > > wrote:
> > > >         > > > > > > > >
> > > >         > > > > > > > > > Hi Chris,
> > > >         > > > > > > > > >
> > > >         > > > > > > > > >    I mean make "master branch protected" of
> > > MXNet.
> > > >         > > > > > > > > >
> > > >         > > > > > > > > > -Gautam
> > > >         > > > > > > > > >
> > > >         > > > > > > > > > On Wed, Sep 27, 2017 at 11:04 PM, Chris
> > > Olivier <
> > > >         > > > > > > cjolivier01@gmail.com
> > > >         > > > > > > > >
> > > >         > > > > > > > > > wrote:
> > > >         > > > > > > > > >
> > > >         > > > > > > > > > > What does this mean? "Mx-net branch
> > > protected"?
> > > >         > > > > > > > > > >
> > > >         > > > > > > > > > > On Wed, Sep 27, 2017 at 9:59 PM Tsuyoshi
> > > OZAWA <
> > > >         > > > > > > > > ozawa.tsuyoshi@gmail.com
> > > >         > > > > > > > > > >
> > > >         > > > > > > > > > > wrote:
> > > >         > > > > > > > > > >
> > > >         > > > > > > > > > > > +1,
> > > >         > > > > > > > > > > >
> > > >         > > > > > > > > > > > While I'm checking the recent build
> > > failures, and I
> > > >         > think
> > > >         > > > the
> > > >         > > > > > > > > decision
> > > >         > > > > > > > > > > > of making the mx-net branch protected
> is
> > > necessary for
> > > >         > > > stable
> > > >         > > > > > > > > > > > building.
> > > >         > > > > > > > > > > > Thanks Kumar for resuming important
> > > discussion.
> > > >         > > > > > > > > > > >
> > > >         > > > > > > > > > > > Best regards
> > > >         > > > > > > > > > > > - Tsuyoshi
> > > >         > > > > > > > > > > >
> > > >         > > > > > > > > > > > On Thu, Sep 28, 2017 at 12:56 PM,
> Kumar,
> > > Gautam <
> > > >         > > > > > > gauta@amazon.com>
> > > >         > > > > > > > > > > wrote:
> > > >         > > > > > > > > > > > > Reviving the discussion.
> > > >         > > > > > > > > > > > >
> > > >         > > > > > > > > > > > > At this point of time we have couple
> of
> > > stable builds
> > > >         > > > > > > > > > > > >
> > > >         > > > > > > > > > > >
> > > >         > https://builds.apache.org/view/Incubator%20Projects/job/
> > > >         > > > > > > > > > > incubator-mxnet/job/master/448/
> > > >         > > > > > > > > > > > >
> > > >         > > > > > > > > > > >
> > > >         > https://builds.apache.org/view/Incubator%20Projects/job/
> > > >         > > > > > > > > > > incubator-mxnet/job/master/449/
> > > >         > > > > > > > > > > > >
> > > >         > > > > > > > > > > > > Should we have a quick discussion or
> > > polling on
> > > >         > making
> > > >         > > > the
> > > >         > > > > > > mx-net
> > > >         > > > > > > > > > > branch
> > > >         > > > > > > > > > > > protected? If you still think we
> > shouldn’t
> > > make it
> > > >         > > > protected
> > > >         > > > > > > please
> > > >         > > > > > > > > > > provide
> > > >         > > > > > > > > > > > a reason to support your claim.
> > > >         > > > > > > > > > > > >
> > > >         > > > > > > > > > > > > Few of us have concern over Jenkin’s
> > > stability. If I
> > > >         > > look
> > > >         > > > > two
> > > >         > > > > > > > weeks
> > > >         > > > > > > > > > > > back, after upgrading Linux slave to
> > g2.8x
> > > and new
> > > >         > > windows
> > > >         > > > > AMI,
> > > >         > > > > > > we
> > > >         > > > > > > > > have
> > > >         > > > > > > > > > > not
> > > >         > > > > > > > > > > > seen any case where instance died due
> to
> > > high memory
> > > >         > > usage
> > > >         > > > or
> > > >         > > > > > any
> > > >         > > > > > > > > > process
> > > >         > > > > > > > > > > > got killed due to high cpu usage or any
> > > other issue
> > > >         > with
> > > >         > > > > > windows
> > > >         > > > > > > > > > slaves.
> > > >         > > > > > > > > > > > >
> > > >         > > > > > > > > > > > > Going forward we are also planning
> that
> > > if we add any
> > > >         > > new
> > > >         > > > > > slave
> > > >         > > > > > > > we
> > > >         > > > > > > > > > will
> > > >         > > > > > > > > > > > not enable the main load immediately,
> but
> > > rather will
> > > >         > do
> > > >         > > > > ‘test
> > > >         > > > > > > > build’
> > > >         > > > > > > > > > to
> > > >         > > > > > > > > > > > make sure that new slaves are not
> causing
> > > any
> > > >         > > > infrastructure
> > > >         > > > > > > issue
> > > >         > > > > > > > > and
> > > >         > > > > > > > > > > > capable to perform as good as existing
> > > slaves.
> > > >         > > > > > > > > > > > >
> > > >         > > > > > > > > > > > > -Gautam
> > > >         > > > > > > > > > > > >
> > > >         > > > > > > > > > > > > On 8/31/17, 5:27 PM, "Lupesko,
> Hagay" <
> > > >         > > lupesko@gmail.com
> > > >         > > > >
> > > >         > > > > > > wrote:
> > > >         > > > > > > > > > > > >
> > > >         > > > > > > > > > > > >     @madan looking into some
> failures –
> > > you’re right…
> > > >         > > > > there’s
> > > >         > > > > > > > > > multiple
> > > >         > > > > > > > > > > > issues going on, some of them
> > > intermittent, and we want
> > > >         > > to
> > > >         > > > be
> > > >         > > > > > > able
> > > >         > > > > > > > to
> > > >         > > > > > > > > > > merge
> > > >         > > > > > > > > > > > fixes in.
> > > >         > > > > > > > > > > > >     Agreed that we can wait with
> > setting
> > > up protected
> > > >         > > > mode
> > > >         > > > > > > until
> > > >         > > > > > > > > > build
> > > >         > > > > > > > > > > > stabilizes.
> > > >         > > > > > > > > > > > >
> > > >         > > > > > > > > > > > >     On 8/31/17, 11:41, "Madan
> Jampani"
> > <
> > > >         > > > > > > madan.jampani@gmail.com>
> > > >         > > > > > > > > > > wrote:
> > > >         > > > > > > > > > > > >
> > > >         > > > > > > > > > > > >         @hagay: we agree on the end
> > > state. I'm not
> > > >         > too
> > > >         > > > > > > particular
> > > >         > > > > > > > > > about
> > > >         > > > > > > > > > > > how we get
> > > >         > > > > > > > > > > > >         there. If you think enabling
> it
> > > now and fixes
> > > >         > > > > > > regression
> > > >         > > > > > > > > > later
> > > >         > > > > > > > > > > > is doable,
> > > >         > > > > > > > > > > > >         I'm fine with. I see a bit
> of a
> > > chicken and
> > > >         > egg
> > > >         > > > > > > problem.
> > > >         > > > > > > > We
> > > >         > > > > > > > > > > need
> > > >         > > > > > > > > > > > to get
> > > >         > > > > > > > > > > > >         some fixes in even when the
> > > status checks are
> > > >         > > > > > failing.
> > > >         > > > > > > > > > > > >
> > > >         > > > > > > > > > > > >         On Thu, Aug 31, 2017 at 11:25
> > > AM, Lupesko,
> > > >         > > Hagay
> > > >         > > > <
> > > >         > > > > > > > > > > > lupesko@gmail.com> wrote:
> > > >         > > > > > > > > > > > >
> > > >         > > > > > > > > > > > >         > @madan – re: getting to a
> > > stable CI first:
> > > >         > > > > > > > > > > > >         > I’m concerned that by not
> > > enabling
> > > >         > protected
> > > >         > > > > branch
> > > >         > > > > > > > mode
> > > >         > > > > > > > > > > ASAP,
> > > >         > > > > > > > > > > > we’re just
> > > >         > > > > > > > > > > > >         > taking in more regressions,
> > > which makes a
> > > >         > > > stable
> > > >         > > > > > > build
> > > >         > > > > > > > a
> > > >         > > > > > > > > > > > moving target for
> > > >         > > > > > > > > > > > >         > us…
> > > >         > > > > > > > > > > > >         >
> > > >         > > > > > > > > > > > >         > On 8/31/17, 10:49, "Zha,
> > > Sheng" <
> > > >         > > > > > zhasheng@amazon.com
> > > >         > > > > > > >
> > > >         > > > > > > > > > wrote:
> > > >         > > > > > > > > > > > >         >
> > > >         > > > > > > > > > > > >         >     Just one thing: please
> > > don’t disable
> > > >         > more
> > > >         > > > > tests
> > > >         > > > > > > or
> > > >         > > > > > > > > just
> > > >         > > > > > > > > > > > raise the
> > > >         > > > > > > > > > > > >         > tolerance thresholds.
> > > >         > > > > > > > > > > > >         >
> > > >         > > > > > > > > > > > >         >     Best regards,
> > > >         > > > > > > > > > > > >         >     -sz
> > > >         > > > > > > > > > > > >         >
> > > >         > > > > > > > > > > > >         >     On 8/31/17, 10:45 AM,
> > > "Madan Jampani" <
> > > >         > > > > > > > > > > > madan.jampani@gmail.com> wrote:
> > > >         > > > > > > > > > > > >         >
> > > >         > > > > > > > > > > > >         >         +1
> > > >         > > > > > > > > > > > >         >         Before we can turn
> > > protected mode I
> > > >         > > > feel
> > > >         > > > > we
> > > >         > > > > > > > > should
> > > >         > > > > > > > > > > > first get to a
> > > >         > > > > > > > > > > > >         > stable CI
> > > >         > > > > > > > > > > > >         >         pipeline.
> > > >         > > > > > > > > > > > >         >         Sandeep is chasing
> > > down known
> > > >         > > breaking
> > > >         > > > > > > issues.
> > > >         > > > > > > > > > > > >         >
> > > >         > > > > > > > > > > > >         >
> > > >         > > > > > > > > > > > >         >         On Thu, Aug 31,
> 2017
> > > at 10:27 AM,
> > > >         > > Hagay
> > > >         > > > > > > > Lupesko <
> > > >         > > > > > > > > > > > lupesko@gmail.com>
> > > >         > > > > > > > > > > > >         > wrote:
> > > >         > > > > > > > > > > > >         >
> > > >         > > > > > > > > > > > >         >         > Build stability
> is
> > a
> > > major issue,
> > > >         > > > > builds
> > > >         > > > > > > have
> > > >         > > > > > > > > > been
> > > >         > > > > > > > > > > > failing left
> > > >         > > > > > > > > > > > >         > and right
> > > >         > > > > > > > > > > > >         >         > over the last
> week.
> > > Some of it is
> > > >         > > due
> > > >         > > > > to
> > > >         > > > > > > > > Jenkins
> > > >         > > > > > > > > > > > slave issues,
> > > >         > > > > > > > > > > > >         > but some are
> > > >         > > > > > > > > > > > >         >         > real regressions.
> > > >         > > > > > > > > > > > >         >         > We need to be
> more
> > > strict in the
> > > >         > > code
> > > >         > > > > > we're
> > > >         > > > > > > > > > > > committing.
> > > >         > > > > > > > > > > > >         >         >
> > > >         > > > > > > > > > > > >         >         > I propose we
> > > configure our master
> > > >         > > to
> > > >         > > > > be a
> > > >         > > > > > > > > > protected
> > > >         > > > > > > > > > > > branch (
> > > >         > > > > > > > > > > > >         >         >
> > > >         > > > > > > > > > > > https://help.github.com/
> > > articles/about-protected-
> > > >         > > branches/
> > > >         > > > ).
> > > >         > > > > > > > > > > > >         >         >
> > > >         > > > > > > > > > > > >         >         > Thoughts?
> > > >         > > > > > > > > > > > >         >         >
> > > >         > > > > > > > > > > > >         >         > On 2017-08-28
> > 22:41,
> > > sandeep
> > > >         > > > > > krishnamurthy
> > > >         > > > > > > <
> > > >         > > > > > > > > > > > s...@gmail.com>
> > > >         > > > > > > > > > > > >         > wrote:
> > > >         > > > > > > > > > > > >         >         > > Hello
> Committers
> > > and
> > > >         > > Contributors,>
> > > >         > > > > > > > > > > > >         >         > >
> > > >         > > > > > > > > > > > >         >         > > Due to unstable
> > > build
> > > >         > pipelines,
> > > >         > > > from
> > > >         > > > > > > past
> > > >         > > > > > > > 1
> > > >         > > > > > > > > > > week,
> > > >         > > > > > > > > > > > PRs are
> > > >         > > > > > > > > > > > >         > being merged>
> > > >         > > > > > > > > > > > >         >         > > after CR
> ignoring
> > > PR build
> > > >         > > status.
> > > >         > > > > > Build
> > > >         > > > > > > > > > pipeline
> > > >         > > > > > > > > > > > is much more
> > > >         > > > > > > > > > > > >         > stable
> > > >         > > > > > > > > > > > >         >         > than>
> > > >         > > > > > > > > > > > >         >         > > last week and
> > most
> > > of the build
> > > >         > > > > > failures
> > > >         > > > > > > > you
> > > >         > > > > > > > > > see
> > > >         > > > > > > > > > > > from now on,
> > > >         > > > > > > > > > > > >         > are likely
> > > >         > > > > > > > > > > > >         >         > to>
> > > >         > > > > > > > > > > > >         >         > > be a valid
> > failure
> > > and hence,
> > > >         > it
> > > >         > > is
> > > >         > > > > > > > > recommended
> > > >         > > > > > > > > > > to
> > > >         > > > > > > > > > > > wait for PR
> > > >         > > > > > > > > > > > >         > builds,
> > > >         > > > > > > > > > > > >         >         > see>
> > > >         > > > > > > > > > > > >         >         > > the root cause
> of
> > > any build
> > > >         > > > failures
> > > >         > > > > > > before
> > > >         > > > > > > > > > > > proceeding with
> > > >         > > > > > > > > > > > >         > merges.>
> > > >         > > > > > > > > > > > >         >         > >
> > > >         > > > > > > > > > > > >         >         > > At this point
> of
> > > time, there
> > > >         > are
> > > >         > > 2
> > > >         > > > > > > > > intermittent
> > > >         > > > > > > > > > > > issue yet to
> > > >         > > > > > > > > > > > >         > be fixed ->
> > > >         > > > > > > > > > > > >         >         > > * Network error
> > > leading to
> > > >         > GitHub
> > > >         > > > > > > requests
> > > >         > > > > > > > > > > > throwing 404>
> > > >         > > > > > > > > > > > >         >         > > * A conflict in
> > > artifacts
> > > >         > > generated
> > > >         > > > > > > between
> > > >         > > > > > > > > > > > branches/PR -
> > > >         > > > > > > > > > > > >         > Cause unknown
> > > >         > > > > > > > > > > > >         >         > yet.>
> > > >         > > > > > > > > > > > >         >         > > These issues
> will
> > > be fixed
> > > >         > soon.>
> > > >         > > > > > > > > > > > >         >         > >
> > > >         > > > > > > > > > > > >         >         > >
> > > >         > > > > > > > > > > > >         >         > > -- >
> > > >         > > > > > > > > > > > >         >         > > Sandeep
> > > Krishnamurthy>
> > > >         > > > > > > > > > > > >         >         > >
> > > >         > > > > > > > > > > > >         >         >
> > > >         > > > > > > > > > > > >         >
> > > >         > > > > > > > > > > > >         >
> > > >         > > > > > > > > > > > >         >
> > > >         > > > > > > > > > > > >         >
> > > >         > > > > > > > > > > > >         >
> > > >         > > > > > > > > > > > >         >
> > > >         > > > > > > > > > > > >
> > > >         > > > > > > > > > > > >
> > > >         > > > > > > > > > > > >
> > > >         > > > > > > > > > > > >
> > > >         > > > > > > > > > > > >
> > > >         > > > > > > > > > > >
> > > >         > > > > > > > > > > >
> > > >         > > > > > > > > > > >
> > > >         > > > > > > > > > > > --
> > > >         > > > > > > > > > > > - Tsuyoshi
> > > >         > > > > > > > > > > >
> > > >         > > > > > > > > > >
> > > >         > > > > > > > > >
> > > >         > > > > > > > > >
> > > >         > > > > > > > > >
> > > >         > > > > > > > > > --
> > > >         > > > > > > > > > Best Regards,
> > > >         > > > > > > > > > Gautam Kumar
> > > >         > > > > > > > > >
> > > >         > > > > > > > >
> > > >         > > > > > > >
> > > >         > > > > > > >
> > > >         > > > > > > >
> > > >         > > > > > > > --
> > > >         > > > > > > > Best Regards,
> > > >         > > > > > > > Gautam Kumar
> > > >         > > > > > > >
> > > >         > > > > > >
> > > >         > > > > >
> > > >         > > > >
> > > >         > > >
> > > >         > >
> > > >         >
> > > >
> > > >
> > > >
> > > >
> > >
> >
> >
> >
> > --
> > Best Regards,
> > Gautam Kumar
> >
>

Re: Apache MXNet build failures are mostly valid - verify before merge

Posted by Henri Yandell <ba...@apache.org>.
A late followup on this.

Is the "How a committer develops on MXNet" documented anywhere?
Staging/master etc? The more complex the development process, the harder it
is for a newcomer to get involved on the project. I couldn't find it on the
website/github.

(I'd also note that the 'how to modify the website/documentation' also
needs to be documented)

I'd suggest that the how-to-dev doc also explain why it's bad for master to
not build. One could argue that, outside of when a release is being made,
master is not important to our users. Yes it should build, but an accident
should only affect those who have put themselves on the bleeding edge.

Hen

On Mon, Oct 2, 2017 at 1:19 PM, Gautam <ga...@gmail.com> wrote:

> Thanks All.
>
>  I've created the JIRA to mark the protected branch for master
> https://issues.apache.org/jira/browse/INCUBATOR-205.
> We also need to add all the committers to be code owner as discussed in the
> slack, I've opened a PR for it
> https://github.com/apache/incubator-mxnet/pull/8128.
>
> Good point Joern, I'll follow up on that.
>
> Regards,
> Gautam
>
> On Fri, Sep 29, 2017 at 2:20 AM, Joern Kottmann <ko...@gmail.com>
> wrote:
>
> > It also makes sense to block too old PRs from merging, because the
> > test results are outdated and the build might fail after it gets
> > merged.
> >
> > Jörn
> >
> > On Thu, Sep 28, 2017 at 9:14 PM, Zha, Sheng <zh...@amazon.com> wrote:
> > > +1 on protected branch.
> > >
> > > Best regards,
> > > -sz
> > >
> > > On 9/28/17, 11:48 AM, "Kumar, Gautam" <ga...@amazon.com> wrote:
> > >
> > >     Hi Guys,
> > >
> > >      Let’s focus on specific issue here.
> > >
> > >     Marking the master branch protected which involves “Only merge if
> > checks has passed, and yes it will run the complete build”.
> > >
> > >     We can’t afford to degrade the quality and keep debugging the build
> > failure forever. If it’s slow down the development at the cost of
> quality I
> > will vote for the quality.
> > >     We can work on improving the infrastructure to improve the overall
> > speed.  If you have any specific concerns on availability of Jenkins
> please
> > point out.
> > >
> > >     -Gautam
> > >
> > >
> > >     On 9/28/17, 11:38 AM, "Chris Olivier" <cj...@gmail.com>
> wrote:
> > >
> > >         -1000 on that. :)
> > >
> > >         On Thu, Sep 28, 2017 at 11:33 AM Naveen Swamy <
> > mnnaveen@gmail.com> wrote:
> > >
> > >         > PR->Sanity test/Linux build/test->reviewer/committer approves
> > the
> > >         > change->Comment "Build Now" (Or trigger on at least one
> > approval from a
> > >         > committer other than author)->*Full build-*>*passes
> > build*->Enable Merge
> > >         >
> > >         > Let us take this particular topic to a separate thread or
> > discuss offline
> > >         > if further clarification is needed.
> > >         >
> > >         > On Thu, Sep 28, 2017 at 11:24 AM, Chris Olivier <
> > cjolivier01@gmail.com>
> > >         > wrote:
> > >         >
> > >         > > I understand the proposal.  How to trigger a build in that
> > case?
> > >         > >
> > >         > >
> > >         > > On Thu, Sep 28, 2017 at 10:54 AM Madan Jampani <
> > madan.jampani@gmail.com>
> > >         > > wrote:
> > >         > >
> > >         > > > Chris,
> > >         > > > I don't think Naveen is suggesting that a merge happen
> > without full
> > >         > > > verification i.e. all tests across all platforms pass.
> > >         > > > If a PR has some back and forth and results in multiple
> > revisions
> > >         > (which
> > >         > > is
> > >         > > > arguably more common than a random unit test failing), we
> > simply delay
> > >         > > full
> > >         > > > verification until the owner/reviewer have settled on a
> > mutually
> > >         > > acceptable
> > >         > > > state.
> > >         > > >
> > >         > > > On Thu, Sep 28, 2017 at 10:25 AM, Chris Olivier <
> > cjolivier01@gmail.com
> > >         > >
> > >         > > > wrote:
> > >         > > >
> > >         > > > > -1 for running only partial tests.  Most failing unit
> > tests that get
> > >         > > > > through fail only for certain platforms/configurations.
> > I personally
> > >         > > > > prefer to be assured the build and test is good before
> > merge.  Most
> > >         > PR
> > >         > > > > merges aren't in a huge hurry.
> > >         > > > >
> > >         > > > > On Thu, Sep 28, 2017 at 9:54 AM, Naveen Swamy <
> > mnnaveen@gmail.com>
> > >         > > > wrote:
> > >         > > > >
> > >         > > > > > +1 to make it protected. Here is what I am thinking
> > for PR builds
> > >         > > > > > on a PR run Sanity Tests + build/test one
> > platform->committer
> > >         > reviews
> > >         > > > the
> > >         > > > > > code and issues "Build Now", a full build is
> > run->Github checks
> > >         > that
> > >         > > > the
> > >         > > > > > full build checks succeed before it can be merged.
> > >         > > > > >
> > >         > > > > > I agree with Madan that PR should be approved by one
> > another
> > >         > > committer.
> > >         > > > > >
> > >         > > > > >
> > >         > > > > >
> > >         > > > > > On Thu, Sep 28, 2017 at 9:37 AM, Madan Jampani <
> > >         > > > madan.jampani@gmail.com>
> > >         > > > > > wrote:
> > >         > > > > >
> > >         > > > > > > +1
> > >         > > > > > >
> > >         > > > > > > At a minimum I'd like to see the following two
> > happen:
> > >         > > > > > > - Option to merge is disabled until all required
> > checks pass.
> > >         > > > > > > - Code is reviewed and given +1 by at least one
> > other committer
> > >         > (no
> > >         > > > > self
> > >         > > > > > > review).
> > >         > > > > > >
> > >         > > > > > > On Wed, Sep 27, 2017 at 11:15 PM, Gautam <
> > gautamnitc@gmail.com>
> > >         > > > wrote:
> > >         > > > > > >
> > >         > > > > > > > Hi Chris,
> > >         > > > > > > >
> > >         > > > > > > >   Here <https://help.github.com/
> > articles/about-protected-
> > >         > > branches/
> > >         > > > >
> > >         > > > > is
> > >         > > > > > > > user
> > >         > > > > > > > document on semantics of protected branch.
> > >         > > > > > > > In short when a branch is protected following
> > applies to that
> > >         > > > branch.
> > >         > > > > > > >
> > >         > > > > > > >    - Can't be force pushed
> > >         > > > > > > >    - Can't be deleted
> > >         > > > > > > >    - Can't have changes merged into it until
> > required status
> > >         > > checks
> > >         > > > > > > >    <https://help.github.com/
> > articles/about-required-
> > >         > > status-checks>
> > >         > > > > > pass
> > >         > > > > > > >    - Can't have changes merged into it until
> > required reviews
> > >         > are
> > >         > > > > > > approved
> > >         > > > > > > >    <https://help.github.com/
> > articles/approving-a-pull-
> > >         > > > > > > > request-with-required-reviews>
> > >         > > > > > > >    - Can't be edited or have files uploaded to it
> > from the web
> > >         > > > > > > >    - Can't have changes merged into it until
> > changes to files
> > >         > > that
> > >         > > > > > > > have a designated
> > >         > > > > > > >    code owner <https://help.github.com/
> > >         > > articles/about-codeowners/>
> > >         > > > > > have
> > >         > > > > > > >    been approved by that owner
> > >         > > > > > > >
> > >         > > > > > > >  I am sure many of us might not want to have all
> > these but we
> > >         > can
> > >         > > > > > debate
> > >         > > > > > > on
> > >         > > > > > > > it. My main motive was to "*Can't have changes
> > merged into it
> > >         > > until
> > >         > > > > > > > required status checks pass*"
> > >         > > > > > > >
> > >         > > > > > > >
> > >         > > > > > > > -Gautam
> > >         > > > > > > >
> > >         > > > > > > >
> > >         > > > > > > >
> > >         > > > > > > > On Wed, Sep 27, 2017 at 11:09 PM, Chris Olivier <
> > >         > > > > cjolivier01@gmail.com
> > >         > > > > > >
> > >         > > > > > > > wrote:
> > >         > > > > > > >
> > >         > > > > > > > > What does that mean? "Protected"? Protected
> from
> > what?
> > >         > > > > > > > >
> > >         > > > > > > > > On Wed, Sep 27, 2017 at 11:08 PM Gautam <
> > >         > gautamnitc@gmail.com>
> > >         > > > > > wrote:
> > >         > > > > > > > >
> > >         > > > > > > > > > Hi Chris,
> > >         > > > > > > > > >
> > >         > > > > > > > > >    I mean make "master branch protected" of
> > MXNet.
> > >         > > > > > > > > >
> > >         > > > > > > > > > -Gautam
> > >         > > > > > > > > >
> > >         > > > > > > > > > On Wed, Sep 27, 2017 at 11:04 PM, Chris
> > Olivier <
> > >         > > > > > > cjolivier01@gmail.com
> > >         > > > > > > > >
> > >         > > > > > > > > > wrote:
> > >         > > > > > > > > >
> > >         > > > > > > > > > > What does this mean? "Mx-net branch
> > protected"?
> > >         > > > > > > > > > >
> > >         > > > > > > > > > > On Wed, Sep 27, 2017 at 9:59 PM Tsuyoshi
> > OZAWA <
> > >         > > > > > > > > ozawa.tsuyoshi@gmail.com
> > >         > > > > > > > > > >
> > >         > > > > > > > > > > wrote:
> > >         > > > > > > > > > >
> > >         > > > > > > > > > > > +1,
> > >         > > > > > > > > > > >
> > >         > > > > > > > > > > > While I'm checking the recent build
> > failures, and I
> > >         > think
> > >         > > > the
> > >         > > > > > > > > decision
> > >         > > > > > > > > > > > of making the mx-net branch protected is
> > necessary for
> > >         > > > stable
> > >         > > > > > > > > > > > building.
> > >         > > > > > > > > > > > Thanks Kumar for resuming important
> > discussion.
> > >         > > > > > > > > > > >
> > >         > > > > > > > > > > > Best regards
> > >         > > > > > > > > > > > - Tsuyoshi
> > >         > > > > > > > > > > >
> > >         > > > > > > > > > > > On Thu, Sep 28, 2017 at 12:56 PM, Kumar,
> > Gautam <
> > >         > > > > > > gauta@amazon.com>
> > >         > > > > > > > > > > wrote:
> > >         > > > > > > > > > > > > Reviving the discussion.
> > >         > > > > > > > > > > > >
> > >         > > > > > > > > > > > > At this point of time we have couple of
> > stable builds
> > >         > > > > > > > > > > > >
> > >         > > > > > > > > > > >
> > >         > https://builds.apache.org/view/Incubator%20Projects/job/
> > >         > > > > > > > > > > incubator-mxnet/job/master/448/
> > >         > > > > > > > > > > > >
> > >         > > > > > > > > > > >
> > >         > https://builds.apache.org/view/Incubator%20Projects/job/
> > >         > > > > > > > > > > incubator-mxnet/job/master/449/
> > >         > > > > > > > > > > > >
> > >         > > > > > > > > > > > > Should we have a quick discussion or
> > polling on
> > >         > making
> > >         > > > the
> > >         > > > > > > mx-net
> > >         > > > > > > > > > > branch
> > >         > > > > > > > > > > > protected? If you still think we
> shouldn’t
> > make it
> > >         > > > protected
> > >         > > > > > > please
> > >         > > > > > > > > > > provide
> > >         > > > > > > > > > > > a reason to support your claim.
> > >         > > > > > > > > > > > >
> > >         > > > > > > > > > > > > Few of us have concern over Jenkin’s
> > stability. If I
> > >         > > look
> > >         > > > > two
> > >         > > > > > > > weeks
> > >         > > > > > > > > > > > back, after upgrading Linux slave to
> g2.8x
> > and new
> > >         > > windows
> > >         > > > > AMI,
> > >         > > > > > > we
> > >         > > > > > > > > have
> > >         > > > > > > > > > > not
> > >         > > > > > > > > > > > seen any case where instance died due to
> > high memory
> > >         > > usage
> > >         > > > or
> > >         > > > > > any
> > >         > > > > > > > > > process
> > >         > > > > > > > > > > > got killed due to high cpu usage or any
> > other issue
> > >         > with
> > >         > > > > > windows
> > >         > > > > > > > > > slaves.
> > >         > > > > > > > > > > > >
> > >         > > > > > > > > > > > > Going forward we are also planning that
> > if we add any
> > >         > > new
> > >         > > > > > slave
> > >         > > > > > > > we
> > >         > > > > > > > > > will
> > >         > > > > > > > > > > > not enable the main load immediately, but
> > rather will
> > >         > do
> > >         > > > > ‘test
> > >         > > > > > > > build’
> > >         > > > > > > > > > to
> > >         > > > > > > > > > > > make sure that new slaves are not causing
> > any
> > >         > > > infrastructure
> > >         > > > > > > issue
> > >         > > > > > > > > and
> > >         > > > > > > > > > > > capable to perform as good as existing
> > slaves.
> > >         > > > > > > > > > > > >
> > >         > > > > > > > > > > > > -Gautam
> > >         > > > > > > > > > > > >
> > >         > > > > > > > > > > > > On 8/31/17, 5:27 PM, "Lupesko, Hagay" <
> > >         > > lupesko@gmail.com
> > >         > > > >
> > >         > > > > > > wrote:
> > >         > > > > > > > > > > > >
> > >         > > > > > > > > > > > >     @madan looking into some failures –
> > you’re right…
> > >         > > > > there’s
> > >         > > > > > > > > > multiple
> > >         > > > > > > > > > > > issues going on, some of them
> > intermittent, and we want
> > >         > > to
> > >         > > > be
> > >         > > > > > > able
> > >         > > > > > > > to
> > >         > > > > > > > > > > merge
> > >         > > > > > > > > > > > fixes in.
> > >         > > > > > > > > > > > >     Agreed that we can wait with
> setting
> > up protected
> > >         > > > mode
> > >         > > > > > > until
> > >         > > > > > > > > > build
> > >         > > > > > > > > > > > stabilizes.
> > >         > > > > > > > > > > > >
> > >         > > > > > > > > > > > >     On 8/31/17, 11:41, "Madan Jampani"
> <
> > >         > > > > > > madan.jampani@gmail.com>
> > >         > > > > > > > > > > wrote:
> > >         > > > > > > > > > > > >
> > >         > > > > > > > > > > > >         @hagay: we agree on the end
> > state. I'm not
> > >         > too
> > >         > > > > > > particular
> > >         > > > > > > > > > about
> > >         > > > > > > > > > > > how we get
> > >         > > > > > > > > > > > >         there. If you think enabling it
> > now and fixes
> > >         > > > > > > regression
> > >         > > > > > > > > > later
> > >         > > > > > > > > > > > is doable,
> > >         > > > > > > > > > > > >         I'm fine with. I see a bit of a
> > chicken and
> > >         > egg
> > >         > > > > > > problem.
> > >         > > > > > > > We
> > >         > > > > > > > > > > need
> > >         > > > > > > > > > > > to get
> > >         > > > > > > > > > > > >         some fixes in even when the
> > status checks are
> > >         > > > > > failing.
> > >         > > > > > > > > > > > >
> > >         > > > > > > > > > > > >         On Thu, Aug 31, 2017 at 11:25
> > AM, Lupesko,
> > >         > > Hagay
> > >         > > > <
> > >         > > > > > > > > > > > lupesko@gmail.com> wrote:
> > >         > > > > > > > > > > > >
> > >         > > > > > > > > > > > >         > @madan – re: getting to a
> > stable CI first:
> > >         > > > > > > > > > > > >         > I’m concerned that by not
> > enabling
> > >         > protected
> > >         > > > > branch
> > >         > > > > > > > mode
> > >         > > > > > > > > > > ASAP,
> > >         > > > > > > > > > > > we’re just
> > >         > > > > > > > > > > > >         > taking in more regressions,
> > which makes a
> > >         > > > stable
> > >         > > > > > > build
> > >         > > > > > > > a
> > >         > > > > > > > > > > > moving target for
> > >         > > > > > > > > > > > >         > us…
> > >         > > > > > > > > > > > >         >
> > >         > > > > > > > > > > > >         > On 8/31/17, 10:49, "Zha,
> > Sheng" <
> > >         > > > > > zhasheng@amazon.com
> > >         > > > > > > >
> > >         > > > > > > > > > wrote:
> > >         > > > > > > > > > > > >         >
> > >         > > > > > > > > > > > >         >     Just one thing: please
> > don’t disable
> > >         > more
> > >         > > > > tests
> > >         > > > > > > or
> > >         > > > > > > > > just
> > >         > > > > > > > > > > > raise the
> > >         > > > > > > > > > > > >         > tolerance thresholds.
> > >         > > > > > > > > > > > >         >
> > >         > > > > > > > > > > > >         >     Best regards,
> > >         > > > > > > > > > > > >         >     -sz
> > >         > > > > > > > > > > > >         >
> > >         > > > > > > > > > > > >         >     On 8/31/17, 10:45 AM,
> > "Madan Jampani" <
> > >         > > > > > > > > > > > madan.jampani@gmail.com> wrote:
> > >         > > > > > > > > > > > >         >
> > >         > > > > > > > > > > > >         >         +1
> > >         > > > > > > > > > > > >         >         Before we can turn
> > protected mode I
> > >         > > > feel
> > >         > > > > we
> > >         > > > > > > > > should
> > >         > > > > > > > > > > > first get to a
> > >         > > > > > > > > > > > >         > stable CI
> > >         > > > > > > > > > > > >         >         pipeline.
> > >         > > > > > > > > > > > >         >         Sandeep is chasing
> > down known
> > >         > > breaking
> > >         > > > > > > issues.
> > >         > > > > > > > > > > > >         >
> > >         > > > > > > > > > > > >         >
> > >         > > > > > > > > > > > >         >         On Thu, Aug 31, 2017
> > at 10:27 AM,
> > >         > > Hagay
> > >         > > > > > > > Lupesko <
> > >         > > > > > > > > > > > lupesko@gmail.com>
> > >         > > > > > > > > > > > >         > wrote:
> > >         > > > > > > > > > > > >         >
> > >         > > > > > > > > > > > >         >         > Build stability is
> a
> > major issue,
> > >         > > > > builds
> > >         > > > > > > have
> > >         > > > > > > > > > been
> > >         > > > > > > > > > > > failing left
> > >         > > > > > > > > > > > >         > and right
> > >         > > > > > > > > > > > >         >         > over the last week.
> > Some of it is
> > >         > > due
> > >         > > > > to
> > >         > > > > > > > > Jenkins
> > >         > > > > > > > > > > > slave issues,
> > >         > > > > > > > > > > > >         > but some are
> > >         > > > > > > > > > > > >         >         > real regressions.
> > >         > > > > > > > > > > > >         >         > We need to be more
> > strict in the
> > >         > > code
> > >         > > > > > we're
> > >         > > > > > > > > > > > committing.
> > >         > > > > > > > > > > > >         >         >
> > >         > > > > > > > > > > > >         >         > I propose we
> > configure our master
> > >         > > to
> > >         > > > > be a
> > >         > > > > > > > > > protected
> > >         > > > > > > > > > > > branch (
> > >         > > > > > > > > > > > >         >         >
> > >         > > > > > > > > > > > https://help.github.com/
> > articles/about-protected-
> > >         > > branches/
> > >         > > > ).
> > >         > > > > > > > > > > > >         >         >
> > >         > > > > > > > > > > > >         >         > Thoughts?
> > >         > > > > > > > > > > > >         >         >
> > >         > > > > > > > > > > > >         >         > On 2017-08-28
> 22:41,
> > sandeep
> > >         > > > > > krishnamurthy
> > >         > > > > > > <
> > >         > > > > > > > > > > > s...@gmail.com>
> > >         > > > > > > > > > > > >         > wrote:
> > >         > > > > > > > > > > > >         >         > > Hello Committers
> > and
> > >         > > Contributors,>
> > >         > > > > > > > > > > > >         >         > >
> > >         > > > > > > > > > > > >         >         > > Due to unstable
> > build
> > >         > pipelines,
> > >         > > > from
> > >         > > > > > > past
> > >         > > > > > > > 1
> > >         > > > > > > > > > > week,
> > >         > > > > > > > > > > > PRs are
> > >         > > > > > > > > > > > >         > being merged>
> > >         > > > > > > > > > > > >         >         > > after CR ignoring
> > PR build
> > >         > > status.
> > >         > > > > > Build
> > >         > > > > > > > > > pipeline
> > >         > > > > > > > > > > > is much more
> > >         > > > > > > > > > > > >         > stable
> > >         > > > > > > > > > > > >         >         > than>
> > >         > > > > > > > > > > > >         >         > > last week and
> most
> > of the build
> > >         > > > > > failures
> > >         > > > > > > > you
> > >         > > > > > > > > > see
> > >         > > > > > > > > > > > from now on,
> > >         > > > > > > > > > > > >         > are likely
> > >         > > > > > > > > > > > >         >         > to>
> > >         > > > > > > > > > > > >         >         > > be a valid
> failure
> > and hence,
> > >         > it
> > >         > > is
> > >         > > > > > > > > recommended
> > >         > > > > > > > > > > to
> > >         > > > > > > > > > > > wait for PR
> > >         > > > > > > > > > > > >         > builds,
> > >         > > > > > > > > > > > >         >         > see>
> > >         > > > > > > > > > > > >         >         > > the root cause of
> > any build
> > >         > > > failures
> > >         > > > > > > before
> > >         > > > > > > > > > > > proceeding with
> > >         > > > > > > > > > > > >         > merges.>
> > >         > > > > > > > > > > > >         >         > >
> > >         > > > > > > > > > > > >         >         > > At this point of
> > time, there
> > >         > are
> > >         > > 2
> > >         > > > > > > > > intermittent
> > >         > > > > > > > > > > > issue yet to
> > >         > > > > > > > > > > > >         > be fixed ->
> > >         > > > > > > > > > > > >         >         > > * Network error
> > leading to
> > >         > GitHub
> > >         > > > > > > requests
> > >         > > > > > > > > > > > throwing 404>
> > >         > > > > > > > > > > > >         >         > > * A conflict in
> > artifacts
> > >         > > generated
> > >         > > > > > > between
> > >         > > > > > > > > > > > branches/PR -
> > >         > > > > > > > > > > > >         > Cause unknown
> > >         > > > > > > > > > > > >         >         > yet.>
> > >         > > > > > > > > > > > >         >         > > These issues will
> > be fixed
> > >         > soon.>
> > >         > > > > > > > > > > > >         >         > >
> > >         > > > > > > > > > > > >         >         > >
> > >         > > > > > > > > > > > >         >         > > -- >
> > >         > > > > > > > > > > > >         >         > > Sandeep
> > Krishnamurthy>
> > >         > > > > > > > > > > > >         >         > >
> > >         > > > > > > > > > > > >         >         >
> > >         > > > > > > > > > > > >         >
> > >         > > > > > > > > > > > >         >
> > >         > > > > > > > > > > > >         >
> > >         > > > > > > > > > > > >         >
> > >         > > > > > > > > > > > >         >
> > >         > > > > > > > > > > > >         >
> > >         > > > > > > > > > > > >
> > >         > > > > > > > > > > > >
> > >         > > > > > > > > > > > >
> > >         > > > > > > > > > > > >
> > >         > > > > > > > > > > > >
> > >         > > > > > > > > > > >
> > >         > > > > > > > > > > >
> > >         > > > > > > > > > > >
> > >         > > > > > > > > > > > --
> > >         > > > > > > > > > > > - Tsuyoshi
> > >         > > > > > > > > > > >
> > >         > > > > > > > > > >
> > >         > > > > > > > > >
> > >         > > > > > > > > >
> > >         > > > > > > > > >
> > >         > > > > > > > > > --
> > >         > > > > > > > > > Best Regards,
> > >         > > > > > > > > > Gautam Kumar
> > >         > > > > > > > > >
> > >         > > > > > > > >
> > >         > > > > > > >
> > >         > > > > > > >
> > >         > > > > > > >
> > >         > > > > > > > --
> > >         > > > > > > > Best Regards,
> > >         > > > > > > > Gautam Kumar
> > >         > > > > > > >
> > >         > > > > > >
> > >         > > > > >
> > >         > > > >
> > >         > > >
> > >         > >
> > >         >
> > >
> > >
> > >
> > >
> >
>
>
>
> --
> Best Regards,
> Gautam Kumar
>

Re: Apache MXNet build failures are mostly valid - verify before merge

Posted by Gautam <ga...@gmail.com>.
Thanks All.

 I've created the JIRA to mark the protected branch for master
https://issues.apache.org/jira/browse/INCUBATOR-205.
We also need to add all the committers to be code owner as discussed in the
slack, I've opened a PR for it
https://github.com/apache/incubator-mxnet/pull/8128.

Good point Joern, I'll follow up on that.

Regards,
Gautam

On Fri, Sep 29, 2017 at 2:20 AM, Joern Kottmann <ko...@gmail.com> wrote:

> It also makes sense to block too old PRs from merging, because the
> test results are outdated and the build might fail after it gets
> merged.
>
> Jörn
>
> On Thu, Sep 28, 2017 at 9:14 PM, Zha, Sheng <zh...@amazon.com> wrote:
> > +1 on protected branch.
> >
> > Best regards,
> > -sz
> >
> > On 9/28/17, 11:48 AM, "Kumar, Gautam" <ga...@amazon.com> wrote:
> >
> >     Hi Guys,
> >
> >      Let’s focus on specific issue here.
> >
> >     Marking the master branch protected which involves “Only merge if
> checks has passed, and yes it will run the complete build”.
> >
> >     We can’t afford to degrade the quality and keep debugging the build
> failure forever. If it’s slow down the development at the cost of quality I
> will vote for the quality.
> >     We can work on improving the infrastructure to improve the overall
> speed.  If you have any specific concerns on availability of Jenkins please
> point out.
> >
> >     -Gautam
> >
> >
> >     On 9/28/17, 11:38 AM, "Chris Olivier" <cj...@gmail.com> wrote:
> >
> >         -1000 on that. :)
> >
> >         On Thu, Sep 28, 2017 at 11:33 AM Naveen Swamy <
> mnnaveen@gmail.com> wrote:
> >
> >         > PR->Sanity test/Linux build/test->reviewer/committer approves
> the
> >         > change->Comment "Build Now" (Or trigger on at least one
> approval from a
> >         > committer other than author)->*Full build-*>*passes
> build*->Enable Merge
> >         >
> >         > Let us take this particular topic to a separate thread or
> discuss offline
> >         > if further clarification is needed.
> >         >
> >         > On Thu, Sep 28, 2017 at 11:24 AM, Chris Olivier <
> cjolivier01@gmail.com>
> >         > wrote:
> >         >
> >         > > I understand the proposal.  How to trigger a build in that
> case?
> >         > >
> >         > >
> >         > > On Thu, Sep 28, 2017 at 10:54 AM Madan Jampani <
> madan.jampani@gmail.com>
> >         > > wrote:
> >         > >
> >         > > > Chris,
> >         > > > I don't think Naveen is suggesting that a merge happen
> without full
> >         > > > verification i.e. all tests across all platforms pass.
> >         > > > If a PR has some back and forth and results in multiple
> revisions
> >         > (which
> >         > > is
> >         > > > arguably more common than a random unit test failing), we
> simply delay
> >         > > full
> >         > > > verification until the owner/reviewer have settled on a
> mutually
> >         > > acceptable
> >         > > > state.
> >         > > >
> >         > > > On Thu, Sep 28, 2017 at 10:25 AM, Chris Olivier <
> cjolivier01@gmail.com
> >         > >
> >         > > > wrote:
> >         > > >
> >         > > > > -1 for running only partial tests.  Most failing unit
> tests that get
> >         > > > > through fail only for certain platforms/configurations.
> I personally
> >         > > > > prefer to be assured the build and test is good before
> merge.  Most
> >         > PR
> >         > > > > merges aren't in a huge hurry.
> >         > > > >
> >         > > > > On Thu, Sep 28, 2017 at 9:54 AM, Naveen Swamy <
> mnnaveen@gmail.com>
> >         > > > wrote:
> >         > > > >
> >         > > > > > +1 to make it protected. Here is what I am thinking
> for PR builds
> >         > > > > > on a PR run Sanity Tests + build/test one
> platform->committer
> >         > reviews
> >         > > > the
> >         > > > > > code and issues "Build Now", a full build is
> run->Github checks
> >         > that
> >         > > > the
> >         > > > > > full build checks succeed before it can be merged.
> >         > > > > >
> >         > > > > > I agree with Madan that PR should be approved by one
> another
> >         > > committer.
> >         > > > > >
> >         > > > > >
> >         > > > > >
> >         > > > > > On Thu, Sep 28, 2017 at 9:37 AM, Madan Jampani <
> >         > > > madan.jampani@gmail.com>
> >         > > > > > wrote:
> >         > > > > >
> >         > > > > > > +1
> >         > > > > > >
> >         > > > > > > At a minimum I'd like to see the following two
> happen:
> >         > > > > > > - Option to merge is disabled until all required
> checks pass.
> >         > > > > > > - Code is reviewed and given +1 by at least one
> other committer
> >         > (no
> >         > > > > self
> >         > > > > > > review).
> >         > > > > > >
> >         > > > > > > On Wed, Sep 27, 2017 at 11:15 PM, Gautam <
> gautamnitc@gmail.com>
> >         > > > wrote:
> >         > > > > > >
> >         > > > > > > > Hi Chris,
> >         > > > > > > >
> >         > > > > > > >   Here <https://help.github.com/
> articles/about-protected-
> >         > > branches/
> >         > > > >
> >         > > > > is
> >         > > > > > > > user
> >         > > > > > > > document on semantics of protected branch.
> >         > > > > > > > In short when a branch is protected following
> applies to that
> >         > > > branch.
> >         > > > > > > >
> >         > > > > > > >    - Can't be force pushed
> >         > > > > > > >    - Can't be deleted
> >         > > > > > > >    - Can't have changes merged into it until
> required status
> >         > > checks
> >         > > > > > > >    <https://help.github.com/
> articles/about-required-
> >         > > status-checks>
> >         > > > > > pass
> >         > > > > > > >    - Can't have changes merged into it until
> required reviews
> >         > are
> >         > > > > > > approved
> >         > > > > > > >    <https://help.github.com/
> articles/approving-a-pull-
> >         > > > > > > > request-with-required-reviews>
> >         > > > > > > >    - Can't be edited or have files uploaded to it
> from the web
> >         > > > > > > >    - Can't have changes merged into it until
> changes to files
> >         > > that
> >         > > > > > > > have a designated
> >         > > > > > > >    code owner <https://help.github.com/
> >         > > articles/about-codeowners/>
> >         > > > > > have
> >         > > > > > > >    been approved by that owner
> >         > > > > > > >
> >         > > > > > > >  I am sure many of us might not want to have all
> these but we
> >         > can
> >         > > > > > debate
> >         > > > > > > on
> >         > > > > > > > it. My main motive was to "*Can't have changes
> merged into it
> >         > > until
> >         > > > > > > > required status checks pass*"
> >         > > > > > > >
> >         > > > > > > >
> >         > > > > > > > -Gautam
> >         > > > > > > >
> >         > > > > > > >
> >         > > > > > > >
> >         > > > > > > > On Wed, Sep 27, 2017 at 11:09 PM, Chris Olivier <
> >         > > > > cjolivier01@gmail.com
> >         > > > > > >
> >         > > > > > > > wrote:
> >         > > > > > > >
> >         > > > > > > > > What does that mean? "Protected"? Protected from
> what?
> >         > > > > > > > >
> >         > > > > > > > > On Wed, Sep 27, 2017 at 11:08 PM Gautam <
> >         > gautamnitc@gmail.com>
> >         > > > > > wrote:
> >         > > > > > > > >
> >         > > > > > > > > > Hi Chris,
> >         > > > > > > > > >
> >         > > > > > > > > >    I mean make "master branch protected" of
> MXNet.
> >         > > > > > > > > >
> >         > > > > > > > > > -Gautam
> >         > > > > > > > > >
> >         > > > > > > > > > On Wed, Sep 27, 2017 at 11:04 PM, Chris
> Olivier <
> >         > > > > > > cjolivier01@gmail.com
> >         > > > > > > > >
> >         > > > > > > > > > wrote:
> >         > > > > > > > > >
> >         > > > > > > > > > > What does this mean? "Mx-net branch
> protected"?
> >         > > > > > > > > > >
> >         > > > > > > > > > > On Wed, Sep 27, 2017 at 9:59 PM Tsuyoshi
> OZAWA <
> >         > > > > > > > > ozawa.tsuyoshi@gmail.com
> >         > > > > > > > > > >
> >         > > > > > > > > > > wrote:
> >         > > > > > > > > > >
> >         > > > > > > > > > > > +1,
> >         > > > > > > > > > > >
> >         > > > > > > > > > > > While I'm checking the recent build
> failures, and I
> >         > think
> >         > > > the
> >         > > > > > > > > decision
> >         > > > > > > > > > > > of making the mx-net branch protected is
> necessary for
> >         > > > stable
> >         > > > > > > > > > > > building.
> >         > > > > > > > > > > > Thanks Kumar for resuming important
> discussion.
> >         > > > > > > > > > > >
> >         > > > > > > > > > > > Best regards
> >         > > > > > > > > > > > - Tsuyoshi
> >         > > > > > > > > > > >
> >         > > > > > > > > > > > On Thu, Sep 28, 2017 at 12:56 PM, Kumar,
> Gautam <
> >         > > > > > > gauta@amazon.com>
> >         > > > > > > > > > > wrote:
> >         > > > > > > > > > > > > Reviving the discussion.
> >         > > > > > > > > > > > >
> >         > > > > > > > > > > > > At this point of time we have couple of
> stable builds
> >         > > > > > > > > > > > >
> >         > > > > > > > > > > >
> >         > https://builds.apache.org/view/Incubator%20Projects/job/
> >         > > > > > > > > > > incubator-mxnet/job/master/448/
> >         > > > > > > > > > > > >
> >         > > > > > > > > > > >
> >         > https://builds.apache.org/view/Incubator%20Projects/job/
> >         > > > > > > > > > > incubator-mxnet/job/master/449/
> >         > > > > > > > > > > > >
> >         > > > > > > > > > > > > Should we have a quick discussion or
> polling on
> >         > making
> >         > > > the
> >         > > > > > > mx-net
> >         > > > > > > > > > > branch
> >         > > > > > > > > > > > protected? If you still think we shouldn’t
> make it
> >         > > > protected
> >         > > > > > > please
> >         > > > > > > > > > > provide
> >         > > > > > > > > > > > a reason to support your claim.
> >         > > > > > > > > > > > >
> >         > > > > > > > > > > > > Few of us have concern over Jenkin’s
> stability. If I
> >         > > look
> >         > > > > two
> >         > > > > > > > weeks
> >         > > > > > > > > > > > back, after upgrading Linux slave to g2.8x
> and new
> >         > > windows
> >         > > > > AMI,
> >         > > > > > > we
> >         > > > > > > > > have
> >         > > > > > > > > > > not
> >         > > > > > > > > > > > seen any case where instance died due to
> high memory
> >         > > usage
> >         > > > or
> >         > > > > > any
> >         > > > > > > > > > process
> >         > > > > > > > > > > > got killed due to high cpu usage or any
> other issue
> >         > with
> >         > > > > > windows
> >         > > > > > > > > > slaves.
> >         > > > > > > > > > > > >
> >         > > > > > > > > > > > > Going forward we are also planning that
> if we add any
> >         > > new
> >         > > > > > slave
> >         > > > > > > > we
> >         > > > > > > > > > will
> >         > > > > > > > > > > > not enable the main load immediately, but
> rather will
> >         > do
> >         > > > > ‘test
> >         > > > > > > > build’
> >         > > > > > > > > > to
> >         > > > > > > > > > > > make sure that new slaves are not causing
> any
> >         > > > infrastructure
> >         > > > > > > issue
> >         > > > > > > > > and
> >         > > > > > > > > > > > capable to perform as good as existing
> slaves.
> >         > > > > > > > > > > > >
> >         > > > > > > > > > > > > -Gautam
> >         > > > > > > > > > > > >
> >         > > > > > > > > > > > > On 8/31/17, 5:27 PM, "Lupesko, Hagay" <
> >         > > lupesko@gmail.com
> >         > > > >
> >         > > > > > > wrote:
> >         > > > > > > > > > > > >
> >         > > > > > > > > > > > >     @madan looking into some failures –
> you’re right…
> >         > > > > there’s
> >         > > > > > > > > > multiple
> >         > > > > > > > > > > > issues going on, some of them
> intermittent, and we want
> >         > > to
> >         > > > be
> >         > > > > > > able
> >         > > > > > > > to
> >         > > > > > > > > > > merge
> >         > > > > > > > > > > > fixes in.
> >         > > > > > > > > > > > >     Agreed that we can wait with setting
> up protected
> >         > > > mode
> >         > > > > > > until
> >         > > > > > > > > > build
> >         > > > > > > > > > > > stabilizes.
> >         > > > > > > > > > > > >
> >         > > > > > > > > > > > >     On 8/31/17, 11:41, "Madan Jampani" <
> >         > > > > > > madan.jampani@gmail.com>
> >         > > > > > > > > > > wrote:
> >         > > > > > > > > > > > >
> >         > > > > > > > > > > > >         @hagay: we agree on the end
> state. I'm not
> >         > too
> >         > > > > > > particular
> >         > > > > > > > > > about
> >         > > > > > > > > > > > how we get
> >         > > > > > > > > > > > >         there. If you think enabling it
> now and fixes
> >         > > > > > > regression
> >         > > > > > > > > > later
> >         > > > > > > > > > > > is doable,
> >         > > > > > > > > > > > >         I'm fine with. I see a bit of a
> chicken and
> >         > egg
> >         > > > > > > problem.
> >         > > > > > > > We
> >         > > > > > > > > > > need
> >         > > > > > > > > > > > to get
> >         > > > > > > > > > > > >         some fixes in even when the
> status checks are
> >         > > > > > failing.
> >         > > > > > > > > > > > >
> >         > > > > > > > > > > > >         On Thu, Aug 31, 2017 at 11:25
> AM, Lupesko,
> >         > > Hagay
> >         > > > <
> >         > > > > > > > > > > > lupesko@gmail.com> wrote:
> >         > > > > > > > > > > > >
> >         > > > > > > > > > > > >         > @madan – re: getting to a
> stable CI first:
> >         > > > > > > > > > > > >         > I’m concerned that by not
> enabling
> >         > protected
> >         > > > > branch
> >         > > > > > > > mode
> >         > > > > > > > > > > ASAP,
> >         > > > > > > > > > > > we’re just
> >         > > > > > > > > > > > >         > taking in more regressions,
> which makes a
> >         > > > stable
> >         > > > > > > build
> >         > > > > > > > a
> >         > > > > > > > > > > > moving target for
> >         > > > > > > > > > > > >         > us…
> >         > > > > > > > > > > > >         >
> >         > > > > > > > > > > > >         > On 8/31/17, 10:49, "Zha,
> Sheng" <
> >         > > > > > zhasheng@amazon.com
> >         > > > > > > >
> >         > > > > > > > > > wrote:
> >         > > > > > > > > > > > >         >
> >         > > > > > > > > > > > >         >     Just one thing: please
> don’t disable
> >         > more
> >         > > > > tests
> >         > > > > > > or
> >         > > > > > > > > just
> >         > > > > > > > > > > > raise the
> >         > > > > > > > > > > > >         > tolerance thresholds.
> >         > > > > > > > > > > > >         >
> >         > > > > > > > > > > > >         >     Best regards,
> >         > > > > > > > > > > > >         >     -sz
> >         > > > > > > > > > > > >         >
> >         > > > > > > > > > > > >         >     On 8/31/17, 10:45 AM,
> "Madan Jampani" <
> >         > > > > > > > > > > > madan.jampani@gmail.com> wrote:
> >         > > > > > > > > > > > >         >
> >         > > > > > > > > > > > >         >         +1
> >         > > > > > > > > > > > >         >         Before we can turn
> protected mode I
> >         > > > feel
> >         > > > > we
> >         > > > > > > > > should
> >         > > > > > > > > > > > first get to a
> >         > > > > > > > > > > > >         > stable CI
> >         > > > > > > > > > > > >         >         pipeline.
> >         > > > > > > > > > > > >         >         Sandeep is chasing
> down known
> >         > > breaking
> >         > > > > > > issues.
> >         > > > > > > > > > > > >         >
> >         > > > > > > > > > > > >         >
> >         > > > > > > > > > > > >         >         On Thu, Aug 31, 2017
> at 10:27 AM,
> >         > > Hagay
> >         > > > > > > > Lupesko <
> >         > > > > > > > > > > > lupesko@gmail.com>
> >         > > > > > > > > > > > >         > wrote:
> >         > > > > > > > > > > > >         >
> >         > > > > > > > > > > > >         >         > Build stability is a
> major issue,
> >         > > > > builds
> >         > > > > > > have
> >         > > > > > > > > > been
> >         > > > > > > > > > > > failing left
> >         > > > > > > > > > > > >         > and right
> >         > > > > > > > > > > > >         >         > over the last week.
> Some of it is
> >         > > due
> >         > > > > to
> >         > > > > > > > > Jenkins
> >         > > > > > > > > > > > slave issues,
> >         > > > > > > > > > > > >         > but some are
> >         > > > > > > > > > > > >         >         > real regressions.
> >         > > > > > > > > > > > >         >         > We need to be more
> strict in the
> >         > > code
> >         > > > > > we're
> >         > > > > > > > > > > > committing.
> >         > > > > > > > > > > > >         >         >
> >         > > > > > > > > > > > >         >         > I propose we
> configure our master
> >         > > to
> >         > > > > be a
> >         > > > > > > > > > protected
> >         > > > > > > > > > > > branch (
> >         > > > > > > > > > > > >         >         >
> >         > > > > > > > > > > > https://help.github.com/
> articles/about-protected-
> >         > > branches/
> >         > > > ).
> >         > > > > > > > > > > > >         >         >
> >         > > > > > > > > > > > >         >         > Thoughts?
> >         > > > > > > > > > > > >         >         >
> >         > > > > > > > > > > > >         >         > On 2017-08-28 22:41,
> sandeep
> >         > > > > > krishnamurthy
> >         > > > > > > <
> >         > > > > > > > > > > > s...@gmail.com>
> >         > > > > > > > > > > > >         > wrote:
> >         > > > > > > > > > > > >         >         > > Hello Committers
> and
> >         > > Contributors,>
> >         > > > > > > > > > > > >         >         > >
> >         > > > > > > > > > > > >         >         > > Due to unstable
> build
> >         > pipelines,
> >         > > > from
> >         > > > > > > past
> >         > > > > > > > 1
> >         > > > > > > > > > > week,
> >         > > > > > > > > > > > PRs are
> >         > > > > > > > > > > > >         > being merged>
> >         > > > > > > > > > > > >         >         > > after CR ignoring
> PR build
> >         > > status.
> >         > > > > > Build
> >         > > > > > > > > > pipeline
> >         > > > > > > > > > > > is much more
> >         > > > > > > > > > > > >         > stable
> >         > > > > > > > > > > > >         >         > than>
> >         > > > > > > > > > > > >         >         > > last week and most
> of the build
> >         > > > > > failures
> >         > > > > > > > you
> >         > > > > > > > > > see
> >         > > > > > > > > > > > from now on,
> >         > > > > > > > > > > > >         > are likely
> >         > > > > > > > > > > > >         >         > to>
> >         > > > > > > > > > > > >         >         > > be a valid failure
> and hence,
> >         > it
> >         > > is
> >         > > > > > > > > recommended
> >         > > > > > > > > > > to
> >         > > > > > > > > > > > wait for PR
> >         > > > > > > > > > > > >         > builds,
> >         > > > > > > > > > > > >         >         > see>
> >         > > > > > > > > > > > >         >         > > the root cause of
> any build
> >         > > > failures
> >         > > > > > > before
> >         > > > > > > > > > > > proceeding with
> >         > > > > > > > > > > > >         > merges.>
> >         > > > > > > > > > > > >         >         > >
> >         > > > > > > > > > > > >         >         > > At this point of
> time, there
> >         > are
> >         > > 2
> >         > > > > > > > > intermittent
> >         > > > > > > > > > > > issue yet to
> >         > > > > > > > > > > > >         > be fixed ->
> >         > > > > > > > > > > > >         >         > > * Network error
> leading to
> >         > GitHub
> >         > > > > > > requests
> >         > > > > > > > > > > > throwing 404>
> >         > > > > > > > > > > > >         >         > > * A conflict in
> artifacts
> >         > > generated
> >         > > > > > > between
> >         > > > > > > > > > > > branches/PR -
> >         > > > > > > > > > > > >         > Cause unknown
> >         > > > > > > > > > > > >         >         > yet.>
> >         > > > > > > > > > > > >         >         > > These issues will
> be fixed
> >         > soon.>
> >         > > > > > > > > > > > >         >         > >
> >         > > > > > > > > > > > >         >         > >
> >         > > > > > > > > > > > >         >         > > -- >
> >         > > > > > > > > > > > >         >         > > Sandeep
> Krishnamurthy>
> >         > > > > > > > > > > > >         >         > >
> >         > > > > > > > > > > > >         >         >
> >         > > > > > > > > > > > >         >
> >         > > > > > > > > > > > >         >
> >         > > > > > > > > > > > >         >
> >         > > > > > > > > > > > >         >
> >         > > > > > > > > > > > >         >
> >         > > > > > > > > > > > >         >
> >         > > > > > > > > > > > >
> >         > > > > > > > > > > > >
> >         > > > > > > > > > > > >
> >         > > > > > > > > > > > >
> >         > > > > > > > > > > > >
> >         > > > > > > > > > > >
> >         > > > > > > > > > > >
> >         > > > > > > > > > > >
> >         > > > > > > > > > > > --
> >         > > > > > > > > > > > - Tsuyoshi
> >         > > > > > > > > > > >
> >         > > > > > > > > > >
> >         > > > > > > > > >
> >         > > > > > > > > >
> >         > > > > > > > > >
> >         > > > > > > > > > --
> >         > > > > > > > > > Best Regards,
> >         > > > > > > > > > Gautam Kumar
> >         > > > > > > > > >
> >         > > > > > > > >
> >         > > > > > > >
> >         > > > > > > >
> >         > > > > > > >
> >         > > > > > > > --
> >         > > > > > > > Best Regards,
> >         > > > > > > > Gautam Kumar
> >         > > > > > > >
> >         > > > > > >
> >         > > > > >
> >         > > > >
> >         > > >
> >         > >
> >         >
> >
> >
> >
> >
>



-- 
Best Regards,
Gautam Kumar

Re: Apache MXNet build failures are mostly valid - verify before merge

Posted by Joern Kottmann <ko...@gmail.com>.
It also makes sense to block too old PRs from merging, because the
test results are outdated and the build might fail after it gets
merged.

Jörn

On Thu, Sep 28, 2017 at 9:14 PM, Zha, Sheng <zh...@amazon.com> wrote:
> +1 on protected branch.
>
> Best regards,
> -sz
>
> On 9/28/17, 11:48 AM, "Kumar, Gautam" <ga...@amazon.com> wrote:
>
>     Hi Guys,
>
>      Let’s focus on specific issue here.
>
>     Marking the master branch protected which involves “Only merge if checks has passed, and yes it will run the complete build”.
>
>     We can’t afford to degrade the quality and keep debugging the build failure forever. If it’s slow down the development at the cost of quality I will vote for the quality.
>     We can work on improving the infrastructure to improve the overall speed.  If you have any specific concerns on availability of Jenkins please point out.
>
>     -Gautam
>
>
>     On 9/28/17, 11:38 AM, "Chris Olivier" <cj...@gmail.com> wrote:
>
>         -1000 on that. :)
>
>         On Thu, Sep 28, 2017 at 11:33 AM Naveen Swamy <mn...@gmail.com> wrote:
>
>         > PR->Sanity test/Linux build/test->reviewer/committer approves the
>         > change->Comment "Build Now" (Or trigger on at least one approval from a
>         > committer other than author)->*Full build-*>*passes build*->Enable Merge
>         >
>         > Let us take this particular topic to a separate thread or discuss offline
>         > if further clarification is needed.
>         >
>         > On Thu, Sep 28, 2017 at 11:24 AM, Chris Olivier <cj...@gmail.com>
>         > wrote:
>         >
>         > > I understand the proposal.  How to trigger a build in that case?
>         > >
>         > >
>         > > On Thu, Sep 28, 2017 at 10:54 AM Madan Jampani <ma...@gmail.com>
>         > > wrote:
>         > >
>         > > > Chris,
>         > > > I don't think Naveen is suggesting that a merge happen without full
>         > > > verification i.e. all tests across all platforms pass.
>         > > > If a PR has some back and forth and results in multiple revisions
>         > (which
>         > > is
>         > > > arguably more common than a random unit test failing), we simply delay
>         > > full
>         > > > verification until the owner/reviewer have settled on a mutually
>         > > acceptable
>         > > > state.
>         > > >
>         > > > On Thu, Sep 28, 2017 at 10:25 AM, Chris Olivier <cjolivier01@gmail.com
>         > >
>         > > > wrote:
>         > > >
>         > > > > -1 for running only partial tests.  Most failing unit tests that get
>         > > > > through fail only for certain platforms/configurations.  I personally
>         > > > > prefer to be assured the build and test is good before merge.  Most
>         > PR
>         > > > > merges aren't in a huge hurry.
>         > > > >
>         > > > > On Thu, Sep 28, 2017 at 9:54 AM, Naveen Swamy <mn...@gmail.com>
>         > > > wrote:
>         > > > >
>         > > > > > +1 to make it protected. Here is what I am thinking for PR builds
>         > > > > > on a PR run Sanity Tests + build/test one platform->committer
>         > reviews
>         > > > the
>         > > > > > code and issues "Build Now", a full build is run->Github checks
>         > that
>         > > > the
>         > > > > > full build checks succeed before it can be merged.
>         > > > > >
>         > > > > > I agree with Madan that PR should be approved by one another
>         > > committer.
>         > > > > >
>         > > > > >
>         > > > > >
>         > > > > > On Thu, Sep 28, 2017 at 9:37 AM, Madan Jampani <
>         > > > madan.jampani@gmail.com>
>         > > > > > wrote:
>         > > > > >
>         > > > > > > +1
>         > > > > > >
>         > > > > > > At a minimum I'd like to see the following two happen:
>         > > > > > > - Option to merge is disabled until all required checks pass.
>         > > > > > > - Code is reviewed and given +1 by at least one other committer
>         > (no
>         > > > > self
>         > > > > > > review).
>         > > > > > >
>         > > > > > > On Wed, Sep 27, 2017 at 11:15 PM, Gautam <ga...@gmail.com>
>         > > > wrote:
>         > > > > > >
>         > > > > > > > Hi Chris,
>         > > > > > > >
>         > > > > > > >   Here <https://help.github.com/articles/about-protected-
>         > > branches/
>         > > > >
>         > > > > is
>         > > > > > > > user
>         > > > > > > > document on semantics of protected branch.
>         > > > > > > > In short when a branch is protected following applies to that
>         > > > branch.
>         > > > > > > >
>         > > > > > > >    - Can't be force pushed
>         > > > > > > >    - Can't be deleted
>         > > > > > > >    - Can't have changes merged into it until required status
>         > > checks
>         > > > > > > >    <https://help.github.com/articles/about-required-
>         > > status-checks>
>         > > > > > pass
>         > > > > > > >    - Can't have changes merged into it until required reviews
>         > are
>         > > > > > > approved
>         > > > > > > >    <https://help.github.com/articles/approving-a-pull-
>         > > > > > > > request-with-required-reviews>
>         > > > > > > >    - Can't be edited or have files uploaded to it from the web
>         > > > > > > >    - Can't have changes merged into it until changes to files
>         > > that
>         > > > > > > > have a designated
>         > > > > > > >    code owner <https://help.github.com/
>         > > articles/about-codeowners/>
>         > > > > > have
>         > > > > > > >    been approved by that owner
>         > > > > > > >
>         > > > > > > >  I am sure many of us might not want to have all these but we
>         > can
>         > > > > > debate
>         > > > > > > on
>         > > > > > > > it. My main motive was to "*Can't have changes merged into it
>         > > until
>         > > > > > > > required status checks pass*"
>         > > > > > > >
>         > > > > > > >
>         > > > > > > > -Gautam
>         > > > > > > >
>         > > > > > > >
>         > > > > > > >
>         > > > > > > > On Wed, Sep 27, 2017 at 11:09 PM, Chris Olivier <
>         > > > > cjolivier01@gmail.com
>         > > > > > >
>         > > > > > > > wrote:
>         > > > > > > >
>         > > > > > > > > What does that mean? "Protected"? Protected from what?
>         > > > > > > > >
>         > > > > > > > > On Wed, Sep 27, 2017 at 11:08 PM Gautam <
>         > gautamnitc@gmail.com>
>         > > > > > wrote:
>         > > > > > > > >
>         > > > > > > > > > Hi Chris,
>         > > > > > > > > >
>         > > > > > > > > >    I mean make "master branch protected" of  MXNet.
>         > > > > > > > > >
>         > > > > > > > > > -Gautam
>         > > > > > > > > >
>         > > > > > > > > > On Wed, Sep 27, 2017 at 11:04 PM, Chris Olivier <
>         > > > > > > cjolivier01@gmail.com
>         > > > > > > > >
>         > > > > > > > > > wrote:
>         > > > > > > > > >
>         > > > > > > > > > > What does this mean? "Mx-net branch protected"?
>         > > > > > > > > > >
>         > > > > > > > > > > On Wed, Sep 27, 2017 at 9:59 PM Tsuyoshi OZAWA <
>         > > > > > > > > ozawa.tsuyoshi@gmail.com
>         > > > > > > > > > >
>         > > > > > > > > > > wrote:
>         > > > > > > > > > >
>         > > > > > > > > > > > +1,
>         > > > > > > > > > > >
>         > > > > > > > > > > > While I'm checking the recent build failures, and I
>         > think
>         > > > the
>         > > > > > > > > decision
>         > > > > > > > > > > > of making the mx-net branch protected is necessary for
>         > > > stable
>         > > > > > > > > > > > building.
>         > > > > > > > > > > > Thanks Kumar for resuming important discussion.
>         > > > > > > > > > > >
>         > > > > > > > > > > > Best regards
>         > > > > > > > > > > > - Tsuyoshi
>         > > > > > > > > > > >
>         > > > > > > > > > > > On Thu, Sep 28, 2017 at 12:56 PM, Kumar, Gautam <
>         > > > > > > gauta@amazon.com>
>         > > > > > > > > > > wrote:
>         > > > > > > > > > > > > Reviving the discussion.
>         > > > > > > > > > > > >
>         > > > > > > > > > > > > At this point of time we have couple of stable builds
>         > > > > > > > > > > > >
>         > > > > > > > > > > >
>         > https://builds.apache.org/view/Incubator%20Projects/job/
>         > > > > > > > > > > incubator-mxnet/job/master/448/
>         > > > > > > > > > > > >
>         > > > > > > > > > > >
>         > https://builds.apache.org/view/Incubator%20Projects/job/
>         > > > > > > > > > > incubator-mxnet/job/master/449/
>         > > > > > > > > > > > >
>         > > > > > > > > > > > > Should we have a quick discussion or polling on
>         > making
>         > > > the
>         > > > > > > mx-net
>         > > > > > > > > > > branch
>         > > > > > > > > > > > protected? If you still think we shouldn’t make it
>         > > > protected
>         > > > > > > please
>         > > > > > > > > > > provide
>         > > > > > > > > > > > a reason to support your claim.
>         > > > > > > > > > > > >
>         > > > > > > > > > > > > Few of us have concern over Jenkin’s stability. If I
>         > > look
>         > > > > two
>         > > > > > > > weeks
>         > > > > > > > > > > > back, after upgrading Linux slave to g2.8x and new
>         > > windows
>         > > > > AMI,
>         > > > > > > we
>         > > > > > > > > have
>         > > > > > > > > > > not
>         > > > > > > > > > > > seen any case where instance died due to high memory
>         > > usage
>         > > > or
>         > > > > > any
>         > > > > > > > > > process
>         > > > > > > > > > > > got killed due to high cpu usage or any other issue
>         > with
>         > > > > > windows
>         > > > > > > > > > slaves.
>         > > > > > > > > > > > >
>         > > > > > > > > > > > > Going forward we are also planning that if we add any
>         > > new
>         > > > > > slave
>         > > > > > > > we
>         > > > > > > > > > will
>         > > > > > > > > > > > not enable the main load immediately, but rather will
>         > do
>         > > > > ‘test
>         > > > > > > > build’
>         > > > > > > > > > to
>         > > > > > > > > > > > make sure that new slaves are not causing any
>         > > > infrastructure
>         > > > > > > issue
>         > > > > > > > > and
>         > > > > > > > > > > > capable to perform as good as existing slaves.
>         > > > > > > > > > > > >
>         > > > > > > > > > > > > -Gautam
>         > > > > > > > > > > > >
>         > > > > > > > > > > > > On 8/31/17, 5:27 PM, "Lupesko, Hagay" <
>         > > lupesko@gmail.com
>         > > > >
>         > > > > > > wrote:
>         > > > > > > > > > > > >
>         > > > > > > > > > > > >     @madan looking into some failures – you’re right…
>         > > > > there’s
>         > > > > > > > > > multiple
>         > > > > > > > > > > > issues going on, some of them intermittent, and we want
>         > > to
>         > > > be
>         > > > > > > able
>         > > > > > > > to
>         > > > > > > > > > > merge
>         > > > > > > > > > > > fixes in.
>         > > > > > > > > > > > >     Agreed that we can wait with setting up protected
>         > > > mode
>         > > > > > > until
>         > > > > > > > > > build
>         > > > > > > > > > > > stabilizes.
>         > > > > > > > > > > > >
>         > > > > > > > > > > > >     On 8/31/17, 11:41, "Madan Jampani" <
>         > > > > > > madan.jampani@gmail.com>
>         > > > > > > > > > > wrote:
>         > > > > > > > > > > > >
>         > > > > > > > > > > > >         @hagay: we agree on the end state. I'm not
>         > too
>         > > > > > > particular
>         > > > > > > > > > about
>         > > > > > > > > > > > how we get
>         > > > > > > > > > > > >         there. If you think enabling it now and fixes
>         > > > > > > regression
>         > > > > > > > > > later
>         > > > > > > > > > > > is doable,
>         > > > > > > > > > > > >         I'm fine with. I see a bit of a chicken and
>         > egg
>         > > > > > > problem.
>         > > > > > > > We
>         > > > > > > > > > > need
>         > > > > > > > > > > > to get
>         > > > > > > > > > > > >         some fixes in even when the status checks are
>         > > > > > failing.
>         > > > > > > > > > > > >
>         > > > > > > > > > > > >         On Thu, Aug 31, 2017 at 11:25 AM, Lupesko,
>         > > Hagay
>         > > > <
>         > > > > > > > > > > > lupesko@gmail.com> wrote:
>         > > > > > > > > > > > >
>         > > > > > > > > > > > >         > @madan – re: getting to a stable CI first:
>         > > > > > > > > > > > >         > I’m concerned that by not enabling
>         > protected
>         > > > > branch
>         > > > > > > > mode
>         > > > > > > > > > > ASAP,
>         > > > > > > > > > > > we’re just
>         > > > > > > > > > > > >         > taking in more regressions, which makes a
>         > > > stable
>         > > > > > > build
>         > > > > > > > a
>         > > > > > > > > > > > moving target for
>         > > > > > > > > > > > >         > us…
>         > > > > > > > > > > > >         >
>         > > > > > > > > > > > >         > On 8/31/17, 10:49, "Zha, Sheng" <
>         > > > > > zhasheng@amazon.com
>         > > > > > > >
>         > > > > > > > > > wrote:
>         > > > > > > > > > > > >         >
>         > > > > > > > > > > > >         >     Just one thing: please don’t disable
>         > more
>         > > > > tests
>         > > > > > > or
>         > > > > > > > > just
>         > > > > > > > > > > > raise the
>         > > > > > > > > > > > >         > tolerance thresholds.
>         > > > > > > > > > > > >         >
>         > > > > > > > > > > > >         >     Best regards,
>         > > > > > > > > > > > >         >     -sz
>         > > > > > > > > > > > >         >
>         > > > > > > > > > > > >         >     On 8/31/17, 10:45 AM, "Madan Jampani" <
>         > > > > > > > > > > > madan.jampani@gmail.com> wrote:
>         > > > > > > > > > > > >         >
>         > > > > > > > > > > > >         >         +1
>         > > > > > > > > > > > >         >         Before we can turn protected mode I
>         > > > feel
>         > > > > we
>         > > > > > > > > should
>         > > > > > > > > > > > first get to a
>         > > > > > > > > > > > >         > stable CI
>         > > > > > > > > > > > >         >         pipeline.
>         > > > > > > > > > > > >         >         Sandeep is chasing down known
>         > > breaking
>         > > > > > > issues.
>         > > > > > > > > > > > >         >
>         > > > > > > > > > > > >         >
>         > > > > > > > > > > > >         >         On Thu, Aug 31, 2017 at 10:27 AM,
>         > > Hagay
>         > > > > > > > Lupesko <
>         > > > > > > > > > > > lupesko@gmail.com>
>         > > > > > > > > > > > >         > wrote:
>         > > > > > > > > > > > >         >
>         > > > > > > > > > > > >         >         > Build stability is a major issue,
>         > > > > builds
>         > > > > > > have
>         > > > > > > > > > been
>         > > > > > > > > > > > failing left
>         > > > > > > > > > > > >         > and right
>         > > > > > > > > > > > >         >         > over the last week. Some of it is
>         > > due
>         > > > > to
>         > > > > > > > > Jenkins
>         > > > > > > > > > > > slave issues,
>         > > > > > > > > > > > >         > but some are
>         > > > > > > > > > > > >         >         > real regressions.
>         > > > > > > > > > > > >         >         > We need to be more strict in the
>         > > code
>         > > > > > we're
>         > > > > > > > > > > > committing.
>         > > > > > > > > > > > >         >         >
>         > > > > > > > > > > > >         >         > I propose we configure our master
>         > > to
>         > > > > be a
>         > > > > > > > > > protected
>         > > > > > > > > > > > branch (
>         > > > > > > > > > > > >         >         >
>         > > > > > > > > > > > https://help.github.com/articles/about-protected-
>         > > branches/
>         > > > ).
>         > > > > > > > > > > > >         >         >
>         > > > > > > > > > > > >         >         > Thoughts?
>         > > > > > > > > > > > >         >         >
>         > > > > > > > > > > > >         >         > On 2017-08-28 22:41, sandeep
>         > > > > > krishnamurthy
>         > > > > > > <
>         > > > > > > > > > > > s...@gmail.com>
>         > > > > > > > > > > > >         > wrote:
>         > > > > > > > > > > > >         >         > > Hello Committers and
>         > > Contributors,>
>         > > > > > > > > > > > >         >         > >
>         > > > > > > > > > > > >         >         > > Due to unstable build
>         > pipelines,
>         > > > from
>         > > > > > > past
>         > > > > > > > 1
>         > > > > > > > > > > week,
>         > > > > > > > > > > > PRs are
>         > > > > > > > > > > > >         > being merged>
>         > > > > > > > > > > > >         >         > > after CR ignoring PR build
>         > > status.
>         > > > > > Build
>         > > > > > > > > > pipeline
>         > > > > > > > > > > > is much more
>         > > > > > > > > > > > >         > stable
>         > > > > > > > > > > > >         >         > than>
>         > > > > > > > > > > > >         >         > > last week and most of the build
>         > > > > > failures
>         > > > > > > > you
>         > > > > > > > > > see
>         > > > > > > > > > > > from now on,
>         > > > > > > > > > > > >         > are likely
>         > > > > > > > > > > > >         >         > to>
>         > > > > > > > > > > > >         >         > > be a valid failure and hence,
>         > it
>         > > is
>         > > > > > > > > recommended
>         > > > > > > > > > > to
>         > > > > > > > > > > > wait for PR
>         > > > > > > > > > > > >         > builds,
>         > > > > > > > > > > > >         >         > see>
>         > > > > > > > > > > > >         >         > > the root cause of any build
>         > > > failures
>         > > > > > > before
>         > > > > > > > > > > > proceeding with
>         > > > > > > > > > > > >         > merges.>
>         > > > > > > > > > > > >         >         > >
>         > > > > > > > > > > > >         >         > > At this point of time, there
>         > are
>         > > 2
>         > > > > > > > > intermittent
>         > > > > > > > > > > > issue yet to
>         > > > > > > > > > > > >         > be fixed ->
>         > > > > > > > > > > > >         >         > > * Network error leading to
>         > GitHub
>         > > > > > > requests
>         > > > > > > > > > > > throwing 404>
>         > > > > > > > > > > > >         >         > > * A conflict in artifacts
>         > > generated
>         > > > > > > between
>         > > > > > > > > > > > branches/PR -
>         > > > > > > > > > > > >         > Cause unknown
>         > > > > > > > > > > > >         >         > yet.>
>         > > > > > > > > > > > >         >         > > These issues will be fixed
>         > soon.>
>         > > > > > > > > > > > >         >         > >
>         > > > > > > > > > > > >         >         > >
>         > > > > > > > > > > > >         >         > > -- >
>         > > > > > > > > > > > >         >         > > Sandeep Krishnamurthy>
>         > > > > > > > > > > > >         >         > >
>         > > > > > > > > > > > >         >         >
>         > > > > > > > > > > > >         >
>         > > > > > > > > > > > >         >
>         > > > > > > > > > > > >         >
>         > > > > > > > > > > > >         >
>         > > > > > > > > > > > >         >
>         > > > > > > > > > > > >         >
>         > > > > > > > > > > > >
>         > > > > > > > > > > > >
>         > > > > > > > > > > > >
>         > > > > > > > > > > > >
>         > > > > > > > > > > > >
>         > > > > > > > > > > >
>         > > > > > > > > > > >
>         > > > > > > > > > > >
>         > > > > > > > > > > > --
>         > > > > > > > > > > > - Tsuyoshi
>         > > > > > > > > > > >
>         > > > > > > > > > >
>         > > > > > > > > >
>         > > > > > > > > >
>         > > > > > > > > >
>         > > > > > > > > > --
>         > > > > > > > > > Best Regards,
>         > > > > > > > > > Gautam Kumar
>         > > > > > > > > >
>         > > > > > > > >
>         > > > > > > >
>         > > > > > > >
>         > > > > > > >
>         > > > > > > > --
>         > > > > > > > Best Regards,
>         > > > > > > > Gautam Kumar
>         > > > > > > >
>         > > > > > >
>         > > > > >
>         > > > >
>         > > >
>         > >
>         >
>
>
>
>

Re: Apache MXNet build failures are mostly valid - verify before merge

Posted by "Zha, Sheng" <zh...@amazon.com>.
+1 on protected branch.

Best regards,
-sz

On 9/28/17, 11:48 AM, "Kumar, Gautam" <ga...@amazon.com> wrote:

    Hi Guys, 
    
     Let’s focus on specific issue here. 
    
    Marking the master branch protected which involves “Only merge if checks has passed, and yes it will run the complete build”. 
    
    We can’t afford to degrade the quality and keep debugging the build failure forever. If it’s slow down the development at the cost of quality I will vote for the quality. 
    We can work on improving the infrastructure to improve the overall speed.  If you have any specific concerns on availability of Jenkins please point out. 
    
    -Gautam 
    
    
    On 9/28/17, 11:38 AM, "Chris Olivier" <cj...@gmail.com> wrote:
    
        -1000 on that. :)
        
        On Thu, Sep 28, 2017 at 11:33 AM Naveen Swamy <mn...@gmail.com> wrote:
        
        > PR->Sanity test/Linux build/test->reviewer/committer approves the
        > change->Comment "Build Now" (Or trigger on at least one approval from a
        > committer other than author)->*Full build-*>*passes build*->Enable Merge
        >
        > Let us take this particular topic to a separate thread or discuss offline
        > if further clarification is needed.
        >
        > On Thu, Sep 28, 2017 at 11:24 AM, Chris Olivier <cj...@gmail.com>
        > wrote:
        >
        > > I understand the proposal.  How to trigger a build in that case?
        > >
        > >
        > > On Thu, Sep 28, 2017 at 10:54 AM Madan Jampani <ma...@gmail.com>
        > > wrote:
        > >
        > > > Chris,
        > > > I don't think Naveen is suggesting that a merge happen without full
        > > > verification i.e. all tests across all platforms pass.
        > > > If a PR has some back and forth and results in multiple revisions
        > (which
        > > is
        > > > arguably more common than a random unit test failing), we simply delay
        > > full
        > > > verification until the owner/reviewer have settled on a mutually
        > > acceptable
        > > > state.
        > > >
        > > > On Thu, Sep 28, 2017 at 10:25 AM, Chris Olivier <cjolivier01@gmail.com
        > >
        > > > wrote:
        > > >
        > > > > -1 for running only partial tests.  Most failing unit tests that get
        > > > > through fail only for certain platforms/configurations.  I personally
        > > > > prefer to be assured the build and test is good before merge.  Most
        > PR
        > > > > merges aren't in a huge hurry.
        > > > >
        > > > > On Thu, Sep 28, 2017 at 9:54 AM, Naveen Swamy <mn...@gmail.com>
        > > > wrote:
        > > > >
        > > > > > +1 to make it protected. Here is what I am thinking for PR builds
        > > > > > on a PR run Sanity Tests + build/test one platform->committer
        > reviews
        > > > the
        > > > > > code and issues "Build Now", a full build is run->Github checks
        > that
        > > > the
        > > > > > full build checks succeed before it can be merged.
        > > > > >
        > > > > > I agree with Madan that PR should be approved by one another
        > > committer.
        > > > > >
        > > > > >
        > > > > >
        > > > > > On Thu, Sep 28, 2017 at 9:37 AM, Madan Jampani <
        > > > madan.jampani@gmail.com>
        > > > > > wrote:
        > > > > >
        > > > > > > +1
        > > > > > >
        > > > > > > At a minimum I'd like to see the following two happen:
        > > > > > > - Option to merge is disabled until all required checks pass.
        > > > > > > - Code is reviewed and given +1 by at least one other committer
        > (no
        > > > > self
        > > > > > > review).
        > > > > > >
        > > > > > > On Wed, Sep 27, 2017 at 11:15 PM, Gautam <ga...@gmail.com>
        > > > wrote:
        > > > > > >
        > > > > > > > Hi Chris,
        > > > > > > >
        > > > > > > >   Here <https://help.github.com/articles/about-protected-
        > > branches/
        > > > >
        > > > > is
        > > > > > > > user
        > > > > > > > document on semantics of protected branch.
        > > > > > > > In short when a branch is protected following applies to that
        > > > branch.
        > > > > > > >
        > > > > > > >    - Can't be force pushed
        > > > > > > >    - Can't be deleted
        > > > > > > >    - Can't have changes merged into it until required status
        > > checks
        > > > > > > >    <https://help.github.com/articles/about-required-
        > > status-checks>
        > > > > > pass
        > > > > > > >    - Can't have changes merged into it until required reviews
        > are
        > > > > > > approved
        > > > > > > >    <https://help.github.com/articles/approving-a-pull-
        > > > > > > > request-with-required-reviews>
        > > > > > > >    - Can't be edited or have files uploaded to it from the web
        > > > > > > >    - Can't have changes merged into it until changes to files
        > > that
        > > > > > > > have a designated
        > > > > > > >    code owner <https://help.github.com/
        > > articles/about-codeowners/>
        > > > > > have
        > > > > > > >    been approved by that owner
        > > > > > > >
        > > > > > > >  I am sure many of us might not want to have all these but we
        > can
        > > > > > debate
        > > > > > > on
        > > > > > > > it. My main motive was to "*Can't have changes merged into it
        > > until
        > > > > > > > required status checks pass*"
        > > > > > > >
        > > > > > > >
        > > > > > > > -Gautam
        > > > > > > >
        > > > > > > >
        > > > > > > >
        > > > > > > > On Wed, Sep 27, 2017 at 11:09 PM, Chris Olivier <
        > > > > cjolivier01@gmail.com
        > > > > > >
        > > > > > > > wrote:
        > > > > > > >
        > > > > > > > > What does that mean? "Protected"? Protected from what?
        > > > > > > > >
        > > > > > > > > On Wed, Sep 27, 2017 at 11:08 PM Gautam <
        > gautamnitc@gmail.com>
        > > > > > wrote:
        > > > > > > > >
        > > > > > > > > > Hi Chris,
        > > > > > > > > >
        > > > > > > > > >    I mean make "master branch protected" of  MXNet.
        > > > > > > > > >
        > > > > > > > > > -Gautam
        > > > > > > > > >
        > > > > > > > > > On Wed, Sep 27, 2017 at 11:04 PM, Chris Olivier <
        > > > > > > cjolivier01@gmail.com
        > > > > > > > >
        > > > > > > > > > wrote:
        > > > > > > > > >
        > > > > > > > > > > What does this mean? "Mx-net branch protected"?
        > > > > > > > > > >
        > > > > > > > > > > On Wed, Sep 27, 2017 at 9:59 PM Tsuyoshi OZAWA <
        > > > > > > > > ozawa.tsuyoshi@gmail.com
        > > > > > > > > > >
        > > > > > > > > > > wrote:
        > > > > > > > > > >
        > > > > > > > > > > > +1,
        > > > > > > > > > > >
        > > > > > > > > > > > While I'm checking the recent build failures, and I
        > think
        > > > the
        > > > > > > > > decision
        > > > > > > > > > > > of making the mx-net branch protected is necessary for
        > > > stable
        > > > > > > > > > > > building.
        > > > > > > > > > > > Thanks Kumar for resuming important discussion.
        > > > > > > > > > > >
        > > > > > > > > > > > Best regards
        > > > > > > > > > > > - Tsuyoshi
        > > > > > > > > > > >
        > > > > > > > > > > > On Thu, Sep 28, 2017 at 12:56 PM, Kumar, Gautam <
        > > > > > > gauta@amazon.com>
        > > > > > > > > > > wrote:
        > > > > > > > > > > > > Reviving the discussion.
        > > > > > > > > > > > >
        > > > > > > > > > > > > At this point of time we have couple of stable builds
        > > > > > > > > > > > >
        > > > > > > > > > > >
        > https://builds.apache.org/view/Incubator%20Projects/job/
        > > > > > > > > > > incubator-mxnet/job/master/448/
        > > > > > > > > > > > >
        > > > > > > > > > > >
        > https://builds.apache.org/view/Incubator%20Projects/job/
        > > > > > > > > > > incubator-mxnet/job/master/449/
        > > > > > > > > > > > >
        > > > > > > > > > > > > Should we have a quick discussion or polling on
        > making
        > > > the
        > > > > > > mx-net
        > > > > > > > > > > branch
        > > > > > > > > > > > protected? If you still think we shouldn’t make it
        > > > protected
        > > > > > > please
        > > > > > > > > > > provide
        > > > > > > > > > > > a reason to support your claim.
        > > > > > > > > > > > >
        > > > > > > > > > > > > Few of us have concern over Jenkin’s stability. If I
        > > look
        > > > > two
        > > > > > > > weeks
        > > > > > > > > > > > back, after upgrading Linux slave to g2.8x and new
        > > windows
        > > > > AMI,
        > > > > > > we
        > > > > > > > > have
        > > > > > > > > > > not
        > > > > > > > > > > > seen any case where instance died due to high memory
        > > usage
        > > > or
        > > > > > any
        > > > > > > > > > process
        > > > > > > > > > > > got killed due to high cpu usage or any other issue
        > with
        > > > > > windows
        > > > > > > > > > slaves.
        > > > > > > > > > > > >
        > > > > > > > > > > > > Going forward we are also planning that if we add any
        > > new
        > > > > > slave
        > > > > > > > we
        > > > > > > > > > will
        > > > > > > > > > > > not enable the main load immediately, but rather will
        > do
        > > > > ‘test
        > > > > > > > build’
        > > > > > > > > > to
        > > > > > > > > > > > make sure that new slaves are not causing any
        > > > infrastructure
        > > > > > > issue
        > > > > > > > > and
        > > > > > > > > > > > capable to perform as good as existing slaves.
        > > > > > > > > > > > >
        > > > > > > > > > > > > -Gautam
        > > > > > > > > > > > >
        > > > > > > > > > > > > On 8/31/17, 5:27 PM, "Lupesko, Hagay" <
        > > lupesko@gmail.com
        > > > >
        > > > > > > wrote:
        > > > > > > > > > > > >
        > > > > > > > > > > > >     @madan looking into some failures – you’re right…
        > > > > there’s
        > > > > > > > > > multiple
        > > > > > > > > > > > issues going on, some of them intermittent, and we want
        > > to
        > > > be
        > > > > > > able
        > > > > > > > to
        > > > > > > > > > > merge
        > > > > > > > > > > > fixes in.
        > > > > > > > > > > > >     Agreed that we can wait with setting up protected
        > > > mode
        > > > > > > until
        > > > > > > > > > build
        > > > > > > > > > > > stabilizes.
        > > > > > > > > > > > >
        > > > > > > > > > > > >     On 8/31/17, 11:41, "Madan Jampani" <
        > > > > > > madan.jampani@gmail.com>
        > > > > > > > > > > wrote:
        > > > > > > > > > > > >
        > > > > > > > > > > > >         @hagay: we agree on the end state. I'm not
        > too
        > > > > > > particular
        > > > > > > > > > about
        > > > > > > > > > > > how we get
        > > > > > > > > > > > >         there. If you think enabling it now and fixes
        > > > > > > regression
        > > > > > > > > > later
        > > > > > > > > > > > is doable,
        > > > > > > > > > > > >         I'm fine with. I see a bit of a chicken and
        > egg
        > > > > > > problem.
        > > > > > > > We
        > > > > > > > > > > need
        > > > > > > > > > > > to get
        > > > > > > > > > > > >         some fixes in even when the status checks are
        > > > > > failing.
        > > > > > > > > > > > >
        > > > > > > > > > > > >         On Thu, Aug 31, 2017 at 11:25 AM, Lupesko,
        > > Hagay
        > > > <
        > > > > > > > > > > > lupesko@gmail.com> wrote:
        > > > > > > > > > > > >
        > > > > > > > > > > > >         > @madan – re: getting to a stable CI first:
        > > > > > > > > > > > >         > I’m concerned that by not enabling
        > protected
        > > > > branch
        > > > > > > > mode
        > > > > > > > > > > ASAP,
        > > > > > > > > > > > we’re just
        > > > > > > > > > > > >         > taking in more regressions, which makes a
        > > > stable
        > > > > > > build
        > > > > > > > a
        > > > > > > > > > > > moving target for
        > > > > > > > > > > > >         > us…
        > > > > > > > > > > > >         >
        > > > > > > > > > > > >         > On 8/31/17, 10:49, "Zha, Sheng" <
        > > > > > zhasheng@amazon.com
        > > > > > > >
        > > > > > > > > > wrote:
        > > > > > > > > > > > >         >
        > > > > > > > > > > > >         >     Just one thing: please don’t disable
        > more
        > > > > tests
        > > > > > > or
        > > > > > > > > just
        > > > > > > > > > > > raise the
        > > > > > > > > > > > >         > tolerance thresholds.
        > > > > > > > > > > > >         >
        > > > > > > > > > > > >         >     Best regards,
        > > > > > > > > > > > >         >     -sz
        > > > > > > > > > > > >         >
        > > > > > > > > > > > >         >     On 8/31/17, 10:45 AM, "Madan Jampani" <
        > > > > > > > > > > > madan.jampani@gmail.com> wrote:
        > > > > > > > > > > > >         >
        > > > > > > > > > > > >         >         +1
        > > > > > > > > > > > >         >         Before we can turn protected mode I
        > > > feel
        > > > > we
        > > > > > > > > should
        > > > > > > > > > > > first get to a
        > > > > > > > > > > > >         > stable CI
        > > > > > > > > > > > >         >         pipeline.
        > > > > > > > > > > > >         >         Sandeep is chasing down known
        > > breaking
        > > > > > > issues.
        > > > > > > > > > > > >         >
        > > > > > > > > > > > >         >
        > > > > > > > > > > > >         >         On Thu, Aug 31, 2017 at 10:27 AM,
        > > Hagay
        > > > > > > > Lupesko <
        > > > > > > > > > > > lupesko@gmail.com>
        > > > > > > > > > > > >         > wrote:
        > > > > > > > > > > > >         >
        > > > > > > > > > > > >         >         > Build stability is a major issue,
        > > > > builds
        > > > > > > have
        > > > > > > > > > been
        > > > > > > > > > > > failing left
        > > > > > > > > > > > >         > and right
        > > > > > > > > > > > >         >         > over the last week. Some of it is
        > > due
        > > > > to
        > > > > > > > > Jenkins
        > > > > > > > > > > > slave issues,
        > > > > > > > > > > > >         > but some are
        > > > > > > > > > > > >         >         > real regressions.
        > > > > > > > > > > > >         >         > We need to be more strict in the
        > > code
        > > > > > we're
        > > > > > > > > > > > committing.
        > > > > > > > > > > > >         >         >
        > > > > > > > > > > > >         >         > I propose we configure our master
        > > to
        > > > > be a
        > > > > > > > > > protected
        > > > > > > > > > > > branch (
        > > > > > > > > > > > >         >         >
        > > > > > > > > > > > https://help.github.com/articles/about-protected-
        > > branches/
        > > > ).
        > > > > > > > > > > > >         >         >
        > > > > > > > > > > > >         >         > Thoughts?
        > > > > > > > > > > > >         >         >
        > > > > > > > > > > > >         >         > On 2017-08-28 22:41, sandeep
        > > > > > krishnamurthy
        > > > > > > <
        > > > > > > > > > > > s...@gmail.com>
        > > > > > > > > > > > >         > wrote:
        > > > > > > > > > > > >         >         > > Hello Committers and
        > > Contributors,>
        > > > > > > > > > > > >         >         > >
        > > > > > > > > > > > >         >         > > Due to unstable build
        > pipelines,
        > > > from
        > > > > > > past
        > > > > > > > 1
        > > > > > > > > > > week,
        > > > > > > > > > > > PRs are
        > > > > > > > > > > > >         > being merged>
        > > > > > > > > > > > >         >         > > after CR ignoring PR build
        > > status.
        > > > > > Build
        > > > > > > > > > pipeline
        > > > > > > > > > > > is much more
        > > > > > > > > > > > >         > stable
        > > > > > > > > > > > >         >         > than>
        > > > > > > > > > > > >         >         > > last week and most of the build
        > > > > > failures
        > > > > > > > you
        > > > > > > > > > see
        > > > > > > > > > > > from now on,
        > > > > > > > > > > > >         > are likely
        > > > > > > > > > > > >         >         > to>
        > > > > > > > > > > > >         >         > > be a valid failure and hence,
        > it
        > > is
        > > > > > > > > recommended
        > > > > > > > > > > to
        > > > > > > > > > > > wait for PR
        > > > > > > > > > > > >         > builds,
        > > > > > > > > > > > >         >         > see>
        > > > > > > > > > > > >         >         > > the root cause of any build
        > > > failures
        > > > > > > before
        > > > > > > > > > > > proceeding with
        > > > > > > > > > > > >         > merges.>
        > > > > > > > > > > > >         >         > >
        > > > > > > > > > > > >         >         > > At this point of time, there
        > are
        > > 2
        > > > > > > > > intermittent
        > > > > > > > > > > > issue yet to
        > > > > > > > > > > > >         > be fixed ->
        > > > > > > > > > > > >         >         > > * Network error leading to
        > GitHub
        > > > > > > requests
        > > > > > > > > > > > throwing 404>
        > > > > > > > > > > > >         >         > > * A conflict in artifacts
        > > generated
        > > > > > > between
        > > > > > > > > > > > branches/PR -
        > > > > > > > > > > > >         > Cause unknown
        > > > > > > > > > > > >         >         > yet.>
        > > > > > > > > > > > >         >         > > These issues will be fixed
        > soon.>
        > > > > > > > > > > > >         >         > >
        > > > > > > > > > > > >         >         > >
        > > > > > > > > > > > >         >         > > -- >
        > > > > > > > > > > > >         >         > > Sandeep Krishnamurthy>
        > > > > > > > > > > > >         >         > >
        > > > > > > > > > > > >         >         >
        > > > > > > > > > > > >         >
        > > > > > > > > > > > >         >
        > > > > > > > > > > > >         >
        > > > > > > > > > > > >         >
        > > > > > > > > > > > >         >
        > > > > > > > > > > > >         >
        > > > > > > > > > > > >
        > > > > > > > > > > > >
        > > > > > > > > > > > >
        > > > > > > > > > > > >
        > > > > > > > > > > > >
        > > > > > > > > > > >
        > > > > > > > > > > >
        > > > > > > > > > > >
        > > > > > > > > > > > --
        > > > > > > > > > > > - Tsuyoshi
        > > > > > > > > > > >
        > > > > > > > > > >
        > > > > > > > > >
        > > > > > > > > >
        > > > > > > > > >
        > > > > > > > > > --
        > > > > > > > > > Best Regards,
        > > > > > > > > > Gautam Kumar
        > > > > > > > > >
        > > > > > > > >
        > > > > > > >
        > > > > > > >
        > > > > > > >
        > > > > > > > --
        > > > > > > > Best Regards,
        > > > > > > > Gautam Kumar
        > > > > > > >
        > > > > > >
        > > > > >
        > > > >
        > > >
        > >
        >
        
    
    


Re: Apache MXNet build failures are mostly valid - verify before merge

Posted by "Kumar, Gautam" <ga...@amazon.com>.
Hi Guys, 

 Let’s focus on specific issue here. 

Marking the master branch protected which involves “Only merge if checks has passed, and yes it will run the complete build”. 

We can’t afford to degrade the quality and keep debugging the build failure forever. If it’s slow down the development at the cost of quality I will vote for the quality. 
We can work on improving the infrastructure to improve the overall speed.  If you have any specific concerns on availability of Jenkins please point out. 

-Gautam 


On 9/28/17, 11:38 AM, "Chris Olivier" <cj...@gmail.com> wrote:

    -1000 on that. :)
    
    On Thu, Sep 28, 2017 at 11:33 AM Naveen Swamy <mn...@gmail.com> wrote:
    
    > PR->Sanity test/Linux build/test->reviewer/committer approves the
    > change->Comment "Build Now" (Or trigger on at least one approval from a
    > committer other than author)->*Full build-*>*passes build*->Enable Merge
    >
    > Let us take this particular topic to a separate thread or discuss offline
    > if further clarification is needed.
    >
    > On Thu, Sep 28, 2017 at 11:24 AM, Chris Olivier <cj...@gmail.com>
    > wrote:
    >
    > > I understand the proposal.  How to trigger a build in that case?
    > >
    > >
    > > On Thu, Sep 28, 2017 at 10:54 AM Madan Jampani <ma...@gmail.com>
    > > wrote:
    > >
    > > > Chris,
    > > > I don't think Naveen is suggesting that a merge happen without full
    > > > verification i.e. all tests across all platforms pass.
    > > > If a PR has some back and forth and results in multiple revisions
    > (which
    > > is
    > > > arguably more common than a random unit test failing), we simply delay
    > > full
    > > > verification until the owner/reviewer have settled on a mutually
    > > acceptable
    > > > state.
    > > >
    > > > On Thu, Sep 28, 2017 at 10:25 AM, Chris Olivier <cjolivier01@gmail.com
    > >
    > > > wrote:
    > > >
    > > > > -1 for running only partial tests.  Most failing unit tests that get
    > > > > through fail only for certain platforms/configurations.  I personally
    > > > > prefer to be assured the build and test is good before merge.  Most
    > PR
    > > > > merges aren't in a huge hurry.
    > > > >
    > > > > On Thu, Sep 28, 2017 at 9:54 AM, Naveen Swamy <mn...@gmail.com>
    > > > wrote:
    > > > >
    > > > > > +1 to make it protected. Here is what I am thinking for PR builds
    > > > > > on a PR run Sanity Tests + build/test one platform->committer
    > reviews
    > > > the
    > > > > > code and issues "Build Now", a full build is run->Github checks
    > that
    > > > the
    > > > > > full build checks succeed before it can be merged.
    > > > > >
    > > > > > I agree with Madan that PR should be approved by one another
    > > committer.
    > > > > >
    > > > > >
    > > > > >
    > > > > > On Thu, Sep 28, 2017 at 9:37 AM, Madan Jampani <
    > > > madan.jampani@gmail.com>
    > > > > > wrote:
    > > > > >
    > > > > > > +1
    > > > > > >
    > > > > > > At a minimum I'd like to see the following two happen:
    > > > > > > - Option to merge is disabled until all required checks pass.
    > > > > > > - Code is reviewed and given +1 by at least one other committer
    > (no
    > > > > self
    > > > > > > review).
    > > > > > >
    > > > > > > On Wed, Sep 27, 2017 at 11:15 PM, Gautam <ga...@gmail.com>
    > > > wrote:
    > > > > > >
    > > > > > > > Hi Chris,
    > > > > > > >
    > > > > > > >   Here <https://help.github.com/articles/about-protected-
    > > branches/
    > > > >
    > > > > is
    > > > > > > > user
    > > > > > > > document on semantics of protected branch.
    > > > > > > > In short when a branch is protected following applies to that
    > > > branch.
    > > > > > > >
    > > > > > > >    - Can't be force pushed
    > > > > > > >    - Can't be deleted
    > > > > > > >    - Can't have changes merged into it until required status
    > > checks
    > > > > > > >    <https://help.github.com/articles/about-required-
    > > status-checks>
    > > > > > pass
    > > > > > > >    - Can't have changes merged into it until required reviews
    > are
    > > > > > > approved
    > > > > > > >    <https://help.github.com/articles/approving-a-pull-
    > > > > > > > request-with-required-reviews>
    > > > > > > >    - Can't be edited or have files uploaded to it from the web
    > > > > > > >    - Can't have changes merged into it until changes to files
    > > that
    > > > > > > > have a designated
    > > > > > > >    code owner <https://help.github.com/
    > > articles/about-codeowners/>
    > > > > > have
    > > > > > > >    been approved by that owner
    > > > > > > >
    > > > > > > >  I am sure many of us might not want to have all these but we
    > can
    > > > > > debate
    > > > > > > on
    > > > > > > > it. My main motive was to "*Can't have changes merged into it
    > > until
    > > > > > > > required status checks pass*"
    > > > > > > >
    > > > > > > >
    > > > > > > > -Gautam
    > > > > > > >
    > > > > > > >
    > > > > > > >
    > > > > > > > On Wed, Sep 27, 2017 at 11:09 PM, Chris Olivier <
    > > > > cjolivier01@gmail.com
    > > > > > >
    > > > > > > > wrote:
    > > > > > > >
    > > > > > > > > What does that mean? "Protected"? Protected from what?
    > > > > > > > >
    > > > > > > > > On Wed, Sep 27, 2017 at 11:08 PM Gautam <
    > gautamnitc@gmail.com>
    > > > > > wrote:
    > > > > > > > >
    > > > > > > > > > Hi Chris,
    > > > > > > > > >
    > > > > > > > > >    I mean make "master branch protected" of  MXNet.
    > > > > > > > > >
    > > > > > > > > > -Gautam
    > > > > > > > > >
    > > > > > > > > > On Wed, Sep 27, 2017 at 11:04 PM, Chris Olivier <
    > > > > > > cjolivier01@gmail.com
    > > > > > > > >
    > > > > > > > > > wrote:
    > > > > > > > > >
    > > > > > > > > > > What does this mean? "Mx-net branch protected"?
    > > > > > > > > > >
    > > > > > > > > > > On Wed, Sep 27, 2017 at 9:59 PM Tsuyoshi OZAWA <
    > > > > > > > > ozawa.tsuyoshi@gmail.com
    > > > > > > > > > >
    > > > > > > > > > > wrote:
    > > > > > > > > > >
    > > > > > > > > > > > +1,
    > > > > > > > > > > >
    > > > > > > > > > > > While I'm checking the recent build failures, and I
    > think
    > > > the
    > > > > > > > > decision
    > > > > > > > > > > > of making the mx-net branch protected is necessary for
    > > > stable
    > > > > > > > > > > > building.
    > > > > > > > > > > > Thanks Kumar for resuming important discussion.
    > > > > > > > > > > >
    > > > > > > > > > > > Best regards
    > > > > > > > > > > > - Tsuyoshi
    > > > > > > > > > > >
    > > > > > > > > > > > On Thu, Sep 28, 2017 at 12:56 PM, Kumar, Gautam <
    > > > > > > gauta@amazon.com>
    > > > > > > > > > > wrote:
    > > > > > > > > > > > > Reviving the discussion.
    > > > > > > > > > > > >
    > > > > > > > > > > > > At this point of time we have couple of stable builds
    > > > > > > > > > > > >
    > > > > > > > > > > >
    > https://builds.apache.org/view/Incubator%20Projects/job/
    > > > > > > > > > > incubator-mxnet/job/master/448/
    > > > > > > > > > > > >
    > > > > > > > > > > >
    > https://builds.apache.org/view/Incubator%20Projects/job/
    > > > > > > > > > > incubator-mxnet/job/master/449/
    > > > > > > > > > > > >
    > > > > > > > > > > > > Should we have a quick discussion or polling on
    > making
    > > > the
    > > > > > > mx-net
    > > > > > > > > > > branch
    > > > > > > > > > > > protected? If you still think we shouldn’t make it
    > > > protected
    > > > > > > please
    > > > > > > > > > > provide
    > > > > > > > > > > > a reason to support your claim.
    > > > > > > > > > > > >
    > > > > > > > > > > > > Few of us have concern over Jenkin’s stability. If I
    > > look
    > > > > two
    > > > > > > > weeks
    > > > > > > > > > > > back, after upgrading Linux slave to g2.8x and new
    > > windows
    > > > > AMI,
    > > > > > > we
    > > > > > > > > have
    > > > > > > > > > > not
    > > > > > > > > > > > seen any case where instance died due to high memory
    > > usage
    > > > or
    > > > > > any
    > > > > > > > > > process
    > > > > > > > > > > > got killed due to high cpu usage or any other issue
    > with
    > > > > > windows
    > > > > > > > > > slaves.
    > > > > > > > > > > > >
    > > > > > > > > > > > > Going forward we are also planning that if we add any
    > > new
    > > > > > slave
    > > > > > > > we
    > > > > > > > > > will
    > > > > > > > > > > > not enable the main load immediately, but rather will
    > do
    > > > > ‘test
    > > > > > > > build’
    > > > > > > > > > to
    > > > > > > > > > > > make sure that new slaves are not causing any
    > > > infrastructure
    > > > > > > issue
    > > > > > > > > and
    > > > > > > > > > > > capable to perform as good as existing slaves.
    > > > > > > > > > > > >
    > > > > > > > > > > > > -Gautam
    > > > > > > > > > > > >
    > > > > > > > > > > > > On 8/31/17, 5:27 PM, "Lupesko, Hagay" <
    > > lupesko@gmail.com
    > > > >
    > > > > > > wrote:
    > > > > > > > > > > > >
    > > > > > > > > > > > >     @madan looking into some failures – you’re right…
    > > > > there’s
    > > > > > > > > > multiple
    > > > > > > > > > > > issues going on, some of them intermittent, and we want
    > > to
    > > > be
    > > > > > > able
    > > > > > > > to
    > > > > > > > > > > merge
    > > > > > > > > > > > fixes in.
    > > > > > > > > > > > >     Agreed that we can wait with setting up protected
    > > > mode
    > > > > > > until
    > > > > > > > > > build
    > > > > > > > > > > > stabilizes.
    > > > > > > > > > > > >
    > > > > > > > > > > > >     On 8/31/17, 11:41, "Madan Jampani" <
    > > > > > > madan.jampani@gmail.com>
    > > > > > > > > > > wrote:
    > > > > > > > > > > > >
    > > > > > > > > > > > >         @hagay: we agree on the end state. I'm not
    > too
    > > > > > > particular
    > > > > > > > > > about
    > > > > > > > > > > > how we get
    > > > > > > > > > > > >         there. If you think enabling it now and fixes
    > > > > > > regression
    > > > > > > > > > later
    > > > > > > > > > > > is doable,
    > > > > > > > > > > > >         I'm fine with. I see a bit of a chicken and
    > egg
    > > > > > > problem.
    > > > > > > > We
    > > > > > > > > > > need
    > > > > > > > > > > > to get
    > > > > > > > > > > > >         some fixes in even when the status checks are
    > > > > > failing.
    > > > > > > > > > > > >
    > > > > > > > > > > > >         On Thu, Aug 31, 2017 at 11:25 AM, Lupesko,
    > > Hagay
    > > > <
    > > > > > > > > > > > lupesko@gmail.com> wrote:
    > > > > > > > > > > > >
    > > > > > > > > > > > >         > @madan – re: getting to a stable CI first:
    > > > > > > > > > > > >         > I’m concerned that by not enabling
    > protected
    > > > > branch
    > > > > > > > mode
    > > > > > > > > > > ASAP,
    > > > > > > > > > > > we’re just
    > > > > > > > > > > > >         > taking in more regressions, which makes a
    > > > stable
    > > > > > > build
    > > > > > > > a
    > > > > > > > > > > > moving target for
    > > > > > > > > > > > >         > us…
    > > > > > > > > > > > >         >
    > > > > > > > > > > > >         > On 8/31/17, 10:49, "Zha, Sheng" <
    > > > > > zhasheng@amazon.com
    > > > > > > >
    > > > > > > > > > wrote:
    > > > > > > > > > > > >         >
    > > > > > > > > > > > >         >     Just one thing: please don’t disable
    > more
    > > > > tests
    > > > > > > or
    > > > > > > > > just
    > > > > > > > > > > > raise the
    > > > > > > > > > > > >         > tolerance thresholds.
    > > > > > > > > > > > >         >
    > > > > > > > > > > > >         >     Best regards,
    > > > > > > > > > > > >         >     -sz
    > > > > > > > > > > > >         >
    > > > > > > > > > > > >         >     On 8/31/17, 10:45 AM, "Madan Jampani" <
    > > > > > > > > > > > madan.jampani@gmail.com> wrote:
    > > > > > > > > > > > >         >
    > > > > > > > > > > > >         >         +1
    > > > > > > > > > > > >         >         Before we can turn protected mode I
    > > > feel
    > > > > we
    > > > > > > > > should
    > > > > > > > > > > > first get to a
    > > > > > > > > > > > >         > stable CI
    > > > > > > > > > > > >         >         pipeline.
    > > > > > > > > > > > >         >         Sandeep is chasing down known
    > > breaking
    > > > > > > issues.
    > > > > > > > > > > > >         >
    > > > > > > > > > > > >         >
    > > > > > > > > > > > >         >         On Thu, Aug 31, 2017 at 10:27 AM,
    > > Hagay
    > > > > > > > Lupesko <
    > > > > > > > > > > > lupesko@gmail.com>
    > > > > > > > > > > > >         > wrote:
    > > > > > > > > > > > >         >
    > > > > > > > > > > > >         >         > Build stability is a major issue,
    > > > > builds
    > > > > > > have
    > > > > > > > > > been
    > > > > > > > > > > > failing left
    > > > > > > > > > > > >         > and right
    > > > > > > > > > > > >         >         > over the last week. Some of it is
    > > due
    > > > > to
    > > > > > > > > Jenkins
    > > > > > > > > > > > slave issues,
    > > > > > > > > > > > >         > but some are
    > > > > > > > > > > > >         >         > real regressions.
    > > > > > > > > > > > >         >         > We need to be more strict in the
    > > code
    > > > > > we're
    > > > > > > > > > > > committing.
    > > > > > > > > > > > >         >         >
    > > > > > > > > > > > >         >         > I propose we configure our master
    > > to
    > > > > be a
    > > > > > > > > > protected
    > > > > > > > > > > > branch (
    > > > > > > > > > > > >         >         >
    > > > > > > > > > > > https://help.github.com/articles/about-protected-
    > > branches/
    > > > ).
    > > > > > > > > > > > >         >         >
    > > > > > > > > > > > >         >         > Thoughts?
    > > > > > > > > > > > >         >         >
    > > > > > > > > > > > >         >         > On 2017-08-28 22:41, sandeep
    > > > > > krishnamurthy
    > > > > > > <
    > > > > > > > > > > > s...@gmail.com>
    > > > > > > > > > > > >         > wrote:
    > > > > > > > > > > > >         >         > > Hello Committers and
    > > Contributors,>
    > > > > > > > > > > > >         >         > >
    > > > > > > > > > > > >         >         > > Due to unstable build
    > pipelines,
    > > > from
    > > > > > > past
    > > > > > > > 1
    > > > > > > > > > > week,
    > > > > > > > > > > > PRs are
    > > > > > > > > > > > >         > being merged>
    > > > > > > > > > > > >         >         > > after CR ignoring PR build
    > > status.
    > > > > > Build
    > > > > > > > > > pipeline
    > > > > > > > > > > > is much more
    > > > > > > > > > > > >         > stable
    > > > > > > > > > > > >         >         > than>
    > > > > > > > > > > > >         >         > > last week and most of the build
    > > > > > failures
    > > > > > > > you
    > > > > > > > > > see
    > > > > > > > > > > > from now on,
    > > > > > > > > > > > >         > are likely
    > > > > > > > > > > > >         >         > to>
    > > > > > > > > > > > >         >         > > be a valid failure and hence,
    > it
    > > is
    > > > > > > > > recommended
    > > > > > > > > > > to
    > > > > > > > > > > > wait for PR
    > > > > > > > > > > > >         > builds,
    > > > > > > > > > > > >         >         > see>
    > > > > > > > > > > > >         >         > > the root cause of any build
    > > > failures
    > > > > > > before
    > > > > > > > > > > > proceeding with
    > > > > > > > > > > > >         > merges.>
    > > > > > > > > > > > >         >         > >
    > > > > > > > > > > > >         >         > > At this point of time, there
    > are
    > > 2
    > > > > > > > > intermittent
    > > > > > > > > > > > issue yet to
    > > > > > > > > > > > >         > be fixed ->
    > > > > > > > > > > > >         >         > > * Network error leading to
    > GitHub
    > > > > > > requests
    > > > > > > > > > > > throwing 404>
    > > > > > > > > > > > >         >         > > * A conflict in artifacts
    > > generated
    > > > > > > between
    > > > > > > > > > > > branches/PR -
    > > > > > > > > > > > >         > Cause unknown
    > > > > > > > > > > > >         >         > yet.>
    > > > > > > > > > > > >         >         > > These issues will be fixed
    > soon.>
    > > > > > > > > > > > >         >         > >
    > > > > > > > > > > > >         >         > >
    > > > > > > > > > > > >         >         > > -- >
    > > > > > > > > > > > >         >         > > Sandeep Krishnamurthy>
    > > > > > > > > > > > >         >         > >
    > > > > > > > > > > > >         >         >
    > > > > > > > > > > > >         >
    > > > > > > > > > > > >         >
    > > > > > > > > > > > >         >
    > > > > > > > > > > > >         >
    > > > > > > > > > > > >         >
    > > > > > > > > > > > >         >
    > > > > > > > > > > > >
    > > > > > > > > > > > >
    > > > > > > > > > > > >
    > > > > > > > > > > > >
    > > > > > > > > > > > >
    > > > > > > > > > > >
    > > > > > > > > > > >
    > > > > > > > > > > >
    > > > > > > > > > > > --
    > > > > > > > > > > > - Tsuyoshi
    > > > > > > > > > > >
    > > > > > > > > > >
    > > > > > > > > >
    > > > > > > > > >
    > > > > > > > > >
    > > > > > > > > > --
    > > > > > > > > > Best Regards,
    > > > > > > > > > Gautam Kumar
    > > > > > > > > >
    > > > > > > > >
    > > > > > > >
    > > > > > > >
    > > > > > > >
    > > > > > > > --
    > > > > > > > Best Regards,
    > > > > > > > Gautam Kumar
    > > > > > > >
    > > > > > >
    > > > > >
    > > > >
    > > >
    > >
    >
    


Re: Apache MXNet build failures are mostly valid - verify before merge

Posted by Chris Olivier <cj...@gmail.com>.
-1000 on that. :)

On Thu, Sep 28, 2017 at 11:33 AM Naveen Swamy <mn...@gmail.com> wrote:

> PR->Sanity test/Linux build/test->reviewer/committer approves the
> change->Comment "Build Now" (Or trigger on at least one approval from a
> committer other than author)->*Full build-*>*passes build*->Enable Merge
>
> Let us take this particular topic to a separate thread or discuss offline
> if further clarification is needed.
>
> On Thu, Sep 28, 2017 at 11:24 AM, Chris Olivier <cj...@gmail.com>
> wrote:
>
> > I understand the proposal.  How to trigger a build in that case?
> >
> >
> > On Thu, Sep 28, 2017 at 10:54 AM Madan Jampani <ma...@gmail.com>
> > wrote:
> >
> > > Chris,
> > > I don't think Naveen is suggesting that a merge happen without full
> > > verification i.e. all tests across all platforms pass.
> > > If a PR has some back and forth and results in multiple revisions
> (which
> > is
> > > arguably more common than a random unit test failing), we simply delay
> > full
> > > verification until the owner/reviewer have settled on a mutually
> > acceptable
> > > state.
> > >
> > > On Thu, Sep 28, 2017 at 10:25 AM, Chris Olivier <cjolivier01@gmail.com
> >
> > > wrote:
> > >
> > > > -1 for running only partial tests.  Most failing unit tests that get
> > > > through fail only for certain platforms/configurations.  I personally
> > > > prefer to be assured the build and test is good before merge.  Most
> PR
> > > > merges aren't in a huge hurry.
> > > >
> > > > On Thu, Sep 28, 2017 at 9:54 AM, Naveen Swamy <mn...@gmail.com>
> > > wrote:
> > > >
> > > > > +1 to make it protected. Here is what I am thinking for PR builds
> > > > > on a PR run Sanity Tests + build/test one platform->committer
> reviews
> > > the
> > > > > code and issues "Build Now", a full build is run->Github checks
> that
> > > the
> > > > > full build checks succeed before it can be merged.
> > > > >
> > > > > I agree with Madan that PR should be approved by one another
> > committer.
> > > > >
> > > > >
> > > > >
> > > > > On Thu, Sep 28, 2017 at 9:37 AM, Madan Jampani <
> > > madan.jampani@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > +1
> > > > > >
> > > > > > At a minimum I'd like to see the following two happen:
> > > > > > - Option to merge is disabled until all required checks pass.
> > > > > > - Code is reviewed and given +1 by at least one other committer
> (no
> > > > self
> > > > > > review).
> > > > > >
> > > > > > On Wed, Sep 27, 2017 at 11:15 PM, Gautam <ga...@gmail.com>
> > > wrote:
> > > > > >
> > > > > > > Hi Chris,
> > > > > > >
> > > > > > >   Here <https://help.github.com/articles/about-protected-
> > branches/
> > > >
> > > > is
> > > > > > > user
> > > > > > > document on semantics of protected branch.
> > > > > > > In short when a branch is protected following applies to that
> > > branch.
> > > > > > >
> > > > > > >    - Can't be force pushed
> > > > > > >    - Can't be deleted
> > > > > > >    - Can't have changes merged into it until required status
> > checks
> > > > > > >    <https://help.github.com/articles/about-required-
> > status-checks>
> > > > > pass
> > > > > > >    - Can't have changes merged into it until required reviews
> are
> > > > > > approved
> > > > > > >    <https://help.github.com/articles/approving-a-pull-
> > > > > > > request-with-required-reviews>
> > > > > > >    - Can't be edited or have files uploaded to it from the web
> > > > > > >    - Can't have changes merged into it until changes to files
> > that
> > > > > > > have a designated
> > > > > > >    code owner <https://help.github.com/
> > articles/about-codeowners/>
> > > > > have
> > > > > > >    been approved by that owner
> > > > > > >
> > > > > > >  I am sure many of us might not want to have all these but we
> can
> > > > > debate
> > > > > > on
> > > > > > > it. My main motive was to "*Can't have changes merged into it
> > until
> > > > > > > required status checks pass*"
> > > > > > >
> > > > > > >
> > > > > > > -Gautam
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Wed, Sep 27, 2017 at 11:09 PM, Chris Olivier <
> > > > cjolivier01@gmail.com
> > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > What does that mean? "Protected"? Protected from what?
> > > > > > > >
> > > > > > > > On Wed, Sep 27, 2017 at 11:08 PM Gautam <
> gautamnitc@gmail.com>
> > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Chris,
> > > > > > > > >
> > > > > > > > >    I mean make "master branch protected" of  MXNet.
> > > > > > > > >
> > > > > > > > > -Gautam
> > > > > > > > >
> > > > > > > > > On Wed, Sep 27, 2017 at 11:04 PM, Chris Olivier <
> > > > > > cjolivier01@gmail.com
> > > > > > > >
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > What does this mean? "Mx-net branch protected"?
> > > > > > > > > >
> > > > > > > > > > On Wed, Sep 27, 2017 at 9:59 PM Tsuyoshi OZAWA <
> > > > > > > > ozawa.tsuyoshi@gmail.com
> > > > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > +1,
> > > > > > > > > > >
> > > > > > > > > > > While I'm checking the recent build failures, and I
> think
> > > the
> > > > > > > > decision
> > > > > > > > > > > of making the mx-net branch protected is necessary for
> > > stable
> > > > > > > > > > > building.
> > > > > > > > > > > Thanks Kumar for resuming important discussion.
> > > > > > > > > > >
> > > > > > > > > > > Best regards
> > > > > > > > > > > - Tsuyoshi
> > > > > > > > > > >
> > > > > > > > > > > On Thu, Sep 28, 2017 at 12:56 PM, Kumar, Gautam <
> > > > > > gauta@amazon.com>
> > > > > > > > > > wrote:
> > > > > > > > > > > > Reviving the discussion.
> > > > > > > > > > > >
> > > > > > > > > > > > At this point of time we have couple of stable builds
> > > > > > > > > > > >
> > > > > > > > > > >
> https://builds.apache.org/view/Incubator%20Projects/job/
> > > > > > > > > > incubator-mxnet/job/master/448/
> > > > > > > > > > > >
> > > > > > > > > > >
> https://builds.apache.org/view/Incubator%20Projects/job/
> > > > > > > > > > incubator-mxnet/job/master/449/
> > > > > > > > > > > >
> > > > > > > > > > > > Should we have a quick discussion or polling on
> making
> > > the
> > > > > > mx-net
> > > > > > > > > > branch
> > > > > > > > > > > protected? If you still think we shouldn’t make it
> > > protected
> > > > > > please
> > > > > > > > > > provide
> > > > > > > > > > > a reason to support your claim.
> > > > > > > > > > > >
> > > > > > > > > > > > Few of us have concern over Jenkin’s stability. If I
> > look
> > > > two
> > > > > > > weeks
> > > > > > > > > > > back, after upgrading Linux slave to g2.8x and new
> > windows
> > > > AMI,
> > > > > > we
> > > > > > > > have
> > > > > > > > > > not
> > > > > > > > > > > seen any case where instance died due to high memory
> > usage
> > > or
> > > > > any
> > > > > > > > > process
> > > > > > > > > > > got killed due to high cpu usage or any other issue
> with
> > > > > windows
> > > > > > > > > slaves.
> > > > > > > > > > > >
> > > > > > > > > > > > Going forward we are also planning that if we add any
> > new
> > > > > slave
> > > > > > > we
> > > > > > > > > will
> > > > > > > > > > > not enable the main load immediately, but rather will
> do
> > > > ‘test
> > > > > > > build’
> > > > > > > > > to
> > > > > > > > > > > make sure that new slaves are not causing any
> > > infrastructure
> > > > > > issue
> > > > > > > > and
> > > > > > > > > > > capable to perform as good as existing slaves.
> > > > > > > > > > > >
> > > > > > > > > > > > -Gautam
> > > > > > > > > > > >
> > > > > > > > > > > > On 8/31/17, 5:27 PM, "Lupesko, Hagay" <
> > lupesko@gmail.com
> > > >
> > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > >     @madan looking into some failures – you’re right…
> > > > there’s
> > > > > > > > > multiple
> > > > > > > > > > > issues going on, some of them intermittent, and we want
> > to
> > > be
> > > > > > able
> > > > > > > to
> > > > > > > > > > merge
> > > > > > > > > > > fixes in.
> > > > > > > > > > > >     Agreed that we can wait with setting up protected
> > > mode
> > > > > > until
> > > > > > > > > build
> > > > > > > > > > > stabilizes.
> > > > > > > > > > > >
> > > > > > > > > > > >     On 8/31/17, 11:41, "Madan Jampani" <
> > > > > > madan.jampani@gmail.com>
> > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > >         @hagay: we agree on the end state. I'm not
> too
> > > > > > particular
> > > > > > > > > about
> > > > > > > > > > > how we get
> > > > > > > > > > > >         there. If you think enabling it now and fixes
> > > > > > regression
> > > > > > > > > later
> > > > > > > > > > > is doable,
> > > > > > > > > > > >         I'm fine with. I see a bit of a chicken and
> egg
> > > > > > problem.
> > > > > > > We
> > > > > > > > > > need
> > > > > > > > > > > to get
> > > > > > > > > > > >         some fixes in even when the status checks are
> > > > > failing.
> > > > > > > > > > > >
> > > > > > > > > > > >         On Thu, Aug 31, 2017 at 11:25 AM, Lupesko,
> > Hagay
> > > <
> > > > > > > > > > > lupesko@gmail.com> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > >         > @madan – re: getting to a stable CI first:
> > > > > > > > > > > >         > I’m concerned that by not enabling
> protected
> > > > branch
> > > > > > > mode
> > > > > > > > > > ASAP,
> > > > > > > > > > > we’re just
> > > > > > > > > > > >         > taking in more regressions, which makes a
> > > stable
> > > > > > build
> > > > > > > a
> > > > > > > > > > > moving target for
> > > > > > > > > > > >         > us…
> > > > > > > > > > > >         >
> > > > > > > > > > > >         > On 8/31/17, 10:49, "Zha, Sheng" <
> > > > > zhasheng@amazon.com
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > > > >         >
> > > > > > > > > > > >         >     Just one thing: please don’t disable
> more
> > > > tests
> > > > > > or
> > > > > > > > just
> > > > > > > > > > > raise the
> > > > > > > > > > > >         > tolerance thresholds.
> > > > > > > > > > > >         >
> > > > > > > > > > > >         >     Best regards,
> > > > > > > > > > > >         >     -sz
> > > > > > > > > > > >         >
> > > > > > > > > > > >         >     On 8/31/17, 10:45 AM, "Madan Jampani" <
> > > > > > > > > > > madan.jampani@gmail.com> wrote:
> > > > > > > > > > > >         >
> > > > > > > > > > > >         >         +1
> > > > > > > > > > > >         >         Before we can turn protected mode I
> > > feel
> > > > we
> > > > > > > > should
> > > > > > > > > > > first get to a
> > > > > > > > > > > >         > stable CI
> > > > > > > > > > > >         >         pipeline.
> > > > > > > > > > > >         >         Sandeep is chasing down known
> > breaking
> > > > > > issues.
> > > > > > > > > > > >         >
> > > > > > > > > > > >         >
> > > > > > > > > > > >         >         On Thu, Aug 31, 2017 at 10:27 AM,
> > Hagay
> > > > > > > Lupesko <
> > > > > > > > > > > lupesko@gmail.com>
> > > > > > > > > > > >         > wrote:
> > > > > > > > > > > >         >
> > > > > > > > > > > >         >         > Build stability is a major issue,
> > > > builds
> > > > > > have
> > > > > > > > > been
> > > > > > > > > > > failing left
> > > > > > > > > > > >         > and right
> > > > > > > > > > > >         >         > over the last week. Some of it is
> > due
> > > > to
> > > > > > > > Jenkins
> > > > > > > > > > > slave issues,
> > > > > > > > > > > >         > but some are
> > > > > > > > > > > >         >         > real regressions.
> > > > > > > > > > > >         >         > We need to be more strict in the
> > code
> > > > > we're
> > > > > > > > > > > committing.
> > > > > > > > > > > >         >         >
> > > > > > > > > > > >         >         > I propose we configure our master
> > to
> > > > be a
> > > > > > > > > protected
> > > > > > > > > > > branch (
> > > > > > > > > > > >         >         >
> > > > > > > > > > > https://help.github.com/articles/about-protected-
> > branches/
> > > ).
> > > > > > > > > > > >         >         >
> > > > > > > > > > > >         >         > Thoughts?
> > > > > > > > > > > >         >         >
> > > > > > > > > > > >         >         > On 2017-08-28 22:41, sandeep
> > > > > krishnamurthy
> > > > > > <
> > > > > > > > > > > s...@gmail.com>
> > > > > > > > > > > >         > wrote:
> > > > > > > > > > > >         >         > > Hello Committers and
> > Contributors,>
> > > > > > > > > > > >         >         > >
> > > > > > > > > > > >         >         > > Due to unstable build
> pipelines,
> > > from
> > > > > > past
> > > > > > > 1
> > > > > > > > > > week,
> > > > > > > > > > > PRs are
> > > > > > > > > > > >         > being merged>
> > > > > > > > > > > >         >         > > after CR ignoring PR build
> > status.
> > > > > Build
> > > > > > > > > pipeline
> > > > > > > > > > > is much more
> > > > > > > > > > > >         > stable
> > > > > > > > > > > >         >         > than>
> > > > > > > > > > > >         >         > > last week and most of the build
> > > > > failures
> > > > > > > you
> > > > > > > > > see
> > > > > > > > > > > from now on,
> > > > > > > > > > > >         > are likely
> > > > > > > > > > > >         >         > to>
> > > > > > > > > > > >         >         > > be a valid failure and hence,
> it
> > is
> > > > > > > > recommended
> > > > > > > > > > to
> > > > > > > > > > > wait for PR
> > > > > > > > > > > >         > builds,
> > > > > > > > > > > >         >         > see>
> > > > > > > > > > > >         >         > > the root cause of any build
> > > failures
> > > > > > before
> > > > > > > > > > > proceeding with
> > > > > > > > > > > >         > merges.>
> > > > > > > > > > > >         >         > >
> > > > > > > > > > > >         >         > > At this point of time, there
> are
> > 2
> > > > > > > > intermittent
> > > > > > > > > > > issue yet to
> > > > > > > > > > > >         > be fixed ->
> > > > > > > > > > > >         >         > > * Network error leading to
> GitHub
> > > > > > requests
> > > > > > > > > > > throwing 404>
> > > > > > > > > > > >         >         > > * A conflict in artifacts
> > generated
> > > > > > between
> > > > > > > > > > > branches/PR -
> > > > > > > > > > > >         > Cause unknown
> > > > > > > > > > > >         >         > yet.>
> > > > > > > > > > > >         >         > > These issues will be fixed
> soon.>
> > > > > > > > > > > >         >         > >
> > > > > > > > > > > >         >         > >
> > > > > > > > > > > >         >         > > -- >
> > > > > > > > > > > >         >         > > Sandeep Krishnamurthy>
> > > > > > > > > > > >         >         > >
> > > > > > > > > > > >         >         >
> > > > > > > > > > > >         >
> > > > > > > > > > > >         >
> > > > > > > > > > > >         >
> > > > > > > > > > > >         >
> > > > > > > > > > > >         >
> > > > > > > > > > > >         >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > --
> > > > > > > > > > > - Tsuyoshi
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Best Regards,
> > > > > > > > > Gautam Kumar
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Best Regards,
> > > > > > > Gautam Kumar
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Apache MXNet build failures are mostly valid - verify before merge

Posted by Naveen Swamy <mn...@gmail.com>.
PR->Sanity test/Linux build/test->reviewer/committer approves the
change->Comment "Build Now" (Or trigger on at least one approval from a
committer other than author)->*Full build-*>*passes build*->Enable Merge

Let us take this particular topic to a separate thread or discuss offline
if further clarification is needed.

On Thu, Sep 28, 2017 at 11:24 AM, Chris Olivier <cj...@gmail.com>
wrote:

> I understand the proposal.  How to trigger a build in that case?
>
>
> On Thu, Sep 28, 2017 at 10:54 AM Madan Jampani <ma...@gmail.com>
> wrote:
>
> > Chris,
> > I don't think Naveen is suggesting that a merge happen without full
> > verification i.e. all tests across all platforms pass.
> > If a PR has some back and forth and results in multiple revisions (which
> is
> > arguably more common than a random unit test failing), we simply delay
> full
> > verification until the owner/reviewer have settled on a mutually
> acceptable
> > state.
> >
> > On Thu, Sep 28, 2017 at 10:25 AM, Chris Olivier <cj...@gmail.com>
> > wrote:
> >
> > > -1 for running only partial tests.  Most failing unit tests that get
> > > through fail only for certain platforms/configurations.  I personally
> > > prefer to be assured the build and test is good before merge.  Most PR
> > > merges aren't in a huge hurry.
> > >
> > > On Thu, Sep 28, 2017 at 9:54 AM, Naveen Swamy <mn...@gmail.com>
> > wrote:
> > >
> > > > +1 to make it protected. Here is what I am thinking for PR builds
> > > > on a PR run Sanity Tests + build/test one platform->committer reviews
> > the
> > > > code and issues "Build Now", a full build is run->Github checks that
> > the
> > > > full build checks succeed before it can be merged.
> > > >
> > > > I agree with Madan that PR should be approved by one another
> committer.
> > > >
> > > >
> > > >
> > > > On Thu, Sep 28, 2017 at 9:37 AM, Madan Jampani <
> > madan.jampani@gmail.com>
> > > > wrote:
> > > >
> > > > > +1
> > > > >
> > > > > At a minimum I'd like to see the following two happen:
> > > > > - Option to merge is disabled until all required checks pass.
> > > > > - Code is reviewed and given +1 by at least one other committer (no
> > > self
> > > > > review).
> > > > >
> > > > > On Wed, Sep 27, 2017 at 11:15 PM, Gautam <ga...@gmail.com>
> > wrote:
> > > > >
> > > > > > Hi Chris,
> > > > > >
> > > > > >   Here <https://help.github.com/articles/about-protected-
> branches/
> > >
> > > is
> > > > > > user
> > > > > > document on semantics of protected branch.
> > > > > > In short when a branch is protected following applies to that
> > branch.
> > > > > >
> > > > > >    - Can't be force pushed
> > > > > >    - Can't be deleted
> > > > > >    - Can't have changes merged into it until required status
> checks
> > > > > >    <https://help.github.com/articles/about-required-
> status-checks>
> > > > pass
> > > > > >    - Can't have changes merged into it until required reviews are
> > > > > approved
> > > > > >    <https://help.github.com/articles/approving-a-pull-
> > > > > > request-with-required-reviews>
> > > > > >    - Can't be edited or have files uploaded to it from the web
> > > > > >    - Can't have changes merged into it until changes to files
> that
> > > > > > have a designated
> > > > > >    code owner <https://help.github.com/
> articles/about-codeowners/>
> > > > have
> > > > > >    been approved by that owner
> > > > > >
> > > > > >  I am sure many of us might not want to have all these but we can
> > > > debate
> > > > > on
> > > > > > it. My main motive was to "*Can't have changes merged into it
> until
> > > > > > required status checks pass*"
> > > > > >
> > > > > >
> > > > > > -Gautam
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Wed, Sep 27, 2017 at 11:09 PM, Chris Olivier <
> > > cjolivier01@gmail.com
> > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > What does that mean? "Protected"? Protected from what?
> > > > > > >
> > > > > > > On Wed, Sep 27, 2017 at 11:08 PM Gautam <ga...@gmail.com>
> > > > wrote:
> > > > > > >
> > > > > > > > Hi Chris,
> > > > > > > >
> > > > > > > >    I mean make "master branch protected" of  MXNet.
> > > > > > > >
> > > > > > > > -Gautam
> > > > > > > >
> > > > > > > > On Wed, Sep 27, 2017 at 11:04 PM, Chris Olivier <
> > > > > cjolivier01@gmail.com
> > > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > What does this mean? "Mx-net branch protected"?
> > > > > > > > >
> > > > > > > > > On Wed, Sep 27, 2017 at 9:59 PM Tsuyoshi OZAWA <
> > > > > > > ozawa.tsuyoshi@gmail.com
> > > > > > > > >
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > +1,
> > > > > > > > > >
> > > > > > > > > > While I'm checking the recent build failures, and I think
> > the
> > > > > > > decision
> > > > > > > > > > of making the mx-net branch protected is necessary for
> > stable
> > > > > > > > > > building.
> > > > > > > > > > Thanks Kumar for resuming important discussion.
> > > > > > > > > >
> > > > > > > > > > Best regards
> > > > > > > > > > - Tsuyoshi
> > > > > > > > > >
> > > > > > > > > > On Thu, Sep 28, 2017 at 12:56 PM, Kumar, Gautam <
> > > > > gauta@amazon.com>
> > > > > > > > > wrote:
> > > > > > > > > > > Reviving the discussion.
> > > > > > > > > > >
> > > > > > > > > > > At this point of time we have couple of stable builds
> > > > > > > > > > >
> > > > > > > > > > https://builds.apache.org/view/Incubator%20Projects/job/
> > > > > > > > > incubator-mxnet/job/master/448/
> > > > > > > > > > >
> > > > > > > > > > https://builds.apache.org/view/Incubator%20Projects/job/
> > > > > > > > > incubator-mxnet/job/master/449/
> > > > > > > > > > >
> > > > > > > > > > > Should we have a quick discussion or polling on making
> > the
> > > > > mx-net
> > > > > > > > > branch
> > > > > > > > > > protected? If you still think we shouldn’t make it
> > protected
> > > > > please
> > > > > > > > > provide
> > > > > > > > > > a reason to support your claim.
> > > > > > > > > > >
> > > > > > > > > > > Few of us have concern over Jenkin’s stability. If I
> look
> > > two
> > > > > > weeks
> > > > > > > > > > back, after upgrading Linux slave to g2.8x and new
> windows
> > > AMI,
> > > > > we
> > > > > > > have
> > > > > > > > > not
> > > > > > > > > > seen any case where instance died due to high memory
> usage
> > or
> > > > any
> > > > > > > > process
> > > > > > > > > > got killed due to high cpu usage or any other issue with
> > > > windows
> > > > > > > > slaves.
> > > > > > > > > > >
> > > > > > > > > > > Going forward we are also planning that if we add any
> new
> > > > slave
> > > > > > we
> > > > > > > > will
> > > > > > > > > > not enable the main load immediately, but rather will do
> > > ‘test
> > > > > > build’
> > > > > > > > to
> > > > > > > > > > make sure that new slaves are not causing any
> > infrastructure
> > > > > issue
> > > > > > > and
> > > > > > > > > > capable to perform as good as existing slaves.
> > > > > > > > > > >
> > > > > > > > > > > -Gautam
> > > > > > > > > > >
> > > > > > > > > > > On 8/31/17, 5:27 PM, "Lupesko, Hagay" <
> lupesko@gmail.com
> > >
> > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > >     @madan looking into some failures – you’re right…
> > > there’s
> > > > > > > > multiple
> > > > > > > > > > issues going on, some of them intermittent, and we want
> to
> > be
> > > > > able
> > > > > > to
> > > > > > > > > merge
> > > > > > > > > > fixes in.
> > > > > > > > > > >     Agreed that we can wait with setting up protected
> > mode
> > > > > until
> > > > > > > > build
> > > > > > > > > > stabilizes.
> > > > > > > > > > >
> > > > > > > > > > >     On 8/31/17, 11:41, "Madan Jampani" <
> > > > > madan.jampani@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > >         @hagay: we agree on the end state. I'm not too
> > > > > particular
> > > > > > > > about
> > > > > > > > > > how we get
> > > > > > > > > > >         there. If you think enabling it now and fixes
> > > > > regression
> > > > > > > > later
> > > > > > > > > > is doable,
> > > > > > > > > > >         I'm fine with. I see a bit of a chicken and egg
> > > > > problem.
> > > > > > We
> > > > > > > > > need
> > > > > > > > > > to get
> > > > > > > > > > >         some fixes in even when the status checks are
> > > > failing.
> > > > > > > > > > >
> > > > > > > > > > >         On Thu, Aug 31, 2017 at 11:25 AM, Lupesko,
> Hagay
> > <
> > > > > > > > > > lupesko@gmail.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > >         > @madan – re: getting to a stable CI first:
> > > > > > > > > > >         > I’m concerned that by not enabling protected
> > > branch
> > > > > > mode
> > > > > > > > > ASAP,
> > > > > > > > > > we’re just
> > > > > > > > > > >         > taking in more regressions, which makes a
> > stable
> > > > > build
> > > > > > a
> > > > > > > > > > moving target for
> > > > > > > > > > >         > us…
> > > > > > > > > > >         >
> > > > > > > > > > >         > On 8/31/17, 10:49, "Zha, Sheng" <
> > > > zhasheng@amazon.com
> > > > > >
> > > > > > > > wrote:
> > > > > > > > > > >         >
> > > > > > > > > > >         >     Just one thing: please don’t disable more
> > > tests
> > > > > or
> > > > > > > just
> > > > > > > > > > raise the
> > > > > > > > > > >         > tolerance thresholds.
> > > > > > > > > > >         >
> > > > > > > > > > >         >     Best regards,
> > > > > > > > > > >         >     -sz
> > > > > > > > > > >         >
> > > > > > > > > > >         >     On 8/31/17, 10:45 AM, "Madan Jampani" <
> > > > > > > > > > madan.jampani@gmail.com> wrote:
> > > > > > > > > > >         >
> > > > > > > > > > >         >         +1
> > > > > > > > > > >         >         Before we can turn protected mode I
> > feel
> > > we
> > > > > > > should
> > > > > > > > > > first get to a
> > > > > > > > > > >         > stable CI
> > > > > > > > > > >         >         pipeline.
> > > > > > > > > > >         >         Sandeep is chasing down known
> breaking
> > > > > issues.
> > > > > > > > > > >         >
> > > > > > > > > > >         >
> > > > > > > > > > >         >         On Thu, Aug 31, 2017 at 10:27 AM,
> Hagay
> > > > > > Lupesko <
> > > > > > > > > > lupesko@gmail.com>
> > > > > > > > > > >         > wrote:
> > > > > > > > > > >         >
> > > > > > > > > > >         >         > Build stability is a major issue,
> > > builds
> > > > > have
> > > > > > > > been
> > > > > > > > > > failing left
> > > > > > > > > > >         > and right
> > > > > > > > > > >         >         > over the last week. Some of it is
> due
> > > to
> > > > > > > Jenkins
> > > > > > > > > > slave issues,
> > > > > > > > > > >         > but some are
> > > > > > > > > > >         >         > real regressions.
> > > > > > > > > > >         >         > We need to be more strict in the
> code
> > > > we're
> > > > > > > > > > committing.
> > > > > > > > > > >         >         >
> > > > > > > > > > >         >         > I propose we configure our master
> to
> > > be a
> > > > > > > > protected
> > > > > > > > > > branch (
> > > > > > > > > > >         >         >
> > > > > > > > > > https://help.github.com/articles/about-protected-
> branches/
> > ).
> > > > > > > > > > >         >         >
> > > > > > > > > > >         >         > Thoughts?
> > > > > > > > > > >         >         >
> > > > > > > > > > >         >         > On 2017-08-28 22:41, sandeep
> > > > krishnamurthy
> > > > > <
> > > > > > > > > > s...@gmail.com>
> > > > > > > > > > >         > wrote:
> > > > > > > > > > >         >         > > Hello Committers and
> Contributors,>
> > > > > > > > > > >         >         > >
> > > > > > > > > > >         >         > > Due to unstable build pipelines,
> > from
> > > > > past
> > > > > > 1
> > > > > > > > > week,
> > > > > > > > > > PRs are
> > > > > > > > > > >         > being merged>
> > > > > > > > > > >         >         > > after CR ignoring PR build
> status.
> > > > Build
> > > > > > > > pipeline
> > > > > > > > > > is much more
> > > > > > > > > > >         > stable
> > > > > > > > > > >         >         > than>
> > > > > > > > > > >         >         > > last week and most of the build
> > > > failures
> > > > > > you
> > > > > > > > see
> > > > > > > > > > from now on,
> > > > > > > > > > >         > are likely
> > > > > > > > > > >         >         > to>
> > > > > > > > > > >         >         > > be a valid failure and hence, it
> is
> > > > > > > recommended
> > > > > > > > > to
> > > > > > > > > > wait for PR
> > > > > > > > > > >         > builds,
> > > > > > > > > > >         >         > see>
> > > > > > > > > > >         >         > > the root cause of any build
> > failures
> > > > > before
> > > > > > > > > > proceeding with
> > > > > > > > > > >         > merges.>
> > > > > > > > > > >         >         > >
> > > > > > > > > > >         >         > > At this point of time, there are
> 2
> > > > > > > intermittent
> > > > > > > > > > issue yet to
> > > > > > > > > > >         > be fixed ->
> > > > > > > > > > >         >         > > * Network error leading to GitHub
> > > > > requests
> > > > > > > > > > throwing 404>
> > > > > > > > > > >         >         > > * A conflict in artifacts
> generated
> > > > > between
> > > > > > > > > > branches/PR -
> > > > > > > > > > >         > Cause unknown
> > > > > > > > > > >         >         > yet.>
> > > > > > > > > > >         >         > > These issues will be fixed soon.>
> > > > > > > > > > >         >         > >
> > > > > > > > > > >         >         > >
> > > > > > > > > > >         >         > > -- >
> > > > > > > > > > >         >         > > Sandeep Krishnamurthy>
> > > > > > > > > > >         >         > >
> > > > > > > > > > >         >         >
> > > > > > > > > > >         >
> > > > > > > > > > >         >
> > > > > > > > > > >         >
> > > > > > > > > > >         >
> > > > > > > > > > >         >
> > > > > > > > > > >         >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > - Tsuyoshi
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Best Regards,
> > > > > > > > Gautam Kumar
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Best Regards,
> > > > > > Gautam Kumar
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Apache MXNet build failures are mostly valid - verify before merge

Posted by Chris Olivier <cj...@gmail.com>.
I understand the proposal.  How to trigger a build in that case?


On Thu, Sep 28, 2017 at 10:54 AM Madan Jampani <ma...@gmail.com>
wrote:

> Chris,
> I don't think Naveen is suggesting that a merge happen without full
> verification i.e. all tests across all platforms pass.
> If a PR has some back and forth and results in multiple revisions (which is
> arguably more common than a random unit test failing), we simply delay full
> verification until the owner/reviewer have settled on a mutually acceptable
> state.
>
> On Thu, Sep 28, 2017 at 10:25 AM, Chris Olivier <cj...@gmail.com>
> wrote:
>
> > -1 for running only partial tests.  Most failing unit tests that get
> > through fail only for certain platforms/configurations.  I personally
> > prefer to be assured the build and test is good before merge.  Most PR
> > merges aren't in a huge hurry.
> >
> > On Thu, Sep 28, 2017 at 9:54 AM, Naveen Swamy <mn...@gmail.com>
> wrote:
> >
> > > +1 to make it protected. Here is what I am thinking for PR builds
> > > on a PR run Sanity Tests + build/test one platform->committer reviews
> the
> > > code and issues "Build Now", a full build is run->Github checks that
> the
> > > full build checks succeed before it can be merged.
> > >
> > > I agree with Madan that PR should be approved by one another committer.
> > >
> > >
> > >
> > > On Thu, Sep 28, 2017 at 9:37 AM, Madan Jampani <
> madan.jampani@gmail.com>
> > > wrote:
> > >
> > > > +1
> > > >
> > > > At a minimum I'd like to see the following two happen:
> > > > - Option to merge is disabled until all required checks pass.
> > > > - Code is reviewed and given +1 by at least one other committer (no
> > self
> > > > review).
> > > >
> > > > On Wed, Sep 27, 2017 at 11:15 PM, Gautam <ga...@gmail.com>
> wrote:
> > > >
> > > > > Hi Chris,
> > > > >
> > > > >   Here <https://help.github.com/articles/about-protected-branches/
> >
> > is
> > > > > user
> > > > > document on semantics of protected branch.
> > > > > In short when a branch is protected following applies to that
> branch.
> > > > >
> > > > >    - Can't be force pushed
> > > > >    - Can't be deleted
> > > > >    - Can't have changes merged into it until required status checks
> > > > >    <https://help.github.com/articles/about-required-status-checks>
> > > pass
> > > > >    - Can't have changes merged into it until required reviews are
> > > > approved
> > > > >    <https://help.github.com/articles/approving-a-pull-
> > > > > request-with-required-reviews>
> > > > >    - Can't be edited or have files uploaded to it from the web
> > > > >    - Can't have changes merged into it until changes to files that
> > > > > have a designated
> > > > >    code owner <https://help.github.com/articles/about-codeowners/>
> > > have
> > > > >    been approved by that owner
> > > > >
> > > > >  I am sure many of us might not want to have all these but we can
> > > debate
> > > > on
> > > > > it. My main motive was to "*Can't have changes merged into it until
> > > > > required status checks pass*"
> > > > >
> > > > >
> > > > > -Gautam
> > > > >
> > > > >
> > > > >
> > > > > On Wed, Sep 27, 2017 at 11:09 PM, Chris Olivier <
> > cjolivier01@gmail.com
> > > >
> > > > > wrote:
> > > > >
> > > > > > What does that mean? "Protected"? Protected from what?
> > > > > >
> > > > > > On Wed, Sep 27, 2017 at 11:08 PM Gautam <ga...@gmail.com>
> > > wrote:
> > > > > >
> > > > > > > Hi Chris,
> > > > > > >
> > > > > > >    I mean make "master branch protected" of  MXNet.
> > > > > > >
> > > > > > > -Gautam
> > > > > > >
> > > > > > > On Wed, Sep 27, 2017 at 11:04 PM, Chris Olivier <
> > > > cjolivier01@gmail.com
> > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > What does this mean? "Mx-net branch protected"?
> > > > > > > >
> > > > > > > > On Wed, Sep 27, 2017 at 9:59 PM Tsuyoshi OZAWA <
> > > > > > ozawa.tsuyoshi@gmail.com
> > > > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > +1,
> > > > > > > > >
> > > > > > > > > While I'm checking the recent build failures, and I think
> the
> > > > > > decision
> > > > > > > > > of making the mx-net branch protected is necessary for
> stable
> > > > > > > > > building.
> > > > > > > > > Thanks Kumar for resuming important discussion.
> > > > > > > > >
> > > > > > > > > Best regards
> > > > > > > > > - Tsuyoshi
> > > > > > > > >
> > > > > > > > > On Thu, Sep 28, 2017 at 12:56 PM, Kumar, Gautam <
> > > > gauta@amazon.com>
> > > > > > > > wrote:
> > > > > > > > > > Reviving the discussion.
> > > > > > > > > >
> > > > > > > > > > At this point of time we have couple of stable builds
> > > > > > > > > >
> > > > > > > > > https://builds.apache.org/view/Incubator%20Projects/job/
> > > > > > > > incubator-mxnet/job/master/448/
> > > > > > > > > >
> > > > > > > > > https://builds.apache.org/view/Incubator%20Projects/job/
> > > > > > > > incubator-mxnet/job/master/449/
> > > > > > > > > >
> > > > > > > > > > Should we have a quick discussion or polling on making
> the
> > > > mx-net
> > > > > > > > branch
> > > > > > > > > protected? If you still think we shouldn’t make it
> protected
> > > > please
> > > > > > > > provide
> > > > > > > > > a reason to support your claim.
> > > > > > > > > >
> > > > > > > > > > Few of us have concern over Jenkin’s stability. If I look
> > two
> > > > > weeks
> > > > > > > > > back, after upgrading Linux slave to g2.8x and new windows
> > AMI,
> > > > we
> > > > > > have
> > > > > > > > not
> > > > > > > > > seen any case where instance died due to high memory usage
> or
> > > any
> > > > > > > process
> > > > > > > > > got killed due to high cpu usage or any other issue with
> > > windows
> > > > > > > slaves.
> > > > > > > > > >
> > > > > > > > > > Going forward we are also planning that if we add any new
> > > slave
> > > > > we
> > > > > > > will
> > > > > > > > > not enable the main load immediately, but rather will do
> > ‘test
> > > > > build’
> > > > > > > to
> > > > > > > > > make sure that new slaves are not causing any
> infrastructure
> > > > issue
> > > > > > and
> > > > > > > > > capable to perform as good as existing slaves.
> > > > > > > > > >
> > > > > > > > > > -Gautam
> > > > > > > > > >
> > > > > > > > > > On 8/31/17, 5:27 PM, "Lupesko, Hagay" <lupesko@gmail.com
> >
> > > > wrote:
> > > > > > > > > >
> > > > > > > > > >     @madan looking into some failures – you’re right…
> > there’s
> > > > > > > multiple
> > > > > > > > > issues going on, some of them intermittent, and we want to
> be
> > > > able
> > > > > to
> > > > > > > > merge
> > > > > > > > > fixes in.
> > > > > > > > > >     Agreed that we can wait with setting up protected
> mode
> > > > until
> > > > > > > build
> > > > > > > > > stabilizes.
> > > > > > > > > >
> > > > > > > > > >     On 8/31/17, 11:41, "Madan Jampani" <
> > > > madan.jampani@gmail.com>
> > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > >         @hagay: we agree on the end state. I'm not too
> > > > particular
> > > > > > > about
> > > > > > > > > how we get
> > > > > > > > > >         there. If you think enabling it now and fixes
> > > > regression
> > > > > > > later
> > > > > > > > > is doable,
> > > > > > > > > >         I'm fine with. I see a bit of a chicken and egg
> > > > problem.
> > > > > We
> > > > > > > > need
> > > > > > > > > to get
> > > > > > > > > >         some fixes in even when the status checks are
> > > failing.
> > > > > > > > > >
> > > > > > > > > >         On Thu, Aug 31, 2017 at 11:25 AM, Lupesko, Hagay
> <
> > > > > > > > > lupesko@gmail.com> wrote:
> > > > > > > > > >
> > > > > > > > > >         > @madan – re: getting to a stable CI first:
> > > > > > > > > >         > I’m concerned that by not enabling protected
> > branch
> > > > > mode
> > > > > > > > ASAP,
> > > > > > > > > we’re just
> > > > > > > > > >         > taking in more regressions, which makes a
> stable
> > > > build
> > > > > a
> > > > > > > > > moving target for
> > > > > > > > > >         > us…
> > > > > > > > > >         >
> > > > > > > > > >         > On 8/31/17, 10:49, "Zha, Sheng" <
> > > zhasheng@amazon.com
> > > > >
> > > > > > > wrote:
> > > > > > > > > >         >
> > > > > > > > > >         >     Just one thing: please don’t disable more
> > tests
> > > > or
> > > > > > just
> > > > > > > > > raise the
> > > > > > > > > >         > tolerance thresholds.
> > > > > > > > > >         >
> > > > > > > > > >         >     Best regards,
> > > > > > > > > >         >     -sz
> > > > > > > > > >         >
> > > > > > > > > >         >     On 8/31/17, 10:45 AM, "Madan Jampani" <
> > > > > > > > > madan.jampani@gmail.com> wrote:
> > > > > > > > > >         >
> > > > > > > > > >         >         +1
> > > > > > > > > >         >         Before we can turn protected mode I
> feel
> > we
> > > > > > should
> > > > > > > > > first get to a
> > > > > > > > > >         > stable CI
> > > > > > > > > >         >         pipeline.
> > > > > > > > > >         >         Sandeep is chasing down known breaking
> > > > issues.
> > > > > > > > > >         >
> > > > > > > > > >         >
> > > > > > > > > >         >         On Thu, Aug 31, 2017 at 10:27 AM, Hagay
> > > > > Lupesko <
> > > > > > > > > lupesko@gmail.com>
> > > > > > > > > >         > wrote:
> > > > > > > > > >         >
> > > > > > > > > >         >         > Build stability is a major issue,
> > builds
> > > > have
> > > > > > > been
> > > > > > > > > failing left
> > > > > > > > > >         > and right
> > > > > > > > > >         >         > over the last week. Some of it is due
> > to
> > > > > > Jenkins
> > > > > > > > > slave issues,
> > > > > > > > > >         > but some are
> > > > > > > > > >         >         > real regressions.
> > > > > > > > > >         >         > We need to be more strict in the code
> > > we're
> > > > > > > > > committing.
> > > > > > > > > >         >         >
> > > > > > > > > >         >         > I propose we configure our master to
> > be a
> > > > > > > protected
> > > > > > > > > branch (
> > > > > > > > > >         >         >
> > > > > > > > > https://help.github.com/articles/about-protected-branches/
> ).
> > > > > > > > > >         >         >
> > > > > > > > > >         >         > Thoughts?
> > > > > > > > > >         >         >
> > > > > > > > > >         >         > On 2017-08-28 22:41, sandeep
> > > krishnamurthy
> > > > <
> > > > > > > > > s...@gmail.com>
> > > > > > > > > >         > wrote:
> > > > > > > > > >         >         > > Hello Committers and Contributors,>
> > > > > > > > > >         >         > >
> > > > > > > > > >         >         > > Due to unstable build pipelines,
> from
> > > > past
> > > > > 1
> > > > > > > > week,
> > > > > > > > > PRs are
> > > > > > > > > >         > being merged>
> > > > > > > > > >         >         > > after CR ignoring PR build status.
> > > Build
> > > > > > > pipeline
> > > > > > > > > is much more
> > > > > > > > > >         > stable
> > > > > > > > > >         >         > than>
> > > > > > > > > >         >         > > last week and most of the build
> > > failures
> > > > > you
> > > > > > > see
> > > > > > > > > from now on,
> > > > > > > > > >         > are likely
> > > > > > > > > >         >         > to>
> > > > > > > > > >         >         > > be a valid failure and hence, it is
> > > > > > recommended
> > > > > > > > to
> > > > > > > > > wait for PR
> > > > > > > > > >         > builds,
> > > > > > > > > >         >         > see>
> > > > > > > > > >         >         > > the root cause of any build
> failures
> > > > before
> > > > > > > > > proceeding with
> > > > > > > > > >         > merges.>
> > > > > > > > > >         >         > >
> > > > > > > > > >         >         > > At this point of time, there are 2
> > > > > > intermittent
> > > > > > > > > issue yet to
> > > > > > > > > >         > be fixed ->
> > > > > > > > > >         >         > > * Network error leading to GitHub
> > > > requests
> > > > > > > > > throwing 404>
> > > > > > > > > >         >         > > * A conflict in artifacts generated
> > > > between
> > > > > > > > > branches/PR -
> > > > > > > > > >         > Cause unknown
> > > > > > > > > >         >         > yet.>
> > > > > > > > > >         >         > > These issues will be fixed soon.>
> > > > > > > > > >         >         > >
> > > > > > > > > >         >         > >
> > > > > > > > > >         >         > > -- >
> > > > > > > > > >         >         > > Sandeep Krishnamurthy>
> > > > > > > > > >         >         > >
> > > > > > > > > >         >         >
> > > > > > > > > >         >
> > > > > > > > > >         >
> > > > > > > > > >         >
> > > > > > > > > >         >
> > > > > > > > > >         >
> > > > > > > > > >         >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > - Tsuyoshi
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Best Regards,
> > > > > > > Gautam Kumar
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Best Regards,
> > > > > Gautam Kumar
> > > > >
> > > >
> > >
> >
>

Re: Apache MXNet build failures are mostly valid - verify before merge

Posted by Madan Jampani <ma...@gmail.com>.
Chris,
I don't think Naveen is suggesting that a merge happen without full
verification i.e. all tests across all platforms pass.
If a PR has some back and forth and results in multiple revisions (which is
arguably more common than a random unit test failing), we simply delay full
verification until the owner/reviewer have settled on a mutually acceptable
state.

On Thu, Sep 28, 2017 at 10:25 AM, Chris Olivier <cj...@gmail.com>
wrote:

> -1 for running only partial tests.  Most failing unit tests that get
> through fail only for certain platforms/configurations.  I personally
> prefer to be assured the build and test is good before merge.  Most PR
> merges aren't in a huge hurry.
>
> On Thu, Sep 28, 2017 at 9:54 AM, Naveen Swamy <mn...@gmail.com> wrote:
>
> > +1 to make it protected. Here is what I am thinking for PR builds
> > on a PR run Sanity Tests + build/test one platform->committer reviews the
> > code and issues "Build Now", a full build is run->Github checks that the
> > full build checks succeed before it can be merged.
> >
> > I agree with Madan that PR should be approved by one another committer.
> >
> >
> >
> > On Thu, Sep 28, 2017 at 9:37 AM, Madan Jampani <ma...@gmail.com>
> > wrote:
> >
> > > +1
> > >
> > > At a minimum I'd like to see the following two happen:
> > > - Option to merge is disabled until all required checks pass.
> > > - Code is reviewed and given +1 by at least one other committer (no
> self
> > > review).
> > >
> > > On Wed, Sep 27, 2017 at 11:15 PM, Gautam <ga...@gmail.com> wrote:
> > >
> > > > Hi Chris,
> > > >
> > > >   Here <https://help.github.com/articles/about-protected-branches/>
> is
> > > > user
> > > > document on semantics of protected branch.
> > > > In short when a branch is protected following applies to that branch.
> > > >
> > > >    - Can't be force pushed
> > > >    - Can't be deleted
> > > >    - Can't have changes merged into it until required status checks
> > > >    <https://help.github.com/articles/about-required-status-checks>
> > pass
> > > >    - Can't have changes merged into it until required reviews are
> > > approved
> > > >    <https://help.github.com/articles/approving-a-pull-
> > > > request-with-required-reviews>
> > > >    - Can't be edited or have files uploaded to it from the web
> > > >    - Can't have changes merged into it until changes to files that
> > > > have a designated
> > > >    code owner <https://help.github.com/articles/about-codeowners/>
> > have
> > > >    been approved by that owner
> > > >
> > > >  I am sure many of us might not want to have all these but we can
> > debate
> > > on
> > > > it. My main motive was to "*Can't have changes merged into it until
> > > > required status checks pass*"
> > > >
> > > >
> > > > -Gautam
> > > >
> > > >
> > > >
> > > > On Wed, Sep 27, 2017 at 11:09 PM, Chris Olivier <
> cjolivier01@gmail.com
> > >
> > > > wrote:
> > > >
> > > > > What does that mean? "Protected"? Protected from what?
> > > > >
> > > > > On Wed, Sep 27, 2017 at 11:08 PM Gautam <ga...@gmail.com>
> > wrote:
> > > > >
> > > > > > Hi Chris,
> > > > > >
> > > > > >    I mean make "master branch protected" of  MXNet.
> > > > > >
> > > > > > -Gautam
> > > > > >
> > > > > > On Wed, Sep 27, 2017 at 11:04 PM, Chris Olivier <
> > > cjolivier01@gmail.com
> > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > What does this mean? "Mx-net branch protected"?
> > > > > > >
> > > > > > > On Wed, Sep 27, 2017 at 9:59 PM Tsuyoshi OZAWA <
> > > > > ozawa.tsuyoshi@gmail.com
> > > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > +1,
> > > > > > > >
> > > > > > > > While I'm checking the recent build failures, and I think the
> > > > > decision
> > > > > > > > of making the mx-net branch protected is necessary for stable
> > > > > > > > building.
> > > > > > > > Thanks Kumar for resuming important discussion.
> > > > > > > >
> > > > > > > > Best regards
> > > > > > > > - Tsuyoshi
> > > > > > > >
> > > > > > > > On Thu, Sep 28, 2017 at 12:56 PM, Kumar, Gautam <
> > > gauta@amazon.com>
> > > > > > > wrote:
> > > > > > > > > Reviving the discussion.
> > > > > > > > >
> > > > > > > > > At this point of time we have couple of stable builds
> > > > > > > > >
> > > > > > > > https://builds.apache.org/view/Incubator%20Projects/job/
> > > > > > > incubator-mxnet/job/master/448/
> > > > > > > > >
> > > > > > > > https://builds.apache.org/view/Incubator%20Projects/job/
> > > > > > > incubator-mxnet/job/master/449/
> > > > > > > > >
> > > > > > > > > Should we have a quick discussion or polling on making the
> > > mx-net
> > > > > > > branch
> > > > > > > > protected? If you still think we shouldn’t make it protected
> > > please
> > > > > > > provide
> > > > > > > > a reason to support your claim.
> > > > > > > > >
> > > > > > > > > Few of us have concern over Jenkin’s stability. If I look
> two
> > > > weeks
> > > > > > > > back, after upgrading Linux slave to g2.8x and new windows
> AMI,
> > > we
> > > > > have
> > > > > > > not
> > > > > > > > seen any case where instance died due to high memory usage or
> > any
> > > > > > process
> > > > > > > > got killed due to high cpu usage or any other issue with
> > windows
> > > > > > slaves.
> > > > > > > > >
> > > > > > > > > Going forward we are also planning that if we add any new
> > slave
> > > > we
> > > > > > will
> > > > > > > > not enable the main load immediately, but rather will do
> ‘test
> > > > build’
> > > > > > to
> > > > > > > > make sure that new slaves are not causing any infrastructure
> > > issue
> > > > > and
> > > > > > > > capable to perform as good as existing slaves.
> > > > > > > > >
> > > > > > > > > -Gautam
> > > > > > > > >
> > > > > > > > > On 8/31/17, 5:27 PM, "Lupesko, Hagay" <lu...@gmail.com>
> > > wrote:
> > > > > > > > >
> > > > > > > > >     @madan looking into some failures – you’re right…
> there’s
> > > > > > multiple
> > > > > > > > issues going on, some of them intermittent, and we want to be
> > > able
> > > > to
> > > > > > > merge
> > > > > > > > fixes in.
> > > > > > > > >     Agreed that we can wait with setting up protected mode
> > > until
> > > > > > build
> > > > > > > > stabilizes.
> > > > > > > > >
> > > > > > > > >     On 8/31/17, 11:41, "Madan Jampani" <
> > > madan.jampani@gmail.com>
> > > > > > > wrote:
> > > > > > > > >
> > > > > > > > >         @hagay: we agree on the end state. I'm not too
> > > particular
> > > > > > about
> > > > > > > > how we get
> > > > > > > > >         there. If you think enabling it now and fixes
> > > regression
> > > > > > later
> > > > > > > > is doable,
> > > > > > > > >         I'm fine with. I see a bit of a chicken and egg
> > > problem.
> > > > We
> > > > > > > need
> > > > > > > > to get
> > > > > > > > >         some fixes in even when the status checks are
> > failing.
> > > > > > > > >
> > > > > > > > >         On Thu, Aug 31, 2017 at 11:25 AM, Lupesko, Hagay <
> > > > > > > > lupesko@gmail.com> wrote:
> > > > > > > > >
> > > > > > > > >         > @madan – re: getting to a stable CI first:
> > > > > > > > >         > I’m concerned that by not enabling protected
> branch
> > > > mode
> > > > > > > ASAP,
> > > > > > > > we’re just
> > > > > > > > >         > taking in more regressions, which makes a stable
> > > build
> > > > a
> > > > > > > > moving target for
> > > > > > > > >         > us…
> > > > > > > > >         >
> > > > > > > > >         > On 8/31/17, 10:49, "Zha, Sheng" <
> > zhasheng@amazon.com
> > > >
> > > > > > wrote:
> > > > > > > > >         >
> > > > > > > > >         >     Just one thing: please don’t disable more
> tests
> > > or
> > > > > just
> > > > > > > > raise the
> > > > > > > > >         > tolerance thresholds.
> > > > > > > > >         >
> > > > > > > > >         >     Best regards,
> > > > > > > > >         >     -sz
> > > > > > > > >         >
> > > > > > > > >         >     On 8/31/17, 10:45 AM, "Madan Jampani" <
> > > > > > > > madan.jampani@gmail.com> wrote:
> > > > > > > > >         >
> > > > > > > > >         >         +1
> > > > > > > > >         >         Before we can turn protected mode I feel
> we
> > > > > should
> > > > > > > > first get to a
> > > > > > > > >         > stable CI
> > > > > > > > >         >         pipeline.
> > > > > > > > >         >         Sandeep is chasing down known breaking
> > > issues.
> > > > > > > > >         >
> > > > > > > > >         >
> > > > > > > > >         >         On Thu, Aug 31, 2017 at 10:27 AM, Hagay
> > > > Lupesko <
> > > > > > > > lupesko@gmail.com>
> > > > > > > > >         > wrote:
> > > > > > > > >         >
> > > > > > > > >         >         > Build stability is a major issue,
> builds
> > > have
> > > > > > been
> > > > > > > > failing left
> > > > > > > > >         > and right
> > > > > > > > >         >         > over the last week. Some of it is due
> to
> > > > > Jenkins
> > > > > > > > slave issues,
> > > > > > > > >         > but some are
> > > > > > > > >         >         > real regressions.
> > > > > > > > >         >         > We need to be more strict in the code
> > we're
> > > > > > > > committing.
> > > > > > > > >         >         >
> > > > > > > > >         >         > I propose we configure our master to
> be a
> > > > > > protected
> > > > > > > > branch (
> > > > > > > > >         >         >
> > > > > > > > https://help.github.com/articles/about-protected-branches/).
> > > > > > > > >         >         >
> > > > > > > > >         >         > Thoughts?
> > > > > > > > >         >         >
> > > > > > > > >         >         > On 2017-08-28 22:41, sandeep
> > krishnamurthy
> > > <
> > > > > > > > s...@gmail.com>
> > > > > > > > >         > wrote:
> > > > > > > > >         >         > > Hello Committers and Contributors,>
> > > > > > > > >         >         > >
> > > > > > > > >         >         > > Due to unstable build pipelines, from
> > > past
> > > > 1
> > > > > > > week,
> > > > > > > > PRs are
> > > > > > > > >         > being merged>
> > > > > > > > >         >         > > after CR ignoring PR build status.
> > Build
> > > > > > pipeline
> > > > > > > > is much more
> > > > > > > > >         > stable
> > > > > > > > >         >         > than>
> > > > > > > > >         >         > > last week and most of the build
> > failures
> > > > you
> > > > > > see
> > > > > > > > from now on,
> > > > > > > > >         > are likely
> > > > > > > > >         >         > to>
> > > > > > > > >         >         > > be a valid failure and hence, it is
> > > > > recommended
> > > > > > > to
> > > > > > > > wait for PR
> > > > > > > > >         > builds,
> > > > > > > > >         >         > see>
> > > > > > > > >         >         > > the root cause of any build failures
> > > before
> > > > > > > > proceeding with
> > > > > > > > >         > merges.>
> > > > > > > > >         >         > >
> > > > > > > > >         >         > > At this point of time, there are 2
> > > > > intermittent
> > > > > > > > issue yet to
> > > > > > > > >         > be fixed ->
> > > > > > > > >         >         > > * Network error leading to GitHub
> > > requests
> > > > > > > > throwing 404>
> > > > > > > > >         >         > > * A conflict in artifacts generated
> > > between
> > > > > > > > branches/PR -
> > > > > > > > >         > Cause unknown
> > > > > > > > >         >         > yet.>
> > > > > > > > >         >         > > These issues will be fixed soon.>
> > > > > > > > >         >         > >
> > > > > > > > >         >         > >
> > > > > > > > >         >         > > -- >
> > > > > > > > >         >         > > Sandeep Krishnamurthy>
> > > > > > > > >         >         > >
> > > > > > > > >         >         >
> > > > > > > > >         >
> > > > > > > > >         >
> > > > > > > > >         >
> > > > > > > > >         >
> > > > > > > > >         >
> > > > > > > > >         >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > - Tsuyoshi
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Best Regards,
> > > > > > Gautam Kumar
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Best Regards,
> > > > Gautam Kumar
> > > >
> > >
> >
>

Re: Apache MXNet build failures are mostly valid - verify before merge

Posted by Chris Olivier <cj...@gmail.com>.
-1 for running only partial tests.  Most failing unit tests that get
through fail only for certain platforms/configurations.  I personally
prefer to be assured the build and test is good before merge.  Most PR
merges aren't in a huge hurry.

On Thu, Sep 28, 2017 at 9:54 AM, Naveen Swamy <mn...@gmail.com> wrote:

> +1 to make it protected. Here is what I am thinking for PR builds
> on a PR run Sanity Tests + build/test one platform->committer reviews the
> code and issues "Build Now", a full build is run->Github checks that the
> full build checks succeed before it can be merged.
>
> I agree with Madan that PR should be approved by one another committer.
>
>
>
> On Thu, Sep 28, 2017 at 9:37 AM, Madan Jampani <ma...@gmail.com>
> wrote:
>
> > +1
> >
> > At a minimum I'd like to see the following two happen:
> > - Option to merge is disabled until all required checks pass.
> > - Code is reviewed and given +1 by at least one other committer (no self
> > review).
> >
> > On Wed, Sep 27, 2017 at 11:15 PM, Gautam <ga...@gmail.com> wrote:
> >
> > > Hi Chris,
> > >
> > >   Here <https://help.github.com/articles/about-protected-branches/> is
> > > user
> > > document on semantics of protected branch.
> > > In short when a branch is protected following applies to that branch.
> > >
> > >    - Can't be force pushed
> > >    - Can't be deleted
> > >    - Can't have changes merged into it until required status checks
> > >    <https://help.github.com/articles/about-required-status-checks>
> pass
> > >    - Can't have changes merged into it until required reviews are
> > approved
> > >    <https://help.github.com/articles/approving-a-pull-
> > > request-with-required-reviews>
> > >    - Can't be edited or have files uploaded to it from the web
> > >    - Can't have changes merged into it until changes to files that
> > > have a designated
> > >    code owner <https://help.github.com/articles/about-codeowners/>
> have
> > >    been approved by that owner
> > >
> > >  I am sure many of us might not want to have all these but we can
> debate
> > on
> > > it. My main motive was to "*Can't have changes merged into it until
> > > required status checks pass*"
> > >
> > >
> > > -Gautam
> > >
> > >
> > >
> > > On Wed, Sep 27, 2017 at 11:09 PM, Chris Olivier <cjolivier01@gmail.com
> >
> > > wrote:
> > >
> > > > What does that mean? "Protected"? Protected from what?
> > > >
> > > > On Wed, Sep 27, 2017 at 11:08 PM Gautam <ga...@gmail.com>
> wrote:
> > > >
> > > > > Hi Chris,
> > > > >
> > > > >    I mean make "master branch protected" of  MXNet.
> > > > >
> > > > > -Gautam
> > > > >
> > > > > On Wed, Sep 27, 2017 at 11:04 PM, Chris Olivier <
> > cjolivier01@gmail.com
> > > >
> > > > > wrote:
> > > > >
> > > > > > What does this mean? "Mx-net branch protected"?
> > > > > >
> > > > > > On Wed, Sep 27, 2017 at 9:59 PM Tsuyoshi OZAWA <
> > > > ozawa.tsuyoshi@gmail.com
> > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > +1,
> > > > > > >
> > > > > > > While I'm checking the recent build failures, and I think the
> > > > decision
> > > > > > > of making the mx-net branch protected is necessary for stable
> > > > > > > building.
> > > > > > > Thanks Kumar for resuming important discussion.
> > > > > > >
> > > > > > > Best regards
> > > > > > > - Tsuyoshi
> > > > > > >
> > > > > > > On Thu, Sep 28, 2017 at 12:56 PM, Kumar, Gautam <
> > gauta@amazon.com>
> > > > > > wrote:
> > > > > > > > Reviving the discussion.
> > > > > > > >
> > > > > > > > At this point of time we have couple of stable builds
> > > > > > > >
> > > > > > > https://builds.apache.org/view/Incubator%20Projects/job/
> > > > > > incubator-mxnet/job/master/448/
> > > > > > > >
> > > > > > > https://builds.apache.org/view/Incubator%20Projects/job/
> > > > > > incubator-mxnet/job/master/449/
> > > > > > > >
> > > > > > > > Should we have a quick discussion or polling on making the
> > mx-net
> > > > > > branch
> > > > > > > protected? If you still think we shouldn’t make it protected
> > please
> > > > > > provide
> > > > > > > a reason to support your claim.
> > > > > > > >
> > > > > > > > Few of us have concern over Jenkin’s stability. If I look two
> > > weeks
> > > > > > > back, after upgrading Linux slave to g2.8x and new windows AMI,
> > we
> > > > have
> > > > > > not
> > > > > > > seen any case where instance died due to high memory usage or
> any
> > > > > process
> > > > > > > got killed due to high cpu usage or any other issue with
> windows
> > > > > slaves.
> > > > > > > >
> > > > > > > > Going forward we are also planning that if we add any new
> slave
> > > we
> > > > > will
> > > > > > > not enable the main load immediately, but rather will do ‘test
> > > build’
> > > > > to
> > > > > > > make sure that new slaves are not causing any infrastructure
> > issue
> > > > and
> > > > > > > capable to perform as good as existing slaves.
> > > > > > > >
> > > > > > > > -Gautam
> > > > > > > >
> > > > > > > > On 8/31/17, 5:27 PM, "Lupesko, Hagay" <lu...@gmail.com>
> > wrote:
> > > > > > > >
> > > > > > > >     @madan looking into some failures – you’re right… there’s
> > > > > multiple
> > > > > > > issues going on, some of them intermittent, and we want to be
> > able
> > > to
> > > > > > merge
> > > > > > > fixes in.
> > > > > > > >     Agreed that we can wait with setting up protected mode
> > until
> > > > > build
> > > > > > > stabilizes.
> > > > > > > >
> > > > > > > >     On 8/31/17, 11:41, "Madan Jampani" <
> > madan.jampani@gmail.com>
> > > > > > wrote:
> > > > > > > >
> > > > > > > >         @hagay: we agree on the end state. I'm not too
> > particular
> > > > > about
> > > > > > > how we get
> > > > > > > >         there. If you think enabling it now and fixes
> > regression
> > > > > later
> > > > > > > is doable,
> > > > > > > >         I'm fine with. I see a bit of a chicken and egg
> > problem.
> > > We
> > > > > > need
> > > > > > > to get
> > > > > > > >         some fixes in even when the status checks are
> failing.
> > > > > > > >
> > > > > > > >         On Thu, Aug 31, 2017 at 11:25 AM, Lupesko, Hagay <
> > > > > > > lupesko@gmail.com> wrote:
> > > > > > > >
> > > > > > > >         > @madan – re: getting to a stable CI first:
> > > > > > > >         > I’m concerned that by not enabling protected branch
> > > mode
> > > > > > ASAP,
> > > > > > > we’re just
> > > > > > > >         > taking in more regressions, which makes a stable
> > build
> > > a
> > > > > > > moving target for
> > > > > > > >         > us…
> > > > > > > >         >
> > > > > > > >         > On 8/31/17, 10:49, "Zha, Sheng" <
> zhasheng@amazon.com
> > >
> > > > > wrote:
> > > > > > > >         >
> > > > > > > >         >     Just one thing: please don’t disable more tests
> > or
> > > > just
> > > > > > > raise the
> > > > > > > >         > tolerance thresholds.
> > > > > > > >         >
> > > > > > > >         >     Best regards,
> > > > > > > >         >     -sz
> > > > > > > >         >
> > > > > > > >         >     On 8/31/17, 10:45 AM, "Madan Jampani" <
> > > > > > > madan.jampani@gmail.com> wrote:
> > > > > > > >         >
> > > > > > > >         >         +1
> > > > > > > >         >         Before we can turn protected mode I feel we
> > > > should
> > > > > > > first get to a
> > > > > > > >         > stable CI
> > > > > > > >         >         pipeline.
> > > > > > > >         >         Sandeep is chasing down known breaking
> > issues.
> > > > > > > >         >
> > > > > > > >         >
> > > > > > > >         >         On Thu, Aug 31, 2017 at 10:27 AM, Hagay
> > > Lupesko <
> > > > > > > lupesko@gmail.com>
> > > > > > > >         > wrote:
> > > > > > > >         >
> > > > > > > >         >         > Build stability is a major issue, builds
> > have
> > > > > been
> > > > > > > failing left
> > > > > > > >         > and right
> > > > > > > >         >         > over the last week. Some of it is due to
> > > > Jenkins
> > > > > > > slave issues,
> > > > > > > >         > but some are
> > > > > > > >         >         > real regressions.
> > > > > > > >         >         > We need to be more strict in the code
> we're
> > > > > > > committing.
> > > > > > > >         >         >
> > > > > > > >         >         > I propose we configure our master to be a
> > > > > protected
> > > > > > > branch (
> > > > > > > >         >         >
> > > > > > > https://help.github.com/articles/about-protected-branches/).
> > > > > > > >         >         >
> > > > > > > >         >         > Thoughts?
> > > > > > > >         >         >
> > > > > > > >         >         > On 2017-08-28 22:41, sandeep
> krishnamurthy
> > <
> > > > > > > s...@gmail.com>
> > > > > > > >         > wrote:
> > > > > > > >         >         > > Hello Committers and Contributors,>
> > > > > > > >         >         > >
> > > > > > > >         >         > > Due to unstable build pipelines, from
> > past
> > > 1
> > > > > > week,
> > > > > > > PRs are
> > > > > > > >         > being merged>
> > > > > > > >         >         > > after CR ignoring PR build status.
> Build
> > > > > pipeline
> > > > > > > is much more
> > > > > > > >         > stable
> > > > > > > >         >         > than>
> > > > > > > >         >         > > last week and most of the build
> failures
> > > you
> > > > > see
> > > > > > > from now on,
> > > > > > > >         > are likely
> > > > > > > >         >         > to>
> > > > > > > >         >         > > be a valid failure and hence, it is
> > > > recommended
> > > > > > to
> > > > > > > wait for PR
> > > > > > > >         > builds,
> > > > > > > >         >         > see>
> > > > > > > >         >         > > the root cause of any build failures
> > before
> > > > > > > proceeding with
> > > > > > > >         > merges.>
> > > > > > > >         >         > >
> > > > > > > >         >         > > At this point of time, there are 2
> > > > intermittent
> > > > > > > issue yet to
> > > > > > > >         > be fixed ->
> > > > > > > >         >         > > * Network error leading to GitHub
> > requests
> > > > > > > throwing 404>
> > > > > > > >         >         > > * A conflict in artifacts generated
> > between
> > > > > > > branches/PR -
> > > > > > > >         > Cause unknown
> > > > > > > >         >         > yet.>
> > > > > > > >         >         > > These issues will be fixed soon.>
> > > > > > > >         >         > >
> > > > > > > >         >         > >
> > > > > > > >         >         > > -- >
> > > > > > > >         >         > > Sandeep Krishnamurthy>
> > > > > > > >         >         > >
> > > > > > > >         >         >
> > > > > > > >         >
> > > > > > > >         >
> > > > > > > >         >
> > > > > > > >         >
> > > > > > > >         >
> > > > > > > >         >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > - Tsuyoshi
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Best Regards,
> > > > > Gautam Kumar
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Best Regards,
> > > Gautam Kumar
> > >
> >
>

Re: Apache MXNet build failures are mostly valid - verify before merge

Posted by Naveen Swamy <mn...@gmail.com>.
+1 to make it protected. Here is what I am thinking for PR builds
on a PR run Sanity Tests + build/test one platform->committer reviews the
code and issues "Build Now", a full build is run->Github checks that the
full build checks succeed before it can be merged.

I agree with Madan that PR should be approved by one another committer.



On Thu, Sep 28, 2017 at 9:37 AM, Madan Jampani <ma...@gmail.com>
wrote:

> +1
>
> At a minimum I'd like to see the following two happen:
> - Option to merge is disabled until all required checks pass.
> - Code is reviewed and given +1 by at least one other committer (no self
> review).
>
> On Wed, Sep 27, 2017 at 11:15 PM, Gautam <ga...@gmail.com> wrote:
>
> > Hi Chris,
> >
> >   Here <https://help.github.com/articles/about-protected-branches/> is
> > user
> > document on semantics of protected branch.
> > In short when a branch is protected following applies to that branch.
> >
> >    - Can't be force pushed
> >    - Can't be deleted
> >    - Can't have changes merged into it until required status checks
> >    <https://help.github.com/articles/about-required-status-checks> pass
> >    - Can't have changes merged into it until required reviews are
> approved
> >    <https://help.github.com/articles/approving-a-pull-
> > request-with-required-reviews>
> >    - Can't be edited or have files uploaded to it from the web
> >    - Can't have changes merged into it until changes to files that
> > have a designated
> >    code owner <https://help.github.com/articles/about-codeowners/> have
> >    been approved by that owner
> >
> >  I am sure many of us might not want to have all these but we can debate
> on
> > it. My main motive was to "*Can't have changes merged into it until
> > required status checks pass*"
> >
> >
> > -Gautam
> >
> >
> >
> > On Wed, Sep 27, 2017 at 11:09 PM, Chris Olivier <cj...@gmail.com>
> > wrote:
> >
> > > What does that mean? "Protected"? Protected from what?
> > >
> > > On Wed, Sep 27, 2017 at 11:08 PM Gautam <ga...@gmail.com> wrote:
> > >
> > > > Hi Chris,
> > > >
> > > >    I mean make "master branch protected" of  MXNet.
> > > >
> > > > -Gautam
> > > >
> > > > On Wed, Sep 27, 2017 at 11:04 PM, Chris Olivier <
> cjolivier01@gmail.com
> > >
> > > > wrote:
> > > >
> > > > > What does this mean? "Mx-net branch protected"?
> > > > >
> > > > > On Wed, Sep 27, 2017 at 9:59 PM Tsuyoshi OZAWA <
> > > ozawa.tsuyoshi@gmail.com
> > > > >
> > > > > wrote:
> > > > >
> > > > > > +1,
> > > > > >
> > > > > > While I'm checking the recent build failures, and I think the
> > > decision
> > > > > > of making the mx-net branch protected is necessary for stable
> > > > > > building.
> > > > > > Thanks Kumar for resuming important discussion.
> > > > > >
> > > > > > Best regards
> > > > > > - Tsuyoshi
> > > > > >
> > > > > > On Thu, Sep 28, 2017 at 12:56 PM, Kumar, Gautam <
> gauta@amazon.com>
> > > > > wrote:
> > > > > > > Reviving the discussion.
> > > > > > >
> > > > > > > At this point of time we have couple of stable builds
> > > > > > >
> > > > > > https://builds.apache.org/view/Incubator%20Projects/job/
> > > > > incubator-mxnet/job/master/448/
> > > > > > >
> > > > > > https://builds.apache.org/view/Incubator%20Projects/job/
> > > > > incubator-mxnet/job/master/449/
> > > > > > >
> > > > > > > Should we have a quick discussion or polling on making the
> mx-net
> > > > > branch
> > > > > > protected? If you still think we shouldn’t make it protected
> please
> > > > > provide
> > > > > > a reason to support your claim.
> > > > > > >
> > > > > > > Few of us have concern over Jenkin’s stability. If I look two
> > weeks
> > > > > > back, after upgrading Linux slave to g2.8x and new windows AMI,
> we
> > > have
> > > > > not
> > > > > > seen any case where instance died due to high memory usage or any
> > > > process
> > > > > > got killed due to high cpu usage or any other issue with windows
> > > > slaves.
> > > > > > >
> > > > > > > Going forward we are also planning that if we add any new slave
> > we
> > > > will
> > > > > > not enable the main load immediately, but rather will do ‘test
> > build’
> > > > to
> > > > > > make sure that new slaves are not causing any infrastructure
> issue
> > > and
> > > > > > capable to perform as good as existing slaves.
> > > > > > >
> > > > > > > -Gautam
> > > > > > >
> > > > > > > On 8/31/17, 5:27 PM, "Lupesko, Hagay" <lu...@gmail.com>
> wrote:
> > > > > > >
> > > > > > >     @madan looking into some failures – you’re right… there’s
> > > > multiple
> > > > > > issues going on, some of them intermittent, and we want to be
> able
> > to
> > > > > merge
> > > > > > fixes in.
> > > > > > >     Agreed that we can wait with setting up protected mode
> until
> > > > build
> > > > > > stabilizes.
> > > > > > >
> > > > > > >     On 8/31/17, 11:41, "Madan Jampani" <
> madan.jampani@gmail.com>
> > > > > wrote:
> > > > > > >
> > > > > > >         @hagay: we agree on the end state. I'm not too
> particular
> > > > about
> > > > > > how we get
> > > > > > >         there. If you think enabling it now and fixes
> regression
> > > > later
> > > > > > is doable,
> > > > > > >         I'm fine with. I see a bit of a chicken and egg
> problem.
> > We
> > > > > need
> > > > > > to get
> > > > > > >         some fixes in even when the status checks are failing.
> > > > > > >
> > > > > > >         On Thu, Aug 31, 2017 at 11:25 AM, Lupesko, Hagay <
> > > > > > lupesko@gmail.com> wrote:
> > > > > > >
> > > > > > >         > @madan – re: getting to a stable CI first:
> > > > > > >         > I’m concerned that by not enabling protected branch
> > mode
> > > > > ASAP,
> > > > > > we’re just
> > > > > > >         > taking in more regressions, which makes a stable
> build
> > a
> > > > > > moving target for
> > > > > > >         > us…
> > > > > > >         >
> > > > > > >         > On 8/31/17, 10:49, "Zha, Sheng" <zhasheng@amazon.com
> >
> > > > wrote:
> > > > > > >         >
> > > > > > >         >     Just one thing: please don’t disable more tests
> or
> > > just
> > > > > > raise the
> > > > > > >         > tolerance thresholds.
> > > > > > >         >
> > > > > > >         >     Best regards,
> > > > > > >         >     -sz
> > > > > > >         >
> > > > > > >         >     On 8/31/17, 10:45 AM, "Madan Jampani" <
> > > > > > madan.jampani@gmail.com> wrote:
> > > > > > >         >
> > > > > > >         >         +1
> > > > > > >         >         Before we can turn protected mode I feel we
> > > should
> > > > > > first get to a
> > > > > > >         > stable CI
> > > > > > >         >         pipeline.
> > > > > > >         >         Sandeep is chasing down known breaking
> issues.
> > > > > > >         >
> > > > > > >         >
> > > > > > >         >         On Thu, Aug 31, 2017 at 10:27 AM, Hagay
> > Lupesko <
> > > > > > lupesko@gmail.com>
> > > > > > >         > wrote:
> > > > > > >         >
> > > > > > >         >         > Build stability is a major issue, builds
> have
> > > > been
> > > > > > failing left
> > > > > > >         > and right
> > > > > > >         >         > over the last week. Some of it is due to
> > > Jenkins
> > > > > > slave issues,
> > > > > > >         > but some are
> > > > > > >         >         > real regressions.
> > > > > > >         >         > We need to be more strict in the code we're
> > > > > > committing.
> > > > > > >         >         >
> > > > > > >         >         > I propose we configure our master to be a
> > > > protected
> > > > > > branch (
> > > > > > >         >         >
> > > > > > https://help.github.com/articles/about-protected-branches/).
> > > > > > >         >         >
> > > > > > >         >         > Thoughts?
> > > > > > >         >         >
> > > > > > >         >         > On 2017-08-28 22:41, sandeep krishnamurthy
> <
> > > > > > s...@gmail.com>
> > > > > > >         > wrote:
> > > > > > >         >         > > Hello Committers and Contributors,>
> > > > > > >         >         > >
> > > > > > >         >         > > Due to unstable build pipelines, from
> past
> > 1
> > > > > week,
> > > > > > PRs are
> > > > > > >         > being merged>
> > > > > > >         >         > > after CR ignoring PR build status. Build
> > > > pipeline
> > > > > > is much more
> > > > > > >         > stable
> > > > > > >         >         > than>
> > > > > > >         >         > > last week and most of the build failures
> > you
> > > > see
> > > > > > from now on,
> > > > > > >         > are likely
> > > > > > >         >         > to>
> > > > > > >         >         > > be a valid failure and hence, it is
> > > recommended
> > > > > to
> > > > > > wait for PR
> > > > > > >         > builds,
> > > > > > >         >         > see>
> > > > > > >         >         > > the root cause of any build failures
> before
> > > > > > proceeding with
> > > > > > >         > merges.>
> > > > > > >         >         > >
> > > > > > >         >         > > At this point of time, there are 2
> > > intermittent
> > > > > > issue yet to
> > > > > > >         > be fixed ->
> > > > > > >         >         > > * Network error leading to GitHub
> requests
> > > > > > throwing 404>
> > > > > > >         >         > > * A conflict in artifacts generated
> between
> > > > > > branches/PR -
> > > > > > >         > Cause unknown
> > > > > > >         >         > yet.>
> > > > > > >         >         > > These issues will be fixed soon.>
> > > > > > >         >         > >
> > > > > > >         >         > >
> > > > > > >         >         > > -- >
> > > > > > >         >         > > Sandeep Krishnamurthy>
> > > > > > >         >         > >
> > > > > > >         >         >
> > > > > > >         >
> > > > > > >         >
> > > > > > >         >
> > > > > > >         >
> > > > > > >         >
> > > > > > >         >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > - Tsuyoshi
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Best Regards,
> > > > Gautam Kumar
> > > >
> > >
> >
> >
> >
> > --
> > Best Regards,
> > Gautam Kumar
> >
>

Re: Apache MXNet build failures are mostly valid - verify before merge

Posted by Chris Olivier <cj...@gmail.com>.
+1

On Thu, Sep 28, 2017 at 9:37 AM, Madan Jampani <ma...@gmail.com>
wrote:

> +1
>
> At a minimum I'd like to see the following two happen:
> - Option to merge is disabled until all required checks pass.
> - Code is reviewed and given +1 by at least one other committer (no self
> review).
>
> On Wed, Sep 27, 2017 at 11:15 PM, Gautam <ga...@gmail.com> wrote:
>
> > Hi Chris,
> >
> >   Here <https://help.github.com/articles/about-protected-branches/> is
> > user
> > document on semantics of protected branch.
> > In short when a branch is protected following applies to that branch.
> >
> >    - Can't be force pushed
> >    - Can't be deleted
> >    - Can't have changes merged into it until required status checks
> >    <https://help.github.com/articles/about-required-status-checks> pass
> >    - Can't have changes merged into it until required reviews are
> approved
> >    <https://help.github.com/articles/approving-a-pull-
> > request-with-required-reviews>
> >    - Can't be edited or have files uploaded to it from the web
> >    - Can't have changes merged into it until changes to files that
> > have a designated
> >    code owner <https://help.github.com/articles/about-codeowners/> have
> >    been approved by that owner
> >
> >  I am sure many of us might not want to have all these but we can debate
> on
> > it. My main motive was to "*Can't have changes merged into it until
> > required status checks pass*"
> >
> >
> > -Gautam
> >
> >
> >
> > On Wed, Sep 27, 2017 at 11:09 PM, Chris Olivier <cj...@gmail.com>
> > wrote:
> >
> > > What does that mean? "Protected"? Protected from what?
> > >
> > > On Wed, Sep 27, 2017 at 11:08 PM Gautam <ga...@gmail.com> wrote:
> > >
> > > > Hi Chris,
> > > >
> > > >    I mean make "master branch protected" of  MXNet.
> > > >
> > > > -Gautam
> > > >
> > > > On Wed, Sep 27, 2017 at 11:04 PM, Chris Olivier <
> cjolivier01@gmail.com
> > >
> > > > wrote:
> > > >
> > > > > What does this mean? "Mx-net branch protected"?
> > > > >
> > > > > On Wed, Sep 27, 2017 at 9:59 PM Tsuyoshi OZAWA <
> > > ozawa.tsuyoshi@gmail.com
> > > > >
> > > > > wrote:
> > > > >
> > > > > > +1,
> > > > > >
> > > > > > While I'm checking the recent build failures, and I think the
> > > decision
> > > > > > of making the mx-net branch protected is necessary for stable
> > > > > > building.
> > > > > > Thanks Kumar for resuming important discussion.
> > > > > >
> > > > > > Best regards
> > > > > > - Tsuyoshi
> > > > > >
> > > > > > On Thu, Sep 28, 2017 at 12:56 PM, Kumar, Gautam <
> gauta@amazon.com>
> > > > > wrote:
> > > > > > > Reviving the discussion.
> > > > > > >
> > > > > > > At this point of time we have couple of stable builds
> > > > > > >
> > > > > > https://builds.apache.org/view/Incubator%20Projects/job/
> > > > > incubator-mxnet/job/master/448/
> > > > > > >
> > > > > > https://builds.apache.org/view/Incubator%20Projects/job/
> > > > > incubator-mxnet/job/master/449/
> > > > > > >
> > > > > > > Should we have a quick discussion or polling on making the
> mx-net
> > > > > branch
> > > > > > protected? If you still think we shouldn’t make it protected
> please
> > > > > provide
> > > > > > a reason to support your claim.
> > > > > > >
> > > > > > > Few of us have concern over Jenkin’s stability. If I look two
> > weeks
> > > > > > back, after upgrading Linux slave to g2.8x and new windows AMI,
> we
> > > have
> > > > > not
> > > > > > seen any case where instance died due to high memory usage or any
> > > > process
> > > > > > got killed due to high cpu usage or any other issue with windows
> > > > slaves.
> > > > > > >
> > > > > > > Going forward we are also planning that if we add any new slave
> > we
> > > > will
> > > > > > not enable the main load immediately, but rather will do ‘test
> > build’
> > > > to
> > > > > > make sure that new slaves are not causing any infrastructure
> issue
> > > and
> > > > > > capable to perform as good as existing slaves.
> > > > > > >
> > > > > > > -Gautam
> > > > > > >
> > > > > > > On 8/31/17, 5:27 PM, "Lupesko, Hagay" <lu...@gmail.com>
> wrote:
> > > > > > >
> > > > > > >     @madan looking into some failures – you’re right… there’s
> > > > multiple
> > > > > > issues going on, some of them intermittent, and we want to be
> able
> > to
> > > > > merge
> > > > > > fixes in.
> > > > > > >     Agreed that we can wait with setting up protected mode
> until
> > > > build
> > > > > > stabilizes.
> > > > > > >
> > > > > > >     On 8/31/17, 11:41, "Madan Jampani" <
> madan.jampani@gmail.com>
> > > > > wrote:
> > > > > > >
> > > > > > >         @hagay: we agree on the end state. I'm not too
> particular
> > > > about
> > > > > > how we get
> > > > > > >         there. If you think enabling it now and fixes
> regression
> > > > later
> > > > > > is doable,
> > > > > > >         I'm fine with. I see a bit of a chicken and egg
> problem.
> > We
> > > > > need
> > > > > > to get
> > > > > > >         some fixes in even when the status checks are failing.
> > > > > > >
> > > > > > >         On Thu, Aug 31, 2017 at 11:25 AM, Lupesko, Hagay <
> > > > > > lupesko@gmail.com> wrote:
> > > > > > >
> > > > > > >         > @madan – re: getting to a stable CI first:
> > > > > > >         > I’m concerned that by not enabling protected branch
> > mode
> > > > > ASAP,
> > > > > > we’re just
> > > > > > >         > taking in more regressions, which makes a stable
> build
> > a
> > > > > > moving target for
> > > > > > >         > us…
> > > > > > >         >
> > > > > > >         > On 8/31/17, 10:49, "Zha, Sheng" <zhasheng@amazon.com
> >
> > > > wrote:
> > > > > > >         >
> > > > > > >         >     Just one thing: please don’t disable more tests
> or
> > > just
> > > > > > raise the
> > > > > > >         > tolerance thresholds.
> > > > > > >         >
> > > > > > >         >     Best regards,
> > > > > > >         >     -sz
> > > > > > >         >
> > > > > > >         >     On 8/31/17, 10:45 AM, "Madan Jampani" <
> > > > > > madan.jampani@gmail.com> wrote:
> > > > > > >         >
> > > > > > >         >         +1
> > > > > > >         >         Before we can turn protected mode I feel we
> > > should
> > > > > > first get to a
> > > > > > >         > stable CI
> > > > > > >         >         pipeline.
> > > > > > >         >         Sandeep is chasing down known breaking
> issues.
> > > > > > >         >
> > > > > > >         >
> > > > > > >         >         On Thu, Aug 31, 2017 at 10:27 AM, Hagay
> > Lupesko <
> > > > > > lupesko@gmail.com>
> > > > > > >         > wrote:
> > > > > > >         >
> > > > > > >         >         > Build stability is a major issue, builds
> have
> > > > been
> > > > > > failing left
> > > > > > >         > and right
> > > > > > >         >         > over the last week. Some of it is due to
> > > Jenkins
> > > > > > slave issues,
> > > > > > >         > but some are
> > > > > > >         >         > real regressions.
> > > > > > >         >         > We need to be more strict in the code we're
> > > > > > committing.
> > > > > > >         >         >
> > > > > > >         >         > I propose we configure our master to be a
> > > > protected
> > > > > > branch (
> > > > > > >         >         >
> > > > > > https://help.github.com/articles/about-protected-branches/).
> > > > > > >         >         >
> > > > > > >         >         > Thoughts?
> > > > > > >         >         >
> > > > > > >         >         > On 2017-08-28 22:41, sandeep krishnamurthy
> <
> > > > > > s...@gmail.com>
> > > > > > >         > wrote:
> > > > > > >         >         > > Hello Committers and Contributors,>
> > > > > > >         >         > >
> > > > > > >         >         > > Due to unstable build pipelines, from
> past
> > 1
> > > > > week,
> > > > > > PRs are
> > > > > > >         > being merged>
> > > > > > >         >         > > after CR ignoring PR build status. Build
> > > > pipeline
> > > > > > is much more
> > > > > > >         > stable
> > > > > > >         >         > than>
> > > > > > >         >         > > last week and most of the build failures
> > you
> > > > see
> > > > > > from now on,
> > > > > > >         > are likely
> > > > > > >         >         > to>
> > > > > > >         >         > > be a valid failure and hence, it is
> > > recommended
> > > > > to
> > > > > > wait for PR
> > > > > > >         > builds,
> > > > > > >         >         > see>
> > > > > > >         >         > > the root cause of any build failures
> before
> > > > > > proceeding with
> > > > > > >         > merges.>
> > > > > > >         >         > >
> > > > > > >         >         > > At this point of time, there are 2
> > > intermittent
> > > > > > issue yet to
> > > > > > >         > be fixed ->
> > > > > > >         >         > > * Network error leading to GitHub
> requests
> > > > > > throwing 404>
> > > > > > >         >         > > * A conflict in artifacts generated
> between
> > > > > > branches/PR -
> > > > > > >         > Cause unknown
> > > > > > >         >         > yet.>
> > > > > > >         >         > > These issues will be fixed soon.>
> > > > > > >         >         > >
> > > > > > >         >         > >
> > > > > > >         >         > > -- >
> > > > > > >         >         > > Sandeep Krishnamurthy>
> > > > > > >         >         > >
> > > > > > >         >         >
> > > > > > >         >
> > > > > > >         >
> > > > > > >         >
> > > > > > >         >
> > > > > > >         >
> > > > > > >         >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > - Tsuyoshi
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Best Regards,
> > > > Gautam Kumar
> > > >
> > >
> >
> >
> >
> > --
> > Best Regards,
> > Gautam Kumar
> >
>

Re: Apache MXNet build failures are mostly valid - verify before merge

Posted by Madan Jampani <ma...@gmail.com>.
+1

At a minimum I'd like to see the following two happen:
- Option to merge is disabled until all required checks pass.
- Code is reviewed and given +1 by at least one other committer (no self
review).

On Wed, Sep 27, 2017 at 11:15 PM, Gautam <ga...@gmail.com> wrote:

> Hi Chris,
>
>   Here <https://help.github.com/articles/about-protected-branches/> is
> user
> document on semantics of protected branch.
> In short when a branch is protected following applies to that branch.
>
>    - Can't be force pushed
>    - Can't be deleted
>    - Can't have changes merged into it until required status checks
>    <https://help.github.com/articles/about-required-status-checks> pass
>    - Can't have changes merged into it until required reviews are approved
>    <https://help.github.com/articles/approving-a-pull-
> request-with-required-reviews>
>    - Can't be edited or have files uploaded to it from the web
>    - Can't have changes merged into it until changes to files that
> have a designated
>    code owner <https://help.github.com/articles/about-codeowners/> have
>    been approved by that owner
>
>  I am sure many of us might not want to have all these but we can debate on
> it. My main motive was to "*Can't have changes merged into it until
> required status checks pass*"
>
>
> -Gautam
>
>
>
> On Wed, Sep 27, 2017 at 11:09 PM, Chris Olivier <cj...@gmail.com>
> wrote:
>
> > What does that mean? "Protected"? Protected from what?
> >
> > On Wed, Sep 27, 2017 at 11:08 PM Gautam <ga...@gmail.com> wrote:
> >
> > > Hi Chris,
> > >
> > >    I mean make "master branch protected" of  MXNet.
> > >
> > > -Gautam
> > >
> > > On Wed, Sep 27, 2017 at 11:04 PM, Chris Olivier <cjolivier01@gmail.com
> >
> > > wrote:
> > >
> > > > What does this mean? "Mx-net branch protected"?
> > > >
> > > > On Wed, Sep 27, 2017 at 9:59 PM Tsuyoshi OZAWA <
> > ozawa.tsuyoshi@gmail.com
> > > >
> > > > wrote:
> > > >
> > > > > +1,
> > > > >
> > > > > While I'm checking the recent build failures, and I think the
> > decision
> > > > > of making the mx-net branch protected is necessary for stable
> > > > > building.
> > > > > Thanks Kumar for resuming important discussion.
> > > > >
> > > > > Best regards
> > > > > - Tsuyoshi
> > > > >
> > > > > On Thu, Sep 28, 2017 at 12:56 PM, Kumar, Gautam <ga...@amazon.com>
> > > > wrote:
> > > > > > Reviving the discussion.
> > > > > >
> > > > > > At this point of time we have couple of stable builds
> > > > > >
> > > > > https://builds.apache.org/view/Incubator%20Projects/job/
> > > > incubator-mxnet/job/master/448/
> > > > > >
> > > > > https://builds.apache.org/view/Incubator%20Projects/job/
> > > > incubator-mxnet/job/master/449/
> > > > > >
> > > > > > Should we have a quick discussion or polling on making the mx-net
> > > > branch
> > > > > protected? If you still think we shouldn’t make it protected please
> > > > provide
> > > > > a reason to support your claim.
> > > > > >
> > > > > > Few of us have concern over Jenkin’s stability. If I look two
> weeks
> > > > > back, after upgrading Linux slave to g2.8x and new windows AMI, we
> > have
> > > > not
> > > > > seen any case where instance died due to high memory usage or any
> > > process
> > > > > got killed due to high cpu usage or any other issue with windows
> > > slaves.
> > > > > >
> > > > > > Going forward we are also planning that if we add any new slave
> we
> > > will
> > > > > not enable the main load immediately, but rather will do ‘test
> build’
> > > to
> > > > > make sure that new slaves are not causing any infrastructure issue
> > and
> > > > > capable to perform as good as existing slaves.
> > > > > >
> > > > > > -Gautam
> > > > > >
> > > > > > On 8/31/17, 5:27 PM, "Lupesko, Hagay" <lu...@gmail.com> wrote:
> > > > > >
> > > > > >     @madan looking into some failures – you’re right… there’s
> > > multiple
> > > > > issues going on, some of them intermittent, and we want to be able
> to
> > > > merge
> > > > > fixes in.
> > > > > >     Agreed that we can wait with setting up protected mode until
> > > build
> > > > > stabilizes.
> > > > > >
> > > > > >     On 8/31/17, 11:41, "Madan Jampani" <ma...@gmail.com>
> > > > wrote:
> > > > > >
> > > > > >         @hagay: we agree on the end state. I'm not too particular
> > > about
> > > > > how we get
> > > > > >         there. If you think enabling it now and fixes regression
> > > later
> > > > > is doable,
> > > > > >         I'm fine with. I see a bit of a chicken and egg problem.
> We
> > > > need
> > > > > to get
> > > > > >         some fixes in even when the status checks are failing.
> > > > > >
> > > > > >         On Thu, Aug 31, 2017 at 11:25 AM, Lupesko, Hagay <
> > > > > lupesko@gmail.com> wrote:
> > > > > >
> > > > > >         > @madan – re: getting to a stable CI first:
> > > > > >         > I’m concerned that by not enabling protected branch
> mode
> > > > ASAP,
> > > > > we’re just
> > > > > >         > taking in more regressions, which makes a stable build
> a
> > > > > moving target for
> > > > > >         > us…
> > > > > >         >
> > > > > >         > On 8/31/17, 10:49, "Zha, Sheng" <zh...@amazon.com>
> > > wrote:
> > > > > >         >
> > > > > >         >     Just one thing: please don’t disable more tests or
> > just
> > > > > raise the
> > > > > >         > tolerance thresholds.
> > > > > >         >
> > > > > >         >     Best regards,
> > > > > >         >     -sz
> > > > > >         >
> > > > > >         >     On 8/31/17, 10:45 AM, "Madan Jampani" <
> > > > > madan.jampani@gmail.com> wrote:
> > > > > >         >
> > > > > >         >         +1
> > > > > >         >         Before we can turn protected mode I feel we
> > should
> > > > > first get to a
> > > > > >         > stable CI
> > > > > >         >         pipeline.
> > > > > >         >         Sandeep is chasing down known breaking issues.
> > > > > >         >
> > > > > >         >
> > > > > >         >         On Thu, Aug 31, 2017 at 10:27 AM, Hagay
> Lupesko <
> > > > > lupesko@gmail.com>
> > > > > >         > wrote:
> > > > > >         >
> > > > > >         >         > Build stability is a major issue, builds have
> > > been
> > > > > failing left
> > > > > >         > and right
> > > > > >         >         > over the last week. Some of it is due to
> > Jenkins
> > > > > slave issues,
> > > > > >         > but some are
> > > > > >         >         > real regressions.
> > > > > >         >         > We need to be more strict in the code we're
> > > > > committing.
> > > > > >         >         >
> > > > > >         >         > I propose we configure our master to be a
> > > protected
> > > > > branch (
> > > > > >         >         >
> > > > > https://help.github.com/articles/about-protected-branches/).
> > > > > >         >         >
> > > > > >         >         > Thoughts?
> > > > > >         >         >
> > > > > >         >         > On 2017-08-28 22:41, sandeep krishnamurthy <
> > > > > s...@gmail.com>
> > > > > >         > wrote:
> > > > > >         >         > > Hello Committers and Contributors,>
> > > > > >         >         > >
> > > > > >         >         > > Due to unstable build pipelines, from past
> 1
> > > > week,
> > > > > PRs are
> > > > > >         > being merged>
> > > > > >         >         > > after CR ignoring PR build status. Build
> > > pipeline
> > > > > is much more
> > > > > >         > stable
> > > > > >         >         > than>
> > > > > >         >         > > last week and most of the build failures
> you
> > > see
> > > > > from now on,
> > > > > >         > are likely
> > > > > >         >         > to>
> > > > > >         >         > > be a valid failure and hence, it is
> > recommended
> > > > to
> > > > > wait for PR
> > > > > >         > builds,
> > > > > >         >         > see>
> > > > > >         >         > > the root cause of any build failures before
> > > > > proceeding with
> > > > > >         > merges.>
> > > > > >         >         > >
> > > > > >         >         > > At this point of time, there are 2
> > intermittent
> > > > > issue yet to
> > > > > >         > be fixed ->
> > > > > >         >         > > * Network error leading to GitHub requests
> > > > > throwing 404>
> > > > > >         >         > > * A conflict in artifacts generated between
> > > > > branches/PR -
> > > > > >         > Cause unknown
> > > > > >         >         > yet.>
> > > > > >         >         > > These issues will be fixed soon.>
> > > > > >         >         > >
> > > > > >         >         > >
> > > > > >         >         > > -- >
> > > > > >         >         > > Sandeep Krishnamurthy>
> > > > > >         >         > >
> > > > > >         >         >
> > > > > >         >
> > > > > >         >
> > > > > >         >
> > > > > >         >
> > > > > >         >
> > > > > >         >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > - Tsuyoshi
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Best Regards,
> > > Gautam Kumar
> > >
> >
>
>
>
> --
> Best Regards,
> Gautam Kumar
>

Re: Apache MXNet build failures are mostly valid - verify before merge

Posted by Pedro Larroy <pe...@gmail.com>.
Given the cost of running all the test for all the build flavors and
architectures I would propose the following:



   - Have a staging branch were PRs are merged by committers which runs all
   the integration tests with the appropriate frequency (say nightly).
   - Have automated fast forwards from the staging branch done
   automatically by Jenkins when the latest head passes all the tests for all
   platforms.



With this we would always have a stable master branch which is well tested,
while being able to adjust the tradeoff between correctness and quick
feedback for PRs.

Another improvement would be to split the feedback in stages, one would be
multi-platform / multi-flavor build which should be around 20 minutes, and
then two or more stages of quick tests and extensive tests. And as I
explained, we wouldn't need to run extensive tests on every PR, just
nightly on staging.

What do you think?

Pedro.

On Thu, Sep 28, 2017 at 2:02 PM, Joern Kottmann <ko...@gmail.com> wrote:

> At Apache OpenNLP we just established among committers that you check
> that the status indicator is green before you merge,
> and if it wasn't the case then we would ask the committer to take
> responsibility and repair things. Works very well our build is never
> broken.
>
> We also strongly prefer if each PR gets reviewed by another committer.
>
> Overall this works quite well. We don't use any of the protections
> against merging, it is important that you can trust each of the
> committers not to break things, if there are problems it is better to
> resolve them with talking to each other, rather than enforcing green
> status checks.
>
> Jörn
>
> On Thu, Sep 28, 2017 at 8:21 AM, Chris Olivier <cj...@gmail.com>
> wrote:
> > +1 on that
> >
> > On Wed, Sep 27, 2017 at 11:15 PM Gautam <ga...@gmail.com> wrote:
> >
> >> Hi Chris,
> >>
> >>   Here <https://help.github.com/articles/about-protected-branches/> is
> >> user
> >> document on semantics of protected branch.
> >> In short when a branch is protected following applies to that branch.
> >>
> >>    - Can't be force pushed
> >>    - Can't be deleted
> >>    - Can't have changes merged into it until required status checks
> >>    <https://help.github.com/articles/about-required-status-checks> pass
> >>    - Can't have changes merged into it until required reviews are
> approved
> >>    <
> >> https://help.github.com/articles/approving-a-pull-
> request-with-required-reviews
> >> >
> >>    - Can't be edited or have files uploaded to it from the web
> >>    - Can't have changes merged into it until changes to files that
> >> have a designated
> >>    code owner <https://help.github.com/articles/about-codeowners/> have
> >>    been approved by that owner
> >>
> >>  I am sure many of us might not want to have all these but we can
> debate on
> >> it. My main motive was to "*Can't have changes merged into it until
> >> required status checks pass*"
> >>
> >>
> >> -Gautam
> >>
> >>
> >>
> >> On Wed, Sep 27, 2017 at 11:09 PM, Chris Olivier <cj...@gmail.com>
> >> wrote:
> >>
> >> > What does that mean? "Protected"? Protected from what?
> >> >
> >> > On Wed, Sep 27, 2017 at 11:08 PM Gautam <ga...@gmail.com> wrote:
> >> >
> >> > > Hi Chris,
> >> > >
> >> > >    I mean make "master branch protected" of  MXNet.
> >> > >
> >> > > -Gautam
> >> > >
> >> > > On Wed, Sep 27, 2017 at 11:04 PM, Chris Olivier <
> cjolivier01@gmail.com
> >> >
> >> > > wrote:
> >> > >
> >> > > > What does this mean? "Mx-net branch protected"?
> >> > > >
> >> > > > On Wed, Sep 27, 2017 at 9:59 PM Tsuyoshi OZAWA <
> >> > ozawa.tsuyoshi@gmail.com
> >> > > >
> >> > > > wrote:
> >> > > >
> >> > > > > +1,
> >> > > > >
> >> > > > > While I'm checking the recent build failures, and I think the
> >> > decision
> >> > > > > of making the mx-net branch protected is necessary for stable
> >> > > > > building.
> >> > > > > Thanks Kumar for resuming important discussion.
> >> > > > >
> >> > > > > Best regards
> >> > > > > - Tsuyoshi
> >> > > > >
> >> > > > > On Thu, Sep 28, 2017 at 12:56 PM, Kumar, Gautam <
> gauta@amazon.com>
> >> > > > wrote:
> >> > > > > > Reviving the discussion.
> >> > > > > >
> >> > > > > > At this point of time we have couple of stable builds
> >> > > > > >
> >> > > > > https://builds.apache.org/view/Incubator%20Projects/job/
> >> > > > incubator-mxnet/job/master/448/
> >> > > > > >
> >> > > > > https://builds.apache.org/view/Incubator%20Projects/job/
> >> > > > incubator-mxnet/job/master/449/
> >> > > > > >
> >> > > > > > Should we have a quick discussion or polling on making the
> mx-net
> >> > > > branch
> >> > > > > protected? If you still think we shouldn’t make it protected
> please
> >> > > > provide
> >> > > > > a reason to support your claim.
> >> > > > > >
> >> > > > > > Few of us have concern over Jenkin’s stability. If I look two
> >> weeks
> >> > > > > back, after upgrading Linux slave to g2.8x and new windows AMI,
> we
> >> > have
> >> > > > not
> >> > > > > seen any case where instance died due to high memory usage or
> any
> >> > > process
> >> > > > > got killed due to high cpu usage or any other issue with windows
> >> > > slaves.
> >> > > > > >
> >> > > > > > Going forward we are also planning that if we add any new
> slave
> >> we
> >> > > will
> >> > > > > not enable the main load immediately, but rather will do ‘test
> >> build’
> >> > > to
> >> > > > > make sure that new slaves are not causing any infrastructure
> issue
> >> > and
> >> > > > > capable to perform as good as existing slaves.
> >> > > > > >
> >> > > > > > -Gautam
> >> > > > > >
> >> > > > > > On 8/31/17, 5:27 PM, "Lupesko, Hagay" <lu...@gmail.com>
> wrote:
> >> > > > > >
> >> > > > > >     @madan looking into some failures – you’re right… there’s
> >> > > multiple
> >> > > > > issues going on, some of them intermittent, and we want to be
> able
> >> to
> >> > > > merge
> >> > > > > fixes in.
> >> > > > > >     Agreed that we can wait with setting up protected mode
> until
> >> > > build
> >> > > > > stabilizes.
> >> > > > > >
> >> > > > > >     On 8/31/17, 11:41, "Madan Jampani" <
> madan.jampani@gmail.com>
> >> > > > wrote:
> >> > > > > >
> >> > > > > >         @hagay: we agree on the end state. I'm not too
> particular
> >> > > about
> >> > > > > how we get
> >> > > > > >         there. If you think enabling it now and fixes
> regression
> >> > > later
> >> > > > > is doable,
> >> > > > > >         I'm fine with. I see a bit of a chicken and egg
> problem.
> >> We
> >> > > > need
> >> > > > > to get
> >> > > > > >         some fixes in even when the status checks are failing.
> >> > > > > >
> >> > > > > >         On Thu, Aug 31, 2017 at 11:25 AM, Lupesko, Hagay <
> >> > > > > lupesko@gmail.com> wrote:
> >> > > > > >
> >> > > > > >         > @madan – re: getting to a stable CI first:
> >> > > > > >         > I’m concerned that by not enabling protected branch
> >> mode
> >> > > > ASAP,
> >> > > > > we’re just
> >> > > > > >         > taking in more regressions, which makes a stable
> build
> >> a
> >> > > > > moving target for
> >> > > > > >         > us…
> >> > > > > >         >
> >> > > > > >         > On 8/31/17, 10:49, "Zha, Sheng" <
> zhasheng@amazon.com>
> >> > > wrote:
> >> > > > > >         >
> >> > > > > >         >     Just one thing: please don’t disable more tests
> or
> >> > just
> >> > > > > raise the
> >> > > > > >         > tolerance thresholds.
> >> > > > > >         >
> >> > > > > >         >     Best regards,
> >> > > > > >         >     -sz
> >> > > > > >         >
> >> > > > > >         >     On 8/31/17, 10:45 AM, "Madan Jampani" <
> >> > > > > madan.jampani@gmail.com> wrote:
> >> > > > > >         >
> >> > > > > >         >         +1
> >> > > > > >         >         Before we can turn protected mode I feel we
> >> > should
> >> > > > > first get to a
> >> > > > > >         > stable CI
> >> > > > > >         >         pipeline.
> >> > > > > >         >         Sandeep is chasing down known breaking
> issues.
> >> > > > > >         >
> >> > > > > >         >
> >> > > > > >         >         On Thu, Aug 31, 2017 at 10:27 AM, Hagay
> >> Lupesko <
> >> > > > > lupesko@gmail.com>
> >> > > > > >         > wrote:
> >> > > > > >         >
> >> > > > > >         >         > Build stability is a major issue, builds
> have
> >> > > been
> >> > > > > failing left
> >> > > > > >         > and right
> >> > > > > >         >         > over the last week. Some of it is due to
> >> > Jenkins
> >> > > > > slave issues,
> >> > > > > >         > but some are
> >> > > > > >         >         > real regressions.
> >> > > > > >         >         > We need to be more strict in the code
> we're
> >> > > > > committing.
> >> > > > > >         >         >
> >> > > > > >         >         > I propose we configure our master to be a
> >> > > protected
> >> > > > > branch (
> >> > > > > >         >         >
> >> > > > > https://help.github.com/articles/about-protected-branches/).
> >> > > > > >         >         >
> >> > > > > >         >         > Thoughts?
> >> > > > > >         >         >
> >> > > > > >         >         > On 2017-08-28 22:41, sandeep
> krishnamurthy <
> >> > > > > s...@gmail.com>
> >> > > > > >         > wrote:
> >> > > > > >         >         > > Hello Committers and Contributors,>
> >> > > > > >         >         > >
> >> > > > > >         >         > > Due to unstable build pipelines, from
> past
> >> 1
> >> > > > week,
> >> > > > > PRs are
> >> > > > > >         > being merged>
> >> > > > > >         >         > > after CR ignoring PR build status. Build
> >> > > pipeline
> >> > > > > is much more
> >> > > > > >         > stable
> >> > > > > >         >         > than>
> >> > > > > >         >         > > last week and most of the build failures
> >> you
> >> > > see
> >> > > > > from now on,
> >> > > > > >         > are likely
> >> > > > > >         >         > to>
> >> > > > > >         >         > > be a valid failure and hence, it is
> >> > recommended
> >> > > > to
> >> > > > > wait for PR
> >> > > > > >         > builds,
> >> > > > > >         >         > see>
> >> > > > > >         >         > > the root cause of any build failures
> before
> >> > > > > proceeding with
> >> > > > > >         > merges.>
> >> > > > > >         >         > >
> >> > > > > >         >         > > At this point of time, there are 2
> >> > intermittent
> >> > > > > issue yet to
> >> > > > > >         > be fixed ->
> >> > > > > >         >         > > * Network error leading to GitHub
> requests
> >> > > > > throwing 404>
> >> > > > > >         >         > > * A conflict in artifacts generated
> between
> >> > > > > branches/PR -
> >> > > > > >         > Cause unknown
> >> > > > > >         >         > yet.>
> >> > > > > >         >         > > These issues will be fixed soon.>
> >> > > > > >         >         > >
> >> > > > > >         >         > >
> >> > > > > >         >         > > -- >
> >> > > > > >         >         > > Sandeep Krishnamurthy>
> >> > > > > >         >         > >
> >> > > > > >         >         >
> >> > > > > >         >
> >> > > > > >         >
> >> > > > > >         >
> >> > > > > >         >
> >> > > > > >         >
> >> > > > > >         >
> >> > > > > >
> >> > > > > >
> >> > > > > >
> >> > > > > >
> >> > > > > >
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > > --
> >> > > > > - Tsuyoshi
> >> > > > >
> >> > > >
> >> > >
> >> > >
> >> > >
> >> > > --
> >> > > Best Regards,
> >> > > Gautam Kumar
> >> > >
> >> >
> >>
> >>
> >>
> >> --
> >> Best Regards,
> >> Gautam Kumar
> >>
>

Re: Apache MXNet build failures are mostly valid - verify before merge

Posted by Joern Kottmann <ko...@gmail.com>.
At Apache OpenNLP we just established among committers that you check
that the status indicator is green before you merge,
and if it wasn't the case then we would ask the committer to take
responsibility and repair things. Works very well our build is never
broken.

We also strongly prefer if each PR gets reviewed by another committer.

Overall this works quite well. We don't use any of the protections
against merging, it is important that you can trust each of the
committers not to break things, if there are problems it is better to
resolve them with talking to each other, rather than enforcing green
status checks.

Jörn

On Thu, Sep 28, 2017 at 8:21 AM, Chris Olivier <cj...@gmail.com> wrote:
> +1 on that
>
> On Wed, Sep 27, 2017 at 11:15 PM Gautam <ga...@gmail.com> wrote:
>
>> Hi Chris,
>>
>>   Here <https://help.github.com/articles/about-protected-branches/> is
>> user
>> document on semantics of protected branch.
>> In short when a branch is protected following applies to that branch.
>>
>>    - Can't be force pushed
>>    - Can't be deleted
>>    - Can't have changes merged into it until required status checks
>>    <https://help.github.com/articles/about-required-status-checks> pass
>>    - Can't have changes merged into it until required reviews are approved
>>    <
>> https://help.github.com/articles/approving-a-pull-request-with-required-reviews
>> >
>>    - Can't be edited or have files uploaded to it from the web
>>    - Can't have changes merged into it until changes to files that
>> have a designated
>>    code owner <https://help.github.com/articles/about-codeowners/> have
>>    been approved by that owner
>>
>>  I am sure many of us might not want to have all these but we can debate on
>> it. My main motive was to "*Can't have changes merged into it until
>> required status checks pass*"
>>
>>
>> -Gautam
>>
>>
>>
>> On Wed, Sep 27, 2017 at 11:09 PM, Chris Olivier <cj...@gmail.com>
>> wrote:
>>
>> > What does that mean? "Protected"? Protected from what?
>> >
>> > On Wed, Sep 27, 2017 at 11:08 PM Gautam <ga...@gmail.com> wrote:
>> >
>> > > Hi Chris,
>> > >
>> > >    I mean make "master branch protected" of  MXNet.
>> > >
>> > > -Gautam
>> > >
>> > > On Wed, Sep 27, 2017 at 11:04 PM, Chris Olivier <cjolivier01@gmail.com
>> >
>> > > wrote:
>> > >
>> > > > What does this mean? "Mx-net branch protected"?
>> > > >
>> > > > On Wed, Sep 27, 2017 at 9:59 PM Tsuyoshi OZAWA <
>> > ozawa.tsuyoshi@gmail.com
>> > > >
>> > > > wrote:
>> > > >
>> > > > > +1,
>> > > > >
>> > > > > While I'm checking the recent build failures, and I think the
>> > decision
>> > > > > of making the mx-net branch protected is necessary for stable
>> > > > > building.
>> > > > > Thanks Kumar for resuming important discussion.
>> > > > >
>> > > > > Best regards
>> > > > > - Tsuyoshi
>> > > > >
>> > > > > On Thu, Sep 28, 2017 at 12:56 PM, Kumar, Gautam <ga...@amazon.com>
>> > > > wrote:
>> > > > > > Reviving the discussion.
>> > > > > >
>> > > > > > At this point of time we have couple of stable builds
>> > > > > >
>> > > > > https://builds.apache.org/view/Incubator%20Projects/job/
>> > > > incubator-mxnet/job/master/448/
>> > > > > >
>> > > > > https://builds.apache.org/view/Incubator%20Projects/job/
>> > > > incubator-mxnet/job/master/449/
>> > > > > >
>> > > > > > Should we have a quick discussion or polling on making the mx-net
>> > > > branch
>> > > > > protected? If you still think we shouldn’t make it protected please
>> > > > provide
>> > > > > a reason to support your claim.
>> > > > > >
>> > > > > > Few of us have concern over Jenkin’s stability. If I look two
>> weeks
>> > > > > back, after upgrading Linux slave to g2.8x and new windows AMI, we
>> > have
>> > > > not
>> > > > > seen any case where instance died due to high memory usage or any
>> > > process
>> > > > > got killed due to high cpu usage or any other issue with windows
>> > > slaves.
>> > > > > >
>> > > > > > Going forward we are also planning that if we add any new slave
>> we
>> > > will
>> > > > > not enable the main load immediately, but rather will do ‘test
>> build’
>> > > to
>> > > > > make sure that new slaves are not causing any infrastructure issue
>> > and
>> > > > > capable to perform as good as existing slaves.
>> > > > > >
>> > > > > > -Gautam
>> > > > > >
>> > > > > > On 8/31/17, 5:27 PM, "Lupesko, Hagay" <lu...@gmail.com> wrote:
>> > > > > >
>> > > > > >     @madan looking into some failures – you’re right… there’s
>> > > multiple
>> > > > > issues going on, some of them intermittent, and we want to be able
>> to
>> > > > merge
>> > > > > fixes in.
>> > > > > >     Agreed that we can wait with setting up protected mode until
>> > > build
>> > > > > stabilizes.
>> > > > > >
>> > > > > >     On 8/31/17, 11:41, "Madan Jampani" <ma...@gmail.com>
>> > > > wrote:
>> > > > > >
>> > > > > >         @hagay: we agree on the end state. I'm not too particular
>> > > about
>> > > > > how we get
>> > > > > >         there. If you think enabling it now and fixes regression
>> > > later
>> > > > > is doable,
>> > > > > >         I'm fine with. I see a bit of a chicken and egg problem.
>> We
>> > > > need
>> > > > > to get
>> > > > > >         some fixes in even when the status checks are failing.
>> > > > > >
>> > > > > >         On Thu, Aug 31, 2017 at 11:25 AM, Lupesko, Hagay <
>> > > > > lupesko@gmail.com> wrote:
>> > > > > >
>> > > > > >         > @madan – re: getting to a stable CI first:
>> > > > > >         > I’m concerned that by not enabling protected branch
>> mode
>> > > > ASAP,
>> > > > > we’re just
>> > > > > >         > taking in more regressions, which makes a stable build
>> a
>> > > > > moving target for
>> > > > > >         > us…
>> > > > > >         >
>> > > > > >         > On 8/31/17, 10:49, "Zha, Sheng" <zh...@amazon.com>
>> > > wrote:
>> > > > > >         >
>> > > > > >         >     Just one thing: please don’t disable more tests or
>> > just
>> > > > > raise the
>> > > > > >         > tolerance thresholds.
>> > > > > >         >
>> > > > > >         >     Best regards,
>> > > > > >         >     -sz
>> > > > > >         >
>> > > > > >         >     On 8/31/17, 10:45 AM, "Madan Jampani" <
>> > > > > madan.jampani@gmail.com> wrote:
>> > > > > >         >
>> > > > > >         >         +1
>> > > > > >         >         Before we can turn protected mode I feel we
>> > should
>> > > > > first get to a
>> > > > > >         > stable CI
>> > > > > >         >         pipeline.
>> > > > > >         >         Sandeep is chasing down known breaking issues.
>> > > > > >         >
>> > > > > >         >
>> > > > > >         >         On Thu, Aug 31, 2017 at 10:27 AM, Hagay
>> Lupesko <
>> > > > > lupesko@gmail.com>
>> > > > > >         > wrote:
>> > > > > >         >
>> > > > > >         >         > Build stability is a major issue, builds have
>> > > been
>> > > > > failing left
>> > > > > >         > and right
>> > > > > >         >         > over the last week. Some of it is due to
>> > Jenkins
>> > > > > slave issues,
>> > > > > >         > but some are
>> > > > > >         >         > real regressions.
>> > > > > >         >         > We need to be more strict in the code we're
>> > > > > committing.
>> > > > > >         >         >
>> > > > > >         >         > I propose we configure our master to be a
>> > > protected
>> > > > > branch (
>> > > > > >         >         >
>> > > > > https://help.github.com/articles/about-protected-branches/).
>> > > > > >         >         >
>> > > > > >         >         > Thoughts?
>> > > > > >         >         >
>> > > > > >         >         > On 2017-08-28 22:41, sandeep krishnamurthy <
>> > > > > s...@gmail.com>
>> > > > > >         > wrote:
>> > > > > >         >         > > Hello Committers and Contributors,>
>> > > > > >         >         > >
>> > > > > >         >         > > Due to unstable build pipelines, from past
>> 1
>> > > > week,
>> > > > > PRs are
>> > > > > >         > being merged>
>> > > > > >         >         > > after CR ignoring PR build status. Build
>> > > pipeline
>> > > > > is much more
>> > > > > >         > stable
>> > > > > >         >         > than>
>> > > > > >         >         > > last week and most of the build failures
>> you
>> > > see
>> > > > > from now on,
>> > > > > >         > are likely
>> > > > > >         >         > to>
>> > > > > >         >         > > be a valid failure and hence, it is
>> > recommended
>> > > > to
>> > > > > wait for PR
>> > > > > >         > builds,
>> > > > > >         >         > see>
>> > > > > >         >         > > the root cause of any build failures before
>> > > > > proceeding with
>> > > > > >         > merges.>
>> > > > > >         >         > >
>> > > > > >         >         > > At this point of time, there are 2
>> > intermittent
>> > > > > issue yet to
>> > > > > >         > be fixed ->
>> > > > > >         >         > > * Network error leading to GitHub requests
>> > > > > throwing 404>
>> > > > > >         >         > > * A conflict in artifacts generated between
>> > > > > branches/PR -
>> > > > > >         > Cause unknown
>> > > > > >         >         > yet.>
>> > > > > >         >         > > These issues will be fixed soon.>
>> > > > > >         >         > >
>> > > > > >         >         > >
>> > > > > >         >         > > -- >
>> > > > > >         >         > > Sandeep Krishnamurthy>
>> > > > > >         >         > >
>> > > > > >         >         >
>> > > > > >         >
>> > > > > >         >
>> > > > > >         >
>> > > > > >         >
>> > > > > >         >
>> > > > > >         >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > --
>> > > > > - Tsuyoshi
>> > > > >
>> > > >
>> > >
>> > >
>> > >
>> > > --
>> > > Best Regards,
>> > > Gautam Kumar
>> > >
>> >
>>
>>
>>
>> --
>> Best Regards,
>> Gautam Kumar
>>

Re: Apache MXNet build failures are mostly valid - verify before merge

Posted by Chris Olivier <cj...@gmail.com>.
+1 on that

On Wed, Sep 27, 2017 at 11:15 PM Gautam <ga...@gmail.com> wrote:

> Hi Chris,
>
>   Here <https://help.github.com/articles/about-protected-branches/> is
> user
> document on semantics of protected branch.
> In short when a branch is protected following applies to that branch.
>
>    - Can't be force pushed
>    - Can't be deleted
>    - Can't have changes merged into it until required status checks
>    <https://help.github.com/articles/about-required-status-checks> pass
>    - Can't have changes merged into it until required reviews are approved
>    <
> https://help.github.com/articles/approving-a-pull-request-with-required-reviews
> >
>    - Can't be edited or have files uploaded to it from the web
>    - Can't have changes merged into it until changes to files that
> have a designated
>    code owner <https://help.github.com/articles/about-codeowners/> have
>    been approved by that owner
>
>  I am sure many of us might not want to have all these but we can debate on
> it. My main motive was to "*Can't have changes merged into it until
> required status checks pass*"
>
>
> -Gautam
>
>
>
> On Wed, Sep 27, 2017 at 11:09 PM, Chris Olivier <cj...@gmail.com>
> wrote:
>
> > What does that mean? "Protected"? Protected from what?
> >
> > On Wed, Sep 27, 2017 at 11:08 PM Gautam <ga...@gmail.com> wrote:
> >
> > > Hi Chris,
> > >
> > >    I mean make "master branch protected" of  MXNet.
> > >
> > > -Gautam
> > >
> > > On Wed, Sep 27, 2017 at 11:04 PM, Chris Olivier <cjolivier01@gmail.com
> >
> > > wrote:
> > >
> > > > What does this mean? "Mx-net branch protected"?
> > > >
> > > > On Wed, Sep 27, 2017 at 9:59 PM Tsuyoshi OZAWA <
> > ozawa.tsuyoshi@gmail.com
> > > >
> > > > wrote:
> > > >
> > > > > +1,
> > > > >
> > > > > While I'm checking the recent build failures, and I think the
> > decision
> > > > > of making the mx-net branch protected is necessary for stable
> > > > > building.
> > > > > Thanks Kumar for resuming important discussion.
> > > > >
> > > > > Best regards
> > > > > - Tsuyoshi
> > > > >
> > > > > On Thu, Sep 28, 2017 at 12:56 PM, Kumar, Gautam <ga...@amazon.com>
> > > > wrote:
> > > > > > Reviving the discussion.
> > > > > >
> > > > > > At this point of time we have couple of stable builds
> > > > > >
> > > > > https://builds.apache.org/view/Incubator%20Projects/job/
> > > > incubator-mxnet/job/master/448/
> > > > > >
> > > > > https://builds.apache.org/view/Incubator%20Projects/job/
> > > > incubator-mxnet/job/master/449/
> > > > > >
> > > > > > Should we have a quick discussion or polling on making the mx-net
> > > > branch
> > > > > protected? If you still think we shouldn’t make it protected please
> > > > provide
> > > > > a reason to support your claim.
> > > > > >
> > > > > > Few of us have concern over Jenkin’s stability. If I look two
> weeks
> > > > > back, after upgrading Linux slave to g2.8x and new windows AMI, we
> > have
> > > > not
> > > > > seen any case where instance died due to high memory usage or any
> > > process
> > > > > got killed due to high cpu usage or any other issue with windows
> > > slaves.
> > > > > >
> > > > > > Going forward we are also planning that if we add any new slave
> we
> > > will
> > > > > not enable the main load immediately, but rather will do ‘test
> build’
> > > to
> > > > > make sure that new slaves are not causing any infrastructure issue
> > and
> > > > > capable to perform as good as existing slaves.
> > > > > >
> > > > > > -Gautam
> > > > > >
> > > > > > On 8/31/17, 5:27 PM, "Lupesko, Hagay" <lu...@gmail.com> wrote:
> > > > > >
> > > > > >     @madan looking into some failures – you’re right… there’s
> > > multiple
> > > > > issues going on, some of them intermittent, and we want to be able
> to
> > > > merge
> > > > > fixes in.
> > > > > >     Agreed that we can wait with setting up protected mode until
> > > build
> > > > > stabilizes.
> > > > > >
> > > > > >     On 8/31/17, 11:41, "Madan Jampani" <ma...@gmail.com>
> > > > wrote:
> > > > > >
> > > > > >         @hagay: we agree on the end state. I'm not too particular
> > > about
> > > > > how we get
> > > > > >         there. If you think enabling it now and fixes regression
> > > later
> > > > > is doable,
> > > > > >         I'm fine with. I see a bit of a chicken and egg problem.
> We
> > > > need
> > > > > to get
> > > > > >         some fixes in even when the status checks are failing.
> > > > > >
> > > > > >         On Thu, Aug 31, 2017 at 11:25 AM, Lupesko, Hagay <
> > > > > lupesko@gmail.com> wrote:
> > > > > >
> > > > > >         > @madan – re: getting to a stable CI first:
> > > > > >         > I’m concerned that by not enabling protected branch
> mode
> > > > ASAP,
> > > > > we’re just
> > > > > >         > taking in more regressions, which makes a stable build
> a
> > > > > moving target for
> > > > > >         > us…
> > > > > >         >
> > > > > >         > On 8/31/17, 10:49, "Zha, Sheng" <zh...@amazon.com>
> > > wrote:
> > > > > >         >
> > > > > >         >     Just one thing: please don’t disable more tests or
> > just
> > > > > raise the
> > > > > >         > tolerance thresholds.
> > > > > >         >
> > > > > >         >     Best regards,
> > > > > >         >     -sz
> > > > > >         >
> > > > > >         >     On 8/31/17, 10:45 AM, "Madan Jampani" <
> > > > > madan.jampani@gmail.com> wrote:
> > > > > >         >
> > > > > >         >         +1
> > > > > >         >         Before we can turn protected mode I feel we
> > should
> > > > > first get to a
> > > > > >         > stable CI
> > > > > >         >         pipeline.
> > > > > >         >         Sandeep is chasing down known breaking issues.
> > > > > >         >
> > > > > >         >
> > > > > >         >         On Thu, Aug 31, 2017 at 10:27 AM, Hagay
> Lupesko <
> > > > > lupesko@gmail.com>
> > > > > >         > wrote:
> > > > > >         >
> > > > > >         >         > Build stability is a major issue, builds have
> > > been
> > > > > failing left
> > > > > >         > and right
> > > > > >         >         > over the last week. Some of it is due to
> > Jenkins
> > > > > slave issues,
> > > > > >         > but some are
> > > > > >         >         > real regressions.
> > > > > >         >         > We need to be more strict in the code we're
> > > > > committing.
> > > > > >         >         >
> > > > > >         >         > I propose we configure our master to be a
> > > protected
> > > > > branch (
> > > > > >         >         >
> > > > > https://help.github.com/articles/about-protected-branches/).
> > > > > >         >         >
> > > > > >         >         > Thoughts?
> > > > > >         >         >
> > > > > >         >         > On 2017-08-28 22:41, sandeep krishnamurthy <
> > > > > s...@gmail.com>
> > > > > >         > wrote:
> > > > > >         >         > > Hello Committers and Contributors,>
> > > > > >         >         > >
> > > > > >         >         > > Due to unstable build pipelines, from past
> 1
> > > > week,
> > > > > PRs are
> > > > > >         > being merged>
> > > > > >         >         > > after CR ignoring PR build status. Build
> > > pipeline
> > > > > is much more
> > > > > >         > stable
> > > > > >         >         > than>
> > > > > >         >         > > last week and most of the build failures
> you
> > > see
> > > > > from now on,
> > > > > >         > are likely
> > > > > >         >         > to>
> > > > > >         >         > > be a valid failure and hence, it is
> > recommended
> > > > to
> > > > > wait for PR
> > > > > >         > builds,
> > > > > >         >         > see>
> > > > > >         >         > > the root cause of any build failures before
> > > > > proceeding with
> > > > > >         > merges.>
> > > > > >         >         > >
> > > > > >         >         > > At this point of time, there are 2
> > intermittent
> > > > > issue yet to
> > > > > >         > be fixed ->
> > > > > >         >         > > * Network error leading to GitHub requests
> > > > > throwing 404>
> > > > > >         >         > > * A conflict in artifacts generated between
> > > > > branches/PR -
> > > > > >         > Cause unknown
> > > > > >         >         > yet.>
> > > > > >         >         > > These issues will be fixed soon.>
> > > > > >         >         > >
> > > > > >         >         > >
> > > > > >         >         > > -- >
> > > > > >         >         > > Sandeep Krishnamurthy>
> > > > > >         >         > >
> > > > > >         >         >
> > > > > >         >
> > > > > >         >
> > > > > >         >
> > > > > >         >
> > > > > >         >
> > > > > >         >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > - Tsuyoshi
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Best Regards,
> > > Gautam Kumar
> > >
> >
>
>
>
> --
> Best Regards,
> Gautam Kumar
>

Re: Apache MXNet build failures are mostly valid - verify before merge

Posted by Gautam <ga...@gmail.com>.
Hi Chris,

  Here <https://help.github.com/articles/about-protected-branches/> is user
document on semantics of protected branch.
In short when a branch is protected following applies to that branch.

   - Can't be force pushed
   - Can't be deleted
   - Can't have changes merged into it until required status checks
   <https://help.github.com/articles/about-required-status-checks> pass
   - Can't have changes merged into it until required reviews are approved
   <https://help.github.com/articles/approving-a-pull-request-with-required-reviews>
   - Can't be edited or have files uploaded to it from the web
   - Can't have changes merged into it until changes to files that
have a designated
   code owner <https://help.github.com/articles/about-codeowners/> have
   been approved by that owner

 I am sure many of us might not want to have all these but we can debate on
it. My main motive was to "*Can't have changes merged into it until
required status checks pass*"


-Gautam



On Wed, Sep 27, 2017 at 11:09 PM, Chris Olivier <cj...@gmail.com>
wrote:

> What does that mean? "Protected"? Protected from what?
>
> On Wed, Sep 27, 2017 at 11:08 PM Gautam <ga...@gmail.com> wrote:
>
> > Hi Chris,
> >
> >    I mean make "master branch protected" of  MXNet.
> >
> > -Gautam
> >
> > On Wed, Sep 27, 2017 at 11:04 PM, Chris Olivier <cj...@gmail.com>
> > wrote:
> >
> > > What does this mean? "Mx-net branch protected"?
> > >
> > > On Wed, Sep 27, 2017 at 9:59 PM Tsuyoshi OZAWA <
> ozawa.tsuyoshi@gmail.com
> > >
> > > wrote:
> > >
> > > > +1,
> > > >
> > > > While I'm checking the recent build failures, and I think the
> decision
> > > > of making the mx-net branch protected is necessary for stable
> > > > building.
> > > > Thanks Kumar for resuming important discussion.
> > > >
> > > > Best regards
> > > > - Tsuyoshi
> > > >
> > > > On Thu, Sep 28, 2017 at 12:56 PM, Kumar, Gautam <ga...@amazon.com>
> > > wrote:
> > > > > Reviving the discussion.
> > > > >
> > > > > At this point of time we have couple of stable builds
> > > > >
> > > > https://builds.apache.org/view/Incubator%20Projects/job/
> > > incubator-mxnet/job/master/448/
> > > > >
> > > > https://builds.apache.org/view/Incubator%20Projects/job/
> > > incubator-mxnet/job/master/449/
> > > > >
> > > > > Should we have a quick discussion or polling on making the mx-net
> > > branch
> > > > protected? If you still think we shouldn’t make it protected please
> > > provide
> > > > a reason to support your claim.
> > > > >
> > > > > Few of us have concern over Jenkin’s stability. If I look two weeks
> > > > back, after upgrading Linux slave to g2.8x and new windows AMI, we
> have
> > > not
> > > > seen any case where instance died due to high memory usage or any
> > process
> > > > got killed due to high cpu usage or any other issue with windows
> > slaves.
> > > > >
> > > > > Going forward we are also planning that if we add any new slave we
> > will
> > > > not enable the main load immediately, but rather will do ‘test build’
> > to
> > > > make sure that new slaves are not causing any infrastructure issue
> and
> > > > capable to perform as good as existing slaves.
> > > > >
> > > > > -Gautam
> > > > >
> > > > > On 8/31/17, 5:27 PM, "Lupesko, Hagay" <lu...@gmail.com> wrote:
> > > > >
> > > > >     @madan looking into some failures – you’re right… there’s
> > multiple
> > > > issues going on, some of them intermittent, and we want to be able to
> > > merge
> > > > fixes in.
> > > > >     Agreed that we can wait with setting up protected mode until
> > build
> > > > stabilizes.
> > > > >
> > > > >     On 8/31/17, 11:41, "Madan Jampani" <ma...@gmail.com>
> > > wrote:
> > > > >
> > > > >         @hagay: we agree on the end state. I'm not too particular
> > about
> > > > how we get
> > > > >         there. If you think enabling it now and fixes regression
> > later
> > > > is doable,
> > > > >         I'm fine with. I see a bit of a chicken and egg problem. We
> > > need
> > > > to get
> > > > >         some fixes in even when the status checks are failing.
> > > > >
> > > > >         On Thu, Aug 31, 2017 at 11:25 AM, Lupesko, Hagay <
> > > > lupesko@gmail.com> wrote:
> > > > >
> > > > >         > @madan – re: getting to a stable CI first:
> > > > >         > I’m concerned that by not enabling protected branch mode
> > > ASAP,
> > > > we’re just
> > > > >         > taking in more regressions, which makes a stable build a
> > > > moving target for
> > > > >         > us…
> > > > >         >
> > > > >         > On 8/31/17, 10:49, "Zha, Sheng" <zh...@amazon.com>
> > wrote:
> > > > >         >
> > > > >         >     Just one thing: please don’t disable more tests or
> just
> > > > raise the
> > > > >         > tolerance thresholds.
> > > > >         >
> > > > >         >     Best regards,
> > > > >         >     -sz
> > > > >         >
> > > > >         >     On 8/31/17, 10:45 AM, "Madan Jampani" <
> > > > madan.jampani@gmail.com> wrote:
> > > > >         >
> > > > >         >         +1
> > > > >         >         Before we can turn protected mode I feel we
> should
> > > > first get to a
> > > > >         > stable CI
> > > > >         >         pipeline.
> > > > >         >         Sandeep is chasing down known breaking issues.
> > > > >         >
> > > > >         >
> > > > >         >         On Thu, Aug 31, 2017 at 10:27 AM, Hagay Lupesko <
> > > > lupesko@gmail.com>
> > > > >         > wrote:
> > > > >         >
> > > > >         >         > Build stability is a major issue, builds have
> > been
> > > > failing left
> > > > >         > and right
> > > > >         >         > over the last week. Some of it is due to
> Jenkins
> > > > slave issues,
> > > > >         > but some are
> > > > >         >         > real regressions.
> > > > >         >         > We need to be more strict in the code we're
> > > > committing.
> > > > >         >         >
> > > > >         >         > I propose we configure our master to be a
> > protected
> > > > branch (
> > > > >         >         >
> > > > https://help.github.com/articles/about-protected-branches/).
> > > > >         >         >
> > > > >         >         > Thoughts?
> > > > >         >         >
> > > > >         >         > On 2017-08-28 22:41, sandeep krishnamurthy <
> > > > s...@gmail.com>
> > > > >         > wrote:
> > > > >         >         > > Hello Committers and Contributors,>
> > > > >         >         > >
> > > > >         >         > > Due to unstable build pipelines, from past 1
> > > week,
> > > > PRs are
> > > > >         > being merged>
> > > > >         >         > > after CR ignoring PR build status. Build
> > pipeline
> > > > is much more
> > > > >         > stable
> > > > >         >         > than>
> > > > >         >         > > last week and most of the build failures you
> > see
> > > > from now on,
> > > > >         > are likely
> > > > >         >         > to>
> > > > >         >         > > be a valid failure and hence, it is
> recommended
> > > to
> > > > wait for PR
> > > > >         > builds,
> > > > >         >         > see>
> > > > >         >         > > the root cause of any build failures before
> > > > proceeding with
> > > > >         > merges.>
> > > > >         >         > >
> > > > >         >         > > At this point of time, there are 2
> intermittent
> > > > issue yet to
> > > > >         > be fixed ->
> > > > >         >         > > * Network error leading to GitHub requests
> > > > throwing 404>
> > > > >         >         > > * A conflict in artifacts generated between
> > > > branches/PR -
> > > > >         > Cause unknown
> > > > >         >         > yet.>
> > > > >         >         > > These issues will be fixed soon.>
> > > > >         >         > >
> > > > >         >         > >
> > > > >         >         > > -- >
> > > > >         >         > > Sandeep Krishnamurthy>
> > > > >         >         > >
> > > > >         >         >
> > > > >         >
> > > > >         >
> > > > >         >
> > > > >         >
> > > > >         >
> > > > >         >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > - Tsuyoshi
> > > >
> > >
> >
> >
> >
> > --
> > Best Regards,
> > Gautam Kumar
> >
>



-- 
Best Regards,
Gautam Kumar

Re: Apache MXNet build failures are mostly valid - verify beforemerge

Posted by Chris Olivier <cj...@gmail.com>.
Thank you for the clarification.

Agree force pushed are bad. I don't think anyone is doing this, though.

I am curious though, and this may be an unpopular question, but is there
precedent to tempoaraily disabling write access to people who merge in code
that hasn't passed CI tests and thus break CI (is there a way to tell)?
Certainly for 10% failures, one could argue that it "slipped by", but im
speaking more about blind merges that break everything 100% of the time.
How do other Apache projects deal with this?

On Wed, Sep 27, 2017 at 11:13 PM kellen sunderland <
kellen.sunderland@gmail.com> wrote:

> Looking at the email thread it means marking the master branch as
> protected in GitHub (a functionality they offer):
> https://help.github.com/articles/about-protected-branches/
>
> Popular open source projects should at a minimum have force push disabled
> on their master branch to prevent broken downstream git references.  The
> other protections offered here are also most likely beneficial.
>
> -Kellen
>
> From: Chris Olivier
> Sent: Thursday, September 28, 2017 8:10 AM
> To: dev@mxnet.incubator.apache.org
> Subject: Re: Apache MXNet build failures are mostly valid - verify
> beforemerge
>
> What does that mean? "Protected"? Protected from what?
>
> On Wed, Sep 27, 2017 at 11:08 PM Gautam <ga...@gmail.com> wrote:
>
> > Hi Chris,
> >
> >    I mean make "master branch protected" of  MXNet.
> >
> > -Gautam
> >
> > On Wed, Sep 27, 2017 at 11:04 PM, Chris Olivier <cj...@gmail.com>
> > wrote:
> >
> > > What does this mean? "Mx-net branch protected"?
> > >
> > > On Wed, Sep 27, 2017 at 9:59 PM Tsuyoshi OZAWA <
> ozawa.tsuyoshi@gmail.com
> > >
> > > wrote:
> > >
> > > > +1,
> > > >
> > > > While I'm checking the recent build failures, and I think the
> decision
> > > > of making the mx-net branch protected is necessary for stable
> > > > building.
> > > > Thanks Kumar for resuming important discussion.
> > > >
> > > > Best regards
> > > > - Tsuyoshi
> > > >
> > > > On Thu, Sep 28, 2017 at 12:56 PM, Kumar, Gautam <ga...@amazon.com>
> > > wrote:
> > > > > Reviving the discussion.
> > > > >
> > > > > At this point of time we have couple of stable builds
> > > > >
> > > > https://builds.apache.org/view/Incubator%20Projects/job/
> > > incubator-mxnet/job/master/448/
> > > > >
> > > > https://builds.apache.org/view/Incubator%20Projects/job/
> > > incubator-mxnet/job/master/449/
> > > > >
> > > > > Should we have a quick discussion or polling on making the mx-net
> > > branch
> > > > protected? If you still think we shouldn’t make it protected please
> > > provide
> > > > a reason to support your claim.
> > > > >
> > > > > Few of us have concern over Jenkin’s stability. If I look two weeks
> > > > back, after upgrading Linux slave to g2.8x and new windows AMI, we
> have
> > > not
> > > > seen any case where instance died due to high memory usage or any
> > process
> > > > got killed due to high cpu usage or any other issue with windows
> > slaves.
> > > > >
> > > > > Going forward we are also planning that if we add any new slave we
> > will
> > > > not enable the main load immediately, but rather will do ‘test build’
> > to
> > > > make sure that new slaves are not causing any infrastructure issue
> and
> > > > capable to perform as good as existing slaves.
> > > > >
> > > > > -Gautam
> > > > >
> > > > > On 8/31/17, 5:27 PM, "Lupesko, Hagay" <lu...@gmail.com> wrote:
> > > > >
> > > > >     @madan looking into some failures – you’re right… there’s
> > multiple
> > > > issues going on, some of them intermittent, and we want to be able to
> > > merge
> > > > fixes in.
> > > > >     Agreed that we can wait with setting up protected mode until
> > build
> > > > stabilizes.
> > > > >
> > > > >     On 8/31/17, 11:41, "Madan Jampani" <ma...@gmail.com>
> > > wrote:
> > > > >
> > > > >         @hagay: we agree on the end state. I'm not too particular
> > about
> > > > how we get
> > > > >         there. If you think enabling it now and fixes regression
> > later
> > > > is doable,
> > > > >         I'm fine with. I see a bit of a chicken and egg problem. We
> > > need
> > > > to get
> > > > >         some fixes in even when the status checks are failing.
> > > > >
> > > > >         On Thu, Aug 31, 2017 at 11:25 AM, Lupesko, Hagay <
> > > > lupesko@gmail.com> wrote:
> > > > >
> > > > >         > @madan – re: getting to a stable CI first:
> > > > >         > I’m concerned that by not enabling protected branch mode
> > > ASAP,
> > > > we’re just
> > > > >         > taking in more regressions, which makes a stable build a
> > > > moving target for
> > > > >         > us…
> > > > >         >
> > > > >         > On 8/31/17, 10:49, "Zha, Sheng" <zh...@amazon.com>
> > wrote:
> > > > >         >
> > > > >         >     Just one thing: please don’t disable more tests or
> just
> > > > raise the
> > > > >         > tolerance thresholds.
> > > > >         >
> > > > >         >     Best regards,
> > > > >         >     -sz
> > > > >         >
> > > > >         >     On 8/31/17, 10:45 AM, "Madan Jampani" <
> > > > madan.jampani@gmail.com> wrote:
> > > > >         >
> > > > >         >         +1
> > > > >         >         Before we can turn protected mode I feel we
> should
> > > > first get to a
> > > > >         > stable CI
> > > > >         >         pipeline.
> > > > >         >         Sandeep is chasing down known breaking issues.
> > > > >         >
> > > > >         >
> > > > >         >         On Thu, Aug 31, 2017 at 10:27 AM, Hagay Lupesko <
> > > > lupesko@gmail.com>
> > > > >         > wrote:
> > > > >         >
> > > > >         >         > Build stability is a major issue, builds have
> > been
> > > > failing left
> > > > >         > and right
> > > > >         >         > over the last week. Some of it is due to
> Jenkins
> > > > slave issues,
> > > > >         > but some are
> > > > >         >         > real regressions.
> > > > >         >         > We need to be more strict in the code we're
> > > > committing.
> > > > >         >         >
> > > > >         >         > I propose we configure our master to be a
> > protected
> > > > branch (
> > > > >         >         >
> > > > https://help.github.com/articles/about-protected-branches/).
> > > > >         >         >
> > > > >         >         > Thoughts?
> > > > >         >         >
> > > > >         >         > On 2017-08-28 22:41, sandeep krishnamurthy <
> > > > s...@gmail.com>
> > > > >         > wrote:
> > > > >         >         > > Hello Committers and Contributors,>
> > > > >         >         > >
> > > > >         >         > > Due to unstable build pipelines, from past 1
> > > week,
> > > > PRs are
> > > > >         > being merged>
> > > > >         >         > > after CR ignoring PR build status. Build
> > pipeline
> > > > is much more
> > > > >         > stable
> > > > >         >         > than>
> > > > >         >         > > last week and most of the build failures you
> > see
> > > > from now on,
> > > > >         > are likely
> > > > >         >         > to>
> > > > >         >         > > be a valid failure and hence, it is
> recommended
> > > to
> > > > wait for PR
> > > > >         > builds,
> > > > >         >         > see>
> > > > >         >         > > the root cause of any build failures before
> > > > proceeding with
> > > > >         > merges.>
> > > > >         >         > >
> > > > >         >         > > At this point of time, there are 2
> intermittent
> > > > issue yet to
> > > > >         > be fixed ->
> > > > >         >         > > * Network error leading to GitHub requests
> > > > throwing 404>
> > > > >         >         > > * A conflict in artifacts generated between
> > > > branches/PR -
> > > > >         > Cause unknown
> > > > >         >         > yet.>
> > > > >         >         > > These issues will be fixed soon.>
> > > > >         >         > >
> > > > >         >         > >
> > > > >         >         > > -- >
> > > > >         >         > > Sandeep Krishnamurthy>
> > > > >         >         > >
> > > > >         >         >
> > > > >         >
> > > > >         >
> > > > >         >
> > > > >         >
> > > > >         >
> > > > >         >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > - Tsuyoshi
> > > >
> > >
> >
> >
> >
> > --
> > Best Regards,
> > Gautam Kumar
> >
>
>

RE: Apache MXNet build failures are mostly valid - verify beforemerge

Posted by kellen sunderland <ke...@gmail.com>.
Looking at the email thread it means marking the master branch as protected in GitHub (a functionality they offer): https://help.github.com/articles/about-protected-branches/

Popular open source projects should at a minimum have force push disabled on their master branch to prevent broken downstream git references.  The other protections offered here are also most likely beneficial.  

-Kellen

From: Chris Olivier
Sent: Thursday, September 28, 2017 8:10 AM
To: dev@mxnet.incubator.apache.org
Subject: Re: Apache MXNet build failures are mostly valid - verify beforemerge

What does that mean? "Protected"? Protected from what?

On Wed, Sep 27, 2017 at 11:08 PM Gautam <ga...@gmail.com> wrote:

> Hi Chris,
>
>    I mean make "master branch protected" of  MXNet.
>
> -Gautam
>
> On Wed, Sep 27, 2017 at 11:04 PM, Chris Olivier <cj...@gmail.com>
> wrote:
>
> > What does this mean? "Mx-net branch protected"?
> >
> > On Wed, Sep 27, 2017 at 9:59 PM Tsuyoshi OZAWA <ozawa.tsuyoshi@gmail.com
> >
> > wrote:
> >
> > > +1,
> > >
> > > While I'm checking the recent build failures, and I think the decision
> > > of making the mx-net branch protected is necessary for stable
> > > building.
> > > Thanks Kumar for resuming important discussion.
> > >
> > > Best regards
> > > - Tsuyoshi
> > >
> > > On Thu, Sep 28, 2017 at 12:56 PM, Kumar, Gautam <ga...@amazon.com>
> > wrote:
> > > > Reviving the discussion.
> > > >
> > > > At this point of time we have couple of stable builds
> > > >
> > > https://builds.apache.org/view/Incubator%20Projects/job/
> > incubator-mxnet/job/master/448/
> > > >
> > > https://builds.apache.org/view/Incubator%20Projects/job/
> > incubator-mxnet/job/master/449/
> > > >
> > > > Should we have a quick discussion or polling on making the mx-net
> > branch
> > > protected? If you still think we shouldn’t make it protected please
> > provide
> > > a reason to support your claim.
> > > >
> > > > Few of us have concern over Jenkin’s stability. If I look two weeks
> > > back, after upgrading Linux slave to g2.8x and new windows AMI, we have
> > not
> > > seen any case where instance died due to high memory usage or any
> process
> > > got killed due to high cpu usage or any other issue with windows
> slaves.
> > > >
> > > > Going forward we are also planning that if we add any new slave we
> will
> > > not enable the main load immediately, but rather will do ‘test build’
> to
> > > make sure that new slaves are not causing any infrastructure issue and
> > > capable to perform as good as existing slaves.
> > > >
> > > > -Gautam
> > > >
> > > > On 8/31/17, 5:27 PM, "Lupesko, Hagay" <lu...@gmail.com> wrote:
> > > >
> > > >     @madan looking into some failures – you’re right… there’s
> multiple
> > > issues going on, some of them intermittent, and we want to be able to
> > merge
> > > fixes in.
> > > >     Agreed that we can wait with setting up protected mode until
> build
> > > stabilizes.
> > > >
> > > >     On 8/31/17, 11:41, "Madan Jampani" <ma...@gmail.com>
> > wrote:
> > > >
> > > >         @hagay: we agree on the end state. I'm not too particular
> about
> > > how we get
> > > >         there. If you think enabling it now and fixes regression
> later
> > > is doable,
> > > >         I'm fine with. I see a bit of a chicken and egg problem. We
> > need
> > > to get
> > > >         some fixes in even when the status checks are failing.
> > > >
> > > >         On Thu, Aug 31, 2017 at 11:25 AM, Lupesko, Hagay <
> > > lupesko@gmail.com> wrote:
> > > >
> > > >         > @madan – re: getting to a stable CI first:
> > > >         > I’m concerned that by not enabling protected branch mode
> > ASAP,
> > > we’re just
> > > >         > taking in more regressions, which makes a stable build a
> > > moving target for
> > > >         > us…
> > > >         >
> > > >         > On 8/31/17, 10:49, "Zha, Sheng" <zh...@amazon.com>
> wrote:
> > > >         >
> > > >         >     Just one thing: please don’t disable more tests or just
> > > raise the
> > > >         > tolerance thresholds.
> > > >         >
> > > >         >     Best regards,
> > > >         >     -sz
> > > >         >
> > > >         >     On 8/31/17, 10:45 AM, "Madan Jampani" <
> > > madan.jampani@gmail.com> wrote:
> > > >         >
> > > >         >         +1
> > > >         >         Before we can turn protected mode I feel we should
> > > first get to a
> > > >         > stable CI
> > > >         >         pipeline.
> > > >         >         Sandeep is chasing down known breaking issues.
> > > >         >
> > > >         >
> > > >         >         On Thu, Aug 31, 2017 at 10:27 AM, Hagay Lupesko <
> > > lupesko@gmail.com>
> > > >         > wrote:
> > > >         >
> > > >         >         > Build stability is a major issue, builds have
> been
> > > failing left
> > > >         > and right
> > > >         >         > over the last week. Some of it is due to Jenkins
> > > slave issues,
> > > >         > but some are
> > > >         >         > real regressions.
> > > >         >         > We need to be more strict in the code we're
> > > committing.
> > > >         >         >
> > > >         >         > I propose we configure our master to be a
> protected
> > > branch (
> > > >         >         >
> > > https://help.github.com/articles/about-protected-branches/).
> > > >         >         >
> > > >         >         > Thoughts?
> > > >         >         >
> > > >         >         > On 2017-08-28 22:41, sandeep krishnamurthy <
> > > s...@gmail.com>
> > > >         > wrote:
> > > >         >         > > Hello Committers and Contributors,>
> > > >         >         > >
> > > >         >         > > Due to unstable build pipelines, from past 1
> > week,
> > > PRs are
> > > >         > being merged>
> > > >         >         > > after CR ignoring PR build status. Build
> pipeline
> > > is much more
> > > >         > stable
> > > >         >         > than>
> > > >         >         > > last week and most of the build failures you
> see
> > > from now on,
> > > >         > are likely
> > > >         >         > to>
> > > >         >         > > be a valid failure and hence, it is recommended
> > to
> > > wait for PR
> > > >         > builds,
> > > >         >         > see>
> > > >         >         > > the root cause of any build failures before
> > > proceeding with
> > > >         > merges.>
> > > >         >         > >
> > > >         >         > > At this point of time, there are 2 intermittent
> > > issue yet to
> > > >         > be fixed ->
> > > >         >         > > * Network error leading to GitHub requests
> > > throwing 404>
> > > >         >         > > * A conflict in artifacts generated between
> > > branches/PR -
> > > >         > Cause unknown
> > > >         >         > yet.>
> > > >         >         > > These issues will be fixed soon.>
> > > >         >         > >
> > > >         >         > >
> > > >         >         > > -- >
> > > >         >         > > Sandeep Krishnamurthy>
> > > >         >         > >
> > > >         >         >
> > > >         >
> > > >         >
> > > >         >
> > > >         >
> > > >         >
> > > >         >
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> > >
> > >
> > > --
> > > - Tsuyoshi
> > >
> >
>
>
>
> --
> Best Regards,
> Gautam Kumar
>


Re: Apache MXNet build failures are mostly valid - verify before merge

Posted by Chris Olivier <cj...@gmail.com>.
What does that mean? "Protected"? Protected from what?

On Wed, Sep 27, 2017 at 11:08 PM Gautam <ga...@gmail.com> wrote:

> Hi Chris,
>
>    I mean make "master branch protected" of  MXNet.
>
> -Gautam
>
> On Wed, Sep 27, 2017 at 11:04 PM, Chris Olivier <cj...@gmail.com>
> wrote:
>
> > What does this mean? "Mx-net branch protected"?
> >
> > On Wed, Sep 27, 2017 at 9:59 PM Tsuyoshi OZAWA <ozawa.tsuyoshi@gmail.com
> >
> > wrote:
> >
> > > +1,
> > >
> > > While I'm checking the recent build failures, and I think the decision
> > > of making the mx-net branch protected is necessary for stable
> > > building.
> > > Thanks Kumar for resuming important discussion.
> > >
> > > Best regards
> > > - Tsuyoshi
> > >
> > > On Thu, Sep 28, 2017 at 12:56 PM, Kumar, Gautam <ga...@amazon.com>
> > wrote:
> > > > Reviving the discussion.
> > > >
> > > > At this point of time we have couple of stable builds
> > > >
> > > https://builds.apache.org/view/Incubator%20Projects/job/
> > incubator-mxnet/job/master/448/
> > > >
> > > https://builds.apache.org/view/Incubator%20Projects/job/
> > incubator-mxnet/job/master/449/
> > > >
> > > > Should we have a quick discussion or polling on making the mx-net
> > branch
> > > protected? If you still think we shouldn’t make it protected please
> > provide
> > > a reason to support your claim.
> > > >
> > > > Few of us have concern over Jenkin’s stability. If I look two weeks
> > > back, after upgrading Linux slave to g2.8x and new windows AMI, we have
> > not
> > > seen any case where instance died due to high memory usage or any
> process
> > > got killed due to high cpu usage or any other issue with windows
> slaves.
> > > >
> > > > Going forward we are also planning that if we add any new slave we
> will
> > > not enable the main load immediately, but rather will do ‘test build’
> to
> > > make sure that new slaves are not causing any infrastructure issue and
> > > capable to perform as good as existing slaves.
> > > >
> > > > -Gautam
> > > >
> > > > On 8/31/17, 5:27 PM, "Lupesko, Hagay" <lu...@gmail.com> wrote:
> > > >
> > > >     @madan looking into some failures – you’re right… there’s
> multiple
> > > issues going on, some of them intermittent, and we want to be able to
> > merge
> > > fixes in.
> > > >     Agreed that we can wait with setting up protected mode until
> build
> > > stabilizes.
> > > >
> > > >     On 8/31/17, 11:41, "Madan Jampani" <ma...@gmail.com>
> > wrote:
> > > >
> > > >         @hagay: we agree on the end state. I'm not too particular
> about
> > > how we get
> > > >         there. If you think enabling it now and fixes regression
> later
> > > is doable,
> > > >         I'm fine with. I see a bit of a chicken and egg problem. We
> > need
> > > to get
> > > >         some fixes in even when the status checks are failing.
> > > >
> > > >         On Thu, Aug 31, 2017 at 11:25 AM, Lupesko, Hagay <
> > > lupesko@gmail.com> wrote:
> > > >
> > > >         > @madan – re: getting to a stable CI first:
> > > >         > I’m concerned that by not enabling protected branch mode
> > ASAP,
> > > we’re just
> > > >         > taking in more regressions, which makes a stable build a
> > > moving target for
> > > >         > us…
> > > >         >
> > > >         > On 8/31/17, 10:49, "Zha, Sheng" <zh...@amazon.com>
> wrote:
> > > >         >
> > > >         >     Just one thing: please don’t disable more tests or just
> > > raise the
> > > >         > tolerance thresholds.
> > > >         >
> > > >         >     Best regards,
> > > >         >     -sz
> > > >         >
> > > >         >     On 8/31/17, 10:45 AM, "Madan Jampani" <
> > > madan.jampani@gmail.com> wrote:
> > > >         >
> > > >         >         +1
> > > >         >         Before we can turn protected mode I feel we should
> > > first get to a
> > > >         > stable CI
> > > >         >         pipeline.
> > > >         >         Sandeep is chasing down known breaking issues.
> > > >         >
> > > >         >
> > > >         >         On Thu, Aug 31, 2017 at 10:27 AM, Hagay Lupesko <
> > > lupesko@gmail.com>
> > > >         > wrote:
> > > >         >
> > > >         >         > Build stability is a major issue, builds have
> been
> > > failing left
> > > >         > and right
> > > >         >         > over the last week. Some of it is due to Jenkins
> > > slave issues,
> > > >         > but some are
> > > >         >         > real regressions.
> > > >         >         > We need to be more strict in the code we're
> > > committing.
> > > >         >         >
> > > >         >         > I propose we configure our master to be a
> protected
> > > branch (
> > > >         >         >
> > > https://help.github.com/articles/about-protected-branches/).
> > > >         >         >
> > > >         >         > Thoughts?
> > > >         >         >
> > > >         >         > On 2017-08-28 22:41, sandeep krishnamurthy <
> > > s...@gmail.com>
> > > >         > wrote:
> > > >         >         > > Hello Committers and Contributors,>
> > > >         >         > >
> > > >         >         > > Due to unstable build pipelines, from past 1
> > week,
> > > PRs are
> > > >         > being merged>
> > > >         >         > > after CR ignoring PR build status. Build
> pipeline
> > > is much more
> > > >         > stable
> > > >         >         > than>
> > > >         >         > > last week and most of the build failures you
> see
> > > from now on,
> > > >         > are likely
> > > >         >         > to>
> > > >         >         > > be a valid failure and hence, it is recommended
> > to
> > > wait for PR
> > > >         > builds,
> > > >         >         > see>
> > > >         >         > > the root cause of any build failures before
> > > proceeding with
> > > >         > merges.>
> > > >         >         > >
> > > >         >         > > At this point of time, there are 2 intermittent
> > > issue yet to
> > > >         > be fixed ->
> > > >         >         > > * Network error leading to GitHub requests
> > > throwing 404>
> > > >         >         > > * A conflict in artifacts generated between
> > > branches/PR -
> > > >         > Cause unknown
> > > >         >         > yet.>
> > > >         >         > > These issues will be fixed soon.>
> > > >         >         > >
> > > >         >         > >
> > > >         >         > > -- >
> > > >         >         > > Sandeep Krishnamurthy>
> > > >         >         > >
> > > >         >         >
> > > >         >
> > > >         >
> > > >         >
> > > >         >
> > > >         >
> > > >         >
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> > >
> > >
> > > --
> > > - Tsuyoshi
> > >
> >
>
>
>
> --
> Best Regards,
> Gautam Kumar
>

Re: Apache MXNet build failures are mostly valid - verify before merge

Posted by Gautam <ga...@gmail.com>.
Hi Chris,

   I mean make "master branch protected" of  MXNet.

-Gautam

On Wed, Sep 27, 2017 at 11:04 PM, Chris Olivier <cj...@gmail.com>
wrote:

> What does this mean? "Mx-net branch protected"?
>
> On Wed, Sep 27, 2017 at 9:59 PM Tsuyoshi OZAWA <oz...@gmail.com>
> wrote:
>
> > +1,
> >
> > While I'm checking the recent build failures, and I think the decision
> > of making the mx-net branch protected is necessary for stable
> > building.
> > Thanks Kumar for resuming important discussion.
> >
> > Best regards
> > - Tsuyoshi
> >
> > On Thu, Sep 28, 2017 at 12:56 PM, Kumar, Gautam <ga...@amazon.com>
> wrote:
> > > Reviving the discussion.
> > >
> > > At this point of time we have couple of stable builds
> > >
> > https://builds.apache.org/view/Incubator%20Projects/job/
> incubator-mxnet/job/master/448/
> > >
> > https://builds.apache.org/view/Incubator%20Projects/job/
> incubator-mxnet/job/master/449/
> > >
> > > Should we have a quick discussion or polling on making the mx-net
> branch
> > protected? If you still think we shouldn’t make it protected please
> provide
> > a reason to support your claim.
> > >
> > > Few of us have concern over Jenkin’s stability. If I look two weeks
> > back, after upgrading Linux slave to g2.8x and new windows AMI, we have
> not
> > seen any case where instance died due to high memory usage or any process
> > got killed due to high cpu usage or any other issue with windows slaves.
> > >
> > > Going forward we are also planning that if we add any new slave we will
> > not enable the main load immediately, but rather will do ‘test build’ to
> > make sure that new slaves are not causing any infrastructure issue and
> > capable to perform as good as existing slaves.
> > >
> > > -Gautam
> > >
> > > On 8/31/17, 5:27 PM, "Lupesko, Hagay" <lu...@gmail.com> wrote:
> > >
> > >     @madan looking into some failures – you’re right… there’s multiple
> > issues going on, some of them intermittent, and we want to be able to
> merge
> > fixes in.
> > >     Agreed that we can wait with setting up protected mode until build
> > stabilizes.
> > >
> > >     On 8/31/17, 11:41, "Madan Jampani" <ma...@gmail.com>
> wrote:
> > >
> > >         @hagay: we agree on the end state. I'm not too particular about
> > how we get
> > >         there. If you think enabling it now and fixes regression later
> > is doable,
> > >         I'm fine with. I see a bit of a chicken and egg problem. We
> need
> > to get
> > >         some fixes in even when the status checks are failing.
> > >
> > >         On Thu, Aug 31, 2017 at 11:25 AM, Lupesko, Hagay <
> > lupesko@gmail.com> wrote:
> > >
> > >         > @madan – re: getting to a stable CI first:
> > >         > I’m concerned that by not enabling protected branch mode
> ASAP,
> > we’re just
> > >         > taking in more regressions, which makes a stable build a
> > moving target for
> > >         > us…
> > >         >
> > >         > On 8/31/17, 10:49, "Zha, Sheng" <zh...@amazon.com> wrote:
> > >         >
> > >         >     Just one thing: please don’t disable more tests or just
> > raise the
> > >         > tolerance thresholds.
> > >         >
> > >         >     Best regards,
> > >         >     -sz
> > >         >
> > >         >     On 8/31/17, 10:45 AM, "Madan Jampani" <
> > madan.jampani@gmail.com> wrote:
> > >         >
> > >         >         +1
> > >         >         Before we can turn protected mode I feel we should
> > first get to a
> > >         > stable CI
> > >         >         pipeline.
> > >         >         Sandeep is chasing down known breaking issues.
> > >         >
> > >         >
> > >         >         On Thu, Aug 31, 2017 at 10:27 AM, Hagay Lupesko <
> > lupesko@gmail.com>
> > >         > wrote:
> > >         >
> > >         >         > Build stability is a major issue, builds have been
> > failing left
> > >         > and right
> > >         >         > over the last week. Some of it is due to Jenkins
> > slave issues,
> > >         > but some are
> > >         >         > real regressions.
> > >         >         > We need to be more strict in the code we're
> > committing.
> > >         >         >
> > >         >         > I propose we configure our master to be a protected
> > branch (
> > >         >         >
> > https://help.github.com/articles/about-protected-branches/).
> > >         >         >
> > >         >         > Thoughts?
> > >         >         >
> > >         >         > On 2017-08-28 22:41, sandeep krishnamurthy <
> > s...@gmail.com>
> > >         > wrote:
> > >         >         > > Hello Committers and Contributors,>
> > >         >         > >
> > >         >         > > Due to unstable build pipelines, from past 1
> week,
> > PRs are
> > >         > being merged>
> > >         >         > > after CR ignoring PR build status. Build pipeline
> > is much more
> > >         > stable
> > >         >         > than>
> > >         >         > > last week and most of the build failures you see
> > from now on,
> > >         > are likely
> > >         >         > to>
> > >         >         > > be a valid failure and hence, it is recommended
> to
> > wait for PR
> > >         > builds,
> > >         >         > see>
> > >         >         > > the root cause of any build failures before
> > proceeding with
> > >         > merges.>
> > >         >         > >
> > >         >         > > At this point of time, there are 2 intermittent
> > issue yet to
> > >         > be fixed ->
> > >         >         > > * Network error leading to GitHub requests
> > throwing 404>
> > >         >         > > * A conflict in artifacts generated between
> > branches/PR -
> > >         > Cause unknown
> > >         >         > yet.>
> > >         >         > > These issues will be fixed soon.>
> > >         >         > >
> > >         >         > >
> > >         >         > > -- >
> > >         >         > > Sandeep Krishnamurthy>
> > >         >         > >
> > >         >         >
> > >         >
> > >         >
> > >         >
> > >         >
> > >         >
> > >         >
> > >
> > >
> > >
> > >
> > >
> >
> >
> >
> > --
> > - Tsuyoshi
> >
>



-- 
Best Regards,
Gautam Kumar

Re: Apache MXNet build failures are mostly valid - verify before merge

Posted by Chris Olivier <cj...@gmail.com>.
What does this mean? "Mx-net branch protected"?

On Wed, Sep 27, 2017 at 9:59 PM Tsuyoshi OZAWA <oz...@gmail.com>
wrote:

> +1,
>
> While I'm checking the recent build failures, and I think the decision
> of making the mx-net branch protected is necessary for stable
> building.
> Thanks Kumar for resuming important discussion.
>
> Best regards
> - Tsuyoshi
>
> On Thu, Sep 28, 2017 at 12:56 PM, Kumar, Gautam <ga...@amazon.com> wrote:
> > Reviving the discussion.
> >
> > At this point of time we have couple of stable builds
> >
> https://builds.apache.org/view/Incubator%20Projects/job/incubator-mxnet/job/master/448/
> >
> https://builds.apache.org/view/Incubator%20Projects/job/incubator-mxnet/job/master/449/
> >
> > Should we have a quick discussion or polling on making the mx-net branch
> protected? If you still think we shouldn’t make it protected please provide
> a reason to support your claim.
> >
> > Few of us have concern over Jenkin’s stability. If I look two weeks
> back, after upgrading Linux slave to g2.8x and new windows AMI, we have not
> seen any case where instance died due to high memory usage or any process
> got killed due to high cpu usage or any other issue with windows slaves.
> >
> > Going forward we are also planning that if we add any new slave we will
> not enable the main load immediately, but rather will do ‘test build’ to
> make sure that new slaves are not causing any infrastructure issue and
> capable to perform as good as existing slaves.
> >
> > -Gautam
> >
> > On 8/31/17, 5:27 PM, "Lupesko, Hagay" <lu...@gmail.com> wrote:
> >
> >     @madan looking into some failures – you’re right… there’s multiple
> issues going on, some of them intermittent, and we want to be able to merge
> fixes in.
> >     Agreed that we can wait with setting up protected mode until build
> stabilizes.
> >
> >     On 8/31/17, 11:41, "Madan Jampani" <ma...@gmail.com> wrote:
> >
> >         @hagay: we agree on the end state. I'm not too particular about
> how we get
> >         there. If you think enabling it now and fixes regression later
> is doable,
> >         I'm fine with. I see a bit of a chicken and egg problem. We need
> to get
> >         some fixes in even when the status checks are failing.
> >
> >         On Thu, Aug 31, 2017 at 11:25 AM, Lupesko, Hagay <
> lupesko@gmail.com> wrote:
> >
> >         > @madan – re: getting to a stable CI first:
> >         > I’m concerned that by not enabling protected branch mode ASAP,
> we’re just
> >         > taking in more regressions, which makes a stable build a
> moving target for
> >         > us…
> >         >
> >         > On 8/31/17, 10:49, "Zha, Sheng" <zh...@amazon.com> wrote:
> >         >
> >         >     Just one thing: please don’t disable more tests or just
> raise the
> >         > tolerance thresholds.
> >         >
> >         >     Best regards,
> >         >     -sz
> >         >
> >         >     On 8/31/17, 10:45 AM, "Madan Jampani" <
> madan.jampani@gmail.com> wrote:
> >         >
> >         >         +1
> >         >         Before we can turn protected mode I feel we should
> first get to a
> >         > stable CI
> >         >         pipeline.
> >         >         Sandeep is chasing down known breaking issues.
> >         >
> >         >
> >         >         On Thu, Aug 31, 2017 at 10:27 AM, Hagay Lupesko <
> lupesko@gmail.com>
> >         > wrote:
> >         >
> >         >         > Build stability is a major issue, builds have been
> failing left
> >         > and right
> >         >         > over the last week. Some of it is due to Jenkins
> slave issues,
> >         > but some are
> >         >         > real regressions.
> >         >         > We need to be more strict in the code we're
> committing.
> >         >         >
> >         >         > I propose we configure our master to be a protected
> branch (
> >         >         >
> https://help.github.com/articles/about-protected-branches/).
> >         >         >
> >         >         > Thoughts?
> >         >         >
> >         >         > On 2017-08-28 22:41, sandeep krishnamurthy <
> s...@gmail.com>
> >         > wrote:
> >         >         > > Hello Committers and Contributors,>
> >         >         > >
> >         >         > > Due to unstable build pipelines, from past 1 week,
> PRs are
> >         > being merged>
> >         >         > > after CR ignoring PR build status. Build pipeline
> is much more
> >         > stable
> >         >         > than>
> >         >         > > last week and most of the build failures you see
> from now on,
> >         > are likely
> >         >         > to>
> >         >         > > be a valid failure and hence, it is recommended to
> wait for PR
> >         > builds,
> >         >         > see>
> >         >         > > the root cause of any build failures before
> proceeding with
> >         > merges.>
> >         >         > >
> >         >         > > At this point of time, there are 2 intermittent
> issue yet to
> >         > be fixed ->
> >         >         > > * Network error leading to GitHub requests
> throwing 404>
> >         >         > > * A conflict in artifacts generated between
> branches/PR -
> >         > Cause unknown
> >         >         > yet.>
> >         >         > > These issues will be fixed soon.>
> >         >         > >
> >         >         > >
> >         >         > > -- >
> >         >         > > Sandeep Krishnamurthy>
> >         >         > >
> >         >         >
> >         >
> >         >
> >         >
> >         >
> >         >
> >         >
> >
> >
> >
> >
> >
>
>
>
> --
> - Tsuyoshi
>

Re: Apache MXNet build failures are mostly valid - verify before merge

Posted by Tsuyoshi OZAWA <oz...@gmail.com>.
+1,

While I'm checking the recent build failures, and I think the decision
of making the mx-net branch protected is necessary for stable
building.
Thanks Kumar for resuming important discussion.

Best regards
- Tsuyoshi

On Thu, Sep 28, 2017 at 12:56 PM, Kumar, Gautam <ga...@amazon.com> wrote:
> Reviving the discussion.
>
> At this point of time we have couple of stable builds
> https://builds.apache.org/view/Incubator%20Projects/job/incubator-mxnet/job/master/448/
> https://builds.apache.org/view/Incubator%20Projects/job/incubator-mxnet/job/master/449/
>
> Should we have a quick discussion or polling on making the mx-net branch protected? If you still think we shouldn’t make it protected please provide a reason to support your claim.
>
> Few of us have concern over Jenkin’s stability. If I look two weeks back, after upgrading Linux slave to g2.8x and new windows AMI, we have not seen any case where instance died due to high memory usage or any process got killed due to high cpu usage or any other issue with windows slaves.
>
> Going forward we are also planning that if we add any new slave we will not enable the main load immediately, but rather will do ‘test build’ to make sure that new slaves are not causing any infrastructure issue and capable to perform as good as existing slaves.
>
> -Gautam
>
> On 8/31/17, 5:27 PM, "Lupesko, Hagay" <lu...@gmail.com> wrote:
>
>     @madan looking into some failures – you’re right… there’s multiple issues going on, some of them intermittent, and we want to be able to merge fixes in.
>     Agreed that we can wait with setting up protected mode until build stabilizes.
>
>     On 8/31/17, 11:41, "Madan Jampani" <ma...@gmail.com> wrote:
>
>         @hagay: we agree on the end state. I'm not too particular about how we get
>         there. If you think enabling it now and fixes regression later is doable,
>         I'm fine with. I see a bit of a chicken and egg problem. We need to get
>         some fixes in even when the status checks are failing.
>
>         On Thu, Aug 31, 2017 at 11:25 AM, Lupesko, Hagay <lu...@gmail.com> wrote:
>
>         > @madan – re: getting to a stable CI first:
>         > I’m concerned that by not enabling protected branch mode ASAP, we’re just
>         > taking in more regressions, which makes a stable build a moving target for
>         > us…
>         >
>         > On 8/31/17, 10:49, "Zha, Sheng" <zh...@amazon.com> wrote:
>         >
>         >     Just one thing: please don’t disable more tests or just raise the
>         > tolerance thresholds.
>         >
>         >     Best regards,
>         >     -sz
>         >
>         >     On 8/31/17, 10:45 AM, "Madan Jampani" <ma...@gmail.com> wrote:
>         >
>         >         +1
>         >         Before we can turn protected mode I feel we should first get to a
>         > stable CI
>         >         pipeline.
>         >         Sandeep is chasing down known breaking issues.
>         >
>         >
>         >         On Thu, Aug 31, 2017 at 10:27 AM, Hagay Lupesko <lu...@gmail.com>
>         > wrote:
>         >
>         >         > Build stability is a major issue, builds have been failing left
>         > and right
>         >         > over the last week. Some of it is due to Jenkins slave issues,
>         > but some are
>         >         > real regressions.
>         >         > We need to be more strict in the code we're committing.
>         >         >
>         >         > I propose we configure our master to be a protected branch (
>         >         > https://help.github.com/articles/about-protected-branches/).
>         >         >
>         >         > Thoughts?
>         >         >
>         >         > On 2017-08-28 22:41, sandeep krishnamurthy <s....@gmail.com>
>         > wrote:
>         >         > > Hello Committers and Contributors,>
>         >         > >
>         >         > > Due to unstable build pipelines, from past 1 week, PRs are
>         > being merged>
>         >         > > after CR ignoring PR build status. Build pipeline is much more
>         > stable
>         >         > than>
>         >         > > last week and most of the build failures you see from now on,
>         > are likely
>         >         > to>
>         >         > > be a valid failure and hence, it is recommended to wait for PR
>         > builds,
>         >         > see>
>         >         > > the root cause of any build failures before proceeding with
>         > merges.>
>         >         > >
>         >         > > At this point of time, there are 2 intermittent issue yet to
>         > be fixed ->
>         >         > > * Network error leading to GitHub requests throwing 404>
>         >         > > * A conflict in artifacts generated between branches/PR -
>         > Cause unknown
>         >         > yet.>
>         >         > > These issues will be fixed soon.>
>         >         > >
>         >         > >
>         >         > > -- >
>         >         > > Sandeep Krishnamurthy>
>         >         > >
>         >         >
>         >
>         >
>         >
>         >
>         >
>         >
>
>
>
>
>



-- 
- Tsuyoshi

Re: Apache MXNet build failures are mostly valid - verify before merge

Posted by "Kumar, Gautam" <ga...@amazon.com>.
Reviving the discussion. 

At this point of time we have couple of stable builds 
https://builds.apache.org/view/Incubator%20Projects/job/incubator-mxnet/job/master/448/
https://builds.apache.org/view/Incubator%20Projects/job/incubator-mxnet/job/master/449/

Should we have a quick discussion or polling on making the mx-net branch protected? If you still think we shouldn’t make it protected please provide a reason to support your claim. 

Few of us have concern over Jenkin’s stability. If I look two weeks back, after upgrading Linux slave to g2.8x and new windows AMI, we have not seen any case where instance died due to high memory usage or any process got killed due to high cpu usage or any other issue with windows slaves. 

Going forward we are also planning that if we add any new slave we will not enable the main load immediately, but rather will do ‘test build’ to make sure that new slaves are not causing any infrastructure issue and capable to perform as good as existing slaves. 

-Gautam 

On 8/31/17, 5:27 PM, "Lupesko, Hagay" <lu...@gmail.com> wrote:

    @madan looking into some failures – you’re right… there’s multiple issues going on, some of them intermittent, and we want to be able to merge fixes in.
    Agreed that we can wait with setting up protected mode until build stabilizes.
    
    On 8/31/17, 11:41, "Madan Jampani" <ma...@gmail.com> wrote:
    
        @hagay: we agree on the end state. I'm not too particular about how we get
        there. If you think enabling it now and fixes regression later is doable,
        I'm fine with. I see a bit of a chicken and egg problem. We need to get
        some fixes in even when the status checks are failing.
        
        On Thu, Aug 31, 2017 at 11:25 AM, Lupesko, Hagay <lu...@gmail.com> wrote:
        
        > @madan – re: getting to a stable CI first:
        > I’m concerned that by not enabling protected branch mode ASAP, we’re just
        > taking in more regressions, which makes a stable build a moving target for
        > us…
        >
        > On 8/31/17, 10:49, "Zha, Sheng" <zh...@amazon.com> wrote:
        >
        >     Just one thing: please don’t disable more tests or just raise the
        > tolerance thresholds.
        >
        >     Best regards,
        >     -sz
        >
        >     On 8/31/17, 10:45 AM, "Madan Jampani" <ma...@gmail.com> wrote:
        >
        >         +1
        >         Before we can turn protected mode I feel we should first get to a
        > stable CI
        >         pipeline.
        >         Sandeep is chasing down known breaking issues.
        >
        >
        >         On Thu, Aug 31, 2017 at 10:27 AM, Hagay Lupesko <lu...@gmail.com>
        > wrote:
        >
        >         > Build stability is a major issue, builds have been failing left
        > and right
        >         > over the last week. Some of it is due to Jenkins slave issues,
        > but some are
        >         > real regressions.
        >         > We need to be more strict in the code we're committing.
        >         >
        >         > I propose we configure our master to be a protected branch (
        >         > https://help.github.com/articles/about-protected-branches/).
        >         >
        >         > Thoughts?
        >         >
        >         > On 2017-08-28 22:41, sandeep krishnamurthy <s....@gmail.com>
        > wrote:
        >         > > Hello Committers and Contributors,>
        >         > >
        >         > > Due to unstable build pipelines, from past 1 week, PRs are
        > being merged>
        >         > > after CR ignoring PR build status. Build pipeline is much more
        > stable
        >         > than>
        >         > > last week and most of the build failures you see from now on,
        > are likely
        >         > to>
        >         > > be a valid failure and hence, it is recommended to wait for PR
        > builds,
        >         > see>
        >         > > the root cause of any build failures before proceeding with
        > merges.>
        >         > >
        >         > > At this point of time, there are 2 intermittent issue yet to
        > be fixed ->
        >         > > * Network error leading to GitHub requests throwing 404>
        >         > > * A conflict in artifacts generated between branches/PR -
        > Cause unknown
        >         > yet.>
        >         > > These issues will be fixed soon.>
        >         > >
        >         > >
        >         > > -- >
        >         > > Sandeep Krishnamurthy>
        >         > >
        >         >
        >
        >
        >
        >
        >
        >