You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Mikhail Gryzykhin <mi...@google.com> on 2018/06/04 16:08:03 UTC

Re: Proposal: keeping post-commit tests green

Hello everyone,

I have addressed comments on the proposal doc and updated it accordingly. I
have also added section on metrics that we want to track for pre-commit
tests and contents for dashboard.

Please, take a second look at the document.

Highlights:
* Sections that I feel require more discussion are marked with *[More
opinions wanted]*
** I've kept original comments open for this iteration. Please, close them
if you feel those resolved, or elaborate more on the topic.*
* Added information on metrics to track
* Moved “Split test jobs into automatically and manually triggered” to
“Other ideas to consider”
* Prioritized automated JIRA ticket creation over manual
* Prioritized roll-back first policy
* Added process for enforcing proposed policies.

--Mikhail

Have feedback <http://go/migryz-feedback>?


On Tue, May 22, 2018 at 10:11 AM Scott Wegner <sw...@google.com> wrote:

> Thanks for the thoughtful proposal Mikhail. I've left some comments in the
> doc.
>
> I encourage others to take a look: the proposal adds some strong policies
> about dealing with post-commit failures (rollback policy, locking master).
> Currently our post-commits are frequently red, and we're missing out on a
> valuable quality signal. I'm in favor of such policies to help get the test
> signals back to a healthy state.
>
> On Mon, May 21, 2018 at 2:48 PM Mikhail Gryzykhin <mi...@google.com>
> wrote:
>
>> Hi Everyone,
>>
>> I've updated design doc according to comments.
>>
>> https://docs.google.com/document/d/1sczGwnCvdHiboVajGVdnZL0rfnr7ViXXAebBAf_uQME
>>
>> In general, ideas proposed seem to be appreciated. Still, some of
>> sections require more discussion.
>>
>> Changes highlight:
>> * Added roll-back first policy to best practices. This includes process
>> on how to handle roll-back.
>> * Marked topics that I'd like to have more input on. [cyan color]
>>
>> --Mikhail
>>
>> Have feedback <http://go/migryz-feedback>?
>>
>>
>> On Fri, May 18, 2018 at 10:56 AM Andrew Pilloud <ap...@google.com>
>> wrote:
>>
>>> Blocking commits to master on test flaps seems critical here. The test
>>> flaps won't get the attention they deserve as long as people are just
>>> spamming their PRs with 'Run Java Precommit' until they turn green. I'm
>>> guilty of this behavior and I know it masks new flaky tests.
>>>
>>> I added a comment to your doc about detecting flaky tests. This can
>>> easily be done by rerunning the postcommits during times when Jenkins would
>>> otherwise be idle. You'll easily get a few dozen runs every weekend, you
>>> just need a process to triage all the flakes and ensure there are bugs. I
>>> worked on a project that did this along with blocking master on any post
>>> commit failure. It was painful for the first few weeks, but things got
>>> significantly better once most of the bugs were fixed.
>>>
>>> Andrew
>>>
>>> On Fri, May 18, 2018 at 10:39 AM Kenneth Knowles <kl...@google.com> wrote:
>>>
>>>> Love it. I would pull out from the doc also the key point: make the
>>>> postcommit status constantly visible to everyone.
>>>>
>>>> Kenn
>>>>
>>>> On Fri, May 18, 2018 at 10:17 AM Mikhail Gryzykhin <mi...@google.com>
>>>> wrote:
>>>>
>>>>> Hi everyone,
>>>>>
>>>>> I'm Mikhail and started working on Google Dataflow several months ago.
>>>>> I'm really excited to work with Beam opensource community.
>>>>>
>>>>> I have a proposal to improve contributor experience by keeping
>>>>> post-commit tests green.
>>>>>
>>>>> I'm looking to get community consensus and approval about the process
>>>>> for keeping post-commit tests green and addressing post-commit test
>>>>> failures.
>>>>>
>>>>> Find full list of ideas brought in for discussion in this document:
>>>>>
>>>>> https://docs.google.com/document/d/1sczGwnCvdHiboVajGVdnZL0rfnr7ViXXAebBAf_uQME
>>>>>
>>>>> Key points are:
>>>>> 1. Add explicit tracking of failures via JIRA
>>>>> 2. No-Commit policy when post-commit tests are red
>>>>>
>>>>> --Mikhail
>>>>>
>>>>>

Re: Proposal: keeping post-commit tests green

Posted by Mikhail Gryzykhin <mi...@google.com>.
Hi everyone,

I've summarized things discussed in this design doc into a Beam site page:
https://beam.apache.org/contribute/postcommits-policies/

Regards,
--Mikhail

Have feedback <http://go/migryz-feedback>?


On Thu, Jun 14, 2018 at 9:13 AM Mikhail Gryzykhin <mi...@google.com> wrote:

> It is one-time action. Afterwards, we treat flakes as failing tests that
> require investigation and fixes.
>
> --Mikhail
>
> Have feedback <http://go/migryz-feedback>?
>
>
> On Wed, Jun 13, 2018 at 5:06 PM Ahmet Altay <al...@google.com> wrote:
>
>>
>>
>> On Wed, Jun 13, 2018 at 3:52 PM, Mikhail Gryzykhin <mi...@google.com>
>> wrote:
>>
>>> Hi Ahmet,
>>>
>>> I've checked on tests status and most of other tests are green 98% of
>>> the time. So I feel that we do not need any explicit actions for those
>>> tests.
>>>
>>
>> Is it going to be a one time action to fix existing flaky tests? Or is it
>> about a process to detect flaky tests in general? If it is former, Java
>> only makes sense to me.
>>
>>
>>>
>>> However java tests seem to have most of the problems. So I moved it to
>>> requirements explicitly.
>>>
>>> I do not bring in fixing failing tests as those should not require any
>>> specific process.
>>>
>>> --Mikhail
>>>
>>> Have feedback <http://go/migryz-feedback>?
>>>
>>>
>>>
>>

Re: Proposal: keeping post-commit tests green

Posted by Mikhail Gryzykhin <mi...@google.com>.
It is one-time action. Afterwards, we treat flakes as failing tests that
require investigation and fixes.

--Mikhail

Have feedback <http://go/migryz-feedback>?


On Wed, Jun 13, 2018 at 5:06 PM Ahmet Altay <al...@google.com> wrote:

>
>
> On Wed, Jun 13, 2018 at 3:52 PM, Mikhail Gryzykhin <mi...@google.com>
> wrote:
>
>> Hi Ahmet,
>>
>> I've checked on tests status and most of other tests are green 98% of the
>> time. So I feel that we do not need any explicit actions for those tests.
>>
>
> Is it going to be a one time action to fix existing flaky tests? Or is it
> about a process to detect flaky tests in general? If it is former, Java
> only makes sense to me.
>
>
>>
>> However java tests seem to have most of the problems. So I moved it to
>> requirements explicitly.
>>
>> I do not bring in fixing failing tests as those should not require any
>> specific process.
>>
>> --Mikhail
>>
>> Have feedback <http://go/migryz-feedback>?
>>
>>
>>
>

Re: Proposal: keeping post-commit tests green

Posted by Ahmet Altay <al...@google.com>.
On Wed, Jun 13, 2018 at 3:52 PM, Mikhail Gryzykhin <mi...@google.com>
wrote:

> Hi Ahmet,
>
> I've checked on tests status and most of other tests are green 98% of the
> time. So I feel that we do not need any explicit actions for those tests.
>

Is it going to be a one time action to fix existing flaky tests? Or is it
about a process to detect flaky tests in general? If it is former, Java
only makes sense to me.


>
> However java tests seem to have most of the problems. So I moved it to
> requirements explicitly.
>
> I do not bring in fixing failing tests as those should not require any
> specific process.
>
> --Mikhail
>
> Have feedback <http://go/migryz-feedback>?
>
>
>

Re: Proposal: keeping post-commit tests green

Posted by Mikhail Gryzykhin <mi...@google.com>.
Hi Ahmet,

I've checked on tests status and most of other tests are green 98% of the
time. So I feel that we do not need any explicit actions for those tests.

However java tests seem to have most of the problems. So I moved it to
requirements explicitly.

I do not bring in fixing failing tests as those should not require any
specific process.

--Mikhail

Have feedback <http://go/migryz-feedback>?


On Wed, Jun 13, 2018 at 3:49 PM Ahmet Altay <al...@google.com> wrote:

>
>
> On Wed, Jun 13, 2018 at 3:45 PM, Mikhail Gryzykhin <mi...@google.com>
> wrote:
>
>> Hello everybody,
>>
>> Thanks everyone. I didn't receive any more feedback on the design
>> proposal document [1] and I believe we've reached consensus. I've added
>> implementation tasks in JIRA (BEAM-4559 [2])  and will start coding soon.
>> As a recap, the high-level plan is:
>>
>>
>>    - Split existing post-commit tests jobs to automatically and manually
>>    triggered
>>    - Add tracking by JIRA bugs for failing test job
>>    - Create document describing post-commit failures handling policies
>>    - Add tests status badge to PR template
>>    - Create dashboard for post-commit tests
>>    - Detect and fix flaky java tests (if any)
>>
>>
> Why is this limited to flaky java tests?
>
>
>>
>> [1]
>> https://docs.google.com/document/d/1sczGwnCvdHiboVajGVdnZL0rfnr7ViXXAebBAf_uQME
>> [2] https://issues.apache.org/jira/browse/BEAM-4559
>>
>> --Mikhail
>>
>>
>>
>

Re: Proposal: keeping post-commit tests green

Posted by Ahmet Altay <al...@google.com>.
On Wed, Jun 13, 2018 at 3:45 PM, Mikhail Gryzykhin <mi...@google.com>
wrote:

> Hello everybody,
>
> Thanks everyone. I didn't receive any more feedback on the design proposal
> document [1] and I believe we've reached consensus. I've added
> implementation tasks in JIRA (BEAM-4559 [2])  and will start coding soon.
> As a recap, the high-level plan is:
>
>
>    - Split existing post-commit tests jobs to automatically and manually
>    triggered
>    - Add tracking by JIRA bugs for failing test job
>    - Create document describing post-commit failures handling policies
>    - Add tests status badge to PR template
>    - Create dashboard for post-commit tests
>    - Detect and fix flaky java tests (if any)
>
>
Why is this limited to flaky java tests?


>
> [1] https://docs.google.com/document/d/1sczGwnCvdHiboVajGVdnZL0rfnr7V
> iXXAebBAf_uQME
> [2] https://issues.apache.org/jira/browse/BEAM-4559
>
> --Mikhail
>
>
>

Re: Proposal: keeping post-commit tests green

Posted by Mikhail Gryzykhin <mi...@google.com>.
Hello everybody,

Thanks everyone. I didn't receive any more feedback on the design proposal
document [1] and I believe we've reached consensus. I've added
implementation tasks in JIRA (BEAM-4559 [2])  and will start coding soon.
As a recap, the high-level plan is:


   - Split existing post-commit tests jobs to automatically and manually
   triggered
   - Add tracking by JIRA bugs for failing test job
   - Create document describing post-commit failures handling policies
   - Add tests status badge to PR template
   - Create dashboard for post-commit tests
   - Detect and fix flaky java tests (if any)


[1]
https://docs.google.com/document/d/1sczGwnCvdHiboVajGVdnZL0rfnr7ViXXAebBAf_uQME
[2] https://issues.apache.org/jira/browse/BEAM-4559

--Mikhail


On Wed, Jun 6, 2018 at 1:12 PM Mikhail Gryzykhin <mi...@google.com> wrote:

> Hello everyone,
>
> Most of the comments on my last draft addressed technical details of
> automation implementation of specific processes proposed. No major process
> changes were suggested.
>
> If you have not yet, please review this document.
>
> Highlights from last change:
> * Bumped splitting tests jobs after Kenneths comment.
> * No-commit in case of too many open JIRA tickets (metric was there,
> action was missing)
> * No-commit in case of too old JIRA ticket (metric was there, action was
> missing)
> * Closed comments that are addressed in document.
>
> This document already has two LGTMs from Scott Wegner and Thomas Weise.
> If no major comments will come, I'll treat this document as complete and
> start working on implementing work items defined in this document.
>
> Thank you,
> --Mikhail
>
>
> On Tue, Jun 5, 2018 at 7:38 PM Thomas Weise <th...@apache.org> wrote:
>
>> Thanks for taking this initiative. As the number of contributors grows,
>> so does the cost of broken builds. I'm also in favor of locking master
>> merges until related issues are fixed (short term pain for long term
>> gain). It would penalize a few for the benefit of many.
>>
>> On that note, recently we also had a fair share of pre-commit build
>> issues, with a few making their way to master. These include instances
>> unrelated to build tooling, such as compile error or packaging. I don't
>> think we should run PR merges over the red light and suggest it is
>> necessary to step up the gatekeeper responsibility committers have.
>>
>> Thanks,
>> Thomas
>>
>>
>> On Tue, Jun 5, 2018 at 10:56 AM, Scott Wegner <sw...@google.com> wrote:
>>
>>> I've taken another pass over the doc, and it looks good to me. Thanks
>>> for driving this effort!
>>>
>>> On Mon, Jun 4, 2018 at 9:08 AM Mikhail Gryzykhin <mi...@google.com>
>>> wrote:
>>>
>>>> Hello everyone,
>>>>
>>>> I have addressed comments on the proposal doc and updated it
>>>> accordingly. I have also added section on metrics that we want to track for
>>>> pre-commit tests and contents for dashboard.
>>>>
>>>> Please, take a second look at the document.
>>>>
>>>> Highlights:
>>>> * Sections that I feel require more discussion are marked with *[More
>>>> opinions wanted]*
>>>> ** I've kept original comments open for this iteration. Please, close
>>>> them if you feel those resolved, or elaborate more on the topic.*
>>>> * Added information on metrics to track
>>>> * Moved “Split test jobs into automatically and manually triggered” to
>>>> “Other ideas to consider”
>>>> * Prioritized automated JIRA ticket creation over manual
>>>> * Prioritized roll-back first policy
>>>> * Added process for enforcing proposed policies.
>>>>
>>>> --Mikhail
>>>>
>>>> Have feedback <http://go/migryz-feedback>?
>>>>
>>>>
>>>> On Tue, May 22, 2018 at 10:11 AM Scott Wegner <sw...@google.com>
>>>> wrote:
>>>>
>>>>> Thanks for the thoughtful proposal Mikhail. I've left some comments in
>>>>> the doc.
>>>>>
>>>>> I encourage others to take a look: the proposal adds some strong
>>>>> policies about dealing with post-commit failures (rollback policy, locking
>>>>> master). Currently our post-commits are frequently red, and we're missing
>>>>> out on a valuable quality signal. I'm in favor of such policies to help get
>>>>> the test signals back to a healthy state.
>>>>>
>>>>> On Mon, May 21, 2018 at 2:48 PM Mikhail Gryzykhin <mi...@google.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Everyone,
>>>>>>
>>>>>> I've updated design doc according to comments.
>>>>>>
>>>>>> https://docs.google.com/document/d/1sczGwnCvdHiboVajGVdnZL0rfnr7ViXXAebBAf_uQME
>>>>>>
>>>>>> In general, ideas proposed seem to be appreciated. Still, some of
>>>>>> sections require more discussion.
>>>>>>
>>>>>> Changes highlight:
>>>>>> * Added roll-back first policy to best practices. This includes
>>>>>> process on how to handle roll-back.
>>>>>> * Marked topics that I'd like to have more input on. [cyan color]
>>>>>>
>>>>>> --Mikhail
>>>>>>
>>>>>> Have feedback <http://go/migryz-feedback>?
>>>>>>
>>>>>>
>>>>>> On Fri, May 18, 2018 at 10:56 AM Andrew Pilloud <ap...@google.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Blocking commits to master on test flaps seems critical here. The
>>>>>>> test flaps won't get the attention they deserve as long as people are just
>>>>>>> spamming their PRs with 'Run Java Precommit' until they turn green. I'm
>>>>>>> guilty of this behavior and I know it masks new flaky tests.
>>>>>>>
>>>>>>> I added a comment to your doc about detecting flaky tests. This can
>>>>>>> easily be done by rerunning the postcommits during times when Jenkins would
>>>>>>> otherwise be idle. You'll easily get a few dozen runs every weekend, you
>>>>>>> just need a process to triage all the flakes and ensure there are bugs. I
>>>>>>> worked on a project that did this along with blocking master on any post
>>>>>>> commit failure. It was painful for the first few weeks, but things got
>>>>>>> significantly better once most of the bugs were fixed.
>>>>>>>
>>>>>>> Andrew
>>>>>>>
>>>>>>> On Fri, May 18, 2018 at 10:39 AM Kenneth Knowles <kl...@google.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Love it. I would pull out from the doc also the key point: make the
>>>>>>>> postcommit status constantly visible to everyone.
>>>>>>>>
>>>>>>>> Kenn
>>>>>>>>
>>>>>>>> On Fri, May 18, 2018 at 10:17 AM Mikhail Gryzykhin <
>>>>>>>> migryz@google.com> wrote:
>>>>>>>>
>>>>>>>>> Hi everyone,
>>>>>>>>>
>>>>>>>>> I'm Mikhail and started working on Google Dataflow several months
>>>>>>>>> ago. I'm really excited to work with Beam opensource community.
>>>>>>>>>
>>>>>>>>> I have a proposal to improve contributor experience by keeping
>>>>>>>>> post-commit tests green.
>>>>>>>>>
>>>>>>>>> I'm looking to get community consensus and approval about the
>>>>>>>>> process for keeping post-commit tests green and addressing post-commit test
>>>>>>>>> failures.
>>>>>>>>>
>>>>>>>>> Find full list of ideas brought in for discussion in this document:
>>>>>>>>>
>>>>>>>>> https://docs.google.com/document/d/1sczGwnCvdHiboVajGVdnZL0rfnr7ViXXAebBAf_uQME
>>>>>>>>>
>>>>>>>>> Key points are:
>>>>>>>>> 1. Add explicit tracking of failures via JIRA
>>>>>>>>> 2. No-Commit policy when post-commit tests are red
>>>>>>>>>
>>>>>>>>> --Mikhail
>>>>>>>>>
>>>>>>>>>
>>

Re: Proposal: keeping post-commit tests green

Posted by Mikhail Gryzykhin <mi...@google.com>.
Hello everyone,

Most of the comments on my last draft addressed technical details of
automation implementation of specific processes proposed. No major process
changes were suggested.

If you have not yet, please review this document.

Highlights from last change:
* Bumped splitting tests jobs after Kenneths comment.
* No-commit in case of too many open JIRA tickets (metric was there, action
was missing)
* No-commit in case of too old JIRA ticket (metric was there, action was
missing)
* Closed comments that are addressed in document.

This document already has two LGTMs from Scott Wegner and Thomas Weise.
If no major comments will come, I'll treat this document as complete and
start working on implementing work items defined in this document.

Thank you,
--Mikhail


On Tue, Jun 5, 2018 at 7:38 PM Thomas Weise <th...@apache.org> wrote:

> Thanks for taking this initiative. As the number of contributors grows, so
> does the cost of broken builds. I'm also in favor of locking master merges
> until related issues are fixed (short term pain for long term gain). It
> would penalize a few for the benefit of many.
>
> On that note, recently we also had a fair share of pre-commit build
> issues, with a few making their way to master. These include instances
> unrelated to build tooling, such as compile error or packaging. I don't
> think we should run PR merges over the red light and suggest it is
> necessary to step up the gatekeeper responsibility committers have.
>
> Thanks,
> Thomas
>
>
> On Tue, Jun 5, 2018 at 10:56 AM, Scott Wegner <sw...@google.com> wrote:
>
>> I've taken another pass over the doc, and it looks good to me. Thanks for
>> driving this effort!
>>
>> On Mon, Jun 4, 2018 at 9:08 AM Mikhail Gryzykhin <mi...@google.com>
>> wrote:
>>
>>> Hello everyone,
>>>
>>> I have addressed comments on the proposal doc and updated it
>>> accordingly. I have also added section on metrics that we want to track for
>>> pre-commit tests and contents for dashboard.
>>>
>>> Please, take a second look at the document.
>>>
>>> Highlights:
>>> * Sections that I feel require more discussion are marked with *[More
>>> opinions wanted]*
>>> ** I've kept original comments open for this iteration. Please, close
>>> them if you feel those resolved, or elaborate more on the topic.*
>>> * Added information on metrics to track
>>> * Moved “Split test jobs into automatically and manually triggered” to
>>> “Other ideas to consider”
>>> * Prioritized automated JIRA ticket creation over manual
>>> * Prioritized roll-back first policy
>>> * Added process for enforcing proposed policies.
>>>
>>> --Mikhail
>>>
>>> Have feedback <http://go/migryz-feedback>?
>>>
>>>
>>> On Tue, May 22, 2018 at 10:11 AM Scott Wegner <sw...@google.com>
>>> wrote:
>>>
>>>> Thanks for the thoughtful proposal Mikhail. I've left some comments in
>>>> the doc.
>>>>
>>>> I encourage others to take a look: the proposal adds some strong
>>>> policies about dealing with post-commit failures (rollback policy, locking
>>>> master). Currently our post-commits are frequently red, and we're missing
>>>> out on a valuable quality signal. I'm in favor of such policies to help get
>>>> the test signals back to a healthy state.
>>>>
>>>> On Mon, May 21, 2018 at 2:48 PM Mikhail Gryzykhin <mi...@google.com>
>>>> wrote:
>>>>
>>>>> Hi Everyone,
>>>>>
>>>>> I've updated design doc according to comments.
>>>>>
>>>>> https://docs.google.com/document/d/1sczGwnCvdHiboVajGVdnZL0rfnr7ViXXAebBAf_uQME
>>>>>
>>>>> In general, ideas proposed seem to be appreciated. Still, some of
>>>>> sections require more discussion.
>>>>>
>>>>> Changes highlight:
>>>>> * Added roll-back first policy to best practices. This includes
>>>>> process on how to handle roll-back.
>>>>> * Marked topics that I'd like to have more input on. [cyan color]
>>>>>
>>>>> --Mikhail
>>>>>
>>>>> Have feedback <http://go/migryz-feedback>?
>>>>>
>>>>>
>>>>> On Fri, May 18, 2018 at 10:56 AM Andrew Pilloud <ap...@google.com>
>>>>> wrote:
>>>>>
>>>>>> Blocking commits to master on test flaps seems critical here. The
>>>>>> test flaps won't get the attention they deserve as long as people are just
>>>>>> spamming their PRs with 'Run Java Precommit' until they turn green. I'm
>>>>>> guilty of this behavior and I know it masks new flaky tests.
>>>>>>
>>>>>> I added a comment to your doc about detecting flaky tests. This can
>>>>>> easily be done by rerunning the postcommits during times when Jenkins would
>>>>>> otherwise be idle. You'll easily get a few dozen runs every weekend, you
>>>>>> just need a process to triage all the flakes and ensure there are bugs. I
>>>>>> worked on a project that did this along with blocking master on any post
>>>>>> commit failure. It was painful for the first few weeks, but things got
>>>>>> significantly better once most of the bugs were fixed.
>>>>>>
>>>>>> Andrew
>>>>>>
>>>>>> On Fri, May 18, 2018 at 10:39 AM Kenneth Knowles <kl...@google.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Love it. I would pull out from the doc also the key point: make the
>>>>>>> postcommit status constantly visible to everyone.
>>>>>>>
>>>>>>> Kenn
>>>>>>>
>>>>>>> On Fri, May 18, 2018 at 10:17 AM Mikhail Gryzykhin <
>>>>>>> migryz@google.com> wrote:
>>>>>>>
>>>>>>>> Hi everyone,
>>>>>>>>
>>>>>>>> I'm Mikhail and started working on Google Dataflow several months
>>>>>>>> ago. I'm really excited to work with Beam opensource community.
>>>>>>>>
>>>>>>>> I have a proposal to improve contributor experience by keeping
>>>>>>>> post-commit tests green.
>>>>>>>>
>>>>>>>> I'm looking to get community consensus and approval about the
>>>>>>>> process for keeping post-commit tests green and addressing post-commit test
>>>>>>>> failures.
>>>>>>>>
>>>>>>>> Find full list of ideas brought in for discussion in this document:
>>>>>>>>
>>>>>>>> https://docs.google.com/document/d/1sczGwnCvdHiboVajGVdnZL0rfnr7ViXXAebBAf_uQME
>>>>>>>>
>>>>>>>> Key points are:
>>>>>>>> 1. Add explicit tracking of failures via JIRA
>>>>>>>> 2. No-Commit policy when post-commit tests are red
>>>>>>>>
>>>>>>>> --Mikhail
>>>>>>>>
>>>>>>>>
>

Re: Proposal: keeping post-commit tests green

Posted by Thomas Weise <th...@apache.org>.
Thanks for taking this initiative. As the number of contributors grows, so
does the cost of broken builds. I'm also in favor of locking master merges
until related issues are fixed (short term pain for long term gain). It
would penalize a few for the benefit of many.

On that note, recently we also had a fair share of pre-commit build issues,
with a few making their way to master. These include instances unrelated to
build tooling, such as compile error or packaging. I don't think we should
run PR merges over the red light and suggest it is necessary to step up the
gatekeeper responsibility committers have.

Thanks,
Thomas


On Tue, Jun 5, 2018 at 10:56 AM, Scott Wegner <sw...@google.com> wrote:

> I've taken another pass over the doc, and it looks good to me. Thanks for
> driving this effort!
>
> On Mon, Jun 4, 2018 at 9:08 AM Mikhail Gryzykhin <mi...@google.com>
> wrote:
>
>> Hello everyone,
>>
>> I have addressed comments on the proposal doc and updated it accordingly.
>> I have also added section on metrics that we want to track for pre-commit
>> tests and contents for dashboard.
>>
>> Please, take a second look at the document.
>>
>> Highlights:
>> * Sections that I feel require more discussion are marked with *[More
>> opinions wanted]*
>> ** I've kept original comments open for this iteration. Please, close
>> them if you feel those resolved, or elaborate more on the topic.*
>> * Added information on metrics to track
>> * Moved “Split test jobs into automatically and manually triggered” to
>> “Other ideas to consider”
>> * Prioritized automated JIRA ticket creation over manual
>> * Prioritized roll-back first policy
>> * Added process for enforcing proposed policies.
>>
>> --Mikhail
>>
>> Have feedback <http://go/migryz-feedback>?
>>
>>
>> On Tue, May 22, 2018 at 10:11 AM Scott Wegner <sw...@google.com> wrote:
>>
>>> Thanks for the thoughtful proposal Mikhail. I've left some comments in
>>> the doc.
>>>
>>> I encourage others to take a look: the proposal adds some strong
>>> policies about dealing with post-commit failures (rollback policy, locking
>>> master). Currently our post-commits are frequently red, and we're missing
>>> out on a valuable quality signal. I'm in favor of such policies to help get
>>> the test signals back to a healthy state.
>>>
>>> On Mon, May 21, 2018 at 2:48 PM Mikhail Gryzykhin <mi...@google.com>
>>> wrote:
>>>
>>>> Hi Everyone,
>>>>
>>>> I've updated design doc according to comments.
>>>> https://docs.google.com/document/d/1sczGwnCvdHiboVajGVdnZL0rfnr7V
>>>> iXXAebBAf_uQME
>>>>
>>>> In general, ideas proposed seem to be appreciated. Still, some of
>>>> sections require more discussion.
>>>>
>>>> Changes highlight:
>>>> * Added roll-back first policy to best practices. This includes process
>>>> on how to handle roll-back.
>>>> * Marked topics that I'd like to have more input on. [cyan color]
>>>>
>>>> --Mikhail
>>>>
>>>> Have feedback <http://go/migryz-feedback>?
>>>>
>>>>
>>>> On Fri, May 18, 2018 at 10:56 AM Andrew Pilloud <ap...@google.com>
>>>> wrote:
>>>>
>>>>> Blocking commits to master on test flaps seems critical here. The test
>>>>> flaps won't get the attention they deserve as long as people are just
>>>>> spamming their PRs with 'Run Java Precommit' until they turn green. I'm
>>>>> guilty of this behavior and I know it masks new flaky tests.
>>>>>
>>>>> I added a comment to your doc about detecting flaky tests. This can
>>>>> easily be done by rerunning the postcommits during times when Jenkins would
>>>>> otherwise be idle. You'll easily get a few dozen runs every weekend, you
>>>>> just need a process to triage all the flakes and ensure there are bugs. I
>>>>> worked on a project that did this along with blocking master on any post
>>>>> commit failure. It was painful for the first few weeks, but things got
>>>>> significantly better once most of the bugs were fixed.
>>>>>
>>>>> Andrew
>>>>>
>>>>> On Fri, May 18, 2018 at 10:39 AM Kenneth Knowles <kl...@google.com>
>>>>> wrote:
>>>>>
>>>>>> Love it. I would pull out from the doc also the key point: make the
>>>>>> postcommit status constantly visible to everyone.
>>>>>>
>>>>>> Kenn
>>>>>>
>>>>>> On Fri, May 18, 2018 at 10:17 AM Mikhail Gryzykhin <mi...@google.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi everyone,
>>>>>>>
>>>>>>> I'm Mikhail and started working on Google Dataflow several months
>>>>>>> ago. I'm really excited to work with Beam opensource community.
>>>>>>>
>>>>>>> I have a proposal to improve contributor experience by keeping
>>>>>>> post-commit tests green.
>>>>>>>
>>>>>>> I'm looking to get community consensus and approval about the
>>>>>>> process for keeping post-commit tests green and addressing post-commit test
>>>>>>> failures.
>>>>>>>
>>>>>>> Find full list of ideas brought in for discussion in this document:
>>>>>>> https://docs.google.com/document/d/1sczGwnCvdHiboVajGVdnZL0rfnr7V
>>>>>>> iXXAebBAf_uQME
>>>>>>>
>>>>>>> Key points are:
>>>>>>> 1. Add explicit tracking of failures via JIRA
>>>>>>> 2. No-Commit policy when post-commit tests are red
>>>>>>>
>>>>>>> --Mikhail
>>>>>>>
>>>>>>>

Re: Proposal: keeping post-commit tests green

Posted by Scott Wegner <sw...@google.com>.
I've taken another pass over the doc, and it looks good to me. Thanks for
driving this effort!

On Mon, Jun 4, 2018 at 9:08 AM Mikhail Gryzykhin <mi...@google.com> wrote:

> Hello everyone,
>
> I have addressed comments on the proposal doc and updated it accordingly.
> I have also added section on metrics that we want to track for pre-commit
> tests and contents for dashboard.
>
> Please, take a second look at the document.
>
> Highlights:
> * Sections that I feel require more discussion are marked with *[More
> opinions wanted]*
> ** I've kept original comments open for this iteration. Please, close them
> if you feel those resolved, or elaborate more on the topic.*
> * Added information on metrics to track
> * Moved “Split test jobs into automatically and manually triggered” to
> “Other ideas to consider”
> * Prioritized automated JIRA ticket creation over manual
> * Prioritized roll-back first policy
> * Added process for enforcing proposed policies.
>
> --Mikhail
>
> Have feedback <http://go/migryz-feedback>?
>
>
> On Tue, May 22, 2018 at 10:11 AM Scott Wegner <sw...@google.com> wrote:
>
>> Thanks for the thoughtful proposal Mikhail. I've left some comments in
>> the doc.
>>
>> I encourage others to take a look: the proposal adds some strong policies
>> about dealing with post-commit failures (rollback policy, locking master).
>> Currently our post-commits are frequently red, and we're missing out on a
>> valuable quality signal. I'm in favor of such policies to help get the test
>> signals back to a healthy state.
>>
>> On Mon, May 21, 2018 at 2:48 PM Mikhail Gryzykhin <mi...@google.com>
>> wrote:
>>
>>> Hi Everyone,
>>>
>>> I've updated design doc according to comments.
>>>
>>> https://docs.google.com/document/d/1sczGwnCvdHiboVajGVdnZL0rfnr7ViXXAebBAf_uQME
>>>
>>> In general, ideas proposed seem to be appreciated. Still, some of
>>> sections require more discussion.
>>>
>>> Changes highlight:
>>> * Added roll-back first policy to best practices. This includes process
>>> on how to handle roll-back.
>>> * Marked topics that I'd like to have more input on. [cyan color]
>>>
>>> --Mikhail
>>>
>>> Have feedback <http://go/migryz-feedback>?
>>>
>>>
>>> On Fri, May 18, 2018 at 10:56 AM Andrew Pilloud <ap...@google.com>
>>> wrote:
>>>
>>>> Blocking commits to master on test flaps seems critical here. The test
>>>> flaps won't get the attention they deserve as long as people are just
>>>> spamming their PRs with 'Run Java Precommit' until they turn green. I'm
>>>> guilty of this behavior and I know it masks new flaky tests.
>>>>
>>>> I added a comment to your doc about detecting flaky tests. This can
>>>> easily be done by rerunning the postcommits during times when Jenkins would
>>>> otherwise be idle. You'll easily get a few dozen runs every weekend, you
>>>> just need a process to triage all the flakes and ensure there are bugs. I
>>>> worked on a project that did this along with blocking master on any post
>>>> commit failure. It was painful for the first few weeks, but things got
>>>> significantly better once most of the bugs were fixed.
>>>>
>>>> Andrew
>>>>
>>>> On Fri, May 18, 2018 at 10:39 AM Kenneth Knowles <kl...@google.com>
>>>> wrote:
>>>>
>>>>> Love it. I would pull out from the doc also the key point: make the
>>>>> postcommit status constantly visible to everyone.
>>>>>
>>>>> Kenn
>>>>>
>>>>> On Fri, May 18, 2018 at 10:17 AM Mikhail Gryzykhin <mi...@google.com>
>>>>> wrote:
>>>>>
>>>>>> Hi everyone,
>>>>>>
>>>>>> I'm Mikhail and started working on Google Dataflow several months
>>>>>> ago. I'm really excited to work with Beam opensource community.
>>>>>>
>>>>>> I have a proposal to improve contributor experience by keeping
>>>>>> post-commit tests green.
>>>>>>
>>>>>> I'm looking to get community consensus and approval about the process
>>>>>> for keeping post-commit tests green and addressing post-commit test
>>>>>> failures.
>>>>>>
>>>>>> Find full list of ideas brought in for discussion in this document:
>>>>>>
>>>>>> https://docs.google.com/document/d/1sczGwnCvdHiboVajGVdnZL0rfnr7ViXXAebBAf_uQME
>>>>>>
>>>>>> Key points are:
>>>>>> 1. Add explicit tracking of failures via JIRA
>>>>>> 2. No-Commit policy when post-commit tests are red
>>>>>>
>>>>>> --Mikhail
>>>>>>
>>>>>>