You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Mikhail Gryzykhin <mi...@google.com> on 2018/08/10 20:24:33 UTC

Test failures list

Hi everyone,

I'm following up on tackling post-commit tests greenness. (See beam
post-commit policies
<https://beam.apache.org/contribute/postcommits-policies/>)

During this week, I've assembled a list of most problematic flaky or
failing tests
<https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20resolution%20%3D%20Unresolved%20AND%20component%20%3D%20test-failures%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC>.
Unfortunately, I'm relatively new to the project and lack triaging guides,
so most of tickets contain only basic information.

*I want to ask community help in following areas:*
1. If you know how to triage tests or the location of triage guide, please
share the knowledge. You can post links here, or add pages to Confluence
wiki <https://cwiki.apache.org/confluence/display/BEAM/> and share link
here.
2. Please, check on the Jira test-failures
<https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20resolution%20%3D%20Unresolved%20AND%20component%20%3D%20test-failures%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC>list
and pick up tests that you might know how to fix and help with fixing
those. Tickets that do not have owner now are not being worked on. I'm
trying out easy mitigations for some of the failures (ie increasing
timeouts), but those should not be treated as fixes.

*Current status:*
Items that are marked critical in the failures list tend to fail jobs in
~5-10% runs each.

I contacted Anton Kedin directly and he works on fixes for couple of most
problematic flakes currently. Anton, thank you for picking those up.

Please, update owner and status of ticket if you start working on some test
failure, this will save time for others who might also start looking into
the failure.

Thank you,
--Mikhail

Have feedback <http://go/migryz-feedback>?

Re: Test failures list

Posted by Mikhail Gryzykhin <mi...@google.com>.
Lukasz, Maimilian,

Thank you for feedback.

Quick summary:
As of now, I'll send updates on tests once/twice a week. Will assemble list
of failures and assignees. We can tweak content as we go.

Meanwhile I'll work on proper dashboard.

Regards,
--Mikhail

Have feedback <http://go/migryz-feedback>?


On Thu, Aug 16, 2018 at 1:59 AM Maximilian Michels <mx...@apache.org> wrote:

> Thank you Mikhail for looking into test failures and compiling the list!
>
> > I cannot access this link. Is it publicly accessible?
>
> Works for me but it takes a while to show results.
>
> > One general question: maybe it's a good idea to assign change
> > authors/code owners to the issues? Or just reach them in jira
> > comments?
>
> While the authors should have a sense of ownership over the code, I
> think it is enough for them to answer questions to the Assignee. They
> shouldn't have to be owning the JIRA issue. This also increases
> knowledge sharing.
>
> > I believe such update sent daily or bi-daily can increase visibility
> > for known failures, simplify search for people who can fix tests,
> > and add nice tracking status.
>
> Flaky tests should be fixed ASAP because they hinder development. +1 for
> daily/bidaily notifications.
>
> Cheers,
> Max
>
> On 16.08.18 10:46, Łukasz Gajowy wrote:
> > Thank you for working on improving the situation with test failures!
> >
> > One general question: maybe it's a good idea to assign change
> > authors/code owners to the issues? Or just reach them in jira comments?
> > They know the code and they may be more likely to know solutions to
> > failing tests or provide useful information (when swamped in other
> > things). WDYT?
> >
> > wt., 14 sie 2018 o 20:05 Mikhail Gryzykhin <migryz@google.com
> > <ma...@google.com>> napisał(a):
> >
> >     Hi everyone,
> >
> >     We have increased amount of test jobs failures recently.
> >
> >     In terms of numbers (based on my memory and http://35.226.225.164/):
> >     Java precommits went down from ~55% to ~30% of succeeded jobs.
> >     Java postcommits went down from ~60 to ~40 of succeeded jobs.
> >
> >
> > I cannot access this link. Is it publicly accessible?
> >
> >
> >     I'm currently triaging post-commit failures and wonder if it will be
> >     useful to send regular updates on found issues and implemented fixes?
> >
> >     What can be present in update:
> >     * Tests greenness based on http://35.226.225.164/ (work on better
> >     dashboard is in progress)
> >     * List of Jira tickets with triaged failures with no owners
> >     * List of Jira tickets in progress and who's working on fixes
> >     * List of Jira tickets with fixes shipped
> >
> >
> >     Each point can also have short description of failure reason.
> >
> >
> > I think such report should be very brief and informative. IMO the report
> > should contain the failures (as short summaries and a link to a JIRA
> > ticket). Whoever's working on an issue should assign him/herself to the
> > ticket and mark it as "IN PROGRESS" so there's no collisions between
> > contributors fixing the tests. I don't see the need for listing the in
> > progress issues (jira already shows that). List of fixed issues may show
> > the progress, but I'd rather see a blank report with an empty failing
> > tests list. :)
> >
> > In fact, I think the list, you showed in the previous message
> > <
> https://issues.apache.org/jira/browse/BEAM-5122?jql=project%20%3D%20BEAM%20AND%20status%20in%20%28Open%2C%20%22In%20Progress%22%2C%20Reopened%29%20AND%20resolution%20%3D%20Unresolved%20AND%20component%20%3D%20test-failures%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC
> > will
> > suffice.
> >
> >
> >
> >     I believe such update sent daily or bi-daily can increase visibility
> >     for known failures, simplify search for people who can fix tests,
> >     and add nice tracking status.
> >
> >
> > Aren't weekly reports enough? It may be hard to change a lot in a day
> > (two days).
> >
> >
> >
> >     What do you think?
> >
> >     Regards,
> >     --Mikhail
> >
> >     Have feedback <http://go/migryz-feedback>?
> >
> >
> >     On Fri, Aug 10, 2018 at 1:24 PM Mikhail Gryzykhin <migryz@google.com
> >     <ma...@google.com>> wrote:
> >
> >         Hi everyone,
> >
> >         I'm following up on tackling post-commit tests greenness. (See
> >         beam post-commit policies
> >         <https://beam.apache.org/contribute/postcommits-policies/>)
> >
> >         During this week, I've assembled a list of most problematic
> >         flaky or failing tests
> >         <
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20status%20in%20%28Open%2C%20%22In%20Progress%22%2C%20Reopened%29%20AND%20resolution%20%3D%20Unresolved%20AND%20component%20%3D%20test-failures%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC
> >.
> >         Unfortunately, I'm relatively new to the project and lack
> >         triaging guides, so most of tickets contain only basic
> information.
> >
> >         _I want to ask community help in following areas:_
> >         1. If you know how to triage tests or the location of triage
> >         guide, please share the knowledge. You can post links here, or
> >         add pages to Confluence wiki
> >         <https://cwiki.apache.org/confluence/display/BEAM/> and share
> >         link here.
> >         2. Please, check on the Jira test-failures
> >         <
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20status%20in%20%28Open%2C%20%22In%20Progress%22%2C%20Reopened%29%20AND%20resolution%20%3D%20Unresolved%20AND%20component%20%3D%20test-failures%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC
> >list
> >         and pick up tests that you might know how to fix and help with
> >         fixing those. Tickets that do not have owner now are not being
> >         worked on. I'm trying out easy mitigations for some of the
> >         failures (ie increasing timeouts), but those should not be
> >         treated as fixes.
> >
> >         _Current status:_
> >         Items that are marked critical in the failures list tend to fail
> >         jobs in ~5-10% runs each.
> >
> >         I contacted Anton Kedin directly and he works on fixes for
> >         couple of most problematic flakes currently. Anton, thank you
> >         for picking those up.
> >
> >         Please, update owner and status of ticket if you start working
> >         on some test failure, this will save time for others who might
> >         also start looking into the failure.
> >
> >         Thank you,
> >         --Mikhail
> >
> >         Have feedback <http://go/migryz-feedback>?
> >
>

Re: Test failures list

Posted by Etienne Chauchot <ec...@apache.org>.
Thanks Mikhail for the list, it gives very good view of the statuses !
+1 to what Maximilian is saying about ownership and knowledge sharing.Also +1 : flaky tests are priority #1.
Etienne
Le jeudi 16 août 2018 à 10:59 +0200, Maximilian Michels a écrit :
> Thank you Mikhail for looking into test failures and compiling the list!
> I cannot access this link. Is it publicly accessible?
> Works for me but it takes a while to show results.
> One general question: maybe it's a good idea to assign changeauthors/code owners to the issues? Or just reach them in
> jiracomments?
> While the authors should have a sense of ownership over the code, Ithink it is enough for them to answer questions to
> the Assignee. Theyshouldn't have to be owning the JIRA issue. This also increasesknowledge sharing.
> I believe such update sent daily or bi-daily can increase visibilityfor known failures, simplify search for people who
> can fix tests,and add nice tracking status.
> Flaky tests should be fixed ASAP because they hinder development. +1 fordaily/bidaily notifications.
> Cheers,Max
> On 16.08.18 10:46, Łukasz Gajowy wrote:
> Thank you for working on improving the situation with test failures! 
> One general question: maybe it's a good idea to assign changeauthors/code owners to the issues? Or just reach them in
> jira comments?They know the code and they may be more likely to know solutions tofailing tests or provide
> useful information (when swamped in otherthings). WDYT?
> wt., 14 sie 2018 o 20:05 Mikhail Gryzykhin <mi...@google.com>> napisał(a):
>     Hi everyone,
>     We have increased amount of test jobs failures recently.
>     In terms of numbers (based on my memory and http://35.226.225.164/):    Java precommits went down from ~55% to
> ~30% of succeeded jobs.    Java postcommits went down from ~60 to ~40 of succeeded jobs.
> 
> I cannot access this link. Is it publicly accessible? 
>     I'm currently triaging post-commit failures and wonder if it will be    useful to send regular updates on found
> issues and implemented fixes?
>     What can be present in update:    * Tests greenness based on http://35.226.225.164/ (work on better    dashboard
> is in progress)    * List of Jira tickets with triaged failures with no owners    * List of Jira tickets in progress
> and who's working on fixes    * List of Jira tickets with fixes shipped     
>     Each point can also have short description of failure reason.
> 
> I think such report should be very brief and informative. IMO the reportshould contain the failures (as short
> summaries and a link to a JIRAticket). Whoever's working on an issue should assign him/herself to theticket and mark
> it as "IN PROGRESS" so there's no collisions betweencontributors fixing the tests. I don't see the need for listing
> the inprogress issues (jira already shows that). List of fixed issues may showthe progress, but I'd rather see a blank
> report with an empty failingtests list. :)
> In fact, I think the list, you showed in the previous message<https://issues.apache.org/jira/browse/BEAM-5122?jql=proj
> ect%20%3D%20BEAM%20AND%20status%20in%20%28Open%2C%20%22In%20Progress%22%2C%20Reopened%29%20AND%20resolution%20%3D%20Un
> resolved%20AND%20component%20%3D%20test-failures%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC> willsuffice.  
> 
>     I believe such update sent daily or bi-daily can increase visibility    for known failures, simplify search for
> people who can fix tests,    and add nice tracking status.
> 
> Aren't weekly reports enough? It may be hard to change a lot in a day(two days).  
> 
>     What do you think?
>     Regards,    --Mikhail
>     Have feedback <http://go/migryz-feedback>? 
> 
>     On Fri, Aug 10, 2018 at 1:24 PM Mikhail Gryzykhin <migryz@google.com    <ma...@google.com>> wrote:
>         Hi everyone,
>         I'm following up on tackling post-commit tests greenness. (See        beam post-commit policies        <https:
> //beam.apache.org/contribute/postcommits-policies/>)
>         During this week, I've assembled a list of most problematic        flaky or failing tests        <https://issu
> es.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20status%20in%20%28Open%2C%20%22In%20Progress%22%2C%20Reopen
> ed%29%20AND%20resolution%20%3D%20Unresolved%20AND%20component%20%3D%20test-
> failures%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC>..        Unfortunately, I'm relatively new to the project
> and lack        triaging guides, so most of tickets contain only basic information.
>         _I want to ask community help in following areas:_        1. If you know how to triage tests or the location
> of triage        guide, please share the knowledge. You can post links here, or        add pages to Confluence
> wiki        <https://cwiki.apache.org/confluence/display/BEAM/> and share        link here.         2. Please, check
> on the Jira test-failures         <https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20status%20in
> %20%28Open%2C%20%22In%20Progress%22%2C%20Reopened%29%20AND%20resolution%20%3D%20Unresolved%20AND%20component%20%3D%20t
> est-failures%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC>list        and pick up tests that you might know how
> to fix and help with        fixing those. Tickets that do not have owner now are not being        worked on. I'm
> trying out easy mitigations for some of the        failures (ie increasing timeouts), but those should not
> be        treated as fixes.
>         _Current status:_        Items that are marked critical in the failures list tend to fail        jobs in ~5-
> 10% runs each.
>         I contacted Anton Kedin directly and he works on fixes for        couple of most problematic flakes currently.
> Anton, thank you        for picking those up.
>         Please, update owner and status of ticket if you start working        on some test failure, this will save
> time for others who might        also start looking into the failure.
>         Thank you,        --Mikhail
>         Have feedback <http://go/migryz-feedback>? 

Re: Test failures list

Posted by Maximilian Michels <mx...@apache.org>.
Thank you Mikhail for looking into test failures and compiling the list!

> I cannot access this link. Is it publicly accessible?

Works for me but it takes a while to show results.

> One general question: maybe it's a good idea to assign change
> authors/code owners to the issues? Or just reach them in jira
> comments?

While the authors should have a sense of ownership over the code, I
think it is enough for them to answer questions to the Assignee. They
shouldn't have to be owning the JIRA issue. This also increases
knowledge sharing.

> I believe such update sent daily or bi-daily can increase visibility
> for known failures, simplify search for people who can fix tests,
> and add nice tracking status.

Flaky tests should be fixed ASAP because they hinder development. +1 for
daily/bidaily notifications.

Cheers,
Max

On 16.08.18 10:46, Łukasz Gajowy wrote:
> Thank you for working on improving the situation with test failures! 
> 
> One general question: maybe it's a good idea to assign change
> authors/code owners to the issues? Or just reach them in jira comments?
> They know the code and they may be more likely to know solutions to
> failing tests or provide useful information (when swamped in other
> things). WDYT?
> 
> wt., 14 sie 2018 o 20:05 Mikhail Gryzykhin <migryz@google.com
> <ma...@google.com>> napisał(a):
> 
>     Hi everyone,
> 
>     We have increased amount of test jobs failures recently.
> 
>     In terms of numbers (based on my memory and http://35.226.225.164/):
>     Java precommits went down from ~55% to ~30% of succeeded jobs.
>     Java postcommits went down from ~60 to ~40 of succeeded jobs.
> 
> 
> I cannot access this link. Is it publicly accessible?
>  
> 
>     I'm currently triaging post-commit failures and wonder if it will be
>     useful to send regular updates on found issues and implemented fixes?
> 
>     What can be present in update:
>     * Tests greenness based on http://35.226.225.164/ (work on better
>     dashboard is in progress)
>     * List of Jira tickets with triaged failures with no owners
>     * List of Jira tickets in progress and who's working on fixes
>     * List of Jira tickets with fixes shipped
>      
> 
>     Each point can also have short description of failure reason.
> 
> 
> I think such report should be very brief and informative. IMO the report
> should contain the failures (as short summaries and a link to a JIRA
> ticket). Whoever's working on an issue should assign him/herself to the
> ticket and mark it as "IN PROGRESS" so there's no collisions between
> contributors fixing the tests. I don't see the need for listing the in
> progress issues (jira already shows that). List of fixed issues may show
> the progress, but I'd rather see a blank report with an empty failing
> tests list. :)
> 
> In fact, I think the list, you showed in the previous message
> <https://issues.apache.org/jira/browse/BEAM-5122?jql=project%20%3D%20BEAM%20AND%20status%20in%20%28Open%2C%20%22In%20Progress%22%2C%20Reopened%29%20AND%20resolution%20%3D%20Unresolved%20AND%20component%20%3D%20test-failures%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC> will
> suffice. 
>  
> 
> 
>     I believe such update sent daily or bi-daily can increase visibility
>     for known failures, simplify search for people who can fix tests,
>     and add nice tracking status.
> 
> 
> Aren't weekly reports enough? It may be hard to change a lot in a day
> (two days). 
>  
> 
> 
>     What do you think?
> 
>     Regards,
>     --Mikhail
> 
>     Have feedback <http://go/migryz-feedback>? 
> 
> 
>     On Fri, Aug 10, 2018 at 1:24 PM Mikhail Gryzykhin <migryz@google.com
>     <ma...@google.com>> wrote:
> 
>         Hi everyone,
> 
>         I'm following up on tackling post-commit tests greenness. (See
>         beam post-commit policies
>         <https://beam.apache.org/contribute/postcommits-policies/>)
> 
>         During this week, I've assembled a list of most problematic
>         flaky or failing tests
>         <https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20status%20in%20%28Open%2C%20%22In%20Progress%22%2C%20Reopened%29%20AND%20resolution%20%3D%20Unresolved%20AND%20component%20%3D%20test-failures%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC>.
>         Unfortunately, I'm relatively new to the project and lack
>         triaging guides, so most of tickets contain only basic information.
> 
>         _I want to ask community help in following areas:_
>         1. If you know how to triage tests or the location of triage
>         guide, please share the knowledge. You can post links here, or
>         add pages to Confluence wiki
>         <https://cwiki.apache.org/confluence/display/BEAM/> and share
>         link here. 
>         2. Please, check on the Jira test-failures 
>         <https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20status%20in%20%28Open%2C%20%22In%20Progress%22%2C%20Reopened%29%20AND%20resolution%20%3D%20Unresolved%20AND%20component%20%3D%20test-failures%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC>list
>         and pick up tests that you might know how to fix and help with
>         fixing those. Tickets that do not have owner now are not being
>         worked on. I'm trying out easy mitigations for some of the
>         failures (ie increasing timeouts), but those should not be
>         treated as fixes.
> 
>         _Current status:_
>         Items that are marked critical in the failures list tend to fail
>         jobs in ~5-10% runs each.
> 
>         I contacted Anton Kedin directly and he works on fixes for
>         couple of most problematic flakes currently. Anton, thank you
>         for picking those up.
> 
>         Please, update owner and status of ticket if you start working
>         on some test failure, this will save time for others who might
>         also start looking into the failure.
> 
>         Thank you,
>         --Mikhail
> 
>         Have feedback <http://go/migryz-feedback>? 
> 

Re: Test failures list

Posted by Łukasz Gajowy <lu...@gmail.com>.
Thank you for working on improving the situation with test failures!

One general question: maybe it's a good idea to assign change authors/code
owners to the issues? Or just reach them in jira comments? They know the
code and they may be more likely to know solutions to failing tests or
provide useful information (when swamped in other things). WDYT?

wt., 14 sie 2018 o 20:05 Mikhail Gryzykhin <mi...@google.com> napisał(a):

> Hi everyone,
>
> We have increased amount of test jobs failures recently.
>
> In terms of numbers (based on my memory and http://35.226.225.164/):
> Java precommits went down from ~55% to ~30% of succeeded jobs.
> Java postcommits went down from ~60 to ~40 of succeeded jobs.
>
>
I cannot access this link. Is it publicly accessible?


> I'm currently triaging post-commit failures and wonder if it will be
> useful to send regular updates on found issues and implemented fixes?
>
> What can be present in update:
> * Tests greenness based on http://35.226.225.164/ (work on better
> dashboard is in progress)
> * List of Jira tickets with triaged failures with no owners
> * List of Jira tickets in progress and who's working on fixes
> * List of Jira tickets with fixes shipped
>
>
Each point can also have short description of failure reason.
>

I think such report should be very brief and informative. IMO the report
should contain the failures (as short summaries and a link to a JIRA
ticket). Whoever's working on an issue should assign him/herself to the
ticket and mark it as "IN PROGRESS" so there's no collisions between
contributors fixing the tests. I don't see the need for listing the in
progress issues (jira already shows that). List of fixed issues may show
the progress, but I'd rather see a blank report with an empty failing tests
list. :)

In fact, I think the list, you showed in the previous message
<https://issues.apache.org/jira/browse/BEAM-5122?jql=project%20%3D%20BEAM%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20resolution%20%3D%20Unresolved%20AND%20component%20%3D%20test-failures%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC>
will
suffice.


>
> I believe such update sent daily or bi-daily can increase visibility for
> known failures, simplify search for people who can fix tests, and add nice
> tracking status.
>

Aren't weekly reports enough? It may be hard to change a lot in a day (two
days).


>
> What do you think?
>
> Regards,
> --Mikhail
>
> Have feedback <http://go/migryz-feedback>?
>
>
> On Fri, Aug 10, 2018 at 1:24 PM Mikhail Gryzykhin <mi...@google.com>
> wrote:
>
>> Hi everyone,
>>
>> I'm following up on tackling post-commit tests greenness. (See beam
>> post-commit policies
>> <https://beam.apache.org/contribute/postcommits-policies/>)
>>
>> During this week, I've assembled a list of most problematic flaky or
>> failing tests
>> <https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20resolution%20%3D%20Unresolved%20AND%20component%20%3D%20test-failures%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC>.
>> Unfortunately, I'm relatively new to the project and lack triaging guides,
>> so most of tickets contain only basic information.
>>
>> *I want to ask community help in following areas:*
>> 1. If you know how to triage tests or the location of triage guide,
>> please share the knowledge. You can post links here, or add pages to Confluence
>> wiki <https://cwiki.apache.org/confluence/display/BEAM/> and share link
>> here.
>> 2. Please, check on the Jira test-failures
>> <https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20resolution%20%3D%20Unresolved%20AND%20component%20%3D%20test-failures%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC>list
>> and pick up tests that you might know how to fix and help with fixing
>> those. Tickets that do not have owner now are not being worked on. I'm
>> trying out easy mitigations for some of the failures (ie increasing
>> timeouts), but those should not be treated as fixes.
>>
>> *Current status:*
>> Items that are marked critical in the failures list tend to fail jobs in
>> ~5-10% runs each.
>>
>> I contacted Anton Kedin directly and he works on fixes for couple of most
>> problematic flakes currently. Anton, thank you for picking those up.
>>
>> Please, update owner and status of ticket if you start working on some
>> test failure, this will save time for others who might also start looking
>> into the failure.
>>
>> Thank you,
>> --Mikhail
>>
>> Have feedback <http://go/migryz-feedback>?
>>
>

Re: Test failures list

Posted by Mikhail Gryzykhin <mi...@google.com>.
Hi everyone,

We have increased amount of test jobs failures recently.

In terms of numbers (based on my memory and http://35.226.225.164/):
Java precommits went down from ~55% to ~30% of succeeded jobs.
Java postcommits went down from ~60 to ~40 of succeeded jobs.

I'm currently triaging post-commit failures and wonder if it will be useful
to send regular updates on found issues and implemented fixes?

What can be present in update:
* Tests greenness based on http://35.226.225.164/ (work on better dashboard
is in progress)
* List of Jira tickets with triaged failures with no owners
* List of Jira tickets in progress and who's working on fixes
* List of Jira tickets with fixes shipped

Each point can also have short description of failure reason.

I believe such update sent daily or bi-daily can increase visibility for
known failures, simplify search for people who can fix tests, and add nice
tracking status.

What do you think?

Regards,
--Mikhail

Have feedback <http://go/migryz-feedback>?


On Fri, Aug 10, 2018 at 1:24 PM Mikhail Gryzykhin <mi...@google.com> wrote:

> Hi everyone,
>
> I'm following up on tackling post-commit tests greenness. (See beam
> post-commit policies
> <https://beam.apache.org/contribute/postcommits-policies/>)
>
> During this week, I've assembled a list of most problematic flaky or
> failing tests
> <https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20resolution%20%3D%20Unresolved%20AND%20component%20%3D%20test-failures%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC>.
> Unfortunately, I'm relatively new to the project and lack triaging guides,
> so most of tickets contain only basic information.
>
> *I want to ask community help in following areas:*
> 1. If you know how to triage tests or the location of triage guide, please
> share the knowledge. You can post links here, or add pages to Confluence
> wiki <https://cwiki.apache.org/confluence/display/BEAM/> and share link
> here.
> 2. Please, check on the Jira test-failures
> <https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20resolution%20%3D%20Unresolved%20AND%20component%20%3D%20test-failures%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC>list
> and pick up tests that you might know how to fix and help with fixing
> those. Tickets that do not have owner now are not being worked on. I'm
> trying out easy mitigations for some of the failures (ie increasing
> timeouts), but those should not be treated as fixes.
>
> *Current status:*
> Items that are marked critical in the failures list tend to fail jobs in
> ~5-10% runs each.
>
> I contacted Anton Kedin directly and he works on fixes for couple of most
> problematic flakes currently. Anton, thank you for picking those up.
>
> Please, update owner and status of ticket if you start working on some
> test failure, this will save time for others who might also start looking
> into the failure.
>
> Thank you,
> --Mikhail
>
> Have feedback <http://go/migryz-feedback>?
>