You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by Apekshit Sharma <ap...@cloudera.com> on 2016/03/30 01:25:17 UTC

Better management of flakies

Proposal:
Maintain a list of flaky tests which can be ignored in main builds (for
patches) and setup a new job which will periodically run those flaky tests.

Benefits:
- Cleaner main builds. Less runs per patch,  ideally no re-runs because of
timeouts and other flakiness. Less frustration. Increased developer
productivity.
- Logs from various runs available when anyone tries to fix these tests.

Possible con: We start demoting tests to flaky list which never get fixed.

How?
Internally, we have setup up two jenkins jobs, one for main build and one
for flaky. In execute shell step, they curl to get the list of flaky tests
and use surefire plugin flags to ignore/run particular tests.
I volunteer to set them up. However, not sure if this approach will work
since we have yetus upstream.

- Appy

Re: Better management of flakies

Posted by Stack <st...@duboce.net>.
On Tue, Mar 29, 2016 at 4:46 PM, Matteo Bertozzi <th...@gmail.com>
wrote:

> don't we have already a flakey detector? HBASE-8018
>
> https://github.com/apache/hbase/commit/4dc52261a19ed1055b837b136c97ea36e362ba6e
>
>
I remember that script. It used to work for me but had forgotten about it.
St.Ack




> On Tue, Mar 29, 2016 at 4:39 PM, Sean Busbey <bu...@cloudera.com> wrote:
>
> > How about if we make a script for determining flakey status, based on
> > rate of failing in either post-commit or in the pre-existing branch
> > check for precommit. The script could live in dev-support and work by
> > effectively grabbing the test reports from those jobs.
> >
> > We can then run that script in a job and save the results as a build
> > artifact.
> >
> > We can then build the changes in your original email: ignoring the
> > list entries in our normal builds and rechecking just them in a
> > dedicated job.
> >
> > On Tue, Mar 29, 2016 at 6:30 PM, Sean Busbey <bu...@cloudera.com>
> wrote:
> > > Where/How do we maintain the list of tests that are flagged as flakes?
> > >
> > > On Tue, Mar 29, 2016 at 6:25 PM, Apekshit Sharma <ap...@cloudera.com>
> > wrote:
> > >> Proposal:
> > >> Maintain a list of flaky tests which can be ignored in main builds
> (for
> > >> patches) and setup a new job which will periodically run those flaky
> > tests.
> > >>
> > >> Benefits:
> > >> - Cleaner main builds. Less runs per patch,  ideally no re-runs
> because
> > of
> > >> timeouts and other flakiness. Less frustration. Increased developer
> > >> productivity.
> > >> - Logs from various runs available when anyone tries to fix these
> tests.
> > >>
> > >> Possible con: We start demoting tests to flaky list which never get
> > fixed.
> > >>
> > >> How?
> > >> Internally, we have setup up two jenkins jobs, one for main build and
> > one
> > >> for flaky. In execute shell step, they curl to get the list of flaky
> > tests
> > >> and use surefire plugin flags to ignore/run particular tests.
> > >> I volunteer to set them up. However, not sure if this approach will
> work
> > >> since we have yetus upstream.
> > >>
> > >> - Appy
> > >
> > >
> > >
> > > --
> > > busbey
> >
> >
> >
> > --
> > busbey
> >
>

Re: Better management of flakies

Posted by Matteo Bertozzi <th...@gmail.com>.
don't we have already a flakey detector? HBASE-8018
https://github.com/apache/hbase/commit/4dc52261a19ed1055b837b136c97ea36e362ba6e

On Tue, Mar 29, 2016 at 4:39 PM, Sean Busbey <bu...@cloudera.com> wrote:

> How about if we make a script for determining flakey status, based on
> rate of failing in either post-commit or in the pre-existing branch
> check for precommit. The script could live in dev-support and work by
> effectively grabbing the test reports from those jobs.
>
> We can then run that script in a job and save the results as a build
> artifact.
>
> We can then build the changes in your original email: ignoring the
> list entries in our normal builds and rechecking just them in a
> dedicated job.
>
> On Tue, Mar 29, 2016 at 6:30 PM, Sean Busbey <bu...@cloudera.com> wrote:
> > Where/How do we maintain the list of tests that are flagged as flakes?
> >
> > On Tue, Mar 29, 2016 at 6:25 PM, Apekshit Sharma <ap...@cloudera.com>
> wrote:
> >> Proposal:
> >> Maintain a list of flaky tests which can be ignored in main builds (for
> >> patches) and setup a new job which will periodically run those flaky
> tests.
> >>
> >> Benefits:
> >> - Cleaner main builds. Less runs per patch,  ideally no re-runs because
> of
> >> timeouts and other flakiness. Less frustration. Increased developer
> >> productivity.
> >> - Logs from various runs available when anyone tries to fix these tests.
> >>
> >> Possible con: We start demoting tests to flaky list which never get
> fixed.
> >>
> >> How?
> >> Internally, we have setup up two jenkins jobs, one for main build and
> one
> >> for flaky. In execute shell step, they curl to get the list of flaky
> tests
> >> and use surefire plugin flags to ignore/run particular tests.
> >> I volunteer to set them up. However, not sure if this approach will work
> >> since we have yetus upstream.
> >>
> >> - Appy
> >
> >
> >
> > --
> > busbey
>
>
>
> --
> busbey
>

Re: Better management of flakies

Posted by Sean Busbey <bu...@cloudera.com>.
How about if we make a script for determining flakey status, based on
rate of failing in either post-commit or in the pre-existing branch
check for precommit. The script could live in dev-support and work by
effectively grabbing the test reports from those jobs.

We can then run that script in a job and save the results as a build artifact.

We can then build the changes in your original email: ignoring the
list entries in our normal builds and rechecking just them in a
dedicated job.

On Tue, Mar 29, 2016 at 6:30 PM, Sean Busbey <bu...@cloudera.com> wrote:
> Where/How do we maintain the list of tests that are flagged as flakes?
>
> On Tue, Mar 29, 2016 at 6:25 PM, Apekshit Sharma <ap...@cloudera.com> wrote:
>> Proposal:
>> Maintain a list of flaky tests which can be ignored in main builds (for
>> patches) and setup a new job which will periodically run those flaky tests.
>>
>> Benefits:
>> - Cleaner main builds. Less runs per patch,  ideally no re-runs because of
>> timeouts and other flakiness. Less frustration. Increased developer
>> productivity.
>> - Logs from various runs available when anyone tries to fix these tests.
>>
>> Possible con: We start demoting tests to flaky list which never get fixed.
>>
>> How?
>> Internally, we have setup up two jenkins jobs, one for main build and one
>> for flaky. In execute shell step, they curl to get the list of flaky tests
>> and use surefire plugin flags to ignore/run particular tests.
>> I volunteer to set them up. However, not sure if this approach will work
>> since we have yetus upstream.
>>
>> - Appy
>
>
>
> --
> busbey



-- 
busbey

Re: Better management of flakies

Posted by Sean Busbey <bu...@cloudera.com>.
Where/How do we maintain the list of tests that are flagged as flakes?

On Tue, Mar 29, 2016 at 6:25 PM, Apekshit Sharma <ap...@cloudera.com> wrote:
> Proposal:
> Maintain a list of flaky tests which can be ignored in main builds (for
> patches) and setup a new job which will periodically run those flaky tests.
>
> Benefits:
> - Cleaner main builds. Less runs per patch,  ideally no re-runs because of
> timeouts and other flakiness. Less frustration. Increased developer
> productivity.
> - Logs from various runs available when anyone tries to fix these tests.
>
> Possible con: We start demoting tests to flaky list which never get fixed.
>
> How?
> Internally, we have setup up two jenkins jobs, one for main build and one
> for flaky. In execute shell step, they curl to get the list of flaky tests
> and use surefire plugin flags to ignore/run particular tests.
> I volunteer to set them up. However, not sure if this approach will work
> since we have yetus upstream.
>
> - Appy



-- 
busbey

Re: Better management of flakies

Posted by Apekshit Sharma <ap...@cloudera.com>.
https://issues.apache.org/jira/browse/HBASE-15651

On Fri, Apr 1, 2016 at 1:49 PM, Stack <st...@duboce.net> wrote:

> On Fri, Apr 1, 2016 at 10:05 AM, Sean Busbey <bu...@cloudera.com> wrote:
> ....
>
> > Maybe better if we ensure everything needed to run our jobs is scripts
> > in dev-support so that the commiter-only change can just be moving
> > jenkins to execute them.
> >
> >
> I like this idea Sean.
> St.Ack
>
>
>
> > On Fri, Apr 1, 2016 at 12:01 PM, Apekshit Sharma <ap...@cloudera.com>
> > wrote:
> > > So who's setting up the jobs? I don't have perms to do it.
> > >
> > > On Wed, Mar 30, 2016 at 11:24 AM, Apekshit Sharma <ap...@cloudera.com>
> > wrote:
> > >
> > >> So tackling the two parts individually:
> > >> 1) Detection : A good automatic flaky detector will no doubt save
> manual
> > >> effort. The one mentioned by Matteo seems like a good one which we can
> > hook
> > >> up with post-commit job. I see a minor problem though, we'll should
> use
> > >> about 20 runs at least, but the rate of execution of this post-commit
> > >> <https://builds.apache.org/view/All/job/HBase-Trunk_matrix/> job for
> > >> trunk is too low which means we'll get stale information from the
> tool.
> > One
> > >> way around that would be running the post-commit job back-to-back? It
> > can
> > >> than trigger another job in the end which will use the tool to
> recompute
> > >> flakies.
> > >>
> > >> 2) Handling: Ignoring flakies in main build and having separate job
> just
> > >> for flakies. We can run it often enough and get some stats on
> flaky-ness
> > >> rate of each test using the same tool as above.
> > >>
> > >> However, I feel that we should keep it super simple in the start. Just
> > >> have a manual list and a separate job for flakies. It it works out
> > fine, we
> > >> can add the automatic detection and statistics part later.
> > >>
> > >> - Appy
> > >>
> > >
> > >
> > >
> > > --
> > >
> > > Regards
> > >
> > > Apekshit Sharma | Software Engineer, Cloudera | Palo Alto, California |
> > > 650-963-6311
> >
> >
> >
> > --
> > busbey
> >
>



-- 

Regards

Apekshit Sharma | Software Engineer, Cloudera | Palo Alto, California |
650-963-6311

Re: Better management of flakies

Posted by Stack <st...@duboce.net>.
On Fri, Apr 1, 2016 at 10:05 AM, Sean Busbey <bu...@cloudera.com> wrote:
....

> Maybe better if we ensure everything needed to run our jobs is scripts
> in dev-support so that the commiter-only change can just be moving
> jenkins to execute them.
>
>
I like this idea Sean.
St.Ack



> On Fri, Apr 1, 2016 at 12:01 PM, Apekshit Sharma <ap...@cloudera.com>
> wrote:
> > So who's setting up the jobs? I don't have perms to do it.
> >
> > On Wed, Mar 30, 2016 at 11:24 AM, Apekshit Sharma <ap...@cloudera.com>
> wrote:
> >
> >> So tackling the two parts individually:
> >> 1) Detection : A good automatic flaky detector will no doubt save manual
> >> effort. The one mentioned by Matteo seems like a good one which we can
> hook
> >> up with post-commit job. I see a minor problem though, we'll should use
> >> about 20 runs at least, but the rate of execution of this post-commit
> >> <https://builds.apache.org/view/All/job/HBase-Trunk_matrix/> job for
> >> trunk is too low which means we'll get stale information from the tool.
> One
> >> way around that would be running the post-commit job back-to-back? It
> can
> >> than trigger another job in the end which will use the tool to recompute
> >> flakies.
> >>
> >> 2) Handling: Ignoring flakies in main build and having separate job just
> >> for flakies. We can run it often enough and get some stats on flaky-ness
> >> rate of each test using the same tool as above.
> >>
> >> However, I feel that we should keep it super simple in the start. Just
> >> have a manual list and a separate job for flakies. It it works out
> fine, we
> >> can add the automatic detection and statistics part later.
> >>
> >> - Appy
> >>
> >
> >
> >
> > --
> >
> > Regards
> >
> > Apekshit Sharma | Software Engineer, Cloudera | Palo Alto, California |
> > 650-963-6311
>
>
>
> --
> busbey
>

Re: Better management of flakies

Posted by Sean Busbey <bu...@cloudera.com>.
If you create a jira that describes what we need to implement, I'll
find time to put in changes.

Maybe better if we ensure everything needed to run our jobs is scripts
in dev-support so that the commiter-only change can just be moving
jenkins to execute them.

On Fri, Apr 1, 2016 at 12:01 PM, Apekshit Sharma <ap...@cloudera.com> wrote:
> So who's setting up the jobs? I don't have perms to do it.
>
> On Wed, Mar 30, 2016 at 11:24 AM, Apekshit Sharma <ap...@cloudera.com> wrote:
>
>> So tackling the two parts individually:
>> 1) Detection : A good automatic flaky detector will no doubt save manual
>> effort. The one mentioned by Matteo seems like a good one which we can hook
>> up with post-commit job. I see a minor problem though, we'll should use
>> about 20 runs at least, but the rate of execution of this post-commit
>> <https://builds.apache.org/view/All/job/HBase-Trunk_matrix/> job for
>> trunk is too low which means we'll get stale information from the tool. One
>> way around that would be running the post-commit job back-to-back? It can
>> than trigger another job in the end which will use the tool to recompute
>> flakies.
>>
>> 2) Handling: Ignoring flakies in main build and having separate job just
>> for flakies. We can run it often enough and get some stats on flaky-ness
>> rate of each test using the same tool as above.
>>
>> However, I feel that we should keep it super simple in the start. Just
>> have a manual list and a separate job for flakies. It it works out fine, we
>> can add the automatic detection and statistics part later.
>>
>> - Appy
>>
>
>
>
> --
>
> Regards
>
> Apekshit Sharma | Software Engineer, Cloudera | Palo Alto, California |
> 650-963-6311



-- 
busbey

Re: Better management of flakies

Posted by Apekshit Sharma <ap...@cloudera.com>.
So who's setting up the jobs? I don't have perms to do it.

On Wed, Mar 30, 2016 at 11:24 AM, Apekshit Sharma <ap...@cloudera.com> wrote:

> So tackling the two parts individually:
> 1) Detection : A good automatic flaky detector will no doubt save manual
> effort. The one mentioned by Matteo seems like a good one which we can hook
> up with post-commit job. I see a minor problem though, we'll should use
> about 20 runs at least, but the rate of execution of this post-commit
> <https://builds.apache.org/view/All/job/HBase-Trunk_matrix/> job for
> trunk is too low which means we'll get stale information from the tool. One
> way around that would be running the post-commit job back-to-back? It can
> than trigger another job in the end which will use the tool to recompute
> flakies.
>
> 2) Handling: Ignoring flakies in main build and having separate job just
> for flakies. We can run it often enough and get some stats on flaky-ness
> rate of each test using the same tool as above.
>
> However, I feel that we should keep it super simple in the start. Just
> have a manual list and a separate job for flakies. It it works out fine, we
> can add the automatic detection and statistics part later.
>
> - Appy
>



-- 

Regards

Apekshit Sharma | Software Engineer, Cloudera | Palo Alto, California |
650-963-6311

Re: Better management of flakies

Posted by Apekshit Sharma <ap...@cloudera.com>.
So tackling the two parts individually:
1) Detection : A good automatic flaky detector will no doubt save manual
effort. The one mentioned by Matteo seems like a good one which we can hook
up with post-commit job. I see a minor problem though, we'll should use
about 20 runs at least, but the rate of execution of this post-commit
<https://builds.apache.org/view/All/job/HBase-Trunk_matrix/> job for trunk
is too low which means we'll get stale information from the tool. One way
around that would be running the post-commit job back-to-back? It can than
trigger another job in the end which will use the tool to recompute flakies.

2) Handling: Ignoring flakies in main build and having separate job just
for flakies. We can run it often enough and get some stats on flaky-ness
rate of each test using the same tool as above.

However, I feel that we should keep it super simple in the start. Just have
a manual list and a separate job for flakies. It it works out fine, we can
add the automatic detection and statistics part later.

- Appy

Re: Better management of flakies

Posted by Ted Yu <yu...@gmail.com>.
bq. We start demoting tests to flaky list which never get fixed.

I think this may happen.
Jenkins jobs for 1.x builds are relatively stable - meaning, we would have
either green Java 7 build or green Java 8 build (if not both).
Broken test(s) would be quickly identified and fixed.

I don't think we need dual builds for 1.x branches.

On Tue, Mar 29, 2016 at 4:25 PM, Apekshit Sharma <ap...@cloudera.com> wrote:

> Proposal:
> Maintain a list of flaky tests which can be ignored in main builds (for
> patches) and setup a new job which will periodically run those flaky tests.
>
> Benefits:
> - Cleaner main builds. Less runs per patch,  ideally no re-runs because of
> timeouts and other flakiness. Less frustration. Increased developer
> productivity.
> - Logs from various runs available when anyone tries to fix these tests.
>
> Possible con: We start demoting tests to flaky list which never get fixed.
>
> How?
> Internally, we have setup up two jenkins jobs, one for main build and one
> for flaky. In execute shell step, they curl to get the list of flaky tests
> and use surefire plugin flags to ignore/run particular tests.
> I volunteer to set them up. However, not sure if this approach will work
> since we have yetus upstream.
>
> - Appy
>