You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@kafka.apache.org by Dan S <da...@gmail.com> on 2022/11/24 08:38:17 UTC

Ci stability

Hello all,

I've had a pr that has been open for a little over a month (several
feedback cycles happened), and I've never seen a fully passing build (tests
in completely different parts of the codebase seemed to fail, often
timeouts). A cursory look at open PRs seems to indicate that mine is not
the only one. I was wondering if there is a place where all the flaky tests
are being tracked, and if it makes sense to fix (or at least temporarily
disable) them so that confidence in new PRs could be increased.

Thanks,

Dan

Re: Ci stability

Posted by Dan S <da...@gmail.com>.

Thanks Colin, I have a draft PR open which I occasionally check on and
disable the failing tests, I'll update it and see if it passes.

Thanks,

Daniel Scanteianu

On Mon, Dec 5, 2022, 18:02 Colin McCabe <cm...@apache.org> wrote:

> FYI, there was a memory leak that affected some of the tests which was
> fixed recently, so hopefully stability will improve a bit. See KAFKA-14433
> for details.
>
> best,
> Colin
>
> On Thu, Nov 24, 2022, at 12:48, John Roesler wrote:
> > Hi Dan,
> >
> > I’m not sure if there’s a consistently used tag, but I’ve gotten good
> > mileage out of just searching for “flaky” or “flaky test” in Jira.
> >
> > If you’re thinking about filing a ticket for a specific test failure
> > you’ve seen, I’ve also usually been able to find out whether there’s
> > already a ticket by searching for the test class or method name.
> >
> > People seem to typically file tickets with “flaky” in the title and
> > then the test name.
> >
> > Thanks again for your interest in improving the situation!
> > -John
> >
> > On Thu, Nov 24, 2022, at 10:08, Dan S wrote:
> >> Thanks for the reply John! Is there a jira tag or view or something that
> >> can be used to find all the failing tests and maybe even try to fix them
> >> (even if fix just means extending a timeout)?
> >>
> >>
> >>
> >> On Thu, Nov 24, 2022, 16:03 John Roesler <vv...@apache.org> wrote:
> >>
> >>> Hi Dan,
> >>>
> >>> Thanks for pointing this out. Flaky tests are a perennial problem. We
> >>> knock them out every now and then, but eventually more spring up.
> >>>
> >>> I’ve had some luck in the past filing Jira tickets for the failing
> tests
> >>> as they pop up in my PRs. Another thing that seems to motivate people
> is to
> >>> open a PR to disable the test in question, as you mention. That can be
> a
> >>> bit aggressive, though, so it wouldn’t be my first suggestion.
> >>>
> >>> I appreciate you bringing this up. I agree that flaky tests pose a
> risk to
> >>> the project because it makes it harder to know whether a PR breaks
> things
> >>> or not.
> >>>
> >>> Thanks,
> >>> John
> >>>
> >>> On Thu, Nov 24, 2022, at 02:38, Dan S wrote:
> >>> > Hello all,
> >>> >
> >>> > I've had a pr that has been open for a little over a month (several
> >>> > feedback cycles happened), and I've never seen a fully passing build
> >>> (tests
> >>> > in completely different parts of the codebase seemed to fail, often
> >>> > timeouts). A cursory look at open PRs seems to indicate that mine is
> not
> >>> > the only one. I was wondering if there is a place where all the flaky
> >>> tests
> >>> > are being tracked, and if it makes sense to fix (or at least
> temporarily
> >>> > disable) them so that confidence in new PRs could be increased.
> >>> >
> >>> > Thanks,
> >>> >
> >>> > Dan
> >>>
>

Re: Ci stability

Posted by Colin McCabe <cm...@apache.org>.

FYI, there was a memory leak that affected some of the tests which was fixed recently, so hopefully stability will improve a bit. See KAFKA-14433 for details.

best,
Colin

On Thu, Nov 24, 2022, at 12:48, John Roesler wrote:
> Hi Dan,
>
> I’m not sure if there’s a consistently used tag, but I’ve gotten good 
> mileage out of just searching for “flaky” or “flaky test” in Jira. 
>
> If you’re thinking about filing a ticket for a specific test failure 
> you’ve seen, I’ve also usually been able to find out whether there’s 
> already a ticket by searching for the test class or method name. 
>
> People seem to typically file tickets with “flaky” in the title and 
> then the test name. 
>
> Thanks again for your interest in improving the situation!
> -John
>
> On Thu, Nov 24, 2022, at 10:08, Dan S wrote:
>> Thanks for the reply John! Is there a jira tag or view or something that
>> can be used to find all the failing tests and maybe even try to fix them
>> (even if fix just means extending a timeout)?
>>
>>
>>
>> On Thu, Nov 24, 2022, 16:03 John Roesler <vv...@apache.org> wrote:
>>
>>> Hi Dan,
>>>
>>> Thanks for pointing this out. Flaky tests are a perennial problem. We
>>> knock them out every now and then, but eventually more spring up.
>>>
>>> I’ve had some luck in the past filing Jira tickets for the failing tests
>>> as they pop up in my PRs. Another thing that seems to motivate people is to
>>> open a PR to disable the test in question, as you mention. That can be a
>>> bit aggressive, though, so it wouldn’t be my first suggestion.
>>>
>>> I appreciate you bringing this up. I agree that flaky tests pose a risk to
>>> the project because it makes it harder to know whether a PR breaks things
>>> or not.
>>>
>>> Thanks,
>>> John
>>>
>>> On Thu, Nov 24, 2022, at 02:38, Dan S wrote:
>>> > Hello all,
>>> >
>>> > I've had a pr that has been open for a little over a month (several
>>> > feedback cycles happened), and I've never seen a fully passing build
>>> (tests
>>> > in completely different parts of the codebase seemed to fail, often
>>> > timeouts). A cursory look at open PRs seems to indicate that mine is not
>>> > the only one. I was wondering if there is a place where all the flaky
>>> tests
>>> > are being tracked, and if it makes sense to fix (or at least temporarily
>>> > disable) them so that confidence in new PRs could be increased.
>>> >
>>> > Thanks,
>>> >
>>> > Dan
>>>

Re: Ci stability

Posted by John Roesler <vv...@apache.org>.

Hi Dan,

I’m not sure if there’s a consistently used tag, but I’ve gotten good mileage out of just searching for “flaky” or “flaky test” in Jira. 

If you’re thinking about filing a ticket for a specific test failure you’ve seen, I’ve also usually been able to find out whether there’s already a ticket by searching for the test class or method name. 

People seem to typically file tickets with “flaky” in the title and then the test name. 

Thanks again for your interest in improving the situation!
-John

On Thu, Nov 24, 2022, at 10:08, Dan S wrote:
> Thanks for the reply John! Is there a jira tag or view or something that
> can be used to find all the failing tests and maybe even try to fix them
> (even if fix just means extending a timeout)?
>
>
>
> On Thu, Nov 24, 2022, 16:03 John Roesler <vv...@apache.org> wrote:
>
>> Hi Dan,
>>
>> Thanks for pointing this out. Flaky tests are a perennial problem. We
>> knock them out every now and then, but eventually more spring up.
>>
>> I’ve had some luck in the past filing Jira tickets for the failing tests
>> as they pop up in my PRs. Another thing that seems to motivate people is to
>> open a PR to disable the test in question, as you mention. That can be a
>> bit aggressive, though, so it wouldn’t be my first suggestion.
>>
>> I appreciate you bringing this up. I agree that flaky tests pose a risk to
>> the project because it makes it harder to know whether a PR breaks things
>> or not.
>>
>> Thanks,
>> John
>>
>> On Thu, Nov 24, 2022, at 02:38, Dan S wrote:
>> > Hello all,
>> >
>> > I've had a pr that has been open for a little over a month (several
>> > feedback cycles happened), and I've never seen a fully passing build
>> (tests
>> > in completely different parts of the codebase seemed to fail, often
>> > timeouts). A cursory look at open PRs seems to indicate that mine is not
>> > the only one. I was wondering if there is a place where all the flaky
>> tests
>> > are being tracked, and if it makes sense to fix (or at least temporarily
>> > disable) them so that confidence in new PRs could be increased.
>> >
>> > Thanks,
>> >
>> > Dan
>>

Re: Ci stability

Posted by Dan S <da...@gmail.com>.

Thanks for the reply John! Is there a jira tag or view or something that
can be used to find all the failing tests and maybe even try to fix them
(even if fix just means extending a timeout)?



On Thu, Nov 24, 2022, 16:03 John Roesler <vv...@apache.org> wrote:

> Hi Dan,
>
> Thanks for pointing this out. Flaky tests are a perennial problem. We
> knock them out every now and then, but eventually more spring up.
>
> I’ve had some luck in the past filing Jira tickets for the failing tests
> as they pop up in my PRs. Another thing that seems to motivate people is to
> open a PR to disable the test in question, as you mention. That can be a
> bit aggressive, though, so it wouldn’t be my first suggestion.
>
> I appreciate you bringing this up. I agree that flaky tests pose a risk to
> the project because it makes it harder to know whether a PR breaks things
> or not.
>
> Thanks,
> John
>
> On Thu, Nov 24, 2022, at 02:38, Dan S wrote:
> > Hello all,
> >
> > I've had a pr that has been open for a little over a month (several
> > feedback cycles happened), and I've never seen a fully passing build
> (tests
> > in completely different parts of the codebase seemed to fail, often
> > timeouts). A cursory look at open PRs seems to indicate that mine is not
> > the only one. I was wondering if there is a place where all the flaky
> tests
> > are being tracked, and if it makes sense to fix (or at least temporarily
> > disable) them so that confidence in new PRs could be increased.
> >
> > Thanks,
> >
> > Dan
>

Re: Ci stability

Posted by John Roesler <vv...@apache.org>.

Hi Dan,

Thanks for pointing this out. Flaky tests are a perennial problem. We knock them out every now and then, but eventually more spring up.

I’ve had some luck in the past filing Jira tickets for the failing tests as they pop up in my PRs. Another thing that seems to motivate people is to open a PR to disable the test in question, as you mention. That can be a bit aggressive, though, so it wouldn’t be my first suggestion.

I appreciate you bringing this up. I agree that flaky tests pose a risk to the project because it makes it harder to know whether a PR breaks things or not. 

Thanks,
John

On Thu, Nov 24, 2022, at 02:38, Dan S wrote:
> Hello all,
>
> I've had a pr that has been open for a little over a month (several
> feedback cycles happened), and I've never seen a fully passing build (tests
> in completely different parts of the codebase seemed to fail, often
> timeouts). A cursory look at open PRs seems to indicate that mine is not
> the only one. I was wondering if there is a place where all the flaky tests
> are being tracked, and if it makes sense to fix (or at least temporarily
> disable) them so that confidence in new PRs could be increased.
>
> Thanks,
>
> Dan