You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@cassandra.apache.org by sankalp kohli <ko...@gmail.com> on 2016/12/03 06:48:18 UTC

Re: Failed Dtest will block cutting releases

Hi,
    I dont see any any update on this thread. We will go ahead and make
Dtest a blocker for cutting releasing for anything after 3.10.

Please respond if anyone has an objection to this.

Thanks,
Sankalp



On Mon, Nov 21, 2016 at 11:57 AM, Josh McKenzie <jm...@apache.org>
wrote:

> Caveat: I'm strongly in favor of us blocking a release on a non-green test
> board of either utest or dtest.
>
>
> > put something in prod which is known to be broken in obvious ways
>
> In my experience the majority of fixes are actually shoring up low-quality
> / flaky tests or fixing tests that have been invalidated by a commit but do
> not indicate an underlying bug. Inferring "tests are failing so we know
> we're asking people to put things in prod that are broken in obvious ways"
> is hyperbolic. A more correct statement would be: "Tests are failing so we
> know we're shipping with a test that's failing" which is not helpful.
>
> Our signal to noise ratio with tests has been very poor historically; we've
> been trying to address that through aggressive triage and assigning out
> test failures however we need far more active and widespread community
> involvement if we want to truly *fix* this problem long-term.
>
> On Mon, Nov 21, 2016 at 2:33 PM, Jonathan Haddad <jo...@jonhaddad.com>
> wrote:
>
> > +1.  Kind of silly to put advise people to put something in prod which is
> > known to be broken in obvious ways
> >
> > On Mon, Nov 21, 2016 at 11:31 AM sankalp kohli <ko...@gmail.com>
> > wrote:
> >
> > > Hi,
> > >     We should not cut a releases if Dtest are not passing. I won't
> block
> > > 3.10 on this since we are just discussing this.
> > >
> > > Please provide feedback on this.
> > >
> > > Thanks,
> > > Sankalp
> > >
> >
>

Re: Failed Dtest will block cutting releases

Posted by Benjamin Roth <be...@jaumo.com>.

Hi Michael,

Thanks for this update. As a newbie it helped me to understand the
organization and processes a little bit better.

I don't know how many CS-devs know this but I love this rule (actually the
whole book):
http://programmer.97things.oreilly.com/wiki/index.php/The_Boy_Scout_Rule

I personally, to be honest, am not the kind of guy that walks through lists
and looks for issues that could be picked up and done, but if I encounter
anything (test, some weird code, design, whatever) that deserves to be
improved, analyzed or fixed and I have a little time left, I try to improve
or fix it.

At that time I am still quite new around here and in process of
understanding the whole picture of Cassandras behaviour, code, processes
and organization. I hope you can forgive me if I don't perfectly get the
point every time right now - but I am eager to learn and improve.

Thanks for your patience!

2016-12-04 19:33 GMT+01:00 Michael Shuler <mi...@pbandjelly.org>:

> Thanks for your thoughts on testing Apache Cassandra, I share them.
>
> I just wanted to note that the known_failure() annotations were recently
> removed from cassandra-dtest [0], due to lack of annotation removal when
> bugs fixed, and the internal webapp that we were using to parse has been
> broken for quite some time, with no fix in sight. The webapp was removed
> and we dropped all the known_failure() annotations.
>
> The test-failure JIRA label [1] is what we've been using during test run
> triage. Those tickets assigned to 'DS Test Eng' need figuring out if
> it's a test problem or Cassandra problem. Typically, the Unassigned
> tickets were determined to be possibly a Cassandra issue. If you enjoy
> test analysis and fixing them, please, jump in and analyze/fix them!
>
> [0] https://github.com/riptano/cassandra-dtest/pull/1399
> [1]
> https://issues.apache.org/jira/issues/?jql=project%20%
> 3D%20CASSANDRA%20AND%20labels%20%3D%20test-failure%20AND%
> 20resolution%20%3D%20unresolved
>
> --
> Kind regards,
> Michael Shuler
>
> On 12/04/2016 02:07 AM, Benjamin Roth wrote:
> > Sorry for jumping in so boldly before.
> >
> > TL;DR:
> >
> >    - I didn't mean to delete every flaky test just like that
> >    - To improve quality, each failing test has to be analyzed
> individually
> >    for release
> >
> > More thoughts on that:
> >
> > I had a closer look on some of the tests tagged as flaky and realized
> that
> > the situation here is more complex than I thought before.
> > Of course I didn't mean to delete all the flaky tests just like that.
> Maybe
> > I should rephrase it a bit to "If a (flaky) test can't really prove
> > something, then it is better not to have it". If a test does prove
> > something depends on its intention, its implementation and on how flaky
> it
> > really is and first of all: Why.
> >
> > These dtests are maybe blessing and curse at the same time. On the one
> hand
> > there are things you cannot test with a unit test, so you need them for
> > certain cases. On the other hand, dtest do not only test the desired
> case.
> >
> >    - They test the test environment (ccm, server hickups) and more or
> less
> >    all components of the CS daemon that are somehow involved as well.
> >    - This exposes the test to many more error sources than the bare test
> >    case and that creates of course a lot of "unreliability" in general
> and
> >    causes flaky results.
> >    - It makes it hard to pin down the failures to a certain cause like
> >       - Flaky test implementation
> >       - Flaky bugs in SUT
> >       - Unreliable test environment
> >    - Analyzing every failure is a pain. But a simple "retry and skip
> over"
> >    _may_ mask a real problem.
> >
> > => Difficult situation!
> >
> > From my own projects and non-CS experience I can tell:
> > Flaky tests give me a bad feeling and always leave a certain smell. I've
> > also just skipped them with that reason "Yes, I know it's flaky, I don't
> > really care about it". But it simply does not feel right.
> >
> > A real life example from another project:
> > Some weeks ago I wrote functional tests to test the integration of
> > SeaweedFS as a blob store backend in an image upload process. Test case
> was
> > roughly to upload an image, check if it exists on both old and new image
> > storage, delete it, check it again. The test existed for years. I simply
> > added some assertions to check the existance of the uploaded files on the
> > new storage. Funnyhow, I must have hit some corner case by that and from
> > that moment on, the test was flaky. Simple URL checks started to time out
> > from time to time. That made me really curios. To cut a long story short:
> > After having checked a whole lot of things, it turned out that not the
> test
> > was flaky and also not the shiny new storagy, it was the LVS
> loadbalancer.
> > The loadbalancer dropped connections reproducibly which happened more
> > likely with increasing concurrency. Finally we removed LVS completely and
> > replaced it by DNS-RR + VRRP, which completely solved the problem and the
> > tests ran happily ever after.
> >
> > Usually there is no pure black and white.
> >
> >    - Sometimes testing whole systems reveals problems you'd never
> >    have found without them
> >    - Sometimes they cause false alerts
> >    - Sometimes, skipping them masks real problems
> >    - Sometimes it sucks if a false alert blocks your release
> >
> > If you want to be really safe, you have to analyze every single failure
> and
> > decide of what kind this failure is or could be and if a retry will prove
> > sth or not. At least when you are at a release gate. I think this should
> be
> > worth it.
> >
> > There's a reason for this thread and there's a reason why people ask
> every
> > few days which CS version is production stable. Things have to improve
> over
> > time. This applies to test implementations, test environments, release
> > processes, and so on. One way to do this is to become a little bit
> stricter
> > (and a bit better) with every release. Making all tests pass at least
> once
> > before a release should be a rather low hanging fruit. Reducing the total
> > number of flaky tests or the "flaky-fail-rate" may be another future
> goal.
> >
> > Btw, the fact of the day:
> > I grepped through dtests and found out that roughly 11% of all tests are
> > flagged with "known_failure" and roughly 8% of all tests are flagged with
> > "flaky". Quite impressive.
> >
> >
> > 2016-12-03 15:52 GMT+01:00 Edward Capriolo <ed...@gmail.com>:
> >
> >> I think it is fair to run a flakey test again. If it is determine it
> flaked
> >> out due to a conflict with another test or something ephemeral in a long
> >> process it is not worth blocking a release.
> >>
> >> Just deleting it is probably not a good path.
> >>
> >> I actually enjoy writing fixing, tweeking, tests so pinge offline or
> >> whatever.
> >>
> >> On Saturday, December 3, 2016, Benjamin Roth <be...@jaumo.com>
> >> wrote:
> >>
> >>> Excuse me if I jump into an old thread, but from my experience, I have
> a
> >>> very clear opinion about situations like that as I encountered them
> >> before:
> >>>
> >>> Tests are there to give *certainty*.
> >>> *Would you like to pass a crossing with a green light if you cannot be
> >> sure
> >>> if green really means green?*
> >>> Do you want to rely on tests that are green, red, green, red? What if a
> >> red
> >>> is a real red and you missed it because you simply ignore it because
> it's
> >>> flaky?
> >>>
> >>> IMHO there are only 3 options how to deal with broken/red tests:
> >>> - Fix the underlying issue
> >>> - Fix the test
> >>> - Delete the test
> >>>
> >>> If I cannot trust a test, it is better not to have it at all. Otherwise
> >>> people are staring at red lights and start to drive.
> >>>
> >>> This causes:
> >>> - Uncertainty
> >>> - Loss of trust
> >>> - Confusion
> >>> - More work
> >>> - *Less quality*
> >>>
> >>> Just as an example:
> >>> Few days ago I created a patch. Then I ran the utest and 1 test failed.
> >>> Hmmm, did I break it? I had to check it twice by checking out the
> former
> >>> state, running the tests again just to recognize that it wasn't me who
> >> made
> >>> it fail. That's annoying.
> >>>
> >>> Sorry again, I'm rather new here but what I just read reminded me much
> of
> >>> situations I have been in years ago.
> >>> So: +1, John
> >>>
> >>> 2016-12-03 7:48 GMT+01:00 sankalp kohli <kohlisankalp@gmail.com
> >>> <javascript:;>>:
> >>>
> >>>> Hi,
> >>>>     I dont see any any update on this thread. We will go ahead and
> make
> >>>> Dtest a blocker for cutting releasing for anything after 3.10.
> >>>>
> >>>> Please respond if anyone has an objection to this.
> >>>>
> >>>> Thanks,
> >>>> Sankalp
> >>>>
> >>>>
> >>>>
> >>>> On Mon, Nov 21, 2016 at 11:57 AM, Josh McKenzie <jmckenzie@apache.org
> >>> <javascript:;>>
> >>>> wrote:
> >>>>
> >>>>> Caveat: I'm strongly in favor of us blocking a release on a non-green
> >>>> test
> >>>>> board of either utest or dtest.
> >>>>>
> >>>>>
> >>>>>> put something in prod which is known to be broken in obvious ways
> >>>>>
> >>>>> In my experience the majority of fixes are actually shoring up
> >>>> low-quality
> >>>>> / flaky tests or fixing tests that have been invalidated by a commit
> >>> but
> >>>> do
> >>>>> not indicate an underlying bug. Inferring "tests are failing so we
> >> know
> >>>>> we're asking people to put things in prod that are broken in obvious
> >>>> ways"
> >>>>> is hyperbolic. A more correct statement would be: "Tests are failing
> >> so
> >>>> we
> >>>>> know we're shipping with a test that's failing" which is not helpful.
> >>>>>
> >>>>> Our signal to noise ratio with tests has been very poor historically;
> >>>> we've
> >>>>> been trying to address that through aggressive triage and assigning
> >> out
> >>>>> test failures however we need far more active and widespread
> >> community
> >>>>> involvement if we want to truly *fix* this problem long-term.
> >>>>>
> >>>>> On Mon, Nov 21, 2016 at 2:33 PM, Jonathan Haddad <jon@jonhaddad.com
> >>> <javascript:;>>
> >>>>> wrote:
> >>>>>
> >>>>>> +1.  Kind of silly to put advise people to put something in prod
> >>> which
> >>>> is
> >>>>>> known to be broken in obvious ways
> >>>>>>
> >>>>>> On Mon, Nov 21, 2016 at 11:31 AM sankalp kohli <
> >>> kohlisankalp@gmail.com <javascript:;>
> >>>>>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Hi,
> >>>>>>>     We should not cut a releases if Dtest are not passing. I
> >> won't
> >>>>> block
> >>>>>>> 3.10 on this since we are just discussing this.
> >>>>>>>
> >>>>>>> Please provide feedback on this.
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> Sankalp
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Benjamin Roth
> >>> Prokurist
> >>>
> >>> Jaumo GmbH · www.jaumo.com
> >>> Wehrstraße 46 · 73035 Göppingen · Germany
> >>> Phone +49 7161 304880-6 · Fax +49 7161 304880-1
> >>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
> >>>
> >>
> >>
> >> --
> >> Sorry this was sent from mobile. Will do less grammar and spell check
> than
> >> usual.
> >>
> >
> >
> >
>
>


-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

Re: Failed Dtest will block cutting releases

Posted by Michael Shuler <mi...@pbandjelly.org>.

Thanks for your thoughts on testing Apache Cassandra, I share them.

I just wanted to note that the known_failure() annotations were recently
removed from cassandra-dtest [0], due to lack of annotation removal when
bugs fixed, and the internal webapp that we were using to parse has been
broken for quite some time, with no fix in sight. The webapp was removed
and we dropped all the known_failure() annotations.

The test-failure JIRA label [1] is what we've been using during test run
triage. Those tickets assigned to 'DS Test Eng' need figuring out if
it's a test problem or Cassandra problem. Typically, the Unassigned
tickets were determined to be possibly a Cassandra issue. If you enjoy
test analysis and fixing them, please, jump in and analyze/fix them!

[0] https://github.com/riptano/cassandra-dtest/pull/1399
[1]
https://issues.apache.org/jira/issues/?jql=project%20%3D%20CASSANDRA%20AND%20labels%20%3D%20test-failure%20AND%20resolution%20%3D%20unresolved

-- 
Kind regards,
Michael Shuler

On 12/04/2016 02:07 AM, Benjamin Roth wrote:
> Sorry for jumping in so boldly before.
> 
> TL;DR:
> 
>    - I didn't mean to delete every flaky test just like that
>    - To improve quality, each failing test has to be analyzed individually
>    for release
> 
> More thoughts on that:
> 
> I had a closer look on some of the tests tagged as flaky and realized that
> the situation here is more complex than I thought before.
> Of course I didn't mean to delete all the flaky tests just like that. Maybe
> I should rephrase it a bit to "If a (flaky) test can't really prove
> something, then it is better not to have it". If a test does prove
> something depends on its intention, its implementation and on how flaky it
> really is and first of all: Why.
> 
> These dtests are maybe blessing and curse at the same time. On the one hand
> there are things you cannot test with a unit test, so you need them for
> certain cases. On the other hand, dtest do not only test the desired case.
> 
>    - They test the test environment (ccm, server hickups) and more or less
>    all components of the CS daemon that are somehow involved as well.
>    - This exposes the test to many more error sources than the bare test
>    case and that creates of course a lot of "unreliability" in general and
>    causes flaky results.
>    - It makes it hard to pin down the failures to a certain cause like
>       - Flaky test implementation
>       - Flaky bugs in SUT
>       - Unreliable test environment
>    - Analyzing every failure is a pain. But a simple "retry and skip over"
>    _may_ mask a real problem.
> 
> => Difficult situation!
> 
> From my own projects and non-CS experience I can tell:
> Flaky tests give me a bad feeling and always leave a certain smell. I've
> also just skipped them with that reason "Yes, I know it's flaky, I don't
> really care about it". But it simply does not feel right.
> 
> A real life example from another project:
> Some weeks ago I wrote functional tests to test the integration of
> SeaweedFS as a blob store backend in an image upload process. Test case was
> roughly to upload an image, check if it exists on both old and new image
> storage, delete it, check it again. The test existed for years. I simply
> added some assertions to check the existance of the uploaded files on the
> new storage. Funnyhow, I must have hit some corner case by that and from
> that moment on, the test was flaky. Simple URL checks started to time out
> from time to time. That made me really curios. To cut a long story short:
> After having checked a whole lot of things, it turned out that not the test
> was flaky and also not the shiny new storagy, it was the LVS loadbalancer.
> The loadbalancer dropped connections reproducibly which happened more
> likely with increasing concurrency. Finally we removed LVS completely and
> replaced it by DNS-RR + VRRP, which completely solved the problem and the
> tests ran happily ever after.
> 
> Usually there is no pure black and white.
> 
>    - Sometimes testing whole systems reveals problems you'd never
>    have found without them
>    - Sometimes they cause false alerts
>    - Sometimes, skipping them masks real problems
>    - Sometimes it sucks if a false alert blocks your release
> 
> If you want to be really safe, you have to analyze every single failure and
> decide of what kind this failure is or could be and if a retry will prove
> sth or not. At least when you are at a release gate. I think this should be
> worth it.
> 
> There's a reason for this thread and there's a reason why people ask every
> few days which CS version is production stable. Things have to improve over
> time. This applies to test implementations, test environments, release
> processes, and so on. One way to do this is to become a little bit stricter
> (and a bit better) with every release. Making all tests pass at least once
> before a release should be a rather low hanging fruit. Reducing the total
> number of flaky tests or the "flaky-fail-rate" may be another future goal.
> 
> Btw, the fact of the day:
> I grepped through dtests and found out that roughly 11% of all tests are
> flagged with "known_failure" and roughly 8% of all tests are flagged with
> "flaky". Quite impressive.
> 
> 
> 2016-12-03 15:52 GMT+01:00 Edward Capriolo <ed...@gmail.com>:
> 
>> I think it is fair to run a flakey test again. If it is determine it flaked
>> out due to a conflict with another test or something ephemeral in a long
>> process it is not worth blocking a release.
>>
>> Just deleting it is probably not a good path.
>>
>> I actually enjoy writing fixing, tweeking, tests so pinge offline or
>> whatever.
>>
>> On Saturday, December 3, 2016, Benjamin Roth <be...@jaumo.com>
>> wrote:
>>
>>> Excuse me if I jump into an old thread, but from my experience, I have a
>>> very clear opinion about situations like that as I encountered them
>> before:
>>>
>>> Tests are there to give *certainty*.
>>> *Would you like to pass a crossing with a green light if you cannot be
>> sure
>>> if green really means green?*
>>> Do you want to rely on tests that are green, red, green, red? What if a
>> red
>>> is a real red and you missed it because you simply ignore it because it's
>>> flaky?
>>>
>>> IMHO there are only 3 options how to deal with broken/red tests:
>>> - Fix the underlying issue
>>> - Fix the test
>>> - Delete the test
>>>
>>> If I cannot trust a test, it is better not to have it at all. Otherwise
>>> people are staring at red lights and start to drive.
>>>
>>> This causes:
>>> - Uncertainty
>>> - Loss of trust
>>> - Confusion
>>> - More work
>>> - *Less quality*
>>>
>>> Just as an example:
>>> Few days ago I created a patch. Then I ran the utest and 1 test failed.
>>> Hmmm, did I break it? I had to check it twice by checking out the former
>>> state, running the tests again just to recognize that it wasn't me who
>> made
>>> it fail. That's annoying.
>>>
>>> Sorry again, I'm rather new here but what I just read reminded me much of
>>> situations I have been in years ago.
>>> So: +1, John
>>>
>>> 2016-12-03 7:48 GMT+01:00 sankalp kohli <kohlisankalp@gmail.com
>>> <javascript:;>>:
>>>
>>>> Hi,
>>>>     I dont see any any update on this thread. We will go ahead and make
>>>> Dtest a blocker for cutting releasing for anything after 3.10.
>>>>
>>>> Please respond if anyone has an objection to this.
>>>>
>>>> Thanks,
>>>> Sankalp
>>>>
>>>>
>>>>
>>>> On Mon, Nov 21, 2016 at 11:57 AM, Josh McKenzie <jmckenzie@apache.org
>>> <javascript:;>>
>>>> wrote:
>>>>
>>>>> Caveat: I'm strongly in favor of us blocking a release on a non-green
>>>> test
>>>>> board of either utest or dtest.
>>>>>
>>>>>
>>>>>> put something in prod which is known to be broken in obvious ways
>>>>>
>>>>> In my experience the majority of fixes are actually shoring up
>>>> low-quality
>>>>> / flaky tests or fixing tests that have been invalidated by a commit
>>> but
>>>> do
>>>>> not indicate an underlying bug. Inferring "tests are failing so we
>> know
>>>>> we're asking people to put things in prod that are broken in obvious
>>>> ways"
>>>>> is hyperbolic. A more correct statement would be: "Tests are failing
>> so
>>>> we
>>>>> know we're shipping with a test that's failing" which is not helpful.
>>>>>
>>>>> Our signal to noise ratio with tests has been very poor historically;
>>>> we've
>>>>> been trying to address that through aggressive triage and assigning
>> out
>>>>> test failures however we need far more active and widespread
>> community
>>>>> involvement if we want to truly *fix* this problem long-term.
>>>>>
>>>>> On Mon, Nov 21, 2016 at 2:33 PM, Jonathan Haddad <jon@jonhaddad.com
>>> <javascript:;>>
>>>>> wrote:
>>>>>
>>>>>> +1.  Kind of silly to put advise people to put something in prod
>>> which
>>>> is
>>>>>> known to be broken in obvious ways
>>>>>>
>>>>>> On Mon, Nov 21, 2016 at 11:31 AM sankalp kohli <
>>> kohlisankalp@gmail.com <javascript:;>
>>>>>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>     We should not cut a releases if Dtest are not passing. I
>> won't
>>>>> block
>>>>>>> 3.10 on this since we are just discussing this.
>>>>>>>
>>>>>>> Please provide feedback on this.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Sankalp
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Benjamin Roth
>>> Prokurist
>>>
>>> Jaumo GmbH � www.jaumo.com
>>> Wehrstra�e 46 � 73035 G�ppingen � Germany
>>> Phone +49 7161 304880-6 � Fax +49 7161 304880-1
>>> AG Ulm � HRB 731058 � Managing Director: Jens Kammerer
>>>
>>
>>
>> --
>> Sorry this was sent from mobile. Will do less grammar and spell check than
>> usual.
>>
> 
> 
>

Re: Failed Dtest will block cutting releases

Posted by Benjamin Roth <be...@jaumo.com>.

Sorry for jumping in so boldly before.

TL;DR:

   - I didn't mean to delete every flaky test just like that
   - To improve quality, each failing test has to be analyzed individually
   for release

More thoughts on that:

I had a closer look on some of the tests tagged as flaky and realized that
the situation here is more complex than I thought before.
Of course I didn't mean to delete all the flaky tests just like that. Maybe
I should rephrase it a bit to "If a (flaky) test can't really prove
something, then it is better not to have it". If a test does prove
something depends on its intention, its implementation and on how flaky it
really is and first of all: Why.

These dtests are maybe blessing and curse at the same time. On the one hand
there are things you cannot test with a unit test, so you need them for
certain cases. On the other hand, dtest do not only test the desired case.

   - They test the test environment (ccm, server hickups) and more or less
   all components of the CS daemon that are somehow involved as well.
   - This exposes the test to many more error sources than the bare test
   case and that creates of course a lot of "unreliability" in general and
   causes flaky results.
   - It makes it hard to pin down the failures to a certain cause like
      - Flaky test implementation
      - Flaky bugs in SUT
      - Unreliable test environment
   - Analyzing every failure is a pain. But a simple "retry and skip over"
   _may_ mask a real problem.

=> Difficult situation!

From my own projects and non-CS experience I can tell:
Flaky tests give me a bad feeling and always leave a certain smell. I've
also just skipped them with that reason "Yes, I know it's flaky, I don't
really care about it". But it simply does not feel right.

A real life example from another project:
Some weeks ago I wrote functional tests to test the integration of
SeaweedFS as a blob store backend in an image upload process. Test case was
roughly to upload an image, check if it exists on both old and new image
storage, delete it, check it again. The test existed for years. I simply
added some assertions to check the existance of the uploaded files on the
new storage. Funnyhow, I must have hit some corner case by that and from
that moment on, the test was flaky. Simple URL checks started to time out
from time to time. That made me really curios. To cut a long story short:
After having checked a whole lot of things, it turned out that not the test
was flaky and also not the shiny new storagy, it was the LVS loadbalancer.
The loadbalancer dropped connections reproducibly which happened more
likely with increasing concurrency. Finally we removed LVS completely and
replaced it by DNS-RR + VRRP, which completely solved the problem and the
tests ran happily ever after.

Usually there is no pure black and white.

   - Sometimes testing whole systems reveals problems you'd never
   have found without them
   - Sometimes they cause false alerts
   - Sometimes, skipping them masks real problems
   - Sometimes it sucks if a false alert blocks your release

If you want to be really safe, you have to analyze every single failure and
decide of what kind this failure is or could be and if a retry will prove
sth or not. At least when you are at a release gate. I think this should be
worth it.

There's a reason for this thread and there's a reason why people ask every
few days which CS version is production stable. Things have to improve over
time. This applies to test implementations, test environments, release
processes, and so on. One way to do this is to become a little bit stricter
(and a bit better) with every release. Making all tests pass at least once
before a release should be a rather low hanging fruit. Reducing the total
number of flaky tests or the "flaky-fail-rate" may be another future goal.

Btw, the fact of the day:
I grepped through dtests and found out that roughly 11% of all tests are
flagged with "known_failure" and roughly 8% of all tests are flagged with
"flaky". Quite impressive.

2016-12-03 15:52 GMT+01:00 Edward Capriolo <ed...@gmail.com>:

> I think it is fair to run a flakey test again. If it is determine it flaked
> out due to a conflict with another test or something ephemeral in a long
> process it is not worth blocking a release.
>
> Just deleting it is probably not a good path.
>
> I actually enjoy writing fixing, tweeking, tests so pinge offline or
> whatever.
>
> On Saturday, December 3, 2016, Benjamin Roth <be...@jaumo.com>
> wrote:
>
> > Excuse me if I jump into an old thread, but from my experience, I have a
> > very clear opinion about situations like that as I encountered them
> before:
> >
> > Tests are there to give *certainty*.
> > *Would you like to pass a crossing with a green light if you cannot be
> sure
> > if green really means green?*
> > Do you want to rely on tests that are green, red, green, red? What if a
> red
> > is a real red and you missed it because you simply ignore it because it's
> > flaky?
> >
> > IMHO there are only 3 options how to deal with broken/red tests:
> > - Fix the underlying issue
> > - Fix the test
> > - Delete the test
> >
> > If I cannot trust a test, it is better not to have it at all. Otherwise
> > people are staring at red lights and start to drive.
> >
> > This causes:
> > - Uncertainty
> > - Loss of trust
> > - Confusion
> > - More work
> > - *Less quality*
> >
> > Just as an example:
> > Few days ago I created a patch. Then I ran the utest and 1 test failed.
> > Hmmm, did I break it? I had to check it twice by checking out the former
> > state, running the tests again just to recognize that it wasn't me who
> made
> > it fail. That's annoying.
> >
> > Sorry again, I'm rather new here but what I just read reminded me much of
> > situations I have been in years ago.
> > So: +1, John
> >
> > 2016-12-03 7:48 GMT+01:00 sankalp kohli <kohlisankalp@gmail.com
> > <javascript:;>>:
> >
> > > Hi,
> > >     I dont see any any update on this thread. We will go ahead and make
> > > Dtest a blocker for cutting releasing for anything after 3.10.
> > >
> > > Please respond if anyone has an objection to this.
> > >
> > > Thanks,
> > > Sankalp
> > >
> > >
> > >
> > > On Mon, Nov 21, 2016 at 11:57 AM, Josh McKenzie <jmckenzie@apache.org
> > <javascript:;>>
> > > wrote:
> > >
> > > > Caveat: I'm strongly in favor of us blocking a release on a non-green
> > > test
> > > > board of either utest or dtest.
> > > >
> > > >
> > > > > put something in prod which is known to be broken in obvious ways
> > > >
> > > > In my experience the majority of fixes are actually shoring up
> > > low-quality
> > > > / flaky tests or fixing tests that have been invalidated by a commit
> > but
> > > do
> > > > not indicate an underlying bug. Inferring "tests are failing so we
> know
> > > > we're asking people to put things in prod that are broken in obvious
> > > ways"
> > > > is hyperbolic. A more correct statement would be: "Tests are failing
> so
> > > we
> > > > know we're shipping with a test that's failing" which is not helpful.
> > > >
> > > > Our signal to noise ratio with tests has been very poor historically;
> > > we've
> > > > been trying to address that through aggressive triage and assigning
> out
> > > > test failures however we need far more active and widespread
> community
> > > > involvement if we want to truly *fix* this problem long-term.
> > > >
> > > > On Mon, Nov 21, 2016 at 2:33 PM, Jonathan Haddad <jon@jonhaddad.com
> > <javascript:;>>
> > > > wrote:
> > > >
> > > > > +1.  Kind of silly to put advise people to put something in prod
> > which
> > > is
> > > > > known to be broken in obvious ways
> > > > >
> > > > > On Mon, Nov 21, 2016 at 11:31 AM sankalp kohli <
> > kohlisankalp@gmail.com <javascript:;>
> > > >
> > > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >     We should not cut a releases if Dtest are not passing. I
> won't
> > > > block
> > > > > > 3.10 on this since we are just discussing this.
> > > > > >
> > > > > > Please provide feedback on this.
> > > > > >
> > > > > > Thanks,
> > > > > > Sankalp
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> > Benjamin Roth
> > Prokurist
> >
> > Jaumo GmbH · www.jaumo.com
> > Wehrstraße 46 · 73035 Göppingen · Germany
> > Phone +49 7161 304880-6 · Fax +49 7161 304880-1
> > AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
> >
>
>
> --
> Sorry this was sent from mobile. Will do less grammar and spell check than
> usual.
>

-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

Re: Failed Dtest will block cutting releases

Posted by Edward Capriolo <ed...@gmail.com>.

I think it is fair to run a flakey test again. If it is determine it flaked
out due to a conflict with another test or something ephemeral in a long
process it is not worth blocking a release.

Just deleting it is probably not a good path.

I actually enjoy writing fixing, tweeking, tests so pinge offline or
whatever.

On Saturday, December 3, 2016, Benjamin Roth <be...@jaumo.com>
wrote:

> Excuse me if I jump into an old thread, but from my experience, I have a
> very clear opinion about situations like that as I encountered them before:
>
> Tests are there to give *certainty*.
> *Would you like to pass a crossing with a green light if you cannot be sure
> if green really means green?*
> Do you want to rely on tests that are green, red, green, red? What if a red
> is a real red and you missed it because you simply ignore it because it's
> flaky?
>
> IMHO there are only 3 options how to deal with broken/red tests:
> - Fix the underlying issue
> - Fix the test
> - Delete the test
>
> If I cannot trust a test, it is better not to have it at all. Otherwise
> people are staring at red lights and start to drive.
>
> This causes:
> - Uncertainty
> - Loss of trust
> - Confusion
> - More work
> - *Less quality*
>
> Just as an example:
> Few days ago I created a patch. Then I ran the utest and 1 test failed.
> Hmmm, did I break it? I had to check it twice by checking out the former
> state, running the tests again just to recognize that it wasn't me who made
> it fail. That's annoying.
>
> Sorry again, I'm rather new here but what I just read reminded me much of
> situations I have been in years ago.
> So: +1, John
>
> 2016-12-03 7:48 GMT+01:00 sankalp kohli <kohlisankalp@gmail.com
> <javascript:;>>:
>
> > Hi,
> >     I dont see any any update on this thread. We will go ahead and make
> > Dtest a blocker for cutting releasing for anything after 3.10.
> >
> > Please respond if anyone has an objection to this.
> >
> > Thanks,
> > Sankalp
> >
> >
> >
> > On Mon, Nov 21, 2016 at 11:57 AM, Josh McKenzie <jmckenzie@apache.org
> <javascript:;>>
> > wrote:
> >
> > > Caveat: I'm strongly in favor of us blocking a release on a non-green
> > test
> > > board of either utest or dtest.
> > >
> > >
> > > > put something in prod which is known to be broken in obvious ways
> > >
> > > In my experience the majority of fixes are actually shoring up
> > low-quality
> > > / flaky tests or fixing tests that have been invalidated by a commit
> but
> > do
> > > not indicate an underlying bug. Inferring "tests are failing so we know
> > > we're asking people to put things in prod that are broken in obvious
> > ways"
> > > is hyperbolic. A more correct statement would be: "Tests are failing so
> > we
> > > know we're shipping with a test that's failing" which is not helpful.
> > >
> > > Our signal to noise ratio with tests has been very poor historically;
> > we've
> > > been trying to address that through aggressive triage and assigning out
> > > test failures however we need far more active and widespread community
> > > involvement if we want to truly *fix* this problem long-term.
> > >
> > > On Mon, Nov 21, 2016 at 2:33 PM, Jonathan Haddad <jon@jonhaddad.com
> <javascript:;>>
> > > wrote:
> > >
> > > > +1.  Kind of silly to put advise people to put something in prod
> which
> > is
> > > > known to be broken in obvious ways
> > > >
> > > > On Mon, Nov 21, 2016 at 11:31 AM sankalp kohli <
> kohlisankalp@gmail.com <javascript:;>
> > >
> > > > wrote:
> > > >
> > > > > Hi,
> > > > >     We should not cut a releases if Dtest are not passing. I won't
> > > block
> > > > > 3.10 on this since we are just discussing this.
> > > > >
> > > > > Please provide feedback on this.
> > > > >
> > > > > Thanks,
> > > > > Sankalp
> > > > >
> > > >
> > >
> >
>
>
>
> --
> Benjamin Roth
> Prokurist
>
> Jaumo GmbH · www.jaumo.com
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 · Fax +49 7161 304880-1
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>


-- 
Sorry this was sent from mobile. Will do less grammar and spell check than
usual.

Re: Failed Dtest will block cutting releases

Posted by Benjamin Roth <be...@jaumo.com>.

Excuse me if I jump into an old thread, but from my experience, I have a
very clear opinion about situations like that as I encountered them before:

Tests are there to give *certainty*.
*Would you like to pass a crossing with a green light if you cannot be sure
if green really means green?*
Do you want to rely on tests that are green, red, green, red? What if a red
is a real red and you missed it because you simply ignore it because it's
flaky?

IMHO there are only 3 options how to deal with broken/red tests:
- Fix the underlying issue
- Fix the test
- Delete the test

If I cannot trust a test, it is better not to have it at all. Otherwise
people are staring at red lights and start to drive.

This causes:
- Uncertainty
- Loss of trust
- Confusion
- More work
- *Less quality*

Just as an example:
Few days ago I created a patch. Then I ran the utest and 1 test failed.
Hmmm, did I break it? I had to check it twice by checking out the former
state, running the tests again just to recognize that it wasn't me who made
it fail. That's annoying.

Sorry again, I'm rather new here but what I just read reminded me much of
situations I have been in years ago.
So: +1, John

2016-12-03 7:48 GMT+01:00 sankalp kohli <ko...@gmail.com>:

> Hi,
>     I dont see any any update on this thread. We will go ahead and make
> Dtest a blocker for cutting releasing for anything after 3.10.
>
> Please respond if anyone has an objection to this.
>
> Thanks,
> Sankalp
>
>
>
> On Mon, Nov 21, 2016 at 11:57 AM, Josh McKenzie <jm...@apache.org>
> wrote:
>
> > Caveat: I'm strongly in favor of us blocking a release on a non-green
> test
> > board of either utest or dtest.
> >
> >
> > > put something in prod which is known to be broken in obvious ways
> >
> > In my experience the majority of fixes are actually shoring up
> low-quality
> > / flaky tests or fixing tests that have been invalidated by a commit but
> do
> > not indicate an underlying bug. Inferring "tests are failing so we know
> > we're asking people to put things in prod that are broken in obvious
> ways"
> > is hyperbolic. A more correct statement would be: "Tests are failing so
> we
> > know we're shipping with a test that's failing" which is not helpful.
> >
> > Our signal to noise ratio with tests has been very poor historically;
> we've
> > been trying to address that through aggressive triage and assigning out
> > test failures however we need far more active and widespread community
> > involvement if we want to truly *fix* this problem long-term.
> >
> > On Mon, Nov 21, 2016 at 2:33 PM, Jonathan Haddad <jo...@jonhaddad.com>
> > wrote:
> >
> > > +1.  Kind of silly to put advise people to put something in prod which
> is
> > > known to be broken in obvious ways
> > >
> > > On Mon, Nov 21, 2016 at 11:31 AM sankalp kohli <kohlisankalp@gmail.com
> >
> > > wrote:
> > >
> > > > Hi,
> > > >     We should not cut a releases if Dtest are not passing. I won't
> > block
> > > > 3.10 on this since we are just discussing this.
> > > >
> > > > Please provide feedback on this.
> > > >
> > > > Thanks,
> > > > Sankalp
> > > >
> > >
> >
>



-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer