You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Nathan Hartman <ha...@gmail.com> on 2020/01/03 17:49:48 UTC

Re: [#4840] Merge assertion failure in svn_sort__array_insert

On Mon, Dec 30, 2019 at 12:28 PM Julian Foad <ju...@apache.org> wrote:
> I have added thorough random-input testing for svn_rangelist_merge2().

I assume that's the one added in r1872121,
test_mergeinfo_merge_random_non_validated_inputs().

I got a fail on it.

Quoth the function's docstring: "Unlike the tests with valid inputs,
this test expects many assertion failures.  We don't care about those.
All we care about is that it does not crash."

It crashed:

[[[
svn_tests: E200006: Test crashed (run in debugger with '--allow-segfaults')
FAIL:  mergeinfo-test 24: test rangelist merge random non-validated inputs
]]]

I know the code contains unfixed bugs that this test is meant to
expose. Sadly, the printf is commented so I don't know what the inputs
were:

+            /*
+            printf("testcase FAIL: %s / %s\n",
+                   mergeinfo_to_string_debug(mx, iterpool),
+                   mergeinfo_to_string_debug(my, iterpool));
+            svn_handle_error(err, stdout, FALSE);
+            */

Nathan

Re: [#4840] Merge assertion failure in svn_sort__array_insert

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Julian, would you like to answer the question in the second quoted paragraph?

Daniel Shahaf wrote on Wed, Jan 08, 2020 at 14:24:39 +0000:
> Julian Foad wrote on Wed, Jan 08, 2020 at 10:14:42 +0000:
> > What?
> 
> Using a PRNG makes the test code harder to read and to maintain, makes it
> more difficult to diagnose FAILs, and increases the risk of a bug in the test
> code.  What benefits are we getting that offset these risks and costs?
> 
> The way in which the PRNG algorithm is used is outside the PRNG's API promises.
> PRNG's make *no* promises on their outputs when used with a fixed seed.  In
> light of this, what reason do we have to believe that the test achieves good coverage?
> 
> It's not necessary to use a PRNG to test a large/diverse set of inputs.  You
> can achieve with plain old combinatorics (e.g., define 300 rangelist variables
> and test various combinations of them).  I'd actually expect that to have
> better coverage than a PRNG with a fixed seed.
> 
> By default, I would have expected the tests to be written without a PRNG, and I
> don't understand why the way the tests are is better.
> 
> Cheers,
> 
> Daniel

Re: [#4840] Merge assertion failure in svn_sort__array_insert

Posted by Julian Foad <ju...@apache.org>.
Daniel Shahaf wrote:
>>> PRNG makes the test code harder to read and to maintain, makes it
>>> more difficult to diagnose FAILs, and increases the risk of a bug

PRNG use makes for reasonably compact and readable test generation, and 
totally easy enough to reproduce and diagnose failures, IMO.

On the contrary, if we use static sets of cases or cases generated by 
simpler combinations of static sets of partial inputs, then it will be 
harder to read and understand whether the coverage is complete enough, 
thus increasing the risk of a bug (omission) in the tests.

- Julian

Re: [#4840] Merge assertion failure in svn_sort__array_insert

Posted by Nathan Hartman <ha...@gmail.com>.
On Fri, Jan 17, 2020 at 10:36 AM Daniel Shahaf <d....@daniel.shahaf.name> wrote:
> Julian and I discussed this on IRC today; the conclusion is:
>
> - With the behaviour of the test with seed=0 having been manually
>   reviewed, we're satisfied that the test achieves adequate coverage.
>
> - The properties that the magic numbers were chosen to satisfy will be
>   documented in code comments.
>
> - Julian has further reviewed the code, leading to, e.g., r1872919.

For convenience, a link to an archive:
https://colabti.org/irclogger/irclogger_log/svn-dev?date=2020-01-17

Thanks,
Nathan

Re: [#4840] Merge assertion failure in svn_sort__array_insert

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Julian Foad wrote on Thu, 16 Jan 2020 09:18 +00:00:
> I inspected the debug prints whizzing by, and noted that it included 
> each of the different kinds of input cases I expected (e.g. empty 
> ranges, reversed ranges, duplicate ranges, overlapping ranges, etc.) and 
> that the results included examples of the failure modes I expected.
> 
> Based on that observation, I altered the parameters such as the sizes of 
> the generated revision ranges and the repetition counts, until it 
> provided good coverage without excessive repetition. I aimed for "a few" 
> (more than one, less than hundreds) of occurrences of the same failure mode.
> 
> That doesn't guarantee coverage of every possible edge case combination, 
> of course, but it gives me confidence that it's an effective test. I 
> admit it's an issue that that is not apparent to an observer. However, 
> anyone needing to debug again can enable the debug prints and see it.

Julian and I discussed this on IRC today; the conclusion is:

- With the behaviour of the test with seed=0 having been manually
  reviewed, we're satisfied that the test achieves adequate coverage.

- The properties that the magic numbers were chosen to satisfy will be
  documented in code comments.

- Julian has further reviewed the code, leading to, e.g., r1872919.

Thanks, Julian.

Daniel

Re: [#4840] Merge assertion failure in svn_sort__array_insert

Posted by Julian Foad <ju...@apache.org>.
Nathan Hartman wrote:
> On Wed, Jan 8, 2020 at 9:24 AM Daniel Shahaf <d....@daniel.shahaf.name> wrote:
> > Julian Foad wrote on Wed, Jan 08, 2020 at 10:14:42 +0000:
> > > The suggestion was that we should prefer the regression test suite to be
> > > deterministic, [...]
> > But unlike nearly all of the test suite, we don't actually know what the
> > sequence of inputs _is_, 

When developing/debugging, I made the test print each set of inputs, 
sometimes for every test case and sometimes just when a case failed. 
Some of that debug print code is still there, commented out.

> > and we have no reason to believe it provides good
> > coverage of edge cases and rare combinations of inputs. For all we know, the
> > 90000 iterations could all be testing the exact same, most common codepath over
> > and over again.

I inspected the debug prints whizzing by, and noted that it included 
each of the different kinds of input cases I expected (e.g. empty 
ranges, reversed ranges, duplicate ranges, overlapping ranges, etc.) and 
that the results included examples of the failure modes I expected.

Based on that observation, I altered the parameters such as the sizes of 
the generated revision ranges and the repetition counts, until it 
provided good coverage without excessive repetition. I aimed for "a few" 
(more than one, less than hundreds) of occurrences of the same failure mode.

That doesn't guarantee coverage of every possible edge case combination, 
of course, but it gives me confidence that it's an effective test. I 
admit it's an issue that that is not apparent to an observer. However, 
anyone needing to debug again can enable the debug prints and see it.

> My earlier thinking (though I didn't spell it out) was to move the
> PRNG code to a separate program, use it to generate a self-documenting
> C array of unique test cases [...] If we discover new failing inputs in the
> future, they can be added to the test suite. Then there is no guessing
> about what inputs are being tested.

Capturing and storing generated cases into a fixed explicit could make 
some tasks slightly easier, perhaps tasks such as reaching the required 
state in a debugger, or extracting a particular case into a separate test.

What cases would you extract?  One example of each currently known 
failure mode (and success mode) is probably useful; that's a start.  I 
already made a separate test case for one failure mode, and it would be 
reasonable to do so for the others.

But what about capturing a sufficient set of input combinations that 
should be tested because they *might* in future result in different 
behaviours?  That's hard.  We can try to guess some cases that look like 
they would be worth testing, which is what we do when writing 
traditional fixed tests.  But the idea of "random" input generation is 
to overcome the tendency to fail to guess in advance all the 
combinations that might reveal a problem.  Pseudo-random inputs are good 
at that.

A good, dedicated random-testing infrastructure would make it easy to 
generate a set of fixed tests based on observing random testing combined 
with, say, code coverage information.  If we had this test 
infrastructure, sure, but if we don't, I'm not seeing why doing that 
extraction and analysis manually would be worth the effort.

The point is, we get a wide coverage of input combinations.  The 
coverage is neither theoretically nor empirically proven to be complete, 
but I demonstrated it is wide enough to be useful.

> > Using a PRNG makes the test code harder to read and to maintain, makes it
> > more difficult to diagnose FAILs, and increases the risk of a bug in the test
> > code. What benefits are we getting that offset these risks and costs?
> > 
> > The way in which the PRNG algorithm is used is outside the PRNG's API promises.
> > PRNG's make *no* promises on their outputs when used with a fixed seed. In
> > light of this, what reason do we have to believe that the test achieves good coverage?

Used for test case generation, the PRNG only needs to generate a (fixed) 
sequence of numbers that with no significant correlation to the pattern 
of usage in generating test cases.  We could reasonably call it a 
"useful sequence generator" instead of "PRNG", to remove the stigma 
attached to falling short of strong randomness guarantees.

It sounds like you feel there should be a much higher quality of 
randomness.  I addressed coverage above.

> > It's not necessary to use a PRNG to test a large/diverse set of inputs. You
> > can achieve with plain old combinatorics (e.g., define 300 rangelist variables
> > and test various combinations of them). I'd actually expect that to have
> > better coverage than a PRNG with a fixed seed.

I'd be mildly interested to see the result of that analysis.  In 
practice, however, what we have now is good enough.

> In other words, replace the PRNG and leave all else as-is.
> Generating all permutations when picking 2 items is easy:

I don't mind if someone wants to write the test code to generate cases a 
different way.  I'm not seeing the need, myself.

If there is something simple I can do, that would satisfy your concerns, 
I would be happy to do so.  That could be something like setting the 
seed to a time-based value instead of a fixed number, or replacing the 
PRNG implementation with another simple drop-in replacement, or some 
such.  Let me know if there is something of that magnitude that would 
resolve the concerns.  So far, I am unable to understand whether or what 
that would be.

- Julian

Re: [#4840] Merge assertion failure in svn_sort__array_insert

Posted by Nathan Hartman <ha...@gmail.com>.
On Wed, Jan 8, 2020 at 9:24 AM Daniel Shahaf <d....@daniel.shahaf.name> wrote:
> Julian Foad wrote on Wed, Jan 08, 2020 at 10:14:42 +0000:
> > The suggestion was that we should prefer the regression test suite to be
> > deterministic, running the same fixed set of tests each time.  Repeating by
> > default, not just repeatable on request.

> But unlike nearly all of the test suite, we don't actually know what the
> sequence of inputs _is_, and we have no reason to believe it provides good
> coverage of edge cases and rare combinations of inputs.  For all we know, the
> 90000 iterations could all be testing the exact same, most common codepath over
> and over again.

That occurred to me as well. There's no way to 'uniq' the values when
they're being determined in a transient manner.

My earlier thinking (though I didn't spell it out) was to move the
PRNG code to a separate program, use it to generate a self-documenting
C array of unique test cases, and put that C array in the test suite.
If someone wanted to regenerate the array, they could re-run the
program that generates them. If we discover new failing inputs in the
future, they can be added to the test suite. Then there is no guessing
about what inputs are being tested.

I haven't yet decided whether I still like this idea, because Daniel's
thoughts seem much more sensible:

> It's not necessary to use a PRNG to test a large/diverse set of inputs.  You
> can achieve with plain old combinatorics (e.g., define 300 rangelist variables
> and test various combinations of them).  I'd actually expect that to have
> better coverage than a PRNG with a fixed seed.

In other words, replace the PRNG and leave all else as-is.

Generating all permutations when picking 2 items is easy:

    for (i = 0; i < 299; i++) {
        for (j = i + 1; j < 300; j++) {
            test(i, j);
            test(j, i);
        }
    }

This generates all 89,700 permutations (300 nPr 2 = 300!/298!) and
there's no huge file of test cases to generate.

Nathan

Re: [#4840] Merge assertion failure in svn_sort__array_insert

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Julian Foad wrote on Wed, Jan 08, 2020 at 10:14:42 +0000:
> Daniel Shahaf wrote:
> > Julian Foad wrote on Tue, Jan 07, 2020 at 09:47:48 +0000:
> > > For now, I propose to make each test use a repeatable sequence, independent
> > > of the other tests.  I think that will be enough; the options I mentioned
> > > can be added if and when there is a demand for them.
> > 
> > I don't understand why you're fixing the seed.
> > 
> > For repeatability, all you need is to output the seed to stdout and to be able
> > to re-run the tests with a particular value of the seed:
> 
> The suggestion was that we should prefer the regression test suite to be
> deterministic, running the same fixed set of tests each time.  Repeating by
> default, not just repeatable on request.
> 
> The other way (non-repeating by default, and repeatable on request) is a
> possible alternative that we could choose.  Indeed, a few (six or so) other
> tests in the suite are currently written that way.  However we don't really
> have the test infrastructure for handling that nicely.  We could add some,
> starting with the pseudocode you posted, but we haven't yet and it would
> probably want a bit more than that to make an overall good experience.

Makes sense.

> > Conversely, the effect of fixing the seed is to make every single test run test
> > exactly the same sequence of API calls on the same sequence of inputs.  That
> > means the test is not merely repeatable, but also deterministic
> 
> Yes, like nearly all of our test suite.

But unlike nearly all of the test suite, we don't actually know what the
sequence of inputs _is_, and we have no reason to believe it provides good
coverage of edge cases and rare combinations of inputs.  For all we know, the
90000 iterations could all be testing the exact same, most common codepath over
and over again.

> > — which begs
> > the question, why do we need a PRNG at all?  What do we gain
> 
> We gain the ability to test a large, diverse set of combinations of inputs.
> Call it an "test case generator" instead of a "PRNG", if that helps.
> 
> > in exchange for the
> > costs and risks associated with using a PRNG in the test suite?
> 
> What?

Using a PRNG makes the test code harder to read and to maintain, makes it
more difficult to diagnose FAILs, and increases the risk of a bug in the test
code.  What benefits are we getting that offset these risks and costs?

The way in which the PRNG algorithm is used is outside the PRNG's API promises.
PRNG's make *no* promises on their outputs when used with a fixed seed.  In
light of this, what reason do we have to believe that the test achieves good coverage?

It's not necessary to use a PRNG to test a large/diverse set of inputs.  You
can achieve with plain old combinatorics (e.g., define 300 rangelist variables
and test various combinations of them).  I'd actually expect that to have
better coverage than a PRNG with a fixed seed.

By default, I would have expected the tests to be written without a PRNG, and I
don't understand why the way the tests are is better.

Cheers,

Daniel

Re: [#4840] Merge assertion failure in svn_sort__array_insert

Posted by Julian Foad <ju...@apache.org>.
Daniel Shahaf wrote:
> Julian Foad wrote on Tue, Jan 07, 2020 at 09:47:48 +0000:
>> For now, I propose to make each test use a repeatable sequence, independent
>> of the other tests.  I think that will be enough; the options I mentioned
>> can be added if and when there is a demand for them.
> 
> I don't understand why you're fixing the seed.
> 
> For repeatability, all you need is to output the seed to stdout and to be able
> to re-run the tests with a particular value of the seed:

The suggestion was that we should prefer the regression test suite to be 
deterministic, running the same fixed set of tests each time.  Repeating 
by default, not just repeatable on request.

The other way (non-repeating by default, and repeatable on request) is a 
possible alternative that we could choose.  Indeed, a few (six or so) 
other tests in the suite are currently written that way.  However we 
don't really have the test infrastructure for handling that nicely.  We 
could add some, starting with the pseudocode you posted, but we haven't 
yet and it would probably want a bit more than that to make an overall 
good experience.

> Conversely, the effect of fixing the seed is to make every single test run test
> exactly the same sequence of API calls on the same sequence of inputs.  That
> means the test is not merely repeatable, but also deterministic 

Yes, like nearly all of our test suite.

> — which begs
> the question, why do we need a PRNG at all?  What do we gain

We gain the ability to test a large, diverse set of combinations of 
inputs.  Call it an "test case generator" instead of a "PRNG", if that 
helps.

> in exchange for the
> costs and risks associated with using a PRNG in the test suite?

What?

- Julian

Re: [#4840] Merge assertion failure in svn_sort__array_insert

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Julian Foad wrote on Tue, Jan 07, 2020 at 09:47:48 +0000:
> For now, I propose to make each test use a repeatable sequence, independent
> of the other tests.  I think that will be enough; the options I mentioned
> can be added if and when there is a demand for them.
> 
> How does that sound?

I don't understand why you're fixing the seed.

For repeatability, all you need is to output the seed to stdout and to be able
to re-run the tests with a particular value of the seed:

    if "mergeinfo-test.c seed" in os.environ:
        seed = int(os.getenv("mergeinfo-test.c seed"))
    else:
        seed = random.randint(0, 2**32 - 1)
    print("seed =", seed)
    run_test()

(Yes, I know we're talking about C tests; it's pseudocode.)

Conversely, the effect of fixing the seed is to make every single test run test
exactly the same sequence of API calls on the same sequence of inputs.  That
means the test is not merely repeatable, but also deterministic — which begs
the question, why do we need a PRNG at all?  What do we gain in exchange for the
costs and risks associated with using a PRNG in the test suite?

Cheers,

Daniel

Re: [#4840] Merge assertion failure in svn_sort__array_insert

Posted by Julian Foad <ju...@apache.org>.
Nathan Hartman wrote:
> Random input tests are not deterministic in the sense that one never
> knows in advance which inputs will be tested.
>  > While that's great for finding bugs that a deterministic test won't
> surface, I wonder if we should try to keep the regression test suite
> deterministic (as much as feasible) 

I was thinking about this yesterday, as I noticed on some test runs one 
of the tests that I marked as XFAIL can unexpectedly pass (XPASS).  I 
agree there is merit in keeping the default run repeatable.

The (pseudo-)random number generator (RNG) used in the tests is a 
deterministic algorithm, under our control.

The way mergeinfo-test.c is currently written, the existing tests 6 and 
10 each seed the RNG to a time function before using it, while the tests 
22-25 that I recently added don't.  If I run 'mergeinfo-test.c 22', the 
seed starts at zero and the test runs with a deterministic sequence, the 
same on every run.

When I was developing these tests, I was running one of them at a time 
and the sequence was repeatable.  When running the whole test suite, so 
that these tests run after tests 6 and 10 (and/or in parallel, I 
suppose), only then does it become non-repeatable.

To make each test use a repeatable sequence independent of other tests, 
each test should maintain its own RNG state.  That is do-able.

> and move any random input testing
> to a separate "fuzz testing" program.

We do have an AFL tests subdirectory, but ...

> Meanwhile, I'm *not* suggesting to remove the test from the regression
> test suite. Rather, I'm suggesting to make it deterministic by giving
> it a fixed list of test inputs. ([...])

A good random-testing infrastructure would make it easy to capture sets 
of inputs that caused failures in the past, and inputs that exercise a 
high proportion of code paths, and construct a regression test 
configuration that re-plays just the selected cases.

However, we don't have such an infrastructure.  It is easier to manage 
if instead we use parameters to make it run repeatably (and reasonably 
quickly) by default and have the option to make it run non-repeatably 
(and optionally for a much longer time) on demand.  Then we can keep the 
test code where it is, alongside all the similar tests, where they can 
easily share test infrastructure.

For now, I propose to make each test use a repeatable sequence, 
independent of the other tests.  I think that will be enough; the 
options I mentioned can be added if and when there is a demand for them.

How does that sound?

- Julian

Re: [#4840] Merge assertion failure in svn_sort__array_insert

Posted by Nathan Hartman <ha...@gmail.com>.
On Mon, Jan 6, 2020 at 12:05 PM Julian Foad <ju...@apache.org> wrote:
>
> Nathan Hartman wrote:
> > I assume that's the one added in r1872121,
> > test_mergeinfo_merge_random_non_validated_inputs().
> >
> > I got a fail on it.
>
> Thanks for the report.  I made a bad assumption in the test code.
>
> http://svn.apache.org/r1872388 should fix it.

Thank you.

I have been thinking about the random input tests... My thoughts:

Random input tests are not deterministic in the sense that one never
knows in advance which inputs will be tested.

While that's great for finding bugs that a deterministic test won't
surface, I wonder if we should try to keep the regression test suite
deterministic (as much as feasible) and move any random input testing
to a separate "fuzz testing" program.

Meanwhile, I'm *not* suggesting to remove the test from the regression
test suite. Rather, I'm suggesting to make it deterministic by giving
it a fixed list of test inputs. (The random inputs generator could be
utilized to generate those inputs, even thousands of them if we'd
like.)

Thoughts?

Nathan

Re: [#4840] Merge assertion failure in svn_sort__array_insert

Posted by Julian Foad <ju...@apache.org>.
Nathan Hartman wrote:
> I assume that's the one added in r1872121,
> test_mergeinfo_merge_random_non_validated_inputs().
> 
> I got a fail on it.

Thanks for the report.  I made a bad assumption in the test code.

http://svn.apache.org/r1872388 should fix it.

- Julian