You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@calcite.apache.org by Michael Mior <mm...@apache.org> on 2018/09/10 11:33:27 UTC

Calcite fuzz testing

In a separate thread, the idea of fuzz testing was brought up. I decided
this could be a fun thing to play around with. I managed to get something
simple running with Kelinci (https://github.com/isstac/kelinci) that tests
for crashes in SqlParser using queries from the Quidem tests as the initial
inputs. Tests are running quite slowly, but I left it overnight and haven't
found any crashes. Any suggestions on other things that might be relatively
straightforward to test?

--
Michael Mior
mmior@apache.org

Re: Calcite fuzz testing

Posted by Julian Hyde <jh...@apache.org>.

What is the evidence that Tracehash actually works? In GeoHash there is a notion of proximity, so it is clear that if two locations are within 10 miles then there will be a maximum distance between their hashes. When Tracehash removes part of the stack, is this based on a human expert’s intuition that the middle of the stack is not relevant? Because usually it isn’t, but sometimes it is.

> On Apr 26, 2019, at 12:10 PM, Michael Mior <mm...@apache.org> wrote:
> 
> I could see some might dismiss this as noise, but I really like the
> idea of tracehash and it would be nice to see that catch on. (I think
> it would be interesting if it could be structured something like a
> geohash so truncation would reduce specificity, but it's less obvious
> how to do this here.) Since it takes up minimal space, I would be open
> to considering including it in stack traces.
> 
> --
> Michael Mior
> mmior@apache.org
> 
> Le ven. 26 avr. 2019 à 14:36, Vladimir Sitnikov
> <si...@gmail.com> a écrit :
>> 
>> Let me post a couple of links I've came across today (it comes out of this
>> Twitter thread: https://twitter.com/backendsecret/status/1121290210464034816
>> ):
>> 
>> https://github.com/alexknvl/fuzzball -- it is a machine learning driven
>> fuzzer for Scala which identifies quite a few bugs in Scala compiler.
>> 
>> The beauty of ML is we don't need to somehow declare the grammar, but it
>> can just learn from lots of samples.
>> I've no idea if that would play well for SQL (we need to declare metadata
>> somehow), however it might still work somehow.
>> 
>> Then there's https://github.com/cretz/javan-warty-pig a fuzzer + bytecode
>> agent to trace execution (it remembers the taken paths, so it distinguishes
>> "different" executions.
>> 
>> https://github.com/alexknvl/tracehash -- a library that produces short
>> summaries for exception stacktraces.
>> Those signatures might be a good aid for "stackoverflow-guided-development"
>> (== we might want to print stacktrace signatures by default for Calcite
>> exceptions).
>> 
>> Vladimir

Re: Calcite fuzz testing

Posted by Michael Mior <mm...@apache.org>.

It would be interesting if the Tracehash author had a source of bug
reports identified as duplicates along with stack traces to see how
well it works in practice. At this point, it seems like it's just a
heuristic based on an opinion of what's important.
--
Michael Mior
mmior@apache.org

Le ven. 26 avr. 2019 à 15:10, Michael Mior <mm...@apache.org> a écrit :
>
> I could see some might dismiss this as noise, but I really like the
> idea of tracehash and it would be nice to see that catch on. (I think
> it would be interesting if it could be structured something like a
> geohash so truncation would reduce specificity, but it's less obvious
> how to do this here.) Since it takes up minimal space, I would be open
> to considering including it in stack traces.
>
> --
> Michael Mior
> mmior@apache.org
>
> Le ven. 26 avr. 2019 à 14:36, Vladimir Sitnikov
> <si...@gmail.com> a écrit :
> >
> > Let me post a couple of links I've came across today (it comes out of this
> > Twitter thread: https://twitter.com/backendsecret/status/1121290210464034816
> > ):
> >
> > https://github.com/alexknvl/fuzzball -- it is a machine learning driven
> > fuzzer for Scala which identifies quite a few bugs in Scala compiler.
> >
> > The beauty of ML is we don't need to somehow declare the grammar, but it
> > can just learn from lots of samples.
> > I've no idea if that would play well for SQL (we need to declare metadata
> > somehow), however it might still work somehow.
> >
> > Then there's https://github.com/cretz/javan-warty-pig a fuzzer + bytecode
> > agent to trace execution (it remembers the taken paths, so it distinguishes
> > "different" executions.
> >
> > https://github.com/alexknvl/tracehash -- a library that produces short
> > summaries for exception stacktraces.
> > Those signatures might be a good aid for "stackoverflow-guided-development"
> > (== we might want to print stacktrace signatures by default for Calcite
> > exceptions).
> >
> > Vladimir

Re: Calcite fuzz testing

Posted by Michael Mior <mm...@apache.org>.

I could see some might dismiss this as noise, but I really like the
idea of tracehash and it would be nice to see that catch on. (I think
it would be interesting if it could be structured something like a
geohash so truncation would reduce specificity, but it's less obvious
how to do this here.) Since it takes up minimal space, I would be open
to considering including it in stack traces.

--
Michael Mior
mmior@apache.org

Le ven. 26 avr. 2019 à 14:36, Vladimir Sitnikov
<si...@gmail.com> a écrit :
>
> Let me post a couple of links I've came across today (it comes out of this
> Twitter thread: https://twitter.com/backendsecret/status/1121290210464034816
> ):
>
> https://github.com/alexknvl/fuzzball -- it is a machine learning driven
> fuzzer for Scala which identifies quite a few bugs in Scala compiler.
>
> The beauty of ML is we don't need to somehow declare the grammar, but it
> can just learn from lots of samples.
> I've no idea if that would play well for SQL (we need to declare metadata
> somehow), however it might still work somehow.
>
> Then there's https://github.com/cretz/javan-warty-pig a fuzzer + bytecode
> agent to trace execution (it remembers the taken paths, so it distinguishes
> "different" executions.
>
> https://github.com/alexknvl/tracehash -- a library that produces short
> summaries for exception stacktraces.
> Those signatures might be a good aid for "stackoverflow-guided-development"
> (== we might want to print stacktrace signatures by default for Calcite
> exceptions).
>
> Vladimir

Re: Calcite fuzz testing

Posted by Andrew O <ao...@gmail.com>.

Although not using ML,  mutation testing of existing test queries could
help catch additional parsing / planning issues. By coincidence I can
across the link below which could be conceptually relevant to this topic,
(although the code / implementation may not directly be)

https://in2test.lsi.uniovi.es/sqlmutation/?lang=en

Regards

Andrew

On Fri, 26 Apr 2019, 19:36 Vladimir Sitnikov, <si...@gmail.com>
wrote:

> Let me post a couple of links I've came across today (it comes out of this
> Twitter thread:
> https://twitter.com/backendsecret/status/1121290210464034816
> ):
>
> https://github.com/alexknvl/fuzzball -- it is a machine learning driven
> fuzzer for Scala which identifies quite a few bugs in Scala compiler.
>
> The beauty of ML is we don't need to somehow declare the grammar, but it
> can just learn from lots of samples.
> I've no idea if that would play well for SQL (we need to declare metadata
> somehow), however it might still work somehow.
>
> Then there's https://github.com/cretz/javan-warty-pig a fuzzer + bytecode
> agent to trace execution (it remembers the taken paths, so it distinguishes
> "different" executions.
>
> https://github.com/alexknvl/tracehash -- a library that produces short
> summaries for exception stacktraces.
> Those signatures might be a good aid for "stackoverflow-guided-development"
> (== we might want to print stacktrace signatures by default for Calcite
> exceptions).
>
> Vladimir
>

Re: Calcite fuzz testing

Posted by Vladimir Sitnikov <si...@gmail.com>.

Let me post a couple of links I've came across today (it comes out of this
Twitter thread: https://twitter.com/backendsecret/status/1121290210464034816
):

https://github.com/alexknvl/fuzzball -- it is a machine learning driven
fuzzer for Scala which identifies quite a few bugs in Scala compiler.

The beauty of ML is we don't need to somehow declare the grammar, but it
can just learn from lots of samples.
I've no idea if that would play well for SQL (we need to declare metadata
somehow), however it might still work somehow.

Then there's https://github.com/cretz/javan-warty-pig a fuzzer + bytecode
agent to trace execution (it remembers the taken paths, so it distinguishes
"different" executions.

https://github.com/alexknvl/tracehash -- a library that produces short
summaries for exception stacktraces.
Those signatures might be a good aid for "stackoverflow-guided-development"
(== we might want to print stacktrace signatures by default for Calcite
exceptions).

Vladimir

Re: Calcite fuzz testing

Posted by Michael Mior <mm...@apache.org>.

I was just suggesting that if there were a number of small bugs already
discovered that maybe one JIRA would be enough to cover them. I'm certainly
not suggesting that we have a JIRA that is continually updated. Perhaps
there should just be a separate JIRA for each issue found.

--
Michael Mior
mmior@apache.org


Le mer. 19 sept. 2018 à 10:51, Vladimir Sitnikov <
sitnikov.vladimir@gmail.com> a écrit :

> Michael>Looks good to me! Should we maybe create a JIRA that we can point
> to for
> Michael>those interested in fixing some bugs?
>
> Frankly speaking I have no idea how that should work.
> The identified expressions are new every time, so do you mean we should
> have "ever-opened JIRA issue that suggests running RexProgramFuzzyTest"?
>
> How about the following?
>
> 1) RexProgramFuzzyTest runs with predefined seed and iteration count for
> regression purposes. It prints nothing when the test is fine.
>
> 2) Then RexProgramFuzzyTest runs with a random seed till the first failure
> (with a time budget of 5 seconds). It prints the error message to the
> console.
> It might inspire those who stare at build logs. Of course the stacktrace
> should either be hidden or truncated to 3-4 lines to avoid log clutter.
>
> Vladimir
>

Re: Calcite fuzz testing

Posted by Vladimir Sitnikov <si...@gmail.com>.

Michael>Looks good to me! Should we maybe create a JIRA that we can point
to for
Michael>those interested in fixing some bugs?

Frankly speaking I have no idea how that should work.
The identified expressions are new every time, so do you mean we should
have "ever-opened JIRA issue that suggests running RexProgramFuzzyTest"?

How about the following?

1) RexProgramFuzzyTest runs with predefined seed and iteration count for
regression purposes. It prints nothing when the test is fine.

2) Then RexProgramFuzzyTest runs with a random seed till the first failure
(with a time budget of 5 seconds). It prints the error message to the
console.
It might inspire those who stare at build logs. Of course the stacktrace
should either be hidden or truncated to 3-4 lines to avoid log clutter.

Vladimir

Re: Calcite fuzz testing

Posted by Michael Mior <mm...@apache.org>.

Looks good to me! Should we maybe create a JIRA that we can point to for
those interested in fixing some bugs?

--
Michael Mior
mmior@apache.org


Le mer. 12 sept. 2018 à 18:17, Vladimir Sitnikov <
sitnikov.vladimir@gmail.com> a écrit :

> Let the fuzzing begin: https://github.com/apache/calcite/pull/830
>
> I have not added it to the CalciteSuite since otherwise it would fail each
> and every build.
>
> On the other hand, it might be a good source of inspiration for newbie
> contributions.
> RexProgrammFuzzyTest#testFuzzy produces lots of failures which are
> more-or-less trivial to fix,
> and which are important to fix as the errors often are related to "wrong
> results from SQL".
>
> Entry point is
>
> https://github.com/apache/calcite/pull/830/files#diff-ffa17851d5a9882f0391445f2fc04468R216
>
> PS. if no-one objects within three days, I'll assume lazy consensus and
> commit it.
>
> Vladimir
>

Re: Calcite fuzz testing

Posted by Vladimir Sitnikov <si...@gmail.com>.

Let the fuzzing begin: https://github.com/apache/calcite/pull/830

I have not added it to the CalciteSuite since otherwise it would fail each
and every build.

On the other hand, it might be a good source of inspiration for newbie
contributions.
RexProgrammFuzzyTest#testFuzzy produces lots of failures which are
more-or-less trivial to fix,
and which are important to fix as the errors often are related to "wrong
results from SQL".

Entry point is
https://github.com/apache/calcite/pull/830/files#diff-ffa17851d5a9882f0391445f2fc04468R216

PS. if no-one objects within three days, I'll assume lazy consensus and
commit it.

Vladimir

Re: Calcite fuzz testing

Posted by Michael Mior <mm...@apache.org>.

My own personal machine could work although I've found in the past it can
be a pain since I'm constantly installing and reconfiguring it for other
tasks and it's more likely to cause other things to break. But it's
certainly a possible option.

--
Michael Mior
mmior@apache.org


Le mer. 12 sept. 2018 à 13:01, Julian Hyde <jh...@apache.org> a écrit :

> My intuition about fuzz testing is that since we are searching an
> exponentially-sized search space in random order we will find 90% of the
> bugs with the first 10% of the effort (or some similar power law). We
> should burn a large amount of CPU on it when we first introduce it, and
> thereafter burn a small amount each nightly test run.
>
> No need to ask INFRA for a VM. Just use your own personal machine
> overnight.
>
> Julian
>
>
> > On Sep 12, 2018, at 6:01 AM, Michael Mior <mm...@apache.org> wrote:
> >
> > True, although 23 days over the lifetime of the project still isn't very
> > much. Definitely better than nothing though. If we take a bit of a hit in
> > CI runtime and catch some bugs, I'm for it :)
> >
> >> It would be great, however we need to have a fuzzer first :)
> >
> > My setup fuzzing the parser with afl seems to be working well, although
> > it's quite slow and never found any crashes so probably not really worth
> it.
> >
> > --
> > Michael Mior
> > mmior@apache.org
> >
> >
> > Le mer. 12 sept. 2018 à 08:40, Vladimir Sitnikov <
> > sitnikov.vladimir@gmail.com> a écrit :
> >
> >>> My only concern is that may betoo short and unlikely to find any bugs
> >>
> >> Remember: each time it starts from a random point.
> >> Apache Jenkins / Calcite-Master has 800+ builds now.
> >> Travis / Calcite has 2300+ builds now.
> >>
> >> Just to clarify: current Travis configuration is 4 jobs (Java 8, 9, 10,
> 11)
> >> They take ~15 minutes to complete.
> >> We can add one more job that would be dedicated to fuzz testing, and we
> >> could configure it for 1..15 minutes.
> >>
> >> 2300 builds * 15 minutes is 23 days worth of fuzzing.
> >>
> >>> It seems like we could also possibly request a VM
> >>> from INFRA to run fuzz testing full time
> >>
> >> It would be great, however we need to have a fuzzer first :)
> >>
> >> Vladimir
> >>
>
>

Re: Calcite fuzz testing

Posted by Julian Hyde <jh...@apache.org>.

My intuition about fuzz testing is that since we are searching an exponentially-sized search space in random order we will find 90% of the bugs with the first 10% of the effort (or some similar power law). We should burn a large amount of CPU on it when we first introduce it, and thereafter burn a small amount each nightly test run.

No need to ask INFRA for a VM. Just use your own personal machine overnight.

Julian


> On Sep 12, 2018, at 6:01 AM, Michael Mior <mm...@apache.org> wrote:
> 
> True, although 23 days over the lifetime of the project still isn't very
> much. Definitely better than nothing though. If we take a bit of a hit in
> CI runtime and catch some bugs, I'm for it :)
> 
>> It would be great, however we need to have a fuzzer first :)
> 
> My setup fuzzing the parser with afl seems to be working well, although
> it's quite slow and never found any crashes so probably not really worth it.
> 
> --
> Michael Mior
> mmior@apache.org
> 
> 
> Le mer. 12 sept. 2018 à 08:40, Vladimir Sitnikov <
> sitnikov.vladimir@gmail.com> a écrit :
> 
>>> My only concern is that may betoo short and unlikely to find any bugs
>> 
>> Remember: each time it starts from a random point.
>> Apache Jenkins / Calcite-Master has 800+ builds now.
>> Travis / Calcite has 2300+ builds now.
>> 
>> Just to clarify: current Travis configuration is 4 jobs (Java 8, 9, 10, 11)
>> They take ~15 minutes to complete.
>> We can add one more job that would be dedicated to fuzz testing, and we
>> could configure it for 1..15 minutes.
>> 
>> 2300 builds * 15 minutes is 23 days worth of fuzzing.
>> 
>>> It seems like we could also possibly request a VM
>>> from INFRA to run fuzz testing full time
>> 
>> It would be great, however we need to have a fuzzer first :)
>> 
>> Vladimir
>>

Re: Calcite fuzz testing

Posted by Michael Mior <mm...@apache.org>.

True, although 23 days over the lifetime of the project still isn't very
much. Definitely better than nothing though. If we take a bit of a hit in
CI runtime and catch some bugs, I'm for it :)

> It would be great, however we need to have a fuzzer first :)

My setup fuzzing the parser with afl seems to be working well, although
it's quite slow and never found any crashes so probably not really worth it.

--
Michael Mior
mmior@apache.org


Le mer. 12 sept. 2018 à 08:40, Vladimir Sitnikov <
sitnikov.vladimir@gmail.com> a écrit :

> >My only concern is that may betoo short and unlikely to find any bugs
>
> Remember: each time it starts from a random point.
> Apache Jenkins / Calcite-Master has 800+ builds now.
> Travis / Calcite has 2300+ builds now.
>
> Just to clarify: current Travis configuration is 4 jobs (Java 8, 9, 10, 11)
> They take ~15 minutes to complete.
> We can add one more job that would be dedicated to fuzz testing, and we
> could configure it for 1..15 minutes.
>
> 2300 builds * 15 minutes is 23 days worth of fuzzing.
>
> >It seems like we could also possibly request a VM
> >from INFRA to run fuzz testing full time
>
> It would be great, however we need to have a fuzzer first :)
>
> Vladimir
>

Re: Calcite fuzz testing

Posted by Vladimir Sitnikov <si...@gmail.com>.

>My only concern is that may betoo short and unlikely to find any bugs

Remember: each time it starts from a random point.
Apache Jenkins / Calcite-Master has 800+ builds now.
Travis / Calcite has 2300+ builds now.

Just to clarify: current Travis configuration is 4 jobs (Java 8, 9, 10, 11)
They take ~15 minutes to complete.
We can add one more job that would be dedicated to fuzz testing, and we
could configure it for 1..15 minutes.

2300 builds * 15 minutes is 23 days worth of fuzzing.

>It seems like we could also possibly request a VM
>from INFRA to run fuzz testing full time

It would be great, however we need to have a fuzzer first :)

Vladimir

Re: Calcite fuzz testing

Posted by Michael Mior <mm...@apache.org>.

I certainly wouldn't be opposed to having a little bit of fuzz testing
incorporated into regular tests as a trial. My only concern is that may be
too short and unlikely to find any bugs. But we could always have it run
only on CI in the future. It seems like we could also possibly request a VM
from INFRA to run fuzz testing full time if it ends up being particularly
valuable. https://reference.apache.org/pmc/vms

--
Michael Mior
mmior@apache.org


Le mer. 12 sept. 2018 à 07:46, Vladimir Sitnikov <
sitnikov.vladimir@gmail.com> a écrit :

> Michael>I wouldn't expect this to execute as part of the normal test suite
>
> Note: we can include limited fuzzing in the regular test suite.
> For instance: a single Travis job that performs fuzz-only tests for a
> couple of minutes.
> That would not increase test duration, however it could eventually find a
> gem.
>
> We can even include a 5 second fuzzer into the regular suite, and it might
> dig 1000-10'000 cases or so provided we don't start PostgreSQL there.
>
> Michael>Either way, it sounds like this
> Michael> could be a good contribution if it's so adept at finding bugs :)
>
> Well, I even have a JIRA for that, however it takes time to add javadoc :)
>
> Here's one more case where fuzzer would definitely find issues:
> https://github.com/apache/calcite/pull/827
>
>
> Vladimir
>

Re: Calcite fuzz testing

Posted by Vladimir Sitnikov <si...@gmail.com>.

Michael>I wouldn't expect this to execute as part of the normal test suite

Note: we can include limited fuzzing in the regular test suite.
For instance: a single Travis job that performs fuzz-only tests for a
couple of minutes.
That would not increase test duration, however it could eventually find a
gem.

We can even include a 5 second fuzzer into the regular suite, and it might
dig 1000-10'000 cases or so provided we don't start PostgreSQL there.

Michael>Either way, it sounds like this
Michael> could be a good contribution if it's so adept at finding bugs :)

Well, I even have a JIRA for that, however it takes time to add javadoc :)

Here's one more case where fuzzer would definitely find issues:
https://github.com/apache/calcite/pull/827


Vladimir

Re: Calcite fuzz testing

Posted by Michael Mior <mm...@apache.org>.

Thanks for the pointer to sqlsmith. I had seen it in the past but its
existence slipped from my memory. (I was also just looking for an excuse to
play around with afl.) Sounds like it might be helpful to have a test
harness to enable us to easily run sqlsmith (I wouldn't expect this to
execute as part of the normal test suite). I like the second option you
proposed for this.

The third option is closest to what I'm currently doing although the need
for a bridge between Java land and afl definitely slows things down
significantly which makes it much harder to realize the benefits of fuzz
testing. I haven't found a great Java native library for fuzz testing
that's anywhere near as mature and robust as afl.

> it does find multiple bugs in RexSimplify in a minute.

I assume all these bugs have been reported. Either way, it sounds like this
could be a good contribution if it's so adept at finding bugs :)

--
Michael Mior
mmior@apache.org


Le lun. 10 sept. 2018 à 08:09, Vladimir Sitnikov <
sitnikov.vladimir@gmail.com> a écrit :

> Michael>Tests are running quite slowly, but I left it overnight and haven't
> found any crashes
>
> There's https://github.com/anse1/sqlsmith which is supposed to be used
> against PostgreSQL.
> However, we could configure Calcite to be a proxy to the PostgreSQL, so
> sqlsmith would use Calcite as if it was PostgreSQL.
>
> Second option is to call sqlsmith to generate some queries, and then
> execute it via Java-based API.
>
> The third option is to implement Java-based (or Kotlin-based) fuzzer. I
> think it might stress more aspects.
> For instance, Calcite has various quoting rules, and sqlsmith just does not
> support that.
>
> Michael>Any suggestions on other things that might be relatively
> straightforward to test?
>
> My RexFuzzer.java is about 200-300 lines of straight-forward Java code that
> produces random RexNodes, and it does find multiple bugs in RexSimplify in
> a minute.
> I think we might want to have a fuzzer for RelBuilder as well.
> It should be more-or-less trivial, and it might find interesting cases.
>
> As a bonus point, we can run the same SQL through regular PostgreSQL and
> compare the outputs.
>
> Vladimir
>

Re: Calcite fuzz testing

Posted by Vladimir Sitnikov <si...@gmail.com>.

Michael>Tests are running quite slowly, but I left it overnight and haven't
found any crashes

There's https://github.com/anse1/sqlsmith which is supposed to be used
against PostgreSQL.
However, we could configure Calcite to be a proxy to the PostgreSQL, so
sqlsmith would use Calcite as if it was PostgreSQL.

Second option is to call sqlsmith to generate some queries, and then
execute it via Java-based API.

The third option is to implement Java-based (or Kotlin-based) fuzzer. I
think it might stress more aspects.
For instance, Calcite has various quoting rules, and sqlsmith just does not
support that.

Michael>Any suggestions on other things that might be relatively
straightforward to test?

My RexFuzzer.java is about 200-300 lines of straight-forward Java code that
produces random RexNodes, and it does find multiple bugs in RexSimplify in
a minute.
I think we might want to have a fuzzer for RelBuilder as well.
It should be more-or-less trivial, and it might find interesting cases.

As a bonus point, we can run the same SQL through regular PostgreSQL and
compare the outputs.

Vladimir