You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@calcite.apache.org by Vladimir Sitnikov <si...@gmail.com> on 2019/04/26 18:36:29 UTC

Re: Calcite fuzz testing

Let me post a couple of links I've came across today (it comes out of this
Twitter thread: https://twitter.com/backendsecret/status/1121290210464034816
):

https://github.com/alexknvl/fuzzball -- it is a machine learning driven
fuzzer for Scala which identifies quite a few bugs in Scala compiler.

The beauty of ML is we don't need to somehow declare the grammar, but it
can just learn from lots of samples.
I've no idea if that would play well for SQL (we need to declare metadata
somehow), however it might still work somehow.

Then there's https://github.com/cretz/javan-warty-pig a fuzzer + bytecode
agent to trace execution (it remembers the taken paths, so it distinguishes
"different" executions.

https://github.com/alexknvl/tracehash -- a library that produces short
summaries for exception stacktraces.
Those signatures might be a good aid for "stackoverflow-guided-development"
(== we might want to print stacktrace signatures by default for Calcite
exceptions).

Vladimir

Re: Calcite fuzz testing

Posted by Julian Hyde <jh...@apache.org>.

What is the evidence that Tracehash actually works? In GeoHash there is a notion of proximity, so it is clear that if two locations are within 10 miles then there will be a maximum distance between their hashes. When Tracehash removes part of the stack, is this based on a human expert’s intuition that the middle of the stack is not relevant? Because usually it isn’t, but sometimes it is.

> On Apr 26, 2019, at 12:10 PM, Michael Mior <mm...@apache.org> wrote:
> 
> I could see some might dismiss this as noise, but I really like the
> idea of tracehash and it would be nice to see that catch on. (I think
> it would be interesting if it could be structured something like a
> geohash so truncation would reduce specificity, but it's less obvious
> how to do this here.) Since it takes up minimal space, I would be open
> to considering including it in stack traces.
> 
> --
> Michael Mior
> mmior@apache.org
> 
> Le ven. 26 avr. 2019 à 14:36, Vladimir Sitnikov
> <si...@gmail.com> a écrit :
>> 
>> Let me post a couple of links I've came across today (it comes out of this
>> Twitter thread: https://twitter.com/backendsecret/status/1121290210464034816
>> ):
>> 
>> https://github.com/alexknvl/fuzzball -- it is a machine learning driven
>> fuzzer for Scala which identifies quite a few bugs in Scala compiler.
>> 
>> The beauty of ML is we don't need to somehow declare the grammar, but it
>> can just learn from lots of samples.
>> I've no idea if that would play well for SQL (we need to declare metadata
>> somehow), however it might still work somehow.
>> 
>> Then there's https://github.com/cretz/javan-warty-pig a fuzzer + bytecode
>> agent to trace execution (it remembers the taken paths, so it distinguishes
>> "different" executions.
>> 
>> https://github.com/alexknvl/tracehash -- a library that produces short
>> summaries for exception stacktraces.
>> Those signatures might be a good aid for "stackoverflow-guided-development"
>> (== we might want to print stacktrace signatures by default for Calcite
>> exceptions).
>> 
>> Vladimir

Re: Calcite fuzz testing

Posted by Michael Mior <mm...@apache.org>.

It would be interesting if the Tracehash author had a source of bug
reports identified as duplicates along with stack traces to see how
well it works in practice. At this point, it seems like it's just a
heuristic based on an opinion of what's important.
--
Michael Mior
mmior@apache.org

Le ven. 26 avr. 2019 à 15:10, Michael Mior <mm...@apache.org> a écrit :
>
> I could see some might dismiss this as noise, but I really like the
> idea of tracehash and it would be nice to see that catch on. (I think
> it would be interesting if it could be structured something like a
> geohash so truncation would reduce specificity, but it's less obvious
> how to do this here.) Since it takes up minimal space, I would be open
> to considering including it in stack traces.
>
> --
> Michael Mior
> mmior@apache.org
>
> Le ven. 26 avr. 2019 à 14:36, Vladimir Sitnikov
> <si...@gmail.com> a écrit :
> >
> > Let me post a couple of links I've came across today (it comes out of this
> > Twitter thread: https://twitter.com/backendsecret/status/1121290210464034816
> > ):
> >
> > https://github.com/alexknvl/fuzzball -- it is a machine learning driven
> > fuzzer for Scala which identifies quite a few bugs in Scala compiler.
> >
> > The beauty of ML is we don't need to somehow declare the grammar, but it
> > can just learn from lots of samples.
> > I've no idea if that would play well for SQL (we need to declare metadata
> > somehow), however it might still work somehow.
> >
> > Then there's https://github.com/cretz/javan-warty-pig a fuzzer + bytecode
> > agent to trace execution (it remembers the taken paths, so it distinguishes
> > "different" executions.
> >
> > https://github.com/alexknvl/tracehash -- a library that produces short
> > summaries for exception stacktraces.
> > Those signatures might be a good aid for "stackoverflow-guided-development"
> > (== we might want to print stacktrace signatures by default for Calcite
> > exceptions).
> >
> > Vladimir

Re: Calcite fuzz testing

Posted by Michael Mior <mm...@apache.org>.

I could see some might dismiss this as noise, but I really like the
idea of tracehash and it would be nice to see that catch on. (I think
it would be interesting if it could be structured something like a
geohash so truncation would reduce specificity, but it's less obvious
how to do this here.) Since it takes up minimal space, I would be open
to considering including it in stack traces.

--
Michael Mior
mmior@apache.org

Le ven. 26 avr. 2019 à 14:36, Vladimir Sitnikov
<si...@gmail.com> a écrit :
>
> Let me post a couple of links I've came across today (it comes out of this
> Twitter thread: https://twitter.com/backendsecret/status/1121290210464034816
> ):
>
> https://github.com/alexknvl/fuzzball -- it is a machine learning driven
> fuzzer for Scala which identifies quite a few bugs in Scala compiler.
>
> The beauty of ML is we don't need to somehow declare the grammar, but it
> can just learn from lots of samples.
> I've no idea if that would play well for SQL (we need to declare metadata
> somehow), however it might still work somehow.
>
> Then there's https://github.com/cretz/javan-warty-pig a fuzzer + bytecode
> agent to trace execution (it remembers the taken paths, so it distinguishes
> "different" executions.
>
> https://github.com/alexknvl/tracehash -- a library that produces short
> summaries for exception stacktraces.
> Those signatures might be a good aid for "stackoverflow-guided-development"
> (== we might want to print stacktrace signatures by default for Calcite
> exceptions).
>
> Vladimir

Re: Calcite fuzz testing

Posted by Andrew O <ao...@gmail.com>.

Although not using ML,  mutation testing of existing test queries could
help catch additional parsing / planning issues. By coincidence I can
across the link below which could be conceptually relevant to this topic,
(although the code / implementation may not directly be)

https://in2test.lsi.uniovi.es/sqlmutation/?lang=en

Regards

Andrew

On Fri, 26 Apr 2019, 19:36 Vladimir Sitnikov, <si...@gmail.com>
wrote:

> Let me post a couple of links I've came across today (it comes out of this
> Twitter thread:
> https://twitter.com/backendsecret/status/1121290210464034816
> ):
>
> https://github.com/alexknvl/fuzzball -- it is a machine learning driven
> fuzzer for Scala which identifies quite a few bugs in Scala compiler.
>
> The beauty of ML is we don't need to somehow declare the grammar, but it
> can just learn from lots of samples.
> I've no idea if that would play well for SQL (we need to declare metadata
> somehow), however it might still work somehow.
>
> Then there's https://github.com/cretz/javan-warty-pig a fuzzer + bytecode
> agent to trace execution (it remembers the taken paths, so it distinguishes
> "different" executions.
>
> https://github.com/alexknvl/tracehash -- a library that produces short
> summaries for exception stacktraces.
> Those signatures might be a good aid for "stackoverflow-guided-development"
> (== we might want to print stacktrace signatures by default for Calcite
> exceptions).
>
> Vladimir
>