You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by Nick Dimiduk <nd...@apache.org> on 2015/11/30 20:08:33 UTC

Using Calcite's parser for similar query analysis?

Hello.

I have a pile of SQL queries. Is it possible to use Calcite to group these
queries into "similar" buckets? Perhaps via some hash function on a parsed
query? I see some databases' query planners provide this kind of
functionality, but mine apparently does not.

Thanks,
Nick

Re: Using Calcite's parser for similar query analysis?

Posted by Nick Dimiduk <nd...@gmail.com>.
Yes, I think you idea of an equivalence set is what I'm looking for.

On Mon, Nov 30, 2015 at 11:22 AM, Julian Hyde <jh...@apache.org> wrote:

> I think there are divergent definitions of “similar”, and it helps if you
> state why you want to detect similar queries.
>
> For example, by one definition, queries are similar if they are the same
> modulo substituting constants. For these queries,  you can share plans if
> you hoist literals[1].
>
> Other queries are similar if they use the same underlying tables. (The
> goal here might be security or compliance.)
>
> Another definition of similarity is “can I satisfy query A using the
> results of query B”? Materialized view rewrite is interested in this.
>
> For some kinds of similarity, I would work on the AST (i.e. SqlNode). For
> other, deeper, kinds of similarity I would recommend working at the algebra
> level (i.e. RelNode).
>
> Then there is equivalence sets vs. similarity. I think you’re talking
> about equivalence sets. Q1 and Q2 are either in the same equivalence set or
> they are not. If they are, then f(Q1) = f(Q2), where f is some function,
> say a hash function or canonization function. Other kinds of similarity
> (e.g. [2]) might give a “distance” between queries, and queries are deemed
> equivalent iff that distance is 0.
>
> Julian
>
> [1] https://issues.apache.org/jira/browse/CALCITE-963
>
> [2] https://en.wikipedia.org/wiki/Cosine_similarity <
> https://en.wikipedia.org/wiki/Cosine_similarity>
>
>
> > On Nov 30, 2015, at 11:08 AM, Nick Dimiduk <nd...@apache.org> wrote:
> >
> > Hello.
> >
> > I have a pile of SQL queries. Is it possible to use Calcite to group
> these
> > queries into "similar" buckets? Perhaps via some hash function on a
> parsed
> > query? I see some databases' query planners provide this kind of
> > functionality, but mine apparently does not.
> >
> > Thanks,
> > Nick
>
>

Re: Using Calcite's parser for similar query analysis?

Posted by Julian Hyde <jh...@apache.org>.
I think there are divergent definitions of “similar”, and it helps if you state why you want to detect similar queries.

For example, by one definition, queries are similar if they are the same modulo substituting constants. For these queries,  you can share plans if you hoist literals[1].

Other queries are similar if they use the same underlying tables. (The goal here might be security or compliance.)

Another definition of similarity is “can I satisfy query A using the results of query B”? Materialized view rewrite is interested in this.

For some kinds of similarity, I would work on the AST (i.e. SqlNode). For other, deeper, kinds of similarity I would recommend working at the algebra level (i.e. RelNode).

Then there is equivalence sets vs. similarity. I think you’re talking about equivalence sets. Q1 and Q2 are either in the same equivalence set or they are not. If they are, then f(Q1) = f(Q2), where f is some function, say a hash function or canonization function. Other kinds of similarity (e.g. [2]) might give a “distance” between queries, and queries are deemed equivalent iff that distance is 0.

Julian

[1] https://issues.apache.org/jira/browse/CALCITE-963

[2] https://en.wikipedia.org/wiki/Cosine_similarity <https://en.wikipedia.org/wiki/Cosine_similarity>


> On Nov 30, 2015, at 11:08 AM, Nick Dimiduk <nd...@apache.org> wrote:
> 
> Hello.
> 
> I have a pile of SQL queries. Is it possible to use Calcite to group these
> queries into "similar" buckets? Perhaps via some hash function on a parsed
> query? I see some databases' query planners provide this kind of
> functionality, but mine apparently does not.
> 
> Thanks,
> Nick