You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by Michael Mior <mm...@uwaterloo.ca> on 2017/10/17 16:43:49 UTC

Re: Cosette / Apache Calcite

This took me longer than expected to get around to, but hopefully the below
is helpful:

https://gist.github.com/fd2d2db412c11ec4d901925548f85ef2

I just did some basic (and very hacky) instrumentation of RelOptRulesTest
to dump SQL before and after rules have been applied. The file consists of
the name of the test followed by the original and then the rewritten SQL.

Many of the tests are missing for various reasons, but there's still 189
examples there to play with. Let me know if any particular aspects of the
SQL are problematic. The "before" SQL is handwritten for the tests and the
"after" is ANSI SQL as generated by Calcite from the resulting logical plan.

--
Michael Mior
mmior@apache.org

2017-09-22 1:03 GMT-04:00 Amogh Margoor <am...@qubole.com>:

> >>> There might be applications in materialized views. A query Q can use a
> materialized view V if V covers Q. In other words if >>>Q == R(V) where R
> is some sequence of relational operators. Given Q and V, Cosette could
> perhaps analyze and either >>>return R (success) or return that V does not
> cover Q (failure).
>
> >>This resembles the problem of deciding whether a given relation
> (expressed as a query) is contained in another one. It will >>take some
> work for Cosette to be able to handle this but it definitely sounds
> interesting. Do you have an application in mind? >>One of them might be to
> determine whether previously cached results can be used.
>
> One simple idea to start here is to replace a naive solver we have in
> Calcite for checking if one predicate implies another predicate. We call it
> RexImplicationChecker in Calcite and if we can replace or help it with
> Constraint solver of Cosette which says if a particular implication is a
> tautology then that would help a great deal.
>
>
>
> On Tue, Sep 19, 2017 at 9:29 PM, Alvin Cheung <ak...@cs.washington.edu>
> wrote:
>
> > Hi Julian et al,
> >
> > Thanks for your interest in Cosette. Your suggestions make a lot of
> sense.
> > We have done some initial work and would like to get your feedback on how
> > to integrate the two tools together.
> >
> > > One obvious idea is to use Cosette to audit Calcite’s query
> > transformation rules. Each rule is supposed to preserve semantics but
> > (until Cosette) we had to trust the author of the rule. We could convert
> > the before and after relational expressions to SQL, and then ask Cosette
> > whether those are equivalent. We could enable this check in Calcite’s
> test
> > suite, during which many thousands of rules are fired.
> >
> > Indeed. We have browsed through the Calcite rules and reformulated a few
> > of them using our Cosette language:
> >
> > 1. Conjunctive select (https://github.com/apache/cal
> > cite/blob/master/core/src/main/java/org/apache/calcite/rel/
> > rules/FilterMergeRule.java) --> https://demo.cosette.cs.washington.edu/
> > (click conjunctive select from the dropdown menu)
> >
> > 2. Join commute (https://github.com/apache/cal
> > cite/blob/master/core/src/main/java/org/apache/calcite/rel/
> > rules/JoinCommuteRule.java) --> Join commute from the demo website above
> >
> > 3. Join/Project transpose (https://github.com/apache/cal
> > cite/blob/master/core/src/main/java/org/apache/calcite/rel/
> > rules/JoinProjectTransposeRule.java) --> Join Proj. Trans. from the demo
> > website above
> >
> > As we are not very familiar with the Calcite code base, we have tried our
> > best to guess the intention of each rule based on the documentation,
> please
> > feel free to point out if we made mistakes.
> >
> > As you can see, the Cosette language is pretty much like standard SQL,
> > except for declarations of schemas and relations. You will also notice
> the
> > "??" in some schema declarations (e.g., in rule 1. above) --- they stand
> > for "symbolic" attributes that can represent any attribute. In other
> words,
> > if Cosette can prove that a rule with symbolic attributes is true, then
> it
> > will be true regardless of what the symbolic attributes are instantiated
> > with. Symbolic predicates (e.g., in rule 1.) works similarly, hence
> giving
> > Cosette a mechanism to prove (or disprove) classes of rewrite rules. See
> > our documentation at http://cosette.cs.washington.edu/guide for details.
> >
> > I believe the challenge here is how we can "reverse engineer" the
> > intention of each of the existing rules so that they can be expressed in
> > Cosette. Any suggestions on how to do this? We have a few students
> working
> > on Cosette and can help, but we will probably need help from Calcite to
> > fully understand all of the existing rules. Another possibility is to
> print
> > out the input and output of each rule application during testing, and
> send
> > them to Cosette. If the printout is in a form that resembles SQL we can
> > probably patch it into Cosette.
> >
> > For new rules, can we can ask Calcite authors to express them in Cosette
> > as well, perhaps as part of the documentation? This way we will only need
> > to handle the existing rules.
> >
> > > A few rules might use other information besides the input relational
> > expression, such as predicates that are known to hold or column
> > combinations that are known to be unique. But let’s see what happens.
> >
> > This is something that we are actively working on. Can you point us to
> > specific rules with such properties? One possibility is the join
> > commutativity rule noted above. You will notice that we didn't prove the
> > "general form" of the rule with symbolic attributes, but rather one with
> > concrete schemas. This is because Cosette currently implements the
> unnamed
> > approach to attribute naming (see Section 3.2 in
> > http://webdam.inria.fr/Alice/pdfs/Chapter-3.pdf), hence the general form
> > of the rule is only true if we know that the two input schemas have
> > distinct attributes.
> >
> > > This is a very loose integration of Cosette / Calcite, but we can make
> > closer integrations (e.g. within the same JVM, even at runtime) as we
> > discover synergies. After all, optimization and theorem-proving are
> related
> > endeavors.
> >
> > Agreed. Cosette is implemented using Coq and Racket. We realize that
> those
> > are not the most popular languages for implementing systems :) , so
> Cosette
> > comes with a POST API as well: http://cosette.cs.washington.
> edu/guide#api
> > . It takes in the program text written in Cosette, and returns the answer
> > (or times out). Does this make it easier to run the tool? We are open to
> > implementing other bindings as well.
> >
> > > Another area that would be useful would be to devise test data.
> >
> > How about this: Each SQL implementation has its own interpretation of
> SQL,
> > with Cosette being one of them. Let's implement different SQL semantics
> > using Cosette (say, Calcite's and Postgres'). Then, given a query, ask
> > Cosette to find a counterexample (i.e., an input relation) where the two
> > implementations will return different results when executed on a given
> > query. If such a counterexample exists, then Calcite developers can
> > determine whether this is a "bug" or a "feature". Does this sound similar
> > to what you have in mind?
> >
> > > There might be applications in materialized views. A query Q can use a
> > materialized view V if V covers Q. In other words if Q == R(V) where R is
> > some sequence of relational operators. Given Q and V, Cosette could
> perhaps
> > analyze and either return R (success) or return that V does not cover Q
> > (failure).
> >
> > This resembles the problem of deciding whether a given relation
> (expressed
> > as a query) is contained in another one. It will take some work for
> Cosette
> > to be able to handle this but it definitely sounds interesting. Do you
> have
> > an application in mind? One of them might be to determine whether
> > previously cached results can be used.
> >
> > We definitely see lots of synergies between the two tools. To start with
> > something easy :) , I propose we first discuss how to use the current
> > Cosette implementation to audit existing Calcite rules, and a way to
> > integrate Cosette into development of future Calcite rules as part of
> code
> > review / regression tests. What do you think?
> >
> > Thanks,
> > Alvin (on behalf of the Cosette team)
> >
>

Re: Cosette / Apache Calcite

Posted by Michael Mior <mm...@uwaterloo.ca>.
Thanks Shumo! That sounds like a great start. If you could also post a link
to your fork once it's pushed to the JIRA case below for reference that
would be helpful.

https://issues.apache.org/jira/browse/CALCITE-1977

--
Michael Mior
mmior@apache.org

2018-02-08 19:53 GMT-05:00 Shumo Chu <sh...@gmail.com>:

> I agree! Being able to turn on and off makes a lot of sense. Since
> Cosette's installation is dockerized, running locally would not be a
> problem. Let me pull together my code and do a few additional gluing so
> that you can run them easily. Then we can start from there. I will put
> my code on my fork of Calcite, does that sound good?
>
> Best
> Shumo
>
> On Thu, Feb 8, 2018 at 4:37 PM, Michael Mior <mm...@uwaterloo.ca> wrote:
>
> > Shumo,
> >
> > Thanks for the additional info. I think it would be great to have these
> > tests run as part of the Calcite test suite. One question I have is what
> > the runtime is like for these tests. Especially since using the web API
> > would result in a dependency on a service we (as in the Calcite PMC)
> don't
> > control, I would be inclined to have these tests run separately so they
> can
> > be turned off if needed. Per an earlier comment from Julian (
> > https://issues.apache.org/jira/browse/CALCITE-1977) one option would be
> to
> > allow Calcite to generate a script that can be fed to Cosette.
> >
> > Also, while I'm sure I could reproduce something like the test suite you
> > currently have in time, if you have code you could share for your current
> > test setup with Cosette and Calcite that would likely be helpful in
> moving
> > this along.
> >
> > Cheers,
> > --
> > Michael Mior
> > mmior@uwaterloo.ca
> >
> > 2018-02-08 19:13 GMT-05:00 Shumo Chu <sh...@gmail.com>:
> >
> > > Currently, I hijack the test cases and call RelToSqlConverter before
> and
> > > after calcite's rewrite. When I get the query before (Q1) and after
> > (Q2), I
> > > can then generate Cosette source code, need to add a few things, e.g.
> > > schema declarations. This Cosette source code is then sent to our
> solver
> > to
> > > get the result of equivalence checking.
> > >
> > > Now everything is run on my own computer. But Cosette does provide a
> web
> > > service that is running on a UW CSE server and has a web API to call (
> > > http://cosette.cs.washington.edu/guide#api). So calcite can either
> call
> > > the
> > > web API (the advantage is no need to install any dependencies) or try
> to
> > > run Cosette locally (we have dockerized the installation of Cosette).
> > >
> > > What do you think?
> > >
> > > Best
> > > Shumo
> > >
> > > On Thu, Feb 8, 2018 at 3:25 PM, Michael Mior <mm...@uwaterloo.ca>
> wrote:
> > >
> > > > Shumo and Alvin,
> > > >
> > > > That's great to hear (especially the part about the lack of bugs)!
> That
> > > > paper is next on my stack to read. I'd love to work on getting these
> > > tests
> > > > better integrated with Calcite. It would be really helpful if you
> could
> > > > provide more detail on how you're currently running the tests.
> Ideally
> > if
> > > > you could provide a way for me to easily replicate your test
> > > > infrastructure, that would probably help give me a better idea on how
> > we
> > > > can help with the integration.
> > > >
> > > > Cheers,
> > > > --
> > > > Michael Mior
> > > > mmior@apache.org
> > > >
> > > > 2018-02-08 17:56 GMT-05:00 Shumo Chu <sh...@gmail.com>:
> > > >
> > > > > Hi, Calcite Devs!
> > > > >
> > > > > We wanted to update you on the current status of using Cosette to
> > help
> > > > > check the correctness of the Calcite planner rules [
> > > > > https://issues.apache.org/jira/browse/CALCITE-1977]:
> > > > >
> > > > > So far we have used Cosette to check whether Calcite's
> > transformations
> > > > are
> > > > > correct by using the test cases in RelOptRulesTest.java.
> > > > >
> > > > > For each test case, we use RelToSqlConverter to revert the query
> plan
> > > > > before Calcite's rewrite back to SQL (call it Q1) and similarly for
> > the
> > > > > query plan after Calcite's rewrite (Q2). We then use Cosette to
> check
> > > > > whether Q1 and Q2 are semantically equivalent.
> > > > >
> > > > > As Calcite's test cases cover many SQL features, some of them are
> not
> > > > > supported by Cosette yet. Out of the 39 (give number) test cases
> that
> > > use
> > > > > SQL features supported by Cosette, Cosette is now able to formally
> > > prove
> > > > > that Calcite's rewrite in 33 of them are correct. This includes a
> few
> > > > > fairly complicated ones, like "testPushFilterPassAggThree" (
> > > > > https://github.com/apache/calcite/blob/master/core/src/test
> > > > > /java/org/apache/calcite/test/RelOptRulesTest.java#L408) The good
> > news
> > > > is
> > > > > that we haven't found any bugs so far :)
> > > > >
> > > > > We have also used the test cases to improve Cosette. For example,
> > > Cosette
> > > > > now supports checking equivalence under database integrity
> > constraints
> > > as
> > > > > preconditions, e.g., Q1 == Q2 iff R.A is a key of R. In case you
> are
> > > > > interested you can read about our latest system at
> > > > > https://arxiv.org/abs/1802.02229
> > > > >
> > > > > We would like to improve Cosette's parser so that we can handle the
> > > > > remaining Calcite test cases. Meanwhile, we would also like to
> > discuss
> > > > how
> > > > > to integrate the two tools in a more systematic way, perhaps as
> part
> > of
> > > > the
> > > > > Calcite test infrastructure rather than us manually extracting the
> > test
> > > > > cases from RelOptRulesTest (RelToSqlConverter also doesn't always
> > > succeed
> > > > > in reversion as you may know). Anyone interested in discussing
> more?
> > If
> > > > so
> > > > > how should we proceed?
> > > > >
> > > > > Best
> > > > > Shumo & Alvin
> > > > >
> > > > > On Tue, Oct 17, 2017 at 6:36 PM, Shumo Chu <sh...@gmail.com>
> > > wrote:
> > > > >
> > > > > > I create a link to the image:
> > > > > >
> > > > > > https://homes.cs.washington.edu/~chushumo/img/calcite_
> > break_down.png
> > > > > >
> > > > > > In terms of integration, what I can imagine is to call Cosette
> web
> > > API
> > > > in
> > > > > > parallel when running calcite test cases, like RelOptRulesTest,
> > then
> > > > > > perhaps Cosette result can give developers either extra
> confidence
> > or
> > > > > > counterexample if there is a bug.
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Tue, Oct 17, 2017 at 5:49 PM, Michael Mior <
> mmior@uwaterloo.ca>
> > > > > wrote:
> > > > > >
> > > > > >> Shumo,
> > > > > >>
> > > > > >> Thanks for the update! Unfortunately I don't believe inline
> images
> > > and
> > > > > >> attachments are not supported on ASF mailing lists so if you
> have
> > > > > another
> > > > > >> way of sharing that, would be interested to see. Glad to hear
> the
> > > test
> > > > > >> cases have been useful to you. Also great to see that Calcite
> > seems
> > > to
> > > > > be
> > > > > >> faring well so far :)
> > > > > >>
> > > > > >> You're right that the conversion of plans to SQL definitely has
> > some
> > > > > >> missing pieces. If there are any that are particularly
> problematic
> > > in
> > > > > your
> > > > > >> with Cosette, please let us know. For any bugs you encounter, it
> > > would
> > > > > be
> > > > > >> great if you could file a JIRA so we can work on fixing these (
> > > > > >> https://issues.apache.org/jira/projects/CALCITE/issues/).
> > > > > >>
> > > > > >> --
> > > > > >> Michael Mior
> > > > > >> mmior@apache.org
> > > > > >>
> > > > > >> 2017-10-17 20:06 GMT-04:00 Shumo Chu <sh...@gmail.com>:
> > > > > >>
> > > > > >> > Hi Michael and Calcite Devs:
> > > > > >> >
> > > > > >> > That's great!
> > > > > >> >
> > > > > >> > We also hijacked the test harness of RelOpRule Test and
> > generated
> > > > SQL
> > > > > >> > using before and after optimization logical plans. (The reason
> > > that
> > > > I
> > > > > >> use
> > > > > >> > query plan rather than the input SQL is that when Calcite
> > > generates
> > > > > SQLs
> > > > > >> > from query plans, it adds the correct table reference of
> > > attributes.
> > > > > >> > Cosette currently cannot handle unqualified attribute
> > references).
> > > > We
> > > > > >> got
> > > > > >> > 232 before and after SQL after some tweaking. You can find the
> > SQL
> > > > > >> queries
> > > > > >> > that we get here: https://github.com/uwdb/Cosett
> > > > > >> e/blob/master/examples/c
> > > > > >> > alcite/calcite_tests.json.
> > > > > >> >
> > > > > >> > We then ran all of them using Cosette. Cosette can currently
> > > support
> > > > > 39
> > > > > >> of
> > > > > >> > 232 queries. Out of these 39 queries, the results are:
> > > > > >> >
> > > > > >> > EQ           9
> > > > > >> > UNKNOWN     27
> > > > > >> > NEQ          3
> > > > > >> >
> > > > > >> > EQ means our Coq backend found a valid proof for the
> equivalence
> > > of
> > > > > the
> > > > > >> > two queries.
> > > > > >> >
> > > > > >> > UNKNOWN means our model checker could not find a
> counterexample
> > to
> > > > > prove
> > > > > >> > that the queries are not equivalent (within a time bound of 3
> > > secs).
> > > > > You
> > > > > >> > can interpret that as "likely equal".
> > > > > >> >
> > > > > >> > NEQ means our model checker found a counterexample to prove
> the
> > > > > queries
> > > > > >> > are not equivalent. These are all the cases that there are
> some
> > > > > >> > preconditions that Calcite understands but we haven't yet put
> > into
> > > > > >> Cosette.
> > > > > >> > For example (testPushSemiJoinPastJoinRuleLeft):
> > > > > >> >
> > > > > >> >  query q1 `SELECT EMP.ENAME
> > > > > >> >                   FROM EMP AS EMP, DEPT AS DEPT, EMP AS EMP0
> > > > > >> >                    WHERE EMP.DEPTNO = DEPT.DEPTNO AND
> > > > > >> >                                  EMP.EMPNO = EMP0.EMPNO`;
> > > > > >> >
> > > > > >> > query q2 `SELECT EMP1.ENAME
> > > > > >> >                  FROM EMP AS EMP1 INNER JOIN DEPT AS DEPT0
> > > > > >> >                     ON EMP1.DEPTNO = DEPT0.DEPTNO INNER JOIN
> EMP
> > > AS
> > > > > EMP2
> > > > > >> >                       ON EMP1.EMPNO = EMP2.EMPNO INNER JOIN
> DEPT
> > > AS
> > > > > >> DEPT1
> > > > > >> >                     ON EMP1.DEPTNO = DEPT1.DEPTNO INNER JOIN
> EMP
> > > AS
> > > > > EMP3
> > > > > >> >                     ON EMP1.EMPNO = EMP3.EMPNO`;
> > > > > >> >
> > > > > >> >
> > > > > >> > These two queries are equivalent under the precondition that
> > EMPNO
> > > > is
> > > > > >> the
> > > > > >> > primary key of EMP, and DEPTNO is the primary key of DEPT, and
> > > that
> > > > is
> > > > > >> > indeed the case!
> > > > > >> >
> > > > > >> > We are extending Cosette to support preconditions such as
> > primary
> > > > keys
> > > > > >> and
> > > > > >> > foreign keys now and we should be able to handle these soon.
> > > > > >> >
> > > > > >> > For the queries that we cannot handle, here is the break down
> of
> > > the
> > > > > >> > issues:
> > > > > >> >
> > > > > >> > [image: Inline image 1]
> > > > > >> >
> > > > > >> > We support a large part of group by queries, GROUPBY here
> means
> > > some
> > > > > >> > grouping features that we don't support, e.g., group by on
> > > arbitrary
> > > > > >> > expressions. Although we might not be able to support all of
> > these
> > > > SQL
> > > > > >> > features, we are definitely working on adding more SQL
> features
> > to
> > > > > >> Cosette.
> > > > > >> >
> > > > > >> > With all above, we can try to integrate Cosette as an extra
> > layer
> > > to
> > > > > >> > ensure developer's confidence. Currently, we are working on
> > SIGMOD
> > > > > >> deadline
> > > > > >> > (Nov. 2) but will devote more time into this afterward.
> > > > > >> >
> > > > > >> > One thing worth mentioning is that we plan to include all
> these
> > > > > calcite
> > > > > >> > examples as the regression test for Cosette so that we can
> make
> > > > > Cosette
> > > > > >> > more powerful and practical. These queries are great
> benchmarks
> > > for
> > > > > >> Cosette
> > > > > >> > and are super valuable!
> > > > > >> >
> > > > > >> > Meanwhile, It would be also great if calcite's logical plan to
> > SQL
> > > > > >> > converter can be improved. It currently doesn't support all
> > > queries
> > > > > and
> > > > > >> in
> > > > > >> > some cases, it actually generates illegal SQL. (happy to
> submit
> > > > > detailed
> > > > > >> > bug report if you think that would be helpful).
> > > > > >> >
> > > > > >> > Best
> > > > > >> > Shumo
> > > > > >> >
> > > > > >> > On Tue, Oct 17, 2017 at 9:43 AM, Michael Mior <
> > mmior@uwaterloo.ca
> > > >
> > > > > >> wrote:
> > > > > >> >
> > > > > >> >> This took me longer than expected to get around to, but
> > hopefully
> > > > the
> > > > > >> >> below
> > > > > >> >> is helpful:
> > > > > >> >>
> > > > > >> >> https://gist.github.com/fd2d2db412c11ec4d901925548f85ef2
> > > > > >> >>
> > > > > >> >> I just did some basic (and very hacky) instrumentation of
> > > > > >> RelOptRulesTest
> > > > > >> >> to dump SQL before and after rules have been applied. The
> file
> > > > > >> consists of
> > > > > >> >> the name of the test followed by the original and then the
> > > > rewritten
> > > > > >> SQL.
> > > > > >> >>
> > > > > >> >> Many of the tests are missing for various reasons, but
> there's
> > > > still
> > > > > >> 189
> > > > > >> >> examples there to play with. Let me know if any particular
> > > aspects
> > > > of
> > > > > >> the
> > > > > >> >> SQL are problematic. The "before" SQL is handwritten for the
> > > tests
> > > > > and
> > > > > >> the
> > > > > >> >> "after" is ANSI SQL as generated by Calcite from the
> resulting
> > > > > logical
> > > > > >> >> plan.
> > > > > >> >>
> > > > > >> >> --
> > > > > >> >> Michael Mior
> > > > > >> >> mmior@apache.org
> > > > > >> >>
> > > > > >> >> 2017-09-22 1:03 GMT-04:00 Amogh Margoor <am...@qubole.com>:
> > > > > >> >>
> > > > > >> >> > >>> There might be applications in materialized views. A
> > query
> > > Q
> > > > > can
> > > > > >> >> use a
> > > > > >> >> > materialized view V if V covers Q. In other words if >>>Q
> ==
> > > R(V)
> > > > > >> where
> > > > > >> >> R
> > > > > >> >> > is some sequence of relational operators. Given Q and V,
> > > Cosette
> > > > > >> could
> > > > > >> >> > perhaps analyze and either >>>return R (success) or return
> > > that V
> > > > > >> does
> > > > > >> >> not
> > > > > >> >> > cover Q (failure).
> > > > > >> >> >
> > > > > >> >> > >>This resembles the problem of deciding whether a given
> > > relation
> > > > > >> >> > (expressed as a query) is contained in another one. It will
> > > > >>take
> > > > > >> some
> > > > > >> >> > work for Cosette to be able to handle this but it
> definitely
> > > > sounds
> > > > > >> >> > interesting. Do you have an application in mind? >>One of
> > them
> > > > > might
> > > > > >> be
> > > > > >> >> to
> > > > > >> >> > determine whether previously cached results can be used.
> > > > > >> >> >
> > > > > >> >> > One simple idea to start here is to replace a naive solver
> we
> > > > have
> > > > > in
> > > > > >> >> > Calcite for checking if one predicate implies another
> > > predicate.
> > > > We
> > > > > >> >> call it
> > > > > >> >> > RexImplicationChecker in Calcite and if we can replace or
> > help
> > > it
> > > > > >> with
> > > > > >> >> > Constraint solver of Cosette which says if a particular
> > > > implication
> > > > > >> is a
> > > > > >> >> > tautology then that would help a great deal.
> > > > > >> >> >
> > > > > >> >> >
> > > > > >> >> >
> > > > > >> >> > On Tue, Sep 19, 2017 at 9:29 PM, Alvin Cheung <
> > > > > >> >> akcheung@cs.washington.edu>
> > > > > >> >> > wrote:
> > > > > >> >> >
> > > > > >> >> > > Hi Julian et al,
> > > > > >> >> > >
> > > > > >> >> > > Thanks for your interest in Cosette. Your suggestions
> make
> > a
> > > > lot
> > > > > of
> > > > > >> >> > sense.
> > > > > >> >> > > We have done some initial work and would like to get your
> > > > > feedback
> > > > > >> on
> > > > > >> >> how
> > > > > >> >> > > to integrate the two tools together.
> > > > > >> >> > >
> > > > > >> >> > > > One obvious idea is to use Cosette to audit Calcite’s
> > query
> > > > > >> >> > > transformation rules. Each rule is supposed to preserve
> > > > semantics
> > > > > >> but
> > > > > >> >> > > (until Cosette) we had to trust the author of the rule.
> We
> > > > could
> > > > > >> >> convert
> > > > > >> >> > > the before and after relational expressions to SQL, and
> > then
> > > > ask
> > > > > >> >> Cosette
> > > > > >> >> > > whether those are equivalent. We could enable this check
> in
> > > > > >> Calcite’s
> > > > > >> >> > test
> > > > > >> >> > > suite, during which many thousands of rules are fired.
> > > > > >> >> > >
> > > > > >> >> > > Indeed. We have browsed through the Calcite rules and
> > > > > reformulated
> > > > > >> a
> > > > > >> >> few
> > > > > >> >> > > of them using our Cosette language:
> > > > > >> >> > >
> > > > > >> >> > > 1. Conjunctive select (https://github.com/apache/cal
> > > > > >> >> > > cite/blob/master/core/src/main/java/org/apache/calcite/
> > rel/
> > > > > >> >> > > rules/FilterMergeRule.java) -->
> > > https://demo.cosette.cs.washin
> > > > > >> >> gton.edu/
> > > > > >> >> > > (click conjunctive select from the dropdown menu)
> > > > > >> >> > >
> > > > > >> >> > > 2. Join commute (https://github.com/apache/cal
> > > > > >> >> > > cite/blob/master/core/src/main/java/org/apache/calcite/
> > rel/
> > > > > >> >> > > rules/JoinCommuteRule.java) --> Join commute from the
> demo
> > > > > website
> > > > > >> >> above
> > > > > >> >> > >
> > > > > >> >> > > 3. Join/Project transpose (https://github.com/apache/cal
> > > > > >> >> > > cite/blob/master/core/src/main/java/org/apache/calcite/
> > rel/
> > > > > >> >> > > rules/JoinProjectTransposeRule.java) --> Join Proj.
> Trans.
> > > > from
> > > > > >> the
> > > > > >> >> demo
> > > > > >> >> > > website above
> > > > > >> >> > >
> > > > > >> >> > > As we are not very familiar with the Calcite code base,
> we
> > > have
> > > > > >> tried
> > > > > >> >> our
> > > > > >> >> > > best to guess the intention of each rule based on the
> > > > > >> documentation,
> > > > > >> >> > please
> > > > > >> >> > > feel free to point out if we made mistakes.
> > > > > >> >> > >
> > > > > >> >> > > As you can see, the Cosette language is pretty much like
> > > > standard
> > > > > >> SQL,
> > > > > >> >> > > except for declarations of schemas and relations. You
> will
> > > also
> > > > > >> notice
> > > > > >> >> > the
> > > > > >> >> > > "??" in some schema declarations (e.g., in rule 1. above)
> > ---
> > > > > they
> > > > > >> >> stand
> > > > > >> >> > > for "symbolic" attributes that can represent any
> attribute.
> > > In
> > > > > >> other
> > > > > >> >> > words,
> > > > > >> >> > > if Cosette can prove that a rule with symbolic attributes
> > is
> > > > > true,
> > > > > >> >> then
> > > > > >> >> > it
> > > > > >> >> > > will be true regardless of what the symbolic attributes
> are
> > > > > >> >> instantiated
> > > > > >> >> > > with. Symbolic predicates (e.g., in rule 1.) works
> > similarly,
> > > > > hence
> > > > > >> >> > giving
> > > > > >> >> > > Cosette a mechanism to prove (or disprove) classes of
> > rewrite
> > > > > >> rules.
> > > > > >> >> See
> > > > > >> >> > > our documentation at http://cosette.cs.washington.
> > edu/guide
> > > > for
> > > > > >> >> details.
> > > > > >> >> > >
> > > > > >> >> > > I believe the challenge here is how we can "reverse
> > engineer"
> > > > the
> > > > > >> >> > > intention of each of the existing rules so that they can
> be
> > > > > >> expressed
> > > > > >> >> in
> > > > > >> >> > > Cosette. Any suggestions on how to do this? We have a few
> > > > > students
> > > > > >> >> > working
> > > > > >> >> > > on Cosette and can help, but we will probably need help
> > from
> > > > > >> Calcite
> > > > > >> >> to
> > > > > >> >> > > fully understand all of the existing rules. Another
> > > possibility
> > > > > is
> > > > > >> to
> > > > > >> >> > print
> > > > > >> >> > > out the input and output of each rule application during
> > > > testing,
> > > > > >> and
> > > > > >> >> > send
> > > > > >> >> > > them to Cosette. If the printout is in a form that
> > resembles
> > > > SQL
> > > > > we
> > > > > >> >> can
> > > > > >> >> > > probably patch it into Cosette.
> > > > > >> >> > >
> > > > > >> >> > > For new rules, can we can ask Calcite authors to express
> > them
> > > > in
> > > > > >> >> Cosette
> > > > > >> >> > > as well, perhaps as part of the documentation? This way
> we
> > > will
> > > > > >> only
> > > > > >> >> need
> > > > > >> >> > > to handle the existing rules.
> > > > > >> >> > >
> > > > > >> >> > > > A few rules might use other information besides the
> input
> > > > > >> relational
> > > > > >> >> > > expression, such as predicates that are known to hold or
> > > column
> > > > > >> >> > > combinations that are known to be unique. But let’s see
> > what
> > > > > >> happens.
> > > > > >> >> > >
> > > > > >> >> > > This is something that we are actively working on. Can
> you
> > > > point
> > > > > >> us to
> > > > > >> >> > > specific rules with such properties? One possibility is
> the
> > > > join
> > > > > >> >> > > commutativity rule noted above. You will notice that we
> > > didn't
> > > > > >> prove
> > > > > >> >> the
> > > > > >> >> > > "general form" of the rule with symbolic attributes, but
> > > rather
> > > > > one
> > > > > >> >> with
> > > > > >> >> > > concrete schemas. This is because Cosette currently
> > > implements
> > > > > the
> > > > > >> >> > unnamed
> > > > > >> >> > > approach to attribute naming (see Section 3.2 in
> > > > > >> >> > > http://webdam.inria.fr/Alice/pdfs/Chapter-3.pdf), hence
> > the
> > > > > >> general
> > > > > >> >> form
> > > > > >> >> > > of the rule is only true if we know that the two input
> > > schemas
> > > > > have
> > > > > >> >> > > distinct attributes.
> > > > > >> >> > >
> > > > > >> >> > > > This is a very loose integration of Cosette / Calcite,
> > but
> > > we
> > > > > can
> > > > > >> >> make
> > > > > >> >> > > closer integrations (e.g. within the same JVM, even at
> > > runtime)
> > > > > as
> > > > > >> we
> > > > > >> >> > > discover synergies. After all, optimization and
> > > theorem-proving
> > > > > are
> > > > > >> >> > related
> > > > > >> >> > > endeavors.
> > > > > >> >> > >
> > > > > >> >> > > Agreed. Cosette is implemented using Coq and Racket. We
> > > realize
> > > > > >> that
> > > > > >> >> > those
> > > > > >> >> > > are not the most popular languages for implementing
> systems
> > > :)
> > > > ,
> > > > > so
> > > > > >> >> > Cosette
> > > > > >> >> > > comes with a POST API as well:
> > http://cosette.cs.washington.
> > > > > >> >> > edu/guide#api
> > > > > >> >> > > . It takes in the program text written in Cosette, and
> > > returns
> > > > > the
> > > > > >> >> answer
> > > > > >> >> > > (or times out). Does this make it easier to run the tool?
> > We
> > > > are
> > > > > >> open
> > > > > >> >> to
> > > > > >> >> > > implementing other bindings as well.
> > > > > >> >> > >
> > > > > >> >> > > > Another area that would be useful would be to devise
> test
> > > > data.
> > > > > >> >> > >
> > > > > >> >> > > How about this: Each SQL implementation has its own
> > > > > interpretation
> > > > > >> of
> > > > > >> >> > SQL,
> > > > > >> >> > > with Cosette being one of them. Let's implement different
> > SQL
> > > > > >> >> semantics
> > > > > >> >> > > using Cosette (say, Calcite's and Postgres'). Then,
> given a
> > > > > query,
> > > > > >> ask
> > > > > >> >> > > Cosette to find a counterexample (i.e., an input
> relation)
> > > > where
> > > > > >> the
> > > > > >> >> two
> > > > > >> >> > > implementations will return different results when
> executed
> > > on
> > > > a
> > > > > >> given
> > > > > >> >> > > query. If such a counterexample exists, then Calcite
> > > developers
> > > > > can
> > > > > >> >> > > determine whether this is a "bug" or a "feature". Does
> this
> > > > sound
> > > > > >> >> similar
> > > > > >> >> > > to what you have in mind?
> > > > > >> >> > >
> > > > > >> >> > > > There might be applications in materialized views. A
> > query
> > > Q
> > > > > can
> > > > > >> >> use a
> > > > > >> >> > > materialized view V if V covers Q. In other words if Q ==
> > > R(V)
> > > > > >> where
> > > > > >> >> R is
> > > > > >> >> > > some sequence of relational operators. Given Q and V,
> > Cosette
> > > > > could
> > > > > >> >> > perhaps
> > > > > >> >> > > analyze and either return R (success) or return that V
> does
> > > not
> > > > > >> cover
> > > > > >> >> Q
> > > > > >> >> > > (failure).
> > > > > >> >> > >
> > > > > >> >> > > This resembles the problem of deciding whether a given
> > > relation
> > > > > >> >> > (expressed
> > > > > >> >> > > as a query) is contained in another one. It will take
> some
> > > work
> > > > > for
> > > > > >> >> > Cosette
> > > > > >> >> > > to be able to handle this but it definitely sounds
> > > interesting.
> > > > > Do
> > > > > >> you
> > > > > >> >> > have
> > > > > >> >> > > an application in mind? One of them might be to determine
> > > > whether
> > > > > >> >> > > previously cached results can be used.
> > > > > >> >> > >
> > > > > >> >> > > We definitely see lots of synergies between the two
> tools.
> > To
> > > > > start
> > > > > >> >> with
> > > > > >> >> > > something easy :) , I propose we first discuss how to use
> > the
> > > > > >> current
> > > > > >> >> > > Cosette implementation to audit existing Calcite rules,
> > and a
> > > > way
> > > > > >> to
> > > > > >> >> > > integrate Cosette into development of future Calcite
> rules
> > as
> > > > > part
> > > > > >> of
> > > > > >> >> > code
> > > > > >> >> > > review / regression tests. What do you think?
> > > > > >> >> > >
> > > > > >> >> > > Thanks,
> > > > > >> >> > > Alvin (on behalf of the Cosette team)
> > > > > >> >> > >
> > > > > >> >> >
> > > > > >> >>
> > > > > >> >
> > > > > >> >
> > > > > >> >
> > > > > >> > --
> > > > > >> > shumochu.com
> > > > > >> >
> > > > > >>
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > shumochu.com
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > shumochu.com
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > shumochu.com
> > >
> >
>
>
>
> --
> shumochu.com
>

Re: Cosette / Apache Calcite

Posted by Shumo Chu <sh...@gmail.com>.
I agree! Being able to turn on and off makes a lot of sense. Since
Cosette's installation is dockerized, running locally would not be a
problem. Let me pull together my code and do a few additional gluing so
that you can run them easily. Then we can start from there. I will put
my code on my fork of Calcite, does that sound good?

Best
Shumo

On Thu, Feb 8, 2018 at 4:37 PM, Michael Mior <mm...@uwaterloo.ca> wrote:

> Shumo,
>
> Thanks for the additional info. I think it would be great to have these
> tests run as part of the Calcite test suite. One question I have is what
> the runtime is like for these tests. Especially since using the web API
> would result in a dependency on a service we (as in the Calcite PMC) don't
> control, I would be inclined to have these tests run separately so they can
> be turned off if needed. Per an earlier comment from Julian (
> https://issues.apache.org/jira/browse/CALCITE-1977) one option would be to
> allow Calcite to generate a script that can be fed to Cosette.
>
> Also, while I'm sure I could reproduce something like the test suite you
> currently have in time, if you have code you could share for your current
> test setup with Cosette and Calcite that would likely be helpful in moving
> this along.
>
> Cheers,
> --
> Michael Mior
> mmior@uwaterloo.ca
>
> 2018-02-08 19:13 GMT-05:00 Shumo Chu <sh...@gmail.com>:
>
> > Currently, I hijack the test cases and call RelToSqlConverter before and
> > after calcite's rewrite. When I get the query before (Q1) and after
> (Q2), I
> > can then generate Cosette source code, need to add a few things, e.g.
> > schema declarations. This Cosette source code is then sent to our solver
> to
> > get the result of equivalence checking.
> >
> > Now everything is run on my own computer. But Cosette does provide a web
> > service that is running on a UW CSE server and has a web API to call (
> > http://cosette.cs.washington.edu/guide#api). So calcite can either call
> > the
> > web API (the advantage is no need to install any dependencies) or try to
> > run Cosette locally (we have dockerized the installation of Cosette).
> >
> > What do you think?
> >
> > Best
> > Shumo
> >
> > On Thu, Feb 8, 2018 at 3:25 PM, Michael Mior <mm...@uwaterloo.ca> wrote:
> >
> > > Shumo and Alvin,
> > >
> > > That's great to hear (especially the part about the lack of bugs)! That
> > > paper is next on my stack to read. I'd love to work on getting these
> > tests
> > > better integrated with Calcite. It would be really helpful if you could
> > > provide more detail on how you're currently running the tests. Ideally
> if
> > > you could provide a way for me to easily replicate your test
> > > infrastructure, that would probably help give me a better idea on how
> we
> > > can help with the integration.
> > >
> > > Cheers,
> > > --
> > > Michael Mior
> > > mmior@apache.org
> > >
> > > 2018-02-08 17:56 GMT-05:00 Shumo Chu <sh...@gmail.com>:
> > >
> > > > Hi, Calcite Devs!
> > > >
> > > > We wanted to update you on the current status of using Cosette to
> help
> > > > check the correctness of the Calcite planner rules [
> > > > https://issues.apache.org/jira/browse/CALCITE-1977]:
> > > >
> > > > So far we have used Cosette to check whether Calcite's
> transformations
> > > are
> > > > correct by using the test cases in RelOptRulesTest.java.
> > > >
> > > > For each test case, we use RelToSqlConverter to revert the query plan
> > > > before Calcite's rewrite back to SQL (call it Q1) and similarly for
> the
> > > > query plan after Calcite's rewrite (Q2). We then use Cosette to check
> > > > whether Q1 and Q2 are semantically equivalent.
> > > >
> > > > As Calcite's test cases cover many SQL features, some of them are not
> > > > supported by Cosette yet. Out of the 39 (give number) test cases that
> > use
> > > > SQL features supported by Cosette, Cosette is now able to formally
> > prove
> > > > that Calcite's rewrite in 33 of them are correct. This includes a few
> > > > fairly complicated ones, like "testPushFilterPassAggThree" (
> > > > https://github.com/apache/calcite/blob/master/core/src/test
> > > > /java/org/apache/calcite/test/RelOptRulesTest.java#L408) The good
> news
> > > is
> > > > that we haven't found any bugs so far :)
> > > >
> > > > We have also used the test cases to improve Cosette. For example,
> > Cosette
> > > > now supports checking equivalence under database integrity
> constraints
> > as
> > > > preconditions, e.g., Q1 == Q2 iff R.A is a key of R. In case you are
> > > > interested you can read about our latest system at
> > > > https://arxiv.org/abs/1802.02229
> > > >
> > > > We would like to improve Cosette's parser so that we can handle the
> > > > remaining Calcite test cases. Meanwhile, we would also like to
> discuss
> > > how
> > > > to integrate the two tools in a more systematic way, perhaps as part
> of
> > > the
> > > > Calcite test infrastructure rather than us manually extracting the
> test
> > > > cases from RelOptRulesTest (RelToSqlConverter also doesn't always
> > succeed
> > > > in reversion as you may know). Anyone interested in discussing more?
> If
> > > so
> > > > how should we proceed?
> > > >
> > > > Best
> > > > Shumo & Alvin
> > > >
> > > > On Tue, Oct 17, 2017 at 6:36 PM, Shumo Chu <sh...@gmail.com>
> > wrote:
> > > >
> > > > > I create a link to the image:
> > > > >
> > > > > https://homes.cs.washington.edu/~chushumo/img/calcite_
> break_down.png
> > > > >
> > > > > In terms of integration, what I can imagine is to call Cosette web
> > API
> > > in
> > > > > parallel when running calcite test cases, like RelOptRulesTest,
> then
> > > > > perhaps Cosette result can give developers either extra confidence
> or
> > > > > counterexample if there is a bug.
> > > > >
> > > > >
> > > > >
> > > > > On Tue, Oct 17, 2017 at 5:49 PM, Michael Mior <mm...@uwaterloo.ca>
> > > > wrote:
> > > > >
> > > > >> Shumo,
> > > > >>
> > > > >> Thanks for the update! Unfortunately I don't believe inline images
> > and
> > > > >> attachments are not supported on ASF mailing lists so if you have
> > > > another
> > > > >> way of sharing that, would be interested to see. Glad to hear the
> > test
> > > > >> cases have been useful to you. Also great to see that Calcite
> seems
> > to
> > > > be
> > > > >> faring well so far :)
> > > > >>
> > > > >> You're right that the conversion of plans to SQL definitely has
> some
> > > > >> missing pieces. If there are any that are particularly problematic
> > in
> > > > your
> > > > >> with Cosette, please let us know. For any bugs you encounter, it
> > would
> > > > be
> > > > >> great if you could file a JIRA so we can work on fixing these (
> > > > >> https://issues.apache.org/jira/projects/CALCITE/issues/).
> > > > >>
> > > > >> --
> > > > >> Michael Mior
> > > > >> mmior@apache.org
> > > > >>
> > > > >> 2017-10-17 20:06 GMT-04:00 Shumo Chu <sh...@gmail.com>:
> > > > >>
> > > > >> > Hi Michael and Calcite Devs:
> > > > >> >
> > > > >> > That's great!
> > > > >> >
> > > > >> > We also hijacked the test harness of RelOpRule Test and
> generated
> > > SQL
> > > > >> > using before and after optimization logical plans. (The reason
> > that
> > > I
> > > > >> use
> > > > >> > query plan rather than the input SQL is that when Calcite
> > generates
> > > > SQLs
> > > > >> > from query plans, it adds the correct table reference of
> > attributes.
> > > > >> > Cosette currently cannot handle unqualified attribute
> references).
> > > We
> > > > >> got
> > > > >> > 232 before and after SQL after some tweaking. You can find the
> SQL
> > > > >> queries
> > > > >> > that we get here: https://github.com/uwdb/Cosett
> > > > >> e/blob/master/examples/c
> > > > >> > alcite/calcite_tests.json.
> > > > >> >
> > > > >> > We then ran all of them using Cosette. Cosette can currently
> > support
> > > > 39
> > > > >> of
> > > > >> > 232 queries. Out of these 39 queries, the results are:
> > > > >> >
> > > > >> > EQ           9
> > > > >> > UNKNOWN     27
> > > > >> > NEQ          3
> > > > >> >
> > > > >> > EQ means our Coq backend found a valid proof for the equivalence
> > of
> > > > the
> > > > >> > two queries.
> > > > >> >
> > > > >> > UNKNOWN means our model checker could not find a counterexample
> to
> > > > prove
> > > > >> > that the queries are not equivalent (within a time bound of 3
> > secs).
> > > > You
> > > > >> > can interpret that as "likely equal".
> > > > >> >
> > > > >> > NEQ means our model checker found a counterexample to prove the
> > > > queries
> > > > >> > are not equivalent. These are all the cases that there are some
> > > > >> > preconditions that Calcite understands but we haven't yet put
> into
> > > > >> Cosette.
> > > > >> > For example (testPushSemiJoinPastJoinRuleLeft):
> > > > >> >
> > > > >> >  query q1 `SELECT EMP.ENAME
> > > > >> >                   FROM EMP AS EMP, DEPT AS DEPT, EMP AS EMP0
> > > > >> >                    WHERE EMP.DEPTNO = DEPT.DEPTNO AND
> > > > >> >                                  EMP.EMPNO = EMP0.EMPNO`;
> > > > >> >
> > > > >> > query q2 `SELECT EMP1.ENAME
> > > > >> >                  FROM EMP AS EMP1 INNER JOIN DEPT AS DEPT0
> > > > >> >                     ON EMP1.DEPTNO = DEPT0.DEPTNO INNER JOIN EMP
> > AS
> > > > EMP2
> > > > >> >                       ON EMP1.EMPNO = EMP2.EMPNO INNER JOIN DEPT
> > AS
> > > > >> DEPT1
> > > > >> >                     ON EMP1.DEPTNO = DEPT1.DEPTNO INNER JOIN EMP
> > AS
> > > > EMP3
> > > > >> >                     ON EMP1.EMPNO = EMP3.EMPNO`;
> > > > >> >
> > > > >> >
> > > > >> > These two queries are equivalent under the precondition that
> EMPNO
> > > is
> > > > >> the
> > > > >> > primary key of EMP, and DEPTNO is the primary key of DEPT, and
> > that
> > > is
> > > > >> > indeed the case!
> > > > >> >
> > > > >> > We are extending Cosette to support preconditions such as
> primary
> > > keys
> > > > >> and
> > > > >> > foreign keys now and we should be able to handle these soon.
> > > > >> >
> > > > >> > For the queries that we cannot handle, here is the break down of
> > the
> > > > >> > issues:
> > > > >> >
> > > > >> > [image: Inline image 1]
> > > > >> >
> > > > >> > We support a large part of group by queries, GROUPBY here means
> > some
> > > > >> > grouping features that we don't support, e.g., group by on
> > arbitrary
> > > > >> > expressions. Although we might not be able to support all of
> these
> > > SQL
> > > > >> > features, we are definitely working on adding more SQL features
> to
> > > > >> Cosette.
> > > > >> >
> > > > >> > With all above, we can try to integrate Cosette as an extra
> layer
> > to
> > > > >> > ensure developer's confidence. Currently, we are working on
> SIGMOD
> > > > >> deadline
> > > > >> > (Nov. 2) but will devote more time into this afterward.
> > > > >> >
> > > > >> > One thing worth mentioning is that we plan to include all these
> > > > calcite
> > > > >> > examples as the regression test for Cosette so that we can make
> > > > Cosette
> > > > >> > more powerful and practical. These queries are great benchmarks
> > for
> > > > >> Cosette
> > > > >> > and are super valuable!
> > > > >> >
> > > > >> > Meanwhile, It would be also great if calcite's logical plan to
> SQL
> > > > >> > converter can be improved. It currently doesn't support all
> > queries
> > > > and
> > > > >> in
> > > > >> > some cases, it actually generates illegal SQL. (happy to submit
> > > > detailed
> > > > >> > bug report if you think that would be helpful).
> > > > >> >
> > > > >> > Best
> > > > >> > Shumo
> > > > >> >
> > > > >> > On Tue, Oct 17, 2017 at 9:43 AM, Michael Mior <
> mmior@uwaterloo.ca
> > >
> > > > >> wrote:
> > > > >> >
> > > > >> >> This took me longer than expected to get around to, but
> hopefully
> > > the
> > > > >> >> below
> > > > >> >> is helpful:
> > > > >> >>
> > > > >> >> https://gist.github.com/fd2d2db412c11ec4d901925548f85ef2
> > > > >> >>
> > > > >> >> I just did some basic (and very hacky) instrumentation of
> > > > >> RelOptRulesTest
> > > > >> >> to dump SQL before and after rules have been applied. The file
> > > > >> consists of
> > > > >> >> the name of the test followed by the original and then the
> > > rewritten
> > > > >> SQL.
> > > > >> >>
> > > > >> >> Many of the tests are missing for various reasons, but there's
> > > still
> > > > >> 189
> > > > >> >> examples there to play with. Let me know if any particular
> > aspects
> > > of
> > > > >> the
> > > > >> >> SQL are problematic. The "before" SQL is handwritten for the
> > tests
> > > > and
> > > > >> the
> > > > >> >> "after" is ANSI SQL as generated by Calcite from the resulting
> > > > logical
> > > > >> >> plan.
> > > > >> >>
> > > > >> >> --
> > > > >> >> Michael Mior
> > > > >> >> mmior@apache.org
> > > > >> >>
> > > > >> >> 2017-09-22 1:03 GMT-04:00 Amogh Margoor <am...@qubole.com>:
> > > > >> >>
> > > > >> >> > >>> There might be applications in materialized views. A
> query
> > Q
> > > > can
> > > > >> >> use a
> > > > >> >> > materialized view V if V covers Q. In other words if >>>Q ==
> > R(V)
> > > > >> where
> > > > >> >> R
> > > > >> >> > is some sequence of relational operators. Given Q and V,
> > Cosette
> > > > >> could
> > > > >> >> > perhaps analyze and either >>>return R (success) or return
> > that V
> > > > >> does
> > > > >> >> not
> > > > >> >> > cover Q (failure).
> > > > >> >> >
> > > > >> >> > >>This resembles the problem of deciding whether a given
> > relation
> > > > >> >> > (expressed as a query) is contained in another one. It will
> > > >>take
> > > > >> some
> > > > >> >> > work for Cosette to be able to handle this but it definitely
> > > sounds
> > > > >> >> > interesting. Do you have an application in mind? >>One of
> them
> > > > might
> > > > >> be
> > > > >> >> to
> > > > >> >> > determine whether previously cached results can be used.
> > > > >> >> >
> > > > >> >> > One simple idea to start here is to replace a naive solver we
> > > have
> > > > in
> > > > >> >> > Calcite for checking if one predicate implies another
> > predicate.
> > > We
> > > > >> >> call it
> > > > >> >> > RexImplicationChecker in Calcite and if we can replace or
> help
> > it
> > > > >> with
> > > > >> >> > Constraint solver of Cosette which says if a particular
> > > implication
> > > > >> is a
> > > > >> >> > tautology then that would help a great deal.
> > > > >> >> >
> > > > >> >> >
> > > > >> >> >
> > > > >> >> > On Tue, Sep 19, 2017 at 9:29 PM, Alvin Cheung <
> > > > >> >> akcheung@cs.washington.edu>
> > > > >> >> > wrote:
> > > > >> >> >
> > > > >> >> > > Hi Julian et al,
> > > > >> >> > >
> > > > >> >> > > Thanks for your interest in Cosette. Your suggestions make
> a
> > > lot
> > > > of
> > > > >> >> > sense.
> > > > >> >> > > We have done some initial work and would like to get your
> > > > feedback
> > > > >> on
> > > > >> >> how
> > > > >> >> > > to integrate the two tools together.
> > > > >> >> > >
> > > > >> >> > > > One obvious idea is to use Cosette to audit Calcite’s
> query
> > > > >> >> > > transformation rules. Each rule is supposed to preserve
> > > semantics
> > > > >> but
> > > > >> >> > > (until Cosette) we had to trust the author of the rule. We
> > > could
> > > > >> >> convert
> > > > >> >> > > the before and after relational expressions to SQL, and
> then
> > > ask
> > > > >> >> Cosette
> > > > >> >> > > whether those are equivalent. We could enable this check in
> > > > >> Calcite’s
> > > > >> >> > test
> > > > >> >> > > suite, during which many thousands of rules are fired.
> > > > >> >> > >
> > > > >> >> > > Indeed. We have browsed through the Calcite rules and
> > > > reformulated
> > > > >> a
> > > > >> >> few
> > > > >> >> > > of them using our Cosette language:
> > > > >> >> > >
> > > > >> >> > > 1. Conjunctive select (https://github.com/apache/cal
> > > > >> >> > > cite/blob/master/core/src/main/java/org/apache/calcite/
> rel/
> > > > >> >> > > rules/FilterMergeRule.java) -->
> > https://demo.cosette.cs.washin
> > > > >> >> gton.edu/
> > > > >> >> > > (click conjunctive select from the dropdown menu)
> > > > >> >> > >
> > > > >> >> > > 2. Join commute (https://github.com/apache/cal
> > > > >> >> > > cite/blob/master/core/src/main/java/org/apache/calcite/
> rel/
> > > > >> >> > > rules/JoinCommuteRule.java) --> Join commute from the demo
> > > > website
> > > > >> >> above
> > > > >> >> > >
> > > > >> >> > > 3. Join/Project transpose (https://github.com/apache/cal
> > > > >> >> > > cite/blob/master/core/src/main/java/org/apache/calcite/
> rel/
> > > > >> >> > > rules/JoinProjectTransposeRule.java) --> Join Proj. Trans.
> > > from
> > > > >> the
> > > > >> >> demo
> > > > >> >> > > website above
> > > > >> >> > >
> > > > >> >> > > As we are not very familiar with the Calcite code base, we
> > have
> > > > >> tried
> > > > >> >> our
> > > > >> >> > > best to guess the intention of each rule based on the
> > > > >> documentation,
> > > > >> >> > please
> > > > >> >> > > feel free to point out if we made mistakes.
> > > > >> >> > >
> > > > >> >> > > As you can see, the Cosette language is pretty much like
> > > standard
> > > > >> SQL,
> > > > >> >> > > except for declarations of schemas and relations. You will
> > also
> > > > >> notice
> > > > >> >> > the
> > > > >> >> > > "??" in some schema declarations (e.g., in rule 1. above)
> ---
> > > > they
> > > > >> >> stand
> > > > >> >> > > for "symbolic" attributes that can represent any attribute.
> > In
> > > > >> other
> > > > >> >> > words,
> > > > >> >> > > if Cosette can prove that a rule with symbolic attributes
> is
> > > > true,
> > > > >> >> then
> > > > >> >> > it
> > > > >> >> > > will be true regardless of what the symbolic attributes are
> > > > >> >> instantiated
> > > > >> >> > > with. Symbolic predicates (e.g., in rule 1.) works
> similarly,
> > > > hence
> > > > >> >> > giving
> > > > >> >> > > Cosette a mechanism to prove (or disprove) classes of
> rewrite
> > > > >> rules.
> > > > >> >> See
> > > > >> >> > > our documentation at http://cosette.cs.washington.
> edu/guide
> > > for
> > > > >> >> details.
> > > > >> >> > >
> > > > >> >> > > I believe the challenge here is how we can "reverse
> engineer"
> > > the
> > > > >> >> > > intention of each of the existing rules so that they can be
> > > > >> expressed
> > > > >> >> in
> > > > >> >> > > Cosette. Any suggestions on how to do this? We have a few
> > > > students
> > > > >> >> > working
> > > > >> >> > > on Cosette and can help, but we will probably need help
> from
> > > > >> Calcite
> > > > >> >> to
> > > > >> >> > > fully understand all of the existing rules. Another
> > possibility
> > > > is
> > > > >> to
> > > > >> >> > print
> > > > >> >> > > out the input and output of each rule application during
> > > testing,
> > > > >> and
> > > > >> >> > send
> > > > >> >> > > them to Cosette. If the printout is in a form that
> resembles
> > > SQL
> > > > we
> > > > >> >> can
> > > > >> >> > > probably patch it into Cosette.
> > > > >> >> > >
> > > > >> >> > > For new rules, can we can ask Calcite authors to express
> them
> > > in
> > > > >> >> Cosette
> > > > >> >> > > as well, perhaps as part of the documentation? This way we
> > will
> > > > >> only
> > > > >> >> need
> > > > >> >> > > to handle the existing rules.
> > > > >> >> > >
> > > > >> >> > > > A few rules might use other information besides the input
> > > > >> relational
> > > > >> >> > > expression, such as predicates that are known to hold or
> > column
> > > > >> >> > > combinations that are known to be unique. But let’s see
> what
> > > > >> happens.
> > > > >> >> > >
> > > > >> >> > > This is something that we are actively working on. Can you
> > > point
> > > > >> us to
> > > > >> >> > > specific rules with such properties? One possibility is the
> > > join
> > > > >> >> > > commutativity rule noted above. You will notice that we
> > didn't
> > > > >> prove
> > > > >> >> the
> > > > >> >> > > "general form" of the rule with symbolic attributes, but
> > rather
> > > > one
> > > > >> >> with
> > > > >> >> > > concrete schemas. This is because Cosette currently
> > implements
> > > > the
> > > > >> >> > unnamed
> > > > >> >> > > approach to attribute naming (see Section 3.2 in
> > > > >> >> > > http://webdam.inria.fr/Alice/pdfs/Chapter-3.pdf), hence
> the
> > > > >> general
> > > > >> >> form
> > > > >> >> > > of the rule is only true if we know that the two input
> > schemas
> > > > have
> > > > >> >> > > distinct attributes.
> > > > >> >> > >
> > > > >> >> > > > This is a very loose integration of Cosette / Calcite,
> but
> > we
> > > > can
> > > > >> >> make
> > > > >> >> > > closer integrations (e.g. within the same JVM, even at
> > runtime)
> > > > as
> > > > >> we
> > > > >> >> > > discover synergies. After all, optimization and
> > theorem-proving
> > > > are
> > > > >> >> > related
> > > > >> >> > > endeavors.
> > > > >> >> > >
> > > > >> >> > > Agreed. Cosette is implemented using Coq and Racket. We
> > realize
> > > > >> that
> > > > >> >> > those
> > > > >> >> > > are not the most popular languages for implementing systems
> > :)
> > > ,
> > > > so
> > > > >> >> > Cosette
> > > > >> >> > > comes with a POST API as well:
> http://cosette.cs.washington.
> > > > >> >> > edu/guide#api
> > > > >> >> > > . It takes in the program text written in Cosette, and
> > returns
> > > > the
> > > > >> >> answer
> > > > >> >> > > (or times out). Does this make it easier to run the tool?
> We
> > > are
> > > > >> open
> > > > >> >> to
> > > > >> >> > > implementing other bindings as well.
> > > > >> >> > >
> > > > >> >> > > > Another area that would be useful would be to devise test
> > > data.
> > > > >> >> > >
> > > > >> >> > > How about this: Each SQL implementation has its own
> > > > interpretation
> > > > >> of
> > > > >> >> > SQL,
> > > > >> >> > > with Cosette being one of them. Let's implement different
> SQL
> > > > >> >> semantics
> > > > >> >> > > using Cosette (say, Calcite's and Postgres'). Then, given a
> > > > query,
> > > > >> ask
> > > > >> >> > > Cosette to find a counterexample (i.e., an input relation)
> > > where
> > > > >> the
> > > > >> >> two
> > > > >> >> > > implementations will return different results when executed
> > on
> > > a
> > > > >> given
> > > > >> >> > > query. If such a counterexample exists, then Calcite
> > developers
> > > > can
> > > > >> >> > > determine whether this is a "bug" or a "feature". Does this
> > > sound
> > > > >> >> similar
> > > > >> >> > > to what you have in mind?
> > > > >> >> > >
> > > > >> >> > > > There might be applications in materialized views. A
> query
> > Q
> > > > can
> > > > >> >> use a
> > > > >> >> > > materialized view V if V covers Q. In other words if Q ==
> > R(V)
> > > > >> where
> > > > >> >> R is
> > > > >> >> > > some sequence of relational operators. Given Q and V,
> Cosette
> > > > could
> > > > >> >> > perhaps
> > > > >> >> > > analyze and either return R (success) or return that V does
> > not
> > > > >> cover
> > > > >> >> Q
> > > > >> >> > > (failure).
> > > > >> >> > >
> > > > >> >> > > This resembles the problem of deciding whether a given
> > relation
> > > > >> >> > (expressed
> > > > >> >> > > as a query) is contained in another one. It will take some
> > work
> > > > for
> > > > >> >> > Cosette
> > > > >> >> > > to be able to handle this but it definitely sounds
> > interesting.
> > > > Do
> > > > >> you
> > > > >> >> > have
> > > > >> >> > > an application in mind? One of them might be to determine
> > > whether
> > > > >> >> > > previously cached results can be used.
> > > > >> >> > >
> > > > >> >> > > We definitely see lots of synergies between the two tools.
> To
> > > > start
> > > > >> >> with
> > > > >> >> > > something easy :) , I propose we first discuss how to use
> the
> > > > >> current
> > > > >> >> > > Cosette implementation to audit existing Calcite rules,
> and a
> > > way
> > > > >> to
> > > > >> >> > > integrate Cosette into development of future Calcite rules
> as
> > > > part
> > > > >> of
> > > > >> >> > code
> > > > >> >> > > review / regression tests. What do you think?
> > > > >> >> > >
> > > > >> >> > > Thanks,
> > > > >> >> > > Alvin (on behalf of the Cosette team)
> > > > >> >> > >
> > > > >> >> >
> > > > >> >>
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> > --
> > > > >> > shumochu.com
> > > > >> >
> > > > >>
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > shumochu.com
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > shumochu.com
> > > >
> > >
> >
> >
> >
> > --
> > shumochu.com
> >
>



-- 
shumochu.com

Re: Cosette / Apache Calcite

Posted by Michael Mior <mm...@uwaterloo.ca>.
Shumo,

Thanks for the additional info. I think it would be great to have these
tests run as part of the Calcite test suite. One question I have is what
the runtime is like for these tests. Especially since using the web API
would result in a dependency on a service we (as in the Calcite PMC) don't
control, I would be inclined to have these tests run separately so they can
be turned off if needed. Per an earlier comment from Julian (
https://issues.apache.org/jira/browse/CALCITE-1977) one option would be to
allow Calcite to generate a script that can be fed to Cosette.

Also, while I'm sure I could reproduce something like the test suite you
currently have in time, if you have code you could share for your current
test setup with Cosette and Calcite that would likely be helpful in moving
this along.

Cheers,
--
Michael Mior
mmior@uwaterloo.ca

2018-02-08 19:13 GMT-05:00 Shumo Chu <sh...@gmail.com>:

> Currently, I hijack the test cases and call RelToSqlConverter before and
> after calcite's rewrite. When I get the query before (Q1) and after (Q2), I
> can then generate Cosette source code, need to add a few things, e.g.
> schema declarations. This Cosette source code is then sent to our solver to
> get the result of equivalence checking.
>
> Now everything is run on my own computer. But Cosette does provide a web
> service that is running on a UW CSE server and has a web API to call (
> http://cosette.cs.washington.edu/guide#api). So calcite can either call
> the
> web API (the advantage is no need to install any dependencies) or try to
> run Cosette locally (we have dockerized the installation of Cosette).
>
> What do you think?
>
> Best
> Shumo
>
> On Thu, Feb 8, 2018 at 3:25 PM, Michael Mior <mm...@uwaterloo.ca> wrote:
>
> > Shumo and Alvin,
> >
> > That's great to hear (especially the part about the lack of bugs)! That
> > paper is next on my stack to read. I'd love to work on getting these
> tests
> > better integrated with Calcite. It would be really helpful if you could
> > provide more detail on how you're currently running the tests. Ideally if
> > you could provide a way for me to easily replicate your test
> > infrastructure, that would probably help give me a better idea on how we
> > can help with the integration.
> >
> > Cheers,
> > --
> > Michael Mior
> > mmior@apache.org
> >
> > 2018-02-08 17:56 GMT-05:00 Shumo Chu <sh...@gmail.com>:
> >
> > > Hi, Calcite Devs!
> > >
> > > We wanted to update you on the current status of using Cosette to help
> > > check the correctness of the Calcite planner rules [
> > > https://issues.apache.org/jira/browse/CALCITE-1977]:
> > >
> > > So far we have used Cosette to check whether Calcite's transformations
> > are
> > > correct by using the test cases in RelOptRulesTest.java.
> > >
> > > For each test case, we use RelToSqlConverter to revert the query plan
> > > before Calcite's rewrite back to SQL (call it Q1) and similarly for the
> > > query plan after Calcite's rewrite (Q2). We then use Cosette to check
> > > whether Q1 and Q2 are semantically equivalent.
> > >
> > > As Calcite's test cases cover many SQL features, some of them are not
> > > supported by Cosette yet. Out of the 39 (give number) test cases that
> use
> > > SQL features supported by Cosette, Cosette is now able to formally
> prove
> > > that Calcite's rewrite in 33 of them are correct. This includes a few
> > > fairly complicated ones, like "testPushFilterPassAggThree" (
> > > https://github.com/apache/calcite/blob/master/core/src/test
> > > /java/org/apache/calcite/test/RelOptRulesTest.java#L408) The good news
> > is
> > > that we haven't found any bugs so far :)
> > >
> > > We have also used the test cases to improve Cosette. For example,
> Cosette
> > > now supports checking equivalence under database integrity constraints
> as
> > > preconditions, e.g., Q1 == Q2 iff R.A is a key of R. In case you are
> > > interested you can read about our latest system at
> > > https://arxiv.org/abs/1802.02229
> > >
> > > We would like to improve Cosette's parser so that we can handle the
> > > remaining Calcite test cases. Meanwhile, we would also like to discuss
> > how
> > > to integrate the two tools in a more systematic way, perhaps as part of
> > the
> > > Calcite test infrastructure rather than us manually extracting the test
> > > cases from RelOptRulesTest (RelToSqlConverter also doesn't always
> succeed
> > > in reversion as you may know). Anyone interested in discussing more? If
> > so
> > > how should we proceed?
> > >
> > > Best
> > > Shumo & Alvin
> > >
> > > On Tue, Oct 17, 2017 at 6:36 PM, Shumo Chu <sh...@gmail.com>
> wrote:
> > >
> > > > I create a link to the image:
> > > >
> > > > https://homes.cs.washington.edu/~chushumo/img/calcite_break_down.png
> > > >
> > > > In terms of integration, what I can imagine is to call Cosette web
> API
> > in
> > > > parallel when running calcite test cases, like RelOptRulesTest, then
> > > > perhaps Cosette result can give developers either extra confidence or
> > > > counterexample if there is a bug.
> > > >
> > > >
> > > >
> > > > On Tue, Oct 17, 2017 at 5:49 PM, Michael Mior <mm...@uwaterloo.ca>
> > > wrote:
> > > >
> > > >> Shumo,
> > > >>
> > > >> Thanks for the update! Unfortunately I don't believe inline images
> and
> > > >> attachments are not supported on ASF mailing lists so if you have
> > > another
> > > >> way of sharing that, would be interested to see. Glad to hear the
> test
> > > >> cases have been useful to you. Also great to see that Calcite seems
> to
> > > be
> > > >> faring well so far :)
> > > >>
> > > >> You're right that the conversion of plans to SQL definitely has some
> > > >> missing pieces. If there are any that are particularly problematic
> in
> > > your
> > > >> with Cosette, please let us know. For any bugs you encounter, it
> would
> > > be
> > > >> great if you could file a JIRA so we can work on fixing these (
> > > >> https://issues.apache.org/jira/projects/CALCITE/issues/).
> > > >>
> > > >> --
> > > >> Michael Mior
> > > >> mmior@apache.org
> > > >>
> > > >> 2017-10-17 20:06 GMT-04:00 Shumo Chu <sh...@gmail.com>:
> > > >>
> > > >> > Hi Michael and Calcite Devs:
> > > >> >
> > > >> > That's great!
> > > >> >
> > > >> > We also hijacked the test harness of RelOpRule Test and generated
> > SQL
> > > >> > using before and after optimization logical plans. (The reason
> that
> > I
> > > >> use
> > > >> > query plan rather than the input SQL is that when Calcite
> generates
> > > SQLs
> > > >> > from query plans, it adds the correct table reference of
> attributes.
> > > >> > Cosette currently cannot handle unqualified attribute references).
> > We
> > > >> got
> > > >> > 232 before and after SQL after some tweaking. You can find the SQL
> > > >> queries
> > > >> > that we get here: https://github.com/uwdb/Cosett
> > > >> e/blob/master/examples/c
> > > >> > alcite/calcite_tests.json.
> > > >> >
> > > >> > We then ran all of them using Cosette. Cosette can currently
> support
> > > 39
> > > >> of
> > > >> > 232 queries. Out of these 39 queries, the results are:
> > > >> >
> > > >> > EQ           9
> > > >> > UNKNOWN     27
> > > >> > NEQ          3
> > > >> >
> > > >> > EQ means our Coq backend found a valid proof for the equivalence
> of
> > > the
> > > >> > two queries.
> > > >> >
> > > >> > UNKNOWN means our model checker could not find a counterexample to
> > > prove
> > > >> > that the queries are not equivalent (within a time bound of 3
> secs).
> > > You
> > > >> > can interpret that as "likely equal".
> > > >> >
> > > >> > NEQ means our model checker found a counterexample to prove the
> > > queries
> > > >> > are not equivalent. These are all the cases that there are some
> > > >> > preconditions that Calcite understands but we haven't yet put into
> > > >> Cosette.
> > > >> > For example (testPushSemiJoinPastJoinRuleLeft):
> > > >> >
> > > >> >  query q1 `SELECT EMP.ENAME
> > > >> >                   FROM EMP AS EMP, DEPT AS DEPT, EMP AS EMP0
> > > >> >                    WHERE EMP.DEPTNO = DEPT.DEPTNO AND
> > > >> >                                  EMP.EMPNO = EMP0.EMPNO`;
> > > >> >
> > > >> > query q2 `SELECT EMP1.ENAME
> > > >> >                  FROM EMP AS EMP1 INNER JOIN DEPT AS DEPT0
> > > >> >                     ON EMP1.DEPTNO = DEPT0.DEPTNO INNER JOIN EMP
> AS
> > > EMP2
> > > >> >                       ON EMP1.EMPNO = EMP2.EMPNO INNER JOIN DEPT
> AS
> > > >> DEPT1
> > > >> >                     ON EMP1.DEPTNO = DEPT1.DEPTNO INNER JOIN EMP
> AS
> > > EMP3
> > > >> >                     ON EMP1.EMPNO = EMP3.EMPNO`;
> > > >> >
> > > >> >
> > > >> > These two queries are equivalent under the precondition that EMPNO
> > is
> > > >> the
> > > >> > primary key of EMP, and DEPTNO is the primary key of DEPT, and
> that
> > is
> > > >> > indeed the case!
> > > >> >
> > > >> > We are extending Cosette to support preconditions such as primary
> > keys
> > > >> and
> > > >> > foreign keys now and we should be able to handle these soon.
> > > >> >
> > > >> > For the queries that we cannot handle, here is the break down of
> the
> > > >> > issues:
> > > >> >
> > > >> > [image: Inline image 1]
> > > >> >
> > > >> > We support a large part of group by queries, GROUPBY here means
> some
> > > >> > grouping features that we don't support, e.g., group by on
> arbitrary
> > > >> > expressions. Although we might not be able to support all of these
> > SQL
> > > >> > features, we are definitely working on adding more SQL features to
> > > >> Cosette.
> > > >> >
> > > >> > With all above, we can try to integrate Cosette as an extra layer
> to
> > > >> > ensure developer's confidence. Currently, we are working on SIGMOD
> > > >> deadline
> > > >> > (Nov. 2) but will devote more time into this afterward.
> > > >> >
> > > >> > One thing worth mentioning is that we plan to include all these
> > > calcite
> > > >> > examples as the regression test for Cosette so that we can make
> > > Cosette
> > > >> > more powerful and practical. These queries are great benchmarks
> for
> > > >> Cosette
> > > >> > and are super valuable!
> > > >> >
> > > >> > Meanwhile, It would be also great if calcite's logical plan to SQL
> > > >> > converter can be improved. It currently doesn't support all
> queries
> > > and
> > > >> in
> > > >> > some cases, it actually generates illegal SQL. (happy to submit
> > > detailed
> > > >> > bug report if you think that would be helpful).
> > > >> >
> > > >> > Best
> > > >> > Shumo
> > > >> >
> > > >> > On Tue, Oct 17, 2017 at 9:43 AM, Michael Mior <mmior@uwaterloo.ca
> >
> > > >> wrote:
> > > >> >
> > > >> >> This took me longer than expected to get around to, but hopefully
> > the
> > > >> >> below
> > > >> >> is helpful:
> > > >> >>
> > > >> >> https://gist.github.com/fd2d2db412c11ec4d901925548f85ef2
> > > >> >>
> > > >> >> I just did some basic (and very hacky) instrumentation of
> > > >> RelOptRulesTest
> > > >> >> to dump SQL before and after rules have been applied. The file
> > > >> consists of
> > > >> >> the name of the test followed by the original and then the
> > rewritten
> > > >> SQL.
> > > >> >>
> > > >> >> Many of the tests are missing for various reasons, but there's
> > still
> > > >> 189
> > > >> >> examples there to play with. Let me know if any particular
> aspects
> > of
> > > >> the
> > > >> >> SQL are problematic. The "before" SQL is handwritten for the
> tests
> > > and
> > > >> the
> > > >> >> "after" is ANSI SQL as generated by Calcite from the resulting
> > > logical
> > > >> >> plan.
> > > >> >>
> > > >> >> --
> > > >> >> Michael Mior
> > > >> >> mmior@apache.org
> > > >> >>
> > > >> >> 2017-09-22 1:03 GMT-04:00 Amogh Margoor <am...@qubole.com>:
> > > >> >>
> > > >> >> > >>> There might be applications in materialized views. A query
> Q
> > > can
> > > >> >> use a
> > > >> >> > materialized view V if V covers Q. In other words if >>>Q ==
> R(V)
> > > >> where
> > > >> >> R
> > > >> >> > is some sequence of relational operators. Given Q and V,
> Cosette
> > > >> could
> > > >> >> > perhaps analyze and either >>>return R (success) or return
> that V
> > > >> does
> > > >> >> not
> > > >> >> > cover Q (failure).
> > > >> >> >
> > > >> >> > >>This resembles the problem of deciding whether a given
> relation
> > > >> >> > (expressed as a query) is contained in another one. It will
> > >>take
> > > >> some
> > > >> >> > work for Cosette to be able to handle this but it definitely
> > sounds
> > > >> >> > interesting. Do you have an application in mind? >>One of them
> > > might
> > > >> be
> > > >> >> to
> > > >> >> > determine whether previously cached results can be used.
> > > >> >> >
> > > >> >> > One simple idea to start here is to replace a naive solver we
> > have
> > > in
> > > >> >> > Calcite for checking if one predicate implies another
> predicate.
> > We
> > > >> >> call it
> > > >> >> > RexImplicationChecker in Calcite and if we can replace or help
> it
> > > >> with
> > > >> >> > Constraint solver of Cosette which says if a particular
> > implication
> > > >> is a
> > > >> >> > tautology then that would help a great deal.
> > > >> >> >
> > > >> >> >
> > > >> >> >
> > > >> >> > On Tue, Sep 19, 2017 at 9:29 PM, Alvin Cheung <
> > > >> >> akcheung@cs.washington.edu>
> > > >> >> > wrote:
> > > >> >> >
> > > >> >> > > Hi Julian et al,
> > > >> >> > >
> > > >> >> > > Thanks for your interest in Cosette. Your suggestions make a
> > lot
> > > of
> > > >> >> > sense.
> > > >> >> > > We have done some initial work and would like to get your
> > > feedback
> > > >> on
> > > >> >> how
> > > >> >> > > to integrate the two tools together.
> > > >> >> > >
> > > >> >> > > > One obvious idea is to use Cosette to audit Calcite’s query
> > > >> >> > > transformation rules. Each rule is supposed to preserve
> > semantics
> > > >> but
> > > >> >> > > (until Cosette) we had to trust the author of the rule. We
> > could
> > > >> >> convert
> > > >> >> > > the before and after relational expressions to SQL, and then
> > ask
> > > >> >> Cosette
> > > >> >> > > whether those are equivalent. We could enable this check in
> > > >> Calcite’s
> > > >> >> > test
> > > >> >> > > suite, during which many thousands of rules are fired.
> > > >> >> > >
> > > >> >> > > Indeed. We have browsed through the Calcite rules and
> > > reformulated
> > > >> a
> > > >> >> few
> > > >> >> > > of them using our Cosette language:
> > > >> >> > >
> > > >> >> > > 1. Conjunctive select (https://github.com/apache/cal
> > > >> >> > > cite/blob/master/core/src/main/java/org/apache/calcite/rel/
> > > >> >> > > rules/FilterMergeRule.java) -->
> https://demo.cosette.cs.washin
> > > >> >> gton.edu/
> > > >> >> > > (click conjunctive select from the dropdown menu)
> > > >> >> > >
> > > >> >> > > 2. Join commute (https://github.com/apache/cal
> > > >> >> > > cite/blob/master/core/src/main/java/org/apache/calcite/rel/
> > > >> >> > > rules/JoinCommuteRule.java) --> Join commute from the demo
> > > website
> > > >> >> above
> > > >> >> > >
> > > >> >> > > 3. Join/Project transpose (https://github.com/apache/cal
> > > >> >> > > cite/blob/master/core/src/main/java/org/apache/calcite/rel/
> > > >> >> > > rules/JoinProjectTransposeRule.java) --> Join Proj. Trans.
> > from
> > > >> the
> > > >> >> demo
> > > >> >> > > website above
> > > >> >> > >
> > > >> >> > > As we are not very familiar with the Calcite code base, we
> have
> > > >> tried
> > > >> >> our
> > > >> >> > > best to guess the intention of each rule based on the
> > > >> documentation,
> > > >> >> > please
> > > >> >> > > feel free to point out if we made mistakes.
> > > >> >> > >
> > > >> >> > > As you can see, the Cosette language is pretty much like
> > standard
> > > >> SQL,
> > > >> >> > > except for declarations of schemas and relations. You will
> also
> > > >> notice
> > > >> >> > the
> > > >> >> > > "??" in some schema declarations (e.g., in rule 1. above) ---
> > > they
> > > >> >> stand
> > > >> >> > > for "symbolic" attributes that can represent any attribute.
> In
> > > >> other
> > > >> >> > words,
> > > >> >> > > if Cosette can prove that a rule with symbolic attributes is
> > > true,
> > > >> >> then
> > > >> >> > it
> > > >> >> > > will be true regardless of what the symbolic attributes are
> > > >> >> instantiated
> > > >> >> > > with. Symbolic predicates (e.g., in rule 1.) works similarly,
> > > hence
> > > >> >> > giving
> > > >> >> > > Cosette a mechanism to prove (or disprove) classes of rewrite
> > > >> rules.
> > > >> >> See
> > > >> >> > > our documentation at http://cosette.cs.washington.edu/guide
> > for
> > > >> >> details.
> > > >> >> > >
> > > >> >> > > I believe the challenge here is how we can "reverse engineer"
> > the
> > > >> >> > > intention of each of the existing rules so that they can be
> > > >> expressed
> > > >> >> in
> > > >> >> > > Cosette. Any suggestions on how to do this? We have a few
> > > students
> > > >> >> > working
> > > >> >> > > on Cosette and can help, but we will probably need help from
> > > >> Calcite
> > > >> >> to
> > > >> >> > > fully understand all of the existing rules. Another
> possibility
> > > is
> > > >> to
> > > >> >> > print
> > > >> >> > > out the input and output of each rule application during
> > testing,
> > > >> and
> > > >> >> > send
> > > >> >> > > them to Cosette. If the printout is in a form that resembles
> > SQL
> > > we
> > > >> >> can
> > > >> >> > > probably patch it into Cosette.
> > > >> >> > >
> > > >> >> > > For new rules, can we can ask Calcite authors to express them
> > in
> > > >> >> Cosette
> > > >> >> > > as well, perhaps as part of the documentation? This way we
> will
> > > >> only
> > > >> >> need
> > > >> >> > > to handle the existing rules.
> > > >> >> > >
> > > >> >> > > > A few rules might use other information besides the input
> > > >> relational
> > > >> >> > > expression, such as predicates that are known to hold or
> column
> > > >> >> > > combinations that are known to be unique. But let’s see what
> > > >> happens.
> > > >> >> > >
> > > >> >> > > This is something that we are actively working on. Can you
> > point
> > > >> us to
> > > >> >> > > specific rules with such properties? One possibility is the
> > join
> > > >> >> > > commutativity rule noted above. You will notice that we
> didn't
> > > >> prove
> > > >> >> the
> > > >> >> > > "general form" of the rule with symbolic attributes, but
> rather
> > > one
> > > >> >> with
> > > >> >> > > concrete schemas. This is because Cosette currently
> implements
> > > the
> > > >> >> > unnamed
> > > >> >> > > approach to attribute naming (see Section 3.2 in
> > > >> >> > > http://webdam.inria.fr/Alice/pdfs/Chapter-3.pdf), hence the
> > > >> general
> > > >> >> form
> > > >> >> > > of the rule is only true if we know that the two input
> schemas
> > > have
> > > >> >> > > distinct attributes.
> > > >> >> > >
> > > >> >> > > > This is a very loose integration of Cosette / Calcite, but
> we
> > > can
> > > >> >> make
> > > >> >> > > closer integrations (e.g. within the same JVM, even at
> runtime)
> > > as
> > > >> we
> > > >> >> > > discover synergies. After all, optimization and
> theorem-proving
> > > are
> > > >> >> > related
> > > >> >> > > endeavors.
> > > >> >> > >
> > > >> >> > > Agreed. Cosette is implemented using Coq and Racket. We
> realize
> > > >> that
> > > >> >> > those
> > > >> >> > > are not the most popular languages for implementing systems
> :)
> > ,
> > > so
> > > >> >> > Cosette
> > > >> >> > > comes with a POST API as well: http://cosette.cs.washington.
> > > >> >> > edu/guide#api
> > > >> >> > > . It takes in the program text written in Cosette, and
> returns
> > > the
> > > >> >> answer
> > > >> >> > > (or times out). Does this make it easier to run the tool? We
> > are
> > > >> open
> > > >> >> to
> > > >> >> > > implementing other bindings as well.
> > > >> >> > >
> > > >> >> > > > Another area that would be useful would be to devise test
> > data.
> > > >> >> > >
> > > >> >> > > How about this: Each SQL implementation has its own
> > > interpretation
> > > >> of
> > > >> >> > SQL,
> > > >> >> > > with Cosette being one of them. Let's implement different SQL
> > > >> >> semantics
> > > >> >> > > using Cosette (say, Calcite's and Postgres'). Then, given a
> > > query,
> > > >> ask
> > > >> >> > > Cosette to find a counterexample (i.e., an input relation)
> > where
> > > >> the
> > > >> >> two
> > > >> >> > > implementations will return different results when executed
> on
> > a
> > > >> given
> > > >> >> > > query. If such a counterexample exists, then Calcite
> developers
> > > can
> > > >> >> > > determine whether this is a "bug" or a "feature". Does this
> > sound
> > > >> >> similar
> > > >> >> > > to what you have in mind?
> > > >> >> > >
> > > >> >> > > > There might be applications in materialized views. A query
> Q
> > > can
> > > >> >> use a
> > > >> >> > > materialized view V if V covers Q. In other words if Q ==
> R(V)
> > > >> where
> > > >> >> R is
> > > >> >> > > some sequence of relational operators. Given Q and V, Cosette
> > > could
> > > >> >> > perhaps
> > > >> >> > > analyze and either return R (success) or return that V does
> not
> > > >> cover
> > > >> >> Q
> > > >> >> > > (failure).
> > > >> >> > >
> > > >> >> > > This resembles the problem of deciding whether a given
> relation
> > > >> >> > (expressed
> > > >> >> > > as a query) is contained in another one. It will take some
> work
> > > for
> > > >> >> > Cosette
> > > >> >> > > to be able to handle this but it definitely sounds
> interesting.
> > > Do
> > > >> you
> > > >> >> > have
> > > >> >> > > an application in mind? One of them might be to determine
> > whether
> > > >> >> > > previously cached results can be used.
> > > >> >> > >
> > > >> >> > > We definitely see lots of synergies between the two tools. To
> > > start
> > > >> >> with
> > > >> >> > > something easy :) , I propose we first discuss how to use the
> > > >> current
> > > >> >> > > Cosette implementation to audit existing Calcite rules, and a
> > way
> > > >> to
> > > >> >> > > integrate Cosette into development of future Calcite rules as
> > > part
> > > >> of
> > > >> >> > code
> > > >> >> > > review / regression tests. What do you think?
> > > >> >> > >
> > > >> >> > > Thanks,
> > > >> >> > > Alvin (on behalf of the Cosette team)
> > > >> >> > >
> > > >> >> >
> > > >> >>
> > > >> >
> > > >> >
> > > >> >
> > > >> > --
> > > >> > shumochu.com
> > > >> >
> > > >>
> > > >
> > > >
> > > >
> > > > --
> > > > shumochu.com
> > > >
> > >
> > >
> > >
> > > --
> > > shumochu.com
> > >
> >
>
>
>
> --
> shumochu.com
>

Re: Cosette / Apache Calcite

Posted by Shumo Chu <sh...@gmail.com>.
Currently, I hijack the test cases and call RelToSqlConverter before and
after calcite's rewrite. When I get the query before (Q1) and after (Q2), I
can then generate Cosette source code, need to add a few things, e.g.
schema declarations. This Cosette source code is then sent to our solver to
get the result of equivalence checking.

Now everything is run on my own computer. But Cosette does provide a web
service that is running on a UW CSE server and has a web API to call (
http://cosette.cs.washington.edu/guide#api). So calcite can either call the
web API (the advantage is no need to install any dependencies) or try to
run Cosette locally (we have dockerized the installation of Cosette).

What do you think?

Best
Shumo

On Thu, Feb 8, 2018 at 3:25 PM, Michael Mior <mm...@uwaterloo.ca> wrote:

> Shumo and Alvin,
>
> That's great to hear (especially the part about the lack of bugs)! That
> paper is next on my stack to read. I'd love to work on getting these tests
> better integrated with Calcite. It would be really helpful if you could
> provide more detail on how you're currently running the tests. Ideally if
> you could provide a way for me to easily replicate your test
> infrastructure, that would probably help give me a better idea on how we
> can help with the integration.
>
> Cheers,
> --
> Michael Mior
> mmior@apache.org
>
> 2018-02-08 17:56 GMT-05:00 Shumo Chu <sh...@gmail.com>:
>
> > Hi, Calcite Devs!
> >
> > We wanted to update you on the current status of using Cosette to help
> > check the correctness of the Calcite planner rules [
> > https://issues.apache.org/jira/browse/CALCITE-1977]:
> >
> > So far we have used Cosette to check whether Calcite's transformations
> are
> > correct by using the test cases in RelOptRulesTest.java.
> >
> > For each test case, we use RelToSqlConverter to revert the query plan
> > before Calcite's rewrite back to SQL (call it Q1) and similarly for the
> > query plan after Calcite's rewrite (Q2). We then use Cosette to check
> > whether Q1 and Q2 are semantically equivalent.
> >
> > As Calcite's test cases cover many SQL features, some of them are not
> > supported by Cosette yet. Out of the 39 (give number) test cases that use
> > SQL features supported by Cosette, Cosette is now able to formally prove
> > that Calcite's rewrite in 33 of them are correct. This includes a few
> > fairly complicated ones, like "testPushFilterPassAggThree" (
> > https://github.com/apache/calcite/blob/master/core/src/test
> > /java/org/apache/calcite/test/RelOptRulesTest.java#L408) The good news
> is
> > that we haven't found any bugs so far :)
> >
> > We have also used the test cases to improve Cosette. For example, Cosette
> > now supports checking equivalence under database integrity constraints as
> > preconditions, e.g., Q1 == Q2 iff R.A is a key of R. In case you are
> > interested you can read about our latest system at
> > https://arxiv.org/abs/1802.02229
> >
> > We would like to improve Cosette's parser so that we can handle the
> > remaining Calcite test cases. Meanwhile, we would also like to discuss
> how
> > to integrate the two tools in a more systematic way, perhaps as part of
> the
> > Calcite test infrastructure rather than us manually extracting the test
> > cases from RelOptRulesTest (RelToSqlConverter also doesn't always succeed
> > in reversion as you may know). Anyone interested in discussing more? If
> so
> > how should we proceed?
> >
> > Best
> > Shumo & Alvin
> >
> > On Tue, Oct 17, 2017 at 6:36 PM, Shumo Chu <sh...@gmail.com> wrote:
> >
> > > I create a link to the image:
> > >
> > > https://homes.cs.washington.edu/~chushumo/img/calcite_break_down.png
> > >
> > > In terms of integration, what I can imagine is to call Cosette web API
> in
> > > parallel when running calcite test cases, like RelOptRulesTest, then
> > > perhaps Cosette result can give developers either extra confidence or
> > > counterexample if there is a bug.
> > >
> > >
> > >
> > > On Tue, Oct 17, 2017 at 5:49 PM, Michael Mior <mm...@uwaterloo.ca>
> > wrote:
> > >
> > >> Shumo,
> > >>
> > >> Thanks for the update! Unfortunately I don't believe inline images and
> > >> attachments are not supported on ASF mailing lists so if you have
> > another
> > >> way of sharing that, would be interested to see. Glad to hear the test
> > >> cases have been useful to you. Also great to see that Calcite seems to
> > be
> > >> faring well so far :)
> > >>
> > >> You're right that the conversion of plans to SQL definitely has some
> > >> missing pieces. If there are any that are particularly problematic in
> > your
> > >> with Cosette, please let us know. For any bugs you encounter, it would
> > be
> > >> great if you could file a JIRA so we can work on fixing these (
> > >> https://issues.apache.org/jira/projects/CALCITE/issues/).
> > >>
> > >> --
> > >> Michael Mior
> > >> mmior@apache.org
> > >>
> > >> 2017-10-17 20:06 GMT-04:00 Shumo Chu <sh...@gmail.com>:
> > >>
> > >> > Hi Michael and Calcite Devs:
> > >> >
> > >> > That's great!
> > >> >
> > >> > We also hijacked the test harness of RelOpRule Test and generated
> SQL
> > >> > using before and after optimization logical plans. (The reason that
> I
> > >> use
> > >> > query plan rather than the input SQL is that when Calcite generates
> > SQLs
> > >> > from query plans, it adds the correct table reference of attributes.
> > >> > Cosette currently cannot handle unqualified attribute references).
> We
> > >> got
> > >> > 232 before and after SQL after some tweaking. You can find the SQL
> > >> queries
> > >> > that we get here: https://github.com/uwdb/Cosett
> > >> e/blob/master/examples/c
> > >> > alcite/calcite_tests.json.
> > >> >
> > >> > We then ran all of them using Cosette. Cosette can currently support
> > 39
> > >> of
> > >> > 232 queries. Out of these 39 queries, the results are:
> > >> >
> > >> > EQ           9
> > >> > UNKNOWN     27
> > >> > NEQ          3
> > >> >
> > >> > EQ means our Coq backend found a valid proof for the equivalence of
> > the
> > >> > two queries.
> > >> >
> > >> > UNKNOWN means our model checker could not find a counterexample to
> > prove
> > >> > that the queries are not equivalent (within a time bound of 3 secs).
> > You
> > >> > can interpret that as "likely equal".
> > >> >
> > >> > NEQ means our model checker found a counterexample to prove the
> > queries
> > >> > are not equivalent. These are all the cases that there are some
> > >> > preconditions that Calcite understands but we haven't yet put into
> > >> Cosette.
> > >> > For example (testPushSemiJoinPastJoinRuleLeft):
> > >> >
> > >> >  query q1 `SELECT EMP.ENAME
> > >> >                   FROM EMP AS EMP, DEPT AS DEPT, EMP AS EMP0
> > >> >                    WHERE EMP.DEPTNO = DEPT.DEPTNO AND
> > >> >                                  EMP.EMPNO = EMP0.EMPNO`;
> > >> >
> > >> > query q2 `SELECT EMP1.ENAME
> > >> >                  FROM EMP AS EMP1 INNER JOIN DEPT AS DEPT0
> > >> >                     ON EMP1.DEPTNO = DEPT0.DEPTNO INNER JOIN EMP AS
> > EMP2
> > >> >                       ON EMP1.EMPNO = EMP2.EMPNO INNER JOIN DEPT AS
> > >> DEPT1
> > >> >                     ON EMP1.DEPTNO = DEPT1.DEPTNO INNER JOIN EMP AS
> > EMP3
> > >> >                     ON EMP1.EMPNO = EMP3.EMPNO`;
> > >> >
> > >> >
> > >> > These two queries are equivalent under the precondition that EMPNO
> is
> > >> the
> > >> > primary key of EMP, and DEPTNO is the primary key of DEPT, and that
> is
> > >> > indeed the case!
> > >> >
> > >> > We are extending Cosette to support preconditions such as primary
> keys
> > >> and
> > >> > foreign keys now and we should be able to handle these soon.
> > >> >
> > >> > For the queries that we cannot handle, here is the break down of the
> > >> > issues:
> > >> >
> > >> > [image: Inline image 1]
> > >> >
> > >> > We support a large part of group by queries, GROUPBY here means some
> > >> > grouping features that we don't support, e.g., group by on arbitrary
> > >> > expressions. Although we might not be able to support all of these
> SQL
> > >> > features, we are definitely working on adding more SQL features to
> > >> Cosette.
> > >> >
> > >> > With all above, we can try to integrate Cosette as an extra layer to
> > >> > ensure developer's confidence. Currently, we are working on SIGMOD
> > >> deadline
> > >> > (Nov. 2) but will devote more time into this afterward.
> > >> >
> > >> > One thing worth mentioning is that we plan to include all these
> > calcite
> > >> > examples as the regression test for Cosette so that we can make
> > Cosette
> > >> > more powerful and practical. These queries are great benchmarks for
> > >> Cosette
> > >> > and are super valuable!
> > >> >
> > >> > Meanwhile, It would be also great if calcite's logical plan to SQL
> > >> > converter can be improved. It currently doesn't support all queries
> > and
> > >> in
> > >> > some cases, it actually generates illegal SQL. (happy to submit
> > detailed
> > >> > bug report if you think that would be helpful).
> > >> >
> > >> > Best
> > >> > Shumo
> > >> >
> > >> > On Tue, Oct 17, 2017 at 9:43 AM, Michael Mior <mm...@uwaterloo.ca>
> > >> wrote:
> > >> >
> > >> >> This took me longer than expected to get around to, but hopefully
> the
> > >> >> below
> > >> >> is helpful:
> > >> >>
> > >> >> https://gist.github.com/fd2d2db412c11ec4d901925548f85ef2
> > >> >>
> > >> >> I just did some basic (and very hacky) instrumentation of
> > >> RelOptRulesTest
> > >> >> to dump SQL before and after rules have been applied. The file
> > >> consists of
> > >> >> the name of the test followed by the original and then the
> rewritten
> > >> SQL.
> > >> >>
> > >> >> Many of the tests are missing for various reasons, but there's
> still
> > >> 189
> > >> >> examples there to play with. Let me know if any particular aspects
> of
> > >> the
> > >> >> SQL are problematic. The "before" SQL is handwritten for the tests
> > and
> > >> the
> > >> >> "after" is ANSI SQL as generated by Calcite from the resulting
> > logical
> > >> >> plan.
> > >> >>
> > >> >> --
> > >> >> Michael Mior
> > >> >> mmior@apache.org
> > >> >>
> > >> >> 2017-09-22 1:03 GMT-04:00 Amogh Margoor <am...@qubole.com>:
> > >> >>
> > >> >> > >>> There might be applications in materialized views. A query Q
> > can
> > >> >> use a
> > >> >> > materialized view V if V covers Q. In other words if >>>Q == R(V)
> > >> where
> > >> >> R
> > >> >> > is some sequence of relational operators. Given Q and V, Cosette
> > >> could
> > >> >> > perhaps analyze and either >>>return R (success) or return that V
> > >> does
> > >> >> not
> > >> >> > cover Q (failure).
> > >> >> >
> > >> >> > >>This resembles the problem of deciding whether a given relation
> > >> >> > (expressed as a query) is contained in another one. It will
> >>take
> > >> some
> > >> >> > work for Cosette to be able to handle this but it definitely
> sounds
> > >> >> > interesting. Do you have an application in mind? >>One of them
> > might
> > >> be
> > >> >> to
> > >> >> > determine whether previously cached results can be used.
> > >> >> >
> > >> >> > One simple idea to start here is to replace a naive solver we
> have
> > in
> > >> >> > Calcite for checking if one predicate implies another predicate.
> We
> > >> >> call it
> > >> >> > RexImplicationChecker in Calcite and if we can replace or help it
> > >> with
> > >> >> > Constraint solver of Cosette which says if a particular
> implication
> > >> is a
> > >> >> > tautology then that would help a great deal.
> > >> >> >
> > >> >> >
> > >> >> >
> > >> >> > On Tue, Sep 19, 2017 at 9:29 PM, Alvin Cheung <
> > >> >> akcheung@cs.washington.edu>
> > >> >> > wrote:
> > >> >> >
> > >> >> > > Hi Julian et al,
> > >> >> > >
> > >> >> > > Thanks for your interest in Cosette. Your suggestions make a
> lot
> > of
> > >> >> > sense.
> > >> >> > > We have done some initial work and would like to get your
> > feedback
> > >> on
> > >> >> how
> > >> >> > > to integrate the two tools together.
> > >> >> > >
> > >> >> > > > One obvious idea is to use Cosette to audit Calcite’s query
> > >> >> > > transformation rules. Each rule is supposed to preserve
> semantics
> > >> but
> > >> >> > > (until Cosette) we had to trust the author of the rule. We
> could
> > >> >> convert
> > >> >> > > the before and after relational expressions to SQL, and then
> ask
> > >> >> Cosette
> > >> >> > > whether those are equivalent. We could enable this check in
> > >> Calcite’s
> > >> >> > test
> > >> >> > > suite, during which many thousands of rules are fired.
> > >> >> > >
> > >> >> > > Indeed. We have browsed through the Calcite rules and
> > reformulated
> > >> a
> > >> >> few
> > >> >> > > of them using our Cosette language:
> > >> >> > >
> > >> >> > > 1. Conjunctive select (https://github.com/apache/cal
> > >> >> > > cite/blob/master/core/src/main/java/org/apache/calcite/rel/
> > >> >> > > rules/FilterMergeRule.java) --> https://demo.cosette.cs.washin
> > >> >> gton.edu/
> > >> >> > > (click conjunctive select from the dropdown menu)
> > >> >> > >
> > >> >> > > 2. Join commute (https://github.com/apache/cal
> > >> >> > > cite/blob/master/core/src/main/java/org/apache/calcite/rel/
> > >> >> > > rules/JoinCommuteRule.java) --> Join commute from the demo
> > website
> > >> >> above
> > >> >> > >
> > >> >> > > 3. Join/Project transpose (https://github.com/apache/cal
> > >> >> > > cite/blob/master/core/src/main/java/org/apache/calcite/rel/
> > >> >> > > rules/JoinProjectTransposeRule.java) --> Join Proj. Trans.
> from
> > >> the
> > >> >> demo
> > >> >> > > website above
> > >> >> > >
> > >> >> > > As we are not very familiar with the Calcite code base, we have
> > >> tried
> > >> >> our
> > >> >> > > best to guess the intention of each rule based on the
> > >> documentation,
> > >> >> > please
> > >> >> > > feel free to point out if we made mistakes.
> > >> >> > >
> > >> >> > > As you can see, the Cosette language is pretty much like
> standard
> > >> SQL,
> > >> >> > > except for declarations of schemas and relations. You will also
> > >> notice
> > >> >> > the
> > >> >> > > "??" in some schema declarations (e.g., in rule 1. above) ---
> > they
> > >> >> stand
> > >> >> > > for "symbolic" attributes that can represent any attribute. In
> > >> other
> > >> >> > words,
> > >> >> > > if Cosette can prove that a rule with symbolic attributes is
> > true,
> > >> >> then
> > >> >> > it
> > >> >> > > will be true regardless of what the symbolic attributes are
> > >> >> instantiated
> > >> >> > > with. Symbolic predicates (e.g., in rule 1.) works similarly,
> > hence
> > >> >> > giving
> > >> >> > > Cosette a mechanism to prove (or disprove) classes of rewrite
> > >> rules.
> > >> >> See
> > >> >> > > our documentation at http://cosette.cs.washington.edu/guide
> for
> > >> >> details.
> > >> >> > >
> > >> >> > > I believe the challenge here is how we can "reverse engineer"
> the
> > >> >> > > intention of each of the existing rules so that they can be
> > >> expressed
> > >> >> in
> > >> >> > > Cosette. Any suggestions on how to do this? We have a few
> > students
> > >> >> > working
> > >> >> > > on Cosette and can help, but we will probably need help from
> > >> Calcite
> > >> >> to
> > >> >> > > fully understand all of the existing rules. Another possibility
> > is
> > >> to
> > >> >> > print
> > >> >> > > out the input and output of each rule application during
> testing,
> > >> and
> > >> >> > send
> > >> >> > > them to Cosette. If the printout is in a form that resembles
> SQL
> > we
> > >> >> can
> > >> >> > > probably patch it into Cosette.
> > >> >> > >
> > >> >> > > For new rules, can we can ask Calcite authors to express them
> in
> > >> >> Cosette
> > >> >> > > as well, perhaps as part of the documentation? This way we will
> > >> only
> > >> >> need
> > >> >> > > to handle the existing rules.
> > >> >> > >
> > >> >> > > > A few rules might use other information besides the input
> > >> relational
> > >> >> > > expression, such as predicates that are known to hold or column
> > >> >> > > combinations that are known to be unique. But let’s see what
> > >> happens.
> > >> >> > >
> > >> >> > > This is something that we are actively working on. Can you
> point
> > >> us to
> > >> >> > > specific rules with such properties? One possibility is the
> join
> > >> >> > > commutativity rule noted above. You will notice that we didn't
> > >> prove
> > >> >> the
> > >> >> > > "general form" of the rule with symbolic attributes, but rather
> > one
> > >> >> with
> > >> >> > > concrete schemas. This is because Cosette currently implements
> > the
> > >> >> > unnamed
> > >> >> > > approach to attribute naming (see Section 3.2 in
> > >> >> > > http://webdam.inria.fr/Alice/pdfs/Chapter-3.pdf), hence the
> > >> general
> > >> >> form
> > >> >> > > of the rule is only true if we know that the two input schemas
> > have
> > >> >> > > distinct attributes.
> > >> >> > >
> > >> >> > > > This is a very loose integration of Cosette / Calcite, but we
> > can
> > >> >> make
> > >> >> > > closer integrations (e.g. within the same JVM, even at runtime)
> > as
> > >> we
> > >> >> > > discover synergies. After all, optimization and theorem-proving
> > are
> > >> >> > related
> > >> >> > > endeavors.
> > >> >> > >
> > >> >> > > Agreed. Cosette is implemented using Coq and Racket. We realize
> > >> that
> > >> >> > those
> > >> >> > > are not the most popular languages for implementing systems :)
> ,
> > so
> > >> >> > Cosette
> > >> >> > > comes with a POST API as well: http://cosette.cs.washington.
> > >> >> > edu/guide#api
> > >> >> > > . It takes in the program text written in Cosette, and returns
> > the
> > >> >> answer
> > >> >> > > (or times out). Does this make it easier to run the tool? We
> are
> > >> open
> > >> >> to
> > >> >> > > implementing other bindings as well.
> > >> >> > >
> > >> >> > > > Another area that would be useful would be to devise test
> data.
> > >> >> > >
> > >> >> > > How about this: Each SQL implementation has its own
> > interpretation
> > >> of
> > >> >> > SQL,
> > >> >> > > with Cosette being one of them. Let's implement different SQL
> > >> >> semantics
> > >> >> > > using Cosette (say, Calcite's and Postgres'). Then, given a
> > query,
> > >> ask
> > >> >> > > Cosette to find a counterexample (i.e., an input relation)
> where
> > >> the
> > >> >> two
> > >> >> > > implementations will return different results when executed on
> a
> > >> given
> > >> >> > > query. If such a counterexample exists, then Calcite developers
> > can
> > >> >> > > determine whether this is a "bug" or a "feature". Does this
> sound
> > >> >> similar
> > >> >> > > to what you have in mind?
> > >> >> > >
> > >> >> > > > There might be applications in materialized views. A query Q
> > can
> > >> >> use a
> > >> >> > > materialized view V if V covers Q. In other words if Q == R(V)
> > >> where
> > >> >> R is
> > >> >> > > some sequence of relational operators. Given Q and V, Cosette
> > could
> > >> >> > perhaps
> > >> >> > > analyze and either return R (success) or return that V does not
> > >> cover
> > >> >> Q
> > >> >> > > (failure).
> > >> >> > >
> > >> >> > > This resembles the problem of deciding whether a given relation
> > >> >> > (expressed
> > >> >> > > as a query) is contained in another one. It will take some work
> > for
> > >> >> > Cosette
> > >> >> > > to be able to handle this but it definitely sounds interesting.
> > Do
> > >> you
> > >> >> > have
> > >> >> > > an application in mind? One of them might be to determine
> whether
> > >> >> > > previously cached results can be used.
> > >> >> > >
> > >> >> > > We definitely see lots of synergies between the two tools. To
> > start
> > >> >> with
> > >> >> > > something easy :) , I propose we first discuss how to use the
> > >> current
> > >> >> > > Cosette implementation to audit existing Calcite rules, and a
> way
> > >> to
> > >> >> > > integrate Cosette into development of future Calcite rules as
> > part
> > >> of
> > >> >> > code
> > >> >> > > review / regression tests. What do you think?
> > >> >> > >
> > >> >> > > Thanks,
> > >> >> > > Alvin (on behalf of the Cosette team)
> > >> >> > >
> > >> >> >
> > >> >>
> > >> >
> > >> >
> > >> >
> > >> > --
> > >> > shumochu.com
> > >> >
> > >>
> > >
> > >
> > >
> > > --
> > > shumochu.com
> > >
> >
> >
> >
> > --
> > shumochu.com
> >
>



-- 
shumochu.com

Re: Cosette / Apache Calcite

Posted by Michael Mior <mm...@uwaterloo.ca>.
Shumo and Alvin,

That's great to hear (especially the part about the lack of bugs)! That
paper is next on my stack to read. I'd love to work on getting these tests
better integrated with Calcite. It would be really helpful if you could
provide more detail on how you're currently running the tests. Ideally if
you could provide a way for me to easily replicate your test
infrastructure, that would probably help give me a better idea on how we
can help with the integration.

Cheers,
--
Michael Mior
mmior@apache.org

2018-02-08 17:56 GMT-05:00 Shumo Chu <sh...@gmail.com>:

> Hi, Calcite Devs!
>
> We wanted to update you on the current status of using Cosette to help
> check the correctness of the Calcite planner rules [
> https://issues.apache.org/jira/browse/CALCITE-1977]:
>
> So far we have used Cosette to check whether Calcite's transformations are
> correct by using the test cases in RelOptRulesTest.java.
>
> For each test case, we use RelToSqlConverter to revert the query plan
> before Calcite's rewrite back to SQL (call it Q1) and similarly for the
> query plan after Calcite's rewrite (Q2). We then use Cosette to check
> whether Q1 and Q2 are semantically equivalent.
>
> As Calcite's test cases cover many SQL features, some of them are not
> supported by Cosette yet. Out of the 39 (give number) test cases that use
> SQL features supported by Cosette, Cosette is now able to formally prove
> that Calcite's rewrite in 33 of them are correct. This includes a few
> fairly complicated ones, like "testPushFilterPassAggThree" (
> https://github.com/apache/calcite/blob/master/core/src/test
> /java/org/apache/calcite/test/RelOptRulesTest.java#L408) The good news is
> that we haven't found any bugs so far :)
>
> We have also used the test cases to improve Cosette. For example, Cosette
> now supports checking equivalence under database integrity constraints as
> preconditions, e.g., Q1 == Q2 iff R.A is a key of R. In case you are
> interested you can read about our latest system at
> https://arxiv.org/abs/1802.02229
>
> We would like to improve Cosette's parser so that we can handle the
> remaining Calcite test cases. Meanwhile, we would also like to discuss how
> to integrate the two tools in a more systematic way, perhaps as part of the
> Calcite test infrastructure rather than us manually extracting the test
> cases from RelOptRulesTest (RelToSqlConverter also doesn't always succeed
> in reversion as you may know). Anyone interested in discussing more? If so
> how should we proceed?
>
> Best
> Shumo & Alvin
>
> On Tue, Oct 17, 2017 at 6:36 PM, Shumo Chu <sh...@gmail.com> wrote:
>
> > I create a link to the image:
> >
> > https://homes.cs.washington.edu/~chushumo/img/calcite_break_down.png
> >
> > In terms of integration, what I can imagine is to call Cosette web API in
> > parallel when running calcite test cases, like RelOptRulesTest, then
> > perhaps Cosette result can give developers either extra confidence or
> > counterexample if there is a bug.
> >
> >
> >
> > On Tue, Oct 17, 2017 at 5:49 PM, Michael Mior <mm...@uwaterloo.ca>
> wrote:
> >
> >> Shumo,
> >>
> >> Thanks for the update! Unfortunately I don't believe inline images and
> >> attachments are not supported on ASF mailing lists so if you have
> another
> >> way of sharing that, would be interested to see. Glad to hear the test
> >> cases have been useful to you. Also great to see that Calcite seems to
> be
> >> faring well so far :)
> >>
> >> You're right that the conversion of plans to SQL definitely has some
> >> missing pieces. If there are any that are particularly problematic in
> your
> >> with Cosette, please let us know. For any bugs you encounter, it would
> be
> >> great if you could file a JIRA so we can work on fixing these (
> >> https://issues.apache.org/jira/projects/CALCITE/issues/).
> >>
> >> --
> >> Michael Mior
> >> mmior@apache.org
> >>
> >> 2017-10-17 20:06 GMT-04:00 Shumo Chu <sh...@gmail.com>:
> >>
> >> > Hi Michael and Calcite Devs:
> >> >
> >> > That's great!
> >> >
> >> > We also hijacked the test harness of RelOpRule Test and generated SQL
> >> > using before and after optimization logical plans. (The reason that I
> >> use
> >> > query plan rather than the input SQL is that when Calcite generates
> SQLs
> >> > from query plans, it adds the correct table reference of attributes.
> >> > Cosette currently cannot handle unqualified attribute references). We
> >> got
> >> > 232 before and after SQL after some tweaking. You can find the SQL
> >> queries
> >> > that we get here: https://github.com/uwdb/Cosett
> >> e/blob/master/examples/c
> >> > alcite/calcite_tests.json.
> >> >
> >> > We then ran all of them using Cosette. Cosette can currently support
> 39
> >> of
> >> > 232 queries. Out of these 39 queries, the results are:
> >> >
> >> > EQ           9
> >> > UNKNOWN     27
> >> > NEQ          3
> >> >
> >> > EQ means our Coq backend found a valid proof for the equivalence of
> the
> >> > two queries.
> >> >
> >> > UNKNOWN means our model checker could not find a counterexample to
> prove
> >> > that the queries are not equivalent (within a time bound of 3 secs).
> You
> >> > can interpret that as "likely equal".
> >> >
> >> > NEQ means our model checker found a counterexample to prove the
> queries
> >> > are not equivalent. These are all the cases that there are some
> >> > preconditions that Calcite understands but we haven't yet put into
> >> Cosette.
> >> > For example (testPushSemiJoinPastJoinRuleLeft):
> >> >
> >> >  query q1 `SELECT EMP.ENAME
> >> >                   FROM EMP AS EMP, DEPT AS DEPT, EMP AS EMP0
> >> >                    WHERE EMP.DEPTNO = DEPT.DEPTNO AND
> >> >                                  EMP.EMPNO = EMP0.EMPNO`;
> >> >
> >> > query q2 `SELECT EMP1.ENAME
> >> >                  FROM EMP AS EMP1 INNER JOIN DEPT AS DEPT0
> >> >                     ON EMP1.DEPTNO = DEPT0.DEPTNO INNER JOIN EMP AS
> EMP2
> >> >                       ON EMP1.EMPNO = EMP2.EMPNO INNER JOIN DEPT AS
> >> DEPT1
> >> >                     ON EMP1.DEPTNO = DEPT1.DEPTNO INNER JOIN EMP AS
> EMP3
> >> >                     ON EMP1.EMPNO = EMP3.EMPNO`;
> >> >
> >> >
> >> > These two queries are equivalent under the precondition that EMPNO is
> >> the
> >> > primary key of EMP, and DEPTNO is the primary key of DEPT, and that is
> >> > indeed the case!
> >> >
> >> > We are extending Cosette to support preconditions such as primary keys
> >> and
> >> > foreign keys now and we should be able to handle these soon.
> >> >
> >> > For the queries that we cannot handle, here is the break down of the
> >> > issues:
> >> >
> >> > [image: Inline image 1]
> >> >
> >> > We support a large part of group by queries, GROUPBY here means some
> >> > grouping features that we don't support, e.g., group by on arbitrary
> >> > expressions. Although we might not be able to support all of these SQL
> >> > features, we are definitely working on adding more SQL features to
> >> Cosette.
> >> >
> >> > With all above, we can try to integrate Cosette as an extra layer to
> >> > ensure developer's confidence. Currently, we are working on SIGMOD
> >> deadline
> >> > (Nov. 2) but will devote more time into this afterward.
> >> >
> >> > One thing worth mentioning is that we plan to include all these
> calcite
> >> > examples as the regression test for Cosette so that we can make
> Cosette
> >> > more powerful and practical. These queries are great benchmarks for
> >> Cosette
> >> > and are super valuable!
> >> >
> >> > Meanwhile, It would be also great if calcite's logical plan to SQL
> >> > converter can be improved. It currently doesn't support all queries
> and
> >> in
> >> > some cases, it actually generates illegal SQL. (happy to submit
> detailed
> >> > bug report if you think that would be helpful).
> >> >
> >> > Best
> >> > Shumo
> >> >
> >> > On Tue, Oct 17, 2017 at 9:43 AM, Michael Mior <mm...@uwaterloo.ca>
> >> wrote:
> >> >
> >> >> This took me longer than expected to get around to, but hopefully the
> >> >> below
> >> >> is helpful:
> >> >>
> >> >> https://gist.github.com/fd2d2db412c11ec4d901925548f85ef2
> >> >>
> >> >> I just did some basic (and very hacky) instrumentation of
> >> RelOptRulesTest
> >> >> to dump SQL before and after rules have been applied. The file
> >> consists of
> >> >> the name of the test followed by the original and then the rewritten
> >> SQL.
> >> >>
> >> >> Many of the tests are missing for various reasons, but there's still
> >> 189
> >> >> examples there to play with. Let me know if any particular aspects of
> >> the
> >> >> SQL are problematic. The "before" SQL is handwritten for the tests
> and
> >> the
> >> >> "after" is ANSI SQL as generated by Calcite from the resulting
> logical
> >> >> plan.
> >> >>
> >> >> --
> >> >> Michael Mior
> >> >> mmior@apache.org
> >> >>
> >> >> 2017-09-22 1:03 GMT-04:00 Amogh Margoor <am...@qubole.com>:
> >> >>
> >> >> > >>> There might be applications in materialized views. A query Q
> can
> >> >> use a
> >> >> > materialized view V if V covers Q. In other words if >>>Q == R(V)
> >> where
> >> >> R
> >> >> > is some sequence of relational operators. Given Q and V, Cosette
> >> could
> >> >> > perhaps analyze and either >>>return R (success) or return that V
> >> does
> >> >> not
> >> >> > cover Q (failure).
> >> >> >
> >> >> > >>This resembles the problem of deciding whether a given relation
> >> >> > (expressed as a query) is contained in another one. It will >>take
> >> some
> >> >> > work for Cosette to be able to handle this but it definitely sounds
> >> >> > interesting. Do you have an application in mind? >>One of them
> might
> >> be
> >> >> to
> >> >> > determine whether previously cached results can be used.
> >> >> >
> >> >> > One simple idea to start here is to replace a naive solver we have
> in
> >> >> > Calcite for checking if one predicate implies another predicate. We
> >> >> call it
> >> >> > RexImplicationChecker in Calcite and if we can replace or help it
> >> with
> >> >> > Constraint solver of Cosette which says if a particular implication
> >> is a
> >> >> > tautology then that would help a great deal.
> >> >> >
> >> >> >
> >> >> >
> >> >> > On Tue, Sep 19, 2017 at 9:29 PM, Alvin Cheung <
> >> >> akcheung@cs.washington.edu>
> >> >> > wrote:
> >> >> >
> >> >> > > Hi Julian et al,
> >> >> > >
> >> >> > > Thanks for your interest in Cosette. Your suggestions make a lot
> of
> >> >> > sense.
> >> >> > > We have done some initial work and would like to get your
> feedback
> >> on
> >> >> how
> >> >> > > to integrate the two tools together.
> >> >> > >
> >> >> > > > One obvious idea is to use Cosette to audit Calcite’s query
> >> >> > > transformation rules. Each rule is supposed to preserve semantics
> >> but
> >> >> > > (until Cosette) we had to trust the author of the rule. We could
> >> >> convert
> >> >> > > the before and after relational expressions to SQL, and then ask
> >> >> Cosette
> >> >> > > whether those are equivalent. We could enable this check in
> >> Calcite’s
> >> >> > test
> >> >> > > suite, during which many thousands of rules are fired.
> >> >> > >
> >> >> > > Indeed. We have browsed through the Calcite rules and
> reformulated
> >> a
> >> >> few
> >> >> > > of them using our Cosette language:
> >> >> > >
> >> >> > > 1. Conjunctive select (https://github.com/apache/cal
> >> >> > > cite/blob/master/core/src/main/java/org/apache/calcite/rel/
> >> >> > > rules/FilterMergeRule.java) --> https://demo.cosette.cs.washin
> >> >> gton.edu/
> >> >> > > (click conjunctive select from the dropdown menu)
> >> >> > >
> >> >> > > 2. Join commute (https://github.com/apache/cal
> >> >> > > cite/blob/master/core/src/main/java/org/apache/calcite/rel/
> >> >> > > rules/JoinCommuteRule.java) --> Join commute from the demo
> website
> >> >> above
> >> >> > >
> >> >> > > 3. Join/Project transpose (https://github.com/apache/cal
> >> >> > > cite/blob/master/core/src/main/java/org/apache/calcite/rel/
> >> >> > > rules/JoinProjectTransposeRule.java) --> Join Proj. Trans. from
> >> the
> >> >> demo
> >> >> > > website above
> >> >> > >
> >> >> > > As we are not very familiar with the Calcite code base, we have
> >> tried
> >> >> our
> >> >> > > best to guess the intention of each rule based on the
> >> documentation,
> >> >> > please
> >> >> > > feel free to point out if we made mistakes.
> >> >> > >
> >> >> > > As you can see, the Cosette language is pretty much like standard
> >> SQL,
> >> >> > > except for declarations of schemas and relations. You will also
> >> notice
> >> >> > the
> >> >> > > "??" in some schema declarations (e.g., in rule 1. above) ---
> they
> >> >> stand
> >> >> > > for "symbolic" attributes that can represent any attribute. In
> >> other
> >> >> > words,
> >> >> > > if Cosette can prove that a rule with symbolic attributes is
> true,
> >> >> then
> >> >> > it
> >> >> > > will be true regardless of what the symbolic attributes are
> >> >> instantiated
> >> >> > > with. Symbolic predicates (e.g., in rule 1.) works similarly,
> hence
> >> >> > giving
> >> >> > > Cosette a mechanism to prove (or disprove) classes of rewrite
> >> rules.
> >> >> See
> >> >> > > our documentation at http://cosette.cs.washington.edu/guide for
> >> >> details.
> >> >> > >
> >> >> > > I believe the challenge here is how we can "reverse engineer" the
> >> >> > > intention of each of the existing rules so that they can be
> >> expressed
> >> >> in
> >> >> > > Cosette. Any suggestions on how to do this? We have a few
> students
> >> >> > working
> >> >> > > on Cosette and can help, but we will probably need help from
> >> Calcite
> >> >> to
> >> >> > > fully understand all of the existing rules. Another possibility
> is
> >> to
> >> >> > print
> >> >> > > out the input and output of each rule application during testing,
> >> and
> >> >> > send
> >> >> > > them to Cosette. If the printout is in a form that resembles SQL
> we
> >> >> can
> >> >> > > probably patch it into Cosette.
> >> >> > >
> >> >> > > For new rules, can we can ask Calcite authors to express them in
> >> >> Cosette
> >> >> > > as well, perhaps as part of the documentation? This way we will
> >> only
> >> >> need
> >> >> > > to handle the existing rules.
> >> >> > >
> >> >> > > > A few rules might use other information besides the input
> >> relational
> >> >> > > expression, such as predicates that are known to hold or column
> >> >> > > combinations that are known to be unique. But let’s see what
> >> happens.
> >> >> > >
> >> >> > > This is something that we are actively working on. Can you point
> >> us to
> >> >> > > specific rules with such properties? One possibility is the join
> >> >> > > commutativity rule noted above. You will notice that we didn't
> >> prove
> >> >> the
> >> >> > > "general form" of the rule with symbolic attributes, but rather
> one
> >> >> with
> >> >> > > concrete schemas. This is because Cosette currently implements
> the
> >> >> > unnamed
> >> >> > > approach to attribute naming (see Section 3.2 in
> >> >> > > http://webdam.inria.fr/Alice/pdfs/Chapter-3.pdf), hence the
> >> general
> >> >> form
> >> >> > > of the rule is only true if we know that the two input schemas
> have
> >> >> > > distinct attributes.
> >> >> > >
> >> >> > > > This is a very loose integration of Cosette / Calcite, but we
> can
> >> >> make
> >> >> > > closer integrations (e.g. within the same JVM, even at runtime)
> as
> >> we
> >> >> > > discover synergies. After all, optimization and theorem-proving
> are
> >> >> > related
> >> >> > > endeavors.
> >> >> > >
> >> >> > > Agreed. Cosette is implemented using Coq and Racket. We realize
> >> that
> >> >> > those
> >> >> > > are not the most popular languages for implementing systems :) ,
> so
> >> >> > Cosette
> >> >> > > comes with a POST API as well: http://cosette.cs.washington.
> >> >> > edu/guide#api
> >> >> > > . It takes in the program text written in Cosette, and returns
> the
> >> >> answer
> >> >> > > (or times out). Does this make it easier to run the tool? We are
> >> open
> >> >> to
> >> >> > > implementing other bindings as well.
> >> >> > >
> >> >> > > > Another area that would be useful would be to devise test data.
> >> >> > >
> >> >> > > How about this: Each SQL implementation has its own
> interpretation
> >> of
> >> >> > SQL,
> >> >> > > with Cosette being one of them. Let's implement different SQL
> >> >> semantics
> >> >> > > using Cosette (say, Calcite's and Postgres'). Then, given a
> query,
> >> ask
> >> >> > > Cosette to find a counterexample (i.e., an input relation) where
> >> the
> >> >> two
> >> >> > > implementations will return different results when executed on a
> >> given
> >> >> > > query. If such a counterexample exists, then Calcite developers
> can
> >> >> > > determine whether this is a "bug" or a "feature". Does this sound
> >> >> similar
> >> >> > > to what you have in mind?
> >> >> > >
> >> >> > > > There might be applications in materialized views. A query Q
> can
> >> >> use a
> >> >> > > materialized view V if V covers Q. In other words if Q == R(V)
> >> where
> >> >> R is
> >> >> > > some sequence of relational operators. Given Q and V, Cosette
> could
> >> >> > perhaps
> >> >> > > analyze and either return R (success) or return that V does not
> >> cover
> >> >> Q
> >> >> > > (failure).
> >> >> > >
> >> >> > > This resembles the problem of deciding whether a given relation
> >> >> > (expressed
> >> >> > > as a query) is contained in another one. It will take some work
> for
> >> >> > Cosette
> >> >> > > to be able to handle this but it definitely sounds interesting.
> Do
> >> you
> >> >> > have
> >> >> > > an application in mind? One of them might be to determine whether
> >> >> > > previously cached results can be used.
> >> >> > >
> >> >> > > We definitely see lots of synergies between the two tools. To
> start
> >> >> with
> >> >> > > something easy :) , I propose we first discuss how to use the
> >> current
> >> >> > > Cosette implementation to audit existing Calcite rules, and a way
> >> to
> >> >> > > integrate Cosette into development of future Calcite rules as
> part
> >> of
> >> >> > code
> >> >> > > review / regression tests. What do you think?
> >> >> > >
> >> >> > > Thanks,
> >> >> > > Alvin (on behalf of the Cosette team)
> >> >> > >
> >> >> >
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > shumochu.com
> >> >
> >>
> >
> >
> >
> > --
> > shumochu.com
> >
>
>
>
> --
> shumochu.com
>

Re: Cosette / Apache Calcite

Posted by Shumo Chu <sh...@gmail.com>.
Hi, Calcite Devs!

We wanted to update you on the current status of using Cosette to help
check the correctness of the Calcite planner rules [
https://issues.apache.org/jira/browse/CALCITE-1977]:

So far we have used Cosette to check whether Calcite's transformations are
correct by using the test cases in RelOptRulesTest.java.

For each test case, we use RelToSqlConverter to revert the query plan
before Calcite's rewrite back to SQL (call it Q1) and similarly for the
query plan after Calcite's rewrite (Q2). We then use Cosette to check
whether Q1 and Q2 are semantically equivalent.

As Calcite's test cases cover many SQL features, some of them are not
supported by Cosette yet. Out of the 39 (give number) test cases that use
SQL features supported by Cosette, Cosette is now able to formally prove
that Calcite's rewrite in 33 of them are correct. This includes a few
fairly complicated ones, like "testPushFilterPassAggThree" (
https://github.com/apache/calcite/blob/master/core/src/test
/java/org/apache/calcite/test/RelOptRulesTest.java#L408) The good news is
that we haven't found any bugs so far :)

We have also used the test cases to improve Cosette. For example, Cosette
now supports checking equivalence under database integrity constraints as
preconditions, e.g., Q1 == Q2 iff R.A is a key of R. In case you are
interested you can read about our latest system at
https://arxiv.org/abs/1802.02229

We would like to improve Cosette's parser so that we can handle the
remaining Calcite test cases. Meanwhile, we would also like to discuss how
to integrate the two tools in a more systematic way, perhaps as part of the
Calcite test infrastructure rather than us manually extracting the test
cases from RelOptRulesTest (RelToSqlConverter also doesn't always succeed
in reversion as you may know). Anyone interested in discussing more? If so
how should we proceed?

Best
Shumo & Alvin

On Tue, Oct 17, 2017 at 6:36 PM, Shumo Chu <sh...@gmail.com> wrote:

> I create a link to the image:
>
> https://homes.cs.washington.edu/~chushumo/img/calcite_break_down.png
>
> In terms of integration, what I can imagine is to call Cosette web API in
> parallel when running calcite test cases, like RelOptRulesTest, then
> perhaps Cosette result can give developers either extra confidence or
> counterexample if there is a bug.
>
>
>
> On Tue, Oct 17, 2017 at 5:49 PM, Michael Mior <mm...@uwaterloo.ca> wrote:
>
>> Shumo,
>>
>> Thanks for the update! Unfortunately I don't believe inline images and
>> attachments are not supported on ASF mailing lists so if you have another
>> way of sharing that, would be interested to see. Glad to hear the test
>> cases have been useful to you. Also great to see that Calcite seems to be
>> faring well so far :)
>>
>> You're right that the conversion of plans to SQL definitely has some
>> missing pieces. If there are any that are particularly problematic in your
>> with Cosette, please let us know. For any bugs you encounter, it would be
>> great if you could file a JIRA so we can work on fixing these (
>> https://issues.apache.org/jira/projects/CALCITE/issues/).
>>
>> --
>> Michael Mior
>> mmior@apache.org
>>
>> 2017-10-17 20:06 GMT-04:00 Shumo Chu <sh...@gmail.com>:
>>
>> > Hi Michael and Calcite Devs:
>> >
>> > That's great!
>> >
>> > We also hijacked the test harness of RelOpRule Test and generated SQL
>> > using before and after optimization logical plans. (The reason that I
>> use
>> > query plan rather than the input SQL is that when Calcite generates SQLs
>> > from query plans, it adds the correct table reference of attributes.
>> > Cosette currently cannot handle unqualified attribute references). We
>> got
>> > 232 before and after SQL after some tweaking. You can find the SQL
>> queries
>> > that we get here: https://github.com/uwdb/Cosett
>> e/blob/master/examples/c
>> > alcite/calcite_tests.json.
>> >
>> > We then ran all of them using Cosette. Cosette can currently support 39
>> of
>> > 232 queries. Out of these 39 queries, the results are:
>> >
>> > EQ           9
>> > UNKNOWN     27
>> > NEQ          3
>> >
>> > EQ means our Coq backend found a valid proof for the equivalence of the
>> > two queries.
>> >
>> > UNKNOWN means our model checker could not find a counterexample to prove
>> > that the queries are not equivalent (within a time bound of 3 secs). You
>> > can interpret that as "likely equal".
>> >
>> > NEQ means our model checker found a counterexample to prove the queries
>> > are not equivalent. These are all the cases that there are some
>> > preconditions that Calcite understands but we haven't yet put into
>> Cosette.
>> > For example (testPushSemiJoinPastJoinRuleLeft):
>> >
>> >  query q1 `SELECT EMP.ENAME
>> >                   FROM EMP AS EMP, DEPT AS DEPT, EMP AS EMP0
>> >                    WHERE EMP.DEPTNO = DEPT.DEPTNO AND
>> >                                  EMP.EMPNO = EMP0.EMPNO`;
>> >
>> > query q2 `SELECT EMP1.ENAME
>> >                  FROM EMP AS EMP1 INNER JOIN DEPT AS DEPT0
>> >                     ON EMP1.DEPTNO = DEPT0.DEPTNO INNER JOIN EMP AS EMP2
>> >                       ON EMP1.EMPNO = EMP2.EMPNO INNER JOIN DEPT AS
>> DEPT1
>> >                     ON EMP1.DEPTNO = DEPT1.DEPTNO INNER JOIN EMP AS EMP3
>> >                     ON EMP1.EMPNO = EMP3.EMPNO`;
>> >
>> >
>> > These two queries are equivalent under the precondition that EMPNO is
>> the
>> > primary key of EMP, and DEPTNO is the primary key of DEPT, and that is
>> > indeed the case!
>> >
>> > We are extending Cosette to support preconditions such as primary keys
>> and
>> > foreign keys now and we should be able to handle these soon.
>> >
>> > For the queries that we cannot handle, here is the break down of the
>> > issues:
>> >
>> > [image: Inline image 1]
>> >
>> > We support a large part of group by queries, GROUPBY here means some
>> > grouping features that we don't support, e.g., group by on arbitrary
>> > expressions. Although we might not be able to support all of these SQL
>> > features, we are definitely working on adding more SQL features to
>> Cosette.
>> >
>> > With all above, we can try to integrate Cosette as an extra layer to
>> > ensure developer's confidence. Currently, we are working on SIGMOD
>> deadline
>> > (Nov. 2) but will devote more time into this afterward.
>> >
>> > One thing worth mentioning is that we plan to include all these calcite
>> > examples as the regression test for Cosette so that we can make Cosette
>> > more powerful and practical. These queries are great benchmarks for
>> Cosette
>> > and are super valuable!
>> >
>> > Meanwhile, It would be also great if calcite's logical plan to SQL
>> > converter can be improved. It currently doesn't support all queries and
>> in
>> > some cases, it actually generates illegal SQL. (happy to submit detailed
>> > bug report if you think that would be helpful).
>> >
>> > Best
>> > Shumo
>> >
>> > On Tue, Oct 17, 2017 at 9:43 AM, Michael Mior <mm...@uwaterloo.ca>
>> wrote:
>> >
>> >> This took me longer than expected to get around to, but hopefully the
>> >> below
>> >> is helpful:
>> >>
>> >> https://gist.github.com/fd2d2db412c11ec4d901925548f85ef2
>> >>
>> >> I just did some basic (and very hacky) instrumentation of
>> RelOptRulesTest
>> >> to dump SQL before and after rules have been applied. The file
>> consists of
>> >> the name of the test followed by the original and then the rewritten
>> SQL.
>> >>
>> >> Many of the tests are missing for various reasons, but there's still
>> 189
>> >> examples there to play with. Let me know if any particular aspects of
>> the
>> >> SQL are problematic. The "before" SQL is handwritten for the tests and
>> the
>> >> "after" is ANSI SQL as generated by Calcite from the resulting logical
>> >> plan.
>> >>
>> >> --
>> >> Michael Mior
>> >> mmior@apache.org
>> >>
>> >> 2017-09-22 1:03 GMT-04:00 Amogh Margoor <am...@qubole.com>:
>> >>
>> >> > >>> There might be applications in materialized views. A query Q can
>> >> use a
>> >> > materialized view V if V covers Q. In other words if >>>Q == R(V)
>> where
>> >> R
>> >> > is some sequence of relational operators. Given Q and V, Cosette
>> could
>> >> > perhaps analyze and either >>>return R (success) or return that V
>> does
>> >> not
>> >> > cover Q (failure).
>> >> >
>> >> > >>This resembles the problem of deciding whether a given relation
>> >> > (expressed as a query) is contained in another one. It will >>take
>> some
>> >> > work for Cosette to be able to handle this but it definitely sounds
>> >> > interesting. Do you have an application in mind? >>One of them might
>> be
>> >> to
>> >> > determine whether previously cached results can be used.
>> >> >
>> >> > One simple idea to start here is to replace a naive solver we have in
>> >> > Calcite for checking if one predicate implies another predicate. We
>> >> call it
>> >> > RexImplicationChecker in Calcite and if we can replace or help it
>> with
>> >> > Constraint solver of Cosette which says if a particular implication
>> is a
>> >> > tautology then that would help a great deal.
>> >> >
>> >> >
>> >> >
>> >> > On Tue, Sep 19, 2017 at 9:29 PM, Alvin Cheung <
>> >> akcheung@cs.washington.edu>
>> >> > wrote:
>> >> >
>> >> > > Hi Julian et al,
>> >> > >
>> >> > > Thanks for your interest in Cosette. Your suggestions make a lot of
>> >> > sense.
>> >> > > We have done some initial work and would like to get your feedback
>> on
>> >> how
>> >> > > to integrate the two tools together.
>> >> > >
>> >> > > > One obvious idea is to use Cosette to audit Calcite’s query
>> >> > > transformation rules. Each rule is supposed to preserve semantics
>> but
>> >> > > (until Cosette) we had to trust the author of the rule. We could
>> >> convert
>> >> > > the before and after relational expressions to SQL, and then ask
>> >> Cosette
>> >> > > whether those are equivalent. We could enable this check in
>> Calcite’s
>> >> > test
>> >> > > suite, during which many thousands of rules are fired.
>> >> > >
>> >> > > Indeed. We have browsed through the Calcite rules and reformulated
>> a
>> >> few
>> >> > > of them using our Cosette language:
>> >> > >
>> >> > > 1. Conjunctive select (https://github.com/apache/cal
>> >> > > cite/blob/master/core/src/main/java/org/apache/calcite/rel/
>> >> > > rules/FilterMergeRule.java) --> https://demo.cosette.cs.washin
>> >> gton.edu/
>> >> > > (click conjunctive select from the dropdown menu)
>> >> > >
>> >> > > 2. Join commute (https://github.com/apache/cal
>> >> > > cite/blob/master/core/src/main/java/org/apache/calcite/rel/
>> >> > > rules/JoinCommuteRule.java) --> Join commute from the demo website
>> >> above
>> >> > >
>> >> > > 3. Join/Project transpose (https://github.com/apache/cal
>> >> > > cite/blob/master/core/src/main/java/org/apache/calcite/rel/
>> >> > > rules/JoinProjectTransposeRule.java) --> Join Proj. Trans. from
>> the
>> >> demo
>> >> > > website above
>> >> > >
>> >> > > As we are not very familiar with the Calcite code base, we have
>> tried
>> >> our
>> >> > > best to guess the intention of each rule based on the
>> documentation,
>> >> > please
>> >> > > feel free to point out if we made mistakes.
>> >> > >
>> >> > > As you can see, the Cosette language is pretty much like standard
>> SQL,
>> >> > > except for declarations of schemas and relations. You will also
>> notice
>> >> > the
>> >> > > "??" in some schema declarations (e.g., in rule 1. above) --- they
>> >> stand
>> >> > > for "symbolic" attributes that can represent any attribute. In
>> other
>> >> > words,
>> >> > > if Cosette can prove that a rule with symbolic attributes is true,
>> >> then
>> >> > it
>> >> > > will be true regardless of what the symbolic attributes are
>> >> instantiated
>> >> > > with. Symbolic predicates (e.g., in rule 1.) works similarly, hence
>> >> > giving
>> >> > > Cosette a mechanism to prove (or disprove) classes of rewrite
>> rules.
>> >> See
>> >> > > our documentation at http://cosette.cs.washington.edu/guide for
>> >> details.
>> >> > >
>> >> > > I believe the challenge here is how we can "reverse engineer" the
>> >> > > intention of each of the existing rules so that they can be
>> expressed
>> >> in
>> >> > > Cosette. Any suggestions on how to do this? We have a few students
>> >> > working
>> >> > > on Cosette and can help, but we will probably need help from
>> Calcite
>> >> to
>> >> > > fully understand all of the existing rules. Another possibility is
>> to
>> >> > print
>> >> > > out the input and output of each rule application during testing,
>> and
>> >> > send
>> >> > > them to Cosette. If the printout is in a form that resembles SQL we
>> >> can
>> >> > > probably patch it into Cosette.
>> >> > >
>> >> > > For new rules, can we can ask Calcite authors to express them in
>> >> Cosette
>> >> > > as well, perhaps as part of the documentation? This way we will
>> only
>> >> need
>> >> > > to handle the existing rules.
>> >> > >
>> >> > > > A few rules might use other information besides the input
>> relational
>> >> > > expression, such as predicates that are known to hold or column
>> >> > > combinations that are known to be unique. But let’s see what
>> happens.
>> >> > >
>> >> > > This is something that we are actively working on. Can you point
>> us to
>> >> > > specific rules with such properties? One possibility is the join
>> >> > > commutativity rule noted above. You will notice that we didn't
>> prove
>> >> the
>> >> > > "general form" of the rule with symbolic attributes, but rather one
>> >> with
>> >> > > concrete schemas. This is because Cosette currently implements the
>> >> > unnamed
>> >> > > approach to attribute naming (see Section 3.2 in
>> >> > > http://webdam.inria.fr/Alice/pdfs/Chapter-3.pdf), hence the
>> general
>> >> form
>> >> > > of the rule is only true if we know that the two input schemas have
>> >> > > distinct attributes.
>> >> > >
>> >> > > > This is a very loose integration of Cosette / Calcite, but we can
>> >> make
>> >> > > closer integrations (e.g. within the same JVM, even at runtime) as
>> we
>> >> > > discover synergies. After all, optimization and theorem-proving are
>> >> > related
>> >> > > endeavors.
>> >> > >
>> >> > > Agreed. Cosette is implemented using Coq and Racket. We realize
>> that
>> >> > those
>> >> > > are not the most popular languages for implementing systems :) , so
>> >> > Cosette
>> >> > > comes with a POST API as well: http://cosette.cs.washington.
>> >> > edu/guide#api
>> >> > > . It takes in the program text written in Cosette, and returns the
>> >> answer
>> >> > > (or times out). Does this make it easier to run the tool? We are
>> open
>> >> to
>> >> > > implementing other bindings as well.
>> >> > >
>> >> > > > Another area that would be useful would be to devise test data.
>> >> > >
>> >> > > How about this: Each SQL implementation has its own interpretation
>> of
>> >> > SQL,
>> >> > > with Cosette being one of them. Let's implement different SQL
>> >> semantics
>> >> > > using Cosette (say, Calcite's and Postgres'). Then, given a query,
>> ask
>> >> > > Cosette to find a counterexample (i.e., an input relation) where
>> the
>> >> two
>> >> > > implementations will return different results when executed on a
>> given
>> >> > > query. If such a counterexample exists, then Calcite developers can
>> >> > > determine whether this is a "bug" or a "feature". Does this sound
>> >> similar
>> >> > > to what you have in mind?
>> >> > >
>> >> > > > There might be applications in materialized views. A query Q can
>> >> use a
>> >> > > materialized view V if V covers Q. In other words if Q == R(V)
>> where
>> >> R is
>> >> > > some sequence of relational operators. Given Q and V, Cosette could
>> >> > perhaps
>> >> > > analyze and either return R (success) or return that V does not
>> cover
>> >> Q
>> >> > > (failure).
>> >> > >
>> >> > > This resembles the problem of deciding whether a given relation
>> >> > (expressed
>> >> > > as a query) is contained in another one. It will take some work for
>> >> > Cosette
>> >> > > to be able to handle this but it definitely sounds interesting. Do
>> you
>> >> > have
>> >> > > an application in mind? One of them might be to determine whether
>> >> > > previously cached results can be used.
>> >> > >
>> >> > > We definitely see lots of synergies between the two tools. To start
>> >> with
>> >> > > something easy :) , I propose we first discuss how to use the
>> current
>> >> > > Cosette implementation to audit existing Calcite rules, and a way
>> to
>> >> > > integrate Cosette into development of future Calcite rules as part
>> of
>> >> > code
>> >> > > review / regression tests. What do you think?
>> >> > >
>> >> > > Thanks,
>> >> > > Alvin (on behalf of the Cosette team)
>> >> > >
>> >> >
>> >>
>> >
>> >
>> >
>> > --
>> > shumochu.com
>> >
>>
>
>
>
> --
> shumochu.com
>



-- 
shumochu.com

Re: Cosette / Apache Calcite

Posted by Shumo Chu <sh...@gmail.com>.
I create a link to the image:

https://homes.cs.washington.edu/~chushumo/img/calcite_break_down.png

In terms of integration, what I can imagine is to call Cosette web API in
parallel when running calcite test cases, like RelOptRulesTest, then
perhaps Cosette result can give developers either extra confidence or
counterexample if there is a bug.



On Tue, Oct 17, 2017 at 5:49 PM, Michael Mior <mm...@uwaterloo.ca> wrote:

> Shumo,
>
> Thanks for the update! Unfortunately I don't believe inline images and
> attachments are not supported on ASF mailing lists so if you have another
> way of sharing that, would be interested to see. Glad to hear the test
> cases have been useful to you. Also great to see that Calcite seems to be
> faring well so far :)
>
> You're right that the conversion of plans to SQL definitely has some
> missing pieces. If there are any that are particularly problematic in your
> with Cosette, please let us know. For any bugs you encounter, it would be
> great if you could file a JIRA so we can work on fixing these (
> https://issues.apache.org/jira/projects/CALCITE/issues/).
>
> --
> Michael Mior
> mmior@apache.org
>
> 2017-10-17 20:06 GMT-04:00 Shumo Chu <sh...@gmail.com>:
>
> > Hi Michael and Calcite Devs:
> >
> > That's great!
> >
> > We also hijacked the test harness of RelOpRule Test and generated SQL
> > using before and after optimization logical plans. (The reason that I use
> > query plan rather than the input SQL is that when Calcite generates SQLs
> > from query plans, it adds the correct table reference of attributes.
> > Cosette currently cannot handle unqualified attribute references). We got
> > 232 before and after SQL after some tweaking. You can find the SQL
> queries
> > that we get here: https://github.com/uwdb/Cosette/blob/master/examples/c
> > alcite/calcite_tests.json.
> >
> > We then ran all of them using Cosette. Cosette can currently support 39
> of
> > 232 queries. Out of these 39 queries, the results are:
> >
> > EQ           9
> > UNKNOWN     27
> > NEQ          3
> >
> > EQ means our Coq backend found a valid proof for the equivalence of the
> > two queries.
> >
> > UNKNOWN means our model checker could not find a counterexample to prove
> > that the queries are not equivalent (within a time bound of 3 secs). You
> > can interpret that as "likely equal".
> >
> > NEQ means our model checker found a counterexample to prove the queries
> > are not equivalent. These are all the cases that there are some
> > preconditions that Calcite understands but we haven't yet put into
> Cosette.
> > For example (testPushSemiJoinPastJoinRuleLeft):
> >
> >  query q1 `SELECT EMP.ENAME
> >                   FROM EMP AS EMP, DEPT AS DEPT, EMP AS EMP0
> >                    WHERE EMP.DEPTNO = DEPT.DEPTNO AND
> >                                  EMP.EMPNO = EMP0.EMPNO`;
> >
> > query q2 `SELECT EMP1.ENAME
> >                  FROM EMP AS EMP1 INNER JOIN DEPT AS DEPT0
> >                     ON EMP1.DEPTNO = DEPT0.DEPTNO INNER JOIN EMP AS EMP2
> >                       ON EMP1.EMPNO = EMP2.EMPNO INNER JOIN DEPT AS DEPT1
> >                     ON EMP1.DEPTNO = DEPT1.DEPTNO INNER JOIN EMP AS EMP3
> >                     ON EMP1.EMPNO = EMP3.EMPNO`;
> >
> >
> > These two queries are equivalent under the precondition that EMPNO is the
> > primary key of EMP, and DEPTNO is the primary key of DEPT, and that is
> > indeed the case!
> >
> > We are extending Cosette to support preconditions such as primary keys
> and
> > foreign keys now and we should be able to handle these soon.
> >
> > For the queries that we cannot handle, here is the break down of the
> > issues:
> >
> > [image: Inline image 1]
> >
> > We support a large part of group by queries, GROUPBY here means some
> > grouping features that we don't support, e.g., group by on arbitrary
> > expressions. Although we might not be able to support all of these SQL
> > features, we are definitely working on adding more SQL features to
> Cosette.
> >
> > With all above, we can try to integrate Cosette as an extra layer to
> > ensure developer's confidence. Currently, we are working on SIGMOD
> deadline
> > (Nov. 2) but will devote more time into this afterward.
> >
> > One thing worth mentioning is that we plan to include all these calcite
> > examples as the regression test for Cosette so that we can make Cosette
> > more powerful and practical. These queries are great benchmarks for
> Cosette
> > and are super valuable!
> >
> > Meanwhile, It would be also great if calcite's logical plan to SQL
> > converter can be improved. It currently doesn't support all queries and
> in
> > some cases, it actually generates illegal SQL. (happy to submit detailed
> > bug report if you think that would be helpful).
> >
> > Best
> > Shumo
> >
> > On Tue, Oct 17, 2017 at 9:43 AM, Michael Mior <mm...@uwaterloo.ca>
> wrote:
> >
> >> This took me longer than expected to get around to, but hopefully the
> >> below
> >> is helpful:
> >>
> >> https://gist.github.com/fd2d2db412c11ec4d901925548f85ef2
> >>
> >> I just did some basic (and very hacky) instrumentation of
> RelOptRulesTest
> >> to dump SQL before and after rules have been applied. The file consists
> of
> >> the name of the test followed by the original and then the rewritten
> SQL.
> >>
> >> Many of the tests are missing for various reasons, but there's still 189
> >> examples there to play with. Let me know if any particular aspects of
> the
> >> SQL are problematic. The "before" SQL is handwritten for the tests and
> the
> >> "after" is ANSI SQL as generated by Calcite from the resulting logical
> >> plan.
> >>
> >> --
> >> Michael Mior
> >> mmior@apache.org
> >>
> >> 2017-09-22 1:03 GMT-04:00 Amogh Margoor <am...@qubole.com>:
> >>
> >> > >>> There might be applications in materialized views. A query Q can
> >> use a
> >> > materialized view V if V covers Q. In other words if >>>Q == R(V)
> where
> >> R
> >> > is some sequence of relational operators. Given Q and V, Cosette could
> >> > perhaps analyze and either >>>return R (success) or return that V does
> >> not
> >> > cover Q (failure).
> >> >
> >> > >>This resembles the problem of deciding whether a given relation
> >> > (expressed as a query) is contained in another one. It will >>take
> some
> >> > work for Cosette to be able to handle this but it definitely sounds
> >> > interesting. Do you have an application in mind? >>One of them might
> be
> >> to
> >> > determine whether previously cached results can be used.
> >> >
> >> > One simple idea to start here is to replace a naive solver we have in
> >> > Calcite for checking if one predicate implies another predicate. We
> >> call it
> >> > RexImplicationChecker in Calcite and if we can replace or help it with
> >> > Constraint solver of Cosette which says if a particular implication
> is a
> >> > tautology then that would help a great deal.
> >> >
> >> >
> >> >
> >> > On Tue, Sep 19, 2017 at 9:29 PM, Alvin Cheung <
> >> akcheung@cs.washington.edu>
> >> > wrote:
> >> >
> >> > > Hi Julian et al,
> >> > >
> >> > > Thanks for your interest in Cosette. Your suggestions make a lot of
> >> > sense.
> >> > > We have done some initial work and would like to get your feedback
> on
> >> how
> >> > > to integrate the two tools together.
> >> > >
> >> > > > One obvious idea is to use Cosette to audit Calcite’s query
> >> > > transformation rules. Each rule is supposed to preserve semantics
> but
> >> > > (until Cosette) we had to trust the author of the rule. We could
> >> convert
> >> > > the before and after relational expressions to SQL, and then ask
> >> Cosette
> >> > > whether those are equivalent. We could enable this check in
> Calcite’s
> >> > test
> >> > > suite, during which many thousands of rules are fired.
> >> > >
> >> > > Indeed. We have browsed through the Calcite rules and reformulated a
> >> few
> >> > > of them using our Cosette language:
> >> > >
> >> > > 1. Conjunctive select (https://github.com/apache/cal
> >> > > cite/blob/master/core/src/main/java/org/apache/calcite/rel/
> >> > > rules/FilterMergeRule.java) --> https://demo.cosette.cs.washin
> >> gton.edu/
> >> > > (click conjunctive select from the dropdown menu)
> >> > >
> >> > > 2. Join commute (https://github.com/apache/cal
> >> > > cite/blob/master/core/src/main/java/org/apache/calcite/rel/
> >> > > rules/JoinCommuteRule.java) --> Join commute from the demo website
> >> above
> >> > >
> >> > > 3. Join/Project transpose (https://github.com/apache/cal
> >> > > cite/blob/master/core/src/main/java/org/apache/calcite/rel/
> >> > > rules/JoinProjectTransposeRule.java) --> Join Proj. Trans. from the
> >> demo
> >> > > website above
> >> > >
> >> > > As we are not very familiar with the Calcite code base, we have
> tried
> >> our
> >> > > best to guess the intention of each rule based on the documentation,
> >> > please
> >> > > feel free to point out if we made mistakes.
> >> > >
> >> > > As you can see, the Cosette language is pretty much like standard
> SQL,
> >> > > except for declarations of schemas and relations. You will also
> notice
> >> > the
> >> > > "??" in some schema declarations (e.g., in rule 1. above) --- they
> >> stand
> >> > > for "symbolic" attributes that can represent any attribute. In other
> >> > words,
> >> > > if Cosette can prove that a rule with symbolic attributes is true,
> >> then
> >> > it
> >> > > will be true regardless of what the symbolic attributes are
> >> instantiated
> >> > > with. Symbolic predicates (e.g., in rule 1.) works similarly, hence
> >> > giving
> >> > > Cosette a mechanism to prove (or disprove) classes of rewrite rules.
> >> See
> >> > > our documentation at http://cosette.cs.washington.edu/guide for
> >> details.
> >> > >
> >> > > I believe the challenge here is how we can "reverse engineer" the
> >> > > intention of each of the existing rules so that they can be
> expressed
> >> in
> >> > > Cosette. Any suggestions on how to do this? We have a few students
> >> > working
> >> > > on Cosette and can help, but we will probably need help from Calcite
> >> to
> >> > > fully understand all of the existing rules. Another possibility is
> to
> >> > print
> >> > > out the input and output of each rule application during testing,
> and
> >> > send
> >> > > them to Cosette. If the printout is in a form that resembles SQL we
> >> can
> >> > > probably patch it into Cosette.
> >> > >
> >> > > For new rules, can we can ask Calcite authors to express them in
> >> Cosette
> >> > > as well, perhaps as part of the documentation? This way we will only
> >> need
> >> > > to handle the existing rules.
> >> > >
> >> > > > A few rules might use other information besides the input
> relational
> >> > > expression, such as predicates that are known to hold or column
> >> > > combinations that are known to be unique. But let’s see what
> happens.
> >> > >
> >> > > This is something that we are actively working on. Can you point us
> to
> >> > > specific rules with such properties? One possibility is the join
> >> > > commutativity rule noted above. You will notice that we didn't prove
> >> the
> >> > > "general form" of the rule with symbolic attributes, but rather one
> >> with
> >> > > concrete schemas. This is because Cosette currently implements the
> >> > unnamed
> >> > > approach to attribute naming (see Section 3.2 in
> >> > > http://webdam.inria.fr/Alice/pdfs/Chapter-3.pdf), hence the general
> >> form
> >> > > of the rule is only true if we know that the two input schemas have
> >> > > distinct attributes.
> >> > >
> >> > > > This is a very loose integration of Cosette / Calcite, but we can
> >> make
> >> > > closer integrations (e.g. within the same JVM, even at runtime) as
> we
> >> > > discover synergies. After all, optimization and theorem-proving are
> >> > related
> >> > > endeavors.
> >> > >
> >> > > Agreed. Cosette is implemented using Coq and Racket. We realize that
> >> > those
> >> > > are not the most popular languages for implementing systems :) , so
> >> > Cosette
> >> > > comes with a POST API as well: http://cosette.cs.washington.
> >> > edu/guide#api
> >> > > . It takes in the program text written in Cosette, and returns the
> >> answer
> >> > > (or times out). Does this make it easier to run the tool? We are
> open
> >> to
> >> > > implementing other bindings as well.
> >> > >
> >> > > > Another area that would be useful would be to devise test data.
> >> > >
> >> > > How about this: Each SQL implementation has its own interpretation
> of
> >> > SQL,
> >> > > with Cosette being one of them. Let's implement different SQL
> >> semantics
> >> > > using Cosette (say, Calcite's and Postgres'). Then, given a query,
> ask
> >> > > Cosette to find a counterexample (i.e., an input relation) where the
> >> two
> >> > > implementations will return different results when executed on a
> given
> >> > > query. If such a counterexample exists, then Calcite developers can
> >> > > determine whether this is a "bug" or a "feature". Does this sound
> >> similar
> >> > > to what you have in mind?
> >> > >
> >> > > > There might be applications in materialized views. A query Q can
> >> use a
> >> > > materialized view V if V covers Q. In other words if Q == R(V) where
> >> R is
> >> > > some sequence of relational operators. Given Q and V, Cosette could
> >> > perhaps
> >> > > analyze and either return R (success) or return that V does not
> cover
> >> Q
> >> > > (failure).
> >> > >
> >> > > This resembles the problem of deciding whether a given relation
> >> > (expressed
> >> > > as a query) is contained in another one. It will take some work for
> >> > Cosette
> >> > > to be able to handle this but it definitely sounds interesting. Do
> you
> >> > have
> >> > > an application in mind? One of them might be to determine whether
> >> > > previously cached results can be used.
> >> > >
> >> > > We definitely see lots of synergies between the two tools. To start
> >> with
> >> > > something easy :) , I propose we first discuss how to use the
> current
> >> > > Cosette implementation to audit existing Calcite rules, and a way to
> >> > > integrate Cosette into development of future Calcite rules as part
> of
> >> > code
> >> > > review / regression tests. What do you think?
> >> > >
> >> > > Thanks,
> >> > > Alvin (on behalf of the Cosette team)
> >> > >
> >> >
> >>
> >
> >
> >
> > --
> > shumochu.com
> >
>



-- 
shumochu.com

Re: Cosette / Apache Calcite

Posted by Michael Mior <mm...@uwaterloo.ca>.
Shumo,

Thanks for the update! Unfortunately I don't believe inline images and
attachments are not supported on ASF mailing lists so if you have another
way of sharing that, would be interested to see. Glad to hear the test
cases have been useful to you. Also great to see that Calcite seems to be
faring well so far :)

You're right that the conversion of plans to SQL definitely has some
missing pieces. If there are any that are particularly problematic in your
with Cosette, please let us know. For any bugs you encounter, it would be
great if you could file a JIRA so we can work on fixing these (
https://issues.apache.org/jira/projects/CALCITE/issues/).

--
Michael Mior
mmior@apache.org

2017-10-17 20:06 GMT-04:00 Shumo Chu <sh...@gmail.com>:

> Hi Michael and Calcite Devs:
>
> That's great!
>
> We also hijacked the test harness of RelOpRule Test and generated SQL
> using before and after optimization logical plans. (The reason that I use
> query plan rather than the input SQL is that when Calcite generates SQLs
> from query plans, it adds the correct table reference of attributes.
> Cosette currently cannot handle unqualified attribute references). We got
> 232 before and after SQL after some tweaking. You can find the SQL queries
> that we get here: https://github.com/uwdb/Cosette/blob/master/examples/c
> alcite/calcite_tests.json.
>
> We then ran all of them using Cosette. Cosette can currently support 39 of
> 232 queries. Out of these 39 queries, the results are:
>
> EQ           9
> UNKNOWN     27
> NEQ          3
>
> EQ means our Coq backend found a valid proof for the equivalence of the
> two queries.
>
> UNKNOWN means our model checker could not find a counterexample to prove
> that the queries are not equivalent (within a time bound of 3 secs). You
> can interpret that as "likely equal".
>
> NEQ means our model checker found a counterexample to prove the queries
> are not equivalent. These are all the cases that there are some
> preconditions that Calcite understands but we haven't yet put into Cosette.
> For example (testPushSemiJoinPastJoinRuleLeft):
>
>  query q1 `SELECT EMP.ENAME
>                   FROM EMP AS EMP, DEPT AS DEPT, EMP AS EMP0
>                    WHERE EMP.DEPTNO = DEPT.DEPTNO AND
>                                  EMP.EMPNO = EMP0.EMPNO`;
>
> query q2 `SELECT EMP1.ENAME
>                  FROM EMP AS EMP1 INNER JOIN DEPT AS DEPT0
>                     ON EMP1.DEPTNO = DEPT0.DEPTNO INNER JOIN EMP AS EMP2
>                       ON EMP1.EMPNO = EMP2.EMPNO INNER JOIN DEPT AS DEPT1
>                     ON EMP1.DEPTNO = DEPT1.DEPTNO INNER JOIN EMP AS EMP3
>                     ON EMP1.EMPNO = EMP3.EMPNO`;
>
>
> These two queries are equivalent under the precondition that EMPNO is the
> primary key of EMP, and DEPTNO is the primary key of DEPT, and that is
> indeed the case!
>
> We are extending Cosette to support preconditions such as primary keys and
> foreign keys now and we should be able to handle these soon.
>
> For the queries that we cannot handle, here is the break down of the
> issues:
>
> [image: Inline image 1]
>
> We support a large part of group by queries, GROUPBY here means some
> grouping features that we don't support, e.g., group by on arbitrary
> expressions. Although we might not be able to support all of these SQL
> features, we are definitely working on adding more SQL features to Cosette.
>
> With all above, we can try to integrate Cosette as an extra layer to
> ensure developer's confidence. Currently, we are working on SIGMOD deadline
> (Nov. 2) but will devote more time into this afterward.
>
> One thing worth mentioning is that we plan to include all these calcite
> examples as the regression test for Cosette so that we can make Cosette
> more powerful and practical. These queries are great benchmarks for Cosette
> and are super valuable!
>
> Meanwhile, It would be also great if calcite's logical plan to SQL
> converter can be improved. It currently doesn't support all queries and in
> some cases, it actually generates illegal SQL. (happy to submit detailed
> bug report if you think that would be helpful).
>
> Best
> Shumo
>
> On Tue, Oct 17, 2017 at 9:43 AM, Michael Mior <mm...@uwaterloo.ca> wrote:
>
>> This took me longer than expected to get around to, but hopefully the
>> below
>> is helpful:
>>
>> https://gist.github.com/fd2d2db412c11ec4d901925548f85ef2
>>
>> I just did some basic (and very hacky) instrumentation of RelOptRulesTest
>> to dump SQL before and after rules have been applied. The file consists of
>> the name of the test followed by the original and then the rewritten SQL.
>>
>> Many of the tests are missing for various reasons, but there's still 189
>> examples there to play with. Let me know if any particular aspects of the
>> SQL are problematic. The "before" SQL is handwritten for the tests and the
>> "after" is ANSI SQL as generated by Calcite from the resulting logical
>> plan.
>>
>> --
>> Michael Mior
>> mmior@apache.org
>>
>> 2017-09-22 1:03 GMT-04:00 Amogh Margoor <am...@qubole.com>:
>>
>> > >>> There might be applications in materialized views. A query Q can
>> use a
>> > materialized view V if V covers Q. In other words if >>>Q == R(V) where
>> R
>> > is some sequence of relational operators. Given Q and V, Cosette could
>> > perhaps analyze and either >>>return R (success) or return that V does
>> not
>> > cover Q (failure).
>> >
>> > >>This resembles the problem of deciding whether a given relation
>> > (expressed as a query) is contained in another one. It will >>take some
>> > work for Cosette to be able to handle this but it definitely sounds
>> > interesting. Do you have an application in mind? >>One of them might be
>> to
>> > determine whether previously cached results can be used.
>> >
>> > One simple idea to start here is to replace a naive solver we have in
>> > Calcite for checking if one predicate implies another predicate. We
>> call it
>> > RexImplicationChecker in Calcite and if we can replace or help it with
>> > Constraint solver of Cosette which says if a particular implication is a
>> > tautology then that would help a great deal.
>> >
>> >
>> >
>> > On Tue, Sep 19, 2017 at 9:29 PM, Alvin Cheung <
>> akcheung@cs.washington.edu>
>> > wrote:
>> >
>> > > Hi Julian et al,
>> > >
>> > > Thanks for your interest in Cosette. Your suggestions make a lot of
>> > sense.
>> > > We have done some initial work and would like to get your feedback on
>> how
>> > > to integrate the two tools together.
>> > >
>> > > > One obvious idea is to use Cosette to audit Calcite’s query
>> > > transformation rules. Each rule is supposed to preserve semantics but
>> > > (until Cosette) we had to trust the author of the rule. We could
>> convert
>> > > the before and after relational expressions to SQL, and then ask
>> Cosette
>> > > whether those are equivalent. We could enable this check in Calcite’s
>> > test
>> > > suite, during which many thousands of rules are fired.
>> > >
>> > > Indeed. We have browsed through the Calcite rules and reformulated a
>> few
>> > > of them using our Cosette language:
>> > >
>> > > 1. Conjunctive select (https://github.com/apache/cal
>> > > cite/blob/master/core/src/main/java/org/apache/calcite/rel/
>> > > rules/FilterMergeRule.java) --> https://demo.cosette.cs.washin
>> gton.edu/
>> > > (click conjunctive select from the dropdown menu)
>> > >
>> > > 2. Join commute (https://github.com/apache/cal
>> > > cite/blob/master/core/src/main/java/org/apache/calcite/rel/
>> > > rules/JoinCommuteRule.java) --> Join commute from the demo website
>> above
>> > >
>> > > 3. Join/Project transpose (https://github.com/apache/cal
>> > > cite/blob/master/core/src/main/java/org/apache/calcite/rel/
>> > > rules/JoinProjectTransposeRule.java) --> Join Proj. Trans. from the
>> demo
>> > > website above
>> > >
>> > > As we are not very familiar with the Calcite code base, we have tried
>> our
>> > > best to guess the intention of each rule based on the documentation,
>> > please
>> > > feel free to point out if we made mistakes.
>> > >
>> > > As you can see, the Cosette language is pretty much like standard SQL,
>> > > except for declarations of schemas and relations. You will also notice
>> > the
>> > > "??" in some schema declarations (e.g., in rule 1. above) --- they
>> stand
>> > > for "symbolic" attributes that can represent any attribute. In other
>> > words,
>> > > if Cosette can prove that a rule with symbolic attributes is true,
>> then
>> > it
>> > > will be true regardless of what the symbolic attributes are
>> instantiated
>> > > with. Symbolic predicates (e.g., in rule 1.) works similarly, hence
>> > giving
>> > > Cosette a mechanism to prove (or disprove) classes of rewrite rules.
>> See
>> > > our documentation at http://cosette.cs.washington.edu/guide for
>> details.
>> > >
>> > > I believe the challenge here is how we can "reverse engineer" the
>> > > intention of each of the existing rules so that they can be expressed
>> in
>> > > Cosette. Any suggestions on how to do this? We have a few students
>> > working
>> > > on Cosette and can help, but we will probably need help from Calcite
>> to
>> > > fully understand all of the existing rules. Another possibility is to
>> > print
>> > > out the input and output of each rule application during testing, and
>> > send
>> > > them to Cosette. If the printout is in a form that resembles SQL we
>> can
>> > > probably patch it into Cosette.
>> > >
>> > > For new rules, can we can ask Calcite authors to express them in
>> Cosette
>> > > as well, perhaps as part of the documentation? This way we will only
>> need
>> > > to handle the existing rules.
>> > >
>> > > > A few rules might use other information besides the input relational
>> > > expression, such as predicates that are known to hold or column
>> > > combinations that are known to be unique. But let’s see what happens.
>> > >
>> > > This is something that we are actively working on. Can you point us to
>> > > specific rules with such properties? One possibility is the join
>> > > commutativity rule noted above. You will notice that we didn't prove
>> the
>> > > "general form" of the rule with symbolic attributes, but rather one
>> with
>> > > concrete schemas. This is because Cosette currently implements the
>> > unnamed
>> > > approach to attribute naming (see Section 3.2 in
>> > > http://webdam.inria.fr/Alice/pdfs/Chapter-3.pdf), hence the general
>> form
>> > > of the rule is only true if we know that the two input schemas have
>> > > distinct attributes.
>> > >
>> > > > This is a very loose integration of Cosette / Calcite, but we can
>> make
>> > > closer integrations (e.g. within the same JVM, even at runtime) as we
>> > > discover synergies. After all, optimization and theorem-proving are
>> > related
>> > > endeavors.
>> > >
>> > > Agreed. Cosette is implemented using Coq and Racket. We realize that
>> > those
>> > > are not the most popular languages for implementing systems :) , so
>> > Cosette
>> > > comes with a POST API as well: http://cosette.cs.washington.
>> > edu/guide#api
>> > > . It takes in the program text written in Cosette, and returns the
>> answer
>> > > (or times out). Does this make it easier to run the tool? We are open
>> to
>> > > implementing other bindings as well.
>> > >
>> > > > Another area that would be useful would be to devise test data.
>> > >
>> > > How about this: Each SQL implementation has its own interpretation of
>> > SQL,
>> > > with Cosette being one of them. Let's implement different SQL
>> semantics
>> > > using Cosette (say, Calcite's and Postgres'). Then, given a query, ask
>> > > Cosette to find a counterexample (i.e., an input relation) where the
>> two
>> > > implementations will return different results when executed on a given
>> > > query. If such a counterexample exists, then Calcite developers can
>> > > determine whether this is a "bug" or a "feature". Does this sound
>> similar
>> > > to what you have in mind?
>> > >
>> > > > There might be applications in materialized views. A query Q can
>> use a
>> > > materialized view V if V covers Q. In other words if Q == R(V) where
>> R is
>> > > some sequence of relational operators. Given Q and V, Cosette could
>> > perhaps
>> > > analyze and either return R (success) or return that V does not cover
>> Q
>> > > (failure).
>> > >
>> > > This resembles the problem of deciding whether a given relation
>> > (expressed
>> > > as a query) is contained in another one. It will take some work for
>> > Cosette
>> > > to be able to handle this but it definitely sounds interesting. Do you
>> > have
>> > > an application in mind? One of them might be to determine whether
>> > > previously cached results can be used.
>> > >
>> > > We definitely see lots of synergies between the two tools. To start
>> with
>> > > something easy :) , I propose we first discuss how to use the current
>> > > Cosette implementation to audit existing Calcite rules, and a way to
>> > > integrate Cosette into development of future Calcite rules as part of
>> > code
>> > > review / regression tests. What do you think?
>> > >
>> > > Thanks,
>> > > Alvin (on behalf of the Cosette team)
>> > >
>> >
>>
>
>
>
> --
> shumochu.com
>

Re: Cosette / Apache Calcite

Posted by Shumo Chu <sh...@gmail.com>.
Hi Michael and Calcite Devs:

That's great!

We also hijacked the test harness of RelOpRule Test and generated SQL using
before and after optimization logical plans. (The reason that I use query
plan rather than the input SQL is that when Calcite generates SQLs from
query plans, it adds the correct table reference of attributes. Cosette
currently cannot handle unqualified attribute references). We got 232
before and after SQL after some tweaking. You can find the SQL queries that
we get here: https://github.com/uwdb/Cosette/blob/master/examples/c
alcite/calcite_tests.json.

We then ran all of them using Cosette. Cosette can currently support 39 of
232 queries. Out of these 39 queries, the results are:

EQ           9
UNKNOWN     27
NEQ          3

EQ means our Coq backend found a valid proof for the equivalence of the two
queries.

UNKNOWN means our model checker could not find a counterexample to prove
that the queries are not equivalent (within a time bound of 3 secs). You
can interpret that as "likely equal".

NEQ means our model checker found a counterexample to prove the queries are
not equivalent. These are all the cases that there are some preconditions
that Calcite understands but we haven't yet put into Cosette. For example
(testPushSemiJoinPastJoinRuleLeft):

 query q1 `SELECT EMP.ENAME
                  FROM EMP AS EMP, DEPT AS DEPT, EMP AS EMP0
                   WHERE EMP.DEPTNO = DEPT.DEPTNO AND
                                 EMP.EMPNO = EMP0.EMPNO`;

query q2 `SELECT EMP1.ENAME
                 FROM EMP AS EMP1 INNER JOIN DEPT AS DEPT0
                    ON EMP1.DEPTNO = DEPT0.DEPTNO INNER JOIN EMP AS EMP2
                    ON EMP1.EMPNO = EMP2.EMPNO INNER JOIN DEPT AS DEPT1
                    ON EMP1.DEPTNO = DEPT1.DEPTNO INNER JOIN EMP AS EMP3
                    ON EMP1.EMPNO = EMP3.EMPNO`;


These two queries are equivalent under the precondition that EMPNO is the
primary key of EMP, and DEPTNO is the primary key of DEPT, and that is
indeed the case!

We are extending Cosette to support preconditions such as primary keys and
foreign keys now and we should be able to handle these soon.

For the queries that we cannot handle, here is the break down of the issues:

[image: Inline image 1]

We support a large part of group by queries, GROUPBY here means some
grouping features that we don't support, e.g., group by on arbitrary
expressions. Although we might not be able to support all of these SQL
features, we are definitely working on adding more SQL features to Cosette.

With all above, we can try to integrate Cosette as an extra layer to ensure
developer's confidence. Currently, we are working on SIGMOD deadline (Nov. 2)
but will devote more time into this afterward.

One thing worth mentioning is that we plan to include all these calcite
examples as the regression test for Cosette so that we can make Cosette
more powerful and practical. These queries are great benchmarks for Cosette
and are super valuable!

Meanwhile, It would be also great if calcite's logical plan to SQL
converter can be improved. It currently doesn't support all queries and in
some cases, it actually generates illegal SQL. (happy to submit detailed
bug report if you think that would be helpful).

Best
Shumo

On Tue, Oct 17, 2017 at 9:43 AM, Michael Mior <mm...@uwaterloo.ca> wrote:

> This took me longer than expected to get around to, but hopefully the below
> is helpful:
>
> https://gist.github.com/fd2d2db412c11ec4d901925548f85ef2
>
> I just did some basic (and very hacky) instrumentation of RelOptRulesTest
> to dump SQL before and after rules have been applied. The file consists of
> the name of the test followed by the original and then the rewritten SQL.
>
> Many of the tests are missing for various reasons, but there's still 189
> examples there to play with. Let me know if any particular aspects of the
> SQL are problematic. The "before" SQL is handwritten for the tests and the
> "after" is ANSI SQL as generated by Calcite from the resulting logical
> plan.
>
> --
> Michael Mior
> mmior@apache.org
>
> 2017-09-22 1:03 GMT-04:00 Amogh Margoor <am...@qubole.com>:
>
> > >>> There might be applications in materialized views. A query Q can use
> a
> > materialized view V if V covers Q. In other words if >>>Q == R(V) where R
> > is some sequence of relational operators. Given Q and V, Cosette could
> > perhaps analyze and either >>>return R (success) or return that V does
> not
> > cover Q (failure).
> >
> > >>This resembles the problem of deciding whether a given relation
> > (expressed as a query) is contained in another one. It will >>take some
> > work for Cosette to be able to handle this but it definitely sounds
> > interesting. Do you have an application in mind? >>One of them might be
> to
> > determine whether previously cached results can be used.
> >
> > One simple idea to start here is to replace a naive solver we have in
> > Calcite for checking if one predicate implies another predicate. We call
> it
> > RexImplicationChecker in Calcite and if we can replace or help it with
> > Constraint solver of Cosette which says if a particular implication is a
> > tautology then that would help a great deal.
> >
> >
> >
> > On Tue, Sep 19, 2017 at 9:29 PM, Alvin Cheung <
> akcheung@cs.washington.edu>
> > wrote:
> >
> > > Hi Julian et al,
> > >
> > > Thanks for your interest in Cosette. Your suggestions make a lot of
> > sense.
> > > We have done some initial work and would like to get your feedback on
> how
> > > to integrate the two tools together.
> > >
> > > > One obvious idea is to use Cosette to audit Calcite’s query
> > > transformation rules. Each rule is supposed to preserve semantics but
> > > (until Cosette) we had to trust the author of the rule. We could
> convert
> > > the before and after relational expressions to SQL, and then ask
> Cosette
> > > whether those are equivalent. We could enable this check in Calcite’s
> > test
> > > suite, during which many thousands of rules are fired.
> > >
> > > Indeed. We have browsed through the Calcite rules and reformulated a
> few
> > > of them using our Cosette language:
> > >
> > > 1. Conjunctive select (https://github.com/apache/cal
> > > cite/blob/master/core/src/main/java/org/apache/calcite/rel/
> > > rules/FilterMergeRule.java) --> https://demo.cosette.cs.
> washington.edu/
> > > (click conjunctive select from the dropdown menu)
> > >
> > > 2. Join commute (https://github.com/apache/cal
> > > cite/blob/master/core/src/main/java/org/apache/calcite/rel/
> > > rules/JoinCommuteRule.java) --> Join commute from the demo website
> above
> > >
> > > 3. Join/Project transpose (https://github.com/apache/cal
> > > cite/blob/master/core/src/main/java/org/apache/calcite/rel/
> > > rules/JoinProjectTransposeRule.java) --> Join Proj. Trans. from the
> demo
> > > website above
> > >
> > > As we are not very familiar with the Calcite code base, we have tried
> our
> > > best to guess the intention of each rule based on the documentation,
> > please
> > > feel free to point out if we made mistakes.
> > >
> > > As you can see, the Cosette language is pretty much like standard SQL,
> > > except for declarations of schemas and relations. You will also notice
> > the
> > > "??" in some schema declarations (e.g., in rule 1. above) --- they
> stand
> > > for "symbolic" attributes that can represent any attribute. In other
> > words,
> > > if Cosette can prove that a rule with symbolic attributes is true, then
> > it
> > > will be true regardless of what the symbolic attributes are
> instantiated
> > > with. Symbolic predicates (e.g., in rule 1.) works similarly, hence
> > giving
> > > Cosette a mechanism to prove (or disprove) classes of rewrite rules.
> See
> > > our documentation at http://cosette.cs.washington.edu/guide for
> details.
> > >
> > > I believe the challenge here is how we can "reverse engineer" the
> > > intention of each of the existing rules so that they can be expressed
> in
> > > Cosette. Any suggestions on how to do this? We have a few students
> > working
> > > on Cosette and can help, but we will probably need help from Calcite to
> > > fully understand all of the existing rules. Another possibility is to
> > print
> > > out the input and output of each rule application during testing, and
> > send
> > > them to Cosette. If the printout is in a form that resembles SQL we can
> > > probably patch it into Cosette.
> > >
> > > For new rules, can we can ask Calcite authors to express them in
> Cosette
> > > as well, perhaps as part of the documentation? This way we will only
> need
> > > to handle the existing rules.
> > >
> > > > A few rules might use other information besides the input relational
> > > expression, such as predicates that are known to hold or column
> > > combinations that are known to be unique. But let’s see what happens.
> > >
> > > This is something that we are actively working on. Can you point us to
> > > specific rules with such properties? One possibility is the join
> > > commutativity rule noted above. You will notice that we didn't prove
> the
> > > "general form" of the rule with symbolic attributes, but rather one
> with
> > > concrete schemas. This is because Cosette currently implements the
> > unnamed
> > > approach to attribute naming (see Section 3.2 in
> > > http://webdam.inria.fr/Alice/pdfs/Chapter-3.pdf), hence the general
> form
> > > of the rule is only true if we know that the two input schemas have
> > > distinct attributes.
> > >
> > > > This is a very loose integration of Cosette / Calcite, but we can
> make
> > > closer integrations (e.g. within the same JVM, even at runtime) as we
> > > discover synergies. After all, optimization and theorem-proving are
> > related
> > > endeavors.
> > >
> > > Agreed. Cosette is implemented using Coq and Racket. We realize that
> > those
> > > are not the most popular languages for implementing systems :) , so
> > Cosette
> > > comes with a POST API as well: http://cosette.cs.washington.
> > edu/guide#api
> > > . It takes in the program text written in Cosette, and returns the
> answer
> > > (or times out). Does this make it easier to run the tool? We are open
> to
> > > implementing other bindings as well.
> > >
> > > > Another area that would be useful would be to devise test data.
> > >
> > > How about this: Each SQL implementation has its own interpretation of
> > SQL,
> > > with Cosette being one of them. Let's implement different SQL semantics
> > > using Cosette (say, Calcite's and Postgres'). Then, given a query, ask
> > > Cosette to find a counterexample (i.e., an input relation) where the
> two
> > > implementations will return different results when executed on a given
> > > query. If such a counterexample exists, then Calcite developers can
> > > determine whether this is a "bug" or a "feature". Does this sound
> similar
> > > to what you have in mind?
> > >
> > > > There might be applications in materialized views. A query Q can use
> a
> > > materialized view V if V covers Q. In other words if Q == R(V) where R
> is
> > > some sequence of relational operators. Given Q and V, Cosette could
> > perhaps
> > > analyze and either return R (success) or return that V does not cover Q
> > > (failure).
> > >
> > > This resembles the problem of deciding whether a given relation
> > (expressed
> > > as a query) is contained in another one. It will take some work for
> > Cosette
> > > to be able to handle this but it definitely sounds interesting. Do you
> > have
> > > an application in mind? One of them might be to determine whether
> > > previously cached results can be used.
> > >
> > > We definitely see lots of synergies between the two tools. To start
> with
> > > something easy :) , I propose we first discuss how to use the current
> > > Cosette implementation to audit existing Calcite rules, and a way to
> > > integrate Cosette into development of future Calcite rules as part of
> > code
> > > review / regression tests. What do you think?
> > >
> > > Thanks,
> > > Alvin (on behalf of the Cosette team)
> > >
> >
>



-- 
shumochu.com