You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Julian Hyde (JIRA)" <ji...@apache.org> on 2015/07/06 22:34:04 UTC

[jira] [Updated] (CALCITE-786) Detect if materialized view can be used to rewrite a query in non-trivial cases

     [ https://issues.apache.org/jira/browse/CALCITE-786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Julian Hyde updated CALCITE-786:
--------------------------------
    Fix Version/s: next

> Detect if materialized view can be used to rewrite a query in non-trivial cases
> -------------------------------------------------------------------------------
>
>                 Key: CALCITE-786
>                 URL: https://issues.apache.org/jira/browse/CALCITE-786
>             Project: Calcite
>          Issue Type: Task
>          Components: core
>    Affects Versions: 1.3.0-incubating, 1.4.0-incubating
>            Reporter: Amogh Margoor
>            Assignee: Julian Hyde
>            Priority: Minor
>             Fix For: next
>
>
> Improvement to detection if MV can be used to rewrite queries in non-trivial cases.
> Pasting the email conversation below that happened over this which briefly discusses the approach taken:
> ---------- Forwarded message ----------
> From: Amogh Margoor <am...@qubole.com>
> Date: Fri, Jun 26, 2015 at 10:11 AM
> Subject: Re: Detect if materialized view can be used to rewrite a query in non-trivial cases
> To: dev@calcite.incubator.apache.org, Rajat Venkatesh <rv...@qubole.com>
> Hi Julian,
> Thanks a lot Julian for your feedback. I have inlined my response below which also includes the commit done.
> Regards,
> Amogh
> On Fri, Jun 26, 2015 at 1:05 AM, Julian Hyde <jh...@apache.org> wrote:
>     This is great work. Certainly consistent with where I am heading.
>     I would not be inclined to use DNF (because of its tendency to inflate
>     certain predicates) but if you are able to get something effective I
>     will happily use it. I think you should package it behind a method --
>     "find out what is left to satisfy p when you have already satisfied q"
>     or something -- and write lots of tests of that method, and it doesn't
>     really matter what algorithm is behind it.
>     Take a look at SubstitutionVisitor.simplfy(RexNode) and how it focuses
>     on finding whether
>       p1 AND p2 AND p3 AND NOT (q1 AND q2 AND q3)
>     is satisfiable.
> >> I saw this method. I will try to use this in improvements to follow.
> >> It didnot seem to solve this currently: (x>10 => x>30)  i.e., find if 
> >>  NOT (NOT(x>10 )  OR  x >30) is satisfiable. 
> >> We have currently packaged it as "if X => Y" (see RexImplicationChecker 
> >> in the commit I shared below), but agree it should be 
> >> more generic like what you suggested above and something we can try to achieve.
>     Later we will want to know not just "can I satisfy query Q using
>     materialization M?" but "can I satisfy part of Q using M, and what is
>     left over?". I can convert most of Q to use an aggregate table over
>     years 2012 .. 2014 and 2015 Jan - May, and then scan the raw data for
>     June 1st onwards, that is a big win.
> >> This certainly should be something we should aim at. 
>     What branch are you working on? Your master branch
>     https://github.com/qubole/incubator-calcite/tree/master seems to be
>     the same as apache/master right now.
> >> We work on https://github.com/qubole/incubator-calcite/tree/qds-1.3 .
> >> This is the commit: https://github.com/qubole/incubator-calcite/pull/1/files?diff=unified
> >> We are in the process of writing UTs for it. We did most of the testing through our client code till now.
> >> We have created new Visitor extending SustitutionVisitor because did not want to mess with the existing code.
> >> More rules need to be added to the new Visitor.  
> >> Will raise a PR once UTs are added and testing is complete.
>   
>     If you can divide this work into pull requests with unit tests, I will
>     happy commit each change as you make progress.
>     By the way, I logged a few jira cases connected to materialized view
>     rewrite today. They were motivated by the phoenix team wanting to use
>     secondary indexes. But they could by applied to any scan-project-sort
>     materialization. See
>     * https://issues.apache.org/jira/browse/CALCITE-771
>     * https://issues.apache.org/jira/browse/CALCITE-772
>     * https://issues.apache.org/jira/browse/CALCITE-773 
> >> Thanks for sharing this info Julian. Will definitely take a look. 
>     Julian
>     On Thu, Jun 25, 2015 at 11:46 AM, Amogh Margoor <am...@qubole.com> wrote:
>     > Hi,
>     > We were working on a problem to detect if materialized view can be used to
>     > rewrite a query in non-trivial cases. Will briefly describe the problem and
>     > approach below and would appreciate feedback on the same.
>     >
>     > Motivation
>     > ---------------
>     > For instance there exists a table "logs" and a partition (materialized
>     > view)  named "log_after_01_Jan" created on it and described by SQL :
>     > "Select * from logs where log_date > '01-01-2015' ".
>     >
>     > Assume that the table "log_after_01_Jan" is much smaller than  table "logs".
>     >
>     >  For user query:
>     > "Select log_body from logs where log_date > '03-03-2015' and
>     > char_length(log_body) < 100",
>     > we should detect that the materialized view "log_after_01_Jan" can be used
>     > and transform the query into:
>     >
>     > "Select log_body from log_after_01_Jan where log_date > '03-03-2015' and
>     > char_length(log_body) < 100"
>     >
>     > Approach
>     > --------------
>     > One of the fundamental problems we would come across here is to check if a
>     > boolean condition X implies (=>) Y. This quickly reduces to the
>     > Satisfiability problem which is NP complete for propositional logic. But
>     > there are many instances like above which can be detected easily. We have
>     > implemented an approach to handle several useful cases for few operators
>     > and types of operands. Will be extending it further for more types of
>     > operations.
>     >
>     > Top Level approach:
>     >
>     > 1. Currently, VolcanoPlanner:useMaterialization tries to rewrite original
>     > query using MV using SubstitutionVisitor. Have extended SubstitutionVisitor
>     > to detect above cases and do the substitution.
>     >
>     > 2. To check if a condition X => Y,
>     >    a. Convert both of them into Disjunctive Normal Form.
>     >        Say X is transformed into  x_1 or x_2 or x_3 ... or x_m and
>     >        Y is transformed into y_1 or y_2 ,... or  y_i, where any x_i and y_i
>     > are conjunctions of atomic predicates.
>     >        For instance condition "(a>10 or b>20) and c <90" will be converted
>     > to DNF: (a>10 and c<90)  or (b>20 and c<90).
>     >
>     >    b. For X=>Y to be a tautology i.e., hold always true, every conjunction
>     > x_i should imply atleast one of the conjunction y_j.
>     >        We wrote some set of simple heuristics to check if a conjunction of
>     > atomic predicates implies another.
>     >       This also involves executing RexNode using RexImplExecutor.
>     >
>     > We have checked in code for this in our fork of
>     > calcite(qubole/incubator-calcite). This is ongoing work and we will be
>     > making many more improvements to it. If this is useful or anybody is
>     > interested in giving feedback then I can share the commit so that we can
>     > discuss about it and take it forward.
>     >
>     > Regards,
>     > Amogh
>     > Member of Technical Staff
>     > Qubole 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)