You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by Amogh Margoor <am...@qubole.com> on 2015/06/25 20:46:58 UTC

Detect if materialized view can be used to rewrite a query in non-trivial cases

Hi,
We were working on a problem to detect if materialized view can be used to
rewrite a query in non-trivial cases. Will briefly describe the problem and
approach below and would appreciate feedback on the same.

Motivation
---------------
For instance there exists a table "logs" and a partition (materialized
view)  named "log_after_01_Jan" created on it and described by SQL :
"Select * from logs where log_date > '01-01-2015' ".

Assume that the table "log_after_01_Jan" is much smaller than  table "logs".

 For user query:
"Select log_body from logs where log_date > '03-03-2015' and
char_length(log_body) < 100",
we should detect that the materialized view "log_after_01_Jan" can be used
and transform the query into:

"Select log_body from log_after_01_Jan where log_date > '03-03-2015' and
char_length(log_body) < 100"

Approach
--------------
One of the fundamental problems we would come across here is to check if a
boolean condition X implies (=>) Y. This quickly reduces to the
Satisfiability problem which is NP complete for propositional logic. But
there are many instances like above which can be detected easily. We have
implemented an approach to handle several useful cases for few operators
and types of operands. Will be extending it further for more types of
operations.

Top Level approach:

1. Currently, VolcanoPlanner:useMaterialization tries to rewrite original
query using MV using SubstitutionVisitor. Have extended SubstitutionVisitor
to detect above cases and do the substitution.

2. To check if a condition X => Y,
   a. Convert both of them into Disjunctive Normal Form.
       Say X is transformed into  x_1 or x_2 or x_3 ... or x_m and
       Y is transformed into y_1 or y_2 ,... or  y_i, where any x_i and y_i
are conjunctions of atomic predicates.
       For instance condition "(a>10 or b>20) and c <90" will be converted
to DNF: (a>10 and c<90)  or (b>20 and c<90).

   b. For X=>Y to be a tautology i.e., hold always true, every conjunction
x_i should imply atleast one of the conjunction y_j.
       We wrote some set of simple heuristics to check if a conjunction of
atomic predicates implies another.
      This also involves executing RexNode using RexImplExecutor.

We have checked in code for this in our fork of
calcite(qubole/incubator-calcite). This is ongoing work and we will be
making many more improvements to it. If this is useful or anybody is
interested in giving feedback then I can share the commit so that we can
discuss about it and take it forward.

Regards,
Amogh
Member of Technical Staff
Qubole

Re: Detect if materialized view can be used to rewrite a query in non-trivial cases

Posted by Amogh Margoor <am...@qubole.com>.
Hi Julian,

Small update:
This are the commits done:
>>
https://github.com/qubole/incubator-calcite/commit/19c1a70585cf9691d0e380c7fb85b7deaa86ad3d
>>
https://github.com/qubole/incubator-calcite/commit/35b1ac3b2d1ce44fa6bd9f72ae8e329555276a10

Minor Correction in the earlier response:
I meant (x>30 => x >10) instead of (x >10 => x >30)

Regards,
Amogh


On Fri, Jun 26, 2015 at 10:11 AM, Amogh Margoor <am...@qubole.com> wrote:

> Hi Julian,
>
> Thanks a lot Julian for your feedback. I have inlined my response below
> which also includes the commit done.
>
> Regards,
> Amogh
>
> On Fri, Jun 26, 2015 at 1:05 AM, Julian Hyde <jh...@apache.org> wrote:
>
>> This is great work. Certainly consistent with where I am heading.
>>
>> I would not be inclined to use DNF (because of its tendency to inflate
>> certain predicates) but if you are able to get something effective I
>> will happily use it. I think you should package it behind a method --
>> "find out what is left to satisfy p when you have already satisfied q"
>> or something -- and write lots of tests of that method, and it doesn't
>> really matter what algorithm is behind it.
>>
>> Take a look at SubstitutionVisitor.simplfy(RexNode) and how it focuses
>> on finding whether
>>
>>   p1 AND p2 AND p3 AND NOT (q1 AND q2 AND q3)
>>
>> is satisfiable.
>>
>
> >> I saw this method. I will try to use this in improvements to follow.
> >> It didnot seem to solve this currently: (x>10 => x>30)  i.e., find if
> >>  NOT (NOT(x>10 )  OR  x >30) is satisfiable.
> >> We have currently packaged it as "if X => Y" (see RexImplicationChecker
> >> in the commit I shared below), but agree it should be
> >> more generic like what you suggested above and something we can try to
> achieve.
>
>>
>> Later we will want to know not just "can I satisfy query Q using
>> materialization M?" but "can I satisfy part of Q using M, and what is
>> left over?". I can convert most of Q to use an aggregate table over
>> years 2012 .. 2014 and 2015 Jan - May, and then scan the raw data for
>> June 1st onwards, that is a big win.
>>
>
> >> This certainly should be something we should aim at.
>
>>
>> What branch are you working on? Your master branch
>> https://github.com/qubole/incubator-calcite/tree/master seems to be
>> the same as apache/master right now.
>>
>> >> We work on https://github.com/qubole/incubator-calcite/tree/qds-1.3 .
> >> This is the commit:
> https://github.com/qubole/incubator-calcite/pull/1/files?diff=unified
> >> We are in the process of writing UTs for it. We did most of the testing
> through our client code till now.
> >> We have created new Visitor extending SustitutionVisitor because did
> not want to mess with the existing code.
> >> More rules need to be added to the new Visitor.
> >> Will raise a PR once UTs are added and testing is complete.
>
>
>> If you can divide this work into pull requests with unit tests, I will
>> happy commit each change as you make progress.
>>
>> By the way, I logged a few jira cases connected to materialized view
>> rewrite today. They were motivated by the phoenix team wanting to use
>> secondary indexes. But they could by applied to any scan-project-sort
>> materialization. See
>>
>> * https://issues.apache.org/jira/browse/CALCITE-771
>> * https://issues.apache.org/jira/browse/CALCITE-772
>> * https://issues.apache.org/jira/browse/CALCITE-773
>>
>
> >> Thanks for sharing this info Julian. Will definitely take a look.
>
>>
>> Julian
>>
>> On Thu, Jun 25, 2015 at 11:46 AM, Amogh Margoor <am...@qubole.com>
>> wrote:
>> > Hi,
>> > We were working on a problem to detect if materialized view can be used
>> to
>> > rewrite a query in non-trivial cases. Will briefly describe the problem
>> and
>> > approach below and would appreciate feedback on the same.
>> >
>> > Motivation
>> > ---------------
>> > For instance there exists a table "logs" and a partition (materialized
>> > view)  named "log_after_01_Jan" created on it and described by SQL :
>> > "Select * from logs where log_date > '01-01-2015' ".
>> >
>> > Assume that the table "log_after_01_Jan" is much smaller than  table
>> "logs".
>> >
>> >  For user query:
>> > "Select log_body from logs where log_date > '03-03-2015' and
>> > char_length(log_body) < 100",
>> > we should detect that the materialized view "log_after_01_Jan" can be
>> used
>> > and transform the query into:
>> >
>> > "Select log_body from log_after_01_Jan where log_date > '03-03-2015' and
>> > char_length(log_body) < 100"
>> >
>> > Approach
>> > --------------
>> > One of the fundamental problems we would come across here is to check
>> if a
>> > boolean condition X implies (=>) Y. This quickly reduces to the
>> > Satisfiability problem which is NP complete for propositional logic. But
>> > there are many instances like above which can be detected easily. We
>> have
>> > implemented an approach to handle several useful cases for few operators
>> > and types of operands. Will be extending it further for more types of
>> > operations.
>> >
>> > Top Level approach:
>> >
>> > 1. Currently, VolcanoPlanner:useMaterialization tries to rewrite
>> original
>> > query using MV using SubstitutionVisitor. Have extended
>> SubstitutionVisitor
>> > to detect above cases and do the substitution.
>> >
>> > 2. To check if a condition X => Y,
>> >    a. Convert both of them into Disjunctive Normal Form.
>> >        Say X is transformed into  x_1 or x_2 or x_3 ... or x_m and
>> >        Y is transformed into y_1 or y_2 ,... or  y_i, where any x_i and
>> y_i
>> > are conjunctions of atomic predicates.
>> >        For instance condition "(a>10 or b>20) and c <90" will be
>> converted
>> > to DNF: (a>10 and c<90)  or (b>20 and c<90).
>> >
>> >    b. For X=>Y to be a tautology i.e., hold always true, every
>> conjunction
>> > x_i should imply atleast one of the conjunction y_j.
>> >        We wrote some set of simple heuristics to check if a conjunction
>> of
>> > atomic predicates implies another.
>> >       This also involves executing RexNode using RexImplExecutor.
>> >
>> > We have checked in code for this in our fork of
>> > calcite(qubole/incubator-calcite). This is ongoing work and we will be
>> > making many more improvements to it. If this is useful or anybody is
>> > interested in giving feedback then I can share the commit so that we can
>> > discuss about it and take it forward.
>> >
>> > Regards,
>> > Amogh
>> > Member of Technical Staff
>> > Qubole
>>
>
>

Re: Detect if materialized view can be used to rewrite a query in non-trivial cases

Posted by Amogh Margoor <am...@qubole.com>.
Hi Julian,

Thanks a lot Julian for your feedback. I have inlined my response below
which also includes the commit done.

Regards,
Amogh

On Fri, Jun 26, 2015 at 1:05 AM, Julian Hyde <jh...@apache.org> wrote:

> This is great work. Certainly consistent with where I am heading.
>
> I would not be inclined to use DNF (because of its tendency to inflate
> certain predicates) but if you are able to get something effective I
> will happily use it. I think you should package it behind a method --
> "find out what is left to satisfy p when you have already satisfied q"
> or something -- and write lots of tests of that method, and it doesn't
> really matter what algorithm is behind it.
>
> Take a look at SubstitutionVisitor.simplfy(RexNode) and how it focuses
> on finding whether
>
>   p1 AND p2 AND p3 AND NOT (q1 AND q2 AND q3)
>
> is satisfiable.
>

>> I saw this method. I will try to use this in improvements to follow.
>> It didnot seem to solve this currently: (x>10 => x>30)  i.e., find if
>>  NOT (NOT(x>10 )  OR  x >30) is satisfiable.
>> We have currently packaged it as "if X => Y" (see RexImplicationChecker
>> in the commit I shared below), but agree it should be
>> more generic like what you suggested above and something we can try to
achieve.

>
> Later we will want to know not just "can I satisfy query Q using
> materialization M?" but "can I satisfy part of Q using M, and what is
> left over?". I can convert most of Q to use an aggregate table over
> years 2012 .. 2014 and 2015 Jan - May, and then scan the raw data for
> June 1st onwards, that is a big win.
>

>> This certainly should be something we should aim at.

>
> What branch are you working on? Your master branch
> https://github.com/qubole/incubator-calcite/tree/master seems to be
> the same as apache/master right now.
>
> >> We work on https://github.com/qubole/incubator-calcite/tree/qds-1.3 .
>> This is the commit:
https://github.com/qubole/incubator-calcite/pull/1/files?diff=unified
>> We are in the process of writing UTs for it. We did most of the testing
through our client code till now.
>> We have created new Visitor extending SustitutionVisitor because did not
want to mess with the existing code.
>> More rules need to be added to the new Visitor.
>> Will raise a PR once UTs are added and testing is complete.


> If you can divide this work into pull requests with unit tests, I will
> happy commit each change as you make progress.
>
> By the way, I logged a few jira cases connected to materialized view
> rewrite today. They were motivated by the phoenix team wanting to use
> secondary indexes. But they could by applied to any scan-project-sort
> materialization. See
>
> * https://issues.apache.org/jira/browse/CALCITE-771
> * https://issues.apache.org/jira/browse/CALCITE-772
> * https://issues.apache.org/jira/browse/CALCITE-773
>

>> Thanks for sharing this info Julian. Will definitely take a look.

>
> Julian
>
> On Thu, Jun 25, 2015 at 11:46 AM, Amogh Margoor <am...@qubole.com> wrote:
> > Hi,
> > We were working on a problem to detect if materialized view can be used
> to
> > rewrite a query in non-trivial cases. Will briefly describe the problem
> and
> > approach below and would appreciate feedback on the same.
> >
> > Motivation
> > ---------------
> > For instance there exists a table "logs" and a partition (materialized
> > view)  named "log_after_01_Jan" created on it and described by SQL :
> > "Select * from logs where log_date > '01-01-2015' ".
> >
> > Assume that the table "log_after_01_Jan" is much smaller than  table
> "logs".
> >
> >  For user query:
> > "Select log_body from logs where log_date > '03-03-2015' and
> > char_length(log_body) < 100",
> > we should detect that the materialized view "log_after_01_Jan" can be
> used
> > and transform the query into:
> >
> > "Select log_body from log_after_01_Jan where log_date > '03-03-2015' and
> > char_length(log_body) < 100"
> >
> > Approach
> > --------------
> > One of the fundamental problems we would come across here is to check if
> a
> > boolean condition X implies (=>) Y. This quickly reduces to the
> > Satisfiability problem which is NP complete for propositional logic. But
> > there are many instances like above which can be detected easily. We have
> > implemented an approach to handle several useful cases for few operators
> > and types of operands. Will be extending it further for more types of
> > operations.
> >
> > Top Level approach:
> >
> > 1. Currently, VolcanoPlanner:useMaterialization tries to rewrite original
> > query using MV using SubstitutionVisitor. Have extended
> SubstitutionVisitor
> > to detect above cases and do the substitution.
> >
> > 2. To check if a condition X => Y,
> >    a. Convert both of them into Disjunctive Normal Form.
> >        Say X is transformed into  x_1 or x_2 or x_3 ... or x_m and
> >        Y is transformed into y_1 or y_2 ,... or  y_i, where any x_i and
> y_i
> > are conjunctions of atomic predicates.
> >        For instance condition "(a>10 or b>20) and c <90" will be
> converted
> > to DNF: (a>10 and c<90)  or (b>20 and c<90).
> >
> >    b. For X=>Y to be a tautology i.e., hold always true, every
> conjunction
> > x_i should imply atleast one of the conjunction y_j.
> >        We wrote some set of simple heuristics to check if a conjunction
> of
> > atomic predicates implies another.
> >       This also involves executing RexNode using RexImplExecutor.
> >
> > We have checked in code for this in our fork of
> > calcite(qubole/incubator-calcite). This is ongoing work and we will be
> > making many more improvements to it. If this is useful or anybody is
> > interested in giving feedback then I can share the commit so that we can
> > discuss about it and take it forward.
> >
> > Regards,
> > Amogh
> > Member of Technical Staff
> > Qubole
>

Re: Detect if materialized view can be used to rewrite a query in non-trivial cases

Posted by Amogh Margoor <am...@qubole.com>.
Hi John,

Thanks for the interest. Responses are inlined.

Regards,
Amogh

On Fri, Jun 26, 2015 at 3:13 AM, John Pullokkaran <
jpullokkaran@hortonworks.com> wrote:

> Amogh,
>
>    It would be nice if you can share the commit id.
>

>>
https://github.com/qubole/incubator-calcite/commit/19c1a70585cf9691d0e380c7fb85b7deaa86ad3d
>>
https://github.com/qubole/incubator-calcite/commit/35b1ac3b2d1ce44fa6bd9f72ae8e329555276a10

>
>
> 1. #2.b "This also involves executing RexNode using RexImplExecutor²
> Wondering why expr needs to be evaluated? Is it cover functions involving
> literals.

>>  To find out if x>30 => x>10 we evaluate x>10 by replacing x by 30. But
we check lot of stuff to ensure this doesnot give false negatives.

>
> 2. how is expr ordering issue solved?
> Example: dept != 10 vs 10 != dept
>
>> For the current set of binary operators we are handling this was
straight forward. See  the second commit for it:
>>
https://github.com/qubole/incubator-calcite/commit/35b1ac3b2d1ce44fa6bd9f72ae8e329555276a10

>
> 3. Column aliasing:
> Select sales as a from t1 order by a
> Vs
> Select sales as b from t1  order by b
>

>> We are planning to test around it. Gut feeling is that it should be
handled
>> as we do operations on RelNodes that I believe would have resolved
aliasing.

>
> Thanks
> John
>
> On 6/25/15, 12:35 PM, "Julian Hyde" <jh...@apache.org> wrote:
>
> >This is great work. Certainly consistent with where I am heading.
> >
> >I would not be inclined to use DNF (because of its tendency to inflate
> >certain predicates) but if you are able to get something effective I
> >will happily use it. I think you should package it behind a method --
> >"find out what is left to satisfy p when you have already satisfied q"
> >or something -- and write lots of tests of that method, and it doesn't
> >really matter what algorithm is behind it.
> >
> >Take a look at SubstitutionVisitor.simplfy(RexNode) and how it focuses
> >on finding whether
> >
> >  p1 AND p2 AND p3 AND NOT (q1 AND q2 AND q3)
> >
> >is satisfiable.
> >
> >Later we will want to know not just "can I satisfy query Q using
> >materialization M?" but "can I satisfy part of Q using M, and what is
> >left over?". I can convert most of Q to use an aggregate table over
> >years 2012 .. 2014 and 2015 Jan - May, and then scan the raw data for
> >June 1st onwards, that is a big win.
> >
> >What branch are you working on? Your master branch
> >https://github.com/qubole/incubator-calcite/tree/master seems to be
> >the same as apache/master right now.
> >
> >If you can divide this work into pull requests with unit tests, I will
> >happy commit each change as you make progress.
> >
> >By the way, I logged a few jira cases connected to materialized view
> >rewrite today. They were motivated by the phoenix team wanting to use
> >secondary indexes. But they could by applied to any scan-project-sort
> >materialization. See
> >
> >* https://issues.apache.org/jira/browse/CALCITE-771
> >* https://issues.apache.org/jira/browse/CALCITE-772
> >* https://issues.apache.org/jira/browse/CALCITE-773
> >
> >Julian
> >
> >On Thu, Jun 25, 2015 at 11:46 AM, Amogh Margoor <am...@qubole.com>
> wrote:
> >> Hi,
> >> We were working on a problem to detect if materialized view can be used
> >>to
> >> rewrite a query in non-trivial cases. Will briefly describe the problem
> >>and
> >> approach below and would appreciate feedback on the same.
> >>
> >> Motivation
> >> ---------------
> >> For instance there exists a table "logs" and a partition (materialized
> >> view)  named "log_after_01_Jan" created on it and described by SQL :
> >> "Select * from logs where log_date > '01-01-2015' ".
> >>
> >> Assume that the table "log_after_01_Jan" is much smaller than  table
> >>"logs".
> >>
> >>  For user query:
> >> "Select log_body from logs where log_date > '03-03-2015' and
> >> char_length(log_body) < 100",
> >> we should detect that the materialized view "log_after_01_Jan" can be
> >>used
> >> and transform the query into:
> >>
> >> "Select log_body from log_after_01_Jan where log_date > '03-03-2015' and
> >> char_length(log_body) < 100"
> >>
> >> Approach
> >> --------------
> >> One of the fundamental problems we would come across here is to check
> >>if a
> >> boolean condition X implies (=>) Y. This quickly reduces to the
> >> Satisfiability problem which is NP complete for propositional logic. But
> >> there are many instances like above which can be detected easily. We
> >>have
> >> implemented an approach to handle several useful cases for few operators
> >> and types of operands. Will be extending it further for more types of
> >> operations.
> >>
> >> Top Level approach:
> >>
> >> 1. Currently, VolcanoPlanner:useMaterialization tries to rewrite
> >>original
> >> query using MV using SubstitutionVisitor. Have extended
> >>SubstitutionVisitor
> >> to detect above cases and do the substitution.
> >>
> >> 2. To check if a condition X => Y,
> >>    a. Convert both of them into Disjunctive Normal Form.
> >>        Say X is transformed into  x_1 or x_2 or x_3 ... or x_m and
> >>        Y is transformed into y_1 or y_2 ,... or  y_i, where any x_i and
> >>y_i
> >> are conjunctions of atomic predicates.
> >>        For instance condition "(a>10 or b>20) and c <90" will be
> >>converted
> >> to DNF: (a>10 and c<90)  or (b>20 and c<90).
> >>
> >>    b. For X=>Y to be a tautology i.e., hold always true, every
> >>conjunction
> >> x_i should imply atleast one of the conjunction y_j.
> >>        We wrote some set of simple heuristics to check if a conjunction
> >>of
> >> atomic predicates implies another.
> >>       This also involves executing RexNode using RexImplExecutor.
> >>
> >> We have checked in code for this in our fork of
> >> calcite(qubole/incubator-calcite). This is ongoing work and we will be
> >> making many more improvements to it. If this is useful or anybody is
> >> interested in giving feedback then I can share the commit so that we can
> >> discuss about it and take it forward.
> >>
> >> Regards,
> >> Amogh
> >> Member of Technical Staff
> >> Qubole
> >
>
>

Re: Detect if materialized view can be used to rewrite a query in non-trivial cases

Posted by John Pullokkaran <jp...@hortonworks.com>.
Amogh,

   It would be nice if you can share the commit id.


1. #2.b "This also involves executing RexNode using RexImplExecutor²
Wondering why expr needs to be evaluated? Is it cover functions involving
literals.

2. how is expr ordering issue solved?
Example: dept != 10 vs 10 != dept

3. Column aliasing:
Select sales as a from t1 order by a
Vs
Select sales as b from t1  order by b

Thanks
John

On 6/25/15, 12:35 PM, "Julian Hyde" <jh...@apache.org> wrote:

>This is great work. Certainly consistent with where I am heading.
>
>I would not be inclined to use DNF (because of its tendency to inflate
>certain predicates) but if you are able to get something effective I
>will happily use it. I think you should package it behind a method --
>"find out what is left to satisfy p when you have already satisfied q"
>or something -- and write lots of tests of that method, and it doesn't
>really matter what algorithm is behind it.
>
>Take a look at SubstitutionVisitor.simplfy(RexNode) and how it focuses
>on finding whether
>
>  p1 AND p2 AND p3 AND NOT (q1 AND q2 AND q3)
>
>is satisfiable.
>
>Later we will want to know not just "can I satisfy query Q using
>materialization M?" but "can I satisfy part of Q using M, and what is
>left over?". I can convert most of Q to use an aggregate table over
>years 2012 .. 2014 and 2015 Jan - May, and then scan the raw data for
>June 1st onwards, that is a big win.
>
>What branch are you working on? Your master branch
>https://github.com/qubole/incubator-calcite/tree/master seems to be
>the same as apache/master right now.
>
>If you can divide this work into pull requests with unit tests, I will
>happy commit each change as you make progress.
>
>By the way, I logged a few jira cases connected to materialized view
>rewrite today. They were motivated by the phoenix team wanting to use
>secondary indexes. But they could by applied to any scan-project-sort
>materialization. See
>
>* https://issues.apache.org/jira/browse/CALCITE-771
>* https://issues.apache.org/jira/browse/CALCITE-772
>* https://issues.apache.org/jira/browse/CALCITE-773
>
>Julian
>
>On Thu, Jun 25, 2015 at 11:46 AM, Amogh Margoor <am...@qubole.com> wrote:
>> Hi,
>> We were working on a problem to detect if materialized view can be used
>>to
>> rewrite a query in non-trivial cases. Will briefly describe the problem
>>and
>> approach below and would appreciate feedback on the same.
>>
>> Motivation
>> ---------------
>> For instance there exists a table "logs" and a partition (materialized
>> view)  named "log_after_01_Jan" created on it and described by SQL :
>> "Select * from logs where log_date > '01-01-2015' ".
>>
>> Assume that the table "log_after_01_Jan" is much smaller than  table
>>"logs".
>>
>>  For user query:
>> "Select log_body from logs where log_date > '03-03-2015' and
>> char_length(log_body) < 100",
>> we should detect that the materialized view "log_after_01_Jan" can be
>>used
>> and transform the query into:
>>
>> "Select log_body from log_after_01_Jan where log_date > '03-03-2015' and
>> char_length(log_body) < 100"
>>
>> Approach
>> --------------
>> One of the fundamental problems we would come across here is to check
>>if a
>> boolean condition X implies (=>) Y. This quickly reduces to the
>> Satisfiability problem which is NP complete for propositional logic. But
>> there are many instances like above which can be detected easily. We
>>have
>> implemented an approach to handle several useful cases for few operators
>> and types of operands. Will be extending it further for more types of
>> operations.
>>
>> Top Level approach:
>>
>> 1. Currently, VolcanoPlanner:useMaterialization tries to rewrite
>>original
>> query using MV using SubstitutionVisitor. Have extended
>>SubstitutionVisitor
>> to detect above cases and do the substitution.
>>
>> 2. To check if a condition X => Y,
>>    a. Convert both of them into Disjunctive Normal Form.
>>        Say X is transformed into  x_1 or x_2 or x_3 ... or x_m and
>>        Y is transformed into y_1 or y_2 ,... or  y_i, where any x_i and
>>y_i
>> are conjunctions of atomic predicates.
>>        For instance condition "(a>10 or b>20) and c <90" will be
>>converted
>> to DNF: (a>10 and c<90)  or (b>20 and c<90).
>>
>>    b. For X=>Y to be a tautology i.e., hold always true, every
>>conjunction
>> x_i should imply atleast one of the conjunction y_j.
>>        We wrote some set of simple heuristics to check if a conjunction
>>of
>> atomic predicates implies another.
>>       This also involves executing RexNode using RexImplExecutor.
>>
>> We have checked in code for this in our fork of
>> calcite(qubole/incubator-calcite). This is ongoing work and we will be
>> making many more improvements to it. If this is useful or anybody is
>> interested in giving feedback then I can share the commit so that we can
>> discuss about it and take it forward.
>>
>> Regards,
>> Amogh
>> Member of Technical Staff
>> Qubole
>


Re: Detect if materialized view can be used to rewrite a query in non-trivial cases

Posted by Julian Hyde <jh...@apache.org>.
This is great work. Certainly consistent with where I am heading.

I would not be inclined to use DNF (because of its tendency to inflate
certain predicates) but if you are able to get something effective I
will happily use it. I think you should package it behind a method --
"find out what is left to satisfy p when you have already satisfied q"
or something -- and write lots of tests of that method, and it doesn't
really matter what algorithm is behind it.

Take a look at SubstitutionVisitor.simplfy(RexNode) and how it focuses
on finding whether

  p1 AND p2 AND p3 AND NOT (q1 AND q2 AND q3)

is satisfiable.

Later we will want to know not just "can I satisfy query Q using
materialization M?" but "can I satisfy part of Q using M, and what is
left over?". I can convert most of Q to use an aggregate table over
years 2012 .. 2014 and 2015 Jan - May, and then scan the raw data for
June 1st onwards, that is a big win.

What branch are you working on? Your master branch
https://github.com/qubole/incubator-calcite/tree/master seems to be
the same as apache/master right now.

If you can divide this work into pull requests with unit tests, I will
happy commit each change as you make progress.

By the way, I logged a few jira cases connected to materialized view
rewrite today. They were motivated by the phoenix team wanting to use
secondary indexes. But they could by applied to any scan-project-sort
materialization. See

* https://issues.apache.org/jira/browse/CALCITE-771
* https://issues.apache.org/jira/browse/CALCITE-772
* https://issues.apache.org/jira/browse/CALCITE-773

Julian

On Thu, Jun 25, 2015 at 11:46 AM, Amogh Margoor <am...@qubole.com> wrote:
> Hi,
> We were working on a problem to detect if materialized view can be used to
> rewrite a query in non-trivial cases. Will briefly describe the problem and
> approach below and would appreciate feedback on the same.
>
> Motivation
> ---------------
> For instance there exists a table "logs" and a partition (materialized
> view)  named "log_after_01_Jan" created on it and described by SQL :
> "Select * from logs where log_date > '01-01-2015' ".
>
> Assume that the table "log_after_01_Jan" is much smaller than  table "logs".
>
>  For user query:
> "Select log_body from logs where log_date > '03-03-2015' and
> char_length(log_body) < 100",
> we should detect that the materialized view "log_after_01_Jan" can be used
> and transform the query into:
>
> "Select log_body from log_after_01_Jan where log_date > '03-03-2015' and
> char_length(log_body) < 100"
>
> Approach
> --------------
> One of the fundamental problems we would come across here is to check if a
> boolean condition X implies (=>) Y. This quickly reduces to the
> Satisfiability problem which is NP complete for propositional logic. But
> there are many instances like above which can be detected easily. We have
> implemented an approach to handle several useful cases for few operators
> and types of operands. Will be extending it further for more types of
> operations.
>
> Top Level approach:
>
> 1. Currently, VolcanoPlanner:useMaterialization tries to rewrite original
> query using MV using SubstitutionVisitor. Have extended SubstitutionVisitor
> to detect above cases and do the substitution.
>
> 2. To check if a condition X => Y,
>    a. Convert both of them into Disjunctive Normal Form.
>        Say X is transformed into  x_1 or x_2 or x_3 ... or x_m and
>        Y is transformed into y_1 or y_2 ,... or  y_i, where any x_i and y_i
> are conjunctions of atomic predicates.
>        For instance condition "(a>10 or b>20) and c <90" will be converted
> to DNF: (a>10 and c<90)  or (b>20 and c<90).
>
>    b. For X=>Y to be a tautology i.e., hold always true, every conjunction
> x_i should imply atleast one of the conjunction y_j.
>        We wrote some set of simple heuristics to check if a conjunction of
> atomic predicates implies another.
>       This also involves executing RexNode using RexImplExecutor.
>
> We have checked in code for this in our fork of
> calcite(qubole/incubator-calcite). This is ongoing work and we will be
> making many more improvements to it. If this is useful or anybody is
> interested in giving feedback then I can share the commit so that we can
> discuss about it and take it forward.
>
> Regards,
> Amogh
> Member of Technical Staff
> Qubole