You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@calcite.apache.org by Γιώργος Θεοδωράκης <gi...@gmail.com> on 2016/09/17 15:08:27 UTC

Query optimization by using Rules

Hi,

I am trying to create a basic planner that enforces rules on simple
queries. At the moment I have created a planner from the examples (and
samza-sql integration I found online) and used HepPlanner for testing some
rules. My question is which form should my test data be? I am using
something like JdbcTest.HrSchema()  right now and I am wondering if I
should create Tables that implement Scannable and Filterable in order to
implement optimizations.

Thanks,
George

Re: Query optimization by using Rules

Posted by Γιώργος Θεοδωράκης <gi...@gmail.com>.

Thank you for the quick response Julian,

I am interested mainly in logical transformation rules. I try to create an
optimized logical plan and transform it in a physical one in my engine. The
engine is streaming and uses ByteBuffers, and I am wondering if it is
possible to optimize simple queries with the following procedure:

1)create a calcite schema that my engine uses without using all the input
data. My calcite schema would have only dummy data that are a small
percentage of real data!! =>
2)validate the query and enforce logical rules to improve the logical plan
=>
3)transform this logical plan to physical in my engine

As I have seen in some of our examples, the optimizations use metadata.
However, my engine is simplified in terms of operators, as it only has one
join and some of the key operators(aggregate,filter,project,expressions). I
am not very experienced in query optimization logic and my question is if I
use dummy data, my results would be wrong because of false metadata? Should
I create an adapter like csv for the ByteBuffers that get the streaming
data (if possible?)  ?

Thanks for your time,
George

2016-09-17 19:27 GMT+03:00 Julian Hyde <jh...@apache.org>:

> The form of your test data depends on the kind of rules you are writing.
> If you are aiming to push down a lot of operations down to a particular
> engine (e.g. Druid can handle everything except Join) then you should run
> on that engine. If you are interested mainly in logical transformation
> rules then the capabilities of the engine are less important and you could
> run on say the Csv adapter. For convenience & familiarity I use JDBC_SCOTT,
> and if I want a larger data set with a rich model I use JDBC_FOODMART.
>
> Lastly, if you are writing unit tests for correctness of the rules, and
> don’t want to execute queries, create a sub-class of RelOptRulesTest.
>
> Julian
>
>
> > On Sep 17, 2016, at 8:08 AM, Γιώργος Θεοδωράκης <gi...@gmail.com>
> wrote:
> >
> > Hi,
> >
> > I am trying to create a basic planner that enforces rules on simple
> > queries. At the moment I have created a planner from the examples (and
> > samza-sql integration I found online) and used HepPlanner for testing
> some
> > rules. My question is which form should my test data be? I am using
> > something like JdbcTest.HrSchema()  right now and I am wondering if I
> > should create Tables that implement Scannable and Filterable in order to
> > implement optimizations.
> >
> > Thanks,
> > George
>
>

Re: Query optimization by using Rules

Posted by Julian Hyde <jh...@apache.org>.

The form of your test data depends on the kind of rules you are writing. If you are aiming to push down a lot of operations down to a particular engine (e.g. Druid can handle everything except Join) then you should run on that engine. If you are interested mainly in logical transformation rules then the capabilities of the engine are less important and you could run on say the Csv adapter. For convenience & familiarity I use JDBC_SCOTT, and if I want a larger data set with a rich model I use JDBC_FOODMART.

Lastly, if you are writing unit tests for correctness of the rules, and don’t want to execute queries, create a sub-class of RelOptRulesTest.

Julian

> On Sep 17, 2016, at 8:08 AM, Γιώργος Θεοδωράκης <gi...@gmail.com> wrote:
> 
> Hi,
> 
> I am trying to create a basic planner that enforces rules on simple
> queries. At the moment I have created a planner from the examples (and
> samza-sql integration I found online) and used HepPlanner for testing some
> rules. My question is which form should my test data be? I am using
> something like JdbcTest.HrSchema()  right now and I am wondering if I
> should create Tables that implement Scannable and Filterable in order to
> implement optimizations.
> 
> Thanks,
> George