You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by 刘东明 <l....@foxmail.com> on 2016/10/25 04:00:02 UTC

回复:Calcite how to push down project/filter/aggregation to TableScan.

Or I do not use the RelBuiler in TableScan (This design has a history reason, but now can change it). 


Add a special node (exchange?) in original tree, push the relnode under the special node. At last, convert the relnode subtree under the special node to query for datasource(RDBMS).


Any idea?


------------------ 原始邮件 ------------------
发件人: "刘东明";<l....@foxmail.com>;
发送时间: 2016年10月25日(星期二) 中午11:20
收件人: "dev"<de...@calcite.apache.org>; 

主题: Calcite how to push down project/filter/aggregation to TableScan.



Hi, all

    I am using Apache Calcite to implement a distributed OLAP system, which datasource is RDBMS. So I want to push down the project/filter/aggregation in RelNode tree to MyTableScan extends TableScan. In MyTableScan, a RelBuilder to get the pushed RelNode. At last, RelBuilder to genatate the Query to the source database. At the same time, the project/filter/aggregation in original RelNode tree should be moved or modified.


    I am using RelOptRule and HepPlanner to implement this feature. I create a rule that match some operands, when onmatch(), I will use one tableScan to replace the matched subtree.


   There are some case:
   1) filter->tableScan  Use transformTo method in RelOptRule I can easily push the filter into tableScan and remove the filter in original tree.
   2) project->tableScan Can not use transformTo method, because the areRowTypesEqual is false. 
   3) push and change aggregation node. In original RelNode tree, count change to sum, avg change to sum/count, at the same time, the agg function be pushed to tableScan.


   for 2) and 3), any idea to implement? 


Thanks.

Re: Calcite how to push down project/filter/aggregation to TableScan.

Posted by Julian Hyde <jh...@apache.org>.
Regarding 2. As you have noticed, each planner rule must preserve the row type. Rather than “pushing down projects”, think of moving them from one engine to another. Let me illustrate with an example. Suppose you have two “engines”: Foo (where data is stored, but can execute some limited operators) and Bar (doesn’t store data, but has powerful distributed operators). You initial plan is

   BarSort
           |
   BarProject
           |
   FooScan 

and then you apply a rule, FooPushProjectRule, to get

  BarSort
           |
  FooProject
          |
  FooScan

The row types are still the same, but now more of the processing is happening in the “Foo” engine, and less data is flowing over the wire from Foo to Bar, and therefore the overall cost is lower.

You haven’t pushed the project into the table scan. You’ve pushed the project into the engine that does the table scan.

(A lot of Calcite adapters do this; see for example CassandraProjectRule and MongoProjectRule. The Druid adapter does it a little differently: DruidRules.DruidProjectRule pushes a Project into a DruidQuery, which is a table scan followed by a sequence of operators, and its row type is the type of the last operator.)

3. Again, take a look at how the Cassandra and Druid adapters deal with Aggregate and TableScan.

Also see AggregateReduceFunctionsRule, which is a purely logical rewrite that transforms AVG to SUM / COUNT, etc. I recommend that you do the logical rewrite before you start trying to push operators to your engine.

Julian

 
> On Oct 24, 2016, at 9:00 PM, 刘东明 <l....@foxmail.com> wrote:
> 
> Or I do not use the RelBuiler in TableScan (This design has a history reason, but now can change it). 
> 
> 
> Add a special node (exchange?) in original tree, push the relnode under the special node. At last, convert the relnode subtree under the special node to query for datasource(RDBMS).
> 
> 
> Any idea?
> 
> 
> ------------------ 原始邮件 ------------------
> 发件人: "刘东明";<l....@foxmail.com>;
> 发送时间: 2016年10月25日(星期二) 中午11:20
> 收件人: "dev"<de...@calcite.apache.org>; 
> 
> 主题: Calcite how to push down project/filter/aggregation to TableScan.
> 
> 
> 
> Hi, all
> 
>    I am using Apache Calcite to implement a distributed OLAP system, which datasource is RDBMS. So I want to push down the project/filter/aggregation in RelNode tree to MyTableScan extends TableScan. In MyTableScan, a RelBuilder to get the pushed RelNode. At last, RelBuilder to genatate the Query to the source database. At the same time, the project/filter/aggregation in original RelNode tree should be moved or modified.
> 
> 
>    I am using RelOptRule and HepPlanner to implement this feature. I create a rule that match some operands, when onmatch(), I will use one tableScan to replace the matched subtree.
> 
> 
>   There are some case:
>   1) filter->tableScan  Use transformTo method in RelOptRule I can easily push the filter into tableScan and remove the filter in original tree.
>   2) project->tableScan Can not use transformTo method, because the areRowTypesEqual is false. 
>   3) push and change aggregation node. In original RelNode tree, count change to sum, avg change to sum/count, at the same time, the agg function be pushed to tableScan.
> 
> 
>   for 2) and 3), any idea to implement? 
> 
> 
> Thanks.