You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Julian Hyde (JIRA)" <ji...@apache.org> on 2018/01/12 04:11:00 UTC

[jira] [Commented] (CALCITE-468) Introduce semi join reduction optimization in Calcite

    [ https://issues.apache.org/jira/browse/CALCITE-468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16323516#comment-16323516 ] 

Julian Hyde commented on CALCITE-468:
-------------------------------------

See also CALCITE-1317, which is a form of WinMagic.

> Introduce semi join reduction optimization in Calcite 
> ------------------------------------------------------
>
>                 Key: CALCITE-468
>                 URL: https://issues.apache.org/jira/browse/CALCITE-468
>             Project: Calcite
>          Issue Type: Bug
>            Reporter: Mostafa Mokhtar
>            Assignee: Julian Hyde
>              Labels: hive
>         Attachments: BaselineTree.png, SemiJoinReductionGains.png, SemiJoinReductionTreee.png
>
>
> The basic idea is to apply join predicates early in a plan in order to reduce the size of intermediate query results and, thus, reduce the cost of other operations. In other words, the idea is to apply the same join predicates twice or more often in a query plan
> In order to reduce the communication costs of a distributed system. Obviously, semi-join reducers are only effective if the (redundant) semi-joins are cheap and result in a significant reduction of the size of intermediate
> query results.
> I propose to extend a query optimizer and integrate semi-join reducer and
> join-ordering, etc. into a single query optimization step
> Several TPC-DS queries like 24, 64 & 80 run very slow do to the lake of semi join reduction optimization in Calcite.
> Doing a rewrite of Q64 to simulate semi join reduction produced 4x gains.
> {code}
> Query	               	Total time	     CPU 	Intermediate rows (Million)
> Baseline			1,377	 356,900	                    23,940
> Semi Join Reduction		343	  47,253	                        23
> {code}
> Q64 subset 
> {code}
> select 
>     count(*)
> FROM
>     store_sales
>         JOIN
>     item ON store_sales.ss_item_sk = item.i_item_sk
>         JOIN
>     store_returns ON store_sales.ss_item_sk = store_returns.sr_item_sk
>         JOIN
>     (select 
>         cs_item_sk
>     from
>         catalog_sales
>     JOIN catalog_returns ON catalog_sales.cs_item_sk = catalog_returns.cr_item_sk
>     group by cs_item_sk
>     having 
> sum(cs_ext_list_price) > 2 * sum(cr_refunded_cash + cr_reversed_charge + cr_store_credit)) cs_ui 
> ON store_sales.ss_item_sk = cs_ui.cs_item_sk
> WHERE
>     i_color in ('maroon' , 'burnished',
>         'dim',
>         'steel',
>         'navajo',
>         'chocolate')
>         and i_current_price between 35 and 35 + 10
>         and i_current_price between 35 + 1 and 35 + 15
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)