You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2022/09/09 07:07:00 UTC

[jira] [Work logged] (HIVE-26524) Use Calcite to remove sections of a query plan known never produces rows

     [ https://issues.apache.org/jira/browse/HIVE-26524?focusedWorklogId=807279&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-807279 ]

ASF GitHub Bot logged work on HIVE-26524:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 09/Sep/22 07:06
            Start Date: 09/Sep/22 07:06
    Worklog Time Spent: 10m 
      Work Description: kasakrisz opened a new pull request, #3588:
URL: https://github.com/apache/hive/pull/3588

   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://cwiki.apache.org/confluence/display/Hive/HowToContribute
     2. Ensure that you have created an issue on the Hive project JIRA: https://issues.apache.org/jira/projects/HIVE/summary
     3. Ensure you have added or run the appropriate tests for your PR: 
     4. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP]HIVE-XXXXX:  Your PR title ...'.
     5. Be sure to keep the PR description updated to reflect all changes.
     6. Please write your PR title to summarize what this PR proposes.
     7. If possible, provide a concise example to reproduce the issue for a faster review.
   
   -->
   
   ### What changes were proposed in this pull request?
   * Currently Hive represents the empty result operator with `HiveSortLimit(fetch=0)`. Change this to `HiveValues(tuples[])` like Calcite does.
   * Improve and extend the `PruneEmptyRules` provided by Calcite with Hive specific functionality.
   * Represent the empty `HiveValues` operator with an AST tree of the query
   ```
   select null as colName0... null as colNamen limit 0
   ```
   when converting back the CBO plan to AST.
   * Get the schema information from the `HiveValues` row type at CBO -> AST conversion.
   
   ### Why are the changes needed?
   * Calcite has built in rules to remove sections of a query plan known never produces any rows. It makes the CBO plan much simpler.
   * In some cases (ex. `select * from table1 where 1=0` ) the whole plan can be removed and Hive already has an optimization not to execute queries which does not provide any result. This optimization is built on checking the limit value at the top level query.
   
   ### Does this PR introduce _any_ user-facing change?
   No, but `explain` results.
   
   ### How was this patch tested?
   ```
   mvn test -Dtest.output.overwrite -Dtest=TestMiniLlapLocalCliDriver -Dqfile=empty_result_outerjoin.q,empty_result.q,empty_result_union.q -pl itests/qtest -Pitests
   ```
   




Issue Time Tracking
-------------------

            Worklog Id:     (was: 807279)
    Remaining Estimate: 0h
            Time Spent: 10m

> Use Calcite to remove sections of a query plan known never produces rows
> ------------------------------------------------------------------------
>
>                 Key: HIVE-26524
>                 URL: https://issues.apache.org/jira/browse/HIVE-26524
>             Project: Hive
>          Issue Type: Improvement
>          Components: CBO
>            Reporter: Krisztian Kasa
>            Assignee: Krisztian Kasa
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Calcite has a set of rules to remove sections of a query plan known never produces any rows. In some cases the whole plan can be removed. Such plans are represented with a single {{Values}} operators with no tuples. ex.:
> {code:java}
> select y + 1 from (select a1 y, b1 z from t1 where b1 > 10) q WHERE 1=0
> {code}
> {code:java}
> HiveValues(tuples=[[]])
> {code}
> Other cases when plan has outer join or set operators some branches can be replaced with empty values moving forward in some cases the join/set operator can be removed
> {code:java}
> select a2, b2 from t2 where 1=0
> union
> select a1, b1 from t1
> {code}
> {code:java}
> HiveAggregate(group=[{0, 1}])
>   HiveTableScan(table=[[default, t1]], table:alias=[t1])
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)