You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2020/08/27 10:07:00 UTC

[jira] [Work logged] (HIVE-24081) Enable pre-materializing CTEs referenced in scalar subqueries

     [ https://issues.apache.org/jira/browse/HIVE-24081?focusedWorklogId=475219&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-475219 ]

ASF GitHub Bot logged work on HIVE-24081:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 27/Aug/20 10:06
            Start Date: 27/Aug/20 10:06
    Worklog Time Spent: 10m 
      Work Description: kasakrisz opened a new pull request #1437:
URL: https://github.com/apache/hive/pull/1437


   ### What changes were proposed in this pull request?
   * Do phase 1 parsing of subquery expressions in order to count CTE references in those subqueries
   * Add a config to materialize CTEs with aggregate output only
   
   
   ### Why are the changes needed?
   Improve performance of complex queries referencing the same fully aggregate CTE more than one times.
   
   ### Does this PR introduce _any_ user-facing change?
   Adds a new config into HiveConf: `hive.optimize.cte.materialize.full.aggregate.only`.
   Prior this patch if `hive.optimize.cte.materialize.threshold` was higher than -1 all non-subquery CTEs were materialized if they were referenced more times than the threshold. This patch limits this to fully aggregate CTEs only by default. The original behavior can restored by setting `hive.optimize.cte.materialize.full.aggregate.only` to false.
   
   ### How was this patch tested?
   * New q tests were added.
   ```
   mvn test -Dtest.output.overwrite -DskipSparkTests -Dtest=TestMiniLlapLocalCliDriver -Dqfile=cte_mat_6.q -pl itests/qtest -Pitests
   ```
   * Run query14 with `set hive.optimize.cte.materialize.threshold=3;`
   ```
   mvn test -Dtest.output.overwrite -DskipSparkTests -Dtest=TestTezPerfCliDriver -Dqfile=query14.q -pl itests/qtest -Pitests
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

            Worklog Id:     (was: 475219)
    Remaining Estimate: 0h
            Time Spent: 10m

> Enable pre-materializing CTEs referenced in scalar subqueries
> -------------------------------------------------------------
>
>                 Key: HIVE-24081
>                 URL: https://issues.apache.org/jira/browse/HIVE-24081
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Planning
>            Reporter: Krisztian Kasa
>            Assignee: Krisztian Kasa
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> HIVE-11752 introduces materializing CTE based on config
> {code}
> hive.optimize.cte.materialize.threshold
> {code}
> Goal of this jira is
> * extending the implementation to support materializing CTE's referenced in scalar subqueries
> * add a config to materialize CTEs with aggregate output only



--
This message was sent by Atlassian Jira
(v8.3.4#803005)