You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2021/03/05 14:17:00 UTC
[jira] [Work logged] (HIVE-15757) Allow EXISTS/NOT EXISTS correlated subquery with aggregates

     [ https://issues.apache.org/jira/browse/HIVE-15757?focusedWorklogId=561420&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-561420 ]

ASF GitHub Bot logged work on HIVE-15757:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 05/Mar/21 14:16
            Start Date: 05/Mar/21 14:16
    Worklog Time Spent: 10m 
      Work Description: kasakrisz opened a new pull request #2039:
URL: https://github.com/apache/hive/pull/2039


   ### What changes were proposed in this pull request?
   1. When transforming subquery AST to calcite plan check whether the subquery is a full aggregation: has aggregate functions without group by clause
   2. When building subquery expression return `true` literal if subquery is a full aggregation having `EXISTS` operator instead of `RexSubQuery`
   
   ### Why are the changes needed?
   To support queries like:
   ```
   SELECT t1.bigint_col
   FROM alltypestiny t1
   WHERE EXISTS
     (SELECT SUM(smallint_col) AS int_col
      FROM alltypestiny
      WHERE t1.date_string_col = string_col AND t1.timestamp_col = timestamp_col)
   GROUP BY t1.bigint_col
   ```
   
   ### Does this PR introduce _any_ user-facing change?
   Yes. Current Hive implementation throws exception when compiling queries like the example above. With this patch query compiles and runs.
   
   ### How was this patch tested?
   ```
   mvn test -Dtest.output.overwrite -DskipSparkTests -Dtest=TestMiniLlapLocalCliDriver -Dqfile=subquery_full_aggregate.q -pl itests/qtest -Pitests
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

            Worklog Id:     (was: 561420)
    Remaining Estimate: 0h
            Time Spent: 10m

> Allow EXISTS/NOT EXISTS correlated subquery with aggregates
> -----------------------------------------------------------
>
>                 Key: HIVE-15757
>                 URL: https://issues.apache.org/jira/browse/HIVE-15757
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Query Planning
>            Reporter: Vineet Garg
>            Assignee: Krisztian Kasa
>            Priority: Major
>              Labels: sub-query
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently HIVE doesn't allow correlated EXISTS/NOT EXISTS subqueries which has aggregate without group by e.g.
> {code} select *
> from src b 
> where exists 
>   (select count(*) 
>   from src a 
>   where b.value = a.value  and a.key = b.key and a.value > 'val_9'
>   ) ;
> {code}
> Such queries could be rewritten to replace EXISTS/NOT EXISTS predicate with {{true}} or {{false}} based on aggregate.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)