You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Stamatis Zampetakis (Jira)" <ji...@apache.org> on 2021/10/11 11:10:00 UTC

[jira] [Commented] (HIVE-25589) SQL: Implement HAVING/QUALIFY predicates for ROW_NUMBER()=1

    [ https://issues.apache.org/jira/browse/HIVE-25589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17427066#comment-17427066 ] 

Stamatis Zampetakis commented on HIVE-25589:
--------------------------------------------

The semantics of {{HAVING}} are specified in SQL standard so changing them to support this use-case may create ambiguity or other problems.
I like the capabilities of {{QUALIFY}} but looking into the [documentation|https://docs.snowflake.com/en/sql-reference/constructs/qualify.html] it does more than what we really need. 

It seems that the real requirement is to easily exclude few named columns from the result set. In that case it may be preferable to introduce a more conservative clause that does exactly this.

For instance Big Query uses SELECT * EXCEPT [syntax|https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#select_except]. 

{code:sql}
INSERT INTO main_table
SELECT * EXCEPT rnum FROM
(SELECT *, ROW_NUMBER() OVER (PARTITION BY event_id) as rnum FROM duplicated_table)
WHERE rnum=1;
{code}

> SQL: Implement HAVING/QUALIFY predicates for ROW_NUMBER()=1
> -----------------------------------------------------------
>
>                 Key: HIVE-25589
>                 URL: https://issues.apache.org/jira/browse/HIVE-25589
>             Project: Hive
>          Issue Type: Improvement
>          Components: CBO, SQL
>    Affects Versions: 4.0.0
>            Reporter: Gopal Vijayaraghavan
>            Priority: Major
>
> The insert queries which use a row_num()=1 function are inconvenient to write or port from an existing workload, because there is no easy way to ignore a column in this pattern.
> {code}
> INSERT INTO main_table 
> SELECT * from duplicated_table
> QUALIFY ROW_NUMER() OVER (PARTITION BY event_id) = 1;
> {code}
> needs to be rewritten into
> {code}
> INSERT INTO main_table
> select event_id, event_ts, event_attribute, event_metric1, event_metric2, event_metric3, event_metric4, .., event_metric43 from 
> (select *, ROW_NUMBER() OVER (PARTITION BY event_id) as rnum from duplicated_table)
> where rnum=1;
> {code}
> This is a time-consuming and error-prone rewrite (dealing with a messed up order of columns between one source and dest table).
> An alternate rewrite would be to do the same or similar syntax using HAVING. 
> {code}
> INSERT INTO main_table 
> SELECT * from duplicated_table
> HAVING ROW_NUMER() OVER (PARTITION BY event_id) = 1;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)