You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Suprith Chandrashekharachar (Jira)" <ji...@apache.org> on 2021/03/31 00:12:00 UTC

[jira] [Commented] (HIVE-24915) Distribute by with sort by clause when used with constant parameter for sort produces wrong result.

    [ https://issues.apache.org/jira/browse/HIVE-24915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17311933#comment-17311933 ] 

Suprith Chandrashekharachar commented on HIVE-24915:
----------------------------------------------------

[~kgyrtkirk] Could you please take a look at this one/assign it to someone who is familiar with the code base w.r.t the change being made?

> Distribute by with sort by clause when used with constant parameter for sort produces wrong result.
> ---------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-24915
>                 URL: https://issues.apache.org/jira/browse/HIVE-24915
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 2.3.4
>            Reporter: Suprith Chandrashekharachar
>            Assignee: Suprith Chandrashekharachar
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Distribute by with sort by clause when used with constant parameter for sort produces wrong result.
> Example: 
> {code:java}
>  SELECT 
>     t.time,
>     'a' as const
>   FROM
>     (SELECT 1591819264 as time
>     UNION ALL
>     SELECT 1591819265 as time) t
>   DISTRIBUTE by const
>   sort by const, t.time
> {code}
> Produces
>   
> |{color:#000000}*time*{color}|{color:#000000}*const*{color}|
> | NULL|{color:#000000}a{color}|
> | NULL|{color:#000000}a{color}|
> Instead it should produce(Hive 0.13 produces this):
> |{color:#000000}*time*{color}|{color:#000000}*const*{color}|
> |{color:#000000}*1591819264*{color}|{color:#000000}a{color}|
> |{color:#000000}*1591819265*{color}|{color:#000000}a{color}|
> Incorrect sort columns are used while creating ReduceSink here [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L9066]
> With constant propagation optimizer enabled, due to incorrect constant operator folding, incorrect results will be produced.
>  
> More examples for incorrect behavior:
> {code:java}
>   SELECT 
>     t.time,
>     'a' as const,
>     t.id
>   FROM
>     (SELECT 1591819264 as time, 1 as id
>     UNION ALL
>     SELECT 1591819265 as time, 2 as id) t
>   DISTRIBUTE by t.time
>   sort by t.time, const, t.id
> {code}
> produces
> |{color:#000000}*time*{color}|{color:#000000}*const*{color}|{color:#000000}*id*{color}|
> |{color:#000000}*1591819264*{color}|{color:#000000}a{color}|NULL |
> |{color:#000000}*1591819265*{color}|{color:#000000}a{color}| NULL|
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)