You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Igor (Jira)" <ji...@apache.org> on 2019/09/02 09:10:00 UTC

[jira] [Commented] (HIVE-21930) WINDOW COUNT DISTINCT return wrong value with PARTITION BY

    [ https://issues.apache.org/jira/browse/HIVE-21930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16920717#comment-16920717 ] 

Igor commented on HIVE-21930:
-----------------------------

Any updates on this?

> WINDOW COUNT DISTINCT return wrong value with PARTITION BY
> ----------------------------------------------------------
>
>                 Key: HIVE-21930
>                 URL: https://issues.apache.org/jira/browse/HIVE-21930
>             Project: Hive
>          Issue Type: Bug
>          Components: PTF-Windowing
>    Affects Versions: 3.1.0
>         Environment: Beeline version 3.1.0.3.0.1.0-187 by Apache Hive
>            Reporter: Igor
>            Priority: Major
>              Labels: distinct, window_funcion
>
> count(distinct a) over (partiton by b) return wring result. For example (T is CTE here):
> {code:java}
> select p, day, ts
> , row_number() OVER (PARTITION BY phone ORDER BY ts ASC) as line_number
> , count(1) OVER (PARTITION BY phone ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as lines
> , count(distinct day) OVER (PARTITION BY phone ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as days
> FROM T{code}
>  WINDOW specification doesn't affect on results: same wrong with and without window.
> count(1) and count(distinct day) return the same result. Count distinct is wrong.
>  
> I've add size(collect_set(day) OVER (PARTITION BY phone)) as days2 and count(distinct return correct result.
> Following query return non-empty result:
> {code:java}
> select A.*, B.days, B. from (
> select p, day, ts
> , row_number() OVER (PARTITION BY phone ORDER BY ts ASC) as line_number
> , count(1) OVER (PARTITION BY p ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as lines
> , count(distinct day) OVER (PARTITION BY phone ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as days
> , size(collect_set(day) OVER (PARTITION BY phone)) as days2
> , dense_rank() over (partition by phone order by day) + dense_rank() over (partition by phone order by day desc) - 1 as days3
> FROM T ) as A 
> join (
> select p, day, ts
> , row_number() OVER (PARTITION BY phone ORDER BY ts ASC) as line_number
> , count(1) OVER (PARTITION BY phone ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as lines
> , count(distinct day) OVER (PARTITION BY phone ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as days
> FROM T
> ) as B on A.p=B.p and A.line_number=B.line_number
> where A.days!=B.days
> order by A.p, A.line_number
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)