You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "László Bodor (Jira)" <ji...@apache.org> on 2020/10/15 09:25:00 UTC

[jira] [Assigned] (HIVE-24245) Vectorized PTF with count and distinct over partition producing incorrect results.

     [ https://issues.apache.org/jira/browse/HIVE-24245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

László Bodor reassigned HIVE-24245:
-----------------------------------

    Assignee: László Bodor

> Vectorized PTF with count and distinct over partition producing incorrect results.
> ----------------------------------------------------------------------------------
>
>                 Key: HIVE-24245
>                 URL: https://issues.apache.org/jira/browse/HIVE-24245
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive, PTF-Windowing, Vectorization
>    Affects Versions: 3.1.0, 3.1.2
>            Reporter: Chiran Ravani
>            Assignee: László Bodor
>            Priority: Critical
>
> Vectorized PTF for count and distinct over partition is broken. It produces incorrect results.
> Below is the test case.
> {code}
> CREATE TABLE bigd781b_new (
>   id int,
>   txt1 string,
>   txt2 string,
>   cda_date int,
>   cda_job_name varchar(12));
> INSERT INTO bigd781b_new VALUES 
>   (1,'2010005759','7164335675012038',20200528,'load1'),
>   (2,'2010005759','7164335675012038',20200528,'load2');
> {code}
> Running below query produces incorrect results
> {code}
> SELECT
>     txt1,
>     txt2,
>     count(distinct txt1) over(partition by txt1) as n,
>     count(distinct txt2) over(partition by txt2) as m
> FROM bigd781b_new
> {code}
> as below.
> {code}
> +-------------+-------------------+----+----+
> |    txt1     |       txt2        | n  | m  |
> +-------------+-------------------+----+----+
> | 2010005759  | 7164335675012038  | 2  | 2  |
> | 2010005759  | 7164335675012038  | 2  | 2  |
> +-------------+-------------------+----+----+
> {code}
> While the correct output would be
> {code}
> +-------------+-------------------+----+----+
> |    txt1     |       txt2        | n  | m  |
> +-------------+-------------------+----+----+
> | 2010005759  | 7164335675012038  | 1  | 1  |
> | 2010005759  | 7164335675012038  | 1  | 1  |
> +-------------+-------------------+----+----+
> {code}
> The problem does not appear after setting below property
> set hive.vectorized.execution.ptf.enabled=false;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)