You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "László Bodor (Jira)" <ji...@apache.org> on 2020/10/15 09:25:00 UTC
[jira] [Assigned] (HIVE-24245) Vectorized PTF with count and
distinct over partition producing incorrect results.
[ https://issues.apache.org/jira/browse/HIVE-24245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
László Bodor reassigned HIVE-24245:
-----------------------------------
Assignee: László Bodor
> Vectorized PTF with count and distinct over partition producing incorrect results.
> ----------------------------------------------------------------------------------
>
> Key: HIVE-24245
> URL: https://issues.apache.org/jira/browse/HIVE-24245
> Project: Hive
> Issue Type: Bug
> Components: Hive, PTF-Windowing, Vectorization
> Affects Versions: 3.1.0, 3.1.2
> Reporter: Chiran Ravani
> Assignee: László Bodor
> Priority: Critical
>
> Vectorized PTF for count and distinct over partition is broken. It produces incorrect results.
> Below is the test case.
> {code}
> CREATE TABLE bigd781b_new (
> id int,
> txt1 string,
> txt2 string,
> cda_date int,
> cda_job_name varchar(12));
> INSERT INTO bigd781b_new VALUES
> (1,'2010005759','7164335675012038',20200528,'load1'),
> (2,'2010005759','7164335675012038',20200528,'load2');
> {code}
> Running below query produces incorrect results
> {code}
> SELECT
> txt1,
> txt2,
> count(distinct txt1) over(partition by txt1) as n,
> count(distinct txt2) over(partition by txt2) as m
> FROM bigd781b_new
> {code}
> as below.
> {code}
> +-------------+-------------------+----+----+
> | txt1 | txt2 | n | m |
> +-------------+-------------------+----+----+
> | 2010005759 | 7164335675012038 | 2 | 2 |
> | 2010005759 | 7164335675012038 | 2 | 2 |
> +-------------+-------------------+----+----+
> {code}
> While the correct output would be
> {code}
> +-------------+-------------------+----+----+
> | txt1 | txt2 | n | m |
> +-------------+-------------------+----+----+
> | 2010005759 | 7164335675012038 | 1 | 1 |
> | 2010005759 | 7164335675012038 | 1 | 1 |
> +-------------+-------------------+----+----+
> {code}
> The problem does not appear after setting below property
> set hive.vectorized.execution.ptf.enabled=false;
--
This message was sent by Atlassian Jira
(v8.3.4#803005)