You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Simhadri Govindappa (Jira)" <ji...@apache.org> on 2022/10/27 10:54:00 UTC

[jira] [Created] (HIVE-26673) Incorrect row count when vectorisation is enabled

Simhadri Govindappa created HIVE-26673:
------------------------------------------

             Summary: Incorrect row count when vectorisation is enabled
                 Key: HIVE-26673
                 URL: https://issues.apache.org/jira/browse/HIVE-26673
             Project: Hive
          Issue Type: Bug
          Components: Hive
    Affects Versions: 4.0.0-alpha-2
            Reporter: Simhadri Govindappa


Repro:
{noformat}
select count(*) from
(SELECT T0.plant_no,
T0.part_chain,
T0.part_new,
T0.part_no
FROM dm_ads_dims_prod.cloudera_test3 T0
LEFT JOIN
(SELECT T0.plant_no,
T0.part_chain
FROM
(SELECT T0.plant_no,
T0.part_chain,
count( *) AS ct
FROM dm_ads_dims_prod.cloudera_test3 T0
WHERE purchase_pos = pos
GROUP BY T0.plant_no,
T0.part_chain) T0
WHERE ct = 2 ) T1 ON T0.plant_no = T1.plant_no
AND T0.part_chain = T1.part_chain
WHERE T0.purchase_pos = T0.pos
AND (T1.part_chain IS NULL
OR (T1.part_chain IS NOT NULL
AND T0.fd = 1)) ) s;
{noformat}
Run the query with the following settings on the repro cluster a few times
{code:java}
set hive.query.results.cache.enabled=false;
set hive.compute.query.using.stats=false;
set hive.auto.convert.join=true;
{code}
and the results was
{code:java}
2682424
2682426
2682425{code}
 

Then turn off {{hive.auto.convert.join}}
{code:java}
set hive.query.results.cache.enabled=false;
set hive.compute.query.using.stats=false;
set hive.auto.convert.join=false;
{code}
and the result was always *2682420*

Analyzing the plans with hive.auto.convert.join enabled vs disabled, the difference is the type of join Map vs Merge.

Additionally, vectorization also plays a role when turned off the result became good:
{code:java}
SET hive.vectorized.execution.enabled=false;
{code}
It is also just a workaround and has negative impact on performance this should help us narrow down where to find the cause of the issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)