You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Stamatis Zampetakis (Jira)" <ji...@apache.org> on 2022/10/19 15:05:00 UTC

[jira] [Created] (HIVE-26653) Wrong results when (map) joining multiple tables on partition column

Stamatis Zampetakis created HIVE-26653:
------------------------------------------

             Summary: Wrong results when (map) joining multiple tables on partition column
                 Key: HIVE-26653
                 URL: https://issues.apache.org/jira/browse/HIVE-26653
             Project: Hive
          Issue Type: Bug
          Components: HiveServer2
            Reporter: Stamatis Zampetakis
            Assignee: Stamatis Zampetakis


The result of the query must have exactly one row matching the date specified in the WHERE clause but the query returns nothing.
{code:sql}
CREATE TABLE table_a (`aid` string ) PARTITIONED BY (`p_dt` string)
row format delimited fields terminated by ',' stored as textfile;

LOAD DATA LOCAL INPATH '../../data/files/_tbla.csv' into TABLE table_a;

CREATE TABLE table_b (`bid` string) PARTITIONED BY (`p_dt` string)
row format delimited fields terminated by ',' stored as textfile;

LOAD DATA LOCAL INPATH '../../data/files/_tblb.csv' into TABLE table_b;

set hive.auto.convert.join=true;
set hive.optimize.semijoin.conversion=false;

SELECT a.p_dt
FROM ((SELECT p_dt
       FROM table_b
       GROUP BY p_dt) a
         JOIN
     (SELECT p_dt
      FROM table_a
      GROUP BY p_dt) b ON a.p_dt = b.p_dt
         JOIN
     (SELECT p_dt
      FROM table_a
      GROUP BY p_dt) c ON a.p_dt = c.p_dt)
WHERE a.p_dt =  translate(cast(to_date(date_sub('2022-08-01', 1)) AS string), '-', '');
{code}
+Expected result+
20220731

+Actual result+
Empty

To reproduce the problem the tables need to have some data. Values in aid and bid columns are not important. For p_dt column use one of the following values 20220731, 20220630.

I will attach some sample data with which the problem can be reproduced. The tables look like below.
||aid|pdt||
|611|20220731|
|239|20220630|
|...|...|

The problem can be reproduced via qtest in current master 
(commit [6b05d64ce8c7161415d97a7896ea50025322e30a|https://github.com/apache/hive/commit/6b05d64ce8c7161415d97a7896ea50025322e30a]) by running the TestMiniLlapLocalCliDriver.

There is specific query plan (will attach shortly) for which the problem shows up so if the plan changes slightly the problem may not appear anymore; this is why we need to set explicitly hive.optimize.semijoin.conversion and hive.auto.convert.join to trigger the problem.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)