You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Zoltan Haindrich (Jira)" <ji...@apache.org> on 2021/05/21 14:54:00 UTC
[jira] [Commented] (HIVE-23809) Data loss occurs when using tez
engine to join different bucketing_version tables
[ https://issues.apache.org/jira/browse/HIVE-23809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17349307#comment-17349307 ]
Zoltan Haindrich commented on HIVE-23809:
-----------------------------------------
[~zhangqidong] I think by "Hive 4.0.0 gave a solution, but the patch has changed a lot" you were refering to HIVE-21304.
Please note that trying to fix that issue with the above patch may probably fix some cases - but most likely not all.
I think introducing a knob to disable the hashcode algo change could probably provide an easier way out.
> Data loss occurs when using tez engine to join different bucketing_version tables
> ---------------------------------------------------------------------------------
>
> Key: HIVE-23809
> URL: https://issues.apache.org/jira/browse/HIVE-23809
> Project: Hive
> Issue Type: Bug
> Components: Hive, Tez
> Affects Versions: 3.1.0
> Reporter: ZhangQiDong
> Assignee: ZhangQiDong
> Priority: Major
> Labels: hive, tez
> Attachments: HIVE-23809.1.patch
>
> Original Estimate: 12h
> Remaining Estimate: 12h
>
> *Test case:*
> create table table_a (a int, b string,c string);
> create table table_b (a int, b string,c string);
> insert into table_a values (11,'a','aa'),(22,'b','bb'),(33,'c','cc'),(44,'d','dd'),(5,'e','ee'),(6,'f','ff'),(7,'g','gg');
> insert into table_b values (11,'a','aa'),(22,'b','bb'),(33,'c','cc'),(44,'d','dd'),(5,'e','ee'),(6,'f','ff'),(7,'g','gg');
> alter table table_a set tblproperties ("bucketing_version"='1');
> alter table table_b set tblproperties ("bucketing_version"='2');
> *Hivesql:*
> *set hive.auto.convert.join=false;*
> *set mapred.reduce.tasks=2;*
> select ta.a as a_a, tb.b as b_b from table_a ta join table_b tb on(ta.a=tb.a);
> set hive.execution.engine=mr;
> +-------+-----+
> |a_a|b_b|
> +-------+-----+
> |5|e|
> |6|f|
> |7|g|
> |11|a|
> |22|b|
> |33|c|
> |44|d|
> +-------+-----+
> set hive.execution.engine=tez;
> +-------+-----+
> |a_a|b_b|
> +-------+-----+
> |6|f|
> |5|e|
> |11|a|
> |33|c|
> +-------+-----+
>
>
>
>
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)