You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Aditya Shah (Jira)" <ji...@apache.org> on 2019/12/12 11:00:00 UTC

[jira] [Comment Edited] (HIVE-22636) Data loss on skewjoin for ACID tables.

    [ https://issues.apache.org/jira/browse/HIVE-22636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16994530#comment-16994530 ] 

Aditya Shah edited comment on HIVE-22636 at 12/12/19 10:59 AM:
---------------------------------------------------------------

[~kgyrtkirk] [~sershe] can you please take a look. I can add a check similar to HIVE-16051 in SkewJoinResolver for full acid too. But if there is a better way, we can do that?

Thanks,
 Aditya


was (Author: aditya-shah):
[~kgyrtkirk] [~sershe] can you please take a look. I can add a similar check similar to HIVE-16051 in SkewJoinResolver for full acid too. But if there is a better way, we can do that?

Thanks,
Aditya

> Data loss on skewjoin for ACID tables.
> --------------------------------------
>
>                 Key: HIVE-22636
>                 URL: https://issues.apache.org/jira/browse/HIVE-22636
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 4.0.0
>            Reporter: Aditya Shah
>            Priority: Blocker
>
> I am trying to do a skewjoin and writing the result into a FullAcid table. The results are incorrect. The issue is similar to seen for MM tables in HIVE-16051 where the fix was to skip having a skewjoin for MM table. 
> Steps to reproduce:
> Used a qtest similar to HIVE-16051:
> {code:java}
> --! qt:dataset:src1
> --! qt:dataset:src
> -- MASK_LINEAGE
> set hive.mapred.mode=nonstrict;
> set hive.exec.dynamic.partition.mode=nonstrict;
> set hive.support.concurrency=true;
> set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
> set hive.optimize.skewjoin=true;
> set hive.skewjoin.key=2;
> set hive.optimize.metadataonly=false;
> CREATE TABLE skewjoin_acid(key INT, value STRING) STORED AS ORC tblproperties ("transactional"="true");
> FROM src src1 JOIN src src2 ON (src1.key = src2.key) INSERT into TABLE skewjoin_acid SELECT src1.key, src2.value;
> select count(distinct key) from skewjoin_acid;
> drop table skewjoin_acid;
> {code}
> The expected result for the count was 309 but got 173. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)