You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Sergey Shelukhin (JIRA)" <ji...@apache.org> on 2017/02/28 23:00:46 UTC

[jira] [Commented] (HIVE-16051) MM tables: skewjoin test fails

    [ https://issues.apache.org/jira/browse/HIVE-16051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15889054#comment-15889054 ] 

Sergey Shelukhin commented on HIVE-16051:
-----------------------------------------

Looks like one of the files gets written twice due to multiple stages; the original creates the new one with _1 prefix, but MM just overwrites it.

> MM tables: skewjoin test fails
> ------------------------------
>
>                 Key: HIVE-16051
>                 URL: https://issues.apache.org/jira/browse/HIVE-16051
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>
> {noformat}
> set hive.optimize.skewjoin = true;
> set hive.skewjoin.key = 2;
> set hive.optimize.metadataonly=false;
> CREATE TABLE dest_j1(key INT, value STRING) STORED AS TEXTFILE tblproperties ("transactional"="true", "transactional_properties"="insert_only");
> FROM src src1 JOIN src src2 ON (src1.key = src2.key)
> INSERT OVERWRITE TABLE dest_j1 SELECT src1.key, src2.value;
> select count(distinct key) from dest_j1;
> {noformat}
> Different results for MM and non-MM table.
> Probably has something to do with how skewjoin handles files; however, looking at MM/debugging logs, there are no suspicious deletes, and everything looks the same for both cases; all the logging for skewjoin row containers and stuff is identical between the two runs (except for the numbers/guids; the number of files, paths, etc. are all the same). So not sure what's going on. Probably dfs dump can answer this question, but it doesn't work for me currently on q files.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)