You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hive.apache.org by "Xuefu Zhang (JIRA)" <ji...@apache.org> on 2014/12/05 04:13:12 UTC

[jira] [Commented] (HIVE-9025) join38.q (without map join) produces incorrect result when testing with multiple reducers

    [ https://issues.apache.org/jira/browse/HIVE-9025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14234988#comment-14234988 ] 

Xuefu Zhang commented on HIVE-9025:
-----------------------------------

This seems caused by HIVE-5771. [~tedxu], could you please take a look?

> join38.q (without map join) produces incorrect result when testing with multiple reducers
> -----------------------------------------------------------------------------------------
>
>                 Key: HIVE-9025
>                 URL: https://issues.apache.org/jira/browse/HIVE-9025
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Chao
>
> I have this query from a modified version of {{join38.q}}, which does NOT use map join:
> {code}
> FROM src a JOIN tmp b ON (a.key = b.col11)
> SELECT a.value, b.col5, count(1) as count
> where b.col11 = 111
> group by a.value, b.col5;
> {code}
> If I set {{mapred.reduce.tasks}} to 1, the result is correct. But, if I set it to be a larger number (3 for instance), then result will be 
> {noformat}
> val_111	105	1
> {noformat}
> which is wrong.
> I think the issue is that, for this case, ConstantPropagationProcFactory will overwrite the partition cols for the reduce sink desc, with an empty list. Then, later on in ReduceSinkOperator#computeHashCode, since partitionEval is length 0, it will use an random number as hashcode, for each separate row. As result, rows with same key will be distributed to different reducers, and hence leads to incorrect result.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)