You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Bikas Saha (JIRA)" <ji...@apache.org> on 2014/03/31 23:07:15 UTC

[jira] [Commented] (TEZ-1003) ConcatenatedMergedKeyValuesInput only groups within each input

    [ https://issues.apache.org/jira/browse/TEZ-1003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13955715#comment-13955715 ] 

Bikas Saha commented on TEZ-1003:
---------------------------------

By definition, the concatenated input is going to concatenate the individual members. So we can either close this jira as invalid or re-target it to create a new merged input thats performs another level of sort-merging on top of sorted merged inputs.

> ConcatenatedMergedKeyValuesInput only groups within each input
> --------------------------------------------------------------
>
>                 Key: TEZ-1003
>                 URL: https://issues.apache.org/jira/browse/TEZ-1003
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Rohini Palaniswamy
>
> In PIG-3835, was trying to do use vertex groups for unions. Union followed by store works fine. But when trying to do groupby, 
>  
> {code}
> A = LOAD '/tmp/data' AS (f1:int,f2:int);
> B = LOAD '/tmp/data2' AS (f1:int,f2:int);
> C = UNION onschema A,B;
> D = GROUP C by f1;
> E = FOREACH D GENERATE group, SUM(C.f2);
> store E into '/tmp/tezout' using PigStorage();
> {code}
> ConcatenatedMergedKeyValuesInput on the reduce, had only grouped records within each input and not across all inputs.
> i.e If A had records
> a 1
> b 1
> b 2
> and B
> a 2
> a 3
> b 3
> The records from ConcatenatedMergedKeyValuesInput of A and B were
> a {1}, b {1,2}, a {2,3}, b {3} while I am expecting a {1,2,3}, b {1,2,3}



--
This message was sent by Atlassian JIRA
(v6.2#6252)