You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Cheolsoo Park (JIRA)" <ji...@apache.org> on 2013/12/11 18:49:07 UTC
[jira] [Updated] (PIG-3618) Replace broadcast edges with
scatter/gather edges in union
[ https://issues.apache.org/jira/browse/PIG-3618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Cheolsoo Park updated PIG-3618:
-------------------------------
Attachment: PIG-3618-1.patch
https://reviews.apache.org/r/16165/
> Replace broadcast edges with scatter/gather edges in union
> ----------------------------------------------------------
>
> Key: PIG-3618
> URL: https://issues.apache.org/jira/browse/PIG-3618
> Project: Pig
> Issue Type: Sub-task
> Components: tez
> Affects Versions: tez-branch
> Reporter: Cheolsoo Park
> Assignee: Cheolsoo Park
> Fix For: tez-branch
>
> Attachments: PIG-3618-1.patch
>
>
> Previously, I implemented union using OnFileUnorderedKVOutput + broadcast edge. But this is a misuse of broadcast edge since union will create duplicate records when parallel is set to more than 1. We should replace them with ShuffledMergedInput + scatter/gather edge having the entire record as key.
> Ideally, we should implement union using OnFileUnorderedKVOutput + scatter/gather edge with a round robin partitioner. For now, this is not supported by Tez (TEZ-661).
--
This message was sent by Atlassian JIRA
(v6.1.4#6159)