You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Pi Song (JIRA)" <ji...@apache.org> on 2008/04/16 15:59:21 UTC
[jira] Commented: (PIG-199) New Join types in Pig
[ https://issues.apache.org/jira/browse/PIG-199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12589560#action_12589560 ]
Pi Song commented on PIG-199:
-----------------------------
Amir,
Just out of curiosity. How do you plan to implement Fragment and Replace Join? Is it like ? :-
For A ⋈ B :-
In A: Map (k1, v1) --> { ((a ,1),(k1,v1)), ((a ,2),(k1,v1)), ((a ,3),(k1,v1)), ... , ((a ,M),(k1,v1)) } where a = GetPartitionA( (k1,v1) ) into N partitions
In B: Map (k1, v1) --> { ((1 ,b),(k1,v1)), ((2 ,b),(k1,v1)), ((3 ,b),(k1,v1)), ... , ((N ,b),(k1,v1)) } where b = GetPartitionB( (k1,v1) ) into M partitions
And then having N * M reduce buckets doing local join?
If that is the case, the amount of data will be multiplied. Wouldn't the performance be worse? Is this solely for inequality join feature ?
> New Join types in Pig
> ---------------------
>
> Key: PIG-199
> URL: https://issues.apache.org/jira/browse/PIG-199
> Project: Pig
> Issue Type: New Feature
> Reporter: Amir Youssefi
> Assignee: Amir Youssefi
>
> We need to design and implementation new Join Types in Pig which can potentially improve the performance for large data-sets. I will start with Map Side Joins/Fragment and Replace.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.