You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Amir Youssefi (JIRA)" <ji...@apache.org> on 2008/04/10 02:24:05 UTC
[jira] Created: (PIG-199) New Join types in Pig
New Join types in Pig
---------------------
Key: PIG-199
URL: https://issues.apache.org/jira/browse/PIG-199
Project: Pig
Issue Type: New Feature
Reporter: Amir Youssefi
We need to design and implementation new Join Types in Pig which can potentially improve the performance for large data-sets. I will start with Map Side Joins/Fragment and Replace.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-199) New Join types in Pig
Posted by "Pi Song (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12589560#action_12589560 ]
Pi Song commented on PIG-199:
-----------------------------
Amir,
Just out of curiosity. How do you plan to implement Fragment and Replace Join? Is it like ? :-
For A ⋈ B :-
In A: Map (k1, v1) --> { ((a ,1),(k1,v1)), ((a ,2),(k1,v1)), ((a ,3),(k1,v1)), ... , ((a ,M),(k1,v1)) } where a = GetPartitionA( (k1,v1) ) into N partitions
In B: Map (k1, v1) --> { ((1 ,b),(k1,v1)), ((2 ,b),(k1,v1)), ((3 ,b),(k1,v1)), ... , ((N ,b),(k1,v1)) } where b = GetPartitionB( (k1,v1) ) into M partitions
And then having N * M reduce buckets doing local join?
If that is the case, the amount of data will be multiplied. Wouldn't the performance be worse? Is this solely for inequality join feature ?
> New Join types in Pig
> ---------------------
>
> Key: PIG-199
> URL: https://issues.apache.org/jira/browse/PIG-199
> Project: Pig
> Issue Type: New Feature
> Reporter: Amir Youssefi
> Assignee: Amir Youssefi
>
> We need to design and implementation new Join Types in Pig which can potentially improve the performance for large data-sets. I will start with Map Side Joins/Fragment and Replace.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-199) New Join types in Pig
Posted by "Amir Youssefi (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Amir Youssefi reassigned PIG-199:
---------------------------------
Assignee: Amir Youssefi
> New Join types in Pig
> ---------------------
>
> Key: PIG-199
> URL: https://issues.apache.org/jira/browse/PIG-199
> Project: Pig
> Issue Type: New Feature
> Reporter: Amir Youssefi
> Assignee: Amir Youssefi
>
> We need to design and implementation new Join Types in Pig which can potentially improve the performance for large data-sets. I will start with Map Side Joins/Fragment and Replace.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.