You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pig.apache.org by "Amir Youssefi (JIRA)" <ji...@apache.org> on 2008/04/10 02:24:05 UTC

[jira] Created: (PIG-199) New Join types in Pig

New Join types in Pig
---------------------

                 Key: PIG-199
                 URL: https://issues.apache.org/jira/browse/PIG-199
             Project: Pig
          Issue Type: New Feature
            Reporter: Amir Youssefi


We need to design and implementation new Join Types in Pig which can potentially improve the performance for large data-sets. I will start with Map Side Joins/Fragment and Replace.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-199) New Join types in Pig

Posted by "Pi Song (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/PIG-199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12589560#action_12589560 ] 

Pi Song commented on PIG-199:
-----------------------------

Amir,

Just out of curiosity. How do you plan to implement Fragment and Replace Join? Is it like ? :-

For A ⋈ B :-
In A:  Map (k1, v1) --> { ((a ,1),(k1,v1)), ((a ,2),(k1,v1)), ((a ,3),(k1,v1)), ... , ((a ,M),(k1,v1)) }    where a = GetPartitionA( (k1,v1) ) into N partitions
In B:  Map (k1, v1) --> { ((1 ,b),(k1,v1)), ((2 ,b),(k1,v1)), ((3 ,b),(k1,v1)), ... , ((N ,b),(k1,v1)) }    where b = GetPartitionB( (k1,v1) ) into M partitions

And then having N * M reduce buckets doing local join?

If that is the case, the amount of data will be multiplied. Wouldn't the performance be worse? Is this solely for inequality join feature ?

> New Join types in Pig
> ---------------------
>
>                 Key: PIG-199
>                 URL: https://issues.apache.org/jira/browse/PIG-199
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Amir Youssefi
>            Assignee: Amir Youssefi
>
> We need to design and implementation new Join Types in Pig which can potentially improve the performance for large data-sets. I will start with Map Side Joins/Fragment and Replace.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (PIG-199) New Join types in Pig

Posted by "Amir Youssefi (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/PIG-199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amir Youssefi reassigned PIG-199:
---------------------------------

    Assignee: Amir Youssefi

> New Join types in Pig
> ---------------------
>
>                 Key: PIG-199
>                 URL: https://issues.apache.org/jira/browse/PIG-199
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Amir Youssefi
>            Assignee: Amir Youssefi
>
> We need to design and implementation new Join Types in Pig which can potentially improve the performance for large data-sets. I will start with Map Side Joins/Fragment and Replace.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.