You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Renato Marroquín Mogrovejo <re...@gmail.com> on 2011/04/13 06:48:30 UTC

About Pig Joins

Hi there,

I have some questions about how PIG performs joins. The site says there are
three types of specialized joins: Replicated, skew, and merge joins. I
wanted to know these implementations. For instance, what about the regular
type of join?  Is there a difference between the repartition join
(org.apache.hadoop.contrib.utils.join) and the regular join implemented in
PIG? Is the merge join what is called a map-side join (where input is sorted
and just regular scans are done)?
And about PIG join framework[1], are there other specific implementation
done? e.g. Semi-joins?
Thanks in advance.


Renato M.

Re: About Pig Joins

Posted by Alan Gates <ga...@yahoo-inc.com>.
You can also take a look at the beta of the _Programming Pig_ book, http://ofps.oreilly.com/titles/9781449302641/ 
   Specialized joins are described in chapter six.  How to do semi- 
joins is also described under Cogroup in the same chapter.  Any  
feedback you have on the descriptions would be welcomed.

Alan.

On Apr 12, 2011, at 9:55 PM, Dmitriy Ryaboy wrote:

> Renato, slides 20-23 on this presentation describe the  
> implementations of
> the joins:
> http://squarecog.wordpress.com/2009/11/03/apache-pig-apittsburgh-hadoop-user-group/
>
> You can also search the pig wiki for design docs for each join.
>
> D
>
> On Tue, Apr 12, 2011 at 9:48 PM, Renato Marroquín Mogrovejo <
> renatoj.marroquin@gmail.com> wrote:
>
>> Hi there,
>>
>> I have some questions about how PIG performs joins. The site says  
>> there are
>> three types of specialized joins: Replicated, skew, and merge  
>> joins. I
>> wanted to know these implementations. For instance, what about the  
>> regular
>> type of join?  Is there a difference between the repartition join
>> (org.apache.hadoop.contrib.utils.join) and the regular join  
>> implemented in
>> PIG? Is the merge join what is called a map-side join (where input is
>> sorted
>> and just regular scans are done)?
>> And about PIG join framework[1], are there other specific  
>> implementation
>> done? e.g. Semi-joins?
>> Thanks in advance.
>>
>>
>> Renato M.
>>


Re: About Pig Joins

Posted by Dmitriy Ryaboy <dv...@gmail.com>.
Renato, slides 20-23 on this presentation describe the implementations of
the joins:
http://squarecog.wordpress.com/2009/11/03/apache-pig-apittsburgh-hadoop-user-group/

You can also search the pig wiki for design docs for each join.

D

On Tue, Apr 12, 2011 at 9:48 PM, Renato Marroquín Mogrovejo <
renatoj.marroquin@gmail.com> wrote:

> Hi there,
>
> I have some questions about how PIG performs joins. The site says there are
> three types of specialized joins: Replicated, skew, and merge joins. I
> wanted to know these implementations. For instance, what about the regular
> type of join?  Is there a difference between the repartition join
> (org.apache.hadoop.contrib.utils.join) and the regular join implemented in
> PIG? Is the merge join what is called a map-side join (where input is
> sorted
> and just regular scans are done)?
> And about PIG join framework[1], are there other specific implementation
> done? e.g. Semi-joins?
> Thanks in advance.
>
>
> Renato M.
>