You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Gagan Juneja <ga...@gmail.com> on 2015/07/14 13:56:16 UTC

Query | Join Internals

Hi Team,

We are using Pig intensively in our various projects. We are doing
optimizations for that we wanted to know how join works. Though we have
moved to Skewed joins for some of our use cases.

At many places in the documentation this is mentioned that in join data is
streamed for second table. But I was identify how this can fit in map
reduce paradigm.

1. Can anyone please clarify how join happens in pig.
2. What is the meaning of Streaming here? Are we loading the files directly
in the reducres?


Regards,
Gagan

Re: Query | Join Internals

Posted by "Jianfeng (Jeff) Zhang" <jz...@hortonworks.com>.
This document should be helpful for you

https://wiki.apache.org/pig/PigSkewedJoinSpec



Best Regard,
Jeff Zhang





On 7/14/15, 4:56 AM, "Gagan Juneja" <ga...@gmail.com> wrote:

>Hi Team,
>
>We are using Pig intensively in our various projects. We are doing
>optimizations for that we wanted to know how join works. Though we have
>moved to Skewed joins for some of our use cases.
>
>At many places in the documentation this is mentioned that in join data is
>streamed for second table. But I was identify how this can fit in map
>reduce paradigm.
>
>1. Can anyone please clarify how join happens in pig.
>2. What is the meaning of Streaming here? Are we loading the files
>directly
>in the reducres?
>
>
>Regards,
>Gagan


Re: Query | Join Internals

Posted by Gagan Juneja <ga...@gmail.com>.
Any help?

Regards,
Gagan

On Tuesday 14 July 2015, Gagan Juneja <ga...@gmail.com> wrote:

> Hi Team,
>
> We are using Pig intensively in our various projects. We are doing
> optimizations for that we wanted to know how join works. Though we have
> moved to Skewed joins for some of our use cases.
>
> At many places in the documentation this is mentioned that in join data is
> streamed for second table. But I was identify how this can fit in map
> reduce paradigm.
>
> 1. Can anyone please clarify how join happens in pig.
> 2. What is the meaning of Streaming here? Are we loading the files
> directly in the reducres?
>
>
> Regards,
> Gagan
>

Re: Query | Join Internals

Posted by Divya Gehlot <di...@gmail.com>.
Hi Gagan,

This link may help you
https://bluewatersql.wordpress.com/2013/10/04/3-little-piggys-advanced-pig-join-scenarios/

On 30 July 2015 at 22:04, Alan Gates <al...@gmail.com> wrote:

> Here's the original design doc:
> https://wiki.apache.org/pig/PigSkewedJoinSpec
>
> Alan.
>
> Gagan Juneja <ga...@gmail.com>
> July 29, 2015 at 21:30
> Any help?
>
> Regards,
> Gagan
>
>
> Gagan Juneja <ga...@gmail.com>
> July 14, 2015 at 4:56
> Hi Team,
>
> We are using Pig intensively in our various projects. We are doing
> optimizations for that we wanted to know how join works. Though we have
> moved to Skewed joins for some of our use cases.
>
> At many places in the documentation this is mentioned that in join data is
> streamed for second table. But I was identify how this can fit in map
> reduce paradigm.
>
> 1. Can anyone please clarify how join happens in pig.
> 2. What is the meaning of Streaming here? Are we loading the files directly
> in the reducres?
>
>
> Regards,
> Gagan
>
>

Re: Query | Join Internals

Posted by Alan Gates <al...@gmail.com>.
Here's the original design doc: 
https://wiki.apache.org/pig/PigSkewedJoinSpec

Alan.

> Gagan Juneja <ma...@gmail.com>
> July 29, 2015 at 21:30
> Any help?
>
> Regards,
> Gagan
>
>
> Gagan Juneja <ma...@gmail.com>
> July 14, 2015 at 4:56
> Hi Team,
>
> We are using Pig intensively in our various projects. We are doing
> optimizations for that we wanted to know how join works. Though we have
> moved to Skewed joins for some of our use cases.
>
> At many places in the documentation this is mentioned that in join data is
> streamed for second table. But I was identify how this can fit in map
> reduce paradigm.
>
> 1. Can anyone please clarify how join happens in pig.
> 2. What is the meaning of Streaming here? Are we loading the files 
> directly
> in the reducres?
>
>
> Regards,
> Gagan
>

Re: Query | Join Internals

Posted by Gagan Juneja <ga...@gmail.com>.
Any help?

Regards,
Gagan

On Tuesday 14 July 2015, Gagan Juneja <ga...@gmail.com> wrote:

> Hi Team,
>
> We are using Pig intensively in our various projects. We are doing
> optimizations for that we wanted to know how join works. Though we have
> moved to Skewed joins for some of our use cases.
>
> At many places in the documentation this is mentioned that in join data is
> streamed for second table. But I was identify how this can fit in map
> reduce paradigm.
>
> 1. Can anyone please clarify how join happens in pig.
> 2. What is the meaning of Streaming here? Are we loading the files
> directly in the reducres?
>
>
> Regards,
> Gagan
>