You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Kevin Burton <bu...@spinn3r.com> on 2011/08/28 21:11:27 UTC

Do "regular join optimizations" apply to COGROUP?

I'm reading the documentation and it says:

"*Regular Join Optimizations*

Optimization for regular joins ensures that the last table in the join is
not brought into memory but streamed through instead. Optimization reduces
the amount of memory used which means you can avoid spilling the data and
also should be able to scale your query to larger data volumes.

To take advantage of this optimization, make sure that the table with the
largest number of tuples per key is the last table in your query. In some of
our tests we saw 10x performance improvement as the result of this
optimization.".


This seems like it would apply to cogroup too…… does it?

-- 

Founder/CEO Spinn3r.com

Location: *San Francisco, CA*
Skype: *burtonator*

Skype-in: *(415) 871-0687*

Re: Do "regular join optimizations" apply to COGROUP?

Posted by Dmitriy Ryaboy <dv...@gmail.com>.
Just make sure to use "git diff --no-prefix" to generate you patch when you
upload it to the Jira. I work off the apache git mirror, as well.

D

On Sun, Aug 28, 2011 at 11:49 PM, Kevin Burton <bu...@spinn3r.com> wrote:

> The documentation should probably be updated to reflect this … I guess I
> should probably shut up and submit a patch :-P
>
> …
>
> I do have to admit that as an outside contributor having a read only git is
> really sweet.  I'm too addicted to our internal branching using distributed
> reversion control to want to work in an OSS project without it :-P
>
> On Sun, Aug 28, 2011 at 12:27 PM, Daniel Dai <da...@hortonworks.com>
> wrote:
>
> > Yes. But only in the case you immediately flatten the rightmost
> > relation after cogroup. Otherwise, bag will be created.
> >
> > Daniel
> >
> > On Sun, Aug 28, 2011 at 12:11 PM, Kevin Burton <bu...@spinn3r.com>
> wrote:
> > > I'm reading the documentation and it says:
> > >
> > > "*Regular Join Optimizations*
> > >
> > > Optimization for regular joins ensures that the last table in the join
> is
> > > not brought into memory but streamed through instead. Optimization
> > reduces
> > > the amount of memory used which means you can avoid spilling the data
> and
> > > also should be able to scale your query to larger data volumes.
> > >
> > > To take advantage of this optimization, make sure that the table with
> the
> > > largest number of tuples per key is the last table in your query. In
> some
> > of
> > > our tests we saw 10x performance improvement as the result of this
> > > optimization.".
> > >
> > >
> > > This seems like it would apply to cogroup too…… does it?
> > >
> > > --
> > >
> > > Founder/CEO Spinn3r.com
> > >
> > > Location: *San Francisco, CA*
> > > Skype: *burtonator*
> > >
> > > Skype-in: *(415) 871-0687*
> > >
> >
>
>
>
> --
>
> Founder/CEO Spinn3r.com
>
> Location: *San Francisco, CA*
> Skype: *burtonator*
>
> Skype-in: *(415) 871-0687*
>

Re: Do "regular join optimizations" apply to COGROUP?

Posted by Kevin Burton <bu...@spinn3r.com>.
The documentation should probably be updated to reflect this … I guess I
should probably shut up and submit a patch :-P

…

I do have to admit that as an outside contributor having a read only git is
really sweet.  I'm too addicted to our internal branching using distributed
reversion control to want to work in an OSS project without it :-P

On Sun, Aug 28, 2011 at 12:27 PM, Daniel Dai <da...@hortonworks.com> wrote:

> Yes. But only in the case you immediately flatten the rightmost
> relation after cogroup. Otherwise, bag will be created.
>
> Daniel
>
> On Sun, Aug 28, 2011 at 12:11 PM, Kevin Burton <bu...@spinn3r.com> wrote:
> > I'm reading the documentation and it says:
> >
> > "*Regular Join Optimizations*
> >
> > Optimization for regular joins ensures that the last table in the join is
> > not brought into memory but streamed through instead. Optimization
> reduces
> > the amount of memory used which means you can avoid spilling the data and
> > also should be able to scale your query to larger data volumes.
> >
> > To take advantage of this optimization, make sure that the table with the
> > largest number of tuples per key is the last table in your query. In some
> of
> > our tests we saw 10x performance improvement as the result of this
> > optimization.".
> >
> >
> > This seems like it would apply to cogroup too…… does it?
> >
> > --
> >
> > Founder/CEO Spinn3r.com
> >
> > Location: *San Francisco, CA*
> > Skype: *burtonator*
> >
> > Skype-in: *(415) 871-0687*
> >
>



-- 

Founder/CEO Spinn3r.com

Location: *San Francisco, CA*
Skype: *burtonator*

Skype-in: *(415) 871-0687*

Re: Do "regular join optimizations" apply to COGROUP?

Posted by Daniel Dai <da...@hortonworks.com>.
Yes. But only in the case you immediately flatten the rightmost
relation after cogroup. Otherwise, bag will be created.

Daniel

On Sun, Aug 28, 2011 at 12:11 PM, Kevin Burton <bu...@spinn3r.com> wrote:
> I'm reading the documentation and it says:
>
> "*Regular Join Optimizations*
>
> Optimization for regular joins ensures that the last table in the join is
> not brought into memory but streamed through instead. Optimization reduces
> the amount of memory used which means you can avoid spilling the data and
> also should be able to scale your query to larger data volumes.
>
> To take advantage of this optimization, make sure that the table with the
> largest number of tuples per key is the last table in your query. In some of
> our tests we saw 10x performance improvement as the result of this
> optimization.".
>
>
> This seems like it would apply to cogroup too…… does it?
>
> --
>
> Founder/CEO Spinn3r.com
>
> Location: *San Francisco, CA*
> Skype: *burtonator*
>
> Skype-in: *(415) 871-0687*
>