You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by Russell Jurney <ru...@gmail.com> on 2010/06/12 03:42:30 UTC

SIZE() of relation

Would it be possible, and not a ton of work to make the builtin SIZE() work
on a relation?  Reason being, I frequently do this:

B = GROUP A ALL;
C = FOREACH B GENERATE SIZE(A) AS total;
DUMP C;

And I would rather do this:

DUMP SIZE(A);

Russ

Re: SIZE() of relation

Posted by Dmitriy Ryaboy <dv...@gmail.com>.
Might be ok if we artificially limit this to only work with algebraic
functions.

On Tue, Jun 15, 2010 at 9:14 AM, Alan Gates <ga...@yahoo-inc.com> wrote:

> There have been several requests for this.  I'm not a fan of it, because it
> makes it too easy to forget that you're forcing a single reducer MR job to
> accomplish this.  But I'm open to persuasion if everyone else disagrees.
>
> Alan.
>
>
> On Jun 11, 2010, at 7:27 PM, Russell Jurney wrote:
>
>  This would be great.  Save us from GROUP ALL/FOREACH, which is awkward.
>>
>> On Fri, Jun 11, 2010 at 7:14 PM, Dmitriy Ryaboy <dv...@gmail.com>
>> wrote:
>>
>>  It would be cool to just treat relations as bags in the general case.
>>> They
>>> kind of are, and kind of are not. Causes lots of user confusion.
>>> There are obvious users-doing-dumb-stuff scenarios that arise though.
>>> I guess the Pig philosophy is that the user is the optimizer, though.. so
>>> maybe it's ok.
>>>
>>> -D
>>>
>>> On Fri, Jun 11, 2010 at 6:42 PM, Russell Jurney <
>>> russell.jurney@gmail.com
>>>
>>>> wrote:
>>>>
>>>
>>>  Would it be possible, and not a ton of work to make the builtin SIZE()
>>>>
>>> work
>>>
>>>> on a relation?  Reason being, I frequently do this:
>>>>
>>>> B = GROUP A ALL;
>>>> C = FOREACH B GENERATE SIZE(A) AS total;
>>>> DUMP C;
>>>>
>>>> And I would rather do this:
>>>>
>>>> DUMP SIZE(A);
>>>>
>>>> Russ
>>>>
>>>>
>>>
>

Re: SIZE() of relation

Posted by Alan Gates <ga...@yahoo-inc.com>.
There have been several requests for this.  I'm not a fan of it,  
because it makes it too easy to forget that you're forcing a single  
reducer MR job to accomplish this.  But I'm open to persuasion if  
everyone else disagrees.

Alan.

On Jun 11, 2010, at 7:27 PM, Russell Jurney wrote:

> This would be great.  Save us from GROUP ALL/FOREACH, which is  
> awkward.
>
> On Fri, Jun 11, 2010 at 7:14 PM, Dmitriy Ryaboy <dv...@gmail.com>  
> wrote:
>
>> It would be cool to just treat relations as bags in the general  
>> case. They
>> kind of are, and kind of are not. Causes lots of user confusion.
>> There are obvious users-doing-dumb-stuff scenarios that arise though.
>> I guess the Pig philosophy is that the user is the optimizer,  
>> though.. so
>> maybe it's ok.
>>
>> -D
>>
>> On Fri, Jun 11, 2010 at 6:42 PM, Russell Jurney <russell.jurney@gmail.com
>>> wrote:
>>
>>> Would it be possible, and not a ton of work to make the builtin  
>>> SIZE()
>> work
>>> on a relation?  Reason being, I frequently do this:
>>>
>>> B = GROUP A ALL;
>>> C = FOREACH B GENERATE SIZE(A) AS total;
>>> DUMP C;
>>>
>>> And I would rather do this:
>>>
>>> DUMP SIZE(A);
>>>
>>> Russ
>>>
>>


Re: SIZE() of relation

Posted by Russell Jurney <ru...@gmail.com>.
This would be great.  Save us from GROUP ALL/FOREACH, which is awkward.

On Fri, Jun 11, 2010 at 7:14 PM, Dmitriy Ryaboy <dv...@gmail.com> wrote:

> It would be cool to just treat relations as bags in the general case. They
> kind of are, and kind of are not. Causes lots of user confusion.
> There are obvious users-doing-dumb-stuff scenarios that arise though.
> I guess the Pig philosophy is that the user is the optimizer, though.. so
> maybe it's ok.
>
> -D
>
> On Fri, Jun 11, 2010 at 6:42 PM, Russell Jurney <russell.jurney@gmail.com
> >wrote:
>
> > Would it be possible, and not a ton of work to make the builtin SIZE()
> work
> > on a relation?  Reason being, I frequently do this:
> >
> > B = GROUP A ALL;
> > C = FOREACH B GENERATE SIZE(A) AS total;
> > DUMP C;
> >
> > And I would rather do this:
> >
> > DUMP SIZE(A);
> >
> > Russ
> >
>

Re: SIZE() of relation

Posted by Dmitriy Ryaboy <dv...@gmail.com>.
It would be cool to just treat relations as bags in the general case. They
kind of are, and kind of are not. Causes lots of user confusion.
There are obvious users-doing-dumb-stuff scenarios that arise though.
I guess the Pig philosophy is that the user is the optimizer, though.. so
maybe it's ok.

-D

On Fri, Jun 11, 2010 at 6:42 PM, Russell Jurney <ru...@gmail.com>wrote:

> Would it be possible, and not a ton of work to make the builtin SIZE() work
> on a relation?  Reason being, I frequently do this:
>
> B = GROUP A ALL;
> C = FOREACH B GENERATE SIZE(A) AS total;
> DUMP C;
>
> And I would rather do this:
>
> DUMP SIZE(A);
>
> Russ
>