You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by Russell Jurney <ru...@gmail.com> on 2010/06/12 03:42:30 UTC
SIZE() of relation
Would it be possible, and not a ton of work to make the builtin SIZE() work
on a relation? Reason being, I frequently do this:
B = GROUP A ALL;
C = FOREACH B GENERATE SIZE(A) AS total;
DUMP C;
And I would rather do this:
DUMP SIZE(A);
Russ
Re: SIZE() of relation
Posted by Dmitriy Ryaboy <dv...@gmail.com>.
Might be ok if we artificially limit this to only work with algebraic
functions.
On Tue, Jun 15, 2010 at 9:14 AM, Alan Gates <ga...@yahoo-inc.com> wrote:
> There have been several requests for this. I'm not a fan of it, because it
> makes it too easy to forget that you're forcing a single reducer MR job to
> accomplish this. But I'm open to persuasion if everyone else disagrees.
>
> Alan.
>
>
> On Jun 11, 2010, at 7:27 PM, Russell Jurney wrote:
>
> This would be great. Save us from GROUP ALL/FOREACH, which is awkward.
>>
>> On Fri, Jun 11, 2010 at 7:14 PM, Dmitriy Ryaboy <dv...@gmail.com>
>> wrote:
>>
>> It would be cool to just treat relations as bags in the general case.
>>> They
>>> kind of are, and kind of are not. Causes lots of user confusion.
>>> There are obvious users-doing-dumb-stuff scenarios that arise though.
>>> I guess the Pig philosophy is that the user is the optimizer, though.. so
>>> maybe it's ok.
>>>
>>> -D
>>>
>>> On Fri, Jun 11, 2010 at 6:42 PM, Russell Jurney <
>>> russell.jurney@gmail.com
>>>
>>>> wrote:
>>>>
>>>
>>> Would it be possible, and not a ton of work to make the builtin SIZE()
>>>>
>>> work
>>>
>>>> on a relation? Reason being, I frequently do this:
>>>>
>>>> B = GROUP A ALL;
>>>> C = FOREACH B GENERATE SIZE(A) AS total;
>>>> DUMP C;
>>>>
>>>> And I would rather do this:
>>>>
>>>> DUMP SIZE(A);
>>>>
>>>> Russ
>>>>
>>>>
>>>
>
Re: SIZE() of relation
Posted by Alan Gates <ga...@yahoo-inc.com>.
There have been several requests for this. I'm not a fan of it,
because it makes it too easy to forget that you're forcing a single
reducer MR job to accomplish this. But I'm open to persuasion if
everyone else disagrees.
Alan.
On Jun 11, 2010, at 7:27 PM, Russell Jurney wrote:
> This would be great. Save us from GROUP ALL/FOREACH, which is
> awkward.
>
> On Fri, Jun 11, 2010 at 7:14 PM, Dmitriy Ryaboy <dv...@gmail.com>
> wrote:
>
>> It would be cool to just treat relations as bags in the general
>> case. They
>> kind of are, and kind of are not. Causes lots of user confusion.
>> There are obvious users-doing-dumb-stuff scenarios that arise though.
>> I guess the Pig philosophy is that the user is the optimizer,
>> though.. so
>> maybe it's ok.
>>
>> -D
>>
>> On Fri, Jun 11, 2010 at 6:42 PM, Russell Jurney <russell.jurney@gmail.com
>>> wrote:
>>
>>> Would it be possible, and not a ton of work to make the builtin
>>> SIZE()
>> work
>>> on a relation? Reason being, I frequently do this:
>>>
>>> B = GROUP A ALL;
>>> C = FOREACH B GENERATE SIZE(A) AS total;
>>> DUMP C;
>>>
>>> And I would rather do this:
>>>
>>> DUMP SIZE(A);
>>>
>>> Russ
>>>
>>
Re: SIZE() of relation
Posted by Russell Jurney <ru...@gmail.com>.
This would be great. Save us from GROUP ALL/FOREACH, which is awkward.
On Fri, Jun 11, 2010 at 7:14 PM, Dmitriy Ryaboy <dv...@gmail.com> wrote:
> It would be cool to just treat relations as bags in the general case. They
> kind of are, and kind of are not. Causes lots of user confusion.
> There are obvious users-doing-dumb-stuff scenarios that arise though.
> I guess the Pig philosophy is that the user is the optimizer, though.. so
> maybe it's ok.
>
> -D
>
> On Fri, Jun 11, 2010 at 6:42 PM, Russell Jurney <russell.jurney@gmail.com
> >wrote:
>
> > Would it be possible, and not a ton of work to make the builtin SIZE()
> work
> > on a relation? Reason being, I frequently do this:
> >
> > B = GROUP A ALL;
> > C = FOREACH B GENERATE SIZE(A) AS total;
> > DUMP C;
> >
> > And I would rather do this:
> >
> > DUMP SIZE(A);
> >
> > Russ
> >
>
Re: SIZE() of relation
Posted by Dmitriy Ryaboy <dv...@gmail.com>.
It would be cool to just treat relations as bags in the general case. They
kind of are, and kind of are not. Causes lots of user confusion.
There are obvious users-doing-dumb-stuff scenarios that arise though.
I guess the Pig philosophy is that the user is the optimizer, though.. so
maybe it's ok.
-D
On Fri, Jun 11, 2010 at 6:42 PM, Russell Jurney <ru...@gmail.com>wrote:
> Would it be possible, and not a ton of work to make the builtin SIZE() work
> on a relation? Reason being, I frequently do this:
>
> B = GROUP A ALL;
> C = FOREACH B GENERATE SIZE(A) AS total;
> DUMP C;
>
> And I would rather do this:
>
> DUMP SIZE(A);
>
> Russ
>