You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by pi song <pi...@gmail.com> on 2008/05/16 01:16:41 UTC

Confused about COGroup semantic

Normally we do COGroup like this:-

X = COGroup A By $0, B By $0 ;

This first column of the output will be data atom.

But if we do:-

X = COGroup A By $0, B By $0, $1 ;

What is the the first column then? I assume the B grouping will be wrapped
to tuple and treated as atom. Am I right?

Pi

Re: Confused about COGroup semantic

Posted by pi song <pi...@gmail.com>.
I don't like it either but seems like we just have to live with this 2
semantics 1 operator thing.

Does anyone have a better solution?

Goals:-
- Consistent semantic
- Ease of use


On Fri, May 16, 2008 at 2:30 PM, Utkarsh Srivastava <ut...@yahoo-inc.com>
wrote:

> >
> > The first output column will have to be wrapped in Tuple whereas if we
> > group
> > by only one column we don't have to wrap. Is that the right logic?
> >
>
> Yes, that is how it has worked so far.
>
> However, if I am not a big fan of this logic since it is confusing at
> times and leads to several special cases in the code. I think it will be
> cleaner to always wrap in a tuple. But that has 2 disadvantages:
>
> i) Will break backward compatibility
> ii) Will lead to non-flat tuples which users won't be able to store
> using default storage functions.
>
> Utkarsh
>
>
>
> > Pi
> >
> > On 5/16/08, Alan Gates <ga...@yahoo-inc.com> wrote:
> > >
> > > There really isn't any meaning to cogrouping with one field on one
> > relation
> > > and two fields on another.  Given our definition of tuple, there
> will
> > never
> > > be any tuples that match.  I believe Santhosh has changed this to be
> a
> > > syntax error.
> > >
> > > Alan.
> > >
> > > pi song wrote:
> > >
> > >> Normally we do COGroup like this:-
> > >>
> > >> X = COGroup A By $0, B By $0 ;
> > >>
> > >> This first column of the output will be data atom.
> > >>
> > >> But if we do:-
> > >>
> > >> X = COGroup A By $0, B By $0, $1 ;
> > >>
> > >> What is the the first column then? I assume the B grouping will be
> > wrapped
> > >> to tuple and treated as atom. Am I right?
> > >>
> > >> Pi
> > >>
> > >>
> > >>
> > >
>

RE: Confused about COGroup semantic

Posted by Utkarsh Srivastava <ut...@yahoo-inc.com>.
> 
> The first output column will have to be wrapped in Tuple whereas if we
> group
> by only one column we don't have to wrap. Is that the right logic?
> 

Yes, that is how it has worked so far. 

However, if I am not a big fan of this logic since it is confusing at
times and leads to several special cases in the code. I think it will be
cleaner to always wrap in a tuple. But that has 2 disadvantages:

i) Will break backward compatibility
ii) Will lead to non-flat tuples which users won't be able to store
using default storage functions.

Utkarsh



> Pi
> 
> On 5/16/08, Alan Gates <ga...@yahoo-inc.com> wrote:
> >
> > There really isn't any meaning to cogrouping with one field on one
> relation
> > and two fields on another.  Given our definition of tuple, there
will
> never
> > be any tuples that match.  I believe Santhosh has changed this to be
a
> > syntax error.
> >
> > Alan.
> >
> > pi song wrote:
> >
> >> Normally we do COGroup like this:-
> >>
> >> X = COGroup A By $0, B By $0 ;
> >>
> >> This first column of the output will be data atom.
> >>
> >> But if we do:-
> >>
> >> X = COGroup A By $0, B By $0, $1 ;
> >>
> >> What is the the first column then? I assume the B grouping will be
> wrapped
> >> to tuple and treated as atom. Am I right?
> >>
> >> Pi
> >>
> >>
> >>
> >

Re: Confused about COGroup semantic

Posted by pi song <pi...@gmail.com>.
Then how about:-
 X = COGroup A By $0,$1,
                    B By $0, $1 ;

The first output column will have to be wrapped in Tuple whereas if we group
by only one column we don't have to wrap. Is that the right logic?

Pi

On 5/16/08, Alan Gates <ga...@yahoo-inc.com> wrote:
>
> There really isn't any meaning to cogrouping with one field on one relation
> and two fields on another.  Given our definition of tuple, there will never
> be any tuples that match.  I believe Santhosh has changed this to be a
> syntax error.
>
> Alan.
>
> pi song wrote:
>
>> Normally we do COGroup like this:-
>>
>> X = COGroup A By $0, B By $0 ;
>>
>> This first column of the output will be data atom.
>>
>> But if we do:-
>>
>> X = COGroup A By $0, B By $0, $1 ;
>>
>> What is the the first column then? I assume the B grouping will be wrapped
>> to tuple and treated as atom. Am I right?
>>
>> Pi
>>
>>
>>
>

Re: Confused about COGroup semantic

Posted by Alan Gates <ga...@yahoo-inc.com>.
There really isn't any meaning to cogrouping with one field on one 
relation and two fields on another.  Given our definition of tuple, 
there will never be any tuples that match.  I believe Santhosh has 
changed this to be a syntax error.

Alan.

pi song wrote:
> Normally we do COGroup like this:-
>
> X = COGroup A By $0, B By $0 ;
>
> This first column of the output will be data atom.
>
> But if we do:-
>
> X = COGroup A By $0, B By $0, $1 ;
>
> What is the the first column then? I assume the B grouping will be wrapped
> to tuple and treated as atom. Am I right?
>
> Pi
>
>