You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Lauren Blau <la...@digitalreasoning.com> on 2012/08/14 11:55:44 UTC

add a field, ordered

I  want to match up tuples from 2 relations. For each key, the 2 relations
will always have the same number of tuples and match by position (the first
tuple in each are a match, the second tuple in each, etc).

so if I have
relation1 = 5,9,7
relation2 = z,a,d

I want to end up with

relation3 = (5,z),(9,a),(7,d)

I figure I need a way to generate a matching key on the ordered tuples of
the relations and then do a cogroup. But I'm stuck on generating the key.
Since adding a field is a project, I assume this has to be done as part of
a foreach loop. But I'm not sure how I can maintain the order while adding
a field to each tuple.

ideas?
Thanks,
lauren

Re: add a field, ordered

Posted by Alan Gates <ga...@hortonworks.com>.
Take a look at https://issues.apache.org/jira/browse/PIG-2353  I believe that's the JIRA for where they're doing the work.

Alan.

On Aug 14, 2012, at 3:38 AM, Lauren Blau wrote:

> Is the source for it available in the development area? I'd be happy to
> help if I can.
> Lauren
> 
> On Tue, Aug 14, 2012 at 6:05 AM, Gianmarco De Francisci Morales <
> gdfm@apache.org> wrote:
> 
>> Hi,
>> 
>> We are finalizing a feature that would solve your problems, something like
>> ROW_NUMBER in some SQL dialect, we call it RANK.
>> This operator will add a unique consecutive row number to each tuple in the
>> relationship.
>> Then you will be able to join the two relationships on the rank field.
>> 
>> For the moment being, however, I think there is no easy way to achieve what
>> you want to do.
>> 
>> Cheers,
>> --
>> Gianmarco
>> 
>> 
>> 
>> On Tue, Aug 14, 2012 at 11:55 AM, Lauren Blau <
>> lauren.blau@digitalreasoning.com> wrote:
>> 
>>> I  want to match up tuples from 2 relations. For each key, the 2
>> relations
>>> will always have the same number of tuples and match by position (the
>> first
>>> tuple in each are a match, the second tuple in each, etc).
>>> 
>>> so if I have
>>> relation1 = 5,9,7
>>> relation2 = z,a,d
>>> 
>>> I want to end up with
>>> 
>>> relation3 = (5,z),(9,a),(7,d)
>>> 
>>> I figure I need a way to generate a matching key on the ordered tuples of
>>> the relations and then do a cogroup. But I'm stuck on generating the key.
>>> Since adding a field is a project, I assume this has to be done as part
>> of
>>> a foreach loop. But I'm not sure how I can maintain the order while
>> adding
>>> a field to each tuple.
>>> 
>>> ideas?
>>> Thanks,
>>> lauren
>>> 
>> 


Re: add a field, ordered

Posted by Lauren Blau <la...@digitalreasoning.com>.
Is the source for it available in the development area? I'd be happy to
help if I can.
Lauren

On Tue, Aug 14, 2012 at 6:05 AM, Gianmarco De Francisci Morales <
gdfm@apache.org> wrote:

> Hi,
>
> We are finalizing a feature that would solve your problems, something like
> ROW_NUMBER in some SQL dialect, we call it RANK.
> This operator will add a unique consecutive row number to each tuple in the
> relationship.
> Then you will be able to join the two relationships on the rank field.
>
> For the moment being, however, I think there is no easy way to achieve what
> you want to do.
>
> Cheers,
> --
> Gianmarco
>
>
>
> On Tue, Aug 14, 2012 at 11:55 AM, Lauren Blau <
> lauren.blau@digitalreasoning.com> wrote:
>
> > I  want to match up tuples from 2 relations. For each key, the 2
> relations
> > will always have the same number of tuples and match by position (the
> first
> > tuple in each are a match, the second tuple in each, etc).
> >
> > so if I have
> > relation1 = 5,9,7
> > relation2 = z,a,d
> >
> > I want to end up with
> >
> > relation3 = (5,z),(9,a),(7,d)
> >
> > I figure I need a way to generate a matching key on the ordered tuples of
> > the relations and then do a cogroup. But I'm stuck on generating the key.
> > Since adding a field is a project, I assume this has to be done as part
> of
> > a foreach loop. But I'm not sure how I can maintain the order while
> adding
> > a field to each tuple.
> >
> > ideas?
> > Thanks,
> > lauren
> >
>

Re: add a field, ordered

Posted by Gianmarco De Francisci Morales <gd...@apache.org>.
Hi,

We are finalizing a feature that would solve your problems, something like
ROW_NUMBER in some SQL dialect, we call it RANK.
This operator will add a unique consecutive row number to each tuple in the
relationship.
Then you will be able to join the two relationships on the rank field.

For the moment being, however, I think there is no easy way to achieve what
you want to do.

Cheers,
--
Gianmarco



On Tue, Aug 14, 2012 at 11:55 AM, Lauren Blau <
lauren.blau@digitalreasoning.com> wrote:

> I  want to match up tuples from 2 relations. For each key, the 2 relations
> will always have the same number of tuples and match by position (the first
> tuple in each are a match, the second tuple in each, etc).
>
> so if I have
> relation1 = 5,9,7
> relation2 = z,a,d
>
> I want to end up with
>
> relation3 = (5,z),(9,a),(7,d)
>
> I figure I need a way to generate a matching key on the ordered tuples of
> the relations and then do a cogroup. But I'm stuck on generating the key.
> Since adding a field is a project, I assume this has to be done as part of
> a foreach loop. But I'm not sure how I can maintain the order while adding
> a field to each tuple.
>
> ideas?
> Thanks,
> lauren
>