You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Lauren Blau <la...@digitalreasoning.com> on 2012/08/14 11:55:44 UTC
add a field, ordered
I want to match up tuples from 2 relations. For each key, the 2 relations
will always have the same number of tuples and match by position (the first
tuple in each are a match, the second tuple in each, etc).
so if I have
relation1 = 5,9,7
relation2 = z,a,d
I want to end up with
relation3 = (5,z),(9,a),(7,d)
I figure I need a way to generate a matching key on the ordered tuples of
the relations and then do a cogroup. But I'm stuck on generating the key.
Since adding a field is a project, I assume this has to be done as part of
a foreach loop. But I'm not sure how I can maintain the order while adding
a field to each tuple.
ideas?
Thanks,
lauren
Re: add a field, ordered
Posted by Alan Gates <ga...@hortonworks.com>.
Take a look at https://issues.apache.org/jira/browse/PIG-2353 I believe that's the JIRA for where they're doing the work.
Alan.
On Aug 14, 2012, at 3:38 AM, Lauren Blau wrote:
> Is the source for it available in the development area? I'd be happy to
> help if I can.
> Lauren
>
> On Tue, Aug 14, 2012 at 6:05 AM, Gianmarco De Francisci Morales <
> gdfm@apache.org> wrote:
>
>> Hi,
>>
>> We are finalizing a feature that would solve your problems, something like
>> ROW_NUMBER in some SQL dialect, we call it RANK.
>> This operator will add a unique consecutive row number to each tuple in the
>> relationship.
>> Then you will be able to join the two relationships on the rank field.
>>
>> For the moment being, however, I think there is no easy way to achieve what
>> you want to do.
>>
>> Cheers,
>> --
>> Gianmarco
>>
>>
>>
>> On Tue, Aug 14, 2012 at 11:55 AM, Lauren Blau <
>> lauren.blau@digitalreasoning.com> wrote:
>>
>>> I want to match up tuples from 2 relations. For each key, the 2
>> relations
>>> will always have the same number of tuples and match by position (the
>> first
>>> tuple in each are a match, the second tuple in each, etc).
>>>
>>> so if I have
>>> relation1 = 5,9,7
>>> relation2 = z,a,d
>>>
>>> I want to end up with
>>>
>>> relation3 = (5,z),(9,a),(7,d)
>>>
>>> I figure I need a way to generate a matching key on the ordered tuples of
>>> the relations and then do a cogroup. But I'm stuck on generating the key.
>>> Since adding a field is a project, I assume this has to be done as part
>> of
>>> a foreach loop. But I'm not sure how I can maintain the order while
>> adding
>>> a field to each tuple.
>>>
>>> ideas?
>>> Thanks,
>>> lauren
>>>
>>
Re: add a field, ordered
Posted by Lauren Blau <la...@digitalreasoning.com>.
Is the source for it available in the development area? I'd be happy to
help if I can.
Lauren
On Tue, Aug 14, 2012 at 6:05 AM, Gianmarco De Francisci Morales <
gdfm@apache.org> wrote:
> Hi,
>
> We are finalizing a feature that would solve your problems, something like
> ROW_NUMBER in some SQL dialect, we call it RANK.
> This operator will add a unique consecutive row number to each tuple in the
> relationship.
> Then you will be able to join the two relationships on the rank field.
>
> For the moment being, however, I think there is no easy way to achieve what
> you want to do.
>
> Cheers,
> --
> Gianmarco
>
>
>
> On Tue, Aug 14, 2012 at 11:55 AM, Lauren Blau <
> lauren.blau@digitalreasoning.com> wrote:
>
> > I want to match up tuples from 2 relations. For each key, the 2
> relations
> > will always have the same number of tuples and match by position (the
> first
> > tuple in each are a match, the second tuple in each, etc).
> >
> > so if I have
> > relation1 = 5,9,7
> > relation2 = z,a,d
> >
> > I want to end up with
> >
> > relation3 = (5,z),(9,a),(7,d)
> >
> > I figure I need a way to generate a matching key on the ordered tuples of
> > the relations and then do a cogroup. But I'm stuck on generating the key.
> > Since adding a field is a project, I assume this has to be done as part
> of
> > a foreach loop. But I'm not sure how I can maintain the order while
> adding
> > a field to each tuple.
> >
> > ideas?
> > Thanks,
> > lauren
> >
>
Re: add a field, ordered
Posted by Gianmarco De Francisci Morales <gd...@apache.org>.
Hi,
We are finalizing a feature that would solve your problems, something like
ROW_NUMBER in some SQL dialect, we call it RANK.
This operator will add a unique consecutive row number to each tuple in the
relationship.
Then you will be able to join the two relationships on the rank field.
For the moment being, however, I think there is no easy way to achieve what
you want to do.
Cheers,
--
Gianmarco
On Tue, Aug 14, 2012 at 11:55 AM, Lauren Blau <
lauren.blau@digitalreasoning.com> wrote:
> I want to match up tuples from 2 relations. For each key, the 2 relations
> will always have the same number of tuples and match by position (the first
> tuple in each are a match, the second tuple in each, etc).
>
> so if I have
> relation1 = 5,9,7
> relation2 = z,a,d
>
> I want to end up with
>
> relation3 = (5,z),(9,a),(7,d)
>
> I figure I need a way to generate a matching key on the ordered tuples of
> the relations and then do a cogroup. But I'm stuck on generating the key.
> Since adding a field is a project, I assume this has to be done as part of
> a foreach loop. But I'm not sure how I can maintain the order while adding
> a field to each tuple.
>
> ideas?
> Thanks,
> lauren
>