You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Jay Hacker <ja...@gmail.com> on 2011/04/15 22:45:34 UTC
Looking up two fields in a relation with another relation
I'm trying to replace a couple of fields in a relation with values
looked up in another relation. Here's an example; let's say I have a
relation mapping each integer to its square:
-----map.txt-----
1 1
2 4
3 9
4 16
5 25
Then I have some data, let's call the columns a and b:
-----data.txt-----
1 2
3 4
5 2
I want to replace each number in the data with its square. My basic
approach is to join 'a' with the key, then generate the value; then
join 'b' with the key, and generate that value. Here's my pig script:
m = load 'map.txt' as (k,v);
data = load 'data.txt' as (a,b);
x = join m by k, data by a;
y = foreach x generate v as aa, b;
z = join m by k, y by b;
w = foreach z generate aa, v as bb;
dump w;
This outputs:
(4,4)
(4,4)
(16,16)
The problem is it y's version of v gets replaced with w's version. I
expect it to output:
(1, 4)
(9, 16)
(25, 4)
What's weird is I'm pretty sure this used to work in Pig 0.7. If
there's a better way to do this (using maps?), please let me know.
I'm using Pig 0.8 with Cloudera CDH3b4.
Thanks.
Re: Looking up two fields in a relation with another relation
Posted by Mridul Muralidharan <mr...@yahoo-inc.com>.
Hi Daniel,
I did test to see see that it was fixed, and the description (as in
the jira) did not directly seem to apply to this issue (when I did a
cursory search) - hence the query.
Since the columns were getting re-aliased (and after a join in one
case), I was not expecting initial aliases to apply.
Thanks for clarifying !
Regards,
Mridul
On Saturday 23 April 2011 12:52 AM, Jianyong Dai wrote:
> Hi, Mridul,
> Sorry I was confused when you say "alias re-use" :). PIG-1705 happens if
> the same column is eventually used twice in a relation. Here in z {m::k,
> m::v, y::aa, y::data}, both m::k and y::aa can be traced back to m.k. I
> did tried PIG-1705 and verified that is the cause. The patch is not
> directly applicable to 0.8.0 release, since the delta is relative to a
> snapshot after release. Check out from 0.8 branch or wait for 0.8.1 in a
> few days.
>
> Thanks,
> Daniel
>
> On 04/22/2011 12:53 AM, Mridul Muralidharan wrote:
>> Alias vs relation difference.
>> The bug is about alias issue, not relation iirc.
>> Everything comes from limited number of relations which are loaded
>> anyway :-)
>>
>> - Mridul
>>
>> On Friday 22 April 2011 06:40 AM, Jianyong Dai wrote:
>>> m is actually reused. z is joining two relations both stemming from m.
>>>
>>> Daniel
>>>
>>> On 04/19/2011 12:28 AM, Mridul Muralidharan wrote:
>>>> If I am not wrong, PIG-1705 talks about conflicting alias's in a join :
>>>> interesting to see how that affects Jay Hacker's issue where there is no
>>>> alias re-use from what I saw ...
>>>>
>>>>
>>>> Regards,
>>>> Mridul
>>>>
>>>> On Tuesday 19 April 2011 03:11 AM, Daniel Dai wrote:
>>>>> I believe it is PIG-1705.
>>>>>
>>>>> Daniel
>>>>>
>>>>> On 04/18/2011 12:02 PM, Jay Hacker wrote:
>>>>>> Thanks. Which Jira issue number is it?
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Apr 15, 2011 at 9:07 PM, Daniel Dai<ji...@yahoo-inc.com> wrote:
>>>>>>> This is a known bug, it is fixed on 0.8 svn. You can check out from
>>>>>>> http://svn.apache.org/repos/asf/pig/branches/branch-0.8, or wait for 0.8.1
>>>>>>> coming in a few days.
>>>>>>>
>>>>>>> Daniel
>>>>>>>
>>>>>>> On 04/15/2011 01:45 PM, Jay Hacker wrote:
>>>>>>>> I'm trying to replace a couple of fields in a relation with values
>>>>>>>> looked up in another relation. Here's an example; let's say I have a
>>>>>>>> relation mapping each integer to its square:
>>>>>>>>
>>>>>>>> -----map.txt-----
>>>>>>>> 1 1
>>>>>>>> 2 4
>>>>>>>> 3 9
>>>>>>>> 4 16
>>>>>>>> 5 25
>>>>>>>>
>>>>>>>> Then I have some data, let's call the columns a and b:
>>>>>>>>
>>>>>>>> -----data.txt-----
>>>>>>>> 1 2
>>>>>>>> 3 4
>>>>>>>> 5 2
>>>>>>>>
>>>>>>>> I want to replace each number in the data with its square. My basic
>>>>>>>> approach is to join 'a' with the key, then generate the value; then
>>>>>>>> join 'b' with the key, and generate that value. Here's my pig script:
>>>>>>>>
>>>>>>>> m = load 'map.txt' as (k,v);
>>>>>>>> data = load 'data.txt' as (a,b);
>>>>>>>> x = join m by k, data by a;
>>>>>>>> y = foreach x generate v as aa, b;
>>>>>>>> z = join m by k, y by b;
>>>>>>>> w = foreach z generate aa, v as bb;
>>>>>>>> dump w;
>>>>>>>>
>>>>>>>> This outputs:
>>>>>>>>
>>>>>>>> (4,4)
>>>>>>>> (4,4)
>>>>>>>> (16,16)
>>>>>>>>
>>>>>>>> The problem is it y's version of v gets replaced with w's version. I
>>>>>>>> expect it to output:
>>>>>>>>
>>>>>>>> (1, 4)
>>>>>>>> (9, 16)
>>>>>>>> (25, 4)
>>>>>>>>
>>>>>>>> What's weird is I'm pretty sure this used to work in Pig 0.7. If
>>>>>>>> there's a better way to do this (using maps?), please let me know.
>>>>>>>> I'm using Pig 0.8 with Cloudera CDH3b4.
>>>>>>>>
>>>>>>>> Thanks.
>
Re: Looking up two fields in a relation with another relation
Posted by Daniel Dai <ji...@yahoo-inc.com>.
Hi, Mridul,
Sorry I was confused when you say "alias re-use" :). PIG-1705 happens if
the same column is eventually used twice in a relation. Here in z {m::k,
m::v, y::aa, y::data}, both m::k and y::aa can be traced back to m.k. I
did tried PIG-1705 and verified that is the cause. The patch is not
directly applicable to 0.8.0 release, since the delta is relative to a
snapshot after release. Check out from 0.8 branch or wait for 0.8.1 in a
few days.
Thanks,
Daniel
On 04/22/2011 12:53 AM, Mridul Muralidharan wrote:
> Alias vs relation difference.
> The bug is about alias issue, not relation iirc.
> Everything comes from limited number of relations which are loaded
> anyway :-)
>
> - Mridul
>
> On Friday 22 April 2011 06:40 AM, Jianyong Dai wrote:
>> m is actually reused. z is joining two relations both stemming from m.
>>
>> Daniel
>>
>> On 04/19/2011 12:28 AM, Mridul Muralidharan wrote:
>>> If I am not wrong, PIG-1705 talks about conflicting alias's in a join :
>>> interesting to see how that affects Jay Hacker's issue where there is no
>>> alias re-use from what I saw ...
>>>
>>>
>>> Regards,
>>> Mridul
>>>
>>> On Tuesday 19 April 2011 03:11 AM, Daniel Dai wrote:
>>>> I believe it is PIG-1705.
>>>>
>>>> Daniel
>>>>
>>>> On 04/18/2011 12:02 PM, Jay Hacker wrote:
>>>>> Thanks. Which Jira issue number is it?
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Apr 15, 2011 at 9:07 PM, Daniel Dai<ji...@yahoo-inc.com> wrote:
>>>>>> This is a known bug, it is fixed on 0.8 svn. You can check out from
>>>>>> http://svn.apache.org/repos/asf/pig/branches/branch-0.8, or wait for 0.8.1
>>>>>> coming in a few days.
>>>>>>
>>>>>> Daniel
>>>>>>
>>>>>> On 04/15/2011 01:45 PM, Jay Hacker wrote:
>>>>>>> I'm trying to replace a couple of fields in a relation with values
>>>>>>> looked up in another relation. Here's an example; let's say I have a
>>>>>>> relation mapping each integer to its square:
>>>>>>>
>>>>>>> -----map.txt-----
>>>>>>> 1 1
>>>>>>> 2 4
>>>>>>> 3 9
>>>>>>> 4 16
>>>>>>> 5 25
>>>>>>>
>>>>>>> Then I have some data, let's call the columns a and b:
>>>>>>>
>>>>>>> -----data.txt-----
>>>>>>> 1 2
>>>>>>> 3 4
>>>>>>> 5 2
>>>>>>>
>>>>>>> I want to replace each number in the data with its square. My basic
>>>>>>> approach is to join 'a' with the key, then generate the value; then
>>>>>>> join 'b' with the key, and generate that value. Here's my pig script:
>>>>>>>
>>>>>>> m = load 'map.txt' as (k,v);
>>>>>>> data = load 'data.txt' as (a,b);
>>>>>>> x = join m by k, data by a;
>>>>>>> y = foreach x generate v as aa, b;
>>>>>>> z = join m by k, y by b;
>>>>>>> w = foreach z generate aa, v as bb;
>>>>>>> dump w;
>>>>>>>
>>>>>>> This outputs:
>>>>>>>
>>>>>>> (4,4)
>>>>>>> (4,4)
>>>>>>> (16,16)
>>>>>>>
>>>>>>> The problem is it y's version of v gets replaced with w's version. I
>>>>>>> expect it to output:
>>>>>>>
>>>>>>> (1, 4)
>>>>>>> (9, 16)
>>>>>>> (25, 4)
>>>>>>>
>>>>>>> What's weird is I'm pretty sure this used to work in Pig 0.7. If
>>>>>>> there's a better way to do this (using maps?), please let me know.
>>>>>>> I'm using Pig 0.8 with Cloudera CDH3b4.
>>>>>>>
>>>>>>> Thanks.
Re: Looking up two fields in a relation with another relation
Posted by Mridul Muralidharan <mr...@yahoo-inc.com>.
Alias vs relation difference.
The bug is about alias issue, not relation iirc.
Everything comes from limited number of relations which are loaded
anyway :-)
- Mridul
On Friday 22 April 2011 06:40 AM, Jianyong Dai wrote:
> m is actually reused. z is joining two relations both stemming from m.
>
> Daniel
>
> On 04/19/2011 12:28 AM, Mridul Muralidharan wrote:
>> If I am not wrong, PIG-1705 talks about conflicting alias's in a join :
>> interesting to see how that affects Jay Hacker's issue where there is no
>> alias re-use from what I saw ...
>>
>>
>> Regards,
>> Mridul
>>
>> On Tuesday 19 April 2011 03:11 AM, Daniel Dai wrote:
>>> I believe it is PIG-1705.
>>>
>>> Daniel
>>>
>>> On 04/18/2011 12:02 PM, Jay Hacker wrote:
>>>> Thanks. Which Jira issue number is it?
>>>>
>>>>
>>>>
>>>> On Fri, Apr 15, 2011 at 9:07 PM, Daniel Dai<ji...@yahoo-inc.com> wrote:
>>>>> This is a known bug, it is fixed on 0.8 svn. You can check out from
>>>>> http://svn.apache.org/repos/asf/pig/branches/branch-0.8, or wait for 0.8.1
>>>>> coming in a few days.
>>>>>
>>>>> Daniel
>>>>>
>>>>> On 04/15/2011 01:45 PM, Jay Hacker wrote:
>>>>>> I'm trying to replace a couple of fields in a relation with values
>>>>>> looked up in another relation. Here's an example; let's say I have a
>>>>>> relation mapping each integer to its square:
>>>>>>
>>>>>> -----map.txt-----
>>>>>> 1 1
>>>>>> 2 4
>>>>>> 3 9
>>>>>> 4 16
>>>>>> 5 25
>>>>>>
>>>>>> Then I have some data, let's call the columns a and b:
>>>>>>
>>>>>> -----data.txt-----
>>>>>> 1 2
>>>>>> 3 4
>>>>>> 5 2
>>>>>>
>>>>>> I want to replace each number in the data with its square. My basic
>>>>>> approach is to join 'a' with the key, then generate the value; then
>>>>>> join 'b' with the key, and generate that value. Here's my pig script:
>>>>>>
>>>>>> m = load 'map.txt' as (k,v);
>>>>>> data = load 'data.txt' as (a,b);
>>>>>> x = join m by k, data by a;
>>>>>> y = foreach x generate v as aa, b;
>>>>>> z = join m by k, y by b;
>>>>>> w = foreach z generate aa, v as bb;
>>>>>> dump w;
>>>>>>
>>>>>> This outputs:
>>>>>>
>>>>>> (4,4)
>>>>>> (4,4)
>>>>>> (16,16)
>>>>>>
>>>>>> The problem is it y's version of v gets replaced with w's version. I
>>>>>> expect it to output:
>>>>>>
>>>>>> (1, 4)
>>>>>> (9, 16)
>>>>>> (25, 4)
>>>>>>
>>>>>> What's weird is I'm pretty sure this used to work in Pig 0.7. If
>>>>>> there's a better way to do this (using maps?), please let me know.
>>>>>> I'm using Pig 0.8 with Cloudera CDH3b4.
>>>>>>
>>>>>> Thanks.
>
Re: Looking up two fields in a relation with another relation
Posted by Daniel Dai <ji...@yahoo-inc.com>.
m is actually reused. z is joining two relations both stemming from m.
Daniel
On 04/19/2011 12:28 AM, Mridul Muralidharan wrote:
> If I am not wrong, PIG-1705 talks about conflicting alias's in a join :
> interesting to see how that affects Jay Hacker's issue where there is no
> alias re-use from what I saw ...
>
>
> Regards,
> Mridul
>
> On Tuesday 19 April 2011 03:11 AM, Daniel Dai wrote:
>> I believe it is PIG-1705.
>>
>> Daniel
>>
>> On 04/18/2011 12:02 PM, Jay Hacker wrote:
>>> Thanks. Which Jira issue number is it?
>>>
>>>
>>>
>>> On Fri, Apr 15, 2011 at 9:07 PM, Daniel Dai<ji...@yahoo-inc.com> wrote:
>>>> This is a known bug, it is fixed on 0.8 svn. You can check out from
>>>> http://svn.apache.org/repos/asf/pig/branches/branch-0.8, or wait for 0.8.1
>>>> coming in a few days.
>>>>
>>>> Daniel
>>>>
>>>> On 04/15/2011 01:45 PM, Jay Hacker wrote:
>>>>> I'm trying to replace a couple of fields in a relation with values
>>>>> looked up in another relation. Here's an example; let's say I have a
>>>>> relation mapping each integer to its square:
>>>>>
>>>>> -----map.txt-----
>>>>> 1 1
>>>>> 2 4
>>>>> 3 9
>>>>> 4 16
>>>>> 5 25
>>>>>
>>>>> Then I have some data, let's call the columns a and b:
>>>>>
>>>>> -----data.txt-----
>>>>> 1 2
>>>>> 3 4
>>>>> 5 2
>>>>>
>>>>> I want to replace each number in the data with its square. My basic
>>>>> approach is to join 'a' with the key, then generate the value; then
>>>>> join 'b' with the key, and generate that value. Here's my pig script:
>>>>>
>>>>> m = load 'map.txt' as (k,v);
>>>>> data = load 'data.txt' as (a,b);
>>>>> x = join m by k, data by a;
>>>>> y = foreach x generate v as aa, b;
>>>>> z = join m by k, y by b;
>>>>> w = foreach z generate aa, v as bb;
>>>>> dump w;
>>>>>
>>>>> This outputs:
>>>>>
>>>>> (4,4)
>>>>> (4,4)
>>>>> (16,16)
>>>>>
>>>>> The problem is it y's version of v gets replaced with w's version. I
>>>>> expect it to output:
>>>>>
>>>>> (1, 4)
>>>>> (9, 16)
>>>>> (25, 4)
>>>>>
>>>>> What's weird is I'm pretty sure this used to work in Pig 0.7. If
>>>>> there's a better way to do this (using maps?), please let me know.
>>>>> I'm using Pig 0.8 with Cloudera CDH3b4.
>>>>>
>>>>> Thanks.
Re: Looking up two fields in a relation with another relation
Posted by Mridul Muralidharan <mr...@yahoo-inc.com>.
If I am not wrong, PIG-1705 talks about conflicting alias's in a join :
interesting to see how that affects Jay Hacker's issue where there is no
alias re-use from what I saw ...
Regards,
Mridul
On Tuesday 19 April 2011 03:11 AM, Daniel Dai wrote:
> I believe it is PIG-1705.
>
> Daniel
>
> On 04/18/2011 12:02 PM, Jay Hacker wrote:
>> Thanks. Which Jira issue number is it?
>>
>>
>>
>> On Fri, Apr 15, 2011 at 9:07 PM, Daniel Dai<ji...@yahoo-inc.com> wrote:
>>> This is a known bug, it is fixed on 0.8 svn. You can check out from
>>> http://svn.apache.org/repos/asf/pig/branches/branch-0.8, or wait for 0.8.1
>>> coming in a few days.
>>>
>>> Daniel
>>>
>>> On 04/15/2011 01:45 PM, Jay Hacker wrote:
>>>> I'm trying to replace a couple of fields in a relation with values
>>>> looked up in another relation. Here's an example; let's say I have a
>>>> relation mapping each integer to its square:
>>>>
>>>> -----map.txt-----
>>>> 1 1
>>>> 2 4
>>>> 3 9
>>>> 4 16
>>>> 5 25
>>>>
>>>> Then I have some data, let's call the columns a and b:
>>>>
>>>> -----data.txt-----
>>>> 1 2
>>>> 3 4
>>>> 5 2
>>>>
>>>> I want to replace each number in the data with its square. My basic
>>>> approach is to join 'a' with the key, then generate the value; then
>>>> join 'b' with the key, and generate that value. Here's my pig script:
>>>>
>>>> m = load 'map.txt' as (k,v);
>>>> data = load 'data.txt' as (a,b);
>>>> x = join m by k, data by a;
>>>> y = foreach x generate v as aa, b;
>>>> z = join m by k, y by b;
>>>> w = foreach z generate aa, v as bb;
>>>> dump w;
>>>>
>>>> This outputs:
>>>>
>>>> (4,4)
>>>> (4,4)
>>>> (16,16)
>>>>
>>>> The problem is it y's version of v gets replaced with w's version. I
>>>> expect it to output:
>>>>
>>>> (1, 4)
>>>> (9, 16)
>>>> (25, 4)
>>>>
>>>> What's weird is I'm pretty sure this used to work in Pig 0.7. If
>>>> there's a better way to do this (using maps?), please let me know.
>>>> I'm using Pig 0.8 with Cloudera CDH3b4.
>>>>
>>>> Thanks.
>>>
>
Re: Looking up two fields in a relation with another relation
Posted by Daniel Dai <ji...@yahoo-inc.com>.
I believe it is PIG-1705.
Daniel
On 04/18/2011 12:02 PM, Jay Hacker wrote:
> Thanks. Which Jira issue number is it?
>
>
>
> On Fri, Apr 15, 2011 at 9:07 PM, Daniel Dai<ji...@yahoo-inc.com> wrote:
>> This is a known bug, it is fixed on 0.8 svn. You can check out from
>> http://svn.apache.org/repos/asf/pig/branches/branch-0.8, or wait for 0.8.1
>> coming in a few days.
>>
>> Daniel
>>
>> On 04/15/2011 01:45 PM, Jay Hacker wrote:
>>> I'm trying to replace a couple of fields in a relation with values
>>> looked up in another relation. Here's an example; let's say I have a
>>> relation mapping each integer to its square:
>>>
>>> -----map.txt-----
>>> 1 1
>>> 2 4
>>> 3 9
>>> 4 16
>>> 5 25
>>>
>>> Then I have some data, let's call the columns a and b:
>>>
>>> -----data.txt-----
>>> 1 2
>>> 3 4
>>> 5 2
>>>
>>> I want to replace each number in the data with its square. My basic
>>> approach is to join 'a' with the key, then generate the value; then
>>> join 'b' with the key, and generate that value. Here's my pig script:
>>>
>>> m = load 'map.txt' as (k,v);
>>> data = load 'data.txt' as (a,b);
>>> x = join m by k, data by a;
>>> y = foreach x generate v as aa, b;
>>> z = join m by k, y by b;
>>> w = foreach z generate aa, v as bb;
>>> dump w;
>>>
>>> This outputs:
>>>
>>> (4,4)
>>> (4,4)
>>> (16,16)
>>>
>>> The problem is it y's version of v gets replaced with w's version. I
>>> expect it to output:
>>>
>>> (1, 4)
>>> (9, 16)
>>> (25, 4)
>>>
>>> What's weird is I'm pretty sure this used to work in Pig 0.7. If
>>> there's a better way to do this (using maps?), please let me know.
>>> I'm using Pig 0.8 with Cloudera CDH3b4.
>>>
>>> Thanks.
>>
Re: Looking up two fields in a relation with another relation
Posted by Jay Hacker <ja...@gmail.com>.
Thanks. Which Jira issue number is it?
On Fri, Apr 15, 2011 at 9:07 PM, Daniel Dai <ji...@yahoo-inc.com> wrote:
> This is a known bug, it is fixed on 0.8 svn. You can check out from
> http://svn.apache.org/repos/asf/pig/branches/branch-0.8, or wait for 0.8.1
> coming in a few days.
>
> Daniel
>
> On 04/15/2011 01:45 PM, Jay Hacker wrote:
>>
>> I'm trying to replace a couple of fields in a relation with values
>> looked up in another relation. Here's an example; let's say I have a
>> relation mapping each integer to its square:
>>
>> -----map.txt-----
>> 1 1
>> 2 4
>> 3 9
>> 4 16
>> 5 25
>>
>> Then I have some data, let's call the columns a and b:
>>
>> -----data.txt-----
>> 1 2
>> 3 4
>> 5 2
>>
>> I want to replace each number in the data with its square. My basic
>> approach is to join 'a' with the key, then generate the value; then
>> join 'b' with the key, and generate that value. Here's my pig script:
>>
>> m = load 'map.txt' as (k,v);
>> data = load 'data.txt' as (a,b);
>> x = join m by k, data by a;
>> y = foreach x generate v as aa, b;
>> z = join m by k, y by b;
>> w = foreach z generate aa, v as bb;
>> dump w;
>>
>> This outputs:
>>
>> (4,4)
>> (4,4)
>> (16,16)
>>
>> The problem is it y's version of v gets replaced with w's version. I
>> expect it to output:
>>
>> (1, 4)
>> (9, 16)
>> (25, 4)
>>
>> What's weird is I'm pretty sure this used to work in Pig 0.7. If
>> there's a better way to do this (using maps?), please let me know.
>> I'm using Pig 0.8 with Cloudera CDH3b4.
>>
>> Thanks.
>
>
Re: Looking up two fields in a relation with another relation
Posted by Daniel Dai <ji...@yahoo-inc.com>.
This is a known bug, it is fixed on 0.8 svn. You can check out from
http://svn.apache.org/repos/asf/pig/branches/branch-0.8, or wait for
0.8.1 coming in a few days.
Daniel
On 04/15/2011 01:45 PM, Jay Hacker wrote:
> I'm trying to replace a couple of fields in a relation with values
> looked up in another relation. Here's an example; let's say I have a
> relation mapping each integer to its square:
>
> -----map.txt-----
> 1 1
> 2 4
> 3 9
> 4 16
> 5 25
>
> Then I have some data, let's call the columns a and b:
>
> -----data.txt-----
> 1 2
> 3 4
> 5 2
>
> I want to replace each number in the data with its square. My basic
> approach is to join 'a' with the key, then generate the value; then
> join 'b' with the key, and generate that value. Here's my pig script:
>
> m = load 'map.txt' as (k,v);
> data = load 'data.txt' as (a,b);
> x = join m by k, data by a;
> y = foreach x generate v as aa, b;
> z = join m by k, y by b;
> w = foreach z generate aa, v as bb;
> dump w;
>
> This outputs:
>
> (4,4)
> (4,4)
> (16,16)
>
> The problem is it y's version of v gets replaced with w's version. I
> expect it to output:
>
> (1, 4)
> (9, 16)
> (25, 4)
>
> What's weird is I'm pretty sure this used to work in Pig 0.7. If
> there's a better way to do this (using maps?), please let me know.
> I'm using Pig 0.8 with Cloudera CDH3b4.
>
> Thanks.