You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Mat Kelcey <ma...@gmail.com> on 2011/07/11 07:47:01 UTC
trouble with syntax for flatten in a foreach
hi,
i've got a pretty simple transform of data i need to do and i can't for the
life of me work it out.
i feel like i'm missing something trivial...
i want to go from this...
person key value
bob age 25
bob colour red
fred age 30
fred food bagels
to this...
person age colour food
bob 25 red null
fred 30 null bagels
here's the best i can do....
> data = load 'blah' as (uid:chararray, key:chararray, value:chararray);
-- data: {uid: chararray,key: chararray,value: chararray}
(bob,age,25)
(bob,colour,red)
(fred,age,30)
(fred,food,bagels)
> split data into
by_age if key=='age',
by_colour if key=='colour',
by_food if key=='food';
> cogrouped = cogroup by_age by uid, by_colour by uid, by_food by uid;
-- cogrouped: {group: chararray,by_age: {(uid: chararray,key:
chararray,value: chararray)},by_colour: {(uid: chararray,key:
chararray,value: chararray)},by_food: {(uid: chararray,key: chararray,value:
chararray)}}
(bob,{(bob,age,25)},{(bob,colour,red)},{})
(fred,{(fred,age,30)},{},{(fred,food,bagels)})
> flattened = foreach cogrouped generate group as uid, by_age.value as age,
by_colour.value as colour, by_food.value as food;
-- flattened: {uid: chararray,age: {(value: chararray)},colour: {(value:
chararray)},food: {(value: chararray)}}
(bob,{(25)},{(red)},{})
(fred,{(30)},{},{(bagels)})
any attempt to call flatten on the tuples, eg
> flattened = foreach cogrouped generate group as uid,
flatten(by_food.value) as food;
and i lose the entries that had a empty bag for food (eg bob in this case)
i've got a feeling isempty might get me somewhere and
> flattened = foreach cogrouped generate
group as uid,
(IsEmpty(by_food.value) ? 0 : 1);
(bob,0)
(fred,1)
but any attempt to use a real value in there fails, i can't get the syntax
correct.
> flattened = foreach cogrouped generate
group as uid,
(IsEmpty(by_food.value) ? {} : by_food.value);
not sure how to define an empty bag for the left hand side of the bin cond?
i must be missing something fundamental somewhere.
help me obiwan kanobi, you're my only hope.
cheers,
mat
Re: trouble with syntax for flatten in a foreach
Posted by Mat Kelcey <ma...@gmail.com>.
i take it all back
generate group as uid,
flatten((IsEmpty(fil_height) ? {('')} : fil_height.value)) as height;
does work
thanks for the help
mat
On 11 July 2011 15:44, Mat Kelcey <ma...@gmail.com> wrote:
> Thanks Thejas,
> I was using pig0.9 (last nights trunk) and couldn't get the bincond +
> flatten combo to work...
> I'll reproduce tonight (if i get time) and reply with exact messaging...
> Cheers,
> Mat
>
> On 11 July 2011 12:21, Thejas Nair <th...@hortonworks.com> wrote:
>
>> The nested-foreach statement is your friend!
>>
>> l = load 'b.pig' as (uid:chararray, key:chararray, value:chararray);
>> g = group l by uid;
>> f = foreach g {
>> fil_age = filter l by key == 'age';
>> fil_colour = filter l by key == 'colour' ;
>> fil_food = filter l by key == 'food';
>>
>> generate group as uid,
>> MAX(fil_age.value) as age,
>> MAX(fil_colour.value) as value,
>> MAX(fil_food.value) as food;
>> }
>>
>> I have used Jacob's idea of using MAX, i think that's more cleaner than
>> flatten + bincond for this use case.
>>
>> The flatten + bincond syntax in your example should work in 0.9, it has
>> some fixes for schema merging issues.
>>
>> -Thejas
>>
>>
>>
>>
>> On 7/10/11 10:47 PM, Mat Kelcey wrote:
>>
>>> hi,
>>>
>>> i've got a pretty simple transform of data i need to do and i can't for
>>> the
>>> life of me work it out.
>>> i feel like i'm missing something trivial...
>>>
>>> i want to go from this...
>>> person key value
>>> bob age 25
>>> bob colour red
>>> fred age 30
>>> fred food bagels
>>>
>>> to this...
>>> person age colour food
>>> bob 25 red null
>>> fred 30 null bagels
>>>
>>> here's the best i can do....
>>>
>>> data = load 'blah' as (uid:chararray, key:chararray, value:chararray);
>>>>
>>> -- data: {uid: chararray,key: chararray,value: chararray}
>>> (bob,age,25)
>>> (bob,colour,red)
>>> (fred,age,30)
>>> (fred,food,bagels)
>>>
>>> split data into
>>>>
>>> by_age if key=='age',
>>> by_colour if key=='colour',
>>> by_food if key=='food';
>>>
>>> cogrouped = cogroup by_age by uid, by_colour by uid, by_food by uid;
>>>>
>>> -- cogrouped: {group: chararray,by_age: {(uid: chararray,key:
>>> chararray,value: chararray)},by_colour: {(uid: chararray,key:
>>> chararray,value: chararray)},by_food: {(uid: chararray,key:
>>> chararray,value:
>>> chararray)}}
>>> (bob,{(bob,age,25)},{(bob,**colour,red)},{})
>>> (fred,{(fred,age,30)},{},{(**fred,food,bagels)})
>>>
>>> flattened = foreach cogrouped generate group as uid, by_age.value as
>>>> age,
>>>>
>>> by_colour.value as colour, by_food.value as food;
>>> -- flattened: {uid: chararray,age: {(value: chararray)},colour: {(value:
>>> chararray)},food: {(value: chararray)}}
>>> (bob,{(25)},{(red)},{})
>>> (fred,{(30)},{},{(bagels)})
>>>
>>> any attempt to call flatten on the tuples, eg
>>>
>>>> flattened = foreach cogrouped generate group as uid,
>>>>
>>> flatten(by_food.value) as food;
>>> and i lose the entries that had a empty bag for food (eg bob in this
>>> case)
>>>
>>> i've got a feeling isempty might get me somewhere and
>>>
>>> flattened = foreach cogrouped generate
>>>>
>>> group as uid,
>>> (IsEmpty(by_food.value) ? 0 : 1);
>>> (bob,0)
>>> (fred,1)
>>>
>>> but any attempt to use a real value in there fails, i can't get the
>>> syntax
>>> correct.
>>>
>>>> flattened = foreach cogrouped generate
>>>>
>>> group as uid,
>>> (IsEmpty(by_food.value) ? {} : by_food.value);
>>>
>>> not sure how to define an empty bag for the left hand side of the bin
>>> cond?
>>>
>>> i must be missing something fundamental somewhere.
>>> help me obiwan kanobi, you're my only hope.
>>>
>>> cheers,
>>> mat
>>>
>>>
>>
>
Re: trouble with syntax for flatten in a foreach
Posted by Thejas Nair <th...@hortonworks.com>.
The nested-foreach statement is your friend!
l = load 'b.pig' as (uid:chararray, key:chararray, value:chararray);
g = group l by uid;
f = foreach g {
fil_age = filter l by key == 'age';
fil_colour = filter l by key == 'colour' ;
fil_food = filter l by key == 'food';
generate group as uid,
MAX(fil_age.value) as age,
MAX(fil_colour.value) as value,
MAX(fil_food.value) as food;
}
I have used Jacob's idea of using MAX, i think that's more cleaner than
flatten + bincond for this use case.
The flatten + bincond syntax in your example should work in 0.9, it has
some fixes for schema merging issues.
-Thejas
On 7/10/11 10:47 PM, Mat Kelcey wrote:
> hi,
>
> i've got a pretty simple transform of data i need to do and i can't for the
> life of me work it out.
> i feel like i'm missing something trivial...
>
> i want to go from this...
> person key value
> bob age 25
> bob colour red
> fred age 30
> fred food bagels
>
> to this...
> person age colour food
> bob 25 red null
> fred 30 null bagels
>
> here's the best i can do....
>
>> data = load 'blah' as (uid:chararray, key:chararray, value:chararray);
> -- data: {uid: chararray,key: chararray,value: chararray}
> (bob,age,25)
> (bob,colour,red)
> (fred,age,30)
> (fred,food,bagels)
>
>> split data into
> by_age if key=='age',
> by_colour if key=='colour',
> by_food if key=='food';
>
>> cogrouped = cogroup by_age by uid, by_colour by uid, by_food by uid;
> -- cogrouped: {group: chararray,by_age: {(uid: chararray,key:
> chararray,value: chararray)},by_colour: {(uid: chararray,key:
> chararray,value: chararray)},by_food: {(uid: chararray,key: chararray,value:
> chararray)}}
> (bob,{(bob,age,25)},{(bob,colour,red)},{})
> (fred,{(fred,age,30)},{},{(fred,food,bagels)})
>
>> flattened = foreach cogrouped generate group as uid, by_age.value as age,
> by_colour.value as colour, by_food.value as food;
> -- flattened: {uid: chararray,age: {(value: chararray)},colour: {(value:
> chararray)},food: {(value: chararray)}}
> (bob,{(25)},{(red)},{})
> (fred,{(30)},{},{(bagels)})
>
> any attempt to call flatten on the tuples, eg
>> flattened = foreach cogrouped generate group as uid,
> flatten(by_food.value) as food;
> and i lose the entries that had a empty bag for food (eg bob in this case)
>
> i've got a feeling isempty might get me somewhere and
>
>> flattened = foreach cogrouped generate
> group as uid,
> (IsEmpty(by_food.value) ? 0 : 1);
> (bob,0)
> (fred,1)
>
> but any attempt to use a real value in there fails, i can't get the syntax
> correct.
>> flattened = foreach cogrouped generate
> group as uid,
> (IsEmpty(by_food.value) ? {} : by_food.value);
>
> not sure how to define an empty bag for the left hand side of the bin cond?
>
> i must be missing something fundamental somewhere.
> help me obiwan kanobi, you're my only hope.
>
> cheers,
> mat
>