You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Xavier Stevens <xs...@mozilla.com> on 2010/12/07 20:25:53 UTC
FOREACH and FLATTEN Syntax
I'm currently running into an issue where I have a bag of tuples like so:
>DUMP foo;
( {(a,b,c,d,e), (1,2,3,4,5)}, ... , {(f,g,h,i,j), (6,7,8,9,10)} )
Each one of the tuples has the same number of fields. So I try to
flatten the structure so I can get just the 1st, 3rd and 4th elements of
each inner tuple.
>flat_foo = FOREACH foo GENERATE FLATTEN($0) AS (T: tuple(f1:chararray,
f2:chararray, f3:chararray, f4:chararray, f5:chararray));
>DUMP flat_foo;
(a, b, c, d, e)
(1, 2, 3, 4, 5)
...
(f,g,h,i,j)
(6,7,8,9,10)
>subset_foo = FOREACH flat_foo GENERATE T.f2, T.f4, T.f5;
>DUMP subset_foo;
When I do this I end up getting a casting error "ERROR 2997: Unable to
recreate exception from backed error: java.lang.ClassCastException:
java.lang.String cannot be cast to org.apache.pig.data.Tuple".
Anyone know what I am doing wrong here?
Thanks,
-Xavier
Re: FOREACH and FLATTEN Syntax
Posted by Daniel Dai <ji...@yahoo-inc.com>.
When you flatten a bag, you get items inside the tuple. The foreach
statement is wrong, you should change it to:
flat_foo = FOREACH foo GENERATE FLATTEN($0) as (f1, f2, f3, f4, f5);
DUMP flat_foo;
(a, b, c, d, e)
(1, 2, 3, 4, 5)
...
(f,g,h,i,j)
(6,7,8,9,10)
subset_foo = FOREACH flat_foo GENERATE f2, f4, f5;
DUMP subset_foo;
(b,d,e)
(2,4,5)
...
(g,i,j)
(7,9,10)
Daniel
Xavier Stevens wrote:
> I'm currently running into an issue where I have a bag of tuples like so:
>
>
>> DUMP foo;
>>
> ( {(a,b,c,d,e), (1,2,3,4,5)}, ... , {(f,g,h,i,j), (6,7,8,9,10)} )
>
> Each one of the tuples has the same number of fields. So I try to
> flatten the structure so I can get just the 1st, 3rd and 4th elements of
> each inner tuple.
>
>
>> flat_foo = FOREACH foo GENERATE FLATTEN($0) AS (T: tuple(f1:chararray,
>>
> f2:chararray, f3:chararray, f4:chararray, f5:chararray));
>
>> DUMP flat_foo;
>>
> (a, b, c, d, e)
> (1, 2, 3, 4, 5)
> ...
> (f,g,h,i,j)
> (6,7,8,9,10)
>
>
>> subset_foo = FOREACH flat_foo GENERATE T.f2, T.f4, T.f5;
>> DUMP subset_foo;
>>
>
> When I do this I end up getting a casting error "ERROR 2997: Unable to
> recreate exception from backed error: java.lang.ClassCastException:
> java.lang.String cannot be cast to org.apache.pig.data.Tuple".
>
>
> Anyone know what I am doing wrong here?
>
>
> Thanks,
>
>
> -Xavier
>