You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Xavier Stevens <xs...@mozilla.com> on 2010/12/07 20:25:53 UTC

FOREACH and FLATTEN Syntax

I'm currently running into an issue where I have a bag of tuples like so:

>DUMP foo;
 ( {(a,b,c,d,e), (1,2,3,4,5)}, ... , {(f,g,h,i,j), (6,7,8,9,10)} )

Each one of the tuples has the same number of fields.  So I try to
flatten the structure so I can get just the 1st, 3rd and 4th elements of
each inner tuple.

>flat_foo = FOREACH foo GENERATE FLATTEN($0) AS (T: tuple(f1:chararray,
f2:chararray, f3:chararray, f4:chararray, f5:chararray));
>DUMP flat_foo;
(a, b, c, d, e)
(1, 2, 3, 4, 5)
...
(f,g,h,i,j)
(6,7,8,9,10)

>subset_foo = FOREACH flat_foo GENERATE T.f2, T.f4, T.f5;
>DUMP subset_foo;

When I do this I end up getting a casting error "ERROR 2997: Unable to
recreate exception from backed error: java.lang.ClassCastException:
java.lang.String cannot be cast to org.apache.pig.data.Tuple".


Anyone know what I am doing wrong here?


Thanks,


-Xavier

Re: FOREACH and FLATTEN Syntax

Posted by Daniel Dai <ji...@yahoo-inc.com>.
When you flatten a bag, you get items inside the tuple. The foreach 
statement is wrong, you should change it to:
flat_foo = FOREACH foo GENERATE FLATTEN($0) as (f1, f2, f3, f4, f5);

DUMP flat_foo;
(a, b, c, d, e)
(1, 2, 3, 4, 5)
...
(f,g,h,i,j)
(6,7,8,9,10)

subset_foo = FOREACH flat_foo GENERATE f2, f4, f5;
DUMP subset_foo;

(b,d,e)
(2,4,5)
...
(g,i,j)
(7,9,10)

Daniel

Xavier Stevens wrote:
> I'm currently running into an issue where I have a bag of tuples like so:
>
>   
>> DUMP foo;
>>     
>  ( {(a,b,c,d,e), (1,2,3,4,5)}, ... , {(f,g,h,i,j), (6,7,8,9,10)} )
>
> Each one of the tuples has the same number of fields.  So I try to
> flatten the structure so I can get just the 1st, 3rd and 4th elements of
> each inner tuple.
>
>   
>> flat_foo = FOREACH foo GENERATE FLATTEN($0) AS (T: tuple(f1:chararray,
>>     
> f2:chararray, f3:chararray, f4:chararray, f5:chararray));
>   
>> DUMP flat_foo;
>>     
> (a, b, c, d, e)
> (1, 2, 3, 4, 5)
> ...
> (f,g,h,i,j)
> (6,7,8,9,10)
>
>   
>> subset_foo = FOREACH flat_foo GENERATE T.f2, T.f4, T.f5;
>> DUMP subset_foo;
>>     
>
> When I do this I end up getting a casting error "ERROR 2997: Unable to
> recreate exception from backed error: java.lang.ClassCastException:
> java.lang.String cannot be cast to org.apache.pig.data.Tuple".
>
>
> Anyone know what I am doing wrong here?
>
>
> Thanks,
>
>
> -Xavier
>