You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by jamal sasha <ja...@gmail.com> on 2012/10/23 17:24:40 UTC
Reading a grouped file
Hi I have a file in format
{(1,123,score) ,(1,124,score)}
{(2,356,score),(2,678,score)}
etc
I am guessing the person who was working on this forgot to flatten this in
last step?
How do I read and flatten this ?
Re: Reading a grouped file
Posted by Dan Arias <da...@bitgravity.com>.
This is a pig datatype issue. It took me a while to get used to the pig
types. See http://pig.apache.org/docs/r0.10.0/basic.html#Schemas
The relation below consists of two tuples of three values each: two
integers and a bytearray. You could read this as follows:
x = load 'file' as (t1:tuple(v1:int, v2:int, s), t2:tuple(v1:int, v2:int,
s));
Once read, you can flatten as necessary.
y = foreach x generate FLATTEN(t1), FLATTEN(t2);
--Dan
On Tue, Oct 23, 2012 at 8:24 AM, jamal sasha <ja...@gmail.com> wrote:
> Hi I have a file in format
> {(1,123,score) ,(1,124,score)}
> {(2,356,score),(2,678,score)}
> etc
>
> I am guessing the person who was working on this forgot to flatten this in
> last step?
> How do I read and flatten this ?
>
Re: Reading a grouped file
Posted by Adam Kawa <ka...@gmail.com>.
What I rather wanted to write is
a = load 'group.txt' as (b1: bag {t1: tuple(id: int, points: int,
type: chararray)});
dump a;
because what you simply want to read is a bag with three-field tuple.
Best,
Adam
2012/10/23 Adam Kawa <ka...@gmail.com>:
> You may run something like:
>
> a = load 'bag.dat' as (b1: bag {t1: tuple(id: int, points: int, type:
> chararray)}, b2: bag {t: tuple(id: int, points: int, type:
> chararray)});
> dump a;
>
> 2012/10/23 jamal sasha <ja...@gmail.com>:
>> Hi I have a file in format
>> {(1,123,score) ,(1,124,score)}
>> {(2,356,score),(2,678,score)}
>> etc
>>
>> I am guessing the person who was working on this forgot to flatten this in
>> last step?
>> How do I read and flatten this ?
Re: Reading a grouped file
Posted by Adam Kawa <ka...@gmail.com>.
You may run something like:
a = load 'bag.dat' as (b1: bag {t1: tuple(id: int, points: int, type:
chararray)}, b2: bag {t: tuple(id: int, points: int, type:
chararray)});
dump a;
2012/10/23 jamal sasha <ja...@gmail.com>:
> Hi I have a file in format
> {(1,123,score) ,(1,124,score)}
> {(2,356,score),(2,678,score)}
> etc
>
> I am guessing the person who was working on this forgot to flatten this in
> last step?
> How do I read and flatten this ?