You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@pig.apache.org by jamal sasha <ja...@gmail.com> on 2012/10/23 17:24:40 UTC

Reading a grouped file

Hi I have a file in format
{(1,123,score) ,(1,124,score)}
{(2,356,score),(2,678,score)}
etc

I  am guessing the person who was working on this forgot to flatten this in
last step?
How do I read and flatten this ?

Re: Reading a grouped file

Posted by Dan Arias <da...@bitgravity.com>.

This is a pig datatype issue. It took me a while to get used to the pig
types. See http://pig.apache.org/docs/r0.10.0/basic.html#Schemas

The relation below consists of two tuples of three values each: two
integers and a bytearray. You could read this as follows:

x = load 'file' as (t1:tuple(v1:int, v2:int, s), t2:tuple(v1:int, v2:int,
s));

Once read, you can flatten as necessary.

y = foreach x generate FLATTEN(t1), FLATTEN(t2);

--Dan

On Tue, Oct 23, 2012 at 8:24 AM, jamal sasha <ja...@gmail.com> wrote:

> Hi I have a file in format
> {(1,123,score) ,(1,124,score)}
> {(2,356,score),(2,678,score)}
> etc
>
> I  am guessing the person who was working on this forgot to flatten this in
> last step?
> How do I read and flatten this ?
>

Re: Reading a grouped file

Posted by Adam Kawa <ka...@gmail.com>.

What I rather wanted to write is

a = load 'group.txt' as (b1: bag {t1: tuple(id: int, points: int,
type: chararray)});
dump a;

because what you simply want to read is a bag with three-field tuple.

Best,
Adam

2012/10/23 Adam Kawa <ka...@gmail.com>:
> You may run something like:
>
> a = load 'bag.dat' as (b1: bag {t1: tuple(id: int, points: int, type:
> chararray)}, b2: bag {t: tuple(id: int, points: int, type:
> chararray)});
> dump a;
>
> 2012/10/23 jamal sasha <ja...@gmail.com>:
>> Hi I have a file in format
>> {(1,123,score) ,(1,124,score)}
>> {(2,356,score),(2,678,score)}
>> etc
>>
>> I  am guessing the person who was working on this forgot to flatten this in
>> last step?
>> How do I read and flatten this ?

Re: Reading a grouped file

Posted by Adam Kawa <ka...@gmail.com>.

You may run something like:

a = load 'bag.dat' as (b1: bag {t1: tuple(id: int, points: int, type:
chararray)}, b2: bag {t: tuple(id: int, points: int, type:
chararray)});
dump a;

2012/10/23 jamal sasha <ja...@gmail.com>:
> Hi I have a file in format
> {(1,123,score) ,(1,124,score)}
> {(2,356,score),(2,678,score)}
> etc
>
> I  am guessing the person who was working on this forgot to flatten this in
> last step?
> How do I read and flatten this ?