You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Rahul Channe <dr...@googlemail.com> on 2014/06/02 02:54:14 UTC

How to FLATTEN hive column in Pig with ARRAY data type

Hi All,

I have imported hive table into pig having a complex data type
(ARRAY<String>). The alias in pig looks as below

grunt> describe A;
A: {cust_id: int,cust_name: chararray,cust_address: {innertuple:
(innerfield: chararray)},cust_email: chararray}

grunt> dump A;

(123,phil abc,{(2200),(benjamin avenue),(philadelphia)},tttt@gmail.com)
(124,diego arty,{(44),(atlanta franklin),(florida)},oooo@gmail.com)

The cust_address is the ARRAY field from hive. I want to FLATTEN the
cust_address into different fields.


Expected output
(2200,benjamin avenue,philadelphia)
(44,atlanta franklin,florida)

please help

Regards,
Rahul

Re: How to FLATTEN hive column in Pig with ARRAY data type

Posted by Pradeep Gollakota <pr...@gmail.com>.
Awesome... that's the way I would have done it as well.


On Mon, Jun 2, 2014 at 10:14 AM, Rahul Channe <dr...@googlemail.com>
wrote:

> I tried changing the hive column datatype from ARRAY to STRUCT for
> cust_address, then i imported the table in pig.
>
> Now I am able to separate the fields, as below
>
> grunt> Z = load 'cust_info' using org.apache.hcatalog.pig.HCatLoader();
> grunt> describe Z;
> Z: {cust_id: int,cust_name: chararray,cust_address: (house_no: int,street:
> chararray,city: chararray)}
>
>
> grunt> Y = foreach Z generate cust_address.house_no as
> house_no,cust_address.street as street,UPPER(cust_address.city) as city;
> grunt> describe Y;
> Y: {house_no: int,street: chararray,city: chararray}
>
> grunt> dump Y;
> (2200,benjamin franklin,PHILADELPHIA)
> (44,atlanta franklin,FLORIDA)
>
>
> On Mon, Jun 2, 2014 at 1:09 PM, Rahul Channe <dr...@googlemail.com>
> wrote:
>
> > grunt> B = foreach A generate BagToTuple(cust_address);
> >
> > grunt> describe B;
> > B: {org.apache.pig.builtin.bagtotuple_cust_address_24: (innerfield:
> > chararray)}
> >
> > grunt> dump B;
> > ((2200,benjamin franklin,philadelphia))
> > ((44,atlanta franklin,florida))
> >
> >
> >
> >
> > On Mon, Jun 2, 2014 at 12:59 PM, Pradeep Gollakota <pradeepg26@gmail.com
> >
> > wrote:
> >
> >> If you're using the built-in BagToTuple UDF, then you probably don't
> need
> >> the FLATTEN operator.
> >>
> >> I suspect that your output looks as follows:
> >>
> >> 2200
> >> benjamin avenue
> >> philadelphia
> >> ...
> >>
> >> Can you confirm that this is what you're seeing?
> >>
> >>
> >> On Mon, Jun 2, 2014 at 9:52 AM, Rahul Channe <dr...@googlemail.com>
> >> wrote:
> >>
> >> > Thank You Pradeep, it worked to a certain extend but having following
> >> > difficulty in separating fields as $0,$1 for the customer_address.
> >> >
> >> >
> >> > Example -
> >> >
> >> > grunt> describe A;
> >> > A: {cust_id: int,cust_name: chararray,cust_address: {innertuple:
> >> > (innerfield: chararray)},cust_email: chararray}
> >> >
> >> > grunt> dump A;
> >> >
> >> > (123,phil abc,{(2200),(benjamin avenue),(philadelphia)},
> tttt@gmail.com)
> >> > (124,diego arty,{(44),(atlanta franklin),(florida)},oooo@gmail.com)
> >> >
> >> > grunt> B = foreach A generate FLATTEN(BagToTuple(cust_address));
> >> > grunt> dump B;
> >> > (2200,benjamin franklin,philadelphia)
> >> > (44,atlanta franklin,florida)
> >> >
> >> > grunt> describe B;
> >> > B: {org.apache.pig.builtin.bagtotuple_cust_address_34::innerfield:
> >> > chararray}
> >> >
> >> >
> >> >
> >> > I am not able to seperate the fields in B as $0,$1 and $3 ,tried using
> >> > STRSPLIT but didnt work.
> >> >
> >> >
> >> >
> >> > On Mon, Jun 2, 2014 at 11:50 AM, Pradeep Gollakota <
> >> pradeepg26@gmail.com>
> >> > wrote:
> >> >
> >> > > There was a similar question as this on StackOverflow a while back.
> >> The
> >> > > suggestion was to write a custom BagToTuple UDF.
> >> > >
> >> > >
> >> > >
> >> >
> >>
> http://stackoverflow.com/questions/18544602/how-to-flatten-a-group-into-a-single-tuple-in-pig
> >> > >
> >> > >
> >> > > On Mon, Jun 2, 2014 at 8:46 AM, Pradeep Gollakota <
> >> pradeepg26@gmail.com>
> >> > > wrote:
> >> > >
> >> > > > Disregard last email.
> >> > > >
> >> > > > Sorry... didn't fully understand the question.
> >> > > >
> >> > > >
> >> > > > On Mon, Jun 2, 2014 at 8:44 AM, Pradeep Gollakota <
> >> > pradeepg26@gmail.com>
> >> > > > wrote:
> >> > > >
> >> > > >> FOREACH A GENERATE cust_id, cust_name, FLATTEN(cust_address),
> >> > > cust_email;
> >> > > >>
> >> > > >> ​
> >> > > >>
> >> > > >>
> >> > > >> On Sun, Jun 1, 2014 at 5:54 PM, Rahul Channe <
> >> drahulc@googlemail.com>
> >> > > >> wrote:
> >> > > >>
> >> > > >>> Hi All,
> >> > > >>>
> >> > > >>> I have imported hive table into pig having a complex data type
> >> > > >>> (ARRAY<String>). The alias in pig looks as below
> >> > > >>>
> >> > > >>> grunt> describe A;
> >> > > >>> A: {cust_id: int,cust_name: chararray,cust_address: {innertuple:
> >> > > >>> (innerfield: chararray)},cust_email: chararray}
> >> > > >>>
> >> > > >>> grunt> dump A;
> >> > > >>>
> >> > > >>> (123,phil abc,{(2200),(benjamin avenue),(philadelphia)},
> >> > tttt@gmail.com
> >> > > )
> >> > > >>> (124,diego arty,{(44),(atlanta franklin),(florida)},
> >> oooo@gmail.com)
> >> > > >>>
> >> > > >>> The cust_address is the ARRAY field from hive. I want to FLATTEN
> >> the
> >> > > >>> cust_address into different fields.
> >> > > >>>
> >> > > >>>
> >> > > >>> Expected output
> >> > > >>> (2200,benjamin avenue,philadelphia)
> >> > > >>> (44,atlanta franklin,florida)
> >> > > >>>
> >> > > >>> please help
> >> > > >>>
> >> > > >>> Regards,
> >> > > >>> Rahul
> >> > > >>>
> >> > > >>
> >> > > >>
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>

Re: How to FLATTEN hive column in Pig with ARRAY data type

Posted by Rahul Channe <dr...@googlemail.com>.
I tried changing the hive column datatype from ARRAY to STRUCT for
cust_address, then i imported the table in pig.

Now I am able to separate the fields, as below

grunt> Z = load 'cust_info' using org.apache.hcatalog.pig.HCatLoader();
grunt> describe Z;
Z: {cust_id: int,cust_name: chararray,cust_address: (house_no: int,street:
chararray,city: chararray)}


grunt> Y = foreach Z generate cust_address.house_no as
house_no,cust_address.street as street,UPPER(cust_address.city) as city;
grunt> describe Y;
Y: {house_no: int,street: chararray,city: chararray}

grunt> dump Y;
(2200,benjamin franklin,PHILADELPHIA)
(44,atlanta franklin,FLORIDA)


On Mon, Jun 2, 2014 at 1:09 PM, Rahul Channe <dr...@googlemail.com> wrote:

> grunt> B = foreach A generate BagToTuple(cust_address);
>
> grunt> describe B;
> B: {org.apache.pig.builtin.bagtotuple_cust_address_24: (innerfield:
> chararray)}
>
> grunt> dump B;
> ((2200,benjamin franklin,philadelphia))
> ((44,atlanta franklin,florida))
>
>
>
>
> On Mon, Jun 2, 2014 at 12:59 PM, Pradeep Gollakota <pr...@gmail.com>
> wrote:
>
>> If you're using the built-in BagToTuple UDF, then you probably don't need
>> the FLATTEN operator.
>>
>> I suspect that your output looks as follows:
>>
>> 2200
>> benjamin avenue
>> philadelphia
>> ...
>>
>> Can you confirm that this is what you're seeing?
>>
>>
>> On Mon, Jun 2, 2014 at 9:52 AM, Rahul Channe <dr...@googlemail.com>
>> wrote:
>>
>> > Thank You Pradeep, it worked to a certain extend but having following
>> > difficulty in separating fields as $0,$1 for the customer_address.
>> >
>> >
>> > Example -
>> >
>> > grunt> describe A;
>> > A: {cust_id: int,cust_name: chararray,cust_address: {innertuple:
>> > (innerfield: chararray)},cust_email: chararray}
>> >
>> > grunt> dump A;
>> >
>> > (123,phil abc,{(2200),(benjamin avenue),(philadelphia)},tttt@gmail.com)
>> > (124,diego arty,{(44),(atlanta franklin),(florida)},oooo@gmail.com)
>> >
>> > grunt> B = foreach A generate FLATTEN(BagToTuple(cust_address));
>> > grunt> dump B;
>> > (2200,benjamin franklin,philadelphia)
>> > (44,atlanta franklin,florida)
>> >
>> > grunt> describe B;
>> > B: {org.apache.pig.builtin.bagtotuple_cust_address_34::innerfield:
>> > chararray}
>> >
>> >
>> >
>> > I am not able to seperate the fields in B as $0,$1 and $3 ,tried using
>> > STRSPLIT but didnt work.
>> >
>> >
>> >
>> > On Mon, Jun 2, 2014 at 11:50 AM, Pradeep Gollakota <
>> pradeepg26@gmail.com>
>> > wrote:
>> >
>> > > There was a similar question as this on StackOverflow a while back.
>> The
>> > > suggestion was to write a custom BagToTuple UDF.
>> > >
>> > >
>> > >
>> >
>> http://stackoverflow.com/questions/18544602/how-to-flatten-a-group-into-a-single-tuple-in-pig
>> > >
>> > >
>> > > On Mon, Jun 2, 2014 at 8:46 AM, Pradeep Gollakota <
>> pradeepg26@gmail.com>
>> > > wrote:
>> > >
>> > > > Disregard last email.
>> > > >
>> > > > Sorry... didn't fully understand the question.
>> > > >
>> > > >
>> > > > On Mon, Jun 2, 2014 at 8:44 AM, Pradeep Gollakota <
>> > pradeepg26@gmail.com>
>> > > > wrote:
>> > > >
>> > > >> FOREACH A GENERATE cust_id, cust_name, FLATTEN(cust_address),
>> > > cust_email;
>> > > >>
>> > > >> ​
>> > > >>
>> > > >>
>> > > >> On Sun, Jun 1, 2014 at 5:54 PM, Rahul Channe <
>> drahulc@googlemail.com>
>> > > >> wrote:
>> > > >>
>> > > >>> Hi All,
>> > > >>>
>> > > >>> I have imported hive table into pig having a complex data type
>> > > >>> (ARRAY<String>). The alias in pig looks as below
>> > > >>>
>> > > >>> grunt> describe A;
>> > > >>> A: {cust_id: int,cust_name: chararray,cust_address: {innertuple:
>> > > >>> (innerfield: chararray)},cust_email: chararray}
>> > > >>>
>> > > >>> grunt> dump A;
>> > > >>>
>> > > >>> (123,phil abc,{(2200),(benjamin avenue),(philadelphia)},
>> > tttt@gmail.com
>> > > )
>> > > >>> (124,diego arty,{(44),(atlanta franklin),(florida)},
>> oooo@gmail.com)
>> > > >>>
>> > > >>> The cust_address is the ARRAY field from hive. I want to FLATTEN
>> the
>> > > >>> cust_address into different fields.
>> > > >>>
>> > > >>>
>> > > >>> Expected output
>> > > >>> (2200,benjamin avenue,philadelphia)
>> > > >>> (44,atlanta franklin,florida)
>> > > >>>
>> > > >>> please help
>> > > >>>
>> > > >>> Regards,
>> > > >>> Rahul
>> > > >>>
>> > > >>
>> > > >>
>> > > >
>> > >
>> >
>>
>
>

Re: How to FLATTEN hive column in Pig with ARRAY data type

Posted by Rahul Channe <dr...@googlemail.com>.
grunt> B = foreach A generate BagToTuple(cust_address);

grunt> describe B;
B: {org.apache.pig.builtin.bagtotuple_cust_address_24: (innerfield:
chararray)}

grunt> dump B;
((2200,benjamin franklin,philadelphia))
((44,atlanta franklin,florida))




On Mon, Jun 2, 2014 at 12:59 PM, Pradeep Gollakota <pr...@gmail.com>
wrote:

> If you're using the built-in BagToTuple UDF, then you probably don't need
> the FLATTEN operator.
>
> I suspect that your output looks as follows:
>
> 2200
> benjamin avenue
> philadelphia
> ...
>
> Can you confirm that this is what you're seeing?
>
>
> On Mon, Jun 2, 2014 at 9:52 AM, Rahul Channe <dr...@googlemail.com>
> wrote:
>
> > Thank You Pradeep, it worked to a certain extend but having following
> > difficulty in separating fields as $0,$1 for the customer_address.
> >
> >
> > Example -
> >
> > grunt> describe A;
> > A: {cust_id: int,cust_name: chararray,cust_address: {innertuple:
> > (innerfield: chararray)},cust_email: chararray}
> >
> > grunt> dump A;
> >
> > (123,phil abc,{(2200),(benjamin avenue),(philadelphia)},tttt@gmail.com)
> > (124,diego arty,{(44),(atlanta franklin),(florida)},oooo@gmail.com)
> >
> > grunt> B = foreach A generate FLATTEN(BagToTuple(cust_address));
> > grunt> dump B;
> > (2200,benjamin franklin,philadelphia)
> > (44,atlanta franklin,florida)
> >
> > grunt> describe B;
> > B: {org.apache.pig.builtin.bagtotuple_cust_address_34::innerfield:
> > chararray}
> >
> >
> >
> > I am not able to seperate the fields in B as $0,$1 and $3 ,tried using
> > STRSPLIT but didnt work.
> >
> >
> >
> > On Mon, Jun 2, 2014 at 11:50 AM, Pradeep Gollakota <pradeepg26@gmail.com
> >
> > wrote:
> >
> > > There was a similar question as this on StackOverflow a while back. The
> > > suggestion was to write a custom BagToTuple UDF.
> > >
> > >
> > >
> >
> http://stackoverflow.com/questions/18544602/how-to-flatten-a-group-into-a-single-tuple-in-pig
> > >
> > >
> > > On Mon, Jun 2, 2014 at 8:46 AM, Pradeep Gollakota <
> pradeepg26@gmail.com>
> > > wrote:
> > >
> > > > Disregard last email.
> > > >
> > > > Sorry... didn't fully understand the question.
> > > >
> > > >
> > > > On Mon, Jun 2, 2014 at 8:44 AM, Pradeep Gollakota <
> > pradeepg26@gmail.com>
> > > > wrote:
> > > >
> > > >> FOREACH A GENERATE cust_id, cust_name, FLATTEN(cust_address),
> > > cust_email;
> > > >>
> > > >> ​
> > > >>
> > > >>
> > > >> On Sun, Jun 1, 2014 at 5:54 PM, Rahul Channe <
> drahulc@googlemail.com>
> > > >> wrote:
> > > >>
> > > >>> Hi All,
> > > >>>
> > > >>> I have imported hive table into pig having a complex data type
> > > >>> (ARRAY<String>). The alias in pig looks as below
> > > >>>
> > > >>> grunt> describe A;
> > > >>> A: {cust_id: int,cust_name: chararray,cust_address: {innertuple:
> > > >>> (innerfield: chararray)},cust_email: chararray}
> > > >>>
> > > >>> grunt> dump A;
> > > >>>
> > > >>> (123,phil abc,{(2200),(benjamin avenue),(philadelphia)},
> > tttt@gmail.com
> > > )
> > > >>> (124,diego arty,{(44),(atlanta franklin),(florida)},oooo@gmail.com
> )
> > > >>>
> > > >>> The cust_address is the ARRAY field from hive. I want to FLATTEN
> the
> > > >>> cust_address into different fields.
> > > >>>
> > > >>>
> > > >>> Expected output
> > > >>> (2200,benjamin avenue,philadelphia)
> > > >>> (44,atlanta franklin,florida)
> > > >>>
> > > >>> please help
> > > >>>
> > > >>> Regards,
> > > >>> Rahul
> > > >>>
> > > >>
> > > >>
> > > >
> > >
> >
>

Re: How to FLATTEN hive column in Pig with ARRAY data type

Posted by Pradeep Gollakota <pr...@gmail.com>.
If you're using the built-in BagToTuple UDF, then you probably don't need
the FLATTEN operator.

I suspect that your output looks as follows:

2200
benjamin avenue
philadelphia
...

Can you confirm that this is what you're seeing?


On Mon, Jun 2, 2014 at 9:52 AM, Rahul Channe <dr...@googlemail.com> wrote:

> Thank You Pradeep, it worked to a certain extend but having following
> difficulty in separating fields as $0,$1 for the customer_address.
>
>
> Example -
>
> grunt> describe A;
> A: {cust_id: int,cust_name: chararray,cust_address: {innertuple:
> (innerfield: chararray)},cust_email: chararray}
>
> grunt> dump A;
>
> (123,phil abc,{(2200),(benjamin avenue),(philadelphia)},tttt@gmail.com)
> (124,diego arty,{(44),(atlanta franklin),(florida)},oooo@gmail.com)
>
> grunt> B = foreach A generate FLATTEN(BagToTuple(cust_address));
> grunt> dump B;
> (2200,benjamin franklin,philadelphia)
> (44,atlanta franklin,florida)
>
> grunt> describe B;
> B: {org.apache.pig.builtin.bagtotuple_cust_address_34::innerfield:
> chararray}
>
>
>
> I am not able to seperate the fields in B as $0,$1 and $3 ,tried using
> STRSPLIT but didnt work.
>
>
>
> On Mon, Jun 2, 2014 at 11:50 AM, Pradeep Gollakota <pr...@gmail.com>
> wrote:
>
> > There was a similar question as this on StackOverflow a while back. The
> > suggestion was to write a custom BagToTuple UDF.
> >
> >
> >
> http://stackoverflow.com/questions/18544602/how-to-flatten-a-group-into-a-single-tuple-in-pig
> >
> >
> > On Mon, Jun 2, 2014 at 8:46 AM, Pradeep Gollakota <pr...@gmail.com>
> > wrote:
> >
> > > Disregard last email.
> > >
> > > Sorry... didn't fully understand the question.
> > >
> > >
> > > On Mon, Jun 2, 2014 at 8:44 AM, Pradeep Gollakota <
> pradeepg26@gmail.com>
> > > wrote:
> > >
> > >> FOREACH A GENERATE cust_id, cust_name, FLATTEN(cust_address),
> > cust_email;
> > >>
> > >> ​
> > >>
> > >>
> > >> On Sun, Jun 1, 2014 at 5:54 PM, Rahul Channe <dr...@googlemail.com>
> > >> wrote:
> > >>
> > >>> Hi All,
> > >>>
> > >>> I have imported hive table into pig having a complex data type
> > >>> (ARRAY<String>). The alias in pig looks as below
> > >>>
> > >>> grunt> describe A;
> > >>> A: {cust_id: int,cust_name: chararray,cust_address: {innertuple:
> > >>> (innerfield: chararray)},cust_email: chararray}
> > >>>
> > >>> grunt> dump A;
> > >>>
> > >>> (123,phil abc,{(2200),(benjamin avenue),(philadelphia)},
> tttt@gmail.com
> > )
> > >>> (124,diego arty,{(44),(atlanta franklin),(florida)},oooo@gmail.com)
> > >>>
> > >>> The cust_address is the ARRAY field from hive. I want to FLATTEN the
> > >>> cust_address into different fields.
> > >>>
> > >>>
> > >>> Expected output
> > >>> (2200,benjamin avenue,philadelphia)
> > >>> (44,atlanta franklin,florida)
> > >>>
> > >>> please help
> > >>>
> > >>> Regards,
> > >>> Rahul
> > >>>
> > >>
> > >>
> > >
> >
>

Re: How to FLATTEN hive column in Pig with ARRAY data type

Posted by Rahul Channe <dr...@googlemail.com>.
Thank You Pradeep, it worked to a certain extend but having following
difficulty in separating fields as $0,$1 for the customer_address.


Example -

grunt> describe A;
A: {cust_id: int,cust_name: chararray,cust_address: {innertuple:
(innerfield: chararray)},cust_email: chararray}

grunt> dump A;

(123,phil abc,{(2200),(benjamin avenue),(philadelphia)},tttt@gmail.com)
(124,diego arty,{(44),(atlanta franklin),(florida)},oooo@gmail.com)

grunt> B = foreach A generate FLATTEN(BagToTuple(cust_address));
grunt> dump B;
(2200,benjamin franklin,philadelphia)
(44,atlanta franklin,florida)

grunt> describe B;
B: {org.apache.pig.builtin.bagtotuple_cust_address_34::innerfield:
chararray}



I am not able to seperate the fields in B as $0,$1 and $3 ,tried using
STRSPLIT but didnt work.



On Mon, Jun 2, 2014 at 11:50 AM, Pradeep Gollakota <pr...@gmail.com>
wrote:

> There was a similar question as this on StackOverflow a while back. The
> suggestion was to write a custom BagToTuple UDF.
>
>
> http://stackoverflow.com/questions/18544602/how-to-flatten-a-group-into-a-single-tuple-in-pig
>
>
> On Mon, Jun 2, 2014 at 8:46 AM, Pradeep Gollakota <pr...@gmail.com>
> wrote:
>
> > Disregard last email.
> >
> > Sorry... didn't fully understand the question.
> >
> >
> > On Mon, Jun 2, 2014 at 8:44 AM, Pradeep Gollakota <pr...@gmail.com>
> > wrote:
> >
> >> FOREACH A GENERATE cust_id, cust_name, FLATTEN(cust_address),
> cust_email;
> >>
> >> ​
> >>
> >>
> >> On Sun, Jun 1, 2014 at 5:54 PM, Rahul Channe <dr...@googlemail.com>
> >> wrote:
> >>
> >>> Hi All,
> >>>
> >>> I have imported hive table into pig having a complex data type
> >>> (ARRAY<String>). The alias in pig looks as below
> >>>
> >>> grunt> describe A;
> >>> A: {cust_id: int,cust_name: chararray,cust_address: {innertuple:
> >>> (innerfield: chararray)},cust_email: chararray}
> >>>
> >>> grunt> dump A;
> >>>
> >>> (123,phil abc,{(2200),(benjamin avenue),(philadelphia)},tttt@gmail.com
> )
> >>> (124,diego arty,{(44),(atlanta franklin),(florida)},oooo@gmail.com)
> >>>
> >>> The cust_address is the ARRAY field from hive. I want to FLATTEN the
> >>> cust_address into different fields.
> >>>
> >>>
> >>> Expected output
> >>> (2200,benjamin avenue,philadelphia)
> >>> (44,atlanta franklin,florida)
> >>>
> >>> please help
> >>>
> >>> Regards,
> >>> Rahul
> >>>
> >>
> >>
> >
>

Re: How to FLATTEN hive column in Pig with ARRAY data type

Posted by Pradeep Gollakota <pr...@gmail.com>.
There was a similar question as this on StackOverflow a while back. The
suggestion was to write a custom BagToTuple UDF.

http://stackoverflow.com/questions/18544602/how-to-flatten-a-group-into-a-single-tuple-in-pig


On Mon, Jun 2, 2014 at 8:46 AM, Pradeep Gollakota <pr...@gmail.com>
wrote:

> Disregard last email.
>
> Sorry... didn't fully understand the question.
>
>
> On Mon, Jun 2, 2014 at 8:44 AM, Pradeep Gollakota <pr...@gmail.com>
> wrote:
>
>> FOREACH A GENERATE cust_id, cust_name, FLATTEN(cust_address), cust_email;
>>
>> ​
>>
>>
>> On Sun, Jun 1, 2014 at 5:54 PM, Rahul Channe <dr...@googlemail.com>
>> wrote:
>>
>>> Hi All,
>>>
>>> I have imported hive table into pig having a complex data type
>>> (ARRAY<String>). The alias in pig looks as below
>>>
>>> grunt> describe A;
>>> A: {cust_id: int,cust_name: chararray,cust_address: {innertuple:
>>> (innerfield: chararray)},cust_email: chararray}
>>>
>>> grunt> dump A;
>>>
>>> (123,phil abc,{(2200),(benjamin avenue),(philadelphia)},tttt@gmail.com)
>>> (124,diego arty,{(44),(atlanta franklin),(florida)},oooo@gmail.com)
>>>
>>> The cust_address is the ARRAY field from hive. I want to FLATTEN the
>>> cust_address into different fields.
>>>
>>>
>>> Expected output
>>> (2200,benjamin avenue,philadelphia)
>>> (44,atlanta franklin,florida)
>>>
>>> please help
>>>
>>> Regards,
>>> Rahul
>>>
>>
>>
>

Re: How to FLATTEN hive column in Pig with ARRAY data type

Posted by Pradeep Gollakota <pr...@gmail.com>.
Disregard last email.

Sorry... didn't fully understand the question.


On Mon, Jun 2, 2014 at 8:44 AM, Pradeep Gollakota <pr...@gmail.com>
wrote:

> FOREACH A GENERATE cust_id, cust_name, FLATTEN(cust_address), cust_email;
>
> ​
>
>
> On Sun, Jun 1, 2014 at 5:54 PM, Rahul Channe <dr...@googlemail.com>
> wrote:
>
>> Hi All,
>>
>> I have imported hive table into pig having a complex data type
>> (ARRAY<String>). The alias in pig looks as below
>>
>> grunt> describe A;
>> A: {cust_id: int,cust_name: chararray,cust_address: {innertuple:
>> (innerfield: chararray)},cust_email: chararray}
>>
>> grunt> dump A;
>>
>> (123,phil abc,{(2200),(benjamin avenue),(philadelphia)},tttt@gmail.com)
>> (124,diego arty,{(44),(atlanta franklin),(florida)},oooo@gmail.com)
>>
>> The cust_address is the ARRAY field from hive. I want to FLATTEN the
>> cust_address into different fields.
>>
>>
>> Expected output
>> (2200,benjamin avenue,philadelphia)
>> (44,atlanta franklin,florida)
>>
>> please help
>>
>> Regards,
>> Rahul
>>
>
>

Re: How to FLATTEN hive column in Pig with ARRAY data type

Posted by Pradeep Gollakota <pr...@gmail.com>.
FOREACH A GENERATE cust_id, cust_name, FLATTEN(cust_address), cust_email;

​


On Sun, Jun 1, 2014 at 5:54 PM, Rahul Channe <dr...@googlemail.com> wrote:

> Hi All,
>
> I have imported hive table into pig having a complex data type
> (ARRAY<String>). The alias in pig looks as below
>
> grunt> describe A;
> A: {cust_id: int,cust_name: chararray,cust_address: {innertuple:
> (innerfield: chararray)},cust_email: chararray}
>
> grunt> dump A;
>
> (123,phil abc,{(2200),(benjamin avenue),(philadelphia)},tttt@gmail.com)
> (124,diego arty,{(44),(atlanta franklin),(florida)},oooo@gmail.com)
>
> The cust_address is the ARRAY field from hive. I want to FLATTEN the
> cust_address into different fields.
>
>
> Expected output
> (2200,benjamin avenue,philadelphia)
> (44,atlanta franklin,florida)
>
> please help
>
> Regards,
> Rahul
>