You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by kiran chitturi <ch...@gmail.com> on 2013/03/14 04:37:54 UTC

Mapping nested json objects to map data type

Hi!

I am using Pig 0.10 version and I have a question about mapping nested JSON
objects from Hbase.

*For example: *

The below commands loads the field family from Hbase.

fields = load 'hbase://documents' using
org.apache.pig.backend.hadoop.hbase.HBaseStorage('field:*','-loadKey true
-limit 5') as (rowkey, metadata:map[]);

The metadata field looks like below after the above command. ( I used
'illustrate fields' to get this)

{fields_j={"tika.Content-Encoding":"ISO-8859-1","distanceToCentroid":0.5761632290266712,"tika.Content-Type":"text/plain;
charset=ISO-8859-1","clusterId":118,"tika.parsing":"ok"}}

Map data type worked as I wanted so far. Now, I would like the value for
'fields_j' key to be also a Map data type. I think it is being assigned as
'byteArray' by default.

Is there any way by which I can convert this in to a map data type ? That
would be helpful for me to process more.

I tried to write python UDF but jython only supports python 2.5, I am not
sure how to convert this string in to a dictionary in python.

Did anyone encounter this type of issue before ?

Sorry for the long question, I want to explain my problem clearly.

Please let me know your suggestions.

Regards,

-- 
Kiran Chitturi

<http://www.linkedin.com/in/kiranchitturi>

Re: Mapping nested json objects to map data type

Posted by kiran chitturi <ch...@gmail.com>.
Thank you Harsha for your quick reply.

I will test my data across this UDF.




On Wed, Mar 13, 2013 at 11:54 PM, Harsha <ha...@defun.org> wrote:

> Hi Kiran,
>        If you are ok with using java for udfs take a look at this
>
> https://github.com/mozilla-metrics/akela/tree/master/src/main/java/com/mozilla/pig/eval/json
> we Use MapToJson to parse complex json objects from hbase.
> -Harsha
>
>
> --
> Harsha
>
>
> On Wednesday, March 13, 2013 at 8:37 PM, kiran chitturi wrote:
>
> > Hi!
> >
> > I am using Pig 0.10 version and I have a question about mapping nested
> JSON
> > objects from Hbase.
> >
> > *For example: *
> >
> > The below commands loads the field family from Hbase.
> >
> > fields = load 'hbase://documents' using
> > org.apache.pig.backend.hadoop.hbase.HBaseStorage('field:*','-loadKey true
> > -limit 5') as (rowkey, metadata:map[]);
> >
> > The metadata field looks like below after the above command. ( I used
> > 'illustrate fields' to get this)
> >
> >
> {fields_j={"tika.Content-Encoding":"ISO-8859-1","distanceToCentroid":0.5761632290266712,"tika.Content-Type":"text/plain;
> > charset=ISO-8859-1","clusterId":118,"tika.parsing":"ok"}}
> >
> > Map data type worked as I wanted so far. Now, I would like the value for
> > 'fields_j' key to be also a Map data type. I think it is being assigned
> as
> > 'byteArray' by default.
> >
> > Is there any way by which I can convert this in to a map data type ? That
> > would be helpful for me to process more.
> >
> > I tried to write python UDF but jython only supports python 2.5, I am not
> > sure how to convert this string in to a dictionary in python.
> >
> > Did anyone encounter this type of issue before ?
> >
> > Sorry for the long question, I want to explain my problem clearly.
> >
> > Please let me know your suggestions.
> >
> > Regards,
> >
> > --
> > Kiran Chitturi
> >
> > <http://www.linkedin.com/in/kiranchitturi>
>
>


-- 
Kiran Chitturi

<http://www.linkedin.com/in/kiranchitturi>

Re: Mapping nested json objects to map data type

Posted by kiran chitturi <ch...@gmail.com>.
Thank you Harsha.

I was able to run my scripts successfully by following the example scripts
and finally, I have my Json object in map data type.

Thanks again,
Kiran.


On Thu, Mar 14, 2013 at 2:51 AM, Harsha <ha...@defun.org> wrote:

> Hi Kiran,
>       Can you take a look at pig scripts under here
>
>
> https://github.com/mozilla-metrics/telemetry-toolbox/tree/master/src/main/pig
> All of them uses those Json udfs to parse.
> --
> Harsha
> Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
>
>
> On Wednesday, March 13, 2013 at 11:25 PM, kiran chitturi wrote:
>
> > Hi Harsha,
> >
> > I am using the UDF that was in the link
> >
> https://github.com/mozilla-metrics/akela/blob/master/src/main/java/com/mozilla/pig/eval/json/MapToJson.java
> > .
> >
> > I was able to run it successfully but I had some issues since the output
> is
> > null.
> >
> > Please find my commands below
> >
> > ----------
> > fields = load 'hbase://documents' using
> > org.apache.pig.backend.hadoop.hbase.HBaseStorage('field:*','-loadKey true
> > -limit 5') as (rowkey, metadata:map[]);
> > fields_split = foreach fields generate
> > com.mozilla.pig.eval.json.MapToJson(metadata);
> > dump fields_split;
> > -----------
> >
> > The output is empty 51 records. When I used the command 'illustrate
> > fields_split', It gave me the below output.
> >
> > -------------------------------------------------------
> > | fields | rowkey:bytearray
> > | metadata:map
> >
> > |
> > ------------------------------------------------------
> > | |
> > collection100hdfs://LucidN1:50001/input/reuters/reut2-021.sgm-166.txt |
> >
> {fields_j={"tika.Content-Encoding":"ISO-8859-1","distanceToCentroid":0.5685425678289969,"tika.Content-Type":"text/plain;
> > charset=ISO-8859-1","clusterId":118,"tika.parsing":"ok"}} |
> > ------------------------------------
> > | fields_split | :chararray |
> > ------------------------------------
> > | | |
> > ------------------------------------
> >
> > Am I missing something here ? Can you give me a simple working usecase of
> > yours if you don't mind ? All of my records have something in the
> 'fields'
> > family. It is quite strange to see empty results.
> >
> > Please let me know your suggestions.
> >
> > Thank you,
> >
> >
> > On Wed, Mar 13, 2013 at 11:54 PM, Harsha <ha...@defun.org> wrote:
> >
> > > Hi Kiran,
> > > If you are ok with using java for udfs take a look at this
> > >
> > >
> https://github.com/mozilla-metrics/akela/tree/master/src/main/java/com/mozilla/pig/eval/json
> > > we Use MapToJson to parse complex json objects from hbase.
> > > -Harsha
> > >
> > >
> > > --
> > > Harsha
> > >
> > >
> > > On Wednesday, March 13, 2013 at 8:37 PM, kiran chitturi wrote:
> > >
> > > > Hi!
> > > >
> > > > I am using Pig 0.10 version and I have a question about mapping
> nested
> > > JSON
> > > > objects from Hbase.
> > > >
> > > > *For example: *
> > > >
> > > > The below commands loads the field family from Hbase.
> > > >
> > > > fields = load 'hbase://documents' using
> > > > org.apache.pig.backend.hadoop.hbase.HBaseStorage('field:*','-loadKey
> true
> > > > -limit 5') as (rowkey, metadata:map[]);
> > > >
> > > > The metadata field looks like below after the above command. ( I used
> > > > 'illustrate fields' to get this)
> > > >
> > >
> > >
> {fields_j={"tika.Content-Encoding":"ISO-8859-1","distanceToCentroid":0.5761632290266712,"tika.Content-Type":"text/plain;
> > > > charset=ISO-8859-1","clusterId":118,"tika.parsing":"ok"}}
> > > >
> > > > Map data type worked as I wanted so far. Now, I would like the value
> for
> > > > 'fields_j' key to be also a Map data type. I think it is being
> assigned
> > > >
> > >
> > > as
> > > > 'byteArray' by default.
> > > >
> > > > Is there any way by which I can convert this in to a map data type ?
> That
> > > > would be helpful for me to process more.
> > > >
> > > > I tried to write python UDF but jython only supports python 2.5, I
> am not
> > > > sure how to convert this string in to a dictionary in python.
> > > >
> > > > Did anyone encounter this type of issue before ?
> > > >
> > > > Sorry for the long question, I want to explain my problem clearly.
> > > >
> > > > Please let me know your suggestions.
> > > >
> > > > Regards,
> > > >
> > > > --
> > > > Kiran Chitturi
> > > >
> > > > <http://www.linkedin.com/in/kiranchitturi>
> >
> >
> > --
> > Kiran Chitturi
> >
> > <http://www.linkedin.com/in/kiranchitturi>
>
>


-- 
Kiran Chitturi

<http://www.linkedin.com/in/kiranchitturi>

Re: Mapping nested json objects to map data type

Posted by Harsha <ha...@defun.org>.
Hi Kiran, 
      Can you take a look at pig scripts under here

https://github.com/mozilla-metrics/telemetry-toolbox/tree/master/src/main/pig
All of them uses those Json udfs to parse.
-- 
Harsha
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Wednesday, March 13, 2013 at 11:25 PM, kiran chitturi wrote:

> Hi Harsha,
> 
> I am using the UDF that was in the link
> https://github.com/mozilla-metrics/akela/blob/master/src/main/java/com/mozilla/pig/eval/json/MapToJson.java
> .
> 
> I was able to run it successfully but I had some issues since the output is
> null.
> 
> Please find my commands below
> 
> ----------
> fields = load 'hbase://documents' using
> org.apache.pig.backend.hadoop.hbase.HBaseStorage('field:*','-loadKey true
> -limit 5') as (rowkey, metadata:map[]);
> fields_split = foreach fields generate
> com.mozilla.pig.eval.json.MapToJson(metadata);
> dump fields_split;
> -----------
> 
> The output is empty 51 records. When I used the command 'illustrate
> fields_split', It gave me the below output.
> 
> -------------------------------------------------------
> | fields | rowkey:bytearray
> | metadata:map
> 
> |
> ------------------------------------------------------
> | |
> collection100hdfs://LucidN1:50001/input/reuters/reut2-021.sgm-166.txt |
> {fields_j={"tika.Content-Encoding":"ISO-8859-1","distanceToCentroid":0.5685425678289969,"tika.Content-Type":"text/plain;
> charset=ISO-8859-1","clusterId":118,"tika.parsing":"ok"}} |
> ------------------------------------
> | fields_split | :chararray |
> ------------------------------------
> | | |
> ------------------------------------
> 
> Am I missing something here ? Can you give me a simple working usecase of
> yours if you don't mind ? All of my records have something in the 'fields'
> family. It is quite strange to see empty results.
> 
> Please let me know your suggestions.
> 
> Thank you,
> 
> 
> On Wed, Mar 13, 2013 at 11:54 PM, Harsha <ha...@defun.org> wrote:
> 
> > Hi Kiran,
> > If you are ok with using java for udfs take a look at this
> > 
> > https://github.com/mozilla-metrics/akela/tree/master/src/main/java/com/mozilla/pig/eval/json
> > we Use MapToJson to parse complex json objects from hbase.
> > -Harsha
> > 
> > 
> > --
> > Harsha
> > 
> > 
> > On Wednesday, March 13, 2013 at 8:37 PM, kiran chitturi wrote:
> > 
> > > Hi!
> > > 
> > > I am using Pig 0.10 version and I have a question about mapping nested
> > JSON
> > > objects from Hbase.
> > > 
> > > *For example: *
> > > 
> > > The below commands loads the field family from Hbase.
> > > 
> > > fields = load 'hbase://documents' using
> > > org.apache.pig.backend.hadoop.hbase.HBaseStorage('field:*','-loadKey true
> > > -limit 5') as (rowkey, metadata:map[]);
> > > 
> > > The metadata field looks like below after the above command. ( I used
> > > 'illustrate fields' to get this)
> > > 
> > 
> > {fields_j={"tika.Content-Encoding":"ISO-8859-1","distanceToCentroid":0.5761632290266712,"tika.Content-Type":"text/plain;
> > > charset=ISO-8859-1","clusterId":118,"tika.parsing":"ok"}}
> > > 
> > > Map data type worked as I wanted so far. Now, I would like the value for
> > > 'fields_j' key to be also a Map data type. I think it is being assigned
> > > 
> > 
> > as
> > > 'byteArray' by default.
> > > 
> > > Is there any way by which I can convert this in to a map data type ? That
> > > would be helpful for me to process more.
> > > 
> > > I tried to write python UDF but jython only supports python 2.5, I am not
> > > sure how to convert this string in to a dictionary in python.
> > > 
> > > Did anyone encounter this type of issue before ?
> > > 
> > > Sorry for the long question, I want to explain my problem clearly.
> > > 
> > > Please let me know your suggestions.
> > > 
> > > Regards,
> > > 
> > > --
> > > Kiran Chitturi
> > > 
> > > <http://www.linkedin.com/in/kiranchitturi>
> 
> 
> -- 
> Kiran Chitturi
> 
> <http://www.linkedin.com/in/kiranchitturi> 


Re: Mapping nested json objects to map data type

Posted by kiran chitturi <ch...@gmail.com>.
Hi Harsha,

I am using the UDF that was in the link
https://github.com/mozilla-metrics/akela/blob/master/src/main/java/com/mozilla/pig/eval/json/MapToJson.java
.

I was able to run it successfully but I had some issues since the output is
null.

Please find my commands below

----------
fields = load 'hbase://documents' using
org.apache.pig.backend.hadoop.hbase.HBaseStorage('field:*','-loadKey true
-limit 5') as (rowkey, metadata:map[]);
fields_split = foreach fields generate
com.mozilla.pig.eval.json.MapToJson(metadata);
dump fields_split;
-----------

The output is empty 51 records. When I used the command 'illustrate
fields_split', It gave me the below output.

-------------------------------------------------------
| fields     | rowkey:bytearray
             | metadata:map

                                          |
------------------------------------------------------
|            |
collection100hdfs://LucidN1:50001/input/reuters/reut2-021.sgm-166.txt |
{fields_j={"tika.Content-Encoding":"ISO-8859-1","distanceToCentroid":0.5685425678289969,"tika.Content-Type":"text/plain;
charset=ISO-8859-1","clusterId":118,"tika.parsing":"ok"}} |
------------------------------------
| fields_split     | :chararray    |
------------------------------------
|                  |               |
------------------------------------

Am I missing something here ? Can you give me a simple working usecase of
yours if you don't mind ? All of my records have something in the 'fields'
family. It is quite strange to see empty results.

Please let me know your suggestions.

Thank you,


On Wed, Mar 13, 2013 at 11:54 PM, Harsha <ha...@defun.org> wrote:

> Hi Kiran,
>        If you are ok with using java for udfs take a look at this
>
> https://github.com/mozilla-metrics/akela/tree/master/src/main/java/com/mozilla/pig/eval/json
> we Use MapToJson to parse complex json objects from hbase.
> -Harsha
>
>
> --
> Harsha
>
>
> On Wednesday, March 13, 2013 at 8:37 PM, kiran chitturi wrote:
>
> > Hi!
> >
> > I am using Pig 0.10 version and I have a question about mapping nested
> JSON
> > objects from Hbase.
> >
> > *For example: *
> >
> > The below commands loads the field family from Hbase.
> >
> > fields = load 'hbase://documents' using
> > org.apache.pig.backend.hadoop.hbase.HBaseStorage('field:*','-loadKey true
> > -limit 5') as (rowkey, metadata:map[]);
> >
> > The metadata field looks like below after the above command. ( I used
> > 'illustrate fields' to get this)
> >
> >
> {fields_j={"tika.Content-Encoding":"ISO-8859-1","distanceToCentroid":0.5761632290266712,"tika.Content-Type":"text/plain;
> > charset=ISO-8859-1","clusterId":118,"tika.parsing":"ok"}}
> >
> > Map data type worked as I wanted so far. Now, I would like the value for
> > 'fields_j' key to be also a Map data type. I think it is being assigned
> as
> > 'byteArray' by default.
> >
> > Is there any way by which I can convert this in to a map data type ? That
> > would be helpful for me to process more.
> >
> > I tried to write python UDF but jython only supports python 2.5, I am not
> > sure how to convert this string in to a dictionary in python.
> >
> > Did anyone encounter this type of issue before ?
> >
> > Sorry for the long question, I want to explain my problem clearly.
> >
> > Please let me know your suggestions.
> >
> > Regards,
> >
> > --
> > Kiran Chitturi
> >
> > <http://www.linkedin.com/in/kiranchitturi>
>
>


-- 
Kiran Chitturi

<http://www.linkedin.com/in/kiranchitturi>

Re: Mapping nested json objects to map data type

Posted by Harsha <ha...@defun.org>.
Hi Kiran,
       If you are ok with using java for udfs take a look at this
https://github.com/mozilla-metrics/akela/tree/master/src/main/java/com/mozilla/pig/eval/json 
we Use MapToJson to parse complex json objects from hbase.
-Harsha


-- 
Harsha


On Wednesday, March 13, 2013 at 8:37 PM, kiran chitturi wrote:

> Hi!
> 
> I am using Pig 0.10 version and I have a question about mapping nested JSON
> objects from Hbase.
> 
> *For example: *
> 
> The below commands loads the field family from Hbase.
> 
> fields = load 'hbase://documents' using
> org.apache.pig.backend.hadoop.hbase.HBaseStorage('field:*','-loadKey true
> -limit 5') as (rowkey, metadata:map[]);
> 
> The metadata field looks like below after the above command. ( I used
> 'illustrate fields' to get this)
> 
> {fields_j={"tika.Content-Encoding":"ISO-8859-1","distanceToCentroid":0.5761632290266712,"tika.Content-Type":"text/plain;
> charset=ISO-8859-1","clusterId":118,"tika.parsing":"ok"}}
> 
> Map data type worked as I wanted so far. Now, I would like the value for
> 'fields_j' key to be also a Map data type. I think it is being assigned as
> 'byteArray' by default.
> 
> Is there any way by which I can convert this in to a map data type ? That
> would be helpful for me to process more.
> 
> I tried to write python UDF but jython only supports python 2.5, I am not
> sure how to convert this string in to a dictionary in python.
> 
> Did anyone encounter this type of issue before ?
> 
> Sorry for the long question, I want to explain my problem clearly.
> 
> Please let me know your suggestions.
> 
> Regards,
> 
> -- 
> Kiran Chitturi
> 
> <http://www.linkedin.com/in/kiranchitturi>