You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Thomas Edison <ju...@gmail.com> on 2013/05/22 19:01:52 UTC

Use Pig to parse JSON objects

Hi all,

I have a two fields in my pig input file.  Let's say product_id and
description.  Description is a JSON objects that actually describes the
product.

Is there anything in Pig other than writing a custom UDF to parse the JSON
object so that I can have some like product_id, product_property,
product_property_value?  Product_property and product_value are parsed from
the description JSON object.  Also one product could have multiple
product_property.

Thanks.

T.E.

Re: Use Pig to parse JSON objects

Posted by Ryan Compton <co...@gmail.com>.
I've been using twitter's elephantbird and have been very happy with
it so far. Here's an example of parsing a nested json with it:

json_eb = LOAD '$IN_DIRS' USING
com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad') as
(json:map[]);

--parse json with twitter's library
parsed0 = FOREACH json_eb GENERATE  STRSPLIT(json#'id',':').$2 AS
tweetId:chararray, STRSPLIT(json#'actor'#'id',':').$2 AS
userId:chararray, json#'postedTime' AS postedTime:chararray,
json#'twitter_entities'#'urls' AS
userPostedLinks:bag{T:(urlTypes:map[])};


On Wed, May 22, 2013 at 10:01 AM, Thomas Edison
<ju...@gmail.com> wrote:
> Hi all,
>
> I have a two fields in my pig input file.  Let's say product_id and
> description.  Description is a JSON objects that actually describes the
> product.
>
> Is there anything in Pig other than writing a custom UDF to parse the JSON
> object so that I can have some like product_id, product_property,
> product_property_value?  Product_property and product_value are parsed from
> the description JSON object.  Also one product could have multiple
> product_property.
>
> Thanks.
>
> T.E.