You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Thomas Edison <ju...@gmail.com> on 2013/05/22 19:01:52 UTC
Use Pig to parse JSON objects
Hi all,
I have a two fields in my pig input file. Let's say product_id and
description. Description is a JSON objects that actually describes the
product.
Is there anything in Pig other than writing a custom UDF to parse the JSON
object so that I can have some like product_id, product_property,
product_property_value? Product_property and product_value are parsed from
the description JSON object. Also one product could have multiple
product_property.
Thanks.
T.E.
Re: Use Pig to parse JSON objects
Posted by Ryan Compton <co...@gmail.com>.
I've been using twitter's elephantbird and have been very happy with
it so far. Here's an example of parsing a nested json with it:
json_eb = LOAD '$IN_DIRS' USING
com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad') as
(json:map[]);
--parse json with twitter's library
parsed0 = FOREACH json_eb GENERATE STRSPLIT(json#'id',':').$2 AS
tweetId:chararray, STRSPLIT(json#'actor'#'id',':').$2 AS
userId:chararray, json#'postedTime' AS postedTime:chararray,
json#'twitter_entities'#'urls' AS
userPostedLinks:bag{T:(urlTypes:map[])};
On Wed, May 22, 2013 at 10:01 AM, Thomas Edison
<ju...@gmail.com> wrote:
> Hi all,
>
> I have a two fields in my pig input file. Let's say product_id and
> description. Description is a JSON objects that actually describes the
> product.
>
> Is there anything in Pig other than writing a custom UDF to parse the JSON
> object so that I can have some like product_id, product_property,
> product_property_value? Product_property and product_value are parsed from
> the description JSON object. Also one product could have multiple
> product_property.
>
> Thanks.
>
> T.E.