You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by David Parks <da...@yahoo.com> on 2013/05/22 12:26:26 UTC

RE: [Bulk] pig 0.10.0 JsonLoader and nested list

I'm quite new to Pig, so perhaps my input is off base here, but if you input
one such record without defining the schema I believe the JsonLoader will
define the schema for you, no?  If so, just import one such record and
'describe' the variable to see the schema. Well... if it's not that easy,
then I'm not the right one to answer, but maybe it gets you something quick.
David


-----Original Message-----
From: marian.steinbach@gmail.com [mailto:marian.steinbach@gmail.com] On
Behalf Of Marian Steinbach
Sent: Wednesday, May 22, 2013 4:13 PM
To: user@pig.apache.org
Subject: [Bulk] pig 0.10.0 JsonLoader and nested list

I would like to load a JSON file containing records of the following format:

{
   "area": "ABC",
   "date_day": 1,
   "date_hour": 0,
   ...
   "energy": [["17-16", 1], ["18-17", 2]] }

The "energy" property represents a sparse matrix. It's a list with an
arbitrary number of key-value-pairs (minimum 1). The first element (string)
is the matrix unit key, the second element is the value.

I need both key and value in order to summarize values with matching keys in
my pig job. I understand that it should be possible to import this as a bag.
Correct?

Can anybody tell me how the schema definition passed to the built-in
JsonLoader function should look like?

Thanks in advance!

Marian


Re: [Bulk] pig 0.10.0 JsonLoader and nested list

Posted by Zhu Wayne <zh...@gmail.com>.
It seems that JSonLoader schema is not well documented. Could someone give
us more examples?


On Wed, May 22, 2013 at 5:26 AM, David Parks <da...@yahoo.com> wrote:

> I'm quite new to Pig, so perhaps my input is off base here, but if you
> input
> one such record without defining the schema I believe the JsonLoader will
> define the schema for you, no?  If so, just import one such record and
> 'describe' the variable to see the schema. Well... if it's not that easy,
> then I'm not the right one to answer, but maybe it gets you something
> quick.
> David
>
>
> -----Original Message-----
> From: marian.steinbach@gmail.com [mailto:marian.steinbach@gmail.com] On
> Behalf Of Marian Steinbach
> Sent: Wednesday, May 22, 2013 4:13 PM
> To: user@pig.apache.org
> Subject: [Bulk] pig 0.10.0 JsonLoader and nested list
>
> I would like to load a JSON file containing records of the following
> format:
>
> {
>    "area": "ABC",
>    "date_day": 1,
>    "date_hour": 0,
>    ...
>    "energy": [["17-16", 1], ["18-17", 2]] }
>
> The "energy" property represents a sparse matrix. It's a list with an
> arbitrary number of key-value-pairs (minimum 1). The first element (string)
> is the matrix unit key, the second element is the value.
>
> I need both key and value in order to summarize values with matching keys
> in
> my pig job. I understand that it should be possible to import this as a
> bag.
> Correct?
>
> Can anybody tell me how the schema definition passed to the built-in
> JsonLoader function should look like?
>
> Thanks in advance!
>
> Marian
>
>


-- 
Wayne Zhu
847-282-0596 (Google Voice)