You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Selvam Raman <se...@gmail.com> on 2017/03/23 21:03:16 UTC

how to read object field within json file

Hi,

{
"id": "test1",
"source": {
    "F1": {
      "id": "4970",
      "eId": "F1",
      "description": "test1",
    },
    "F2": {
      "id": "5070",
      "eId": "F2",
      "description": "test2",
    },
    "F3": {
      "id": "5170",
      "eId": "F3",
      "description": "test3",
    },
    "F4":{}
      etc..
      "F999":{}
}

I am having bzip json files like above format.
some json row contains two objects within source(like F1 and F2), sometime
five(F1,F2,F3,F4,F5),etc. So the final schema will contains combination of
all objects for the source field.

Now, every row will contain n number of objects but only some contains
valid records.
how can i retreive the value of "description" in "source" field.

source.F1.description - returns the result but how can i get all
description result for every row..(something like this
"source.*.description").

-- 
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"

Re: how to read object field within json file

Posted by Yong Zhang <ja...@hotmail.com>.

I missed the part to pass in a schema to force the "struct" to a Map, then use explode. Good option.

Yong

________________________________
From: Michael Armbrust <mi...@databricks.com>
Sent: Friday, March 24, 2017 3:02 PM
To: Yong Zhang
Cc: Selvam Raman; user
Subject: Re: how to read object field within json file

I'm not sure you can parse this as an Array, but you can hint to the parser that you would like to treat source as a map instead of as a struct.  This is a good strategy when you have dynamic columns in your data.

Here is an example of the schema you can use to parse this JSON and also how to use explode to turn it into separate rows<https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/1023043053387187/679071429109042/2840265927289860/latest.html>.  This blog post has more on working with semi-structured data in Spark<https://databricks.com/blog/2017/02/23/working-complex-data-formats-structured-streaming-apache-spark-2-1.html>.

On Thu, Mar 23, 2017 at 2:49 PM, Yong Zhang <ja...@hotmail.com>> wrote:

That's why your "source" should be defined as an Array[Struct] type (which makes sense in this case, it has an undetermined length  , so you can explode it and get the description easily.

Now you need write your own UDF, maybe can do what you want.

Yong

________________________________
From: Selvam Raman <se...@gmail.com>>
Sent: Thursday, March 23, 2017 5:03 PM
To: user
Subject: how to read object field within json file

Hi,

{
"id": "test1",
"source": {
    "F1": {
      "id": "4970",
      "eId": "F1",
      "description": "test1",
    },
    "F2": {
      "id": "5070",
      "eId": "F2",
      "description": "test2",
    },
    "F3": {
      "id": "5170",
      "eId": "F3",
      "description": "test3",
    },
    "F4":{}
      etc..
      "F999":{}
}

I am having bzip json files like above format.
some json row contains two objects within source(like F1 and F2), sometime five(F1,F2,F3,F4,F5),etc. So the final schema will contains combination of all objects for the source field.

Now, every row will contain n number of objects but only some contains valid records.
how can i retreive the value of "description" in "source" field.

source.F1.description - returns the result but how can i get all description result for every row..(something like this "source.*.description").

--
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"

Re: how to read object field within json file

Posted by Selvam Raman <se...@gmail.com>.

Thank you Armbust.

On Fri, Mar 24, 2017 at 7:02 PM, Michael Armbrust <mi...@databricks.com>
wrote:

> I'm not sure you can parse this as an Array, but you can hint to the
> parser that you would like to treat source as a map instead of as a
> struct.  This is a good strategy when you have dynamic columns in your data.
>
> Here is an example of the schema you can use to parse this JSON and also
> how to use explode to turn it into separate rows
> <https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/1023043053387187/679071429109042/2840265927289860/latest.html>.
> This blog post has more on working with semi-structured data in Spark
> <https://databricks.com/blog/2017/02/23/working-complex-data-formats-structured-streaming-apache-spark-2-1.html>
> .
>
> On Thu, Mar 23, 2017 at 2:49 PM, Yong Zhang <ja...@hotmail.com> wrote:
>
>> That's why your "source" should be defined as an Array[Struct] type
>> (which makes sense in this case, it has an undetermined length  , so you
>> can explode it and get the description easily.
>>
>> Now you need write your own UDF, maybe can do what you want.
>>
>> Yong
>>
>> ------------------------------
>> *From:* Selvam Raman <se...@gmail.com>
>> *Sent:* Thursday, March 23, 2017 5:03 PM
>> *To:* user
>> *Subject:* how to read object field within json file
>>
>> Hi,
>>
>> {
>> "id": "test1",
>> "source": {
>>     "F1": {
>>       "id": "4970",
>>       "eId": "F1",
>>       "description": "test1",
>>     },
>>     "F2": {
>>       "id": "5070",
>>       "eId": "F2",
>>       "description": "test2",
>>     },
>>     "F3": {
>>       "id": "5170",
>>       "eId": "F3",
>>       "description": "test3",
>>     },
>>     "F4":{}
>>       etc..
>>       "F999":{}
>> }
>>
>> I am having bzip json files like above format.
>> some json row contains two objects within source(like F1 and F2),
>> sometime five(F1,F2,F3,F4,F5),etc. So the final schema will contains
>> combination of all objects for the source field.
>>
>> Now, every row will contain n number of objects but only some contains
>> valid records.
>> how can i retreive the value of "description" in "source" field.
>>
>> source.F1.description - returns the result but how can i get all
>> description result for every row..(something like this
>> "source.*.description").
>>
>> --
>> Selvam Raman
>> "லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
>>
>
>


-- 
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"

Re: how to read object field within json file

Posted by Michael Armbrust <mi...@databricks.com>.

I'm not sure you can parse this as an Array, but you can hint to the parser
that you would like to treat source as a map instead of as a struct.  This
is a good strategy when you have dynamic columns in your data.

Here is an example of the schema you can use to parse this JSON and also
how to use explode to turn it into separate rows
<https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/1023043053387187/679071429109042/2840265927289860/latest.html>.
This blog post has more on working with semi-structured data in Spark
<https://databricks.com/blog/2017/02/23/working-complex-data-formats-structured-streaming-apache-spark-2-1.html>
.

On Thu, Mar 23, 2017 at 2:49 PM, Yong Zhang <ja...@hotmail.com> wrote:

> That's why your "source" should be defined as an Array[Struct] type (which
> makes sense in this case, it has an undetermined length  , so you can
> explode it and get the description easily.
>
> Now you need write your own UDF, maybe can do what you want.
>
> Yong
>
> ------------------------------
> *From:* Selvam Raman <se...@gmail.com>
> *Sent:* Thursday, March 23, 2017 5:03 PM
> *To:* user
> *Subject:* how to read object field within json file
>
> Hi,
>
> {
> "id": "test1",
> "source": {
>     "F1": {
>       "id": "4970",
>       "eId": "F1",
>       "description": "test1",
>     },
>     "F2": {
>       "id": "5070",
>       "eId": "F2",
>       "description": "test2",
>     },
>     "F3": {
>       "id": "5170",
>       "eId": "F3",
>       "description": "test3",
>     },
>     "F4":{}
>       etc..
>       "F999":{}
> }
>
> I am having bzip json files like above format.
> some json row contains two objects within source(like F1 and F2), sometime
> five(F1,F2,F3,F4,F5),etc. So the final schema will contains combination of
> all objects for the source field.
>
> Now, every row will contain n number of objects but only some contains
> valid records.
> how can i retreive the value of "description" in "source" field.
>
> source.F1.description - returns the result but how can i get all
> description result for every row..(something like this
> "source.*.description").
>
> --
> Selvam Raman
> "லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
>

Re: how to read object field within json file

Posted by Yong Zhang <ja...@hotmail.com>.

That's why your "source" should be defined as an Array[Struct] type (which makes sense in this case, it has an undetermined length  , so you can explode it and get the description easily.

Now you need write your own UDF, maybe can do what you want.

Yong

________________________________
From: Selvam Raman <se...@gmail.com>
Sent: Thursday, March 23, 2017 5:03 PM
To: user
Subject: how to read object field within json file

Hi,

{
"id": "test1",
"source": {
    "F1": {
      "id": "4970",
      "eId": "F1",
      "description": "test1",
    },
    "F2": {
      "id": "5070",
      "eId": "F2",
      "description": "test2",
    },
    "F3": {
      "id": "5170",
      "eId": "F3",
      "description": "test3",
    },
    "F4":{}
      etc..
      "F999":{}
}

I am having bzip json files like above format.
some json row contains two objects within source(like F1 and F2), sometime five(F1,F2,F3,F4,F5),etc. So the final schema will contains combination of all objects for the source field.

Now, every row will contain n number of objects but only some contains valid records.
how can i retreive the value of "description" in "source" field.

source.F1.description - returns the result but how can i get all description result for every row..(something like this "source.*.description").

--
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"