You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by lk_spark <lk...@163.com> on 2016/10/19 03:35:54 UTC

how to extract arraytype data to file

hi，all:
I want to read a json file and search it by sql .
the data struct should be :
bid: string (nullable = true)
code: string (nullable = true)
and the json file data should be like :
     {bid":"MzI4MTI5MzcyNw==","code":"罗甸网警"}
     {"bid":"MzI3MzQ5Nzc2Nw==","code":"西早君"}
but in fact my json file data is :
    {"bizs":[ {bid":"MzI4MTI5MzcyNw==","code":"罗甸网警"},{"bid":"MzI3MzQ5Nzc2Nw==","code":"西早君"}]}
    {"bizs":[ {bid":"MzI4MTI5Mzcy00==","code":"罗甸网警"},{"bid":"MzI3MzQ5Nzc201==","code":"西早君"}]}

Ｉ load it by spark ,data schema shows like this :
root
 |-- bizs: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- bid: string (nullable = true)
 |    |    |-- code: string (nullable = true)

I can select columns by : df.select("bizs.id","bizs.name")
but the colume values is in array type:
+--------------------+--------------------+
|                  id|                code|
+--------------------+--------------------+
|[4938200, 4938201...|[罗甸网警, 室内设计师杨焰红, ...|
|[4938300, 4938301...|[SDCS十全九美, 旅梦长大, ...|
|[4938400, 4938401...|[日重重工液压行走回转, 氧老家,...|
|[4938500, 4938501...|[PABXSLZ, 陈少燕, 笑蜜...|
|[4938600, 4938601...|[税海微云, 西域美农云家店, 福...|
+--------------------+--------------------+

what I want is I can read colum in normal row type. how I can do it ?

2016-10-19


lk_spark

Re: Re: how to extract arraytype data to file

Posted by lk_spark <lk...@163.com>.

Thank you, all of you. explode() is helpful:

df.selectExpr("explode(bizs) as e").select("e.*").show()

2016-10-19 

lk_spark 



发件人：Hyukjin Kwon <gu...@gmail.com>
发送时间：2016-10-19 13:16
主题：Re: how to extract arraytype data to file
收件人："Divya Gehlot"<di...@gmail.com>
抄送："lk_spark"<lk...@spark.apache.org>

This reminds me of https://github.com/databricks/spark-xml/issues/141#issuecomment-234835577

Maybe using explode() would be helpful.


Thanks!


2016-10-19 14:05 GMT+09:00 Divya Gehlot <di...@gmail.com>:

http://stackoverflow.com/questions/33864389/how-can-i-create-a-spark-dataframe-from-a-nested-array-of-struct-element



Hope this helps 




Thanks,
Divya 


On 19 October 2016 at 11:35, lk_spark <lk...@163.com> wrote:

hi，all:
I want to read a json file and search it by sql .
the data struct should be :
bid: string (nullable = true)
code: string (nullable = true)
and the json file data should be like :
     {bid":"MzI4MTI5MzcyNw==","code":"罗甸网警"}
     {"bid":"MzI3MzQ5Nzc2Nw==","code":"西早君"}
but in fact my json file data is :
    {"bizs":[ {bid":"MzI4MTI5MzcyNw==","code":"罗甸网警"},{"bid":"MzI3MzQ5Nzc2Nw==","code":"西早君"}]}
    {"bizs":[ {bid":"MzI4MTI5Mzcy00==","code":"罗甸网警"},{"bid":"MzI3MzQ5Nzc201==","code":"西早君"}]}

Ｉ load it by spark ,data schema shows like this :
root
 |-- bizs: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- bid: string (nullable = true)
 |    |    |-- code: string (nullable = true)

I can select columns by : df.select("bizs.id","bizs.name")
but the colume values is in array type:
+--------------------+--------------------+
|                  id|                code|
+--------------------+--------------------+
|[4938200, 4938201...|[罗甸网警, 室内设计师杨焰红, ...|
|[4938300, 4938301...|[SDCS十全九美, 旅梦长大, ...|
|[4938400, 4938401...|[日重重工液压行走回转, 氧老家,...|
|[4938500, 4938501...|[PABXSLZ, 陈少燕, 笑蜜...|
|[4938600, 4938601...|[税海微云, 西域美农云家店, 福...|
+--------------------+--------------------+

what I want is I can read colum in normal row type. how I can do it ?

2016-10-19


lk_spark

Re: how to extract arraytype data to file

Posted by Hyukjin Kwon <gu...@gmail.com>.

This reminds me of
https://github.com/databricks/spark-xml/issues/141#issuecomment-234835577

Maybe using explode() would be helpful.

Thanks!

2016-10-19 14:05 GMT+09:00 Divya Gehlot <di...@gmail.com>:

> http://stackoverflow.com/questions/33864389/how-can-i-
> create-a-spark-dataframe-from-a-nested-array-of-struct-element
>
> Hope this helps
>
>
> Thanks,
> Divya
>
> On 19 October 2016 at 11:35, lk_spark <lk...@163.com> wrote:
>
>> hi，all:
>> I want to read a json file and search it by sql .
>> the data struct should be :
>>
>> bid: string (nullable = true)
>> code: string (nullable = true)
>>
>> and the json file data should be like :
>>      {bid":"MzI4MTI5MzcyNw==","code":"罗甸网警"}
>>      {"bid":"MzI3MzQ5Nzc2Nw==","code":"西早君"}
>> but in fact my json file data is :
>>     {"bizs":[ {bid":"MzI4MTI5MzcyNw==","code
>> ":"罗甸网警"},{"bid":"MzI3MzQ5Nzc2Nw==","code":"西早君"}]}
>>     {"bizs":[ {bid":"MzI4MTI5Mzcy00==","code
>> ":"罗甸网警"},{"bid":"MzI3MzQ5Nzc201==","code":"西早君"}]}
>> Ｉ load it by spark ,data schema shows like this :
>>
>> root
>>  |-- bizs: array (nullable = true)
>>  |    |-- element: struct (containsNull = true)
>>  |    |    |-- bid: string (nullable = true)
>>  |    |    |-- code: string (nullable = true)
>>
>>
>> I can select columns by : df.select("bizs.id","bizs.name")
>> but the colume values is in array type:
>> +--------------------+--------------------+
>> |                  id|                code|
>> +--------------------+--------------------+
>> |[4938200, 4938201...|[罗甸网警, 室内设计师杨焰红, ...|
>> |[4938300, 4938301...|[SDCS十全九美, 旅梦长大, ...|
>> |[4938400, 4938401...|[日重重工液压行走回转, 氧老家,...|
>> |[4938500, 4938501...|[PABXSLZ, 陈少燕, 笑蜜...|
>> |[4938600, 4938601...|[税海微云, 西域美农云家店, 福...|
>> +--------------------+--------------------+
>>
>> what I want is I can read colum in normal row type. how I can do it ?
>> 2016-10-19
>> ------------------------------
>> lk_spark
>>
>
>

Re: how to extract arraytype data to file

Posted by Divya Gehlot <di...@gmail.com>.

http://stackoverflow.com/questions/33864389/how-can-i-create-a-spark-dataframe-from-a-nested-array-of-struct-element

Hope this helps


Thanks,
Divya

On 19 October 2016 at 11:35, lk_spark <lk...@163.com> wrote:

> hi，all:
> I want to read a json file and search it by sql .
> the data struct should be :
>
> bid: string (nullable = true)
> code: string (nullable = true)
>
> and the json file data should be like :
>      {bid":"MzI4MTI5MzcyNw==","code":"罗甸网警"}
>      {"bid":"MzI3MzQ5Nzc2Nw==","code":"西早君"}
> but in fact my json file data is :
>     {"bizs":[ {bid":"MzI4MTI5MzcyNw==","code":"罗甸网警"},{"bid":"
> MzI3MzQ5Nzc2Nw==","code":"西早君"}]}
>     {"bizs":[ {bid":"MzI4MTI5Mzcy00==","code":"罗甸网警"},{"bid":"
> MzI3MzQ5Nzc201==","code":"西早君"}]}
> Ｉ load it by spark ,data schema shows like this :
>
> root
>  |-- bizs: array (nullable = true)
>  |    |-- element: struct (containsNull = true)
>  |    |    |-- bid: string (nullable = true)
>  |    |    |-- code: string (nullable = true)
>
>
> I can select columns by : df.select("bizs.id","bizs.name")
> but the colume values is in array type:
> +--------------------+--------------------+
> |                  id|                code|
> +--------------------+--------------------+
> |[4938200, 4938201...|[罗甸网警, 室内设计师杨焰红, ...|
> |[4938300, 4938301...|[SDCS十全九美, 旅梦长大, ...|
> |[4938400, 4938401...|[日重重工液压行走回转, 氧老家,...|
> |[4938500, 4938501...|[PABXSLZ, 陈少燕, 笑蜜...|
> |[4938600, 4938601...|[税海微云, 西域美农云家店, 福...|
> +--------------------+--------------------+
>
> what I want is I can read colum in normal row type. how I can do it ?
> 2016-10-19
> ------------------------------
> lk_spark
>

RE: how to extract arraytype data to file

Posted by "Kappaganthu, Sivaram (ES)" <Si...@ADP.com>.

There is an option called Explode for this .

From: lk_spark [mailto:lk_spark@163.com]
Sent: Wednesday, October 19, 2016 9:06 AM
To: user.spark
Subject: how to extract arraytype data to file

hi，all:
I want to read a json file and search it by sql .
the data struct should be :
bid: string (nullable = true)
code: string (nullable = true)
and the json file data should be like :
     {bid":"MzI4MTI5MzcyNw==","code":"罗甸网警"}
     {"bid":"MzI3MzQ5Nzc2Nw==","code":"西早君"}
but in fact my json file data is :
    {"bizs":[ {bid":"MzI4MTI5MzcyNw==","code":"罗甸网警"},{"bid":"MzI3MzQ5Nzc2Nw==","code":"西早君"}]}
    {"bizs":[ {bid":"MzI4MTI5Mzcy00==","code":"罗甸网警"},{"bid":"MzI3MzQ5Nzc201==","code":"西早君"}]}
Ｉ load it by spark ,data schema shows like this :
root
 |-- bizs: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- bid: string (nullable = true)
 |    |    |-- code: string (nullable = true)

I can select columns by : df.select("bizs.id","bizs.name")
but the colume values is in array type:
+--------------------+--------------------+
|                  id|                code|
+--------------------+--------------------+
|[4938200, 4938201...|[罗甸网警, 室内设计师杨焰红, ...|
|[4938300, 4938301...|[SDCS十全九美, 旅梦长大, ...|
|[4938400, 4938401...|[日重重工液压行走回转, 氧老家,...|
|[4938500, 4938501...|[PABXSLZ, 陈少燕, 笑蜜...|
|[4938600, 4938601...|[税海微云, 西域美农云家店, 福...|
+--------------------+--------------------+

what I want is I can read colum in normal row type. how I can do it ?
2016-10-19
________________________________
lk_spark

----------------------------------------------------------------------
This message and any attachments are intended only for the use of the addressee and may contain information that is privileged and confidential. If the reader of the message is not the intended recipient or an authorized representative of the intended recipient, you are hereby notified that any dissemination of this communication is strictly prohibited. If you have received this communication in error, notify the sender immediately by return email and delete the message and any attachments from your system.