You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by kant kodali <ka...@gmail.com> on 2017/05/17 23:06:11 UTC

How to flatten struct into a dataframe?

Hi,

I have the following schema. And I am trying to put the structure below in
a data frame or dataset such that each in field inside a struct is a column
in a data frame.
I tried to follow this link
<http://stackoverflow.com/questions/38753898/how-to-flatten-a-struct-in-a-spark-dataframe>
and
did the following.

Dataset<Row> df = ds.select(functions.from_json(new Column("value").cast(
"string"), getSchema()).as("payload"));

Dataset<Row> df1 = df.select(df.col("payload.info"));
df1.printSchema();


root
 |-- info: struct (nullable = true)
 |    |-- index: string (nullable = true)
 |    |-- type: string (nullable = true)
 |    |-- id: string (nullable = true)
 |    |-- name: string (nullable = true)
 |    |-- number: integer (nullable = true)


However I get the following

+--------------------+
|                info|
+--------------------+
|[,mango,,fruit...|
|[,apple,,fruit...|

I just want the data frame in the format below. any ideas?

index | type | id | name | number

Thanks!

Re: How to flatten struct into a dataframe?

Posted by kant kodali <ka...@gmail.com>.
Bookmarked that blog post! It answers lot of my questions.

On Wed, May 17, 2017 at 4:25 PM, Michael Armbrust <mi...@databricks.com>
wrote:

> This blog post walks through ways to manipulate complex data
> <https://databricks.com/blog/2017/02/23/working-complex-data-formats-structured-streaming-apache-spark-2-1.html>
> .
>
> To flatten you can run df.selectExpr("payload.info.*")
>
> On Wed, May 17, 2017 at 4:06 PM, kant kodali <ka...@gmail.com> wrote:
>
>> Hi,
>>
>> I have the following schema. And I am trying to put the structure below
>> in a data frame or dataset such that each in field inside a struct is a
>> column in a data frame.
>> I tried to follow this link
>> <http://stackoverflow.com/questions/38753898/how-to-flatten-a-struct-in-a-spark-dataframe> and
>> did the following.
>>
>> Dataset<Row> df = ds.select(functions.from_json(new Column("value").cast(
>> "string"), getSchema()).as("payload"));
>>
>> Dataset<Row> df1 = df.select(df.col("payload.info"));
>> df1.printSchema();
>>
>>
>> root
>>  |-- info: struct (nullable = true)
>>  |    |-- index: string (nullable = true)
>>  |    |-- type: string (nullable = true)
>>  |    |-- id: string (nullable = true)
>>  |    |-- name: string (nullable = true)
>>  |    |-- number: integer (nullable = true)
>>
>>
>> However I get the following
>>
>> +--------------------+
>> |                info|
>> +--------------------+
>> |[,mango,,fruit...|
>> |[,apple,,fruit...|
>>
>> I just want the data frame in the format below. any ideas?
>>
>> index | type | id | name | number
>>
>> Thanks!
>>
>
>

Re: How to flatten struct into a dataframe?

Posted by Michael Armbrust <mi...@databricks.com>.
This blog post walks through ways to manipulate complex data
<https://databricks.com/blog/2017/02/23/working-complex-data-formats-structured-streaming-apache-spark-2-1.html>
.

To flatten you can run df.selectExpr("payload.info.*")

On Wed, May 17, 2017 at 4:06 PM, kant kodali <ka...@gmail.com> wrote:

> Hi,
>
> I have the following schema. And I am trying to put the structure below in
> a data frame or dataset such that each in field inside a struct is a column
> in a data frame.
> I tried to follow this link
> <http://stackoverflow.com/questions/38753898/how-to-flatten-a-struct-in-a-spark-dataframe> and
> did the following.
>
> Dataset<Row> df = ds.select(functions.from_json(new Column("value").cast("
> string"), getSchema()).as("payload"));
>
> Dataset<Row> df1 = df.select(df.col("payload.info"));
> df1.printSchema();
>
>
> root
>  |-- info: struct (nullable = true)
>  |    |-- index: string (nullable = true)
>  |    |-- type: string (nullable = true)
>  |    |-- id: string (nullable = true)
>  |    |-- name: string (nullable = true)
>  |    |-- number: integer (nullable = true)
>
>
> However I get the following
>
> +--------------------+
> |                info|
> +--------------------+
> |[,mango,,fruit...|
> |[,apple,,fruit...|
>
> I just want the data frame in the format below. any ideas?
>
> index | type | id | name | number
>
> Thanks!
>