You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by kant kodali <ka...@gmail.com> on 2017/12/23 11:50:43 UTC
How to use schema from one of the columns of a dataset to parse
another column and create a flattened dataset using Spark Streaming 2.2.0?
Hi All,
How to use value (schema) of one of the columns of a dataset to parse
another column and create a flattened dataset using Spark Streaming 2.2.0?
I have the following *source data frame* that I create from reading
messages from Kafka
col1: string
col2: json string
col1 | col2
---------------------------------------------------------------------------
schemaUri1 | "{"name": "foo", "zipcode": 11111}"
schemaUri2 | "{"name": "bar", "zipcode": 11112, "id": 1234}"
schemaUri1 | "{"name": "foobar", "zipcode": 11113}"
schemaUri2 | "{"name": "barfoo", "zipcode": 11114, "id": 1235,
"interest": "reading"}"
*My target data frame*
name | zipcode | id | interest
--------------------------------
foo | 11111 | null | null
bar | 11112 | 1234 | null
foobar | 11113 | null | null
barfoo | 11114 | 1235 | reading
*Assume you have the following function*
// This function returns a StructType that represents a schema for a given
schemaUri
public StructType getSchema(String schemaUri)