You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by kant kodali <ka...@gmail.com> on 2017/12/23 11:50:43 UTC

How to use schema from one of the columns of a dataset to parse another column and create a flattened dataset using Spark Streaming 2.2.0?

Hi All,

How to use value (schema) of one of the columns of a dataset to parse
another column and create a flattened dataset using Spark Streaming 2.2.0?

I have the following *source data frame* that I create from reading
messages from Kafka

col1: string
col2: json string


      col1    |   col2
---------------------------------------------------------------------------
   schemaUri1 | "{"name": "foo", "zipcode": 11111}"
   schemaUri2 | "{"name": "bar", "zipcode": 11112, "id": 1234}"
   schemaUri1 | "{"name": "foobar", "zipcode": 11113}"
   schemaUri2 | "{"name": "barfoo", "zipcode": 11114, "id": 1235,
"interest": "reading"}"

*My target data frame*

name   | zipcode | id  | interest
--------------------------------
foo    | 11111  | null | null
bar    | 11112  | 1234 | null
foobar | 11113  | null | null
barfoo | 11114  | 1235 | reading

*Assume you have the following function*

// This function returns a StructType that represents a schema for a given
schemaUri

public StructType getSchema(String schemaUri)