You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by "Chaudhary, Umesh" <Um...@searshc.com> on 2015/07/01 13:45:48 UTC

Json Dataframe formation and Querying

Hi,
I am creating DataFrame from a json file and the schema of json as truely depicted by dataframe.printschema() is:

root
|-- 1-F2: struct (nullable = true)
|    |-- A: string (nullable = true)
|    |-- B: string (nullable = true)
|    |-- C: string (nullable = true)
|-- 10-C4: struct (nullable = true)
|    |-- A: string (nullable = true)
|    |-- D: string (nullable = true)
|    |-- E: string (nullable = true)
|-- 11-B5: struct (nullable = true)
|    |-- A: string (nullable = true)
|    |-- D: string (nullable = true)
|    |-- F: string (nullable = true)
|    |-- G: string (nullable = true)

In the above schema ; struct type elements {1-F2 ; 10-C4; 11-B5 } are dynamic. These kind of dynamic schema can be easily parsed by any parser (e.g. gson, jackson) and Map type structure makes it easy to query back and transform but in Spark 1.4 how should I query back using construct like :

dataframe.select([0]).show()  --> Index based query

I tried to save it as Table and then tried to describe it back using spark-sql repl but it is unable to find my table.

What is the preferred way to deal with this type of use case in Spark?

Regards,
Umesh Chaudhary

This message, including any attachments, is the property of Sears Holdings Corporation and/or one of its subsidiaries. It is confidential and may contain proprietary or legally privileged information. If you are not the intended recipient, please delete it without reading the contents. Thank you.