You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Xinh Huynh <xi...@gmail.com> on 2016/05/05 18:53:22 UTC

Accessing JSON array in Spark SQL

Hi,

I am having trouble accessing an array element in JSON data with a
dataframe. Here is the schema:

val json1 = """{"f1":"1", "f1a":[{"f2":"2"}] } }"""
val rdd1 = sc.parallelize(List(json1))
val df1 = sqlContext.read.json(rdd1)
df1.printSchema()

root |-- f1: string (nullable = true) |-- f1a: array (nullable = true) |
|-- element: struct (containsNull = true) | | |-- f2: string (nullable =
true)

I would expect to be able to select the first element of "f1a" this way:
df1.select("f1a[0]").show()

org.apache.spark.sql.AnalysisException: cannot resolve 'f1a[0]' given input
columns f1, f1a;

This is with Spark 1.6.0.

Please help. A follow-up question is: can I access arbitrary levels of
nested JSON array of struct of array of struct?

Thanks,
Xinh

Re: Accessing JSON array in Spark SQL

Posted by Michael Armbrust <mi...@databricks.com>.
use df.selectExpr to evaluate complex expression (instead of just column
names).

On Thu, May 5, 2016 at 11:53 AM, Xinh Huynh <xi...@gmail.com> wrote:

> Hi,
>
> I am having trouble accessing an array element in JSON data with a
> dataframe. Here is the schema:
>
> val json1 = """{"f1":"1", "f1a":[{"f2":"2"}] } }"""
> val rdd1 = sc.parallelize(List(json1))
> val df1 = sqlContext.read.json(rdd1)
> df1.printSchema()
>
> root |-- f1: string (nullable = true) |-- f1a: array (nullable = true) |
> |-- element: struct (containsNull = true) | | |-- f2: string (nullable =
> true)
>
> I would expect to be able to select the first element of "f1a" this way:
> df1.select("f1a[0]").show()
>
> org.apache.spark.sql.AnalysisException: cannot resolve 'f1a[0]' given
> input columns f1, f1a;
>
> This is with Spark 1.6.0.
>
> Please help. A follow-up question is: can I access arbitrary levels of
> nested JSON array of struct of array of struct?
>
> Thanks,
> Xinh
>