You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Xuelin Cao <xu...@yahoo.com.INVALID> on 2014/12/08 06:08:19 UTC
Spark SQL: How to get the hierarchical element with SQL?
Hi,
I'm generating a Spark SQL table from an offline Json file.
The difficulty is, in the original json file, there is a hierarchical structure. And, as a result, this is what I get:
scala> tb.printSchemaroot |-- budget: double (nullable = true) |-- filterIp: array (nullable = true) | |-- element: string (containsNull = false) |-- status: integer (nullable = true) |-- third_party: integer (nullable = true) |-- userId: integer (nullable = true)
As you may have noticed, the table schema is with a hierarchical structure ("element" field is a sub-field under the "filterIp" field). Then, my question is, how do I access the "element" field with SQL?
Re: Spark SQL: How to get the hierarchical element with SQL?
Posted by Alessandro Panebianco <al...@me.com>.
I went through complex hierarchal JSON structures and Spark seems to fail in querying them no matter what syntax is used.
Hope this helps,
Regards,
Alessandro
> On Dec 8, 2014, at 6:05 AM, Raghavendra Pandey <ra...@gmail.com> wrote:
>
> Yeah, the dot notation works. It works even for arrays. But I am not sure if it can handle complex hierarchies.
>
> On Mon Dec 08 2014 at 11:55:19 AM Cheng Lian <lian.cs.zju@gmail.com <ma...@gmail.com>> wrote:
> You may access it via something like SELECT filterIp.element
> FROM tb, just like Hive. Or if you’re using Spark SQL DSL, you can use tb.select("filterIp.element".attr).
>
> On 12/8/14 1:08 PM, Xuelin Cao wrote:
>
>
>
>>
>> Hi,
>>
>> I'm generating a Spark SQL table from an offline Json file.
>>
>> The difficulty is, in the original json file, there is a hierarchical structure. And, as a result, this is what I get:
>>
>> scala> tb.printSchema
>> root
>> |-- budget: double (nullable = true)
>> |-- filterIp: array (nullable = true)
>> | |-- element: string (containsNull = false)
>> |-- status: integer (nullable = true)
>> |-- third_party: integer (nullable = true)
>> |-- userId: integer (nullable = true)
>>
>> As you may have noticed, the table schema is with a hierarchical structure ("element" field is a sub-field under the "filterIp" field). Then, my question is, how do I access the "element" field with SQL?
>>
>>
>
>
Re: Spark SQL: How to get the hierarchical element with SQL?
Posted by Raghavendra Pandey <ra...@gmail.com>.
Yeah, the dot notation works. It works even for arrays. But I am not sure
if it can handle complex hierarchies.
On Mon Dec 08 2014 at 11:55:19 AM Cheng Lian <li...@gmail.com> wrote:
> You may access it via something like SELECT filterIp.element FROM tb,
> just like Hive. Or if you’re using Spark SQL DSL, you can use
> tb.select("filterIp.element".attr).
>
> On 12/8/14 1:08 PM, Xuelin Cao wrote:
>
>
> Hi,
>
> I'm generating a Spark SQL table from an offline Json file.
>
> The difficulty is, in the original json file, there is a
> hierarchical structure. And, as a result, this is what I get:
>
> scala> tb.printSchema
> root
> |-- budget: double (nullable = true)
> * |-- filterIp: array (nullable = true)*
> * | |-- element: string (containsNull = false)*
> |-- status: integer (nullable = true)
> |-- third_party: integer (nullable = true)
> |-- userId: integer (nullable = true)
>
> As you may have noticed, the table schema is with a hierarchical
> structure ("element" field is a sub-field under the "filterIp" field).
> Then, my question is, how do I access the "element" field with SQL?
>
>
>
>
Re: Spark SQL: How to get the hierarchical element with SQL?
Posted by Cheng Lian <li...@gmail.com>.
You may access it via something like |SELECT filterIp.element FROM tb|,
just like Hive. Or if you’re using Spark SQL DSL, you can use
|tb.select("filterIp.element".attr)|.
On 12/8/14 1:08 PM, Xuelin Cao wrote:
>
> Hi,
>
> I'm generating a Spark SQL table from an offline Json file.
>
> The difficulty is, in the original json file, there is a
> hierarchical structure. And, as a result, this is what I get:
>
> scala> tb.printSchema
> root
> |-- budget: double (nullable = true)
> * |-- filterIp: array (nullable = true)*
> * | |-- element: string (containsNull = false)*
> |-- status: integer (nullable = true)
> |-- third_party: integer (nullable = true)
> |-- userId: integer (nullable = true)
>
> As you may have noticed, the table schema is with a hierarchical
> structure ("element" field is a sub-field under the "filterIp" field).
> Then, my question is, how do I access the "element" field with SQL?
>
>
Spark SQL: How to get the hierarchical element with SQL?
Posted by Xuelin Cao <xu...@yahoo.com.INVALID>.
Hi,
I'm generating a Spark SQL table from an offline Json file.
The difficulty is, in the original json file, there is a hierarchical structure. And, as a result, this is what I get:
scala> tb.printSchemaroot |-- budget: double (nullable = true) |-- filterIp: array (nullable = true) | |-- element: string (containsNull = false) |-- status: integer (nullable = true) |-- third_party: integer (nullable = true) |-- userId: integer (nullable = true)
As you may have noticed, the table schema is with a hierarchical structure ("element" field is a sub-field under the "filterIp" field). Then, my question is, how do I access the "element" field with SQL?