You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Xuelin Cao <xu...@yahoo.com.INVALID> on 2014/12/08 06:08:19 UTC

Spark SQL: How to get the hierarchical element with SQL?

Hi,
    I'm generating a Spark SQL table from an offline Json file.
    The difficulty is, in the original json file, there is a hierarchical structure. And, as a result, this is what I get:
scala> tb.printSchemaroot |-- budget: double (nullable = true) |-- filterIp: array (nullable = true) |    |-- element: string (containsNull = false) |-- status: integer (nullable = true) |-- third_party: integer (nullable = true) |-- userId: integer (nullable = true)
As you may have noticed, the table schema is with a hierarchical structure ("element" field is a sub-field under the "filterIp" field). Then, my question is, how do I access the "element" field with SQL?

Re: Spark SQL: How to get the hierarchical element with SQL?

Posted by Alessandro Panebianco <al...@me.com>.

I went through complex hierarchal JSON structures and Spark seems to fail in querying them no matter what syntax is used.

Hope this helps,

Regards,

Alessandro


> On Dec 8, 2014, at 6:05 AM, Raghavendra Pandey <ra...@gmail.com> wrote:
> 
> Yeah, the dot notation works. It works even for arrays. But I am not sure if it can handle complex hierarchies. 
> 
> On Mon Dec 08 2014 at 11:55:19 AM Cheng Lian <lian.cs.zju@gmail.com <ma...@gmail.com>> wrote:
> You may access it via something like SELECT filterIp.element
>           FROM tb, just like Hive. Or if you’re using Spark SQL DSL, you can use tb.select("filterIp.element".attr).
> 
> On 12/8/14 1:08 PM, Xuelin Cao wrote:
> 
> 
> 
>> 
>> Hi,
>> 
>>     I'm generating a Spark SQL table from an offline Json file.
>> 
>>     The difficulty is, in the original json file, there is a hierarchical structure. And, as a result, this is what I get:
>> 
>> scala> tb.printSchema
>> root
>>  |-- budget: double (nullable = true)
>>  |-- filterIp: array (nullable = true)
>>  |    |-- element: string (containsNull = false)
>>  |-- status: integer (nullable = true)
>>  |-- third_party: integer (nullable = true)
>>  |-- userId: integer (nullable = true)
>> 
>> As you may have noticed, the table schema is with a hierarchical structure ("element" field is a sub-field under the "filterIp" field). Then, my question is, how do I access the "element" field with SQL?
>> 
>> 
> 
>

Re: Spark SQL: How to get the hierarchical element with SQL?

Posted by Raghavendra Pandey <ra...@gmail.com>.

Yeah, the dot notation works. It works even for arrays. But I am not sure
if it can handle complex hierarchies.

On Mon Dec 08 2014 at 11:55:19 AM Cheng Lian <li...@gmail.com> wrote:

>  You may access it via something like SELECT filterIp.element FROM tb,
> just like Hive. Or if you’re using Spark SQL DSL, you can use
> tb.select("filterIp.element".attr).
>
> On 12/8/14 1:08 PM, Xuelin Cao wrote:
>
>
>  Hi,
>
>      I'm generating a Spark SQL table from an offline Json file.
>
>      The difficulty is, in the original json file, there is a
> hierarchical structure. And, as a result, this is what I get:
>
>  scala> tb.printSchema
> root
>  |-- budget: double (nullable = true)
> * |-- filterIp: array (nullable = true)*
> * |    |-- element: string (containsNull = false)*
>  |-- status: integer (nullable = true)
>  |-- third_party: integer (nullable = true)
>  |-- userId: integer (nullable = true)
>
>  As you may have noticed, the table schema is with a hierarchical
> structure ("element" field is a sub-field under the "filterIp" field).
> Then, my question is, how do I access the "element" field with SQL?
>
>
>    
>

Re: Spark SQL: How to get the hierarchical element with SQL?

Posted by Cheng Lian <li...@gmail.com>.

You may access it via something like |SELECT filterIp.element FROM tb|, 
just like Hive. Or if you’re using Spark SQL DSL, you can use 
|tb.select("filterIp.element".attr)|.

On 12/8/14 1:08 PM, Xuelin Cao wrote:

>
> Hi,
>
>     I'm generating a Spark SQL table from an offline Json file.
>
>     The difficulty is, in the original json file, there is a 
> hierarchical structure. And, as a result, this is what I get:
>
> scala> tb.printSchema
> root
>  |-- budget: double (nullable = true)
> * |-- filterIp: array (nullable = true)*
> * |    |-- element: string (containsNull = false)*
>  |-- status: integer (nullable = true)
>  |-- third_party: integer (nullable = true)
>  |-- userId: integer (nullable = true)
>
> As you may have noticed, the table schema is with a hierarchical 
> structure ("element" field is a sub-field under the "filterIp" field). 
> Then, my question is, how do I access the "element" field with SQL?
>
>

Spark SQL: How to get the hierarchical element with SQL?

Posted by Xuelin Cao <xu...@yahoo.com.INVALID>.

Hi,
    I'm generating a Spark SQL table from an offline Json file.
    The difficulty is, in the original json file, there is a hierarchical structure. And, as a result, this is what I get:
scala> tb.printSchemaroot |-- budget: double (nullable = true) |-- filterIp: array (nullable = true) |    |-- element: string (containsNull = false) |-- status: integer (nullable = true) |-- third_party: integer (nullable = true) |-- userId: integer (nullable = true)
As you may have noticed, the table schema is with a hierarchical structure ("element" field is a sub-field under the "filterIp" field). Then, my question is, how do I access the "element" field with SQL?