You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by gtinside <gt...@gmail.com> on 2014/09/02 22:26:15 UTC

flattening a list in spark sql

Hi ,

I am using jsonRDD in spark sql and having trouble iterating through array
inside the json object. Please refer to the schema below :

-- Preferences: struct (nullable = true)
 |    |-- destinations: array (nullable = true)
 |-- user: string (nullable = true)

Sample Data:

-- Preferences: struct (nullable = true)
 |    |-- destinations: ("Paris","NYC","LA","EWR")
 |-- user: "test1"

-- Preferences: struct (nullable = true)
 |    |-- destinations: ("Paris","SFO")
 |-- user: "test2"


My requirement is to run query for displaying number of user per destination
as follows :

Number of users:10, Destination:Paris
Number of users:20, Destination:NYC
Number of users:30, Destination:SFO

To achieve the above mentioned result, I need to flatten out the
destinations array, but I am not sure how to do it. Can you please help ?

Gaurav




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/flattening-a-list-in-spark-sql-tp13300.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: flattening a list in spark sql

Posted by gtinside <gt...@gmail.com>.
My bad, please ignore, it works !!!



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/flattening-a-list-in-spark-sql-tp13300p13901.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: flattening a list in spark sql

Posted by gtinside <gt...@gmail.com>.
Hi ,

Thanks it worked, really appreciate your help. I have also been trying to do
multiple Lateral Views, but it doesn't seem to be working. 

Query :
hiveContext.sql("Select t2 from fav LATERAL VIEW explode(TABS) tabs1 as t1
LATERAL VIEW explode(t1) tabs2 as t2")

Exception
org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Unresolved
attributes: 't2, tree:

Regards,
Gaurav






--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/flattening-a-list-in-spark-sql-tp13300p13894.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: flattening a list in spark sql

Posted by Michael Armbrust <mi...@databricks.com>.
Yes you can.  HiveContext's functionality is a strict superset of
SQLContext.


On Tue, Sep 2, 2014 at 6:35 PM, gtinside <gt...@gmail.com> wrote:

> Thanks . I am not using hive context . I am loading data from Cassandra and
> then converting it into json and then querying it through SQL context . Can
> I use use hive context to query on a jsonRDD ?
>
> Gaurav
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/flattening-a-list-in-spark-sql-tp13300p13320.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>

Re: flattening a list in spark sql

Posted by gtinside <gt...@gmail.com>.
Thanks . I am not using hive context . I am loading data from Cassandra and
then converting it into json and then querying it through SQL context . Can
I use use hive context to query on a jsonRDD ?

Gaurav



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/flattening-a-list-in-spark-sql-tp13300p13320.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: flattening a list in spark sql

Posted by Michael Armbrust <mi...@databricks.com>.
Check out LATERAL VIEW explode:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LateralView


On Tue, Sep 2, 2014 at 1:26 PM, gtinside <gt...@gmail.com> wrote:

> Hi ,
>
> I am using jsonRDD in spark sql and having trouble iterating through array
> inside the json object. Please refer to the schema below :
>
> -- Preferences: struct (nullable = true)
>  |    |-- destinations: array (nullable = true)
>  |-- user: string (nullable = true)
>
> Sample Data:
>
> -- Preferences: struct (nullable = true)
>  |    |-- destinations: ("Paris","NYC","LA","EWR")
>  |-- user: "test1"
>
> -- Preferences: struct (nullable = true)
>  |    |-- destinations: ("Paris","SFO")
>  |-- user: "test2"
>
>
> My requirement is to run query for displaying number of user per
> destination
> as follows :
>
> Number of users:10, Destination:Paris
> Number of users:20, Destination:NYC
> Number of users:30, Destination:SFO
>
> To achieve the above mentioned result, I need to flatten out the
> destinations array, but I am not sure how to do it. Can you please help ?
>
> Gaurav
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/flattening-a-list-in-spark-sql-tp13300.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>