You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Sameer Tilak <ss...@live.com> on 2014/07/25 23:25:01 UTC

Spark SQL and Hive tables

Hi All,I am trying to load data from Hive tables using Spark SQL. I am using spark-shell. Here is what I see: 
val trainingDataTable = sql("""SELECT prod.prod_num, demographics.gender, demographics.birth_year, demographics.income_group  FROM prod p JOIN demographics d ON d.user_id = p.user_id""")
14/07/25 14:18:46 INFO Analyzer: Max iterations (2) reached for batch MultiInstanceRelations14/07/25 14:18:46 INFO Analyzer: Max iterations (2) reached for batch CaseInsensitiveAttributeReferencesjava.lang.RuntimeException: Table Not Found: prod.
I have these tables in hive. I used show tables command to confirm this. Can someone please let me know how do I make them accessible here?

RE: Spark SQL and Hive tables

Posted by Sameer Tilak <ss...@live.com>.

Thanks, Michael.
From: michael@databricks.com
Date: Fri, 25 Jul 2014 14:49:00 -0700
Subject: Re: Spark SQL and Hive tables
To: user@spark.apache.org

>From the programming guide:
When working with Hive one must construct a HiveContext, which inherits from SQLContext, and adds support for finding tables in in the MetaStore and writing queries using HiveQL.

 conf/ is a top level directory in the spark distribution that you downloaded.

On Fri, Jul 25, 2014 at 2:35 PM, Sameer Tilak <ss...@live.com> wrote:

Hi Jerry,Thanks for your reply. I was following the steps in this programming guide. It does not mention anything about creating HiveContext or HQL explicitly. 

http://databricks.com/blog/2014/03/26/spark-sql-manipulating-structured-data-using-spark-2.html

Users(userId INT, name String, email STRING,

age INT, latitude: DOUBLE, longitude: DOUBLE,
subscribed: BOOLEAN)Events(userId INT, action INT)

Given the data stored in in these tables, one might want to build a model that will predict which users are good targets for a new campaign, based on users that are similar.

// Data can easily be extracted from existing sources,
// such as Apache Hive.
val trainingDataTable = sql("""
  SELECT e.action
         u.age,
         u.latitude,
         u.logitude
  FROM Users u
  JOIN Events e
  ON u.userId = e.userId""")

Date: Fri, 25 Jul 2014 17:27:17 -0400
Subject: Re: Spark SQL and Hive tables
From: chilinglam@gmail.com

To: user@spark.apache.org

Hi Sameer,
Maybe this page will help you: https://spark.apache.org/docs/latest/sql-programming-guide.html#hive-tables

Best Regards,
Jerry

On Fri, Jul 25, 2014 at 5:25 PM, Sameer Tilak <ss...@live.com> wrote:

Hi All,I am trying to load data from Hive tables using Spark SQL. I am using spark-shell. Here is what I see: 
val trainingDataTable = sql("""SELECT prod.prod_num, demographics.gender, demographics.birth_year, demographics.income_group  FROM prod p JOIN demographics d ON d.user_id = p.user_id""")

14/07/25 14:18:46 INFO Analyzer: Max iterations (2) reached for batch MultiInstanceRelations14/07/25 14:18:46 INFO Analyzer: Max iterations (2) reached for batch CaseInsensitiveAttributeReferences

java.lang.RuntimeException: Table Not Found: prod.
I have these tables in hive. I used show tables command to confirm this. Can someone please let me know how do I make them accessible here?

Re: Spark SQL and Hive tables

Posted by Michael Armbrust <mi...@databricks.com>.

>From the programming guide:

When working with Hive one must construct a HiveContext, which inherits
> from SQLContext, and adds support for finding tables in in the MetaStore
> and writing queries using HiveQL.


 conf/ is a top level directory in the spark distribution that you
downloaded.


On Fri, Jul 25, 2014 at 2:35 PM, Sameer Tilak <ss...@live.com> wrote:

> Hi Jerry,
> Thanks for your reply. I was following the steps in this programming
> guide. It does not mention anything about creating HiveContext or HQL
> explicitly.
>
>
>
> http://databricks.com/blog/2014/03/26/spark-sql-manipulating-structured-data-using-spark-2.html
>
>
>    - Users(userId INT, name String, email STRING,
>    age INT, latitude: DOUBLE, longitude: DOUBLE,
>    subscribed: BOOLEAN)
>    - Events(userId INT, action INT)
>
> Given the data stored in in these tables, one might want to build a model
> that will predict which users are good targets for a new campaign, based on
> users that are similar.
>
> // Data can easily be extracted from existing sources,// such as Apache Hive.val trainingDataTable = sql("""  SELECT e.action         u.age,         u.latitude,         u.logitude  FROM Users u  JOIN Events e  ON u.userId = e.userId""")
>
>
>
> ------------------------------
> Date: Fri, 25 Jul 2014 17:27:17 -0400
> Subject: Re: Spark SQL and Hive tables
> From: chilinglam@gmail.com
> To: user@spark.apache.org
>
>
> Hi Sameer,
>
> Maybe this page will help you:
> https://spark.apache.org/docs/latest/sql-programming-guide.html#hive-tables
>
> Best Regards,
>
> Jerry
>
>
>
> On Fri, Jul 25, 2014 at 5:25 PM, Sameer Tilak <ss...@live.com> wrote:
>
> Hi All,
> I am trying to load data from Hive tables using Spark SQL. I am using
> spark-shell. Here is what I see:
>
> val trainingDataTable = sql("""SELECT prod.prod_num, demographics.gender,
> demographics.birth_year, demographics.income_group  FROM prod p JOIN
> demographics d ON d.user_id = p.user_id""")
>
> 14/07/25 14:18:46 INFO Analyzer: Max iterations (2) reached for batch
> MultiInstanceRelations
> 14/07/25 14:18:46 INFO Analyzer: Max iterations (2) reached for batch
> CaseInsensitiveAttributeReferences
> java.lang.RuntimeException: Table Not Found: prod.
>
> I have these tables in hive. I used show tables command to confirm this.
> Can someone please let me know how do I make them accessible here?
>
>
>

RE: Spark SQL and Hive tables

Posted by Sameer Tilak <ss...@live.com>.

Hi Jerry,Thanks for your reply. I was following the steps in this programming guide. It does not mention anything about creating HiveContext or HQL explicitly. 

http://databricks.com/blog/2014/03/26/spark-sql-manipulating-structured-data-using-spark-2.html
Users(userId INT, name String, email STRING,age INT, latitude: DOUBLE, longitude: DOUBLE,subscribed: BOOLEAN)Events(userId INT, action INT)Given the data stored in in these tables, one might want to build a model that will predict which users are good targets for a new campaign, based on users that are similar.// Data can easily be extracted from existing sources,
// such as Apache Hive.
val trainingDataTable = sql("""
  SELECT e.action
         u.age,
         u.latitude,
         u.logitude
  FROM Users u
  JOIN Events e
  ON u.userId = e.userId""")


Date: Fri, 25 Jul 2014 17:27:17 -0400
Subject: Re: Spark SQL and Hive tables
From: chilinglam@gmail.com
To: user@spark.apache.org

Hi Sameer,
Maybe this page will help you: https://spark.apache.org/docs/latest/sql-programming-guide.html#hive-tables

Best Regards,
Jerry


On Fri, Jul 25, 2014 at 5:25 PM, Sameer Tilak <ss...@live.com> wrote:




Hi All,I am trying to load data from Hive tables using Spark SQL. I am using spark-shell. Here is what I see: 
val trainingDataTable = sql("""SELECT prod.prod_num, demographics.gender, demographics.birth_year, demographics.income_group  FROM prod p JOIN demographics d ON d.user_id = p.user_id""")

14/07/25 14:18:46 INFO Analyzer: Max iterations (2) reached for batch MultiInstanceRelations14/07/25 14:18:46 INFO Analyzer: Max iterations (2) reached for batch CaseInsensitiveAttributeReferences
java.lang.RuntimeException: Table Not Found: prod.
I have these tables in hive. I used show tables command to confirm this. Can someone please let me know how do I make them accessible here?

Re: Spark SQL and Hive tables

Posted by Michael Armbrust <mi...@databricks.com>.

>
> [S]ince Hive has a large number of dependencies, it is not included in the
> default Spark assembly. In order to use Hive you must first run ‘SPARK_HIVE=true
> sbt/sbt assembly/assembly’ (or use -Phive for maven). This command builds
> a new assembly jar that includes Hive. Note that this Hive assembly jar
> must also be present on all of the worker nodes, as they will need access
> to the Hive serialization and deserialization libraries (SerDes) in order
> to acccess data stored in Hive.



On Fri, Jul 25, 2014 at 3:20 PM, Sameer Tilak <ss...@live.com> wrote:

> Hi Jerry,
>
> I am having trouble with this. May be something wrong with my import or
> version etc.
>
> scala> import org.apache.spark.sql._;
> import org.apache.spark.sql._
>
> scala> val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
> <console>:24: error: object hive is not a member of package
> org.apache.spark.sql
>        val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
>                                                   ^
> Here is what I see for autocompletion:
>
> scala> org.apache.spark.sql.
> Row             SQLContext      SchemaRDD       SchemaRDDLike   api
> catalyst        columnar        execution       package         parquet
> test
>
>
> ------------------------------
> Date: Fri, 25 Jul 2014 17:48:27 -0400
>
> Subject: Re: Spark SQL and Hive tables
> From: chilinglam@gmail.com
> To: user@spark.apache.org
>
>
> Hi Sameer,
>
> The blog post you referred to is about Spark SQL. I don't think the intent
> of the article is meant to guide you how to read data from Hive via Spark
> SQL. So don't worry too much about the blog post.
>
> The programming guide I referred to demonstrate how to read data from Hive
> using Spark SQL. It is a good starting point.
>
> Best Regards,
>
> Jerry
>
>
> On Fri, Jul 25, 2014 at 5:38 PM, Sameer Tilak <ss...@live.com> wrote:
>
> Hi Michael,
> Thanks. I am not creating HiveContext, I am creating SQLContext. I am
> using CDH 5.1. Can you please let me know which conf/ directory you are
> talking about?
>
> ------------------------------
> From: michael@databricks.com
> Date: Fri, 25 Jul 2014 14:34:53 -0700
>
> Subject: Re: Spark SQL and Hive tables
> To: user@spark.apache.org
>
>
> In particular, have you put your hive-site.xml in the conf/ directory?
>  Also, are you creating a HiveContext instead of a SQLContext?
>
>
> On Fri, Jul 25, 2014 at 2:27 PM, Jerry Lam <ch...@gmail.com> wrote:
>
> Hi Sameer,
>
> Maybe this page will help you:
> https://spark.apache.org/docs/latest/sql-programming-guide.html#hive-tables
>
> Best Regards,
>
> Jerry
>
>
>
> On Fri, Jul 25, 2014 at 5:25 PM, Sameer Tilak <ss...@live.com> wrote:
>
> Hi All,
> I am trying to load data from Hive tables using Spark SQL. I am using
> spark-shell. Here is what I see:
>
> val trainingDataTable = sql("""SELECT prod.prod_num, demographics.gender,
> demographics.birth_year, demographics.income_group  FROM prod p JOIN
> demographics d ON d.user_id = p.user_id""")
>
> 14/07/25 14:18:46 INFO Analyzer: Max iterations (2) reached for batch
> MultiInstanceRelations
> 14/07/25 14:18:46 INFO Analyzer: Max iterations (2) reached for batch
> CaseInsensitiveAttributeReferences
> java.lang.RuntimeException: Table Not Found: prod.
>
> I have these tables in hive. I used show tables command to confirm this.
> Can someone please let me know how do I make them accessible here?
>
>
>
>
>

RE: Spark SQL and Hive tables

Posted by Sameer Tilak <ss...@live.com>.

Hi Jerry,
I am having trouble with this. May be something wrong with my import or version etc. 
scala> import org.apache.spark.sql._;import org.apache.spark.sql._
scala> val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)<console>:24: error: object hive is not a member of package org.apache.spark.sql       val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)                                                  ^Here is what I see for autocompletion:
scala> org.apache.spark.sql.Row             SQLContext      SchemaRDD       SchemaRDDLike   apicatalyst        columnar        execution       package         parquettest

Date: Fri, 25 Jul 2014 17:48:27 -0400
Subject: Re: Spark SQL and Hive tables
From: chilinglam@gmail.com
To: user@spark.apache.org

Hi Sameer,
The blog post you referred to is about Spark SQL. I don't think the intent of the article is meant to guide you how to read data from Hive via Spark SQL. So don't worry too much about the blog post. 

The programming guide I referred to demonstrate how to read data from Hive using Spark SQL. It is a good starting point.
Best Regards,
Jerry

On Fri, Jul 25, 2014 at 5:38 PM, Sameer Tilak <ss...@live.com> wrote:

Hi Michael,Thanks. I am not creating HiveContext, I am creating SQLContext. I am using CDH 5.1. Can you please let me know which conf/ directory you are talking about? 

From: michael@databricks.com

Date: Fri, 25 Jul 2014 14:34:53 -0700
Subject: Re: Spark SQL and Hive tables
To: user@spark.apache.org

In particular, have you put your hive-site.xml in the conf/ directory?  Also, are you creating a HiveContext instead of a SQLContext?

On Fri, Jul 25, 2014 at 2:27 PM, Jerry Lam <ch...@gmail.com> wrote:

Hi Sameer,
Maybe this page will help you: https://spark.apache.org/docs/latest/sql-programming-guide.html#hive-tables

Best Regards,
Jerry

On Fri, Jul 25, 2014 at 5:25 PM, Sameer Tilak <ss...@live.com> wrote:

Hi All,I am trying to load data from Hive tables using Spark SQL. I am using spark-shell. Here is what I see: 
val trainingDataTable = sql("""SELECT prod.prod_num, demographics.gender, demographics.birth_year, demographics.income_group  FROM prod p JOIN demographics d ON d.user_id = p.user_id""")

14/07/25 14:18:46 INFO Analyzer: Max iterations (2) reached for batch MultiInstanceRelations14/07/25 14:18:46 INFO Analyzer: Max iterations (2) reached for batch CaseInsensitiveAttributeReferences

java.lang.RuntimeException: Table Not Found: prod.
I have these tables in hive. I used show tables command to confirm this. Can someone please let me know how do I make them accessible here?

RE: Spark SQL and Hive tables

Posted by Sameer Tilak <ss...@live.com>.

Thanks, Jerry.

Date: Fri, 25 Jul 2014 17:48:27 -0400
Subject: Re: Spark SQL and Hive tables
From: chilinglam@gmail.com
To: user@spark.apache.org

Hi Sameer,
The blog post you referred to is about Spark SQL. I don't think the intent of the article is meant to guide you how to read data from Hive via Spark SQL. So don't worry too much about the blog post. 

The programming guide I referred to demonstrate how to read data from Hive using Spark SQL. It is a good starting point.
Best Regards,
Jerry

On Fri, Jul 25, 2014 at 5:38 PM, Sameer Tilak <ss...@live.com> wrote:

Hi Michael,Thanks. I am not creating HiveContext, I am creating SQLContext. I am using CDH 5.1. Can you please let me know which conf/ directory you are talking about? 

From: michael@databricks.com

Date: Fri, 25 Jul 2014 14:34:53 -0700
Subject: Re: Spark SQL and Hive tables
To: user@spark.apache.org

In particular, have you put your hive-site.xml in the conf/ directory?  Also, are you creating a HiveContext instead of a SQLContext?

On Fri, Jul 25, 2014 at 2:27 PM, Jerry Lam <ch...@gmail.com> wrote:

Hi Sameer,
Maybe this page will help you: https://spark.apache.org/docs/latest/sql-programming-guide.html#hive-tables

Best Regards,
Jerry

On Fri, Jul 25, 2014 at 5:25 PM, Sameer Tilak <ss...@live.com> wrote:

Hi All,I am trying to load data from Hive tables using Spark SQL. I am using spark-shell. Here is what I see: 
val trainingDataTable = sql("""SELECT prod.prod_num, demographics.gender, demographics.birth_year, demographics.income_group  FROM prod p JOIN demographics d ON d.user_id = p.user_id""")

14/07/25 14:18:46 INFO Analyzer: Max iterations (2) reached for batch MultiInstanceRelations14/07/25 14:18:46 INFO Analyzer: Max iterations (2) reached for batch CaseInsensitiveAttributeReferences

java.lang.RuntimeException: Table Not Found: prod.
I have these tables in hive. I used show tables command to confirm this. Can someone please let me know how do I make them accessible here?

Re: Spark SQL and Hive tables

Posted by Jerry Lam <ch...@gmail.com>.

Hi Sameer,

The blog post you referred to is about Spark SQL. I don't think the intent
of the article is meant to guide you how to read data from Hive via Spark
SQL. So don't worry too much about the blog post.

The programming guide I referred to demonstrate how to read data from Hive
using Spark SQL. It is a good starting point.

Best Regards,

Jerry


On Fri, Jul 25, 2014 at 5:38 PM, Sameer Tilak <ss...@live.com> wrote:

> Hi Michael,
> Thanks. I am not creating HiveContext, I am creating SQLContext. I am
> using CDH 5.1. Can you please let me know which conf/ directory you are
> talking about?
>
> ------------------------------
> From: michael@databricks.com
> Date: Fri, 25 Jul 2014 14:34:53 -0700
>
> Subject: Re: Spark SQL and Hive tables
> To: user@spark.apache.org
>
>
> In particular, have you put your hive-site.xml in the conf/ directory?
>  Also, are you creating a HiveContext instead of a SQLContext?
>
>
> On Fri, Jul 25, 2014 at 2:27 PM, Jerry Lam <ch...@gmail.com> wrote:
>
> Hi Sameer,
>
> Maybe this page will help you:
> https://spark.apache.org/docs/latest/sql-programming-guide.html#hive-tables
>
> Best Regards,
>
> Jerry
>
>
>
> On Fri, Jul 25, 2014 at 5:25 PM, Sameer Tilak <ss...@live.com> wrote:
>
> Hi All,
> I am trying to load data from Hive tables using Spark SQL. I am using
> spark-shell. Here is what I see:
>
> val trainingDataTable = sql("""SELECT prod.prod_num, demographics.gender,
> demographics.birth_year, demographics.income_group  FROM prod p JOIN
> demographics d ON d.user_id = p.user_id""")
>
> 14/07/25 14:18:46 INFO Analyzer: Max iterations (2) reached for batch
> MultiInstanceRelations
> 14/07/25 14:18:46 INFO Analyzer: Max iterations (2) reached for batch
> CaseInsensitiveAttributeReferences
> java.lang.RuntimeException: Table Not Found: prod.
>
> I have these tables in hive. I used show tables command to confirm this.
> Can someone please let me know how do I make them accessible here?
>
>
>
>

RE: Spark SQL and Hive tables

Posted by Sameer Tilak <ss...@live.com>.

Hi Michael,Thanks. I am not creating HiveContext, I am creating SQLContext. I am using CDH 5.1. Can you please let me know which conf/ directory you are talking about? 

From: michael@databricks.com
Date: Fri, 25 Jul 2014 14:34:53 -0700
Subject: Re: Spark SQL and Hive tables
To: user@spark.apache.org

In particular, have you put your hive-site.xml in the conf/ directory?  Also, are you creating a HiveContext instead of a SQLContext?

On Fri, Jul 25, 2014 at 2:27 PM, Jerry Lam <ch...@gmail.com> wrote:

Hi Sameer,
Maybe this page will help you: https://spark.apache.org/docs/latest/sql-programming-guide.html#hive-tables

Best Regards,
Jerry

On Fri, Jul 25, 2014 at 5:25 PM, Sameer Tilak <ss...@live.com> wrote:

Hi All,I am trying to load data from Hive tables using Spark SQL. I am using spark-shell. Here is what I see: 
val trainingDataTable = sql("""SELECT prod.prod_num, demographics.gender, demographics.birth_year, demographics.income_group  FROM prod p JOIN demographics d ON d.user_id = p.user_id""")

14/07/25 14:18:46 INFO Analyzer: Max iterations (2) reached for batch MultiInstanceRelations14/07/25 14:18:46 INFO Analyzer: Max iterations (2) reached for batch CaseInsensitiveAttributeReferences

java.lang.RuntimeException: Table Not Found: prod.
I have these tables in hive. I used show tables command to confirm this. Can someone please let me know how do I make them accessible here?

Re: Spark SQL and Hive tables

Posted by Michael Armbrust <mi...@databricks.com>.

In particular, have you put your hive-site.xml in the conf/ directory?
 Also, are you creating a HiveContext instead of a SQLContext?


On Fri, Jul 25, 2014 at 2:27 PM, Jerry Lam <ch...@gmail.com> wrote:

> Hi Sameer,
>
> Maybe this page will help you:
> https://spark.apache.org/docs/latest/sql-programming-guide.html#hive-tables
>
> Best Regards,
>
> Jerry
>
>
>
> On Fri, Jul 25, 2014 at 5:25 PM, Sameer Tilak <ss...@live.com> wrote:
>
>> Hi All,
>> I am trying to load data from Hive tables using Spark SQL. I am using
>> spark-shell. Here is what I see:
>>
>> val trainingDataTable = sql("""SELECT prod.prod_num, demographics.gender,
>> demographics.birth_year, demographics.income_group  FROM prod p JOIN
>> demographics d ON d.user_id = p.user_id""")
>>
>> 14/07/25 14:18:46 INFO Analyzer: Max iterations (2) reached for batch
>> MultiInstanceRelations
>> 14/07/25 14:18:46 INFO Analyzer: Max iterations (2) reached for batch
>> CaseInsensitiveAttributeReferences
>> java.lang.RuntimeException: Table Not Found: prod.
>>
>> I have these tables in hive. I used show tables command to confirm this.
>> Can someone please let me know how do I make them accessible here?
>>
>
>

Re: Spark SQL and Hive tables

Posted by Jerry Lam <ch...@gmail.com>.

Hi Sameer,

Maybe this page will help you:
https://spark.apache.org/docs/latest/sql-programming-guide.html#hive-tables

Best Regards,

Jerry



On Fri, Jul 25, 2014 at 5:25 PM, Sameer Tilak <ss...@live.com> wrote:

> Hi All,
> I am trying to load data from Hive tables using Spark SQL. I am using
> spark-shell. Here is what I see:
>
> val trainingDataTable = sql("""SELECT prod.prod_num, demographics.gender,
> demographics.birth_year, demographics.income_group  FROM prod p JOIN
> demographics d ON d.user_id = p.user_id""")
>
> 14/07/25 14:18:46 INFO Analyzer: Max iterations (2) reached for batch
> MultiInstanceRelations
> 14/07/25 14:18:46 INFO Analyzer: Max iterations (2) reached for batch
> CaseInsensitiveAttributeReferences
> java.lang.RuntimeException: Table Not Found: prod.
>
> I have these tables in hive. I used show tables command to confirm this.
> Can someone please let me know how do I make them accessible here?
>