You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by nitinkak001 <ni...@gmail.com> on 2015/03/02 21:37:05 UTC

Executing hive query from Spark code

I want to run Hive query inside Spark and use the RDDs generated from that
inside Spark. I read in the documentation 

"/Hive support is enabled by adding the -Phive and -Phive-thriftserver flags
to Spark’s build. This command builds a new assembly jar that includes Hive.
Note that this Hive assembly jar must also be present on all of the worker
nodes, as they will need access to the Hive serialization and
deserialization libraries (SerDes) in order to access data stored in Hive./"

I just wanted to know what -Phive and -Phive-thriftserver flags really do
and is there a way to have the hive support without updating the assembly.
Does that flag add a hive support jar or something?

The reason I am asking is that I will be using Cloudera version of Spark in
future and I am not sure how to add the Hive support to that Spark
distribution.






--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Executing-hive-query-from-Spark-code-tp21880.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: Executing hive query from Spark code

Posted by Ted Yu <yu...@gmail.com>.

Here is snippet of dependency tree for spark-hive module:

[INFO] org.apache.spark:spark-hive_2.10:jar:1.3.0-SNAPSHOT
...
[INFO] +- org.spark-project.hive:hive-metastore:jar:0.13.1a:compile
[INFO] |  +- org.spark-project.hive:hive-shims:jar:0.13.1a:compile
[INFO] |  |  +-
org.spark-project.hive.shims:hive-shims-common:jar:0.13.1a:compile
[INFO] |  |  +-
org.spark-project.hive.shims:hive-shims-0.20:jar:0.13.1a:runtime
[INFO] |  |  +-
org.spark-project.hive.shims:hive-shims-common-secure:jar:0.13.1a:compile
[INFO] |  |  +-
org.spark-project.hive.shims:hive-shims-0.20S:jar:0.13.1a:runtime
[INFO] |  |  \-
org.spark-project.hive.shims:hive-shims-0.23:jar:0.13.1a:runtime
...
[INFO] +- org.spark-project.hive:hive-exec:jar:0.13.1a:compile
[INFO] |  +- org.spark-project.hive:hive-ant:jar:0.13.1a:compile
[INFO] |  |  \- org.apache.velocity:velocity:jar:1.5:compile
[INFO] |  |     \- oro:oro:jar:2.0.8:compile
[INFO] |  +- org.spark-project.hive:hive-common:jar:0.13.1a:compile
...
[INFO] +- org.spark-project.hive:hive-serde:jar:0.13.1a:compile

bq. is there a way to have the hive support without updating the assembly

I don't think so.

On Mon, Mar 2, 2015 at 12:37 PM, nitinkak001 <ni...@gmail.com> wrote:

> I want to run Hive query inside Spark and use the RDDs generated from that
> inside Spark. I read in the documentation
>
> "/Hive support is enabled by adding the -Phive and -Phive-thriftserver
> flags
> to Spark’s build. This command builds a new assembly jar that includes
> Hive.
> Note that this Hive assembly jar must also be present on all of the worker
> nodes, as they will need access to the Hive serialization and
> deserialization libraries (SerDes) in order to access data stored in
> Hive./"
>
> I just wanted to know what -Phive and -Phive-thriftserver flags really do
> and is there a way to have the hive support without updating the assembly.
> Does that flag add a hive support jar or something?
>
> The reason I am asking is that I will be using Cloudera version of Spark in
> future and I am not sure how to add the Hive support to that Spark
> distribution.
>
>
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Executing-hive-query-from-Spark-code-tp21880.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>