You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by ReeceRobinson <Re...@TheRobinsons.gen.nz> on 2015/07/28 00:56:28 UTC

Do I really need to build Spark for Hive/Thrift Server support?

I'm a bit confused about the documentation in the area of Hive support.

I want to use a remote Hive metastore/hdfs server and the documentation says
that we need to build Spark from source due to the large number of
dependencies Hive requires.

Specifically the documentation says:

"Hive has a large number of dependencies, it is not included in the default
Spark assembly....This command builds a new assembly jar that includes
Hive."

So I downloaded the source distribution of Spark 1.4.1 and executed the
following build command:

./make-distribution.sh --name spark-1.4.1-hadoop-2.6-hive --tgz -Pyarn
-Phadoop-2.6 -Dhadoop.version=2.6.0 -Phive -Phive-thriftserver  -DskipTests

Inspecting the size of the resulting spark-assembly-1.4.1-hadoop2.6.0.jar it
is only a few bytes different ie. Pre-built jar is 162976273 bytes and my
custom built jar is 162976444. I don't see any new hive jar file either?

Can someone please help me understand what is going on here?

Cheers,
Reece



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Do-I-really-need-to-build-Spark-for-Hive-Thrift-Server-support-tp24013.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: Do I really need to build Spark for Hive/Thrift Server support?

Posted by roni <ro...@gmail.com>.

Hi All,
 Any explanation for this?
 As Reece said I can do operations with hive but -

val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc) -- gives error.

I have already created spark ec2 cluster with the spark-ec2 script. How can
I build it again?

Thanks
_Roni

On Tue, Jul 28, 2015 at 2:46 PM, ReeceRobinson <Re...@therobinsons.gen.nz>
wrote:

> I am building an analytics environment based on Spark and want to use HIVE
> in
> multi-user mode i.e. not use the embedded derby database but use Postgres
> and HDFS instead. I am using the included Spark Thrift Server to process
> queries using Spark SQL.
>
> The documentation gives me the impression that I need to create a custom
> build of Spark 1.4.1. However I don't think this is either accurate now OR
> it is for some different context I'm not aware of?
>
> I used the pre-built Spark 1.4.1 distribution today with my hive-site.xml
> for Postgres and HDFS and it worked! I see the warehouse files turn up in
> HDFS and I see the metadata inserted into Postgres when I created a test
> table.
>
> I can connect to the Thrift Server using beeline and perform queries on my
> data. I also verified using the Spark UI that the SQL is being processed by
> Spark SQL.
>
> So I guess I'm asking is the document out-of-date or am I missing
> something?
>
> Cheers,
> Reece
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Do-I-really-need-to-build-Spark-for-Hive-Thrift-Server-support-tp24013p24039.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>

Re: Do I really need to build Spark for Hive/Thrift Server support?

Posted by ReeceRobinson <Re...@TheRobinsons.gen.nz>.

I am building an analytics environment based on Spark and want to use HIVE in
multi-user mode i.e. not use the embedded derby database but use Postgres
and HDFS instead. I am using the included Spark Thrift Server to process
queries using Spark SQL.

The documentation gives me the impression that I need to create a custom
build of Spark 1.4.1. However I don't think this is either accurate now OR
it is for some different context I'm not aware of?

I used the pre-built Spark 1.4.1 distribution today with my hive-site.xml
for Postgres and HDFS and it worked! I see the warehouse files turn up in
HDFS and I see the metadata inserted into Postgres when I created a test
table.

I can connect to the Thrift Server using beeline and perform queries on my
data. I also verified using the Spark UI that the SQL is being processed by
Spark SQL.

So I guess I'm asking is the document out-of-date or am I missing something?

Cheers,
Reece



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Do-I-really-need-to-build-Spark-for-Hive-Thrift-Server-support-tp24013p24039.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org