You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Silvio Fiorito <si...@granturing.com> on 2014/06/06 21:08:28 UTC

Spark 1.0 & embedded Hive libraries

Is there a repo somewhere with the code for the Hive dependencies (hive-exec, hive-serde, & hive-metastore) used in SparkSQL? Are they forked with Spark-specific customizations, like Shark, or simply relabeled with a new package name ("org.spark-project.hive")? I couldn't find any repos on Github or Apache main.

I'm wanting to use some Hive packages outside of the ones burned into the Spark JAR but I'm having all sorts of headaches due to "jar-hell" with the Hive JARs in CDH or even HDP mismatched with the Spark Hive JARs.

Thanks,
Silvio

Re: Spark 1.0 & embedded Hive libraries

Posted by Silvio Fiorito <si...@granturing.com>.
Great, thanks for the info and pointer to the repo!

From: Patrick Wendell<ma...@gmail.com>
Sent: ?Friday?, ?June? ?6?, ?2014 ?5?:?11? ?PM
To: user@spark.apache.org<ma...@spark.apache.org>

They are forked and slightly modified for two reasons:

(a) Hive embeds a bunch of other dependencies in their published jars
such that it makes it really hard for other projects to depend on
them. If you look at the hive-exec jar they copy a bunch of other
dependencies directly into this jar. We modified the Hive 0.12 build
to produce jars that do not include other dependencies inside of them.

(b) Hive replies on a version of protobuf that means it is
incompatible with certain Hadoop versions. We used a shaded version of
the protobuf dependency to avoid this.

The forked copy is here - feel free to take a look:
https://github.com/pwendell/hive/commits/branch-0.12-shaded-protobuf

I'm hoping the upstream Hive project will change their published
artifacts to make them usable as a library for other applications.
Unfortunately as it stands we had to fork our own copy of these to
make it work. I think it's being tracked by this JIRA:

https://issues.apache.org/jira/browse/HIVE-5733

- Patrick

On Fri, Jun 6, 2014 at 12:08 PM, Silvio Fiorito
<si...@granturing.com> wrote:
> Is there a repo somewhere with the code for the Hive dependencies
> (hive-exec, hive-serde, & hive-metastore) used in SparkSQL? Are they forked
> with Spark-specific customizations, like Shark, or simply relabeled with a
> new package name ("org.spark-project.hive")? I couldn't find any repos on
> Github or Apache main.
>
> I'm wanting to use some Hive packages outside of the ones burned into the
> Spark JAR but I'm having all sorts of headaches due to "jar-hell" with the
> Hive JARs in CDH or even HDP mismatched with the Spark Hive JARs.
>
> Thanks,
> Silvio

Re: Spark 1.0 & embedded Hive libraries

Posted by Patrick Wendell <pw...@gmail.com>.
They are forked and slightly modified for two reasons:

(a) Hive embeds a bunch of other dependencies in their published jars
such that it makes it really hard for other projects to depend on
them. If you look at the hive-exec jar they copy a bunch of other
dependencies directly into this jar. We modified the Hive 0.12 build
to produce jars that do not include other dependencies inside of them.

(b) Hive replies on a version of protobuf that means it is
incompatible with certain Hadoop versions. We used a shaded version of
the protobuf dependency to avoid this.

The forked copy is here - feel free to take a look:
https://github.com/pwendell/hive/commits/branch-0.12-shaded-protobuf

I'm hoping the upstream Hive project will change their published
artifacts to make them usable as a library for other applications.
Unfortunately as it stands we had to fork our own copy of these to
make it work. I think it's being tracked by this JIRA:

https://issues.apache.org/jira/browse/HIVE-5733

- Patrick

On Fri, Jun 6, 2014 at 12:08 PM, Silvio Fiorito
<si...@granturing.com> wrote:
> Is there a repo somewhere with the code for the Hive dependencies
> (hive-exec, hive-serde, & hive-metastore) used in SparkSQL? Are they forked
> with Spark-specific customizations, like Shark, or simply relabeled with a
> new package name ("org.spark-project.hive")? I couldn't find any repos on
> Github or Apache main.
>
> I'm wanting to use some Hive packages outside of the ones burned into the
> Spark JAR but I'm having all sorts of headaches due to "jar-hell" with the
> Hive JARs in CDH or even HDP mismatched with the Spark Hive JARs.
>
> Thanks,
> Silvio