You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Stig Rohde Døssing <st...@gmail.com> on 2019/03/08 10:13:54 UTC

Hive Hadoop/HBase dependencies

Hi,

I'm working on migrating Apache Storm to Hadoop 3.2.0, and I'm having some
trouble with the dependency tree pulled in by Hive.

Our direct dependencies on Hive are

org.apache.hive:hcatalog:hive-hcatalog-core:3.1.1
org.apache.hive:hive-webhcat-java-client:3.1.1
org.apache.hive:hive-hcatalog-streaming:3.1.1

Are these artifacts intended for use by other projects, or should I be
using other (shaded?) artifacts to interact with Hive?

The Hadoop manual (
https://hadoop.apache.org/docs/r3.1.1/hadoop-project-dist/hadoop-common/DownstreamDev.html#Build_Artifacts)
lists the artifacts downstream projects should be using. Most of those
artifacts shade Hadoop's dependencies, to avoid causing conflicts in users'
projects. HBase does the same with hbase-shaded-client.

Hive doesn't seem to use these shaded artifacts, but instead refers to
artifacts like hbase-client, or hadoop-hdfs, which causes conflicts with
the shaded artifacts (hbase-shaded-client, hadoop-hdfs-client), since both
shaded and unshaded artifacts contain the same Hadoop classes.

Additionally hive-hcatalog-streaming pulls in hive-cli, which pulls in
Hadoop 2.7.4 jars. This doesn't seem intentional.

Are there any plans to migrate Hive to the shaded Hadoop/HBase jars for
Hive 4, or would there be objections against doing so? I think it could
help avoid dependency conflicts when projects rely on Hive and Hadoop/HBase
at the same time.