You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Dave Winterbourne <da...@gmail.com> on 2013/03/25 19:42:03 UTC

Question - why are there instances of org.apache.commons.lang.StringUtils and WordUtils bundled in hive?

I have been working on eliminating duplicate class warnings in my maven
build, and in the end discovered that there are two classes from apache
commons-lang that are bundled with hive-exec:

jar tf hive-0.10.0-bin//lib/hive-exec-0.10.0.jar | grep
org/apache/commons/lang/
org/apache/commons/lang/
org/apache/commons/lang/StringUtils.class
org/apache/commons/lang/WordUtils.class

Why are these classes bundled with hive as opposed to just using
commons-lang? If there truly is a need for custom functionality, why not
put it in a different class to avoid this collision?

Re: Question - why are there instances of org.apache.commons.lang.StringUtils and WordUtils bundled in hive?

Posted by Owen O'Malley <om...@apache.org>.
You're right. I was thinking there was a hive-ql jar, but there isn't.
(Note that they aren't duplicated in the source tree, just packaged up in
the jar.) I've created https://issues.apache.org/jira/browse/HIVE-4229 to
provide a jar of the ql classes without the upstream classes included.

Note that in the long stream, I think we need to simplify the jars, but
that is a bigger issue.

-- Owen


On Mon, Mar 25, 2013 at 3:23 PM, Dave Winterbourne <
dave.winterbourne@gmail.com> wrote:

> We have a custom User Defined Function that extends UDF - I'll admit some
> ignorance, as I inherited this code, but UDF is a class that comes from
> hive-exec, so it doesn't seem true that hive-exec is not intended for
> external usage. That having been said, my original question is why there
> are classes from commons-lang that are simply duplicated in the code base.
> This is bad form at best, but causes class collisions and thus duplicate
> class warnings.
>
> On Mon, Mar 25, 2013 at 2:48 PM, Owen O'Malley <om...@apache.org> wrote:
>
> > Hive-exec isn't meant for external usage. It is the bundled jar of Hive's
> > runtime dependencies that are required for Hive's MapReduce tasks. It
> > consists of :
> >
> > hive-common
> > hive-ql
> > hive-serde
> > hive-shims
> > thrift
> > commons-lang
> > json
> > avro
> > avro-mapred
> > java-ewah
> > javolution
> > protobuf-java
> >
> > -- Owen
> >
> >
> > On Mon, Mar 25, 2013 at 11:42 AM, Dave Winterbourne <
> > dave.winterbourne@gmail.com> wrote:
> >
> > > I have been working on eliminating duplicate class warnings in my maven
> > > build, and in the end discovered that there are two classes from apache
> > > commons-lang that are bundled with hive-exec:
> > >
> > > jar tf hive-0.10.0-bin//lib/hive-exec-0.10.0.jar | grep
> > > org/apache/commons/lang/
> > > org/apache/commons/lang/
> > > org/apache/commons/lang/StringUtils.class
> > > org/apache/commons/lang/WordUtils.class
> > >
> > > Why are these classes bundled with hive as opposed to just using
> > > commons-lang? If there truly is a need for custom functionality, why
> not
> > > put it in a different class to avoid this collision?
> > >
> >
>

Re: Question - why are there instances of org.apache.commons.lang.StringUtils and WordUtils bundled in hive?

Posted by Dave Winterbourne <da...@gmail.com>.
We have a custom User Defined Function that extends UDF - I'll admit some
ignorance, as I inherited this code, but UDF is a class that comes from
hive-exec, so it doesn't seem true that hive-exec is not intended for
external usage. That having been said, my original question is why there
are classes from commons-lang that are simply duplicated in the code base.
This is bad form at best, but causes class collisions and thus duplicate
class warnings.

On Mon, Mar 25, 2013 at 2:48 PM, Owen O'Malley <om...@apache.org> wrote:

> Hive-exec isn't meant for external usage. It is the bundled jar of Hive's
> runtime dependencies that are required for Hive's MapReduce tasks. It
> consists of :
>
> hive-common
> hive-ql
> hive-serde
> hive-shims
> thrift
> commons-lang
> json
> avro
> avro-mapred
> java-ewah
> javolution
> protobuf-java
>
> -- Owen
>
>
> On Mon, Mar 25, 2013 at 11:42 AM, Dave Winterbourne <
> dave.winterbourne@gmail.com> wrote:
>
> > I have been working on eliminating duplicate class warnings in my maven
> > build, and in the end discovered that there are two classes from apache
> > commons-lang that are bundled with hive-exec:
> >
> > jar tf hive-0.10.0-bin//lib/hive-exec-0.10.0.jar | grep
> > org/apache/commons/lang/
> > org/apache/commons/lang/
> > org/apache/commons/lang/StringUtils.class
> > org/apache/commons/lang/WordUtils.class
> >
> > Why are these classes bundled with hive as opposed to just using
> > commons-lang? If there truly is a need for custom functionality, why not
> > put it in a different class to avoid this collision?
> >
>

Re: Question - why are there instances of org.apache.commons.lang.StringUtils and WordUtils bundled in hive?

Posted by Owen O'Malley <om...@apache.org>.
Hive-exec isn't meant for external usage. It is the bundled jar of Hive's
runtime dependencies that are required for Hive's MapReduce tasks. It
consists of :

hive-common
hive-ql
hive-serde
hive-shims
thrift
commons-lang
json
avro
avro-mapred
java-ewah
javolution
protobuf-java

-- Owen


On Mon, Mar 25, 2013 at 11:42 AM, Dave Winterbourne <
dave.winterbourne@gmail.com> wrote:

> I have been working on eliminating duplicate class warnings in my maven
> build, and in the end discovered that there are two classes from apache
> commons-lang that are bundled with hive-exec:
>
> jar tf hive-0.10.0-bin//lib/hive-exec-0.10.0.jar | grep
> org/apache/commons/lang/
> org/apache/commons/lang/
> org/apache/commons/lang/StringUtils.class
> org/apache/commons/lang/WordUtils.class
>
> Why are these classes bundled with hive as opposed to just using
> commons-lang? If there truly is a need for custom functionality, why not
> put it in a different class to avoid this collision?
>