You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tez.apache.org by Jan Morlock <ja...@googlemail.com> on 2016/01/28 18:01:35 UTC

Classpath Composition

Using Apache Tez in combination with Hive or Pig we *sometimes* encounter
error messages similar to the following ones:

Caused by: java.lang.NoSuchMethodError:
com.google.common.base.Splitter.splitToList(Ljava/lang/CharSequence;)
Caused by: java.lang.NoSuchMethodError:
com.google.common.base.Stopwatch.elapsedTime(Ljava/util/concurrent/TimeUnit;)

The origin of this problem is that multiple Google Guava jars with
different versions exist on the classpath.
The most ancient version (11.0.2) comes with several Hadoop components (see
also https://issues.apache.org/jira/browse/HADOOP-10101). Our own code uses
a more up-to-date version.

When using Apache Tez, the composition of the classpath seems to be random.
Therefore sometimes an old Guava version lacking modern methods is found
first on the classpath leading to the exception listed above.

My question is the following one: are there any techniques or Tez options
steering the classpath composition, perhaps similar to
mapreduce.job.user.classpath.first?

We are using CDH 5.4.5.

Thank you very much in advance.

Re: Classpath Composition

Posted by Hitesh Shah <hi...@apache.org>.
Assuming you have the guava jar available on all nodes, you can set “tez.cluster.additional.classpath.prefix” to point to it and this classpath value will be prepended to the classpath of the tez runtime layers. However, please note that this is not a guarantee to work if the guava jar from your own code ends up causing the Tez framework to not find the guava APIs it uses. 

Please feel free to add your comments on the approach proposed at https://issues.apache.org/jira/browse/TEZ-2164 - this is mainly to hide Tez’s use of Guava but does not really fix the problems caused by Hadoop, etc requiring an older guava jar. Also, some of the guava APIs used by Tez are not available in guava-18 hence some changes were needed to Tez too. Hadoop in recent times has ported over some of the code from guava into its own code to avoid the compat issues caused by users using a newer version of guava ( older versions did not work seamlessly ). This may also be a point to consider depending on what patches CDH has bundled into its hadoop jars.

Also to clarify, the guava jar coming into the Tez runtime will be a result of the guava jar bundled into the tez tarball ( configured as part of tez.lib.uris ). The tez runtime classpath is in the form of “<configured classpath prefix>:$PWD/*:$PWD/tezlib/*” where tezlib is the dir into which the tez tarball is uncompressed. 

thanks
— Hitesh


On Jan 28, 2016, at 9:01 AM, Jan Morlock <ja...@googlemail.com> wrote:

> Using Apache Tez in combination with Hive or Pig we *sometimes* encounter error messages similar to the following ones:
> 
> Caused by: java.lang.NoSuchMethodError: com.google.common.base.Splitter.splitToList(Ljava/lang/CharSequence;)
> Caused by: java.lang.NoSuchMethodError: com.google.common.base.Stopwatch.elapsedTime(Ljava/util/concurrent/TimeUnit;)
> 
> The origin of this problem is that multiple Google Guava jars with different versions exist on the classpath.
> The most ancient version (11.0.2) comes with several Hadoop components (see also https://issues.apache.org/jira/browse/HADOOP-10101). Our own code uses a more up-to-date version.
> 
> When using Apache Tez, the composition of the classpath seems to be random. Therefore sometimes an old Guava version lacking modern methods is found first on the classpath leading to the exception listed above.
> 
> My question is the following one: are there any techniques or Tez options steering the classpath composition, perhaps similar to mapreduce.job.user.classpath.first?
> 
> We are using CDH 5.4.5.
> 
> Thank you very much in advance.