You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Luis Casillas (JIRA)" <ji...@apache.org> on 2016/06/11 01:42:21 UTC

[jira] [Comment Edited] (TEZ-3299) Tez is incompatible with HADOOP_USE_CLIENT_CLASSLOADER=true

    [ https://issues.apache.org/jira/browse/TEZ-3299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15325625#comment-15325625 ] 

Luis Casillas edited comment on TEZ-3299 at 6/11/16 1:42 AM:
-------------------------------------------------------------

I've found a workaround that gets me around this issue:

{code}
export HADOOP_CLIENT_CLASSLOADER_SYSTEM_CLASSES='java.,javax.accessibility.,javax.activation.,javax.activity.,javax.annotation.,javax.annotation.processing.,javax.crypto.,javax.imageio.,javax.jws.,javax.lang.model.,-javax.management.j2ee.,javax.management.,javax.naming.,javax.net.,javax.print.,javax.rmi.,javax.script.,-javax.security.auth.message.,javax.security.auth.,javax.security.cert.,javax.security.sasl.,javax.sound.,javax.sql.,javax.swing.,javax.tools.,javax.transaction.,-javax.xml.registry.,-javax.xml.rpc.,javax.xml.,org.w3c.dom.,org.xml.sax.,org.apache.commons.logging.,org.apache.log4j.,core-default.xml,hdfs-default.xml,mapred-default.xml,yarn-default.xml'
{code}

But this presumably creates other risks.  My application could erroneously package some Hadoop jars incompatible with the installed ones.


was (Author: lcasillas):
I've found a workaround that gets me around this issue:

{code}
export HADOOP_CLIENT_CLASSLOADER_SYSTEM_CLASSES='java.,javax.accessibility.,javax.activation.,javax.activity.,javax.annotation.,javax.annotation.processing.,javax.crypto.,javax.imageio.,javax.jws.,javax.lang.model.,-javax.management.j2ee.,javax.management.,javax.naming.,javax.net.,javax.print.,javax.rmi.,javax.script.,-javax.security.auth.message.,javax.security.auth.,javax.security.cert.,javax.security.sasl.,javax.sound.,javax.sql.,javax.swing.,javax.tools.,javax.transaction.,-javax.xml.registry.,-javax.xml.rpc.,javax.xml.,org.w3c.dom.,org.xml.sax.,org.apache.commons.logging.,org.apache.log4j.,core-default.xml,hdfs-default.xml,mapred-default.xml,yarn-default.xml'
{code}


> Tez is incompatible with HADOOP_USE_CLIENT_CLASSLOADER=true
> -----------------------------------------------------------
>
>                 Key: TEZ-3299
>                 URL: https://issues.apache.org/jira/browse/TEZ-3299
>             Project: Apache Tez
>          Issue Type: Bug
>    Affects Versions: 0.8.3
>         Environment: Elastic MapReduce 4.7.0
>            Reporter: Luis Casillas
>
> The ticket HADOOP-10893 introduced a new environment variable, HADOOP_USE_CLIENT_CLASSLOADER, that makes the hadoop jar command put the client application's own bundled jars (in the the jar file's lib/ directory ) ahead of those bundled by the Hadoop installation. 
> Tez 0.8.3, however, does not play nicely with this feature.  The reason is that Tez has classes under the org.apache.hadoop package hierarchy (e.g., org.apache.hadoop.mapreduce.split.TezGroupedSplitsInputFormat).
> Hadoop's ApplicationClassLoader class, which implements the HADOOP_USE_CLIENT_CLASSLOADER=true feature, in its default configuration will refuse to load classes inside the org.apache.hadoop packages, instead delegating to the parent classloader.  See the implementation for reference:
> * https://github.com/c9n/hadoop/blob/master/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ApplicationClassLoader.java
> The way that Elastic MapReduce 4.7.0 sets up the classpath for Tez 0.8.3, the tez-mapreduce-0.8.3.jar is in the client classpath, so in my Cascading application I get this *extremely confusing* failure:
> 1. The JVM can load the `org.apache.tez.mapreduce.input.MRInput$MRInputConfigBuilder` class successfully;
> 2. But it gets a `NoClassDefFoundError` for `org.apache.hadoop.mapreduce.split.TezGroupedSplitsInputFormat`
> And the reason I say "extremely confusing" is because *both of these classes are in the same jar*!  This surprising difference is caused by ApplicationClassLoader, which logs its configuration at the beginning of the job:
> {code}
> 16/06/11 00:51:15 INFO util.ApplicationClassLoader: system classes: [java., javax.accessibility., javax.activation., javax.activity., javax.annotation., javax.annotation.processing., javax.crypto., javax.imageio., javax.jws., javax.lang.model., -javax.management.j2ee., javax.management., javax.naming., javax.net., javax.print., javax.rmi., javax.script., -javax.security.auth.message., javax.security.auth., javax.security.cert., javax.security.sasl., javax.sound., javax.sql., javax.swing., javax.tools., javax.transaction., -javax.xml.registry., -javax.xml.rpc., javax.xml., org.w3c.dom., org.xml.sax., org.apache.commons.logging., org.apache.log4j., org.apache.hadoop., core-default.xml, hdfs-default.xml, mapred-default.xml, yarn-default.xml]
> {code}
> This can also be verified by exporting HADOOP_OPTS='-verbose:class' before running my application:
> {code}
> [Loaded org.apache.tez.mapreduce.partition.MRPartitioner from file:/usr/lib/tez/tez-mapreduce-0.8.3.jar]
> [Loaded org.apache.tez.mapreduce.hadoop.MRInputHelpers from file:/usr/lib/tez/tez-mapreduce-0.8.3.jar]
> [Loaded org.apache.tez.mapreduce.input.MRInput$MRInputHelpersInternal from file:/usr/lib/tez/tez-mapreduce-0.8.3.jar
> ]
> [Loaded org.apache.hadoop.mapreduce.InputFormat from file:/usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-core-2.7.2-amzn-2.jar]
> ...
> 16/06/11 00:51:32 ERROR dataplatform.Main: Uncaught exception
> java.lang.NoClassDefFoundError: org/apache/hadoop/mapreduce/split/TezGroupedSplitsInputFormat
>         at org.apache.tez.mapreduce.input.MRInput$MRInputConfigBuilder.createGeneratorDataSource(MRInput.java:325)
>         at org.apache.tez.mapreduce.input.MRInput$MRInputConfigBuilder.build(MRInput.java:249)
>         at cascading.flow.tez.Hadoop2TezFlowStep.createVertex(Hadoop2TezFlowStep.java:515)
>         at cascading.flow.tez.Hadoop2TezFlowStep.createDAG(Hadoop2TezFlowStep.java:216)
>         at cascading.flow.tez.Hadoop2TezFlowStep.createFlowStepJob(Hadoop2TezFlowStep.java:197)
>         at cascading.flow.tez.Hadoop2TezFlowStep.createFlowStepJob(Hadoop2TezFlowStep.java:123)
>         at cascading.flow.planner.BaseFlowStep.getCreateFlowStepJob(BaseFlowStep.java:916)
>         at cascading.flow.BaseFlow.initializeNewJobsMap(BaseFlow.java:1353)
>         at cascading.flow.BaseFlow.initialize(BaseFlow.java:247)
>         at cascading.flow.planner.FlowPlanner.buildFlow(FlowPlanner.java:203)
>         at cascading.flow.FlowConnector.connect(FlowConnector.java:456)
>         at com.progressfin.dataplatform.sip.SipAddressFlow.buildFlow(SipAddressFlow.java:70)
>         at com.progressfin.dataplatform.AllTheFlows.getAllFlows(AllTheFlows.java:141)
>         at com.progressfin.dataplatform.AllTheFlows.getEverythingCascade(AllTheFlows.java:119)
>         at com.progressfin.dataplatform.Main.run(Main.java:114)
>         at com.progressfin.dataplatform.Main.main(Main.java:81)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
>         at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.mapreduce.split.TezGroupedSplitsInputFormat
>         at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>         at org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:200)
>         at org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:168)
>         ... 22 more
> {code}
> So if I may suggest a solution, perhaps Tez should refrain from putting any classes under the org.apache.hadoop package, because Hadoop may refuse to load them under some configurations!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)