You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Luis Casillas (JIRA)" <ji...@apache.org> on 2016/06/11 01:26:21 UTC

[jira] [Updated] (TEZ-3299) Tez is incompatible with HADOOP_USE_CLIENT_CLASSLOADER=true

     [ https://issues.apache.org/jira/browse/TEZ-3299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Luis Casillas updated TEZ-3299:
-------------------------------
    Description: 
The ticket HADOOP-10893 introduced a new environment variable, HADOOP_USE_CLIENT_CLASSLOADER, that makes the hadoop jar command put the client application's own bundled jars (in the the jar file's lib/ directory ) ahead of those bundled by the Hadoop installation. 

Tez 0.8.3, however, does not play nicely with this feature.  The reason is that Tez has classes under the org.apache.hadoop package hierarchy (e.g., org.apache.hadoop.mapreduce.split.TezGroupedSplitsInputFormat).
Hadoop's ApplicationClassLoader class, which implements the HADOOP_USE_CLIENT_CLASSLOADER=true feature, in its default configuration will refuse to load classes inside the org.apache.hadoop packages, instead delegating to the parent classloader.  See the implementation for reference:

* https://github.com/c9n/hadoop/blob/master/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ApplicationClassLoader.java

The way that Elastic MapReduce 4.7.0 sets up the classpath for Tez 0.8.3, the tez-mapreduce-0.8.3.jar is in the client classpath, so in my Cascading application I get this *extremely confusing* failure:

1. The JVM can load the `org.apache.tez.mapreduce.input.MRInput$MRInputConfigBuilder` class successfully;
2. But it gets a `NoClassDefFoundError` for `org.apache.hadoop.mapreduce.split.TezGroupedSplitsInputFormat`

And the reason I say "extremely confusing" is because *both of these classes are in the same jar*!  This surprising difference is caused by ApplicationClassLoader, which logs its configuration at the beginning of the job:

{code}
16/06/11 00:51:15 INFO util.ApplicationClassLoader: system classes: [java., javax.accessibility., javax.activation., javax.activity., javax.annotation., javax.annotation.processing., javax.crypto., javax.imageio., javax.jws., javax.lang.model., -javax.management.j2ee., javax.management., javax.naming., javax.net., javax.print., javax.rmi., javax.script., -javax.security.auth.message., javax.security.auth., javax.security.cert., javax.security.sasl., javax.sound., javax.sql., javax.swing., javax.tools., javax.transaction., -javax.xml.registry., -javax.xml.rpc., javax.xml., org.w3c.dom., org.xml.sax., org.apache.commons.logging., org.apache.log4j., org.apache.hadoop., core-default.xml, hdfs-default.xml, mapred-default.xml, yarn-default.xml]
{code}

This can also be verified by exporting HADOOP_OPTS='-verbose:class' before running my application:

{code}
[Loaded org.apache.tez.mapreduce.partition.MRPartitioner from file:/usr/lib/tez/tez-mapreduce-0.8.3.jar]
[Loaded org.apache.tez.mapreduce.hadoop.MRInputHelpers from file:/usr/lib/tez/tez-mapreduce-0.8.3.jar]
[Loaded org.apache.tez.mapreduce.input.MRInput$MRInputHelpersInternal from file:/usr/lib/tez/tez-mapreduce-0.8.3.jar
]
[Loaded org.apache.hadoop.mapreduce.InputFormat from file:/usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-core-2.7.2-amzn-2.jar]

...

16/06/11 00:51:32 ERROR dataplatform.Main: Uncaught exception
java.lang.NoClassDefFoundError: org/apache/hadoop/mapreduce/split/TezGroupedSplitsInputFormat
        at org.apache.tez.mapreduce.input.MRInput$MRInputConfigBuilder.createGeneratorDataSource(MRInput.java:325)
        at org.apache.tez.mapreduce.input.MRInput$MRInputConfigBuilder.build(MRInput.java:249)
        at cascading.flow.tez.Hadoop2TezFlowStep.createVertex(Hadoop2TezFlowStep.java:515)
        at cascading.flow.tez.Hadoop2TezFlowStep.createDAG(Hadoop2TezFlowStep.java:216)
        at cascading.flow.tez.Hadoop2TezFlowStep.createFlowStepJob(Hadoop2TezFlowStep.java:197)
        at cascading.flow.tez.Hadoop2TezFlowStep.createFlowStepJob(Hadoop2TezFlowStep.java:123)
        at cascading.flow.planner.BaseFlowStep.getCreateFlowStepJob(BaseFlowStep.java:916)
        at cascading.flow.BaseFlow.initializeNewJobsMap(BaseFlow.java:1353)
        at cascading.flow.BaseFlow.initialize(BaseFlow.java:247)
        at cascading.flow.planner.FlowPlanner.buildFlow(FlowPlanner.java:203)
        at cascading.flow.FlowConnector.connect(FlowConnector.java:456)
        at com.progressfin.dataplatform.sip.SipAddressFlow.buildFlow(SipAddressFlow.java:70)
        at com.progressfin.dataplatform.AllTheFlows.getAllFlows(AllTheFlows.java:141)
        at com.progressfin.dataplatform.AllTheFlows.getEverythingCascade(AllTheFlows.java:119)
        at com.progressfin.dataplatform.Main.run(Main.java:114)
        at com.progressfin.dataplatform.Main.main(Main.java:81)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.mapreduce.split.TezGroupedSplitsInputFormat
        at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        at org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:200)
        at org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:168)
        ... 22 more
{code}

So if I may suggest a solution, perhaps Tez should refrain from putting any classes under the org.apache.hadoop package, because Hadoop may refuse to load them under some configurations!

  was:
The ticket HADOOP-10893 introduced a new environment variable, HADOOP_USE_CLIENT_CLASSLOADER, that makes the hadoop jar command put the client application's own bundled jars (in the the jar file's lib/ directory ) ahead of those bundled by the Hadoop installation. 

Tez 0.8.3, however, does not play nicely with this feature.  The reason is that Tez has classes under the org.apache.hadoop package hierarchy (e.g., org.apache.hadoop.mapreduce.split.TezGroupedSplitsInputFormat).
Hadoop's ApplicationClassLoader class, which implements the HADOOP_USE_CLIENT_CLASSLOADER=true feature, in its default configuration will refuse to load classes inside the org.apache.hadoop packages, instead delegating to the parent classloader.  See the implementation for reference:

* https://github.com/c9n/hadoop/blob/master/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ApplicationClassLoader.java

The way that Elastic MapReduce 4.7.0 sets up the classpath for Tez 0.8.3, the tez-mapreduce-0.8.3.jar is in the client classpath, so in my Cascading application I get this *extremely confusing* failure:

1. The JVM can load the `org.apache.tez.mapreduce.input.MRInput$MRInputConfigBuilder` class successfully;
2. But it gets a `NoClassDefFoundError` for `org.apache.hadoop.mapreduce.split.TezGroupedSplitsInputFormat`

And the reason I say "extremely confusing" is because *both of these classes are in the same jar*!  This surprising difference is caused by ApplicationClassLoader, which logs its configuration at the beginning of the job:

{code}
16/06/11 00:51:15 INFO util.ApplicationClassLoader: system classes: [java., javax.accessibility., javax.activation., javax.activity., javax.annotation., javax.annotation.processing., javax.crypto., javax.imageio., javax.jws., javax.lang.model., -javax.management.j2ee., javax.management., javax.naming., javax.net., javax.print., javax.rmi., javax.script., -javax.security.auth.message., javax.security.auth., javax.security.cert., javax.security.sasl., javax.sound., javax.sql., javax.swing., javax.tools., javax.transaction., -javax.xml.registry., -javax.xml.rpc., javax.xml., org.w3c.dom., org.xml.sax., org.apache.commons.logging., org.apache.log4j., org.apache.hadoop., core-default.xml, hdfs-default.xml, mapred-default.xml, yarn-default.xml]
{code}

This can also be verified by exporting HADOOP_OPTS='-verbose:class' before running my application:

    [Loaded org.apache.tez.mapreduce.partition.MRPartitioner from file:/usr/lib/tez/tez-mapreduce-0.8.3.jar]
    [Loaded org.apache.tez.mapreduce.hadoop.MRInputHelpers from file:/usr/lib/tez/tez-mapreduce-0.8.3.jar]
    [Loaded org.apache.tez.mapreduce.input.MRInput$MRInputHelpersInternal from file:/usr/lib/tez/tez-mapreduce-0.8.3.jar
    ]
    [Loaded org.apache.hadoop.mapreduce.InputFormat from file:/usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-core-2.7.2-amzn-2.jar]
    
    ...
    
    16/06/11 00:51:32 ERROR dataplatform.Main: Uncaught exception
    java.lang.NoClassDefFoundError: org/apache/hadoop/mapreduce/split/TezGroupedSplitsInputFormat
            at org.apache.tez.mapreduce.input.MRInput$MRInputConfigBuilder.createGeneratorDataSource(MRInput.java:325)
            at org.apache.tez.mapreduce.input.MRInput$MRInputConfigBuilder.build(MRInput.java:249)
            at cascading.flow.tez.Hadoop2TezFlowStep.createVertex(Hadoop2TezFlowStep.java:515)
            at cascading.flow.tez.Hadoop2TezFlowStep.createDAG(Hadoop2TezFlowStep.java:216)
            at cascading.flow.tez.Hadoop2TezFlowStep.createFlowStepJob(Hadoop2TezFlowStep.java:197)
            at cascading.flow.tez.Hadoop2TezFlowStep.createFlowStepJob(Hadoop2TezFlowStep.java:123)
            at cascading.flow.planner.BaseFlowStep.getCreateFlowStepJob(BaseFlowStep.java:916)
            at cascading.flow.BaseFlow.initializeNewJobsMap(BaseFlow.java:1353)
            at cascading.flow.BaseFlow.initialize(BaseFlow.java:247)
            at cascading.flow.planner.FlowPlanner.buildFlow(FlowPlanner.java:203)
            at cascading.flow.FlowConnector.connect(FlowConnector.java:456)
            at com.progressfin.dataplatform.sip.SipAddressFlow.buildFlow(SipAddressFlow.java:70)
            at com.progressfin.dataplatform.AllTheFlows.getAllFlows(AllTheFlows.java:141)
            at com.progressfin.dataplatform.AllTheFlows.getEverythingCascade(AllTheFlows.java:119)
            at com.progressfin.dataplatform.Main.run(Main.java:114)
            at com.progressfin.dataplatform.Main.main(Main.java:81)
            at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
            at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
            at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
            at java.lang.reflect.Method.invoke(Method.java:498)
            at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
            at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
    Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.mapreduce.split.TezGroupedSplitsInputFormat
            at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
            at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
            at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
            at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
            at org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:200)
            at org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:168)
            ... 22 more

So if I may suggest a solution, perhaps Tez should refrain from putting any classes under the org.apache.hadoop package, because Hadoop may refuse to load them under some configurations!


> Tez is incompatible with HADOOP_USE_CLIENT_CLASSLOADER=true
> -----------------------------------------------------------
>
>                 Key: TEZ-3299
>                 URL: https://issues.apache.org/jira/browse/TEZ-3299
>             Project: Apache Tez
>          Issue Type: Bug
>    Affects Versions: 0.8.3
>         Environment: Elastic MapReduce 4.7.0
>            Reporter: Luis Casillas
>
> The ticket HADOOP-10893 introduced a new environment variable, HADOOP_USE_CLIENT_CLASSLOADER, that makes the hadoop jar command put the client application's own bundled jars (in the the jar file's lib/ directory ) ahead of those bundled by the Hadoop installation. 
> Tez 0.8.3, however, does not play nicely with this feature.  The reason is that Tez has classes under the org.apache.hadoop package hierarchy (e.g., org.apache.hadoop.mapreduce.split.TezGroupedSplitsInputFormat).
> Hadoop's ApplicationClassLoader class, which implements the HADOOP_USE_CLIENT_CLASSLOADER=true feature, in its default configuration will refuse to load classes inside the org.apache.hadoop packages, instead delegating to the parent classloader.  See the implementation for reference:
> * https://github.com/c9n/hadoop/blob/master/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ApplicationClassLoader.java
> The way that Elastic MapReduce 4.7.0 sets up the classpath for Tez 0.8.3, the tez-mapreduce-0.8.3.jar is in the client classpath, so in my Cascading application I get this *extremely confusing* failure:
> 1. The JVM can load the `org.apache.tez.mapreduce.input.MRInput$MRInputConfigBuilder` class successfully;
> 2. But it gets a `NoClassDefFoundError` for `org.apache.hadoop.mapreduce.split.TezGroupedSplitsInputFormat`
> And the reason I say "extremely confusing" is because *both of these classes are in the same jar*!  This surprising difference is caused by ApplicationClassLoader, which logs its configuration at the beginning of the job:
> {code}
> 16/06/11 00:51:15 INFO util.ApplicationClassLoader: system classes: [java., javax.accessibility., javax.activation., javax.activity., javax.annotation., javax.annotation.processing., javax.crypto., javax.imageio., javax.jws., javax.lang.model., -javax.management.j2ee., javax.management., javax.naming., javax.net., javax.print., javax.rmi., javax.script., -javax.security.auth.message., javax.security.auth., javax.security.cert., javax.security.sasl., javax.sound., javax.sql., javax.swing., javax.tools., javax.transaction., -javax.xml.registry., -javax.xml.rpc., javax.xml., org.w3c.dom., org.xml.sax., org.apache.commons.logging., org.apache.log4j., org.apache.hadoop., core-default.xml, hdfs-default.xml, mapred-default.xml, yarn-default.xml]
> {code}
> This can also be verified by exporting HADOOP_OPTS='-verbose:class' before running my application:
> {code}
> [Loaded org.apache.tez.mapreduce.partition.MRPartitioner from file:/usr/lib/tez/tez-mapreduce-0.8.3.jar]
> [Loaded org.apache.tez.mapreduce.hadoop.MRInputHelpers from file:/usr/lib/tez/tez-mapreduce-0.8.3.jar]
> [Loaded org.apache.tez.mapreduce.input.MRInput$MRInputHelpersInternal from file:/usr/lib/tez/tez-mapreduce-0.8.3.jar
> ]
> [Loaded org.apache.hadoop.mapreduce.InputFormat from file:/usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-core-2.7.2-amzn-2.jar]
> ...
> 16/06/11 00:51:32 ERROR dataplatform.Main: Uncaught exception
> java.lang.NoClassDefFoundError: org/apache/hadoop/mapreduce/split/TezGroupedSplitsInputFormat
>         at org.apache.tez.mapreduce.input.MRInput$MRInputConfigBuilder.createGeneratorDataSource(MRInput.java:325)
>         at org.apache.tez.mapreduce.input.MRInput$MRInputConfigBuilder.build(MRInput.java:249)
>         at cascading.flow.tez.Hadoop2TezFlowStep.createVertex(Hadoop2TezFlowStep.java:515)
>         at cascading.flow.tez.Hadoop2TezFlowStep.createDAG(Hadoop2TezFlowStep.java:216)
>         at cascading.flow.tez.Hadoop2TezFlowStep.createFlowStepJob(Hadoop2TezFlowStep.java:197)
>         at cascading.flow.tez.Hadoop2TezFlowStep.createFlowStepJob(Hadoop2TezFlowStep.java:123)
>         at cascading.flow.planner.BaseFlowStep.getCreateFlowStepJob(BaseFlowStep.java:916)
>         at cascading.flow.BaseFlow.initializeNewJobsMap(BaseFlow.java:1353)
>         at cascading.flow.BaseFlow.initialize(BaseFlow.java:247)
>         at cascading.flow.planner.FlowPlanner.buildFlow(FlowPlanner.java:203)
>         at cascading.flow.FlowConnector.connect(FlowConnector.java:456)
>         at com.progressfin.dataplatform.sip.SipAddressFlow.buildFlow(SipAddressFlow.java:70)
>         at com.progressfin.dataplatform.AllTheFlows.getAllFlows(AllTheFlows.java:141)
>         at com.progressfin.dataplatform.AllTheFlows.getEverythingCascade(AllTheFlows.java:119)
>         at com.progressfin.dataplatform.Main.run(Main.java:114)
>         at com.progressfin.dataplatform.Main.main(Main.java:81)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
>         at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.mapreduce.split.TezGroupedSplitsInputFormat
>         at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>         at org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:200)
>         at org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:168)
>         ... 22 more
> {code}
> So if I may suggest a solution, perhaps Tez should refrain from putting any classes under the org.apache.hadoop package, because Hadoop may refuse to load them under some configurations!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)