You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Xiaochen Ouyang (Jira)" <ji...@apache.org> on 2021/02/24 03:29:00 UTC
[jira] [Comment Edited] (SPARK-33212) Upgrade to Hadoop 3.2.2 and move to shaded clients for Hadoop 3.x profile

    [ https://issues.apache.org/jira/browse/SPARK-33212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17289618#comment-17289618 ] 

Xiaochen Ouyang edited comment on SPARK-33212 at 2/24/21, 3:28 AM:
-------------------------------------------------------------------

Hi [~csun], we submit a spark application with command `spark-submit  --master yarn --class org.apache.spark.examples.SparkPi  /opt/spark/examples/jars/spark*.jar`.

1、Get AMIpFilter ClassNotFoundException , because there is no 'hadoop-client-minicluster.jar' in classpath. So we remove the line {color:#de350b}_'<scope>test</scope>'_ in parent pom.xml and resource-manager/yarn/pom.xml.{color}

2、rebuild spark project 、depoly binary jars and submit application

3、Get a new Exception as follows:

+2021-02-24 08:36:54,391 ERROR org.apache.spark.SparkContext: Error initializing SparkContext.
 java.lang.IllegalStateException: class org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter is not a javax.servlet.Filter+

 

The key reason is that spark dirver classloader load class `AmIpFilter` implements javax.servlet.Filter, but in shaded jar the class `Filter` imported like 'import +{color:#de350b}org.apache.hadoop.shaded.javax.servlet.Filter{color}+'. So, AmIpFilter can't be reflected in spark dirver process.

 

 


was (Author: ouyangxc.zte):
Hi [~csun], we submit a spark application with command `spark-submit  --master yarn --class org.apache.spark.examples.SparkPi  /opt/spark/examples/jars/spark*.jar`.

1、Get AMIpFilter ClassNotFoundException , because there is no 'hadoop-client-minicluster.jar' in classpath. So we remove the line {color:#de350b}_'<scope>test</scope>'_ in parent pom.xml and resource-manager/yarn/pom.xml.{color}
 
 2、rebuild spark project 、depoly binary jars and submit application
 
 3、Get a new Exception as follows:
 
 +2021-02-24 08:36:54,391 ERROR org.apache.spark.SparkContext: Error initializing SparkContext.
 java.lang.IllegalStateException: class org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter is not a javax.servlet.Filter+

 

The key reason is that spark dirver classloader load class `AmIpFilter` implements javax.servlet.Filter, but in shaded jar the class `Filter` imported like 'import +{color:#de350b}org.apache.hadoop.shaded.javax.servlet.Filter{color}+'. So, AmIpFilter can't be reflected in spark dirver proceess.

 

 

> Upgrade to Hadoop 3.2.2 and move to shaded clients for Hadoop 3.x profile
> -------------------------------------------------------------------------
>
>                 Key: SPARK-33212
>                 URL: https://issues.apache.org/jira/browse/SPARK-33212
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core, Spark Submit, SQL, YARN
>    Affects Versions: 3.0.1
>            Reporter: Chao Sun
>            Assignee: Chao Sun
>            Priority: Major
>              Labels: releasenotes
>             Fix For: 3.2.0
>
>
> Hadoop 3.x+ offers shaded client jars: hadoop-client-api and hadoop-client-runtime, which shade 3rd party dependencies such as Guava, protobuf, jetty etc. This Jira switches Spark to use these jars instead of hadoop-common, hadoop-client etc. Benefits include:
>  * It unblocks Spark from upgrading to Hadoop 3.2.2/3.3.0+. The newer versions of Hadoop have migrated to Guava 27.0+ and in order to resolve Guava conflicts, Spark depends on Hadoop to not leaking dependencies.
>  * It makes Spark/Hadoop dependency cleaner. Currently Spark uses both client-side and server-side Hadoop APIs from modules such as hadoop-common, hadoop-yarn-server-common etc. Moving to hadoop-client-api allows use to only use public/client API from Hadoop side.
>  * Provides a better isolation from Hadoop dependencies. In future Spark can better evolve without worrying about dependencies pulled from Hadoop side (which used to be a lot).
> *There are some behavior changes introduced with this JIRA, when people use Spark compiled with Hadoop 3.x:*
> - Users now need to make sure class path contains `hadoop-client-api` and `hadoop-client-runtime` jars when they deploy Spark with the `hadoop-provided` option. In addition, it is high recommended that they put these two jars before other Hadoop jars in the class path. Otherwise, conflicts such as from Guava could happen if classes are loaded from the other non-shaded Hadoop jars.
> - Since the new shaded Hadoop clients no longer include 3rd party dependencies. Users who used to depend on these now need to explicitly put the jars in their class path.
> Ideally the above should go to release notes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org