You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Abhishek Modi (Jira)" <ji...@apache.org> on 2019/10/15 02:12:00 UTC
[jira] [Created] (SPARK-29472) Mechanism for Excluding Jars at Launch for YARN

Abhishek Modi created SPARK-29472:
-------------------------------------

             Summary: Mechanism for Excluding Jars at Launch for YARN
                 Key: SPARK-29472
                 URL: https://issues.apache.org/jira/browse/SPARK-29472
             Project: Spark
          Issue Type: New Feature
          Components: YARN
    Affects Versions: 2.4.4
            Reporter: Abhishek Modi


*Summary*

It would be convenient if there were an easy way to exclude jars from Spark’s classpath at launch time. This would complement the way in which jars can be added to the classpath using {{extraClassPath}}.

 

*Context*

The Spark build contains its dependency jars in the {{/jars}} directory. These jars become part of the executor’s classpath. By default on YARN, these jars are packaged and distributed to containers at launch ({{spark-submit}}) time.

 

While developing Spark applications, customers sometimes need to debug using different versions of dependencies. This can become difficult if the dependency (eg. Parquet 1.11.0) is one that Spark already has in {{/jars}} (eg. Parquet 1.10.1 in Spark 2.4), as the dependency included with Spark is preferentially loaded. 

 

Configurations such as {{userClassPathFirst}} are available. However these have often come with other side effects. For example, if the customer’s build includes Avro they will likely see {{Caused by: java.lang.LinkageError: loader constraint violation: when resolving method "org.apache.spark.SparkConf.registerAvroSchemas(Lscala/collection/Seq;)Lorg/apache/spark/SparkConf;" the class loader (instance of org/apache/spark/util/ChildFirstURLClassLoader) of the current class, com/uber/marmaray/common/spark/SparkFactory, and the class loader (instance of sun/misc/Launcher$AppClassLoader) for the method's defining class, org/apache/spark/SparkConf, have different Class objects for the type scala/collection/Seq used in the signature}}. Resolving such issues often takes many hours.

 

To deal with these sorts of issues, customers often download the Spark build, remove the target jars and then do spark-submit. Other times, customers may not be able to do spark-submit as it is gated behind some Spark Job Server. In this case, customers may try downloading the build, removing the jars, and then using configurations such as {{spark.yarn.dist.jars}} or {{spark.yarn.dist.archives}}. Both of these options are undesirable as they are very operationally heavy, error prone and often result in the customer’s spark builds going out of sync with the authoritative build. 

 

*Solution*

I’d like to propose adding a {{spark.yarn.jars.exclusionRegex}} configuration. Customers could provide a regex such as {{.\*parquet.\*}} and jar files matching this regex would not be included in the driver and executor classpath.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org