You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@oozie.apache.org by "Satish Subhashrao Saley (JIRA)" <ji...@apache.org> on 2016/09/15 22:19:20 UTC

[jira] [Commented] (OOZIE-2606) Set spark.yarn.jars to fix Spark 2.0 with Oozie

    [ https://issues.apache.org/jira/browse/OOZIE-2606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15494685#comment-15494685 ] 

Satish Subhashrao Saley commented on OOZIE-2606:
------------------------------------------------

- In case of spark 1.X, we *can* have both {{spark.yarn.jar}} and {{spark.yarn.jars}}. {{spark.yarn.jars}} will get ignored anyway.
- By setting {{spark.yarn.jar}} configuration in case of spark-1, we will avoid multiple distribution of spark-yarn/spark-assembly jar.
- In case of spark 2.X, we *cannot* have both {{spark.yarn.jar}} and {{spark.yarn.jars}}. It causes problems.

-The approach in the patch is to look at the spark version and populate the configuration accordingly. For checking the spark version, I am checking the "Specification-Version" field in the jar manifest. (Any cleaner alternatives?)
- We are still keeping the {{--files}} option as it is required in case of spark-1 and not causing any issues with spark-2 even if some of the uris present in {{spark.yarn.jars}} and {{--files}} are same. Files will get distributed only once. 
- Also, for spark-2.X, we need to bump up the versions of some libraries. I created a profile for spark-2 and spark-1 (spark-1 being the default). For me, spark-1.X did not work with newer version of those libraries.

> Set spark.yarn.jars to fix Spark 2.0 with Oozie
> -----------------------------------------------
>
>                 Key: OOZIE-2606
>                 URL: https://issues.apache.org/jira/browse/OOZIE-2606
>             Project: Oozie
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 4.2.0
>            Reporter: Jonathan Kelly
>            Assignee: Satish Subhashrao Saley
>              Labels: spark, spark2.0.0
>             Fix For: 4.3.0
>
>         Attachments: OOZIE-2606-2.patch, OOZIE-2606.patch
>
>
> Oozie adds all of the jars in the Oozie Spark sharelib to the DistributedCache such that all jars will be present in the current working directory of the YARN container (as well as in the container classpath). However, this is not quite enough to make Spark 2.0 work, since Spark 2.0 by default looks for the jars in assembly/target/scala-2.11/jars [1] (as if it is a locally built distribution for development) and will not find them in the current working directory.
> To fix this, we can set spark.yarn.jars to *.jar so that it finds the jars in the current working directory rather than looking in the wrong place. [2]
> [1] https://github.com/apache/spark/blob/v2.0.0-rc2/launcher/src/main/java/org/apache/spark/launcher/CommandBuilderUtils.java#L357
> [2] https://github.com/apache/spark/blob/v2.0.0-rc2/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala#L476
> Note: This property will be ignored by Spark 1.x.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)