You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@oozie.apache.org by "Robert Kanter (JIRA)" <ji...@apache.org> on 2015/08/14 01:43:46 UTC

[jira] [Updated] (OOZIE-2277) Honor oozie.action.sharelib.for.spark in Spark jobs

     [ https://issues.apache.org/jira/browse/OOZIE-2277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Kanter updated OOZIE-2277:
---------------------------------
    Attachment: OOZIE-2277.002.patch

The 002 patch should be correct now.  With Marcelo and Hari's help, and lots of trial and error, I was able to figure out what configs we need to set to what values for local, yarn-client, and yarn-cluster modes.  I put a large comment in {{SparkMain}} explaining what needs to be done for each mode.  The patch also makes the {{SparkConfigurationService}} ignore the {{spark.yarn.jar}} property, as this can conflict with the Spark jars in the Sharelib, especially in yarn-client mode.  I also changed the Spark sharelib pom a bit to make sure everything we need is there.  The patch got a little more complicated, so I put it up on RB: https://reviews.apache.org/r/37452/

> Honor oozie.action.sharelib.for.spark in Spark jobs
> ---------------------------------------------------
>
>                 Key: OOZIE-2277
>                 URL: https://issues.apache.org/jira/browse/OOZIE-2277
>             Project: Oozie
>          Issue Type: Improvement
>            Reporter: Ryan Brush
>            Assignee: Robert Kanter
>            Priority: Minor
>         Attachments: OOZIE-2277.001.patch, OOZIE-2277.002.patch
>
>
> Shared libraries specified by oozie.action.sharelib.for.spark are not visible in the Spark job itself. For instance, setting oozie.action.sharelib.for.spark to "spark,hcat" will not make the hcat jars usable in the Spark job. This is inconsistent with other actions (such as Java and MapReduce actions).
> Since the Spark action just calls SparkSubmit, it looks like we would need to explicitly pass the jars for the specified sharelibs into the SparkSubmit operation so they are available to the Spark operation itself. 
> One option: we can just pass the HDFS URLs to that command via the --jars parameter. This is actually what I've done to work around this issue; it makes for a long SparkSubmit command but works. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)