You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@oozie.apache.org by "Robert Kanter (JIRA)" <ji...@apache.org> on 2015/07/30 03:05:05 UTC

[jira] [Updated] (OOZIE-2277) Honor oozie.action.sharelib.for.spark in Spark jobs

     [ https://issues.apache.org/jira/browse/OOZIE-2277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Kanter updated OOZIE-2277:
---------------------------------
    Attachment: OOZIE-2277.001.patch

I couldn't get {{--jars}} to work.  I talked to some Spark guys and they said to use the {{SPARK_DIST_CLASSPATH}} env var.  Unfortunately, we can't easily do that because of how the Launcher Job works.  But it turns out we can use {{spark.executor.extraClassPath}} and {{spark.driver.extraClassPath}} to add the jars.

The patch sets {{spark.executor.extraClassPath}} and {{spark.driver.extraClassPath}} (or appends to if already defined) to the launcher job's classpath.  This classpath will have all of the localized jars that the user added via the sharelib, lib/ dir, etc and anything Oozie or Hadoop added (basically, the classpath that the other actions normally get).
It also fixes 

> Honor oozie.action.sharelib.for.spark in Spark jobs
> ---------------------------------------------------
>
>                 Key: OOZIE-2277
>                 URL: https://issues.apache.org/jira/browse/OOZIE-2277
>             Project: Oozie
>          Issue Type: Improvement
>            Reporter: Ryan Brush
>            Assignee: Robert Kanter
>            Priority: Minor
>         Attachments: OOZIE-2277.001.patch
>
>
> Shared libraries specified by oozie.action.sharelib.for.spark are not visible in the Spark job itself. For instance, setting oozie.action.sharelib.for.spark to "spark,hcat" will not make the hcat jars usable in the Spark job. This is inconsistent with other actions (such as Java and MapReduce actions).
> Since the Spark action just calls SparkSubmit, it looks like we would need to explicitly pass the jars for the specified sharelibs into the SparkSubmit operation so they are available to the Spark operation itself. 
> One option: we can just pass the HDFS URLs to that command via the --jars parameter. This is actually what I've done to work around this issue; it makes for a long SparkSubmit command but works. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)