You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@oozie.apache.org by "Robert Kanter (JIRA)" <ji...@apache.org> on 2015/03/13 22:56:38 UTC

[jira] [Created] (OOZIE-2170) Oozie should automatically sets configs to make Spark jobs show up in the Spark History Server

Robert Kanter created OOZIE-2170:
------------------------------------

             Summary: Oozie should automatically sets configs to make Spark jobs show up in the Spark History Server
                 Key: OOZIE-2170
                 URL: https://issues.apache.org/jira/browse/OOZIE-2170
             Project: Oozie
          Issue Type: Improvement
          Components: action
    Affects Versions: trunk
            Reporter: Robert Kanter
            Assignee: Robert Kanter


If you use "yarn-cluster" for the Spark action's master, the Spark jobs don't show up in the Spark History Server or properly link to it from the Spark AM.

The user needs to set this in their Spark action in the workflow.xml:
{code:xml}
<spark-opts>--conf spark.yarn.historyServer.address=http://SPH18088 --conf spark.eventLog.dir=hdfs://NN:8020/user/spark/applicationHistory --conf spark.eventLog.enabled=true</spark-opts>
{code}

It would be nice if Oozie did this automatically via some oozie-site.xml config(s).  We could do something similar how the hadoop configs are setup where it will load a Spark .conf file from a directory based on the RM specified in the <job-tracker>.

While we're at it, it might be good to document how to use Spark on YARN:
# Include the spark-assembly jar with your workflow (this is unfortunately not published in maven)
# Specify "yarn-cluster" as the master



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)