You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Pat Ferrel (JIRA)" <ji...@apache.org> on 2016/03/17 16:22:33 UTC

[jira] [Resolved] (MAHOUT-1762) Pick up $SPARK_HOME/conf/spark-defaults.conf on startup

     [ https://issues.apache.org/jira/browse/MAHOUT-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pat Ferrel resolved MAHOUT-1762.
--------------------------------
    Resolution: Won't Fix

We don't know of anything this blocks and moving to using sparksubmit was voted down, which only applies to Mahout CLI drivers anyway. All CLI drivers support passthrough of arbitrary key=value pairs, which go into the SparkConf and when using Mahout as a Lib you can create any arbitrary SparkConf.

Will not fix unless someone can explain the need. 

> Pick up $SPARK_HOME/conf/spark-defaults.conf on startup
> -------------------------------------------------------
>
>                 Key: MAHOUT-1762
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1762
>             Project: Mahout
>          Issue Type: Improvement
>          Components: spark
>            Reporter: Sergey Tryuber
>            Assignee: Pat Ferrel
>             Fix For: 1.0.0
>
>
> [spark-defaults.conf|http://spark.apache.org/docs/latest/configuration.html#dynamically-loading-spark-properties] is aimed to contain global configuration for Spark cluster. For example, in our HDP2.2 environment it contains:
> {noformat}
> spark.driver.extraJavaOptions      -Dhdp.version=2.2.0.0–2041
> spark.yarn.am.extraJavaOptions     -Dhdp.version=2.2.0.0–2041
> {noformat}
> and there are many other good things. Actually it is expected that when a user starts Spark Shell, it will be working fine. Unfortunately this does not happens with Mahout Spark Shell, because it ignores spark configuration and user has to copy-past lots of options into _MAHOUT_OPTS_.
> This happens because [org.apache.mahout.sparkbindings.shell.Main|https://github.com/apache/mahout/blob/master/spark-shell/src/main/scala/org/apache/mahout/sparkbindings/shell/Main.scala] is executed directly in [initialization script|https://github.com/apache/mahout/blob/master/bin/mahout]:
> {code}
> "$JAVA" $JAVA_HEAP_MAX $MAHOUT_OPTS -classpath "$CLASSPATH" "org.apache.mahout.sparkbindings.shell.Main" $@
> {code}
> In contrast, in Spark shell is indirectly invoked through spark-submit in [spark-shell|https://github.com/apache/spark/blob/master/bin/spark-shell] script:
> {code}
> "$FWDIR"/bin/spark-submit --class org.apache.spark.repl.Main "$@"
> {code}
> [SparkSubmit|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala] contains an additional initialization layer for loading properties file (see SparkSubmitArguments#mergeDefaultSparkProperties method).
> So there are two possible solutions:
> * use proper Spark-like initialization logic
> * use thin envelope like it is in H2O Sparkling Water ([sparkling-shell|https://github.com/h2oai/sparkling-water/blob/master/bin/sparkling-shell])



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)