You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by pwendell <gi...@git.apache.org> on 2014/04/02 08:13:41 UTC

[GitHub] spark pull request: [WIP] Clean up and simplify Spark configuratio...

GitHub user pwendell opened a pull request:

    https://github.com/apache/spark/pull/299

    [WIP] Clean up and simplify Spark configuration

    Over time as we've added more deployment modes, this have gotten a bit unwieldy with user-facing configuration options in Spark. Going forward we'll advise all users to run `spark-submit` to launch applications. This is a WIP patch but it makes the following improvements:
    
    1. Improved `spark-env.sh.template` which was missing a lot of things users now set in that file.
    2. Removes the shipping of SPARK_CLASSPATH, SPARK_JAVA_OPTS, and SPARK_LIBRARY_PATH to the executors on the cluster. This was an ugly hack. Instead it introduces config variables spark.executor.extraJavaOpts, spark.executor.extraLibraryPath, and spark.executor.extraClassPath.
    3. Adds ability to set these same variables for the driver using `spark-submit`. 
    4. Allows you to load system properties from a `spark-defaults.conf` file when running `spark-submit`. This will allow setting both SparkConf options and other system properties utilized by `spark-submit`.
    5. Made `SPARK_LOCAL_IP` an environment variable rather than a SparkConf property. This is more consistent with it being set on each node.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/pwendell/spark config-cleanup

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/299.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #299
    
----
commit 7be5d8a08aecd94df0d36501781535c5d1aaa1c1
Author: Patrick Wendell <pw...@gmail.com>
Date:   2014-03-31T20:16:46Z

    Change spark.local.dir -> SPARK_LOCAL_DIRS

commit 8804e39e2fc0af20bc4e1603b15af0dd0ddf5985
Author: Patrick Wendell <pw...@gmail.com>
Date:   2014-04-01T07:13:52Z

    Stash of adding config options in submit script and YARN

commit 5e9485f9b58de132d6734e995f7e002a9413a518
Author: Patrick Wendell <pw...@gmail.com>
Date:   2014-04-01T22:00:42Z

    executorJavaOpts

commit f3598014134df4c36f12221a8ee7c6dbb9ad9d7e
Author: Patrick Wendell <pw...@gmail.com>
Date:   2014-04-01T22:10:31Z

    Remove SPARK_LIBRARY_PATH

commit 581391a0ea273689c3e74611f283df168da0565d
Author: Patrick Wendell <pw...@gmail.com>
Date:   2014-04-01T22:18:42Z

    SPARK_JAVA_OPTS --> SPARK_MASTER_OPTS for master settings

commit a2c2bf091899bdef6511fb977cab42cedfab4995
Author: Patrick Wendell <pw...@gmail.com>
Date:   2014-04-01T23:31:11Z

    Small clean-up

commit 82191a6d0f3a7dc79640c6a1dc1597bf22287dfe
Author: Patrick Wendell <pw...@gmail.com>
Date:   2014-04-02T00:35:13Z

    Don't ship executor envs

commit a57c4f87f9026dcf0a37f41d3ed1785182bae82e
Author: Patrick Wendell <pw...@gmail.com>
Date:   2014-04-02T01:02:43Z

    Clean up terminology inside of spark-env script

commit 3ffc125683ceeed1ea07dd28c89c85b3ae31aa26
Author: Patrick Wendell <pw...@gmail.com>
Date:   2014-04-02T05:45:59Z

    Library path and classpath for drivers

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Clean up and simplify Spark configuration

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/299#issuecomment-40913772
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Clean up and simplify Spark configuration

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/299#issuecomment-40428500
  
    Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Clean up and simplify Spark configuration

Posted by sryza <gi...@git.apache.org>.

Github user sryza commented on a diff in the pull request:

    https://github.com/apache/spark/pull/299#discussion_r11718975
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkConf.scala ---
    @@ -208,6 +208,81 @@ class SparkConf(loadDefaults: Boolean) extends Cloneable with Logging {
         new SparkConf(false).setAll(settings)
       }
     
    +  /** Checks for illegal or deprecated config settings. Throws an exception for the former. Not
    +    * idempotent - may mutate this conf object to convert deprecated settings to supported ones. */
    +  private[spark] def validateSettings() {
    +    if (settings.contains("spark.local.dir")) {
    +      val msg = "In Spark 1.0 and later spark.local.dir will be overridden by the value set by " +
    +        "the cluster manager (via SPARK_LOCAL_DIRS in mesos/standalone and LOCAL_DIRS in YARN)."
    +      logWarning(msg)
    +    }
    +
    +    val executorOptsKey = "spark.executor.extraJavaOptions"
    +    val executorClasspathKey = "spark.executor.extraClassPath"
    +    val driverOptsKey = "spark.driver.extraJavaOptions"
    +    val driverClassPathKey = "spark.driver.extraClassPath"
    +
    +    // Validate spark.executor.extraJavaOptions
    +    settings.get(executorOptsKey).map { javaOpts =>
    +      if (javaOpts.contains("-Dspark")) {
    +        val msg = s"$executorOptsKey is not allowed to set Spark options. Was '$javaOpts'"
    +        throw new Exception(msg)
    +      }
    +      if (javaOpts.contains("-Xmx") || javaOpts.contains("-Xms")) {
    +        val msg = s"$executorOptsKey is not allowed to alter memory settings (was '$javaOpts'). " +
    +          "Use spark.executor.memory instead."
    +        throw new Exception(msg)
    +      }
    +    }
    +
    +    // Check for legacy configs
    +    sys.env.get("SPARK_JAVA_OPTS").foreach { value =>
    +      val error =
    +        s"""
    +          |SPARK_JAVA_OPTS was detected (set to '$value').
    +          |This has undefined behavior when running on a cluster and is deprecated in Spark 1.0+.
    +          |
    +          |Please instead use:
    +          | - ./spark-submit with conf/spark-defaults.conf to set properties for an application
    +          | - ./spark-submit with --driver-java-options to set -X options for a driver
    +          | - spark.executor.executor.extraJavaOptions to set -X options for executors
    --- End diff --
    
    repetition of "executor"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Clean up and simplify Spark configuration

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/299#discussion_r11721526
  
    --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
    @@ -333,15 +325,29 @@ trait ClientBase extends Logging {
         if (useConcurrentAndIncrementalGC) {
           // In our expts, using (default) throughput collector has severe perf ramifications in
           // multi-tenant machines
    -      JAVA_OPTS += " -XX:+UseConcMarkSweepGC "
    -      JAVA_OPTS += " -XX:+CMSIncrementalMode "
    -      JAVA_OPTS += " -XX:+CMSIncrementalPacing "
    -      JAVA_OPTS += " -XX:CMSIncrementalDutyCycleMin=0 "
    -      JAVA_OPTS += " -XX:CMSIncrementalDutyCycle=10 "
    +      JAVA_OPTS += "-XX:+UseConcMarkSweepGC"
    +      JAVA_OPTS += "-XX:+CMSIncrementalMode"
    +      JAVA_OPTS += "-XX:+CMSIncrementalPacing"
    +      JAVA_OPTS += "-XX:CMSIncrementalDutyCycleMin=0"
    +      JAVA_OPTS += "-XX:CMSIncrementalDutyCycle=10"
         }
     
    -    if (env.isDefinedAt("SPARK_JAVA_OPTS")) {
    -      JAVA_OPTS += " " + env("SPARK_JAVA_OPTS")
    +    // TODO: it might be nicer to pass these as an internal environment variable rather than
    +    // as Java options, due to complications with string parsing of nested quotes.
    +    if (args.amClass == classOf[ExecutorLauncher].getName) {
    +      // If we are being launched in client mode, forward the spark-conf options
    +      // onto the executor launcher
    +      for ((k, v) <- sparkConf.getAll) {
    +        JAVA_OPTS += "-D" + k + "=" + "\\\"" + v + "\\\""
    --- End diff --
    
    Oop you're right


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Clean up and simplify Spark configuration

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/299#issuecomment-40453639
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP] Clean up and simplify Spark configuratio...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/299#issuecomment-40326340
  
    
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14107/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Clean up and simplify Spark configuration

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/299#issuecomment-40458976
  
    Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP] Clean up and simplify Spark configuratio...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/299#issuecomment-40325340
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP] Clean up and simplify Spark configuratio...

Posted by aarondav <gi...@git.apache.org>.

Github user aarondav commented on a diff in the pull request:

    https://github.com/apache/spark/pull/299#discussion_r11220055
  
    --- Diff: docs/configuration.md ---
    @@ -586,6 +589,16 @@ Apart from these, the following properties are also available, and may be useful
         Number of cores to allocate for each task.
       </td>
     </tr>
    +<tr>
    +  <td>spark.executor.extraJavaOptions</td>
    +  <td>(none)</td>
    +  <td>
    +    A string of extra JVM options to pass to executors. For instance, GC settings or custom
    +    paths for native code. Note that it is illegal to set Spark properties or heap size 
    --- End diff --
    
    Why is library path specified individually in spark.driver.libraryPath, but as a part of extraJavaOptions here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP] Clean up and simplify Spark configuratio...

Posted by aarondav <gi...@git.apache.org>.

Github user aarondav commented on a diff in the pull request:

    https://github.com/apache/spark/pull/299#discussion_r11327148
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkConf.scala ---
    @@ -208,6 +210,26 @@ class SparkConf(loadDefaults: Boolean) extends Cloneable with Logging {
         new SparkConf(false).setAll(settings)
       }
     
    +  /** Checks for illegal or deprecated config settings. Throws an exception for the former. */
    +  private def validateSettings() {
    +    if (settings.contains("spark.local.dir")) {
    +      val msg = "In Spark 1.0 and later spark.local.dir will be overridden by the value set by " +
    +        "the cluster manager (via SPARK_LOCAL_DIRS in mesos/standalone and LOCAL_DIRS in YARN)."
    +      logWarning(msg)
    +    }
    +    val executorOptsKey = "spark.executor.extraJavaOptions"
    +    settings.get(executorOptsKey).map { javaOpts =>
    +      if (javaOpts.contains("-Dspark")) {
    +        val msg = s"$executorOptsKey is not allowed to set Spark options. Was '$javaOpts'"
    +        throw new Exception(msg)
    +      }
    +      if (javaOpts.contains("-Xmx") || javaOpts.contains("-Xms")) {
    +        val msg = s"$executorOptsKey is not allowed to alter memory settings. Was '$javaOpts'"
    +        throw new Exception(msg)
    +      }
    --- End diff --
    
    This check is here because of the existence of spark.executor.memory. We always set a value of Xmx and Xms when starting executors (a default if no value specified), so this would conflict if the user adds these options.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Clean up and simplify Spark configuration

Posted by witgo <gi...@git.apache.org>.

Github user witgo commented on a diff in the pull request:

    https://github.com/apache/spark/pull/299#discussion_r11720287
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
    @@ -123,6 +142,14 @@ object SparkSubmit {
     
         val options = List[OptionAssigner](
           new OptionAssigner(appArgs.master, ALL_CLUSTER_MGRS, false, sysProp = "spark.master"),
    +
    +      new OptionAssigner(appArgs.driverExtraClassPath, STANDALONE | YARN, true,
    +        sysProp = "spark.driver.extraClassPath"),
    +      new OptionAssigner(appArgs.driverExtraJavaOptions, STANDALONE | YARN, true,
    +        sysProp = "spark.driver.extraJavaOpts"),
    +      new OptionAssigner(appArgs.driverExtraLibraryPath, STANDALONE | YARN, true,
    +        sysProp = "spark.driver.extraLibraryPath"),
    +
           new OptionAssigner(appArgs.driverMemory, YARN, true, clOption = "--driver-memory"),
           new OptionAssigner(appArgs.name, YARN, true, clOption = "--name"),
    --- End diff --
    
    If we can add a OptionAssigner to set spark.App.Name
    eg:
    ```scala
    new OptionAssigner(appArgs.name, STANDALONE | MESOS | YARN, false,
            sysProp = "spark.app.name"),
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Clean up and simplify Spark configuration

Posted by tgravescs <gi...@git.apache.org>.

Github user tgravescs commented on a diff in the pull request:

    https://github.com/apache/spark/pull/299#discussion_r11774783
  
    --- Diff: conf/spark-env.sh.template ---
    @@ -1,22 +1,42 @@
     #!/usr/bin/env bash
     
    -# This file contains environment variables required to run Spark. Copy it as
    -# spark-env.sh and edit that to configure Spark for your site.
    -#
    -# The following variables can be set in this file:
    +# This file is sourced when running various Spark programs.
    +# Copy it as spark-env.sh and edit that to configure Spark for your site.
    +
    +# Options read when launching programs locally with 
    +# ./bin/run-example or ./bin/spark-submit
    +# - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
    +# - SPARK_PUBLIC_DNS, to set the public dns name of the driver program
    +# - SPARK_CLASSPATH, default classpath entries to append
    +
    +# Options read by executors and drivers running inside the cluster
     # - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
    +# - SPARK_PUBLIC_DNS, to set the public DNS name of the driver program
    +# - SPARK_CLASSPATH, default classpath entries to append
    +# - SPARK_LOCAL_DIRS, storage directories to use on this node for shuffle and RDD data
     # - MESOS_NATIVE_LIBRARY, to point to your libmesos.so if you use Mesos
    -# - SPARK_JAVA_OPTS, to set node-specific JVM options for Spark. Note that
    -#   we recommend setting app-wide options in the application's driver program.
    -#     Examples of node-specific options : -Dspark.local.dir, GC options
    -#     Examples of app-wide options : -Dspark.serializer
    -#
    -# If using the standalone deploy mode, you can also set variables for it here:
    +
    +# Options read in YARN client mode
    +# - SPARK_YARN_APP_JAR, Path to your application’s JAR file (required)
    --- End diff --
    
    SPARK_YARN_APP_JAR is no longer used.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Clean up and simplify Spark configuration

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/299#discussion_r11719922
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
    @@ -123,6 +142,14 @@ object SparkSubmit {
     
         val options = List[OptionAssigner](
           new OptionAssigner(appArgs.master, ALL_CLUSTER_MGRS, false, sysProp = "spark.master"),
    +
    --- End diff --
    
    nit: one new line too many


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Clean up and simplify Spark configuration

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/299#discussion_r11719879
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
    @@ -17,14 +17,17 @@
     
     package org.apache.spark.deploy
     
    -import java.io.{PrintStream, File}
    +import java.io.{IOException, FileInputStream, PrintStream, File}
    --- End diff --
    
    and group `org.apache.spark` imports together


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP] Clean up and simplify Spark configuratio...

Posted by aarondav <gi...@git.apache.org>.

Github user aarondav commented on a diff in the pull request:

    https://github.com/apache/spark/pull/299#discussion_r11220148
  
    --- Diff: conf/spark-env.sh.template ---
    @@ -1,19 +1,36 @@
     #!/usr/bin/env bash
     
    -# This file contains environment variables required to run Spark. Copy it as
    -# spark-env.sh and edit that to configure Spark for your site.
    -#
    -# The following variables can be set in this file:
    +# This file is sourced when running various Spark classes. 
    +# Copy it as spark-env.sh and edit that to configure Spark for your site.
    +
    +# Options read when launching programs locally with 
    +# ./bin/spark-example or ./bin/spark-submit
    +# - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
    +# - SPARK_PUBLIC_DNS, to set the public dns name of the driver program
    +# - SPARK_CLASSPATH, default classpath entries to append
    +
    +# Options read by executors and drivers running inside the cluster
     # - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
    +# - SPARK_PUBLIC_DNS, to set the public DNS name of the driver program
    +# - SPARK_LOCAL_DIRS, shuffle directories to use on this node
     # - MESOS_NATIVE_LIBRARY, to point to your libmesos.so if you use Mesos
    -# - SPARK_JAVA_OPTS, to set node-specific JVM options for Spark. Note that
    -#   we recommend setting app-wide options in the application's driver program.
    -#     Examples of node-specific options : -Dspark.local.dir, GC options
    -#     Examples of app-wide options : -Dspark.serializer
    -#
    -# If using the standalone deploy mode, you can also set variables for it here:
    +# - SPARK_CLASSPATH, default classpath entries to append
    +
    +# Options read in YARN client mode
    +# - SPARK_YARN_APP_JAR, Path to your application’s JAR file (required)
    +# - SPARK_WORKER_INSTANCES, Number of workers to start (Default: 2)
    +# - SPARK_WORKER_CORES, Number of cores for the workers (Default: 1).
    +# - SPARK_WORKER_MEMORY, Memory per Worker (e.g. 1000M, 2G) (Default: 1G)
    +# - SPARK_MASTER_MEMORY, Memory for Master (e.g. 1000M, 2G) (Default: 512 Mb)
    +# - SPARK_YARN_APP_NAME, The name of your application (Default: Spark)
    +# - SPARK_YARN_QUEUE, The hadoop queue to use for allocation requests (Default: ‘default’)
    +# - SPARK_YARN_DIST_FILES, Comma separated list of files to be distributed with the job.
    +# - SPARK_YARN_DIST_ARCHIVES, Comma separated list of archives to be distributed with the job.
    +
    +# Options for the daemons used in the standalone deploy mode:
     # - SPARK_MASTER_IP, to bind the master to a different IP address or hostname
     # - SPARK_MASTER_PORT / SPARK_MASTER_WEBUI_PORT, to use non-default ports
    +# - SPARK_MASTER_OPTS, to set config properties at the master (e.g "-Dx=y")
    --- End diff --
    
    SPARK_WORKER_OPTS is not listed here (though note that SPARK_DAEMON_OPTS/MEMORY is the more general option for both)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP] Clean up and simplify Spark configuratio...

Posted by pwendell <gi...@git.apache.org>.

Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/299#discussion_r11567194
  
    --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
    @@ -340,8 +341,22 @@ trait ClientBase extends Logging {
           JAVA_OPTS += " -XX:CMSIncrementalDutyCycle=10 "
         }
     
    -    if (env.isDefinedAt("SPARK_JAVA_OPTS")) {
    -      JAVA_OPTS += " " + env("SPARK_JAVA_OPTS")
    +
    --- End diff --
    
    Thinking a bit more, we have two options here:
    
    (a) make a backwards incompatible change here and people have to re-write there jobs
    (b) continue supporting shipping SPARK_JAVA_OPTS from the driver to the executors for the entire 1.X family of Spark releases (e.g probably years).
    
    I guess we can do (a) but I might give a loud error message here so that users change this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Clean up and simplify Spark configuration

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/299#issuecomment-40832146
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Clean up and simplify Spark configuration

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/299#issuecomment-40915519
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Clean up and simplify Spark configuration

Posted by JoshRosen <gi...@git.apache.org>.

Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/299#discussion_r15976048
  
    --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala ---
    @@ -64,9 +64,10 @@ private[spark] class Executor(
       // to what Yarn on this system said was available. This will be used later when SparkEnv
       // created.
       if (java.lang.Boolean.valueOf(
    -      System.getProperty("SPARK_YARN_MODE", System.getenv("SPARK_YARN_MODE"))))
    -  {
    +      System.getProperty("SPARK_YARN_MODE", System.getenv("SPARK_YARN_MODE")))) {
         conf.set("spark.local.dir", getYarnLocalDirs())
    +  } else if (sys.env.contains("SPARK_LOCAL_DIRS")) {
    +    conf.set("spark.local.dir", sys.env("SPARK_LOCAL_DIRS"))
    --- End diff --
    
    Maybe the problem here lies with spark-ec2's default configuration setting SPARK_LOCAL_DIRS on the master when it should only really be used on workers, and in not setting `spark.local.dir`.
    
    I think the current documentation for SPARK_LOCAL_DIRS sort of suggests that it acts as an override, without any caveats about whether it only should be used on workers, etc.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [WIP] Clean up and simplify Spark configuratio...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/299#discussion_r11355444
  
    --- Diff: conf/spark-env.sh.template ---
    @@ -1,19 +1,36 @@
     #!/usr/bin/env bash
     
    -# This file contains environment variables required to run Spark. Copy it as
    -# spark-env.sh and edit that to configure Spark for your site.
    -#
    -# The following variables can be set in this file:
    +# This file is sourced when running various Spark classes. 
    +# Copy it as spark-env.sh and edit that to configure Spark for your site.
    +
    +# Options read when launching programs locally with 
    +# ./bin/spark-example or ./bin/spark-submit
    +# - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
    +# - SPARK_PUBLIC_DNS, to set the public dns name of the driver program
    +# - SPARK_CLASSPATH, default classpath entries to append
    +
    +# Options read by executors and drivers running inside the cluster
     # - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
    +# - SPARK_PUBLIC_DNS, to set the public DNS name of the driver program
    +# - SPARK_LOCAL_DIRS, shuffle directories to use on this node
     # - MESOS_NATIVE_LIBRARY, to point to your libmesos.so if you use Mesos
    -# - SPARK_JAVA_OPTS, to set node-specific JVM options for Spark. Note that
    -#   we recommend setting app-wide options in the application's driver program.
    -#     Examples of node-specific options : -Dspark.local.dir, GC options
    -#     Examples of app-wide options : -Dspark.serializer
    -#
    -# If using the standalone deploy mode, you can also set variables for it here:
    +# - SPARK_CLASSPATH, default classpath entries to append
    +
    +# Options read in YARN client mode
    +# - SPARK_YARN_APP_JAR, Path to your application’s JAR file (required)
    +# - SPARK_WORKER_INSTANCES, Number of workers to start (Default: 2)
    +# - SPARK_WORKER_CORES, Number of cores for the workers (Default: 1).
    +# - SPARK_WORKER_MEMORY, Memory per Worker (e.g. 1000M, 2G) (Default: 1G)
    +# - SPARK_MASTER_MEMORY, Memory for Master (e.g. 1000M, 2G) (Default: 512 Mb)
    +# - SPARK_YARN_APP_NAME, The name of your application (Default: Spark)
    +# - SPARK_YARN_QUEUE, The hadoop queue to use for allocation requests (Default: ‘default’)
    +# - SPARK_YARN_DIST_FILES, Comma separated list of files to be distributed with the job.
    +# - SPARK_YARN_DIST_ARCHIVES, Comma separated list of archives to be distributed with the job.
    +
    +# Options for the daemons used in the standalone deploy mode:
     # - SPARK_MASTER_IP, to bind the master to a different IP address or hostname
     # - SPARK_MASTER_PORT / SPARK_MASTER_WEBUI_PORT, to use non-default ports
    +# - SPARK_MASTER_OPTS, to set config properties at the master (e.g "-Dx=y")
    --- End diff --
    
    Hm... so shouldn't we list that here as well?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Clean up and simplify Spark configuration

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/299#discussion_r11721568
  
    --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
    @@ -313,13 +305,13 @@ trait ClientBase extends Logging {
     
         val amMemory = calculateAMMemory(newApp)
     
    -    var JAVA_OPTS = ""
    +    var JAVA_OPTS = ListBuffer[String]()
    --- End diff --
    
    I think this can be a val


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Clean up and simplify Spark configuration

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/299#issuecomment-40914960
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP] Clean up and simplify Spark configuratio...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/299#discussion_r11355670
  
    --- Diff: conf/spark-env.sh.template ---
    @@ -1,19 +1,36 @@
     #!/usr/bin/env bash
     
    -# This file contains environment variables required to run Spark. Copy it as
    -# spark-env.sh and edit that to configure Spark for your site.
    -#
    -# The following variables can be set in this file:
    +# This file is sourced when running various Spark classes. 
    +# Copy it as spark-env.sh and edit that to configure Spark for your site.
    +
    +# Options read when launching programs locally with 
    +# ./bin/spark-example or ./bin/spark-submit
    +# - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
    +# - SPARK_PUBLIC_DNS, to set the public dns name of the driver program
    +# - SPARK_CLASSPATH, default classpath entries to append
    +
    +# Options read by executors and drivers running inside the cluster
     # - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
    +# - SPARK_PUBLIC_DNS, to set the public DNS name of the driver program
    +# - SPARK_LOCAL_DIRS, shuffle directories to use on this node
     # - MESOS_NATIVE_LIBRARY, to point to your libmesos.so if you use Mesos
    -# - SPARK_JAVA_OPTS, to set node-specific JVM options for Spark. Note that
    -#   we recommend setting app-wide options in the application's driver program.
    -#     Examples of node-specific options : -Dspark.local.dir, GC options
    -#     Examples of app-wide options : -Dspark.serializer
    -#
    -# If using the standalone deploy mode, you can also set variables for it here:
    +# - SPARK_CLASSPATH, default classpath entries to append
    +
    +# Options read in YARN client mode
    +# - SPARK_YARN_APP_JAR, Path to your application’s JAR file (required)
    +# - SPARK_WORKER_INSTANCES, Number of workers to start (Default: 2)
    +# - SPARK_WORKER_CORES, Number of cores for the workers (Default: 1).
    +# - SPARK_WORKER_MEMORY, Memory per Worker (e.g. 1000M, 2G) (Default: 1G)
    +# - SPARK_MASTER_MEMORY, Memory for Master (e.g. 1000M, 2G) (Default: 512 Mb)
    +# - SPARK_YARN_APP_NAME, The name of your application (Default: Spark)
    +# - SPARK_YARN_QUEUE, The hadoop queue to use for allocation requests (Default: ‘default’)
    +# - SPARK_YARN_DIST_FILES, Comma separated list of files to be distributed with the job.
    +# - SPARK_YARN_DIST_ARCHIVES, Comma separated list of archives to be distributed with the job.
    +
    +# Options for the daemons used in the standalone deploy mode:
     # - SPARK_MASTER_IP, to bind the master to a different IP address or hostname
     # - SPARK_MASTER_PORT / SPARK_MASTER_WEBUI_PORT, to use non-default ports
    +# - SPARK_MASTER_OPTS, to set config properties at the master (e.g "-Dx=y")
     # - SPARK_WORKER_CORES, to set the number of cores to use on this machine
     # - SPARK_WORKER_MEMORY, to set how much memory to use (e.g. 1000m, 2g)
    --- End diff --
    
    Oh I see, the former is the total amount of memory for all executors on one machine, but the latter is the memory given to the Worker daemon thread that launches these executors...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Clean up and simplify Spark configuration

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/299#issuecomment-40831695
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Clean up and simplify Spark configuration

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/299#issuecomment-40788261
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Clean up and simplify Spark configuration

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/299#discussion_r11720122
  
    --- Diff: yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala ---
    @@ -42,6 +42,9 @@ import org.apache.spark.deploy.SparkHadoopUtil
     import org.apache.spark.util.Utils
     
     
    +/**
    + * An application master that runs the users driver program and allocates executors.
    --- End diff --
    
    user's


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Clean up and simplify Spark configuration

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/299#issuecomment-40453632
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Clean up and simplify Spark configuration

Posted by tgravescs <gi...@git.apache.org>.

Github user tgravescs commented on the pull request:

    https://github.com/apache/spark/pull/299#issuecomment-40642445
  
    The SPARK_CLASSPATH only did anything if you were running in yarn-client mode for the client itself.  yarn-standalone didn't propogate it to the real CLASSPATH.  With the changes you made now it does.
    
    I think its fine to leave since some people were asking for it.  Lets just not document it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP] Clean up and simplify Spark configuratio...

Posted by mateiz <gi...@git.apache.org>.

Github user mateiz commented on a diff in the pull request:

    https://github.com/apache/spark/pull/299#discussion_r11513514
  
    --- Diff: conf/spark-env.sh.template ---
    @@ -1,19 +1,36 @@
     #!/usr/bin/env bash
     
    -# This file contains environment variables required to run Spark. Copy it as
    -# spark-env.sh and edit that to configure Spark for your site.
    -#
    -# The following variables can be set in this file:
    +# This file is sourced when running various Spark classes. 
    +# Copy it as spark-env.sh and edit that to configure Spark for your site.
    +
    +# Options read when launching programs locally with 
    +# ./bin/spark-example or ./bin/spark-submit
    --- End diff --
    
    Are we calling it spark-example or run-example?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP] Clean up and simplify Spark configuratio...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/299#issuecomment-40324104
  
    Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Clean up and simplify Spark configuration

Posted by tgravescs <gi...@git.apache.org>.

Github user tgravescs commented on the pull request:

    https://github.com/apache/spark/pull/299#issuecomment-41046566
  
    I don't see where you addressed my comment about spark.authenticate not being propogated to the executors on yarn properly.   I gave it a quick try and its not working.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP] Clean up and simplify Spark configuratio...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/299#issuecomment-40324072
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Clean up and simplify Spark configuration

Posted by tgravescs <gi...@git.apache.org>.

Github user tgravescs commented on a diff in the pull request:

    https://github.com/apache/spark/pull/299#discussion_r11777280
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
    @@ -108,6 +110,21 @@ object SparkSubmit {
         val sysProps = new HashMap[String, String]()
         var childMainClass = ""
     
    +    // Load system properties by default from the file, if present
    +    if (appArgs.verbose) printStream.println(s"Using properties file: ${appArgs.propertiesFile}")
    +    Option(appArgs.propertiesFile).foreach { filename =>
    --- End diff --
    
    is the properties file intended to only be for driver settings?  They aren't propogating to the executors on yarn.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Clean up and simplify Spark configuration

Posted by pwendell <gi...@git.apache.org>.

Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/299#issuecomment-40954634
  
    Okay - I played around with this a bunch this weekend and added some more tests. Unfortunately this type of thing is hard to unit test well since the changes are pervasive throughout the code base and affect things exogenous to the creation of a SparkContext.
    
    We'll continue testing this in the next few days during QA. For now I think I'm going to merge this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Clean up and simplify Spark configuration

Posted by tgravescs <gi...@git.apache.org>.

Github user tgravescs commented on a diff in the pull request:

    https://github.com/apache/spark/pull/299#discussion_r11776742
  
    --- Diff: bin/spark-submit ---
    @@ -25,8 +25,13 @@ while (($#)); do
         DEPLOY_MODE=$2
       elif [ $1 = "--driver-memory" ]; then
         DRIVER_MEMORY=$2
    +  elif [ $1 = "--driver-library-path" ]; then
    +    export _SPARK_LIBRARY_PATH=$2
    +  elif [ $1 = "--driver-class-path" ]; then
    +    export SPARK_CLASSPATH="$SPARK_CLASSPATH:$2"
    +  elif [ $1 = "--driver-java-options" ]; then
    --- End diff --
    
    this doesn't match usage in SparkSubmit where its --driver-java-opts


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP] Clean up and simplify Spark configuratio...

Posted by mridulm <gi...@git.apache.org>.

Github user mridulm commented on a diff in the pull request:

    https://github.com/apache/spark/pull/299#discussion_r11282611
  
    --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
    @@ -340,8 +341,22 @@ trait ClientBase extends Logging {
           JAVA_OPTS += " -XX:CMSIncrementalDutyCycle=10 "
         }
     
    -    if (env.isDefinedAt("SPARK_JAVA_OPTS")) {
    -      JAVA_OPTS += " " + env("SPARK_JAVA_OPTS")
    +
    --- End diff --
    
    application specific defines, -X* config values, etc


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP] Clean up and simplify Spark configuratio...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/299#issuecomment-40325635
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP] Clean up and simplify Spark configuratio...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/299#discussion_r11355234
  
    --- Diff: conf/spark-env.sh.template ---
    @@ -1,19 +1,36 @@
     #!/usr/bin/env bash
     
    -# This file contains environment variables required to run Spark. Copy it as
    -# spark-env.sh and edit that to configure Spark for your site.
    -#
    -# The following variables can be set in this file:
    +# This file is sourced when running various Spark classes. 
    +# Copy it as spark-env.sh and edit that to configure Spark for your site.
    +
    +# Options read when launching programs locally with 
    +# ./bin/spark-example or ./bin/spark-submit
    +# - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
    +# - SPARK_PUBLIC_DNS, to set the public dns name of the driver program
    +# - SPARK_CLASSPATH, default classpath entries to append
    +
    +# Options read by executors and drivers running inside the cluster
     # - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
    +# - SPARK_PUBLIC_DNS, to set the public DNS name of the driver program
    +# - SPARK_LOCAL_DIRS, shuffle directories to use on this node
     # - MESOS_NATIVE_LIBRARY, to point to your libmesos.so if you use Mesos
    -# - SPARK_JAVA_OPTS, to set node-specific JVM options for Spark. Note that
    -#   we recommend setting app-wide options in the application's driver program.
    -#     Examples of node-specific options : -Dspark.local.dir, GC options
    -#     Examples of app-wide options : -Dspark.serializer
    -#
    -# If using the standalone deploy mode, you can also set variables for it here:
    +# - SPARK_CLASSPATH, default classpath entries to append
    +
    +# Options read in YARN client mode
    +# - SPARK_YARN_APP_JAR, Path to your application’s JAR file (required)
    +# - SPARK_WORKER_INSTANCES, Number of workers to start (Default: 2)
    +# - SPARK_WORKER_CORES, Number of cores for the workers (Default: 1).
    +# - SPARK_WORKER_MEMORY, Memory per Worker (e.g. 1000M, 2G) (Default: 1G)
    +# - SPARK_MASTER_MEMORY, Memory for Master (e.g. 1000M, 2G) (Default: 512 Mb)
    +# - SPARK_YARN_APP_NAME, The name of your application (Default: Spark)
    +# - SPARK_YARN_QUEUE, The hadoop queue to use for allocation requests (Default: ‘default’)
    +# - SPARK_YARN_DIST_FILES, Comma separated list of files to be distributed with the job.
    +# - SPARK_YARN_DIST_ARCHIVES, Comma separated list of archives to be distributed with the job.
    +
    +# Options for the daemons used in the standalone deploy mode:
     # - SPARK_MASTER_IP, to bind the master to a different IP address or hostname
     # - SPARK_MASTER_PORT / SPARK_MASTER_WEBUI_PORT, to use non-default ports
    +# - SPARK_MASTER_OPTS, to set config properties at the master (e.g "-Dx=y")
    --- End diff --
    
    Also, is SPARK_MASTER_MEMORY missing here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP] Clean up and simplify Spark configuratio...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/299#discussion_r11356743
  
    --- Diff: yarn/common/src/main/scala/org/apache/spark/scheduler/cluster/YarnClientClusterScheduler.scala ---
    @@ -29,7 +29,7 @@ import org.apache.spark.util.Utils
      */
     private[spark] class YarnClientClusterScheduler(sc: SparkContext, conf: Configuration) extends TaskSchedulerImpl(sc) {
     
    -  def this(sc: SparkContext) = this(sc, new Configuration())
    +  def this(sc: SparkContext) = this(sc, sc.getConf)
    --- End diff --
    
    Maybe I'm missing something here, but doesn't sc.getConf return `SparkConf`, not a hadoop `Configuration`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP] Clean up and simplify Spark configuratio...

Posted by mridulm <gi...@git.apache.org>.

Github user mridulm commented on a diff in the pull request:

    https://github.com/apache/spark/pull/299#discussion_r11235668
  
    --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
    @@ -340,8 +341,22 @@ trait ClientBase extends Logging {
           JAVA_OPTS += " -XX:CMSIncrementalDutyCycle=10 "
         }
     
    -    if (env.isDefinedAt("SPARK_JAVA_OPTS")) {
    -      JAVA_OPTS += " " + env("SPARK_JAVA_OPTS")
    +
    --- End diff --
    
    Removing support for this is to going to fail too many jobs which are currently run via cron, this is going to make things very messy.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Clean up and simplify Spark configuration

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/299#issuecomment-40914955
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Clean up and simplify Spark configuration

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/299#issuecomment-40787245
  
     Build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP] Clean up and simplify Spark configuratio...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/299#issuecomment-40324069
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Clean up and simplify Spark configuration

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/299#issuecomment-40425601
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Clean up and simplify Spark configuration

Posted by pwendell <gi...@git.apache.org>.

Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/299#issuecomment-41114666
  
    @tgravescs I'll fix this - sorry I thought it worked.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Clean up and simplify Spark configuration

Posted by JoshRosen <gi...@git.apache.org>.

Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/299#discussion_r15975800
  
    --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala ---
    @@ -64,9 +64,10 @@ private[spark] class Executor(
       // to what Yarn on this system said was available. This will be used later when SparkEnv
       // created.
       if (java.lang.Boolean.valueOf(
    -      System.getProperty("SPARK_YARN_MODE", System.getenv("SPARK_YARN_MODE"))))
    -  {
    +      System.getProperty("SPARK_YARN_MODE", System.getenv("SPARK_YARN_MODE")))) {
         conf.set("spark.local.dir", getYarnLocalDirs())
    +  } else if (sys.env.contains("SPARK_LOCAL_DIRS")) {
    +    conf.set("spark.local.dir", sys.env("SPARK_LOCAL_DIRS"))
    --- End diff --
    
    @pwendell 
    
    If we're running on local mode, then SparkEnv will have already been created and DiskBlockManager will have already created the local dirs using the previous value of "spark.local.dir".  When we change "spark.local.dir" here, the local Executor will attempt to use local directories that might not exist, causing problems for local jobs that use addFIle().
    
    I discovered this issue when debugging some spark-perf tests in local mode on an EC2 node.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: Clean up and simplify Spark configuration

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/299#discussion_r11720047
  
    --- Diff: docs/configuration.md ---
    @@ -666,13 +696,7 @@ The following variables can be set in `spark-env.sh`:
     * `JAVA_HOME`, the location where Java is installed (if it's not on your default `PATH`)
     * `PYSPARK_PYTHON`, the Python binary to use for PySpark
     * `SPARK_LOCAL_IP`, to configure which IP address of the machine to bind to.
    --- End diff --
    
    This is no longer true


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP] Clean up and simplify Spark configuratio...

Posted by pwendell <gi...@git.apache.org>.

Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/299#discussion_r11235928
  
    --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
    @@ -340,8 +341,22 @@ trait ClientBase extends Logging {
           JAVA_OPTS += " -XX:CMSIncrementalDutyCycle=10 "
         }
     
    -    if (env.isDefinedAt("SPARK_JAVA_OPTS")) {
    -      JAVA_OPTS += " " + env("SPARK_JAVA_OPTS")
    +
    --- End diff --
    
    @mridulm we could add this back to make it backwards-compatible and give a warning. Would that make sense?
    
    Can you give examples of what people are setting in SPARK_JAVA_OPTS? Just curious how people are using it. Also, what does cron have to do with it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP] Clean up and simplify Spark configuratio...

Posted by mridulm <gi...@git.apache.org>.

Github user mridulm commented on a diff in the pull request:

    https://github.com/apache/spark/pull/299#discussion_r11236051
  
    --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
    @@ -340,8 +341,22 @@ trait ClientBase extends Logging {
           JAVA_OPTS += " -XX:CMSIncrementalDutyCycle=10 "
         }
     
    -    if (env.isDefinedAt("SPARK_JAVA_OPTS")) {
    -      JAVA_OPTS += " " + env("SPARK_JAVA_OPTS")
    +
    --- End diff --
    
    cron as in periodically run via oozie or just normal cron.
    so not manually triggered, and so failure of those jobs wont even be noticed for a while (and only after they have already impacted other things)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Clean up and simplify Spark configuration

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/299#issuecomment-40918746
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Clean up and simplify Spark configuration

Posted by pwendell <gi...@git.apache.org>.

Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/299#issuecomment-40615869
  
    @tgravescs `spark.driver.extrClassPath` and `spark.executor.extraClassPath` exist mostly to provide equivalent support to setting SPARK_CLASSPATH which has always been around in previous versions. I also don't like the fact that we support this kind of thing...  we could make it unsupported on YARN if you want, or undocumented.
    
    I tested this on a local YARN cluster. We're doing some QA this week on YARN 2.2 clusters as well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP] Clean up and simplify Spark configuratio...

Posted by mridulm <gi...@git.apache.org>.

Github user mridulm commented on the pull request:

    https://github.com/apache/spark/pull/299#issuecomment-39402736
  
    We will need to continue support for a few of these (SPARK_JAVA_OPTS for example) - which are currently getting pretty heavily used.
    We can issue a deprecated warning to the user, but removing them will break too many cron'ed jobs already running.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP] Clean up and simplify Spark configuratio...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/299#issuecomment-40321268
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP] Clean up and simplify Spark configuratio...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/299#issuecomment-39292232
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP] Clean up and simplify Spark configuratio...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/299#issuecomment-40326046
  
    
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14106/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP] Clean up and simplify Spark configuratio...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/299#discussion_r11355049
  
    --- Diff: conf/spark-env.sh.template ---
    @@ -1,19 +1,36 @@
     #!/usr/bin/env bash
     
    -# This file contains environment variables required to run Spark. Copy it as
    -# spark-env.sh and edit that to configure Spark for your site.
    -#
    -# The following variables can be set in this file:
    +# This file is sourced when running various Spark classes. 
    +# Copy it as spark-env.sh and edit that to configure Spark for your site.
    +
    +# Options read when launching programs locally with 
    +# ./bin/spark-example or ./bin/spark-submit
    +# - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
    +# - SPARK_PUBLIC_DNS, to set the public dns name of the driver program
    +# - SPARK_CLASSPATH, default classpath entries to append
    +
    +# Options read by executors and drivers running inside the cluster
     # - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
    +# - SPARK_PUBLIC_DNS, to set the public DNS name of the driver program
    +# - SPARK_LOCAL_DIRS, shuffle directories to use on this node
     # - MESOS_NATIVE_LIBRARY, to point to your libmesos.so if you use Mesos
    -# - SPARK_JAVA_OPTS, to set node-specific JVM options for Spark. Note that
    -#   we recommend setting app-wide options in the application's driver program.
    -#     Examples of node-specific options : -Dspark.local.dir, GC options
    -#     Examples of app-wide options : -Dspark.serializer
    -#
    -# If using the standalone deploy mode, you can also set variables for it here:
    +# - SPARK_CLASSPATH, default classpath entries to append
    +
    +# Options read in YARN client mode
    +# - SPARK_YARN_APP_JAR, Path to your application’s JAR file (required)
    +# - SPARK_WORKER_INSTANCES, Number of workers to start (Default: 2)
    +# - SPARK_WORKER_CORES, Number of cores for the workers (Default: 1).
    +# - SPARK_WORKER_MEMORY, Memory per Worker (e.g. 1000M, 2G) (Default: 1G)
    +# - SPARK_MASTER_MEMORY, Memory for Master (e.g. 1000M, 2G) (Default: 512 Mb)
    +# - SPARK_YARN_APP_NAME, The name of your application (Default: Spark)
    +# - SPARK_YARN_QUEUE, The hadoop queue to use for allocation requests (Default: ‘default’)
    +# - SPARK_YARN_DIST_FILES, Comma separated list of files to be distributed with the job.
    +# - SPARK_YARN_DIST_ARCHIVES, Comma separated list of archives to be distributed with the job.
    +
    +# Options for the daemons used in the standalone deploy mode:
     # - SPARK_MASTER_IP, to bind the master to a different IP address or hostname
     # - SPARK_MASTER_PORT / SPARK_MASTER_WEBUI_PORT, to use non-default ports
    +# - SPARK_MASTER_OPTS, to set config properties at the master (e.g "-Dx=y")
    --- End diff --
    
    What is the plan for SPARK_DAEMON_*? Do we plan to keep them around?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Clean up and simplify Spark configuration

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/299#discussion_r11719981
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/SparkDeploySchedulerBackend.scala ---
    @@ -42,11 +42,21 @@ private[spark] class SparkDeploySchedulerBackend(
     
         // The endpoint for executors to talk to us
         val driverUrl = "akka.tcp://spark@%s:%s/user/%s".format(
    -      conf.get("spark.driver.host"),  conf.get("spark.driver.port"),
    +      conf.get("spark.driver.host"), conf.get("spark.driver.port"),
           CoarseGrainedSchedulerBackend.ACTOR_NAME)
    -    val args = Seq(driverUrl, "{{EXECUTOR_ID}}", "{{HOSTNAME}}", "{{CORES}}", "{{WORKER_URL}}")
    +    val args = Seq(driverUrl, "{{EXECUTOR_ID}}", "{{HOSTNAME}}",
    +      "{{CORES}}", "{{WORKER_URL}}")
    --- End diff --
    
    nit: couldn't this be in the line above?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Clean up and simplify Spark configuration

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/299#issuecomment-40790471
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14231/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Clean up and simplify Spark configuration

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/299#issuecomment-40914986
  
    Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP] Clean up and simplify Spark configuratio...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/299#discussion_r11356283
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
    @@ -221,6 +247,13 @@ object SparkSubmit {
         val url = localJarFile.getAbsoluteFile.toURI.toURL
         loader.addURL(url)
       }
    +
    +  private def getDefaultProperties(file: File): Seq[(String, String)] = {
    +    val inputStream = new FileInputStream(file)
    +    val properties = new Properties()
    +    properties.load(inputStream)
    +    properties.stringPropertyNames().toSeq.map(k => (k, properties(k)))
    +  }
    --- End diff --
    
    Would be good to add a try catch here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Clean up and simplify Spark configuration

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/299#issuecomment-40831838
  
    Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP] Clean up and simplify Spark configuratio...

Posted by mateiz <gi...@git.apache.org>.

Github user mateiz commented on a diff in the pull request:

    https://github.com/apache/spark/pull/299#discussion_r11513371
  
    --- Diff: conf/spark-env.sh.template ---
    @@ -1,19 +1,36 @@
     #!/usr/bin/env bash
     
    -# This file contains environment variables required to run Spark. Copy it as
    -# spark-env.sh and edit that to configure Spark for your site.
    -#
    -# The following variables can be set in this file:
    +# This file is sourced when running various Spark classes. 
    --- End diff --
    
    Do you want to say what classes this affects? Or maybe say "Spark programs".


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Clean up and simplify Spark configuration

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/299#issuecomment-40831840
  
    
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14242/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP] Clean up and simplify Spark configuratio...

Posted by mridulm <gi...@git.apache.org>.

Github user mridulm commented on a diff in the pull request:

    https://github.com/apache/spark/pull/299#discussion_r11235552
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkConf.scala ---
    @@ -208,6 +210,26 @@ class SparkConf(loadDefaults: Boolean) extends Cloneable with Logging {
         new SparkConf(false).setAll(settings)
       }
     
    +  /** Checks for illegal or deprecated config settings. Throws an exception for the former. */
    +  private def validateSettings() {
    +    if (settings.contains("spark.local.dir")) {
    +      val msg = "In Spark 1.0 and later spark.local.dir will be overridden by the value set by " +
    +        "the cluster manager (via SPARK_LOCAL_DIRS in mesos/standalone and LOCAL_DIRS in YARN)."
    +      logWarning(msg)
    +    }
    +    val executorOptsKey = "spark.executor.extraJavaOptions"
    +    settings.get(executorOptsKey).map { javaOpts =>
    +      if (javaOpts.contains("-Dspark")) {
    +        val msg = s"$executorOptsKey is not allowed to set Spark options. Was '$javaOpts'"
    +        throw new Exception(msg)
    +      }
    +      if (javaOpts.contains("-Xmx") || javaOpts.contains("-Xms")) {
    +        val msg = s"$executorOptsKey is not allowed to alter memory settings. Was '$javaOpts'"
    +        throw new Exception(msg)
    +      }
    --- End diff --
    
    The contains check is not a very general assumption to make ... though might be reasonably practical for first cut.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Clean up and simplify Spark configuration

Posted by sryza <gi...@git.apache.org>.

Github user sryza commented on a diff in the pull request:

    https://github.com/apache/spark/pull/299#discussion_r11718962
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
    @@ -123,6 +142,14 @@ object SparkSubmit {
     
         val options = List[OptionAssigner](
           new OptionAssigner(appArgs.master, ALL_CLUSTER_MGRS, false, sysProp = "spark.master"),
    +
    +      new OptionAssigner(appArgs.driverExtraClassPath, STANDALONE | YARN, true,
    +        sysProp = "spark.driver.extraClassPath"),
    +      new OptionAssigner(appArgs.driverExtraJavaOptions, STANDALONE | YARN, true,
    +        sysProp = "spark.driver.extraJavaOpts"),
    --- End diff --
    
    Options, not Opts?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Clean up and simplify Spark configuration

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/299#issuecomment-40916422
  
    
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14277/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Clean up and simplify Spark configuration

Posted by tgravescs <gi...@git.apache.org>.

Github user tgravescs commented on a diff in the pull request:

    https://github.com/apache/spark/pull/299#discussion_r11777147
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkConf.scala ---
    @@ -208,6 +208,82 @@ class SparkConf(loadDefaults: Boolean) extends Cloneable with Logging {
         new SparkConf(false).setAll(settings)
       }
     
    +  /** Checks for illegal or deprecated config settings. Throws an exception for the former. Not
    +    * idempotent - may mutate this conf object to convert deprecated settings to supported ones. */
    +  private[spark] def validateSettings() {
    +    if (settings.contains("spark.local.dir")) {
    +      val msg = "In Spark 1.0 and later spark.local.dir will be overridden by the value set by " +
    +        "the cluster manager (via SPARK_LOCAL_DIRS in mesos/standalone and LOCAL_DIRS in YARN)."
    +      logWarning(msg)
    +    }
    +
    +    val executorOptsKey = "spark.executor.extraJavaOptions"
    +    val executorClasspathKey = "spark.executor.extraClassPath"
    +    val driverOptsKey = "spark.driver.extraJavaOptions"
    +    val driverClassPathKey = "spark.driver.extraClassPath"
    +
    +    // Validate spark.executor.extraJavaOptions
    +    settings.get(executorOptsKey).map { javaOpts =>
    +      if (javaOpts.contains("-Dspark")) {
    +        val msg = s"$executorOptsKey is not allowed to set Spark options (was '$javaOpts)'. " +
    --- End diff --
    
    Sorry I misunderstood this setting.  Ignore that comment. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Clean up and simplify Spark configuration

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/299#issuecomment-40788256
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Clean up and simplify Spark configuration

Posted by liyezhang556520 <gi...@git.apache.org>.

Github user liyezhang556520 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/299#discussion_r15046137
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/CoarseMesosSchedulerBackend.scala ---
    @@ -123,20 +134,22 @@ private[spark] class CoarseMesosSchedulerBackend(
           conf.get("spark.driver.host"),
           conf.get("spark.driver.port"),
           CoarseGrainedSchedulerBackend.ACTOR_NAME)
    +
         val uri = conf.get("spark.executor.uri", null)
         if (uri == null) {
           val runScript = new File(sparkHome, "./bin/spark-class").getCanonicalPath
           command.setValue(
    -        "\"%s\" org.apache.spark.executor.CoarseGrainedExecutorBackend %s %s %s %d".format(
    -          runScript, driverUrl, offer.getSlaveId.getValue, offer.getHostname, numCores))
    +        "\"%s\" org.apache.spark.executor.CoarseGrainedExecutorBackend %s %s %s %s %d".format(
    +          runScript, extraOpts, driverUrl, offer.getSlaveId.getValue, offer.getHostname, numCores))
    --- End diff --
    
    @pwendell <code>extraOpts</code> will be read as arguments of <code>org.apache.spark.executor.CoarseGrainedExecutorBackend</code>, which will lead to parse error if <code>extraOpts</code> is not empty. Because the 4th argument of <code>CoarseGrainedExecutorBackend</code> is <code>Int</code> type, and if <code>extraOpts</code> is not empty, other values instead of <code>numCores</code> will be regarded as 4th argument to cast to Int and cause error. Do I have any misunderstanding of this:)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Clean up and simplify Spark configuration

Posted by tgravescs <gi...@git.apache.org>.

Github user tgravescs commented on the pull request:

    https://github.com/apache/spark/pull/299#issuecomment-40597042
  
    sorry for my delay, the documentation changes on yarn look good.
    
     I see you added new functionality of allowing the extra driver/executor classpath to be specified. I had thought about doing this, but I was still on the fence about this.  The reason being is generally we push people to ship everything they really need with their application. This helps isolate them from  system dependencies which helps make it easier to upgrade without affecting customers.  It has been asked about by a couple of people though and since there are so many different setup options I think its ok to have.
    
    Have you had a chance to test it on yarn? Hopefully I'll have time to try it today.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP] Clean up and simplify Spark configuratio...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/299#discussion_r11355031
  
    --- Diff: conf/spark-env.sh.template ---
    @@ -1,19 +1,36 @@
     #!/usr/bin/env bash
     
    -# This file contains environment variables required to run Spark. Copy it as
    -# spark-env.sh and edit that to configure Spark for your site.
    -#
    -# The following variables can be set in this file:
    +# This file is sourced when running various Spark classes. 
    +# Copy it as spark-env.sh and edit that to configure Spark for your site.
    +
    +# Options read when launching programs locally with 
    +# ./bin/spark-example or ./bin/spark-submit
    +# - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
    +# - SPARK_PUBLIC_DNS, to set the public dns name of the driver program
    +# - SPARK_CLASSPATH, default classpath entries to append
    +
    +# Options read by executors and drivers running inside the cluster
     # - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
    +# - SPARK_PUBLIC_DNS, to set the public DNS name of the driver program
    +# - SPARK_LOCAL_DIRS, shuffle directories to use on this node
     # - MESOS_NATIVE_LIBRARY, to point to your libmesos.so if you use Mesos
    -# - SPARK_JAVA_OPTS, to set node-specific JVM options for Spark. Note that
    -#   we recommend setting app-wide options in the application's driver program.
    -#     Examples of node-specific options : -Dspark.local.dir, GC options
    -#     Examples of app-wide options : -Dspark.serializer
    -#
    -# If using the standalone deploy mode, you can also set variables for it here:
    +# - SPARK_CLASSPATH, default classpath entries to append
    +
    +# Options read in YARN client mode
    +# - SPARK_YARN_APP_JAR, Path to your application’s JAR file (required)
    +# - SPARK_WORKER_INSTANCES, Number of workers to start (Default: 2)
    +# - SPARK_WORKER_CORES, Number of cores for the workers (Default: 1).
    +# - SPARK_WORKER_MEMORY, Memory per Worker (e.g. 1000M, 2G) (Default: 1G)
    +# - SPARK_MASTER_MEMORY, Memory for Master (e.g. 1000M, 2G) (Default: 512 Mb)
    --- End diff --
    
    What happens if both SPARK_MASTER_MEMORY and SPARK_DAEMON_MEMORY are set, which one takes precedence?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP] Clean up and simplify Spark configuratio...

Posted by aarondav <gi...@git.apache.org>.

Github user aarondav commented on a diff in the pull request:

    https://github.com/apache/spark/pull/299#discussion_r11219984
  
    --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
    @@ -340,8 +341,22 @@ trait ClientBase extends Logging {
           JAVA_OPTS += " -XX:CMSIncrementalDutyCycle=10 "
         }
     
    -    if (env.isDefinedAt("SPARK_JAVA_OPTS")) {
    -      JAVA_OPTS += " " + env("SPARK_JAVA_OPTS")
    +
    +    if (args.amClass == classOf[ExecutorLauncher].getName) {
    +      // If we are being launched in client mode, forward the spark-conf options
    +      // onto the executor launcher
    +      for ((k, v) <- sparkConf.getAll) {
    +        JAVA_OPTS += s"-D$k=$v"
    +      }
    +    } else {
    +      // If we are being launched in standalone mode, capture and forward any spark
    +      // system properties (e.g. set by spark-class).
    +      for ((k, v) <- sys.props.filterKeys(_.startsWith("spark"))) {
    +        JAVA_OPTS += s"-D$k=$v"
    +      }
    +      // TODO: honor driver classpath here: sys.props.get("spark.driver.classPath")
    +      sys.props.get("spark.driver.javaOpts").map(opts => JAVA_OPTS += opts)
    +      sys.props.get("spark.driver.libraryPath").map(p => JAVA_OPTS + s"-Djava.library.path=$p")
    --- End diff --
    
    foreach and +=


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Clean up and simplify Spark configuration

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/299#issuecomment-40913775
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Clean up and simplify Spark configuration

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/299#discussion_r11720080
  
    --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
    @@ -333,15 +325,29 @@ trait ClientBase extends Logging {
         if (useConcurrentAndIncrementalGC) {
           // In our expts, using (default) throughput collector has severe perf ramifications in
           // multi-tenant machines
    -      JAVA_OPTS += " -XX:+UseConcMarkSweepGC "
    -      JAVA_OPTS += " -XX:+CMSIncrementalMode "
    -      JAVA_OPTS += " -XX:+CMSIncrementalPacing "
    -      JAVA_OPTS += " -XX:CMSIncrementalDutyCycleMin=0 "
    -      JAVA_OPTS += " -XX:CMSIncrementalDutyCycle=10 "
    +      JAVA_OPTS += "-XX:+UseConcMarkSweepGC"
    +      JAVA_OPTS += "-XX:+CMSIncrementalMode"
    +      JAVA_OPTS += "-XX:+CMSIncrementalPacing"
    +      JAVA_OPTS += "-XX:CMSIncrementalDutyCycleMin=0"
    +      JAVA_OPTS += "-XX:CMSIncrementalDutyCycle=10"
         }
     
    -    if (env.isDefinedAt("SPARK_JAVA_OPTS")) {
    -      JAVA_OPTS += " " + env("SPARK_JAVA_OPTS")
    +    // TODO: it might be nicer to pass these as an internal environment variable rather than
    +    // as Java options, due to complications with string parsing of nested quotes.
    +    if (args.amClass == classOf[ExecutorLauncher].getName) {
    +      // If we are being launched in client mode, forward the spark-conf options
    +      // onto the executor launcher
    +      for ((k, v) <- sparkConf.getAll) {
    +        JAVA_OPTS += "-D" + k + "=" + "\\\"" + v + "\\\""
    --- End diff --
    
    Do you need a space before `-D`? Otherwise when you string them together, it'll be squished into one giant string without spaces.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Clean up and simplify Spark configuration

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/299#issuecomment-40915520
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP] Clean up and simplify Spark configuratio...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/299#discussion_r11354857
  
    --- Diff: conf/spark-env.sh.template ---
    @@ -1,19 +1,36 @@
     #!/usr/bin/env bash
     
    -# This file contains environment variables required to run Spark. Copy it as
    -# spark-env.sh and edit that to configure Spark for your site.
    -#
    -# The following variables can be set in this file:
    +# This file is sourced when running various Spark classes. 
    +# Copy it as spark-env.sh and edit that to configure Spark for your site.
    +
    +# Options read when launching programs locally with 
    +# ./bin/spark-example or ./bin/spark-submit
    +# - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
    +# - SPARK_PUBLIC_DNS, to set the public dns name of the driver program
    +# - SPARK_CLASSPATH, default classpath entries to append
    +
    +# Options read by executors and drivers running inside the cluster
     # - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
    +# - SPARK_PUBLIC_DNS, to set the public DNS name of the driver program
    --- End diff --
    
    Looks like this is duplicated at the very end of the file.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Clean up and simplify Spark configuration

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/299#issuecomment-40832133
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP] Clean up and simplify Spark configuratio...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/299#issuecomment-40326339
  
    Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP] Clean up and simplify Spark configuratio...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/299#discussion_r11355250
  
    --- Diff: conf/spark-env.sh.template ---
    @@ -1,19 +1,36 @@
     #!/usr/bin/env bash
     
    -# This file contains environment variables required to run Spark. Copy it as
    -# spark-env.sh and edit that to configure Spark for your site.
    -#
    -# The following variables can be set in this file:
    +# This file is sourced when running various Spark classes. 
    +# Copy it as spark-env.sh and edit that to configure Spark for your site.
    +
    +# Options read when launching programs locally with 
    +# ./bin/spark-example or ./bin/spark-submit
    +# - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
    +# - SPARK_PUBLIC_DNS, to set the public dns name of the driver program
    +# - SPARK_CLASSPATH, default classpath entries to append
    +
    +# Options read by executors and drivers running inside the cluster
     # - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
    +# - SPARK_PUBLIC_DNS, to set the public DNS name of the driver program
    +# - SPARK_LOCAL_DIRS, shuffle directories to use on this node
     # - MESOS_NATIVE_LIBRARY, to point to your libmesos.so if you use Mesos
    -# - SPARK_JAVA_OPTS, to set node-specific JVM options for Spark. Note that
    -#   we recommend setting app-wide options in the application's driver program.
    -#     Examples of node-specific options : -Dspark.local.dir, GC options
    -#     Examples of app-wide options : -Dspark.serializer
    -#
    -# If using the standalone deploy mode, you can also set variables for it here:
    +# - SPARK_CLASSPATH, default classpath entries to append
    +
    +# Options read in YARN client mode
    +# - SPARK_YARN_APP_JAR, Path to your application’s JAR file (required)
    +# - SPARK_WORKER_INSTANCES, Number of workers to start (Default: 2)
    +# - SPARK_WORKER_CORES, Number of cores for the workers (Default: 1).
    +# - SPARK_WORKER_MEMORY, Memory per Worker (e.g. 1000M, 2G) (Default: 1G)
    +# - SPARK_MASTER_MEMORY, Memory for Master (e.g. 1000M, 2G) (Default: 512 Mb)
    +# - SPARK_YARN_APP_NAME, The name of your application (Default: Spark)
    +# - SPARK_YARN_QUEUE, The hadoop queue to use for allocation requests (Default: ‘default’)
    +# - SPARK_YARN_DIST_FILES, Comma separated list of files to be distributed with the job.
    +# - SPARK_YARN_DIST_ARCHIVES, Comma separated list of archives to be distributed with the job.
    +
    +# Options for the daemons used in the standalone deploy mode:
     # - SPARK_MASTER_IP, to bind the master to a different IP address or hostname
     # - SPARK_MASTER_PORT / SPARK_MASTER_WEBUI_PORT, to use non-default ports
    +# - SPARK_MASTER_OPTS, to set config properties at the master (e.g "-Dx=y")
     # - SPARK_WORKER_CORES, to set the number of cores to use on this machine
     # - SPARK_WORKER_MEMORY, to set how much memory to use (e.g. 1000m, 2g)
    --- End diff --
    
    What happens if both SPARK_WORKER_MEMORY and SPARK_DAEMON_MEMORY are set, which one takes precedence?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP] Clean up and simplify Spark configuratio...

Posted by mridulm <gi...@git.apache.org>.

Github user mridulm commented on a diff in the pull request:

    https://github.com/apache/spark/pull/299#discussion_r11236035
  
    --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
    @@ -340,8 +341,22 @@ trait ClientBase extends Logging {
           JAVA_OPTS += " -XX:CMSIncrementalDutyCycle=10 "
         }
     
    -    if (env.isDefinedAt("SPARK_JAVA_OPTS")) {
    -      JAVA_OPTS += " " + env("SPARK_JAVA_OPTS")
    +
    --- End diff --
    
    Warning the user would be great, just not remove support for it :-)
    I dont have my scripts at home, but these are used to specify application specific defines (which wont start with 'spark' prefer), etc currently iirc. There are no other ways to do it right now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Clean up and simplify Spark configuration

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/299#discussion_r11720017
  
    --- Diff: docs/configuration.md ---
    @@ -643,6 +646,34 @@ Apart from these, the following properties are also available, and may be useful
         Number of cores to allocate for each task.
       </td>
     </tr>
    +<tr>
    +  <td>spark.executor.extraJavaOptions</td>
    +  <td>(none)</td>
    +  <td>
    +    A string of extra JVM options to pass to executors. For instance, GC settings or other
    +    logging. Note that it is illegal to set Spark properties or heap size settings with this 
    +    option. Spark properties should be set using a SparkConf object or the 
    +    spark-defaults.conf file used with the spark-submit script. Heap size settings can be set
    +    with spark.executor.memory.
    +  </td>
    +</tr>
    +<tr>
    +  <td>spark.executor.extraClassPath</td>
    +  <td>(none)</td>
    +  <td>
    +    Extra classpath entries to append to the classpath of executors. This exists primarily
    +    for backwards-compatiblity with older versions of Spark. Users typically should not need
    --- End diff --
    
    compatibility


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Clean up and simplify Spark configuration

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/299#discussion_r11719873
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
    @@ -17,14 +17,17 @@
     
     package org.apache.spark.deploy
     
    -import java.io.{PrintStream, File}
    +import java.io.{IOException, FileInputStream, PrintStream, File}
    --- End diff --
    
    super small nit: alphabetize these


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Clean up and simplify Spark configuration

Posted by sryza <gi...@git.apache.org>.

Github user sryza commented on the pull request:

    https://github.com/apache/spark/pull/299#issuecomment-40955189
  
    Is this tied to a Spark JIRA?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP] Clean up and simplify Spark configuratio...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/299#issuecomment-40321304
  
    Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP] Clean up and simplify Spark configuratio...

Posted by aarondav <gi...@git.apache.org>.

Github user aarondav commented on a diff in the pull request:

    https://github.com/apache/spark/pull/299#discussion_r11220181
  
    --- Diff: conf/spark-env.sh.template ---
    @@ -1,19 +1,36 @@
     #!/usr/bin/env bash
     
    -# This file contains environment variables required to run Spark. Copy it as
    -# spark-env.sh and edit that to configure Spark for your site.
    -#
    -# The following variables can be set in this file:
    +# This file is sourced when running various Spark classes. 
    +# Copy it as spark-env.sh and edit that to configure Spark for your site.
    +
    +# Options read when launching programs locally with 
    +# ./bin/spark-example or ./bin/spark-submit
    +# - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
    +# - SPARK_PUBLIC_DNS, to set the public dns name of the driver program
    +# - SPARK_CLASSPATH, default classpath entries to append
    +
    +# Options read by executors and drivers running inside the cluster
     # - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
    +# - SPARK_PUBLIC_DNS, to set the public DNS name of the driver program
    +# - SPARK_LOCAL_DIRS, shuffle directories to use on this node
     # - MESOS_NATIVE_LIBRARY, to point to your libmesos.so if you use Mesos
    -# - SPARK_JAVA_OPTS, to set node-specific JVM options for Spark. Note that
    -#   we recommend setting app-wide options in the application's driver program.
    -#     Examples of node-specific options : -Dspark.local.dir, GC options
    -#     Examples of app-wide options : -Dspark.serializer
    -#
    -# If using the standalone deploy mode, you can also set variables for it here:
    +# - SPARK_CLASSPATH, default classpath entries to append
    +
    +# Options read in YARN client mode
    +# - SPARK_YARN_APP_JAR, Path to your application’s JAR file (required)
    +# - SPARK_WORKER_INSTANCES, Number of workers to start (Default: 2)
    +# - SPARK_WORKER_CORES, Number of cores for the workers (Default: 1).
    +# - SPARK_WORKER_MEMORY, Memory per Worker (e.g. 1000M, 2G) (Default: 1G)
    +# - SPARK_MASTER_MEMORY, Memory for Master (e.g. 1000M, 2G) (Default: 512 Mb)
    +# - SPARK_YARN_APP_NAME, The name of your application (Default: Spark)
    +# - SPARK_YARN_QUEUE, The hadoop queue to use for allocation requests (Default: ‘default’)
    +# - SPARK_YARN_DIST_FILES, Comma separated list of files to be distributed with the job.
    +# - SPARK_YARN_DIST_ARCHIVES, Comma separated list of archives to be distributed with the job.
    +
    +# Options for the daemons used in the standalone deploy mode:
     # - SPARK_MASTER_IP, to bind the master to a different IP address or hostname
     # - SPARK_MASTER_PORT / SPARK_MASTER_WEBUI_PORT, to use non-default ports
    +# - SPARK_MASTER_OPTS, to set config properties at the master (e.g "-Dx=y")
     # - SPARK_WORKER_CORES, to set the number of cores to use on this machine
     # - SPARK_WORKER_MEMORY, to set how much memory to use (e.g. 1000m, 2g)
    --- End diff --
    
    should probably specify that this is actually the total amount of memory to give to executors spawned from this worker, this wording could confuse people


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Clean up and simplify Spark configuration

Posted by sryza <gi...@git.apache.org>.

Github user sryza commented on a diff in the pull request:

    https://github.com/apache/spark/pull/299#discussion_r11718991
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkConf.scala ---
    @@ -208,6 +208,81 @@ class SparkConf(loadDefaults: Boolean) extends Cloneable with Logging {
         new SparkConf(false).setAll(settings)
       }
     
    +  /** Checks for illegal or deprecated config settings. Throws an exception for the former. Not
    +    * idempotent - may mutate this conf object to convert deprecated settings to supported ones. */
    +  private[spark] def validateSettings() {
    +    if (settings.contains("spark.local.dir")) {
    +      val msg = "In Spark 1.0 and later spark.local.dir will be overridden by the value set by " +
    +        "the cluster manager (via SPARK_LOCAL_DIRS in mesos/standalone and LOCAL_DIRS in YARN)."
    +      logWarning(msg)
    +    }
    +
    +    val executorOptsKey = "spark.executor.extraJavaOptions"
    +    val executorClasspathKey = "spark.executor.extraClassPath"
    +    val driverOptsKey = "spark.driver.extraJavaOptions"
    +    val driverClassPathKey = "spark.driver.extraClassPath"
    +
    +    // Validate spark.executor.extraJavaOptions
    +    settings.get(executorOptsKey).map { javaOpts =>
    +      if (javaOpts.contains("-Dspark")) {
    +        val msg = s"$executorOptsKey is not allowed to set Spark options. Was '$javaOpts'"
    +        throw new Exception(msg)
    +      }
    +      if (javaOpts.contains("-Xmx") || javaOpts.contains("-Xms")) {
    +        val msg = s"$executorOptsKey is not allowed to alter memory settings (was '$javaOpts'). " +
    +          "Use spark.executor.memory instead."
    +        throw new Exception(msg)
    +      }
    +    }
    +
    +    // Check for legacy configs
    +    sys.env.get("SPARK_JAVA_OPTS").foreach { value =>
    +      val error =
    +        s"""
    +          |SPARK_JAVA_OPTS was detected (set to '$value').
    +          |This has undefined behavior when running on a cluster and is deprecated in Spark 1.0+.
    +          |
    +          |Please instead use:
    +          | - ./spark-submit with conf/spark-defaults.conf to set properties for an application
    +          | - ./spark-submit with --driver-java-options to set -X options for a driver
    +          | - spark.executor.executor.extraJavaOptions to set -X options for executors
    +          | - SPARK_DAEMON_OPTS to set java options for standalone daemons (i.e. master, worker)
    +        """.stripMargin
    +      logError(error)
    +
    +      for (key <- Seq(executorOptsKey, driverOptsKey)) {
    +        if (getOption(key).isDefined) {
    +          throw new SparkException(s"Found both $key and SPARK_JAVA_OPTS. Use only the former.")
    +        } else {
    +          logWarning(s"Setting '$key' to '$value' as a work-around.")
    +          set(key, value)
    +        }
    +      }
    +    }
    +
    +    sys.env.get("SPARK_CLASSPATH").foreach { value =>
    +      val error =
    +        s"""
    +          |SPARK_CLASSPATH was detected (set to '$value').
    +          | This has undefined behavior when running on a cluster and is deprecated in Spark 1.0+.
    +          |
    +          |Please instead use:
    +          | - ./spark-submit with --driver-class-path to augment the driver classpath
    +          | - spark.executor.executor.extraClassPath to augment the executor classpath
    --- End diff --
    
    repetition of executor


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Clean up and simplify Spark configuration

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/299#issuecomment-40920109
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14281/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Clean up and simplify Spark configuration

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/299#issuecomment-40831681
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP] Clean up and simplify Spark configuratio...

Posted by aarondav <gi...@git.apache.org>.

Github user aarondav commented on a diff in the pull request:

    https://github.com/apache/spark/pull/299#discussion_r11355356
  
    --- Diff: conf/spark-env.sh.template ---
    @@ -1,19 +1,36 @@
     #!/usr/bin/env bash
     
    -# This file contains environment variables required to run Spark. Copy it as
    -# spark-env.sh and edit that to configure Spark for your site.
    -#
    -# The following variables can be set in this file:
    +# This file is sourced when running various Spark classes. 
    +# Copy it as spark-env.sh and edit that to configure Spark for your site.
    +
    +# Options read when launching programs locally with 
    +# ./bin/spark-example or ./bin/spark-submit
    +# - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
    +# - SPARK_PUBLIC_DNS, to set the public dns name of the driver program
    +# - SPARK_CLASSPATH, default classpath entries to append
    +
    +# Options read by executors and drivers running inside the cluster
     # - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
    +# - SPARK_PUBLIC_DNS, to set the public DNS name of the driver program
    +# - SPARK_LOCAL_DIRS, shuffle directories to use on this node
     # - MESOS_NATIVE_LIBRARY, to point to your libmesos.so if you use Mesos
    -# - SPARK_JAVA_OPTS, to set node-specific JVM options for Spark. Note that
    -#   we recommend setting app-wide options in the application's driver program.
    -#     Examples of node-specific options : -Dspark.local.dir, GC options
    -#     Examples of app-wide options : -Dspark.serializer
    -#
    -# If using the standalone deploy mode, you can also set variables for it here:
    +# - SPARK_CLASSPATH, default classpath entries to append
    +
    +# Options read in YARN client mode
    +# - SPARK_YARN_APP_JAR, Path to your application’s JAR file (required)
    +# - SPARK_WORKER_INSTANCES, Number of workers to start (Default: 2)
    +# - SPARK_WORKER_CORES, Number of cores for the workers (Default: 1).
    +# - SPARK_WORKER_MEMORY, Memory per Worker (e.g. 1000M, 2G) (Default: 1G)
    +# - SPARK_MASTER_MEMORY, Memory for Master (e.g. 1000M, 2G) (Default: 512 Mb)
    +# - SPARK_YARN_APP_NAME, The name of your application (Default: Spark)
    +# - SPARK_YARN_QUEUE, The hadoop queue to use for allocation requests (Default: ‘default’)
    +# - SPARK_YARN_DIST_FILES, Comma separated list of files to be distributed with the job.
    +# - SPARK_YARN_DIST_ARCHIVES, Comma separated list of archives to be distributed with the job.
    +
    +# Options for the daemons used in the standalone deploy mode:
     # - SPARK_MASTER_IP, to bind the master to a different IP address or hostname
     # - SPARK_MASTER_PORT / SPARK_MASTER_WEBUI_PORT, to use non-default ports
    +# - SPARK_MASTER_OPTS, to set config properties at the master (e.g "-Dx=y")
    --- End diff --
    
    there isn't actually a SPARK_MASTER_MEMORY, SPARK_DAEMON_MEMORY is the only way to set this


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP] Clean up and simplify Spark configuratio...

Posted by pwendell <gi...@git.apache.org>.

Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/299#discussion_r11221895
  
    --- Diff: docs/configuration.md ---
    @@ -586,6 +589,16 @@ Apart from these, the following properties are also available, and may be useful
         Number of cores to allocate for each task.
       </td>
     </tr>
    +<tr>
    +  <td>spark.executor.extraJavaOptions</td>
    +  <td>(none)</td>
    +  <td>
    +    A string of extra JVM options to pass to executors. For instance, GC settings or custom
    +    paths for native code. Note that it is illegal to set Spark properties or heap size 
    --- End diff --
    
    Ya sorry - I need to update the executor to break out theseoptions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP] Clean up and simplify Spark configuratio...

Posted by pwendell <gi...@git.apache.org>.

Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/299#discussion_r11567397
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/SparkDeploySchedulerBackend.scala ---
    @@ -42,11 +42,16 @@ private[spark] class SparkDeploySchedulerBackend(
     
         // The endpoint for executors to talk to us
         val driverUrl = "akka.tcp://spark@%s:%s/user/%s".format(
    -      conf.get("spark.driver.host"),  conf.get("spark.driver.port"),
    +      conf.get("spark.driver.host"), conf.get("spark.driver.port"),
           CoarseGrainedSchedulerBackend.ACTOR_NAME)
    -    val args = Seq(driverUrl, "{{EXECUTOR_ID}}", "{{HOSTNAME}}", "{{CORES}}", "{{WORKER_URL}}")
    +    val args = sc.conf.get("spark.executor.extraJavaOptions").split(" ") ++
    --- End diff --
    
    Good catch - I think we might just have to treat these as a single blob. It's not ideal but I noticed this is what YARN already does.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Clean up and simplify Spark configuration

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/299#issuecomment-40787253
  
    Build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Clean up and simplify Spark configuration

Posted by tgravescs <gi...@git.apache.org>.

Github user tgravescs commented on a diff in the pull request:

    https://github.com/apache/spark/pull/299#discussion_r11775108
  
    --- Diff: docs/configuration.md ---
    @@ -649,6 +652,34 @@ Apart from these, the following properties are also available, and may be useful
         Number of cores to allocate for each task.
       </td>
     </tr>
    +<tr>
    +  <td>spark.executor.extraJavaOptions</td>
    +  <td>(none)</td>
    +  <td>
    +    A string of extra JVM options to pass to executors. For instance, GC settings or other
    +    logging. Note that it is illegal to set Spark properties or heap size settings with this 
    +    option. Spark properties should be set using a SparkConf object or the 
    +    spark-defaults.conf file used with the spark-submit script. Heap size settings can be set
    --- End diff --
    
    should this be spark-defaults.properties since that is wait code looks for?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Clean up and simplify Spark configuration

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/299#issuecomment-40914987
  
    
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14276/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Clean up and simplify Spark configuration

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/299#issuecomment-40830907
  
    
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14241/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Clean up and simplify Spark configuration

Posted by tgravescs <gi...@git.apache.org>.

Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/299#discussion_r11793588

--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -108,6 +110,21 @@ object SparkSubmit {
val sysProps = new HashMap[String, String]()
var childMainClass = ""

+ // Load system properties by default from the file, if present
+ if (appArgs.verbose) printStream.println(s"Using properties file: ${appArgs.propertiesFile}")
+ Option(appArgs.propertiesFile).foreach { filename =>
--- End diff --

I was trying to use spark.authenticate and it isn't being turned on.

This setting though I don't think will be able to be propagated since when the executor starts it uses akka to talk back to the driver in order to get those settings. It has to know ahead of time to use the secret. So perhaps there are a few that need to be handled specially.

Looking at some other settings they are getting propagated properly.

I don't know if this is related to this jira but I'm also seeing weird behavior when an application fails. The AM seems to be staying around even though it finished with RM with failed. I'll try to look into this some more.

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Clean up and simplify Spark configuration

Posted by pwendell <gi...@git.apache.org>.

Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/299#discussion_r11779935
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
    @@ -108,6 +110,21 @@ object SparkSubmit {
         val sysProps = new HashMap[String, String]()
         var childMainClass = ""
     
    +    // Load system properties by default from the file, if present
    +    if (appArgs.verbose) printStream.println(s"Using properties file: ${appArgs.propertiesFile}")
    +    Option(appArgs.propertiesFile).foreach { filename =>
    --- End diff --
    
    They are propagated to the executors because the driver bundles up all of it's spark properties and sends them to the executors. Is there a path you see where they don't get propagated?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Clean up and simplify Spark configuration

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/299#issuecomment-40913815
  
    
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14275/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP] Clean up and simplify Spark configuratio...

Posted by aarondav <gi...@git.apache.org>.

Github user aarondav commented on a diff in the pull request:

    https://github.com/apache/spark/pull/299#discussion_r11235627
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkConf.scala ---
    @@ -208,6 +210,26 @@ class SparkConf(loadDefaults: Boolean) extends Cloneable with Logging {
         new SparkConf(false).setAll(settings)
       }
     
    +  /** Checks for illegal or deprecated config settings. Throws an exception for the former. */
    +  private def validateSettings() {
    +    if (settings.contains("spark.local.dir")) {
    +      val msg = "In Spark 1.0 and later spark.local.dir will be overridden by the value set by " +
    +        "the cluster manager (via SPARK_LOCAL_DIRS in mesos/standalone and LOCAL_DIRS in YARN)."
    +      logWarning(msg)
    +    }
    +    val executorOptsKey = "spark.executor.extraJavaOptions"
    +    settings.get(executorOptsKey).map { javaOpts =>
    +      if (javaOpts.contains("-Dspark")) {
    +        val msg = s"$executorOptsKey is not allowed to set Spark options. Was '$javaOpts'"
    +        throw new Exception(msg)
    +      }
    +      if (javaOpts.contains("-Xmx") || javaOpts.contains("-Xms")) {
    +        val msg = s"$executorOptsKey is not allowed to alter memory settings. Was '$javaOpts'"
    --- End diff --
    
    Maybe tell the user what property to set instead.
    
    Also, we could do this same check for SPARK_JAVA_OPTS, since setting the memory there is overridden by whatever variant of SPARK_MEM is in use.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP] Clean up and simplify Spark configuratio...

Posted by aarondav <gi...@git.apache.org>.

Github user aarondav commented on a diff in the pull request:

    https://github.com/apache/spark/pull/299#discussion_r11355327
  
    --- Diff: conf/spark-env.sh.template ---
    @@ -1,19 +1,36 @@
     #!/usr/bin/env bash
     
    -# This file contains environment variables required to run Spark. Copy it as
    -# spark-env.sh and edit that to configure Spark for your site.
    -#
    -# The following variables can be set in this file:
    +# This file is sourced when running various Spark classes. 
    +# Copy it as spark-env.sh and edit that to configure Spark for your site.
    +
    +# Options read when launching programs locally with 
    +# ./bin/spark-example or ./bin/spark-submit
    +# - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
    +# - SPARK_PUBLIC_DNS, to set the public dns name of the driver program
    +# - SPARK_CLASSPATH, default classpath entries to append
    +
    +# Options read by executors and drivers running inside the cluster
     # - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
    +# - SPARK_PUBLIC_DNS, to set the public DNS name of the driver program
    +# - SPARK_LOCAL_DIRS, shuffle directories to use on this node
     # - MESOS_NATIVE_LIBRARY, to point to your libmesos.so if you use Mesos
    -# - SPARK_JAVA_OPTS, to set node-specific JVM options for Spark. Note that
    -#   we recommend setting app-wide options in the application's driver program.
    -#     Examples of node-specific options : -Dspark.local.dir, GC options
    -#     Examples of app-wide options : -Dspark.serializer
    -#
    -# If using the standalone deploy mode, you can also set variables for it here:
    +# - SPARK_CLASSPATH, default classpath entries to append
    +
    +# Options read in YARN client mode
    +# - SPARK_YARN_APP_JAR, Path to your application’s JAR file (required)
    +# - SPARK_WORKER_INSTANCES, Number of workers to start (Default: 2)
    +# - SPARK_WORKER_CORES, Number of cores for the workers (Default: 1).
    +# - SPARK_WORKER_MEMORY, Memory per Worker (e.g. 1000M, 2G) (Default: 1G)
    +# - SPARK_MASTER_MEMORY, Memory for Master (e.g. 1000M, 2G) (Default: 512 Mb)
    +# - SPARK_YARN_APP_NAME, The name of your application (Default: Spark)
    +# - SPARK_YARN_QUEUE, The hadoop queue to use for allocation requests (Default: ‘default’)
    +# - SPARK_YARN_DIST_FILES, Comma separated list of files to be distributed with the job.
    +# - SPARK_YARN_DIST_ARCHIVES, Comma separated list of archives to be distributed with the job.
    +
    +# Options for the daemons used in the standalone deploy mode:
     # - SPARK_MASTER_IP, to bind the master to a different IP address or hostname
     # - SPARK_MASTER_PORT / SPARK_MASTER_WEBUI_PORT, to use non-default ports
    +# - SPARK_MASTER_OPTS, to set config properties at the master (e.g "-Dx=y")
     # - SPARK_WORKER_CORES, to set the number of cores to use on this machine
     # - SPARK_WORKER_MEMORY, to set how much memory to use (e.g. 1000m, 2g)
    --- End diff --
    
    those two are unrelated (unfortunately)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP] Clean up and simplify Spark configuratio...

Posted by aarondav <gi...@git.apache.org>.

Github user aarondav commented on a diff in the pull request:

    https://github.com/apache/spark/pull/299#discussion_r11324035
  
    --- Diff: bin/run-example ---
    @@ -75,7 +75,6 @@ fi
     
     # Set JAVA_OPTS to be able to load native libraries and to set heap size
     JAVA_OPTS="$SPARK_JAVA_OPTS"
    -JAVA_OPTS="$JAVA_OPTS -Djava.library.path=$SPARK_LIBRARY_PATH"
    --- End diff --
    
    It seems there is currently no way to set the library path for examples, do we need one?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP] Clean up and simplify Spark configuratio...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/299#issuecomment-40324105
  
    
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14105/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP] Clean up and simplify Spark configuratio...

Posted by aarondav <gi...@git.apache.org>.

Github user aarondav commented on a diff in the pull request:

    https://github.com/apache/spark/pull/299#discussion_r11219968
  
    --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
    @@ -340,8 +341,22 @@ trait ClientBase extends Logging {
           JAVA_OPTS += " -XX:CMSIncrementalDutyCycle=10 "
         }
     
    -    if (env.isDefinedAt("SPARK_JAVA_OPTS")) {
    -      JAVA_OPTS += " " + env("SPARK_JAVA_OPTS")
    +
    +    if (args.amClass == classOf[ExecutorLauncher].getName) {
    +      // If we are being launched in client mode, forward the spark-conf options
    +      // onto the executor launcher
    +      for ((k, v) <- sparkConf.getAll) {
    +        JAVA_OPTS += s"-D$k=$v"
    +      }
    +    } else {
    +      // If we are being launched in standalone mode, capture and forward any spark
    +      // system properties (e.g. set by spark-class).
    +      for ((k, v) <- sys.props.filterKeys(_.startsWith("spark"))) {
    +        JAVA_OPTS += s"-D$k=$v"
    +      }
    +      // TODO: honor driver classpath here: sys.props.get("spark.driver.classPath")
    +      sys.props.get("spark.driver.javaOpts").map(opts => JAVA_OPTS += opts)
    --- End diff --
    
    foreach


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP] Clean up and simplify Spark configuratio...

Posted by mateiz <gi...@git.apache.org>.

Github user mateiz commented on a diff in the pull request:

    https://github.com/apache/spark/pull/299#discussion_r11513505
  
    --- Diff: conf/spark-env.sh.template ---
    @@ -1,19 +1,36 @@
     #!/usr/bin/env bash
     
    -# This file contains environment variables required to run Spark. Copy it as
    -# spark-env.sh and edit that to configure Spark for your site.
    -#
    -# The following variables can be set in this file:
    +# This file is sourced when running various Spark classes. 
    +# Copy it as spark-env.sh and edit that to configure Spark for your site.
    +
    +# Options read when launching programs locally with 
    +# ./bin/spark-example or ./bin/spark-submit
    +# - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
    +# - SPARK_PUBLIC_DNS, to set the public dns name of the driver program
    +# - SPARK_CLASSPATH, default classpath entries to append
    +
    +# Options read by executors and drivers running inside the cluster
     # - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
    +# - SPARK_PUBLIC_DNS, to set the public DNS name of the driver program
    +# - SPARK_LOCAL_DIRS, shuffle directories to use on this node
     # - MESOS_NATIVE_LIBRARY, to point to your libmesos.so if you use Mesos
    -# - SPARK_JAVA_OPTS, to set node-specific JVM options for Spark. Note that
    -#   we recommend setting app-wide options in the application's driver program.
    -#     Examples of node-specific options : -Dspark.local.dir, GC options
    -#     Examples of app-wide options : -Dspark.serializer
    -#
    -# If using the standalone deploy mode, you can also set variables for it here:
    +# - SPARK_CLASSPATH, default classpath entries to append
    +
    +# Options read in YARN client mode
    +# - SPARK_YARN_APP_JAR, Path to your application’s JAR file (required)
    +# - SPARK_WORKER_INSTANCES, Number of workers to start (Default: 2)
    +# - SPARK_WORKER_CORES, Number of cores for the workers (Default: 1).
    +# - SPARK_WORKER_MEMORY, Memory per Worker (e.g. 1000M, 2G) (Default: 1G)
    +# - SPARK_MASTER_MEMORY, Memory for Master (e.g. 1000M, 2G) (Default: 512 Mb)
    +# - SPARK_YARN_APP_NAME, The name of your application (Default: Spark)
    +# - SPARK_YARN_QUEUE, The hadoop queue to use for allocation requests (Default: ‘default’)
    +# - SPARK_YARN_DIST_FILES, Comma separated list of files to be distributed with the job.
    +# - SPARK_YARN_DIST_ARCHIVES, Comma separated list of archives to be distributed with the job.
    +
    +# Options for the daemons used in the standalone deploy mode:
     # - SPARK_MASTER_IP, to bind the master to a different IP address or hostname
     # - SPARK_MASTER_PORT / SPARK_MASTER_WEBUI_PORT, to use non-default ports
    +# - SPARK_MASTER_OPTS, to set config properties at the master (e.g "-Dx=y")
     # - SPARK_WORKER_CORES, to set the number of cores to use on this machine
     # - SPARK_WORKER_MEMORY, to set how much memory to use (e.g. 1000m, 2g)
     # - SPARK_WORKER_PORT / SPARK_WORKER_WEBUI_PORT
    --- End diff --
    
    Do we want to list all these options in `docs/configuration.md` as well? Right now it has a subset of them, e.g. no YARN options. We could also have it point to the template.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Clean up and simplify Spark configuration

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/299#issuecomment-40916420
  
    Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Clean up and simplify Spark configuration

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/299#issuecomment-40425587
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Clean up and simplify Spark configuration

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/299#discussion_r11720118
  
    --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
    @@ -424,29 +431,29 @@ object ClientBase {
         }
       }
     
    -  def populateClasspath(conf: Configuration, sparkConf: SparkConf, addLog4j: Boolean, env: HashMap[String, String]) {
    -    Apps.addToEnvironment(env, Environment.CLASSPATH.name, Environment.PWD.$())
    +  def populateClasspath(conf: Configuration, sparkConf: SparkConf, addLog4j: Boolean, env: HashMap[String, String],
    +      extraClassPath: Option[String] = None) {
    +
    +    /** Add entry to the classpath. */
    +    def addClasspathEntry(path: String) = Apps.addToEnvironment(env, Environment.CLASSPATH.name, path)
    +    /** Add entry to the classpath. Interpreted as a path relative to the working directory. */
    +    def addPwdClasspathEntry(entry: String) = addClasspathEntry(Environment.PWD.$() + Path.SEPARATOR + entry)
    +
    +    extraClassPath.foreach(addClasspathEntry)
    +
    +    addClasspathEntry(Environment.PWD.$())
         // If log4j present, ensure ours overrides all others
    -    if (addLog4j) {
    -      Apps.addToEnvironment(env, Environment.CLASSPATH.name, Environment.PWD.$() +
    -        Path.SEPARATOR + LOG4J_PROP)
    -    }
    +    if (addLog4j) addPwdClasspathEntry(LOG4J_PROP)
         // Normally the users app.jar is last in case conflicts with spark jars
    -    val userClasspathFirst = sparkConf.get("spark.yarn.user.classpath.first", "false")
    -      .toBoolean
    -    if (userClasspathFirst) {
    -      Apps.addToEnvironment(env, Environment.CLASSPATH.name, Environment.PWD.$() +
    -        Path.SEPARATOR + APP_JAR)
    -    }
    -    Apps.addToEnvironment(env, Environment.CLASSPATH.name, Environment.PWD.$() +
    -      Path.SEPARATOR + SPARK_JAR)
    -    ClientBase.populateHadoopClasspath(conf, env)
    -
    -    if (!userClasspathFirst) {
    -      Apps.addToEnvironment(env, Environment.CLASSPATH.name, Environment.PWD.$() +
    -        Path.SEPARATOR + APP_JAR)
    +    if (sparkConf.get("spark.yarn.user.classpath.first", "false").toBoolean) {
    +      addPwdClasspathEntry(APP_JAR)
    +      addPwdClasspathEntry(SPARK_JAR)
    +      ClientBase.populateHadoopClasspath(conf, env)
    +    } else {
    +      addPwdClasspathEntry(SPARK_JAR)
    +      ClientBase.populateHadoopClasspath(conf, env)
    +      addPwdClasspathEntry(APP_JAR)
    --- End diff --
    
    Nice, the old code here really bugged me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Clean up and simplify Spark configuration

Posted by tgravescs <gi...@git.apache.org>.

Github user tgravescs commented on a diff in the pull request:

    https://github.com/apache/spark/pull/299#discussion_r11774661
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
    @@ -122,6 +141,22 @@ private[spark] class SparkSubmitArguments(args: Array[String]) {
           driverCores = value
           parseOpts(tail)
     
    +    case ("--driver-class-path") :: value :: tail =>
    +      driverExtraClassPath = value
    +      parseOpts(tail)
    +
    +    case ("--driver-java-opts") :: value :: tail =>
    +      driverExtraJavaOptions = value
    +      parseOpts(tail)
    +
    +    case ("--driver-library-path") :: value :: tail =>
    +      driverExtraLibraryPath = value
    +      parseOpts(tail)
    +
    +    case ("--properties-file") :: value :: tail =>
    --- End diff --
    
    this isn't being printed in usage of spark-submit script.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP] Clean up and simplify Spark configuratio...

Posted by pwendell <gi...@git.apache.org>.

Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/299#discussion_r11235965
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/Client.scala ---
    @@ -54,8 +54,15 @@ private class ClientActor(driverArgs: ClientArguments, conf: SparkConf) extends
             System.getenv().foreach{case (k, v) => env(k) = v}
     
             val mainClass = "org.apache.spark.deploy.worker.DriverWrapper"
    +        val classPathEntries = sys.props.get("spark.driver.classPath").toSeq.flatMap { cp =>
    +          cp.split(java.io.File.pathSeparator)
    +        }
    +        val libraryPathEntries = sys.props.get("spark.driver.libraryPath").toSeq.flatMap { cp =>
    +          cp.split(java.io.File.pathSeparator)
    +        }
    +        val javaOpts = sys.props.get("spark.driver.javaOpts").toSeq
    --- End diff --
    
    hm ya, I should probably split them up. I wasn't kidding when I put `[WIP]` in this PR title!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Clean up and simplify Spark configuration

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/299#discussion_r11719892
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
    @@ -108,6 +112,21 @@ object SparkSubmit {
         val sysProps = new HashMap[String, String]()
         var childMainClass = ""
     
    +    // Load system properties by default from the file, if present
    +    if (appArgs.verbose) printStream.println(s"Using properties file: ${appArgs.propertiesFile}")
    +    Option(appArgs.propertiesFile).map { filename =>
    --- End diff --
    
    foreach


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Clean up and simplify Spark configuration

Posted by pwendell <gi...@git.apache.org>.

Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/299#discussion_r11720264
  
    --- Diff: docs/configuration.md ---
    @@ -666,13 +696,7 @@ The following variables can be set in `spark-env.sh`:
     * `JAVA_HOME`, the location where Java is installed (if it's not on your default `PATH`)
     * `PYSPARK_PYTHON`, the Python binary to use for PySpark
     * `SPARK_LOCAL_IP`, to configure which IP address of the machine to bind to.
    --- End diff --
    
    I think this remains true for akka utils, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP] Clean up and simplify Spark configuratio...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/299#issuecomment-39295723
  
    Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP] Clean up and simplify Spark configuratio...

Posted by tgravescs <gi...@git.apache.org>.

Github user tgravescs commented on a diff in the pull request:

    https://github.com/apache/spark/pull/299#discussion_r11444014
  
    --- Diff: conf/spark-env.sh.template ---
    @@ -1,19 +1,36 @@
     #!/usr/bin/env bash
     
    -# This file contains environment variables required to run Spark. Copy it as
    -# spark-env.sh and edit that to configure Spark for your site.
    -#
    -# The following variables can be set in this file:
    +# This file is sourced when running various Spark classes. 
    +# Copy it as spark-env.sh and edit that to configure Spark for your site.
    +
    +# Options read when launching programs locally with 
    +# ./bin/spark-example or ./bin/spark-submit
    +# - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
    +# - SPARK_PUBLIC_DNS, to set the public dns name of the driver program
    +# - SPARK_CLASSPATH, default classpath entries to append
    +
    +# Options read by executors and drivers running inside the cluster
     # - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
    +# - SPARK_PUBLIC_DNS, to set the public DNS name of the driver program
    +# - SPARK_LOCAL_DIRS, shuffle directories to use on this node
     # - MESOS_NATIVE_LIBRARY, to point to your libmesos.so if you use Mesos
    -# - SPARK_JAVA_OPTS, to set node-specific JVM options for Spark. Note that
    -#   we recommend setting app-wide options in the application's driver program.
    -#     Examples of node-specific options : -Dspark.local.dir, GC options
    -#     Examples of app-wide options : -Dspark.serializer
    -#
    -# If using the standalone deploy mode, you can also set variables for it here:
    +# - SPARK_CLASSPATH, default classpath entries to append
    +
    +# Options read in YARN client mode
    +# - SPARK_YARN_APP_JAR, Path to your application’s JAR file (required)
    +# - SPARK_WORKER_INSTANCES, Number of workers to start (Default: 2)
    +# - SPARK_WORKER_CORES, Number of cores for the workers (Default: 1).
    +# - SPARK_WORKER_MEMORY, Memory per Worker (e.g. 1000M, 2G) (Default: 1G)
    +# - SPARK_MASTER_MEMORY, Memory for Master (e.g. 1000M, 2G) (Default: 512 Mb)
    --- End diff --
    
    If this is for yarn client mode, you'll want to update these to match the new env variables:
    * `SPARK_EXECUTOR_INSTANCES`, Number of executors to start (Default: 2)
    * `SPARK_EXECUTOR_CORES`, Number of cores per executor (Default: 1).
    * `SPARK_EXECUTOR_MEMORY`, Memory per executor (e.g. 1000M, 2G) (Default: 1G)
    * `SPARK_DRIVER_MEMORY`, Memory for driver (e.g. 1000M, 2G) (Default: 512 Mb)
    * `SPARK_YARN_APP_NAME`, The name of your application (Default: Spark)
    * `SPARK_YARN_QUEUE`, The YARN queue to use for allocation requests (Default: 'default')
    * `SPARK_YARN_DIST_FILES`, Comma separated list of files to be distributed with the job.
    * `SPARK_YARN_DIST_ARCHIVES`, Comma separated list of archives to be distributed with the job.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP] Clean up and simplify Spark configuratio...

Posted by markhamstra <gi...@git.apache.org>.

Github user markhamstra commented on a diff in the pull request:

    https://github.com/apache/spark/pull/299#discussion_r11221304
  
    --- Diff: conf/spark-env.sh.template ---
    @@ -1,19 +1,36 @@
     #!/usr/bin/env bash
     
    -# This file contains environment variables required to run Spark. Copy it as
    -# spark-env.sh and edit that to configure Spark for your site.
    -#
    -# The following variables can be set in this file:
    +# This file is sourced when running various Spark classes. 
    +# Copy it as spark-env.sh and edit that to configure Spark for your site.
    --- End diff --
    
    Example values for many or all of these options would be helpful.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Clean up and simplify Spark configuration

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/299#issuecomment-40835371
  
    Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Clean up and simplify Spark configuration

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/299#issuecomment-40920106
  
    Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP] Clean up and simplify Spark configuratio...

Posted by aarondav <gi...@git.apache.org>.

Github user aarondav commented on a diff in the pull request:

    https://github.com/apache/spark/pull/299#discussion_r11235687
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/Client.scala ---
    @@ -54,8 +54,15 @@ private class ClientActor(driverArgs: ClientArguments, conf: SparkConf) extends
             System.getenv().foreach{case (k, v) => env(k) = v}
     
             val mainClass = "org.apache.spark.deploy.worker.DriverWrapper"
    +        val classPathEntries = sys.props.get("spark.driver.classPath").toSeq.flatMap { cp =>
    +          cp.split(java.io.File.pathSeparator)
    +        }
    +        val libraryPathEntries = sys.props.get("spark.driver.libraryPath").toSeq.flatMap { cp =>
    +          cp.split(java.io.File.pathSeparator)
    +        }
    +        val javaOpts = sys.props.get("spark.driver.javaOpts").toSeq
    --- End diff --
    
    This will be a Seq("-Xblah -Xblah2 -Xblah3") rather than any sort of set of options.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP] Clean up and simplify Spark configuratio...

Posted by aarondav <gi...@git.apache.org>.

Github user aarondav commented on a diff in the pull request:

    https://github.com/apache/spark/pull/299#discussion_r11220246
  
    --- Diff: conf/spark-env.sh.template ---
    @@ -1,19 +1,36 @@
     #!/usr/bin/env bash
     
    -# This file contains environment variables required to run Spark. Copy it as
    -# spark-env.sh and edit that to configure Spark for your site.
    -#
    -# The following variables can be set in this file:
    +# This file is sourced when running various Spark classes. 
    +# Copy it as spark-env.sh and edit that to configure Spark for your site.
    +
    +# Options read when launching programs locally with 
    +# ./bin/spark-example or ./bin/spark-submit
    +# - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
    +# - SPARK_PUBLIC_DNS, to set the public dns name of the driver program
    +# - SPARK_CLASSPATH, default classpath entries to append
    +
    +# Options read by executors and drivers running inside the cluster
     # - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
    +# - SPARK_PUBLIC_DNS, to set the public DNS name of the driver program
    +# - SPARK_LOCAL_DIRS, shuffle directories to use on this node
     # - MESOS_NATIVE_LIBRARY, to point to your libmesos.so if you use Mesos
    -# - SPARK_JAVA_OPTS, to set node-specific JVM options for Spark. Note that
    -#   we recommend setting app-wide options in the application's driver program.
    -#     Examples of node-specific options : -Dspark.local.dir, GC options
    -#     Examples of app-wide options : -Dspark.serializer
    -#
    -# If using the standalone deploy mode, you can also set variables for it here:
    +# - SPARK_CLASSPATH, default classpath entries to append
    --- End diff --
    
    nit: maybe reorder SPARK_CLASSPATH to be right below _DNS to allow users to more easily pattern-match these three together


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP] Clean up and simplify Spark configuratio...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/299#issuecomment-40326045
  
    Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Clean up and simplify Spark configuration

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/299#issuecomment-40787295
  
    Build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Clean up and simplify Spark configuration

Posted by pwendell <gi...@git.apache.org>.

Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/299#discussion_r11763430
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
    @@ -123,6 +142,14 @@ object SparkSubmit {
     
         val options = List[OptionAssigner](
           new OptionAssigner(appArgs.master, ALL_CLUSTER_MGRS, false, sysProp = "spark.master"),
    +
    +      new OptionAssigner(appArgs.driverExtraClassPath, STANDALONE | YARN, true,
    +        sysProp = "spark.driver.extraClassPath"),
    +      new OptionAssigner(appArgs.driverExtraJavaOptions, STANDALONE | YARN, true,
    +        sysProp = "spark.driver.extraJavaOpts"),
    --- End diff --
    
    Good catch - I'm ashamed I don't have unit tests for this patch.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Clean up and simplify Spark configuration

Posted by witgo <gi...@git.apache.org>.

Github user witgo commented on the pull request:

    https://github.com/apache/spark/pull/299#issuecomment-41150404
  
    ```
    ./bin/spark-submit /opt/spark/classes/toona-assembly-assembly-1.0.0-SNAPSHOT.jar --master spark://spark:7077 --deploy-mode client --class com.zhe800.toona.als.computation.DealCF --arg hdfs://192.168.10.39:8020/user/hadoop/testdata/20140304/*  --arg /opt/spark/toona/uidIndex --arg /opt/spark/toona/rating --arg /opt/spark/toona/model --verbose
    ```
    => 
    ```
    Using properties file: /opt/spark/spark-1.0.0-cdh3/conf/spark-defaults.conf
    Adding default property: spark.executor.memory=13g
    Adding default property: spark.eventLog.enabled=true
    Adding default property: spark.eventLog.dir=/opt/spark/logs/
    Adding default property: spark.master=spark://spark:7077
    Using properties file: /opt/spark/spark-1.0.0-cdh3/conf/spark-defaults.conf
    Adding default property: spark.executor.memory=13g
    Adding default property: spark.eventLog.enabled=true
    Adding default property: spark.eventLog.dir=/opt/spark/logs/
    Adding default property: spark.master=spark://spark:7077
    ```
    why ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP] Clean up and simplify Spark configuratio...

Posted by aarondav <gi...@git.apache.org>.

Github user aarondav commented on a diff in the pull request:

    https://github.com/apache/spark/pull/299#discussion_r11327153
  
    --- Diff: conf/spark-env.sh.template ---
    @@ -1,19 +1,36 @@
     #!/usr/bin/env bash
     
    -# This file contains environment variables required to run Spark. Copy it as
    -# spark-env.sh and edit that to configure Spark for your site.
    -#
    -# The following variables can be set in this file:
    +# This file is sourced when running various Spark classes. 
    +# Copy it as spark-env.sh and edit that to configure Spark for your site.
    +
    +# Options read when launching programs locally with 
    +# ./bin/spark-example or ./bin/spark-submit
    +# - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
    +# - SPARK_PUBLIC_DNS, to set the public dns name of the driver program
    +# - SPARK_CLASSPATH, default classpath entries to append
    +
    +# Options read by executors and drivers running inside the cluster
     # - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
    +# - SPARK_PUBLIC_DNS, to set the public DNS name of the driver program
    +# - SPARK_LOCAL_DIRS, shuffle directories to use on this node
    --- End diff --
    
    This is also used for RDD blocks stored on disk and external sort spilling


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Clean up and simplify Spark configuration

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/299#issuecomment-40835373
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14243/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Clean up and simplify Spark configuration

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/299#discussion_r11719747
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkConf.scala ---
    @@ -208,6 +208,81 @@ class SparkConf(loadDefaults: Boolean) extends Cloneable with Logging {
         new SparkConf(false).setAll(settings)
       }
     
    +  /** Checks for illegal or deprecated config settings. Throws an exception for the former. Not
    +    * idempotent - may mutate this conf object to convert deprecated settings to supported ones. */
    +  private[spark] def validateSettings() {
    +    if (settings.contains("spark.local.dir")) {
    +      val msg = "In Spark 1.0 and later spark.local.dir will be overridden by the value set by " +
    +        "the cluster manager (via SPARK_LOCAL_DIRS in mesos/standalone and LOCAL_DIRS in YARN)."
    +      logWarning(msg)
    +    }
    +
    +    val executorOptsKey = "spark.executor.extraJavaOptions"
    +    val executorClasspathKey = "spark.executor.extraClassPath"
    +    val driverOptsKey = "spark.driver.extraJavaOptions"
    +    val driverClassPathKey = "spark.driver.extraClassPath"
    +
    +    // Validate spark.executor.extraJavaOptions
    +    settings.get(executorOptsKey).map { javaOpts =>
    +      if (javaOpts.contains("-Dspark")) {
    +        val msg = s"$executorOptsKey is not allowed to set Spark options. Was '$javaOpts'"
    --- End diff --
    
    Maybe also add how the user is supposed to set `-Dspark*` options. Small nit: `Was '$javaOpts'` is inconsistent with what it is below (i.e. `was '$javaOpts'`)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Clean up and simplify Spark configuration

Posted by witgo <gi...@git.apache.org>.

Github user witgo commented on the pull request:

    https://github.com/apache/spark/pull/299#issuecomment-41150159
  
    SPARK DAEMON_OPTS seems to have no effect


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP] Clean up and simplify Spark configuratio...

Posted by aarondav <gi...@git.apache.org>.

Github user aarondav commented on a diff in the pull request:

    https://github.com/apache/spark/pull/299#discussion_r11219913
  
    --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ExecutorRunnableUtil.scala ---
    @@ -56,9 +56,10 @@ trait ExecutorRunnableUtil extends Logging {
         // Set the JVM memory
         val executorMemoryString = executorMemory + "m"
         JAVA_OPTS += "-Xms" + executorMemoryString + " -Xmx" + executorMemoryString + " "
    -    if (env.isDefinedAt("SPARK_JAVA_OPTS")) {
    -      JAVA_OPTS += env("SPARK_JAVA_OPTS") + " "
    -    }
    +
    +    // Set extra Java options for the executor
    +    val executorOpts = sys.props.find(_._1.contains("spark.executor.extraJavaOptions"))
    +    JAVA_OPTS += executorOpts
    --- End diff --
    
    I think this returns an Option? String concatenation may not be the correct solution here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Clean up and simplify Spark configuration

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/299#issuecomment-40830904
  
    Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP] Clean up and simplify Spark configuratio...

Posted by pwendell <gi...@git.apache.org>.

Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/299#discussion_r11567219
  
    --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
    @@ -340,8 +341,22 @@ trait ClientBase extends Logging {
           JAVA_OPTS += " -XX:CMSIncrementalDutyCycle=10 "
         }
     
    -    if (env.isDefinedAt("SPARK_JAVA_OPTS")) {
    -      JAVA_OPTS += " " + env("SPARK_JAVA_OPTS")
    +
    --- End diff --
    
    Hm actually I'm not so sure. The existing behavior is really confusing because it means that if SPARK_JAVA_OPTS is set on the executors and the driver... the behavior is basically undefined. It might be worth it to bite the bullet here rather than continue to support this unpredictable behavior for a long time. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP] Clean up and simplify Spark configuratio...

Posted by mridulm <gi...@git.apache.org>.

Github user mridulm commented on a diff in the pull request:

    https://github.com/apache/spark/pull/299#discussion_r11235611
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/SparkDeploySchedulerBackend.scala ---
    @@ -42,11 +42,16 @@ private[spark] class SparkDeploySchedulerBackend(
     
         // The endpoint for executors to talk to us
         val driverUrl = "akka.tcp://spark@%s:%s/user/%s".format(
    -      conf.get("spark.driver.host"),  conf.get("spark.driver.port"),
    +      conf.get("spark.driver.host"), conf.get("spark.driver.port"),
           CoarseGrainedSchedulerBackend.ACTOR_NAME)
    -    val args = Seq(driverUrl, "{{EXECUTOR_ID}}", "{{HOSTNAME}}", "{{CORES}}", "{{WORKER_URL}}")
    +    val args = sc.conf.get("spark.executor.extraJavaOptions").split(" ") ++
    --- End diff --
    
    This would fail when we have quoted strings


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Clean up and simplify Spark configuration

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/299#issuecomment-40830818
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP] Clean up and simplify Spark configuratio...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/299#issuecomment-39295726
  
    
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13670/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Clean up and simplify Spark configuration

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/299#issuecomment-40918739
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP] Clean up and simplify Spark configuratio...

Posted by mridulm <gi...@git.apache.org>.

Github user mridulm commented on a diff in the pull request:

    https://github.com/apache/spark/pull/299#discussion_r11235466
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/DriverRunner.scala ---
    @@ -74,13 +74,17 @@ private[spark] class DriverRunner(
     
               // Make sure user application jar is on the classpath
               // TODO: If we add ability to submit multiple jars they should also be added here
    -          val env = Map(driverDesc.command.environment.toSeq: _*)
    -          env("SPARK_CLASSPATH") = env.getOrElse("SPARK_CLASSPATH", "") + s":$localJarFilename"
    -          val newCommand = Command(driverDesc.command.mainClass,
    -            driverDesc.command.arguments.map(substituteVariables), env)
    +          val classPath = driverDesc.command.classPathEntries ++ Seq(s":$localJarFilename")
    --- End diff --
    
    Remove the ':'


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Clean up and simplify Spark configuration

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/299#issuecomment-40787296
  
    
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14227/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP] Clean up and simplify Spark configuratio...

Posted by pwendell <gi...@git.apache.org>.

Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/299#discussion_r11239389
  
    --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
    @@ -340,8 +341,22 @@ trait ClientBase extends Logging {
           JAVA_OPTS += " -XX:CMSIncrementalDutyCycle=10 "
         }
     
    -    if (env.isDefinedAt("SPARK_JAVA_OPTS")) {
    -      JAVA_OPTS += " " + env("SPARK_JAVA_OPTS")
    +
    --- End diff --
    
    Okay sounds good. If you have examples of what values this is being used for would be helpful (e.g. are they setting GC settings, or is some application-specific system properties or what).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Clean up and simplify Spark configuration

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/299


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Clean up and simplify Spark configuration

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/299#issuecomment-40458978
  
    
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14143/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Clean up and simplify Spark configuration

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/299#issuecomment-40790468
  
    Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Clean up and simplify Spark configuration

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/299#issuecomment-40830827
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Clean up and simplify Spark configuration

Posted by pwendell <gi...@git.apache.org>.

Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/299#issuecomment-40430015
  
    @tgravescs I the last pass added a few light documentation bits and some minor changes in the YARN code. Mind taking a look? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP] Clean up and simplify Spark configuratio...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/299#issuecomment-40325334
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Clean up and simplify Spark configuration

Posted by pwendell <gi...@git.apache.org>.

Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/299#discussion_r11720277
  
    --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
    @@ -333,15 +325,29 @@ trait ClientBase extends Logging {
         if (useConcurrentAndIncrementalGC) {
           // In our expts, using (default) throughput collector has severe perf ramifications in
           // multi-tenant machines
    -      JAVA_OPTS += " -XX:+UseConcMarkSweepGC "
    -      JAVA_OPTS += " -XX:+CMSIncrementalMode "
    -      JAVA_OPTS += " -XX:+CMSIncrementalPacing "
    -      JAVA_OPTS += " -XX:CMSIncrementalDutyCycleMin=0 "
    -      JAVA_OPTS += " -XX:CMSIncrementalDutyCycle=10 "
    +      JAVA_OPTS += "-XX:+UseConcMarkSweepGC"
    +      JAVA_OPTS += "-XX:+CMSIncrementalMode"
    +      JAVA_OPTS += "-XX:+CMSIncrementalPacing"
    +      JAVA_OPTS += "-XX:CMSIncrementalDutyCycleMin=0"
    +      JAVA_OPTS += "-XX:CMSIncrementalDutyCycle=10"
         }
     
    -    if (env.isDefinedAt("SPARK_JAVA_OPTS")) {
    -      JAVA_OPTS += " " + env("SPARK_JAVA_OPTS")
    +    // TODO: it might be nicer to pass these as an internal environment variable rather than
    +    // as Java options, due to complications with string parsing of nested quotes.
    +    if (args.amClass == classOf[ExecutorLauncher].getName) {
    +      // If we are being launched in client mode, forward the spark-conf options
    +      // onto the executor launcher
    +      for ((k, v) <- sparkConf.getAll) {
    +        JAVA_OPTS += "-D" + k + "=" + "\\\"" + v + "\\\""
    --- End diff --
    
    JAVA_OPTS is a sequence of strings, so the space gets inserted when the sequence is joined inside of YARN.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP] Clean up and simplify Spark configuratio...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/299#issuecomment-40325638
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP] Clean up and simplify Spark configuratio...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/299#issuecomment-39292226
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP] Clean up and simplify Spark configuratio...

Posted by pwendell <gi...@git.apache.org>.

Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/299#issuecomment-40325568
  
    Did a pass but still need to write a testing framework for this and update some documentation.
    
    @mridulm I settled on the approach of providing support for SPARK_JAVA_OPTS by simply converting it into `spark.executor.extraJavaOptions` if it's present (/cc @aarondav). But it does give a loud warning telling users to change... since the current approach has some issues.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP] Clean up and simplify Spark configuratio...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/299#issuecomment-40321305
  
    
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14102/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Clean up and simplify Spark configuration

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/299#issuecomment-40913814
  
    Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Clean up and simplify Spark configuration

Posted by tgravescs <gi...@git.apache.org>.

Github user tgravescs commented on a diff in the pull request:

    https://github.com/apache/spark/pull/299#discussion_r11775119
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkConf.scala ---
    @@ -208,6 +208,82 @@ class SparkConf(loadDefaults: Boolean) extends Cloneable with Logging {
         new SparkConf(false).setAll(settings)
       }
     
    +  /** Checks for illegal or deprecated config settings. Throws an exception for the former. Not
    +    * idempotent - may mutate this conf object to convert deprecated settings to supported ones. */
    +  private[spark] def validateSettings() {
    +    if (settings.contains("spark.local.dir")) {
    +      val msg = "In Spark 1.0 and later spark.local.dir will be overridden by the value set by " +
    +        "the cluster manager (via SPARK_LOCAL_DIRS in mesos/standalone and LOCAL_DIRS in YARN)."
    +      logWarning(msg)
    +    }
    +
    +    val executorOptsKey = "spark.executor.extraJavaOptions"
    +    val executorClasspathKey = "spark.executor.extraClassPath"
    +    val driverOptsKey = "spark.driver.extraJavaOptions"
    +    val driverClassPathKey = "spark.driver.extraClassPath"
    +
    +    // Validate spark.executor.extraJavaOptions
    +    settings.get(executorOptsKey).map { javaOpts =>
    +      if (javaOpts.contains("-Dspark")) {
    +        val msg = s"$executorOptsKey is not allowed to set Spark options (was '$javaOpts)'. " +
    +          "Set them directly on a SparkConf or in a properties file when using ./bin/spark-submit."
    +        throw new Exception(msg)
    +      }
    +      if (javaOpts.contains("-Xmx") || javaOpts.contains("-Xms")) {
    +        val msg = s"$executorOptsKey is not allowed to alter memory settings (was '$javaOpts'). " +
    +          "Use spark.executor.memory instead."
    +        throw new Exception(msg)
    +      }
    +    }
    +
    +    // Check for legacy configs
    +    sys.env.get("SPARK_JAVA_OPTS").foreach { value =>
    +      val error =
    +        s"""
    +          |SPARK_JAVA_OPTS was detected (set to '$value').
    +          |This has undefined behavior when running on a cluster and is deprecated in Spark 1.0+.
    +          |
    +          |Please instead use:
    +          | - ./spark-submit with conf/spark-defaults.conf to set properties for an application
    --- End diff --
    
    should this be spark-defaults.properties since that is wait code looks for?
    
    
    Also are you adding a spark-defaults.properties.template or something with documentation?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Clean up and simplify Spark configuration

Posted by tgravescs <gi...@git.apache.org>.

Github user tgravescs commented on a diff in the pull request:

    https://github.com/apache/spark/pull/299#discussion_r11775869
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkConf.scala ---
    @@ -208,6 +208,82 @@ class SparkConf(loadDefaults: Boolean) extends Cloneable with Logging {
         new SparkConf(false).setAll(settings)
       }
     
    +  /** Checks for illegal or deprecated config settings. Throws an exception for the former. Not
    +    * idempotent - may mutate this conf object to convert deprecated settings to supported ones. */
    +  private[spark] def validateSettings() {
    +    if (settings.contains("spark.local.dir")) {
    +      val msg = "In Spark 1.0 and later spark.local.dir will be overridden by the value set by " +
    +        "the cluster manager (via SPARK_LOCAL_DIRS in mesos/standalone and LOCAL_DIRS in YARN)."
    +      logWarning(msg)
    +    }
    +
    +    val executorOptsKey = "spark.executor.extraJavaOptions"
    +    val executorClasspathKey = "spark.executor.extraClassPath"
    +    val driverOptsKey = "spark.driver.extraJavaOptions"
    +    val driverClassPathKey = "spark.driver.extraClassPath"
    +
    +    // Validate spark.executor.extraJavaOptions
    +    settings.get(executorOptsKey).map { javaOpts =>
    +      if (javaOpts.contains("-Dspark")) {
    +        val msg = s"$executorOptsKey is not allowed to set Spark options (was '$javaOpts)'. " +
    --- End diff --
    
    I'm getting this error message trying to set things in the properties file running on yarn
    
    $ cat spark-conf.properties 
    spark.executor.extraJavaOptions=-Dspark.testing=foo


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Clean up and simplify Spark configuration

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/299#issuecomment-40428502
  
    
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14121/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP] Clean up and simplify Spark configuratio...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/299#issuecomment-40321272
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---