You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by andrewor14 <gi...@git.apache.org> on 2014/08/08 04:09:50 UTC

[GitHub] spark pull request: [SPARK-2849 / 2914] Handle certain Spark confi...

GitHub user andrewor14 opened a pull request:

    https://github.com/apache/spark/pull/1845

    [SPARK-2849 / 2914] Handle certain Spark configs in bash correctly

    We currently rely on the Java properties file to parse `spark-defaults.conf` file. However, certain Spark configs need to be processed independently of the JVM. These fall into two general categories:
    
    **1) spark.driver.***
    
    In client deploy mode, the driver is launched from within `SparkSubmit`'s JVM. This means by the time we parse Spark configs from `spark-defaults.conf`, it is already too late to control certain properties of the driver's JVM. We currently ignore these configs in client mode altogether.
    ```
    spark.driver.memory
    spark.driver.extraJavaOptions
    spark.driver.extraClassPath
    spark.driver.extraLibraryPath
    ```
    **2) spark.*.extraJavaOptions**
    
    These configs involve a list of java options squeezed into a string. Currently, if any of the options include escaped double quotes or backslashes, the splitting of these options will likely be incorrect and the Java command to start the executors may be corrupted. The relevant configs here include the following.
    
    ```
    spark.driver.extraJavaOptions
    spark.executor.extraJavaOptions
    ```
    
    For both categories, we need to preemptively parse the Spark configs in bash. There is a lot of trickery involved in performing heavy-duty parsing in bash, and I have moved much of the complexity to a new file `bin/util.sh`. I have tested this locally with escaped double quotes, backslashes, whitespace, and a combination of the above, and the behavior is as expected.
    
    The changes build directly on top of my old PR at #1770.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/andrewor14/spark handle-configs-bash

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/1845.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1845
    
----
commit 250cb955efe9c9bdf24be6cefcfb1dfa71d39bc4
Author: Andrew Or <an...@gmail.com>
Date:   2014-08-04T20:12:15Z

    Do not ignore spark.driver.extra* for client mode

commit a2ab1b0a3a976e361ea86fc20fc7083e7f9885ca
Author: Andrew Or <an...@gmail.com>
Date:   2014-08-06T04:32:05Z

    Parse spark.driver.extra* in bash

commit 0025474d7412607e1ca620d1942393d78a28b7f8
Author: Andrew Or <an...@gmail.com>
Date:   2014-08-06T04:35:16Z

    Revert SparkSubmit handling of --driver-* options for only cluster mode

commit 63ed2e9dada83402d2502efb9759524be74c04b2
Author: Andrew Or <an...@gmail.com>
Date:   2014-08-06T04:36:00Z

    Merge branch 'master' of github.com:apache/spark into submit-driver-extra

commit 75ee6b4a1c6df1a911cf62ded81e4eabb737b345
Author: Andrew Or <an...@gmail.com>
Date:   2014-08-06T04:36:35Z

    Remove accidentally added file

commit 8843562bb6883a092e6f4032f05fe01932db086b
Author: Andrew Or <an...@gmail.com>
Date:   2014-08-06T04:37:11Z

    Fix compilation issues...

commit 98dd8e327ac5940d8a4a3820027645ee4b88178e
Author: Andrew Or <an...@gmail.com>
Date:   2014-08-06T04:39:07Z

    Add warning if properties file does not exist

commit 130f295e085d95e8205d882174a5667d29b3b1f2
Author: Andrew Or <an...@gmail.com>
Date:   2014-08-06T05:12:28Z

    Handle spark.driver.memory too

commit 4edcaa8027961578246c5cfa8a2d82a92a031265
Author: Andrew Or <an...@gmail.com>
Date:   2014-08-06T06:17:56Z

    Redirect stdout to stderr for python

commit e5cfb4627df353125f8f2382bad4bb35aa03c7fb
Author: Andrew Or <an...@gmail.com>
Date:   2014-08-06T20:26:04Z

    Collapse duplicate code + fix potential whitespace issues

commit 4ec22a154c428e7e581d43d86bd7ef9d1308ca45
Author: Andrew Or <an...@gmail.com>
Date:   2014-08-06T20:26:42Z

    Merge branch 'master' of github.com:apache/spark into submit-driver-extra

commit ef12f74b9b7e7edcefb6b82cb53de3eccbf0d9ad
Author: Andrew Or <an...@gmail.com>
Date:   2014-08-06T20:31:32Z

    Minor formatting

commit fa2136ed14145f8fa18f40e6e1a3a776048c01ab
Author: Andrew Or <an...@gmail.com>
Date:   2014-08-07T05:23:44Z

    Escape Java options + parse java properties files properly

commit dec23439ad82718f786ea022b1f118f202687cc1
Author: Andrew Or <an...@gmail.com>
Date:   2014-08-07T05:28:38Z

    Only export variables if they exist

commit a4df3c4165ce4546742fbd0b9d92ea612973bb2e
Author: Andrew Or <an...@gmail.com>
Date:   2014-08-07T05:47:57Z

    Move parsing and escaping logic to utils.sh
    
    This commit also fixes a deadly typo.

commit de765c9813275b939741e1b78567b2443fab5f2d
Author: Andrew Or <an...@gmail.com>
Date:   2014-08-07T06:22:05Z

    Print spark-class command properly

commit 8e552b733d52ada89dd7c0e8692fcca87fc00d26
Author: Andrew Or <an...@gmail.com>
Date:   2014-08-07T06:25:36Z

    Include an example of spark.*.extraJavaOptions
    
    Right now it's not super obvious how to specify multiple
    java options, especially ones with white spaces.

commit c13a2cb75b49cfbf7ae6765d900123595a5db076
Author: Andrew Or <an...@gmail.com>
Date:   2014-08-07T06:26:38Z

    Merge branch 'master' of github.com:apache/spark into submit-driver-extra

commit c854859be8a604ac04c74488e7729423c47acd37
Author: Andrew Or <an...@gmail.com>
Date:   2014-08-07T06:38:39Z

    Add small comment

commit 1cdc6b15ff375bfb0ce3fe3f6b6c434dc4e30947
Author: Andrew Or <an...@gmail.com>
Date:   2014-08-07T19:33:55Z

    Fix bug: escape escaped double quotes properly
    
    The previous code used to ignore all closing quotes if the same token
    also has an escaped double quote. For example, in
    
    -Dkey="I am the \"man\""
    
    the last token contains both escaped quotes and valid quotes. This
    used to be interpreted as a token that doesn't have a closing quote
    when it actually does. This is fixed in this commit.

commit 45a1eb996773fa1828d1d489cbc451f2033845e0
Author: Andrew Or <an...@gmail.com>
Date:   2014-08-07T20:28:56Z

    Fix bug: escape escaped backslashes and quotes properly...
    
    This is so that the way this is parsed and the way Java parses
    its java opts is consistent.

commit aabfc7e1da8897b266020da6c480cbe7d774bc99
Author: Andrew Or <an...@gmail.com>
Date:   2014-08-07T21:24:57Z

    escape -> split (minor)

commit a992ae2ba7067cf76fba0e3ef192b275eee40b57
Author: Andrew Or <an...@gmail.com>
Date:   2014-08-07T23:51:16Z

    Escape spark.*.extraJavaOptions correctly
    
    We previously never dealt with this correctly, in that we evaluated
    all backslashes twice, once when passing spark.*.extraJavaOptions
    into SparkSubmit, and another time when calling
    Utils.splitCommandString.
    
    This means we need to pass the raw values of these configs directly
    to the JVM without evaluating the backslashes when launching
    SparkSubmit. The way we do this is through a few custom environment
    variables.
    
    As of this commit, the user should follow the format outlined in
    spark-defaults.conf.template for spark.*.extraJavaOptions, and
    the expected java options (with quotes, whitespaces and backslashes
    and everything) will be propagated to the driver or the executors
    correctly.

commit c7b99267c577195882c965029884941b79cc8ed0
Author: Andrew Or <an...@gmail.com>
Date:   2014-08-07T23:51:29Z

    Minor changes to spark-defaults.conf.template
    
    ... to highlight our new-found ability to deal with special values.

commit 5d8f8c481951269033ecc1ec0435060bdc25e926
Author: Andrew Or <an...@gmail.com>
Date:   2014-08-07T23:52:29Z

    Merge branch 'master' of github.com:apache/spark into submit-driver-extra

commit e793e5f56c5c62d94fe0f2ac3d8aefc1d0b1573e
Author: Andrew Or <an...@gmail.com>
Date:   2014-08-08T01:38:17Z

    Handle multi-line arguments

commit c2273fcad9a4f3e5cdfba56eb88e483100728a8a
Author: Andrew Or <an...@gmail.com>
Date:   2014-08-08T01:53:20Z

    Fix typo (minor)

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849] Handle driver configs separately ...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/1845#issuecomment-52743041
  
    test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849 / 2914] Handle certain Spark confi...

Posted by pwendell <gi...@git.apache.org>.

Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1845#discussion_r16397452
  
    --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
    @@ -1421,3 +1421,24 @@ private[spark] object Utils extends Logging {
       }
     
     }
    +
    +/**
    + * A utility class to redirect the child process's stdout or stderr.
    + */
    +private[spark] class RedirectThread(in: InputStream, out: OutputStream, name: String)
    +  extends Thread(name) {
    +
    +  setDaemon(true)
    +  override def run() {
    +    scala.util.control.Exception.ignoring(classOf[IOException]) {
    +      // FIXME: We copy the stream on the level of bytes to avoid encoding problems.
    --- End diff --
    
    Do you understand this comment? I don't.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849] Handle driver configs separately ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1845#issuecomment-52840537
  
    **Tests timed out** after a configured wait of `120m`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849] Handle driver configs separately ...

Posted by pwendell <gi...@git.apache.org>.

Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1845#discussion_r16496323
  
    --- Diff: bin/spark-class ---
    @@ -146,10 +155,28 @@ if $cygwin; then
     fi
     export CLASSPATH
     
    -if [ "$SPARK_PRINT_LAUNCH_COMMAND" == "1" ]; then
    -  echo -n "Spark Command: " 1>&2
    -  echo "$RUNNER" -cp "$CLASSPATH" $JAVA_OPTS "$@" 1>&2
    -  echo -e "========================================\n" 1>&2
    +# In Spark submit client mode, the driver is launched in the same JVM as Spark submit itself.
    +# Here we must parse the properties file for relevant "spark.driver.*" configs before launching
    +# the driver JVM itself. Instead of handling this complexity in Bash, we launch a separate JVM
    +# to prepare the launch environment of this driver JVM.
    +
    +if [ -n "$SPARK_SUBMIT_BOOTSTRAP_DRIVER" ]; then
    +  # This is used only if the properties file actually contains these special configs
    +  # Export the environment variables needed by SparkSubmitDriverBootstrapper
    +  export RUNNER
    +  export CLASSPATH
    +  export JAVA_OPTS
    +  export OUR_JAVA_MEM
    +  export SPARK_CLASS=1
    +  shift # Ignore main class and use our own
    --- End diff --
    
    Just so I understand - is this only ever used when the main class is `SparkSubmit`? It might be good to say that in the comment. The way it is now makes it sound like we could substitute a different class from the one that was specified, but in fact, all this does is ignore the class here and we hard code it in the bootstrapper.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849] Handle driver configs separately ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1845#issuecomment-52745225
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18929/consoleFull) for   PR 1845 at commit [`9a778f6`](https://github.com/apache/spark/commit/9a778f66d812886b3322b32ae92fccf9d8415657).
     * This patch **passes** unit tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849] Handle driver configs separately ...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/1845#issuecomment-52821333
  
    Yeah, I'm not super fond of having two JVMs running either, but I just haven't looked into what we need to do for Windows. We might try to cut out this layer in the future once we come up with a better design that suits both environments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849 / 2914] Handle certain Spark confi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1845#issuecomment-52586241
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18807/consoleFull) for   PR 1845 at commit [`c84f5c8`](https://github.com/apache/spark/commit/c84f5c8ac6c0d75930a3545bf98b25c6d6287186).
     * This patch **passes** unit tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `  case class Params(input: String = "data/mllib/sample_linear_regression_data.txt")`
      * `  case class Params(input: String = "data/mllib/sample_linear_regression_data.txt")`
      * `  case class Params(input: String = "data/mllib/sample_binary_classification_data.txt")`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849] Handle driver configs separately ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1845#issuecomment-52850096
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18993/consoleFull) for   PR 1845 at commit [`bed4bdf`](https://github.com/apache/spark/commit/bed4bdff74e941f791080e9407fc5295af10f677).
     * This patch **passes** unit tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `  shift # Ignore main class (org.apache.spark.deploy.SparkSubmit) and use our own`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849] Handle driver configs separately ...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/1845#issuecomment-52818583
  
    test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849 / 2914] Handle certain Spark confi...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/1845#issuecomment-51679922
  
    test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849] Handle driver configs separately ...

Posted by pwendell <gi...@git.apache.org>.

Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1845#discussion_r16427882
  
    --- Diff: bin/spark-class ---
    @@ -73,11 +75,16 @@ case "$1" in
         OUR_JAVA_MEM=${SPARK_EXECUTOR_MEMORY:-$DEFAULT_MEM}
         ;;
     
    -  # Spark submit uses SPARK_SUBMIT_OPTS and SPARK_JAVA_OPTS
    -    'org.apache.spark.deploy.SparkSubmit')
    -    OUR_JAVA_OPTS="$SPARK_JAVA_OPTS $SPARK_SUBMIT_OPTS \
    -      -Djava.library.path=$SPARK_SUBMIT_LIBRARY_PATH"
    +  # Spark submit uses SPARK_JAVA_OPTS + SPARK_SUBMIT_JAVA_OPTS + SPARK_DRIVER_MEMORY + SPARK_SUBMIT_DRIVER_MEMORY.
    --- End diff --
    
    could you wrap this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849 / 2914] Handle certain Spark confi...

Posted by pwendell <gi...@git.apache.org>.

Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/1845#issuecomment-52591994
  
    Hey Andrew I took a pass. Overall I think this is the best approach we can do right now (using a bootstrap java process). I felt like there were several attempts to overly generalize this that probably aren't worth doing now. I think it's fine to just read the environment variables directly rather than going through options. Also, the way it is now it adds some complexity where we try to merge options. This could make some really confusing outcomes for users, I'd just defer to the normal precedence we have in the rest of the spark submit options where the flags take script precedence over values in the config file.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849] Handle driver configs separately ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1845#issuecomment-52819405
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18969/consoleFull) for   PR 1845 at commit [`24dba60`](https://github.com/apache/spark/commit/24dba6067b33719e7eb25b0fb536bc2df72d34a4).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849] Handle driver configs separately ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1845#issuecomment-52674976
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18859/consoleFull) for   PR 1845 at commit [`9ba37e2`](https://github.com/apache/spark/commit/9ba37e2b17a78ebddded6f4fbd30297b8365f57c).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849] Handle driver configs separately ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1845#issuecomment-52742757
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18934/consoleFull) for   PR 1845 at commit [`24dba60`](https://github.com/apache/spark/commit/24dba6067b33719e7eb25b0fb536bc2df72d34a4).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849 / 2914] Handle certain Spark confi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1845#issuecomment-52383497
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18657/consoleFull) for   PR 1845 at commit [`c886568`](https://github.com/apache/spark/commit/c886568687adfda65ae27d0fefa184b345a770ad).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849 / 2914] Handle certain Spark confi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1845#issuecomment-52608216
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18836/consoleFull) for   PR 1845 at commit [`d6488f9`](https://github.com/apache/spark/commit/d6488f92ddc15860747ef6ed4082e7ffca62c468).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849] Handle driver configs separately ...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/1845#issuecomment-52744175
  
    Let me try this once more. Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849 / 2914] Handle certain Spark confi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1845#issuecomment-52384399
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18657/consoleFull) for   PR 1845 at commit [`c886568`](https://github.com/apache/spark/commit/c886568687adfda65ae27d0fefa184b345a770ad).
     * This patch **fails** unit tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849 / 2914] Handle certain Spark confi...

Posted by pwendell <gi...@git.apache.org>.

Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/1845#issuecomment-52592106
  
    Also one other thing - if you wanted to name that script `DriverBootstrapLauncher` or something it might be better than `SparkClassLauncher`. It's quite specific to the case of launching the driver in client mode right now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849] Handle driver configs separately ...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/1845#issuecomment-52820004
  
    @vanzin Yeah, the tricky thing is not with Bash but with Windows. Even the least complicated logic becomes many tens of lines of code. In your example, you're using `\r\n` as the delimiter, but it's possible that the user's command may contain these characters as well so we need to do more escaping there. I'm not saying that it's simply impossible; I just mean that it will be complicated because we have to parse it again in Bash. The original form of this PR went down that road and it was significantly more complicated then, and doing the same in Windows will be even more complex and error prone.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849] Handle driver configs separately ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1845#issuecomment-52685572
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18862/consoleFull) for   PR 1845 at commit [`d0f20db`](https://github.com/apache/spark/commit/d0f20db0e81551bb5f7447fe5491cc2ae5817421).
     * This patch **passes** unit tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849] Handle driver configs separately ...

Posted by pwendell <gi...@git.apache.org>.

Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/1845#issuecomment-52734362
  
    @sryza yeah I spoke with @mateiz about this one a bit offline. Basically in this case there is now way to correctly parse these expressions in both bash and windows that is not extremely complex -- much more complex than the original logic we had to deal with of just trapping some command line arguments. So in this more limited case it was deemed acceptable to launch a separate JVM first.
    
    I like the idea of containing this more (cc @vanzin) - but I don't think this is possible since bash can only receive the output of the command as a string. Bash can natively support the notion of sending a list of arguments to a subprocess, but AFAIK it can't read a list back.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849] Handle driver configs separately ...

Posted by pwendell <gi...@git.apache.org>.

Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1845#discussion_r16428889
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmitDriverBootstrapper.scala ---
    @@ -0,0 +1,122 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.deploy
    +
    +import java.io.File
    +
    +import scala.collection.JavaConversions._
    +
    +import org.apache.spark.util.{RedirectThread, Utils}
    +
    +/**
    + * Launch an application through Spark submit in client mode with the appropriate classpath,
    + * library paths, java options and memory. These properties of the JVM must be set before the
    + * driver JVM is launched. The sole purpose of this class is to avoid handling the complexity
    + * of parsing the properties file for such relevant configs in BASH.
    + *
    + * Usage: org.apache.spark.deploy.SparkSubmitDriverBootstrapper <application args>
    + */
    +private[spark] object SparkSubmitDriverBootstrapper {
    +
    +  // Note: This class depends on the behavior of `bin/spark-class` and `bin/spark-submit`.
    +  // Any changes made there must be reflected in this file.
    +
    +  def main(args: Array[String]): Unit = {
    +    val submitArgs = args
    +    val runner = sys.env("RUNNER")
    +    val classpath = sys.env("CLASSPATH")
    +    val javaOpts = sys.env("JAVA_OPTS")
    +    val defaultDriverMemory = sys.env("OUR_JAVA_MEM")
    +
    +    // Spark submit specific environment variables
    +    val deployMode = sys.env("SPARK_SUBMIT_DEPLOY_MODE")
    +    val propertiesFile = sys.env("SPARK_SUBMIT_PROPERTIES_FILE")
    +    val bootstrapDriver = sys.env("SPARK_SUBMIT_BOOTSTRAP_DRIVER")
    +    val submitDriverMemory = sys.env.get("SPARK_SUBMIT_DRIVER_MEMORY")
    +    val submitLibraryPath = sys.env.get("SPARK_SUBMIT_LIBRARY_PATH")
    +    val submitClasspath = sys.env.get("SPARK_SUBMIT_CLASSPATH")
    +    val submitJavaOpts = sys.env.get("SPARK_SUBMIT_JAVA_OPTS")
    +
    +    assume(runner != null, "RUNNER must be set")
    +    assume(classpath != null, "CLASSPATH must be set")
    +    assume(javaOpts != null, "JAVA_OPTS must be set")
    +    assume(defaultDriverMemory != null, "OUR_JAVA_MEM must be set")
    +    assume(deployMode == "client", "SPARK_SUBMIT_DEPLOY_MODE must be \"client\"!")
    +    assume(propertiesFile != null, "SPARK_SUBMIT_PROPERTIES_FILE must be set")
    +    assume(bootstrapDriver != null, "SPARK_SUBMIT_BOOTSTRAP_DRIVER must be set!")
    +
    +    // Parse the properties file for the equivalent spark.driver.* configs
    +    val properties = SparkSubmitArguments.getPropertiesFromFile(new File(propertiesFile)).toMap
    +    val confDriverMemory = properties.get("spark.driver.memory").getOrElse(defaultDriverMemory)
    +    val confLibraryPath = properties.get("spark.driver.extraLibraryPath").getOrElse("")
    +    val confClasspath = properties.get("spark.driver.extraClassPath").getOrElse("")
    +    val confJavaOpts = properties.get("spark.driver.extraJavaOptions").getOrElse("")
    +
    +    // Favor Spark submit arguments over the equivalent configs in the properties file.
    +    // Note that we do not actually use the Spark submit values for library path, classpath,
    +    // and java opts here, because we have already captured them in BASH.
    --- End diff --
    
    Java opts and Bash


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849] Handle driver configs separately ...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/1845#issuecomment-52744147
  
    Jenkins is failing because of some mllib / streaming leftover state


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849] Handle driver configs separately ...

Posted by pwendell <gi...@git.apache.org>.

Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/1845#issuecomment-52818212
  
    @vanzin this example shows array concatenation in bash and passing an array into the arguments of a sub-process. I don't see how it's related to the exact proposal here. What we need is a way to parse a string into a native argument array in a way that is protable across windows and scripting and bash. In this case, we'd need to convert the output of your pyhton program back to a bash array in order to pass to a separate subprocess.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849 / 2914] Handle certain Spark confi...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1845#discussion_r16397316
  
    --- Diff: bin/spark-class ---
    @@ -101,11 +103,12 @@ fi
     # Set JAVA_OPTS to be able to load native libraries and to set heap size
     JAVA_OPTS="-XX:MaxPermSize=128m $OUR_JAVA_OPTS"
     JAVA_OPTS="$JAVA_OPTS -Xms$OUR_JAVA_MEM -Xmx$OUR_JAVA_MEM"
    +
     # Load extra JAVA_OPTS from conf/java-opts, if it exists
     if [ -e "$FWDIR/conf/java-opts" ] ; then
       JAVA_OPTS="$JAVA_OPTS `cat $FWDIR/conf/java-opts`"
     fi
    -export JAVA_OPTS
    --- End diff --
    
    It's not consumed as far as I can tell.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849] Handle driver configs separately ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1845#issuecomment-52743943
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18938/consoleFull) for   PR 1845 at commit [`24dba60`](https://github.com/apache/spark/commit/24dba6067b33719e7eb25b0fb536bc2df72d34a4).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849 / 2914] Handle certain Spark confi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1845#issuecomment-52384351
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18656/consoleFull) for   PR 1845 at commit [`7a4190a`](https://github.com/apache/spark/commit/7a4190af01454768fe2334c00129d2d535ed57fb).
     * This patch **passes** unit tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849] Handle driver configs separately ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1845#issuecomment-52691293
  
    **Tests timed out** after a configured wait of `120m`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849 / 2914] Handle certain Spark confi...

Posted by pwendell <gi...@git.apache.org>.

Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1845#discussion_r16395985
  
    --- Diff: bin/spark-class ---
    @@ -101,11 +103,12 @@ fi
     # Set JAVA_OPTS to be able to load native libraries and to set heap size
     JAVA_OPTS="-XX:MaxPermSize=128m $OUR_JAVA_OPTS"
     JAVA_OPTS="$JAVA_OPTS -Xms$OUR_JAVA_MEM -Xmx$OUR_JAVA_MEM"
    +
     # Load extra JAVA_OPTS from conf/java-opts, if it exists
     if [ -e "$FWDIR/conf/java-opts" ] ; then
       JAVA_OPTS="$JAVA_OPTS `cat $FWDIR/conf/java-opts`"
     fi
    -export JAVA_OPTS
    --- End diff --
    
    do you know why this was exported before? I looked and I couldn't find anything downstream that consumes this


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849 / 2914] Handle certain Spark confi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1845#issuecomment-51696855
  
    QA results for PR 1845:<br>- This patch PASSES unit tests.<br>- This patch merges cleanly<br>- This patch adds no public classes<br><br>For more information see test ouptut:<br>https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18258/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849 / 2914] Handle certain Spark confi...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/1845#issuecomment-51567374
  
    Tests sound like a great idea.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849] Handle driver configs separately ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1845#issuecomment-52819428
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18969/consoleFull) for   PR 1845 at commit [`24dba60`](https://github.com/apache/spark/commit/24dba6067b33719e7eb25b0fb536bc2df72d34a4).
     * This patch **fails** unit tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849] Handle driver configs separately ...

Posted by vanzin <gi...@git.apache.org>.

Github user vanzin commented on the pull request:

https://github.com/apache/spark/pull/1845#issuecomment-52818953

@pwendell my proposal is that the scala code should take care of figuring out all the arguments to be passed to the JVM / SparkSubmit. In my example, the python program is what the JVM invocation would be (i.e. replace "python -c ... ${x[@]}" with "java ${x[@]}", and the "test" function would be an invocation to the "SparkSubmitBootstrap" class). Note there is no string parsing; I'm using "IFS" to say "each line is an element in the array", so the bootstrap code would print one argument per line (and not a single line with all arguments).

I was just trying to show that it's possible to handle quoting, since @andrewor14 did not think it was possible.

I'm ok with you guys just going with the current approach, but I think all this logic that is now split between bash and scala is getting way overcomplicated, and it would be much better to have everything in scala.

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849] Handle driver configs separately ...

Posted by pwendell <gi...@git.apache.org>.

Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1845#discussion_r16427923
  
    --- Diff: bin/spark-class ---
    @@ -146,10 +153,27 @@ if $cygwin; then
     fi
     export CLASSPATH
     
    -if [ "$SPARK_PRINT_LAUNCH_COMMAND" == "1" ]; then
    -  echo -n "Spark Command: " 1>&2
    -  echo "$RUNNER" -cp "$CLASSPATH" $JAVA_OPTS "$@" 1>&2
    -  echo -e "========================================\n" 1>&2
    +# In Spark submit client mode, the driver is launched in the same JVM as Spark submit itself.
    +# Here we must parse the properties file for relevant "spark.driver.*" configs before launching
    +# the driver JVM itself. Instead of handling this complexity in BASH, we launch a separate JVM
    --- End diff --
    
    I think it's just "Bash" not "BASH"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849 / 2914] Handle certain Spark confi...

Posted by pwendell <gi...@git.apache.org>.

Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1845#discussion_r16280819
  
    --- Diff: bin/spark-class ---
    @@ -101,11 +106,16 @@ fi
     # Set JAVA_OPTS to be able to load native libraries and to set heap size
     JAVA_OPTS="-XX:MaxPermSize=128m $OUR_JAVA_OPTS"
     JAVA_OPTS="$JAVA_OPTS -Xms$OUR_JAVA_MEM -Xmx$OUR_JAVA_MEM"
    +
     # Load extra JAVA_OPTS from conf/java-opts, if it exists
     if [ -e "$FWDIR/conf/java-opts" ] ; then
       JAVA_OPTS="$JAVA_OPTS `cat $FWDIR/conf/java-opts`"
     fi
    -export JAVA_OPTS
    +
    +# Split JAVA_OPTS properly to handle whitespace, double quotes and backslashes
    +# This exports the split java options into SPLIT_JAVA_OPTS
    +split_java_options "$JAVA_OPTS"
    --- End diff --
    
    This actually solves a more general problem than those reported in SPARK-2849 and SPARK-2914... the problem/feature is that in general we don't support quotes in any of the java option strings we have. I tested this in master and confirmed it doesn't work:
    
    ```
    $ export SPARK_JAVA_OPTS="-Dfoo=\"bar baz\""
    $ ./bin/spark-shell
    Spark assembly has been built with Hive, including Datanucleus jars on classpath
    Error: Could not find or load main class baz"
    ```
    So it might be good to create another JIRA as well for this PR. One that just explains that none of the JAVA_OPTS variants we have correctly support quoted strings.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849 / 2914] Handle certain Spark confi...

Posted by pwendell <gi...@git.apache.org>.

Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1845#discussion_r16396854
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkClassLauncher.scala ---
    @@ -0,0 +1,116 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.deploy
    +
    +import java.io.File
    +
    +import scala.collection.JavaConversions._
    +
    +import org.apache.spark.util.{RedirectThread, Utils}
    +
    +/**
    + * Wrapper of `bin/spark-class` that prepares the launch environment of the child JVM properly.
    + * This is currently only used for running Spark submit in client mode. The goal moving forward
    + * is to use this class for all use cases of `bin/spark-class`.
    + */
    +object SparkClassLauncher {
    +
    +  /**
    +   * Launch a Spark class with the given class paths, library paths, java options and memory.
    +   * If we are launching an application through Spark submit in client mode, we must also
    +   * take into account special `spark.driver.*` properties needed to start the driver JVM.
    +   */
    +  def main(args: Array[String]): Unit = {
    +    if (args.size < 8) {
    +      System.err.println(
    +        """
    +          |Usage: org.apache.spark.deploy.SparkClassLauncher
    +          |
    +          |  [properties file]    - path to your Spark properties file
    +          |  [java runner]        - command to launch the child JVM
    +          |  [java class paths]   - class paths to pass to the child JVM
    +          |  [java library paths] - library paths to pass to the child JVM
    +          |  [java opts]          - java options to pass to the child JVM
    +          |  [java memory]        - memory used to launch the child JVM
    +          |  [client mode]        - whether the child JVM will run the Spark driver
    +          |  [main class]         - main class to run in the child JVM
    +          |  <main args>          - arguments passed to this main class
    +          |
    +          |Example:
    +          |  org.apache.spark.deploy.SparkClassLauncher.SparkClassLauncher
    +          |    conf/spark-defaults.conf java /classpath1:/classpath2 /librarypath1:/librarypath2
    +          |    "-XX:-UseParallelGC -Dsome=property" 5g true org.apache.spark.deploy.SparkSubmit
    +          |    --master local --class org.apache.spark.examples.SparkPi 10
    +        """.stripMargin)
    +      System.exit(1)
    +    }
    +    val propertiesFile = args(0)
    +    val javaRunner = args(1)
    +    val clClassPaths = args(2)
    +    val clLibraryPaths = args(3)
    +    val clJavaOpts = Utils.splitCommandString(args(4))
    +    val clJavaMemory = args(5)
    +    val clientMode = args(6) == "true"
    --- End diff --
    
    I can't see a case where this would ever not be "true" - to keep it simple and understandable it might be good to just omit this command line argument and the associated logic.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849] Handle driver configs separately ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1845#issuecomment-52822517
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18973/consoleFull) for   PR 1845 at commit [`24dba60`](https://github.com/apache/spark/commit/24dba6067b33719e7eb25b0fb536bc2df72d34a4).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849] Handle driver configs separately ...

Posted by vanzin <gi...@git.apache.org>.

Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/1845#issuecomment-52675629
  
    Hi @andrewor14,
    
    First thanks for working on this. But I wonder if a better approach wouldn't be to, instead, use this bootstrap class to just create the actual SparkSubmit command line and still execute the SparkSubmit VM directly from bash?
    
    Something like this:
    
        javaArgs=$(java SparkSubmitBootstrapper args)
        exec java $javaArgs
    
    That way we can keep all the logic regarding figuring out the command line to be executed in Scala code, instead of the current split-brain state. The bash script could be greatly simplified that way, I think. It also has the advantage of just keeping one VM alive while the app is running.
    
    What do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849] Handle driver configs separately ...

Posted by pwendell <gi...@git.apache.org>.

Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/1845#issuecomment-52821841
  
    Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849 / 2914] Handle certain Spark confi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1845#issuecomment-52582819
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18805/consoleFull) for   PR 1845 at commit [`b71f52b`](https://github.com/apache/spark/commit/b71f52b7adaf5b3e67e17e31fa4fe11089494e1a).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849 / 2914] Handle certain Spark confi...

Posted by pwendell <gi...@git.apache.org>.

Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1845#discussion_r16396112
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkClassLauncher.scala ---
    @@ -0,0 +1,116 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.deploy
    +
    +import java.io.File
    +
    +import scala.collection.JavaConversions._
    +
    +import org.apache.spark.util.{RedirectThread, Utils}
    +
    +/**
    + * Wrapper of `bin/spark-class` that prepares the launch environment of the child JVM properly.
    + * This is currently only used for running Spark submit in client mode. The goal moving forward
    --- End diff --
    
    I would structure that goal as a TODO that appears internally in the class rather than in the public docstring.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849] Handle driver configs separately ...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/1845#issuecomment-52713424

**Update.** Looks like pyspark wasn't actually broken. I was testing out @vanzin's proposal and ran into a python stdout interference message that turned out to be highly misleading. I have filed this separately here: [SPARK-3140](https://issues.apache.org/jira/browse/SPARK-3140).

Also, I realize that having the `SparkSubmitDriverBootstrapper` output a list of arguments back to Bash won't actually work, because Bash will need to split the output, which may contain whitespace in quotes (among other things to escape), so we're back to the original problem of splitting the command string. It may be possible to do some escaping in Scala and then unescaping in Bash, but it gets a little complicated as we go down that road.

The open issue now is getting pyspark to kill the `SparkSubmitDriverBootStrapper's` subprocess to exit with the parent JVM.

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849 / 2914] Handle certain Spark confi...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/1845#issuecomment-52384612
  
    test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849] Handle driver configs separately ...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/1845#issuecomment-52709203
  
    Thanks @vanzin I think that's actually a good idea. I have observed that the subprocess doesn't actually exit when running pyspark, and reducing one layer of inception in this whole stack would keep things simpler.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849 / 2914] Handle certain Spark confi...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/1845#issuecomment-52608223
  
    @pwendell Thanks for your feedback. I think I have addressed all your comments. In particular, I have fixed the precedence order of the various properties that were relevant and allowed the parameters to be passed as environment variables rather than as command line arguments. I also came up with a new name. See if you like it. I do not have access to a test environment for windows at the moment, so I prefer to defer the fix to another PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849 / 2914] Handle certain Spark confi...

Posted by pwendell <gi...@git.apache.org>.

Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1845#discussion_r16397499
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkClassLauncher.scala ---
    @@ -0,0 +1,116 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.deploy
    +
    +import java.io.File
    +
    +import scala.collection.JavaConversions._
    +
    +import org.apache.spark.util.{RedirectThread, Utils}
    +
    +/**
    + * Wrapper of `bin/spark-class` that prepares the launch environment of the child JVM properly.
    + * This is currently only used for running Spark submit in client mode. The goal moving forward
    + * is to use this class for all use cases of `bin/spark-class`.
    + */
    +object SparkClassLauncher {
    +
    +  /**
    +   * Launch a Spark class with the given class paths, library paths, java options and memory.
    +   * If we are launching an application through Spark submit in client mode, we must also
    +   * take into account special `spark.driver.*` properties needed to start the driver JVM.
    +   */
    +  def main(args: Array[String]): Unit = {
    +    if (args.size < 8) {
    +      System.err.println(
    +        """
    +          |Usage: org.apache.spark.deploy.SparkClassLauncher
    +          |
    +          |  [properties file]    - path to your Spark properties file
    +          |  [java runner]        - command to launch the child JVM
    +          |  [java class paths]   - class paths to pass to the child JVM
    +          |  [java library paths] - library paths to pass to the child JVM
    +          |  [java opts]          - java options to pass to the child JVM
    +          |  [java memory]        - memory used to launch the child JVM
    +          |  [client mode]        - whether the child JVM will run the Spark driver
    +          |  [main class]         - main class to run in the child JVM
    +          |  <main args>          - arguments passed to this main class
    +          |
    +          |Example:
    +          |  org.apache.spark.deploy.SparkClassLauncher.SparkClassLauncher
    +          |    conf/spark-defaults.conf java /classpath1:/classpath2 /librarypath1:/librarypath2
    +          |    "-XX:-UseParallelGC -Dsome=property" 5g true org.apache.spark.deploy.SparkSubmit
    +          |    --master local --class org.apache.spark.examples.SparkPi 10
    +        """.stripMargin)
    +      System.exit(1)
    +    }
    +    val propertiesFile = args(0)
    +    val javaRunner = args(1)
    +    val clClassPaths = args(2)
    +    val clLibraryPaths = args(3)
    +    val clJavaOpts = Utils.splitCommandString(args(4))
    +    val clJavaMemory = args(5)
    +    val clientMode = args(6) == "true"
    +    val mainClass = args(7)
    +
    +    // In client deploy mode, parse the properties file for certain `spark.driver.*` configs.
    +    // These configs encode java options, class paths, and library paths needed to launch the JVM.
    +    val properties =
    +      if (clientMode) {
    +        SparkSubmitArguments.getPropertiesFromFile(new File(propertiesFile)).toMap
    +      } else {
    +        Map[String, String]()
    +      }
    +    val confDriverMemory = properties.get("spark.driver.memory")
    +    val confClassPaths = properties.get("spark.driver.extraClassPath")
    +    val confLibraryPaths = properties.get("spark.driver.extraLibraryPath")
    +    val confJavaOpts = properties.get("spark.driver.extraJavaOptions")
    +
    +    // Merge relevant command line values with the config equivalents, if any
    +    val javaMemory =
    +      if (clientMode) {
    +        confDriverMemory.getOrElse(clJavaMemory)
    +      } else {
    +        clJavaMemory
    +      }
    +    val pathSeparator = sys.props("path.separator")
    +    val classPaths = clClassPaths + confClassPaths.map(pathSeparator + _).getOrElse("")
    +    val libraryPaths = clLibraryPaths + confLibraryPaths.map(pathSeparator + _).getOrElse("")
    +    val javaOpts = clJavaOpts ++ confJavaOpts.map(Utils.splitCommandString).getOrElse(Seq.empty)
    +    val filteredJavaOpts = javaOpts.distinct.filterNot { opt =>
    +      opt.startsWith("-Djava.library.path") || opt.startsWith("-Xms") || opt.startsWith("-Xmx")
    --- End diff --
    
    does this mean if I put `--driver-memory` as a flag and I also include `spark.driver.memory` it in the config file that the config file will take precedence?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849 / 2914] Handle certain Spark confi...

Posted by pwendell <gi...@git.apache.org>.

Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1845#discussion_r16280454
  
    --- Diff: bin/utils.sh ---
    @@ -0,0 +1,108 @@
    +#!/usr/bin/env bash
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +# * ---------------------------------------------------- *
    +# |  Utility functions for launching Spark applications  |
    +# * ---------------------------------------------------- *
    +
    +# Parse the value of a config from a java properties file according to the specifications in
    +# http://docs.oracle.com/javase/7/docs/api/java/util/Properties.html#load(java.io.Reader).
    +# This accepts the name of the config as an argument, and expects the path of the property
    +# file to be found in PROPERTIES_FILE. The value is returned through JAVA_PROPERTY_VALUE.
    +parse_java_property() {
    +  JAVA_PROPERTY_VALUE=""  # return value
    +  config_buffer=""        # buffer for collecting parts of a config value
    +  multi_line=0            # whether this config is spanning multiple lines
    +  while read -r line; do
    +    # Strip leading and trailing whitespace
    +    line=$(echo "$line" | sed "s/^[[:space:]]\(.*\)[[:space:]]*$/\1/")
    +    contains_config=$(echo "$line" | grep -e "^$1")
    +    if [[ -n "$contains_config" || "$multi_line" == 1 ]]; then
    +      has_more_lines=$(echo "$line" | grep -e "\\\\$")
    +      if [[ -n "$has_more_lines" ]]; then
    --- End diff --
    
    We discussed this already today, but I'd prefer not to deal with multiline statements at all here to keep this simpler.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849 / 2914] Handle certain Spark confi...

Posted by kanzhang <gi...@git.apache.org>.

Github user kanzhang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1845#discussion_r16330493
  
    --- Diff: bin/utils.sh ---
    @@ -17,9 +16,53 @@
     # limitations under the License.
     #
     
    +# * ---------------------------------------------------- *
    +# |  Utility functions for launching Spark applications  |
    +# * ---------------------------------------------------- *
    +
    +# Parse the value of a config from a java properties file according to the specifications in
    +# http://docs.oracle.com/javase/7/docs/api/java/util/Properties.html#load(java.io.Reader),
    +# with the exception of the support for multi-line arguments. This accepts the name of the
    +# config as an argument, and expects the path of the property file to be found in
    +# PROPERTIES_FILE. The value is returned through JAVA_PROPERTY_VALUE.
    +function parse_java_property() {
    +  JAVA_PROPERTY_VALUE=$(\
    +    grep "^[[:space:]]*$1" "$PROPERTIES_FILE" | \
    +    head -n 1 | \
    +    sed "s/^[[:space:]]*$1//g" | \
    +    sed "s/^[[:space:]]*[:=]\{0,1\}//g" | \
    +    sed "s/^[[:space:]]*//g" | \
    +    sed "s/[[:space:]]*$//g"
    +  )
    +  export JAVA_PROPERTY_VALUE
    +}
    +
    +# Properly split java options, dealing with whitespace, double quotes and backslashes.
    +# This accepts a string and returns the resulting list through SPLIT_JAVA_OPTS.
    +# For security reasons, this is isolated in its own function.
    +function split_java_options() {
    +  eval set -- "$1"
    --- End diff --
    
    It's fair to say "I don't think this allows people to do anything they couldn't already do", assuming the input (local conf file) is considered trusted. Obviously, it shouldn't be called with untrusted inputs without validation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849] Handle driver configs separately ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1845#issuecomment-52743953
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18938/consoleFull) for   PR 1845 at commit [`24dba60`](https://github.com/apache/spark/commit/24dba6067b33719e7eb25b0fb536bc2df72d34a4).
     * This patch **fails** unit tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849 / 2914] Handle certain Spark confi...

Posted by pwendell <gi...@git.apache.org>.

Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1845#discussion_r16397548
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkClassLauncher.scala ---
    @@ -0,0 +1,116 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.deploy
    +
    +import java.io.File
    +
    +import scala.collection.JavaConversions._
    +
    +import org.apache.spark.util.{RedirectThread, Utils}
    +
    +/**
    + * Wrapper of `bin/spark-class` that prepares the launch environment of the child JVM properly.
    + * This is currently only used for running Spark submit in client mode. The goal moving forward
    + * is to use this class for all use cases of `bin/spark-class`.
    + */
    +object SparkClassLauncher {
    +
    +  /**
    +   * Launch a Spark class with the given class paths, library paths, java options and memory.
    +   * If we are launching an application through Spark submit in client mode, we must also
    +   * take into account special `spark.driver.*` properties needed to start the driver JVM.
    +   */
    +  def main(args: Array[String]): Unit = {
    +    if (args.size < 8) {
    +      System.err.println(
    +        """
    +          |Usage: org.apache.spark.deploy.SparkClassLauncher
    +          |
    +          |  [properties file]    - path to your Spark properties file
    +          |  [java runner]        - command to launch the child JVM
    +          |  [java class paths]   - class paths to pass to the child JVM
    +          |  [java library paths] - library paths to pass to the child JVM
    +          |  [java opts]          - java options to pass to the child JVM
    +          |  [java memory]        - memory used to launch the child JVM
    +          |  [client mode]        - whether the child JVM will run the Spark driver
    +          |  [main class]         - main class to run in the child JVM
    +          |  <main args>          - arguments passed to this main class
    +          |
    +          |Example:
    +          |  org.apache.spark.deploy.SparkClassLauncher.SparkClassLauncher
    +          |    conf/spark-defaults.conf java /classpath1:/classpath2 /librarypath1:/librarypath2
    +          |    "-XX:-UseParallelGC -Dsome=property" 5g true org.apache.spark.deploy.SparkSubmit
    +          |    --master local --class org.apache.spark.examples.SparkPi 10
    +        """.stripMargin)
    +      System.exit(1)
    +    }
    +    val propertiesFile = args(0)
    +    val javaRunner = args(1)
    +    val clClassPaths = args(2)
    +    val clLibraryPaths = args(3)
    +    val clJavaOpts = Utils.splitCommandString(args(4))
    +    val clJavaMemory = args(5)
    +    val clientMode = args(6) == "true"
    +    val mainClass = args(7)
    +
    +    // In client deploy mode, parse the properties file for certain `spark.driver.*` configs.
    +    // These configs encode java options, class paths, and library paths needed to launch the JVM.
    +    val properties =
    +      if (clientMode) {
    +        SparkSubmitArguments.getPropertiesFromFile(new File(propertiesFile)).toMap
    +      } else {
    +        Map[String, String]()
    +      }
    +    val confDriverMemory = properties.get("spark.driver.memory")
    +    val confClassPaths = properties.get("spark.driver.extraClassPath")
    +    val confLibraryPaths = properties.get("spark.driver.extraLibraryPath")
    +    val confJavaOpts = properties.get("spark.driver.extraJavaOptions")
    +
    +    // Merge relevant command line values with the config equivalents, if any
    +    val javaMemory =
    +      if (clientMode) {
    +        confDriverMemory.getOrElse(clJavaMemory)
    +      } else {
    +        clJavaMemory
    +      }
    +    val pathSeparator = sys.props("path.separator")
    +    val classPaths = clClassPaths + confClassPaths.map(pathSeparator + _).getOrElse("")
    +    val libraryPaths = clLibraryPaths + confLibraryPaths.map(pathSeparator + _).getOrElse("")
    --- End diff --
    
    rather than append one to the other - I think that users setting `--driver-library-path` should simply take precedence over any `driver.library.path` defined in the conf. That's the behavior we have in general for configs... it could cause some confusion to merge them here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849 / 2914] Handle certain Spark confi...

Posted by pwendell <gi...@git.apache.org>.

Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1845#discussion_r16323472
  
    --- Diff: bin/utils.sh ---
    @@ -17,9 +16,53 @@
     # limitations under the License.
     #
     
    +# * ---------------------------------------------------- *
    +# |  Utility functions for launching Spark applications  |
    +# * ---------------------------------------------------- *
    +
    +# Parse the value of a config from a java properties file according to the specifications in
    +# http://docs.oracle.com/javase/7/docs/api/java/util/Properties.html#load(java.io.Reader),
    +# with the exception of the support for multi-line arguments. This accepts the name of the
    +# config as an argument, and expects the path of the property file to be found in
    +# PROPERTIES_FILE. The value is returned through JAVA_PROPERTY_VALUE.
    +function parse_java_property() {
    +  JAVA_PROPERTY_VALUE=$(\
    +    grep "^[[:space:]]*$1" "$PROPERTIES_FILE" | \
    +    head -n 1 | \
    +    sed "s/^[[:space:]]*$1//g" | \
    +    sed "s/^[[:space:]]*[:=]\{0,1\}//g" | \
    +    sed "s/^[[:space:]]*//g" | \
    +    sed "s/[[:space:]]*$//g"
    +  )
    +  export JAVA_PROPERTY_VALUE
    +}
    +
    +# Properly split java options, dealing with whitespace, double quotes and backslashes.
    +# This accepts a string and returns the resulting list through SPLIT_JAVA_OPTS.
    +# For security reasons, this is isolated in its own function.
    +function split_java_options() {
    +  eval set -- "$1"
    --- End diff --
    
    Do we need to document the security concerns with this function? Basically, we can't call this on user input except a script directly called by the user. It might make sense to actually name this `split_java_options_insecure` or even just `split_command_string_insecure` (similar to the naming in spark).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849] Handle driver configs separately ...

Posted by pwendell <gi...@git.apache.org>.

Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1845#discussion_r16428762
  
    --- Diff: bin/spark-submit ---
    @@ -17,28 +17,46 @@
     # limitations under the License.
     #
     
    +# NOTE: Any changes in this file must be reflected in SparkClassLauncher.scala!
    +
     export SPARK_HOME="$(cd `dirname $0`/..; pwd)"
     ORIG_ARGS=("$@")
     
     while (($#)); do
       if [ "$1" = "--deploy-mode" ]; then
    -    DEPLOY_MODE=$2
    +    SPARK_SUBMIT_DEPLOY_MODE=$2
    +  elif [ "$1" = "--properties-file" ]; then
    +    SPARK_SUBMIT_PROPERTIES_FILE=$2
       elif [ "$1" = "--driver-memory" ]; then
    -    DRIVER_MEMORY=$2
    +    export SPARK_SUBMIT_DRIVER_MEMORY=$2
       elif [ "$1" = "--driver-library-path" ]; then
         export SPARK_SUBMIT_LIBRARY_PATH=$2
       elif [ "$1" = "--driver-class-path" ]; then
         export SPARK_SUBMIT_CLASSPATH=$2
       elif [ "$1" = "--driver-java-options" ]; then
    -    export SPARK_SUBMIT_OPTS=$2
    --- End diff --
    
    I believe SPARK_SUBMIT_OPTS has become a slightly public API since some people have used it when scripting to avoid exactly the problem this PR is trying to solve. So it might be to initialize SPARK_SUBMIT_JAVA_OPTS to the value of SPARK_SUBMIT_OPTS with a comment that it's for compatiblity.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849] Handle driver configs separately ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1845#issuecomment-52744341
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18939/consoleFull) for   PR 1845 at commit [`24dba60`](https://github.com/apache/spark/commit/24dba6067b33719e7eb25b0fb536bc2df72d34a4).
     * This patch **fails** unit tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849 / 2914] Handle certain Spark confi...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1845#discussion_r16397299
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkClassLauncher.scala ---
    @@ -0,0 +1,116 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.deploy
    +
    +import java.io.File
    +
    +import scala.collection.JavaConversions._
    +
    +import org.apache.spark.util.{RedirectThread, Utils}
    +
    +/**
    + * Wrapper of `bin/spark-class` that prepares the launch environment of the child JVM properly.
    + * This is currently only used for running Spark submit in client mode. The goal moving forward
    + * is to use this class for all use cases of `bin/spark-class`.
    + */
    +object SparkClassLauncher {
    +
    +  /**
    +   * Launch a Spark class with the given class paths, library paths, java options and memory.
    +   * If we are launching an application through Spark submit in client mode, we must also
    +   * take into account special `spark.driver.*` properties needed to start the driver JVM.
    +   */
    +  def main(args: Array[String]): Unit = {
    +    if (args.size < 8) {
    +      System.err.println(
    +        """
    +          |Usage: org.apache.spark.deploy.SparkClassLauncher
    +          |
    +          |  [properties file]    - path to your Spark properties file
    +          |  [java runner]        - command to launch the child JVM
    +          |  [java class paths]   - class paths to pass to the child JVM
    +          |  [java library paths] - library paths to pass to the child JVM
    +          |  [java opts]          - java options to pass to the child JVM
    +          |  [java memory]        - memory used to launch the child JVM
    +          |  [client mode]        - whether the child JVM will run the Spark driver
    +          |  [main class]         - main class to run in the child JVM
    +          |  <main args>          - arguments passed to this main class
    +          |
    +          |Example:
    +          |  org.apache.spark.deploy.SparkClassLauncher.SparkClassLauncher
    +          |    conf/spark-defaults.conf java /classpath1:/classpath2 /librarypath1:/librarypath2
    +          |    "-XX:-UseParallelGC -Dsome=property" 5g true org.apache.spark.deploy.SparkSubmit
    +          |    --master local --class org.apache.spark.examples.SparkPi 10
    +        """.stripMargin)
    +      System.exit(1)
    +    }
    +    val propertiesFile = args(0)
    +    val javaRunner = args(1)
    +    val clClassPaths = args(2)
    +    val clLibraryPaths = args(3)
    +    val clJavaOpts = Utils.splitCommandString(args(4))
    +    val clJavaMemory = args(5)
    +    val clientMode = args(6) == "true"
    --- End diff --
    
    This is anticipating the future where all usages of `bin/spark-class` are routed through this class. I was hesitant on removing this because it would technically break backward compatibility if we decide to add a new flag in the future, since this is not `private[spark]` or anything. Should I just add a warning saying this should be called only from `bin/spark-class` instead?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849 / 2914] Handle certain Spark confi...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/1845#issuecomment-52557810
  
    Working on a new version of this that involves launching two JVMs. Hold off reviewing for now and stay tuned for a revised version.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849] Handle driver configs separately ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1845#issuecomment-52613410
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18836/consoleFull) for   PR 1845 at commit [`d6488f9`](https://github.com/apache/spark/commit/d6488f92ddc15860747ef6ed4082e7ffca62c468).
     * This patch **passes** unit tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849 / 2914] Handle certain Spark confi...

Posted by pwendell <gi...@git.apache.org>.

Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1845#discussion_r16397360
  
    --- Diff: bin/spark-class ---
    @@ -146,10 +149,28 @@ if $cygwin; then
     fi
     export CLASSPATH
     
    -if [ "$SPARK_PRINT_LAUNCH_COMMAND" == "1" ]; then
    +if [ -n "$SPARK_PRINT_LAUNCH_COMMAND" ]; then
       echo -n "Spark Command: " 1>&2
       echo "$RUNNER" -cp "$CLASSPATH" $JAVA_OPTS "$@" 1>&2
       echo -e "========================================\n" 1>&2
     fi
     
    -exec "$RUNNER" -cp "$CLASSPATH" $JAVA_OPTS "$@"
    +# In Spark submit client mode, the driver is launched in the same JVM as Spark submit itself.
    +# Here we must parse the properties file for relevant "spark.driver.*" configs for launching
    +# the driver JVM itself.
    +
    +if [ -n "$SPARK_SUBMIT_CLIENT_MODE" ]; then
    +  # This is currently used only if the properties file actually consists of these special configs
    --- End diff --
    
    There is no doc anywhere here that explains why this `SparkClassLauncher` is needed - notably to help bootstrap setting certain java options that we don't want to parse in bash. It be good to say this here or somewhere in a comment in `SparkClassLauncher`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849 / 2914] Handle certain Spark confi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1845#issuecomment-52385431
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18661/consoleFull) for   PR 1845 at commit [`c886568`](https://github.com/apache/spark/commit/c886568687adfda65ae27d0fefa184b345a770ad).
     * This patch **passes** unit tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849 / 2914] Handle certain Spark confi...

Posted by pwendell <gi...@git.apache.org>.

Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1845#discussion_r16396219
  
    --- Diff: bin/spark-submit ---
    @@ -36,9 +38,27 @@ while (($#)); do
     done
     
     DEPLOY_MODE=${DEPLOY_MODE:-"client"}
    +DEFAULT_PROPERTIES_FILE="$SPARK_HOME/conf/spark-defaults.conf"
    +PROPERTIES_FILE=${PROPERTIES_FILE:-"$DEFAULT_PROPERTIES_FILE"}
     
    -if [ -n "$DRIVER_MEMORY" ] && [ $DEPLOY_MODE == "client" ]; then
    -  export SPARK_DRIVER_MEMORY=$DRIVER_MEMORY
    +# For client mode, the driver will be launched in the same JVM that launches
    +# SparkSubmit, so we may need to read the properties file for any extra class
    +# paths, library paths, java options and memory early on. Otherwise, it will
    +# be too late by the time the JVM has started.
    +
    +if [ "$DEPLOY_MODE" == "client" ]; then
    --- End diff --
    
    These script changes seem simple enough that it makes sense to port them to windows in the same patch.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849] Handle driver configs separately ...

Posted by pwendell <gi...@git.apache.org>.

Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/1845#issuecomment-52851182
  
    Okay LGTM as an immediate fix - want to pull this into an RC.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849] Handle driver configs separately ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1845#issuecomment-52743537
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18937/consoleFull) for   PR 1845 at commit [`24dba60`](https://github.com/apache/spark/commit/24dba6067b33719e7eb25b0fb536bc2df72d34a4).
     * This patch **fails** unit tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849 / 2914] Handle certain Spark confi...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1845#discussion_r16403594
  
    --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
    @@ -1421,3 +1421,24 @@ private[spark] object Utils extends Logging {
       }
     
     }
    +
    +/**
    + * A utility class to redirect the child process's stdout or stderr.
    + */
    +private[spark] class RedirectThread(in: InputStream, out: OutputStream, name: String)
    +  extends Thread(name) {
    +
    +  setDaemon(true)
    +  override def run() {
    +    scala.util.control.Exception.ignoring(classOf[IOException]) {
    +      // FIXME: We copy the stream on the level of bytes to avoid encoding problems.
    --- End diff --
    
    *shrug*


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849 / 2914] Handle certain Spark confi...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1845#discussion_r16322977
  
    --- Diff: bin/utils.sh ---
    @@ -0,0 +1,108 @@
    +#!/usr/bin/env bash
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +# * ---------------------------------------------------- *
    +# |  Utility functions for launching Spark applications  |
    +# * ---------------------------------------------------- *
    +
    +# Parse the value of a config from a java properties file according to the specifications in
    +# http://docs.oracle.com/javase/7/docs/api/java/util/Properties.html#load(java.io.Reader).
    +# This accepts the name of the config as an argument, and expects the path of the property
    +# file to be found in PROPERTIES_FILE. The value is returned through JAVA_PROPERTY_VALUE.
    +parse_java_property() {
    +  JAVA_PROPERTY_VALUE=""  # return value
    +  config_buffer=""        # buffer for collecting parts of a config value
    +  multi_line=0            # whether this config is spanning multiple lines
    +  while read -r line; do
    +    # Strip leading and trailing whitespace
    +    line=$(echo "$line" | sed "s/^[[:space:]]\(.*\)[[:space:]]*$/\1/")
    +    contains_config=$(echo "$line" | grep -e "^$1")
    +    if [[ -n "$contains_config" || "$multi_line" == 1 ]]; then
    +      has_more_lines=$(echo "$line" | grep -e "\\\\$")
    +      if [[ -n "$has_more_lines" ]]; then
    +        # Strip trailing backslash
    +        line=$(echo "$line" | sed "s/\\\\$//")
    +        config_buffer="$config_buffer $line"
    +        multi_line=1
    +      else
    +        JAVA_PROPERTY_VALUE="$config_buffer $line"
    +        break
    +      fi
    +    fi
    +  done < "$PROPERTIES_FILE"
    +
    +  # Actually extract the value of the config
    +  JAVA_PROPERTY_VALUE=$( \
    +    echo "$JAVA_PROPERTY_VALUE" | \
    +    sed "s/$1//" | \
    +    sed "s/^[[:space:]]*[:=]\{0,1\}//" | \
    +    sed "s/^[[:space:]]*\(.*\)[[:space:]]*$/\1/g" \
    +  )
    +  export JAVA_PROPERTY_VALUE
    +}
    +
    +# Properly split java options, dealing with whitespace, double quotes and backslashes.
    +# This accepts a string and returns the resulting list through SPLIT_JAVA_OPTS.
    +split_java_options() {
    +  SPLIT_JAVA_OPTS=()  # return value
    --- End diff --
    
    Yeah, that does exactly what we want.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849] Handle driver configs separately ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1845#issuecomment-52843308
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18993/consoleFull) for   PR 1845 at commit [`bed4bdf`](https://github.com/apache/spark/commit/bed4bdff74e941f791080e9407fc5295af10f677).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849] Handle driver configs separately ...

Posted by pwendell <gi...@git.apache.org>.

Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1845#discussion_r16428811
  
    --- Diff: bin/spark-submit ---
    @@ -17,28 +17,46 @@
     # limitations under the License.
     #
     
    +# NOTE: Any changes in this file must be reflected in SparkClassLauncher.scala!
    +
     export SPARK_HOME="$(cd `dirname $0`/..; pwd)"
     ORIG_ARGS=("$@")
     
     while (($#)); do
       if [ "$1" = "--deploy-mode" ]; then
    -    DEPLOY_MODE=$2
    +    SPARK_SUBMIT_DEPLOY_MODE=$2
    +  elif [ "$1" = "--properties-file" ]; then
    +    SPARK_SUBMIT_PROPERTIES_FILE=$2
       elif [ "$1" = "--driver-memory" ]; then
    -    DRIVER_MEMORY=$2
    +    export SPARK_SUBMIT_DRIVER_MEMORY=$2
       elif [ "$1" = "--driver-library-path" ]; then
         export SPARK_SUBMIT_LIBRARY_PATH=$2
       elif [ "$1" = "--driver-class-path" ]; then
         export SPARK_SUBMIT_CLASSPATH=$2
       elif [ "$1" = "--driver-java-options" ]; then
    -    export SPARK_SUBMIT_OPTS=$2
    +    export SPARK_SUBMIT_JAVA_OPTS=$2
       fi
       shift
     done
     
    -DEPLOY_MODE=${DEPLOY_MODE:-"client"}
    +DEFAULT_PROPERTIES_FILE="$SPARK_HOME/conf/spark-defaults.conf"
    +export SPARK_SUBMIT_DEPLOY_MODE=${SPARK_SUBMIT_DEPLOY_MODE:-"client"}
    +export SPARK_SUBMIT_PROPERTIES_FILE=${SPARK_SUBMIT_PROPERTIES_FILE:-"$DEFAULT_PROPERTIES_FILE"}
    +
    +# For client mode, the driver will be launched in the same JVM that launches
    +# SparkSubmit, so we may need to read the properties file for any extra class
    +# paths, library paths, java options and memory early on. Otherwise, it will
    +# be too late by the time the JVM has started.
     
    -if [ -n "$DRIVER_MEMORY" ] && [ $DEPLOY_MODE == "client" ]; then
    -  export SPARK_DRIVER_MEMORY=$DRIVER_MEMORY
    +if [ "$SPARK_SUBMIT_DEPLOY_MODE" == "client" ]; then
    +  # Parse the properties file only if the special configs exist
    +  contains_special_configs=$(
    +    grep -e "spark.driver.extra*\|spark.driver.memory" "$SPARK_SUBMIT_PROPERTIES_FILE" | \
    --- End diff --
    
    if SPARK_SUBMIT_PROPERTIES_FILE does not exist, will this line cause the script to fail?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849 / 2914] Handle certain Spark confi...

Posted by pwendell <gi...@git.apache.org>.

Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1845#discussion_r16397578
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkClassLauncher.scala ---
    @@ -0,0 +1,116 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.deploy
    +
    +import java.io.File
    +
    +import scala.collection.JavaConversions._
    +
    +import org.apache.spark.util.{RedirectThread, Utils}
    +
    +/**
    + * Wrapper of `bin/spark-class` that prepares the launch environment of the child JVM properly.
    + * This is currently only used for running Spark submit in client mode. The goal moving forward
    + * is to use this class for all use cases of `bin/spark-class`.
    + */
    +object SparkClassLauncher {
    +
    +  /**
    +   * Launch a Spark class with the given class paths, library paths, java options and memory.
    +   * If we are launching an application through Spark submit in client mode, we must also
    +   * take into account special `spark.driver.*` properties needed to start the driver JVM.
    +   */
    +  def main(args: Array[String]): Unit = {
    +    if (args.size < 8) {
    +      System.err.println(
    +        """
    +          |Usage: org.apache.spark.deploy.SparkClassLauncher
    +          |
    +          |  [properties file]    - path to your Spark properties file
    +          |  [java runner]        - command to launch the child JVM
    +          |  [java class paths]   - class paths to pass to the child JVM
    +          |  [java library paths] - library paths to pass to the child JVM
    +          |  [java opts]          - java options to pass to the child JVM
    +          |  [java memory]        - memory used to launch the child JVM
    +          |  [client mode]        - whether the child JVM will run the Spark driver
    +          |  [main class]         - main class to run in the child JVM
    +          |  <main args>          - arguments passed to this main class
    +          |
    +          |Example:
    +          |  org.apache.spark.deploy.SparkClassLauncher.SparkClassLauncher
    +          |    conf/spark-defaults.conf java /classpath1:/classpath2 /librarypath1:/librarypath2
    +          |    "-XX:-UseParallelGC -Dsome=property" 5g true org.apache.spark.deploy.SparkSubmit
    +          |    --master local --class org.apache.spark.examples.SparkPi 10
    +        """.stripMargin)
    +      System.exit(1)
    +    }
    +    val propertiesFile = args(0)
    +    val javaRunner = args(1)
    +    val clClassPaths = args(2)
    +    val clLibraryPaths = args(3)
    +    val clJavaOpts = Utils.splitCommandString(args(4))
    +    val clJavaMemory = args(5)
    +    val clientMode = args(6) == "true"
    --- End diff --
    
    This is an internal tool - it doesn't need to have compatibility at all.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849] Handle driver configs separately ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1845#issuecomment-52849125
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18991/consoleFull) for   PR 1845 at commit [`24dba60`](https://github.com/apache/spark/commit/24dba6067b33719e7eb25b0fb536bc2df72d34a4).
     * This patch **passes** unit tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849] Handle driver configs separately ...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/1845#issuecomment-52743675
  
    Hm, test this please!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849 / 2914] Handle certain Spark confi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1845#issuecomment-52585848
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18805/consoleFull) for   PR 1845 at commit [`b71f52b`](https://github.com/apache/spark/commit/b71f52b7adaf5b3e67e17e31fa4fe11089494e1a).
     * This patch **passes** unit tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849 / 2914] Handle certain Spark confi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1845#issuecomment-51680007
  
    QA tests have started for PR 1845. This patch merges cleanly. <br>View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18247/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849] Handle driver configs separately ...

Posted by pwendell <gi...@git.apache.org>.

Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/1845#issuecomment-52840984
  
    Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849] Handle driver configs separately ...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/1845#issuecomment-52684123
  
    Looks like pyspark is broken; still investigating.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849] Handle driver configs separately ...

Posted by pwendell <gi...@git.apache.org>.

Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/1845#issuecomment-52821311
  
    @vanzin that's actually a decent idea and something we should seriously consider doing. This patch I want to get in for 1.1. since this is a very confusing situation for users, but it's worth looking at the design of just re-writing the entire thing to simplify it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849] Handle driver configs separately ...

Posted by vanzin <gi...@git.apache.org>.

Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/1845#issuecomment-52729431
  
    @andrewor14 the scala code would have to generate the arguments properly quoted/escaped for bash.
    
    @sryza it's hard to not run two jvms if you want to support all the possible ways of writing a properties file. It's just too hard to do in pure bash (or with standard posix tools without resorting to python / perl).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849] Handle driver configs separately ...

Posted by sryza <gi...@git.apache.org>.

Github user sryza commented on the pull request:

    https://github.com/apache/spark/pull/1845#issuecomment-52713440
  
    When we added spark-submit originally, we went with the current approach (Bash->Scala) because @mateiz had concerns about the overhead of starting two JVMs.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849 / 2914] Handle certain Spark confi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1845#issuecomment-51556022
  
    QA tests have started for PR 1845. This patch merges cleanly. <br>View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18169/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849 / 2914] Handle certain Spark confi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1845#issuecomment-51680769
  
    QA results for PR 1845:<br>- This patch FAILED unit tests.<br>- This patch merges cleanly<br>- This patch adds no public classes<br><br>For more information see test ouptut:<br>https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18247/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849 / 2914] Handle certain Spark confi...

Posted by rxin <gi...@git.apache.org>.

Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/1845#issuecomment-51565448
  
    Given the complexity of the bash scripts, can we make sure we have some tests for this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849 / 2914] Handle certain Spark confi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1845#issuecomment-51695688
  
    QA tests have started for PR 1845. This patch merges cleanly. <br>View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18258/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849] Handle driver configs separately ...

Posted by vanzin <gi...@git.apache.org>.

Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/1845#issuecomment-52820810
  
    I see. Well, you could go the other way and still move everything to the scala bootstrapper, and always execute two VMs... still achieves simplification of the scripts, although now you always have two VMs running. :-/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849] Handle driver configs separately ...

Posted by pwendell <gi...@git.apache.org>.

Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/1845#issuecomment-52667793
  
    Hey Andrew - this is looking good. Only some minor comments here and I want to test it a bit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849 / 2914] Handle certain Spark confi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1845#issuecomment-51674448
  
    QA tests have started for PR 1845. This patch merges cleanly. <br>View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18239/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849 / 2914] Handle certain Spark confi...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/1845#issuecomment-51695612
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849 / 2914] Handle certain Spark confi...

Posted by pwendell <gi...@git.apache.org>.

Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1845#discussion_r16281080
  
    --- Diff: bin/utils.sh ---
    @@ -0,0 +1,108 @@
    +#!/usr/bin/env bash
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +# * ---------------------------------------------------- *
    +# |  Utility functions for launching Spark applications  |
    +# * ---------------------------------------------------- *
    +
    +# Parse the value of a config from a java properties file according to the specifications in
    +# http://docs.oracle.com/javase/7/docs/api/java/util/Properties.html#load(java.io.Reader).
    +# This accepts the name of the config as an argument, and expects the path of the property
    +# file to be found in PROPERTIES_FILE. The value is returned through JAVA_PROPERTY_VALUE.
    +parse_java_property() {
    +  JAVA_PROPERTY_VALUE=""  # return value
    +  config_buffer=""        # buffer for collecting parts of a config value
    +  multi_line=0            # whether this config is spanning multiple lines
    +  while read -r line; do
    +    # Strip leading and trailing whitespace
    +    line=$(echo "$line" | sed "s/^[[:space:]]\(.*\)[[:space:]]*$/\1/")
    +    contains_config=$(echo "$line" | grep -e "^$1")
    +    if [[ -n "$contains_config" || "$multi_line" == 1 ]]; then
    +      has_more_lines=$(echo "$line" | grep -e "\\\\$")
    +      if [[ -n "$has_more_lines" ]]; then
    +        # Strip trailing backslash
    +        line=$(echo "$line" | sed "s/\\\\$//")
    +        config_buffer="$config_buffer $line"
    +        multi_line=1
    +      else
    +        JAVA_PROPERTY_VALUE="$config_buffer $line"
    +        break
    +      fi
    +    fi
    +  done < "$PROPERTIES_FILE"
    +
    +  # Actually extract the value of the config
    +  JAVA_PROPERTY_VALUE=$( \
    +    echo "$JAVA_PROPERTY_VALUE" | \
    +    sed "s/$1//" | \
    +    sed "s/^[[:space:]]*[:=]\{0,1\}//" | \
    +    sed "s/^[[:space:]]*\(.*\)[[:space:]]*$/\1/g" \
    +  )
    +  export JAVA_PROPERTY_VALUE
    +}
    +
    +# Properly split java options, dealing with whitespace, double quotes and backslashes.
    +# This accepts a string and returns the resulting list through SPLIT_JAVA_OPTS.
    +split_java_options() {
    +  SPLIT_JAVA_OPTS=()  # return value
    --- End diff --
    
    I think this could be solved using an `eval` statement much more easily. At first I though there are some security concerns, but I don't think this allows people to do anything they couldn't already do:
    ```
    test="One \"This \\\"is two\" Three"
    eval "set -- $test"
    for i in "$@"
    do
      echo $i
    done
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849 / 2914] Handle certain Spark confi...

Posted by rxin <gi...@git.apache.org>.

Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1845#discussion_r16323564
  
    --- Diff: bin/utils.sh ---
    @@ -17,9 +16,53 @@
     # limitations under the License.
     #
     
    +# * ---------------------------------------------------- *
    +# |  Utility functions for launching Spark applications  |
    +# * ---------------------------------------------------- *
    +
    +# Parse the value of a config from a java properties file according to the specifications in
    +# http://docs.oracle.com/javase/7/docs/api/java/util/Properties.html#load(java.io.Reader),
    +# with the exception of the support for multi-line arguments. This accepts the name of the
    +# config as an argument, and expects the path of the property file to be found in
    +# PROPERTIES_FILE. The value is returned through JAVA_PROPERTY_VALUE.
    +function parse_java_property() {
    +  JAVA_PROPERTY_VALUE=$(\
    +    grep "^[[:space:]]*$1" "$PROPERTIES_FILE" | \
    +    head -n 1 | \
    +    sed "s/^[[:space:]]*$1//g" | \
    +    sed "s/^[[:space:]]*[:=]\{0,1\}//g" | \
    +    sed "s/^[[:space:]]*//g" | \
    +    sed "s/[[:space:]]*$//g"
    +  )
    +  export JAVA_PROPERTY_VALUE
    +}
    +
    +# Properly split java options, dealing with whitespace, double quotes and backslashes.
    +# This accepts a string and returns the resulting list through SPLIT_JAVA_OPTS.
    +# For security reasons, this is isolated in its own function.
    +function split_java_options() {
    +  eval set -- "$1"
    --- End diff --
    
    @kanzhang any comment on the security concern here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849] Handle driver configs separately ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1845#issuecomment-52677947
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18862/consoleFull) for   PR 1845 at commit [`d0f20db`](https://github.com/apache/spark/commit/d0f20db0e81551bb5f7447fe5491cc2ae5817421).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849 / 2914] Handle certain Spark confi...

Posted by pwendell <gi...@git.apache.org>.

Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/1845#issuecomment-52276624
  
    Hey @andrewor14 it's great to see this being fixed. I think things can be simplified a lot by deferring to the `eval` from bash to deal with the argumnet parsing. It's worth seeing if that handles all the cases here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849 / 2914] Handle certain Spark confi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1845#issuecomment-52383274
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18656/consoleFull) for   PR 1845 at commit [`7a4190a`](https://github.com/apache/spark/commit/7a4190af01454768fe2334c00129d2d535ed57fb).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849 / 2914] Handle certain Spark confi...

Posted by pwendell <gi...@git.apache.org>.

Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1845#discussion_r16397290
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkClassLauncher.scala ---
    @@ -0,0 +1,116 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.deploy
    +
    +import java.io.File
    +
    +import scala.collection.JavaConversions._
    +
    +import org.apache.spark.util.{RedirectThread, Utils}
    +
    +/**
    + * Wrapper of `bin/spark-class` that prepares the launch environment of the child JVM properly.
    + * This is currently only used for running Spark submit in client mode. The goal moving forward
    + * is to use this class for all use cases of `bin/spark-class`.
    + */
    +object SparkClassLauncher {
    +
    +  /**
    +   * Launch a Spark class with the given class paths, library paths, java options and memory.
    +   * If we are launching an application through Spark submit in client mode, we must also
    +   * take into account special `spark.driver.*` properties needed to start the driver JVM.
    +   */
    +  def main(args: Array[String]): Unit = {
    +    if (args.size < 8) {
    +      System.err.println(
    +        """
    +          |Usage: org.apache.spark.deploy.SparkClassLauncher
    --- End diff --
    
    To keep it simpler for now, rather than having a bunch of command line arguments, why not just directly read the environment variables set in `spark-class`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849] Handle driver configs separately ...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/1845


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849] Handle driver configs separately ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1845#issuecomment-52743530
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18937/consoleFull) for   PR 1845 at commit [`24dba60`](https://github.com/apache/spark/commit/24dba6067b33719e7eb25b0fb536bc2df72d34a4).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849] Handle driver configs separately ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1845#issuecomment-52742770
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18934/consoleFull) for   PR 1845 at commit [`24dba60`](https://github.com/apache/spark/commit/24dba6067b33719e7eb25b0fb536bc2df72d34a4).
     * This patch **fails** unit tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849 / 2914] Handle certain Spark confi...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/1845#issuecomment-52383086
  
    test this please!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849] Handle driver configs separately ...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1845#discussion_r16504521
  
    --- Diff: bin/spark-class ---
    @@ -17,6 +17,8 @@
     # limitations under the License.
     #
     
    +# NOTE: Any changes to this file must be reflected in SparkSubmitDriverBootstrapper.scala!
    +
     cygwin=false
     case "`uname`" in
         CYGWIN*) cygwin=true;;
    --- End diff --
    
    Ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849] Handle driver configs separately ...

Posted by pwendell <gi...@git.apache.org>.

Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1845#discussion_r16500557
  
    --- Diff: bin/spark-class ---
    @@ -17,6 +17,8 @@
     # limitations under the License.
     #
     
    +# NOTE: Any changes to this file must be reflected in SparkSubmitDriverBootstrapper.scala!
    +
     cygwin=false
     case "`uname`" in
         CYGWIN*) cygwin=true;;
    --- End diff --
    
    below we should change this to advise people to set `spark.driver.memory` instead of `SPARK_DRIVER_MEMORY`. Also we should remove the references in `spark-env.sh`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849 / 2914] Handle certain Spark confi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1845#issuecomment-52583116
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18807/consoleFull) for   PR 1845 at commit [`c84f5c8`](https://github.com/apache/spark/commit/c84f5c8ac6c0d75930a3545bf98b25c6d6287186).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849 / 2914] Handle certain Spark confi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1845#issuecomment-52384652
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18661/consoleFull) for   PR 1845 at commit [`c886568`](https://github.com/apache/spark/commit/c886568687adfda65ae27d0fefa184b345a770ad).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849] Handle driver configs separately ...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/1845#issuecomment-52742587
  
    Hi everyone, thanks for your comments. I have tested the latest commit and I think this is ready to go in my opinion. I have fixed the PySpark behavior where the subprocess doesn't actually terminate in a similar way Py4j's `JavaGateway` handles it (see `--die-on-broken-pipe`, which we currently use for the direct `SparkSubmit` code path). I have verified that the precedence order of various driver configs are as expected.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849] Handle driver configs separately ...

Posted by vanzin <gi...@git.apache.org>.

Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/1845#issuecomment-52809564
  
    A little fun with bash:
    
        test() {
          echo "arg1"
          echo '$arg2'
          echo "'arg3"
          echo '"arg4'
          echo "a r g 5"
          echo "$@"
        }
        
        IFS=$'\r\n'
        x=($(test abcde fghi))
        exec python -c "import sys; print str(len(sys.argv[1:])) + ' : ' + repr(sys.argv[1:])" "${x[@]}"
    
    Output:
    
        6 : ['arg1', '$arg2', "'arg3", '"arg4', 'a r g 5', 'abcde fghi']


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849 / 2914] Handle certain Spark confi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1845#issuecomment-51558467
  
    QA results for PR 1845:<br>- This patch PASSES unit tests.<br>- This patch merges cleanly<br>- This patch adds no public classes<br><br>For more information see test ouptut:<br>https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18169/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849] Handle driver configs separately ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1845#issuecomment-52740893
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18929/consoleFull) for   PR 1845 at commit [`9a778f6`](https://github.com/apache/spark/commit/9a778f66d812886b3322b32ae92fccf9d8415657).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [GitHub] spark pull request: [SPARK-2849] Handle driver configs separately ...

Posted by Sean Owen <so...@cloudera.com>.

(Guava also has ByteStreams.copy for this and I know its a dependency
already - I forget if commons IO is.)
Github user ash211 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1845#discussion_r17213316

    --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
    @@ -1421,3 +1421,24 @@ private[spark] object Utils extends Logging {
       }

     }
    +
    +/**
    + * A utility class to redirect the child process's stdout or stderr.
    + */
    +private[spark] class RedirectThread(in: InputStream, out:
OutputStream, name: String)
    +  extends Thread(name) {
    +
    +  setDaemon(true)
    +  override def run() {
    +    scala.util.control.Exception.ignoring(classOf[IOException]) {
    +      // FIXME: We copy the stream on the level of bytes to avoid
encoding problems.
    --- End diff --

    I'm guessing the original author was reading Strings() before, which
requires that you define how to interpret the bytes into strings (encodings
like ASCII/UTF8, etc).  There were probably some byte sequences coming
through that weren't characters in UTF8 so exceptions were being thrown.
Since this is just reading from an InputStream and writing to an
OutputStream, copying bytes would be much more efficient than reading
bytes, interpreting as characters, converting back to bytes, then sending
those out the other side.

    Conveniently, Apache has an ```IOUtils.copy(inputStream, outputStream,
bufferSize)``` [method](
http://commons.apache.org/proper/commons-io/apidocs/org/apache/commons/io/IOUtils.html#copy(java.io.InputStream,
java.io.OutputStream, int)) that would do exactly this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849] Handle driver configs separately ...

Posted by ash211 <gi...@git.apache.org>.

Github user ash211 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1845#discussion_r17213316
  
    --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
    @@ -1421,3 +1421,24 @@ private[spark] object Utils extends Logging {
       }
     
     }
    +
    +/**
    + * A utility class to redirect the child process's stdout or stderr.
    + */
    +private[spark] class RedirectThread(in: InputStream, out: OutputStream, name: String)
    +  extends Thread(name) {
    +
    +  setDaemon(true)
    +  override def run() {
    +    scala.util.control.Exception.ignoring(classOf[IOException]) {
    +      // FIXME: We copy the stream on the level of bytes to avoid encoding problems.
    --- End diff --
    
    I'm guessing the original author was reading Strings() before, which requires that you define how to interpret the bytes into strings (encodings like ASCII/UTF8, etc).  There were probably some byte sequences coming through that weren't characters in UTF8 so exceptions were being thrown.  Since this is just reading from an InputStream and writing to an OutputStream, copying bytes would be much more efficient than reading bytes, interpreting as characters, converting back to bytes, then sending those out the other side.
    
    Conveniently, Apache has an ```IOUtils.copy(inputStream, outputStream, bufferSize)``` [method](http://commons.apache.org/proper/commons-io/apidocs/org/apache/commons/io/IOUtils.html#copy(java.io.InputStream, java.io.OutputStream, int)) that would do exactly this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2849] Handle driver configs separately ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1845#issuecomment-52841889
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18991/consoleFull) for   PR 1845 at commit [`24dba60`](https://github.com/apache/spark/commit/24dba6067b33719e7eb25b0fb536bc2df72d34a4).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org