You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by andrewor14 <gi...@git.apache.org> on 2014/08/26 08:34:20 UTC

[GitHub] spark pull request: [WIP][SPARK-3167] Handle special driver config...

GitHub user andrewor14 opened a pull request:

    https://github.com/apache/spark/pull/2129

    [WIP][SPARK-3167] Handle special driver configs in Windows

    This is an attempt to bring the Windows scripts up to speed after recent splashing changes in #1845.
    
    This is still WIP because there is an issue with getting `SparkSubmitDriverBootstrapper` to work. More specifically, the `SparkSubmit` subprocess is not picking up `stdin` from the console as expected, because there is now an extra layer of execution in this code path.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/andrewor14/spark windows-config

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/2129.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2129
    
----
commit 83ebe601032867327988940073de4ee08a42c3fe
Author: Andrew Or <an...@gmail.com>
Date:   2014-08-26T06:28:48Z

    Parse special driver configs in Windows (broken)
    
    Note that this is still currently broken. There is an issue with
    using SparkSubmitDriverBootstrapper with windows; the stdin is not
    being picked up properly by the SparkSubmit subprocess. This must
    be fixed before the PR is merged.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3167] Handle special driver configs in ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2129#issuecomment-53529009
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19291/consoleFull) for   PR 2129 at commit [`881a8f0`](https://github.com/apache/spark/commit/881a8f0d03046bf074776a8e6c820a99fad02d11).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3167] Handle special driver configs in ...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/2129#issuecomment-53524099
  
    Hey Jenkins, test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3167] Handle special driver configs in ...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/2129#issuecomment-53524644
  
    **-- Update --**
    
    As of the latest commit, I have tested this with setting the `--driver-*` options and the corresponding `spark.driver.*` configs. The order of precedence is the same as established in #1845, i.e. the command line arguments override the values in the properties file. I have also verified that all subprocesses terminate after the parent exits in both Scala and Python.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-3167] Handle special driver config...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2129#discussion_r16752033
  
    --- Diff: bin/spark-class2.cmd ---
    @@ -115,5 +125,27 @@ rem Figure out where java is.
     set RUNNER=java
     if not "x%JAVA_HOME%"=="x" set RUNNER=%JAVA_HOME%\bin\java
     
    -"%RUNNER%" -cp "%CLASSPATH%" %JAVA_OPTS% %*
    +rem In Spark submit client mode, the driver is launched in the same JVM as Spark submit itself.
    +rem Here we must parse the properties file for relevant "spark.driver.*" configs before launching
    +rem the driver JVM itself. Instead of handling this complexity in Bash, we launch a separate JVM
    +rem to prepare the launch environment of this driver JVM.
    +
    +rem In this case, leave out the main class (org.apache.spark.deploy.SparkSubmit) and use our own.
    +rem Leaving out the first argument is surprisingly difficult to do in Windows. Note that this must
    +rem be done here because the Windows "shift" command does not work in a conditional block.
    +set BOOTSTRAP_ARGS=
    +shift
    +:start_parse
    +if "%~1" == "" goto end_parse
    +set BOOTSTRAP_ARGS=%BOOTSTRAP_ARGS% %~1
    +shift
    +goto start_parse
    +:end_parse
    +
    +if not [%SPARK_SUBMIT_BOOTSTRAP_DRIVER%] == [] (
    +  set SPARK_CLASS=1
    +  "%RUNNER%" org.apache.spark.deploy.SparkSubmitDriverBootstrapper %BOOTSTRAP_ARGS%
    --- End diff --
    
    This should be fixed in this commit https://github.com/andrewor14/spark/commit/f97daa20c184fb4b68d7ff5de1172a8e94e38b2f


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3167] Handle special driver configs in ...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/2129#issuecomment-53529332
  
    Okay I'm gonna merge this - thanks Andrew.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3167] Handle special driver configs in ...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/2129#issuecomment-53526021
  
    Hey @andrewor14 - a few minor comments, but otherwise LGTM. At this point you are the foremost export on windows scripting, so can't add much value reviewing that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3167] Handle special driver configs in ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2129#issuecomment-53523598
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19277/consoleFull) for   PR 2129 at commit [`72004c2`](https://github.com/apache/spark/commit/72004c2b6bd502244cb1e9e055d26bd1f064ce5a).
     * This patch **fails** unit tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `rem In this case, leave out the main class (org.apache.spark.deploy.SparkSubmit) and use our own.`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-3167] Handle special driver config...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2129#issuecomment-53385409
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19201/consoleFull) for   PR 2129 at commit [`83ebe60`](https://github.com/apache/spark/commit/83ebe601032867327988940073de4ee08a42c3fe).
     * This patch **passes** unit tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `rem In this case, leave out the main class (org.apache.spark.deploy.SparkSubmit) and use our own.`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3167] Handle special driver configs in ...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/2129#issuecomment-53516287
  
    Windows, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3167] Handle special driver configs in ...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2129#discussion_r16756387
  
    --- Diff: bin/spark-class2.cmd ---
    @@ -67,10 +69,18 @@ rem Executors use SPARK_JAVA_OPTS + SPARK_EXECUTOR_MEMORY.
       set OUR_JAVA_OPTS=%SPARK_JAVA_OPTS% %SPARK_EXECUTOR_OPTS%
       if not "x%SPARK_EXECUTOR_MEMORY%"=="x" set OUR_JAVA_MEM=%SPARK_EXECUTOR_MEMORY%
     
    -rem All drivers use SPARK_JAVA_OPTS + SPARK_DRIVER_MEMORY. The repl also uses SPARK_REPL_OPTS.
    -) else if "%1"=="org.apache.spark.repl.Main" (
    -  set OUR_JAVA_OPTS=%SPARK_JAVA_OPTS% %SPARK_REPL_OPTS%
    +rem Spark submit uses SPARK_JAVA_OPTS + SPARK_SUBMIT_OPTS +
    +rem SPARK_DRIVER_MEMORY + SPARK_SUBMIT_DRIVER_MEMORY.
    +rem The repl also uses SPARK_REPL_OPTS.
    +) else if "%1"=="org.apache.spark.deploy.SparkSubmit" (
    +  set OUR_JAVA_OPTS=%SPARK_JAVA_OPTS% %SPARK_SUBMIT_OPTS% %SPARK_REPL_OPTS%
    +  if not "x%SPARK_SUBMIT_LIBRARY_PATH%"=="x" (
    +    set OUR_JAVA_OPTS=!OUR_JAVA_OPTS! -Djava.library.path=%SPARK_SUBMIT_LIBRARY_PATH%
    --- End diff --
    
    The `!VAR_NAME!` syntax is explained here: https://github.com/andrewor14/spark/commit/72004c2b6bd502244cb1e9e055d26bd1f064ce5a


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3167] Handle special driver configs in ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2129#issuecomment-53524010
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19278/consoleFull) for   PR 2129 at commit [`afcffea`](https://github.com/apache/spark/commit/afcffead52e508808c12eb32f281c2ecb0800480).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3167] Handle special driver configs in ...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/2129


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3167] Handle special driver configs in ...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2129#discussion_r16753486
  
    --- Diff: bin/spark-class2.cmd ---
    @@ -115,5 +125,27 @@ rem Figure out where java is.
     set RUNNER=java
     if not "x%JAVA_HOME%"=="x" set RUNNER=%JAVA_HOME%\bin\java
     
    -"%RUNNER%" -cp "%CLASSPATH%" %JAVA_OPTS% %*
    +rem In Spark submit client mode, the driver is launched in the same JVM as Spark submit itself.
    +rem Here we must parse the properties file for relevant "spark.driver.*" configs before launching
    +rem the driver JVM itself. Instead of handling this complexity in Bash, we launch a separate JVM
    +rem to prepare the launch environment of this driver JVM.
    +
    +rem In this case, leave out the main class (org.apache.spark.deploy.SparkSubmit) and use our own.
    +rem Leaving out the first argument is surprisingly difficult to do in Windows. Note that this must
    +rem be done here because the Windows "shift" command does not work in a conditional block.
    +set BOOTSTRAP_ARGS=
    +shift
    +:start_parse
    +if "%~1" == "" goto end_parse
    +set BOOTSTRAP_ARGS=%BOOTSTRAP_ARGS% %~1
    +shift
    +goto start_parse
    +:end_parse
    +
    +if not [%SPARK_SUBMIT_BOOTSTRAP_DRIVER%] == [] (
    +  set SPARK_CLASS=1
    +  "%RUNNER%" org.apache.spark.deploy.SparkSubmitDriverBootstrapper %BOOTSTRAP_ARGS%
    --- End diff --
    
    JK, this was actually fixed in https://github.com/andrewor14/spark/commit/35caecc899796da1ad4851185644ff591d479270


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3167] Handle special driver configs in ...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2129#discussion_r16758263
  
    --- Diff: python/pyspark/java_gateway.py ---
    @@ -69,6 +70,22 @@ def preexec_func():
                     error_msg += "--------------------------------------------------------------\n"
                 raise Exception(error_msg)
     
    +        # In Windows, ensure the Java child processes do not linger after Python has exited.
    +        # In UNIX-based systems, the child process can kill itself on broken pipe (i.e. when
    +        # the parent process' stdin sends an EOF). In Windows, however, this is not possible
    +        # because java.lang.Process reads directly from the parent process' stdin, contending
    +        # with any opportunity to read an EOF from the parent. Note that this is only best
    +        # effort and will not take effect if the python process is violently terminated.
    +        if on_windows:
    +            # In Windows, the child process here is "spark-submit.cmd", not the JVM itself
    +            # (because the UNIX "exec" command is not available). This means we cannot simply
    +            # call proc.kill(), which kills only the "spark-submit.cmd" process but not the
    +            # JVMs. Instead, we use "taskkill" with the tree-kill option "/t" to terminate all
    +            # child processes in the tree.
    +            def killChild():
    +                Popen(["cmd", "/c", "taskkill", "/f", "/t", "/pid", str(proc.pid)])
    --- End diff --
    
    Sure. If we are to link it I'd rather provide a more official one, say http://technet.microsoft.com/en-us/library/bb491009.aspx


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-3167] Handle special driver config...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2129#discussion_r16697858
  
    --- Diff: bin/spark-class2.cmd ---
    @@ -115,5 +125,27 @@ rem Figure out where java is.
     set RUNNER=java
     if not "x%JAVA_HOME%"=="x" set RUNNER=%JAVA_HOME%\bin\java
     
    -"%RUNNER%" -cp "%CLASSPATH%" %JAVA_OPTS% %*
    +rem In Spark submit client mode, the driver is launched in the same JVM as Spark submit itself.
    +rem Here we must parse the properties file for relevant "spark.driver.*" configs before launching
    +rem the driver JVM itself. Instead of handling this complexity in Bash, we launch a separate JVM
    +rem to prepare the launch environment of this driver JVM.
    +
    +rem In this case, leave out the main class (org.apache.spark.deploy.SparkSubmit) and use our own.
    +rem Leaving out the first argument is surprisingly difficult to do in Windows. Note that this must
    +rem be done here because the Windows "shift" command does not work in a conditional block.
    +set BOOTSTRAP_ARGS=
    +shift
    +:start_parse
    +if "%~1" == "" goto end_parse
    +set BOOTSTRAP_ARGS=%BOOTSTRAP_ARGS% %~1
    +shift
    +goto start_parse
    +:end_parse
    +
    +if not [%SPARK_SUBMIT_BOOTSTRAP_DRIVER%] == [] (
    +  set SPARK_CLASS=1
    +  "%RUNNER%" org.apache.spark.deploy.SparkSubmitDriverBootstrapper %BOOTSTRAP_ARGS%
    --- End diff --
    
    This is not working yet. See commit message https://github.com/andrewor14/spark/commit/83ebe601032867327988940073de4ee08a42c3fe for more detail.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3167] Handle special driver configs in ...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/2129#issuecomment-53523933
  
    test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3167] Handle special driver configs in ...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/2129#issuecomment-53528831
  
    Jenkins, test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3167] Handle special driver configs in ...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/2129#issuecomment-53523666
  
    As of the latest commit, I have tested this with setting the `--driver-*` options and the corresponding `spark.driver.*` configs. The order of precedence is the same as established in #1845, i.e. the command line arguments override the values in the properties file. I have also verified that all subprocesses terminate after the parent exits in both Scala and Python.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3167] Handle special driver configs in ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2129#issuecomment-53526552
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19279/consoleFull) for   PR 2129 at commit [`22b1acd`](https://github.com/apache/spark/commit/22b1acd32a44c477548108daa2f1328499b98556).
     * This patch **passes** unit tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `rem In this case, leave out the main class (org.apache.spark.deploy.SparkSubmit) and use our own.`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3167] Handle special driver configs in ...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2129#discussion_r16758239
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmitDriverBootstrapper.scala ---
    @@ -133,17 +133,24 @@ private[spark] object SparkSubmitDriverBootstrapper {
         val process = builder.start()
     
         // Redirect stdin, stdout, and stderr to/from the child JVM
    -    val stdinThread = new RedirectThread(System.in, process.getOutputStream, "redirect stdin")
         val stdoutThread = new RedirectThread(process.getInputStream, System.out, "redirect stdout")
         val stderrThread = new RedirectThread(process.getErrorStream, System.err, "redirect stderr")
    -    stdinThread.start()
         stdoutThread.start()
         stderrThread.start()
     
    -    // Terminate on broken pipe, which signals that the parent process has exited. This is
    -    // important for the PySpark shell, where Spark submit itself is a python subprocess.
    -    stdinThread.join()
    -    process.destroy()
    +    // In Windows, the subprocess reads directly from our stdin, so we should avoid spawning
    --- End diff --
    
    Not that I'm aware of :/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3167] Handle special driver configs in ...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2129#discussion_r16757019
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmitDriverBootstrapper.scala ---
    @@ -133,17 +133,24 @@ private[spark] object SparkSubmitDriverBootstrapper {
         val process = builder.start()
     
         // Redirect stdin, stdout, and stderr to/from the child JVM
    -    val stdinThread = new RedirectThread(System.in, process.getOutputStream, "redirect stdin")
         val stdoutThread = new RedirectThread(process.getInputStream, System.out, "redirect stdout")
         val stderrThread = new RedirectThread(process.getErrorStream, System.err, "redirect stderr")
    -    stdinThread.start()
         stdoutThread.start()
         stderrThread.start()
     
    -    // Terminate on broken pipe, which signals that the parent process has exited. This is
    -    // important for the PySpark shell, where Spark submit itself is a python subprocess.
    -    stdinThread.join()
    -    process.destroy()
    +    // In Windows, the subprocess reads directly from our stdin, so we should avoid spawning
    --- End diff --
    
    Is there any public location where this behavior is specified?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3167] Handle special driver configs in ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2129#issuecomment-53524053
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19278/consoleFull) for   PR 2129 at commit [`afcffea`](https://github.com/apache/spark/commit/afcffead52e508808c12eb32f281c2ecb0800480).
     * This patch **fails** unit tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `rem In this case, leave out the main class (org.apache.spark.deploy.SparkSubmit) and use our own.`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3167] Handle special driver configs in ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2129#issuecomment-53532419
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19291/consoleFull) for   PR 2129 at commit [`881a8f0`](https://github.com/apache/spark/commit/881a8f0d03046bf074776a8e6c820a99fad02d11).
     * This patch **passes** unit tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `rem In this case, leave out the main class (org.apache.spark.deploy.SparkSubmit) and use our own.`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-3167] Handle special driver config...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2129#issuecomment-53381288
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19201/consoleFull) for   PR 2129 at commit [`83ebe60`](https://github.com/apache/spark/commit/83ebe601032867327988940073de4ee08a42c3fe).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3167] Handle special driver configs in ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2129#issuecomment-53523542
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19277/consoleFull) for   PR 2129 at commit [`72004c2`](https://github.com/apache/spark/commit/72004c2b6bd502244cb1e9e055d26bd1f064ce5a).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3167] Handle special driver configs in ...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2129#discussion_r16757138
  
    --- Diff: python/pyspark/java_gateway.py ---
    @@ -69,6 +70,22 @@ def preexec_func():
                     error_msg += "--------------------------------------------------------------\n"
                 raise Exception(error_msg)
     
    +        # In Windows, ensure the Java child processes do not linger after Python has exited.
    +        # In UNIX-based systems, the child process can kill itself on broken pipe (i.e. when
    +        # the parent process' stdin sends an EOF). In Windows, however, this is not possible
    +        # because java.lang.Process reads directly from the parent process' stdin, contending
    +        # with any opportunity to read an EOF from the parent. Note that this is only best
    +        # effort and will not take effect if the python process is violently terminated.
    +        if on_windows:
    +            # In Windows, the child process here is "spark-submit.cmd", not the JVM itself
    +            # (because the UNIX "exec" command is not available). This means we cannot simply
    +            # call proc.kill(), which kills only the "spark-submit.cmd" process but not the
    +            # JVMs. Instead, we use "taskkill" with the tree-kill option "/t" to terminate all
    +            # child processes in the tree.
    +            def killChild():
    +                Popen(["cmd", "/c", "taskkill", "/f", "/t", "/pid", str(proc.pid)])
    --- End diff --
    
    I would link to wherever you found this - maybe here?
    http://stackoverflow.com/questions/1230669/subprocess-deleting-child-processes-in-windows
    
    Also, this is a bit scary to just issue kill commands (let's hope `proc.pid` is correct!) but it appears to be the best way to do this ATM.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3167] Handle special driver configs in ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2129#issuecomment-53524224
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19279/consoleFull) for   PR 2129 at commit [`22b1acd`](https://github.com/apache/spark/commit/22b1acd32a44c477548108daa2f1328499b98556).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org