You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by vanzin <gi...@git.apache.org> on 2018/01/17 19:44:21 UTC

[GitHub] spark pull request #20297: [SPARK-23020][CORE] Fix races in launcher code, t...

GitHub user vanzin opened a pull request:

    https://github.com/apache/spark/pull/20297

    [SPARK-23020][CORE] Fix races in launcher code, test.

    The race in the code is because the handle might update
    its state to the wrong state if the connection handling
    thread is still processing incoming data; so the handle
    needs to wait for the connection to finish up before
    checking the final state.
    
    The race in the test is because when waiting for a handle
    to reach a final state, the waitFor() method needs to wait
    until all handle state is updated (which also includes
    waiting for the connection thread above to finish).
    Otherwise, waitFor() may return too early, which would cause
    a bunch of different races (like the listener not being yet
    notified of the state change, or being in the middle of
    being notified, or the handle not being properly disposed
    and causing postChecks() to assert).
    
    On top of that I found, by code inspection, a couple of
    potential races that could make a handle end up in the
    wrong state when being killed.
    
    The original version of this fix introduced the flipped
    version of the first race described above; the connection
    closing might override the handle state before the
    handle might have a chance to do cleanup. The fix there
    is to only dispose of the handle from the connection
    when there is an error, and let the handle dispose
    itself in the normal case.
    
    The fix also cause a bug in YarnClusterSuite to be surfaced;
    the code was checking for a file in the classpath that was
    not expected to be there in client mode. Because of the above
    issues, the error was not propagating correctly and the (buggy)
    test was incorrectly passing.
    
    Tested by running the existing unit tests a lot (and not
    seeing the errors I was seeing before).

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/vanzin/spark SPARK-23020

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20297.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20297
    
----
commit 8bde21a1cbdab3c49a85c1da960f4d9c7bf70064
Author: Marcelo Vanzin <va...@...>
Date:   2018-01-16T06:40:44Z

    [SPARK-23020][CORE] Fix races in launcher code, test.
    
    The race in the code is because the handle might update
    its state to the wrong state if the connection handling
    thread is still processing incoming data; so the handle
    needs to wait for the connection to finish up before
    checking the final state.
    
    The race in the test is because when waiting for a handle
    to reach a final state, the waitFor() method needs to wait
    until all handle state is updated (which also includes
    waiting for the connection thread above to finish).
    Otherwise, waitFor() may return too early, which would cause
    a bunch of different races (like the listener not being yet
    notified of the state change, or being in the middle of
    being notified, or the handle not being properly disposed
    and causing postChecks() to assert).
    
    On top of that I found, by code inspection, a couple of
    potential races that could make a handle end up in the
    wrong state when being killed.
    
    The original version of this fix introduced the flipped
    version of the first race described above; the connection
    closing might override the handle state before the
    handle might have a chance to do cleanup. The fix there
    is to only dispose of the handle from the connection
    when there is an error, and let the handle dispose
    itself in the normal case.
    
    The fix also cause a bug in YarnClusterSuite to be surfaced;
    the code was checking for a file in the classpath that was
    not expected to be there in client mode. Because of the above
    issues, the error was not propagating correctly and the (buggy)
    test was incorrectly passing.
    
    Tested by running the existing unit tests a lot (and not
    seeing the errors I was seeing before).

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20297: [SPARK-23020][CORE] Fix races in launcher code, t...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/20297


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20297: [SPARK-23020][CORE] Fix races in launcher code, t...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20297#discussion_r162694343
  
    --- Diff: launcher/src/main/java/org/apache/spark/launcher/LauncherServer.java ---
    @@ -331,23 +358,27 @@ protected void handle(Message msg) throws IOException {
               timeout.cancel();
             }
             close();
    +        if (handle != null) {
    +          handle.dispose();
    +        }
           } finally {
             timeoutTimer.purge();
           }
         }
     
         @Override
         public void close() throws IOException {
    +      if (!isOpen()) {
    +        return;
    +      }
    +
           synchronized (clients) {
             clients.remove(this);
           }
    -      super.close();
    -      if (handle != null) {
    -        if (!handle.getState().isFinal()) {
    -          LOG.log(Level.WARNING, "Lost connection to spark application.");
    -          handle.setState(SparkAppHandle.State.LOST);
    -        }
    -        handle.disconnect();
    --- End diff --
    
    See https://github.com/apache/spark/pull/20297#pullrequestreview-89568079


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20297: [SPARK-23020][CORE] Fix races in launcher code, t...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20297#discussion_r162524725
  
    --- Diff: launcher/src/main/java/org/apache/spark/launcher/LauncherConnection.java ---
    @@ -95,15 +95,15 @@ protected synchronized void send(Message msg) throws IOException {
       }
     
       @Override
    -  public void close() throws IOException {
    +  public synchronized void close() throws IOException {
    --- End diff --
    
    do we still need to change this method?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20297: [SPARK-23020][CORE] Fix races in launcher code, t...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20297#discussion_r162525142
  
    --- Diff: launcher/src/main/java/org/apache/spark/launcher/LauncherServer.java ---
    @@ -331,23 +358,27 @@ protected void handle(Message msg) throws IOException {
               timeout.cancel();
             }
             close();
    +        if (handle != null) {
    +          handle.dispose();
    +        }
           } finally {
             timeoutTimer.purge();
           }
         }
     
         @Override
         public void close() throws IOException {
    +      if (!isOpen()) {
    +        return;
    +      }
    +
           synchronized (clients) {
             clients.remove(this);
           }
    -      super.close();
    -      if (handle != null) {
    -        if (!handle.getState().isFinal()) {
    -          LOG.log(Level.WARNING, "Lost connection to spark application.");
    -          handle.setState(SparkAppHandle.State.LOST);
    -        }
    -        handle.disconnect();
    --- End diff --
    
    why we don't disconnect now?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20297: [SPARK-23020][CORE] Fix races in launcher code, test.

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20297
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20297: [SPARK-23020][CORE] Fix races in launcher code, test.

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20297
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/45/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20297: [SPARK-23020][CORE] Fix races in launcher code, t...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20297#discussion_r162524615
  
    --- Diff: launcher/src/main/java/org/apache/spark/launcher/ChildProcAppHandle.java ---
    @@ -48,14 +48,16 @@ public synchronized void disconnect() {
     
       @Override
       public synchronized void kill() {
    -    disconnect();
    -    if (childProc != null) {
    -      if (childProc.isAlive()) {
    -        childProc.destroyForcibly();
    +    if (!isDisposed()) {
    +      setState(State.KILLED);
    --- End diff --
    
    so we should set the state to `KILLED` once the `kill` method is called? Even the code below fails(throw exception), the state should still be `KILLED`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20297: [SPARK-23020][CORE] Fix races in launcher code, t...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20297#discussion_r162694174
  
    --- Diff: launcher/src/main/java/org/apache/spark/launcher/LauncherServer.java ---
    @@ -331,23 +358,27 @@ protected void handle(Message msg) throws IOException {
               timeout.cancel();
             }
             close();
    +        if (handle != null) {
    +          handle.dispose();
    +        }
           } finally {
             timeoutTimer.purge();
           }
         }
     
         @Override
         public void close() throws IOException {
    +      if (!isOpen()) {
    +        return;
    +      }
    +
           synchronized (clients) {
             clients.remove(this);
           }
    -      super.close();
    -      if (handle != null) {
    -        if (!handle.getState().isFinal()) {
    -          LOG.log(Level.WARNING, "Lost connection to spark application.");
    -          handle.setState(SparkAppHandle.State.LOST);
    -        }
    -        handle.disconnect();
    +
    +      synchronized (this) {
    +        super.close();
    +        notifyAll();
    --- End diff --
    
    See L239.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20297: [SPARK-23020][CORE] Fix races in launcher code, t...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20297#discussion_r163014278
  
    --- Diff: launcher/src/main/java/org/apache/spark/launcher/ChildProcAppHandle.java ---
    @@ -48,14 +48,16 @@ public synchronized void disconnect() {
     
       @Override
       public synchronized void kill() {
    -    disconnect();
    -    if (childProc != null) {
    -      if (childProc.isAlive()) {
    -        childProc.destroyForcibly();
    +    if (!isDisposed()) {
    +      setState(State.KILLED);
    --- End diff --
    
    Then the comment I made in the previous PR applies. Closing the socket / killing the child can have other implications (like changing the state) and it's easier to reason about what happens if the desired state change happens first.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20297: [SPARK-23020][CORE] Fix races in launcher code, test.

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20297
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86289/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20297: [SPARK-23020][CORE] Fix races in launcher code, test.

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20297
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20297: [SPARK-23020][CORE] Fix races in launcher code, test.

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20297
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20297: [SPARK-23020][CORE] Fix races in launcher code, test.

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20297
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20297: [SPARK-23020][CORE] Fix races in launcher code, test.

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20297
  
    **[Test build #86289 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86289/testReport)** for PR 20297 at commit [`8bde21a`](https://github.com/apache/spark/commit/8bde21a1cbdab3c49a85c1da960f4d9c7bf70064).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20297: [SPARK-23020][CORE] Fix races in launcher code, test.

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20297
  
    **[Test build #4067 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4067/testReport)** for PR 20297 at commit [`95bac27`](https://github.com/apache/spark/commit/95bac2773ee7adab9f57aa4377ff2e998353f02f).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20297: [SPARK-23020][CORE] Fix races in launcher code, test.

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20297
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86405/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20297: [SPARK-23020][CORE] Fix races in launcher code, test.

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20297
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86351/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20297: [SPARK-23020][CORE] Fix races in launcher code, test.

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/20297
  
    @sameeragarwal @cloud-fan @gengliangwang since you guys looked at the previous PR.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20297: [SPARK-23020][CORE] Fix races in launcher code, t...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20297#discussion_r162693890
  
    --- Diff: launcher/src/main/java/org/apache/spark/launcher/LauncherConnection.java ---
    @@ -95,15 +95,15 @@ protected synchronized void send(Message msg) throws IOException {
       }
     
       @Override
    -  public void close() throws IOException {
    +  public synchronized void close() throws IOException {
    --- End diff --
    
    We never *needed* to change it, but the extra code wasn't doing anything useful, so I chose the simpler version.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20297: [SPARK-23020][CORE] Fix races in launcher code, test.

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20297
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20297: [SPARK-23020][CORE] Fix races in launcher code, test.

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20297
  
    **[Test build #4060 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4060/testReport)** for PR 20297 at commit [`8bde21a`](https://github.com/apache/spark/commit/8bde21a1cbdab3c49a85c1da960f4d9c7bf70064).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20297: [SPARK-23020][CORE] Fix races in launcher code, test.

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20297
  
    **[Test build #4067 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4067/testReport)** for PR 20297 at commit [`95bac27`](https://github.com/apache/spark/commit/95bac2773ee7adab9f57aa4377ff2e998353f02f).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20297: [SPARK-23020][CORE] Fix races in launcher code, t...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20297#discussion_r162159354
  
    --- Diff: launcher/src/main/java/org/apache/spark/launcher/LauncherServer.java ---
    @@ -331,23 +358,27 @@ protected void handle(Message msg) throws IOException {
               timeout.cancel();
             }
             close();
    +        if (handle != null) {
    --- End diff --
    
    This is the fix for the new race described in the summary (the code is moved here from below).
    
    This changes behavior slightly: the handle now waits for the child process / thread to finish before disposing itself, whereas before that would happen as soon as the connection between the child process / thread was closed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20297: [SPARK-23020][CORE] Fix races in launcher code, test.

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/20297
  
    thanks, merging to master/2.3!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20297: [SPARK-23020][CORE] Fix races in launcher code, test.

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20297
  
    **[Test build #86395 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86395/testReport)** for PR 20297 at commit [`95bac27`](https://github.com/apache/spark/commit/95bac2773ee7adab9f57aa4377ff2e998353f02f).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20297: [SPARK-23020][CORE] Fix races in launcher code, test.

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20297
  
    **[Test build #4062 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4062/testReport)** for PR 20297 at commit [`8bde21a`](https://github.com/apache/spark/commit/8bde21a1cbdab3c49a85c1da960f4d9c7bf70064).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20297: [SPARK-23020][CORE] Fix races in launcher code, test.

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/20297
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20297: [SPARK-23020][CORE] Fix races in launcher code, t...

Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20297#discussion_r162548031
  
    --- Diff: launcher/src/main/java/org/apache/spark/launcher/LauncherConnection.java ---
    @@ -95,15 +95,15 @@ protected synchronized void send(Message msg) throws IOException {
       }
     
       @Override
    -  public void close() throws IOException {
    +  public synchronized void close() throws IOException {
         if (!closed) {
    --- End diff --
    
    => isOpen


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20297: [SPARK-23020][CORE] Fix races in launcher code, test.

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20297
  
    **[Test build #4066 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4066/testReport)** for PR 20297 at commit [`95bac27`](https://github.com/apache/spark/commit/95bac2773ee7adab9f57aa4377ff2e998353f02f).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20297: [SPARK-23020][CORE] Fix races in launcher code, test.

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20297
  
    **[Test build #86405 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86405/testReport)** for PR 20297 at commit [`95bac27`](https://github.com/apache/spark/commit/95bac2773ee7adab9f57aa4377ff2e998353f02f).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20297: [SPARK-23020][CORE] Fix races in launcher code, test.

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20297
  
    **[Test build #4060 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4060/testReport)** for PR 20297 at commit [`8bde21a`](https://github.com/apache/spark/commit/8bde21a1cbdab3c49a85c1da960f4d9c7bf70064).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20297: [SPARK-23020][CORE] Fix races in launcher code, t...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20297#discussion_r163119635
  
    --- Diff: launcher/src/main/java/org/apache/spark/launcher/ChildProcAppHandle.java ---
    @@ -48,14 +48,16 @@ public synchronized void disconnect() {
     
       @Override
       public synchronized void kill() {
    -    disconnect();
    -    if (childProc != null) {
    -      if (childProc.isAlive()) {
    -        childProc.destroyForcibly();
    +    if (!isDisposed()) {
    +      setState(State.KILLED);
    --- End diff --
    
    ok makes sense.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20297: [SPARK-23020][CORE] Fix races in launcher code, test.

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20297
  
    **[Test build #4066 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4066/testReport)** for PR 20297 at commit [`95bac27`](https://github.com/apache/spark/commit/95bac2773ee7adab9f57aa4377ff2e998353f02f).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20297: [SPARK-23020][CORE] Fix races in launcher code, test.

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20297
  
    **[Test build #4062 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4062/testReport)** for PR 20297 at commit [`8bde21a`](https://github.com/apache/spark/commit/8bde21a1cbdab3c49a85c1da960f4d9c7bf70064).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20297: [SPARK-23020][CORE] Fix races in launcher code, t...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20297#discussion_r162851582
  
    --- Diff: launcher/src/main/java/org/apache/spark/launcher/ChildProcAppHandle.java ---
    @@ -48,14 +48,16 @@ public synchronized void disconnect() {
     
       @Override
       public synchronized void kill() {
    -    disconnect();
    -    if (childProc != null) {
    -      if (childProc.isAlive()) {
    -        childProc.destroyForcibly();
    +    if (!isDisposed()) {
    +      setState(State.KILLED);
    --- End diff --
    
    Even the order doesn't matter, I think it's more conventional to set the state at the end.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20297: [SPARK-23020][CORE] Fix races in launcher code, t...

Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20297#discussion_r162550217
  
    --- Diff: launcher/src/main/java/org/apache/spark/launcher/ChildProcAppHandle.java ---
    @@ -48,14 +48,16 @@ public synchronized void disconnect() {
     
       @Override
       public synchronized void kill() {
    -    disconnect();
    -    if (childProc != null) {
    -      if (childProc.isAlive()) {
    -        childProc.destroyForcibly();
    +    if (!isDisposed()) {
    +      setState(State.KILLED);
    --- End diff --
    
    +1,I have the same question in last review. We should figure it out.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20297: [SPARK-23020][CORE] Fix races in launcher code, test.

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20297
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86395/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20297: [SPARK-23020][CORE] Fix races in launcher code, test.

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/20297
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20297: [SPARK-23020][CORE] Fix races in launcher code, test.

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20297
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20297: [SPARK-23020][CORE] Fix races in launcher code, test.

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20297
  
    **[Test build #4061 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4061/testReport)** for PR 20297 at commit [`8bde21a`](https://github.com/apache/spark/commit/8bde21a1cbdab3c49a85c1da960f4d9c7bf70064).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20297: [SPARK-23020][CORE] Fix races in launcher code, test.

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20297
  
    **[Test build #86405 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86405/testReport)** for PR 20297 at commit [`95bac27`](https://github.com/apache/spark/commit/95bac2773ee7adab9f57aa4377ff2e998353f02f).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20297: [SPARK-23020][CORE] Fix races in launcher code, test.

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20297
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20297: [SPARK-23020][CORE] Fix races in launcher code, test.

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20297
  
    **[Test build #4068 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4068/testReport)** for PR 20297 at commit [`95bac27`](https://github.com/apache/spark/commit/95bac2773ee7adab9f57aa4377ff2e998353f02f).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20297: [SPARK-23020][CORE] Fix races in launcher code, test.

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20297
  
    **[Test build #86395 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86395/testReport)** for PR 20297 at commit [`95bac27`](https://github.com/apache/spark/commit/95bac2773ee7adab9f57aa4377ff2e998353f02f).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20297: [SPARK-23020][CORE] Fix races in launcher code, test.

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20297
  
    **[Test build #4068 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4068/testReport)** for PR 20297 at commit [`95bac27`](https://github.com/apache/spark/commit/95bac2773ee7adab9f57aa4377ff2e998353f02f).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20297: [SPARK-23020][CORE] Fix races in launcher code, test.

Posted by sameeragarwal <gi...@git.apache.org>.
Github user sameeragarwal commented on the issue:

    https://github.com/apache/spark/pull/20297
  
    I've kicked a bunch of test builds in parallel to further rule out any flakiness on jenkins. Thanks!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20297: [SPARK-23020][CORE] Fix races in launcher code, test.

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20297
  
    **[Test build #86351 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86351/testReport)** for PR 20297 at commit [`8bde21a`](https://github.com/apache/spark/commit/8bde21a1cbdab3c49a85c1da960f4d9c7bf70064).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20297: [SPARK-23020][CORE] Fix races in launcher code, test.

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20297
  
    **[Test build #86289 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86289/testReport)** for PR 20297 at commit [`8bde21a`](https://github.com/apache/spark/commit/8bde21a1cbdab3c49a85c1da960f4d9c7bf70064).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20297: [SPARK-23020][CORE] Fix races in launcher code, t...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20297#discussion_r162159436
  
    --- Diff: resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala ---
    @@ -381,7 +381,9 @@ private object YarnClusterDriver extends Logging with Matchers {
     
           // Verify that the config archive is correctly placed in the classpath of all containers.
           val confFile = "/" + Client.SPARK_CONF_FILE
    -      assert(getClass().getResource(confFile) != null)
    +      if (conf.getOption(SparkLauncher.DEPLOY_MODE) == Some("cluster")) {
    --- End diff --
    
    This is the bug in the YARN tests that the fix uncovered.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20297: [SPARK-23020][CORE] Fix races in launcher code, test.

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/20297
  
    LGTM


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20297: [SPARK-23020][CORE] Fix races in launcher code, test.

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20297
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/52/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20297: [SPARK-23020][CORE] Fix races in launcher code, test.

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20297
  
    **[Test build #4061 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4061/testReport)** for PR 20297 at commit [`8bde21a`](https://github.com/apache/spark/commit/8bde21a1cbdab3c49a85c1da960f4d9c7bf70064).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20297: [SPARK-23020][CORE] Fix races in launcher code, test.

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/20297
  
    I kicked an extra couple of builds aside from the one that should auto-trigger.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20297: [SPARK-23020][CORE] Fix races in launcher code, test.

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20297
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/8/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20297: [SPARK-23020][CORE] Fix races in launcher code, t...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20297#discussion_r162693731
  
    --- Diff: launcher/src/main/java/org/apache/spark/launcher/ChildProcAppHandle.java ---
    @@ -48,14 +48,16 @@ public synchronized void disconnect() {
     
       @Override
       public synchronized void kill() {
    -    disconnect();
    -    if (childProc != null) {
    -      if (childProc.isAlive()) {
    -        childProc.destroyForcibly();
    +    if (!isDisposed()) {
    +      setState(State.KILLED);
    --- End diff --
    
    None of the calls below should raise exceptions. Even the socket close is wrapped in a try..catch.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20297: [SPARK-23020][CORE] Fix races in launcher code, t...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20297#discussion_r162525240
  
    --- Diff: launcher/src/main/java/org/apache/spark/launcher/LauncherServer.java ---
    @@ -331,23 +358,27 @@ protected void handle(Message msg) throws IOException {
               timeout.cancel();
             }
             close();
    +        if (handle != null) {
    +          handle.dispose();
    +        }
           } finally {
             timeoutTimer.purge();
           }
         }
     
         @Override
         public void close() throws IOException {
    +      if (!isOpen()) {
    +        return;
    +      }
    +
           synchronized (clients) {
             clients.remove(this);
           }
    -      super.close();
    -      if (handle != null) {
    -        if (!handle.getState().isFinal()) {
    -          LOG.log(Level.WARNING, "Lost connection to spark application.");
    -          handle.setState(SparkAppHandle.State.LOST);
    -        }
    -        handle.disconnect();
    +
    +      synchronized (this) {
    +        super.close();
    +        notifyAll();
    --- End diff --
    
    why `notifyAll`? who might be waiting?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20297: [SPARK-23020][CORE] Fix races in launcher code, test.

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20297
  
    **[Test build #86351 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86351/testReport)** for PR 20297 at commit [`8bde21a`](https://github.com/apache/spark/commit/8bde21a1cbdab3c49a85c1da960f4d9c7bf70064).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org