You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by jose-torres <gi...@git.apache.org> on 2018/02/13 19:52:58 UTC

[GitHub] spark pull request #20602: [SPARK-23416][SS] handle streaming interrupts in ...

GitHub user jose-torres opened a pull request:

    https://github.com/apache/spark/pull/20602

    [SPARK-23416][SS] handle streaming interrupts in ThreadUtils.awaitResult()

    ## What changes were proposed in this pull request?
    
    StreamExecution.isInterruptedByStop() implements a whitelist of exceptions which indicate a benign stop() call. The DataSourceV2 write path introduces a new kind of exception, SparkException caused by InterruptedException, which must be added to this whitelist. (This exception comes from an interrupt in ThreadUtils.awaitResult().)
    
    ## How was this patch tested?
    
    Existing unit tests. Unfortunately, the underlying flakiness is reasonably rare, so I don't have a good idea for how to test that this resolves it.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jose-torres/spark SPARK-23416

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20602.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20602
    
----
commit 24a621e1c30ca02436a6725c5646d216bf2d7118
Author: Jose Torres <jo...@...>
Date:   2018-02-13T19:50:01Z

    add accommodation for ThreadUtils.awaitResult()

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20602: [SPARK-23416][SS] handle streaming interrupts in ThreadU...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20602
  
    **[Test build #87421 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87421/testReport)** for PR 20602 at commit [`24a621e`](https://github.com/apache/spark/commit/24a621e1c30ca02436a6725c5646d216bf2d7118).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20602: [SPARK-23416][SS] handle streaming interrupts in ThreadU...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20602
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20602: [SPARK-23416][SS] handle streaming interrupts in ThreadU...

Posted by jose-torres <gi...@git.apache.org>.
Github user jose-torres commented on the issue:

    https://github.com/apache/spark/pull/20602
  
    /cc @dongjoon-hyun @mgaido91 @tdas @zsxwing 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20602: [SPARK-23416][SS] handle streaming interrupts in ThreadU...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20602
  
    **[Test build #87421 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87421/testReport)** for PR 20602 at commit [`24a621e`](https://github.com/apache/spark/commit/24a621e1c30ca02436a6725c5646d216bf2d7118).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20602: [SPARK-23416][SS] handle streaming interrupts in ...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20602#discussion_r167993164
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala ---
    @@ -369,7 +370,11 @@ abstract class StreamExecution(
             //                      exception
             // UncheckedExecutionException - thrown by codes that cannot throw a checked
             //                               ExecutionException, such as BiFunction.apply
    -        case e2 @ (_: UncheckedIOException | _: ExecutionException | _: UncheckedExecutionException)
    +        // SparkException - thrown if the interrupt happens in the middle of an RPC wait
    +        case e2 @ (_: UncheckedIOException |
    +                   _: ExecutionException |
    +                   _: UncheckedExecutionException |
    +                   _: SparkException)
    --- End diff --
    
    I am not sure especially about this. It can be thrown for a great number of cases which aren't necessarily caused by the stop operation. I don't think it is a good idea to add it.  


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20602: [SPARK-23416][SS] handle streaming interrupts in ...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20602#discussion_r167995867
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala ---
    @@ -369,7 +370,11 @@ abstract class StreamExecution(
             //                      exception
             // UncheckedExecutionException - thrown by codes that cannot throw a checked
             //                               ExecutionException, such as BiFunction.apply
    -        case e2 @ (_: UncheckedIOException | _: ExecutionException | _: UncheckedExecutionException)
    +        // SparkException - thrown if the interrupt happens in the middle of an RPC wait
    +        case e2 @ (_: UncheckedIOException |
    +                   _: ExecutionException |
    +                   _: UncheckedExecutionException |
    +                   _: SparkException)
    --- End diff --
    
    Not really. But as you said, relying on a list of exceptions to be thrown is a bit fragile. Can't we rethink this and do a more long term change, rather than a quick (and it seems a bit unsafe to me) fix? Currently it is just causing test flakiness, so I don't think there is a great need to fix it immediately. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20602: [SPARK-23416][SS] handle streaming interrupts in ThreadU...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20602
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20602: [SPARK-23416][SS] handle streaming interrupts in ThreadU...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20602
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20602: [SPARK-23416][SS] handle streaming interrupts in ThreadU...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20602
  
    **[Test build #87423 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87423/testReport)** for PR 20602 at commit [`f620083`](https://github.com/apache/spark/commit/f6200837f30570edc2dc3145e8651d1f238c8e9b).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20602: [SPARK-23416][SS] handle streaming interrupts in ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20602#discussion_r168060518
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala ---
    @@ -369,7 +370,11 @@ abstract class StreamExecution(
             //                      exception
             // UncheckedExecutionException - thrown by codes that cannot throw a checked
             //                               ExecutionException, such as BiFunction.apply
    -        case e2 @ (_: UncheckedIOException | _: ExecutionException | _: UncheckedExecutionException)
    +        // SparkException - thrown if the interrupt happens in the middle of an RPC wait
    --- End diff --
    
    does it mean this issue is nothing to do with `WriteToDataSourceV2Exec`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20602: [SPARK-23416][SS] handle streaming interrupts in ThreadU...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20602
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87423/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20602: [SPARK-23416][SS] handle streaming interrupts in ...

Posted by jose-torres <gi...@git.apache.org>.
Github user jose-torres commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20602#discussion_r167998592
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala ---
    @@ -369,7 +370,11 @@ abstract class StreamExecution(
             //                      exception
             // UncheckedExecutionException - thrown by codes that cannot throw a checked
             //                               ExecutionException, such as BiFunction.apply
    -        case e2 @ (_: UncheckedIOException | _: ExecutionException | _: UncheckedExecutionException)
    +        // SparkException - thrown if the interrupt happens in the middle of an RPC wait
    +        case e2 @ (_: UncheckedIOException |
    +                   _: ExecutionException |
    +                   _: UncheckedExecutionException |
    +                   _: SparkException)
    --- End diff --
    
    SGTM. I'll close this then.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20602: [SPARK-23416][SS] handle streaming interrupts in ...

Posted by jose-torres <gi...@git.apache.org>.
Github user jose-torres closed the pull request at:

    https://github.com/apache/spark/pull/20602


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20602: [SPARK-23416][SS] handle streaming interrupts in ...

Posted by jose-torres <gi...@git.apache.org>.
Github user jose-torres commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20602#discussion_r167994087
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala ---
    @@ -369,7 +370,11 @@ abstract class StreamExecution(
             //                      exception
             // UncheckedExecutionException - thrown by codes that cannot throw a checked
             //                               ExecutionException, such as BiFunction.apply
    -        case e2 @ (_: UncheckedIOException | _: ExecutionException | _: UncheckedExecutionException)
    +        // SparkException - thrown if the interrupt happens in the middle of an RPC wait
    +        case e2 @ (_: UncheckedIOException |
    +                   _: ExecutionException |
    +                   _: UncheckedExecutionException |
    +                   _: SparkException)
    --- End diff --
    
    I agree with the worry, but I'm not sure I see a better solution.
    
    Other alternatives I can think of are matching against the specific exception message string, or changing ThreadUtils.awaitResult() to throw a custom exception. Do you have any thoughts?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20602: [SPARK-23416][SS] handle streaming interrupts in ThreadU...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on the issue:

    https://github.com/apache/spark/pull/20602
  
    @jose-torres it wraps all `Throwable`s with SparkException. I would prefer to throw fatal errors directly.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20602: [SPARK-23416][SS] handle streaming interrupts in ThreadU...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20602
  
    **[Test build #87423 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87423/testReport)** for PR 20602 at commit [`f620083`](https://github.com/apache/spark/commit/f6200837f30570edc2dc3145e8651d1f238c8e9b).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20602: [SPARK-23416][SS] handle streaming interrupts in ThreadU...

Posted by jose-torres <gi...@git.apache.org>.
Github user jose-torres commented on the issue:

    https://github.com/apache/spark/pull/20602
  
    Can you clarify? I don't think that WriteToDataSourceV2Exec is doing anything wrong here.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20602: [SPARK-23416][SS] handle streaming interrupts in ThreadU...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20602
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87421/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20602: [SPARK-23416][SS] handle streaming interrupts in ThreadU...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on the issue:

    https://github.com/apache/spark/pull/20602
  
    Can we just fix WriteToDataSourceV2Exec instead?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org