You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/09/24 11:03:04 UTC

[GitHub] [spark] lxian opened a new pull request #34098: [SPARK-36842][Core] TaskSchedulerImpl - stop TaskResultGetter properly

lxian opened a new pull request #34098:
URL: https://github.com/apache/spark/pull/34098


   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html
     2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html
     3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'.
     4. Be sure to keep the PR description updated to reflect all changes.
     5. Please write your PR title to summarize what this PR proposes.
     6. If possible, provide a concise example to reproduce the issue for a faster review.
     7. If you want to add a new configuration, please read the guideline first for naming configurations in
        'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'.
     8. If you want to add or modify an error type or message, please read the guideline first in
        'core/src/main/resources/error/README.md'.
   -->
   
   ### What changes were proposed in this pull request?
   Catch exception during TaskSchedulerImpl.stop() so that all components can be stopped properly
   
   ### Why are the changes needed?
   Otherwise some threads won't be stopped during spark session restart
   
   ### Does this PR introduce _any_ user-facing change?
   NO
   
   ### How was this patch tested?
   It's tested by 
   1. create a new spark session in yarn-client mode
   2. kill the spark application on yarn
   3. check that the spark context is stopped and create a new spark session
   4. do the above steps multiple times and verify that no task-result-getter threads number doesn't increase
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] lxian commented on pull request #34098: [SPARK-36842][Core] TaskSchedulerImpl - stop TaskResultGetter properly

Posted by GitBox <gi...@apache.org>.
lxian commented on pull request #34098:
URL: https://github.com/apache/spark/pull/34098#issuecomment-930080438


   > > I've checked SparkEnv.stop and I didn't find obvious unchecked exceptions that would break the stop(). I think it may not be necessary to do the same try..catch to SparkEnv.stop
   > 
   > We have the following which can throw exceptions:
   > 
   > * `mapOutputTracker.stop()` can throw SparkException in case of timeout.
   > * `blockManager.stop()` can throw `InterruptedException`.
   > * `metricsSystem.stop()` could throw exception - depends on the sink.
   > * `rpcEnv.shutdown()` could throw `InterruptedException` (and others ?).
   >   
   >   * `rpcEnv.awaitTermination` could throw `InterruptedException`.
   > 
   > Note that `sparkEnv.stop` itself is protected - and so would not cause sc stop to be blocked. Pls check if any of the above would prevent cleanup - and so reinit of a new sc to fail/be problematic.
   
   1. `mapOutputTracker.stop()`. I think the SparkException is already been wrapped with try..catch. I copied the code snippet below 
   ```
     override def stop(): Unit = {
       mapOutputTrackerMasterMessages.offer(PoisonPill)
       threadpool.shutdown()
       try {
         sendTracker(StopMapOutputTracker)
       } catch {
         case e: SparkException =>
           logError("Could not tell tracker we are stopping.", e)
       }
       trackerEndpoint = null
       shuffleStatuses.clear()
     }
   ``` 
   2. `metricsSystem.stop()`. I've checked the current implementations of `sink`, and I didn't find one implementation that would through an Exception.
   3. And as for `InterruptedException`, I think they don't belong to the "NonFatal" type of Throwables. Maybe the best way to handle it is just let it been thrown out ?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on pull request #34098: [SPARK-36842][Core] TaskSchedulerImpl - stop TaskResultGetter properly

Posted by GitBox <gi...@apache.org>.
mridulm commented on pull request #34098:
URL: https://github.com/apache/spark/pull/34098#issuecomment-955647214


   Thanks for digging more @lxian !
   To add to the answers to my queries.
   
   * Re: `mapOutputTracker.stop()` can throw `SparkException` in case of timeout
     * As @lxian pointed out, this cant happen now after Holden's changes (I think I might have been looking at a different branch, sorry for the confusion).
   * Re: `metricsSystem.stop()` could throw exception - depends on the sink.
     * As @lxian detailed, current spark Sink's should not cause this to happen. Having said that:
     * Spark supports plugging in custom Sink's : so looking only at what exists in our codebase is unfortunately insufficient.
       * An exception here prevents everything else in `SparkEnv.stop` from running
     *  To be defensive, handling this would be better - thoughts ?
   
   Both of these below are related to `InterruptedException`:
   * `blockManager.stop()` can throw `InterruptedException`
   * `rpcEnv.awaitTermination` could throw `InterruptedException`
   
   I agree with @lxian, that is not caught by `Utils.tryLogNonFatalError` anyway - so let us preserve existing behavior for that.
   
   Given the above, can we address the potential issue with Sink.close ?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm edited a comment on pull request #34098: [SPARK-36842][Core] TaskSchedulerImpl - stop TaskResultGetter properly

Posted by GitBox <gi...@apache.org>.
mridulm edited a comment on pull request #34098:
URL: https://github.com/apache/spark/pull/34098#issuecomment-928579930


   > I've checked SparkEnv.stop and I didn't find obvious unchecked exceptions that would break the stop(). I think it may not be necessary to do the same try..catch to SparkEnv.stop
   
   We have the following which can throw exceptions:
   * `mapOutputTracker.stop()` can throw SparkException in case of timeout.
   * `blockManager.stop()` can throw `InterruptedException`.
   * `metricsSystem.stop()` could throw exception - depends on the sink.
   * `rpcEnv.shutdown()` could throw `InterruptedException` (and others ?).
     * `rpcEnv.awaitTermination` could throw `InterruptedException`.
   
   Note that `sparkEnv.stop` itself is protected - and so would not cause sc stop to be blocked.
   Pls check if any of the above would prevent cleanup - and so reinit of a new sc to fail/be problematic.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] lxian commented on pull request #34098: [SPARK-36842][Core] TaskSchedulerImpl - stop TaskResultGetter properly

Posted by GitBox <gi...@apache.org>.
lxian commented on pull request #34098:
URL: https://github.com/apache/spark/pull/34098#issuecomment-930080438


   > > I've checked SparkEnv.stop and I didn't find obvious unchecked exceptions that would break the stop(). I think it may not be necessary to do the same try..catch to SparkEnv.stop
   > 
   > We have the following which can throw exceptions:
   > 
   > * `mapOutputTracker.stop()` can throw SparkException in case of timeout.
   > * `blockManager.stop()` can throw `InterruptedException`.
   > * `metricsSystem.stop()` could throw exception - depends on the sink.
   > * `rpcEnv.shutdown()` could throw `InterruptedException` (and others ?).
   >   
   >   * `rpcEnv.awaitTermination` could throw `InterruptedException`.
   > 
   > Note that `sparkEnv.stop` itself is protected - and so would not cause sc stop to be blocked. Pls check if any of the above would prevent cleanup - and so reinit of a new sc to fail/be problematic.
   
   1. `mapOutputTracker.stop()`. I think the SparkException is already been wrapped with try..catch. I copied the code snippet below 
   ```
     override def stop(): Unit = {
       mapOutputTrackerMasterMessages.offer(PoisonPill)
       threadpool.shutdown()
       try {
         sendTracker(StopMapOutputTracker)
       } catch {
         case e: SparkException =>
           logError("Could not tell tracker we are stopping.", e)
       }
       trackerEndpoint = null
       shuffleStatuses.clear()
     }
   ``` 
   2. `metricsSystem.stop()`. I've checked the current implementations of `sink`, and I didn't find one implementation that would through an Exception.
   3. And as for `InterruptedException`, I think they don't belong to the "NonFatal" type of Throwables. Maybe the best way to handle it is just let it been thrown out ?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34098: [SPARK-36842][Core] TaskSchedulerImpl - stop TaskResultGetter properly

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34098:
URL: https://github.com/apache/spark/pull/34098#issuecomment-927421647


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48149/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34098: [SPARK-36842][Core] TaskSchedulerImpl - stop TaskResultGetter properly

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34098:
URL: https://github.com/apache/spark/pull/34098#issuecomment-927452514


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48149/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #34098: [SPARK-36842][Core] TaskSchedulerImpl - stop TaskResultGetter properly

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #34098:
URL: https://github.com/apache/spark/pull/34098#issuecomment-927224007


   cc @mridulm, @Ngone51 and @tgravescs FYI


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on pull request #34098: [SPARK-36842][Core] TaskSchedulerImpl - stop TaskResultGetter properly

Posted by GitBox <gi...@apache.org>.
mridulm commented on pull request #34098:
URL: https://github.com/apache/spark/pull/34098#issuecomment-927237439


   Do you want to do the same within `SparkEnv.stop` as well ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34098: [SPARK-36842][Core] TaskSchedulerImpl - stop TaskResultGetter properly

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34098:
URL: https://github.com/apache/spark/pull/34098#issuecomment-927492269


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143637/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #34098: [SPARK-36842][Core] TaskSchedulerImpl - stop TaskResultGetter properly

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #34098:
URL: https://github.com/apache/spark/pull/34098#issuecomment-927403243


   ok to test


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm edited a comment on pull request #34098: [SPARK-36842][Core] TaskSchedulerImpl - stop TaskResultGetter properly

Posted by GitBox <gi...@apache.org>.
mridulm edited a comment on pull request #34098:
URL: https://github.com/apache/spark/pull/34098#issuecomment-928579930


   > I've checked SparkEnv.stop and I didn't find obvious unchecked exceptions that would break the stop(). I think it may not be necessary to do the same try..catch to SparkEnv.stop
   
   We have the following which can throw exceptions:
   * `mapOutputTracker.stop()` can throw SparkException in case of timeout.
   * `blockManager.stop()` can throw `InterruptedException`.
   * `metricsSystem.stop()` could throw exception - depends on the sink.
   * `rpcEnv.shutdown()` could throw `InterruptedException` (and others ?).
     * `rpcEnv.awaitTermination` could throw `InterruptedException`.
   
   Note that `sparkEnv.stop` itself is protected - and so would not cause sc stop to be blocked.
   Pls check if any of the above would prevent cleanup - and so reinit of a new sc to fail/be problematic.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on a change in pull request #34098: [SPARK-36842][Core] TaskSchedulerImpl - stop TaskResultGetter properly

Posted by GitBox <gi...@apache.org>.
srowen commented on a change in pull request #34098:
URL: https://github.com/apache/spark/pull/34098#discussion_r739814914



##########
File path: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala
##########
@@ -928,13 +928,17 @@ private[spark] class TaskSchedulerImpl(
   override def stop(): Unit = {
     speculationScheduler.shutdown()
     if (backend != null) {
-      backend.stop()
+      Utils.tryLogNonFatalError {
+        backend.stop()

Review comment:
       How about wrapping the others too?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm edited a comment on pull request #34098: [SPARK-36842][Core] TaskSchedulerImpl - stop TaskResultGetter properly

Posted by GitBox <gi...@apache.org>.
mridulm edited a comment on pull request #34098:
URL: https://github.com/apache/spark/pull/34098#issuecomment-927237439


   The change looks good to me.
   Do you want to do the same within `SparkEnv.stop` as well ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34098: [SPARK-36842][Core] TaskSchedulerImpl - stop TaskResultGetter properly

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34098:
URL: https://github.com/apache/spark/pull/34098#issuecomment-926542548


   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34098: [SPARK-36842][Core] TaskSchedulerImpl - stop TaskResultGetter properly

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34098:
URL: https://github.com/apache/spark/pull/34098#issuecomment-927443807


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48149/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on pull request #34098: [SPARK-36842][Core] TaskSchedulerImpl - stop TaskResultGetter properly

Posted by GitBox <gi...@apache.org>.
mridulm commented on pull request #34098:
URL: https://github.com/apache/spark/pull/34098#issuecomment-928579930


   > I've checked SparkEnv.stop and I didn't find obvious unchecked exceptions that would break the stop(). I think it may not be necessary to do the same try..catch to SparkEnv.stop
   
   We have the following which can throw exceptions:
   * `mapOutputTracker.stop()` can throw SparkException in case of timeout.
   * `blockManager.stop()` can throw `InterruptedException`.
   * `metricsSystem.stop()` could throw exception - depends on the sink.
   * `rpcEnv.shutdown()` could throw `InterruptedException` (and others ?).
     * `rpcEnv.awaitTermination` could throw `InterruptedException`.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34098: [SPARK-36842][Core] TaskSchedulerImpl - stop TaskResultGetter properly

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34098:
URL: https://github.com/apache/spark/pull/34098#issuecomment-926542548


   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #34098: [SPARK-36842][Core] TaskSchedulerImpl - stop TaskResultGetter properly

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #34098:
URL: https://github.com/apache/spark/pull/34098#issuecomment-927403582


   **[Test build #143637 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143637/testReport)** for PR 34098 at commit [`b7580d1`](https://github.com/apache/spark/commit/b7580d17bc37c84e06ba8cd15f0677cf731330df).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] lxian commented on pull request #34098: [SPARK-36842][Core] TaskSchedulerImpl - stop TaskResultGetter properly

Posted by GitBox <gi...@apache.org>.
lxian commented on pull request #34098:
URL: https://github.com/apache/spark/pull/34098#issuecomment-927786322


   > The change looks good to me.
   > Do you want to do the same within `SparkEnv.stop` as well ?
   
   I've checked `SparkEnv.stop` and I didn't find obvious unchecked exceptions that would break the stop(). I think it may not be necessary to do the same try..catch to `SparkEnv.stop`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm edited a comment on pull request #34098: [SPARK-36842][Core] TaskSchedulerImpl - stop TaskResultGetter properly

Posted by GitBox <gi...@apache.org>.
mridulm edited a comment on pull request #34098:
URL: https://github.com/apache/spark/pull/34098#issuecomment-955647214


   Thanks for digging more @lxian !
   Apologies for the delay in getting back on this; and to add to the answers to my queries.
   
   * Re: `mapOutputTracker.stop()` can throw `SparkException` in case of timeout
     * As @lxian pointed out, this cant happen now after Holden's changes (I think I might have been looking at a different branch, sorry for the confusion).
   * Re: `metricsSystem.stop()` could throw exception - depends on the sink.
     * As @lxian detailed, current spark Sink's should not cause this to happen. Having said that:
     * Spark supports plugging in custom Sink's : so looking only at what exists in our codebase is unfortunately insufficient.
       * An exception here prevents everything else in `SparkEnv.stop` from running
     *  To be defensive, handling this would be better - thoughts ?
   
   Both of these below are related to `InterruptedException`:
   * `blockManager.stop()` can throw `InterruptedException`
   * `rpcEnv.awaitTermination` could throw `InterruptedException`
   
   I agree with @lxian, that is not caught by `Utils.tryLogNonFatalError` anyway - so let us preserve existing behavior for that.
   
   Given the above, can we address the potential issue with Sink.close ?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34098: [SPARK-36842][Core] TaskSchedulerImpl - stop TaskResultGetter properly

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34098:
URL: https://github.com/apache/spark/pull/34098#issuecomment-927492269


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143637/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on a change in pull request #34098: [SPARK-36842][Core] TaskSchedulerImpl - stop TaskResultGetter properly

Posted by GitBox <gi...@apache.org>.
mridulm commented on a change in pull request #34098:
URL: https://github.com/apache/spark/pull/34098#discussion_r717152525



##########
File path: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala
##########
@@ -928,13 +928,17 @@ private[spark] class TaskSchedulerImpl(
   override def stop(): Unit = {
     speculationScheduler.shutdown()
     if (backend != null) {
-      backend.stop()
+      Utils.tryLogNonFatalError {
+        backend.stop()

Review comment:
       Checking more, what is the exception thrown in `barrierCoordinator.stop` ?
   The should be defensive, and should not have resulted in failures.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on a change in pull request #34098: [SPARK-36842][Core] TaskSchedulerImpl - stop TaskResultGetter properly

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on a change in pull request #34098:
URL: https://github.com/apache/spark/pull/34098#discussion_r716307491



##########
File path: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala
##########
@@ -928,13 +928,17 @@ private[spark] class TaskSchedulerImpl(
   override def stop(): Unit = {
     speculationScheduler.shutdown()
     if (backend != null) {
-      backend.stop()
+      Utils.tryLogNonFatalError {
+        backend.stop()

Review comment:
       What's the exception you encountered here?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] lxian commented on pull request #34098: [SPARK-36842][Core] TaskSchedulerImpl - stop TaskResultGetter properly

Posted by GitBox <gi...@apache.org>.
lxian commented on pull request #34098:
URL: https://github.com/apache/spark/pull/34098#issuecomment-927291772


   > The change looks good to me.
   > Do you want to do the same within `SparkEnv.stop` as well ?
   
   sure, let me do it as well


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34098: [SPARK-36842][Core] TaskSchedulerImpl - stop TaskResultGetter properly

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34098:
URL: https://github.com/apache/spark/pull/34098#issuecomment-927452514


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48149/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tgravescs commented on pull request #34098: [SPARK-36842][Core] TaskSchedulerImpl - stop TaskResultGetter properly

Posted by GitBox <gi...@apache.org>.
tgravescs commented on pull request #34098:
URL: https://github.com/apache/spark/pull/34098#issuecomment-927853746


   change look fine to me, it would be nice to have the stack trace on the exception thrown.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34098: [SPARK-36842][Core] TaskSchedulerImpl - stop TaskResultGetter properly

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34098:
URL: https://github.com/apache/spark/pull/34098#issuecomment-927483398


   **[Test build #143637 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143637/testReport)** for PR 34098 at commit [`b7580d1`](https://github.com/apache/spark/commit/b7580d17bc37c84e06ba8cd15f0677cf731330df).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] github-actions[bot] commented on pull request #34098: [SPARK-36842][Core] TaskSchedulerImpl - stop TaskResultGetter properly

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #34098:
URL: https://github.com/apache/spark/pull/34098#issuecomment-1033189624


   We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] github-actions[bot] closed pull request #34098: [SPARK-36842][Core] TaskSchedulerImpl - stop TaskResultGetter properly

Posted by GitBox <gi...@apache.org>.
github-actions[bot] closed pull request #34098:
URL: https://github.com/apache/spark/pull/34098


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34098: [SPARK-36842][Core] TaskSchedulerImpl - stop TaskResultGetter properly

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34098:
URL: https://github.com/apache/spark/pull/34098#issuecomment-927403582


   **[Test build #143637 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143637/testReport)** for PR 34098 at commit [`b7580d1`](https://github.com/apache/spark/commit/b7580d17bc37c84e06ba8cd15f0677cf731330df).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on a change in pull request #34098: [SPARK-36842][Core] TaskSchedulerImpl - stop TaskResultGetter properly

Posted by GitBox <gi...@apache.org>.
mridulm commented on a change in pull request #34098:
URL: https://github.com/apache/spark/pull/34098#discussion_r717152525



##########
File path: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala
##########
@@ -928,13 +928,17 @@ private[spark] class TaskSchedulerImpl(
   override def stop(): Unit = {
     speculationScheduler.shutdown()
     if (backend != null) {
-      backend.stop()
+      Utils.tryLogNonFatalError {
+        backend.stop()

Review comment:
       Checking more, what is the exception thrown in `barrierCoordinator.stop` ?
   The should be defensive, and should not have resulted in failures.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] lxian commented on a change in pull request #34098: [SPARK-36842][Core] TaskSchedulerImpl - stop TaskResultGetter properly

Posted by GitBox <gi...@apache.org>.
lxian commented on a change in pull request #34098:
URL: https://github.com/apache/spark/pull/34098#discussion_r716570505



##########
File path: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala
##########
@@ -928,13 +928,17 @@ private[spark] class TaskSchedulerImpl(
   override def stop(): Unit = {
     speculationScheduler.shutdown()
     if (backend != null) {
-      backend.stop()
+      Utils.tryLogNonFatalError {
+        backend.stop()

Review comment:
       when deployed in yarn, org.apache.spark.scheduler.cluster.YarnSchedulerBackend#stop will call requestTotalExecutors() on stop. If the yarn application is killed already, it will receive an IOException on sending the RPC.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on pull request #34098: [SPARK-36842][Core] TaskSchedulerImpl - stop TaskResultGetter properly

Posted by GitBox <gi...@apache.org>.
mridulm commented on pull request #34098:
URL: https://github.com/apache/spark/pull/34098#issuecomment-928579930


   > I've checked SparkEnv.stop and I didn't find obvious unchecked exceptions that would break the stop(). I think it may not be necessary to do the same try..catch to SparkEnv.stop
   
   We have the following which can throw exceptions:
   * `mapOutputTracker.stop()` can throw SparkException in case of timeout.
   * `blockManager.stop()` can throw `InterruptedException`.
   * `metricsSystem.stop()` could throw exception - depends on the sink.
   * `rpcEnv.shutdown()` could throw `InterruptedException` (and others ?).
     * `rpcEnv.awaitTermination` could throw `InterruptedException`.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org