You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/10/09 21:07:17 UTC

[GitHub] [spark] holdenk opened a new pull request #29992: [SPARK-32881][CORE] Catch some race condition errors and log them more clearly

holdenk opened a new pull request #29992:
URL: https://github.com/apache/spark/pull/29992


   ### What changes were proposed in this pull request?
   
   Decommissioning can run out of time resulting in some race condition, these race conditions result in confusing error messages but not negative impact.
   
   ### Why are the changes needed?
   
   The NPE & element missing errors in the log can create a missunderstanding.
   
   ### Does this PR introduce _any_ user-facing change?
   Logs change.
   
   ### How was this patch tested?
   Existing tests pass.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #29992: [SPARK-32881][CORE] Catch some race condition errors and log them more clearly

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #29992:
URL: https://github.com/apache/spark/pull/29992#issuecomment-713227168


   I can give a try. Do you have any stacktrace or symptom for this NPE issue? Seems SPARK-32881 did not throw NPE.
   
   @dongjoon-hyun, should SPARK-32881 be resolved after this fix, or is it open mistakenly?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] holdenk commented on pull request #29992: [SPARK-32881][CORE] Catch some race condition errors and log them more clearly

Posted by GitBox <gi...@apache.org>.
holdenk commented on pull request #29992:
URL: https://github.com/apache/spark/pull/29992#issuecomment-713645293


   So I think we get the null if the mapStatus is deleted _during_ the update. The problem is checking if it's null doesn't  guarantee it won't be deleted.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29992: [SPARK-32881][CORE] Catch some race condition errors and log them more clearly

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29992:
URL: https://github.com/apache/spark/pull/29992#issuecomment-706489349


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34215/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29992: [SPARK-32881][CORE] Catch some race condition errors and log them more clearly

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29992:
URL: https://github.com/apache/spark/pull/29992#issuecomment-706416005


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34203/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29992: [SPARK-32881][CORE] Catch some race condition errors and log them more clearly

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29992:
URL: https://github.com/apache/spark/pull/29992#issuecomment-706422540


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29992: [SPARK-32881][CORE] Catch some race condition errors and log them more clearly

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29992:
URL: https://github.com/apache/spark/pull/29992#issuecomment-706486767


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34215/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #29992: [SPARK-32881][CORE] Catch some race condition errors and log them more clearly

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #29992:
URL: https://github.com/apache/spark/pull/29992#issuecomment-709777531


   gentle ping. This PR introduces the very first place that catches and suppresses `NullPointerException` in the codebase - the second place that catches `NullPointerException` (`mllib/src/main/scala/org/apache/spark/ml/feature/VectorAssembler.scala`) rethrows NPE.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #29992: [SPARK-32881][CORE] Catch some race condition errors and log them more clearly

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #29992:
URL: https://github.com/apache/spark/pull/29992#issuecomment-708110379


   How did we test this, @holdenk? I can make a quick followup for that.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29992: [SPARK-32881][CORE] Catch some race condition errors and log them more clearly

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29992:
URL: https://github.com/apache/spark/pull/29992#issuecomment-706403237


   **[Test build #129599 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129599/testReport)** for PR 29992 at commit [`1ca46a4`](https://github.com/apache/spark/commit/1ca46a477020efa4e5bd11e4d55a82ef1e1b2254).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #29992: [SPARK-32881][CORE] Catch some race condition errors and log them more clearly

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #29992:
URL: https://github.com/apache/spark/pull/29992#issuecomment-717728435






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29992: [SPARK-32881][CORE] Catch some race condition errors and log them more clearly

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29992:
URL: https://github.com/apache/spark/pull/29992#issuecomment-706445738


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29992: [SPARK-32881][CORE] Catch some race condition errors and log them more clearly

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29992:
URL: https://github.com/apache/spark/pull/29992#issuecomment-706501529


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/129612/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #29992: [SPARK-32881][CORE] Catch some race condition errors and log them more clearly

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #29992:
URL: https://github.com/apache/spark/pull/29992#issuecomment-706482611






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #29992: [SPARK-32881][CORE] Catch some race condition errors and log them more clearly

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #29992:
URL: https://github.com/apache/spark/pull/29992#issuecomment-713332365


   Thanks, @HyukjinKwon . I resolved SPARK-32881 now.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29992: [SPARK-32881][CORE] Catch some race condition errors and log them more clearly

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29992:
URL: https://github.com/apache/spark/pull/29992#issuecomment-706489354






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29992: [SPARK-32881][CORE] Catch some race condition errors and log them more clearly

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29992:
URL: https://github.com/apache/spark/pull/29992#issuecomment-706489354






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29992: [SPARK-32881][CORE] Catch some race condition errors and log them more clearly

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29992:
URL: https://github.com/apache/spark/pull/29992#issuecomment-706445738






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29992: [SPARK-32881][CORE] Catch some race condition errors and log them more clearly

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29992:
URL: https://github.com/apache/spark/pull/29992#issuecomment-706403237


   **[Test build #129599 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129599/testReport)** for PR 29992 at commit [`1ca46a4`](https://github.com/apache/spark/commit/1ca46a477020efa4e5bd11e4d55a82ef1e1b2254).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29992: [SPARK-32881][CORE] Catch some race condition errors and log them more clearly

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29992:
URL: https://github.com/apache/spark/pull/29992#issuecomment-706501525


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29992: [SPARK-32881][CORE] Catch some race condition errors and log them more clearly

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29992:
URL: https://github.com/apache/spark/pull/29992#issuecomment-706501525






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun closed pull request #29992: [SPARK-32881][CORE] Catch some race condition errors and log them more clearly

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun closed pull request #29992:
URL: https://github.com/apache/spark/pull/29992


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #29992: [SPARK-32881][CORE] Catch some race condition errors and log them more clearly

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #29992:
URL: https://github.com/apache/spark/pull/29992#issuecomment-713337745


   Actually, this PR fixes both `java.lang.NullPointerException` and `java.util.NoSuchElementException`.
   I reported NPE to her personally in addition to SPARK-32881.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29992: [SPARK-32881][CORE] Catch some race condition errors and log them more clearly

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #29992:
URL: https://github.com/apache/spark/pull/29992#discussion_r502742469



##########
File path: core/src/main/scala/org/apache/spark/MapOutputTracker.scala
##########
@@ -125,14 +125,19 @@ private class ShuffleStatus(numPartitions: Int) extends Logging {
    * Update the map output location (e.g. during migration).
    */
   def updateMapOutput(mapId: Long, bmAddress: BlockManagerId): Unit = withWriteLock {
-    val mapStatusOpt = mapStatuses.find(_.mapId == mapId)
-    mapStatusOpt match {
-      case Some(mapStatus) =>
-        logInfo(s"Updating map output for ${mapId} to ${bmAddress}")
-        mapStatus.updateLocation(bmAddress)
-        invalidateSerializedMapOutputStatusCache()
-      case None =>
-        logError(s"Asked to update map output ${mapId} for untracked map status.")
+    try {
+      val mapStatusOpt = mapStatuses.find(_.mapId == mapId)
+      mapStatusOpt match {
+        case Some(mapStatus) =>
+          logInfo(s"Updating map output for ${mapId} to ${bmAddress}")
+          mapStatus.updateLocation(bmAddress)
+          invalidateSerializedMapOutputStatusCache()
+        case None =>
+          logWarning(s"Asked to update map output ${mapId} for untracked map status.")
+      }
+    } catch {
+      case e: java.lang.NullPointerException =>

Review comment:
       Shall we remove `java.lang.`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29992: [SPARK-32881][CORE] Catch some race condition errors and log them more clearly

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29992:
URL: https://github.com/apache/spark/pull/29992#issuecomment-706482990


   **[Test build #129612 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129612/testReport)** for PR 29992 at commit [`1ca46a4`](https://github.com/apache/spark/commit/1ca46a477020efa4e5bd11e4d55a82ef1e1b2254).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] holdenk edited a comment on pull request #29992: [SPARK-32881][CORE] Catch some race condition errors and log them more clearly

Posted by GitBox <gi...@apache.org>.
holdenk edited a comment on pull request #29992:
URL: https://github.com/apache/spark/pull/29992#issuecomment-713645293


   So I think we get the null if the mapStatus is deleted _during_ the update. The problem is checking if it's null doesn't  guarantee it won't be deleted after the check.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #29992: [SPARK-32881][CORE] Catch some race condition errors and log them more clearly

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #29992:
URL: https://github.com/apache/spark/pull/29992#issuecomment-713216544


   gentile ping. Would you mind asking to address my comments? I thought it wouldn't be difficult to address.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon removed a comment on pull request #29992: [SPARK-32881][CORE] Catch some race condition errors and log them more clearly

Posted by GitBox <gi...@apache.org>.
HyukjinKwon removed a comment on pull request #29992:
URL: https://github.com/apache/spark/pull/29992#issuecomment-717728435


   @holden, I am very sorry but can you guide me how you tested and/or share a traceback?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29992: [SPARK-32881][CORE] Catch some race condition errors and log them more clearly

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29992:
URL: https://github.com/apache/spark/pull/29992#issuecomment-706422546


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/34203/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #29992: [SPARK-32881][CORE] Catch some race condition errors and log them more clearly

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #29992:
URL: https://github.com/apache/spark/pull/29992#issuecomment-713345672


   That's good. I just wonder if we can just:
   - `blockManagerInfo.contains(...)` instead of catching `java.util.NoSuchElementException`
   -  Check if something is null, and just ignore instead of catching `java.lang.NullPointerException`
   
   I don't believe catching `java.lang.NullPointerException` in a block is a good practice because it will catch all NPE that can potentially happen everywhere.
   
   Would you mind if I ask the stacktrace? At least I want to understand why catching NPE is necessary even if I fail to make a fix.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #29992: [SPARK-32881][CORE] Catch some race condition errors and log them more clearly

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #29992:
URL: https://github.com/apache/spark/pull/29992#discussion_r504358081



##########
File path: core/src/main/scala/org/apache/spark/MapOutputTracker.scala
##########
@@ -125,14 +125,19 @@ private class ShuffleStatus(numPartitions: Int) extends Logging {
    * Update the map output location (e.g. during migration).
    */
   def updateMapOutput(mapId: Long, bmAddress: BlockManagerId): Unit = withWriteLock {
-    val mapStatusOpt = mapStatuses.find(_.mapId == mapId)
-    mapStatusOpt match {
-      case Some(mapStatus) =>
-        logInfo(s"Updating map output for ${mapId} to ${bmAddress}")
-        mapStatus.updateLocation(bmAddress)
-        invalidateSerializedMapOutputStatusCache()
-      case None =>
-        logError(s"Asked to update map output ${mapId} for untracked map status.")
+    try {
+      val mapStatusOpt = mapStatuses.find(_.mapId == mapId)
+      mapStatusOpt match {
+        case Some(mapStatus) =>
+          logInfo(s"Updating map output for ${mapId} to ${bmAddress}")
+          mapStatus.updateLocation(bmAddress)
+          invalidateSerializedMapOutputStatusCache()
+        case None =>
+          logWarning(s"Asked to update map output ${mapId} for untracked map status.")
+      }
+    } catch {
+      case e: java.lang.NullPointerException =>

Review comment:
       Quick question: can we avoid catching `NullPointerException`? It's a bit odd that we catch `NullPointerException`. We could just switch to if-else I guess.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29992: [SPARK-32881][CORE] Catch some race condition errors and log them more clearly

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29992:
URL: https://github.com/apache/spark/pull/29992#issuecomment-706422546


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/34203/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29992: [SPARK-32881][CORE] Catch some race condition errors and log them more clearly

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29992:
URL: https://github.com/apache/spark/pull/29992#issuecomment-706422523


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34203/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29992: [SPARK-32881][CORE] Catch some race condition errors and log them more clearly

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29992:
URL: https://github.com/apache/spark/pull/29992#issuecomment-706482990


   **[Test build #129612 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129612/testReport)** for PR 29992 at commit [`1ca46a4`](https://github.com/apache/spark/commit/1ca46a477020efa4e5bd11e4d55a82ef1e1b2254).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29992: [SPARK-32881][CORE] Catch some race condition errors and log them more clearly

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29992:
URL: https://github.com/apache/spark/pull/29992#issuecomment-706445351


   **[Test build #129599 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129599/testReport)** for PR 29992 at commit [`1ca46a4`](https://github.com/apache/spark/commit/1ca46a477020efa4e5bd11e4d55a82ef1e1b2254).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29992: [SPARK-32881][CORE] Catch some race condition errors and log them more clearly

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29992:
URL: https://github.com/apache/spark/pull/29992#issuecomment-706501279


   **[Test build #129612 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129612/testReport)** for PR 29992 at commit [`1ca46a4`](https://github.com/apache/spark/commit/1ca46a477020efa4e5bd11e4d55a82ef1e1b2254).
    * This patch **fails due to an unknown error code, -9**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29992: [SPARK-32881][CORE] Catch some race condition errors and log them more clearly

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #29992:
URL: https://github.com/apache/spark/pull/29992#discussion_r502742624



##########
File path: core/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala
##########
@@ -380,16 +380,22 @@ class BlockManagerMasterEndpoint(
    * @return Seq of ReplicateBlock
    */
   private def getReplicateInfoForRDDBlocks(blockManagerId: BlockManagerId): Seq[ReplicateBlock] = {
-    val info = blockManagerInfo(blockManagerId)
+    try {
+      val info = blockManagerInfo(blockManagerId)
 
-    val rddBlocks = info.blocks.keySet().asScala.filter(_.isRDD)
-    rddBlocks.map { blockId =>
-      val currentBlockLocations = blockLocations.get(blockId)
-      val maxReplicas = currentBlockLocations.size + 1
-      val remainingLocations = currentBlockLocations.toSeq.filter(bm => bm != blockManagerId)
-      val replicateMsg = ReplicateBlock(blockId, remainingLocations, maxReplicas)
-      replicateMsg
-    }.toSeq
+      val rddBlocks = info.blocks.keySet().asScala.filter(_.isRDD)
+      rddBlocks.map { blockId =>
+        val currentBlockLocations = blockLocations.get(blockId)
+        val maxReplicas = currentBlockLocations.size + 1
+        val remainingLocations = currentBlockLocations.toSeq.filter(bm => bm != blockManagerId)
+        val replicateMsg = ReplicateBlock(blockId, remainingLocations, maxReplicas)
+        replicateMsg
+      }.toSeq
+    } catch {
+      // If the block manager has already exited, nothing to replicate.
+      case e: java.util.NoSuchElementException =>

Review comment:
       Shall we import?

##########
File path: core/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala
##########
@@ -380,16 +380,22 @@ class BlockManagerMasterEndpoint(
    * @return Seq of ReplicateBlock
    */
   private def getReplicateInfoForRDDBlocks(blockManagerId: BlockManagerId): Seq[ReplicateBlock] = {
-    val info = blockManagerInfo(blockManagerId)
+    try {
+      val info = blockManagerInfo(blockManagerId)
 
-    val rddBlocks = info.blocks.keySet().asScala.filter(_.isRDD)
-    rddBlocks.map { blockId =>
-      val currentBlockLocations = blockLocations.get(blockId)
-      val maxReplicas = currentBlockLocations.size + 1
-      val remainingLocations = currentBlockLocations.toSeq.filter(bm => bm != blockManagerId)
-      val replicateMsg = ReplicateBlock(blockId, remainingLocations, maxReplicas)
-      replicateMsg
-    }.toSeq
+      val rddBlocks = info.blocks.keySet().asScala.filter(_.isRDD)
+      rddBlocks.map { blockId =>
+        val currentBlockLocations = blockLocations.get(blockId)
+        val maxReplicas = currentBlockLocations.size + 1
+        val remainingLocations = currentBlockLocations.toSeq.filter(bm => bm != blockManagerId)
+        val replicateMsg = ReplicateBlock(blockId, remainingLocations, maxReplicas)
+        replicateMsg
+      }.toSeq
+    } catch {
+      // If the block manager has already exited, nothing to replicate.
+      case e: java.util.NoSuchElementException =>

Review comment:
       Also, it would be great if we have a warning log here.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29992: [SPARK-32881][CORE] Catch some race condition errors and log them more clearly

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29992:
URL: https://github.com/apache/spark/pull/29992#issuecomment-706445741


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/129599/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] holdenk commented on pull request #29992: [SPARK-32881][CORE] Catch some race condition errors and log them more clearly

Posted by GitBox <gi...@apache.org>.
holdenk commented on pull request #29992:
URL: https://github.com/apache/spark/pull/29992#issuecomment-713224196


   Hey sorry I didn’t see your comments. To avoid this I don’t think we could do if else, we’d have to introduce locking I think. Since this only happens under a race condition where we don’t care about the data I think it’s ok. If you’d prefer we could also take that part out since the NPE will get eaten later. Happy to review a PR with whatever approach you want or work on a follow up if you want.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org