You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "warrenzhu25 (via GitHub)" <gi...@apache.org> on 2023/05/07 18:14:56 UTC

[GitHub] [spark] warrenzhu25 opened a new pull request, #41082: [SPARK-43398][CORE] Executor timeout should be max of idle shuffle and rdd timeout

warrenzhu25 opened a new pull request, #41082:
URL: https://github.com/apache/spark/pull/41082

   ### What changes were proposed in this pull request?
   Executor timeout should be max of idle, shuffle and rdd timeout
   
   
   ### Why are the changes needed?
   Wrong timeout value when combining idle, shuffle and rdd timeout
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   Added test in `ExecutorMonitorSuite`
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] warrenzhu25 commented on pull request #41082: [SPARK-43398][CORE] Executor timeout should be max of idle shuffle and rdd timeout

Posted by "warrenzhu25 (via GitHub)" <gi...@apache.org>.
warrenzhu25 commented on PR #41082:
URL: https://github.com/apache/spark/pull/41082#issuecomment-1537557668

   > The removed code seems to be originated from Apache Spark 3.0.0. Could you give a correct `Affected Versions` to [SPARK-43398](https://issues.apache.org/jira/browse/SPARK-43398) if you think this is a bug, @warrenzhu25 ?
   > 
   > ![Screenshot 2023-05-07 at 1 22 51 PM](https://user-images.githubusercontent.com/9700541/236700961-c5886940-1c55-40ff-9555-6980447f751d.png)
   
   Done.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] warrenzhu25 commented on a diff in pull request #41082: [SPARK-43398][CORE] Executor timeout should be max of idle shuffle and rdd timeout

Posted by "warrenzhu25 (via GitHub)" <gi...@apache.org>.
warrenzhu25 commented on code in PR #41082:
URL: https://github.com/apache/spark/pull/41082#discussion_r1192807969


##########
core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala:
##########
@@ -563,17 +563,10 @@ private[spark] class ExecutorMonitor(
     def updateTimeout(): Unit = {
       val oldDeadline = timeoutAt
       val newDeadline = if (idleStart >= 0) {
-        val timeout = if (cachedBlocks.nonEmpty || (shuffleIds != null && shuffleIds.nonEmpty)) {
-          val _cacheTimeout = if (cachedBlocks.nonEmpty) storageTimeoutNs else Long.MaxValue
-          val _shuffleTimeout = if (shuffleIds != null && shuffleIds.nonEmpty) {
-            shuffleTimeoutNs
-          } else {
-            Long.MaxValue
-          }
-          math.min(_cacheTimeout, _shuffleTimeout)
-        } else {
-          idleTimeoutNs
-        }
+        val _cacheTimeout = if (cachedBlocks.nonEmpty) storageTimeoutNs else 0
+        val _shuffleTimeout = if (shuffleIds.nonEmpty) shuffleTimeoutNs else 0

Review Comment:
   Done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] warrenzhu25 commented on a diff in pull request #41082: [SPARK-43398][CORE] Executor timeout should be max of idle shuffle and rdd timeout

Posted by "warrenzhu25 (via GitHub)" <gi...@apache.org>.
warrenzhu25 commented on code in PR #41082:
URL: https://github.com/apache/spark/pull/41082#discussion_r1192512299


##########
core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala:
##########
@@ -563,18 +563,7 @@ private[spark] class ExecutorMonitor(
     def updateTimeout(): Unit = {
       val oldDeadline = timeoutAt
       val newDeadline = if (idleStart >= 0) {
-        val timeout = if (cachedBlocks.nonEmpty || (shuffleIds != null && shuffleIds.nonEmpty)) {
-          val _cacheTimeout = if (cachedBlocks.nonEmpty) storageTimeoutNs else Long.MaxValue
-          val _shuffleTimeout = if (shuffleIds != null && shuffleIds.nonEmpty) {
-            shuffleTimeoutNs
-          } else {
-            Long.MaxValue
-          }
-          math.min(_cacheTimeout, _shuffleTimeout)
-        } else {
-          idleTimeoutNs
-        }
-        val deadline = idleStart + timeout
+        val deadline = idleStart + getTimeout()

Review Comment:
   Done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #41082: [SPARK-43398][CORE] Executor timeout should be max of idle shuffle and rdd timeout

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #41082:
URL: https://github.com/apache/spark/pull/41082#discussion_r1192758370


##########
core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala:
##########
@@ -563,17 +563,10 @@ private[spark] class ExecutorMonitor(
     def updateTimeout(): Unit = {
       val oldDeadline = timeoutAt
       val newDeadline = if (idleStart >= 0) {
-        val timeout = if (cachedBlocks.nonEmpty || (shuffleIds != null && shuffleIds.nonEmpty)) {
-          val _cacheTimeout = if (cachedBlocks.nonEmpty) storageTimeoutNs else Long.MaxValue
-          val _shuffleTimeout = if (shuffleIds != null && shuffleIds.nonEmpty) {
-            shuffleTimeoutNs
-          } else {
-            Long.MaxValue
-          }
-          math.min(_cacheTimeout, _shuffleTimeout)
-        } else {
-          idleTimeoutNs
-        }
+        val _cacheTimeout = if (cachedBlocks.nonEmpty) storageTimeoutNs else 0
+        val _shuffleTimeout = if (shuffleIds.nonEmpty) shuffleTimeoutNs else 0

Review Comment:
   BTW, could you add a test case which `shuffleTrackingEnabled` is false? I guess your new code might hit NPE in that test case.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun closed pull request #41082: [SPARK-43398][CORE] Executor timeout should be max of idle shuffle and rdd timeout

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun closed pull request #41082: [SPARK-43398][CORE] Executor timeout should be max of idle shuffle and rdd timeout
URL: https://github.com/apache/spark/pull/41082


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #41082: [SPARK-43398][CORE] Executor timeout should be max of idle shuffle and rdd timeout

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #41082:
URL: https://github.com/apache/spark/pull/41082#discussion_r1187982520


##########
core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala:
##########
@@ -550,7 +550,7 @@ private[spark] class ExecutorMonitor(
 
     // The set of shuffles for which shuffle data is held by the executor.
     // This should only be used in the event thread.
-    private val shuffleIds = if (shuffleTrackingEnabled) new mutable.HashSet[Int]() else null
+    private val shuffleIds = new mutable.HashSet[Int]()

Review Comment:
   Please revert this part because `if (shuffleIds != null && shuffleIds.nonEmpty)` is totally fine.
   For this kind of PR, we need to minimize as much as possible in order to backport easily, @warrenzhu25 .



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #41082: [SPARK-43398][CORE] Executor timeout should be max of idle shuffle and rdd timeout

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #41082:
URL: https://github.com/apache/spark/pull/41082#discussion_r1192757699


##########
core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala:
##########
@@ -563,17 +563,10 @@ private[spark] class ExecutorMonitor(
     def updateTimeout(): Unit = {
       val oldDeadline = timeoutAt
       val newDeadline = if (idleStart >= 0) {
-        val timeout = if (cachedBlocks.nonEmpty || (shuffleIds != null && shuffleIds.nonEmpty)) {
-          val _cacheTimeout = if (cachedBlocks.nonEmpty) storageTimeoutNs else Long.MaxValue
-          val _shuffleTimeout = if (shuffleIds != null && shuffleIds.nonEmpty) {
-            shuffleTimeoutNs
-          } else {
-            Long.MaxValue
-          }
-          math.min(_cacheTimeout, _shuffleTimeout)
-        } else {
-          idleTimeoutNs
-        }
+        val _cacheTimeout = if (cachedBlocks.nonEmpty) storageTimeoutNs else 0
+        val _shuffleTimeout = if (shuffleIds.nonEmpty) shuffleTimeoutNs else 0

Review Comment:
   `shuffleIds != null && shuffleIds.nonEmpty` instead of `shuffleIds.nonEmpty`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #41082: [SPARK-43398][CORE] Executor timeout should be max of idle shuffle and rdd timeout

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on PR #41082:
URL: https://github.com/apache/spark/pull/41082#issuecomment-1586768341

   Also, cc @Ngone51 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #41082: [SPARK-43398][CORE] Executor timeout should be max of idle shuffle and rdd timeout

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on PR #41082:
URL: https://github.com/apache/spark/pull/41082#issuecomment-1586758974

   `sql - other` module test failure is a know flaky issue. Thank you, @warrenzhu25 and @mridulm .
   
   Merged to master/3.4/3.3.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] warrenzhu25 commented on pull request #41082: [SPARK-43398][CORE] Executor timeout should be max of idle shuffle and rdd timeout

Posted by "warrenzhu25 (via GitHub)" <gi...@apache.org>.
warrenzhu25 commented on PR #41082:
URL: https://github.com/apache/spark/pull/41082#issuecomment-1537509158

   @dongjoon-hyun help take a look?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] warrenzhu25 commented on a diff in pull request #41082: [SPARK-43398][CORE] Executor timeout should be max of idle shuffle and rdd timeout

Posted by "warrenzhu25 (via GitHub)" <gi...@apache.org>.
warrenzhu25 commented on code in PR #41082:
URL: https://github.com/apache/spark/pull/41082#discussion_r1192512098


##########
core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala:
##########
@@ -550,7 +550,7 @@ private[spark] class ExecutorMonitor(
 
     // The set of shuffles for which shuffle data is held by the executor.
     // This should only be used in the event thread.
-    private val shuffleIds = if (shuffleTrackingEnabled) new mutable.HashSet[Int]() else null
+    private val shuffleIds = new mutable.HashSet[Int]()

Review Comment:
   Done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #41082: [SPARK-43398][CORE] Executor timeout should be max of idle shuffle and rdd timeout

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #41082:
URL: https://github.com/apache/spark/pull/41082#discussion_r1187983009


##########
core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala:
##########
@@ -563,18 +563,7 @@ private[spark] class ExecutorMonitor(
     def updateTimeout(): Unit = {
       val oldDeadline = timeoutAt
       val newDeadline = if (idleStart >= 0) {
-        val timeout = if (cachedBlocks.nonEmpty || (shuffleIds != null && shuffleIds.nonEmpty)) {
-          val _cacheTimeout = if (cachedBlocks.nonEmpty) storageTimeoutNs else Long.MaxValue
-          val _shuffleTimeout = if (shuffleIds != null && shuffleIds.nonEmpty) {
-            shuffleTimeoutNs
-          } else {
-            Long.MaxValue
-          }
-          math.min(_cacheTimeout, _shuffleTimeout)
-        } else {
-          idleTimeoutNs
-        }
-        val deadline = idleStart + timeout
+        val deadline = idleStart + getTimeout()

Review Comment:
   Please inline the code instead of making one-time used method.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org