You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/10/26 23:02:49 UTC

[GitHub] [spark] holdenk opened a new pull request #30155: [SPARK-33231][CORE] Make pod allocation executor timeouts configurable

holdenk opened a new pull request #30155:
URL: https://github.com/apache/spark/pull/30155


   
   ### What changes were proposed in this pull request?
   
   Make pod allocation executor timeouts configurable. Keep all known pods in mind when allocating executors to avoid over allocating if the pending time is much higher than the allocation interval.
   
   ### Why are the changes needed?
   The current executor timeouts do not match that of all real world clusters especially under load. While this can be worked around by increasing the allocation batch delay, that will decrease the speed at which the total number of executors will be able to be requested.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes new configuration property
   
   
   ### How was this patch tested?
   
   Updated existing test to use the timeout from the new configuration property. Verified test failed without the update.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30155: [SPARK-33231][CORE] Make pod allocation executor timeouts configurable

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30155:
URL: https://github.com/apache/spark/pull/30155#issuecomment-716912043






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30155: [SPARK-33231][CORE] Make pod allocation executor timeouts configurable

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #30155:
URL: https://github.com/apache/spark/pull/30155#discussion_r512327125



##########
File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
##########
@@ -227,6 +227,14 @@ private[spark] object Config extends Logging {
       .checkValue(value => value > 0, "Allocation batch delay must be a positive time value.")
       .createWithDefaultString("1s")
 
+  val KUBERNETES_ALLOCATION_EXECUTOR_TIMEOUT =
+    ConfigBuilder("spark.kubernetes.allocation.executor.timeout")
+      .doc("Time to wait before considering a pending executor timedout.")
+      .version("3.1.0")
+      .timeConf(TimeUnit.MILLISECONDS)
+      .checkValue(value => value > 0, "Allocation batch delay must be a positive time value.")

Review comment:
       Is this `Allocation batch delay`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30155: [SPARK-33231][CORE] Make pod allocation executor timeouts configurable

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #30155:
URL: https://github.com/apache/spark/pull/30155#discussion_r512328539



##########
File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
##########
@@ -227,6 +227,14 @@ private[spark] object Config extends Logging {
       .checkValue(value => value > 0, "Allocation batch delay must be a positive time value.")
       .createWithDefaultString("1s")
 
+  val KUBERNETES_ALLOCATION_EXECUTOR_TIMEOUT =
+    ConfigBuilder("spark.kubernetes.allocation.executor.timeout")
+      .doc("Time to wait before considering a pending executor timedout.")
+      .version("3.1.0")
+      .timeConf(TimeUnit.MILLISECONDS)
+      .checkValue(value => value > 0, "Allocation batch delay must be a positive time value.")
+      .createWithDefaultString("600s")

Review comment:
       Technically, this is the existing value, `600s`. So, which value did you need to change? I'm curious about your use cases. And, do you want to propose a different default value?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30155: [SPARK-33231][CORE] Make pod allocation executor timeouts configurable

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #30155:
URL: https://github.com/apache/spark/pull/30155#discussion_r512430454



##########
File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
##########
@@ -227,6 +227,14 @@ private[spark] object Config extends Logging {
       .checkValue(value => value > 0, "Allocation batch delay must be a positive time value.")
       .createWithDefaultString("1s")
 
+  val KUBERNETES_ALLOCATION_EXECUTOR_TIMEOUT =
+    ConfigBuilder("spark.kubernetes.allocation.executor.timeout")
+      .doc("Time to wait before considering a pending executor timedout.")
+      .version("3.1.0")
+      .timeConf(TimeUnit.MILLISECONDS)
+      .checkValue(value => value > 0, "Allocation batch delay must be a positive time value.")
+      .createWithDefaultString("600s")

Review comment:
       Then, we had better mention explicitly that this PR is increasing the value in this PR.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30155: [SPARK-33231][CORE] Make pod allocation executor timeouts configurable

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30155:
URL: https://github.com/apache/spark/pull/30155#issuecomment-716875202






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30155: [SPARK-33231][CORE] Make pod allocation executor timeouts configurable

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #30155:
URL: https://github.com/apache/spark/pull/30155#discussion_r512327125



##########
File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
##########
@@ -227,6 +227,14 @@ private[spark] object Config extends Logging {
       .checkValue(value => value > 0, "Allocation batch delay must be a positive time value.")
       .createWithDefaultString("1s")
 
+  val KUBERNETES_ALLOCATION_EXECUTOR_TIMEOUT =
+    ConfigBuilder("spark.kubernetes.allocation.executor.timeout")
+      .doc("Time to wait before considering a pending executor timedout.")
+      .version("3.1.0")
+      .timeConf(TimeUnit.MILLISECONDS)
+      .checkValue(value => value > 0, "Allocation batch delay must be a positive time value.")

Review comment:
       Is this `Allocation batch delay`? `Timeout` and `Delay` have different meanings.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30155: [SPARK-33231][CORE] Make pod allocation executor timeouts configurable

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30155:
URL: https://github.com/apache/spark/pull/30155#issuecomment-716886195


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34903/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30155: [SPARK-33231][CORE] Make pod allocation executor timeouts configurable

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30155:
URL: https://github.com/apache/spark/pull/30155#issuecomment-716871108


   **[Test build #130301 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130301/testReport)** for PR 30155 at commit [`4e08327`](https://github.com/apache/spark/commit/4e083274441c544cb33f6f8b5d3fbc544c7fa7fc).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30155: [SPARK-33231][CORE] Make pod allocation executor timeouts configurable

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30155:
URL: https://github.com/apache/spark/pull/30155#issuecomment-716894347


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30155: [SPARK-33231][CORE] Make pod allocation executor timeouts configurable

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30155:
URL: https://github.com/apache/spark/pull/30155#issuecomment-716894352


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/34903/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30155: [SPARK-33231][CORE] Make pod allocation executor timeouts configurable

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30155:
URL: https://github.com/apache/spark/pull/30155#issuecomment-716894338


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34903/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30155: [SPARK-33231][CORE] Make pod allocation executor timeouts configurable

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30155:
URL: https://github.com/apache/spark/pull/30155#issuecomment-716896945






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30155: [SPARK-33231][CORE] Make pod allocation executor timeouts configurable

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30155:
URL: https://github.com/apache/spark/pull/30155#issuecomment-716871108


   **[Test build #130301 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130301/testReport)** for PR 30155 at commit [`4e08327`](https://github.com/apache/spark/commit/4e083274441c544cb33f6f8b5d3fbc544c7fa7fc).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30155: [SPARK-33231][CORE] Make pod allocation executor timeouts configurable

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #30155:
URL: https://github.com/apache/spark/pull/30155#discussion_r512330529



##########
File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala
##########
@@ -214,10 +216,9 @@ private[spark] class ExecutorPodsAllocator(
     }
 
     if (newlyCreatedExecutors.isEmpty
-        && currentPendingExecutors.isEmpty
-        && currentRunningCount < currentTotalExpectedExecutors) {
+        && knownPodCount < currentTotalExpectedExecutors) {

Review comment:
       @holdenk . Please split this part.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30155: [SPARK-33231][CORE] Make pod allocation executor timeouts configurable

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30155:
URL: https://github.com/apache/spark/pull/30155#issuecomment-716894347






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30155: [SPARK-33231][CORE] Make pod allocation executor timeouts configurable

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30155:
URL: https://github.com/apache/spark/pull/30155#issuecomment-716875052


   **[Test build #130301 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130301/testReport)** for PR 30155 at commit [`4e08327`](https://github.com/apache/spark/commit/4e083274441c544cb33f6f8b5d3fbc544c7fa7fc).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] holdenk commented on a change in pull request #30155: [SPARK-33231][CORE] Make pod allocation executor timeouts configurable

Posted by GitBox <gi...@apache.org>.
holdenk commented on a change in pull request #30155:
URL: https://github.com/apache/spark/pull/30155#discussion_r512337317



##########
File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
##########
@@ -227,6 +227,14 @@ private[spark] object Config extends Logging {
       .checkValue(value => value > 0, "Allocation batch delay must be a positive time value.")
       .createWithDefaultString("1s")
 
+  val KUBERNETES_ALLOCATION_EXECUTOR_TIMEOUT =
+    ConfigBuilder("spark.kubernetes.allocation.executor.timeout")
+      .doc("Time to wait before considering a pending executor timedout.")
+      .version("3.1.0")
+      .timeConf(TimeUnit.MILLISECONDS)
+      .checkValue(value => value > 0, "Allocation batch delay must be a positive time value.")
+      .createWithDefaultString("600s")

Review comment:
       I believe the existing value is 60s.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30155: [SPARK-33231][CORE] Make pod allocation executor timeouts configurable

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30155:
URL: https://github.com/apache/spark/pull/30155#issuecomment-716912027


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34906/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30155: [SPARK-33231][CORE] Make pod allocation executor timeouts configurable

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30155:
URL: https://github.com/apache/spark/pull/30155#issuecomment-716896945






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] holdenk commented on a change in pull request #30155: [SPARK-33231][CORE] Make pod allocation executor timeouts configurable

Posted by GitBox <gi...@apache.org>.
holdenk commented on a change in pull request #30155:
URL: https://github.com/apache/spark/pull/30155#discussion_r512926302



##########
File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
##########
@@ -227,6 +227,14 @@ private[spark] object Config extends Logging {
       .checkValue(value => value > 0, "Allocation batch delay must be a positive time value.")
       .createWithDefaultString("1s")
 
+  val KUBERNETES_ALLOCATION_EXECUTOR_TIMEOUT =
+    ConfigBuilder("spark.kubernetes.allocation.executor.timeout")
+      .doc("Time to wait before considering a pending executor timedout.")
+      .version("3.1.0")
+      .timeConf(TimeUnit.MILLISECONDS)
+      .checkValue(value => value > 0, "Allocation batch delay must be a positive time value.")
+      .createWithDefaultString("600s")

Review comment:
       Good call




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30155: [SPARK-33231][CORE] Make pod allocation executor timeouts configurable

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30155:
URL: https://github.com/apache/spark/pull/30155#issuecomment-716896795


   **[Test build #130304 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130304/testReport)** for PR 30155 at commit [`90ad4d1`](https://github.com/apache/spark/commit/90ad4d140a80df379384f81ffbbdfec758eb1b39).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] holdenk commented on pull request #30155: [SPARK-33231][CORE] Make pod allocation executor timeouts configurable

Posted by GitBox <gi...@apache.org>.
holdenk commented on pull request #30155:
URL: https://github.com/apache/spark/pull/30155#issuecomment-716889302


   Revert the change in ExecutorPodsAllocatorSuite.scala and then try and run the test


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] holdenk commented on pull request #30155: [SPARK-33231][SPARK-33262][CORE] Make pod allocation executor timeouts configurable & allow scheduling with pending pods.

Posted by GitBox <gi...@apache.org>.
holdenk commented on pull request #30155:
URL: https://github.com/apache/spark/pull/30155#issuecomment-717438026


   Added the 2nd JIRA and updated the description @dongjoon-hyun 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30155: [SPARK-33231][CORE] Make pod allocation executor timeouts configurable

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30155:
URL: https://github.com/apache/spark/pull/30155#issuecomment-716905709


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34906/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30155: [SPARK-33231][CORE] Make pod allocation executor timeouts configurable

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30155:
URL: https://github.com/apache/spark/pull/30155#issuecomment-716892631


   **[Test build #130304 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130304/testReport)** for PR 30155 at commit [`90ad4d1`](https://github.com/apache/spark/commit/90ad4d140a80df379384f81ffbbdfec758eb1b39).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30155: [SPARK-33231][CORE] Make pod allocation executor timeouts configurable

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #30155:
URL: https://github.com/apache/spark/pull/30155#discussion_r512330374



##########
File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala
##########
@@ -214,10 +216,9 @@ private[spark] class ExecutorPodsAllocator(
     }
 
     if (newlyCreatedExecutors.isEmpty
-        && currentPendingExecutors.isEmpty
-        && currentRunningCount < currentTotalExpectedExecutors) {
+        && knownPodCount < currentTotalExpectedExecutors) {

Review comment:
       This is irrelevant to `[SPARK-33231][CORE] Make pod allocation executor timeouts configurable`. Could you make another PR about this logic change?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30155: [SPARK-33231][CORE] Make pod allocation executor timeouts configurable

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30155:
URL: https://github.com/apache/spark/pull/30155#issuecomment-716875202






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] holdenk commented on a change in pull request #30155: [SPARK-33231][CORE] Make pod allocation executor timeouts configurable

Posted by GitBox <gi...@apache.org>.
holdenk commented on a change in pull request #30155:
URL: https://github.com/apache/spark/pull/30155#discussion_r512337517



##########
File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
##########
@@ -227,6 +227,14 @@ private[spark] object Config extends Logging {
       .checkValue(value => value > 0, "Allocation batch delay must be a positive time value.")
       .createWithDefaultString("1s")
 
+  val KUBERNETES_ALLOCATION_EXECUTOR_TIMEOUT =
+    ConfigBuilder("spark.kubernetes.allocation.executor.timeout")
+      .doc("Time to wait before considering a pending executor timedout.")
+      .version("3.1.0")
+      .timeConf(TimeUnit.MILLISECONDS)
+      .checkValue(value => value > 0, "Allocation batch delay must be a positive time value.")

Review comment:
       Good catch, thanks.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun closed pull request #30155: [SPARK-33231][SPARK-33262][CORE] Make pod allocation executor timeouts configurable & allow scheduling with pending pods.

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun closed pull request #30155:
URL: https://github.com/apache/spark/pull/30155


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30155: [SPARK-33231][SPARK-33262][CORE] Make pod allocation executor timeouts configurable & allow scheduling with pending pods.

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #30155:
URL: https://github.com/apache/spark/pull/30155#discussion_r512942534



##########
File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala
##########
@@ -214,10 +216,9 @@ private[spark] class ExecutorPodsAllocator(
     }
 
     if (newlyCreatedExecutors.isEmpty
-        && currentPendingExecutors.isEmpty
-        && currentRunningCount < currentTotalExpectedExecutors) {
+        && knownPodCount < currentTotalExpectedExecutors) {
       val numExecutorsToAllocate = math.min(
-        currentTotalExpectedExecutors - currentRunningCount, podAllocationSize)
+        currentTotalExpectedExecutors - knownPodCount, podAllocationSize)

Review comment:
       Could you add a test case for `SPARK-33262`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30155: [SPARK-33231][CORE] Make pod allocation executor timeouts configurable

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #30155:
URL: https://github.com/apache/spark/pull/30155#discussion_r512328539



##########
File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
##########
@@ -227,6 +227,14 @@ private[spark] object Config extends Logging {
       .checkValue(value => value > 0, "Allocation batch delay must be a positive time value.")
       .createWithDefaultString("1s")
 
+  val KUBERNETES_ALLOCATION_EXECUTOR_TIMEOUT =
+    ConfigBuilder("spark.kubernetes.allocation.executor.timeout")
+      .doc("Time to wait before considering a pending executor timedout.")
+      .version("3.1.0")
+      .timeConf(TimeUnit.MILLISECONDS)
+      .checkValue(value => value > 0, "Allocation batch delay must be a positive time value.")
+      .createWithDefaultString("600s")

Review comment:
       Technically, this is the existing value, `600s`. So, which value did you need to change? I'm curious about your use cases.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] holdenk commented on a change in pull request #30155: [SPARK-33231][CORE] Make pod allocation executor timeouts configurable

Posted by GitBox <gi...@apache.org>.
holdenk commented on a change in pull request #30155:
URL: https://github.com/apache/spark/pull/30155#discussion_r512336669



##########
File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala
##########
@@ -214,10 +216,9 @@ private[spark] class ExecutorPodsAllocator(
     }
 
     if (newlyCreatedExecutors.isEmpty
-        && currentPendingExecutors.isEmpty
-        && currentRunningCount < currentTotalExpectedExecutors) {
+        && knownPodCount < currentTotalExpectedExecutors) {

Review comment:
       So it's not unrelated, but I can give it a second JIRA? Having more time available means that depending on currentRunningCount can be stale information.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30155: [SPARK-33231][CORE] Make pod allocation executor timeouts configurable

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30155:
URL: https://github.com/apache/spark/pull/30155#issuecomment-716912043






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30155: [SPARK-33231][CORE] Make pod allocation executor timeouts configurable

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30155:
URL: https://github.com/apache/spark/pull/30155#issuecomment-716892631


   **[Test build #130304 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130304/testReport)** for PR 30155 at commit [`90ad4d1`](https://github.com/apache/spark/commit/90ad4d140a80df379384f81ffbbdfec758eb1b39).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org