You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by "holdenk (via GitHub)" <gi...@apache.org> on 2023/07/26 22:36:38 UTC

[GitHub] [spark] holdenk opened a new pull request, #39825: [SPARK-42261][SPARK-42260][K8S] Log Allocation Stalls and Trigger Allocation event without blocking on snapshot

holdenk opened a new pull request, #39825:
URL: https://github.com/apache/spark/pull/39825

   ### What changes were proposed in this pull request?
   
   Log Allocation Stalls when we are unable to allocate any pods (but wish to) during a K8s snapshot event.
   Trigger Allocation event without blocking on snapshot provided that there is enough room in maxPendingPods.
   
   ### Why are the changes needed?
   
   Spark on K8s dynamic allocation can be difficult to debug, prone to stalling in heavily loaded clusters, and waiting for snapshot events has an unnecessary delay for pod allocation.
   
   ### Does this PR introduce _any_ user-facing change?
   
   New log messages, faster pod scale up.
   
   ### How was this patch tested?
   
   Modified existing test to verify that we are both triggering allocation with pending pods and tracking when we are stalled.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-42261][SPARK-42260][K8S] Log Allocation Stalls and Trigger Allocation event without blocking on snapshot [spark]

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.

github-actions[bot] closed pull request #39825: [SPARK-42261][SPARK-42260][K8S] Log Allocation Stalls and Trigger Allocation event without blocking on snapshot
URL: https://github.com/apache/spark/pull/39825


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #39825: [SPARK-42261][SPARK-42260][K8S] Log Allocation Stalls and Trigger Allocation event without blocking on snapshot

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.

dongjoon-hyun commented on PR #39825:
URL: https://github.com/apache/spark/pull/39825#issuecomment-1421008520

   As I mentioned in the previous comment, Apache Spark are known to get explosions of pending pods and subsequent EKS resources. I was surprised that SPARK-42261 is filed as a bug, @holdenk .
   > is there some context I'm missing?  Did y'all get a stuck cluster with the previous behavior?
   
   I thought you are using `StatefulSet` only?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-42261][SPARK-42260][K8S] Log Allocation Stalls and Trigger Allocation event without blocking on snapshot [spark]

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.

github-actions[bot] commented on PR #39825:
URL: https://github.com/apache/spark/pull/39825#issuecomment-1922540334

   We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39825: [SPARK-42261][SPARK-42260][K8S] Log Allocation Stalls and Trigger Allocation event without blocking on snapshot

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.

dongjoon-hyun commented on code in PR #39825:
URL: https://github.com/apache/spark/pull/39825#discussion_r1099652817


##########
resource-managers/kubernetes/core/src/test/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocatorSuite.scala:
##########
@@ -186,7 +186,107 @@ class ExecutorPodsAllocatorSuite extends SparkFunSuite with BeforeAndAfter {
     rpb.require(ereq).require(treq)
     val rp = rpb.build()
 
-    val confWithLowMaxPendingPods = conf.clone.set(KUBERNETES_MAX_PENDING_PODS.key, "3")
+    val confWithMediumMaxPendingPods = conf.clone.set(
+      KUBERNETES_MAX_PENDING_PODS.key, "20").set(
+      KUBERNETES_ALLOCATION_BLOCK_ON_SNAPSHOT.key, "false").set(
+      KUBERNETES_ALLOCATION_BATCH_SIZE.key, "20")
+    assert(podsAllocatorUnderTest.stalledStartTime == null)
+    podsAllocatorUnderTest = new ExecutorPodsAllocator(confWithMediumMaxPendingPods, secMgr,
+      executorBuilder, kubernetesClient, snapshotsStore, waitForExecutorPodsClock)
+    podsAllocatorUnderTest.start(TEST_SPARK_APP_ID, schedulerBackend)
+
+    podsAllocatorUnderTest.setTotalExpectedExecutors(Map(defaultProfile -> 2, rp -> 3))
+    assert(podsAllocatorUnderTest.numOutstandingPods.get() == 5)
+    // We should have allocated something in each RPI so we don't count as "stalled."
+    assert(podsAllocatorUnderTest.stalledStartTime == null)
+
+    // We should not yet have stalled.
+    podsAllocatorUnderTest.setTotalExpectedExecutors(Map(rp -> 100))
+    assert(podsAllocatorUnderTest.numOutstandingPods.get() == 20)
+    assert(podsAllocatorUnderTest.stalledStartTime == null)
+
+    // And now we stall.
+    podsAllocatorUnderTest.setTotalExpectedExecutors(Map(rp -> 200))
+    assert(podsAllocatorUnderTest.numOutstandingPods.get() == 20)
+    assert(podsAllocatorUnderTest.stalledStartTime != null)
+  }
+
+  test("SPARK-42261: Don't allow allocations without snapshot by default (except new rpID)") {
+    when(podOperations
+      .withField("status.phase", "Pending"))
+      .thenReturn(podOperations)
+    when(podOperations
+      .withLabel(SPARK_APP_ID_LABEL, TEST_SPARK_APP_ID))
+      .thenReturn(podOperations)
+    when(podOperations
+      .withLabel(SPARK_ROLE_LABEL, SPARK_POD_EXECUTOR_ROLE))
+      .thenReturn(podOperations)
+    when(podOperations
+      .withLabelIn(meq(SPARK_EXECUTOR_ID_LABEL), any()))
+      .thenReturn(podOperations)
+
+    val startTime = Instant.now.toEpochMilli
+    waitForExecutorPodsClock.setTime(startTime)
+
+    val rpb = new ResourceProfileBuilder()
+    val ereq = new ExecutorResourceRequests()
+    val treq = new TaskResourceRequests()
+    ereq.cores(4).memory("2g")
+    treq.cpus(2)
+    rpb.require(ereq).require(treq)
+    val rp = rpb.build()
+
+    val confWithMediumMaxPendingPods = conf.clone.set(
+      KUBERNETES_MAX_PENDING_PODS.key, "20").set(
+      KUBERNETES_ALLOCATION_BATCH_SIZE.key, "20")
+    assert(podsAllocatorUnderTest.stalledStartTime == null)
+    podsAllocatorUnderTest = new ExecutorPodsAllocator(confWithMediumMaxPendingPods, secMgr,
+      executorBuilder, kubernetesClient, snapshotsStore, waitForExecutorPodsClock)
+    podsAllocatorUnderTest.start(TEST_SPARK_APP_ID, schedulerBackend)
+
+    podsAllocatorUnderTest.setTotalExpectedExecutors(Map(defaultProfile -> 2, rp -> 3))
+    assert(podsAllocatorUnderTest.numOutstandingPods.get() == 5)
+    // We should have allocated something in each RPI so we don't count as "stalled."
+    assert(podsAllocatorUnderTest.stalledStartTime == null)
+
+    podsAllocatorUnderTest.setTotalExpectedExecutors(Map(rp -> 100))
+    assert(podsAllocatorUnderTest.numOutstandingPods.get() == 5)
+    assert(podsAllocatorUnderTest.stalledStartTime == null)
+
+    podsAllocatorUnderTest.setTotalExpectedExecutors(Map(rp -> 200))
+    assert(podsAllocatorUnderTest.numOutstandingPods.get() == 5)
+    assert(podsAllocatorUnderTest.stalledStartTime == null)
+  }
+
+  test("SPARK-36052: pending pod limit with multiple resource profiles & SPARK-42261") {

Review Comment:
   Oh, is this the original test case? Then, could you move the following test cases after `SPARK-36052` test case?
   ```
   test("SPARK-42261: Allow allocations without snapshot up to min of max pending & alloc size.") {
   test("SPARK-42261: Don't allow allocations without snapshot by default (except new rpID)") {
   
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] github-actions[bot] closed pull request #39825: [SPARK-42261][SPARK-42260][K8S] Log Allocation Stalls and Trigger Allocation event without blocking on snapshot

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.

github-actions[bot] closed pull request #39825: [SPARK-42261][SPARK-42260][K8S] Log Allocation Stalls and Trigger Allocation event without blocking on snapshot
URL: https://github.com/apache/spark/pull/39825


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] attilapiros commented on a diff in pull request #39825: [SPARK-42261][SPARK-42260][K8S] Log Allocation Stalls and Trigger Allocation event without blocking on snapshot

Posted by "attilapiros (via GitHub)" <gi...@apache.org>.

attilapiros commented on code in PR #39825:
URL: https://github.com/apache/spark/pull/39825#discussion_r1092616469


##########
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala:
##########
@@ -141,9 +143,26 @@ class ExecutorPodsAllocator(
       totalExpectedExecutorsPerResourceProfileId.put(rp.id, numExecs)
     }
     logDebug(s"Set total expected execs to $totalExpectedExecutorsPerResourceProfileId")
-    if (numOutstandingPods.get() == 0) {
+    if (numOutstandingPods.get() < maxPendingPods) {

Review Comment:
   The [default of KUBERNETES_MAX_PENDING_PODS is Int.MaxValue](https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala#L737) (too keep the old behaviour when it was introduced)  and the `numOutstandingPods` main intention was to slow down upscaling at very steep peaks:
   https://github.com/apache/spark/blob/b5b40113a64b4dbbcd4efe86da4409f2be8e9c56/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala#L397-L399
   
   What about to use the allocation batch size (more a factor of it as good lower limit)?
    
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] holdenk commented on a diff in pull request #39825: [SPARK-42261][SPARK-42260][K8S] Log Allocation Stalls and Trigger Allocation event without blocking on snapshot

Posted by "holdenk (via GitHub)" <gi...@apache.org>.

holdenk commented on code in PR #39825:
URL: https://github.com/apache/spark/pull/39825#discussion_r1097976550


##########
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala:
##########
@@ -141,9 +143,26 @@ class ExecutorPodsAllocator(
       totalExpectedExecutorsPerResourceProfileId.put(rp.id, numExecs)
     }
     logDebug(s"Set total expected execs to $totalExpectedExecutorsPerResourceProfileId")
-    if (numOutstandingPods.get() == 0) {
+    if (numOutstandingPods.get() < maxPendingPods) {

Review Comment:
   That seems reasonable to me I'll change it over.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] holdenk commented on a diff in pull request #39825: [SPARK-42261][SPARK-42260][K8S] Log Allocation Stalls and Trigger Allocation event without blocking on snapshot

Posted by "holdenk (via GitHub)" <gi...@apache.org>.

holdenk commented on code in PR #39825:
URL: https://github.com/apache/spark/pull/39825#discussion_r1100813043


##########
resource-managers/kubernetes/core/src/test/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocatorSuite.scala:
##########
@@ -161,7 +161,7 @@ class ExecutorPodsAllocatorSuite extends SparkFunSuite with BeforeAndAfter {
     assert(ExecutorPodsAllocator.splitSlots(seq2, 4) === Seq(("a", 2), ("b", 1), ("c", 1)))
   }
 
-  test("SPARK-36052: pending pod limit with multiple resource profiles") {
+  test("SPARK-42261: Allow allocations without snapshot up to min of max pending & alloc size.") {

Review Comment:
   Sounds good, will refactor next week.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] holdenk commented on pull request #39825: [SPARK-42261][SPARK-42260][K8S] Log Allocation Stalls and Trigger Allocation event without blocking on snapshot

Posted by "holdenk (via GitHub)" <gi...@apache.org>.

holdenk commented on PR #39825:
URL: https://github.com/apache/spark/pull/39825#issuecomment-1419876529

   Ok @dongjoon-hyun I hear the -1 concern around excessive scale-up -- but blocking scale-up on snapshots being delivered seems like kind of a hack (what we are doing today). What about if we use "allocation batch size" to gate it like @attilapiros suggested?
   
   The other option would be to add this as a default feature off flag (e.g. `enableAllocationWithPendingPods` and set it to `false` by default).
   
   The current logic which exists in the master branch seems to be based on 0343854f54b48b206ca434accec99355011560c2 / https://github.com/apache/spark/pull/25236  from @vanzin which introduced `hasPendingPods` (which we now in master track the number of instead) and seems in line with the stated goals of that (" More responsive dynamic allocation with K8S"). I don't see anything mentioned in it about reducing the number of pending resources in EKS but maybe there was a discussion off-PR/off-list I don't see.
   
   Do any of those work for your concern?
   
   Just as an aside I'm a little surprised with a veto ( https://spark.apache.org/committers.html / https://www.apache.org/foundation/voting.html ) this early in the conversation around a proposed change, is there some context I'm missing? Did y'all get a stuck cluster with the previous behavior?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] github-actions[bot] commented on pull request #39825: [SPARK-42261][SPARK-42260][K8S] Log Allocation Stalls and Trigger Allocation event without blocking on snapshot

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.

github-actions[bot] commented on PR #39825:
URL: https://github.com/apache/spark/pull/39825#issuecomment-1555391404

   We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39825: [SPARK-42261][SPARK-42260][K8S] Log Allocation Stalls and Trigger Allocation event without blocking on snapshot

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.

dongjoon-hyun commented on code in PR #39825:
URL: https://github.com/apache/spark/pull/39825#discussion_r1099651597


##########
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala:
##########
@@ -737,6 +737,18 @@ private[spark] object Config extends Logging {
       .checkValue(value => value > 0, "Maximum number of pending pods should be a positive integer")
       .createWithDefault(Int.MaxValue)
 
+  val KUBERNETES_ALLOCATION_BLOCK_ON_SNAPSHOT =
+    ConfigBuilder("spark.kubernetes.allocation.blockOnSnapshot")
+      .doc("By default Spark on Kube will not trigger a new allocation until Kube delivers a new " +
+        "snapshot this allows an implicit feedback from the cluster manager in that if it is too " +
+        "busy snapshots may be backed up. Settings this to false increases the speed at which " +
+        "Spark an scale up but does increase the risk of an excessive number of pending " +
+        s"resources in some environments. See ${KUBERNETES_MAX_PENDING_PODS.key} " +

Review Comment:
   Thank you for mentioning the risks.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-42261][SPARK-42260][K8S] Log Allocation Stalls and Trigger Allocation event without blocking on snapshot [spark]

Posted by "09306677806 (via GitHub)" <gi...@apache.org>.

09306677806 commented on PR #39825:
URL: https://github.com/apache/spark/pull/39825#issuecomment-1922569119

   Hi, I can't log in. My phone was hacked. I got into trouble. My whole life
   is gone. Can you give me a chance to log in?
   
   در تاریخ جمعه ۲ فوریه ۲۰۲۴،‏ ۰۳:۵۰ github-actions[bot] <
   ***@***.***> نوشت:
   
   > We're closing this PR because it hasn't been updated in a while. This
   > isn't a judgement on the merit of the PR in any way. It's just a way of
   > keeping the PR queue manageable.
   > If you'd like to revive this PR, please reopen it and ask a committer to
   > remove the Stale tag!
   >
   > —
   > Reply to this email directly, view it on GitHub
   > <https://github.com/apache/spark/pull/39825#issuecomment-1922540334>, or
   > unsubscribe
   > <https://github.com/notifications/unsubscribe-auth/ANEZZREIL2F7GOQFF4VWTA3YRQWNHAVCNFSM6AAAAAAUMZSAEKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMRSGU2DAMZTGQ>
   > .
   > You are receiving this because you are subscribed to this thread.Message
   > ID: ***@***.***>
   >
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] holdenk commented on pull request #39825: [SPARK-42261][SPARK-42260][K8S] Log Allocation Stalls and Trigger Allocation event without blocking on snapshot

Posted by "holdenk (via GitHub)" <gi...@apache.org>.

holdenk commented on PR #39825:
URL: https://github.com/apache/spark/pull/39825#issuecomment-1421268974

   We're exploring a mixture of different ways of handling allocation depending on the job.
   
   So I filed [SPARK-42261](https://issues.apache.org/jira/browse/SPARK-42261) as a bug because (from the commit history) not allocating until a snapshot is triggered does not seem to be an intentional feature and it slows down scale up in (what to me) is an unexpected way. I'm happy to change it to "improvement" from bug.
   
   But more to the point: does having this as an opt-in feature make you feel comfortable with dropping the -1? Or do you think that having this an optional feature is bad? I think it's reasonable for us to want to support non-EKS deployments for fast scale up (not everyone is going to use EKS let alone PVCs on EKS).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39825: [SPARK-42261][SPARK-42260][K8S] Log Allocation Stalls and Trigger Allocation event without blocking on snapshot

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.

dongjoon-hyun commented on code in PR #39825:
URL: https://github.com/apache/spark/pull/39825#discussion_r1099650989


##########
resource-managers/kubernetes/core/src/test/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocatorSuite.scala:
##########
@@ -161,7 +161,7 @@ class ExecutorPodsAllocatorSuite extends SparkFunSuite with BeforeAndAfter {
     assert(ExecutorPodsAllocator.splitSlots(seq2, 4) === Seq(("a", 2), ("b", 1), ("c", 1)))
   }
 
-  test("SPARK-36052: pending pod limit with multiple resource profiles") {
+  test("SPARK-42261: Allow allocations without snapshot up to min of max pending & alloc size.") {

Review Comment:
   Please add a new test case while keeping the original test coverage by using configurations.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #39825: [SPARK-42261][SPARK-42260][K8S] Log Allocation Stalls and Trigger Allocation event without blocking on snapshot

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.

dongjoon-hyun commented on PR #39825:
URL: https://github.com/apache/spark/pull/39825#issuecomment-1421976910

Yes, last week, I was only -1 for [SPARK-42261 K8s will not allocate more execs if there are any pending execs until next snapshot](https://issues.apache.org/jira/browse/SPARK-42261) part because it's reported as a bug. For SPARK-42260, I wasn't sure because its `Target Version` was `3.4.1`.

For the following, I'm open for new feature approaches.

> What about if we use "allocation batch size" to gate it like @attilapiros suggested?

For the following (which is more important), I didn't catch up on this PR's latest commits yet, but I didn't mean to prevent alternatives. Even in Spark 3.4, I believe we can backport a required K8s alternatives still if we need it, @holdenk .
> But more to the point: does having this as an opt-in feature make you feel comfortable with dropping the -1? Or do you think that having this an optional feature is bad? I think it's reasonable for us to want to support non-EKS deployments for fast scale up (not everyone is going to use EKS let alone PVCs on EKS).

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39825: [SPARK-42261][SPARK-42260][K8S] Log Allocation Stalls and Trigger Allocation event without blocking on snapshot

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.

dongjoon-hyun commented on code in PR #39825:
URL: https://github.com/apache/spark/pull/39825#discussion_r1099652817


##########
resource-managers/kubernetes/core/src/test/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocatorSuite.scala:
##########
@@ -186,7 +186,107 @@ class ExecutorPodsAllocatorSuite extends SparkFunSuite with BeforeAndAfter {
     rpb.require(ereq).require(treq)
     val rp = rpb.build()
 
-    val confWithLowMaxPendingPods = conf.clone.set(KUBERNETES_MAX_PENDING_PODS.key, "3")
+    val confWithMediumMaxPendingPods = conf.clone.set(
+      KUBERNETES_MAX_PENDING_PODS.key, "20").set(
+      KUBERNETES_ALLOCATION_BLOCK_ON_SNAPSHOT.key, "false").set(
+      KUBERNETES_ALLOCATION_BATCH_SIZE.key, "20")
+    assert(podsAllocatorUnderTest.stalledStartTime == null)
+    podsAllocatorUnderTest = new ExecutorPodsAllocator(confWithMediumMaxPendingPods, secMgr,
+      executorBuilder, kubernetesClient, snapshotsStore, waitForExecutorPodsClock)
+    podsAllocatorUnderTest.start(TEST_SPARK_APP_ID, schedulerBackend)
+
+    podsAllocatorUnderTest.setTotalExpectedExecutors(Map(defaultProfile -> 2, rp -> 3))
+    assert(podsAllocatorUnderTest.numOutstandingPods.get() == 5)
+    // We should have allocated something in each RPI so we don't count as "stalled."
+    assert(podsAllocatorUnderTest.stalledStartTime == null)
+
+    // We should not yet have stalled.
+    podsAllocatorUnderTest.setTotalExpectedExecutors(Map(rp -> 100))
+    assert(podsAllocatorUnderTest.numOutstandingPods.get() == 20)
+    assert(podsAllocatorUnderTest.stalledStartTime == null)
+
+    // And now we stall.
+    podsAllocatorUnderTest.setTotalExpectedExecutors(Map(rp -> 200))
+    assert(podsAllocatorUnderTest.numOutstandingPods.get() == 20)
+    assert(podsAllocatorUnderTest.stalledStartTime != null)
+  }
+
+  test("SPARK-42261: Don't allow allocations without snapshot by default (except new rpID)") {
+    when(podOperations
+      .withField("status.phase", "Pending"))
+      .thenReturn(podOperations)
+    when(podOperations
+      .withLabel(SPARK_APP_ID_LABEL, TEST_SPARK_APP_ID))
+      .thenReturn(podOperations)
+    when(podOperations
+      .withLabel(SPARK_ROLE_LABEL, SPARK_POD_EXECUTOR_ROLE))
+      .thenReturn(podOperations)
+    when(podOperations
+      .withLabelIn(meq(SPARK_EXECUTOR_ID_LABEL), any()))
+      .thenReturn(podOperations)
+
+    val startTime = Instant.now.toEpochMilli
+    waitForExecutorPodsClock.setTime(startTime)
+
+    val rpb = new ResourceProfileBuilder()
+    val ereq = new ExecutorResourceRequests()
+    val treq = new TaskResourceRequests()
+    ereq.cores(4).memory("2g")
+    treq.cpus(2)
+    rpb.require(ereq).require(treq)
+    val rp = rpb.build()
+
+    val confWithMediumMaxPendingPods = conf.clone.set(
+      KUBERNETES_MAX_PENDING_PODS.key, "20").set(
+      KUBERNETES_ALLOCATION_BATCH_SIZE.key, "20")
+    assert(podsAllocatorUnderTest.stalledStartTime == null)
+    podsAllocatorUnderTest = new ExecutorPodsAllocator(confWithMediumMaxPendingPods, secMgr,
+      executorBuilder, kubernetesClient, snapshotsStore, waitForExecutorPodsClock)
+    podsAllocatorUnderTest.start(TEST_SPARK_APP_ID, schedulerBackend)
+
+    podsAllocatorUnderTest.setTotalExpectedExecutors(Map(defaultProfile -> 2, rp -> 3))
+    assert(podsAllocatorUnderTest.numOutstandingPods.get() == 5)
+    // We should have allocated something in each RPI so we don't count as "stalled."
+    assert(podsAllocatorUnderTest.stalledStartTime == null)
+
+    podsAllocatorUnderTest.setTotalExpectedExecutors(Map(rp -> 100))
+    assert(podsAllocatorUnderTest.numOutstandingPods.get() == 5)
+    assert(podsAllocatorUnderTest.stalledStartTime == null)
+
+    podsAllocatorUnderTest.setTotalExpectedExecutors(Map(rp -> 200))
+    assert(podsAllocatorUnderTest.numOutstandingPods.get() == 5)
+    assert(podsAllocatorUnderTest.stalledStartTime == null)
+  }
+
+  test("SPARK-36052: pending pod limit with multiple resource profiles & SPARK-42261") {

Review Comment:
   Oh, is this the original test case? If you don't mind, could you move the following test cases after `SPARK-36052` test case?
   ```
   test("SPARK-42261: Allow allocations without snapshot up to min of max pending & alloc size.") {
   test("SPARK-42261: Don't allow allocations without snapshot by default (except new rpID)") {
   
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org