You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/03/09 07:23:39 UTC

[GitHub] [spark] dongjoon-hyun opened a new pull request #35783: [SPARK-38480][K8S] Remove `spark.kubernetes.job.queue` in favor of `spark.kubernetes.driver.podGroupTemplateFile`

dongjoon-hyun opened a new pull request #35783:
URL: https://github.com/apache/spark/pull/35783


   ### What changes were proposed in this pull request?
   
   This PR aims to remove `spark.kubernetes.job.queue` in favor of `spark.kubernetes.driver.podGroupTemplateFile` for Apache Spark 3.3.
   
   ### Why are the changes needed?
   
   There are several batch execution scheduler options including custom schedulers in K8s environment.
   We had better isolate scheduler specific settings instead of introducing a new configuration.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No, the previous configuration is not released yet.
   
   ### How was this patch tested?
   
   Pass the CIs and K8s IT.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #35783: [SPARK-38480][K8S] Remove `spark.kubernetes.job.queue` in favor of `spark.kubernetes.driver.podGroupTemplateFile`

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #35783:
URL: https://github.com/apache/spark/pull/35783#issuecomment-1063215458


   Thank you, @viirya , @yaooqinn , @Yikun , @martin-g , @k82cn. 
   Merged to master for Apache Spark 3.3.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #35783: [SPARK-38480][K8S] Remove `spark.kubernetes.job.queue` in favor of `spark.kubernetes.driver.podGroupTemplateFile`

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #35783:
URL: https://github.com/apache/spark/pull/35783#discussion_r822930303



##########
File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
##########
@@ -306,13 +306,6 @@ private[spark] object Config extends Logging {
       .stringConf
       .createOptional
 
-  val KUBERNETES_JOB_QUEUE = ConfigBuilder("spark.kubernetes.job.queue")

Review comment:
       Thanks




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Yikun commented on a change in pull request #35783: [SPARK-38480][K8S] Remove `spark.kubernetes.job.queue` in favor of `spark.kubernetes.driver.podGroupTemplateFile`

Posted by GitBox <gi...@apache.org>.
Yikun commented on a change in pull request #35783:
URL: https://github.com/apache/spark/pull/35783#discussion_r822562344



##########
File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
##########
@@ -306,13 +306,6 @@ private[spark] object Config extends Logging {
       .stringConf
       .createOptional
 
-  val KUBERNETES_JOB_QUEUE = ConfigBuilder("spark.kubernetes.job.queue")

Review comment:
       And also remove this in https://github.com/apache/spark/blob/master/docs/running-on-kubernetes.md




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] k82cn commented on pull request #35783: [SPARK-38480][K8S] Remove `spark.kubernetes.job.queue` in favor of `spark.kubernetes.driver.podGroupTemplateFile`

Posted by GitBox <gi...@apache.org>.
k82cn commented on pull request #35783:
URL: https://github.com/apache/spark/pull/35783#issuecomment-1062833666


   > @viirya @martin-g @yaooqinn Thanks for your review. And sorry to late reply, frankly, I was a little bit concerned about flexibility before, but now I think I'm +1 on this.
   > 
   > If needed, we still can select some configuration carefully in future to overwrite.
   > 
   > I also took some time to get some more feedback from our internal and local users/developers (@yaooqinn @aidaizyy @william-wang @k82cn) who are using kubernetes or using spark with volcano. They also think it's a good way.
   > 
   > Thanks @dongjoon-hyun for your help! LGTM!
   
   Overall, that's ok to me :) But it's better to have related parameters to make it easier.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Yikun commented on a change in pull request #35783: [SPARK-38480][K8S] Remove `spark.kubernetes.job.queue` in favor of `spark.kubernetes.driver.podGroupTemplateFile`

Posted by GitBox <gi...@apache.org>.
Yikun commented on a change in pull request #35783:
URL: https://github.com/apache/spark/pull/35783#discussion_r822448101



##########
File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/VolcanoFeatureStep.scala
##########
@@ -60,7 +59,6 @@ private[spark] class VolcanoFeatureStep extends KubernetesDriverCustomFeatureCon
 
     var spec = pg.getSpec
     if (spec == null) spec = new PodGroupSpec

Review comment:
       Yep, the volcano will create a default queue and also use [default queue internal](https://github.com/volcano-sh/volcano/blob/167e048eace4c7fcd192f8167359e9ad97c54545/pkg/webhooks/admission/jobs/mutate/mutate_job.go#L136) if spec.queue not specified.

##########
File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/VolcanoFeatureStep.scala
##########
@@ -60,7 +59,6 @@ private[spark] class VolcanoFeatureStep extends KubernetesDriverCustomFeatureCon
 
     var spec = pg.getSpec
     if (spec == null) spec = new PodGroupSpec

Review comment:
       Yep, the volcano will create a default queue and also use [default queue internally](https://github.com/volcano-sh/volcano/blob/167e048eace4c7fcd192f8167359e9ad97c54545/pkg/webhooks/admission/jobs/mutate/mutate_job.go#L136) if spec.queue not specified.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] martin-g commented on a change in pull request #35783: [SPARK-38480][K8S] Remove `spark.kubernetes.job.queue` in favor of `spark.kubernetes.driver.podGroupTemplateFile`

Posted by GitBox <gi...@apache.org>.
martin-g commented on a change in pull request #35783:
URL: https://github.com/apache/spark/pull/35783#discussion_r822394391



##########
File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/VolcanoFeatureStep.scala
##########
@@ -60,7 +59,6 @@ private[spark] class VolcanoFeatureStep extends KubernetesDriverCustomFeatureCon
 
     var spec = pg.getSpec
     if (spec == null) spec = new PodGroupSpec

Review comment:
       https://volcano.sh/en/docs/queue/#default-queue says `Subsequent jobs that are not assigned to a queue will be assigned to queue default.`. It is not clear whether the same applies to pod groups.
   https://volcano.sh/en/docs/podgroup/ does not say anything explicitly too




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] martin-g commented on a change in pull request #35783: [SPARK-38480][K8S] Remove `spark.kubernetes.job.queue` in favor of `spark.kubernetes.driver.podGroupTemplateFile`

Posted by GitBox <gi...@apache.org>.
martin-g commented on a change in pull request #35783:
URL: https://github.com/apache/spark/pull/35783#discussion_r822371243



##########
File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/VolcanoFeatureStep.scala
##########
@@ -60,7 +59,6 @@ private[spark] class VolcanoFeatureStep extends KubernetesDriverCustomFeatureCon
 
     var spec = pg.getSpec
     if (spec == null) spec = new PodGroupSpec

Review comment:
       AFAIU it the `queue` should be in the podgroup template yaml




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #35783: [SPARK-38480][K8S] Remove `spark.kubernetes.job.queue` in favor of `spark.kubernetes.driver.podGroupTemplateFile`

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #35783:
URL: https://github.com/apache/spark/pull/35783#discussion_r822373332



##########
File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/VolcanoFeatureStep.scala
##########
@@ -60,7 +59,6 @@ private[spark] class VolcanoFeatureStep extends KubernetesDriverCustomFeatureCon
 
     var spec = pg.getSpec
     if (spec == null) spec = new PodGroupSpec

Review comment:
       I think this is for the case no podgroup template is specified?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #35783: [SPARK-38480][K8S] Remove `spark.kubernetes.job.queue` in favor of `spark.kubernetes.driver.podGroupTemplateFile`

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #35783:
URL: https://github.com/apache/spark/pull/35783#discussion_r822378878



##########
File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/VolcanoFeatureStep.scala
##########
@@ -60,7 +59,6 @@ private[spark] class VolcanoFeatureStep extends KubernetesDriverCustomFeatureCon
 
     var spec = pg.getSpec
     if (spec == null) spec = new PodGroupSpec

Review comment:
       To @viirya , yes. Volcano has a default queue by default and use it for the PodGroups without queue spec. That's the Volcano's pre-defined behavior.
   - https://volcano.sh/en/docs/queue/#default-queue
   
   To @martin-g , **no** as you see in the above Volcano document. Where did you get that understanding?
   > AFAIU it the queue should be in the podgroup template yaml




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun closed pull request #35783: [SPARK-38480][K8S] Remove `spark.kubernetes.job.queue` in favor of `spark.kubernetes.driver.podGroupTemplateFile`

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun closed pull request #35783:
URL: https://github.com/apache/spark/pull/35783


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] martin-g commented on a change in pull request #35783: [SPARK-38480][K8S] Remove `spark.kubernetes.job.queue` in favor of `spark.kubernetes.driver.podGroupTemplateFile`

Posted by GitBox <gi...@apache.org>.
martin-g commented on a change in pull request #35783:
URL: https://github.com/apache/spark/pull/35783#discussion_r822376481



##########
File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/VolcanoFeatureStep.scala
##########
@@ -60,7 +59,6 @@ private[spark] class VolcanoFeatureStep extends KubernetesDriverCustomFeatureCon
 
     var spec = pg.getSpec
     if (spec == null) spec = new PodGroupSpec

Review comment:
       You are right! 
   If `val pg = new PodGroup()` (line 53) then the PG won't have a queue




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Yikun commented on a change in pull request #35783: [SPARK-38480][K8S] Remove `spark.kubernetes.job.queue` in favor of `spark.kubernetes.driver.podGroupTemplateFile`

Posted by GitBox <gi...@apache.org>.
Yikun commented on a change in pull request #35783:
URL: https://github.com/apache/spark/pull/35783#discussion_r822529440



##########
File path: resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/VolcanoTestsSuite.scala
##########
@@ -200,7 +200,7 @@ private[spark] trait VolcanoTestsSuite extends BeforeAndAfterEach { k8sSuite: Ku
       groupLoc: Option[String] = None,
       queue: Option[String] = None,
       driverTemplate: Option[String] = None): SparkAppConf = {
-    val conf = kubernetesTestComponents.newSparkAppConf()
+    var conf = kubernetesTestComponents.newSparkAppConf()

Review comment:
       ```suggestion
       val conf = kubernetesTestComponents.newSparkAppConf()
   ```

##########
File path: resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/VolcanoTestsSuite.scala
##########
@@ -210,7 +210,12 @@ private[spark] trait VolcanoTestsSuite extends BeforeAndAfterEach { k8sSuite: Ku
       .set(KUBERNETES_SCHEDULER_NAME.key, "volcano")
       .set(KUBERNETES_DRIVER_POD_FEATURE_STEPS.key, VOLCANO_FEATURE_STEP)
       .set(KUBERNETES_EXECUTOR_POD_FEATURE_STEPS.key, VOLCANO_FEATURE_STEP)
-    queue.foreach(conf.set(KUBERNETES_JOB_QUEUE.key, _))
+    queue.foreach { q =>
+      conf = conf.set(KUBERNETES_DRIVER_PODGROUP_TEMPLATE_FILE.key,

Review comment:
       ```suggestion
         conf.set(KUBERNETES_DRIVER_PODGROUP_TEMPLATE_FILE.key,
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #35783: [SPARK-38480][K8S] Remove `spark.kubernetes.job.queue` in favor of `spark.kubernetes.driver.podGroupTemplateFile`

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #35783:
URL: https://github.com/apache/spark/pull/35783#issuecomment-1063204294


   No, @k82cn . That's not better because this is `volcano` specific. Apache Spark wants to be open and extensible to all custom schedulers. To do that, we need a clear isolation between schedulers.
   > Overall, that's ok to me :) But it's better to have related parameters to make it easier.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #35783: [SPARK-38480][K8S] Remove `spark.kubernetes.job.queue` in favor of `spark.kubernetes.driver.podGroupTemplateFile`

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #35783:
URL: https://github.com/apache/spark/pull/35783#issuecomment-1063214496


   The last commit updates documentation and changes `var` to `val` in the test case.
   
   - UT passed.
   ```
   [info] BasicDriverFeatureStepSuite:
   [info] - Check the pod respects all configurations from the user. (200 milliseconds)
   [info] - Check driver pod respects kubernetes driver request cores (9 milliseconds)
   [info] - Check appropriate entrypoint rerouting for various bindings (3 milliseconds)
   [info] - memory overhead factor: java (2 milliseconds)
   [info] - memory overhead factor: python default (2 milliseconds)
   [info] - memory overhead factor: python w/ override (2 milliseconds)
   [info] - memory overhead factor: r default (1 millisecond)
   [info] - SPARK-35493: make spark.blockManager.port be able to be fallen back to in driver pod (3 milliseconds)
   [info] - SPARK-36075: Check driver pod respects nodeSelector/driverNodeSelector (2 milliseconds)
   [info] EnvSecretsFeatureStepSuite:
   [info] - sets up all keyRefs (3 milliseconds)
   [info] ExecutorPodsPollingSnapshotSourceSuite:
   [info] - Items returned by the API should be pushed to the event queue (17 milliseconds)
   [info] - SPARK-36334: Support pod listing with resource version (7 milliseconds)
   [info] VolcanoFeatureStepSuite:
   [info] - SPARK-36061: Driver Pod with Volcano PodGroup (329 milliseconds)
   [info] - SPARK-36061: Executor Pod with Volcano PodGroup (2 milliseconds)
   [info] - SPARK-38423: Support priorityClassName (31 milliseconds)
   [info] - SPARK-38455: Support driver podgroup template (77 milliseconds)
   [info] - SPARK-38455: Support executor podgroup template (8 milliseconds)
   [info] ExecutorPodsSnapshotSuite:
   [info] - States are interpreted correctly from pod metadata. (14 milliseconds)
   [info] - SPARK-30821: States are interpreted correctly from pod metadata when configured to check all containers. (3 milliseconds)
   [info] - Updates add new pods for non-matching ids and edit existing pods for matching ids (1 millisecond)
   [info] ExecutorKubernetesCredentialsFeatureStepSuite:
   [info] - configure spark pod with executor service account (2 milliseconds)
   [info] - configure spark pod with with driver service account and without executor service account (0 milliseconds)
   [info] - configure spark pod with with driver service account and with executor service account (1 millisecond)
   [info] DriverKubernetesCredentialsFeatureStepSuite:
   [info] - Don't set any credentials (4 milliseconds)
   [info] - Only set credentials that are manually mounted. (1 millisecond)
   [info] - Mount credentials from the submission client as a secret. (31 milliseconds)
   [info] PodTemplateConfigMapStepSuite:
   [info] - Do nothing when executor template is not specified (1 millisecond)
   [info] - Mounts executor template volume if config specified (45 milliseconds)
   [info] KubernetesExecutorBuilderSuite:
   [info] - use empty initial pod if template is not specified (45 milliseconds)
   [info] - SPARK-36059: set custom scheduler (70 milliseconds)
   [info] - load pod template if specified (18 milliseconds)
   [info] - configure a custom test step (17 milliseconds)
   [info] - SPARK-37145: configure a custom test step with base config (14 milliseconds)
   [info] - SPARK-37145: configure a custom test step with driver or executor config (20 milliseconds)
   [info] - SPARK-37145: configure a custom test step with wrong type config (12 milliseconds)
   [info] - SPARK-37145: configure a custom test step with wrong name (12 milliseconds)
   [info] - complain about misconfigured pod template (11 milliseconds)
   [info] KubernetesConfSuite:
   [info] - Resolve driver labels, annotations, secret mount paths, envs, and memory overhead (2 milliseconds)
   [info] - Basic executor translated fields. (0 milliseconds)
   [info] - resource profile not default. (0 milliseconds)
   [info] - Image pull secrets. (0 milliseconds)
   [info] - Set executor labels, annotations, and secrets (1 millisecond)
   [info] - Verify that executorEnv key conforms to the regular specification (1 millisecond)
   [info] - SPARK-36075: Set nodeSelector, driverNodeSelector, executorNodeSelect (1 millisecond)
   [info] - SPARK-36059: Set driver.scheduler and executor.scheduler (1 millisecond)
   [info] - SPARK-37735: access appId in KubernetesConf (1 millisecond)
   [info] - SPARK-36566: get app name label (1 millisecond)
   [info] BasicExecutorFeatureStepSuite:
   [info] - test spark resource missing vendor (7 milliseconds)
   [info] - test spark resource missing amount (1 millisecond)
   [info] - basic executor pod with resources (8 milliseconds)
   [info] - basic executor pod has reasonable defaults (8 milliseconds)
   [info] - executor pod hostnames get truncated to 63 characters (8 milliseconds)
   [info] - SPARK-35460: invalid PodNamePrefixes (1 millisecond)
   [info] - hostname truncation generates valid host names (18 milliseconds)
   [info] - classpath and extra java options get translated into environment variables (7 milliseconds)
   [info] - SPARK-32655 Support appId/execId placeholder in SPARK_EXECUTOR_DIRS (6 milliseconds)
   [info] - test executor pyspark memory (6 milliseconds)
   [info] - auth secret propagation (8 milliseconds)
   [info] - Auth secret shouldn't propagate if files are loaded. (9 milliseconds)
   [info] - SPARK-32661 test executor offheap memory (7 milliseconds)
   [info] - basic resourceprofile (7 milliseconds)
   [info] - resourceprofile with gpus (7 milliseconds)
   [info] - Verify spark conf dir is mounted as configmap volume on executor pod's container. (7 milliseconds)
   [info] - SPARK-34316 Disable configmap volume on executor pod's container (6 milliseconds)
   [info] - SPARK-35482: user correct block manager port for executor pods (8 milliseconds)
   [info] - SPARK-35969: Make the pod prefix more readable and tallied with K8S DNS Label Names (11 milliseconds)
   [info] - SPARK-36075: Check executor pod respects nodeSelector/executorNodeSelector (6 milliseconds)
   [info] KubernetesVolumeUtilsSuite:
   [info] - Parses hostPath volumes correctly (1 millisecond)
   [info] - Parses subPath correctly (0 milliseconds)
   [info] - Parses persistentVolumeClaim volumes correctly (1 millisecond)
   [info] - Parses emptyDir volumes correctly (1 millisecond)
   [info] - Parses emptyDir volume options can be optional (0 milliseconds)
   [info] - Defaults optional readOnly to false (0 milliseconds)
   [info] - Fails on missing mount key (0 milliseconds)
   [info] - Fails on missing option key (1 millisecond)
   [info] - SPARK-33063: Fails on missing option key in persistentVolumeClaim (0 milliseconds)
   [info] - Parses read-only nfs volumes correctly (1 millisecond)
   [info] - Parses read/write nfs volumes correctly (0 milliseconds)
   [info] - Fails on missing path option (0 milliseconds)
   [info] - Fails on missing server option (1 millisecond)
   [info] ExecutorRollPluginSuite:
   [info] - Empty executor list (9 milliseconds)
   [info] - Driver summary should be ignored (3 milliseconds)
   [info] - A one-item executor list (4 milliseconds)
   [info] - SPARK-37806: All policy should ignore executor if totalTasks < minTasks (1 millisecond)
   [info] - Policy: ID (1 millisecond)
   [info] - Policy: ADD_TIME (1 millisecond)
   [info] - Policy: TOTAL_GC_TIME (0 milliseconds)
   [info] - Policy: TOTAL_DURATION (0 milliseconds)
   [info] - Policy: FAILED_TASKS (0 milliseconds)
   [info] - Policy: AVERAGE_DURATION (1 millisecond)
   [info] - Policy: OUTLIER - Work like TOTAL_DURATION if there is no outlier (0 milliseconds)
   [info] - Policy: OUTLIER - Detect an average task duration outlier (0 milliseconds)
   [info] - Policy: OUTLIER - Detect a total task duration outlier (1 millisecond)
   [info] - Policy: OUTLIER - Detect a total GC time outlier (1 millisecond)
   [info] - Policy: OUTLIER_NO_FALLBACK - Return None if there are no outliers (0 milliseconds)
   [info] - Policy: OUTLIER_NO_FALLBACK - Detect an average task duration outlier (1 millisecond)
   [info] - Policy: OUTLIER_NO_FALLBACK - Detect a total task duration outlier (0 milliseconds)
   [info] - Policy: OUTLIER_NO_FALLBACK - Detect a total GC time outlier (1 millisecond)
   [info] KubernetesClusterSchedulerBackendSuite:
   [info] - Start all components (4 milliseconds)
   [info] - Stop all components (69 milliseconds)
   [info] - Remove executor (26 milliseconds)
   [info] - Kill executors (50 milliseconds)
   [info] - SPARK-34407: CoarseGrainedSchedulerBackend.stop may throw SparkException (5 milliseconds)
   [info] - SPARK-34469: Ignore RegisterExecutor when SparkContext is stopped (1 millisecond)
   [info] - Dynamically fetch an executor ID (1 millisecond)
   [info] KubernetesDriverBuilderSuite:
   [info] - use empty initial pod if template is not specified (32 milliseconds)
   [info] - SPARK-36059: set custom scheduler (34 milliseconds)
   [info] - load pod template if specified (18 milliseconds)
   [info] - configure a custom test step (19 milliseconds)
   [info] - SPARK-37145: configure a custom test step with base config (19 milliseconds)
   [info] - SPARK-37145: configure a custom test step with driver or executor config (18 milliseconds)
   [info] - SPARK-37145: configure a custom test step with wrong type config (6 milliseconds)
   [info] - SPARK-37145: configure a custom test step with wrong name (6 milliseconds)
   [info] - complain about misconfigured pod template (6 milliseconds)
   [info] - SPARK-37331: check driver pre kubernetes resource, empty by default (13 milliseconds)
   [info] - SPARK-37331: check driver pre kubernetes resource as expected (12 milliseconds)
   [info] LocalDirsFeatureStepSuite:
   [info] - Resolve to default local dir if neither env nor configuration are set (0 milliseconds)
   [info] - Use configured local dirs split on comma if provided. (1 millisecond)
   [info] - Use tmpfs to back default local dir (1 millisecond)
   [info] - local dir on mounted volume (1 millisecond)
   [info] ExecutorPodsWatchSnapshotSourceSuite:
   [info] - Watch events should be pushed to the snapshots store as snapshot updates. (1 millisecond)
   [info] ExecutorPodsAllocatorSuite:
   [info] - SPARK-36052: test splitSlots (1 millisecond)
   [info] - SPARK-36052: pending pod limit with multiple resource profiles (20 milliseconds)
   [info] - Initially request executors in batches. Do not request another batch if the first has not finished. (3 milliseconds)
   [info] - Request executors in batches. Allow another batch to be requested if all pending executors start running. (4 milliseconds)
   [info] - When a current batch reaches error states immediately, re-request them on the next batch. (3 milliseconds)
   [info] - Verify stopping deletes the labeled pods (0 milliseconds)
   [info] - When an executor is requested but the API does not report it in a reasonable time, retry requesting that executor. (4 milliseconds)
   [info] - SPARK-28487: scale up and down on target executor count changes (4 milliseconds)
   [info] - SPARK-34334: correctly identify timed out pending pod requests as excess (2 milliseconds)
   [info] - SPARK-33099: Respect executor idle timeout configuration (2 milliseconds)
   [info] - SPARK-34361: scheduler backend known pods with multiple resource profiles at downscaling (7 milliseconds)
   [info] - SPARK-33288: multiple resource profiles (5 milliseconds)
   [info] - SPARK-33262: pod allocator does not stall with pending pods (3 milliseconds)
   [info] - SPARK-35416: Support PersistentVolumeClaim Reuse (10 milliseconds)
   [info] - print the pod name instead of Some(name) if pod is absent (1 millisecond)
   [info] ExecutorPodsSnapshotsStoreSuite:
   [info] - Subscribers get notified of events periodically. (2 milliseconds)
   [info] - Even without sending events, initially receive an empty buffer. (1 millisecond)
   [info] - Replacing the snapshot passes the new snapshot to subscribers. (0 milliseconds)
   [info] ExecutorPodsLifecycleManagerSuite:
   [info] - When an executor reaches error states immediately, remove from the scheduler backend. (14 milliseconds)
   [info] - Don't remove executors twice from Spark but remove from K8s repeatedly. (1 millisecond)
   [info] - When the scheduler backend lists executor ids that aren't present in the cluster, remove those executors from Spark. (2 milliseconds)
   [info] - Keep executor pods in k8s if configured. (2 milliseconds)
   [info] StatefulSetAllocatorSuite:
   [info] - Validate initial statefulSet creation & cleanup with two resource profiles (12 milliseconds)
   [info] - Validate statefulSet scale up (1 millisecond)
   [info] HadoopConfDriverFeatureStepSuite:
   [info] - mount hadoop config map if defined (1 millisecond)
   [info] - create hadoop config map if config dir is defined (2 milliseconds)
   [info] KubernetesClusterManagerSuite:
   [info] - constructing a AbstractPodsAllocator works (2 milliseconds)
   [info] KubernetesClientUtilsSuite:
   [info] - verify load files, loads only allowed files and not the disallowed files. (11 milliseconds)
   [info] - verify load files, truncates the content to maxSize, when keys are very large in number. (1 second, 283 milliseconds)
   [info] - verify load files, truncates the content to maxSize, when keys are equal in length. (2 milliseconds)
   [info] - verify that configmap built as expected (1 millisecond)
   [info] MountVolumesFeatureStepSuite:
   [info] - Mounts hostPath volumes (0 milliseconds)
   [info] - Mounts persistentVolumeClaims (1 millisecond)
   [info] - SPARK-32713 Mounts parameterized persistentVolumeClaims in executors (1 millisecond)
   [info] - Create and mounts persistentVolumeClaims in driver (1 millisecond)
   [info] - Create and mount persistentVolumeClaims in executors (0 milliseconds)
   [info] - Mounts emptyDir (2 milliseconds)
   [info] - Mounts emptyDir with no options (0 milliseconds)
   [info] - Mounts read/write nfs volumes (2 milliseconds)
   [info] - Mounts read-only nfs volumes (0 milliseconds)
   [info] - Mounts multiple volumes (1 millisecond)
   [info] - mountPath should be unique (1 millisecond)
   [info] - Mounts subpath on emptyDir (0 milliseconds)
   [info] - Mounts subpath on persistentVolumeClaims (1 millisecond)
   [info] - Mounts multiple subpaths (1 millisecond)
   [info] ClientSuite:
   [info] - The client should configure the pod using the builder. (4 milliseconds)
   [info] - The client should create Kubernetes resources (1 millisecond)
   [info] - SPARK-37331: The client should create Kubernetes resources with pre resources (2 milliseconds
   [info] - All files from SPARK_CONF_DIR, except templates, spark config, binary files and are within size limit, should be populated to pod's configMap. (6 milliseconds)
   [info] - Waiting for app completion should stall on the watcher (0 milliseconds)
   [info] K8sSubmitOpSuite:
   [info] - List app status (3 milliseconds)
   [info] - List status for multiple apps with glob (1 millisecond)
   [info] - Kill app (0 milliseconds)
   [info] - Kill app with gracePeriod (1 millisecond)
   [info] - Kill multiple apps with glob without gracePeriod (0 milliseconds)
   [info] KubernetesLocalDiskShuffleDataIOSuite:
   [info] - recompute is not blocked by the recovery (5 seconds, 406 milliseconds)
   [info] - Partial recompute shuffle data (6 seconds, 91 milliseconds)
   [info] - A new rdd and full recovery of old data (6 seconds, 98 milliseconds)
   [info] - Multi stages (4 seconds, 779 milliseconds)
   [info] KerberosConfDriverFeatureStepSuite:
   [info] - mount krb5 config map if defined (13 milliseconds)
   [info] - create krb5.conf config map if local config provided (11 milliseconds)
   [info] - create keytab secret if client keytab file used (7 milliseconds)
   [info] - do nothing if container-local keytab used (5 milliseconds)
   [info] - mount delegation tokens if provided (6 milliseconds)
   [info] - create delegation tokens if needed (17 milliseconds)
   [info] - do nothing if no config and no tokens (10 milliseconds)
   [info] MountSecretsFeatureStepSuite:
   [info] - mounts all given secrets (2 milliseconds)
   [info] DriverServiceFeatureStepSuite:
   [info] - Headless service has a port for the driver RPC, the block manager and driver ui. (2 milliseconds)
   [info] - Hostname and ports are set according to the service name. (0 milliseconds)
   [info] - Ports should resolve to defaults in SparkConf and in the service. (0 milliseconds)
   [info] - Long prefixes should switch to using a generated unique name. (3 milliseconds)
   [info] - Disallow bind address and driver host to be set explicitly. (1 millisecond)
   [info] DriverCommandFeatureStepSuite:
   [info] - java resource (0 milliseconds)
   [info] - python resource (1 millisecond)
   [info] - python executable precedence (1 millisecond)
   [info] - R resource (0 milliseconds)
   [info] - SPARK-25355: java resource args with proxy-user (0 milliseconds)
   [info] - SPARK-25355: python resource args with proxy-user (0 milliseconds)
   [info] - SPARK-25355: R resource args with proxy-user (0 milliseconds)
   [info] KubernetesUtilsSuite:
   [info] - Selects the given container as spark container. (1 millisecond)
   [info] - Selects the first container if no container name is given. (0 milliseconds)
   [info] - Falls back to the first container if given container name does not exist. (0 milliseconds)
   [info] - constructs spark pod correctly with pod template with no containers (0 milliseconds)
   [info] - SPARK-38201: check uploadFileToHadoopCompatibleFS with different delSrc and overwrite (76 milliseconds)
   [info] Run completed in 28 seconds, 304 milliseconds.
   [info] Total number of tests run: 205
   [info] Suites: completed 33, aborted 0
   [info] Tests: succeeded 205, failed 0, canceled 0, ignored 0, pending 0
   [info] All tests passed.
   [success] Total time: 37 s, completed Mar 9, 2022 9:18:17 AM
   ```
   
   - IT passed.
   ```
   [info] KubernetesSuite:
   [info] - Run SparkPi with no resources (10 seconds, 874 milliseconds)
   [info] - Run SparkPi with no resources & statefulset allocation (9 seconds, 705 milliseconds)
   [info] - Run SparkPi with a very long application name. (9 seconds, 724 milliseconds)
   [info] - Use SparkLauncher.NO_RESOURCE (9 seconds, 648 milliseconds)
   [info] - Run SparkPi with a master URL without a scheme. (9 seconds, 689 milliseconds)
   [info] - Run SparkPi with an argument. (9 seconds, 632 milliseconds)
   [info] - Run SparkPi with custom labels, annotations, and environment variables. (9 seconds, 749 milliseconds)
   [info] - All pods have the same service account by default (9 seconds, 646 milliseconds)
   [info] - Run extraJVMOptions check on driver (4 seconds, 502 milliseconds)
   [info] - Run SparkRemoteFileTest using a remote data file (9 seconds, 746 milliseconds)
   [info] - Verify logging configuration is picked from the provided SPARK_CONF_DIR/log4j2.properties (15 seconds, 131 milliseconds)
   [info] - Run SparkPi with env and mount secrets. (18 seconds, 590 milliseconds)
   [info] - Run PySpark on simple pi.py example (10 seconds, 808 milliseconds)
   [info] - Run PySpark to test a pyfiles example (11 seconds, 814 milliseconds)
   [info] - Run PySpark with memory customization (9 seconds, 655 milliseconds)
   [info] - Run in client mode. (7 seconds, 306 milliseconds)
   [info] - Start pod creation from template (9 seconds, 788 milliseconds)
   [info] - SPARK-38398: Schedule pod creation from template (9 seconds, 745 milliseconds)
   [info] - Test basic decommissioning (42 seconds, 121 milliseconds)
   [info] - Test basic decommissioning with shuffle cleanup (42 seconds, 274 milliseconds)
   [info] *** Test still running after 2 minutes, 13 seconds: suite name: KubernetesSuite, test name: Test decommissioning with dynamic allocation & shuffle cleanups.
   [info] - Test decommissioning with dynamic allocation & shuffle cleanups (2 minutes, 41 seconds)
   [info] - Test decommissioning timeouts (41 seconds, 775 milliseconds)
   [info] - SPARK-37576: Rolling decommissioning (1 minute, 6 seconds)
   [info] - Run SparkR on simple dataframe.R example (12 seconds, 699 milliseconds)
   [info] VolcanoSuite:
   [info] - Run SparkPi with no resources (10 seconds, 600 milliseconds)
   [info] - Run SparkPi with no resources & statefulset allocation (10 seconds, 690 milliseconds)
   [info] - Run SparkPi with a very long application name. (10 seconds, 663 milliseconds)
   [info] - Use SparkLauncher.NO_RESOURCE (10 seconds, 665 milliseconds)
   [info] - Run SparkPi with a master URL without a scheme. (10 seconds, 736 milliseconds)
   [info] - Run SparkPi with an argument. (10 seconds, 705 milliseconds)
   [info] - Run SparkPi with custom labels, annotations, and environment variables. (10 seconds, 645 milliseconds)
   [info] - All pods have the same service account by default (10 seconds, 669 milliseconds)
   [info] - Run extraJVMOptions check on driver (5 seconds, 591 milliseconds)
   [info] - Run SparkRemoteFileTest using a remote data file (10 seconds, 664 milliseconds)
   [info] - Verify logging configuration is picked from the provided SPARK_CONF_DIR/log4j2.properties (16 seconds, 375 milliseconds)
   [info] - Run SparkPi with env and mount secrets. (20 seconds, 707 milliseconds)
   [info] - Run PySpark on simple pi.py example (11 seconds, 680 milliseconds)
   [info] - Run PySpark to test a pyfiles example (12 seconds, 783 milliseconds)
   [info] - Run PySpark with memory customization (10 seconds, 708 milliseconds)
   [info] - Run in client mode. (7 seconds, 222 milliseconds)
   [info] - Start pod creation from template (10 seconds, 765 milliseconds)
   [info] - SPARK-38398: Schedule pod creation from template (10 seconds, 772 milliseconds)
   [info] - Test basic decommissioning (42 seconds, 213 milliseconds)
   [info] - Test basic decommissioning with shuffle cleanup (43 seconds, 377 milliseconds)
   [info] - Test decommissioning with dynamic allocation & shuffle cleanups (2 minutes, 42 seconds)
   [info] - Test decommissioning timeouts (42 seconds, 791 milliseconds)
   [info] - SPARK-37576: Rolling decommissioning (1 minute, 8 seconds)
   [info] - Run SparkR on simple dataframe.R example (12 seconds, 764 milliseconds)
   [info] - Run SparkPi with volcano scheduler (10 seconds, 742 milliseconds)
   [info] - SPARK-38188: Run SparkPi jobs with 2 queues (only 1 enabled) (13 seconds, 462 milliseconds)
   [info] - SPARK-38188: Run SparkPi jobs with 2 queues (all enabled) (21 seconds, 397 milliseconds)
   [info] - SPARK-38423: Run SparkPi Jobs with priorityClassName (15 seconds, 291 milliseconds)
   [info] - SPARK-38423: Run driver job to validate priority order (16 seconds, 398 milliseconds)
   [info] Run completed in 28 minutes, 15 seconds.
   [info] Total number of tests run: 53
   [info] Suites: completed 2, aborted 0
   [info] Tests: succeeded 53, failed 0, canceled 0, ignored 0, pending 0
   [info] All tests passed.
   [success] Total time: 1805 s (30:05), completed Mar 9, 2022 9:56:20 AM
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #35783: [SPARK-38480][K8S] Remove `spark.kubernetes.job.queue` in favor of `spark.kubernetes.driver.podGroupTemplateFile`

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #35783:
URL: https://github.com/apache/spark/pull/35783#issuecomment-1062627859


   cc @viirya and @Yikun 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #35783: [SPARK-38480][K8S] Remove `spark.kubernetes.job.queue` in favor of `spark.kubernetes.driver.podGroupTemplateFile`

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #35783:
URL: https://github.com/apache/spark/pull/35783#discussion_r822361895



##########
File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/VolcanoFeatureStep.scala
##########
@@ -60,7 +59,6 @@ private[spark] class VolcanoFeatureStep extends KubernetesDriverCustomFeatureCon
 
     var spec = pg.getSpec
     if (spec == null) spec = new PodGroupSpec

Review comment:
       Hmm, when `spec = new PodGroupSpec`, what queue it will use?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #35783: [SPARK-38480][K8S] Remove `spark.kubernetes.job.queue` in favor of `spark.kubernetes.driver.podGroupTemplateFile`

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #35783:
URL: https://github.com/apache/spark/pull/35783#discussion_r822378878



##########
File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/VolcanoFeatureStep.scala
##########
@@ -60,7 +59,6 @@ private[spark] class VolcanoFeatureStep extends KubernetesDriverCustomFeatureCon
 
     var spec = pg.getSpec
     if (spec == null) spec = new PodGroupSpec

Review comment:
       To @viirya . Volcano has a default queue by default and use it for the PodGroups without queue spec. That's the Volcano's pre-defined behavior.
   - https://volcano.sh/en/docs/queue/#default-queue
   
   To @martin-g , **no** as you see in the above Volcano document. Where did you get that understanding?
   > AFAIU it the queue should be in the podgroup template yaml




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] martin-g commented on a change in pull request #35783: [SPARK-38480][K8S] Remove `spark.kubernetes.job.queue` in favor of `spark.kubernetes.driver.podGroupTemplateFile`

Posted by GitBox <gi...@apache.org>.
martin-g commented on a change in pull request #35783:
URL: https://github.com/apache/spark/pull/35783#discussion_r822394872



##########
File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/VolcanoFeatureStep.scala
##########
@@ -60,7 +59,6 @@ private[spark] class VolcanoFeatureStep extends KubernetesDriverCustomFeatureCon
 
     var spec = pg.getSpec
     if (spec == null) spec = new PodGroupSpec

Review comment:
       I just asked in Volcano Slack and they confirmed that the `default` queue will be used in this case.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org