You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "turboFei (via GitHub)" <gi...@apache.org> on 2023/05/16 03:52:33 UTC

[GitHub] [spark] turboFei opened a new pull request, #41181: [SPARK-43504] Mount hadoop config map in executor side

turboFei opened a new pull request, #41181:
URL: https://github.com/apache/spark/pull/41181

   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html
     2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html
     3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'.
     4. Be sure to keep the PR description updated to reflect all changes.
     5. Please write your PR title to summarize what this PR proposes.
     6. If possible, provide a concise example to reproduce the issue for a faster review.
     7. If you want to add a new configuration, please read the guideline first for naming configurations in
        'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'.
     8. If you want to add or modify an error type or message, please read the guideline first in
        'core/src/main/resources/error/README.md'.
   -->
   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Spark versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   If benchmark tests were added, please run the benchmarks in GitHub Actions for the consistent environment, and the instructions could accord to: https://spark.apache.org/developer-tools.html#github-workflow-benchmarks.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] turboFei commented on pull request #41181: [SPARK-43504][K8S] Mounts the hadoop config map on the executor pod

Posted by "turboFei (via GitHub)" <gi...@apache.org>.
turboFei commented on PR #41181:
URL: https://github.com/apache/spark/pull/41181#issuecomment-1549820782

   seems the k8s integration testing is stuck, will check this pr in our dev hadoop cluster tomorrow.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] turboFei commented on pull request #41181: [SPARK-43504][K8S] Mounts the hadoop config map on the executor pod

Posted by "turboFei (via GitHub)" <gi...@apache.org>.
turboFei commented on PR #41181:
URL: https://github.com/apache/spark/pull/41181#issuecomment-1550976280

   > this PR doesn't create new ConfigMaps, it either uses a user pre-set config map (no creation) or just reuse the config map created by driver which is created if necessary.
   
   yes, this PR doesn't create new ConfigMap.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #41181: [SPARK-43504][K8S] Mounts the hadoop config map on the executor pod

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on PR #41181:
URL: https://github.com/apache/spark/pull/41181#issuecomment-1571318994

   Merged to master for Apache Spark 3.5.0.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] yaooqinn commented on pull request #41181: [SPARK-43504][K8S] Mounts the hadoop config map on the executor pod

Posted by "yaooqinn (via GitHub)" <gi...@apache.org>.
yaooqinn commented on PR #41181:
URL: https://github.com/apache/spark/pull/41181#issuecomment-1558489805

   > Hadoop and spark are different components, it is better to maintain them separately.
   
   I do not fully agree. In the early days, Hadoop may be special. We have a specific code path to read HADOOP_CONF_DIR. But now Hadoop is an option, as we have other options for storage and scheduling, especially on the cloud or Kubernetes.
   
   Maybe we shall treat it like hive configurations or other third-party components to reduce the maintenance burden and complexity of the deployment.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] advancedxy commented on pull request #41181: [SPARK-43504][K8S] Mounts the hadoop config map on the executor pod

Posted by "advancedxy (via GitHub)" <gi...@apache.org>.
advancedxy commented on PR #41181:
URL: https://github.com/apache/spark/pull/41181#issuecomment-1550797386

   > Thank you for making a PR, @turboFei .
   > 
   > However, this PR might cause a outage because the number of configMap is controlled by quota.
   > 
   > ```
   > $ kubectl describe quota | grep configmaps
   > count/configmaps                                                  4     150
   > ```
   > 
   > To avoid the production outage, this should be under a new configuration with `false` by default at least.
   
   150 is a bit small for serious production usage, we may add this note in the `running_on_k8s.md`  documentation.
   
   And BTW, this MR doesn't create new ConfigMaps, it either uses a user pre-set config map (no creation) or just reuse the config map created by driver which is created if necessary. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] turboFei commented on pull request #41181: [SPARK-43504][K8S] Mounts the hadoop config map on the executor pod

Posted by "turboFei (via GitHub)" <gi...@apache.org>.
turboFei commented on PR #41181:
URL: https://github.com/apache/spark/pull/41181#issuecomment-1551222246

   thanks @advancedxy add more UT to check additionalPodSystemProperties and executor pod.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] turboFei commented on a diff in pull request #41181: [SPARK-43504][K8S] Mounts the hadoop config map on the executor pod

Posted by "turboFei (via GitHub)" <gi...@apache.org>.
turboFei commented on code in PR #41181:
URL: https://github.com/apache/spark/pull/41181#discussion_r1195084378


##########
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/HadoopConfExecutorFeatureStep.scala:
##########
@@ -0,0 +1,65 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.k8s.features
+
+import io.fabric8.kubernetes.api.model.{ContainerBuilder, PodBuilder, VolumeBuilder}
+
+import org.apache.spark.deploy.k8s.{KubernetesConf, SparkPod}
+import org.apache.spark.deploy.k8s.Config.KUBERNETES_EXECUTOR_DISABLE_CONFIGMAP
+import org.apache.spark.deploy.k8s.Constants._
+
+/**
+ * Mounts the Hadoop configuration on the executor pod.
+ */
+private[spark] class HadoopConfExecutorFeatureStep(conf: KubernetesConf)
+  extends KubernetesFeatureConfigStep {
+
+  override def configurePod(pod: SparkPod): SparkPod = {
+    conf.getOption(HADOOP_CONFIG_MAP_NAME) match {

Review Comment:
   thanks, will check it



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] turboFei commented on pull request #41181: [SPARK-43504][K8S] Mounts the hadoop config map on the executor pod

Posted by "turboFei (via GitHub)" <gi...@apache.org>.
turboFei commented on PR #41181:
URL: https://github.com/apache/spark/pull/41181#issuecomment-1556452364

   > And putting and locating extra configuration files in SPARK_HOME/conf is also a suggested way from our docs, so is this step necessary?
   
   I think it is necessary.
   
   Hadoop and spark are different components, it is better to maintain them separately.
   
   In our company, we have conf version for hadoop conf, so we do not put hadoop config files under SPARK_HOME/conf and use soft link to manage the hadoop conf.
   
   
   > Alternatively, if both exist, what is the precedence between them? Is it idempotent?
   
   In this pr, it just mounts the hadoop config map in the executor side and the hadoop conf mounted is absolute same with that in driver pod.
   
   As shown below, the SPARK_CONF_DIR has higher precedence.
   https://github.com/apache/spark/blob/e096bce604ed6fab37713437ac0d985673910537/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh#L68-L76
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] turboFei commented on pull request #41181: [SPARK-43504][K8S] Mounts the hadoop config map on the executor pod

Posted by "turboFei (via GitHub)" <gi...@apache.org>.
turboFei commented on PR #41181:
URL: https://github.com/apache/spark/pull/41181#issuecomment-1562455824

   > I can also help with a use case for this, usually the submission client is on a single environment (Lets say we have it on cloud), and with spark on k8s, we can easily run jobs in different envs like in private Cloud Clusters being submitted from public Cloud. Where we would need diff properties to be passed for the submission client as well as for drivers and executors. This is also a use case where mounting the hadoopConfMap in executors would help in making the task easy to maintain the configs.
   
   
   Yes, I think this pr is general for hadoop conf use case, and it does not create more resource because it just use the existing config map.
   
   @yaooqinn @dongjoon-hyun could you help to take another look? Appreciated for your help.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] advancedxy commented on a diff in pull request #41181: [SPARK-43504][K8S] Mounts the hadoop config map on the executor pod

Posted by "advancedxy (via GitHub)" <gi...@apache.org>.
advancedxy commented on code in PR #41181:
URL: https://github.com/apache/spark/pull/41181#discussion_r1196324778


##########
resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/features/HadoopConfExecutorFeatureStepSuite.scala:
##########
@@ -0,0 +1,64 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.k8s.features
+
+import java.io.File
+import java.nio.charset.StandardCharsets.UTF_8
+
+import com.google.common.io.Files
+
+import org.apache.spark.{SparkConf, SparkFunSuite}
+import org.apache.spark.deploy.k8s.{KubernetesTestConf, SecretVolumeUtils, SparkPod}
+import org.apache.spark.deploy.k8s.Constants._
+import org.apache.spark.deploy.k8s.features.KubernetesFeaturesTestUtils.containerHasEnvVar
+import org.apache.spark.util.{SparkConfWithEnv, Utils}
+
+class HadoopConfExecutorFeatureStepSuite extends SparkFunSuite  {
+  import SecretVolumeUtils._
+
+  test("SPARK-43504: mounts the hadoop config map on the executor pod") {
+    val confDir = Utils.createTempDir()
+    val confFiles = Set("core-site.xml", "hdfs-site.xml")
+
+    confFiles.foreach { f =>
+      Files.write("some data", new File(confDir, f), UTF_8)
+    }
+
+    val driverSparkConf = new SparkConfWithEnv(
+      Map(ENV_HADOOP_CONF_DIR -> confDir.getAbsolutePath()))
+    val executorSparkConf = new SparkConf(false)
+
+    val driverConf = KubernetesTestConf.createDriverConf(sparkConf = driverSparkConf)
+    val driverStep = new HadoopConfDriverFeatureStep(driverConf)
+    driverStep.getAdditionalPodSystemProperties().foreach { case (key, value) =>

Review Comment:
   nit: you may need to add/update `HadoopConfDriverFeatureStepSuite` to make sure the `additionalPodSystemProperties` is correctly generated: empty or add a new entry.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] turboFei commented on pull request #41181: [SPARK-43504][K8S] Mounts the hadoop config map on the executor pod

Posted by "turboFei (via GitHub)" <gi...@apache.org>.
turboFei commented on PR #41181:
URL: https://github.com/apache/spark/pull/41181#issuecomment-1558479516

   gentle ping @yaooqinn @dongjoon-hyun 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] yaooqinn commented on pull request #41181: [SPARK-43504][K8S] Mounts the hadoop config map on the executor pod

Posted by "yaooqinn (via GitHub)" <gi...@apache.org>.
yaooqinn commented on PR #41181:
URL: https://github.com/apache/spark/pull/41181#issuecomment-1575994314

   thanks, @dongjoon-hyun and @turboFei. Late +1 from my side.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] turboFei commented on pull request #41181: [SPARK-43504][K8S] Mounts the hadoop config map on the executor pod

Posted by "turboFei (via GitHub)" <gi...@apache.org>.
turboFei commented on PR #41181:
URL: https://github.com/apache/spark/pull/41181#issuecomment-1558506309

   > Maybe we shall treat it like hive configurations or other third-party components to reduce the maintenance burden and complexity of the deployment.
   
   I believe different companies treat hadoop conf differently.
   
   For ebay, we add conf version for hadoop conf, because it is used by
   - public  hadoop client nodes
   - private hadoop client nodes
   - hadoop service nodes(nn, rm, hms, kyuubi)
   
   <img width="974" alt="image" src="https://github.com/apache/spark/assets/6757692/83d503bf-bbfb-4ede-80e9-120ad8a24b38">
   
   and we have an RESTful service to download the hdaoop conf and we use soft link to manage them locally.
   
   Recently, we are making migration, from spark3 + hadoop2 to spark3 + hadoop3.
   
   For hadoop2 and hadoop3, their hadoop conf are even different for us.
   
   So to manage the hadoop conf well,  in ebay, we do not want to put the hadoop conf files and spark conf files together.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] turboFei commented on pull request #41181: [SPARK-43504][K8S] Mounts the hadoop config map on the executor pod

Posted by "turboFei (via GitHub)" <gi...@apache.org>.
turboFei commented on PR #41181:
URL: https://github.com/apache/spark/pull/41181#issuecomment-1549024117

   > would you mind updating "Does this PR introduce any user-facing change?"
   
   updated


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun closed pull request #41181: [SPARK-43504][K8S] Mounts the hadoop config map on the executor pod

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun closed pull request #41181: [SPARK-43504][K8S] Mounts the hadoop config map on the executor pod
URL: https://github.com/apache/spark/pull/41181


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] turboFei commented on pull request #41181: [SPARK-43504][K8S] Mounts the hadoop config map on the executor pod

Posted by "turboFei (via GitHub)" <gi...@apache.org>.
turboFei commented on PR #41181:
URL: https://github.com/apache/spark/pull/41181#issuecomment-1571327090

   thanks all !!!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] infoankitp commented on pull request #41181: [SPARK-43504][K8S] Mounts the hadoop config map on the executor pod

Posted by "infoankitp (via GitHub)" <gi...@apache.org>.
infoankitp commented on PR #41181:
URL: https://github.com/apache/spark/pull/41181#issuecomment-1562256974

   I can also help with a use case for this, usually the submission client is on a single environment (Lets say we have it on cloud), and with spark on k8s, we can easily run jobs in different envs like in private Cloud Clusters being submitted from public Cloud. Where we would need diff properties to be passed for the submission client as well as for drivers and executors. This is also a use case where mounting the hadoopConfMap in executors would help in making the task easy to maintain the configs. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] turboFei commented on pull request #41181: [SPARK-43504][K8S] Mounts the hadoop config map on the executor pod

Posted by "turboFei (via GitHub)" <gi...@apache.org>.
turboFei commented on PR #41181:
URL: https://github.com/apache/spark/pull/41181#issuecomment-1552365072

   gentle ping @dongjoon-hyun would you like to review again? thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] turboFei commented on pull request #41181: [SPARK-43504][K8S] Mounts the hadoop config map on the executor pod

Posted by "turboFei (via GitHub)" <gi...@apache.org>.
turboFei commented on PR #41181:
URL: https://github.com/apache/spark/pull/41181#issuecomment-1551118696

   the UT has passed, gentle ping @dongjoon-hyun 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #41181: [SPARK-43504][K8S] Mounts the hadoop config map on the executor pod

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on PR #41181:
URL: https://github.com/apache/spark/pull/41181#issuecomment-1550983146

   Oh, got it. Thank you for correcting me.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] advancedxy commented on a diff in pull request #41181: [SPARK-43504][K8S] Mounts the hadoop config map on the executor pod

Posted by "advancedxy (via GitHub)" <gi...@apache.org>.
advancedxy commented on code in PR #41181:
URL: https://github.com/apache/spark/pull/41181#discussion_r1195068724


##########
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/HadoopConfExecutorFeatureStep.scala:
##########
@@ -0,0 +1,65 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.k8s.features
+
+import io.fabric8.kubernetes.api.model.{ContainerBuilder, PodBuilder, VolumeBuilder}
+
+import org.apache.spark.deploy.k8s.{KubernetesConf, SparkPod}
+import org.apache.spark.deploy.k8s.Config.KUBERNETES_EXECUTOR_DISABLE_CONFIGMAP
+import org.apache.spark.deploy.k8s.Constants._
+
+/**
+ * Mounts the Hadoop configuration on the executor pod.
+ */
+private[spark] class HadoopConfExecutorFeatureStep(conf: KubernetesConf)
+  extends KubernetesFeatureConfigStep {
+
+  override def configurePod(pod: SparkPod): SparkPod = {
+    conf.getOption(HADOOP_CONFIG_MAP_NAME) match {
+      case Some(hadoopConfigMap) if !conf.get(KUBERNETES_EXECUTOR_DISABLE_CONFIGMAP) =>

Review Comment:
   `KUBERNETES_EXECUTOR_DISABLE_CONFIGMAP`  I believe this configuration is for SPARK_CONF_DIR config map solely?  Other places such as `PodTemplateConfigMapStep` still creates ConfigMaps.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] advancedxy commented on a diff in pull request #41181: [SPARK-43504][K8S] Mounts the hadoop config map on the executor pod

Posted by "advancedxy (via GitHub)" <gi...@apache.org>.
advancedxy commented on code in PR #41181:
URL: https://github.com/apache/spark/pull/41181#discussion_r1195069910


##########
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/HadoopConfExecutorFeatureStep.scala:
##########
@@ -0,0 +1,65 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.k8s.features
+
+import io.fabric8.kubernetes.api.model.{ContainerBuilder, PodBuilder, VolumeBuilder}
+
+import org.apache.spark.deploy.k8s.{KubernetesConf, SparkPod}
+import org.apache.spark.deploy.k8s.Config.KUBERNETES_EXECUTOR_DISABLE_CONFIGMAP
+import org.apache.spark.deploy.k8s.Constants._
+
+/**
+ * Mounts the Hadoop configuration on the executor pod.
+ */
+private[spark] class HadoopConfExecutorFeatureStep(conf: KubernetesConf)
+  extends KubernetesFeatureConfigStep {
+
+  override def configurePod(pod: SparkPod): SparkPod = {
+    conf.getOption(HADOOP_CONFIG_MAP_NAME) match {

Review Comment:
   Nit: you may use `pod.transform` like `HadoopConfDriverFeatureStep`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] pan3793 commented on pull request #41181: [SPARK-43504][K8S] Mounts the hadoop config map on the executor pod

Posted by "pan3793 (via GitHub)" <gi...@apache.org>.
pan3793 commented on PR #41181:
URL: https://github.com/apache/spark/pull/41181#issuecomment-1549019299

   I encountered the same issues w/ Spark 3.3.1. This sounds like a regression (I suppose it works before SPARK-25815, I don't have experience w/ running such an old version of Spark on K8s).
   
   The key point is that the executor needs to download artifacts during the bootstrap phase, so the assumption in SPARK-25815 is not always true.
   
   > adding the Hadoop config to the executor pods: this is not needed
   > since the Spark driver will serialize the Hadoop config and send
   > it to executors when running tasks.
   
   Given the executor use `SparkHadoopUtil.get.newConfiguration(conf)` to construct Hadoop conf,  we can put the related hdfs/s3 configurations into `spark-defaults.conf` w/ `spark.hadoop.` prefix as a workaround.
   
   https://github.com/apache/spark/blob/0df4c01b7c4d4476fe0de9dccb3425cc1295fc85/core/src/main/scala/org/apache/spark/executor/Executor.scala#L1006-L1012
   
   This PR definitely fixes some use cases, @turboFei would you mind updating "Does this PR introduce any user-facing change?"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] turboFei commented on pull request #41181: [SPARK-43504][K8S] Mounts the hadoop config map on the executor pod

Posted by "turboFei (via GitHub)" <gi...@apache.org>.
turboFei commented on PR #41181:
URL: https://github.com/apache/spark/pull/41181#issuecomment-1550791604

   thanks for comments, I will check it


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] yaooqinn commented on pull request #41181: [SPARK-43504][K8S] Mounts the hadoop config map on the executor pod

Posted by "yaooqinn (via GitHub)" <gi...@apache.org>.
yaooqinn commented on PR #41181:
URL: https://github.com/apache/spark/pull/41181#issuecomment-1554062154

   The Hadoop configurations can be propagated after https://github.com/apache/spark/pull/27735. And putting and locating extra configuration files in SPARK_HOME/conf is also a suggested way from our docs, so is this step necessary?
   
   Alternatively, if both exist, what is the precedence between them? Is it idempotent?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org