You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "EnricoMi (via GitHub)" <gi...@apache.org> on 2024/03/11 14:23:51 UTC

[PR] [SPARK-47323][K8S] Support custom executor log urls [spark]

EnricoMi opened a new pull request, #45464:
URL: https://github.com/apache/spark/pull/45464

   ### What changes were proposed in this pull request?
   Make Kubernetes resource manager support existing config `spark.ui.custom.executor.log.url`.
   
   Allow for
   
       spark.ui.custom.executor.log.url="https://my.custom.url/logs?app={{APP_ID}}&executor={{EXECUTOR_ID}}"
   
   ### Why are the changes needed?
   Running Spark on Kubernetes requires persisting the logs elsewhere. Having the Spark UI link to those logs is very useful. This is currently only supported by YARN.
   
   ### Does this PR introduce _any_ user-facing change?
   Spark UI provides links to logs when run on Kubernetes.
   
   ### How was this patch tested?
   Unit test and manually tested on minikube K8S cluster.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   No


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47323][K8S] Support custom executor log urls [spark]

Posted by "pan3793 (via GitHub)" <gi...@apache.org>.
pan3793 commented on PR #45464:
URL: https://github.com/apache/spark/pull/45464#issuecomment-1988639547

   @EnricoMi this looks much simpler than my previous attempt https://github.com/apache/spark/pull/38357


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47323][K8S] Support custom executor log urls [spark]

Posted by "EnricoMi (via GitHub)" <gi...@apache.org>.
EnricoMi commented on PR #45464:
URL: https://github.com/apache/spark/pull/45464#issuecomment-2021252924

   CC @dongjoon-hyun


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47323][K8S] Support custom executor log urls [spark]

Posted by "EnricoMi (via GitHub)" <gi...@apache.org>.
EnricoMi commented on PR #45464:
URL: https://github.com/apache/spark/pull/45464#issuecomment-2067722199

   @dongjoon-hyun What do you think?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47323][K8S] Support custom executor log urls [spark]

Posted by "EnricoMi (via GitHub)" <gi...@apache.org>.
EnricoMi commented on code in PR #45464:
URL: https://github.com/apache/spark/pull/45464#discussion_r1519987571


##########
docs/configuration.md:
##########
@@ -1627,15 +1627,13 @@ Apart from these, the following properties are also available, and may be useful
   <td><code>spark.ui.custom.executor.log.url</code></td>
   <td>(none)</td>
   <td>
-    Specifies custom spark executor log URL for supporting external log service instead of using cluster
+    Specifies custom Spark executor log URL for supporting external log service instead of using cluster
     managers' application log URLs in Spark UI. Spark will support some path variables via patterns
     which can vary on cluster manager. Please check the documentation for your cluster manager to
     see which patterns are supported, if any. <p/>
     Please note that this configuration also replaces original log urls in event log,
     which will be also effective when accessing the application on history server. The new log urls must be
     permanent, otherwise you might have dead link for executor log urls.
-    <p/>
-    For now, only YARN mode supports this configuration

Review Comment:
   I think the `Please check the documentation for your cluster manager to see which patterns are supported, if any.` is sufficient, there is no need to list which manager supports this conf and which don't. That list easily gets out-dated.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47323][K8S] Support custom executor log urls [spark]

Posted by "mridulm (via GitHub)" <gi...@apache.org>.
mridulm commented on PR #45464:
URL: https://github.com/apache/spark/pull/45464#issuecomment-1988591514

   +CC @thejdeep 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47323][K8S] Support custom executor log urls [spark]

Posted by "EnricoMi (via GitHub)" <gi...@apache.org>.
EnricoMi commented on code in PR #45464:
URL: https://github.com/apache/spark/pull/45464#discussion_r1520050341


##########
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesExecutorBackend.scala:
##########
@@ -28,6 +28,46 @@ import org.apache.spark.rpc._
 import org.apache.spark.scheduler.cluster.CoarseGrainedClusterMessages._
 import org.apache.spark.util.Utils
 
+/**
+ * Custom implementation of CoarseGrainedExecutorBackend for Kubernetes resource manager.
+ * This class provides kubernetes executor attributes.
+ */
+private[spark] class KubernetesExecutorBackend(
+    rpcEnv: RpcEnv,
+    appId: String,
+    driverUrl: String,
+    executorId: String,
+    bindAddress: String,
+    hostname: String,
+    podName: String,
+    cores: Int,
+    env: SparkEnv,
+    resourcesFile: Option[String],
+    resourceProfile: ResourceProfile)
+  extends CoarseGrainedExecutorBackend(
+    rpcEnv,
+    driverUrl,
+    executorId,
+    bindAddress,
+    hostname,
+    cores,
+    env,
+    resourcesFile,
+    resourceProfile) with Logging {
+
+  override def extractAttributes: Map[String, String] = {
+    super.extractAttributes ++
+      Map(
+        "LOG_FILES" -> "log",
+        "APP_ID" -> appId,
+        "EXECUTOR_ID" -> executorId,
+        "HOSTNAME" -> hostname,
+        "POD_NAME" -> podName

Review Comment:
   I have added the namespace.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47323][K8S] Support custom executor log urls [spark]

Posted by "EnricoMi (via GitHub)" <gi...@apache.org>.
EnricoMi commented on PR #45464:
URL: https://github.com/apache/spark/pull/45464#issuecomment-1988911223

   > @EnricoMi this looks much simpler than my previous attempt #38357
   
   Thanks for the pointer! I have a PR for driver log support in the pipeline.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47323][K8S] Support custom executor log urls [spark]

Posted by "pan3793 (via GitHub)" <gi...@apache.org>.
pan3793 commented on code in PR #45464:
URL: https://github.com/apache/spark/pull/45464#discussion_r1519873684


##########
docs/configuration.md:
##########
@@ -1627,15 +1627,13 @@ Apart from these, the following properties are also available, and may be useful
   <td><code>spark.ui.custom.executor.log.url</code></td>
   <td>(none)</td>
   <td>
-    Specifies custom spark executor log URL for supporting external log service instead of using cluster
+    Specifies custom Spark executor log URL for supporting external log service instead of using cluster
     managers' application log URLs in Spark UI. Spark will support some path variables via patterns
     which can vary on cluster manager. Please check the documentation for your cluster manager to
     see which patterns are supported, if any. <p/>
     Please note that this configuration also replaces original log urls in event log,
     which will be also effective when accessing the application on history server. The new log urls must be
     permanent, otherwise you might have dead link for executor log urls.
-    <p/>
-    For now, only YARN mode supports this configuration

Review Comment:
   standalone?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47323][K8S] Support custom executor log urls [spark]

Posted by "pan3793 (via GitHub)" <gi...@apache.org>.
pan3793 commented on code in PR #45464:
URL: https://github.com/apache/spark/pull/45464#discussion_r1519858538


##########
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesExecutorBackend.scala:
##########
@@ -28,6 +28,46 @@ import org.apache.spark.rpc._
 import org.apache.spark.scheduler.cluster.CoarseGrainedClusterMessages._
 import org.apache.spark.util.Utils
 
+/**
+ * Custom implementation of CoarseGrainedExecutorBackend for Kubernetes resource manager.
+ * This class provides kubernetes executor attributes.
+ */
+private[spark] class KubernetesExecutorBackend(
+    rpcEnv: RpcEnv,
+    appId: String,
+    driverUrl: String,
+    executorId: String,
+    bindAddress: String,
+    hostname: String,
+    podName: String,
+    cores: Int,
+    env: SparkEnv,
+    resourcesFile: Option[String],
+    resourceProfile: ResourceProfile)
+  extends CoarseGrainedExecutorBackend(
+    rpcEnv,
+    driverUrl,
+    executorId,
+    bindAddress,
+    hostname,
+    cores,
+    env,
+    resourcesFile,
+    resourceProfile) with Logging {
+
+  override def extractAttributes: Map[String, String] = {
+    super.extractAttributes ++
+      Map(
+        "LOG_FILES" -> "log",
+        "APP_ID" -> appId,
+        "EXECUTOR_ID" -> executorId,
+        "HOSTNAME" -> hostname,
+        "POD_NAME" -> podName

Review Comment:
   the external log service for K8s is likely to use `namespace` and `pod name` to query the logs, could you please expose NAMESPACE too?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org