You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "holdenk (via GitHub)" <gi...@apache.org> on 2023/03/11 21:13:41 UTC

[GitHub] [spark] holdenk opened a new pull request, #40383: [SPARK-42762][K8S][MINOR] Improve logging in K8s on disconnect when using statefulsets.

holdenk opened a new pull request, #40383:
URL: https://github.com/apache/spark/pull/40383

   
   ### What changes were proposed in this pull request?
   
   Track network connections during exec ID allocation.
   
   ### Why are the changes needed?
   
   Disconnect log messages are overwhelmed making it difficult to isolate networking issues.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Log will become easier to read.
   
   ### How was this patch tested?
   
   Existing tests indicate no regression, log only change.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] holdenk commented on pull request #40383: [SPARK-42762][K8S] Improve logging in K8s on disconnect when using StatefulSets

Posted by "holdenk (via GitHub)" <gi...@apache.org>.
holdenk commented on PR #40383:
URL: https://github.com/apache/spark/pull/40383#issuecomment-1465064951

   So we only keep it in the window between an executor requesting it's ID and it disconnecting to restabmudh a proper connection (generally <1s). Given it's a few bytes of information per outstanding executor without an ID assigned yet I'm not super worried about the memory usage.
   
   We could also close the connection from the driver side instead I suppose but that might cause a race condition with the executor (would have to check how we handle disconnects and if we process outstanding messages).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #40383: [SPARK-42762][K8S] Improve logging in K8s on disconnect when using StatefulSets

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #40383:
URL: https://github.com/apache/spark/pull/40383#discussion_r1133169570


##########
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackend.scala:
##########
@@ -294,10 +295,15 @@ private[spark] class KubernetesClusterSchedulerBackend(
   }
 
   private class KubernetesDriverEndpoint extends DriverEndpoint {
+
+    protected val execIDRequester = new HashMap[RpcAddress, String]
+
     private def generateExecID(context: RpcCallContext): PartialFunction[Any, Unit] = {
       case x: GenerateExecID =>
         val newId = execId.incrementAndGet().toString
         context.reply(newId)
+        val executorAddress = context.senderAddress
+        execIDRequester(executorAddress) = newId

Review Comment:
   Is this used by `StatefulSets` case only?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #40383: [SPARK-42762][K8S][MINOR] Improve logging in K8s on disconnect when using statefulsets.

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #40383:
URL: https://github.com/apache/spark/pull/40383#discussion_r1133168958


##########
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackend.scala:
##########
@@ -294,10 +295,15 @@ private[spark] class KubernetesClusterSchedulerBackend(
   }
 
   private class KubernetesDriverEndpoint extends DriverEndpoint {
+
+    protected val execIDRequester = new HashMap[RpcAddress, String]

Review Comment:
   This looks like monotonically increasing and will cause OOM in JVM in case of dynamic allocation or executor rolling case, especially long-running batch jobs or streaming jobs. What do you think about that, @holdenk ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #40383: [SPARK-42762][K8S] Improve logging in K8s on disconnect when using StatefulSets

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on PR #40383:
URL: https://github.com/apache/spark/pull/40383#issuecomment-1465067025

   Thank you for the answer. Let me follow the code patch again.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #40383: [SPARK-42762][K8S][MINOR] Improve logging in K8s on disconnect when using statefulsets.

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #40383:
URL: https://github.com/apache/spark/pull/40383#discussion_r1133168958


##########
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackend.scala:
##########
@@ -294,10 +295,15 @@ private[spark] class KubernetesClusterSchedulerBackend(
   }
 
   private class KubernetesDriverEndpoint extends DriverEndpoint {
+
+    protected val execIDRequester = new HashMap[RpcAddress, String]

Review Comment:
   This looks like monotonically increasing and will cause OOM in JVM in case of dynamic allocation or executor rolling case. What do you think about that, @holdenk ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] holdenk commented on pull request #40383: [SPARK-42762][K8S] Improve logging in K8s on disconnect when using StatefulSets

Posted by "holdenk (via GitHub)" <gi...@apache.org>.
holdenk commented on PR #40383:
URL: https://github.com/apache/spark/pull/40383#issuecomment-1465065222

   it would be used by in any situation where the executors don't have IDs pre-assigned to them, that might include other allocation techniques now that its pluggable but the only built in one is statefulsets.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun closed pull request #40383: [SPARK-42762][K8S] Improve logging in K8s on disconnect when using StatefulSets

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun closed pull request #40383: [SPARK-42762][K8S] Improve logging in K8s on disconnect when using StatefulSets
URL: https://github.com/apache/spark/pull/40383


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org