You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "zhouyifan279 (via GitHub)" <gi...@apache.org> on 2023/04/03 13:02:39 UTC

[GitHub] [spark] zhouyifan279 opened a new pull request, #40645: [SPARK-43014] Do not override `spark.app.submitTime` in k8s cluster mode driver

zhouyifan279 opened a new pull request, #40645:
URL: https://github.com/apache/spark/pull/40645

   
   ### What changes were proposed in this pull request?
   Do not override `spark.app.submitTime` in k8s cluster mode driver
   
   ### Why are the changes needed?
   If `spark.app.submitTime` is overridden, we can not get real submit time.
   
   
   ### Does this PR introduce _any_ user-facing change?
   'No'
   
   
   ### How was this patch tested?
   Add test "SPARK-43014: Do not override `spark.app.submitTime` in k8s cluster mode driver"
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #40645: [SPARK-43014][CORE][K8S] Support `spark.kubernetes.setSubmitTimeInDriver`

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on PR #40645:
URL: https://github.com/apache/spark/pull/40645#issuecomment-1540367826

   Unfortunately, it seems to fail again.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #40645: [SPARK-43014] Do not overwrite `spark.app.submitTime` in k8s cluster mode driver

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #40645:
URL: https://github.com/apache/spark/pull/40645#discussion_r1164416835


##########
core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala:
##########
@@ -904,7 +904,12 @@ private[spark] class SparkSubmit extends Logging {
       childArgs ++= Seq("--verbose")
     }
 
-    sparkConf.set("spark.app.submitTime", System.currentTimeMillis().toString)
+    val setSubmitTimeInClusterModeDriver =
+      sparkConf.getBoolean("spark.kubernetes.setSubmitTimeInDriver", false)

Review Comment:
   When we introduce a new feature, we hide it first at least for a single Apache Spark release in order to provide a testing capability without the default behavior changes. In other words, this should be `true`.
   ```scala
   - sparkConf.getBoolean("spark.kubernetes.setSubmitTimeInDriver", false)
   - sparkConf.getBoolean("spark.kubernetes.setSubmitTimeInDriver", true)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhouyifan279 commented on a diff in pull request #40645: [SPARK-43014][CORE] Support `spark.kubernetes.setSubmitTimeInDriver`

Posted by "zhouyifan279 (via GitHub)" <gi...@apache.org>.
zhouyifan279 commented on code in PR #40645:
URL: https://github.com/apache/spark/pull/40645#discussion_r1188057220


##########
core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala:
##########
@@ -550,6 +550,44 @@ class SparkSubmitSuite
     Files.delete(Paths.get("TestUDTF.jar"))
   }
 
+  test("SPARK-43014: Set `spark.app.submitTime` if missing ") {
+    val clArgs1 = Seq(
+      "--deploy-mode", "client",
+      "--master", "k8s://host:port",
+      "--class", "org.SomeClass",
+      "/home/thejar.jar")
+    val appArgs1 = new SparkSubmitArguments(clArgs1)
+    val (_, _, conf1, _) = submit.prepareSubmitEnvironment(appArgs1)
+    conf1.getOption("spark.app.submitTime").isDefined should be(true)
+
+    val submitTime = 1234567.toString
+    val clArgs2 = Seq(
+      "--deploy-mode", "client",
+      "--master", "k8s://host:port",
+      "--class", "org.SomeClass",
+      "--conf", "spark.app.submitTime=" + submitTime,
+      "/home/thejar.jar")
+    val appArgs2 = new SparkSubmitArguments(clArgs2)
+    val (_, _, conf2, _) = submit.prepareSubmitEnvironment(appArgs2)
+    conf2.get("spark.app.submitTime") should be (submitTime)
+  }
+
+  test("SPARK-43014: Overwrite `spark.app.submitTime` in k8s cluster mode driver " +
+    "when `spark.kubernetes.setSubmitTimeInDriver` is true") {

Review Comment:
   OK



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] yaooqinn commented on a diff in pull request #40645: [SPARK-43014] Do not overwrite `spark.app.submitTime` in k8s cluster mode driver

Posted by "yaooqinn (via GitHub)" <gi...@apache.org>.
yaooqinn commented on code in PR #40645:
URL: https://github.com/apache/spark/pull/40645#discussion_r1156739751


##########
core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala:
##########
@@ -904,7 +904,10 @@ private[spark] class SparkSubmit extends Logging {
       childArgs ++= Seq("--verbose")
     }
 
-    sparkConf.set("spark.app.submitTime", System.currentTimeMillis().toString)
+    if (!isKubernetesClusterModeDriver) {

Review Comment:
   Simply setIfMissing? then it will be set when we first touch it
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhouyifan279 commented on pull request #40645: [SPARK-43014] Do not overwrite `spark.app.submitTime` in k8s cluster mode driver

Posted by "zhouyifan279 (via GitHub)" <gi...@apache.org>.
zhouyifan279 commented on PR #40645:
URL: https://github.com/apache/spark/pull/40645#issuecomment-1504982149

   @dongjoon-hyun Added a new configuration `spark.kubernetes.setSubmitTimeInDriver` to control which `submitTime` Spark should use.
   Now Spark will set `spark.app.submitTime` in the following cases:
   
   1. The application is deployed using spark-submit in client/cluster mode. 
   SubmitTime is the time spark-submit executed.
   2. The application is deployed by other ways, such as [spark-on-k8s-operator](https://googlecloudplatform.github.io/spark-on-k8s-operator/)
   SubmitTime is the time driver pod started.
   3. The application is deployed using spark-submit in cluster mode and `spark.kubernetes.setSubmitTimeInDriver` is true. 
   SubmitTime is the time driver pod started.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] yaooqinn commented on a diff in pull request #40645: [SPARK-43014] Do not overwrite `spark.app.submitTime` in k8s cluster mode driver

Posted by "yaooqinn (via GitHub)" <gi...@apache.org>.
yaooqinn commented on code in PR #40645:
URL: https://github.com/apache/spark/pull/40645#discussion_r1156739751


##########
core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala:
##########
@@ -904,7 +904,10 @@ private[spark] class SparkSubmit extends Logging {
       childArgs ++= Seq("--verbose")
     }
 
-    sparkConf.set("spark.app.submitTime", System.currentTimeMillis().toString)
+    if (!isKubernetesClusterModeDriver) {

Review Comment:
   Simply setIfMissing?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhouyifan279 commented on pull request #40645: [SPARK-43014] Do not overwrite `spark.app.submitTime` in k8s cluster mode driver

Posted by "zhouyifan279 (via GitHub)" <gi...@apache.org>.
zhouyifan279 commented on PR #40645:
URL: https://github.com/apache/spark/pull/40645#issuecomment-1495345568

   cc @yaooqinn @dongjoon-hyun 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #40645: [SPARK-43014] Do not overwrite `spark.app.submitTime` in k8s cluster mode driver

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #40645:
URL: https://github.com/apache/spark/pull/40645#discussion_r1186490829


##########
core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala:
##########
@@ -550,6 +550,44 @@ class SparkSubmitSuite
     Files.delete(Paths.get("TestUDTF.jar"))
   }
 
+  test("SPARK-43014: Set `spark.app.submitTime` if missing ") {
+    val clArgs1 = Seq(
+      "--deploy-mode", "client",
+      "--master", "k8s://host:port",
+      "--class", "org.SomeClass",
+      "/home/thejar.jar")
+    val appArgs1 = new SparkSubmitArguments(clArgs1)
+    val (_, _, conf1, _) = submit.prepareSubmitEnvironment(appArgs1)
+    conf1.getOption("spark.app.submitTime").isDefined should be(true)
+
+    val submitTime = 1234567.toString

Review Comment:
   ```scala
   - val submitTime = 1234567.toString
   + val submitTime = "1234567"
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #40645: [SPARK-43014] Do not overwrite `spark.app.submitTime` in k8s cluster mode driver

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #40645:
URL: https://github.com/apache/spark/pull/40645#discussion_r1186489865


##########
core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala:
##########
@@ -904,7 +904,12 @@ private[spark] class SparkSubmit extends Logging {
       childArgs ++= Seq("--verbose")
     }
 
-    sparkConf.set("spark.app.submitTime", System.currentTimeMillis().toString)
+    val setSubmitTimeInClusterModeDriver =
+      sparkConf.getBoolean("spark.kubernetes.setSubmitTimeInDriver", true)
+    if (!sparkConf.contains("spark.app.submitTime")
+      || isKubernetesClusterModeDriver && setSubmitTimeInClusterModeDriver ) {

Review Comment:
   Extra space.
   - `setSubmitTimeInClusterModeDriver )` -> `setSubmitTimeInClusterModeDriver)`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhouyifan279 commented on a diff in pull request #40645: [SPARK-43014] Do not overwrite `spark.app.submitTime` in k8s cluster mode driver

Posted by "zhouyifan279 (via GitHub)" <gi...@apache.org>.
zhouyifan279 commented on code in PR #40645:
URL: https://github.com/apache/spark/pull/40645#discussion_r1164912069


##########
core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala:
##########
@@ -904,7 +904,12 @@ private[spark] class SparkSubmit extends Logging {
       childArgs ++= Seq("--verbose")
     }
 
-    sparkConf.set("spark.app.submitTime", System.currentTimeMillis().toString)
+    val setSubmitTimeInClusterModeDriver =
+      sparkConf.getBoolean("spark.kubernetes.setSubmitTimeInDriver", false)

Review Comment:
   @dongjoon-hyun OK.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun closed pull request #40645: [SPARK-43014][CORE][K8S] Support `spark.kubernetes.setSubmitTimeInDriver`

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun closed pull request #40645: [SPARK-43014][CORE][K8S] Support `spark.kubernetes.setSubmitTimeInDriver`
URL: https://github.com/apache/spark/pull/40645


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhouyifan279 commented on a diff in pull request #40645: [SPARK-43014][CORE] Support `spark.kubernetes.setSubmitTimeInDriver`

Posted by "zhouyifan279 (via GitHub)" <gi...@apache.org>.
zhouyifan279 commented on code in PR #40645:
URL: https://github.com/apache/spark/pull/40645#discussion_r1188050033


##########
core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala:
##########
@@ -904,7 +904,12 @@ private[spark] class SparkSubmit extends Logging {
       childArgs ++= Seq("--verbose")
     }
 
-    sparkConf.set("spark.app.submitTime", System.currentTimeMillis().toString)
+    val setSubmitTimeInClusterModeDriver =
+      sparkConf.getBoolean("spark.kubernetes.setSubmitTimeInDriver", true)
+    if (!sparkConf.contains("spark.app.submitTime")
+      || isKubernetesClusterModeDriver && setSubmitTimeInClusterModeDriver ) {

Review Comment:
   Fixed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhouyifan279 commented on pull request #40645: [SPARK-43014][CORE][K8S] Support `spark.kubernetes.setSubmitTimeInDriver`

Posted by "zhouyifan279 (via GitHub)" <gi...@apache.org>.
zhouyifan279 commented on PR #40645:
URL: https://github.com/apache/spark/pull/40645#issuecomment-1540395802

   > Unfortunately, it seems to fail again.
   
   Let me trigger it again. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhouyifan279 commented on pull request #40645: [SPARK-43014] Do not overwrite `spark.app.submitTime` in k8s cluster mode driver

Posted by "zhouyifan279 (via GitHub)" <gi...@apache.org>.
zhouyifan279 commented on PR #40645:
URL: https://github.com/apache/spark/pull/40645#issuecomment-1535595335

   Gently ping @dongjoon-hyun 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] yaooqinn commented on a diff in pull request #40645: [SPARK-43014] Do not overwrite `spark.app.submitTime` in k8s cluster mode driver

Posted by "yaooqinn (via GitHub)" <gi...@apache.org>.
yaooqinn commented on code in PR #40645:
URL: https://github.com/apache/spark/pull/40645#discussion_r1159227423


##########
core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala:
##########
@@ -904,7 +904,10 @@ private[spark] class SparkSubmit extends Logging {
       childArgs ++= Seq("--verbose")
     }
 
-    sparkConf.set("spark.app.submitTime", System.currentTimeMillis().toString)
+    if (!isKubernetesClusterModeDriver) {

Review Comment:
   I am neutral about `override it`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhouyifan279 commented on a diff in pull request #40645: [SPARK-43014] Do not overwrite `spark.app.submitTime` in k8s cluster mode driver

Posted by "zhouyifan279 (via GitHub)" <gi...@apache.org>.
zhouyifan279 commented on code in PR #40645:
URL: https://github.com/apache/spark/pull/40645#discussion_r1156977007


##########
core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala:
##########
@@ -904,7 +904,10 @@ private[spark] class SparkSubmit extends Logging {
       childArgs ++= Seq("--verbose")
     }
 
-    sparkConf.set("spark.app.submitTime", System.currentTimeMillis().toString)
+    if (!isKubernetesClusterModeDriver) {

Review Comment:
   > Simply setIfMissing? then it will be set when we first touch it
   
   If user provide conf `spark.app.submitTime`, I think we should overwrite it.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #40645: [SPARK-43014] Do not overwrite `spark.app.submitTime` in k8s cluster mode driver

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on PR #40645:
URL: https://github.com/apache/spark/pull/40645#issuecomment-1501206457

   No because it will be a new regression in terms of the vulnerability. You are aiming to fix one thing and open another bug.
   > We'd better not to handle this case in Spark's code.
   
   That's the reason why I counter-propose you to add `a configuration` to support this proposed behavior and the existing behavior, @zhouyifan279 .
   - https://github.com/apache/spark/pull/40645#pullrequestreview-1370202973
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhouyifan279 commented on pull request #40645: [SPARK-43014][CORE][K8S] Support `spark.kubernetes.setSubmitTimeInDriver`

Posted by "zhouyifan279 (via GitHub)" <gi...@apache.org>.
zhouyifan279 commented on PR #40645:
URL: https://github.com/apache/spark/pull/40645#issuecomment-1541170207

   > Thank you, @zhouyifan279 .
   > 
   > BTW, when I said the following.
   > 
   > > Please re-trigger the failed test pipeline.
   > 
   > It means this. `Re-run failed jobs` is always faster. ![Screen Shot 2023-05-09 at 9 25 51 AM](https://user-images.githubusercontent.com/9700541/237159446-59d15792-5d5e-4ee5-9ab8-61341391cb23.png)
   
   Thanks for the information. Didn't know that before. 😂 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhouyifan279 commented on pull request #40645: [SPARK-43014][CORE][K8S] Support `spark.kubernetes.setSubmitTimeInDriver`

Posted by "zhouyifan279 (via GitHub)" <gi...@apache.org>.
zhouyifan279 commented on PR #40645:
URL: https://github.com/apache/spark/pull/40645#issuecomment-1541993717

   > This PR never passes the unit tests. According to the GitHub Action log, you had better verify the following test suite locally, @zhouyifan279 . In the community, we are unable to merge a PR with UT failures.
   > 
   > ```
   > [info] ReplE2ESuite:
   > sh: 1: cannot open /dev/tty: No such device or address
   > sh: 1: cannot open /dev/tty: No such device or address
   > sh: 1: cannot open /dev/tty: No such device or address
   > sh: 1: cannot open /dev/tty: No such device or address
   > sh: 1: cannot open /dev/tty: No such device or address
   > sh: 1: cannot open /dev/tty: No such device or address
   > ```
   @dongjoon-hyun CI succeeded after I rebase on master branch.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhouyifan279 commented on pull request #40645: [SPARK-43014] Do not overwrite `spark.app.submitTime` in k8s cluster mode driver

Posted by "zhouyifan279 (via GitHub)" <gi...@apache.org>.
zhouyifan279 commented on PR #40645:
URL: https://github.com/apache/spark/pull/40645#issuecomment-1495341213

   Failed tests are irrelevant


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhouyifan279 commented on pull request #40645: [SPARK-43014] Do not overwrite `spark.app.submitTime` in k8s cluster mode driver

Posted by "zhouyifan279 (via GitHub)" <gi...@apache.org>.
zhouyifan279 commented on PR #40645:
URL: https://github.com/apache/spark/pull/40645#issuecomment-1495658982

   > I understand what you aim to achieve in this PR. In case of K8s Cluster Mode, the driver pod has a driver `ConfigMap` containing `spark.app.submitTime` already. So, you want to keep that. However, this is not safe because of the timezone between the user machine and the K8s control plane server.
   > 
   > If we want to propose this improvement to include the driver pod pending time, we need to add a configuration to control this feature, @zhouyifan279 . Could you try to add a configuration and add an integration test case too?
   
   @dongjoon-hyun Can you elaborate about how timezone affects the computed driver pod pending time.
   
   As I understand, both `spark.app.submitTime` and `spark.app.startTime` is milliseconds between the current time and midnight, January 1, 1970 UTC.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #40645: [SPARK-43014][CORE] Support `spark.kubernetes.setSubmitTimeInDriver`

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #40645:
URL: https://github.com/apache/spark/pull/40645#discussion_r1186493926


##########
core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala:
##########
@@ -550,6 +550,44 @@ class SparkSubmitSuite
     Files.delete(Paths.get("TestUDTF.jar"))
   }
 
+  test("SPARK-43014: Set `spark.app.submitTime` if missing ") {
+    val clArgs1 = Seq(
+      "--deploy-mode", "client",
+      "--master", "k8s://host:port",
+      "--class", "org.SomeClass",
+      "/home/thejar.jar")
+    val appArgs1 = new SparkSubmitArguments(clArgs1)
+    val (_, _, conf1, _) = submit.prepareSubmitEnvironment(appArgs1)
+    conf1.getOption("spark.app.submitTime").isDefined should be(true)
+
+    val submitTime = 1234567.toString
+    val clArgs2 = Seq(
+      "--deploy-mode", "client",
+      "--master", "k8s://host:port",
+      "--class", "org.SomeClass",
+      "--conf", "spark.app.submitTime=" + submitTime,
+      "/home/thejar.jar")
+    val appArgs2 = new SparkSubmitArguments(clArgs2)
+    val (_, _, conf2, _) = submit.prepareSubmitEnvironment(appArgs2)
+    conf2.get("spark.app.submitTime") should be (submitTime)
+  }
+
+  test("SPARK-43014: Overwrite `spark.app.submitTime` in k8s cluster mode driver " +
+    "when `spark.kubernetes.setSubmitTimeInDriver` is true") {

Review Comment:
   We need to have a test case `false` instead of this because `true` is the default, doesn't we?
   ```
   val setSubmitTimeInClusterModeDriver =
         sparkConf.getBoolean("spark.kubernetes.setSubmitTimeInDriver", true)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhouyifan279 commented on a diff in pull request #40645: [SPARK-43014][CORE] Support `spark.kubernetes.setSubmitTimeInDriver`

Posted by "zhouyifan279 (via GitHub)" <gi...@apache.org>.
zhouyifan279 commented on code in PR #40645:
URL: https://github.com/apache/spark/pull/40645#discussion_r1188057220


##########
core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala:
##########
@@ -550,6 +550,44 @@ class SparkSubmitSuite
     Files.delete(Paths.get("TestUDTF.jar"))
   }
 
+  test("SPARK-43014: Set `spark.app.submitTime` if missing ") {
+    val clArgs1 = Seq(
+      "--deploy-mode", "client",
+      "--master", "k8s://host:port",
+      "--class", "org.SomeClass",
+      "/home/thejar.jar")
+    val appArgs1 = new SparkSubmitArguments(clArgs1)
+    val (_, _, conf1, _) = submit.prepareSubmitEnvironment(appArgs1)
+    conf1.getOption("spark.app.submitTime").isDefined should be(true)
+
+    val submitTime = 1234567.toString
+    val clArgs2 = Seq(
+      "--deploy-mode", "client",
+      "--master", "k8s://host:port",
+      "--class", "org.SomeClass",
+      "--conf", "spark.app.submitTime=" + submitTime,
+      "/home/thejar.jar")
+    val appArgs2 = new SparkSubmitArguments(clArgs2)
+    val (_, _, conf2, _) = submit.prepareSubmitEnvironment(appArgs2)
+    conf2.get("spark.app.submitTime") should be (submitTime)
+  }
+
+  test("SPARK-43014: Overwrite `spark.app.submitTime` in k8s cluster mode driver " +
+    "when `spark.kubernetes.setSubmitTimeInDriver` is true") {

Review Comment:
   Sounds reasonable.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhouyifan279 commented on pull request #40645: [SPARK-43014] Do not overwrite `spark.app.submitTime` in k8s cluster mode driver

Posted by "zhouyifan279 (via GitHub)" <gi...@apache.org>.
zhouyifan279 commented on PR #40645:
URL: https://github.com/apache/spark/pull/40645#issuecomment-1514065355

   @dongjoon-hyun do you have any more inputs on this PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhouyifan279 commented on pull request #40645: [SPARK-43014] Do not overwrite `spark.app.submitTime` in k8s cluster mode driver

Posted by "zhouyifan279 (via GitHub)" <gi...@apache.org>.
zhouyifan279 commented on PR #40645:
URL: https://github.com/apache/spark/pull/40645#issuecomment-1500887774

   > Any updates, @zhouyifan279 ?
   
   @dongjoon-hyun sorry for response late. 
   
   Speaking from my limited experience, I think very few users may encounter the case you mentioned.
   We'd better not to handle this case in Spark's code. 
   
   Besides, if user changed system time by intention, he can handle it by himself easily:
   1. Inject a custom spark property, such as `spark.app.systemTimeBias` into SparkConf. 
   2. The real driver pod pending time would be: `startTime - systemTimeBias - submitTime`.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #40645: [SPARK-43014] Do not overwrite `spark.app.submitTime` in k8s cluster mode driver

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on PR #40645:
URL: https://github.com/apache/spark/pull/40645#issuecomment-1496541323

   Sorry for misleading you. You are right about timezone. What I imagined was more like the following case.
   ```
   $ docker run -it --rm --cap-add SYS_TIME openjdk:latest bash
   
   bash-4.4# date -s '@2147483647'
   Tue Jan 19 03:14:07 UTC 2038
   
   bash-4.4# jshell
   |  Welcome to JShell -- Version 18.0.2.1
   |  For an introduction type: /help intro
   
   jshell> java.lang.System.currentTimeMillis()
   $1 ==> 2147483657595
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #40645: [SPARK-43014][CORE][K8S] Support `spark.kubernetes.setSubmitTimeInDriver`

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on PR #40645:
URL: https://github.com/apache/spark/pull/40645#issuecomment-1540497458

   Thank you, @zhouyifan279 .
   
   BTW, when I said the following.
   > Please re-trigger the failed test pipeline.
   
   It means this. `Re-run failed jobs` is always faster.
   ![Screen Shot 2023-05-09 at 9 25 51 AM](https://github.com/apache/spark/assets/9700541/59d15792-5d5e-4ee5-9ab8-61341391cb23)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #40645: [SPARK-43014][CORE] Support `spark.kubernetes.setSubmitTimeInDriver`

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #40645:
URL: https://github.com/apache/spark/pull/40645#discussion_r1186495319


##########
core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala:
##########
@@ -550,6 +550,44 @@ class SparkSubmitSuite
     Files.delete(Paths.get("TestUDTF.jar"))
   }
 
+  test("SPARK-43014: Set `spark.app.submitTime` if missing ") {
+    val clArgs1 = Seq(
+      "--deploy-mode", "client",
+      "--master", "k8s://host:port",
+      "--class", "org.SomeClass",
+      "/home/thejar.jar")
+    val appArgs1 = new SparkSubmitArguments(clArgs1)
+    val (_, _, conf1, _) = submit.prepareSubmitEnvironment(appArgs1)
+    conf1.getOption("spark.app.submitTime").isDefined should be(true)
+
+    val submitTime = 1234567.toString
+    val clArgs2 = Seq(
+      "--deploy-mode", "client",
+      "--master", "k8s://host:port",
+      "--class", "org.SomeClass",
+      "--conf", "spark.app.submitTime=" + submitTime,
+      "/home/thejar.jar")
+    val appArgs2 = new SparkSubmitArguments(clArgs2)
+    val (_, _, conf2, _) = submit.prepareSubmitEnvironment(appArgs2)
+    conf2.get("spark.app.submitTime") should be (submitTime)
+  }
+
+  test("SPARK-43014: Overwrite `spark.app.submitTime` in k8s cluster mode driver " +
+    "when `spark.kubernetes.setSubmitTimeInDriver` is true") {
+    val submitTime = System.currentTimeMillis() - 1000
+    val clArgs = Seq(
+      "--deploy-mode", "client",
+      "--master", "k8s://host:port",
+      "--class", "org.SomeClass",
+      "--conf", "spark.app.submitTime=" + submitTime,
+      "--conf", "spark.kubernetes.submitInDriver=true",
+      "--conf", "spark.kubernetes.setSubmitTimeInDriver=true",
+      "/home/thejar.jar")
+    val appArgs = new SparkSubmitArguments(clArgs)
+    val (_, _, conf, _) = submit.prepareSubmitEnvironment(appArgs)
+    conf.getLong("spark.app.submitTime", -1) should be > submitTime

Review Comment:
   Actually, this case looks misleading because this check seems to be valid due to `isKubernetesClusterModeDriver` instead of `"--conf", "spark.kubernetes.setSubmitTimeInDriver=true",`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #40645: [SPARK-43014][CORE][K8S] Support `spark.kubernetes.setSubmitTimeInDriver`

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on PR #40645:
URL: https://github.com/apache/spark/pull/40645#issuecomment-1542515219

   Great! Thank you, @zhouyifan279 .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org