You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "robreeves (via GitHub)" <gi...@apache.org> on 2023/04/05 20:06:46 UTC

[GitHub] [spark] robreeves commented on a diff in pull request #40637: [SPARK-43002][YARN] Modify yarn client application report logging frequency to reduce noise

robreeves commented on code in PR #40637:
URL: https://github.com/apache/spark/pull/40637#discussion_r1158959568


##########
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala:
##########
@@ -1162,7 +1165,11 @@ private[spark] class Client(
       val state = report.getYarnApplicationState
 
       if (logApplicationReport) {
-        logInfo(s"Application report for $appId (state: $state)")
+        if (reportsSinceLastLog == reportsTillNextLog || lastState != state) {

Review Comment:
   Right now this can't happen, but I'm usually paranoid about using equality checks in case there is some bug where we don't hit this line of code, but do increment `reportsSinceLastLog`. A more defensive way to code this is `reportsSinceLastLog >= reportsTillNextLog`.



##########
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala:
##########
@@ -1142,8 +1142,11 @@ private[spark] class Client(
       logApplicationReport: Boolean = true,
       interval: Long = sparkConf.get(REPORT_INTERVAL)): YarnAppReport = {
     var lastState: YarnApplicationState = null
+    val reportsTillNextLog: Int = sparkConf.get(REPORT_LOG_FREQUENCY)
+    var reportsSinceLastLog: Int = 0
     while (true) {
       Thread.sleep(interval)
+      reportsSinceLastLog += 1

Review Comment:
   A bit of a nit, but I prefer to keep modification of `reportsSinceLastLog` as close to the logging section as possible (line 1166). Right now there is a big section of unrelated code before where it is set and used that requires the reader to jump around more.



##########
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/config.scala:
##########
@@ -225,6 +225,15 @@ package object config extends Logging {
     .timeConf(TimeUnit.MILLISECONDS)
     .createWithDefaultString("1s")
 
+  private[spark] val REPORT_LOG_FREQUENCY = {
+    ConfigBuilder("spark.yarn.report.logging.frequency")
+      .doc("Number of application reports processed until the next application status is logged." +

Review Comment:
   To be more precise, it is the **maximum** number of reports processed because a state change will cause it to be logged right away. I think the state change behavior is worth mentioning.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org