You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "gengliangwang (via GitHub)" <gi...@apache.org> on 2024/03/26 18:58:03 UTC

[PR] [SPARK-47574][INFRA] Introduce Structured Logging Framework [spark]

gengliangwang opened a new pull request, #45729:
URL: https://github.com/apache/spark/pull/45729

   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html
     2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html
     3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'.
     4. Be sure to keep the PR description updated to reflect all changes.
     5. Please write your PR title to summarize what this PR proposes.
     6. If possible, provide a concise example to reproduce the issue for a faster review.
     7. If you want to add a new configuration, please read the guideline first for naming configurations in
        'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'.
     8. If you want to add or modify an error type or message, please read the guideline first in
        'common/utils/src/main/resources/error/README.md'.
   -->
   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   Introduce Structured Logging Framework as per [SPIP: Structured Logging Framework for Apache Spark](https://docs.google.com/document/d/1rATVGmFLNVLmtxSpWrEceYm7d-ocgu8ofhryVs4g3XU/edit?usp=sharing) .
   * The default logging output format will be json lines. For example 
   ```
   {
      "ts":"2023-03-12T12:02:46.661-0700",
      "level":"ERROR",
      "msg":"Cannot determine whether executor 289 is alive or not",
      "context":{
          "executor_id":"289"
      },
      "exception":{
         "class":"org.apache.spark.SparkException",
         "msg":"Exception thrown in awaitResult",
         "stackTrace":"..."
      },
      "source":"BlockManagerMasterEndpoint"
   } 
   ```
   * Introduce a new configuration `spark.log.structuredLogging.enabled` to set the default log4j configuration. It is true by default. Users can disable it to get plain text log outputs.
   * The change will start with the `logError` method. Example changes on the API: 
   from
   ```
   logError(s"Cannot determine whether executor $executorId is alive or not.", e)
   ```
   to
   ```
   logError(log"Cannot determine whether executor ${MDC(EXECUTOR_ID, executorId)} is alive or not.", e)
   ```
   
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   
   To enhance Apache Spark's logging system by implementing structured logging. This transition will change the format of the default log output from plain text to JSON lines, making it more analyzable.
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Spark versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   Yes, the default log output format will be json lines instead of plain text. User can restore the default plain text output when disabling configuration ``spark.log.structuredLogging.enabled`. 
   If a user is a customized log4j configuration, there is no changes in the log output.
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   If benchmark tests were added, please run the benchmarks in GitHub Actions for the consistent environment, and the instructions could accord to: https://spark.apache.org/developer-tools.html#github-workflow-benchmarks.
   -->
   New Unit tests
   
   ### Was this patch authored or co-authored using generative AI tooling?
   <!--
   If generative AI tooling has been used in the process of authoring this patch, please include the
   phrase: 'Generated-by: ' followed by the name of the tool and its version.
   If no, write 'No'.
   Please refer to the [ASF Generative Tooling Guidance](https://www.apache.org/legal/generative-tooling.html) for details.
   -->
   Yes, some of the code comments are from github copilot


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47574][INFRA] Introduce Structured Logging Framework [spark]

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on code in PR #45729:
URL: https://github.com/apache/spark/pull/45729#discussion_r1540498116


##########
common/utils/src/main/scala/org/apache/spark/internal/Logging.scala:
##########
@@ -190,6 +281,7 @@ private[spark] object Logging {
   @volatile private var initialized = false
   @volatile private var defaultRootLevel: Level = null
   @volatile private var defaultSparkLog4jConfig = false
+  @volatile private var useStructuredLogging = true

Review Comment:
   How about `structuredLoggingEnabled` ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47574][INFRA] Introduce Structured Logging Framework [spark]

Posted by "gengliangwang (via GitHub)" <gi...@apache.org>.
gengliangwang closed pull request #45729: [SPARK-47574][INFRA] Introduce Structured Logging Framework
URL: https://github.com/apache/spark/pull/45729


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47574][INFRA] Introduce Structured Logging Framework [spark]

Posted by "gengliangwang (via GitHub)" <gi...@apache.org>.
gengliangwang commented on PR #45729:
URL: https://github.com/apache/spark/pull/45729#issuecomment-2026713363

   @amaliujia @dtenedor @beliefer @HyukjinKwon @cloud-fan Thanks for the reviews!
   The PR has been open for 3 days. I am merging this one to master and moving forward.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47574][CORE] Introduce Structured Logging Framework [spark]

Posted by "steveloughran (via GitHub)" <gi...@apache.org>.
steveloughran commented on code in PR #45729:
URL: https://github.com/apache/spark/pull/45729#discussion_r1553727210


##########
common/utils/src/test/scala/org/apache/spark/util/StructuredLoggingSuite.scala:
##########
@@ -0,0 +1,83 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.util
+
+import java.io.File
+import java.nio.file.Files
+
+import org.scalatest.funsuite.AnyFunSuite // scalastyle:ignore funsuite
+
+import org.apache.spark.internal.{Logging, MDC}
+import org.apache.spark.internal.LogKey.EXECUTOR_ID
+
+abstract class LoggingSuiteBase extends AnyFunSuite // scalastyle:ignore funsuite
+  with Logging {
+
+  protected def logFilePath: String
+
+  protected lazy val logFile: File = {
+    val pwd = new File(".").getCanonicalPath
+    new File(pwd + "/" + logFilePath)
+  }
+
+  // Returns the first line in the log file that contains the given substring.
+  protected def captureLogOutput(f: () => Unit): String = {
+    val content = if (logFile.exists()) {
+      Files.readString(logFile.toPath)
+    } else {
+      ""
+    }
+    f()
+    val newContent = Files.readString(logFile.toPath)
+    newContent.substring(content.length)
+  }
+}
+
+class StructuredLoggingSuite extends LoggingSuiteBase {
+  private val className = this.getClass.getName.stripSuffix("$")
+  override def logFilePath: String = "target/structured.log"
+
+  test("Structured logging") {
+    val msg = "This is a log message"
+    val logOutput = captureLogOutput(() => logError(msg))
+
+    // scalastyle:off line.size.limit
+    val pattern = s"""\\{"ts":"[^"]+","level":"ERROR","msg":"This is a log message","logger":"$className"}\n""".r
+    // scalastyle:on
+    assert(pattern.matches(logOutput))
+  }
+
+  test("Structured logging with MDC") {
+    val logOutput = captureLogOutput(() => logError(log"Lost executor ${MDC(EXECUTOR_ID, "1")}."))
+    assert(logOutput.nonEmpty)
+    // scalastyle:off line.size.limit
+    val pattern1 = s"""\\{"ts":"[^"]+","level":"ERROR","msg":"Lost executor 1.","context":\\{"executor_id":"1"},"logger":"$className"}\n""".r
+    // scalastyle:on
+    assert(pattern1.matches(logOutput))
+  }
+
+  test("Structured exception logging with MDC") {
+    val exception = new RuntimeException("OOM")
+    val logOutput = captureLogOutput(() =>
+      logError(log"Error in executor ${MDC(EXECUTOR_ID, "1")}.", exception))
+    assert(logOutput.nonEmpty)
+    // scalastyle:off line.size.limit
+    val pattern = s"""\\{"ts":"[^"]+","level":"ERROR","msg":"Error in executor 1.","context":\\{"executor_id":"1"},"exception":\\{"class":"java.lang.RuntimeException","msg":"OOM","stacktrace":.*},"logger":"$className"}\n""".r

Review Comment:
   1. you should be deserializing this rather than just doing pattern matching, then validate the contents
   2. and verify that on nested exceptions, you get both messages and stack traces



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47574][INFRA] Introduce Structured Logging Framework [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #45729:
URL: https://github.com/apache/spark/pull/45729#discussion_r1544097323


##########
core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala:
##########
@@ -228,6 +228,11 @@ private[spark] class SparkSubmit extends Logging {
     val childClasspath = new ArrayBuffer[String]()
     val sparkConf = args.toSparkConf()
     if (sparkConf.contains("spark.local.connect")) sparkConf.remove("spark.remote")
+    if (sparkConf.getBoolean(STRUCTURED_LOGGING_ENABLED.key, defaultValue = true)) {

Review Comment:
   does this work for spark-shell as well? And thriftserver?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47574][INFRA] Introduce Structured Logging Framework [spark]

Posted by "gengliangwang (via GitHub)" <gi...@apache.org>.
gengliangwang commented on code in PR #45729:
URL: https://github.com/apache/spark/pull/45729#discussion_r1546593404


##########
core/src/main/scala/org/apache/spark/internal/config/package.scala:
##########
@@ -141,6 +141,16 @@ package object config {
         "Ensure that memory overhead is a double greater than 0")
       .createWithDefault(0.1)
 
+  private[spark] val STRUCTURED_LOGGING_ENABLED =
+    ConfigBuilder("spark.log.structuredLogging.enabled")

Review Comment:
   Thanks, I just created https://issues.apache.org/jira/browse/SPARK-47671 for this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47574][INFRA] Introduce Structured Logging Framework [spark]

Posted by "tgravescs (via GitHub)" <gi...@apache.org>.
tgravescs commented on code in PR #45729:
URL: https://github.com/apache/spark/pull/45729#discussion_r1546337511


##########
core/src/main/scala/org/apache/spark/internal/config/package.scala:
##########
@@ -141,6 +141,16 @@ package object config {
         "Ensure that memory overhead is a double greater than 0")
       .createWithDefault(0.1)
 
+  private[spark] val STRUCTURED_LOGGING_ENABLED =
+    ConfigBuilder("spark.log.structuredLogging.enabled")

Review Comment:
   this config needs to be documented on configuration.md



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47574][INFRA] Introduce Structured Logging Framework [spark]

Posted by "gengliangwang (via GitHub)" <gi...@apache.org>.
gengliangwang commented on code in PR #45729:
URL: https://github.com/apache/spark/pull/45729#discussion_r1540319742


##########
common/utils/src/test/scala/org/apache/spark/util/StructuredLoggingSuite.scala:
##########
@@ -0,0 +1,91 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.util
+
+import java.io.{ByteArrayOutputStream, PrintStream}
+
+import org.apache.commons.io.output.TeeOutputStream
+import org.scalatest.{BeforeAndAfterAll, BeforeAndAfterEach}
+import org.scalatest.funsuite.AnyFunSuite // scalastyle:ignore funsuite
+
+import org.apache.spark.internal.{Logging, MDC}
+import org.apache.spark.internal.LogKey.EXECUTOR_ID
+
+abstract class LoggingSuiteBase
+    extends AnyFunSuite // scalastyle:ignore funsuite
+    with BeforeAndAfterAll
+    with BeforeAndAfterEach
+    with Logging {
+  protected val outContent = new ByteArrayOutputStream()
+  protected val originalErr = System.err
+
+  override def beforeAll(): Unit = {
+    val teeStream = new TeeOutputStream(originalErr, outContent)
+    System.setErr(new PrintStream(teeStream))
+  }
+
+  override def afterAll(): Unit = {
+    System.setErr(originalErr)
+  }
+
+  override def afterEach(): Unit = {
+    outContent.reset()
+  }
+}
+
+class StructuredLoggingSuite extends LoggingSuiteBase {
+  val className = this.getClass.getName.stripSuffix("$")
+  override def beforeAll(): Unit = {
+    super.beforeAll()
+    Logging.enableStructuredLogging()
+  }
+
+  test("Structured logging") {
+    val msg = "This is a log message"
+    logError(msg)
+
+    val logOutput = outContent.toString.split("\n").filter(_.contains(msg)).head
+    assert(logOutput.nonEmpty)
+    // scalastyle:off line.size.limit
+    val pattern = s"""\\{"ts":"[^"]+","level":"ERROR","msg":"This is a log message","logger":"$className"}""".r

Review Comment:
   The output is always one line of json. https://logging.apache.org/log4j/2.x/manual/json-template-layout.html doesn't support pretty print. Also, parsing multiple-line json is slower than single line.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47574][INFRA] Introduce Structured Logging Framework [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #45729:
URL: https://github.com/apache/spark/pull/45729#discussion_r1544095644


##########
common/utils/src/test/scala/org/apache/spark/util/StructuredLoggingSuite.scala:
##########
@@ -0,0 +1,83 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.util
+
+import java.io.File
+import java.nio.file.Files
+
+import org.scalatest.funsuite.AnyFunSuite // scalastyle:ignore funsuite
+
+import org.apache.spark.internal.{Logging, MDC}
+import org.apache.spark.internal.LogKey.EXECUTOR_ID
+
+abstract class LoggingSuiteBase extends AnyFunSuite // scalastyle:ignore funsuite

Review Comment:
   why can't we use `SparkFunSuite`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47574][INFRA] Introduce Structured Logging Framework [spark]

Posted by "dtenedor (via GitHub)" <gi...@apache.org>.
dtenedor commented on code in PR #45729:
URL: https://github.com/apache/spark/pull/45729#discussion_r1544763733


##########
common/utils/src/test/scala/org/apache/spark/util/StructuredLoggingSuite.scala:
##########
@@ -0,0 +1,91 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.util
+
+import java.io.{ByteArrayOutputStream, PrintStream}
+
+import org.apache.commons.io.output.TeeOutputStream
+import org.scalatest.{BeforeAndAfterAll, BeforeAndAfterEach}
+import org.scalatest.funsuite.AnyFunSuite // scalastyle:ignore funsuite
+
+import org.apache.spark.internal.{Logging, MDC}
+import org.apache.spark.internal.LogKey.EXECUTOR_ID
+
+abstract class LoggingSuiteBase
+    extends AnyFunSuite // scalastyle:ignore funsuite
+    with BeforeAndAfterAll
+    with BeforeAndAfterEach
+    with Logging {
+  protected val outContent = new ByteArrayOutputStream()
+  protected val originalErr = System.err
+
+  override def beforeAll(): Unit = {
+    val teeStream = new TeeOutputStream(originalErr, outContent)
+    System.setErr(new PrintStream(teeStream))
+  }
+
+  override def afterAll(): Unit = {
+    System.setErr(originalErr)
+  }
+
+  override def afterEach(): Unit = {
+    outContent.reset()
+  }
+}
+
+class StructuredLoggingSuite extends LoggingSuiteBase {
+  val className = this.getClass.getName.stripSuffix("$")
+  override def beforeAll(): Unit = {
+    super.beforeAll()
+    Logging.enableStructuredLogging()
+  }
+
+  test("Structured logging") {
+    val msg = "This is a log message"
+    logError(msg)
+
+    val logOutput = outContent.toString.split("\n").filter(_.contains(msg)).head
+    assert(logOutput.nonEmpty)
+    // scalastyle:off line.size.limit
+    val pattern = s"""\\{"ts":"[^"]+","level":"ERROR","msg":"This is a log message","logger":"$className"}""".r

Review Comment:
   @gengliangwang I know, I was talking about in the unit test case, not the production JSON output itself. It is fine that the production JSON record is on one line. I am suggesting that we make the JSON in the unit tests formatted in a more readable way, and then we can ignore whitespace when comparing the expected JSON record against the expected one in each test case.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47574][CORE] Introduce Structured Logging Framework [spark]

Posted by "pan3793 (via GitHub)" <gi...@apache.org>.
pan3793 commented on PR #45729:
URL: https://github.com/apache/spark/pull/45729#issuecomment-2038263897

   @gengliangwang thanks for the clarification, that makes sense.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47574][CORE] Introduce Structured Logging Framework [spark]

Posted by "gengliangwang (via GitHub)" <gi...@apache.org>.
gengliangwang commented on PR #45729:
URL: https://github.com/apache/spark/pull/45729#issuecomment-2038127812

   > Seems it does not get implemented in this PR, in the migration PRs, we still inject the Spark identifiers like APP_ID manually in each message. 
   
   This will be done. 
   The current migration is necessary, for example, there are logs about executor/worker on driver. 
   
   
   > Another question is, as we use the enum LogKey to track all known MDC keys, is it possible to inject custom keys?
   
   It is possible, developers can either use ThreadContext from Log4j, or create a customized MessageWithContext entry
   ```
   case class MessageWithContext(message: String, context: java.util.HashMap[String, String])
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47574][INFRA] Introduce Structured Logging Framework [spark]

Posted by "gengliangwang (via GitHub)" <gi...@apache.org>.
gengliangwang commented on code in PR #45729:
URL: https://github.com/apache/spark/pull/45729#discussion_r1544105135


##########
core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala:
##########
@@ -228,6 +228,11 @@ private[spark] class SparkSubmit extends Logging {
     val childClasspath = new ArrayBuffer[String]()
     val sparkConf = args.toSparkConf()
     if (sparkConf.contains("spark.local.connect")) sparkConf.remove("spark.remote")
+    if (sparkConf.getBoolean(STRUCTURED_LOGGING_ENABLED.key, defaultValue = true)) {

Review Comment:
   Yes



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47574][INFRA] Introduce Structured Logging Framework [spark]

Posted by "gengliangwang (via GitHub)" <gi...@apache.org>.
gengliangwang commented on code in PR #45729:
URL: https://github.com/apache/spark/pull/45729#discussion_r1542390853


##########
common/utils/src/main/scala/org/apache/spark/internal/Logging.scala:
##########
@@ -55,6 +91,43 @@ trait Logging {
     log_
   }
 
+  implicit class LogStringContext(val sc: StringContext) {
+    def log(args: Any*): MessageWithContext = {
+      val processedParts = sc.parts.iterator
+      val sb = new StringBuilder(processedParts.next())
+      lazy val map = new java.util.HashMap[String, String]()
+
+      args.foreach { arg =>
+        arg match {
+          case mdc: MDC =>
+            sb.append(mdc.value)
+            if (Logging.isStructuredLoggingEnabled) {
+              map.put(mdc.key.toString.toLowerCase(Locale.ROOT), mdc.value)
+            }
+          case other =>
+            // Note: all the arguments are supposed to be MDCs, but we only throw an exception
+            //       if we are running in test mode. This is to avoid breaking existing code.

Review Comment:
   Thanks, this is a great suggestion!



##########
common/utils/src/main/scala/org/apache/spark/internal/Logging.scala:
##########
@@ -190,6 +281,7 @@ private[spark] object Logging {
   @volatile private var initialized = false
   @volatile private var defaultRootLevel: Level = null
   @volatile private var defaultSparkLog4jConfig = false
+  @volatile private var useStructuredLogging = true

Review Comment:
   Sure, updated.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47574][INFRA] Introduce Structured Logging Framework [spark]

Posted by "gengliangwang (via GitHub)" <gi...@apache.org>.
gengliangwang commented on code in PR #45729:
URL: https://github.com/apache/spark/pull/45729#discussion_r1540312938


##########
common/utils/src/main/scala/org/apache/spark/internal/Logging.scala:
##########
@@ -55,6 +77,43 @@ trait Logging {
     log_
   }
 
+  implicit class LogStringContext(val sc: StringContext) {
+    def log(args: Any*): (String, Option[Instance]) = {
+      val processedParts = sc.parts.iterator
+      val sb = new StringBuilder(processedParts.next())
+      lazy val map = new java.util.HashMap[String, String]()
+
+      args.foreach { arg =>
+        arg match {

Review Comment:
   There is another code block at the end of the loop:
   ```
           if (processedParts.hasNext) {
             sb.append(processedParts.next())
           }
   ```
   I am trying to build both the map and the string in one loop.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47574][INFRA] Introduce Structured Logging Framework [spark]

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on code in PR #45729:
URL: https://github.com/apache/spark/pull/45729#discussion_r1540500697


##########
common/utils/src/main/scala/org/apache/spark/internal/Logging.scala:
##########
@@ -55,6 +91,43 @@ trait Logging {
     log_
   }
 
+  implicit class LogStringContext(val sc: StringContext) {
+    def log(args: Any*): MessageWithContext = {
+      val processedParts = sc.parts.iterator
+      val sb = new StringBuilder(processedParts.next())
+      lazy val map = new java.util.HashMap[String, String]()
+
+      args.foreach { arg =>
+        arg match {
+          case mdc: MDC =>
+            sb.append(mdc.value)
+            if (Logging.isStructuredLoggingEnabled) {
+              map.put(mdc.key.toString.toLowerCase(Locale.ROOT), mdc.value)
+            }
+          case other =>
+            // Note: all the arguments are supposed to be MDCs, but we only throw an exception
+            //       if we are running in test mode. This is to avoid breaking existing code.

Review Comment:
   Shall we change `args: Any*` to `args: MDC* ? So we can avoid the match path.



##########
common/utils/src/main/scala/org/apache/spark/internal/Logging.scala:
##########
@@ -55,6 +91,43 @@ trait Logging {
     log_
   }
 
+  implicit class LogStringContext(val sc: StringContext) {
+    def log(args: Any*): MessageWithContext = {
+      val processedParts = sc.parts.iterator
+      val sb = new StringBuilder(processedParts.next())
+      lazy val map = new java.util.HashMap[String, String]()
+
+      args.foreach { arg =>
+        arg match {
+          case mdc: MDC =>
+            sb.append(mdc.value)
+            if (Logging.isStructuredLoggingEnabled) {
+              map.put(mdc.key.toString.toLowerCase(Locale.ROOT), mdc.value)
+            }
+          case other =>
+            // Note: all the arguments are supposed to be MDCs, but we only throw an exception
+            //       if we are running in test mode. This is to avoid breaking existing code.

Review Comment:
   Shall we change `args: Any*` to `args: MDC*` ? So we can avoid the match path.



##########
common/utils/src/main/scala/org/apache/spark/internal/Logging.scala:
##########
@@ -55,6 +91,43 @@ trait Logging {
     log_
   }
 
+  implicit class LogStringContext(val sc: StringContext) {
+    def log(args: Any*): MessageWithContext = {
+      val processedParts = sc.parts.iterator
+      val sb = new StringBuilder(processedParts.next())
+      lazy val map = new java.util.HashMap[String, String]()
+
+      args.foreach { arg =>
+        arg match {
+          case mdc: MDC =>
+            sb.append(mdc.value)
+            if (Logging.isStructuredLoggingEnabled) {
+              map.put(mdc.key.toString.toLowerCase(Locale.ROOT), mdc.value)
+            }
+          case other =>
+            // Note: all the arguments are supposed to be MDCs, but we only throw an exception
+            //       if we are running in test mode. This is to avoid breaking existing code.

Review Comment:
   Shall we change `args: Any*` to `args: MDC* ? So we can avoid the match path.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47574][INFRA] Introduce Structured Logging Framework [spark]

Posted by "amaliujia (via GitHub)" <gi...@apache.org>.
amaliujia commented on code in PR #45729:
URL: https://github.com/apache/spark/pull/45729#discussion_r1542066261


##########
common/utils/src/test/scala/org/apache/spark/util/StructuredLoggingSuite.scala:
##########
@@ -0,0 +1,91 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.util
+
+import java.io.{ByteArrayOutputStream, PrintStream}
+
+import org.apache.commons.io.output.TeeOutputStream
+import org.scalatest.{BeforeAndAfterAll, BeforeAndAfterEach}
+import org.scalatest.funsuite.AnyFunSuite // scalastyle:ignore funsuite
+
+import org.apache.spark.internal.{Logging, MDC}
+import org.apache.spark.internal.LogKey.EXECUTOR_ID
+
+abstract class LoggingSuiteBase
+    extends AnyFunSuite // scalastyle:ignore funsuite
+    with BeforeAndAfterAll
+    with BeforeAndAfterEach
+    with Logging {
+  protected val outContent = new ByteArrayOutputStream()
+  protected val originalErr = System.err
+
+  override def beforeAll(): Unit = {
+    val teeStream = new TeeOutputStream(originalErr, outContent)
+    System.setErr(new PrintStream(teeStream))
+  }
+
+  override def afterAll(): Unit = {
+    System.setErr(originalErr)
+  }
+
+  override def afterEach(): Unit = {
+    outContent.reset()
+  }
+}
+
+class StructuredLoggingSuite extends LoggingSuiteBase {
+  val className = this.getClass.getName.stripSuffix("$")
+  override def beforeAll(): Unit = {
+    super.beforeAll()
+    Logging.enableStructuredLogging()
+  }
+
+  test("Structured logging") {
+    val msg = "This is a log message"
+    logError(msg)
+
+    val logOutput = outContent.toString.split("\n").filter(_.contains(msg)).head
+    assert(logOutput.nonEmpty)
+    // scalastyle:off line.size.limit
+    val pattern = s"""\\{"ts":"[^"]+","level":"ERROR","msg":"This is a log message","logger":"$className"}""".r
+    // scalastyle:on
+    assert(pattern.matches(logOutput))

Review Comment:
   Regarding to the test case readability, I am wondering if we at last put the value of the `logOutput` as a comment here with newlines and whitespaces inserted to have better readability, so we can read the pattern and the quickly read the value in comment to understand what this test case does?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47574][INFRA] Introduce Structured Logging Framework [spark]

Posted by "amaliujia (via GitHub)" <gi...@apache.org>.
amaliujia commented on code in PR #45729:
URL: https://github.com/apache/spark/pull/45729#discussion_r1542066757


##########
common/utils/src/test/scala/org/apache/spark/util/StructuredLoggingSuite.scala:
##########
@@ -0,0 +1,91 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.util
+
+import java.io.{ByteArrayOutputStream, PrintStream}
+
+import org.apache.commons.io.output.TeeOutputStream
+import org.scalatest.{BeforeAndAfterAll, BeforeAndAfterEach}
+import org.scalatest.funsuite.AnyFunSuite // scalastyle:ignore funsuite
+
+import org.apache.spark.internal.{Logging, MDC}
+import org.apache.spark.internal.LogKey.EXECUTOR_ID
+
+abstract class LoggingSuiteBase
+    extends AnyFunSuite // scalastyle:ignore funsuite
+    with BeforeAndAfterAll
+    with BeforeAndAfterEach
+    with Logging {
+  protected val outContent = new ByteArrayOutputStream()
+  protected val originalErr = System.err
+
+  override def beforeAll(): Unit = {
+    val teeStream = new TeeOutputStream(originalErr, outContent)
+    System.setErr(new PrintStream(teeStream))
+  }
+
+  override def afterAll(): Unit = {
+    System.setErr(originalErr)
+  }
+
+  override def afterEach(): Unit = {
+    outContent.reset()
+  }
+}
+
+class StructuredLoggingSuite extends LoggingSuiteBase {
+  val className = this.getClass.getName.stripSuffix("$")
+  override def beforeAll(): Unit = {
+    super.beforeAll()
+    Logging.enableStructuredLogging()
+  }
+
+  test("Structured logging") {
+    val msg = "This is a log message"
+    logError(msg)
+
+    val logOutput = outContent.toString.split("\n").filter(_.contains(msg)).head
+    assert(logOutput.nonEmpty)
+    // scalastyle:off line.size.limit
+    val pattern = s"""\\{"ts":"[^"]+","level":"ERROR","msg":"This is a log message","logger":"$className"}""".r
+    // scalastyle:on
+    assert(pattern.matches(logOutput))

Review Comment:
   Please ignore if the value is not stable thus whenever we put it in the comment, it becomes stable very soon. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47574][INFRA] Introduce Structured Logging Framework [spark]

Posted by "amaliujia (via GitHub)" <gi...@apache.org>.
amaliujia commented on code in PR #45729:
URL: https://github.com/apache/spark/pull/45729#discussion_r1540084201


##########
common/utils/src/main/scala/org/apache/spark/internal/Logging.scala:
##########
@@ -55,6 +76,37 @@ trait Logging {
     log_
   }
 
+  implicit class LogStringContext(val sc: StringContext) {
+    def log(args: Any*): (String, Instance) = {
+      val processedParts = sc.parts.iterator
+      val sb = new StringBuilder(processedParts.next())
+      lazy val map = new java.util.HashMap[String, String]()
+
+      args.foreach { arg =>
+        arg match {
+          case mdc: MDC =>
+            sb.append(mdc.value)
+            if (Logging.isStructuredLoggingEnabled) {
+              map.put(mdc.key.toString.toLowerCase(Locale.ROOT), mdc.value)
+            }
+          case other =>
+            throw new IllegalArgumentException(s"Argument $other is not a MDC")

Review Comment:
   Do we need Spark Exception and error class here?



##########
common/utils/src/main/scala/org/apache/spark/internal/Logging.scala:
##########
@@ -55,6 +76,37 @@ trait Logging {
     log_
   }
 
+  implicit class LogStringContext(val sc: StringContext) {
+    def log(args: Any*): (String, Instance) = {
+      val processedParts = sc.parts.iterator
+      val sb = new StringBuilder(processedParts.next())
+      lazy val map = new java.util.HashMap[String, String]()
+
+      args.foreach { arg =>
+        arg match {
+          case mdc: MDC =>
+            sb.append(mdc.value)
+            if (Logging.isStructuredLoggingEnabled) {
+              map.put(mdc.key.toString.toLowerCase(Locale.ROOT), mdc.value)
+            }
+          case other =>
+            throw new IllegalArgumentException(s"Argument $other is not a MDC")
+        }
+        if (processedParts.hasNext) {
+          sb.append(processedParts.next())
+        }
+      }
+
+      // Create a CloseableThreadContext and apply the context map
+      val closeableContext = if (Logging.isStructuredLoggingEnabled) {
+        CloseableThreadContext.putAll(map)
+      } else {
+        null

Review Comment:
   Why choose to return null but not an empty `CloseableThreadContext`, which could better mitigate NPE?
   
   How will the callers deal with this null in existing implementation? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47574][INFRA] Introduce Structured Logging Framework [spark]

Posted by "gengliangwang (via GitHub)" <gi...@apache.org>.
gengliangwang commented on code in PR #45729:
URL: https://github.com/apache/spark/pull/45729#discussion_r1540294855


##########
common/utils/src/main/scala/org/apache/spark/internal/Logging.scala:
##########
@@ -17,17 +17,39 @@
 
 package org.apache.spark.internal
 
+import java.util.Locale
+
 import scala.jdk.CollectionConverters._
 
-import org.apache.logging.log4j.{Level, LogManager}
+import org.apache.logging.log4j.{CloseableThreadContext, Level, LogManager}
+import org.apache.logging.log4j.CloseableThreadContext.Instance
 import org.apache.logging.log4j.core.{Filter, LifeCycle, LogEvent, Logger => Log4jLogger, LoggerContext}
 import org.apache.logging.log4j.core.appender.ConsoleAppender
 import org.apache.logging.log4j.core.config.DefaultConfiguration
 import org.apache.logging.log4j.core.filter.AbstractFilter
 import org.slf4j.{Logger, LoggerFactory}
 
+import org.apache.spark.SparkException
 import org.apache.spark.internal.Logging.SparkShellLoggingFilter
-import org.apache.spark.util.SparkClassUtils
+import org.apache.spark.util.{SparkClassUtils, SparkEnvUtils}
+
+// Mapped Diagnostic Context (MDC) that will be used in log messages.
+// The values of the MDC will be inline in the log message, while the key-value pairs will be
+// part of the ThreadContext.
+case class MDC(key: LogKey.Value, value: String)

Review Comment:
   And we are putting the class name into the log message. I am trying to avoid a long class name here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47574][INFRA] Introduce Structured Logging Framework [spark]

Posted by "panbingkun (via GitHub)" <gi...@apache.org>.
panbingkun commented on code in PR #45729:
URL: https://github.com/apache/spark/pull/45729#discussion_r1545654750


##########
common/utils/src/test/scala/org/apache/spark/util/StructuredLoggingSuite.scala:
##########
@@ -0,0 +1,91 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.util
+
+import java.io.{ByteArrayOutputStream, PrintStream}
+
+import org.apache.commons.io.output.TeeOutputStream
+import org.scalatest.{BeforeAndAfterAll, BeforeAndAfterEach}
+import org.scalatest.funsuite.AnyFunSuite // scalastyle:ignore funsuite
+
+import org.apache.spark.internal.{Logging, MDC}
+import org.apache.spark.internal.LogKey.EXECUTOR_ID
+
+abstract class LoggingSuiteBase
+    extends AnyFunSuite // scalastyle:ignore funsuite
+    with BeforeAndAfterAll
+    with BeforeAndAfterEach
+    with Logging {
+  protected val outContent = new ByteArrayOutputStream()
+  protected val originalErr = System.err
+
+  override def beforeAll(): Unit = {
+    val teeStream = new TeeOutputStream(originalErr, outContent)
+    System.setErr(new PrintStream(teeStream))
+  }
+
+  override def afterAll(): Unit = {
+    System.setErr(originalErr)
+  }
+
+  override def afterEach(): Unit = {
+    outContent.reset()
+  }
+}
+
+class StructuredLoggingSuite extends LoggingSuiteBase {
+  val className = this.getClass.getName.stripSuffix("$")
+  override def beforeAll(): Unit = {
+    super.beforeAll()
+    Logging.enableStructuredLogging()
+  }
+
+  test("Structured logging") {
+    val msg = "This is a log message"
+    logError(msg)
+
+    val logOutput = outContent.toString.split("\n").filter(_.contains(msg)).head
+    assert(logOutput.nonEmpty)
+    // scalastyle:off line.size.limit
+    val pattern = s"""\\{"ts":"[^"]+","level":"ERROR","msg":"This is a log message","logger":"$className"}""".r

Review Comment:
   @dtenedor @gengliangwang 
   Considering the `readability` of JSON format in UT, I submitted a PR to improve it
   https://github.com/apache/spark/pull/45784
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47574][CORE] Introduce Structured Logging Framework [spark]

Posted by "pan3793 (via GitHub)" <gi...@apache.org>.
pan3793 commented on PR #45729:
URL: https://github.com/apache/spark/pull/45729#issuecomment-2038083361

   @gengliangwang I see the SPIP docs say
   
   > Spark identifiers (e.g., query ID, executor ID, task ID) will be [tagged using ThreadContext](https://logging.apache.org/log4j/2.x/manual/thread-context.html#fish-tagging), e.g., ThreadContext.set(EXECUTOR_ID, executorId).
   
   Seems it does not get implemented in this PR, in the migration PRs, we still inject the Spark identifiers like APP_ID manually in each message. Another question is, as we use the `enum LogKey` to track all known MDC keys, is it possible to inject custom keys? For example, users may have custom labels on the Spark nodes, and they also want to aggregate logs by those custom labels.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47574][INFRA] Introduce Structured Logging Framework [spark]

Posted by "gengliangwang (via GitHub)" <gi...@apache.org>.
gengliangwang commented on code in PR #45729:
URL: https://github.com/apache/spark/pull/45729#discussion_r1544104949


##########
common/utils/src/test/scala/org/apache/spark/util/StructuredLoggingSuite.scala:
##########
@@ -0,0 +1,83 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.util
+
+import java.io.File
+import java.nio.file.Files
+
+import org.scalatest.funsuite.AnyFunSuite // scalastyle:ignore funsuite
+
+import org.apache.spark.internal.{Logging, MDC}
+import org.apache.spark.internal.LogKey.EXECUTOR_ID
+
+abstract class LoggingSuiteBase extends AnyFunSuite // scalastyle:ignore funsuite

Review Comment:
   This test suite is under common/utils module and can't import `SparkFunSuite`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47574][INFRA] Introduce Structured Logging Framework [spark]

Posted by "dtenedor (via GitHub)" <gi...@apache.org>.
dtenedor commented on code in PR #45729:
URL: https://github.com/apache/spark/pull/45729#discussion_r1540233859


##########
common/utils/src/main/scala/org/apache/spark/internal/Logging.scala:
##########
@@ -17,17 +17,39 @@
 
 package org.apache.spark.internal
 
+import java.util.Locale
+
 import scala.jdk.CollectionConverters._
 
-import org.apache.logging.log4j.{Level, LogManager}
+import org.apache.logging.log4j.{CloseableThreadContext, Level, LogManager}
+import org.apache.logging.log4j.CloseableThreadContext.Instance
 import org.apache.logging.log4j.core.{Filter, LifeCycle, LogEvent, Logger => Log4jLogger, LoggerContext}
 import org.apache.logging.log4j.core.appender.ConsoleAppender
 import org.apache.logging.log4j.core.config.DefaultConfiguration
 import org.apache.logging.log4j.core.filter.AbstractFilter
 import org.slf4j.{Logger, LoggerFactory}
 
+import org.apache.spark.SparkException
 import org.apache.spark.internal.Logging.SparkShellLoggingFilter
-import org.apache.spark.util.SparkClassUtils
+import org.apache.spark.util.{SparkClassUtils, SparkEnvUtils}
+
+// Mapped Diagnostic Context (MDC) that will be used in log messages.
+// The values of the MDC will be inline in the log message, while the key-value pairs will be
+// part of the ThreadContext.
+case class MDC(key: LogKey.Value, value: String)

Review Comment:
   This would probably be more readable fully spelled-out (`MappedDiagnosticContext`) rather than an acronym, no?



##########
common/utils/src/main/scala/org/apache/spark/internal/Logging.scala:
##########
@@ -55,6 +77,43 @@ trait Logging {
     log_
   }
 
+  implicit class LogStringContext(val sc: StringContext) {
+    def log(args: Any*): (String, Option[Instance]) = {
+      val processedParts = sc.parts.iterator
+      val sb = new StringBuilder(processedParts.next())
+      lazy val map = new java.util.HashMap[String, String]()
+
+      args.foreach { arg =>
+        arg match {

Review Comment:
   you can skip this line and the `arg =>` on the end of the previous line and just list the cases directly instead



##########
common/utils/src/main/scala/org/apache/spark/internal/Logging.scala:
##########
@@ -17,17 +17,39 @@
 
 package org.apache.spark.internal
 
+import java.util.Locale
+
 import scala.jdk.CollectionConverters._
 
-import org.apache.logging.log4j.{Level, LogManager}
+import org.apache.logging.log4j.{CloseableThreadContext, Level, LogManager}
+import org.apache.logging.log4j.CloseableThreadContext.Instance
 import org.apache.logging.log4j.core.{Filter, LifeCycle, LogEvent, Logger => Log4jLogger, LoggerContext}
 import org.apache.logging.log4j.core.appender.ConsoleAppender
 import org.apache.logging.log4j.core.config.DefaultConfiguration
 import org.apache.logging.log4j.core.filter.AbstractFilter
 import org.slf4j.{Logger, LoggerFactory}
 
+import org.apache.spark.SparkException
 import org.apache.spark.internal.Logging.SparkShellLoggingFilter
-import org.apache.spark.util.SparkClassUtils
+import org.apache.spark.util.{SparkClassUtils, SparkEnvUtils}
+
+// Mapped Diagnostic Context (MDC) that will be used in log messages.
+// The values of the MDC will be inline in the log message, while the key-value pairs will be
+// part of the ThreadContext.
+case class MDC(key: LogKey.Value, value: String)
+
+class LogEntry(entry: => (String, Option[Instance])) {

Review Comment:
   this pair of `(String, Option[Instance])` might be more readable using a helper case class, since we return it frequently.



##########
common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala:
##########
@@ -0,0 +1,21 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.internal
+
+object LogKey extends Enumeration {

Review Comment:
   let's add a comment saying what this represents?



##########
common/utils/src/test/scala/org/apache/spark/util/StructuredLoggingSuite.scala:
##########
@@ -0,0 +1,91 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.util
+
+import java.io.{ByteArrayOutputStream, PrintStream}
+
+import org.apache.commons.io.output.TeeOutputStream
+import org.scalatest.{BeforeAndAfterAll, BeforeAndAfterEach}
+import org.scalatest.funsuite.AnyFunSuite // scalastyle:ignore funsuite
+
+import org.apache.spark.internal.{Logging, MDC}
+import org.apache.spark.internal.LogKey.EXECUTOR_ID
+
+abstract class LoggingSuiteBase
+    extends AnyFunSuite // scalastyle:ignore funsuite
+    with BeforeAndAfterAll
+    with BeforeAndAfterEach
+    with Logging {
+  protected val outContent = new ByteArrayOutputStream()
+  protected val originalErr = System.err
+
+  override def beforeAll(): Unit = {
+    val teeStream = new TeeOutputStream(originalErr, outContent)
+    System.setErr(new PrintStream(teeStream))
+  }
+
+  override def afterAll(): Unit = {
+    System.setErr(originalErr)
+  }
+
+  override def afterEach(): Unit = {
+    outContent.reset()
+  }
+}
+
+class StructuredLoggingSuite extends LoggingSuiteBase {
+  val className = this.getClass.getName.stripSuffix("$")
+  override def beforeAll(): Unit = {
+    super.beforeAll()
+    Logging.enableStructuredLogging()
+  }
+
+  test("Structured logging") {
+    val msg = "This is a log message"
+    logError(msg)
+
+    val logOutput = outContent.toString.split("\n").filter(_.contains(msg)).head
+    assert(logOutput.nonEmpty)
+    // scalastyle:off line.size.limit
+    val pattern = s"""\\{"ts":"[^"]+","level":"ERROR","msg":"This is a log message","logger":"$className"}""".r

Review Comment:
   Thanks for adding this test suite. It looks like it will become the main means of exercising that the structured logging framework will work as intended as we develop it. I know we want to regex out spurious text in the result here, but the set of expected test cases might be easier to read if we keep whitespace formatting in each expected result. Then when we compare expected results against actual results, we can strip whitespace from both. For example, this becomes something like:
   
   ```
     val x =
       s"""
         |{
         |  "ts": [^"]+,
         |  "level": "ERROR",
         |  "msg": "This is a log message",
         |  "logger": $className
         |}
         |""".stripMargin
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47574][INFRA] Introduce Structured Logging Framework [spark]

Posted by "gengliangwang (via GitHub)" <gi...@apache.org>.
gengliangwang commented on code in PR #45729:
URL: https://github.com/apache/spark/pull/45729#discussion_r1540294626


##########
common/utils/src/main/scala/org/apache/spark/internal/Logging.scala:
##########
@@ -17,17 +17,39 @@
 
 package org.apache.spark.internal
 
+import java.util.Locale
+
 import scala.jdk.CollectionConverters._
 
-import org.apache.logging.log4j.{Level, LogManager}
+import org.apache.logging.log4j.{CloseableThreadContext, Level, LogManager}
+import org.apache.logging.log4j.CloseableThreadContext.Instance
 import org.apache.logging.log4j.core.{Filter, LifeCycle, LogEvent, Logger => Log4jLogger, LoggerContext}
 import org.apache.logging.log4j.core.appender.ConsoleAppender
 import org.apache.logging.log4j.core.config.DefaultConfiguration
 import org.apache.logging.log4j.core.filter.AbstractFilter
 import org.slf4j.{Logger, LoggerFactory}
 
+import org.apache.spark.SparkException
 import org.apache.spark.internal.Logging.SparkShellLoggingFilter
-import org.apache.spark.util.SparkClassUtils
+import org.apache.spark.util.{SparkClassUtils, SparkEnvUtils}
+
+// Mapped Diagnostic Context (MDC) that will be used in log messages.
+// The values of the MDC will be inline in the log message, while the key-value pairs will be
+// part of the ThreadContext.
+case class MDC(key: LogKey.Value, value: String)

Review Comment:
   This is a well-known term: https://logging.apache.org/log4j/1.x/apidocs/org/apache/log4j/MDC.html



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47574][INFRA] Introduce Structured Logging Framework [spark]

Posted by "gengliangwang (via GitHub)" <gi...@apache.org>.
gengliangwang commented on code in PR #45729:
URL: https://github.com/apache/spark/pull/45729#discussion_r1540122275


##########
common/utils/src/main/scala/org/apache/spark/internal/Logging.scala:
##########
@@ -55,6 +76,37 @@ trait Logging {
     log_
   }
 
+  implicit class LogStringContext(val sc: StringContext) {
+    def log(args: Any*): (String, Instance) = {
+      val processedParts = sc.parts.iterator
+      val sb = new StringBuilder(processedParts.next())
+      lazy val map = new java.util.HashMap[String, String]()
+
+      args.foreach { arg =>
+        arg match {
+          case mdc: MDC =>
+            sb.append(mdc.value)
+            if (Logging.isStructuredLoggingEnabled) {
+              map.put(mdc.key.toString.toLowerCase(Locale.ROOT), mdc.value)
+            }
+          case other =>
+            throw new IllegalArgumentException(s"Argument $other is not a MDC")

Review Comment:
   Good point. I am changing it to an error which only happens during testing. So I don't think we need an error class here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47574][INFRA] Introduce Structured Logging Framework [spark]

Posted by "gengliangwang (via GitHub)" <gi...@apache.org>.
gengliangwang commented on code in PR #45729:
URL: https://github.com/apache/spark/pull/45729#discussion_r1540335763


##########
common/utils/src/main/scala/org/apache/spark/internal/Logging.scala:
##########
@@ -17,17 +17,39 @@
 
 package org.apache.spark.internal
 
+import java.util.Locale
+
 import scala.jdk.CollectionConverters._
 
-import org.apache.logging.log4j.{Level, LogManager}
+import org.apache.logging.log4j.{CloseableThreadContext, Level, LogManager}
+import org.apache.logging.log4j.CloseableThreadContext.Instance
 import org.apache.logging.log4j.core.{Filter, LifeCycle, LogEvent, Logger => Log4jLogger, LoggerContext}
 import org.apache.logging.log4j.core.appender.ConsoleAppender
 import org.apache.logging.log4j.core.config.DefaultConfiguration
 import org.apache.logging.log4j.core.filter.AbstractFilter
 import org.slf4j.{Logger, LoggerFactory}
 
+import org.apache.spark.SparkException
 import org.apache.spark.internal.Logging.SparkShellLoggingFilter
-import org.apache.spark.util.SparkClassUtils
+import org.apache.spark.util.{SparkClassUtils, SparkEnvUtils}
+
+// Mapped Diagnostic Context (MDC) that will be used in log messages.
+// The values of the MDC will be inline in the log message, while the key-value pairs will be
+// part of the ThreadContext.
+case class MDC(key: LogKey.Value, value: String)
+
+class LogEntry(entry: => (String, Option[Instance])) {

Review Comment:
   Updated. I tried to avoid creating more internal classes. But the cost should be trivial anyway.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47574][INFRA] Introduce Structured Logging Framework [spark]

Posted by "gengliangwang (via GitHub)" <gi...@apache.org>.
gengliangwang commented on code in PR #45729:
URL: https://github.com/apache/spark/pull/45729#discussion_r1540335898


##########
common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala:
##########
@@ -0,0 +1,21 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.internal
+
+object LogKey extends Enumeration {

Review Comment:
   Thanks, updated



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47574][INFRA] Introduce Structured Logging Framework [spark]

Posted by "gengliangwang (via GitHub)" <gi...@apache.org>.
gengliangwang commented on PR #45729:
URL: https://github.com/apache/spark/pull/45729#issuecomment-2021250316

   cc @steveloughran @dtenedor @@bart-samwel as well


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47574][INFRA] Introduce Structured Logging Framework [spark]

Posted by "gengliangwang (via GitHub)" <gi...@apache.org>.
gengliangwang commented on code in PR #45729:
URL: https://github.com/apache/spark/pull/45729#discussion_r1540122993


##########
common/utils/src/main/scala/org/apache/spark/internal/Logging.scala:
##########
@@ -55,6 +76,37 @@ trait Logging {
     log_
   }
 
+  implicit class LogStringContext(val sc: StringContext) {
+    def log(args: Any*): (String, Instance) = {
+      val processedParts = sc.parts.iterator
+      val sb = new StringBuilder(processedParts.next())
+      lazy val map = new java.util.HashMap[String, String]()
+
+      args.foreach { arg =>
+        arg match {
+          case mdc: MDC =>
+            sb.append(mdc.value)
+            if (Logging.isStructuredLoggingEnabled) {
+              map.put(mdc.key.toString.toLowerCase(Locale.ROOT), mdc.value)
+            }
+          case other =>
+            throw new IllegalArgumentException(s"Argument $other is not a MDC")
+        }
+        if (processedParts.hasNext) {
+          sb.append(processedParts.next())
+        }
+      }
+
+      // Create a CloseableThreadContext and apply the context map
+      val closeableContext = if (Logging.isStructuredLoggingEnabled) {
+        CloseableThreadContext.putAll(map)
+      } else {
+        null

Review Comment:
   This is to avoid the extra overhead of creating a CloseableThreadContext instance. I am changing this to None for safety. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47574][INFRA] Introduce Structured Logging Framework [spark]

Posted by "amaliujia (via GitHub)" <gi...@apache.org>.
amaliujia commented on code in PR #45729:
URL: https://github.com/apache/spark/pull/45729#discussion_r1542066261


##########
common/utils/src/test/scala/org/apache/spark/util/StructuredLoggingSuite.scala:
##########
@@ -0,0 +1,91 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.util
+
+import java.io.{ByteArrayOutputStream, PrintStream}
+
+import org.apache.commons.io.output.TeeOutputStream
+import org.scalatest.{BeforeAndAfterAll, BeforeAndAfterEach}
+import org.scalatest.funsuite.AnyFunSuite // scalastyle:ignore funsuite
+
+import org.apache.spark.internal.{Logging, MDC}
+import org.apache.spark.internal.LogKey.EXECUTOR_ID
+
+abstract class LoggingSuiteBase
+    extends AnyFunSuite // scalastyle:ignore funsuite
+    with BeforeAndAfterAll
+    with BeforeAndAfterEach
+    with Logging {
+  protected val outContent = new ByteArrayOutputStream()
+  protected val originalErr = System.err
+
+  override def beforeAll(): Unit = {
+    val teeStream = new TeeOutputStream(originalErr, outContent)
+    System.setErr(new PrintStream(teeStream))
+  }
+
+  override def afterAll(): Unit = {
+    System.setErr(originalErr)
+  }
+
+  override def afterEach(): Unit = {
+    outContent.reset()
+  }
+}
+
+class StructuredLoggingSuite extends LoggingSuiteBase {
+  val className = this.getClass.getName.stripSuffix("$")
+  override def beforeAll(): Unit = {
+    super.beforeAll()
+    Logging.enableStructuredLogging()
+  }
+
+  test("Structured logging") {
+    val msg = "This is a log message"
+    logError(msg)
+
+    val logOutput = outContent.toString.split("\n").filter(_.contains(msg)).head
+    assert(logOutput.nonEmpty)
+    // scalastyle:off line.size.limit
+    val pattern = s"""\\{"ts":"[^"]+","level":"ERROR","msg":"This is a log message","logger":"$className"}""".r
+    // scalastyle:on
+    assert(pattern.matches(logOutput))

Review Comment:
   Regarding to the test case readability, I am wondering if we at last put the value of the `logOutput` as a comment here with newlines and whitespaces inserted to have better readability, so we can read the pattern and then quickly read the value in comment to understand what this test case does?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47574][INFRA] Introduce Structured Logging Framework [spark]

Posted by "amaliujia (via GitHub)" <gi...@apache.org>.
amaliujia commented on code in PR #45729:
URL: https://github.com/apache/spark/pull/45729#discussion_r1542066261


##########
common/utils/src/test/scala/org/apache/spark/util/StructuredLoggingSuite.scala:
##########
@@ -0,0 +1,91 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.util
+
+import java.io.{ByteArrayOutputStream, PrintStream}
+
+import org.apache.commons.io.output.TeeOutputStream
+import org.scalatest.{BeforeAndAfterAll, BeforeAndAfterEach}
+import org.scalatest.funsuite.AnyFunSuite // scalastyle:ignore funsuite
+
+import org.apache.spark.internal.{Logging, MDC}
+import org.apache.spark.internal.LogKey.EXECUTOR_ID
+
+abstract class LoggingSuiteBase
+    extends AnyFunSuite // scalastyle:ignore funsuite
+    with BeforeAndAfterAll
+    with BeforeAndAfterEach
+    with Logging {
+  protected val outContent = new ByteArrayOutputStream()
+  protected val originalErr = System.err
+
+  override def beforeAll(): Unit = {
+    val teeStream = new TeeOutputStream(originalErr, outContent)
+    System.setErr(new PrintStream(teeStream))
+  }
+
+  override def afterAll(): Unit = {
+    System.setErr(originalErr)
+  }
+
+  override def afterEach(): Unit = {
+    outContent.reset()
+  }
+}
+
+class StructuredLoggingSuite extends LoggingSuiteBase {
+  val className = this.getClass.getName.stripSuffix("$")
+  override def beforeAll(): Unit = {
+    super.beforeAll()
+    Logging.enableStructuredLogging()
+  }
+
+  test("Structured logging") {
+    val msg = "This is a log message"
+    logError(msg)
+
+    val logOutput = outContent.toString.split("\n").filter(_.contains(msg)).head
+    assert(logOutput.nonEmpty)
+    // scalastyle:off line.size.limit
+    val pattern = s"""\\{"ts":"[^"]+","level":"ERROR","msg":"This is a log message","logger":"$className"}""".r
+    // scalastyle:on
+    assert(pattern.matches(logOutput))

Review Comment:
   Regarding to the test case readability, I am wondering if we at last put the value of the `logOutput` as a comment here so we can read the pattern and the quickly read the value in comment to understand what this test case does?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47574][INFRA] Introduce Structured Logging Framework [spark]

Posted by "gengliangwang (via GitHub)" <gi...@apache.org>.
gengliangwang commented on code in PR #45729:
URL: https://github.com/apache/spark/pull/45729#discussion_r1542391302


##########
common/utils/src/test/scala/org/apache/spark/util/StructuredLoggingSuite.scala:
##########
@@ -0,0 +1,91 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.util
+
+import java.io.{ByteArrayOutputStream, PrintStream}
+
+import org.apache.commons.io.output.TeeOutputStream
+import org.scalatest.{BeforeAndAfterAll, BeforeAndAfterEach}
+import org.scalatest.funsuite.AnyFunSuite // scalastyle:ignore funsuite
+
+import org.apache.spark.internal.{Logging, MDC}
+import org.apache.spark.internal.LogKey.EXECUTOR_ID
+
+abstract class LoggingSuiteBase
+    extends AnyFunSuite // scalastyle:ignore funsuite
+    with BeforeAndAfterAll
+    with BeforeAndAfterEach
+    with Logging {
+  protected val outContent = new ByteArrayOutputStream()
+  protected val originalErr = System.err
+
+  override def beforeAll(): Unit = {
+    val teeStream = new TeeOutputStream(originalErr, outContent)
+    System.setErr(new PrintStream(teeStream))
+  }
+
+  override def afterAll(): Unit = {
+    System.setErr(originalErr)
+  }
+
+  override def afterEach(): Unit = {
+    outContent.reset()
+  }
+}
+
+class StructuredLoggingSuite extends LoggingSuiteBase {
+  val className = this.getClass.getName.stripSuffix("$")
+  override def beforeAll(): Unit = {
+    super.beforeAll()
+    Logging.enableStructuredLogging()
+  }
+
+  test("Structured logging") {
+    val msg = "This is a log message"
+    logError(msg)
+
+    val logOutput = outContent.toString.split("\n").filter(_.contains(msg)).head
+    assert(logOutput.nonEmpty)
+    // scalastyle:off line.size.limit
+    val pattern = s"""\\{"ts":"[^"]+","level":"ERROR","msg":"This is a log message","logger":"$className"}""".r
+    // scalastyle:on
+    assert(pattern.matches(logOutput))

Review Comment:
   I got the idea. Please check the latest test



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org