You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "ramesh-muthusamy (via GitHub)" <gi...@apache.org> on 2023/08/09 17:19:04 UTC

[GitHub] [spark] ramesh-muthusamy opened a new pull request, #42416: [SPARK-44741][Metrics]Spark StatsD metrics reporter to support metrics filter option

ramesh-muthusamy opened a new pull request, #42416:
URL: https://github.com/apache/spark/pull/42416

   
   
   ### What changes were proposed in this pull request?
   
   Adding additional option in StatsdSink to allow passing regex as a filter option for reporting metrics
   Added unit test case to validate the application of filter in the StatsdSink
   Updated metrics config to represent the added config for StatsdSink
   
   ### Why are the changes needed?
   
   In the current state Spark metrics instances send a large range of metrics that are useful for debugging and performance analysis. There are cases where the consumers would like to switch between deeper metrics and only specific custom metrics that are required for system maintenance. This option introduced in StatsdSink extends the existing filtering option in the reporter and exposes the same as an option. This implementation is similar to GraphiteSink that supports a similar feature.
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes , This PR exposes an additional config that users can provide optionally to filter the metrics that are being reported. This is a optional field and defaulted to the existing behaviour where the sink reports all the metrics.
   
   
   ### How was this patch tested?
   
   Unit tests has been added in the PR to validate the change
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #42416: [SPARK-44741][CORE] Spark StatsD metrics reporter to support metrics filter option

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #42416:
URL: https://github.com/apache/spark/pull/42416#discussion_r1289211807


##########
core/src/main/scala/org/apache/spark/metrics/sink/StatsdSink.scala:
##########
@@ -43,6 +44,8 @@ private[spark] class StatsdSink(
     val property: Properties, val registry: MetricRegistry) extends Sink with Logging {
   import StatsdSink._
 
+  def propertyToOption(prop: String): Option[String] = Option(property.getProperty(prop))

Review Comment:
   Since this is used once, we had better remove this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dbtsai commented on pull request #42416: [SPARK-44741][Metrics]Spark StatsD metrics reporter to support metrics filter option

Posted by "dbtsai (via GitHub)" <gi...@apache.org>.
dbtsai commented on PR #42416:
URL: https://github.com/apache/spark/pull/42416#issuecomment-1671878698

   Cc @dongjoon-hyun for review 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #42416: [SPARK-44741][CORE] Spark StatsD metrics reporter to support metrics filter option

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #42416:
URL: https://github.com/apache/spark/pull/42416#discussion_r1289212677


##########
core/src/main/scala/org/apache/spark/metrics/sink/StatsdSink.scala:
##########
@@ -53,9 +56,18 @@ private[spark] class StatsdSink(
 
   val prefix = property.getProperty(STATSD_KEY_PREFIX, STATSD_DEFAULT_PREFIX)
 
+  val filter = propertyToOption(STATSD_KEY_REGEX) match {

Review Comment:
   And, maybe the following?
   ```scala
   - val filter = propertyToOption(STATSD_KEY_REGEX) match {
   + val filter = Option(property.getProperty(STATSD_KEY_REGEX)) match {
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ramesh-muthusamy commented on a diff in pull request #42416: [SPARK-44741][CORE] Spark StatsD metrics reporter to support metrics filter option

Posted by "ramesh-muthusamy (via GitHub)" <gi...@apache.org>.
ramesh-muthusamy commented on code in PR #42416:
URL: https://github.com/apache/spark/pull/42416#discussion_r1289131446


##########
core/src/test/scala/org/apache/spark/metrics/sink/StatsdSinkSuite.scala:
##########
@@ -171,5 +174,32 @@ class StatsdSinkSuite extends SparkFunSuite {
       }
     }
   }
+
+
+  test("metrics StatsD sink with filtered Gauge") {
+    withSocketAndSink { (socket, sink) =>
+      val gauge = new Gauge[Double] {
+        override def getValue: Double = 1.23
+      }
+
+      val filteredMetricKeys = Set(
+        "test-123.driver.CodeGenerator.generatedMethodSize",
+        "test-123.driver.CodeGenerator.generatedMethodSize"
+      )
+
+      filteredMetricKeys.foreach(sink.registry.register(_, gauge))
+
+      sink.registry.register("test-123.driver.spark.streaming.local.latency", gauge)
+      sink.report()
+
+      val p = new DatagramPacket(new Array[Byte](socketBufferSize), socketBufferSize)

Review Comment:
   @dongjoon-hyun  resolved the issue, these propped up from the rebase that I did. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #42416: [SPARK-44741][CORE] Spark StatsD metrics reporter to support metrics filter option

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #42416:
URL: https://github.com/apache/spark/pull/42416#discussion_r1289208286


##########
core/src/test/scala/org/apache/spark/metrics/sink/StatsdSinkSuite.scala:
##########
@@ -171,5 +174,32 @@ class StatsdSinkSuite extends SparkFunSuite {
       }
     }
   }
+
+
+  test("metrics StatsD sink with filtered Gauge") {
+    withSocketAndSink { (socket, sink) =>
+      val gauge = new Gauge[Double] {
+        override def getValue: Double = 1.23
+      }
+
+      val filteredMetricKeys = Set(
+        "test-123.driver.CodeGenerator.generatedMethodSize",
+        "test-123.driver.CodeGenerator.generatedMethodSize"
+      )
+
+      filteredMetricKeys.foreach(sink.registry.register(_, gauge))
+
+      sink.registry.register("test-123.driver.spark.streaming.local.latency", gauge)
+      sink.report()
+
+      val p = new DatagramPacket(new Array[Byte](socketBufferSize), socketBufferSize)

Review Comment:
   Thank you for updating.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #42416: [SPARK-44741][Metrics]Spark StatsD metrics reporter to support metrics filter option

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on PR #42416:
URL: https://github.com/apache/spark/pull/42416#issuecomment-1671895210

   Sure. Thank you, @dbtsai and @ramesh-muthusamy .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #42416: [SPARK-44741][CORE] Support regex-based MetricFilter in `StatsdSink`

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #42416:
URL: https://github.com/apache/spark/pull/42416#discussion_r1289383886


##########
core/src/test/scala/org/apache/spark/metrics/sink/StatsdSinkSuite.scala:
##########
@@ -59,6 +61,7 @@ class StatsdSinkSuite extends SparkFunSuite {
     val props = new Properties
     defaultProps.foreach(e => props.put(e._1, e._2))
     props.put(STATSD_KEY_PORT, socket.getLocalPort.toString)
+    props.put(STATSD_KEY_REGEX, "test-[0-9]+.driver.(CodeGenerator|BlockManager)")

Review Comment:
   This seems to break all the other test cases except the newly added one. Could you run this whole test suite locally and fix the bug?
   ```
   [info] *** 4 TESTS FAILED ***
   [error] Failed: Total 3673, Failed 4, Errors 0, Passed 3669, Ignored 10, Canceled 4
   [error] Failed tests:
   [error] 	org.apache.spark.metrics.sink.StatsdSinkSuite
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #42416: [SPARK-44741][CORE] Spark StatsD metrics reporter to support metrics filter option

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #42416:
URL: https://github.com/apache/spark/pull/42416#discussion_r1289041739


##########
core/src/test/scala/org/apache/spark/metrics/sink/StatsdSinkSuite.scala:
##########
@@ -171,5 +174,32 @@ class StatsdSinkSuite extends SparkFunSuite {
       }
     }
   }
+
+
+  test("metrics StatsD sink with filtered Gauge") {
+    withSocketAndSink { (socket, sink) =>
+      val gauge = new Gauge[Double] {
+        override def getValue: Double = 1.23
+      }
+
+      val filteredMetricKeys = Set(
+        "test-123.driver.CodeGenerator.generatedMethodSize",
+        "test-123.driver.CodeGenerator.generatedMethodSize"
+      )
+
+      filteredMetricKeys.foreach(sink.registry.register(_, gauge))
+
+      sink.registry.register("test-123.driver.spark.streaming.local.latency", gauge)
+      sink.report()
+
+      val p = new DatagramPacket(new Array[Byte](socketBufferSize), socketBufferSize)

Review Comment:
   Could you fix the compilation error, @ramesh-muthusamy ?
   ```
   [info] compiling 330 Scala sources and 29 Java sources to /home/runner/work/spark/spark/core/target/scala-2.12/test-classes ...
   [error] /home/runner/work/spark/spark/core/src/test/scala/org/apache/spark/metrics/sink/StatsdSinkSuite.scala:195:50: not found: value socketBufferSize
   [error]       val p = new DatagramPacket(new Array[Byte](socketBufferSize), socketBufferSize)
   [error]                                                  ^
   [error] /home/runner/work/spark/spark/core/src/test/scala/org/apache/spark/metrics/sink/StatsdSinkSuite.scala:195:69: not found: value socketBufferSize
   [error]       val p = new DatagramPacket(new Array[Byte](socketBufferSize), socketBufferSize)
   [error]                                                                     ^
   [error] two errors found
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ramesh-muthusamy commented on a diff in pull request #42416: [SPARK-44741][CORE] Support regex-based MetricFilter in `StatsdSink`

Posted by "ramesh-muthusamy (via GitHub)" <gi...@apache.org>.
ramesh-muthusamy commented on code in PR #42416:
URL: https://github.com/apache/spark/pull/42416#discussion_r1289513072


##########
core/src/test/scala/org/apache/spark/metrics/sink/StatsdSinkSuite.scala:
##########
@@ -59,6 +61,7 @@ class StatsdSinkSuite extends SparkFunSuite {
     val props = new Properties
     defaultProps.foreach(e => props.put(e._1, e._2))
     props.put(STATSD_KEY_PORT, socket.getLocalPort.toString)
+    props.put(STATSD_KEY_REGEX, "test-[0-9]+.driver.(CodeGenerator|BlockManager)")

Review Comment:
   All of them were timing out locally, I wondered if that was to do with the statsd setup. In this case looks like the timeout is happening since the test keeps waiting for the result and filter is blocking the message from being sent. I confirmed this behaviour locally and have resolved the same. The StatsdTestSuite now  runs successfully locally. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun closed pull request #42416: [SPARK-44741][CORE] Support regex-based MetricFilter in `StatsdSink`

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun closed pull request #42416: [SPARK-44741][CORE] Support regex-based MetricFilter in `StatsdSink`
URL: https://github.com/apache/spark/pull/42416


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ramesh-muthusamy commented on a diff in pull request #42416: [SPARK-44741][CORE] Support regex-based MetricFilter in `StatsdSink`

Posted by "ramesh-muthusamy (via GitHub)" <gi...@apache.org>.
ramesh-muthusamy commented on code in PR #42416:
URL: https://github.com/apache/spark/pull/42416#discussion_r1289239353


##########
core/src/main/scala/org/apache/spark/metrics/sink/StatsdSink.scala:
##########
@@ -43,6 +44,8 @@ private[spark] class StatsdSink(
     val property: Properties, val registry: MetricRegistry) extends Sink with Logging {
   import StatsdSink._
 
+  def propertyToOption(prop: String): Option[String] = Option(property.getProperty(prop))

Review Comment:
   yes makes sense 



##########
core/src/main/scala/org/apache/spark/metrics/sink/StatsdSink.scala:
##########
@@ -53,9 +56,18 @@ private[spark] class StatsdSink(
 
   val prefix = property.getProperty(STATSD_KEY_PREFIX, STATSD_DEFAULT_PREFIX)
 
+  val filter = propertyToOption(STATSD_KEY_REGEX) match {

Review Comment:
   agreed, updated 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org