You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by GitBox <gi...@apache.org> on 2020/04/28 05:32:46 UTC

[GitHub] [kafka] cmccabe opened a new pull request #8569: KIP-551: Expose disk read and write metrics

cmccabe opened a new pull request #8569:
URL: https://github.com/apache/kafka/pull/8569


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] mumrah commented on a change in pull request #8569: KIP-551: Expose disk read and write metrics

Posted by GitBox <gi...@apache.org>.
mumrah commented on a change in pull request #8569:
URL: https://github.com/apache/kafka/pull/8569#discussion_r425443889



##########
File path: core/src/main/scala/kafka/metrics/LinuxIoMetricsCollector.scala
##########
@@ -0,0 +1,102 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package kafka.server
+
+import java.nio.file.{Files, Paths}
+
+import org.apache.kafka.common.utils.Time
+import org.slf4j.Logger
+
+import scala.jdk.CollectionConverters._
+
+/**
+ * Retrieves Linux /proc/self/io metrics.
+ */
+class LinuxIoMetricsCollector(procRoot: String, val time: Time, val logger: Logger) {
+  import LinuxIoMetricsCollector._
+  var lastUpdateMs = -1L
+  var cachedReadBytes = 0L
+  var cachedWriteBytes = 0L
+  val path = Paths.get(procRoot, "self", "io")
+
+  def readBytes(): Long = this.synchronized {
+    val curMs = time.milliseconds()

Review comment:
       minor nit: could move the time check to the updateValues method since that's where lastUpdateMs is set? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] cmccabe edited a comment on pull request #8569: KIP-551: Expose disk read and write metrics

Posted by GitBox <gi...@apache.org>.
cmccabe edited a comment on pull request #8569:
URL: https://github.com/apache/kafka/pull/8569#issuecomment-620884289


   > In addition to block-level read/write, would there be a benefit to expose file system read/write metrics?
   
   It's better to have that discussion on the mailing list.  This PR is just about KIP-551.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] cmccabe commented on a change in pull request #8569: KIP-551: Expose disk read and write metrics

Posted by GitBox <gi...@apache.org>.
cmccabe commented on a change in pull request #8569:
URL: https://github.com/apache/kafka/pull/8569#discussion_r422584829



##########
File path: core/src/main/scala/kafka/metrics/LinuxIoMetricsCollector.scala
##########
@@ -0,0 +1,96 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package kafka.server
+
+import java.nio.file.{Files, Paths}
+
+import org.apache.kafka.common.utils.Time
+import org.slf4j.Logger
+
+import scala.jdk.CollectionConverters._
+
+/**
+ * Retrieves Linux /proc/self/io metrics.
+ */
+class LinuxIoMetricsCollector(val procPath: String, val time: Time, val logger: Logger) {
+  import LinuxIoMetricsCollector._
+  var lastUpdateMs = -1L
+  var cachedReadBytes = 0L
+  var cachedWriteBytes = 0L
+
+  def readBytes(): Long = this.synchronized {
+    val curMs = time.milliseconds()
+    if (curMs != lastUpdateMs) {
+      updateValues(curMs)
+    }
+    cachedReadBytes
+  }
+
+  def writeBytes(): Long = this.synchronized {
+    val curMs = time.milliseconds()
+    if (curMs != lastUpdateMs) {
+      updateValues(curMs)
+    }
+    cachedWriteBytes
+  }
+
+  /**
+   * Read /proc/self/io.
+   *
+   * Generally, each line in this file contains a prefix followed by a colon and a number.
+   *
+   * For example, it might contain this:
+   * rchar: 4052
+   * wchar: 0
+   * syscr: 13
+   * syscw: 0
+   * read_bytes: 0
+   * write_bytes: 0
+   * cancelled_write_bytes: 0
+   */
+  def updateValues(now: Long): Boolean = this.synchronized {

Review comment:
       Assuming there's no background refresh thread, the only thread that is reading from `/proc` is the thread calling `readBytes`.  So you still have to wait for that read, whether or not you use a lock here.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] Hangleton commented on a change in pull request #8569: KIP-551: Expose disk read and write metrics

Posted by GitBox <gi...@apache.org>.
Hangleton commented on a change in pull request #8569:
URL: https://github.com/apache/kafka/pull/8569#discussion_r416391699



##########
File path: core/src/main/scala/kafka/metrics/LinuxIoMetricsCollector.scala
##########
@@ -0,0 +1,96 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package kafka.server
+
+import java.nio.file.{Files, Paths}
+
+import org.apache.kafka.common.utils.Time
+import org.slf4j.Logger
+
+import scala.jdk.CollectionConverters._
+
+/**
+ * Retrieves Linux /proc/self/io metrics.
+ */
+class LinuxIoMetricsCollector(val procPath: String, val time: Time, val logger: Logger) {
+  import LinuxIoMetricsCollector._
+  var lastUpdateMs = -1L
+  var cachedReadBytes = 0L
+  var cachedWriteBytes = 0L
+
+  def readBytes(): Long = this.synchronized {
+    val curMs = time.milliseconds()
+    if (curMs != lastUpdateMs) {
+      updateValues(curMs)
+    }
+    cachedReadBytes
+  }
+
+  def writeBytes(): Long = this.synchronized {
+    val curMs = time.milliseconds()
+    if (curMs != lastUpdateMs) {
+      updateValues(curMs)
+    }
+    cachedWriteBytes
+  }
+
+  /**
+   * Read /proc/self/io.
+   *
+   * Generally, each line in this file contains a prefix followed by a colon and a number.
+   *
+   * For example, it might contain this:
+   * rchar: 4052
+   * wchar: 0
+   * syscr: 13
+   * syscw: 0
+   * read_bytes: 0
+   * write_bytes: 0
+   * cancelled_write_bytes: 0
+   */
+  def updateValues(now: Long): Boolean = this.synchronized {

Review comment:
       Nit: do you think the lock should be hold while reading `/proc`, or restricted to the update of `lastUpdateMs`, `cachedReadBytes` and `cachedWriteBytes`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] cmccabe commented on pull request #8569: KIP-551: Expose disk read and write metrics

Posted by GitBox <gi...@apache.org>.
cmccabe commented on pull request #8569:
URL: https://github.com/apache/kafka/pull/8569#issuecomment-620884289


   > In addition to block-level read/write, would there be a benefit to expose file system read/write metrics?
   It's better to have that discussion on the mailing list.  This PR is just about KIP-551.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] Hangleton commented on pull request #8569: KIP-551: Expose disk read and write metrics

Posted by GitBox <gi...@apache.org>.
Hangleton commented on pull request #8569:
URL: https://github.com/apache/kafka/pull/8569#issuecomment-629612091


   @mumrah 
   
   I think it would be preferable to avoid using JNI libraries, because it can add maintenance overhead and require additional configurations for tests. It also adds some risks inherent to the safety of the native implementation provided by the library.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] hackaugusto edited a comment on pull request #8569: KIP-551: Expose disk read and write metrics

Posted by GitBox <gi...@apache.org>.
hackaugusto edited a comment on pull request #8569:
URL: https://github.com/apache/kafka/pull/8569#issuecomment-1040320969


   @d1egoaz it seems the metric names on the wiki are wrong. This exposes `kafka.server:type=KafkaServer,name=linux-disk-read-bytes` and `kafka.server:type=KafkaServer,name=linux-disk-write-bytes`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] hackaugusto commented on pull request #8569: KIP-551: Expose disk read and write metrics

Posted by GitBox <gi...@apache.org>.
hackaugusto commented on pull request #8569:
URL: https://github.com/apache/kafka/pull/8569#issuecomment-1040320969


   @d1egoaz itseems the metric names on the wiki are wrong. This exposes `kafka.server:type=KafkaServer,name=linux-disk-read-bytes` and `kafka.server:type=KafkaServer,name=linux-disk-write-bytes`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] cmccabe commented on a change in pull request #8569: KIP-551: Expose disk read and write metrics

Posted by GitBox <gi...@apache.org>.
cmccabe commented on a change in pull request #8569:
URL: https://github.com/apache/kafka/pull/8569#discussion_r422584924



##########
File path: core/src/main/scala/kafka/metrics/LinuxIoMetricsCollector.scala
##########
@@ -0,0 +1,96 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package kafka.server
+
+import java.nio.file.{Files, Paths}
+
+import org.apache.kafka.common.utils.Time
+import org.slf4j.Logger
+
+import scala.jdk.CollectionConverters._
+
+/**
+ * Retrieves Linux /proc/self/io metrics.
+ */
+class LinuxIoMetricsCollector(val procPath: String, val time: Time, val logger: Logger) {
+  import LinuxIoMetricsCollector._
+  var lastUpdateMs = -1L
+  var cachedReadBytes = 0L
+  var cachedWriteBytes = 0L
+
+  def readBytes(): Long = this.synchronized {
+    val curMs = time.milliseconds()
+    if (curMs != lastUpdateMs) {
+      updateValues(curMs)
+    }
+    cachedReadBytes
+  }
+
+  def writeBytes(): Long = this.synchronized {
+    val curMs = time.milliseconds()
+    if (curMs != lastUpdateMs) {
+      updateValues(curMs)
+    }
+    cachedWriteBytes
+  }
+
+  /**
+   * Read /proc/self/io.
+   *
+   * Generally, each line in this file contains a prefix followed by a colon and a number.
+   *
+   * For example, it might contain this:
+   * rchar: 4052
+   * wchar: 0
+   * syscr: 13
+   * syscw: 0
+   * read_bytes: 0
+   * write_bytes: 0
+   * cancelled_write_bytes: 0
+   */
+  def updateValues(now: Long): Boolean = this.synchronized {

Review comment:
       Also there is not going to be an I/O stall reading from `/proc` since `/proc` is not a real disk.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] mimaison commented on a change in pull request #8569: KIP-551: Expose disk read and write metrics

Posted by GitBox <gi...@apache.org>.
mimaison commented on a change in pull request #8569:
URL: https://github.com/apache/kafka/pull/8569#discussion_r421525324



##########
File path: core/src/main/scala/kafka/metrics/LinuxIoMetricsCollector.scala
##########
@@ -0,0 +1,96 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package kafka.server
+
+import java.nio.file.{Files, Paths}
+
+import org.apache.kafka.common.utils.Time
+import org.slf4j.Logger
+
+import scala.jdk.CollectionConverters._
+
+/**
+ * Retrieves Linux /proc/self/io metrics.
+ */
+class LinuxIoMetricsCollector(val procPath: String, val time: Time, val logger: Logger) {
+  import LinuxIoMetricsCollector._
+  var lastUpdateMs = -1L
+  var cachedReadBytes = 0L
+  var cachedWriteBytes = 0L
+
+  def readBytes(): Long = this.synchronized {
+    val curMs = time.milliseconds()
+    if (curMs != lastUpdateMs) {

Review comment:
       Should we have add small jitter to `lastUpdateMs` as both metrics should trigger around the same time, ie
   ```
   if (curMs > lastUpdateMs + 100) {
   ```

##########
File path: core/src/main/scala/kafka/metrics/LinuxIoMetricsCollector.scala
##########
@@ -0,0 +1,96 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package kafka.server
+
+import java.nio.file.{Files, Paths}
+
+import org.apache.kafka.common.utils.Time
+import org.slf4j.Logger
+
+import scala.jdk.CollectionConverters._
+
+/**
+ * Retrieves Linux /proc/self/io metrics.
+ */
+class LinuxIoMetricsCollector(val procPath: String, val time: Time, val logger: Logger) {
+  import LinuxIoMetricsCollector._
+  var lastUpdateMs = -1L
+  var cachedReadBytes = 0L
+  var cachedWriteBytes = 0L
+
+  def readBytes(): Long = this.synchronized {
+    val curMs = time.milliseconds()
+    if (curMs != lastUpdateMs) {
+      updateValues(curMs)
+    }
+    cachedReadBytes
+  }
+
+  def writeBytes(): Long = this.synchronized {
+    val curMs = time.milliseconds()
+    if (curMs != lastUpdateMs) {
+      updateValues(curMs)
+    }
+    cachedWriteBytes
+  }
+
+  /**
+   * Read /proc/self/io.
+   *
+   * Generally, each line in this file contains a prefix followed by a colon and a number.
+   *
+   * For example, it might contain this:
+   * rchar: 4052
+   * wchar: 0
+   * syscr: 13
+   * syscw: 0
+   * read_bytes: 0
+   * write_bytes: 0
+   * cancelled_write_bytes: 0
+   */
+  def updateValues(now: Long): Boolean = this.synchronized {
+    try {
+      cachedReadBytes = -1
+      cachedWriteBytes = -1
+      val lines = Files.readAllLines(Paths.get(procPath, "self", "io")).asScala
+      lines.foreach(line => {
+        if (line.startsWith(READ_BYTES_PREFIX)) {
+          cachedReadBytes = line.substring(READ_BYTES_PREFIX.size).toLong
+        } else if (line.startsWith(WRITE_BYTES_PREFIX)) {
+          cachedWriteBytes = line.substring(WRITE_BYTES_PREFIX.size).toLong
+        }
+      })
+      lastUpdateMs = now
+      true
+    } catch {
+      case t: Throwable => {
+        logger.warn("LinuxIoMetricsCollector: unable to update metrics", t)
+        false
+      }
+    }
+  }
+
+  def usable(): Boolean = {
+    updateValues(time.milliseconds())

Review comment:
       This prints a stacktrace at startup when not on Linux. I wonder if we should check if `/proc` exists and only print a stacktrace if it exists and we can't read it. WDYT?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] hackaugusto edited a comment on pull request #8569: KIP-551: Expose disk read and write metrics

Posted by GitBox <gi...@apache.org>.
hackaugusto edited a comment on pull request #8569:
URL: https://github.com/apache/kafka/pull/8569#issuecomment-1040320969


   @d1egoaz it seems the metric names on the wiki are wrong. This exposes `kafka.server:type=KafkaServer,name=linux-disk-read-bytes` and `kafka.server:type=KafkaServer,name=linux-disk-write-bytes`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] cmccabe commented on a change in pull request #8569: KIP-551: Expose disk read and write metrics

Posted by GitBox <gi...@apache.org>.
cmccabe commented on a change in pull request #8569:
URL: https://github.com/apache/kafka/pull/8569#discussion_r422584924



##########
File path: core/src/main/scala/kafka/metrics/LinuxIoMetricsCollector.scala
##########
@@ -0,0 +1,96 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package kafka.server
+
+import java.nio.file.{Files, Paths}
+
+import org.apache.kafka.common.utils.Time
+import org.slf4j.Logger
+
+import scala.jdk.CollectionConverters._
+
+/**
+ * Retrieves Linux /proc/self/io metrics.
+ */
+class LinuxIoMetricsCollector(val procPath: String, val time: Time, val logger: Logger) {
+  import LinuxIoMetricsCollector._
+  var lastUpdateMs = -1L
+  var cachedReadBytes = 0L
+  var cachedWriteBytes = 0L
+
+  def readBytes(): Long = this.synchronized {
+    val curMs = time.milliseconds()
+    if (curMs != lastUpdateMs) {
+      updateValues(curMs)
+    }
+    cachedReadBytes
+  }
+
+  def writeBytes(): Long = this.synchronized {
+    val curMs = time.milliseconds()
+    if (curMs != lastUpdateMs) {
+      updateValues(curMs)
+    }
+    cachedWriteBytes
+  }
+
+  /**
+   * Read /proc/self/io.
+   *
+   * Generally, each line in this file contains a prefix followed by a colon and a number.
+   *
+   * For example, it might contain this:
+   * rchar: 4052
+   * wchar: 0
+   * syscr: 13
+   * syscw: 0
+   * read_bytes: 0
+   * write_bytes: 0
+   * cancelled_write_bytes: 0
+   */
+  def updateValues(now: Long): Boolean = this.synchronized {

Review comment:
       Also there is not going to be an I/O stall reading from `/proc` since `/proc` is not a real disk.  I would still prefer to do fewer system calls rather than more, but I don't think that's worth adding a background thread for.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] cmccabe commented on pull request #8569: KIP-551: Expose disk read and write metrics

Posted by GitBox <gi...@apache.org>.
cmccabe commented on pull request #8569:
URL: https://github.com/apache/kafka/pull/8569#issuecomment-630352959


   @mumrah :  Good question.  I don't think anyone has looked at Sigar.  I guess the question is whether we want to get into the business of doing general-purpose node monitoring.  I think many people would say no.  We're doing this metric mainly because it's very simple to check, and also very impactful for Kafka (starting heavy disk reads often correlates with performance tanking).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] cmccabe commented on a change in pull request #8569: KIP-551: Expose disk read and write metrics

Posted by GitBox <gi...@apache.org>.
cmccabe commented on a change in pull request #8569:
URL: https://github.com/apache/kafka/pull/8569#discussion_r416961523



##########
File path: core/src/main/scala/kafka/metrics/LinuxIoMetricsCollector.scala
##########
@@ -0,0 +1,96 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package kafka.server
+
+import java.nio.file.{Files, Paths}
+
+import org.apache.kafka.common.utils.Time
+import org.slf4j.Logger
+
+import scala.jdk.CollectionConverters._
+
+/**
+ * Retrieves Linux /proc/self/io metrics.
+ */
+class LinuxIoMetricsCollector(val procPath: String, val time: Time, val logger: Logger) {
+  import LinuxIoMetricsCollector._
+  var lastUpdateMs = -1L
+  var cachedReadBytes = 0L
+  var cachedWriteBytes = 0L
+
+  def readBytes(): Long = this.synchronized {
+    val curMs = time.milliseconds()
+    if (curMs != lastUpdateMs) {
+      updateValues(curMs)
+    }
+    cachedReadBytes
+  }
+
+  def writeBytes(): Long = this.synchronized {
+    val curMs = time.milliseconds()
+    if (curMs != lastUpdateMs) {
+      updateValues(curMs)
+    }
+    cachedWriteBytes
+  }
+
+  /**
+   * Read /proc/self/io.
+   *
+   * Generally, each line in this file contains a prefix followed by a colon and a number.
+   *
+   * For example, it might contain this:
+   * rchar: 4052
+   * wchar: 0
+   * syscr: 13
+   * syscw: 0
+   * read_bytes: 0
+   * write_bytes: 0
+   * cancelled_write_bytes: 0
+   */
+  def updateValues(now: Long): Boolean = this.synchronized {

Review comment:
       Unless we choose to read this file in a background thread, there isn't a reason to avoid using a lock here.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] cmccabe commented on a change in pull request #8569: KIP-551: Expose disk read and write metrics

Posted by GitBox <gi...@apache.org>.
cmccabe commented on a change in pull request #8569:
URL: https://github.com/apache/kafka/pull/8569#discussion_r422585408



##########
File path: core/src/main/scala/kafka/metrics/LinuxIoMetricsCollector.scala
##########
@@ -0,0 +1,96 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package kafka.server
+
+import java.nio.file.{Files, Paths}
+
+import org.apache.kafka.common.utils.Time
+import org.slf4j.Logger
+
+import scala.jdk.CollectionConverters._
+
+/**
+ * Retrieves Linux /proc/self/io metrics.
+ */
+class LinuxIoMetricsCollector(val procPath: String, val time: Time, val logger: Logger) {
+  import LinuxIoMetricsCollector._
+  var lastUpdateMs = -1L
+  var cachedReadBytes = 0L
+  var cachedWriteBytes = 0L
+
+  def readBytes(): Long = this.synchronized {
+    val curMs = time.milliseconds()
+    if (curMs != lastUpdateMs) {
+      updateValues(curMs)
+    }
+    cachedReadBytes
+  }
+
+  def writeBytes(): Long = this.synchronized {
+    val curMs = time.milliseconds()
+    if (curMs != lastUpdateMs) {
+      updateValues(curMs)
+    }
+    cachedWriteBytes
+  }
+
+  /**
+   * Read /proc/self/io.
+   *
+   * Generally, each line in this file contains a prefix followed by a colon and a number.
+   *
+   * For example, it might contain this:
+   * rchar: 4052
+   * wchar: 0
+   * syscr: 13
+   * syscw: 0
+   * read_bytes: 0
+   * write_bytes: 0
+   * cancelled_write_bytes: 0
+   */
+  def updateValues(now: Long): Boolean = this.synchronized {
+    try {
+      cachedReadBytes = -1
+      cachedWriteBytes = -1
+      val lines = Files.readAllLines(Paths.get(procPath, "self", "io")).asScala
+      lines.foreach(line => {
+        if (line.startsWith(READ_BYTES_PREFIX)) {
+          cachedReadBytes = line.substring(READ_BYTES_PREFIX.size).toLong
+        } else if (line.startsWith(WRITE_BYTES_PREFIX)) {
+          cachedWriteBytes = line.substring(WRITE_BYTES_PREFIX.size).toLong
+        }
+      })
+      lastUpdateMs = now
+      true
+    } catch {
+      case t: Throwable => {
+        logger.warn("LinuxIoMetricsCollector: unable to update metrics", t)
+        false
+      }
+    }
+  }
+
+  def usable(): Boolean = {
+    updateValues(time.milliseconds())

Review comment:
       Good point.  I will add a check.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] hackaugusto commented on pull request #8569: KIP-551: Expose disk read and write metrics

Posted by GitBox <gi...@apache.org>.
hackaugusto commented on pull request #8569:
URL: https://github.com/apache/kafka/pull/8569#issuecomment-1040320969


   @d1egoaz itseems the metric names on the wiki are wrong. This exposes `kafka.server:type=KafkaServer,name=linux-disk-read-bytes` and `kafka.server:type=KafkaServer,name=linux-disk-write-bytes`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] d1egoaz commented on pull request #8569: KIP-551: Expose disk read and write metrics

Posted by GitBox <gi...@apache.org>.
d1egoaz commented on pull request #8569:
URL: https://github.com/apache/kafka/pull/8569#issuecomment-772811614


   @cmccabe 👋 I wonder how does this work
   
   According to https://cwiki.apache.org/confluence/display/KAFKA/KIP-551%3A+Expose+disk+read+and+write+metrics (status resolved)
   
   There should be created these two new metrics:
   
   Full Name | Type | Description
   -- | -- | --
   kafka.server:type=KafkaServer,name=TotalDiskReadBytes | 64-bit gauge | The total number of bytes read by the broker process.  This includes reads from all disks.  It does not include reads that came out of page cache.
   kafka.server:type=KafkaServer,name=TotalDiskWriteBytes | 64-bit gauge | The total number of bytes written by the broker process.  This includes writes from all disks.
   
   However, I don't see them exposed via JMX
   
   Am I missing something? Do I need to enable this in a different way?
   
   Thanks
   
   
   BTW, the Jira ticket mentioned in the confluence page doesn't mention this PR
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] cmccabe commented on a change in pull request #8569: KIP-551: Expose disk read and write metrics

Posted by GitBox <gi...@apache.org>.
cmccabe commented on a change in pull request #8569:
URL: https://github.com/apache/kafka/pull/8569#discussion_r422585559



##########
File path: core/src/main/scala/kafka/metrics/LinuxIoMetricsCollector.scala
##########
@@ -0,0 +1,96 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package kafka.server
+
+import java.nio.file.{Files, Paths}
+
+import org.apache.kafka.common.utils.Time
+import org.slf4j.Logger
+
+import scala.jdk.CollectionConverters._
+
+/**
+ * Retrieves Linux /proc/self/io metrics.
+ */
+class LinuxIoMetricsCollector(val procPath: String, val time: Time, val logger: Logger) {
+  import LinuxIoMetricsCollector._
+  var lastUpdateMs = -1L
+  var cachedReadBytes = 0L
+  var cachedWriteBytes = 0L
+
+  def readBytes(): Long = this.synchronized {
+    val curMs = time.milliseconds()
+    if (curMs != lastUpdateMs) {

Review comment:
       This code is just there to prevent reading from proc more than once a millisecond.  I expect in practice we will read fewer times since we only read when the metrics values are fetched.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] cmccabe commented on a change in pull request #8569: KIP-551: Expose disk read and write metrics

Posted by GitBox <gi...@apache.org>.
cmccabe commented on a change in pull request #8569:
URL: https://github.com/apache/kafka/pull/8569#discussion_r426808657



##########
File path: core/src/main/scala/kafka/metrics/LinuxIoMetricsCollector.scala
##########
@@ -0,0 +1,102 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package kafka.server
+
+import java.nio.file.{Files, Paths}
+
+import org.apache.kafka.common.utils.Time
+import org.slf4j.Logger
+
+import scala.jdk.CollectionConverters._
+
+/**
+ * Retrieves Linux /proc/self/io metrics.
+ */
+class LinuxIoMetricsCollector(procRoot: String, val time: Time, val logger: Logger) {
+  import LinuxIoMetricsCollector._
+  var lastUpdateMs = -1L
+  var cachedReadBytes = 0L
+  var cachedWriteBytes = 0L
+  val path = Paths.get(procRoot, "self", "io")
+
+  def readBytes(): Long = this.synchronized {
+    val curMs = time.milliseconds()

Review comment:
       Interesting idea, but that would complicate the `usable` function, right?  Probably better to leave it where it is.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] cmccabe merged pull request #8569: KIP-551: Expose disk read and write metrics

Posted by GitBox <gi...@apache.org>.
cmccabe merged pull request #8569:
URL: https://github.com/apache/kafka/pull/8569


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org