You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by GitBox <gi...@apache.org> on 2020/06/11 03:37:50 UTC

[GitHub] [kafka] skaundinya15 opened a new pull request #8850: KAFKA-10141: Add more detail to log segment delete messages

skaundinya15 opened a new pull request #8850:
URL: https://github.com/apache/kafka/pull/8850


   As specified in https://issues.apache.org/jira/browse/KAFKA-10141, it would be helpful to include as much information as possible when deleting log segments. This patch introduces log messages that give more specific details as to why the log segment was deleted and the specific metadata regarding that log segment.
   
   
   *More detailed description of your change,
   if necessary. The PR title and PR message become
   the squashed commit message, so use a separate
   comment to ping reviewers.*
   
   *Summary of testing strategy (including rationale)
   for the feature or bug fix. Unit and/or integration
   tests are expected for any behaviour change and
   system tests should be considered for larger changes.*
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] skaundinya15 commented on a change in pull request #8850: KAFKA-10141: Add more detail to log segment delete messages

Posted by GitBox <gi...@apache.org>.
skaundinya15 commented on a change in pull request #8850:
URL: https://github.com/apache/kafka/pull/8850#discussion_r438573463



##########
File path: core/src/main/scala/kafka/log/Log.scala
##########
@@ -1784,8 +1784,18 @@ class Log(@volatile private var _dir: File,
   private def deleteRetentionMsBreachedSegments(): Int = {
     if (config.retentionMs < 0) return 0
     val startMs = time.milliseconds
-    deleteOldSegments((segment, _) => startMs - segment.largestTimestamp > config.retentionMs,
-      reason = s"retention time ${config.retentionMs}ms breach")
+
+    def shouldDelete(segment: LogSegment, nextSegmentOpt: Option[LogSegment]) = {
+      if (startMs - segment.largestTimestamp > config.retentionMs) {
+        info(s"Segment with base offset ${segment.baseOffset} will be deleted due to" +
+          s" retentionMs breach. Largest timestamp of segment is ${segment.largestTimestamp}")
+        true
+      } else {
+        false
+      }
+    }
+
+    deleteOldSegments(shouldDelete, reason = s"retention time ${config.retentionMs}ms breach")

Review comment:
       Yeah I agree, I think it would be redundant to keep the `reason` in `deleteOldSegments`. I'll remove that and as you suggested just mentioned the number of segments getting deleted.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] hachikuji commented on pull request #8850: KAFKA-10141: Add more detail to log segment delete messages

Posted by GitBox <gi...@apache.org>.
hachikuji commented on pull request #8850:
URL: https://github.com/apache/kafka/pull/8850#issuecomment-646126053


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] hachikuji commented on pull request #8850: KAFKA-10141: Add more detail to log segment delete messages

Posted by GitBox <gi...@apache.org>.
hachikuji commented on pull request #8850:
URL: https://github.com/apache/kafka/pull/8850#issuecomment-646369171






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] ijuma commented on pull request #8850: KAFKA-10141: Add more detail to log segment delete messages

Posted by GitBox <gi...@apache.org>.
ijuma commented on pull request #8850:
URL: https://github.com/apache/kafka/pull/8850#issuecomment-646631148


   I guess a downside is that we would always generate the string even if logging was only at the warn level.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] hachikuji commented on pull request #8850: KAFKA-10141: Add more detail to log segment delete messages

Posted by GitBox <gi...@apache.org>.
hachikuji commented on pull request #8850:
URL: https://github.com/apache/kafka/pull/8850#issuecomment-646125726


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] hachikuji commented on pull request #8850: KAFKA-10141: Add more detail to log segment delete messages

Posted by GitBox <gi...@apache.org>.
hachikuji commented on pull request #8850:
URL: https://github.com/apache/kafka/pull/8850#issuecomment-642428104


   ok to test


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] hachikuji commented on pull request #8850: KAFKA-10141: Add more detail to log segment delete messages

Posted by GitBox <gi...@apache.org>.
hachikuji commented on pull request #8850:
URL: https://github.com/apache/kafka/pull/8850#issuecomment-646368944


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] hachikuji commented on a change in pull request #8850: KAFKA-10141: Add more detail to log segment delete messages

Posted by GitBox <gi...@apache.org>.
hachikuji commented on a change in pull request #8850:
URL: https://github.com/apache/kafka/pull/8850#discussion_r438559177



##########
File path: core/src/main/scala/kafka/log/Log.scala
##########
@@ -1784,8 +1784,18 @@ class Log(@volatile private var _dir: File,
   private def deleteRetentionMsBreachedSegments(): Int = {
     if (config.retentionMs < 0) return 0
     val startMs = time.milliseconds
-    deleteOldSegments((segment, _) => startMs - segment.largestTimestamp > config.retentionMs,
-      reason = s"retention time ${config.retentionMs}ms breach")
+
+    def shouldDelete(segment: LogSegment, nextSegmentOpt: Option[LogSegment]) = {
+      if (startMs - segment.largestTimestamp > config.retentionMs) {
+        info(s"Segment with base offset ${segment.baseOffset} will be deleted due to" +

Review comment:
       `LogSegment.largestTimestamp` may refer to either the largest record timestamp for newer formats or the last modified time of the segment for older formats. I think it would be helpful if the log message indicated which case it is. Perhaps we could add a method like this to `LogSegment`?
   ```scala
   def largestRecordTimestamp: Option[Long]
   ```

##########
File path: core/src/main/scala/kafka/log/Log.scala
##########
@@ -1804,8 +1816,15 @@ class Log(@volatile private var _dir: File,
   }
 
   private def deleteLogStartOffsetBreachedSegments(): Int = {
-    def shouldDelete(segment: LogSegment, nextSegmentOpt: Option[LogSegment]) =
-      nextSegmentOpt.exists(_.baseOffset <= logStartOffset)
+    def shouldDelete(segment: LogSegment, nextSegmentOpt: Option[LogSegment]) = {
+      if (nextSegmentOpt.exists(_.baseOffset <= logStartOffset)) {
+        info (s"Segment with base offset ${segment.baseOffset} will be deleted due to" +

Review comment:
       nit: space after `info`

##########
File path: core/src/main/scala/kafka/log/Log.scala
##########
@@ -1784,8 +1784,18 @@ class Log(@volatile private var _dir: File,
   private def deleteRetentionMsBreachedSegments(): Int = {
     if (config.retentionMs < 0) return 0
     val startMs = time.milliseconds
-    deleteOldSegments((segment, _) => startMs - segment.largestTimestamp > config.retentionMs,
-      reason = s"retention time ${config.retentionMs}ms breach")
+
+    def shouldDelete(segment: LogSegment, nextSegmentOpt: Option[LogSegment]) = {
+      if (startMs - segment.largestTimestamp > config.retentionMs) {
+        info(s"Segment with base offset ${segment.baseOffset} will be deleted due to" +
+          s" retentionMs breach. Largest timestamp of segment is ${segment.largestTimestamp}")
+        true
+      } else {
+        false
+      }
+    }
+
+    deleteOldSegments(shouldDelete, reason = s"retention time ${config.retentionMs}ms breach")

Review comment:
       With the logging we have above, do you think we still need the message in `deleteOldSegments`? Perhaps we could make it more concise at least. Maybe just mention the number of segments to be deleted for example.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] hachikuji commented on pull request #8850: KAFKA-10141: Add more detail to log segment delete messages

Posted by GitBox <gi...@apache.org>.
hachikuji commented on pull request #8850:
URL: https://github.com/apache/kafka/pull/8850#issuecomment-646125880


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] skaundinya15 commented on pull request #8850: KAFKA-10141: Add more detail to log segment delete messages

Posted by GitBox <gi...@apache.org>.
skaundinya15 commented on pull request #8850:
URL: https://github.com/apache/kafka/pull/8850#issuecomment-645560370


   Thanks for the comments @hachikuji. I've addressed all of your feedback, ready for another look whenever you are.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] skaundinya15 commented on a change in pull request #8850: KAFKA-10141: Add more detail to log segment delete messages

Posted by GitBox <gi...@apache.org>.
skaundinya15 commented on a change in pull request #8850:
URL: https://github.com/apache/kafka/pull/8850#discussion_r441755277



##########
File path: core/src/main/scala/kafka/log/Log.scala
##########
@@ -1784,8 +1785,26 @@ class Log(@volatile private var _dir: File,
   private def deleteRetentionMsBreachedSegments(): Int = {
     if (config.retentionMs < 0) return 0
     val startMs = time.milliseconds
-    deleteOldSegments((segment, _) => startMs - segment.largestTimestamp > config.retentionMs,
-      reason = s"retention time ${config.retentionMs}ms breach")
+
+    def shouldDelete(segment: LogSegment, nextSegmentOpt: Option[LogSegment]) = {
+      if (startMs - segment.largestTimestamp > config.retentionMs) {
+        segment.largestRecordTimestamp match {
+          case Some(ts) =>
+            info(s"Segment with base offset ${segment.baseOffset} will be deleted due to" +
+              s" retentionMs breach based on the largest record timestamp from the segment, which" +

Review comment:
       Will add it back in




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] hachikuji commented on a change in pull request #8850: KAFKA-10141: Add more detail to log segment delete messages

Posted by GitBox <gi...@apache.org>.
hachikuji commented on a change in pull request #8850:
URL: https://github.com/apache/kafka/pull/8850#discussion_r441699978



##########
File path: core/src/main/scala/kafka/log/Log.scala
##########
@@ -1784,8 +1785,26 @@ class Log(@volatile private var _dir: File,
   private def deleteRetentionMsBreachedSegments(): Int = {
     if (config.retentionMs < 0) return 0
     val startMs = time.milliseconds
-    deleteOldSegments((segment, _) => startMs - segment.largestTimestamp > config.retentionMs,
-      reason = s"retention time ${config.retentionMs}ms breach")
+
+    def shouldDelete(segment: LogSegment, nextSegmentOpt: Option[LogSegment]) = {
+      if (startMs - segment.largestTimestamp > config.retentionMs) {
+        segment.largestRecordTimestamp match {
+          case Some(ts) =>
+            info(s"Segment with base offset ${segment.baseOffset} will be deleted due to" +
+              s" retentionMs breach based on the largest record timestamp from the segment, which" +

Review comment:
       We lost the reference to `config.retentionMs`.

##########
File path: core/src/main/scala/kafka/log/Log.scala
##########
@@ -1794,20 +1813,30 @@ class Log(@volatile private var _dir: File,
     def shouldDelete(segment: LogSegment, nextSegmentOpt: Option[LogSegment]) = {
       if (diff - segment.size >= 0) {
         diff -= segment.size
+        info(s"Segment with base offset ${segment.baseOffset} will be deleted due to" +
+          s" retentionSize breach. Segment size is ${segment.size} and total log size after" +

Review comment:
       Similarly, can you mention `config.retentionSize`?

##########
File path: core/src/main/scala/kafka/log/Log.scala
##########
@@ -1794,20 +1813,30 @@ class Log(@volatile private var _dir: File,
     def shouldDelete(segment: LogSegment, nextSegmentOpt: Option[LogSegment]) = {
       if (diff - segment.size >= 0) {
         diff -= segment.size
+        info(s"Segment with base offset ${segment.baseOffset} will be deleted due to" +
+          s" retentionSize breach. Segment size is ${segment.size} and total log size after" +
+          s" deletion will be ${size - diff}")
         true
       } else {
         false
       }
     }
 
-    deleteOldSegments(shouldDelete, reason = s"retention size in bytes ${config.retentionSize} breach")
+    deleteOldSegments(shouldDelete)
   }
 
   private def deleteLogStartOffsetBreachedSegments(): Int = {
-    def shouldDelete(segment: LogSegment, nextSegmentOpt: Option[LogSegment]) =
-      nextSegmentOpt.exists(_.baseOffset <= logStartOffset)
+    def shouldDelete(segment: LogSegment, nextSegmentOpt: Option[LogSegment]) = {
+      if (nextSegmentOpt.exists(_.baseOffset <= logStartOffset)) {
+        info(s"Segment with base offset ${segment.baseOffset} will be deleted due to" +
+          s" startOffset breach. logStartOffset is ${logStartOffset}")

Review comment:
       nit: braces unnecessary for simple variable references

##########
File path: core/src/main/scala/kafka/log/Log.scala
##########
@@ -1702,11 +1702,12 @@ class Log(@volatile private var _dir: File,
    *                  (if there is one) and returns true iff it is deletable
    * @return The number of segments deleted
    */
-  private def deleteOldSegments(predicate: (LogSegment, Option[LogSegment]) => Boolean, reason: String): Int = {
+  private def deleteOldSegments(predicate: (LogSegment, Option[LogSegment]) => Boolean) = {

Review comment:
       In that case, perhaps we can remove the log message? We already have the following logging when in the `deleteSegments` path:
   ```scala
     private def deleteSegmentFiles(segments: Iterable[LogSegment], asyncDelete: Boolean): Unit = {
       segments.foreach(_.changeFileSuffixes("", Log.DeletedFileSuffix))
   
       def deleteSegments(): Unit = {
         info(s"Deleting segments ${segments.mkString(",")}")
         maybeHandleIOException(s"Error while deleting segments for $topicPartition in dir ${dir.getParent}") {
           segments.foreach(_.deleteIfExists())
         }
       }
   
       if (asyncDelete) {
         info(s"Scheduling segments for deletion ${segments.mkString(",")}")
         scheduler.schedule("delete-file", () => deleteSegments, delay = config.fileDeleteDelayMs)
       } else {
         deleteSegments()
       }
     }
   ```
   So it seems this message is not adding any additional value.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] hachikuji commented on a change in pull request #8850: KAFKA-10141: Add more detail to log segment delete messages

Posted by GitBox <gi...@apache.org>.
hachikuji commented on a change in pull request #8850:
URL: https://github.com/apache/kafka/pull/8850#discussion_r441930200



##########
File path: core/src/main/scala/kafka/log/Log.scala
##########
@@ -1794,20 +1811,30 @@ class Log(@volatile private var _dir: File,
     def shouldDelete(segment: LogSegment, nextSegmentOpt: Option[LogSegment]) = {
       if (diff - segment.size >= 0) {
         diff -= segment.size
+        info(s"Segment with base offset ${segment.baseOffset} will be deleted due to" +
+          s" ${config.retentionSize} breach. Segment size is ${segment.size} and total log size" +

Review comment:
       nit: ".. will be deleted due to **retention size in bytes** ${config.retentionSize} breach."




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] skaundinya15 commented on a change in pull request #8850: KAFKA-10141: Add more detail to log segment delete messages

Posted by GitBox <gi...@apache.org>.
skaundinya15 commented on a change in pull request #8850:
URL: https://github.com/apache/kafka/pull/8850#discussion_r438578146



##########
File path: core/src/main/scala/kafka/log/Log.scala
##########
@@ -1784,8 +1784,18 @@ class Log(@volatile private var _dir: File,
   private def deleteRetentionMsBreachedSegments(): Int = {
     if (config.retentionMs < 0) return 0
     val startMs = time.milliseconds
-    deleteOldSegments((segment, _) => startMs - segment.largestTimestamp > config.retentionMs,
-      reason = s"retention time ${config.retentionMs}ms breach")
+
+    def shouldDelete(segment: LogSegment, nextSegmentOpt: Option[LogSegment]) = {
+      if (startMs - segment.largestTimestamp > config.retentionMs) {
+        info(s"Segment with base offset ${segment.baseOffset} will be deleted due to" +

Review comment:
       That's a good point, I'll implement that method you suggested and using that method log the message accordingly.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] skaundinya15 commented on a change in pull request #8850: KAFKA-10141: Add more detail to log segment delete messages

Posted by GitBox <gi...@apache.org>.
skaundinya15 commented on a change in pull request #8850:
URL: https://github.com/apache/kafka/pull/8850#discussion_r439155424



##########
File path: core/src/main/scala/kafka/log/Log.scala
##########
@@ -1702,11 +1702,12 @@ class Log(@volatile private var _dir: File,
    *                  (if there is one) and returns true iff it is deletable
    * @return The number of segments deleted
    */
-  private def deleteOldSegments(predicate: (LogSegment, Option[LogSegment]) => Boolean, reason: String): Int = {
+  private def deleteOldSegments(predicate: (LogSegment, Option[LogSegment]) => Boolean) = {

Review comment:
       Wouldn't keeping the reason be a redundant? Since for every segment we delete we are logging exactly why we are deleting and the details surrounding the deleition.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] dhruvilshah3 commented on pull request #8850: KAFKA-10141: Add more detail to log segment delete messages

Posted by GitBox <gi...@apache.org>.
dhruvilshah3 commented on pull request #8850:
URL: https://github.com/apache/kafka/pull/8850#issuecomment-667454222


   I attempted to improve the logging further. This also removes the side effect of logging as part of evaluating the predicate. https://github.com/apache/kafka/pull/9110


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] skaundinya15 commented on pull request #8850: KAFKA-10141: Add more detail to log segment delete messages

Posted by GitBox <gi...@apache.org>.
skaundinya15 commented on pull request #8850:
URL: https://github.com/apache/kafka/pull/8850#issuecomment-643012766


   Thanks for the reviews @hachikuji. I updated the PR per your suggestions and left some follow up comments for clarification. Whenever you're ready, the PR is ready for another review.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] skaundinya15 commented on a change in pull request #8850: KAFKA-10141: Add more detail to log segment delete messages

Posted by GitBox <gi...@apache.org>.
skaundinya15 commented on a change in pull request #8850:
URL: https://github.com/apache/kafka/pull/8850#discussion_r441753269



##########
File path: core/src/main/scala/kafka/log/Log.scala
##########
@@ -1784,8 +1785,26 @@ class Log(@volatile private var _dir: File,
   private def deleteRetentionMsBreachedSegments(): Int = {
     if (config.retentionMs < 0) return 0
     val startMs = time.milliseconds
-    deleteOldSegments((segment, _) => startMs - segment.largestTimestamp > config.retentionMs,
-      reason = s"retention time ${config.retentionMs}ms breach")
+
+    def shouldDelete(segment: LogSegment, nextSegmentOpt: Option[LogSegment]) = {
+      if (startMs - segment.largestTimestamp > config.retentionMs) {
+        segment.largestRecordTimestamp match {
+          case Some(ts) =>
+            info(s"Segment with base offset ${segment.baseOffset} will be deleted due to" +
+              s" retentionMs breach based on the largest record timestamp from the segment, which" +

Review comment:
       Will add it back in.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] hachikuji commented on a change in pull request #8850: KAFKA-10141: Add more detail to log segment delete messages

Posted by GitBox <gi...@apache.org>.
hachikuji commented on a change in pull request #8850:
URL: https://github.com/apache/kafka/pull/8850#discussion_r441703376



##########
File path: core/src/main/scala/kafka/log/Log.scala
##########
@@ -1702,11 +1702,12 @@ class Log(@volatile private var _dir: File,
    *                  (if there is one) and returns true iff it is deletable
    * @return The number of segments deleted
    */
-  private def deleteOldSegments(predicate: (LogSegment, Option[LogSegment]) => Boolean, reason: String): Int = {
+  private def deleteOldSegments(predicate: (LogSegment, Option[LogSegment]) => Boolean) = {

Review comment:
       In that case, perhaps we can remove the log message? We already have the following logging in the `deleteSegments` path:
   ```scala
     private def deleteSegmentFiles(segments: Iterable[LogSegment], asyncDelete: Boolean): Unit = {
       segments.foreach(_.changeFileSuffixes("", Log.DeletedFileSuffix))
   
       def deleteSegments(): Unit = {
         info(s"Deleting segments ${segments.mkString(",")}")
         maybeHandleIOException(s"Error while deleting segments for $topicPartition in dir ${dir.getParent}") {
           segments.foreach(_.deleteIfExists())
         }
       }
   
       if (asyncDelete) {
         info(s"Scheduling segments for deletion ${segments.mkString(",")}")
         scheduler.schedule("delete-file", () => deleteSegments, delay = config.fileDeleteDelayMs)
       } else {
         deleteSegments()
       }
     }
   ```
   So it seems this message is not adding any additional value.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] hachikuji merged pull request #8850: KAFKA-10141: Add more detail to log segment delete messages

Posted by GitBox <gi...@apache.org>.
hachikuji merged pull request #8850:
URL: https://github.com/apache/kafka/pull/8850


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] hachikuji commented on a change in pull request #8850: KAFKA-10141: Add more detail to log segment delete messages

Posted by GitBox <gi...@apache.org>.
hachikuji commented on a change in pull request #8850:
URL: https://github.com/apache/kafka/pull/8850#discussion_r438997622



##########
File path: core/src/main/scala/kafka/log/Log.scala
##########
@@ -1702,11 +1702,12 @@ class Log(@volatile private var _dir: File,
    *                  (if there is one) and returns true iff it is deletable
    * @return The number of segments deleted
    */
-  private def deleteOldSegments(predicate: (LogSegment, Option[LogSegment]) => Boolean, reason: String): Int = {
+  private def deleteOldSegments(predicate: (LogSegment, Option[LogSegment]) => Boolean) = {

Review comment:
       Can we keep the reason? On second thought, maybe it's fine to leave this as is.

##########
File path: core/src/main/scala/kafka/log/Log.scala
##########
@@ -1794,20 +1811,29 @@ class Log(@volatile private var _dir: File,
     def shouldDelete(segment: LogSegment, nextSegmentOpt: Option[LogSegment]) = {
       if (diff - segment.size >= 0) {
         diff -= segment.size
+        info(s"Segment with base offset ${segment.baseOffset} will be deleted due to" +

Review comment:
       In addition, it might be useful to know the total log size. Maybe we could include `size - diff` as the size after deletion?

##########
File path: core/src/main/scala/kafka/log/Log.scala
##########
@@ -1784,8 +1785,24 @@ class Log(@volatile private var _dir: File,
   private def deleteRetentionMsBreachedSegments(): Int = {
     if (config.retentionMs < 0) return 0
     val startMs = time.milliseconds
-    deleteOldSegments((segment, _) => startMs - segment.largestTimestamp > config.retentionMs,
-      reason = s"retention time ${config.retentionMs}ms breach")
+
+    def shouldDelete(segment: LogSegment, nextSegmentOpt: Option[LogSegment]) = {
+      if (startMs - segment.largestTimestamp > config.retentionMs) {
+        segment.largestRecordTimestamp match {
+          case Some(ts) =>
+            info(s"Segment with base offset ${segment.baseOffset} will be deleted due to" +
+              s" retentionMs breach. Largest record timestamp of segment is $ts")

Review comment:
       nit: could we make the connection clearer? How about this?
   ```scala
      info(s"Segment with base offset ${segment.baseOffset} will be deleted due to" +
                 s" retentionMs breach based on the largest record timestamp from the segment, which is $ts")
   ```
   
   Also, we seem to have lost mention o




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] skaundinya15 commented on pull request #8850: KAFKA-10141: Add more detail to log segment delete messages

Posted by GitBox <gi...@apache.org>.
skaundinya15 commented on pull request #8850:
URL: https://github.com/apache/kafka/pull/8850#issuecomment-645797928


   Thanks @hachikuji, addressed the comment.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] skaundinya15 commented on pull request #8850: KAFKA-10141: Add more detail to log segment delete messages

Posted by GitBox <gi...@apache.org>.
skaundinya15 commented on pull request #8850:
URL: https://github.com/apache/kafka/pull/8850#issuecomment-642454876


   Thanks for the review @hachikuji, just updated the PR addressing your comments. The PR is ready for a review whenever you are.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] skaundinya15 commented on a change in pull request #8850: KAFKA-10141: Add more detail to log segment delete messages

Posted by GitBox <gi...@apache.org>.
skaundinya15 commented on a change in pull request #8850:
URL: https://github.com/apache/kafka/pull/8850#discussion_r441752358



##########
File path: core/src/main/scala/kafka/log/Log.scala
##########
@@ -1702,11 +1702,12 @@ class Log(@volatile private var _dir: File,
    *                  (if there is one) and returns true iff it is deletable
    * @return The number of segments deleted
    */
-  private def deleteOldSegments(predicate: (LogSegment, Option[LogSegment]) => Boolean, reason: String): Int = {
+  private def deleteOldSegments(predicate: (LogSegment, Option[LogSegment]) => Boolean) = {

Review comment:
       Sure, will do.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] skaundinya15 commented on a change in pull request #8850: KAFKA-10141: Add more detail to log segment delete messages

Posted by GitBox <gi...@apache.org>.
skaundinya15 commented on a change in pull request #8850:
URL: https://github.com/apache/kafka/pull/8850#discussion_r439156564



##########
File path: core/src/main/scala/kafka/log/Log.scala
##########
@@ -1784,8 +1785,24 @@ class Log(@volatile private var _dir: File,
   private def deleteRetentionMsBreachedSegments(): Int = {
     if (config.retentionMs < 0) return 0
     val startMs = time.milliseconds
-    deleteOldSegments((segment, _) => startMs - segment.largestTimestamp > config.retentionMs,
-      reason = s"retention time ${config.retentionMs}ms breach")
+
+    def shouldDelete(segment: LogSegment, nextSegmentOpt: Option[LogSegment]) = {
+      if (startMs - segment.largestTimestamp > config.retentionMs) {
+        segment.largestRecordTimestamp match {
+          case Some(ts) =>
+            info(s"Segment with base offset ${segment.baseOffset} will be deleted due to" +
+              s" retentionMs breach. Largest record timestamp of segment is $ts")

Review comment:
       > Also, we seem to have lost mention o
   
   I'll clarify the log message as you suggested. Could you explain what you meant by the above?
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] hachikuji commented on a change in pull request #8850: KAFKA-10141: Add more detail to log segment delete messages

Posted by GitBox <gi...@apache.org>.
hachikuji commented on a change in pull request #8850:
URL: https://github.com/apache/kafka/pull/8850#discussion_r441705818



##########
File path: core/src/main/scala/kafka/log/Log.scala
##########
@@ -1784,8 +1785,24 @@ class Log(@volatile private var _dir: File,
   private def deleteRetentionMsBreachedSegments(): Int = {
     if (config.retentionMs < 0) return 0
     val startMs = time.milliseconds
-    deleteOldSegments((segment, _) => startMs - segment.largestTimestamp > config.retentionMs,
-      reason = s"retention time ${config.retentionMs}ms breach")
+
+    def shouldDelete(segment: LogSegment, nextSegmentOpt: Option[LogSegment]) = {
+      if (startMs - segment.largestTimestamp > config.retentionMs) {
+        segment.largestRecordTimestamp match {
+          case Some(ts) =>
+            info(s"Segment with base offset ${segment.baseOffset} will be deleted due to" +
+              s" retentionMs breach. Largest record timestamp of segment is $ts")

Review comment:
       Sorry, forgot to finish my thought. I think I was going to mention that we lost the reference to the configuration value.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org