You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by GitBox <gi...@apache.org> on 2020/08/01 02:21:36 UTC

[GitHub] [kafka] dhruvilshah3 opened a new pull request #9110: MINOR: Ensure a reason is logged for every segment deletion

dhruvilshah3 opened a new pull request #9110:
URL: https://github.com/apache/kafka/pull/9110


   This PR improves the logging for segment deletion to ensure that a reason is logged for every segment that is deleted.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] dhruvilshah3 commented on pull request #9110: MINOR: Ensure a reason is logged for all segment deletion operations

Posted by GitBox <gi...@apache.org>.
dhruvilshah3 commented on pull request #9110:
URL: https://github.com/apache/kafka/pull/9110#issuecomment-668151967


   Thanks for the reviews @kowshik, @hachikuji, @ijuma. I have addressed the comments.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] kowshik commented on a change in pull request #9110: MINOR: Ensure a reason is logged for every segment deletion

Posted by GitBox <gi...@apache.org>.
kowshik commented on a change in pull request #9110:
URL: https://github.com/apache/kafka/pull/9110#discussion_r464213840



##########
File path: core/src/main/scala/kafka/log/Log.scala
##########
@@ -2227,13 +2210,16 @@ class Log(@volatile private var _dir: File,
    * @param segments The log segments to schedule for deletion
    * @param asyncDelete Whether the segment files should be deleted asynchronously
    */
-  private def removeAndDeleteSegments(segments: Iterable[LogSegment], asyncDelete: Boolean): Unit = {
+  private def removeAndDeleteSegments(segments: Iterable[LogSegment],
+                                      asyncDelete: Boolean,
+                                      reason: SegmentDeletionReason): Unit = {
     if (segments.nonEmpty) {
       lock synchronized {
         // As most callers hold an iterator into the `segments` collection and `removeAndDeleteSegment` mutates it by
         // removing the deleted segment, we should force materialization of the iterator here, so that results of the
         // iteration remain valid and deterministic.
         val toDelete = segments.toList
+        println(s"${reason.reasonString(this, toDelete)}")

Review comment:
       Did you mean to use `info` level logging?
   

##########
File path: core/src/main/scala/kafka/log/Log.scala
##########
@@ -2686,3 +2670,50 @@ object LogMetricNames {
     List(NumLogSegments, LogStartOffset, LogEndOffset, Size)
   }
 }
+
+sealed trait SegmentDeletionReason {
+  def reasonString(log: Log, toDelete: Iterable[LogSegment]): String
+}
+
+case object RetentionMsBreachDeletion extends SegmentDeletionReason {
+  override def reasonString(log: Log, toDelete: Iterable[LogSegment]): String = {
+    s"Deleting segments due to retention time ${log.config.retentionMs}ms breach: ${toDelete.mkString(",")}"
+  }
+}
+
+case object RetentionSizeBreachDeletion extends SegmentDeletionReason {
+  override def reasonString(log: Log, toDelete: Iterable[LogSegment]): String = {
+    s"Deleting segments due to retention size ${log.config.retentionSize} breach. " +
+      s"Current log size is ${log.size}. ${toDelete.mkString(",")}"

Review comment:
       nit: replace "." with ":" 
   
   `s"Current log size is ${log.size}: ${toDelete.mkString(",")}"`

##########
File path: core/src/main/scala/kafka/log/LogSegment.scala
##########
@@ -413,7 +413,7 @@ class LogSegment private[log] (val log: FileRecords,
   override def toString: String = "LogSegment(baseOffset=" + baseOffset +
     ", size=" + size +
     ", lastModifiedTime=" + lastModified +
-    ", largestTime=" + largestTimestamp +
+    ", largestRecordTimestamp=" + largestRecordTimestamp +

Review comment:
       Should we keep `largestTime` from LHS, and, print both `largestRecordTimestamp` and `largestTime` ?

##########
File path: core/src/main/scala/kafka/log/Log.scala
##########
@@ -2227,14 +2210,17 @@ class Log(@volatile private var _dir: File,
    * @param segments The log segments to schedule for deletion
    * @param asyncDelete Whether the segment files should be deleted asynchronously
    */
-  private def removeAndDeleteSegments(segments: Iterable[LogSegment], asyncDelete: Boolean): Unit = {
+  private def removeAndDeleteSegments(segments: Iterable[LogSegment],
+                                      asyncDelete: Boolean,
+                                      reason: SegmentDeletionReason): Unit = {
     if (segments.nonEmpty) {
       lock synchronized {
         // As most callers hold an iterator into the `segments` collection and `removeAndDeleteSegment` mutates it by
         // removing the deleted segment, we should force materialization of the iterator here, so that results of the
         // iteration remain valid and deterministic.
         val toDelete = segments.toList
         toDelete.foreach { segment =>
+          info(s"${reason.reasonString(this, segment)}")

Review comment:
       @dhruvilshah3 Sounds good!
   
   @ijuma Good point. I'd guess that also during initial hotset reduction there will be an increase in logging, but that will be temporary.
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] dhruvilshah3 commented on a change in pull request #9110: MINOR: Ensure a reason is logged for all segment deletion operations

Posted by GitBox <gi...@apache.org>.
dhruvilshah3 commented on a change in pull request #9110:
URL: https://github.com/apache/kafka/pull/9110#discussion_r464100823



##########
File path: core/src/main/scala/kafka/log/Log.scala
##########
@@ -2227,14 +2210,17 @@ class Log(@volatile private var _dir: File,
    * @param segments The log segments to schedule for deletion
    * @param asyncDelete Whether the segment files should be deleted asynchronously
    */
-  private def removeAndDeleteSegments(segments: Iterable[LogSegment], asyncDelete: Boolean): Unit = {
+  private def removeAndDeleteSegments(segments: Iterable[LogSegment],
+                                      asyncDelete: Boolean,
+                                      reason: SegmentDeletionReason): Unit = {
     if (segments.nonEmpty) {
       lock synchronized {
         // As most callers hold an iterator into the `segments` collection and `removeAndDeleteSegment` mutates it by
         // removing the deleted segment, we should force materialization of the iterator here, so that results of the
         // iteration remain valid and deterministic.
         val toDelete = segments.toList
         toDelete.foreach { segment =>
+          info(s"${reason.reasonString(this, segment)}")

Review comment:
       @ijuma We log one message per deleted segment. This could cause temporary increase in log volume when DeleteRecords is used or when retention is lowered, for example.
   
   Overall, we have a few options with different tradeoffs:
   
   1. Log a common reason per batch being deleted, including base offsets of segments being deleted. eg.
   ```
   Deleting segments due to retention time 999ms breach. BaseOffsets: (0,5,...).
   ```
   
   2. Log a common reason per batch being deleted, including base offsets and metadata of segments. eg. 
   ```
   Deleting segments due to retention time 999ms breach: LogSegment(baseOffset=0, size=360, lastModifiedTime=1596387738000, largestRecordTimestamp=Some(1596387737414)),LogSegment(baseOffset=5, size=360, lastModifiedTime=1596387738000, largestRecordTimestamp=Some(1596387737414)),...
   ```
   
   3. Log one message per segment being deleted. This is the current behavior. eg.
   ```
   Segment with base offset 0 will be deleted due to retention time 999ms breach based on the largest record timestamp from the segment, which is ...
   Segment with base offset 5 will be deleted due to retention time 999ms breach based on the largest record timestamp from the segment, which is ...
   ...
   ```
   
   Doing (2) may be a reasonable tradeoff. It eliminates some of the redundancy at the cost of making it to glean per segment metadata. Let me know what you think.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] dhruvilshah3 commented on a change in pull request #9110: MINOR: Ensure a reason is logged for every segment deletion

Posted by GitBox <gi...@apache.org>.
dhruvilshah3 commented on a change in pull request #9110:
URL: https://github.com/apache/kafka/pull/9110#discussion_r464100823



##########
File path: core/src/main/scala/kafka/log/Log.scala
##########
@@ -2227,14 +2210,17 @@ class Log(@volatile private var _dir: File,
    * @param segments The log segments to schedule for deletion
    * @param asyncDelete Whether the segment files should be deleted asynchronously
    */
-  private def removeAndDeleteSegments(segments: Iterable[LogSegment], asyncDelete: Boolean): Unit = {
+  private def removeAndDeleteSegments(segments: Iterable[LogSegment],
+                                      asyncDelete: Boolean,
+                                      reason: SegmentDeletionReason): Unit = {
     if (segments.nonEmpty) {
       lock synchronized {
         // As most callers hold an iterator into the `segments` collection and `removeAndDeleteSegment` mutates it by
         // removing the deleted segment, we should force materialization of the iterator here, so that results of the
         // iteration remain valid and deterministic.
         val toDelete = segments.toList
         toDelete.foreach { segment =>
+          info(s"${reason.reasonString(this, segment)}")

Review comment:
       @ijuma We log one message per deleted segment. This could cause temporary increase in log volume when DeleteRecords is used or when retention is lowered, for example.
   
   Overall, we have a few options with different tradeoffs:
   
   1. Log a common reason per batch being deleted, including base offsets of segments being deleted. This was the behavior before https://github.com/apache/kafka/pull/8850. eg.
   ```
   Deleting segments due to retention time 999ms breach. BaseOffsets: (0,5,...).
   ```
   
   2. Log a common reason per batch being deleted, including base offsets and metadata of segments. eg. 
   ```
   Deleting segments due to retention time 999ms breach: LogSegment(baseOffset=0, size=360, lastModifiedTime=1596387738000, largestRecordTimestamp=Some(1596387737414)),LogSegment(baseOffset=5, size=360, lastModifiedTime=1596387738000, largestRecordTimestamp=Some(1596387737414)),...
   ```
   
   3. Log one message per segment being deleted. This is the current behavior. eg.
   ```
   Segment with base offset 0 will be deleted due to retention time 999ms breach based on the largest record timestamp from the segment, which is ...
   Segment with base offset 5 will be deleted due to retention time 999ms breach based on the largest record timestamp from the segment, which is ...
   ...
   ```
   
   Doing (2) may be a reasonable tradeoff. It eliminates some of the redundancy at the cost of making it to glean per segment metadata. Let me know what you think.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] dhruvilshah3 commented on a change in pull request #9110: MINOR: Ensure a reason is logged for all segment deletion operations

Posted by GitBox <gi...@apache.org>.
dhruvilshah3 commented on a change in pull request #9110:
URL: https://github.com/apache/kafka/pull/9110#discussion_r466601556



##########
File path: core/src/main/scala/kafka/log/Log.scala
##########
@@ -2227,13 +2210,16 @@ class Log(@volatile private var _dir: File,
    * @param segments The log segments to schedule for deletion
    * @param asyncDelete Whether the segment files should be deleted asynchronously
    */
-  private def removeAndDeleteSegments(segments: Iterable[LogSegment], asyncDelete: Boolean): Unit = {
+  private def removeAndDeleteSegments(segments: Iterable[LogSegment],
+                                      asyncDelete: Boolean,
+                                      reason: SegmentDeletionReason): Unit = {
     if (segments.nonEmpty) {
       lock synchronized {
         // As most callers hold an iterator into the `segments` collection and `removeAndDeleteSegment` mutates it by
         // removing the deleted segment, we should force materialization of the iterator here, so that results of the
         // iteration remain valid and deterministic.
         val toDelete = segments.toList
+        info(s"${reason.logReason(this, toDelete)}")

Review comment:
       Thanks for catching. I fixed this.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] ijuma commented on a change in pull request #9110: MINOR: Ensure a reason is logged for every segment deletion

Posted by GitBox <gi...@apache.org>.
ijuma commented on a change in pull request #9110:
URL: https://github.com/apache/kafka/pull/9110#discussion_r463980962



##########
File path: core/src/main/scala/kafka/log/Log.scala
##########
@@ -2227,14 +2210,17 @@ class Log(@volatile private var _dir: File,
    * @param segments The log segments to schedule for deletion
    * @param asyncDelete Whether the segment files should be deleted asynchronously
    */
-  private def removeAndDeleteSegments(segments: Iterable[LogSegment], asyncDelete: Boolean): Unit = {
+  private def removeAndDeleteSegments(segments: Iterable[LogSegment],
+                                      asyncDelete: Boolean,
+                                      reason: SegmentDeletionReason): Unit = {
     if (segments.nonEmpty) {
       lock synchronized {
         // As most callers hold an iterator into the `segments` collection and `removeAndDeleteSegment` mutates it by
         // removing the deleted segment, we should force materialization of the iterator here, so that results of the
         // iteration remain valid and deterministic.
         val toDelete = segments.toList
         toDelete.foreach { segment =>
+          info(s"${reason.reasonString(this, segment)}")

Review comment:
       Can we think about cases where this could be an issue? Say delete records is used, causing a large number of segments to be deleted Could that trigger excessive logging?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] dhruvilshah3 edited a comment on pull request #9110: MINOR: Ensure a reason is logged for all segment deletion operations

Posted by GitBox <gi...@apache.org>.
dhruvilshah3 edited a comment on pull request #9110:
URL: https://github.com/apache/kafka/pull/9110#issuecomment-668151967


   Thanks for the reviews @kowshik, @hachikuji, @ijuma. I have addressed the comments. Let me know what you think.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] hachikuji commented on a change in pull request #9110: MINOR: Ensure a reason is logged for all segment deletion operations

Posted by GitBox <gi...@apache.org>.
hachikuji commented on a change in pull request #9110:
URL: https://github.com/apache/kafka/pull/9110#discussion_r466550492



##########
File path: core/src/main/scala/kafka/log/Log.scala
##########
@@ -2227,13 +2210,16 @@ class Log(@volatile private var _dir: File,
    * @param segments The log segments to schedule for deletion
    * @param asyncDelete Whether the segment files should be deleted asynchronously
    */
-  private def removeAndDeleteSegments(segments: Iterable[LogSegment], asyncDelete: Boolean): Unit = {
+  private def removeAndDeleteSegments(segments: Iterable[LogSegment],
+                                      asyncDelete: Boolean,
+                                      reason: SegmentDeletionReason): Unit = {
     if (segments.nonEmpty) {
       lock synchronized {
         // As most callers hold an iterator into the `segments` collection and `removeAndDeleteSegment` mutates it by
         // removing the deleted segment, we should force materialization of the iterator here, so that results of the
         // iteration remain valid and deterministic.
         val toDelete = segments.toList
+        info(s"${reason.logReason(this, toDelete)}")

Review comment:
       Don't we need to get rid of the call to `info` here?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] kowshik commented on a change in pull request #9110: MINOR: Ensure a reason is logged for every segment deletion

Posted by GitBox <gi...@apache.org>.
kowshik commented on a change in pull request #9110:
URL: https://github.com/apache/kafka/pull/9110#discussion_r464012842



##########
File path: core/src/main/scala/kafka/log/Log.scala
##########
@@ -2227,14 +2210,17 @@ class Log(@volatile private var _dir: File,
    * @param segments The log segments to schedule for deletion
    * @param asyncDelete Whether the segment files should be deleted asynchronously
    */
-  private def removeAndDeleteSegments(segments: Iterable[LogSegment], asyncDelete: Boolean): Unit = {
+  private def removeAndDeleteSegments(segments: Iterable[LogSegment],
+                                      asyncDelete: Boolean,
+                                      reason: SegmentDeletionReason): Unit = {
     if (segments.nonEmpty) {
       lock synchronized {
         // As most callers hold an iterator into the `segments` collection and `removeAndDeleteSegment` mutates it by
         // removing the deleted segment, we should force materialization of the iterator here, so that results of the
         // iteration remain valid and deterministic.
         val toDelete = segments.toList
         toDelete.foreach { segment =>
+          info(s"${reason.reasonString(this, segment)}")

Review comment:
       @dhruvilshah3 Sounds good!
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] hachikuji commented on a change in pull request #9110: MINOR: Ensure a reason is logged for all segment deletion operations

Posted by GitBox <gi...@apache.org>.
hachikuji commented on a change in pull request #9110:
URL: https://github.com/apache/kafka/pull/9110#discussion_r465221060



##########
File path: core/src/main/scala/kafka/log/Log.scala
##########
@@ -2227,14 +2210,17 @@ class Log(@volatile private var _dir: File,
    * @param segments The log segments to schedule for deletion
    * @param asyncDelete Whether the segment files should be deleted asynchronously
    */
-  private def removeAndDeleteSegments(segments: Iterable[LogSegment], asyncDelete: Boolean): Unit = {
+  private def removeAndDeleteSegments(segments: Iterable[LogSegment],
+                                      asyncDelete: Boolean,
+                                      reason: SegmentDeletionReason): Unit = {
     if (segments.nonEmpty) {
       lock synchronized {
         // As most callers hold an iterator into the `segments` collection and `removeAndDeleteSegment` mutates it by
         // removing the deleted segment, we should force materialization of the iterator here, so that results of the
         // iteration remain valid and deterministic.
         val toDelete = segments.toList
         toDelete.foreach { segment =>
+          info(s"${reason.reasonString(this, segment)}")

Review comment:
       I think it is important to capture the segment level details. In the past, we have had trouble explaining precisely why a specific segment got deleted. For example, was it because of the last modified time or the record timestamp? When users are looking to understand why data is deleted, we should be able to provide a clear answer.
   
   My personal preference is probably 3) because I hate dealing with lists of things in log messages. Simple grepping no longer work to extract the details. Big messages also messes up console scrolling and can choke downstream systems. For segments, I am not so worried about log noise because the rate of segment creation is not _that_ high. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] hachikuji commented on a change in pull request #9110: MINOR: Ensure a reason is logged for all segment deletion operations

Posted by GitBox <gi...@apache.org>.
hachikuji commented on a change in pull request #9110:
URL: https://github.com/apache/kafka/pull/9110#discussion_r464545190



##########
File path: core/src/main/scala/kafka/log/Log.scala
##########
@@ -2227,13 +2210,16 @@ class Log(@volatile private var _dir: File,
    * @param segments The log segments to schedule for deletion
    * @param asyncDelete Whether the segment files should be deleted asynchronously
    */
-  private def removeAndDeleteSegments(segments: Iterable[LogSegment], asyncDelete: Boolean): Unit = {
+  private def removeAndDeleteSegments(segments: Iterable[LogSegment],
+                                      asyncDelete: Boolean,
+                                      reason: SegmentDeletionReason): Unit = {
     if (segments.nonEmpty) {
       lock synchronized {
         // As most callers hold an iterator into the `segments` collection and `removeAndDeleteSegment` mutates it by
         // removing the deleted segment, we should force materialization of the iterator here, so that results of the
         // iteration remain valid and deterministic.
         val toDelete = segments.toList
+        info(s"${reason.reasonString(this, toDelete)}")

Review comment:
       A little annoying to need to pass through segments just to be added to each log message individually. Maybe we could do it like this instead
   ```scala
   info(s"Deleting segments due to ${reason.reasonString(this)}: $toDelete")
   ```

##########
File path: core/src/main/scala/kafka/log/Log.scala
##########
@@ -2686,3 +2670,50 @@ object LogMetricNames {
     List(NumLogSegments, LogStartOffset, LogEndOffset, Size)
   }
 }
+
+sealed trait SegmentDeletionReason {
+  def reasonString(log: Log, toDelete: Iterable[LogSegment]): String
+}
+
+case object RetentionMsBreachDeletion extends SegmentDeletionReason {

Review comment:
       nit: is it necessary to add `Deletion` to all of these? Maybe only `LogDeletion` needs it since it is referring to deletion of the log itself.

##########
File path: core/src/main/scala/kafka/log/LogSegment.scala
##########
@@ -413,7 +413,7 @@ class LogSegment private[log] (val log: FileRecords,
   override def toString: String = "LogSegment(baseOffset=" + baseOffset +
     ", size=" + size +
     ", lastModifiedTime=" + lastModified +
-    ", largestTime=" + largestTimestamp +
+    ", largestRecordTimestamp=" + largestRecordTimestamp +

Review comment:
       I'm ok with the change. I think it's better to reflect the underlying fields directly and redundant information just adds noise to the logs.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] hachikuji commented on pull request #9110: MINOR: Ensure a reason is logged for all segment deletion operations

Posted by GitBox <gi...@apache.org>.
hachikuji commented on pull request #9110:
URL: https://github.com/apache/kafka/pull/9110#issuecomment-670127847


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] dhruvilshah3 commented on a change in pull request #9110: MINOR: Ensure a reason is logged for every segment deletion

Posted by GitBox <gi...@apache.org>.
dhruvilshah3 commented on a change in pull request #9110:
URL: https://github.com/apache/kafka/pull/9110#discussion_r463975069



##########
File path: core/src/main/scala/kafka/log/Log.scala
##########
@@ -2227,14 +2210,17 @@ class Log(@volatile private var _dir: File,
    * @param segments The log segments to schedule for deletion
    * @param asyncDelete Whether the segment files should be deleted asynchronously
    */
-  private def removeAndDeleteSegments(segments: Iterable[LogSegment], asyncDelete: Boolean): Unit = {
+  private def removeAndDeleteSegments(segments: Iterable[LogSegment],
+                                      asyncDelete: Boolean,
+                                      reason: SegmentDeletionReason): Unit = {
     if (segments.nonEmpty) {
       lock synchronized {
         // As most callers hold an iterator into the `segments` collection and `removeAndDeleteSegment` mutates it by
         // removing the deleted segment, we should force materialization of the iterator here, so that results of the
         // iteration remain valid and deterministic.
         val toDelete = segments.toList
         toDelete.foreach { segment =>
+          info(s"${reason.reasonString(this, segment)}")

Review comment:
       While verbose, I think having the granularity of each segment is useful. This allows us to easily reason about why a particular segment was deleted. Note that we switched from a single log per batch to a log per segment in https://github.com/apache/kafka/pull/8850.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] ijuma commented on pull request #9110: MINOR: Ensure a reason is logged for all segment deletion operations

Posted by GitBox <gi...@apache.org>.
ijuma commented on pull request #9110:
URL: https://github.com/apache/kafka/pull/9110#issuecomment-668162389


   ok to test


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] hachikuji commented on pull request #9110: MINOR: Ensure a reason is logged for all segment deletion operations

Posted by GitBox <gi...@apache.org>.
hachikuji commented on pull request #9110:
URL: https://github.com/apache/kafka/pull/9110#issuecomment-669552517


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] hachikuji commented on pull request #9110: MINOR: Ensure a reason is logged for all segment deletion operations

Posted by GitBox <gi...@apache.org>.
hachikuji commented on pull request #9110:
URL: https://github.com/apache/kafka/pull/9110#issuecomment-669552682


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] hachikuji commented on pull request #9110: MINOR: Ensure a reason is logged for all segment deletion operations

Posted by GitBox <gi...@apache.org>.
hachikuji commented on pull request #9110:
URL: https://github.com/apache/kafka/pull/9110#issuecomment-670239077


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] hachikuji commented on pull request #9110: MINOR: Ensure a reason is logged for all segment deletion operations

Posted by GitBox <gi...@apache.org>.
hachikuji commented on pull request #9110:
URL: https://github.com/apache/kafka/pull/9110#issuecomment-670126784






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] hachikuji merged pull request #9110: MINOR: Ensure a reason is logged for all segment deletion operations

Posted by GitBox <gi...@apache.org>.
hachikuji merged pull request #9110:
URL: https://github.com/apache/kafka/pull/9110


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] kowshik commented on a change in pull request #9110: MINOR: Ensure a reason is logged for every segment deletion

Posted by GitBox <gi...@apache.org>.
kowshik commented on a change in pull request #9110:
URL: https://github.com/apache/kafka/pull/9110#discussion_r463914087



##########
File path: core/src/main/scala/kafka/log/Log.scala
##########
@@ -2227,14 +2210,17 @@ class Log(@volatile private var _dir: File,
    * @param segments The log segments to schedule for deletion
    * @param asyncDelete Whether the segment files should be deleted asynchronously
    */
-  private def removeAndDeleteSegments(segments: Iterable[LogSegment], asyncDelete: Boolean): Unit = {
+  private def removeAndDeleteSegments(segments: Iterable[LogSegment],
+                                      asyncDelete: Boolean,
+                                      reason: SegmentDeletionReason): Unit = {
     if (segments.nonEmpty) {
       lock synchronized {
         // As most callers hold an iterator into the `segments` collection and `removeAndDeleteSegment` mutates it by
         // removing the deleted segment, we should force materialization of the iterator here, so that results of the
         // iteration remain valid and deterministic.
         val toDelete = segments.toList
         toDelete.foreach { segment =>
+          info(s"${reason.reasonString(this, segment)}")

Review comment:
       If we passed in the deletion reason further into the `deleteSegmentFiles` method, it seems we can print the reason string just once for a batch of segments being deleted. And within the reason string, we can provide the reason for deleting the batch:
   
   https://github.com/confluentinc/ce-kafka/blob/master/core/src/main/scala/kafka/log/Log.scala#L2519
   https://github.com/confluentinc/ce-kafka/blob/master/core/src/main/scala/kafka/log/Log.scala#L2526
   
   ex: `info("Deleting segments due to $reason: ${segments.mkString(",")}"`
   
   where `$reason` provides `due to retention time 1200000ms breach`.
   
   The drawback is that sometimes we can not print segment-specific information since the error message is at a batch level. But generally it may be that segment-level deletion information could bloat our server logging, so it may be better to batch the logging instead.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] dhruvilshah3 commented on a change in pull request #9110: MINOR: Ensure a reason is logged for all segment deletion operations

Posted by GitBox <gi...@apache.org>.
dhruvilshah3 commented on a change in pull request #9110:
URL: https://github.com/apache/kafka/pull/9110#discussion_r465939741



##########
File path: core/src/main/scala/kafka/log/Log.scala
##########
@@ -2227,14 +2210,17 @@ class Log(@volatile private var _dir: File,
    * @param segments The log segments to schedule for deletion
    * @param asyncDelete Whether the segment files should be deleted asynchronously
    */
-  private def removeAndDeleteSegments(segments: Iterable[LogSegment], asyncDelete: Boolean): Unit = {
+  private def removeAndDeleteSegments(segments: Iterable[LogSegment],
+                                      asyncDelete: Boolean,
+                                      reason: SegmentDeletionReason): Unit = {
     if (segments.nonEmpty) {
       lock synchronized {
         // As most callers hold an iterator into the `segments` collection and `removeAndDeleteSegment` mutates it by
         // removing the deleted segment, we should force materialization of the iterator here, so that results of the
         // iteration remain valid and deterministic.
         val toDelete = segments.toList
         toDelete.foreach { segment =>
+          info(s"${reason.reasonString(this, segment)}")

Review comment:
       I think this is reasonable. Logging a segment per line will make it easier for us to diagnose issues. I made the change to log a segment per line for retention-related deletions. We still batch all segments in a single line for all other deletion events, eg. log deletion, truncation, etc.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] kowshik commented on a change in pull request #9110: MINOR: Ensure a reason is logged for every segment deletion

Posted by GitBox <gi...@apache.org>.
kowshik commented on a change in pull request #9110:
URL: https://github.com/apache/kafka/pull/9110#discussion_r463914087



##########
File path: core/src/main/scala/kafka/log/Log.scala
##########
@@ -2227,14 +2210,17 @@ class Log(@volatile private var _dir: File,
    * @param segments The log segments to schedule for deletion
    * @param asyncDelete Whether the segment files should be deleted asynchronously
    */
-  private def removeAndDeleteSegments(segments: Iterable[LogSegment], asyncDelete: Boolean): Unit = {
+  private def removeAndDeleteSegments(segments: Iterable[LogSegment],
+                                      asyncDelete: Boolean,
+                                      reason: SegmentDeletionReason): Unit = {
     if (segments.nonEmpty) {
       lock synchronized {
         // As most callers hold an iterator into the `segments` collection and `removeAndDeleteSegment` mutates it by
         // removing the deleted segment, we should force materialization of the iterator here, so that results of the
         // iteration remain valid and deterministic.
         val toDelete = segments.toList
         toDelete.foreach { segment =>
+          info(s"${reason.reasonString(this, segment)}")

Review comment:
       If we passed in the deletion reason further into the `deleteSegmentFiles` method, it seems we can print the reason string just once for a batch of segments being deleted. And within the reason string, we can provide the reason for deleting the batch:
   
   https://github.com/confluentinc/ce-kafka/blob/master/core/src/main/scala/kafka/log/Log.scala#L2519
   https://github.com/confluentinc/ce-kafka/blob/master/core/src/main/scala/kafka/log/Log.scala#L2526
   
   ex: `info("Deleting segments due to $reason: ${segments.mkString(",")}"`
   
   where `$reason` provides `due to retention time 1200000ms breach`.
   
   The drawback is that sometimes we can not print segment-specific information since the error message is at a batch level. But generally it may be that segment-level deletion information could bloat our server logging, so it may be better to batch the logging instead.
   
   What are your thoughts?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org