You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by GitBox <gi...@apache.org> on 2020/07/07 12:58:23 UTC

[GitHub] [kafka] ijuma commented on a change in pull request #8936: KAFKA-10207: Fixed padded timeindex causing premature data deletion

ijuma commented on a change in pull request #8936:
URL: https://github.com/apache/kafka/pull/8936#discussion_r450843637



##########
File path: core/src/test/scala/unit/kafka/log/TimeIndexTest.scala
##########
@@ -148,5 +148,39 @@ class TimeIndexTest {
     idx.close()
   }
 
-}
 
+  /**
+   * In certain cases, index files fail to have their pre-allocated 0 bytes trimmed from the tail
+   * when a new segment is rolled. This causes a silent failure at the next startup where all retention
+   * windows are breached purging out data whether or not the window was really breached.
+   * KAFKA-10207
+   */
+  @Test
+  def testLoadingUntrimmedIndex(): Unit = {
+    // A larger index size must be specified or the starting offset will round down
+    // preventing this issue from being reproduced. Configs default to 10mb.
+    val max1MbEntryCount = 100000
+    // Create a file that will exist on disk and be removed when we are done
+    val file = nonExistantTempFile()
+    file.deleteOnExit()
+    // create an index that can have up to 100000 entries, about 1mb
+    var idx2 = new TimeIndex(file, baseOffset = 0, max1MbEntryCount * 12)
+    // Append less than the maximum number of entries, leaving 0 bytes padding the end
+    for (i <- 1 until max1MbEntryCount)
+      idx2.maybeAppend(i, i)
+
+    idx2.flush()
+    // jvm 1.8.0_191 fails to always flush shrinking resize to zfs disk

Review comment:
       Apache Kafka contributors were involved in the discussion for reverting that change, but we were only aware of the performance impact. It sounds like you're saying that it also resulted in incorrect behavior?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org