You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "John Roesler (JIRA)" <ji...@apache.org> on 2018/06/18 21:46:00 UTC

[jira] [Created] (KAFKA-7072) Kafka Streams may drop rocksb window segments before they expire

John Roesler created KAFKA-7072:
-----------------------------------

             Summary: Kafka Streams may drop rocksb window segments before they expire
                 Key: KAFKA-7072
                 URL: https://issues.apache.org/jira/browse/KAFKA-7072
             Project: Kafka
          Issue Type: Bug
          Components: streams
    Affects Versions: 1.0.1, 1.1.0, 0.11.0.2, 1.0.0, 0.11.0.1, 0.11.0.0, 2.0.0
            Reporter: John Roesler
            Assignee: John Roesler
             Fix For: 2.1.0


The current implementation of Segments used by Rocks Session and Time window stores is in conflict with our current timestamp management model.

The current segmentation approach allows configuration of a fixed number of segments (let's say *4*) and a fixed retention time. We essentially divide up the retention time into the available number of segments:
{quote}{{<---------|-----------------------------|}}
{{   expiration date                 right now}}
{{          \-------retention time--------/}}
{{          |  seg 0  |  seg 1  |  seg 2  |  seg 3  |}}
{quote}
Note that we keep one extra segment so that we can record new events, while some events in seg 0 are actually expired (but we only drop whole segments, so they just get to hang around.
{quote}{{<-------------|-----------------------------|}}
{{       expiration date                 right now}}
{{              \-------retention time--------/}}
{{          |  seg 0  |  seg 1  |  seg 2  |  seg 3  |}}
{quote}
When it's time to provision segment 4, we know that segment 0 is completely expired, so we drop it:
{quote}{{<-------------------|-----------------------------|}}
{{             expiration date                 right now}}
{{                    \-------retention time--------/}}
{{                    |  seg 1  |  seg 2  |  seg 3  |  seg 4  |}}
{quote}
 

However, the current timestamp management model allows for records from the future. Namely, because we define stream time as the minimum buffered timestamp (nondecreasing), we can have a buffer like this: [ 5, 2, 6 ], and our stream time will be 2, but we'll handle a record with timestamp 5 next. referring to the example, this means we could wind up having to provision segment 4 before segment 0 expires!

Let's say "f" is our future event:
{quote}{{<-------------------|-----------------------------|----f}}
{{             expiration date                 right now}}
{{                    \-------retention time--------/}}
{{                    |  seg 1  |  seg 2  |  seg 3  |  seg 4  |}}
{quote}
{{}}Should we drop segment 0 prematurely? Or should we crash and refuse to process "f"?

Today, we do the former, and this is probably the better choice. If we refuse to process "f", then we cannot make progress ever again.

Dropping segment 0 prematurely is a bummer, but users could also set the retention time high enough that they don't think they'll actually get any events late enough to need segment 0. Worst case, since we can have many future events without advancing stream time, sparse enough to each require their own segment, which would eat deeply into the retention time, dropping many segments that should be live.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)