You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Matthias J. Sax (Jira)" <ji...@apache.org> on 2019/10/22 19:55:00 UTC

[jira] [Resolved] (KAFKA-4113) Allow KTable bootstrap

     [ https://issues.apache.org/jira/browse/KAFKA-4113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matthias J. Sax resolved KAFKA-4113.
------------------------------------
    Resolution: Invalid

Closing this ticket as "invalid" because KTables are populated based on the record timestamps and hence the idea of a "blind bootstrap" does not apply. Instead, an indirect bootstrap can be done by ensuring that the table record timestamps are smaller than the first stream record timestamp will allow to "bootstrap" a KTable.

For the case that somebody wants to completely decouple the KTable from the KStream (ie, disable time synchronization, and to get similar table semantics as for a global-KTable) a custom timestamp extractor that always returns zero can be used.

The current timestamp mechanism (cf. KAFKA-3514) is not perfect yet, but the gaps are tracked via KAFKA-6542 and KAFKA-7458 already and we don't need to double track.
 

> Allow KTable bootstrap
> ----------------------
>
>                 Key: KAFKA-4113
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4113
>             Project: Kafka
>          Issue Type: New Feature
>          Components: streams
>            Reporter: Matthias J. Sax
>            Priority: Major
>
> On the mailing list, there are multiple request about the possibility to "fully populate" a KTable before actual stream processing start.
> Even if it is somewhat difficult to define, when the initial populating phase should end, there are multiple possibilities:
> The main idea is, that there is a rarely updated topic that contains the data. Only after this topic got read completely and the KTable is ready, the application should start processing. This would indicate, that on startup, the current partition sizes must be fetched and stored, and after KTable got populated up to those offsets, stream processing can start.
> Other discussed ideas are:
> 1) an initial fixed time period for populating
> (it might be hard for a user to estimate the correct value)
> 2) an "idle" period, ie, if no update to a KTable for a certain time is
> done, we consider it as populated
> 3) a timestamp cut off point, ie, all records with an older timestamp
> belong to the initial populating phase
> The API change is not decided yet, and the API desing is part of this JIRA.
> One suggestion (for option (4)) was:
> {noformat}
> KTable table = builder.table("topic", 1000); // populate the table without reading any other topics until see one record with timestamp 1000.
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)