You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Guozhang Wang (JIRA)" <ji...@apache.org> on 2018/01/18 22:30:00 UTC

[jira] [Commented] (KAFKA-3184) Add Checkpoint for In-memory State Store

    [ https://issues.apache.org/jira/browse/KAFKA-3184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16331348#comment-16331348 ] 

Guozhang Wang commented on KAFKA-3184:
--------------------------------------

[~yuzhihong@gmail.com] sorry for the late reply!

Your understanding is basically right, and here are my thoughts about the flushing:

1. It should not be expensive and stopping-the-world, since flushing calls may be called on each commit. I was thinking that it could either by async (but we need to make the checkpointed offsets value to be consistent with the checkpoint image on disk itself); or we only do the checkpointing every N. flush calls.

2. As for {{persistent()}}, currently it is only used in {{ProcessorStateManager#checkpoint}} and {{StoreChangelogReader#restoredOffsets}}; with in-memory state stores being checkpointed periodically, I think we can just deprecate this flag and let these two callers always checkpoint / save restored offsets.

> Add Checkpoint for In-memory State Store
> ----------------------------------------
>
>                 Key: KAFKA-3184
>                 URL: https://issues.apache.org/jira/browse/KAFKA-3184
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>            Reporter: Guozhang Wang
>            Priority: Major
>              Labels: user-experience
>
> Currently Kafka Streams does not make a checkpoint of the persistent state store upon committing, which would be expensive since it is "stopping the world" and write on disks: for example, RocksDB would require you to copy the file directory to make a copy naively. 
> However, for in-memory stores checkpointing maybe doable in an asynchronous manner hence it can be done quickly. And the benefit of having intermediate checkpoint is to avoid restoring from scratch if standby tasks are not present.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)