You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@kafka.apache.org by "Eno Thereska (JIRA)" <ji...@apache.org> on 2016/11/11 12:55:58 UTC

[jira] [Updated] (KAFKA-4317) RocksDB checkpoint files lost on kill -9

     [ https://issues.apache.org/jira/browse/KAFKA-4317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Eno Thereska updated KAFKA-4317:
--------------------------------
    Assignee:     (was: Guozhang Wang)

> RocksDB checkpoint files lost on kill -9
> ----------------------------------------
>
>                 Key: KAFKA-4317
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4317
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>    Affects Versions: 0.10.0.1
>            Reporter: Greg Fodor
>
> Right now, the checkpoint files for logged RocksDB stores are written during a graceful shutdown, and removed upon restoration. Unfortunately this means that in a scenario where the process is forcibly killed, the checkpoint files are not there, so all RocksDB stores are rematerialized from scratch on the next launch.
> In a way, this is good, because it simulates bootstrapping a new node (for example, its a good way to see how much I/O is used to rematerialize the stores) however it leads to longer recovery times when a non-graceful shutdown occurs and we want to get the job up and running again.
> It seems that two possible things to consider:
> - Simply do not remove checkpoint files on restoring. This way a kill -9 will result in only repeating the restoration of all the data generated in the source topics since the last graceful shutdown.
> - Continually update the checkpoint files (perhaps on commit) -- this would result in the least amount of overhead/latency in restarting, but the additional complexity may not be worth it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)