You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Jay Kreps (JIRA)" <ji...@apache.org> on 2014/07/22 01:58:40 UTC

[jira] [Resolved] (KAFKA-1539) Due to OS caching Kafka might loose offset files which causes full reset of data

     [ https://issues.apache.org/jira/browse/KAFKA-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jay Kreps resolved KAFKA-1539.
------------------------------

    Resolution: Fixed

I'm checking this in since it seems to fix a clear problem, but [~arhagnel] it would still be good to get confirmation that the problem you were producing is fixed by this.

> Due to OS caching Kafka might loose offset files which causes full reset of data
> --------------------------------------------------------------------------------
>
>                 Key: KAFKA-1539
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1539
>             Project: Kafka
>          Issue Type: Bug
>          Components: log
>    Affects Versions: 0.8.1.1
>            Reporter: Dmitry Bugaychenko
>            Assignee: Jay Kreps
>         Attachments: KAFKA-1539.patch
>
>
> Seen this while testing power failure and disk failures. Due to chaching on OS level (eg. XFS can cache data for 30 seconds) after failure we got offset files of zero length. This dramatically slows down broker startup (it have to re-check all segments) and if high watermark offsets lost it simply erases all data and start recovering from other brokers (looks funny - first spending 2-3 hours re-checking logs and then deleting them all due to missing high watermark).
> Proposal: introduce offset files rotation. Keep two version of offset file, write to oldest, read from the newest valid. In this case we would be able to configure offset checkpoint time in a way that at least one file is alway flushed and valid.



--
This message was sent by Atlassian JIRA
(v6.2#6252)