You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Michael Dürig (JIRA)" <ji...@apache.org> on 2016/06/02 13:37:59 UTC

[jira] [Commented] (OAK-4291) FileStore.flush prone to races leading to corruption

    [ https://issues.apache.org/jira/browse/OAK-4291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15312310#comment-15312310 ] 

Michael Dürig commented on OAK-4291:
------------------------------------

Thanks for the feedback and patch. Makes a lot of sense! I'm happy we can get rid of the extra {{flushMonitor}}. 

3. is not just about consistency though. Without the {{flushMonitor}} it is required to only consider the writers that where borrowed (and not all that are disposed at that point). Doing otherwise could lead a concurrent thread in flush to get stuck on {{safeEnterWhen(poolMonitor, allReturned(toReturn))}} as its borrowed writers where cleared by the earlier thread already. 




> FileStore.flush prone to races leading to corruption
> ----------------------------------------------------
>
>                 Key: OAK-4291
>                 URL: https://issues.apache.org/jira/browse/OAK-4291
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: segment-tar
>            Reporter: Michael Dürig
>            Assignee: Michael Dürig
>            Priority: Critical
>              Labels: resilience
>             Fix For: 1.6
>
>         Attachments: OAK-4291-02.patch, OAK_4291-UTs.patch, OAK_4291.patch
>
>
> There is a small window in {{FileStore.flush}} that could lead to data corruption: if we crash right after setting the persisted head but before any delay-flushed {{SegmentBufferWriter}} instance flushes (see {{SegmentBufferWriterPool.returnWriter()}}) then that data is lost although it might already be referenced from the persisted head.
> We need to come up with a test case for this. 
> A possible fix would be to return a future from {{SegmentWriter.flush}} and rely on a completion callback. Such a change would most likely also be useful for OAK-3690. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)