You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Francesco Mari (JIRA)" <ji...@apache.org> on 2017/10/31 14:53:00 UTC

[jira] [Updated] (OAK-6888) Flushing the FileStore might return before data is persisted

     [ https://issues.apache.org/jira/browse/OAK-6888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Francesco Mari updated OAK-6888:
--------------------------------
    Attachment: failure.txt

As a real-world example of such a failure is an execution of {{testSyncBigBlob}} in {{ExternalPrivateStoreIT}}, slightly edited for clarity.

In the example a background flush is preventing the "main" thread from performing a flush. When the synchronization between the standby and the primary happens, the old head state is transferred ({{473f3a4f-bf18-4d4f-a2aa-736ed4a64944.00000005}}). Later on, when the content of the primary and the standby instance is compared, the new head state is used instead ({{8574c330-29ca-491a-a66e-b5b0d1b6b75e.0000000b}}). At this time, the background flush operation is completed and the primary {{FileStore}} has a different persisted head state than the standby.

> Flushing the FileStore might return before data is persisted
> ------------------------------------------------------------
>
>                 Key: OAK-6888
>                 URL: https://issues.apache.org/jira/browse/OAK-6888
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: segment-tar
>            Reporter: Francesco Mari
>            Assignee: Francesco Mari
>             Fix For: 1.8, 1.7.11
>
>         Attachments: failure.txt
>
>
> The implementation of {{FileStore#flush}} might return before all the expected data is persisted on disk. 
> The root cause of this behaviour is the implementation of {{TarRevisions#flush}}, which is too lenient when acquiring the lock for the journal file. If a background flush operation is in progress and a user calls {{FileStore#flush}}, that method will immediately return because the lock of the journal file is already owned by the background flush operation. The caller doesn't have the guarantee that everything committed before {{FileStore#flush}} is persisted to disk when the method returns. 
> A fix for this problem might be to create an additional implementation of flush. The current implementation, needed for the background flush thread, will not be exposed to the users of {{FileStore}}. The new implementation of {{TarRevisions#flush}} should have stricter semantics and always guarantee that the persisted head contains everything visible to the user of {{FileStore}} before the flush operation was started.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)