You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Michael Dürig (JIRA)" <ji...@apache.org> on 2018/10/23 07:01:00 UTC

[jira] [Comment Edited] (OAK-7852) Blocked background flush can cause sever data loss

    [ https://issues.apache.org/jira/browse/OAK-7852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16660184#comment-16660184 ] 

Michael Dürig edited comment on OAK-7852 at 10/23/18 7:00 AM:
--------------------------------------------------------------

I implemented a patch for the different approach mentioned in my previous comment: [https://github.com/mduerig/jackrabbit-oak/commits/OAK-7852-2]. This introduces to thresholds: after a certain time without {{flush}} a warning is written to the log for each further write operation but no more than on a second. After some more time without {{flush}} when the second threshold is reached an error is written to the log and further writer operations fail with {{IOException: "Write operations disallowed: transient write operations not flushed for too long}}" until a {{flush}} occurs.

[~frm], please have a look.


was (Author: mduerig):
I implemented a patch for the different approach mentioned in my previous comment: [https://github.com/mduerig/jackrabbit-oak/commits/OAK-7852-2]. This introduces to thresholds: after a certain time without {{flush}} a warning is written to the log for each further write operation but no more than on a second. After some more time without {{flush}} when the second threshold is reached an error is written to the log and further writer operations fail with {{IOException: Write operations disallowed: transient write operations not flushed for too long}}.

[~frm], please have a look.

> Blocked background flush can cause sever data loss 
> ---------------------------------------------------
>
>                 Key: OAK-7852
>                 URL: https://issues.apache.org/jira/browse/OAK-7852
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: segment-tar
>            Reporter: Michael Dürig
>            Assignee: Michael Dürig
>            Priority: Major
>             Fix For: 1.10
>
>
> When the {{FileStore background task}} fails (e.g. because of a deadlock) and the {{FileStore}} is subsequently shutdown in an unclean way ({{kill -9}}) then there is a risk of a sever data loss. Although a journal could be reconstructed from the segments, there is a chance that most if not all of the revisions written since the failure of the background tasks are inconsistent with a {{SNFE}}. 
> The expectation for such a case should be that a journal could be reconstructed from the segments and that all but the last few revisions are consistent.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)