You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Julian Sedding (Jira)" <ji...@apache.org> on 2022/05/26 15:28:00 UTC

[jira] [Created] (OAK-9785) Tar SegmentStore can be corrupted during compaction

Julian Sedding created OAK-9785:
-----------------------------------

             Summary: Tar SegmentStore can be corrupted during compaction
                 Key: OAK-9785
                 URL: https://issues.apache.org/jira/browse/OAK-9785
             Project: Jackrabbit Oak
          Issue Type: Bug
          Components: segment-tar
    Affects Versions: 1.42.0
            Reporter: Julian Sedding


There is a scenario where a segment store can become corrupted, leading to {{{}SegmentNotFoundException}}s with very "young" \{{SegmentId}}s, i.e. in the 1-2 digit millisecond range. E.g. {{SegmentId age=2ms{}}}.

The scenario I observed looks as follows:
 - a blob is "lost" from the external blob store (presumably due to incorrect cloning of the instance, most likely only happens with unfortunate timing)
 - a tail revision GC run is performed (not sure if it matters that this was a tail compaction)
 -- the missing blob is encountered during compaction
 -- an exception other than an {{IOException}} (IIRC it was a {{{}IllegalArgumentException{}}}) is thrown due to the missing blob
 -- revision GC fails WITHOUT properly being aborted, and thus the partially written revision of the compaction run is not removed
 - more data is written on the instance
 - a full revision GC run is performed
 -- a referenced segment is removed due to the incorrect/confused revision data
 - the {{SegmentNotFoundException}} is first observed either during the remainder of the compaction run or when the respective node is requested the next time, usually during a traversal

The root cause is in [{{AbstractCompactionStrategy}}|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-segment-tar/src/main/java/org/apache/jackrabbit/oak/segment/file/AbstractCompactionStrategy.java#L233], where only \{{IOException}}s are caught.

In order to improve the robustness of the code, I think we need to catch all \{{Throwable}}s. Otherwise we cannot guarantee that compaction is correctly aborted.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)