You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Stefan Egli (Jira)" <ji...@apache.org> on 2020/09/01 12:08:00 UTC

[jira] [Commented] (OAK-9176) sweep upgrade of pre 1.8 branch commits not always sets "_bc" for parents/root

    [ https://issues.apache.org/jira/browse/OAK-9176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17188390#comment-17188390 ] 

Stefan Egli commented on OAK-9176:
----------------------------------

[Added|https://github.com/stefan-egli/jackrabbit-oak/commit/65382ccaaca434ed718c464fd1799a9f80e72870] sweep2 to the [working branch|https://github.com/stefan-egli/jackrabbit-oak/commits/issue/OAK-9176] :
* settings collection now has a {{sweep2Status}} which contains a {{status}} which is either {{sweeping}} or {{swept}} - plus a {{lock}} which represents the clusterId of the instance currently sweeping or having done the sweep.
* if the above status is not set, then a {{Sweep2Helper.isSweep2Necessary()}} inspects the root document to see if a sweep2 is necessary
* at startup, if a sweep2 is detected to be necessary, there's now a new, dedicated, one-time {{backgroundSweep2}} Thread which will do a repository scan for any missing {{_bc}}.
* if the instance doing a sweep2 crashes and another instance in the cluster is still running, that new instance will do a sweep2 again - until a complete sweep2 was done.

missing:
* since a sweep2 might be long-running and cause a high load on the backend, a throttling might be useful

> sweep upgrade of pre 1.8 branch commits not always sets "_bc" for parents/root
> ------------------------------------------------------------------------------
>
>                 Key: OAK-9176
>                 URL: https://issues.apache.org/jira/browse/OAK-9176
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: documentmk
>    Affects Versions: 1.32.0
>            Reporter: Stefan Egli
>            Assignee: Stefan Egli
>            Priority: Major
>
> OAK-5869 in Oak 1.8 introduced the _annotation of documents with branch commits_ for facilitating Revision GC. This means that branch commits from 1.8 on are reflected in an explicit {{"_bc"}} entry (a.s [branches|https://jackrabbit.apache.org/oak/docs/nodestore/documentmk.html#branches] docu for further details).
> Generally speaking {{"_bc"}} are set in the following cases:
>  # for property changes on the document itself
>  # for new children (ie via {{"_deleted" : "false"}}) on the new child's document
>  # for parents of new children (ie via {{"_commitRoot": ".."}} to indicate a change for conflict handling) on the parent's document
>  # for the root document (or more precisely the commitRoot, which for branch commits is the root)
> For repositories created prior to 1.8 an [upgrade|https://jackrabbit.apache.org/oak/docs/nodestore/document/upgrade.html] mechanism was [introduced|https://github.com/apache/jackrabbit-oak/commit/e2aad1c0e867b148e81d4de9001e18551c1d8a5c] (in OAK-3712): a _sweep_ takes care of automatically annotating existing pre-1.8 documents accordingly. This is ultimately handled in [{{NodeDocumentSweeper.sweepOne}}|https://github.com/apache/jackrabbit-oak/blob/d35346d4d446908c7019e931cb54d88824c1a637/oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/NodeDocumentSweeper.java#L179] which is happening via invocation to {{forceBackgroundSweep}}.
> This upgrade mechanism is not properly handling the last 2 aspects (listed above), *if* the parent/root node itself doesn't have any further property changes. This is due to the fact that sweepOne(), in charge of the upgrade, only considers [PROPERTY_OR_DELETED|https://github.com/apache/jackrabbit-oak/blob/d35346d4d446908c7019e931cb54d88824c1a637/oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/NodeDocumentSweeper.java#L181] document entries (missing otherwise unchanged parents and root).
> This can therefore lead to nodes that had children created before Oak 1.8 to not have a {{"_bc"}} entry - besides the root which might also not have pre Oak 1.8 branch commit {{"_bc"}} entries.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)