You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Marcel Reutegger (JIRA)" <ji...@apache.org> on 2014/03/19 21:11:54 UTC

[jira] [Resolved] (OAK-1555) Inefficient node state diff with old revisions

     [ https://issues.apache.org/jira/browse/OAK-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Marcel Reutegger resolved OAK-1555.
-----------------------------------

    Resolution: Fixed

Implemented a journal like diff cache on MongoDB with a capped collection: http://svn.apache.org/r1579378

> Inefficient node state diff with old revisions
> ----------------------------------------------
>
>                 Key: OAK-1555
>                 URL: https://issues.apache.org/jira/browse/OAK-1555
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: core, mongomk
>    Affects Versions: 0.18
>            Reporter: Marcel Reutegger
>            Assignee: Marcel Reutegger
>            Priority: Blocker
>             Fix For: 0.19
>
>
> As part of OAK-1429 a number of improvements were implemented but one issue remains when a node state diff is done with older revisions.
> The DocumentNodeStore keeps a modified timestamp on each document and updates it whenever the document is explicitly modified or implicitly when a descendant document is updated. With this timestamp the store is able to tell when a subtree was last modified. The diff implementation gets inefficient when the two revisions to compare are older than the modified timestamp of a document tree. In this case the implementation tends to read many more nodes than were actually modified because it cannot exactly tell when a subtree was modified.
> Improvements from OAK-1394 and OAK-1429 helped quite a bit because the diff cache in the DocumentNodeStore is pro-actively filled by the commits. However, in addition to the observation listeners that perform diffs there is also the async index update, which periodically performs a diff. Those diff usually go further back in time and are the ones that are inefficient and also have a negative impact on the diff cache.
> A solution to this problem was already discussed in a recent oak conf call. The DocumentNodeStore keeps a journal of commits and uses it to answer node state diff calls. With this journal the store should also be able to efficiently diff across multiple commits. A number of options were discusses, whether to implemented the journal with a local file or a capped MongoDB collection.
> Ideas for alternative solutions are welcome...



--
This message was sent by Atlassian JIRA
(v6.2#6252)