You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Marcel Reutegger (JIRA)" <ji...@apache.org> on 2014/03/19 21:11:54 UTC
[jira] [Resolved] (OAK-1555) Inefficient node state diff with old
revisions
[ https://issues.apache.org/jira/browse/OAK-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Marcel Reutegger resolved OAK-1555.
-----------------------------------
Resolution: Fixed
Implemented a journal like diff cache on MongoDB with a capped collection: http://svn.apache.org/r1579378
> Inefficient node state diff with old revisions
> ----------------------------------------------
>
> Key: OAK-1555
> URL: https://issues.apache.org/jira/browse/OAK-1555
> Project: Jackrabbit Oak
> Issue Type: Bug
> Components: core, mongomk
> Affects Versions: 0.18
> Reporter: Marcel Reutegger
> Assignee: Marcel Reutegger
> Priority: Blocker
> Fix For: 0.19
>
>
> As part of OAK-1429 a number of improvements were implemented but one issue remains when a node state diff is done with older revisions.
> The DocumentNodeStore keeps a modified timestamp on each document and updates it whenever the document is explicitly modified or implicitly when a descendant document is updated. With this timestamp the store is able to tell when a subtree was last modified. The diff implementation gets inefficient when the two revisions to compare are older than the modified timestamp of a document tree. In this case the implementation tends to read many more nodes than were actually modified because it cannot exactly tell when a subtree was modified.
> Improvements from OAK-1394 and OAK-1429 helped quite a bit because the diff cache in the DocumentNodeStore is pro-actively filled by the commits. However, in addition to the observation listeners that perform diffs there is also the async index update, which periodically performs a diff. Those diff usually go further back in time and are the ones that are inefficient and also have a negative impact on the diff cache.
> A solution to this problem was already discussed in a recent oak conf call. The DocumentNodeStore keeps a journal of commits and uses it to answer node state diff calls. With this journal the store should also be able to efficiently diff across multiple commits. A number of options were discusses, whether to implemented the journal with a local file or a capped MongoDB collection.
> Ideas for alternative solutions are welcome...
--
This message was sent by Atlassian JIRA
(v6.2#6252)