You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Marcel Reutegger (JIRA)" <ji...@apache.org> on 2013/02/12 15:27:12 UTC

[jira] [Commented] (OAK-591) Improve KernelNodeStore cache efficiency

    [ https://issues.apache.org/jira/browse/OAK-591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13576637#comment-13576637 ] 

Marcel Reutegger commented on OAK-591:
--------------------------------------

Turns out it's not that trivial. Even if the implementation supports the :hash or :id property, the getNodes() method again leaves it up to the implementation if the JSON string will have these properties in the child nodes. IMO it would be legal for an implementation to return the :hash in one call, but then omit it again in another call of getNodes() further down the hierarchy. If that's the case the KernelNodeStates have to track the initial read revision of the root node state to have a fallback when there is no hash or id later. But this again introduces data into the KernelNodeState, which changes with every modification to the repository.

I think using the id or hash is only feasible if the implementation always returns these properties (or never, in which case the current revision+path lookup would be used).

Am I misreading the contract of MK.getNodes() or can I depend on an implementation to always return hashes or ids for child nodes if it did once?
                
> Improve KernelNodeStore cache efficiency
> ----------------------------------------
>
>                 Key: OAK-591
>                 URL: https://issues.apache.org/jira/browse/OAK-591
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 0.6
>            Reporter: Marcel Reutegger
>         Attachments: mk.log.gz, OAK-591.patch
>
>
> The cache in KernelNodeStore references entries with a path+revision combo. This mapping quickly becomes inefficient when there are writes on the repository. Whenever something is changed, the complete cache basically becomes invalid and oak-core needs to re-fetch nodes again, even though they didn't change. The attached test shows this behaviour. The test initially creates 10 nodes and lets a thread read those nodes repeatedly. To make the test somewhat realistic the reader acquires a new session in every run through the loop. This is to simulate e.g. a request which acquires a new session every time (Apache Sling does it that way). At the same time writes occur but in a separate part of the repository. As can be seen in the logs, the nodes are read from the MicroKernel whenever something changes anywhere in the repository. Obviously this is no limited to the test nodes. The log also shows repeated reads to node type, user and index nodes. None of them change while the test runs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira