You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Alex Parvulescu (JIRA)" <ji...@apache.org> on 2014/03/12 11:27:44 UTC
[jira] [Comment Edited] (OAK-1465) performance degradation with growing index size on Oak-Mongo

    [ https://issues.apache.org/jira/browse/OAK-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13931613#comment-13931613 ] 

Alex Parvulescu edited comment on OAK-1465 at 3/12/14 10:26 AM:
----------------------------------------------------------------

This is how the property index updates look like [0] each save operation triggers 2 index updates (_before_ and _after_ are the index keys):
 -  one node type (oak:Unstructured)
 - one property type. (here the property has the same value as the node name)

My profiling session shows a lot of cache misses on the DocumentNodeStore#getNode, and given the high frequency of small commits I don't see any code tweaks that I could do to speed up this test.

It would be interesting to add some sort of output of the cache stats after the tests, I wanted to at least see them, but I found it ridiculously hard to get a reference to that stats object.

I'm un-assigning myself from this issue, but I'm still open to any ideas of improvement, so feel free to point to anything I might have missed in the indexing code.

A small thing I've noticed is that the NodeBuilder#getChildNodeNames in the case of the DocumentNodeState is using the default AbstractNodeState impl which is simply calling #getChildNodeEntries and then extracting the names. I did not see heavy usage of this method (I ran into it in the IndexUpdate#collectIndexEditors method), so I don't think very important to provide a more efficient implementation.


[0]
{code}
update on /test19b6d919/testNode/level1_49/217f0ea5-190c-4c56-8b7d-c4b180c670a1
    before []
    after  [217f0ea5-190c-4c56-8b7d-c4b180c670a1]
update on /test19b6d919/testNode/level1_49/217f0ea5-190c-4c56-8b7d-c4b180c670a1
    before []
    after  [oak%3AUnstructured]
{code}


was (Author: alex.parvulescu):
This is how the property index updates look like [0]: each save operation triggers 2 index updates, one node type and one property type. 
My profiling session shows a lot of cache misses on the DocumentNodeStore#getNode, and given the high frequency of small commits I don't see any code tweaks that I could do to speed up this test.

It would be interesting to add some sort of output of the cache stats after the tests, I wanted to at least see them, but I found it ridiculously hard to get a reference to that stats object.

I'm un-assigning myself from this issue, but I'm still open to any ideas of improvement, so feel free to point to anything I might have missed in the indexing code.

A small thing I've noticed is that the NodeBuilder#getChildNodeNames in the case of the DocumentNodeState is using the default AbstractNodeState impl which is simply calling #getChildNodeEntries and then extracting the names. I did not see heavy usage of this method (I ran into it in the IndexUpdate#collectIndexEditors method), so I don't think very important to provide a more efficient implementation.


[0]
{code}
update on /test19b6d919/testNode/level1_49/217f0ea5-190c-4c56-8b7d-c4b180c670a1
    before []
    after  [217f0ea5-190c-4c56-8b7d-c4b180c670a1]
update on /test19b6d919/testNode/level1_49/217f0ea5-190c-4c56-8b7d-c4b180c670a1
    before []
    after  [oak%3AUnstructured]
{code}

> performance degradation with growing index size on Oak-Mongo
> ------------------------------------------------------------
>
>                 Key: OAK-1465
>                 URL: https://issues.apache.org/jira/browse/OAK-1465
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: mongomk
>    Affects Versions: 0.17.1
>            Reporter: Stefan Egli
>            Assignee: Alex Parvulescu
>            Priority: Blocker
>             Fix For: 0.19
>
>         Attachments: CreateManyIndexedNodesTest.java
>
>
> Tested with an oak-snapshot of Monday Feb 24, 10AM EST.
> Noticed that when the amount of nodes indexed - eg wrt a particular property - the adding of nodes becomes slower and slower.
> Will attach a oak-run benchmark to underline this. Basically the scenario where this occurred was:
>  * have a number of "level 1" nodes (eg 100)
>  * under those "level 1" nodes, add a growing list of children, each with a property that is indexed (ie that index is actually growing and is probably causing the slowdown).



--
This message was sent by Atlassian JIRA
(v6.2#6252)