You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Alex Parvulescu (JIRA)" <ji...@apache.org> on 2017/01/25 14:28:26 UTC

[jira] [Comment Edited] (OAK-5499) IndexUpdate can do mulitple traversal of a content tree during initial index when there are sub-root indices

    [ https://issues.apache.org/jira/browse/OAK-5499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15837804#comment-15837804 ] 

Alex Parvulescu edited comment on OAK-5499 at 1/25/17 2:27 PM:
---------------------------------------------------------------

Attaching a possible fix. I decided to investigate a different approach, which is to skip the out of band indexing if the base state is the {{MISSING_NODE}}, this is the only case where the extra traversal is very expensive (see [OAK-5499-v2-fix.patch|^OAK-5499-v2-fix.patch]).

The way it would work is the first index's reindex will be a part of the full traversal (no longer a dedicated reindex traversal), and would also pickup other index definitions that also need a reindex and include those as well in the current traversal (no longer spawning out of band reindex traversals).
Unfortunately this is a pain to test, so I don't have anything better than some logs, I also attached the version of the patch where anyone can see the logs locally (see  [OAK-5499-v2-demo.patch|^OAK-5499-v2-demo.patch]).

To simplify feedback here's the output without the patch (_IU_ is the IndexUpdate class, _CNA_ is the childNodeAdded call, _E0_ and _E1_ are the 2 indexers):
{noformat}
[IU] Reindexing [/oak:index/foo1Index]
[E0] /
[E0] /content
[E0] /content/childContent
[E0] /content/childContent/c0
[E0] /content/childContent/c0/c1
[E0] /content/oak:index
[E0] /content/oak:index/foo2Index
[E0] /oak:index
[E0] /oak:index/foo1Index
Reindexing done for [/oak:index/foo1Index]
[IU] CNA /
[IU] Reindexing [/content/oak:index/foo2Index]
[E1] /
[E1] /childContent
[E1] /childContent/c0
[E1] /childContent/c0/c1
[E1] /oak:index
[E1] /oak:index/foo2Index
Reindexing done for [/content/oak:index/foo2Index]
[IU] CNA /content
[IU] CNA /content/childContent
[IU] CNA /content/childContent/c0
[IU] CNA /content
[IU] CNA /content/oak:index
[IU] CNA /
[IU] CNA /oak:index
{noformat}
We can see the extra traversals happening, whereas with the patch, both indexes reindex are collapsed into the main traversal thread:
{noformat}
[IU] Reindexing [/oak:index/foo1Index]
[E0] /
[IU] CNA /
[IU] Reindexing [/content/oak:index/foo2Index]
[E1] /
[E0] /content
[IU] CNA /content
[E1] /childContent
[E0] /content/childContent
[IU] CNA /content/childContent
[E1] /childContent/c0
[E0] /content/childContent/c0
[IU] CNA /content/childContent/c0
[E1] /childContent/c0/c1
[E0] /content/childContent/c0/c1
[IU] CNA /content
[E1] /oak:index
[E0] /content/oak:index
[IU] CNA /content/oak:index
[E1] /oak:index/foo2Index
[E0] /content/oak:index/foo2Index
[IU] CNA /
[E0] /oak:index
[IU] CNA /oak:index
[E0] /oak:index/foo1Index
{noformat}

I took some special care to preserve the current logging style on reindex, and I believe I managed to do that, but there might have been aspects I forgot. feedback very appreciated!



was (Author: alex.parvulescu):
Attaching a possible fix. I decided to investigate a different approach, which is to skip the out of band indexing if the base state is the {{MISSING_NODE}}, this is the only case where the extra traversal is very expensive (see OAK-5499-v2-fix.patch).

The way it would work is the first index's reindex will be a part of the full traversal (no longer a dedicated reindex traversal), and would also pickup other index definitions that also need a reindex and include those as well in the current traversal (no longer spawning out of band reindex traversals).
Unfortunately this is a pain to test, so I don't have anything better than some logs, I also attached the version of the patch where anyone can see the logs locally (see OAK-5499-v2-demo.patch).

To simplify feedback here's the output without the patch (_IU_ is the IndexUpdate class, _CNA_ is the childNodeAdded call, _E0_ and _E1_ are the 2 indexers):
{noformat}
[IU] Reindexing [/oak:index/foo1Index]
[E0] /
[E0] /content
[E0] /content/childContent
[E0] /content/childContent/c0
[E0] /content/childContent/c0/c1
[E0] /content/oak:index
[E0] /content/oak:index/foo2Index
[E0] /oak:index
[E0] /oak:index/foo1Index
Reindexing done for [/oak:index/foo1Index]
[IU] CNA /
[IU] Reindexing [/content/oak:index/foo2Index]
[E1] /
[E1] /childContent
[E1] /childContent/c0
[E1] /childContent/c0/c1
[E1] /oak:index
[E1] /oak:index/foo2Index
Reindexing done for [/content/oak:index/foo2Index]
[IU] CNA /content
[IU] CNA /content/childContent
[IU] CNA /content/childContent/c0
[IU] CNA /content
[IU] CNA /content/oak:index
[IU] CNA /
[IU] CNA /oak:index
{noformat}
We can see the extra traversals happening, whereas with the patch, both indexes reindex are collapsed into the main traversal thread:
{noformat}
[IU] Reindexing [/oak:index/foo1Index]
[E0] /
[IU] CNA /
[IU] Reindexing [/content/oak:index/foo2Index]
[E1] /
[E0] /content
[IU] CNA /content
[E1] /childContent
[E0] /content/childContent
[IU] CNA /content/childContent
[E1] /childContent/c0
[E0] /content/childContent/c0
[IU] CNA /content/childContent/c0
[E1] /childContent/c0/c1
[E0] /content/childContent/c0/c1
[IU] CNA /content
[E1] /oak:index
[E0] /content/oak:index
[IU] CNA /content/oak:index
[E1] /oak:index/foo2Index
[E0] /content/oak:index/foo2Index
[IU] CNA /
[E0] /oak:index
[IU] CNA /oak:index
[E0] /oak:index/foo1Index
{noformat}

I took some special care to preserve the current logging style on reindex, and I believe I managed to do that, but there might have been aspects I forgot. feedback very appreciated!


> IndexUpdate can do mulitple traversal of a content tree during initial index when there are sub-root indices
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: OAK-5499
>                 URL: https://issues.apache.org/jira/browse/OAK-5499
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: core
>            Reporter: Vikas Saurabh
>            Assignee: Vikas Saurabh
>            Priority: Minor
>             Fix For: 1.8
>
>         Attachments: OAK-5499.patch, OAK-5499-v2-demo.patch, OAK-5499-v2-fix.patch
>
>
> In case we've index defs such as:
> {noformat}
> /oak:index/foo1Index
> /content
>    /oak:index/foo2Index
> {noformat}
> then initial indexing process \[0] would traverse tree under {{/content}} twice - once while indexing for top-level indices and next when it starts to index newly discovered {{foo2Index}} while traversing {{/content/oak:index}}.
> What we can do is that while first diff processes {{/content}} and discovers a node named {{oak:index}}, it can actively go in that tree and peek into index defs from under it and register as required. The diff can then proceed under {{/content}} while the new indices would also get diffs (avoiding another traversal)
> \[0] first time indexing or in case {{/:async}} gets deleted or checkpoint for async index couldn't be retrieved



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)