You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Thomas Mueller (JIRA)" <ji...@apache.org> on 2017/01/26 09:38:24 UTC

[jira] [Comment Edited] (OAK-5324) Enable property index reindexing via oak-run

    [ https://issues.apache.org/jira/browse/OAK-5324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15837941#comment-15837941 ] 

Thomas Mueller edited comment on OAK-5324 at 1/26/17 9:37 AM:
--------------------------------------------------------------

> But I assume this issue is rather about a way to introduce a new index or update an existing one when the system is online, right? In that case, the branch-less mode is off the table.

I see. I wrote a tool that allows managing indexes (creating, changing, reindexing, removing) using a script, for both the regular and the branch-less mode now:
http://svn.apache.org/r1780222

> At least for new indexes we could try to improve the branch handling in the DocumentNodeStore.

If that turns out to be much easier, we could probably make reindexing a special case of creating a new index. For example, re-index into a new hidden child node, ":data_1", ":data_2",..., so that the existing nodes are not changed. And only change the pointer to the latest ":data_x" node at the end, maybe in a separate commit. After that, the old, outdated ":data_(n-1)" node could be removed step-by-step using multiple commits, or in one commit (which can't conflict).

Another options might be to split indexing into multiple commits. For example use a "fromPath" .. "toPath" range, and only re-index part of the repository at a time.

> Async re-index? Does that disable synchronous index updates while it is re-indexing?

I don't know currently.




was (Author: tmueller):
> But I assume this issue is rather about a way to introduce a new index or update an existing one when the system is online, right? In that case, the branch-less mode is off the table.

I see. I wrote a tool that allows managing indexes (creating, changing, reindexing, removing) using a script, for both the regular and the branch-less mode now:
http://svn.apache.org/r1780222

> At least for new indexes we could try to improve the branch handling in the DocumentNodeStore.

If that turns out to be much easier, we could probably make reindexing a special case of creating a new index. For example, re-index into a new hidden child node, ":data_1", ":data_2",..., so that the existing nodes are not changed. And only change the pointer to the latest ":data_x" node at the very end, in a separate commit.

Another options might be to split indexing into multiple commits. For example use a "fromPath" .. "toPath" range, and only re-index part of the repository at a time.

> Async re-index? Does that disable synchronous index updates while it is re-indexing?

I don't know currently.



> Enable property index reindexing via oak-run
> --------------------------------------------
>
>                 Key: OAK-5324
>                 URL: https://issues.apache.org/jira/browse/OAK-5324
>             Project: Jackrabbit Oak
>          Issue Type: New Feature
>          Components: documentmk, run
>            Reporter: Chetan Mehrotra
>            Assignee: Thomas Mueller
>             Fix For: 1.6, 1.8
>
>
> Currently introducing a new property index or performing a reindex of existing property index is problamatic on DocumentNodeStore. This happens because doing this results in either 
> # Persisted branch - Which is slow at times and has issues related to conflict handling
> # Large in memory branch which increases heap pressure
> To enable this use case we should add some tooling in oak-run where we can use different approach for achieving the same. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)