You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Alex Parvulescu (JIRA)" <ji...@apache.org> on 2014/03/27 21:07:15 UTC
[jira] [Comment Edited] (OAK-1456) Non-blocking reindexing
[ https://issues.apache.org/jira/browse/OAK-1456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13949861#comment-13949861 ]
Alex Parvulescu edited comment on OAK-1456 at 3/27/14 8:06 PM:
---------------------------------------------------------------
attaching initial patch for feedback.
The idea is that for property indexes that have the 'reindex-async' flag set to true, when the 'reindex' flag is raised, the reindex will happen asynchronously.
The process should work as follows:
- raising the 'reindex' flag makes the property index editor consider an index for full reindex, now if the 'reindex-async' flag is present, the editor will simply set the 'async' property on the index (async = 'async-reindex') and ignore it.
- there will be a new thread (a copy of the 'async' one) dedicated to these special properties which runs in the background and will pickup the aforementioned index and run a full reindex on it.
The trick here is that this thread waits until is completes a cycle without any changes _then_ it will remove the 'async' property thus switching the property index back to a synchronous mode.
So one open issue here is: the async will switch back to sync at the very least in a matter of 2 cycles (by the current setting 10 seconds).
The mechanism works fine, the issues are around installing a second async thread for the new channel. There were some tweaks I had to do to the current async indexer, but nothing too disruptive (there were still some assumptions around the fact that there is only one async indexer running at a time). Also, I had to add a _synchronized_ on the _checkpoint_ method in the SegmentNodeStore, because of a race issue where the 2 async indexers were running at the same time, and consistently one of them could not create the checkpoint (adding some delay fixed the problem).
I'm not 100% sure if switching from async to sync is prone to lose some info because of the retry policy of the merge operation that will introduce the changes.
Also I think reindexing under a different node is not necessary, as the changes should be visible after the commit which contains everything.
[~jukkaz] maybe you can take a quick look at the patch?
was (Author: alex.parvulescu):
attaching initial patch for feedback.
The idea is that for property indexes that have the 'reindex-async' flag set to true, when the 'reindex' flag is raised, the reindex will happen asynchronously.
The process should work as follows:
- raising the 'reindex' flag makes the property index editor consider an index for full reindex, now if the 'reindex-async' flag is present, the editor will simply set the 'async' property on the index (async = 'async-reindex') and ignore it.
- there will be a new thread (a copy of the 'async' one) dedicated to these special properties which runs in the background and will pickup the aforementioned index and run a full reindex on it.
The trick here is that this thread waits until is completes a cycle without any changes _then_ is will remove the 'async' property thus switching the property index back to a synchronous mode.
So one open issue here is: the async will switch back to sync at the very least in a matter of 2 cycles (by the current setting 10 seconds).
The mechanism works fine, the issues are around installing a second async thread for the new channel. There were some tweaks I had to do to the current async indexer, but nothing too disruptive (there were still some assumptions around the fact that there is only one async indexer running at a time). Also, I had to add a _synchronized_ on the _checkpoint_ method in the SegmentNodeStore, because of a race issue where the 2 async indexers were running at the same time, and consistently one of them could not create the checkpoint (adding some delay fixed the problem).
I'm not 100% sure if switching from async to sync is prone to lose some info because of the retry policy of the merge operation that will introduce the changes.
Also I think reindexing under a different node is not necessary, as the changes should be visible after the commit which contains everything.
[~jukkaz] maybe you can take a quick look at the patch?
> Non-blocking reindexing
> -----------------------
>
> Key: OAK-1456
> URL: https://issues.apache.org/jira/browse/OAK-1456
> Project: Jackrabbit Oak
> Issue Type: Improvement
> Components: query
> Reporter: Michael Marth
> Assignee: Alex Parvulescu
> Priority: Blocker
> Labels: production, resilience
> Fix For: 0.20
>
> Attachments: OAK-1456.patch
>
>
> For huge Oak repos it will be essential to re-index some or all indexes in case they go out of sync in a non-blocking way (i.e. the repo is still operation while the re-indexing takes place).
> For an asynchronous index this should not be much of a problem. One could drop it and recreate (as an added benefit it might be nice if the user could simply add a property "reindex" to the index definition node to trigger this).
> For synchronous indexes, I suggest the mechanism creates an asynchronous index behind the scenes first and once it has caught up
> * blocks writes (?)
> * removes the existing synchronous index
> * moves asynchronous index in its place and makes it synchronous
--
This message was sent by Atlassian JIRA
(v6.2#6252)