You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Chetan Mehrotra (JIRA)" <ji...@apache.org> on 2017/07/17 05:56:00 UTC

[jira] [Commented] (OAK-6246) Support for out of band indexing with read only access to NodeStore

    [ https://issues.apache.org/jira/browse/OAK-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16089348#comment-16089348 ] 

Chetan Mehrotra commented on OAK-6246:
--------------------------------------

This feature is implemented now. It supports 2 modes

h3. Reindex and import in single command

In this mode the oak-run tool can connect to repository in read-write mode and perform the indexing and then reimport it back into the system in same command

{noformat}
java -jar oak-run*.jar index --reindex --index-paths=/oak:index/lucene --read-write mongodb://server:27017/oak
{noformat}

This is done in following steps
# oak-run connect to repo in read-write mode and creates a checkpoint and then disconnects
# oak-run connects to repo in read only mode and performs reindexing upto the checkpoint state. Here it connects in read-only mode to ensure that long running indexing task does not cause much side affect  i.e. no lease related writes, no monitoring for recovery etc
# Once indexing is done it reconnects to node store in read-write mode and then perform index import
## Pause the current async indexers
## Import the external index files
## Bring the external index state upto date wrt current checkpoint of the name. For e.g. if index /oak:index/lucene (lane async) is indexed upto cp1 in previous step and current referred checkpoint of async lane is cp2 then in this step we update the index with diff from cp1 -> cp2
## Resume the async indexer

h3. Reindex and import in different commands

In  this mode the admin needs perform the 3 main steps in previous section explicitly and then finally import the indexes either via oak-run or IndexMBean. See OAK-6271 for import details

> Support for out of band indexing with read only access to NodeStore
> -------------------------------------------------------------------
>
>                 Key: OAK-6246
>                 URL: https://issues.apache.org/jira/browse/OAK-6246
>             Project: Jackrabbit Oak
>          Issue Type: New Feature
>          Components: run
>            Reporter: Chetan Mehrotra
>            Assignee: Chetan Mehrotra
>             Fix For: 1.8
>
>
> Provide support for out of band indexing where oak-run is connected in read only mode with NodeStore and indexes are stored on file system. These are then imported back by target system.
> Had a discussion with [~catholicon] and following flow was determined
> # Admin would create provision a checkpoint via CheckpointMBean
> # oak-run index is connected to NodeStore in read only mode and passed with
> #* checkpoint from previous step
> #* list of indexes which need to be reindexed
> # oak-run index logic would then proceed with reindexing. However the created index data would be stored locally. This would make use of 
> #* DirectoryFactory - OAK-6243
> #* Copy-on-write nodestore approach as being used in OAK-6220
> # Once indexing is completed it would dump all index to an output folder with some metadata
> # Then admin can copy this index data and use an MBean on the target setup to "import" it back. This import would need to
> #* Pause the current async indexers
> #* Import the external index files
> #* Bring the external indexer upto date to there respective lanes checkpoint
> #* Resume the async indexer
> The benefit of this approach is that 
> # We only need to backport the import logic. Rest all can be implemented in trunk and need not be backported. 
> # Using read-only mode allow oak-run from trunk to be safely connected to any of the old versions



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)