You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Chetan Mehrotra (JIRA)" <ji...@apache.org> on 2015/05/05 16:39:00 UTC

[jira] [Commented] (OAK-2247) CopyOnWriteDirectory implementation for Lucene for use in indexing

    [ https://issues.apache.org/jira/browse/OAK-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14528534#comment-14528534 ] 

Chetan Mehrotra commented on OAK-2247:
--------------------------------------

[initial patch|^OAK-2247-v1.patch]. It takes following design approach

# Depending on config the Directory created via {{LuceneIndexEditorContext}} would be wrapped by {{CopyOnWriteDirectory}} (COW)
# COW would maintain a queue of changes being applied on backing remote directory. Any change is initially done to a local directory (FS based). This is done to ensure that underlying NodeBuilder is not accessed in concurrent way
# CWO would wrap an IndexOutput created with {{CopyOnCloseIndexOutput}}. This wrapper would push a task in queue once it is closed
# Before the directory is closed the complete queue would be finished

*Incremental Indexing*
For incremental indexing the logic has some optimization where it makes use of locally copied index files (done for CopyOnRead) if possible. For this to work one needs to provide the {{indexPath}} i.e. path where indexing configuration is defined as per current Oak design an index editor does not know the path of the index config node builder. This path is required to pickup the correct local directory

This allows
* Avoiding copy of temporary files which get deleted during indexing itself
* Allows Lucene to make use of native file system support and copies the written files in background

[~teofili] [~alex.parvulescu] Can you have a look
[~mduerig] Can you review the approach taken to process job. Its based on the way processing is done via executor in BackgroundObserver



> CopyOnWriteDirectory implementation for Lucene for use in indexing
> ------------------------------------------------------------------
>
>                 Key: OAK-2247
>                 URL: https://issues.apache.org/jira/browse/OAK-2247
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: lucene
>            Reporter: Chetan Mehrotra
>            Assignee: Chetan Mehrotra
>         Attachments: OAK-2247-v1.patch
>
>
> Currently a Lucene index when is written directly to OakDirectory. For reindex case it might happen that Lucene merge policy read the written index files again and then perform a sgement merge. This might have lower performance when OakDirectroy is writing to remote storage.
> Instead of that we can implement a CopyOnWriteDirectory on similar lines to  OAK-1724 where CopyOnReadDirectory support copies the  index locally for faster access. 
> At high level flow would be
> # While writing index the index file is first written to local directory
> # Any write is done locally and once a file is written its written asynchronously to OakDirectory
> # When IndexWriter is closed it would wait untill all the write is completed
> This needs to be benchmarked with existing reindex timings to see it its actually beneficial



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)