You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Vikas Saurabh (JIRA)" <ji...@apache.org> on 2017/05/28 22:30:04 UTC

[jira] [Assigned] (OAK-2808) Active deletion of 'deleted' Lucene index files from DataStore without relying on full scale Blob GC

     [ https://issues.apache.org/jira/browse/OAK-2808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vikas Saurabh reassigned OAK-2808:
----------------------------------

    Assignee: Vikas Saurabh  (was: Thomas Mueller)

Done following commits on trunk:
[r1796552|https://svn.apache.org/r1796552] - Main meaty implementation of the feature with tests
[r1796554|https://svn.apache.org/r1796554] - Wire up implementation done above with lucene indexing
[r1796555|https://svn.apache.org/r1796555] - An integration test which runs against mongo with/out fds

[~chetanm], as discussed earlier, we now instantiate directory factory on each indexing cycle. But, that required hacky modification of {{setDirectoryFactory}} (used in oak-run for out-of-band indexing feature) - I'm not feeling so good about the approach :-/

Things left to do:
* OAK-6227 - [~chetanm], I looked at doing it the jmx way we discussed - but afaics we'd have to depend on sun management package... I want to avoid that. Otoh, {{NodeStore}} gives listCheckpoint and checkpointInfo - maybe, we can just standardize on {{oak:created}} in checkpointInfo and do the find-min logic in oak-lucene!?
* Setup scheduling of actual collection of blobs marked deleted during indexing

> Active deletion of 'deleted' Lucene index files from DataStore without relying on full scale Blob GC
> ----------------------------------------------------------------------------------------------------
>
>                 Key: OAK-2808
>                 URL: https://issues.apache.org/jira/browse/OAK-2808
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: lucene
>            Reporter: Chetan Mehrotra
>            Assignee: Vikas Saurabh
>              Labels: datastore, performance
>             Fix For: 1.8
>
>         Attachments: copyonread-stats.png, OAK-2808-1.patch
>
>
> With storing of Lucene index files within DataStore our usage pattern
> of DataStore has changed between JR2 and Oak.
> With JR2 the writes were mostly application based i.e. if application
> stores a pdf/image file then that would be stored in DataStore. JR2 by
> default would not write stuff to DataStore. Further in deployment
> where large number of binary content is present then systems tend to
> share the DataStore to avoid duplication of storage. In such cases
> running Blob GC is a non trivial task as it involves a manual step and
> coordination across multiple deployments. Due to this systems tend to
> delay frequency of GC
> Now with Oak apart from application the Oak system itself *actively*
> uses the DataStore to store the index files for Lucene and there the
> churn might be much higher i.e. frequency of creation and deletion of
> index file is lot higher. This would accelerate the rate of garbage
> generation and thus put lot more pressure on the DataStore storage
> requirements.
> Discussion thread http://markmail.org/thread/iybd3eq2bh372zrl



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)