You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Chetan Mehrotra (JIRA)" <ji...@apache.org> on 2016/06/30 11:10:10 UTC
[jira] [Commented] (OAK-3629) Index corruption seen with CopyOnRead when index defnition is recreated

    [ https://issues.apache.org/jira/browse/OAK-3629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15356914#comment-15356914 ] 

Chetan Mehrotra commented on OAK-3629:
--------------------------------------

*Implementation Details*

To fix this issue we have changed the directory structure which is used for local storage. Now for each reindex of index at given JCR path a new directory would be created

{noformat}
lucene-1467282418224
├── default
│   ├── ...
│   ├── segments_2
│   └── segments.gen
└── index-details.txt
{noformat}

* Top level dir <index path>-<uid>
** index path - This would consist of max last 2 elements of indexpath excluding oak:index and only having safe chars a-zA-Z0-9_. All other chars would be removed. Using the actual index name simplifies mapping the index dir on filesystem with actual index path in JCR
** uid - Unique id as set in index config (details below)
* {{default}} - The top dir is a container. For a Oak Lucene index at any path there can be multiple actual Lucene directory. Currently we have a "default" which maps to ":data". Later we can also store suggest index etc (OAK-3916)
* {{index-details.txt}} - This contains certain metadata like timestamp when index directory was created on the filesystem, index JCR path

*Unique Id*
{{LuceneIndexEditorContext}} would generate and store a unique id which per current impl is timestamp (always increasing via Clock API). Once generated this would be stored under <index node>/:status/uid. In case of reindex all such hidden nodes get removed and this would cause regenration of the unique id. This id would then be combined with index name (derived from path).

if index gets reindexed then it would lead to newer directory name. This would ensure that directory names do not collide for given cluster node

* trunk
** 1750769 - Base implementation

> Index corruption seen with CopyOnRead when index defnition is recreated
> -----------------------------------------------------------------------
>
>                 Key: OAK-3629
>                 URL: https://issues.apache.org/jira/browse/OAK-3629
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: lucene
>            Reporter: Chetan Mehrotra
>            Assignee: Chetan Mehrotra
>            Priority: Blocker
>             Fix For: 1.6
>
>
> CopyOnRead logic relies on {{reindexCount}} to determine the name of directory in which index files would be copied. In normal flow if the index is reindexed then this count would get increased and newer index files would get copied to a new directory.
> However if the index definition node gets recreated due to some deployment process then this count gets reset to 0. Due to which newly created index files from reindexing would start getting copied to already existing directory and that can lead to corruption.
> So what happened here was
> # System started with index definition I1 and indexing got complete with index files saved under index/hash(indexpath)/1 (where 1 is current reindex count)
> # A new index definition package was deployed which reset the index count. Now reindex happened again and the CopyOnRead logic per current design reused the existing index directory. And it so happens that Lucene create file with same name and same size but different content. This trips the CopyOnRead defense of length based index corruption check and thus cause new lucene index to corrupt
> *Note that here corruption is transient i.e. persisted index is not corrupted*. Just that locally copied index gets corrupted. Cleaning up the index directory would fix the issue and that can be used as a workaround.
> *Fix*
> After discussing with [~tmueller] following approach can be used.
> Instead of relying on reindex count we can maintain a hidden randomly generated uuid and store it in the index config. This would be used to derive the name of directory on filesystem. If the index definition gets reset then the uuid can be regenerated. 
> *Workaround*
> Clean the directory used by CopyOnRead which is <repo home>/index before restart



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)