You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Marcel Reutegger (JIRA)" <ji...@apache.org> on 2015/04/20 15:39:59 UTC

[jira] [Commented] (OAK-2778) DocumentNodeState is null for revision rx-x-x

    [ https://issues.apache.org/jira/browse/OAK-2778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14502794#comment-14502794 ] 

Marcel Reutegger commented on OAK-2778:
---------------------------------------

The root cause is most likely a race condition in the VersionGarbageCollector. It first gets a list of candidate documents to remove. The condition for those documents is: the _deletedOnce flag is true and the _modified timestamp is older than one day. Then the VersionGarbageCollector goes through each of the documents and checks if the node based on that document exists at the head revision when the GC started. It the node doesn't exist, the document is considered garbage and removed.
The race condition occurs, when the node is re-created after the GC started and is going through the candidate documents. The VersionGarbageCollector will consider the document as garbage even though it was modified after the given timestamp.
The race condition is usually quite unlikely for the lucene index update. Almost all lucene files are write once and never modified. The exception is when a re-index occurs. Oak will delete all existing lucene files and start a new index from scratch (atomically). This means the new index files will use names of files that existed before. Again, in most cases even a re-index is not a problem, because of the way lucene assigns names to files. Almost all lucene files have a generation suffix, which is a hexadecimal number, which increments with each new generation. The suffix is monotonically increasing and starts as 0. This means, the garbage collector will always remove the oldest generations and not touch the active part of the index. Even if a re-index occurs, the previously active lucene index will be at a rather high generation, which makes it unlikely to overlap with the generation of the new index.

The probability increases considerably if multiple re-indexes occur, e.g. once a day. This means the lucene generations eligible for garbage collection will overlap with the new generations to be used by the active index.

I'll try to create a test to reproduce the issue.

> DocumentNodeState is null for revision rx-x-x
> ---------------------------------------------
>
>                 Key: OAK-2778
>                 URL: https://issues.apache.org/jira/browse/OAK-2778
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: core, mongomk
>    Affects Versions: 1.0, 1.2
>            Reporter: Marcel Reutegger
>            Assignee: Marcel Reutegger
>             Fix For: 1.3.0
>
>
> On a system running Oak 1.0.12 the following exception is seen repeatedly when the async index update tries to update a lucene index:
> {noformat}
> org.apache.sling.commons.scheduler.impl.QuartzScheduler Exception during job execution of org.apache.jackrabbit.oak.plugins.index.AsyncIndexUpdate@6be42cde : DocumentNodeState is null for revision r14cbbd50ad2-0-1 of /oak:index/lucene/:data/_1co.cfe (aborting getChildNodes())
> org.apache.jackrabbit.oak.plugins.document.DocumentStoreException: DocumentNodeState is null for revision r14cbbd50ad2-0-1 of /oak:index/lucene/:data/_1co.cfe (aborting getChildNodes())
> at org.apache.jackrabbit.oak.plugins.document.DocumentNodeStore$6.apply(DocumentNodeStore.java:925)
> at org.apache.jackrabbit.oak.plugins.document.DocumentNodeStore$6.apply(DocumentNodeStore.java:919)
> at com.google.common.collect.Iterators$8.transform(Iterators.java:794)
> at com.google.common.collect.TransformedIterator.next(TransformedIterator.java:48)
> at com.google.common.collect.TransformedIterator.next(TransformedIterator.java:48)
> at org.apache.jackrabbit.oak.plugins.document.DocumentNodeState$ChildNodeEntryIterator.next(DocumentNodeState.java:618)
> at org.apache.jackrabbit.oak.plugins.document.DocumentNodeState$ChildNodeEntryIterator.next(DocumentNodeState.java:587)
> at com.google.common.collect.TransformedIterator.next(TransformedIterator.java:48)
> at com.google.common.collect.Iterators.addAll(Iterators.java:357)
> at com.google.common.collect.Lists.newArrayList(Lists.java:146)
> at com.google.common.collect.Iterables.toCollection(Iterables.java:334)
> at com.google.common.collect.Iterables.toArray(Iterables.java:312)
> at org.apache.jackrabbit.oak.plugins.index.lucene.OakDirectory.listAll(OakDirectory.java:69)
> at org.apache.lucene.index.DirectoryReader.indexExists(DirectoryReader.java:339)
> at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:720)
> at org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndexEditorContext.getWriter(LuceneIndexEditorContext.java:134)
> at org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndexEditor.addOrUpdate(LuceneIndexEditor.java:260)
> at org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndexEditor.leave(LuceneIndexEditor.java:171)
> at org.apache.jackrabbit.oak.spi.commit.CompositeEditor.leave(CompositeEditor.java:74)
> at org.apache.jackrabbit.oak.spi.commit.VisibleEditor.leave(VisibleEditor.java:63)
> at org.apache.jackrabbit.oak.spi.commit.EditorDiff.childNodeAdded(EditorDiff.java:130)
> at org.apache.jackrabbit.oak.plugins.memory.EmptyNodeState.compareAgainstEmptyState(EmptyNodeState.java:160)
> {noformat}
> A similar issue was already fixed with OAK-2420.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)