You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Chetan Mehrotra (JIRA)" <ji...@apache.org> on 2017/12/19 05:26:00 UTC

[jira] [Commented] (OAK-7074) Ensure that all Documents are read with document order traversal indexing

    [ https://issues.apache.org/jira/browse/OAK-7074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16296232#comment-16296232 ] 

Chetan Mehrotra commented on OAK-7074:
--------------------------------------

With 1818634 now sorting uses distinct mode to avoid duplicates.

[~catholicon] mentioned in offline discussion that for duplicates we just need to ensure NodeStateEntries are unique per per. It does not matter for same path which entry is picked. Further document may appear more than once in a cursor traversal for one of the following cases

# Document was updated - If document gets updated then it may be moved around and thus may appear twice in natural order traversal. So while sorting we can still pick anyone as the NodeState view for the checkpoint revision would be same for both Mongo documents. 
# Document was moved due to internal design of Mongo - It may happen that Mongo may move around document without update (say due to some compaction process). In that case we are not sure on consistency gurantee of natural order traversal i.e. is it possible that document may not get reflected in cursor result at all if Mongo is in use?

So based on #1 we just need to ensure that sorting removes any duplicates

> Ensure that all Documents are read with document order traversal indexing
> -------------------------------------------------------------------------
>
>                 Key: OAK-7074
>                 URL: https://issues.apache.org/jira/browse/OAK-7074
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: mongomk, run
>            Reporter: Chetan Mehrotra
>            Assignee: Chetan Mehrotra
>             Fix For: 1.8
>
>
> With OAK-6353 support was added for document order traversal indexing. In this mode we open a DB cursor and try to read all documents from it using document order traversal. Such a cursor may remain open for long time (2-4 hrs) and its possible that document may get reordered by the Mongo storage engine. This would result in 2 aspects to be thought about 
> # Duplicate documents - Same document may appear more than once in result set 
> # Possibly missed document - It may be a possibility that a document got moved and missed becoming part of cursor. 
> Both these aspects would need to be handled



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)