You are viewing a plain text version of this content. The canonical link for it is here.

Posted to oak-issues@jackrabbit.apache.org by "Marcel Reutegger (JIRA)" <ji...@apache.org> on 2015/01/06 08:47:34 UTC

[jira] [Updated] (OAK-2359) read is inefficient when there are many split documents

     [ https://issues.apache.org/jira/browse/OAK-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Marcel Reutegger updated OAK-2359:
----------------------------------
    Summary: read is inefficient when there are many split documents  (was: diffImpl is inefficient when there are many split documents)

> read is inefficient when there are many split documents
> -------------------------------------------------------
>
>                 Key: OAK-2359
>                 URL: https://issues.apache.org/jira/browse/OAK-2359
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.0.8
>         Environment: 1.0.8.r1644758
>            Reporter: Stefan Egli
>            Assignee: Marcel Reutegger
>            Priority: Critical
>             Fix For: 1.1.4, 1.0.10
>
>         Attachments: oak2359patch.diff
>
>
> As reported in OAK-2358 there is a potential problem with revisionGC not cleaning up split documents properly (in 1.0.8.r1644758 at least). 
> As a side-effect, having many garbage-revisions renders the diffImpl algorithm to become very slow - normally it would take only a few millis, but with nodes that have many split-documents I can see diffImpl take hundres of millis, sometimes up to a few seconds. Which causes the observation dequeuing to be slower than the rate in which observation events are enqueued, which results in observation queue never being cleaned up and event handling being delayed more and more.
> Adding some logging showed that diffImpl would often read many split-documents, which supports the assumption that the revisionGC not cleaning up revisions has the diffImpl-slowness as a side-effect. Having said that - diffImpl should probably still be able to run fast, since all the revisions it should look at should be in the main document, not in split documents.
> I dont have a test case handy for this at the moment unfortunately - if more is coming up, I'll add more details here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)