You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Tomek Rękawek (JIRA)" <ji...@apache.org> on 2016/12/16 11:17:58 UTC

[jira] [Updated] (OAK-4751) Improve the checkpoint migration performance

     [ https://issues.apache.org/jira/browse/OAK-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tomek Rękawek updated OAK-4751:
-------------------------------
    Fix Version/s: 1.4.12

> Improve the checkpoint migration performance
> --------------------------------------------
>
>                 Key: OAK-4751
>                 URL: https://issues.apache.org/jira/browse/OAK-4751
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: segment-tar, upgrade
>            Reporter: Tomek Rękawek
>            Assignee: Tomek Rękawek
>             Fix For: 1.5.10, 1.4.12, 1.6
>
>         Attachments: OAK-4751.patch
>
>
> (based on [~alex.parvulescu] input):
> During the segment->segment-tar migration, a fair amount of time is being taken by the deduplication process. Basically the repository is ingesting large amounts of content (a checkpoint is the equivalent of a full repo state), and once it deduplicates the data, it finds it already available in the destination repository.
> The reason this happens is because the diff mechanism cannot be efficient across repositories.
> For example: on the source repo we have r0 root state and cp0 a checkpoint very close to r0. the diff(r0, cp0) is extremely cheap measured in milliseconds. But what the sidegrade does is it copies r0 to the destination repository: r0 -> rx1, then it runs diff(rx1, cp0) which becomes very expensive as the 2 node states don't originate from the same repository, so diffing will fallback to a slow content equals comparison. next the content is almost equal, so a huge amount of cycles are wasted in deduplicating data over the 2 repositories.
> I have no easy solution here other than looking into providing a diff mechanism that will compare the 2 local states diff(r0, cp0) BUT apply the delta to the destination repository (apply it on rx1). I'm not sure how easy this will turn out to be, and if it's worth the effort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)