You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Julian Sedding (JIRA)" <ji...@apache.org> on 2015/04/07 14:21:12 UTC

[jira] [Commented] (OAK-2626) Optimize binary comparison for merge during upgrade

    [ https://issues.apache.org/jira/browse/OAK-2626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14483081#comment-14483081 ] 

Julian Sedding commented on OAK-2626:
-------------------------------------

The time taken for an upgrade can be split into the time taken to *copy the data* and the time taken to *execute the commit hooks*.

This optimization relies on comparing blobs by reference. The optimizations are designed to avoid file-system access where possible, i.e. reference calculation is reduced to a string operation.

For the initial step of *copying the data*, the attached patch is sufficient, as the {{JackrabbitNodeState}}'s anonymous {{AbstractBlob}} implements {{equals}} with a reference comparison, before falling back to {{AbstractBlob#equals()}}.

In order to benefit from the optimization during the *execution of commot hooks*, the patch from OAK-2627, which adds a reference comparison to {{AbstractBlob}} itself, needs to be applied as well. This is because when the commit hooks are executed, any compared {{NodeState}}s are off the same type (e.g. SegmentNodeState or DocumentNodeState).

To activate the optimization, {{ReferenceOptimizedBlobStore}} needs to be used (as a drop-in replacement) instead of {{DataStoreBlobStore}}.

!incremental-upgrade-no-changes.png!

The graph shows four scenarios run with TarMK + FDS (500k nodes copied from an AEM instance, ~2/3 are digital assets, ~1/3 are websites). Each time the source repository is copied a second time without any changes.

# copy: No optimizations. Essentially, the entire repository is copied again (34 sec) and then compared for the commit-hooks. No NodeStates are shared, so a full repository traversal is done for the comparison (63 sec).
# copy + binary-optimization: Optimized blob comparison by reference. Again, the entire repository is copied again (34 sec) and then compared for the commit-hooks. A full repository traversal is done for the comparison, but blob comparison is optimized (14 sec).
# recursive-copy: Content is copied recursively (43 sec). All properties are compared during copy and set only if changed (see OAK-2619). Since there are no changes, no time is required to execute commit-hooks (0 sec).
# recursive-copy + blob-optimization: As above, but the recursive copy benefits from optimized binary comparison (15 sec). Again, no changes were made, hence commit hooks require no time (0 sec).

For reference, the first run for all four scenarios is very uniform: 33-36 sec for copy and 19 sec for the commit hooks (comparing against EmptyNodeState is fast), i.e. a total of 52-57 sec.

> Optimize binary comparison for merge during upgrade 
> ----------------------------------------------------
>
>                 Key: OAK-2626
>                 URL: https://issues.apache.org/jira/browse/OAK-2626
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: upgrade
>    Affects Versions: 1.1.7
>            Reporter: Julian Sedding
>            Priority: Minor
>         Attachments: OAK-2626.patch, incremental-upgrade-no-changes.png
>
>
> In OAK-2619 I propose to support repeated upgrades into the same NodeStore.
> This issue does not optimizate the first run, but any subsequent run benefits from the proposed changes.
> One use-case for this feature is to import all content several days before the upgrade and then copy only the delta on the day of the upgrade.
> Assuming that both the source and target repositories use the same FileDataStore, binaries could be efficiently compared by their references.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)