You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Stefan Egli (JIRA)" <ji...@apache.org> on 2015/02/05 15:05:35 UTC

[jira] [Created] (OAK-2480) Incremental (FileStore)Backup copies the entire source instead of just the delta

Stefan Egli created OAK-2480:
--------------------------------

             Summary: Incremental (FileStore)Backup copies the entire source instead of just the delta
                 Key: OAK-2480
                 URL: https://issues.apache.org/jira/browse/OAK-2480
             Project: Jackrabbit Oak
          Issue Type: Bug
          Components: run
    Affects Versions: 1.1.5
            Reporter: Stefan Egli


Running the FileStoreBackup (in oak-run) sequentially should correspond to an incremental backup. This implies the expectation, that the incremental backup is very resource-friendly, ie that it only adds the delta/diff that changed since the last backup. Instead what can be een at the moment, is that it copies the entire source-store again on each 'incremental' backup.

Tested with the latest trunk snapshot.

Suspecting the problem to be as follows: on the first backup the FileStoreBackup stores a checkpoint created in the source-store and adds it as a property "checkpoint" to the backup root node, besides the actual backup which is stored in '/root'. 
On subsequent incremental runs, the backup tries to retrieve said property "checkpoint" from the backup and uses that in the compactor to do the diff based upon.
Now the problem seems to be that in Compactor.compact it goes to call process(), which does a writer.writeNode(before) (where before is the checkpoint in the origin store but writer is a writer of the backup store). And in this SegmentWriter.writeNode() it fails to find the 'before' segment, and thus traverses the entire tree and copies it from the origin to the backup.

So the problem looks to be in the area where it assumes to find this 'checkpoint-before' in the backup but that's not the case.

So a solution would have been to not do the diff between the checkpoint and the current origin-head, but between the backup-head and the origin-head instead. Now apparently this was not the intention though, as that would mean to read through the entire backup for doing the diffing - and that would be inefficient...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)