You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Philip Martin <ph...@wandisco.com> on 2011/02/17 11:54:38 UTC

svnadmin hotcopy --incremental

Somebody responsible for backing up a large FSFS repository asked me if
it were possible to do an incremental hotcopy.  An incremental hotcopy
would update a previous hotcopy to the current HEAD and would only need
to copy the rev files newer than the previous hotcopy.  This might
involve deleting rev files if the packing has changed.

Some strategy to deal with revprops would be needed: copy them all, read
them all and copy the ones that have changed, copy the ones with newer
timestamps, something else.  The locks directory would need to be
deleted and copied completely.

Incremental hotcopy would start with a valid repository and end with a
valid reposiory, but interrupting it part way through might result in an
invalid repository.

I don't think we can easily do this for BDB, so this would be an
FSFS-only feature; since svnadmin already has BDB-only flags this should
not be too much of a problem.

Anyone see any problems with this approach?  Does it sound like a good
idea?

-- 
Philip

Re: svnadmin hotcopy --incremental

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Philip Martin wrote on Sat, Feb 19, 2011 at 10:30:17 +0000:
> Philip Martin <ph...@wandisco.com> writes:
> 
> > Daniel Shahaf <d....@daniel.shahaf.name> writes:
> >>
> >> Either method would need to account for old-revprop changes and for 'svn
> >> lock' locks.
> >
> > Unless we start recording some sort of history for revprops the only
> > option for svnsync is to resend all revprops, hotcopy has more options.
> 
> As you know, hotcopy gets locks wrong at present (issue 3750).  I've
> just thought of another thing to add to that issue but tigris.org is
> down so I'm recording it here:
> 
> hotcopy currently copies revs then locks.  Nothing prevents the source
> adding a rev and a lock after hotcopy has copied revs and before it
> copies locks.  That means the copy could have a lock on a file that does
> not exist in the copy.  I'm not sure there is any copying order we can
> use that will avoid the problem, we may need to add a post-copy step to
> audit the locks.
> 

Or we could start recording locks in the revprops storage...

e.g., if I lock a file FOO with HEAD=N, then we set an internal revprop
on rM (where rM is when FOO@N's noderev was created).  It wouldn't be
visible as a revprop outside of the FS.  Not sure on the details.

I need to think more about this.

> -- 
> Philip

Re: svnadmin hotcopy --incremental

Posted by Philip Martin <ph...@wandisco.com>.
Philip Martin <ph...@wandisco.com> writes:

> Daniel Shahaf <d....@daniel.shahaf.name> writes:
>>
>> Either method would need to account for old-revprop changes and for 'svn
>> lock' locks.
>
> Unless we start recording some sort of history for revprops the only
> option for svnsync is to resend all revprops, hotcopy has more options.

As you know, hotcopy gets locks wrong at present (issue 3750).  I've
just thought of another thing to add to that issue but tigris.org is
down so I'm recording it here:

hotcopy currently copies revs then locks.  Nothing prevents the source
adding a rev and a lock after hotcopy has copied revs and before it
copies locks.  That means the copy could have a lock on a file that does
not exist in the copy.  I'm not sure there is any copying order we can
use that will avoid the problem, we may need to add a post-copy step to
audit the locks.

-- 
Philip

Re: svnadmin hotcopy --incremental

Posted by Philip Martin <ph...@wandisco.com>.
Daniel Shahaf <d....@daniel.shahaf.name> writes:

> Philip Martin wrote on Sat, Feb 19, 2011 at 09:43:49 +0000:
>> At the moment svnsync doesn't do exclusive file locks or old
>> revprops that have changed.  svnsync is unlikely to ever update old
>> revprops automatically, it is likely that it will always need some
>> external system to track revprop changes.
>> 
>
> Either method would need to account for old-revprop changes and for 'svn
> lock' locks.

Unless we start recording some sort of history for revprops the only
option for svnsync is to resend all revprops, hotcopy has more options.

-- 
Philip

Re: svnadmin hotcopy --incremental

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Daniel Shahaf wrote on Sat, Feb 19, 2011 at 11:45:54 +0200:
> (For example, WebDAV mirrors also suffer from both of these problems)

Oops; I already said that in another mail.  Sorry for the duplication.

Re: svnadmin hotcopy --incremental

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Philip Martin wrote on Sat, Feb 19, 2011 at 09:43:49 +0000:
> Daniel Shahaf <d....@daniel.shahaf.name> writes:
> 
> > Why can't you svnsync the master to the hotcopy using file:// URLs?
> > (and create the svn:sync-* props before / remove them after, if that's
> > a problem)
> 
> That's one option.  svnsync is probably less efficient in terms of disk
> IO.

Fair point.

> At the moment svnsync doesn't do exclusive file locks or old
> revprops that have changed.  svnsync is unlikely to ever update old
> revprops automatically, it is likely that it will always need some
> external system to track revprop changes.
> 

Either method would need to account for old-revprop changes and for 'svn
lock' locks.  (For example, WebDAV mirrors also suffer from both of
these problems)

> -- 
> Philip

Re: svnadmin hotcopy --incremental

Posted by Philip Martin <ph...@wandisco.com>.
Daniel Shahaf <d....@daniel.shahaf.name> writes:

> Why can't you svnsync the master to the hotcopy using file:// URLs?
> (and create the svn:sync-* props before / remove them after, if that's
> a problem)

That's one option.  svnsync is probably less efficient in terms of disk
IO.  At the moment svnsync doesn't do exclusive file locks or old
revprops that have changed.  svnsync is unlikely to ever update old
revprops automatically, it is likely that it will always need some
external system to track revprop changes.

-- 
Philip

Re: svnadmin hotcopy --incremental

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Why can't you svnsync the master to the hotcopy using file:// URLs?
(and create the svn:sync-* props before / remove them after, if that's
a problem)

Philip Martin wrote on Thu, Feb 17, 2011 at 10:54:38 +0000:
> Somebody responsible for backing up a large FSFS repository asked me if
> it were possible to do an incremental hotcopy.  An incremental hotcopy
> would update a previous hotcopy to the current HEAD and would only need
> to copy the rev files newer than the previous hotcopy.  This might
> involve deleting rev files if the packing has changed.
> 
> Some strategy to deal with revprops would be needed: copy them all, read
> them all and copy the ones that have changed, copy the ones with newer
> timestamps, something else.  The locks directory would need to be
> deleted and copied completely.
> 
> Incremental hotcopy would start with a valid repository and end with a
> valid reposiory, but interrupting it part way through might result in an
> invalid repository.
> 
> I don't think we can easily do this for BDB, so this would be an
> FSFS-only feature; since svnadmin already has BDB-only flags this should
> not be too much of a problem.
> 
> Anyone see any problems with this approach?  Does it sound like a good
> idea?
> 
> -- 
> Philip

Re: svnadmin hotcopy --incremental

Posted by Philip Martin <ph...@wandisco.com>.
Philip Martin <ph...@wandisco.com> writes:

> rsync on a live repository is becoming less reliable: it doesn't handle
> exclusive file locks (issue 3750) or 1.7 packed revprops.

The rep-cache is a problem as well.

-- 
Philip

Re: svnadmin hotcopy --incremental

Posted by Philip Martin <ph...@wandisco.com>.
Stefan Sperling <st...@elego.de> writes:

> It would be nice to come up with a solution that leaves the repository
> in a well-defined state even if the operation is interrupted. rsync cannot
> do that.

So long as we do things in a reasonable order it will probably be
possible to restart the incremental hotcopy even if a previous
incremental was interrupted and left an invalid repository.

rsync on a live repository is becoming less reliable: it doesn't handle
exclusive file locks (issue 3750) or 1.7 packed revprops.


-- 
Philip

Re: svnadmin hotcopy --incremental

Posted by Stefan Sperling <st...@elego.de>.
On Thu, Feb 17, 2011 at 01:26:55PM +0200, Daniel Shahaf wrote:
> Stefan Sperling wrote on Thu, Feb 17, 2011 at 12:29:09 +0100:
> > On Thu, Feb 17, 2011 at 01:14:40PM +0200, Daniel Shahaf wrote:
> > > Stefan Sperling wrote on Thu, Feb 17, 2011 at 12:11:06 +0100:
> > > > People are using rsync instead of hotcopy for this reason (and as long
> > > > 'current' is copied first this is probably the best way of making incremental
> > > > backups).
> > > 
> > > Nowadays 'svnadmin recover' can recreate 'current', can't it?
> > 
> > Yes, but it's annoying for admins to have to worry about this while
> > they're restoring from backup with all their users stomping on their
> > toes because the service is down.
> 
> Then run 'recover' when you make the backup, not when you restore it.

There are many ways to peel an orange :)

Re: svnadmin hotcopy --incremental

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Stefan Sperling wrote on Thu, Feb 17, 2011 at 12:29:09 +0100:
> On Thu, Feb 17, 2011 at 01:14:40PM +0200, Daniel Shahaf wrote:
> > Stefan Sperling wrote on Thu, Feb 17, 2011 at 12:11:06 +0100:
> > > People are using rsync instead of hotcopy for this reason (and as long
> > > 'current' is copied first this is probably the best way of making incremental
> > > backups).
> > 
> > Nowadays 'svnadmin recover' can recreate 'current', can't it?
> 
> Yes, but it's annoying for admins to have to worry about this while
> they're restoring from backup with all their users stomping on their
> toes because the service is down.

Then run 'recover' when you make the backup, not when you restore it.

Re: svnadmin hotcopy --incremental

Posted by Stefan Sperling <st...@elego.de>.
On Thu, Feb 17, 2011 at 01:14:40PM +0200, Daniel Shahaf wrote:
> Stefan Sperling wrote on Thu, Feb 17, 2011 at 12:11:06 +0100:
> > People are using rsync instead of hotcopy for this reason (and as long
> > 'current' is copied first this is probably the best way of making incremental
> > backups).
> 
> Nowadays 'svnadmin recover' can recreate 'current', can't it?

Yes, but it's annoying for admins to have to worry about this while
they're restoring from backup with all their users stomping on their
toes because the service is down.

Re: svnadmin hotcopy --incremental

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Stefan Sperling wrote on Thu, Feb 17, 2011 at 12:11:06 +0100:
> People are using rsync instead of hotcopy for this reason (and as long
> 'current' is copied first this is probably the best way of making incremental
> backups).

Nowadays 'svnadmin recover' can recreate 'current', can't it?

Re: svnadmin hotcopy --incremental

Posted by Stefan Sperling <st...@elego.de>.
On Thu, Feb 17, 2011 at 10:54:38AM +0000, Philip Martin wrote:
> Somebody responsible for backing up a large FSFS repository asked me if
> it were possible to do an incremental hotcopy.  An incremental hotcopy
> would update a previous hotcopy to the current HEAD and would only need
> to copy the rev files newer than the previous hotcopy.  This might
> involve deleting rev files if the packing has changed.
> 
> Some strategy to deal with revprops would be needed: copy them all, read
> them all and copy the ones that have changed, copy the ones with newer
> timestamps, something else.  The locks directory would need to be
> deleted and copied completely.
> 
> Incremental hotcopy would start with a valid repository and end with a
> valid reposiory, but interrupting it part way through might result in an
> invalid repository.
> 
> I don't think we can easily do this for BDB, so this would be an
> FSFS-only feature; since svnadmin already has BDB-only flags this should
> not be too much of a problem.
> 
> Anyone see any problems with this approach?  Does it sound like a good
> idea?

People are using rsync instead of hotcopy for this reason (and as long
'current' is copied first this is probably the best way of making incremental
backups).

So it would be great if we had a built-in way to support this.

It would be nice to come up with a solution that leaves the repository
in a well-defined state even if the operation is interrupted. rsync cannot
do that.