You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@subversion.apache.org by Soren 'Frank' Munch <sm...@u5.com> on 2005/07/02 19:12:45 UTC

Backing up - is db/current written last? (was: Berkeley...)

Daniel Berlin wrote:
> When I started looking at Subversion for our company, the mandatory
> requirements were reliability and availability. We have development
> offices in several countries and require 24-hour access. I waited until
> FSFS was available and stable and have since begun pilot testing.
>
> FSFS with rsync is indeed the way to go. We use rsync to backup our
> repositories being careful to backup db/current first. The backup cron
> job is amazingly fast - maybe a few seconds every hour, so we get highly
> reliable backups with essentially no impact on the server. We copy to a
> standby server (to be located in a disaster recovery center), so should
> the main server fail, or we lose network connectivity, we can be up and
> running again in under an hour.

Smart, and done with only packages, Subversion and rsync only.

One thing that struck me is that backing up the db/current first is no 
guarantee?

A commit initiated just before backup-time must result in a corrupted backup, 
virtually every time:

db/current is written (by svn doing "commit zero")
The backup start and backs up db/current first. The same backup procedure then 
proceed (e.g. by running rsync) for the rest of the files.
But when rsync started commit zero is still running and writes to other files 
which should correspond to db/current. We can have no control over which 
files are being read by rsync earlier than commit zero updated them, and 
which later.

So it is hard to imagine how commit zero could be correctly backed up.

To me it seems a better procedure would be to back up db/current first, then 
_wait_ until commit zero surely is completed (with plenty of margin added) 
and then run rsync. This way we can be sure that the update of all files 
corresponding to commit zero is completed.

It does not matter if another commit has started while we wait as the files 
written by commit zero will not be changed and so we would have a backup with 
integrity (and possibly 'unreferred' files belonging to commits done before 
our backup were completed).

Have I understood this correctly?

- - -

Another matter is - which is a OS-thing - what happens if a write-request is 
being done to a file while another process reads from it. Does e.g. UNIX 
automatically take care of some locking? If so it must mean that it is 
"wrong" to write to any part of a file while it is being read?

Or can we risk that db/current is being read by the backup  procedure - and 
then changed by svn while we read it it, resulting in corruption?

- - -

Once "The server is down for maintenance, please come back soon..." was good 
enough... :-)

Soren 'Frank'


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Backing up - is db/current written last?

Posted by Daniel Berlin <db...@dberlin.org>.
On Mon, 2005-07-04 at 14:02 +0200, Jacob Atzen wrote:
> On Sat, Jul 02, 2005 at 10:32:37PM -0400, Daniel Berlin wrote:
> > Rsync builds the list of files first, *then* transfers them.
> > 
> > Thus, the case to worry about is if it transfers db/current *after*
> > another commit has occurred during the rsync (and the newly created
> > revision file is thus not in rsync's list of files), and then you have a
> > db/current that refers to something that wasn't transferred.
> > 
> > Hence the reason to transfer it first.
> 
> Why not use svnadmin hotcopy[1] and stop worrying instead?

Because it takes time to copy 6.9 gig around to a temporary directory :)



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Backing up - is db/current written last?

Posted by Jacob Atzen <ja...@aub.dk>.
On Sat, Jul 02, 2005 at 10:32:37PM -0400, Daniel Berlin wrote:
> Rsync builds the list of files first, *then* transfers them.
> 
> Thus, the case to worry about is if it transfers db/current *after*
> another commit has occurred during the rsync (and the newly created
> revision file is thus not in rsync's list of files), and then you have a
> db/current that refers to something that wasn't transferred.
> 
> Hence the reason to transfer it first.

Why not use svnadmin hotcopy[1] and stop worrying instead? That is, do a
hotcopy on the repository server and then rsync the copy to the backup
server. I also like to keep a dump of every single revision that goes
into the repository, just in case. I do this by having a post-commit
hook that simply dumps the latest delta into a file on each commit like
this:

$SVNADMIN dump "$REPOS" --incremental --revision "$REV" | bzip2 >
"$BACKUPDIR/$REPOSNAME/incremental/$REPOSNAME.$REV.dump.bz2"

[1]: <http://svnbook.red-bean.com/en/1.1/re33.html>

-- 
Cheers,
- Jacob Atzen

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Backing up - is db/current written last?

Posted by Daniel Berlin <db...@dberlin.org>.
On Sun, 2005-07-03 at 05:01 +0700, Soren 'Frank' Munch wrote:
> Daniel wrote:
> > db/current is written last by svn.
> >
> > Also, fsfs is still transactional, so the commits are stored in the
> > transaction directory until they are moved into place, and *then*
> > db/current is updated.
> 
> Great! In that case db/current _must_ point to a set of valid file and there's 
> no reason to delay rsync.
Rsync builds the list of files first, *then* transfers them.

Thus, the case to worry about is if it transfers db/current *after*
another commit has occurred during the rsync (and the newly created
revision file is thus not in rsync's list of files), and then you have a
db/current that refers to something that wasn't transferred.

Hence the reason to transfer it first.


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Backing up - is db/current written last?

Posted by Soren 'Frank' Munch <sm...@u5.com>.
Daniel wrote:
> db/current is written last by svn.
>
> Also, fsfs is still transactional, so the commits are stored in the
> transaction directory until they are moved into place, and *then*
> db/current is updated.

Great! In that case db/current _must_ point to a set of valid file and there's 
no reason to delay rsync.

> Thus, at worst, you will end up with a backup that does not include the
> last revision, *not* a corrupt repository.

I figured out this would be the worst case if db/current was not written last.

- - -

Well, the bottom line must be that grabbing the db/current first leads to 
correct backups of the commit that wrote it. I assume is what hot-backup.py 
is doing but I am one of those who would like to keep up an illusion that I 
know what goes on and the raw "(s)cp db/current ..." and subsequent rsync is 
very appealing.

Thanks for the information from the front-line!

Soren 'Frank'

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Backing up - is db/current written last? (was: Berkeley...)

Posted by Daniel Berlin <db...@dberlin.org>.
On Sun, 2005-07-03 at 02:12 +0700, Soren 'Frank' Munch wrote:
> Daniel Berlin wrote:
> > When I started looking at Subversion for our company, the mandatory
> > requirements were reliability and availability. We have development
> > offices in several countries and require 24-hour access. I waited until
> > FSFS was available and stable and have since begun pilot testing.
> >
> > FSFS with rsync is indeed the way to go. We use rsync to backup our
> > repositories being careful to backup db/current first. The backup cron
> > job is amazingly fast - maybe a few seconds every hour, so we get highly
> > reliable backups with essentially no impact on the server. We copy to a
> > standby server (to be located in a disaster recovery center), so should
> > the main server fail, or we lose network connectivity, we can be up and
> > running again in under an hour.
> 
> Smart, and done with only packages, Subversion and rsync only.
> 
> One thing that struck me is that backing up the db/current first is no 
> guarantee?
> 
> A commit initiated just before backup-time must result in a corrupted backup, 
> virtually every time:
> 
> db/current is written (by svn doing "commit zero")

db/current is written last by svn.

Also, fsfs is still transactional, so the commits are stored in the
transaction directory until they are moved into place, and *then*
db/current is updated.

Thus, at worst, you will end up with a backup that does not include the
last revision, *not* a corrupt repository.



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org