You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@subversion.apache.org by Soren 'Frank' Munch <sm...@u5.com> on 2005/07/01 20:01:19 UTC

Berkeley is good stuff!

Hi,

reading the last postings one could get the idea that Subversion with Berkeley 
is a risky affair.

We have been using Berkeley for 8 years or so, on dozens of servers, as a part 
of Postfix, Postgres, MySQL, Cyrus and for about a year with Subversion on 
FreeBSD. Some of the servers are having tens of thousands of pageloads pr. 
day and handling an equal number of mails.

Status: We have never had a problem with Berkely, not one in all the time! It 
sounds too good to be true, but it isn't. The only problem I recall - which 
seems to have gone away - is numerous entries in syslog over several years 
about hundreds of Berkeley 'lockers'.

This despite all the occasional problems one can expect, like a server crash 
now and then. I have repaired quite a couple of MySQL dbs over time, a few 
Postgres ditto but have yet to gain any experience with repairing Berkeley 
(or Svn for that matter).

A couple of recent posters report problems using OS X, maybe there's a problem 
on this system. So some have other experiences with Berkeley/Svn, but 
considering the services already done by this combination the strong 
statements we have seen seems really misleading.

I think it is far more correct to claim that if you are using Svn+Berkeley you 
are very unlikely to experience any problems.  At the same time I could 
imagine that fsfs is a better choice for most average size reps. I have a gut 
feeling that Berkely is a little big and complicated and with some known 
drawbacks (of which instability is _not_ one, unless the whole system 
crashes). Should I one day find the time I might migrate our old reps.

IMHO!

Soren 'Frank'

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Backing up - is db/current written last?

Posted by Daniel Berlin <db...@dberlin.org>.
On Mon, 2005-07-04 at 14:02 +0200, Jacob Atzen wrote:
> On Sat, Jul 02, 2005 at 10:32:37PM -0400, Daniel Berlin wrote:
> > Rsync builds the list of files first, *then* transfers them.
> > 
> > Thus, the case to worry about is if it transfers db/current *after*
> > another commit has occurred during the rsync (and the newly created
> > revision file is thus not in rsync's list of files), and then you have a
> > db/current that refers to something that wasn't transferred.
> > 
> > Hence the reason to transfer it first.
> 
> Why not use svnadmin hotcopy[1] and stop worrying instead?

Because it takes time to copy 6.9 gig around to a temporary directory :)



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Backing up - is db/current written last?

Posted by Jacob Atzen <ja...@aub.dk>.
On Sat, Jul 02, 2005 at 10:32:37PM -0400, Daniel Berlin wrote:
> Rsync builds the list of files first, *then* transfers them.
> 
> Thus, the case to worry about is if it transfers db/current *after*
> another commit has occurred during the rsync (and the newly created
> revision file is thus not in rsync's list of files), and then you have a
> db/current that refers to something that wasn't transferred.
> 
> Hence the reason to transfer it first.

Why not use svnadmin hotcopy[1] and stop worrying instead? That is, do a
hotcopy on the repository server and then rsync the copy to the backup
server. I also like to keep a dump of every single revision that goes
into the repository, just in case. I do this by having a post-commit
hook that simply dumps the latest delta into a file on each commit like
this:

$SVNADMIN dump "$REPOS" --incremental --revision "$REV" | bzip2 >
"$BACKUPDIR/$REPOSNAME/incremental/$REPOSNAME.$REV.dump.bz2"

[1]: <http://svnbook.red-bean.com/en/1.1/re33.html>

-- 
Cheers,
- Jacob Atzen

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Backing up - is db/current written last?

Posted by Daniel Berlin <db...@dberlin.org>.
On Sun, 2005-07-03 at 05:01 +0700, Soren 'Frank' Munch wrote:
> Daniel wrote:
> > db/current is written last by svn.
> >
> > Also, fsfs is still transactional, so the commits are stored in the
> > transaction directory until they are moved into place, and *then*
> > db/current is updated.
> 
> Great! In that case db/current _must_ point to a set of valid file and there's 
> no reason to delay rsync.
Rsync builds the list of files first, *then* transfers them.

Thus, the case to worry about is if it transfers db/current *after*
another commit has occurred during the rsync (and the newly created
revision file is thus not in rsync's list of files), and then you have a
db/current that refers to something that wasn't transferred.

Hence the reason to transfer it first.


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Backing up - is db/current written last?

Posted by Soren 'Frank' Munch <sm...@u5.com>.
Daniel wrote:
> db/current is written last by svn.
>
> Also, fsfs is still transactional, so the commits are stored in the
> transaction directory until they are moved into place, and *then*
> db/current is updated.

Great! In that case db/current _must_ point to a set of valid file and there's 
no reason to delay rsync.

> Thus, at worst, you will end up with a backup that does not include the
> last revision, *not* a corrupt repository.

I figured out this would be the worst case if db/current was not written last.

- - -

Well, the bottom line must be that grabbing the db/current first leads to 
correct backups of the commit that wrote it. I assume is what hot-backup.py 
is doing but I am one of those who would like to keep up an illusion that I 
know what goes on and the raw "(s)cp db/current ..." and subsequent rsync is 
very appealing.

Thanks for the information from the front-line!

Soren 'Frank'

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Backing up - is db/current written last? (was: Berkeley...)

Posted by Daniel Berlin <db...@dberlin.org>.
On Sun, 2005-07-03 at 02:12 +0700, Soren 'Frank' Munch wrote:
> Daniel Berlin wrote:
> > When I started looking at Subversion for our company, the mandatory
> > requirements were reliability and availability. We have development
> > offices in several countries and require 24-hour access. I waited until
> > FSFS was available and stable and have since begun pilot testing.
> >
> > FSFS with rsync is indeed the way to go. We use rsync to backup our
> > repositories being careful to backup db/current first. The backup cron
> > job is amazingly fast - maybe a few seconds every hour, so we get highly
> > reliable backups with essentially no impact on the server. We copy to a
> > standby server (to be located in a disaster recovery center), so should
> > the main server fail, or we lose network connectivity, we can be up and
> > running again in under an hour.
> 
> Smart, and done with only packages, Subversion and rsync only.
> 
> One thing that struck me is that backing up the db/current first is no 
> guarantee?
> 
> A commit initiated just before backup-time must result in a corrupted backup, 
> virtually every time:
> 
> db/current is written (by svn doing "commit zero")

db/current is written last by svn.

Also, fsfs is still transactional, so the commits are stored in the
transaction directory until they are moved into place, and *then*
db/current is updated.

Thus, at worst, you will end up with a backup that does not include the
last revision, *not* a corrupt repository.



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Backing up - is db/current written last? (was: Berkeley...)

Posted by Soren 'Frank' Munch <sm...@u5.com>.
Daniel Berlin wrote:
> When I started looking at Subversion for our company, the mandatory
> requirements were reliability and availability. We have development
> offices in several countries and require 24-hour access. I waited until
> FSFS was available and stable and have since begun pilot testing.
>
> FSFS with rsync is indeed the way to go. We use rsync to backup our
> repositories being careful to backup db/current first. The backup cron
> job is amazingly fast - maybe a few seconds every hour, so we get highly
> reliable backups with essentially no impact on the server. We copy to a
> standby server (to be located in a disaster recovery center), so should
> the main server fail, or we lose network connectivity, we can be up and
> running again in under an hour.

Smart, and done with only packages, Subversion and rsync only.

One thing that struck me is that backing up the db/current first is no 
guarantee?

A commit initiated just before backup-time must result in a corrupted backup, 
virtually every time:

db/current is written (by svn doing "commit zero")
The backup start and backs up db/current first. The same backup procedure then 
proceed (e.g. by running rsync) for the rest of the files.
But when rsync started commit zero is still running and writes to other files 
which should correspond to db/current. We can have no control over which 
files are being read by rsync earlier than commit zero updated them, and 
which later.

So it is hard to imagine how commit zero could be correctly backed up.

To me it seems a better procedure would be to back up db/current first, then 
_wait_ until commit zero surely is completed (with plenty of margin added) 
and then run rsync. This way we can be sure that the update of all files 
corresponding to commit zero is completed.

It does not matter if another commit has started while we wait as the files 
written by commit zero will not be changed and so we would have a backup with 
integrity (and possibly 'unreferred' files belonging to commits done before 
our backup were completed).

Have I understood this correctly?

- - -

Another matter is - which is a OS-thing - what happens if a write-request is 
being done to a file while another process reads from it. Does e.g. UNIX 
automatically take care of some locking? If so it must mean that it is 
"wrong" to write to any part of a file while it is being read?

Or can we risk that db/current is being read by the backup  procedure - and 
then changed by svn while we read it it, resulting in corruption?

- - -

Once "The server is down for maintenance, please come back soon..." was good 
enough... :-)

Soren 'Frank'


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Berkeley is good stuff!

Posted by John Waycott <ja...@cox.net>.
Daniel Berlin wrote:

>In FSFS, older revisions are immutable.
>This means that the data in them won't ever be written to.  The only
>thing you *really* need to ensure is that the db/current file gets
>copied first, so that it doesn't reference a revision file that appeared
>since you started your copy.
>Other than that, something like rsync should make this *very* fast.
>
>
>  
>
When I started looking at Subversion for our company, the mandatory 
requirements were reliability and availability. We have development 
offices in several countries and require 24-hour access. I waited until 
FSFS was available and stable and have since begun pilot testing.

FSFS with rsync is indeed the way to go. We use rsync to backup our 
repositories being careful to backup db/current first. The backup cron 
job is amazingly fast - maybe a few seconds every hour, so we get highly 
reliable backups with essentially no impact on the server. We copy to a 
standby server (to be located in a disaster recovery center), so should 
the main server fail, or we lose network connectivity, we can be up and 
running again in under an hour.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Berkeley is good stuff!

Posted by Daniel Berlin <db...@dberlin.org>.
On Sat, 2005-07-02 at 08:33 +0700, Soren 'Frank' Munch wrote:
> Ryan Schmidt wrote :
> > I was under the impression that a hot-backup is only necessary for
> > BDB repositories -- that FSFS repositories can simply be copied using
> > a normal operating system copy. In other words, the existence of hot-
> > copy for BDB is not a benefit of BDB; the necessity of using hot-copy
> > for BDB is a drawback of BDB. At least that's what the list has led
> > me to believe.
> 
> But also repos running on Berkeley can be backed up with a normal operations 
> like cp -Rfp or tar. The tricky question is how long we need to take the 
> svn-service down, to ensure that data written to the repos during the backup 
> doesn't screw up the integrity of the files.
> 
> I think it is like this:
> 
> 1. fsfs without using snapshot file system (or bdb by simple copying): We must 
> ensure that no write operations takes place while we back up the files.

In FSFS, older revisions are immutable.
This means that the data in them won't ever be written to.  The only
thing you *really* need to ensure is that the db/current file gets
copied first, so that it doesn't reference a revision file that appeared
since you started your copy.
Other than that, something like rsync should make this *very* fast.



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Berkeley is good stuff!

Posted by Soren 'Frank' Munch <sm...@u5.com>.
Ryan Schmidt wrote :
> I was under the impression that a hot-backup is only necessary for
> BDB repositories -- that FSFS repositories can simply be copied using
> a normal operating system copy. In other words, the existence of hot-
> copy for BDB is not a benefit of BDB; the necessity of using hot-copy
> for BDB is a drawback of BDB. At least that's what the list has led
> me to believe.

But also repos running on Berkeley can be backed up with a normal operations 
like cp -Rfp or tar. The tricky question is how long we need to take the 
svn-service down, to ensure that data written to the repos during the backup 
doesn't screw up the integrity of the files.

I think it is like this:

1. fsfs without using snapshot file system (or bdb by simple copying): We must 
ensure that no write operations takes place while we back up the files. This 
means closing the svn service,  copying the files and when completed, opening 
svn again. This is the method we currently use. If the repo is very large 
this means some downtime. 

2. fsfs using a snapshot-able file system: Closing server, taking snapshot, 
opening. In other words the snapshot functionality allows us to close for a 
very short time but close we still have to, we can't risk snapshotting in the 
middle of an operation.

3. BDB, using snapshot - I'd imagine that this was similar to 2. Or is there 
even some mechanism in svn+hotbackup that can ensure data integrity without 
stopping the svn-server at all, even in the middle of an operation? As there 
is already 'atomicity' built into svn this could be the case. If so this 
would be a real asset for busy svn system.

Somebody must know about this... :-)

Soren 'Frank'


On Saturday 02 July 2005 06:56, Ryan Schmidt wrote:
> On 02.07.2005, at 01:17, Soren 'Frank' Munch wrote:
> > On Saturday 02 July 2005 04:42, Christopher Kreager wrote:
> >> We chose the Berkeley path with the intention to use the
> >> hotbackup, which
> >> helps for running two db back to back for recovery with less down
> >> time. At
> >> the moment SVNADMIN can not seem to handle loading the dump file
> >> of this
> >> size when restoring to BDB, but works just fine going to FSFS.
> >
> > FreeBSD 5.x/6.x has a filesystem snapshot-option. It can most
> > probably be
> > used to get "hotbackups" with fsfs. I don't know if Linux/OS X has
> > something
> > similar.
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Berkeley is good stuff!

Posted by Ryan Schmidt <su...@ryandesign.com>.
On 02.07.2005, at 01:17, Soren 'Frank' Munch wrote:

> On Saturday 02 July 2005 04:42, Christopher Kreager wrote:
>
>> We chose the Berkeley path with the intention to use the  
>> hotbackup, which
>> helps for running two db back to back for recovery with less down  
>> time. At
>> the moment SVNADMIN can not seem to handle loading the dump file  
>> of this
>> size when restoring to BDB, but works just fine going to FSFS.
>
> FreeBSD 5.x/6.x has a filesystem snapshot-option. It can most  
> probably be
> used to get "hotbackups" with fsfs. I don't know if Linux/OS X has  
> something
> similar.

I was under the impression that a hot-backup is only necessary for  
BDB repositories -- that FSFS repositories can simply be copied using  
a normal operating system copy. In other words, the existence of hot- 
copy for BDB is not a benefit of BDB; the necessity of using hot-copy  
for BDB is a drawback of BDB. At least that's what the list has led  
me to believe.



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Berkeley is good stuff!

Posted by Soren 'Frank' Munch <sm...@u5.com>.
Hi Chris,

most interesting.  Hmmm, instead of taking the trouble of upgrading to any new 
Berkeley version I think I'll rather switch to fsfs for our old reps.

BTW, FreeBSD 5.x/6.x has a filesystem snapshot-option. It can most probably be 
used to get "hotbackups" with fsfs. I don't know if Linux/OS X has something 
similar.

Best regards

Soren 'Frank'

On Saturday 02 July 2005 04:42, Christopher Kreager wrote:
>  I would agree with you, and have only been using it from Oct 04 till now,
> upgrading svn as I go. But their is some configuration mixture with BDB
> 4.2to 4.3 in this move to svn 1.2.0 that only one of our 7 repos did not
> handle well, which was our largest 4+ GB.
> Not saying it a Berkeley issue or an SVN issue, but seems to be some mix or
> procedure in converting over in our case.
>
> We chose the Berkeley path with the intention to use the hotbackup, which
> helps for running two db back to back for recovery with less down time. At
> the moment SVNADMIN can not seem to handle loading the dump file of this
> size when restoring to BDB, but works just fine going to FSFS.
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Berkeley is good stuff!

Posted by "Jim C. Nasby" <de...@decibel.org>.
On Sat, Jul 02, 2005 at 03:01:19AM +0700, Soren 'Frank' Munch wrote:
> Hi,
> 
> reading the last postings one could get the idea that Subversion with Berkeley 
> is a risky affair.
> 
> We have been using Berkeley for 8 years or so, on dozens of servers, as a part 
> of Postfix, Postgres, MySQL, Cyrus and for about a year with Subversion on 
> FreeBSD. Some of the servers are having tens of thousands of pageloads pr. 
> day and handling an equal number of mails.

PostgreSQL is in no way associated with BDB.
-- 
Jim C. Nasby, Database Consultant               decibel@decibel.org 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Berkeley is good stuff!

Posted by Christopher Kreager <la...@gmail.com>.
 I would agree with you, and have only been using it from Oct 04 till now, 
upgrading svn as I go. But their is some configuration mixture with BDB 4.2to 
4.3 in this move to svn 1.2.0 that only one of our 7 repos did not handle 
well, which was our largest 4+ GB.
Not saying it a Berkeley issue or an SVN issue, but seems to be some mix or 
procedure in converting over in our case.

We chose the Berkeley path with the intention to use the hotbackup, which 
helps for running two db back to back for recovery with less down time. At 
the moment SVNADMIN can not seem to handle loading the dump file of this 
size when restoring to BDB, but works just fine going to FSFS. 

 

On 7/1/05, Soren 'Frank' Munch <sm...@u5.com> wrote:
> 
> Hi,
> 
> reading the last postings one could get the idea that Subversion with 
> Berkeley
> is a risky affair.
> 
> We have been using Berkeley for 8 years or so, on dozens of servers, as a 
> part
> of Postfix, Postgres, MySQL, Cyrus and for about a year with Subversion on
> FreeBSD. Some of the servers are having tens of thousands of pageloads pr.
> day and handling an equal number of mails.
> 
> Status: We have never had a problem with Berkely, not one in all the time! 
> It
> sounds too good to be true, but it isn't. The only problem I recall - 
> which
> seems to have gone away - is numerous entries in syslog over several years
> about hundreds of Berkeley 'lockers'.
> 
> This despite all the occasional problems one can expect, like a server 
> crash
> now and then. I have repaired quite a couple of MySQL dbs over time, a few
> Postgres ditto but have yet to gain any experience with repairing Berkeley
> (or Svn for that matter).
> 
> A couple of recent posters report problems using OS X, maybe there's a 
> problem
> on this system. So some have other experiences with Berkeley/Svn, but
> considering the services already done by this combination the strong
> statements we have seen seems really misleading.
> 
> I think it is far more correct to claim that if you are using Svn+Berkeley 
> you
> are very unlikely to experience any problems. At the same time I could
> imagine that fsfs is a better choice for most average size reps. I have a 
> gut
> feeling that Berkely is a little big and complicated and with some known
> drawbacks (of which instability is _not_ one, unless the whole system
> crashes). Should I one day find the time I might migrate our old reps.
> 
> IMHO!
> 
> Soren 'Frank'
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
> For additional commands, e-mail: users-help@subversion.tigris.org
> 
> 


-- 
- lan