You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@subversion.apache.org by zhiwei chen <zh...@gmail.com> on 2011/08/02 08:07:48 UTC

how to back svn repositories

hi,everyone.

We have many svn repositories,more than 100,000 , but every repository has
less than 1024M.

So,which svn backup strategies should I use ?

Re: how to back svn repositories

Posted by Nico Kadel-Garcia <nk...@gmail.com>.
On Tue, Aug 2, 2011 at 2:26 AM, Venkata Badipatla
<ve...@persistent.co.in> wrote:
> Svnadmin dump utility is a correct strategy to be followed. As you are
> telling many repos, you can write simple perl script which can create the
> dump files for all the repos.

OK, *now* I'm going to waggle my finger at you.  I find this typical
of some ancient and simplistic perl "solutions" I've seen lately where
a "simple matter of programming" turned into a "simple matter of
ignoring the realities".

Writing a "simple perl script" to chew through 100,000 repositories,
all approaching 1 Gig in size, is approaching 100 TB of fata. Even
with fast local disk, it's a very significant hardware load on the
server. 100 TB, at 80 MB/second for reasonably fast modern disk
is.....  2 weeks. And it would easily take that long to restore, and
it is going to *CHURN* the backups, so identical backups still get
overwritten rather than checked and left alone. And it is *begging*
for "split-brain" problems, where changes that occur in the 2 weeks
are in people's working copies but not in the failover or slave
repository and chaos ensues when their working copy has changes
committed, with the same revision, that the new master server does not
have.

It will also entirely neglect all configuration files, such as
pre-commit and post-commit scripts.

One *could* do something with svnadmin dump of incremental changes:
some sort of flag to register where previous successful transfers left
off could be used to begin dumps with only the updates, and preserve
those. It gets tricky to maintain. But that's all builit right into
svnsync, anyway, so one might as well use that to strart with rather
than this simplistic approach. If you need to stay this simple for
whatever reason, an svnsync push from the primary server might serve.


>
> --Venkat
>
> From: zhiwei chen [mailto:zhiweik@gmail.com]
> Sent: Tuesday, August 02, 2011 11:38 AM
> To: users@subversion.apache.org
> Subject: how to back svn repositories
>
>
>
> hi,everyone.
>
>
>
> We have many svn repositories,more than 100,000 , but every repository has
> less than 1024M.
>
>
>
> So,which svn backup strategies should I use ?
>
> DISCLAIMER ========== This e-mail may contain privileged and confidential
> information which is the property of Persistent Systems Ltd. It is intended
> only for the use of the individual or entity to which it is addressed. If
> you are not the intended recipient, you are not authorized to read, retain,
> copy, print, distribute or use this message. If you have received this
> communication in error, please notify the sender and delete all copies of
> this message. Persistent Systems Ltd. does not accept any liability for
> virus infected mails.

RE: how to back svn repositories

Posted by Venkata Badipatla <ve...@persistent.co.in>.
Svnadmin dump utility is a correct strategy to be followed. As you are telling many repos, you can write simple perl script which can create the dump files for all the repos.

--Venkat
From: zhiwei chen [mailto:zhiweik@gmail.com]
Sent: Tuesday, August 02, 2011 11:38 AM
To: users@subversion.apache.org
Subject: how to back svn repositories

hi,everyone.

We have many svn repositories,more than 100,000 , but every repository has less than 1024M.

So,which svn backup strategies should I use ?

DISCLAIMER
==========
This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.


Re: how to back svn repositories

Posted by Nico Kadel-Garcia <nk...@gmail.com>.
On Tue, Aug 2, 2011 at 2:07 AM, zhiwei chen <zh...@gmail.com> wrote:
> hi,everyone.
> We have many svn repositories,more than 100,000 , but every repository has
> less than 1024M.
> So,which svn backup strategies should I use ?

Great bird of space, what are you running? Sourceforge? You're
approaching 100 TB of repository space!!!

I have to assume that 99.9% of these are idle auto-generated
repositories created as part of sme regression testing or continuous
build  structure. I went through something like this with an in-house
backup system that used a database to manage hardlinks, and most of
whose directories had no actual edits or unlocked files in them. I had
to optimize it by basically ignoring all the non-active equivalent of
tags, which turned it from an insane 5 day restoration procedure to a
2 hour restoration procedure.

I assume that the old, stable repositories are what most of us would
use as tags: suitable to lock down and backup up with rsync, star, or
a similar tool that will not re-copy every byte every time you run it,
that can be run twice without overwriting already transmitted files,
and that can be gracefully managed to select or deselect targets. This
will mirror not only the revisiions, but the file ownership,
authentication and scripting internal to the repository. It won't
mirror HTTP access or web configs, or SSH based access configurations,
so treat that separately.

That said, the databases can be synchronized with svnsync on a remote
server for efficiency, and to help avoid corruption issues from
mirroring files in the midst of database interactions. This will *not*
gain you fail over repositories with identical uuid's suitable for
"svn switch" operations, but it will also allow you to update your
backup server's subversion binaries without interfering with the
primary system.. Any repository that has had updates since the last
svnsync, svnadmin dump, or other backup technology, however, will be
prone to "split-brain" problems where a new revision submitted on the
failover or recovered server does not match the revision previously
with the same number on the original server, and chaos will ensue.

Split-brain is something that people don't seem to worry about much
for small repositories: you can notify your clients that they need to
re-checkout their working copies and copy over their working files,
and they'll only lose some recent commits. But it's potentially
really, really nasty to automated procedures.

Frankly, this is the point where you call WanDisco and say "Hi, I've
got a problem: do you have a commercial grade solution?" They have
tools that will do multi-master setups and avoid the "split-brain"
problem, and have probably already addressed the backup needs.

RE: how to back svn repositories

Posted by Bob Archer <Bo...@amsi.com>.
> hi,everyone.
> 
> We have many svn repositories,more than 100,000 , but every
> repository has less than 1024M.
> 
> So,which svn backup strategies should I use ?

I think there are three basic options...

1. Back up to tape... this is what we do. Yes, we possible lose intra day stuff but it is a risk we are taking.

2. use svnadmin hotcopy to copy the repos to a separate drive system.

3. use svnsync to mirror all the repositories. This also gives you a psuedo hot swap situation. Of course, you have to have the hardware available to do it, so this is probably the most expensive option.

Yes, very basic level answer I know.

BOb