You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@subversion.apache.org by BD <cc...@gmail.com> on 2010/05/07 15:26:40 UTC

Active-Active Clustering with Subversion

Hi All,

I'm starting a new project to consolidate all svn repos across our company
into a single instance. Originally we looked at doing a active-passive
cluster, but after looking at the loads on the current individual svn repos,
we are thinking that an active-active cluster would be preferable.

My question is, is it possible/safe to have two apache/svn nodes accessing
the same repo on the same storage system, shared out via nfs v3? Of course
the repo DB will formated with type FSFS, but we are concerned about data
corruption with multiple nodes doing commits to the same repo. Does anyone
have any experience using svn in this or a similar configuration?

Thanks
B

Re: Active-Active Clustering with Subversion

Posted by km...@rockwellcollins.com.
BD <cc...@gmail.com> wrote on 05/10/2010 09:52:39 AM:
> Thats a very interesting way of looking at this problem. It does 
> make sense that multiple commit processes coming from the same 
> machine really wouldnt be that different from the question I was 
> asking. I guess from here I'll have to do some testing somehow, with
> nfs in the mix and see if I can purposly corrupt data by running 
> many commit requests from two separate apache nodes. But if what 
> your saying is right, it sounds like i shouldnt have much in the way
> of problems.

Not to get offtrack again, but have you ever thought of just using
more cores/cpus in one physical server instead of multiple physical
servers?  Yes, you lose hardware failover, but you do not risk any
potential bizarre data corruption issues caused by multiple writers
on NFS.

I have seen near saturation on dual 1Gb network links on a dual
cpu/dual core svn server without using NFS for storage.  (Disk
I/O is on a separate SAN fiber network.)  I would hate to go
back to network based Disk I/O...

Kevin R.

Re: Active-Active Clustering with Subversion

Posted by BD <cc...@gmail.com>.
Thanks Hyrum,

Thats a very interesting way of looking at this problem. It does make sense
that multiple commit processes coming from the same machine really wouldnt
be that different from the question I was asking. I guess from here I'll
have to do some testing somehow, with nfs in the mix and see if I can
purposly corrupt data by running many commit requests from two separate
apache nodes. But if what your saying is right, it sounds like i shouldnt
have much in the way of problems.

Thanks again!
B

On Mon, May 10, 2010 at 5:03 AM, Hyrum K. Wright <
hyrum_wright@mail.utexas.edu> wrote:

>
>
> On Fri, May 7, 2010 at 8:08 PM, BD <cc...@gmail.com> wrote:
>
>>
>> So the question remains, taking physical restraints out of the question,
>> is there anyone out there who knows about managing the risks assocciated
>> with having two or more apache/svn nodes accessing repos that are stored on
>> a shared NFS storage system, with the SVN DBs using FSFS.
>
>
> I can't comment on your specific situation, but Subversion repositories are
> designed to be accessed by multiple concurrent processes, even if these
> processes are located on separate hosts.  When using a single instances
> of Apache, for example, multiple requests can often spawn multiple processes
> which all interact (correctly) with the Subversion repository.  In addition,
> the write-serialization window is relatively small, and writers do not block
> readers, so even during long-running parallel commits, read operations will
> still work as expected.
>
> Throwing NFS in the mix here may complicate things a bit, but probably not
> by much.
>
> -Hyrum
>

Re: Active-Active Clustering with Subversion

Posted by "Hyrum K. Wright" <hy...@mail.utexas.edu>.
On Fri, May 7, 2010 at 8:08 PM, BD <cc...@gmail.com> wrote:

>
> So the question remains, taking physical restraints out of the question, is
> there anyone out there who knows about managing the risks assocciated with
> having two or more apache/svn nodes accessing repos that are stored on a
> shared NFS storage system, with the SVN DBs using FSFS.


I can't comment on your specific situation, but Subversion repositories are
designed to be accessed by multiple concurrent processes, even if these
processes are located on separate hosts.  When using a single instances
of Apache, for example, multiple requests can often spawn multiple processes
which all interact (correctly) with the Subversion repository.  In addition,
the write-serialization window is relatively small, and writers do not block
readers, so even during long-running parallel commits, read operations will
still work as expected.

Throwing NFS in the mix here may complicate things a bit, but probably not
by much.

-Hyrum

Re: Active-Active Clustering with Subversion

Posted by BD <cc...@gmail.com>.
Thanks Les,

I know NFS itself can certainly be a bottleneck. However, we will be
devoting at least three shelves of disk on our NetApp 3070 which in our
standard RAID group size will make for about 38 data spindles and we will
have have 256 GB of read cache per head on a two head storage system.

Initially we dont expect compute capacity to be a problem with our SVN
setup, but we are a growing company and are planning this SVN cluster to be
scalable with the organization.

So the question remains, taking physical restraints out of the question, is
there anyone out there who knows about managing the risks assocciated with
having two or more apache/svn nodes accessing repos that are stored on a
shared NFS storage system, with the SVN DBs using FSFS.

B

On Fri, May 7, 2010 at 1:05 PM, Les Mikesell <le...@gmail.com> wrote:

> On 5/7/2010 10:26 AM, BD wrote:
>
>> Hi All,
>>
>> I'm starting a new project to consolidate all svn repos across our
>> company into a single instance. Originally we looked at doing a
>> active-passive cluster, but after looking at the loads on the current
>> individual svn repos, we are thinking that an active-active cluster
>> would be preferable.
>>
>> My question is, is it possible/safe to have two apache/svn nodes
>> accessing the same repo on the same storage system, shared out via nfs
>> v3? Of course the repo DB will formated with type FSFS, but we are
>> concerned about data corruption with multiple nodes doing commits to the
>> same repo. Does anyone have any experience using svn in this or a
>> similar configuration?
>>
>
> The underlying disk system itself is probably the bottleneck so
> distributing access isn't likely to help performance that much anyway. I'd
> expect bigger gains from beefing up the storage unit (make the raid
> distribute over more drives, don't share those drives with other work, use a
> controller with battery-backed buffering, etc.).
>
> --
>  Les Mikesell
>   lesmikesell@gmail.com
>

Re: Active-Active Clustering with Subversion

Posted by Les Mikesell <le...@gmail.com>.
On 5/7/2010 10:26 AM, BD wrote:
> Hi All,
>
> I'm starting a new project to consolidate all svn repos across our
> company into a single instance. Originally we looked at doing a
> active-passive cluster, but after looking at the loads on the current
> individual svn repos, we are thinking that an active-active cluster
> would be preferable.
>
> My question is, is it possible/safe to have two apache/svn nodes
> accessing the same repo on the same storage system, shared out via nfs
> v3? Of course the repo DB will formated with type FSFS, but we are
> concerned about data corruption with multiple nodes doing commits to the
> same repo. Does anyone have any experience using svn in this or a
> similar configuration?

The underlying disk system itself is probably the bottleneck so 
distributing access isn't likely to help performance that much anyway. 
I'd expect bigger gains from beefing up the storage unit (make the raid 
distribute over more drives, don't share those drives with other work, 
use a controller with battery-backed buffering, etc.).

-- 
   Les Mikesell
    lesmikesell@gmail.com

Re: Active-Active Clustering with Subversion

Posted by BD <cc...@gmail.com>.
Thanks for the reply Ryan,

I'll have to look further into how locking is setup on our NetApp FAS 3070.
We were also considering using GFS to handle the locking, have you heard
anything about users having multiple svn compute nodes connecting to a repo
on GFS and using distributed lock manager?

I saw that some people were using svnsync and writethrough proxying. I was
concerned about the read-only copies keeping up during nightly builds when
our developers will often go through thousands of commits. I'll have to look
into the documentation for svnsync a little closer.

B

On Fri, May 7, 2010 at 11:50 AM, Ryan Schmidt <
subversion-2010b@ryandesign.com> wrote:

>
> On May 7, 2010, at 10:26, BD wrote:
>
> > I'm starting a new project to consolidate all svn repos across our
> company into a single instance. Originally we looked at doing a
> active-passive cluster, but after looking at the loads on the current
> individual svn repos, we are thinking that an active-active cluster would be
> preferable.
> >
> > My question is, is it possible/safe to have two apache/svn nodes
> accessing the same repo on the same storage system, shared out via nfs v3?
> Of course the repo DB will formated with type FSFS, but we are concerned
> about data corruption with multiple nodes doing commits to the same repo.
> Does anyone have any experience using svn in this or a similar
> configuration?
>
> Hosting a repo on NFS can work, but so many people write here for help
> after trying to do so and finding it doesn't work for them. It depends on
> whether your NFS implementation supports proper locking.
>
> I've been told before that to do active-active clustering, you would want
> to have the repository data located on a cluster filesystem (e.g. Apple
> Xsan) accessed by both servers. Otherwise data corruption would indeed be a
> concern.
>
> But, these days, you could have a simpler setup with two (or more)
> standalone servers which mirror each other's contents using svnsync. Write
> requests would have to happen on a single master server only, but the
> mirrors could be configured with a writethrough proxy to make this
> transparent. You should be able to find documentation on setting these up.
>
>

Re: Active-Active Clustering with Subversion

Posted by Ryan Schmidt <su...@ryandesign.com>.
On May 7, 2010, at 10:26, BD wrote:

> I'm starting a new project to consolidate all svn repos across our company into a single instance. Originally we looked at doing a active-passive cluster, but after looking at the loads on the current individual svn repos, we are thinking that an active-active cluster would be preferable.
> 
> My question is, is it possible/safe to have two apache/svn nodes accessing the same repo on the same storage system, shared out via nfs v3? Of course the repo DB will formated with type FSFS, but we are concerned about data corruption with multiple nodes doing commits to the same repo. Does anyone have any experience using svn in this or a similar configuration? 

Hosting a repo on NFS can work, but so many people write here for help after trying to do so and finding it doesn't work for them. It depends on whether your NFS implementation supports proper locking.

I've been told before that to do active-active clustering, you would want to have the repository data located on a cluster filesystem (e.g. Apple Xsan) accessed by both servers. Otherwise data corruption would indeed be a concern.

But, these days, you could have a simpler setup with two (or more) standalone servers which mirror each other's contents using svnsync. Write requests would have to happen on a single master server only, but the mirrors could be configured with a writethrough proxy to make this transparent. You should be able to find documentation on setting these up.