You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@subversion.apache.org by Simon <tz...@snkmail.com> on 2011/08/15 03:34:15 UTC

Delay syncing to mirror repositories causing issues

We have a main master repository and a number of mirror slave repositories at a bunch of locations that are set up as webdav transparent write-through proxies. These are synced by a process similar to svnsync, and this all seems to work okay.

However, it is inevitable that there is delay in the commits at the master repository propagating out to the slaves. This is not usually a problem, except when a large commit has been made where the transfer time of the revisions data is significant. In this situation the a client that uses the slave repository can have its commit blocked because it is unable to update to the latest revision because the slave repository is out of sync. This is unfortunate because it makes the slave repository somewhat useless until the sync has time to resolve itself. In a recent situation our slave was out of sync for around 3.5 hours.

Is there a workaround for this situation?
Switching the working copies back to the master is not really feasible at present because we run different UUIDs in the slave repositories, and I think our users would find this too cumbersome (or too complex!).

I was thinking that if the client had knowledge of the master repository (perhaps as an additional property in the slave repositories properties) it would be possible for it to defer back to the master for the updates under these circumstances.

I have a couple of other thoughts on this but I was wondering if anyone has some experience in this area?

Regard,

Simon

Re: Delay syncing to mirror repositories causing issues

Posted by Simon Takita <st...@broadcom.com>.

On 15/08/2011, at 23:30 , Nico Kadel-Garcia nkadel-at-gmail.com |subversion users list| wrote:
> This is *precisely* the situation I warned about.... last week? When
> someone else was trying to set up that kind of live mirror pretending
> to be a master-master setup. I'm quite 3.5 hours is impressive,
> though. How did that happen, if you don't mind giving more detail.
Essentially we run one master and use hook scripts to sync out to the remote slave servers. In this instance there was a ~700Mb binary blob checked in to the master. The 3.5 hour delay was predominately transfer time of the 700Mb transaction set across our loaded inter-office link.

Re: Delay syncing to mirror repositories causing issues

Posted by Stefan Sperling <st...@elego.de>.

On Mon, Aug 15, 2011 at 10:31:32AM -0400, Nico Kadel-Garcia wrote:
> On Mon, Aug 15, 2011 at 9:45 AM, Stefan Sperling <st...@elego.de> wrote:
> > AFAIK they don't modify Subversion's code. Their solution proxys webdav
> > traffic between Subversion clients and servers, like a man-in-the-middle.
> > The licence of Subversion doesn't matter in this case.
> 
> I'd be..... really, really surprised by that. I'd expect the
> pre-commit hooks, at least, to do some kind of verification of the
> local server's state as  the designated master node, to avoid the
> split-brain situations.

Pre-commit hooks aren't part of Subversion's code.

There are many examples where a non-copyleft licence allows companies
to build products that couldn't be proprietary if the original code
was copyleft. But this isn't one of them. What WD is doing would work
even if Subversion was proprietary software and the on-wire protocol was
reverse-engineered.

> But since it's closed source, I don't have
> access to it. Do you have a copy you can check on?

No. I don't have access to wandisco's code. I only know about what's
written on their website.

Re: Delay syncing to mirror repositories causing issues

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.

Stefan Sperling wrote on Tue, Aug 16, 2011 at 15:58:13 +0200:
> Right. I think the slave should be selective about what it forwards to
> the master. E.g. requests for log messages can certainly be sent to the
> master without causing much harm. The only real problem would be an update
> to a revision that is currently syncing or about to be synced.
> A diff/blame operation that involves this revision might also cause
> undesired traffic.
> 

For that matter you could try and cache those revprops somewhere, even
though the FS' youngest revision is older than them.

I'd probably prefer to see them separated from the FS revprops proper
--- ie, from the revprops that correspond to existing revisions --- in
order to maintain sanity of the FS code maintainers.

> We should try to improve the error message users get to see. If mod_dav_svn
> were to peek at the svn:sync-* properties to determine whether a sync is
> happening, it could annotate error messages for failed read requests with
> a "please try again later" message. (Yes, this assumes svnsync is being
> used -- dump/load isn't really the standard way of doing this, sorry).

Agreed; the proxy could could check for svn:sync-lock, svn:rdump-lock,
and maybe svn:I-am-syncing-the-proxy-via-some-other-means, revprops on
r0.

And it could include the URL of the master in the error message...
though, strictly speaking, I'm not sure that we expose that URL to
clients today?  (so we may want to make this disclosure of information
optional)

Re: Delay syncing to mirror repositories causing issues

Posted by Stefan Sperling <st...@elego.de>.

On Tue, Aug 16, 2011 at 09:31:21AM +1000, Simon Takita wrote:
> 
> On 16/08/2011, at 02:34 , Stefan Sperling stsp-at-elego.de |subversion users list| wrote:
> 
> > On Mon, Aug 15, 2011 at 11:06:29AM -0500, Les Mikesell wrote:
> >> I suppose the direct access could help in the case where the
> >> revision taking too long to sync is not the same data the client
> >> needs for its update, but otherwise it could make things worse.
> > 
> > Good point.
> > 
> > I was thinking of operations like 'svn log', 'svn diff' etc.
> > An update will need to pull the same data the sync is getting, of course.
> 
> 
> Direct access would certainly help in the case where the revision in transit was unrelated to a clients working copy. In fact I would probably expect this to be the usual case.
> 
> Even though deferring to the master in this case (where the transaction is related to the working copy), could make things worse in terms of absolute sync time, the current situation is that the slave can't be used for some operations during this period. In our situation fully coherent access to what the master server sees is a higher priority than update time, but I understand that others may have a different priority here.
> 

Right. I think the slave should be selective about what it forwards to
the master. E.g. requests for log messages can certainly be sent to the
master without causing much harm. The only real problem would be an update
to a revision that is currently syncing or about to be synced.
A diff/blame operation that involves this revision might also cause
undesired traffic.

We should try to improve the error message users get to see. If mod_dav_svn
were to peek at the svn:sync-* properties to determine whether a sync is
happening, it could annotate error messages for failed read requests with
a "please try again later" message. (Yes, this assumes svnsync is being
used -- dump/load isn't really the standard way of doing this, sorry).

Re: Delay syncing to mirror repositories causing issues

Posted by Simon Takita <st...@broadcom.com>.

On 16/08/2011, at 02:34 , Stefan Sperling stsp-at-elego.de |subversion users list| wrote:

> On Mon, Aug 15, 2011 at 11:06:29AM -0500, Les Mikesell wrote:
>> I suppose the direct access could help in the case where the
>> revision taking too long to sync is not the same data the client
>> needs for its update, but otherwise it could make things worse.
> 
> Good point.
> 
> I was thinking of operations like 'svn log', 'svn diff' etc.
> An update will need to pull the same data the sync is getting, of course.

Direct access would certainly help in the case where the revision in transit was unrelated to a clients working copy. In fact I would probably expect this to be the usual case.

Even though deferring to the master in this case (where the transaction is related to the working copy), could make things worse in terms of absolute sync time, the current situation is that the slave can't be used for some operations during this period. In our situation fully coherent access to what the master server sees is a higher priority than update time, but I understand that others may have a different priority here.

Re: Delay syncing to mirror repositories causing issues

Posted by Stefan Sperling <st...@elego.de>.

On Mon, Aug 15, 2011 at 11:06:29AM -0500, Les Mikesell wrote:
> I suppose the direct access could help in the case where the
> revision taking too long to sync is not the same data the client
> needs for its update, but otherwise it could make things worse.

Good point.

I was thinking of operations like 'svn log', 'svn diff' etc.
An update will need to pull the same data the sync is getting, of course.

Re: Delay syncing to mirror repositories causing issues

Posted by Les Mikesell <le...@gmail.com>.

On 8/15/2011 10:34 AM, Stefan Sperling wrote:
> On Mon, Aug 15, 2011 at 10:06:39AM -0500, Les Mikesell wrote:
>> I can see how you might do a quorum based locking scheme there to
>> make things reliable in the case of a partitioned network with
>> multiple replicas, but what can it do to improve the time it takes
>> for a certain amount of new/uncached data to make it to the other
>> side of a slow network?  Don't the rules of physics still apply?
>
> I believe with WD clients using a slave server can access data while it is
> being copied to the slave because read-requests for data that isn't yet
> available on the slave are proxied to the master.

How are you are going to try to improve throughput by pulling multiple 
copies when the network is too slow go get one across for the cache?

> This is something Subversion's write-through proxy could do, too.
> But it doesn't right now. All read-requests are answered by the slave
> and they fail if requested data isn't available yet.

I suppose the direct access could help in the case where the revision 
taking too long to sync is not the same data the client needs for its 
update, but otherwise it could make things worse.  This might make a 
good case for not putting multiple projects in the same repo, though.

-- 
   Les Mikesell
    lesmikesell@gmail.com

Re: Delay syncing to mirror repositories causing issues

Posted by Stefan Sperling <st...@elego.de>.

On Mon, Aug 15, 2011 at 05:34:59PM +0200, Stefan Sperling wrote:
> I believe with WD clients using a slave server can access data while it is
> being copied to the slave because read-requests for data that isn't yet
> available on the slave are proxied to the master.
> 
> This is something Subversion's write-through proxy could do, too.
> But it doesn't right now. All read-requests are answered by the slave
> and they fail if requested data isn't available yet.

BTW, see http://subversion.tigris.org/issues/show_bug.cgi?id=2988

Re: Delay syncing to mirror repositories causing issues

Posted by Stefan Sperling <st...@elego.de>.

On Mon, Aug 15, 2011 at 10:06:39AM -0500, Les Mikesell wrote:
> I can see how you might do a quorum based locking scheme there to
> make things reliable in the case of a partitioned network with
> multiple replicas, but what can it do to improve the time it takes
> for a certain amount of new/uncached data to make it to the other
> side of a slow network?  Don't the rules of physics still apply?

I believe with WD clients using a slave server can access data while it is
being copied to the slave because read-requests for data that isn't yet
available on the slave are proxied to the master.

This is something Subversion's write-through proxy could do, too.
But it doesn't right now. All read-requests are answered by the slave
and they fail if requested data isn't available yet.

Re: Delay syncing to mirror repositories causing issues

Posted by Ian Wild <ia...@wandisco.com>.

On Mon, Aug 15, 2011 at 4:06 PM, Les Mikesell <le...@gmail.com> wrote:

>
> I can see how you might do a quorum based locking scheme there to make
> things reliable in the case of a partitioned network with multiple replicas,
> but what can it do to improve the time it takes for a certain amount of
> new/uncached data to make it to the other side of a slow network?  Don't the
> rules of physics still apply?
>
>
Hi Les,

Yes, the rules of physics still apply, but the key with WANdisco is that the
commit always happens at the local node, so anyone else using that local
node to do the checkout gets the very latest version. There is no concept of
a slave server with WANdisco. The quorum is established at the time of the
commit and the mechanism provides a guarenteed way to ensure that the same
commits are applied to all servers in the same order, but not necessarily at
the same time (a server could be down, and would only catch-up its missed
transactions when it came back on line).

I should also add that Subversion Multisite in no way changes the operation
of the underlying Subversion binaries and we are not implemented with hooks.
In fact the product is a proxy server which sits between the client and
server and reads/replicates write traffic as it's sent to all other
servers.

WANdisco have some huge customers and the product is used to solve these
exact issues by thousands of developers every day. It's a very robust
solution all round... If anyone on this list would like to get access to
trial copy to prove out the claims then I'm sure I can arrange that, just
drop me a mail and I will be happy to sort.

Best Wishes,

Ian

--
Ian Wild
WANdisco, Inc.

http://www.wandisco.com

uberSVN: Apache Subversion Made Easy
http://www.uberSVN.com <http://www.ubersvn.com/>

Re: Delay syncing to mirror repositories causing issues

Posted by Les Mikesell <le...@gmail.com>.

On 8/15/2011 9:31 AM, Nico Kadel-Garcia wrote:
> On Mon, Aug 15, 2011 at 9:45 AM, Stefan Sperling<st...@elego.de>  wrote:
>> On Mon, Aug 15, 2011 at 09:30:58AM -0400, Nico Kadel-Garcia wrote:
>>> Note that this is also one of the cases where the selection of the
>>> Apache license for Subversion, rather than GPL, means that Wandisco
>>> can build a business plan on selling these commercially enhanced
>>> versions of Subversion without ever publishing the code.....
>>
>> AFAIK they don't modify Subversion's code. Their solution proxys webdav
>> traffic between Subversion clients and servers, like a man-in-the-middle.
>> The licence of Subversion doesn't matter in this case.
>
> I'd be..... really, really surprised by that. I'd expect the
> pre-commit hooks, at least, to do some kind of verification of the
> local server's state as  the designated master node, to avoid the
> split-brain situations. But since it's closed source, I don't have
> access to it. Do you have a copy you can check on?

A non-standard pre-commit hook wouldn't technically be part of 
subversion code either...

I can see how you might do a quorum based locking scheme there to make 
things reliable in the case of a partitioned network with multiple 
replicas, but what can it do to improve the time it takes for a certain 
amount of new/uncached data to make it to the other side of a slow 
network?  Don't the rules of physics still apply?

-- 
   Les Mikesell
    lesmikesell@gmail.com

Re: Delay syncing to mirror repositories causing issues

Posted by Nico Kadel-Garcia <nk...@gmail.com>.

On Mon, Aug 15, 2011 at 9:45 AM, Stefan Sperling <st...@elego.de> wrote:
> On Mon, Aug 15, 2011 at 09:30:58AM -0400, Nico Kadel-Garcia wrote:
>> Note that this is also one of the cases where the selection of the
>> Apache license for Subversion, rather than GPL, means that Wandisco
>> can build a business plan on selling these commercially enhanced
>> versions of Subversion without ever publishing the code.....
>
> AFAIK they don't modify Subversion's code. Their solution proxys webdav
> traffic between Subversion clients and servers, like a man-in-the-middle.
> The licence of Subversion doesn't matter in this case.

I'd be..... really, really surprised by that. I'd expect the
pre-commit hooks, at least, to do some kind of verification of the
local server's state as  the designated master node, to avoid the
split-brain situations. But since it's closed source, I don't have
access to it. Do you have a copy you can check on?

Re: Delay syncing to mirror repositories causing issues

Posted by Stefan Sperling <st...@elego.de>.

On Mon, Aug 15, 2011 at 09:30:58AM -0400, Nico Kadel-Garcia wrote:
> Note that this is also one of the cases where the selection of the
> Apache license for Subversion, rather than GPL, means that Wandisco
> can build a business plan on selling these commercially enhanced
> versions of Subversion without ever publishing the code.....

AFAIK they don't modify Subversion's code. Their solution proxys webdav
traffic between Subversion clients and servers, like a man-in-the-middle.
The licence of Subversion doesn't matter in this case.

Re: Delay syncing to mirror repositories causing issues

Posted by Nico Kadel-Garcia <nk...@gmail.com>.

On Sun, Aug 14, 2011 at 9:34 PM, Simon <tz...@snkmail.com> wrote:
> We have a main master repository and a number of mirror slave repositories at a bunch of locations that are set up as webdav transparent write-through proxies. These are synced by a process similar to svnsync, and this all seems to work okay.
>
> However, it is inevitable that there is delay in the commits at the master repository propagating out to the slaves. This is not usually a problem, except when a large commit has been made where the transfer time of the revisions data is significant. In this situation the a client that uses the slave repository can have its commit blocked because it is unable to update to the latest revision because the slave repository is out of sync. This is unfortunate because it makes the slave repository somewhat useless until the sync has time to resolve itself. In a recent situation our slave was out of sync for around 3.5 hours.
>
> Is there a workaround for this situation?
> Switching the working copies back to the master is not really feasible at present because we run different UUIDs in the slave repositories, and I think our users would find this too cumbersome (or too complex!).

This is *precisely* the situation I warned about.... last week? When
someone else was trying to set up that kind of live mirror pretending
to be a master-master setup. I'm quite 3.5 hours is impressive,
though. How did that happen, if you don't mind giving more detail.

> I was thinking that if the client had knowledge of the master repository (perhaps as an additional property in the slave repositories properties) it would be possible for it to defer back to the master for the updates under these circumstances.
>
> I have a couple of other thoughts on this but I was wondering if anyone has some experience in this area?
>
> Regard,
>
> Simon

This looks like what WanDisco's "Mutli-Site" tool does, with some
interesting proxying and production grade state management to
designate a preferred master and proxy traffic to it as necessary,
especially commits. There's an explanation of it at
http://www.wandisco.com/subversion/multisite.

Note that this is also one of the cases where the selection of the
Apache license for Subversion, rather than GPL, means that Wandisco
can build a business plan on selling these commercially enhanced
versions of Subversion without ever publishing the code.....

Re: Delay syncing to mirror repositories causing issues

Posted by Simon Takita <st...@broadcom.com>.

On 15/08/2011, at 22:54 , Stefan Sperling stsp-at-elego.de |subversion users list| wrote:

> On Mon, Aug 15, 2011 at 01:34:15AM +0000, Simon wrote:
>> We have a main master repository and a number of mirror slave
>> repositories at a bunch of locations that are set up as webdav
>> transparent write-through proxies. These are synced by a process
>> similar to svnsync, and this all seems to work okay.
> 
> So you're using a different synchronisation process than svnsync?
> How does it work? What prevents you from using svnsync? What's the
> commit-transfer latency of your procedure compared to the latency of svnsync?
Our sync mechanism essentially uses svnadmin dump at the master and svnadmin load at the slaves with some home cooked supervisory functionality to gel it all together. This may not be optimal but my understanding is that our sync mechanism was developed to work around some prior problems in svnsync for large commits. It is possible that the original svnsync problem has long since been fixed but the infrastructure is now somewhat standardised and I have very limited control over the infrastructure that I have available.

>> However, it is inevitable that there is delay in the commits at the
>> master repository propagating out to the slaves. This is not usually a
>> problem, except when a large commit has been made where the transfer
>> time of the revisions data is significant. In this situation the a
>> client that uses the slave repository can have its commit blocked
>> because it is unable to update to the latest revision because the
>> slave repository is out of sync. This is unfortunate because it makes
>> the slave repository somewhat useless until the sync has time to
>> resolve itself. In a recent situation our slave was out of sync for
>> around 3.5 hours.
>> 
>> Is there a workaround for this situation?
> 
> You're not telling why one or more commits took 3.5 hours to sync.
> Why was the date set larger than usual? Maybe if you provide more
> information about this someone will come up with a workaround.
This was a 700Mb checkin synced out to slaves over a loaded WAN link, so this was predominately transfer time of the data blob for the commit.


> One scenario where a large commit can happen is when a lot of new
> revisions are imported from a dumpfile, e.g. when a project is moved
> from one repository to another.
> The Apache Software Foundation (ASF) uses a write-through proxy setup.
> The main server is in the US and the mirror is in Europe. Occasionally,
> new code is imported with history when a project joins the ASF. 
> The new revisions are stored in a dump file which is copied to the master
> and the slave. Next, commit access is temporarily disabled on both servers.
> The dump file is loaded into both repositories. svnsync meta data is updated
> on the slave to mark the current head revision as synced (this number
> is in the svn:sync-last-merged-rev revision property at revision 0). 
> Now when commit access is re-enabled the master and slave are already
> in sync. The resulting downtime is lower than if the newly imported
> revisions were imported at the master and synced via svnsync.

Re: Delay syncing to mirror repositories causing issues

Posted by Stefan Sperling <st...@elego.de>.

On Mon, Aug 15, 2011 at 01:34:15AM +0000, Simon wrote:
> We have a main master repository and a number of mirror slave
> repositories at a bunch of locations that are set up as webdav
> transparent write-through proxies. These are synced by a process
> similar to svnsync, and this all seems to work okay.

So you're using a different synchronisation process than svnsync?
How does it work? What prevents you from using svnsync? What's the
commit-transfer latency of your procedure compared to the latency of svnsync?

> However, it is inevitable that there is delay in the commits at the
> master repository propagating out to the slaves. This is not usually a
> problem, except when a large commit has been made where the transfer
> time of the revisions data is significant. In this situation the a
> client that uses the slave repository can have its commit blocked
> because it is unable to update to the latest revision because the
> slave repository is out of sync. This is unfortunate because it makes
> the slave repository somewhat useless until the sync has time to
> resolve itself. In a recent situation our slave was out of sync for
> around 3.5 hours.
> 
> Is there a workaround for this situation?

You're not telling why one or more commits took 3.5 hours to sync.
Why was the date set larger than usual? Maybe if you provide more
information about this someone will come up with a workaround.

One scenario where a large commit can happen is when a lot of new
revisions are imported from a dumpfile, e.g. when a project is moved
from one repository to another.
The Apache Software Foundation (ASF) uses a write-through proxy setup.
The main server is in the US and the mirror is in Europe. Occasionally,
new code is imported with history when a project joins the ASF. 
The new revisions are stored in a dump file which is copied to the master
and the slave. Next, commit access is temporarily disabled on both servers.
The dump file is loaded into both repositories. svnsync meta data is updated
on the slave to mark the current head revision as synced (this number
is in the svn:sync-last-merged-rev revision property at revision 0). 
Now when commit access is re-enabled the master and slave are already
in sync. The resulting downtime is lower than if the newly imported
revisions were imported at the master and synced via svnsync.

Re: Delay syncing to mirror repositories causing issues

Posted by Thorsten Schöning <ts...@am-soft.de>.

Guten Tag Simon,
am Montag, 15. August 2011 um 03:34 schrieben Sie:

> I have a couple of other thoughts on this but I was wondering if
> anyone has some experience in this area?

Sounds like what you really want is to spent some money and a get a
working solution:

http://www.wandisco.com/subversion/multisite

Mit freundlichen Grüßen,

Thorsten Schöning

-- 
Thorsten Schöning
AM-SoFT IT-Systeme - Hameln | Potsdam | Leipzig
 
Telefon: Potsdam: 0331-743881-0
E-Mail:  tschoening@am-soft.de
Web:     http://www.am-soft.de

AM-SoFT GmbH IT-Systeme, Konsumhof 1-5, 14482 Potsdam
Amtsgericht Potsdam HRB 21278 P, Geschäftsführer: Andreas Muchow