You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@subversion.apache.org by Auke Jilderda <au...@philips.com> on 2004/10/26 08:14:00 UTC

More robust handling of shaky network connections?

Trying to use Subversion at a fairly large project, we repeatedly run into
some networking issues, causing SVN to bail out with error messages such
as: "Could not read response body: An existing connection was forcibly
closed by the remote host." and "Could not create SSL connection through
proxy server".  We suspect it might be our corporate proxy server
infrastructure that might be causing this but we have yet to track down
the precise cause.

In any case, it raises an interesting issue: Apparently, Subversion does
not try to re-connect a busted network connection and I'm wondering why
that is?  Being a networked application that often needs an open
connection for a prolonged amount of time (e.g. we've seen commits that
take over two hours), it might be wise to provide some sort of graceful
degradation (e.g. trying once or twice to restore a broken connection
before bailing out).  Any arguments in favour or against this?


Auke


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: More robust handling of shaky network connections?

Posted by "C. Michael Pilato" <cm...@collab.net>.

Auke Jilderda <au...@philips.com> writes:

> In any case, it raises an interesting issue: Apparently, Subversion
> does not try to re-connect a busted network connection and I'm
> wondering why that is?  Being a networked application that often
> needs an open connection for a prolonged amount of time (e.g. we've
> seen commits that take over two hours), it might be wise to provide
> some sort of graceful degradation (e.g. trying once or twice to
> restore a broken connection before bailing out).  Any arguments in
> favour or against this?

This is (obviously) a good idea.  It's the implementation that I fear.
Subversion's modularity keeps the network stuff (that moves the data
which transforms a working copy from state to state) well away from
the working copy management code (which actually understands those
states).  If a long-lived request, like a REPORT used during a
checkout or update operation, was to die in the middle somewhere, the
repository access layer would be completely oblivious to the details
of the half-finished operation.

Because our most widely used data transfer API is the "editor", which
demands depth-first tree ordering with no revisitation, the RA layer
would need to somehow signal the WC layer about the network problem so
that either the WC could rollback to the same state it had before the
initial request, or at least be placed into a mode where it expected
to see much of the same data changes that it already saw (and know
that this is okay).

Allow me to wonder aloud so that my ignorance is easier to see.

Could this be accomplished strictly at the RA layer level?  What if
the RA modules kept track of exactly where they were in processing a
request when the connection dropped, and then, on repetition, ignored
everything up to that point.  I'm thinking about the likes of 'wget
-c' (continue where I left off).  So, for example, if libsvn_ra_dav
know it had read 12,015 bytes off the stream successfully before
something died, it would repeat, ignore 12,015 bytes, and then
continue processing at the 12,016th byte.  The working copy code (and
perhaps even the user) would be oblivious to a problem having
occured.  Something tells me it just ain't that simple.

Could this be accomplished strictly at the client layer level?  We've
done a lot of work to make operations like checkouts and updates
restartable.  There are still bugs in these areas (switches, notably),
and some stuff that basically works but looks scary (merges showing
'G' for everything previously merged), but if we could get our
subcommands to a place where the larger operation could be safely
re-attempted, and where the RA layers return clear indications (in the
forms of predictable, dedicated error codes) of when a failure has
occured for a network integrity reason, then perhaps this kind of
re-attempt processing could happen even well up into the client
libraries.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: More robust handling of shaky network connections?

Posted by kf...@collab.net.

Bruce Elrick <br...@elrick.ca> writes:
> As Auke says, with a 2 hour application operations, it would be nice
> if a short network interruption (which is sufficient to break a TCP
> connection) would not lose the much longer-lived operation.
> 
> Anyway, it would be a large undertaking.  My point is that there are
> valid reasons for there being additional failure handing above the
> network connection layer, or to put it the other way, TCP gives you a
> very low level of failure recovery, but certainly not good enough for
> every case.

Hmmm, okay, that does sound reasonable, yeah.

Mike Pilato has already speculated about how we might implement this.
I can't really add to what he said.  The trick of picking up an
operation where it left off (as opposed to restarting it) is pretty
hard in Subversion.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: More robust handling of shaky network connections?

Posted by Bruce Elrick <br...@elrick.ca>.

kfogel@collab.net wrote:

>Auke Jilderda <au...@philips.com> writes:
>  
>
>>Trying to use Subversion at a fairly large project, we repeatedly run into
>>some networking issues, causing SVN to bail out with error messages such
>>as: "Could not read response body: An existing connection was forcibly
>>closed by the remote host." and "Could not create SSL connection through
>>proxy server".  We suspect it might be our corporate proxy server
>>infrastructure that might be causing this but we have yet to track down
>>the precise cause.
>>
>>In any case, it raises an interesting issue: Apparently, Subversion does
>>not try to re-connect a busted network connection and I'm wondering why
>>that is?  Being a networked application that often needs an open
>>connection for a prolonged amount of time (e.g. we've seen commits that
>>take over two hours), it might be wise to provide some sort of graceful
>>degradation (e.g. trying once or twice to restore a broken connection
>>before bailing out).  Any arguments in favour or against this?
>>    
>>
>
>I feel a little uncomfortable making Subversion take on
>responsibilities that should belong to the transport layer.  Wasn't
>TCP supposed to take care of this stuff for us?  If it's not, can we
>realistically do much better?
>
>  
>
TCP is designed to account for packet loss and has generalized 
algorithms with corresponding parameters to accomplish this, but it is 
quite generic.  As well, it is purely for maintaining a communication 
session from a networking point of view.  There are certainly cases 
where from a TCP point of view is has exhausted its algorithms and as 
such the communication session gets killed (either from the POV of the 
client or the server TCP stack, or both)  but from an application 
session POV, if another TCP connection can be established, it makes 
sense to not abandon the application session (and all the application 
state, both server & client) so that they can continue on a new TCP 
connection. 

I've worked with a product (Tivoli Storage Manager, which coincidentally 
does versioned backups) whose backup/archive client will try to 
re-establish sessions with the server if the original TCP session dies 
and pick up where it left off; it actually maps its sessions to TCP 
connections and simply allows a client operation to continue by creating 
a new application session on top of a new TCP connection.  When you have 
experienced that re-connect at the application layer, you really 
appreciate it.

As Auke says, with a 2 hour application operations, it would be nice if 
a short network interruption (which is sufficient to break a TCP 
connection) would not lose the much longer-lived operation.

Anyway, it would be a large undertaking.  My point is that there are 
valid reasons for there being additional failure handing above the 
network connection layer, or to put it the other way, TCP gives you a 
very low level of failure recovery, but certainly not good enough for 
every case.

Cheers...
Bruce

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: More robust handling of shaky network connections?

Posted by kf...@collab.net.

Auke Jilderda <au...@philips.com> writes:
> Trying to use Subversion at a fairly large project, we repeatedly run into
> some networking issues, causing SVN to bail out with error messages such
> as: "Could not read response body: An existing connection was forcibly
> closed by the remote host." and "Could not create SSL connection through
> proxy server".  We suspect it might be our corporate proxy server
> infrastructure that might be causing this but we have yet to track down
> the precise cause.
> 
> In any case, it raises an interesting issue: Apparently, Subversion does
> not try to re-connect a busted network connection and I'm wondering why
> that is?  Being a networked application that often needs an open
> connection for a prolonged amount of time (e.g. we've seen commits that
> take over two hours), it might be wise to provide some sort of graceful
> degradation (e.g. trying once or twice to restore a broken connection
> before bailing out).  Any arguments in favour or against this?

I feel a little uncomfortable making Subversion take on
responsibilities that should belong to the transport layer.  Wasn't
TCP supposed to take care of this stuff for us?  If it's not, can we
realistically do much better?


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org