You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@subversion.apache.org by Justin Johnson <ju...@honesthacker.com> on 2010/03/12 14:34:47 UTC

Error "An existing connection was forcibly closed by the remote host" with F5 content switch or working copy on shared drive

Hi,

I'm trying to understand why the following error occurs.

svn: REPORT request failed on '/svn/reponame/!svn/vcc/default'
svn: REPORT of '/svn/reponame/!svn/vcc/default': Could not read response
body: An existing connection was forcibly closed by the remote host.   (
http://HOSTNAME <http://hostname/>)
command exit code: 1

I've seen this error in a couple of scenarios:
1) when performing a checkout on a Windows box with the working copy stored
on a drive mapped to a NAS share
2) when performing a checkout on a Windows box and the server is an F5
content switch that just redirects traffic to the Subversion server

The first scenario is of less concern to me, but I mention it anyway since I
think it is the same problem.

For the second scenario, I worked with someone on our networking team to
understand the problem.  What he discovered and how he "resolved" it with
our F5 content switch can be found below.  The server is running Solaris 10,
Subversion 1.6.6, Apache 2.2.11, and repositories are served via HTTP.  The
client is running Windows XP SP3 and Subversion 1.6.7 (error occurs with
TortoiseSVN as well), but the error also occurs on Windows Server 2003.  I
haven't tested any other Windows client OSes and haven't seen the error on
UNIX, but suspect the underlying problem may exist there and the OS handles
it more gracefully.  Here is the explanation by my networking contact.

****
The problem that is presenting is that the client's receive buffer is
filling up and staying full for a long period of time.  When this occurs, he
advertises a tcp window size of 0 in packets he sends to the destination
F5.  This also happens when he goes directly against a server.  The server
seems to tolerate it while the F5 does not.

Last year,  I took traces of the traffic against the server by the client
directly, and through the F5, and saw that the server was seeing different
MTU and options from the F5.   I modified the standard TCP profile on the F5
to have it proxy the TCP options the client offered so the server would get
them.  I also set it to proxy the MTU setting the client offered. This
seemed to have fixed the problem at that time.  But your current testing
failed.

Upon closer inspection, I determined that the F5 was resetting the
connections, not the server as I had previously thought.  This time, I
turned off those two options from last year and increased the Maximum
Segment Retransmissions from the default of 8 to 16.  This controls the
number of times the F5 resends a packet after it gets no response.  This
also controls the zero window probes he sends to see if the client can
receive data yet.  TCP uses a back-off algorithm and increases the time
between retries.  With 8 attempts, the total retry time is just over a
minute.  I suspect retries of 16 will cause it to retry for 5 or 10 minutes.

I would really like to get this in front of SVN developers, because
something is getting hosed on the client that causes him to stop pulling off
the receive buffer.  If the zero window lasted 10 seconds or so, it would
not be a problem.  But for him to in effect go offline for over a minute is,
I believe, a bug.  We can just assume that the reason the error does not
occur when you hit the server directly is that the Sun box handles the  zero
window issue differently, or it might just retry more than 8 times by
default.  Might be a question for the UNIX team as to the retry count.  If
we get some time, we could do some packet captures and find out for certain.

Yesterday and today, I did a few other things that *did not *help.  I
increased the TCP receive buffers on the client side sessions, then on the
server side sessions, then both.  I then turned off all of the tcp options
in the F5 default TCP profile.
****

So, in summary, my problem is currently "resolved" by increasing the Maximum
Segment Retransmissions from the default of 8 to 16 on the F5.  However, as
I mentioned above I've seen this problem when connecting directly to the
Subversion server and storing the working copy on a network drive.

Does anyone have any ideas?  Is this something that can be fixed in the
Subversion code itself?

Thanks.
Justin

Re: Error "An existing connection was forcibly closed by the remote host" with F5 content switch or working copy on shared drive

Posted by Les Mikesell <le...@gmail.com>.
This sounds like the checkout is just slow in creating files, triggering server 
timeouts that are probably configurable.  When talking directly to the server, 
this should be controlled by the apache 'TimeOut' directive which defaults to 
300 seconds.  It is also possible for intermediate NAT or firewall devices to 
time apparently inactive connections out but normally they would have a much 
longer timeout for tcp connections.

  -Les


Bert Huijben wrote:
>                 Hi,
> 
>  
> 
> Subversion uses the neon (or serf) library for connecting with webdav 
> repositories. It doesn't change any of the tcp settings itself, nor does 
> it handle the tcp connections. (Neither of those has specific MTU 
> handling or anything like that as far as I can tell. They just use the 
> self-tuning support implemented in the operating system)
> 
>  
> 
> For the server side everything is handled by the Apache httpd process 
> and our mod_dav_svn just receives the pre-parsed requests; so no tcp 
> handling there in the Subversion layer either.
> 
>  
> 
> So I don't think the Subversion project can really fix this for you (as 
> part of Subversion). But you might find the developers of the other 
> projects on our development list. (Neon, Serf, Neon and Apache Httpd 
> have development lists themselves too)
> 
>  
> 
>                 Bert
> 
>  
> 
> *From:* Justin Johnson [mailto:justin@honesthacker.com]
> *Sent:* woensdag 17 maart 2010 12:47
> *To:* users@subversion.apache.org
> *Subject:* Re: Error "An existing connection was forcibly closed by the 
> remote host" with F5 content switch or working copy on shared drive
> 
>  
> 
> On Fri, Mar 12, 2010 at 9:34 AM, Justin Johnson <justin@honesthacker.com 
> <ma...@honesthacker.com>> wrote:
> 
> Hi,
> 
> I'm trying to understand why the following error occurs.
> 
> svn: REPORT request failed on '/svn/reponame/!svn/vcc/default'
> svn: REPORT of '/svn/reponame/!svn/vcc/default': Could not read response 
> body: An existing connection was forcibly closed by the remote host.   
> (http://HOSTNAME <http://hostname/>)
> command exit code: 1
> 
> I've seen this error in a couple of scenarios:
> 1) when performing a checkout on a Windows box with the working copy 
> stored on a drive mapped to a NAS share
> 2) when performing a checkout on a Windows box and the server is an F5 
> content switch that just redirects traffic to the Subversion server
> 
> The first scenario is of less concern to me, but I mention it anyway 
> since I think it is the same problem.
> 
> For the second scenario, I worked with someone on our networking team to 
> understand the problem.  What he discovered and how he "resolved" it 
> with our F5 content switch can be found below.  The server is running 
> Solaris 10, Subversion 1.6.6, Apache 2.2.11, and repositories are served 
> via HTTP.  The client is running Windows XP SP3 and Subversion 1.6.7 
> (error occurs with TortoiseSVN as well), but the error also occurs on 
> Windows Server 2003.  I haven't tested any other Windows client OSes and 
> haven't seen the error on UNIX, but suspect the underlying problem may 
> exist there and the OS handles it more gracefully.  Here is the 
> explanation by my networking contact.
> 
> ****
> 
> The problem that is presenting is that the client's receive buffer is 
> filling up and staying full for a long period of time.  When this 
> occurs, he advertises a tcp window size of 0 in packets he sends to the 
> destination F5.  This also happens when he goes directly against a 
> server.  The server seems to tolerate it while the F5 does not. 
> 
>  
> 
> Last year,  I took traces of the traffic against the server by the 
> client directly, and through the F5, and saw that the server was seeing 
> different MTU and options from the F5.   I modified the standard TCP 
> profile on the F5 to have it proxy the TCP options the client offered so 
> the server would get them.  I also set it to proxy the MTU setting the 
> client offered. This seemed to have fixed the problem at that time.  But 
> your current testing failed. 
> 
>  
> 
> Upon closer inspection, I determined that the F5 was resetting the 
> connections, not the server as I had previously thought.  This time, I 
> turned off those two options from last year and increased the Maximum 
> Segment Retransmissions from the default of 8 to 16.  This controls the 
> number of times the F5 resends a packet after it gets no response.  This 
> also controls the zero window probes he sends to see if the client can 
> receive data yet.  TCP uses a back-off algorithm and increases the time 
> between retries.  With 8 attempts, the total retry time is just over a 
> minute.  I suspect retries of 16 will cause it to retry for 5 or 10 minutes.
> 
>  
> 
> I would really like to get this in front of SVN developers, because 
> something is getting hosed on the client that causes him to stop pulling 
> off the receive buffer.  If the zero window lasted 10 seconds or so, it 
> would not be a problem.  But for him to in effect go offline for over a 
> minute is, I believe, a bug.  We can just assume that the reason the 
> error does not occur when you hit the server directly is that the Sun 
> box handles the  zero window issue differently, or it might just retry 
> more than 8 times by default.  Might be a question for the UNIX team as 
> to the retry count.  If we get some time, we could do some packet 
> captures and find out for certain.
> 
>  
> 
> Yesterday and today, I did a few other things that _did not _help.  I 
> increased the TCP receive buffers on the client side sessions, then on 
> the server side sessions, then both.  I then turned off all of the tcp 
> options in the F5 default TCP profile.
> ****
> 
> So, in summary, my problem is currently "resolved" by increasing the 
> Maximum Segment Retransmissions from the default of 8 to 16 on the F5.  
> However, as I mentioned above I've seen this problem when connecting 
> directly to the Subversion server and storing the working copy on a 
> network drive.
> 
> Does anyone have any ideas?  Is this something that can be fixed in the 
> Subversion code itself?
> 
> Thanks.
> Justin
> 
>  
> 
> 
> No responses?  This seems like something more for the dev list, but I 
> wanted to follow protocol and wait for a response from the users list first.
> 
> Thanks.
> Justin
> 

RE: Error "An existing connection was forcibly closed by the remote host" with F5 content switch or working copy on shared drive

Posted by Bert Huijben <be...@qqmail.nl>.
                Hi,

 

Subversion uses the neon (or serf) library for connecting with webdav repositories. It doesn't change any of the tcp settings itself, nor does it handle the tcp connections. (Neither of those has specific MTU handling or anything like that as far as I can tell. They just use the self-tuning support implemented in the operating system)

 

For the server side everything is handled by the Apache httpd process and our mod_dav_svn just receives the pre-parsed requests; so no tcp handling there in the Subversion layer either.

 

So I don't think the Subversion project can really fix this for you (as part of Subversion). But you might find the developers of the other projects on our development list. (Neon, Serf, Neon and Apache Httpd have development lists themselves too)

 

                Bert

 

From: Justin Johnson [mailto:justin@honesthacker.com] 
Sent: woensdag 17 maart 2010 12:47
To: users@subversion.apache.org
Subject: Re: Error "An existing connection was forcibly closed by the remote host" with F5 content switch or working copy on shared drive

 

On Fri, Mar 12, 2010 at 9:34 AM, Justin Johnson <ju...@honesthacker.com> wrote:

Hi,

I'm trying to understand why the following error occurs.

svn: REPORT request failed on '/svn/reponame/!svn/vcc/default' 
svn: REPORT of '/svn/reponame/!svn/vcc/default': Could not read response body: An existing connection was forcibly closed by the remote host.   (http://HOSTNAME <http://hostname/> ) 
command exit code: 1 

I've seen this error in a couple of scenarios:
1) when performing a checkout on a Windows box with the working copy stored on a drive mapped to a NAS share
2) when performing a checkout on a Windows box and the server is an F5 content switch that just redirects traffic to the Subversion server

The first scenario is of less concern to me, but I mention it anyway since I think it is the same problem.

For the second scenario, I worked with someone on our networking team to understand the problem.  What he discovered and how he "resolved" it with our F5 content switch can be found below.  The server is running Solaris 10, Subversion 1.6.6, Apache 2.2.11, and repositories are served via HTTP.  The client is running Windows XP SP3 and Subversion 1.6.7 (error occurs with TortoiseSVN as well), but the error also occurs on Windows Server 2003.  I haven't tested any other Windows client OSes and haven't seen the error on UNIX, but suspect the underlying problem may exist there and the OS handles it more gracefully.  Here is the explanation by my networking contact.

****

The problem that is presenting is that the client's receive buffer is filling up and staying full for a long period of time.  When this occurs, he advertises a tcp window size of 0 in packets he sends to the destination F5.  This also happens when he goes directly against a server.  The server seems to tolerate it while the F5 does not.  

 

Last year,  I took traces of the traffic against the server by the client directly, and through the F5, and saw that the server was seeing different MTU and options from the F5.   I modified the standard TCP profile on the F5 to have it proxy the TCP options the client offered so the server would get them.  I also set it to proxy the MTU setting the client offered. This seemed to have fixed the problem at that time.  But your current testing failed.  

 

Upon closer inspection, I determined that the F5 was resetting the connections, not the server as I had previously thought.  This time, I turned off those two options from last year and increased the Maximum Segment Retransmissions from the default of 8 to 16.  This controls the number of times the F5 resends a packet after it gets no response.  This also controls the zero window probes he sends to see if the client can receive data yet.  TCP uses a back-off algorithm and increases the time between retries.  With 8 attempts, the total retry time is just over a minute.  I suspect retries of 16 will cause it to retry for 5 or 10 minutes.

 

I would really like to get this in front of SVN developers, because something is getting hosed on the client that causes him to stop pulling off the receive buffer.  If the zero window lasted 10 seconds or so, it would not be a problem.  But for him to in effect go offline for over a minute is, I believe, a bug.  We can just assume that the reason the error does not occur when you hit the server directly is that the Sun box handles the  zero window issue differently, or it might just retry more than 8 times by default.  Might be a question for the UNIX team as to the retry count.  If we get some time, we could do some packet captures and find out for certain.

 

Yesterday and today, I did a few other things that did not help.  I increased the TCP receive buffers on the client side sessions, then on the server side sessions, then both.  I then turned off all of the tcp options in the F5 default TCP profile.
****

So, in summary, my problem is currently "resolved" by increasing the Maximum Segment Retransmissions from the default of 8 to 16 on the F5.  However, as I mentioned above I've seen this problem when connecting directly to the Subversion server and storing the working copy on a network drive.

Does anyone have any ideas?  Is this something that can be fixed in the Subversion code itself?

Thanks.
Justin

 


No responses?  This seems like something more for the dev list, but I wanted to follow protocol and wait for a response from the users list first.

Thanks.
Justin


Re: Error "An existing connection was forcibly closed by the remote host" with F5 content switch or working copy on shared drive

Posted by Justin Johnson <ju...@honesthacker.com>.
On Fri, Mar 12, 2010 at 9:34 AM, Justin Johnson <ju...@honesthacker.com>wrote:

> Hi,
>
> I'm trying to understand why the following error occurs.
>
> svn: REPORT request failed on '/svn/reponame/!svn/vcc/default'
> svn: REPORT of '/svn/reponame/!svn/vcc/default': Could not read response
> body: An existing connection was forcibly closed by the remote host.   (
> http://HOSTNAME <http://hostname/>)
> command exit code: 1
>
> I've seen this error in a couple of scenarios:
> 1) when performing a checkout on a Windows box with the working copy stored
> on a drive mapped to a NAS share
> 2) when performing a checkout on a Windows box and the server is an F5
> content switch that just redirects traffic to the Subversion server
>
> The first scenario is of less concern to me, but I mention it anyway since
> I think it is the same problem.
>
> For the second scenario, I worked with someone on our networking team to
> understand the problem.  What he discovered and how he "resolved" it with
> our F5 content switch can be found below.  The server is running Solaris 10,
> Subversion 1.6.6, Apache 2.2.11, and repositories are served via HTTP.  The
> client is running Windows XP SP3 and Subversion 1.6.7 (error occurs with
> TortoiseSVN as well), but the error also occurs on Windows Server 2003.  I
> haven't tested any other Windows client OSes and haven't seen the error on
> UNIX, but suspect the underlying problem may exist there and the OS handles
> it more gracefully.  Here is the explanation by my networking contact.
>
> ****
> The problem that is presenting is that the client's receive buffer is
> filling up and staying full for a long period of time.  When this occurs, he
> advertises a tcp window size of 0 in packets he sends to the destination
> F5.  This also happens when he goes directly against a server.  The server
> seems to tolerate it while the F5 does not.
>
> Last year,  I took traces of the traffic against the server by the client
> directly, and through the F5, and saw that the server was seeing different
> MTU and options from the F5.   I modified the standard TCP profile on the F5
> to have it proxy the TCP options the client offered so the server would get
> them.  I also set it to proxy the MTU setting the client offered. This
> seemed to have fixed the problem at that time.  But your current testing
> failed.
>
> Upon closer inspection, I determined that the F5 was resetting the
> connections, not the server as I had previously thought.  This time, I
> turned off those two options from last year and increased the Maximum
> Segment Retransmissions from the default of 8 to 16.  This controls the
> number of times the F5 resends a packet after it gets no response.  This
> also controls the zero window probes he sends to see if the client can
> receive data yet.  TCP uses a back-off algorithm and increases the time
> between retries.  With 8 attempts, the total retry time is just over a
> minute.  I suspect retries of 16 will cause it to retry for 5 or 10 minutes.
>
> I would really like to get this in front of SVN developers, because
> something is getting hosed on the client that causes him to stop pulling off
> the receive buffer.  If the zero window lasted 10 seconds or so, it would
> not be a problem.  But for him to in effect go offline for over a minute is,
> I believe, a bug.  We can just assume that the reason the error does not
> occur when you hit the server directly is that the Sun box handles the  zero
> window issue differently, or it might just retry more than 8 times by
> default.  Might be a question for the UNIX team as to the retry count.  If
> we get some time, we could do some packet captures and find out for certain.
>
> Yesterday and today, I did a few other things that *did not *help.  I
> increased the TCP receive buffers on the client side sessions, then on the
> server side sessions, then both.  I then turned off all of the tcp options
> in the F5 default TCP profile.
> ****
>
> So, in summary, my problem is currently "resolved" by increasing the
> Maximum Segment Retransmissions from the default of 8 to 16 on the F5.
> However, as I mentioned above I've seen this problem when connecting
> directly to the Subversion server and storing the working copy on a network
> drive.
>
> Does anyone have any ideas?  Is this something that can be fixed in the
> Subversion code itself?
>
> Thanks.
> Justin
>
>
No responses?  This seems like something more for the dev list, but I wanted
to follow protocol and wait for a response from the users list first.

Thanks.
Justin