You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@subversion.apache.org by Stuart MacDonald <st...@gmail.com> on 2013/12/16 16:56:32 UTC

Slow VM, svn client drops connection with FIN packet

I am seeing exactly this issue:
https://groups.google.com/forum/#!topic/subversion_users/AZ4M62Qwmjc but do
not find a bug report for it in the database. The linked bug is for
something similar but unrelated. Can I file this?

...Stu

stumacdonald@stumacdonald-VirtualBox:~/src/ianywhere$ svn --version
svn, version 1.7.13 (r1516569)
compiled Aug 27 2013, 09:06:07

Copyright (C) 2013 The Apache Software Foundation.
This software consists of contributions made by many people; see the NOTICE
file for more information.
Subversion is open source software, see http://subversion.apache.org/

The following repository access (RA) modules are available:

* ra_neon : Module for accessing a repository via WebDAV protocol using
Neon.
- handles 'http' scheme
- handles 'https' scheme
* ra_svn : Module for accessing a repository using the svn network protocol.
- with Cyrus SASL authentication
- handles 'svn' scheme
* ra_local : Module for accessing a repository on local disk.
- handles 'file' scheme

Re: Slow VM, svn client drops connection with FIN packet

Posted by Stuart MacDonald <st...@gmail.com>.
Thanks, I'll pass it along. It mentions that commit access is required to
exploit this though, so I think IT will probably ignore this if they
haven't already patched it.

...Stu


On Mon, Dec 16, 2013 at 2:20 PM, Ben Reser <be...@reser.org> wrote:

> On 12/16/13 11:08 AM, Stuart MacDonald wrote:
> > svn is 1.7.7  (we are not planning to upgrade for some time)
>
> This doesn't help with your issue but if you need ammo to convince IT to
> upgrade:
> https://subversion.apache.org/security/CVE-2013-4131-advisory.txt
>
> If it's a distribution package it might have been patched without changing
> the
> version number.
>

Re: Slow VM, svn client drops connection with FIN packet

Posted by Ben Reser <be...@reser.org>.
On 12/16/13 11:08 AM, Stuart MacDonald wrote:
> svn is 1.7.7  (we are not planning to upgrade for some time)

This doesn't help with your issue but if you need ammo to convince IT to upgrade:
https://subversion.apache.org/security/CVE-2013-4131-advisory.txt

If it's a distribution package it might have been patched without changing the
version number.

Re: Slow VM, svn client drops connection with FIN packet

Posted by Stuart MacDonald <st...@gmail.com>.
IT says we have:
nginx 1.05 with plans to move to 1.3.9
svn is 1.7.7  (we are not planning to upgrade for some time)

...Stu


On Mon, Dec 16, 2013 at 12:45 PM, Stuart MacDonald <stuartm.coding@gmail.com
> wrote:

> The link I provided is *exactly* what I'm seeing, so I didn't see the need
> to repeat that post. I'm in a VM, the client drops the connection with an
> erroneous [FIN, ACK], just after the TCP window opens up again.
>
> Hm, one detail I can provide now that's not in that post:
> U  directory/file1
> U  directory/file2
> svn: E175002: REPORT of '/svn/repos/<name>/!svn/me': Could not read
> response body: connection was closed by server (http://<server>)
>
> I don't know what's on the server side, and may not be able to find out,
> but I'll inquire. Hm, actually the network trace shows "Server:
> nginx/1.0.5". Client operation  is 'svn up'.
>
> What else would you like me to do/provide?
>
> ...Stu
>
>
>
> On Mon, Dec 16, 2013 at 12:05 PM, Ben Reser <be...@reser.org> wrote:
>
>> On Mon Dec 16 07:56:32 2013, Stuart MacDonald wrote:
>> >
>> > I am seeing exactly this
>> > issue:
>> https://groups.google.com/forum/#!topic/subversion_users/AZ4M62Qwmjc
>> > but do not find a bug report for it in the database. The linked bug is
>> for
>> > something similar but unrelated. Can I file this?
>> >
>> > ...Stu
>> >
>> > stumacdonald@stumacdonald-VirtualBox:~/src/ianywhere$ svn --version
>> > svn, version 1.7.13 (r1516569)
>> > compiled Aug 27 2013, 09:06:07
>> >
>> > Copyright (C) 2013 The Apache Software Foundation.
>> > This software consists of contributions made by many people; see the
>> NOTICE
>> > file for more information.
>> > Subversion is open source software, see http://subversion.apache.org/
>> >
>> > The following repository access (RA) modules are available:
>> >
>> > * ra_neon : Module for accessing a repository via WebDAV protocol using
>> Neon.
>> > - handles 'http' scheme
>> > - handles 'https' scheme
>> > * ra_svn : Module for accessing a repository using the svn network
>> protocol.
>> > - with Cyrus SASL authentication
>> > - handles 'svn' scheme
>> > * ra_local : Module for accessing a repository on local disk.
>> > - handles 'file' scheme
>>
>> Before reporting an issue you really ought to try it with the most
>> recent version of Subversion (1.7.14 and 1.8.5).  Since 1.8.0 we only
>> use the serf http-library.  If you can trigger the same issue with
>> 1.8.5 then that's likely a sign that we aren't tending the TCP
>> connections as we should (vs the neon http library that your version
>> output shows you're using with 1.7.13).  Another point is that you
>> really should provide more details here.  What versions of httpd and
>> svn are on the server side?  What client operation(s) are you having
>> the issue with.
>>
>> Without a lot more research on your part the bug isn't likely to get
>> much interest.
>>
>
>

Re: Slow VM, svn client drops connection with FIN packet

Posted by st...@gmail.com.
I replied yesterday, but my post seems to have disappeared.

I figured it out, and the key was your reproduction and my realisation I 
wasn't seeing a FIN from my server. It turns out that in every case the last
392598 71.906863000 10.222.3.88 10.14.11.50 HTTP 1434 Continuation or 
non-HTTP traffic
line from the server is actually also a FIN packet, and Wireshark (or the 
TCP dissector) is "hiding" that; if I look at the packet I can see the FIN 
flag, but it's not mentioned in the Info field in the packet list. So the 
FINs from my client are close handshake responses not initiations, and it 
is the server timing out after all.

Thanks for the help. I'm checking with my IT dept to see if they can bump 
up the timeout.

...Stu

On Tuesday, January 14, 2014 11:55:39 AM UTC-5, Philip Martin wrote:
>
> Stuart MacDonald <stuartm...@gmail.com <javascript:>> writes: 
>
> > svn: E175002: REPORT of '/svn/repos/deckard-65x/!svn/me': Could not read 
> > response body: connection was closed by server (http://foundry51.qnx.com) 
>
>
> I can provoke this on my local LAN by setting the Apache timeout on the 
> server to a low value such as 5 seconds.  Then I run a checkout that 
> generates a large REPORT response and I use gdb to interrupt the client 
> at subversion/libsvn_ra_neon/fetch.c:1709 so that the client is in the 
> middle of reading the repsonse.  I wait for the server timeout to 
> expire and then resume the client and I get the error: 
>
> svn: E175002: REPORT of '/obj/repo/!svn/me': Could not read response body: 
> connection was closed by server (http://peri.local:8888) 
>
> One fix is to increase the server timeout to handle your slow client. 
>
> -- 
> Philip Martin | Subversion Committer 
> WANdisco // *Non-Stop Data* 
>

Re: Slow VM, svn client drops connection with FIN packet

Posted by Philip Martin <ph...@wandisco.com>.
Stuart MacDonald <st...@gmail.com> writes:

> svn: E175002: REPORT of '/svn/repos/deckard-65x/!svn/me': Could not read
> response body: connection was closed by server (http://foundry51.qnx.com)

I can provoke this on my local LAN by setting the Apache timeout on the
server to a low value such as 5 seconds.  Then I run a checkout that
generates a large REPORT response and I use gdb to interrupt the client
at subversion/libsvn_ra_neon/fetch.c:1709 so that the client is in the
middle of reading the repsonse.  I wait for the server timeout to
expire and then resume the client and I get the error:

svn: E175002: REPORT of '/obj/repo/!svn/me': Could not read response body: connection was closed by server (http://peri.local:8888)

One fix is to increase the server timeout to handle your slow client.

-- 
Philip Martin | Subversion Committer
WANdisco // *Non-Stop Data*

Re: Slow VM, svn client drops connection with FIN packet

Posted by Stuart MacDonald <st...@gmail.com>.
Here's this morning's failure. For testing, I've set http-timeout=67. I am
trying to see if the zero window is a coincidence or not.

The debug output:
Reading 8192 bytes of response body.
Got 8192 bytes.
XML: Parsing 8192 bytes.
XML: char-data (235) returns 0
XML: xmlParseChunk returned 0
Reading 8192 bytes of response body.
Got 6180 bytes.
XML: Parsing 6180 bytes.
XML: char-data (235) returns 0
XML: xmlParseChunk returned 0
Reading 8192 bytes of response body.
Aborted request (-3): Could not read response body
sess: Closing connection.
sess: Connection closed.
Request ends, status 200 class 2xx, error line:
Could not read response body: connection was closed by server
Running destroy hooks.
Request ends.
svn: E175002: REPORT of '/svn/repos/deckard-65x/!svn/me': Could not read
response body: connection was closed by server (http://foundry51.qnx.com)
sess: Destroying session.
sess: Destroying session.

The network capture:
391174    498.610439000    10.222.3.88    10.14.11.50    HTTP    1434
Continuation or non-HTTP traffic
391175    498.610495000    10.14.11.50    10.222.3.88    TCP    54    37480
> http [ACK] Seq=1945767 Ack=480574851 Win=1280 Len=0
391176    498.842179000    10.222.3.88    10.14.11.50    HTTP    1334
[TCP Window Full] Continuation or non-HTTP traffic
391177    499.042264000    10.14.11.50    10.222.3.88    TCP    54    [TCP
ZeroWindow] 37480 > http [ACK] Seq=1945767 Ack=480576131 Win=0 Len=0
391178    499.297972000    10.222.3.88    10.14.11.50    TCP    60    [TCP
Keep-Alive] http > 37480 [ACK] Seq=480576130 Ack=1945767 Win=908288 Len=0
391179    499.298065000    10.14.11.50    10.222.3.88    TCP    54    [TCP
ZeroWindow] 37480 > http [ACK] Seq=1945767 Ack=480576131 Win=0 Len=0
391180    499.796762000    10.222.3.88    10.14.11.50    TCP    60    [TCP
Keep-Alive] http > 37480 [ACK] Seq=480576130 Ack=1945767 Win=908288 Len=0
391181    499.796811000    10.14.11.50    10.222.3.88    TCP    54    [TCP
ZeroWindow] 37480 > http [ACK] Seq=1945767 Ack=480576131 Win=0 Len=0
391182    500.778349000    10.222.3.88    10.14.11.50    TCP    60    [TCP
Keep-Alive] http > 37480 [ACK] Seq=480576130 Ack=1945767 Win=908288 Len=0
391183    500.778405000    10.14.11.50    10.222.3.88    TCP    54    [TCP
ZeroWindow] 37480 > http [ACK] Seq=1945767 Ack=480576131 Win=0 Len=0
391184    502.728191000    10.222.3.88    10.14.11.50    TCP    60    [TCP
Keep-Alive] http > 37480 [ACK] Seq=480576130 Ack=1945767 Win=908288 Len=0
391185    502.728260000    10.14.11.50    10.222.3.88    TCP    54    [TCP
ZeroWindow] 37480 > http [ACK] Seq=1945767 Ack=480576131 Win=0 Len=0
391186    506.614031000    10.222.3.88    10.14.11.50    TCP    60    [TCP
Keep-Alive] http > 37480 [ACK] Seq=480576130 Ack=1945767 Win=908288 Len=0
391187    506.614093000    10.14.11.50    10.222.3.88    TCP    54    [TCP
ZeroWindow] 37480 > http [ACK] Seq=1945767 Ack=480576131 Win=0 Len=0
391188    514.371638000    10.222.3.88    10.14.11.50    TCP    60    [TCP
Keep-Alive] http > 37480 [ACK] Seq=480576130 Ack=1945767 Win=908288 Len=0
391189    514.371694000    10.14.11.50    10.222.3.88    TCP    54    [TCP
ZeroWindow] 37480 > http [ACK] Seq=1945767 Ack=480576131 Win=0 Len=0
391190    529.872211000    10.222.3.88    10.14.11.50    TCP    60    [TCP
Keep-Alive] http > 37480 [ACK] Seq=480576130 Ack=1945767 Win=908288 Len=0
391191    *REF*    10.14.11.50    10.222.3.88    TCP    54    [TCP
ZeroWindow] 37480 > http [ACK] Seq=1945767 Ack=480576131 Win=0 Len=0
391192    30.988601000    10.222.3.88    10.14.11.50    TCP    60    [TCP
Keep-Alive] http > 37480 [ACK] Seq=480576130 Ack=1945767 Win=908288 Len=0
391193    30.988645000    10.14.11.50    10.222.3.88    TCP    54    [TCP
ZeroWindow] 37480 > http [ACK] Seq=1945767 Ack=480576131 Win=0 Len=0
391194    71.282655000    10.14.11.50    10.222.3.88    TCP    54    [TCP
Window Update] 37480 > http [ACK] Seq=1945767 Ack=480576131 Win=1171456
Len=0
391195    71.296598000    10.222.3.88    10.14.11.50    HTTP    1434
Continuation or non-HTTP traffic
which results in:
392598    71.906863000    10.222.3.88    10.14.11.50    HTTP    1434
Continuation or non-HTTP traffic
392599    71.906925000    10.14.11.50    10.222.3.88    TCP    54    37480
> http [ACK] Seq=1945767 Ack=482310892 Win=987904 Len=0
392600    72.043276000    10.14.11.50    10.222.3.88    TCP    54    [TCP
Window Update] 37480 > http [ACK] Seq=1945767 Ack=482310892 Win=1188096
Len=0
392601    72.375325000    10.14.11.50    10.222.3.88    TCP    54    37480
> http [FIN, ACK] Seq=1945767 Ack=482310892 Win=1188096 Len=0
392602    72.388690000    10.222.3.88    10.14.11.50    TCP    60    http >
37480 [ACK] Seq=482310892 Ack=1945768 Win=908288 Len=0

There are two interesting things:
The long gap after the *REF* packet is unusual, it's the first time I've
seen that.
The FIN packet is 72 seconds later, not 67 like I set the timeout. OTOH,
it's close, and, given the long gap no packets occur at FIN - 67 seconds.

I'm going to keep trying random timeout values.

...Stu


On Tue, Jan 14, 2014 at 9:27 AM, Stuart MacDonald
<st...@gmail.com>wrote:

> I don't have access to the server log, but you can see from the wireshark
> capture that the *client* closed the connection (look for the RST packet),
> not the server. The error of "socket closed" is consistent with that.
>
> I suspect that there's an issue handling the "http-timeout" timer; it gets
> triggered during a TCP zero window situation, and then not canceled
> properly, and then it fires, closing the socket.
>
> That would also fit the data of the bug being in 1.6 (reported by someone
> else, but I linked to it in my original post), 1.7 and 1.8; it's a bug in
> svn's handling of the underlying HTTP interaction layer, and not in that
> layer or the server or somewhere else.
>
> I could be wrong. When I look at the evidence so far, that's what I see.
>
> ...Stu
>
>
> On Tue, Jan 14, 2014 at 6:33 AM, Philip Martin <philip.martin@wandisco.com
> > wrote:
>
>> stuartm.coding@gmail.com writes:
>>
>> > I have another failure, with some debug info, and some updates.
>> >
>> > We're switching to a different system for this particular repository,
>> so I
>> > may or may not be able to continue debug efforts. :-(
>> >
>> > Since there's no debug info available for the svn 1.8 client, I haven't
>> > been running that.
>> >
>> > I went a full week without a failure, which I suspect is due to the
>> > enormous amount of debugging that neon-debug-mask=255 produces, so this
>> > week I dialed it back to 127 and got a failure on the first try.
>> >
>> > Here's the latest failure. First up, the debug info:
>> > Got 8192 bytes.
>> > XML: Parsing 8192 bytes.
>> > XML: char-data (235) returns 0
>> > XML: xmlParseChunk returned 0
>> > Reading 8192 bytes of response body.
>> > Got 6600 bytes.
>> > XML: Parsing 6600 bytes.
>> > XML: char-data (235) returns 0
>> > XML: xmlParseChunk returned 0
>> > Reading 8192 bytes of response body.
>> > Aborted request (-3): Could not read response body
>>
>> That message comes from ne_request.c and the -3 is NE_SOCK_CLOSED
>> defined in ne_socket.h:
>>
>> #define NE_SOCK_CLOSED (-3)
>> /* Connection was reset (e.g. server crashed) */
>>
>> > sess: Closing connection.
>> > sess: Connection closed.
>> > Request ends, status 200 class 2xx, error line:
>> > Could not read response body: connection was closed by server
>> > Running destroy hooks.
>> > Request ends.
>> > svn: E175002: REPORT of '/svn/repos/<name>/!svn/me': Could not read
>> > response body: connection was closed by server (http://<server>)
>> > sess: Destroying session.
>> > sess: Destroying session.
>>
>> What does the server error log show?
>>
>> --
>> Philip Martin | Subversion Committer
>> WANdisco // *Non-Stop Data*
>>
>
>

Re: Slow VM, svn client drops connection with FIN packet

Posted by Stuart MacDonald <st...@gmail.com>.
I don't have access to the server log, but you can see from the wireshark
capture that the *client* closed the connection (look for the RST packet),
not the server. The error of "socket closed" is consistent with that.

I suspect that there's an issue handling the "http-timeout" timer; it gets
triggered during a TCP zero window situation, and then not canceled
properly, and then it fires, closing the socket.

That would also fit the data of the bug being in 1.6 (reported by someone
else, but I linked to it in my original post), 1.7 and 1.8; it's a bug in
svn's handling of the underlying HTTP interaction layer, and not in that
layer or the server or somewhere else.

I could be wrong. When I look at the evidence so far, that's what I see.

...Stu


On Tue, Jan 14, 2014 at 6:33 AM, Philip Martin
<ph...@wandisco.com>wrote:

> stuartm.coding@gmail.com writes:
>
> > I have another failure, with some debug info, and some updates.
> >
> > We're switching to a different system for this particular repository, so
> I
> > may or may not be able to continue debug efforts. :-(
> >
> > Since there's no debug info available for the svn 1.8 client, I haven't
> > been running that.
> >
> > I went a full week without a failure, which I suspect is due to the
> > enormous amount of debugging that neon-debug-mask=255 produces, so this
> > week I dialed it back to 127 and got a failure on the first try.
> >
> > Here's the latest failure. First up, the debug info:
> > Got 8192 bytes.
> > XML: Parsing 8192 bytes.
> > XML: char-data (235) returns 0
> > XML: xmlParseChunk returned 0
> > Reading 8192 bytes of response body.
> > Got 6600 bytes.
> > XML: Parsing 6600 bytes.
> > XML: char-data (235) returns 0
> > XML: xmlParseChunk returned 0
> > Reading 8192 bytes of response body.
> > Aborted request (-3): Could not read response body
>
> That message comes from ne_request.c and the -3 is NE_SOCK_CLOSED
> defined in ne_socket.h:
>
> #define NE_SOCK_CLOSED (-3)
> /* Connection was reset (e.g. server crashed) */
>
> > sess: Closing connection.
> > sess: Connection closed.
> > Request ends, status 200 class 2xx, error line:
> > Could not read response body: connection was closed by server
> > Running destroy hooks.
> > Request ends.
> > svn: E175002: REPORT of '/svn/repos/<name>/!svn/me': Could not read
> > response body: connection was closed by server (http://<server>)
> > sess: Destroying session.
> > sess: Destroying session.
>
> What does the server error log show?
>
> --
> Philip Martin | Subversion Committer
> WANdisco // *Non-Stop Data*
>

Re: Slow VM, svn client drops connection with FIN packet

Posted by Philip Martin <ph...@wandisco.com>.
Philip Martin <ph...@wandisco.com> writes:

> That message comes from ne_request.c and the -3 is NE_SOCK_CLOSED
> defined in ne_socket.h:
>
> #define NE_SOCK_CLOSED (-3)
> /* Connection was reset (e.g. server crashed) */

/* Socket was closed */
#define NE_SOCK_CLOSED (-3)

The comment associated with the constant is "Socket was closed" not
"Connection was reset".

-- 
Philip Martin | Subversion Committer
WANdisco // *Non-Stop Data*

Re: Slow VM, svn client drops connection with FIN packet

Posted by Philip Martin <ph...@wandisco.com>.
stuartm.coding@gmail.com writes:

> I have another failure, with some debug info, and some updates.
>
> We're switching to a different system for this particular repository, so I 
> may or may not be able to continue debug efforts. :-(
>
> Since there's no debug info available for the svn 1.8 client, I haven't 
> been running that.
>
> I went a full week without a failure, which I suspect is due to the 
> enormous amount of debugging that neon-debug-mask=255 produces, so this 
> week I dialed it back to 127 and got a failure on the first try.
>
> Here's the latest failure. First up, the debug info:
> Got 8192 bytes.
> XML: Parsing 8192 bytes.
> XML: char-data (235) returns 0
> XML: xmlParseChunk returned 0
> Reading 8192 bytes of response body.
> Got 6600 bytes.
> XML: Parsing 6600 bytes.
> XML: char-data (235) returns 0
> XML: xmlParseChunk returned 0
> Reading 8192 bytes of response body.
> Aborted request (-3): Could not read response body

That message comes from ne_request.c and the -3 is NE_SOCK_CLOSED
defined in ne_socket.h:

#define NE_SOCK_CLOSED (-3)
/* Connection was reset (e.g. server crashed) */

> sess: Closing connection.
> sess: Connection closed.
> Request ends, status 200 class 2xx, error line:
> Could not read response body: connection was closed by server
> Running destroy hooks.
> Request ends.
> svn: E175002: REPORT of '/svn/repos/<name>/!svn/me': Could not read 
> response body: connection was closed by server (http://<server>)
> sess: Destroying session.
> sess: Destroying session.

What does the server error log show?

-- 
Philip Martin | Subversion Committer
WANdisco // *Non-Stop Data*

Re: Slow VM, svn client drops connection with FIN packet

Posted by st...@gmail.com.
I have another failure, with some debug info, and some updates.

We're switching to a different system for this particular repository, so I 
may or may not be able to continue debug efforts. :-(

Since there's no debug info available for the svn 1.8 client, I haven't 
been running that.

I went a full week without a failure, which I suspect is due to the 
enormous amount of debugging that neon-debug-mask=255 produces, so this 
week I dialed it back to 127 and got a failure on the first try.

Here's the latest failure. First up, the debug info:
Got 8192 bytes.
XML: Parsing 8192 bytes.
XML: char-data (235) returns 0
XML: xmlParseChunk returned 0
Reading 8192 bytes of response body.
Got 6600 bytes.
XML: Parsing 6600 bytes.
XML: char-data (235) returns 0
XML: xmlParseChunk returned 0
Reading 8192 bytes of response body.
Aborted request (-3): Could not read response body
sess: Closing connection.
sess: Connection closed.
Request ends, status 200 class 2xx, error line:
Could not read response body: connection was closed by server
Running destroy hooks.
Request ends.
svn: E175002: REPORT of '/svn/repos/<name>/!svn/me': Could not read 
response body: connection was closed by server (http://<server>)
sess: Destroying session.
sess: Destroying session.

The capture info, with the default timeout which I am assuming is 600 
seconds:
77081    82.621499000    10.222.3.88    10.14.11.50    HTTP    1434    
Continuation or non-HTTP traffic
77082    82.621541000    10.14.11.50    10.222.3.88    TCP    54    36603 > 
http [ACK] Seq=1850961 Ack=92395908 Win=1280 Len=0
77083    *REF*    10.222.3.88    10.14.11.50    HTTP    1334    [TCP Window 
Full] Continuation or non-HTTP traffic
77084    0.200407000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ZeroWindow] 36603 > http [ACK] Seq=1850961 Ack=92397188 Win=0 Len=0
77085    0.458337000    10.222.3.88    10.14.11.50    TCP    60    [TCP 
Keep-Alive] http > 36603 [ACK] Seq=92397187 Ack=1850961 Win=893952 Len=0
77086    0.458390000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ZeroWindow] 36603 > http [ACK] Seq=1850961 Ack=92397188 Win=0 Len=0
77087    0.960584000    10.222.3.88    10.14.11.50    TCP    60    [TCP 
Keep-Alive] http > 36603 [ACK] Seq=92397187 Ack=1850961 Win=893952 Len=0
77088    0.960653000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ZeroWindow] 36603 > http [ACK] Seq=1850961 Ack=92397188 Win=0 Len=0
77089    1.950177000    10.222.3.88    10.14.11.50    TCP    60    [TCP 
Keep-Alive] http > 36603 [ACK] Seq=92397187 Ack=1850961 Win=893952 Len=0
77090    1.950213000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ZeroWindow] 36603 > http [ACK] Seq=1850961 Ack=92397188 Win=0 Len=0
77092    2.991154000    10.222.3.88    10.14.11.50    HTTP    1434    
Continuation or non-HTTP traffic
which gives
395638    524.989307000    10.222.3.88    10.14.11.50    HTTP    1434    
Continuation or non-HTTP traffic
395639    524.989401000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=763605828 
Win=746496 Len=0
395640    524.989606000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=763615488 
Win=736768 Len=0
395641    524.989883000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=763629288 
Win=722944 Len=0
395642    524.990140000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=763643088 
Win=709120 Len=0
395643    525.869551000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=766011168 
Win=1366016 Len=0
395644    526.501877000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ZeroWindow] [TCP ACKed unseen segment] 36603 > http [ACK] Seq=1850961 
Ack=767377268 Win=0 Len=0
395645    527.769324000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=768063228 
Win=1101056 Len=0
395646    527.783100000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=768128088 
Win=1116416 Len=0
395647    527.786772000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=768215028 
Win=1154816 Len=0
395648    527.815465000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=768347508 
Win=1218304 Len=0
395649    527.818509000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=768373728 
Win=1204736 Len=0
395650    527.872518000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=768698028 
Win=1233664 Len=0
395651    528.026681000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=769930368 
Win=1239040 Len=0
395652    528.086233000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=770440968 
Win=1298432 Len=0
395653    528.683101000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ZeroWindow] [TCP ACKed unseen segment] 36603 > http [ACK] Seq=1850961 
Ack=771739448 Win=0 Len=0
395654    529.544542000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=772977408 
Win=757504 Len=0
395655    529.591855000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=773252028 
Win=798976 Len=0
395656    529.660855000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=773881308 
Win=423424 Len=0
395657    529.688608000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=774109008 
Win=669184 Len=0
395658    529.860154000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=775638048 
Win=611328 Len=0
395659    530.033617000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=777532788 
Win=1091328 Len=0
395660    530.224557000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=779399928 
Win=1686272 Len=0
395661    530.375431000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=781086288 
Win=716032 Len=0
395662    531.350496000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=785885928 
Win=739584 Len=0
395663    531.625875000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=786624228 Win=1280 
Len=0
395664    532.084955000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ZeroWindow] [TCP ACKed unseen segment] 36603 > http [ACK] Seq=1850961 
Ack=786625508 Win=0 Len=0
395665    532.880166000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=786974748 
Win=721664 Len=0
395666    532.910934000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=787118268 
Win=832000 Len=0
395667    532.920303000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=787183128 
Win=785152 Len=0
395668    532.955237000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=787427388 
Win=1219072 Len=0
395669    532.955463000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
Window Update] [TCP ACKed unseen segment] 36603 > http [ACK] Seq=1850961 
Ack=787427388 Win=1248768 Len=0
395670    532.960093000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=787490868 
Win=1323264 Len=0
395671    532.970228000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=787557108 
Win=1404672 Len=0
395672    532.984501000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=787678548 
Win=1502720 Len=0
395673    532.999351000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=787786188 
Win=1514496 Len=0
395674    533.019030000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=788026308 
Win=1484800 Len=0
395675    533.032385000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=788144988 
Win=1488896 Len=0
395676    533.047272000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=788299548 
Win=1450240 Len=0
395677    533.188960000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=790268808 
Win=953344 Len=0
395678    533.984603000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ZeroWindow] [TCP ACKed unseen segment] 36603 > http [ACK] Seq=1850961 
Ack=794394908 Win=0 Len=0
395679    549.627263000    10.222.3.88    10.14.11.50    TCP    60    [TCP 
Previous segment not captured] http > 36603 [ACK] Seq=794394907 Ack=1850961 
Win=893952 Len=0
395680    549.627315000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ZeroWindow] [TCP ACKed unseen segment] 36603 > http [ACK] Seq=1850961 
Ack=794394908 Win=0 Len=0
395681    565.449108000    10.222.3.88    10.14.11.50    TCP    60    [TCP 
Dup ACK 395679#1] http > 36603 [ACK] Seq=794394907 Ack=1850961 Win=893952 
Len=0
395682    565.449169000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ZeroWindow] [TCP ACKed unseen segment] 36603 > http [ACK] Seq=1850961 
Ack=794394908 Win=0 Len=0
395683    599.797841000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=794858688 
Win=1141248 Len=0
395684    599.810002000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=794875248 
Win=1172992 Len=0
395685    599.825538000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=794978748 
Win=1188096 Len=0
395686    599.839904000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=795000828 
Win=1233664 Len=0
395687    599.843870000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=795054648 
Win=1185280 Len=0
395688    599.857927000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=795096048 
Win=1149440 Len=0
395689    600.137326000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ACKed unseen segment] 36603 > http [FIN, ACK] Seq=1850961 Ack=795809509 
Win=1526272 Len=0

Any thoughts, now that there's a bit of debug info?

...Stu

On Monday, December 23, 2013 5:40:30 PM UTC-5, Stuart MacDonald wrote:
>
> On Mon, Dec 23, 2013 at 4:47 PM, Stuart MacDonald <
> stuartm.coding@gmail.com> wrote:
>
>> I took some time last Fri to see what was 180 seconds prior to the FIN 
>> packet in the trace (this is the 1.7.14 failure):
>>  289223    259.418521000    10.222.3.88    10.14.10.32    HTTP    
>> 1434    Continuation or non-HTTP traffic
>> 289224    259.418583000    10.14.10.32    10.222.3.88    TCP    54    
>> 60442 > http [ACK] Seq=2151 Ack=358465616 Win=1280 Len=0
>> 289225    259.649415000    10.222.3.88    10.14.10.32    HTTP    1334    
>> [TCP Window Full] Continuation or non-HTTP traffic
>> 289226    259.855605000    10.14.10.32    10.222.3.88    TCP    54    
>> [TCP ZeroWindow] 60442 > http [ACK] Seq=2151 Ack=358466896 Win=0 Len=0
>> 289227    259.868018000    10.222.3.88    10.14.10.32    HTTP    1334    
>> [TCP Window Full] [TCP Retransmission] Continuation or non-HTTP traffic
>> 289228    *REF*    10.14.10.32    10.222.3.88    TCP    66    [TCP 
>> ZeroWindow] 60442 > http [ACK] Seq=2151 Ack=358466896 Win=0 Len=0 
>> SLE=358465616 SRE=358466896
>> 289229    0.383858000    10.14.10.32    10.222.3.88    TCP    54    [TCP 
>> Window Update] 60442 > http [ACK] Seq=2151 Ack=358466896 Win=2062848 Len=0
>> 289230    0.398005000    10.222.3.88    10.14.10.32    HTTP    1434    
>> Continuation or non-HTTP traffic
>> which yields
>> 444658    178.073895000    10.222.3.88    10.14.10.32    HTTP    1434    
>> Continuation or non-HTTP traffic
>> 444659    178.073944000    10.14.10.32    10.222.3.88    TCP    54    
>> 60442 > http [ACK] Seq=2151 Ack=550721697 Win=542208 Len=0
>> 444660    179.906931000    10.14.10.32    10.222.3.88    TCP    54    
>> [TCP Window Update] 60442 > http [ACK] Seq=2151 Ack=550721697 Win=1966336 
>> Len=0
>> 444661    180.152119000    10.14.10.32    10.222.3.88    TCP    54    
>> 60442 > http [FIN, ACK] Seq=2151 Ack=550721697 Win=1966336 Len=0
>> 444662    180.165529000    10.222.3.88    10.14.10.32    TCP    60    
>> http > 60442 [ACK] Seq=550721697 Ack=2152 Win=10240 Len=0
>>  
>> If the time ref is set at (wireshark-)packet 289227 then the FIN packet 
>> would be at 179.something elapsed. There are a lot of TCP zero windows in 
>> the trace though, that one isn't special AFAICT. There's plenty before and 
>> after it.
>>
>> If someone can let me know what the default for http-timeout is for 
>> 1.8.5/ra_serf, I'll do the same check in this morning's failure and see 
>> what's there.
>>
>
> Some googling 
> <http://mail-archives.apache.org/mod_mbox/subversion-commits/201211.mbox/%3C20121120183435.0E86A23888E3@eris.apache.org%3E>reveals 
> that the default timeout is likely to be 600 seconds. Let's see what we 
> have there (this is the 1.8.5 failure):
> 708    25.462185000    10.222.3.88    10.14.10.85    TCP    1434    [TCP 
> segment of a reassembled PDU]
> 709    25.462580000    10.14.10.85    10.222.3.88    TCP    54    9415 > 
> http [ACK] Seq=2713 Ack=753855 Win=1280 Len=0
> 710    25.707269000    10.222.3.88    10.14.10.85    TCP    1334    [TCP 
> Window Full] [TCP segment of a reassembled PDU]
> 711    25.907229000    10.14.10.85    10.222.3.88    TCP    54    [TCP 
> ZeroWindow] 9415 > http [ACK] Seq=2713 Ack=755135 Win=0 Len=0
> 712    *REF*    10.14.10.85    10.222.3.88    TCP    54    [TCP Window 
> Update] 9415 > http [ACK] Seq=2713 Ack=755135 Win=3328 Len=0
> 713    0.013860000    10.222.3.88    10.14.10.85    TCP    1434    [TCP 
> segment of a reassembled PDU]
> which yields
> 496958    599.344977000    10.222.3.88    10.14.10.85    HTTP    1434    
> Continuation or non-HTTP traffic
> 496959    599.345056000    10.14.10.85    10.222.3.88    TCP    54    9415 
> > http [ACK] Seq=2713 Ack=612899776 Win=1760768 Len=0
> 496960    599.345098000    10.14.10.85    10.222.3.88    TCP    54    [TCP 
> Window Update] 9415 > http [ACK] Seq=2713 Ack=612899776 Win=1762048 Len=0
> 496961    600.011551000    10.14.10.85    10.222.3.88    TCP    54    9415 
> > http [FIN, ACK] Seq=2713 Ack=612899776 Win=1762048 Len=0
> 496962    600.025110000    10.222.3.88    10.14.10.85    TCP    60    http 
> > 9415 [ACK] Seq=612899776 Ack=2714 Win=12288 Len=0
>
> So that's interesting that it's also at a TCP zero window, but like the 
> 1.7.14 failure, there's nothing special about that one; there are several 
> before it and many after it.
>
> From 2013-12-17, 1.7.13 (WANdisco again), http-timeout=180, ra_neon:
> 380006    238.310421000    10.222.3.88    10.14.10.32    HTTP    1434    
> Continuation or non-HTTP traffic
> 380007    238.310501000    10.14.10.32    10.222.3.88    TCP    54    
> 12926 > http [ACK] Seq=2151 Ack=464948988 Win=1280 Len=0
> 380008    *REF*    10.222.3.88    10.14.10.32    HTTP    1334    [TCP 
> Window Full] Continuation or non-HTTP traffic
> 380009    0.200088000    10.14.10.32    10.222.3.88    TCP    54    [TCP 
> ZeroWindow] 12926 > http [ACK] Seq=2151 Ack=464950268 Win=0 Len=0
> 380010    0.456070000    10.222.3.88    10.14.10.32    TCP    60    [TCP 
> Keep-Alive] http > 12926 [ACK] Seq=464950267 Ack=2151 Win=10240 Len=0
> 380011    0.456130000    10.14.10.32    10.222.3.88    TCP    54    [TCP 
> ZeroWindow] 12926 > http [ACK] Seq=2151 Ack=464950268 Win=0 Len=0
> 380012    0.953776000    10.222.3.88    10.14.10.32    TCP    60    [TCP 
> Keep-Alive] http > 12926 [ACK] Seq=464950267 Ack=2151 Win=10240 Len=0
> 380013    0.953813000    10.14.10.32    10.222.3.88    TCP    54    [TCP 
> ZeroWindow] 12926 > http [ACK] Seq=2151 Ack=464950268 Win=0 Len=0
> 380014    1.935785000    10.222.3.88    10.14.10.32    TCP    60    [TCP 
> Keep-Alive] http > 12926 [ACK] Seq=464950267 Ack=2151 Win=10240 Len=0
> 380015    1.935818000    10.14.10.32    10.222.3.88    TCP    54    [TCP 
> ZeroWindow] 12926 > http [ACK] Seq=2151 Ack=464950268 Win=0 Len=0
> 380016    2.018723000    10.14.10.32    10.222.3.88    TCP    54    [TCP 
> Window Update] 12926 > http [ACK] Seq=2151 Ack=464950268 Win=2472960 Len=0
> 380017    2.032298000    10.222.3.88    10.14.10.32    HTTP    1434    
> Continuation or non-HTTP traffic
> which yields
> 444920    179.032704000    10.222.3.88    10.14.10.32    HTTP    1434    
> Continuation or non-HTTP traffic
> 444921    179.032758000    10.14.10.32    10.222.3.88    TCP    54    
> 12926 > http [ACK] Seq=2151 Ack=545077309 Win=198656 Len=0
> 444922    179.889276000    10.14.10.32    10.222.3.88    TCP    54    [TCP 
> Window Update] 12926 > http [ACK] Seq=2151 Ack=545077309 Win=905216 Len=0
> 444923    180.088824000    10.14.10.32    10.222.3.88    TCP    54    
> 12926 > http [FIN, ACK] Seq=2151 Ack=545077309 Win=905216 Len=0
> 444924    180.103081000    10.222.3.88    10.14.10.32    TCP    60    http 
> > 12926 [ACK] Seq=545077309 Ack=2152 Win=10240 Len=0
>
> From 2013-12-16, same as 2013-12-17:
> 178168    140.424974000    10.222.3.88    10.14.10.32    HTTP    1434    
> Continuation or non-HTTP traffic
> 178169    140.425010000    10.14.10.32    10.222.3.88    TCP    54    
> 33813 > http [ACK] Seq=2151 Ack=220218202 Win=1280 Len=0
> 178170    140.658514000    10.222.3.88    10.14.10.32    HTTP    1334    
> [TCP Window Full] Continuation or non-HTTP traffic
> 178171    140.858753000    10.14.10.32    10.222.3.88    TCP    54    [TCP 
> ZeroWindow] 33813 > http [ACK] Seq=2151 Ack=220219482 Win=0 Len=0
> 178172    141.119647000    10.222.3.88    10.14.10.32    TCP    60    [TCP 
> Keep-Alive] http > 33813 [ACK] Seq=220219481 Ack=2151 Win=10240 Len=0
> 178173    141.119690000    10.14.10.32    10.222.3.88    TCP    54    [TCP 
> ZeroWindow] 33813 > http [ACK] Seq=2151 Ack=220219482 Win=0 Len=0
> 178174    141.627353000    10.222.3.88    10.14.10.32    TCP    60    [TCP 
> Keep-Alive] http > 33813 [ACK] Seq=220219481 Ack=2151 Win=10240 Len=0
> 178175    141.627385000    10.14.10.32    10.222.3.88    TCP    54    [TCP 
> ZeroWindow] 33813 > http [ACK] Seq=2151 Ack=220219482 Win=0 Len=0
> 178176    142.628795000    10.222.3.88    10.14.10.32    TCP    60    [TCP 
> Keep-Alive] http > 33813 [ACK] Seq=220219481 Ack=2151 Win=10240 Len=0
> 178177    142.628839000    10.14.10.32    10.222.3.88    TCP    54    [TCP 
> ZeroWindow] 33813 > http [ACK] Seq=2151 Ack=220219482 Win=0 Len=0
> 178178    144.619506000    10.222.3.88    10.14.10.32    TCP    60    [TCP 
> Keep-Alive] http > 33813 [ACK] Seq=220219481 Ack=2151 Win=10240 Len=0
> 178179    144.619574000    10.14.10.32    10.222.3.88    TCP    54    [TCP 
> ZeroWindow] 33813 > http [ACK] Seq=2151 Ack=220219482 Win=0 Len=0
> 178180    148.585393000    10.222.3.88    10.14.10.32    TCP    60    [TCP 
> Keep-Alive] http > 33813 [ACK] Seq=220219481 Ack=2151 Win=10240 Len=0
> 178181    148.585447000    10.14.10.32    10.222.3.88    TCP    54    [TCP 
> ZeroWindow] 33813 > http [ACK] Seq=2151 Ack=220219482 Win=0 Len=0
> 178182    156.502765000    10.222.3.88    10.14.10.32    TCP    60    [TCP 
> Keep-Alive] http > 33813 [ACK] Seq=220219481 Ack=2151 Win=10240 Len=0
> 178183    *REF*    10.14.10.32    10.222.3.88    TCP    54    [TCP 
> ZeroWindow] 33813 > http [ACK] Seq=2151 Ack=220219482 Win=0 Len=0
> 178184    15.821813000    10.222.3.88    10.14.10.32    TCP    60    [TCP 
> Keep-Alive] http > 33813 [ACK] Seq=220219481 Ack=2151 Win=10240 Len=0
> 178185    15.821845000    10.14.10.32    10.222.3.88    TCP    54    [TCP 
> ZeroWindow] 33813 > http [ACK] Seq=2151 Ack=220219482 Win=0 Len=0
> 178186    43.448451000    10.14.10.32    10.222.3.88    TCP    54    [TCP 
> Window Update] 33813 > http [ACK] Seq=2151 Ack=220219482 Win=2856448 Len=0
> 178187    43.461842000    10.222.3.88    10.14.10.32    HTTP    1434    
> Continuation or non-HTTP traffic
> which yields
> 291476    173.722084000    10.222.3.88    10.14.10.32    HTTP    1434    
> Continuation or non-HTTP traffic
> 291477    173.722125000    10.14.10.32    10.222.3.88    TCP    54    
> 33813 > http [ACK] Seq=2151 Ack=361058243 Win=52224 Len=0
> 291478    178.322196000    10.14.10.32    10.222.3.88    TCP    54    [TCP 
> Window Update] 33813 > http [ACK] Seq=2151 Ack=361058243 Win=1489664 Len=0
> 291479    178.364417000    10.14.10.32    10.222.3.88    TCP    54    [TCP 
> Window Update] 33813 > http [ACK] Seq=2151 Ack=361058243 Win=3042816 Len=0
> 291480    180.974205000    10.14.10.32    10.222.3.88    TCP    54    
> 33813 > http [FIN, ACK] Seq=2151 Ack=361058243 Win=3042816 Len=0
> 291481    180.988706000    10.222.3.88    10.14.10.32    TCP    60    http 
> > 33813 [ACK] Seq=361058243 Ack=2152 Win=10240 Len=0
>
> Nice to see the exponential backoff at work during the TCP zero window.
>
> Maybe this is a red herring. I do find it unusual that all the http 
> timeout start points seem to be during TCP zero windows.
>
> ...Stu
>

Re: Slow VM, svn client drops connection with FIN packet

Posted by Stuart MacDonald <st...@gmail.com>.
On Mon, Dec 23, 2013 at 4:47 PM, Stuart MacDonald
<st...@gmail.com>wrote:

> I took some time last Fri to see what was 180 seconds prior to the FIN
> packet in the trace (this is the 1.7.14 failure):
> 289223    259.418521000    10.222.3.88    10.14.10.32    HTTP    1434
> Continuation or non-HTTP traffic
> 289224    259.418583000    10.14.10.32    10.222.3.88    TCP    54
> 60442 > http [ACK] Seq=2151 Ack=358465616 Win=1280 Len=0
> 289225    259.649415000    10.222.3.88    10.14.10.32    HTTP    1334
> [TCP Window Full] Continuation or non-HTTP traffic
> 289226    259.855605000    10.14.10.32    10.222.3.88    TCP    54    [TCP
> ZeroWindow] 60442 > http [ACK] Seq=2151 Ack=358466896 Win=0 Len=0
> 289227    259.868018000    10.222.3.88    10.14.10.32    HTTP    1334
> [TCP Window Full] [TCP Retransmission] Continuation or non-HTTP traffic
> 289228    *REF*    10.14.10.32    10.222.3.88    TCP    66    [TCP
> ZeroWindow] 60442 > http [ACK] Seq=2151 Ack=358466896 Win=0 Len=0
> SLE=358465616 SRE=358466896
> 289229    0.383858000    10.14.10.32    10.222.3.88    TCP    54    [TCP
> Window Update] 60442 > http [ACK] Seq=2151 Ack=358466896 Win=2062848 Len=0
> 289230    0.398005000    10.222.3.88    10.14.10.32    HTTP    1434
> Continuation or non-HTTP traffic
> which yields
> 444658    178.073895000    10.222.3.88    10.14.10.32    HTTP    1434
> Continuation or non-HTTP traffic
> 444659    178.073944000    10.14.10.32    10.222.3.88    TCP    54
> 60442 > http [ACK] Seq=2151 Ack=550721697 Win=542208 Len=0
> 444660    179.906931000    10.14.10.32    10.222.3.88    TCP    54    [TCP
> Window Update] 60442 > http [ACK] Seq=2151 Ack=550721697 Win=1966336 Len=0
> 444661    180.152119000    10.14.10.32    10.222.3.88    TCP    54
> 60442 > http [FIN, ACK] Seq=2151 Ack=550721697 Win=1966336 Len=0
> 444662    180.165529000    10.222.3.88    10.14.10.32    TCP    60    http
> > 60442 [ACK] Seq=550721697 Ack=2152 Win=10240 Len=0
>
> If the time ref is set at (wireshark-)packet 289227 then the FIN packet
> would be at 179.something elapsed. There are a lot of TCP zero windows in
> the trace though, that one isn't special AFAICT. There's plenty before and
> after it.
>
> If someone can let me know what the default for http-timeout is for
> 1.8.5/ra_serf, I'll do the same check in this morning's failure and see
> what's there.
>

Some googling
<http://mail-archives.apache.org/mod_mbox/subversion-commits/201211.mbox/%3C20121120183435.0E86A23888E3@eris.apache.org%3E>reveals
that the default timeout is likely to be 600 seconds. Let's see what we
have there (this is the 1.8.5 failure):
708    25.462185000    10.222.3.88    10.14.10.85    TCP    1434    [TCP
segment of a reassembled PDU]
709    25.462580000    10.14.10.85    10.222.3.88    TCP    54    9415 >
http [ACK] Seq=2713 Ack=753855 Win=1280 Len=0
710    25.707269000    10.222.3.88    10.14.10.85    TCP    1334    [TCP
Window Full] [TCP segment of a reassembled PDU]
711    25.907229000    10.14.10.85    10.222.3.88    TCP    54    [TCP
ZeroWindow] 9415 > http [ACK] Seq=2713 Ack=755135 Win=0 Len=0
712    *REF*    10.14.10.85    10.222.3.88    TCP    54    [TCP Window
Update] 9415 > http [ACK] Seq=2713 Ack=755135 Win=3328 Len=0
713    0.013860000    10.222.3.88    10.14.10.85    TCP    1434    [TCP
segment of a reassembled PDU]
which yields
496958    599.344977000    10.222.3.88    10.14.10.85    HTTP    1434
Continuation or non-HTTP traffic
496959    599.345056000    10.14.10.85    10.222.3.88    TCP    54    9415
> http [ACK] Seq=2713 Ack=612899776 Win=1760768 Len=0
496960    599.345098000    10.14.10.85    10.222.3.88    TCP    54    [TCP
Window Update] 9415 > http [ACK] Seq=2713 Ack=612899776 Win=1762048 Len=0
496961    600.011551000    10.14.10.85    10.222.3.88    TCP    54    9415
> http [FIN, ACK] Seq=2713 Ack=612899776 Win=1762048 Len=0
496962    600.025110000    10.222.3.88    10.14.10.85    TCP    60    http
> 9415 [ACK] Seq=612899776 Ack=2714 Win=12288 Len=0

So that's interesting that it's also at a TCP zero window, but like the
1.7.14 failure, there's nothing special about that one; there are several
before it and many after it.

>From 2013-12-17, 1.7.13 (WANdisco again), http-timeout=180, ra_neon:
380006    238.310421000    10.222.3.88    10.14.10.32    HTTP    1434
Continuation or non-HTTP traffic
380007    238.310501000    10.14.10.32    10.222.3.88    TCP    54    12926
> http [ACK] Seq=2151 Ack=464948988 Win=1280 Len=0
380008    *REF*    10.222.3.88    10.14.10.32    HTTP    1334    [TCP
Window Full] Continuation or non-HTTP traffic
380009    0.200088000    10.14.10.32    10.222.3.88    TCP    54    [TCP
ZeroWindow] 12926 > http [ACK] Seq=2151 Ack=464950268 Win=0 Len=0
380010    0.456070000    10.222.3.88    10.14.10.32    TCP    60    [TCP
Keep-Alive] http > 12926 [ACK] Seq=464950267 Ack=2151 Win=10240 Len=0
380011    0.456130000    10.14.10.32    10.222.3.88    TCP    54    [TCP
ZeroWindow] 12926 > http [ACK] Seq=2151 Ack=464950268 Win=0 Len=0
380012    0.953776000    10.222.3.88    10.14.10.32    TCP    60    [TCP
Keep-Alive] http > 12926 [ACK] Seq=464950267 Ack=2151 Win=10240 Len=0
380013    0.953813000    10.14.10.32    10.222.3.88    TCP    54    [TCP
ZeroWindow] 12926 > http [ACK] Seq=2151 Ack=464950268 Win=0 Len=0
380014    1.935785000    10.222.3.88    10.14.10.32    TCP    60    [TCP
Keep-Alive] http > 12926 [ACK] Seq=464950267 Ack=2151 Win=10240 Len=0
380015    1.935818000    10.14.10.32    10.222.3.88    TCP    54    [TCP
ZeroWindow] 12926 > http [ACK] Seq=2151 Ack=464950268 Win=0 Len=0
380016    2.018723000    10.14.10.32    10.222.3.88    TCP    54    [TCP
Window Update] 12926 > http [ACK] Seq=2151 Ack=464950268 Win=2472960 Len=0
380017    2.032298000    10.222.3.88    10.14.10.32    HTTP    1434
Continuation or non-HTTP traffic
which yields
444920    179.032704000    10.222.3.88    10.14.10.32    HTTP    1434
Continuation or non-HTTP traffic
444921    179.032758000    10.14.10.32    10.222.3.88    TCP    54    12926
> http [ACK] Seq=2151 Ack=545077309 Win=198656 Len=0
444922    179.889276000    10.14.10.32    10.222.3.88    TCP    54    [TCP
Window Update] 12926 > http [ACK] Seq=2151 Ack=545077309 Win=905216 Len=0
444923    180.088824000    10.14.10.32    10.222.3.88    TCP    54    12926
> http [FIN, ACK] Seq=2151 Ack=545077309 Win=905216 Len=0
444924    180.103081000    10.222.3.88    10.14.10.32    TCP    60    http
> 12926 [ACK] Seq=545077309 Ack=2152 Win=10240 Len=0

>From 2013-12-16, same as 2013-12-17:
178168    140.424974000    10.222.3.88    10.14.10.32    HTTP    1434
Continuation or non-HTTP traffic
178169    140.425010000    10.14.10.32    10.222.3.88    TCP    54    33813
> http [ACK] Seq=2151 Ack=220218202 Win=1280 Len=0
178170    140.658514000    10.222.3.88    10.14.10.32    HTTP    1334
[TCP Window Full] Continuation or non-HTTP traffic
178171    140.858753000    10.14.10.32    10.222.3.88    TCP    54    [TCP
ZeroWindow] 33813 > http [ACK] Seq=2151 Ack=220219482 Win=0 Len=0
178172    141.119647000    10.222.3.88    10.14.10.32    TCP    60    [TCP
Keep-Alive] http > 33813 [ACK] Seq=220219481 Ack=2151 Win=10240 Len=0
178173    141.119690000    10.14.10.32    10.222.3.88    TCP    54    [TCP
ZeroWindow] 33813 > http [ACK] Seq=2151 Ack=220219482 Win=0 Len=0
178174    141.627353000    10.222.3.88    10.14.10.32    TCP    60    [TCP
Keep-Alive] http > 33813 [ACK] Seq=220219481 Ack=2151 Win=10240 Len=0
178175    141.627385000    10.14.10.32    10.222.3.88    TCP    54    [TCP
ZeroWindow] 33813 > http [ACK] Seq=2151 Ack=220219482 Win=0 Len=0
178176    142.628795000    10.222.3.88    10.14.10.32    TCP    60    [TCP
Keep-Alive] http > 33813 [ACK] Seq=220219481 Ack=2151 Win=10240 Len=0
178177    142.628839000    10.14.10.32    10.222.3.88    TCP    54    [TCP
ZeroWindow] 33813 > http [ACK] Seq=2151 Ack=220219482 Win=0 Len=0
178178    144.619506000    10.222.3.88    10.14.10.32    TCP    60    [TCP
Keep-Alive] http > 33813 [ACK] Seq=220219481 Ack=2151 Win=10240 Len=0
178179    144.619574000    10.14.10.32    10.222.3.88    TCP    54    [TCP
ZeroWindow] 33813 > http [ACK] Seq=2151 Ack=220219482 Win=0 Len=0
178180    148.585393000    10.222.3.88    10.14.10.32    TCP    60    [TCP
Keep-Alive] http > 33813 [ACK] Seq=220219481 Ack=2151 Win=10240 Len=0
178181    148.585447000    10.14.10.32    10.222.3.88    TCP    54    [TCP
ZeroWindow] 33813 > http [ACK] Seq=2151 Ack=220219482 Win=0 Len=0
178182    156.502765000    10.222.3.88    10.14.10.32    TCP    60    [TCP
Keep-Alive] http > 33813 [ACK] Seq=220219481 Ack=2151 Win=10240 Len=0
178183    *REF*    10.14.10.32    10.222.3.88    TCP    54    [TCP
ZeroWindow] 33813 > http [ACK] Seq=2151 Ack=220219482 Win=0 Len=0
178184    15.821813000    10.222.3.88    10.14.10.32    TCP    60    [TCP
Keep-Alive] http > 33813 [ACK] Seq=220219481 Ack=2151 Win=10240 Len=0
178185    15.821845000    10.14.10.32    10.222.3.88    TCP    54    [TCP
ZeroWindow] 33813 > http [ACK] Seq=2151 Ack=220219482 Win=0 Len=0
178186    43.448451000    10.14.10.32    10.222.3.88    TCP    54    [TCP
Window Update] 33813 > http [ACK] Seq=2151 Ack=220219482 Win=2856448 Len=0
178187    43.461842000    10.222.3.88    10.14.10.32    HTTP    1434
Continuation or non-HTTP traffic
which yields
291476    173.722084000    10.222.3.88    10.14.10.32    HTTP    1434
Continuation or non-HTTP traffic
291477    173.722125000    10.14.10.32    10.222.3.88    TCP    54    33813
> http [ACK] Seq=2151 Ack=361058243 Win=52224 Len=0
291478    178.322196000    10.14.10.32    10.222.3.88    TCP    54    [TCP
Window Update] 33813 > http [ACK] Seq=2151 Ack=361058243 Win=1489664 Len=0
291479    178.364417000    10.14.10.32    10.222.3.88    TCP    54    [TCP
Window Update] 33813 > http [ACK] Seq=2151 Ack=361058243 Win=3042816 Len=0
291480    180.974205000    10.14.10.32    10.222.3.88    TCP    54    33813
> http [FIN, ACK] Seq=2151 Ack=361058243 Win=3042816 Len=0
291481    180.988706000    10.222.3.88    10.14.10.32    TCP    60    http
> 33813 [ACK] Seq=361058243 Ack=2152 Win=10240 Len=0

Nice to see the exponential backoff at work during the TCP zero window.

Maybe this is a red herring. I do find it unusual that all the http timeout
start points seem to be during TCP zero windows.

...Stu

Re: Slow VM, svn client drops connection with FIN packet

Posted by Stuart MacDonald <st...@gmail.com>.
On Mon, Dec 23, 2013 at 3:51 PM, Lieven Govaerts <lg...@apache.org> wrote:

> to be clear, you've tested:
> - svn 1.7.13 with ra_neon, svn 1.7.14 with ra_neon and 1.8.5 with ra_serf.
> - to an svn 1.7.7 server
> - over http, with nginx 1.0.5. as intermediate proxy
> - running 'svn up'
> - on an Ubuntu guest (version ______) using Virtualbox (version
> __________) on host OS ________.
>
> Can you fill in the blanks?
>

- on an Ubuntu guest (version 12.10) using Virtualbox (version 4.2.12
r84980) on host OS Windows 7 64-bit Enterprise SP1
For VirtualBox the guest extensions are installed; the adapter is set in
NAT mode.

The error "The server sent a truncated HTTP response body." you're
> seeing from ra_serf, means that ra_serf is busy reading a response
> from the server but all data isn't arriving.
>

The network trace shows the data arriving:

svn 1.8.5:
496958    625.444757000    10.222.3.88    10.14.10.85    HTTP    1434
Continuation or non-HTTP traffic
496959    625.444836000    10.14.10.85    10.222.3.88    TCP    54    9415
> http [ACK] Seq=2713 Ack=612899776 Win=1760768 Len=0
496960    625.444878000    10.14.10.85    10.222.3.88    TCP    54    [TCP
Window Update] 9415 > http [ACK] Seq=2713 Ack=612899776 Win=1762048 Len=0
496961    626.111331000    10.14.10.85    10.222.3.88    TCP    54    9415
> http [FIN, ACK] Seq=2713 Ack=612899776 Win=1762048 Len=0
496962    626.124890000    10.222.3.88    10.14.10.85    TCP    60    http
> 9415 [ACK] Seq=612899776 Ack=2714 Win=12288 Len=0

svn 1.7.14:
444658    437.941946000    10.222.3.88    10.14.10.32    HTTP    1434
Continuation or non-HTTP traffic
444659    437.941995000    10.14.10.32    10.222.3.88    TCP    54    60442
> http [ACK] Seq=2151 Ack=550721697 Win=542208 Len=0
444660    439.774982000    10.14.10.32    10.222.3.88    TCP    54    [TCP
Window Update] 60442 > http [ACK] Seq=2151 Ack=550721697 Win=1966336 Len=0
444661    440.020170000    10.14.10.32    10.222.3.88    TCP    54    60442
> http [FIN, ACK] Seq=2151 Ack=550721697 Win=1966336 Len=0
444662    440.033580000    10.222.3.88    10.14.10.32    TCP    60    http
> 60442 [ACK] Seq=550721697 Ack=2152 Win=10240 Len=0

One thing to note: a while ago my IT dept had me set "http-timeout=180" to
see if that fixed the problem. The 1.7.14 network capture is with that set
from last Fri. The 1.8.5 is without it set from this morning. I found that
setting it did reduce the frequency of occurrence, but not resolve it
entirely; I went from failing about 3 out of 5 mornings to 1 out of 5. I've
removed it (restored to using the default values) in order to have more
frequent failures for quicker testing.

We have seen this caused by proxies tampering with the response data.
> In this case I suppose the connection is shut down unexpectedly first,
> which ra_serf interprets as 'the connection closed, but I didn't get
> all data for this response yet" => error.
>

I would doubt that the proxy is tampering with the data, otherwise svn
would fail all the time, yes?

Note that the client is sending the [FIN, ACK], not the server.


> You have tested two versions of subversion, each with their own http
> client implementation (serf and neon). Both libraries have been
> implemented completely independently by different people, without
> sharing code. I find it difficult to believe that both libraries have
> the same bug, but ok, it's not impossible.
>

Agreed, it seems unlikely, but not impossible. I'd look first at the svn
layer that uses the libraries.

I took some time last Fri to see what was 180 seconds prior to the FIN
packet in the trace (this is the 1.7.14 failure):
289223    259.418521000    10.222.3.88    10.14.10.32    HTTP    1434
Continuation or non-HTTP traffic
289224    259.418583000    10.14.10.32    10.222.3.88    TCP    54    60442
> http [ACK] Seq=2151 Ack=358465616 Win=1280 Len=0
289225    259.649415000    10.222.3.88    10.14.10.32    HTTP    1334
[TCP Window Full] Continuation or non-HTTP traffic
289226    259.855605000    10.14.10.32    10.222.3.88    TCP    54    [TCP
ZeroWindow] 60442 > http [ACK] Seq=2151 Ack=358466896 Win=0 Len=0
289227    259.868018000    10.222.3.88    10.14.10.32    HTTP    1334
[TCP Window Full] [TCP Retransmission] Continuation or non-HTTP traffic
289228    *REF*    10.14.10.32    10.222.3.88    TCP    66    [TCP
ZeroWindow] 60442 > http [ACK] Seq=2151 Ack=358466896 Win=0 Len=0
SLE=358465616 SRE=358466896
289229    0.383858000    10.14.10.32    10.222.3.88    TCP    54    [TCP
Window Update] 60442 > http [ACK] Seq=2151 Ack=358466896 Win=2062848 Len=0
289230    0.398005000    10.222.3.88    10.14.10.32    HTTP    1434
Continuation or non-HTTP traffic
which yields
444658    178.073895000    10.222.3.88    10.14.10.32    HTTP    1434
Continuation or non-HTTP traffic
444659    178.073944000    10.14.10.32    10.222.3.88    TCP    54    60442
> http [ACK] Seq=2151 Ack=550721697 Win=542208 Len=0
444660    179.906931000    10.14.10.32    10.222.3.88    TCP    54    [TCP
Window Update] 60442 > http [ACK] Seq=2151 Ack=550721697 Win=1966336 Len=0
444661    180.152119000    10.14.10.32    10.222.3.88    TCP    54    60442
> http [FIN, ACK] Seq=2151 Ack=550721697 Win=1966336 Len=0
444662    180.165529000    10.222.3.88    10.14.10.32    TCP    60    http
> 60442 [ACK] Seq=550721697 Ack=2152 Win=10240 Len=0

If the time ref is set at (wireshark-)packet 289227 then the FIN packet
would be at 179.something elapsed. There are a lot of TCP zero windows in
the trace though, that one isn't special AFAICT. There's plenty before and
after it.

If someone can let me know what the default for http-timeout is for
1.8.5/ra_serf, I'll do the same check in this morning's failure and see
what's there.

In case of serf I can add that if serf would actively drop the
> connection due to a timeout, you'd see a different error message. In
> fact, I've searched for error E120106 and it's only raised when a
> clean EOF is returned from the network layer, whereas in case of error
> (TCP RST) the nw layer returns a ECONNABORTED or ECONNRESET status
> code.
>

Right, but that would be if the server drops the connection? The client is
dropping the connection.


> So, my first guess would be a problem with either the guest OS, or
> with the Virtualbox network code (the latter I think is more likely).
>

The packets are being delivered, so I'd expect not, but that's an area I
didn't consider. On that note: the wireshark capture is from the Win7 host,
not the guest. I'm assuming the captures would be the same in both cases,
but that may not be true.


> 1. If you like and are allowed to, you can send me the wireshark .pcap
> file privately (replace username/passwd/basic/digest authn headers
> with X, but keep the same # of chars!).
>

I've been assuming I'm not allowed to; I'd like to leave this to a last
resort.


> Also, the debug logging of both http client libraries would be nice:
> - For svn 1.7.x put neon-debug-mask=255 in your servers file.
>

Done.


> - For 1.8.5 this won't work anymore, you'll have to compile serf from
> source to enable debugging (flags CONN_VERBOSE, SOCK_VERBOSE,
> SOCK_MSG_VERBOSE in serf_private.h).
>

I tried compiling from source last week, but it was pretty painful. I'll
try again over the holidays.


> 2. I'd also test this same operation on a physical machine, with the
> same OS and/or with a different OS.
>

The rest of my team are not running a VM and nobody else sees this problem.
Also, they've syncing the entire repository, I'm only syncing a few (large)
branches to save disk space, so their operations are heavier than mine.
Keep in mind: the post that I linked to in my original post mentions that
the problem seems to be due to the slowness of the VM while svn is
processing the incoming data. I'm expecting this to be a necessary
contributing factor, although not a sufficient factor. IOW I wouldn't
expect this problem to show up on the host OS.

Hm. Given that I'm the only person out of a team of about 10 distributed
coast to coast across Canada, doesn't that rule out the proxy?

I have TortoiseSVN installed, I'll see if I can setup something on the host
OS.


> 3. Regardless of this issue, the upgrade of nginx is important for svn
> 1.8.5, as older versions of nginx have problems with serf's use of
> HTTP 1.1 features. There's a fallback mechanism now, but at the cost
> of some reduced performance.


I'll mention this to our IT.

...Stu

Re: Slow VM, svn client drops connection with FIN packet

Posted by Lieven Govaerts <lg...@apache.org>.
Hi Stu,

On Mon, Dec 23, 2013 at 6:11 PM,  <st...@gmail.com> wrote:
> On Tuesday, December 17, 2013 10:32:34 AM UTC-5, Branko Čibej wrote:
>>
>> On 17.12.2013 16:19, Stuart MacDonald wrote:
>> > The Wireshark trace shows clearly and conclusively this is a bug in
>> > the client. The client drops the connection by sending a FIN packet
>> > unexpectedly.
>>
>> It could be waiting for a response from the proxy that never reaches the
>> client, and dropping the connection after a timeout.
>>
>> Regardless, you should still try to reproduce this with the latest
>> client (ideally, both latest 1.7 and 1.8) to determine if the problem
>> has been fixed. Especially in 1.8, there have been a lot of improvements
>> in the HTTP protocol driver.
>
>
> I'm using WANdisco's packages to test with, as it turned out to be easier to
> get those set up than to compile from source with all the optional bits I
> need.
>
> I have seen the exact same issue on 1.7.14:
> svn: E175002: REPORT of '/svn/repos/<name>/!svn/me': Could not read response
> body: connection was closed by server (http://<server>)
> and the Wireshark capture shows the client sending a [FIN, ACK] to close the
> connection, despite it only being a fraction of a second since the last
> server traffic. This always occurs just after the TCP window opening up
> again, not sure if that's significant.
>
> This *also* occurs on 1.8.5, although the error message is more informative:
> svn: E120106: ra_serf: The server sent a truncated HTTP response body.
> The Wireshark capture is *exactly* *identical*; after some server traffic,
> the TCP window opens up and the client sends a [FIN, ACK] to close the
> connection.

to be clear, you've tested:
- svn 1.7.13 with ra_neon, svn 1.7.14 with ra_neon and 1.8.5 with ra_serf.
- to an svn 1.7.7 server
- over http, with nginx 1.0.5. as intermediate proxy
- running 'svn up'
- on an Ubuntu guest (version ______) using Virtualbox (version
__________) on host OS ________.

Can you fill in the blanks?

The error "The server sent a truncated HTTP response body." you're
seeing from ra_serf, means that ra_serf is busy reading a response
from the server but all data isn't arriving.

We have seen this caused by proxies tampering with the response data.
In this case I suppose the connection is shut down unexpectedly first,
which ra_serf interprets as 'the connection closed, but I didn't get
all data for this response yet" => error.

> Having taken a closer look at the network captures; the failure pattern is
> always:
> - 1380 byte packets from the server
> - an ACK from the client
> - a TCP window update from the client (the window hadn't gone to 0 in any
> cases, this is just a regular update)
> - in one case there's another window update from the client
> - a delay of 0 - 4 seconds where there is no traffic in either direction
> - the client sends the [FIN, ACK]
>
>
> Now, I'm not sure what constitutes a truncated HTTP response body, as the
> traffic looks the same in both cases, large blocks of encoded data. So I
> think the 1.8.x error message is also incorrect, but possibly less incorrect
> than the 1.7.x error message.
>

Your setup has many different layers so many possible failing components.
>From your transcript of the wireshark trace it looks like the
connection is shut down on the client side (like you said).

You have tested two versions of subversion, each with their own http
client implementation (serf and neon). Both libraries have been
implemented completely independently by different people, without
sharing code. I find it difficult to believe that both libraries have
the same bug, but ok, it's not impossible.

In case of serf I can add that if serf would actively drop the
connection due to a timeout, you'd see a different error message. In
fact, I've searched for error E120106 and it's only raised when a
clean EOF is returned from the network layer, whereas in case of error
(TCP RST) the nw layer returns a ECONNABORTED or ECONNRESET status
code.

So, my first guess would be a problem with either the guest OS, or
with the Virtualbox network code (the latter I think is more likely).

> What's the next step?

1. If you like and are allowed to, you can send me the wireshark .pcap
file privately (replace username/passwd/basic/digest authn headers
with X, but keep the same # of chars!).
Also, the debug logging of both http client libraries would be nice:
- For svn 1.7.x put neon-debug-mask=255 in your servers file.
- For 1.8.5 this won't work anymore, you'll have to compile serf from
source to enable debugging (flags CONN_VERBOSE, SOCK_VERBOSE,
SOCK_MSG_VERBOSE in serf_private.h).

Maybe with this information I can confirm my reasoning or find another
potential cause.

2. I'd also test this same operation on a physical machine, with the
same OS and/or with a different OS.

3. Regardless of this issue, the upgrade of nginx is important for svn
1.8.5, as older versions of nginx have problems with serf's use of
HTTP 1.1 features. There's a fallback mechanism now, but at the cost
of some reduced performance.

>
> ...Stu
>

Lieven

Re: Slow VM, svn client drops connection with FIN packet

Posted by st...@gmail.com.
On Tuesday, December 17, 2013 10:32:34 AM UTC-5, Branko Čibej wrote:
>
> On 17.12.2013 16:19, Stuart MacDonald wrote: 
> > The Wireshark trace shows clearly and conclusively this is a bug in 
> > the client. The client drops the connection by sending a FIN packet 
> > unexpectedly. 
>
> It could be waiting for a response from the proxy that never reaches the 
> client, and dropping the connection after a timeout. 
>
> Regardless, you should still try to reproduce this with the latest 
> client (ideally, both latest 1.7 and 1.8) to determine if the problem 
> has been fixed. Especially in 1.8, there have been a lot of improvements 
> in the HTTP protocol driver.
>

I'm using WANdisco's packages to test with, as it turned out to be easier 
to get those set up than to compile from source with all the optional bits 
I need.

I have seen the exact same issue on 1.7.14:
svn: E175002: REPORT of '/svn/repos/<name>/!svn/me': Could not read 
response body: connection was closed by server (http://<server>)
and the Wireshark capture shows the client sending a [FIN, ACK] to close 
the connection, despite it only being a fraction of a second since the last 
server traffic. This always occurs just after the TCP window opening up 
again, not sure if that's significant.

This *also* occurs on 1.8.5, although the error message is more informative:
svn: E120106: ra_serf: The server sent a truncated HTTP response body.
The Wireshark capture is *exactly* *identical*; after some server traffic, 
the TCP window opens up and the client sends a [FIN, ACK] to close the 
connection.

Having taken a closer look at the network captures; the failure pattern is 
always:
- 1380 byte packets from the server
- an ACK from the client
- a TCP window update from the client (the window hadn't gone to 0 in any 
cases, this is just a regular update)
- in one case there's another window update from the client
- a delay of 0 - 4 seconds where there is no traffic in either direction
- the client sends the [FIN, ACK]

Now, I'm not sure what constitutes a truncated HTTP response body, as the 
traffic looks the same in both cases, large blocks of encoded data. So I 
think the 1.8.x error message is also incorrect, but possibly less 
incorrect than the 1.7.x error message.

What's the next step?

...Stu



 

Re: Slow VM, svn client drops connection with FIN packet

Posted by Branko Čibej <br...@wandisco.com>.
On 17.12.2013 16:19, Stuart MacDonald wrote:
> On Monday, December 16, 2013 1:45:11 PM UTC-5, Branko Čibej wrote:
>
>     So your server is using an nginx proxy, and your 1.7 client
>     doesn't work with it. The thing to do is to try to reproduce this
>     with 1.8, and if it's reproducible, it's still most likely a proxy
>     bug (nginx 1.0.5 is rather old). Reporting this to the server
>     administrator would be the next step.
>
>
> The Wireshark trace shows clearly and conclusively this is a bug in
> the client. The client drops the connection by sending a FIN packet
> unexpectedly.

It could be waiting for a response from the proxy that never reaches the
client, and dropping the connection after a timeout.

Regardless, you should still try to reproduce this with the latest
client (ideally, both latest 1.7 and 1.8) to determine if the problem
has been fixed. Especially in 1.8, there have been a lot of improvements
in the HTTP protocol driver.


-- Brane

-- 
Branko Čibej | Director of Subversion
WANdisco // Non-Stop Data
e. brane@wandisco.com

Re: Slow VM, svn client drops connection with FIN packet

Posted by Stuart MacDonald <st...@gmail.com>.
On Monday, December 16, 2013 1:45:11 PM UTC-5, Branko Čibej wrote:
>
>  So your server is using an nginx proxy, and your 1.7 client doesn't work 
> with it. The thing to do is to try to reproduce this with 1.8, and if it's 
> reproducible, it's still most likely a proxy bug (nginx 1.0.5 is rather 
> old). Reporting this to the server administrator would be the next step.
>

The Wireshark trace shows clearly and conclusively this is a bug in the 
client. The client drops the connection by sending a FIN packet 
unexpectedly.

...Stu

Re: Slow VM, svn client drops connection with FIN packet

Posted by Branko Čibej <br...@wandisco.com>.
On 16.12.2013 18:45, Stuart MacDonald wrote:
> The link I provided is *exactly* what I'm seeing, so I didn't see the
> need to repeat that post. I'm in a VM, the client drops the connection
> with an erroneous [FIN, ACK], just after the TCP window opens up again.
>
> Hm, one detail I can provide now that's not in that post:
> U  directory/file1
> U  directory/file2
> svn: E175002: REPORT of '/svn/repos/<name>/!svn/me': Could not read
> response body: connection was closed by server (http://<server>)
>
> I don't know what's on the server side, and may not be able to find
> out, but I'll inquire. Hm, actually the network trace shows "Server:
> nginx/1.0.5". Client operation  is 'svn up'.


So your server is using an nginx proxy, and your 1.7 client doesn't work
with it. The thing to do is to try to reproduce this with 1.8, and if
it's reproducible, it's still most likely a proxy bug (nginx 1.0.5 is
rather old). Reporting this to the server administrator would be the
next step.

-- Brane


-- 
Branko Čibej | Director of Subversion
WANdisco // Non-Stop Data
e. brane@wandisco.com

Re: Slow VM, svn client drops connection with FIN packet

Posted by Stuart MacDonald <st...@gmail.com>.
The link I provided is *exactly* what I'm seeing, so I didn't see the need
to repeat that post. I'm in a VM, the client drops the connection with an
erroneous [FIN, ACK], just after the TCP window opens up again.

Hm, one detail I can provide now that's not in that post:
U  directory/file1
U  directory/file2
svn: E175002: REPORT of '/svn/repos/<name>/!svn/me': Could not read
response body: connection was closed by server (http://<server>)

I don't know what's on the server side, and may not be able to find out,
but I'll inquire. Hm, actually the network trace shows "Server:
nginx/1.0.5". Client operation  is 'svn up'.

What else would you like me to do/provide?

...Stu



On Mon, Dec 16, 2013 at 12:05 PM, Ben Reser <be...@reser.org> wrote:

> On Mon Dec 16 07:56:32 2013, Stuart MacDonald wrote:
> >
> > I am seeing exactly this
> > issue:
> https://groups.google.com/forum/#!topic/subversion_users/AZ4M62Qwmjc
> > but do not find a bug report for it in the database. The linked bug is
> for
> > something similar but unrelated. Can I file this?
> >
> > ...Stu
> >
> > stumacdonald@stumacdonald-VirtualBox:~/src/ianywhere$ svn --version
> > svn, version 1.7.13 (r1516569)
> > compiled Aug 27 2013, 09:06:07
> >
> > Copyright (C) 2013 The Apache Software Foundation.
> > This software consists of contributions made by many people; see the
> NOTICE
> > file for more information.
> > Subversion is open source software, see http://subversion.apache.org/
> >
> > The following repository access (RA) modules are available:
> >
> > * ra_neon : Module for accessing a repository via WebDAV protocol using
> Neon.
> > - handles 'http' scheme
> > - handles 'https' scheme
> > * ra_svn : Module for accessing a repository using the svn network
> protocol.
> > - with Cyrus SASL authentication
> > - handles 'svn' scheme
> > * ra_local : Module for accessing a repository on local disk.
> > - handles 'file' scheme
>
> Before reporting an issue you really ought to try it with the most
> recent version of Subversion (1.7.14 and 1.8.5).  Since 1.8.0 we only
> use the serf http-library.  If you can trigger the same issue with
> 1.8.5 then that's likely a sign that we aren't tending the TCP
> connections as we should (vs the neon http library that your version
> output shows you're using with 1.7.13).  Another point is that you
> really should provide more details here.  What versions of httpd and
> svn are on the server side?  What client operation(s) are you having
> the issue with.
>
> Without a lot more research on your part the bug isn't likely to get
> much interest.
>

Re: Slow VM, svn client drops connection with FIN packet

Posted by Ben Reser <be...@reser.org>.
On Mon Dec 16 07:56:32 2013, Stuart MacDonald wrote:
>
> I am seeing exactly this
> issue: https://groups.google.com/forum/#!topic/subversion_users/AZ4M62Qwmjc
> but do not find a bug report for it in the database. The linked bug is for
> something similar but unrelated. Can I file this?
>
> ...Stu
>
> stumacdonald@stumacdonald-VirtualBox:~/src/ianywhere$ svn --version
> svn, version 1.7.13 (r1516569)
> compiled Aug 27 2013, 09:06:07
>
> Copyright (C) 2013 The Apache Software Foundation.
> This software consists of contributions made by many people; see the NOTICE
> file for more information.
> Subversion is open source software, see http://subversion.apache.org/
>
> The following repository access (RA) modules are available:
>
> * ra_neon : Module for accessing a repository via WebDAV protocol using Neon.
> - handles 'http' scheme
> - handles 'https' scheme
> * ra_svn : Module for accessing a repository using the svn network protocol.
> - with Cyrus SASL authentication
> - handles 'svn' scheme
> * ra_local : Module for accessing a repository on local disk.
> - handles 'file' scheme

Before reporting an issue you really ought to try it with the most 
recent version of Subversion (1.7.14 and 1.8.5).  Since 1.8.0 we only 
use the serf http-library.  If you can trigger the same issue with 
1.8.5 then that's likely a sign that we aren't tending the TCP 
connections as we should (vs the neon http library that your version 
output shows you're using with 1.7.13).  Another point is that you 
really should provide more details here.  What versions of httpd and 
svn are on the server side?  What client operation(s) are you having 
the issue with.

Without a lot more research on your part the bug isn't likely to get 
much interest.

Re: Slow VM, svn client drops connection with FIN packet

Posted by Thorsten Schöning <ts...@am-soft.de>.
Guten Tag Stuart MacDonald,
am Montag, 16. Dezember 2013 um 18:31 schrieben Sie:

> If someone can point me at a Ubuntu-compatible package[...]

We use the package form Wandisco on our LTS 12.04.

http://subversion.apache.org/packages.html#ubuntu

Mit freundlichen Grüßen,

Thorsten Schöning

-- 
Thorsten Schöning       E-Mail:Thorsten.Schoening@AM-SoFT.de
AM-SoFT IT-Systeme      http://www.AM-SoFT.de/

Telefon...........05151-  9468- 55
Fax...............05151-  9468- 88
Mobil..............0178-8 9468- 04

AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln
AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow


Re: Slow VM, svn client drops connection with FIN packet

Posted by Stuart MacDonald <st...@gmail.com>.
On Mon, Dec 16, 2013 at 12:45 PM, Mark Phippard <ma...@gmail.com> wrote:

> On Mon, Dec 16, 2013 at 12:31 PM, Stuart MacDonald <
> stuartm.coding@gmail.com> wrote:
>
>> If someone can point me at a Ubuntu-compatible package, I'm more than
>> happy to upgrade. Last time I looked (within the last two months) I was
>> unable to find anything later than what I'm running. I'd rather not spend
>> the time compiling from source if I can avoid it.
>>
>
> http://subversion.apache.org/packages.html#ubuntu
>

Yeah. The Ubuntu packages don't have either 1.7.14 or 1.8.5. The WANdisco
packages require registration. What I said stands, there are no current
Ubuntu-compatible packages.

So, compiling from sources.

...Stu

Re: Slow VM, svn client drops connection with FIN packet

Posted by Mark Phippard <ma...@gmail.com>.
On Mon, Dec 16, 2013 at 12:31 PM, Stuart MacDonald <stuartm.coding@gmail.com
> wrote:

> If someone can point me at a Ubuntu-compatible package, I'm more than
> happy to upgrade. Last time I looked (within the last two months) I was
> unable to find anything later than what I'm running. I'd rather not spend
> the time compiling from source if I can avoid it.
>

http://subversion.apache.org/packages.html#ubuntu

Please elucidate "capture traces". I already have a Wireshark capture of
> the failure. I see exactly what the poster saw: the client drops the
> connection with an unexplained [FIN, ACK].
>

I meant capture new traces once you are on 1.8.5 (assuming the problem
still exists).  Different libraries are used now, so the error might
manifest differently.


> This is easily recreated for me: I run this update once a morning, and see
> the failure about 3 days out of 5. I've been working with my internal IT
> group, but they haven't been able to help much. They did have me add
> 'http-timeout = 180" to the .subversion/servers [global] section; this cut
> down the failures to about 1 out of 5 days.
>

Simply saying that this is not the sort of error that a dev can attach a
debugger to, so having traces from the latest version might help.



> Agreed; getting the bug fixed is the victory. However, this is clearly two
> bugs: 1) the error message is wrong, 2) the client is dropping the
> connection. Those need to be in the bug database unless they are already
> present, or already fixed. Neither seemed to be the case when I posted.
>

You are not using 1.8, so how do you know?  The vast majority of bugs do
not go through the tracker.  I am simply pointing out that getting this
into the tracker will not get it any more attention.

-- 
Thanks

Mark Phippard
http://markphip.blogspot.com/

Re: Slow VM, svn client drops connection with FIN packet

Posted by Stuart MacDonald <st...@gmail.com>.
If someone can point me at a Ubuntu-compatible package, I'm more than happy
to upgrade. Last time I looked (within the last two months) I was unable to
find anything later than what I'm running. I'd rather not spend the time
compiling from source if I can avoid it.

Please elucidate "capture traces". I already have a Wireshark capture of
the failure. I see exactly what the poster saw: the client drops the
connection with an unexplained [FIN, ACK].

This is easily recreated for me: I run this update once a morning, and see
the failure about 3 days out of 5. I've been working with my internal IT
group, but they haven't been able to help much. They did have me add
'http-timeout = 180" to the .subversion/servers [global] section; this cut
down the failures to about 1 out of 5 days.

Agreed; getting the bug fixed is the victory. However, this is clearly two
bugs: 1) the error message is wrong, 2) the client is dropping the
connection. Those need to be in the bug database unless they are already
present, or already fixed. Neither seemed to be the case when I posted.

...Stu


On Mon, Dec 16, 2013 at 11:59 AM, Mark Phippard <ma...@gmail.com> wrote:

> On Mon, Dec 16, 2013 at 10:56 AM, Stuart MacDonald <
> stuartm.coding@gmail.com> wrote:
>
>>
>> I am seeing exactly this issue:
>> https://groups.google.com/forum/#!topic/subversion_users/AZ4M62Qwmjc but
>> do not find a bug report for it in the database. The linked bug is for
>> something similar but unrelated. Can I file this?
>>
>> ...Stu
>>
>> stumacdonald@stumacdonald-VirtualBox:~/src/ianywhere$ svn --version
>> svn, version 1.7.13 (r1516569)
>> compiled Aug 27 2013, 09:06:07
>>
>
>
> Subversion 1.8 has an entirely new HTTP networking layer.  So the first
> thing you should do is start with the latest version (1.8.5) so you can see
> if the problem is either fixed or manifests differently.  You will likely
> also need to capture traces etc. to have any chance of getting this solved
> since it is not something that can be easily recreated.
>
> Finally, getting something filed in the issue tracker is not a victory.
>  This mailing list is the best place to get attention for your problem and
> interact with others etc.
>
> --
> Thanks
>
> Mark Phippard
> http://markphip.blogspot.com/
>

Re: Slow VM, svn client drops connection with FIN packet

Posted by Mark Phippard <ma...@gmail.com>.
On Mon, Dec 16, 2013 at 10:56 AM, Stuart MacDonald <stuartm.coding@gmail.com
> wrote:

>
> I am seeing exactly this issue:
> https://groups.google.com/forum/#!topic/subversion_users/AZ4M62Qwmjc but
> do not find a bug report for it in the database. The linked bug is for
> something similar but unrelated. Can I file this?
>
> ...Stu
>
> stumacdonald@stumacdonald-VirtualBox:~/src/ianywhere$ svn --version
> svn, version 1.7.13 (r1516569)
> compiled Aug 27 2013, 09:06:07
>


Subversion 1.8 has an entirely new HTTP networking layer.  So the first
thing you should do is start with the latest version (1.8.5) so you can see
if the problem is either fixed or manifests differently.  You will likely
also need to capture traces etc. to have any chance of getting this solved
since it is not something that can be easily recreated.

Finally, getting something filed in the issue tracker is not a victory.
 This mailing list is the best place to get attention for your problem and
interact with others etc.

-- 
Thanks

Mark Phippard
http://markphip.blogspot.com/