You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@subversion.apache.org by Stuart MacDonald <st...@gmail.com> on 2013/12/16 16:56:32 UTC
Slow VM, svn client drops connection with FIN packet
I am seeing exactly this issue:
https://groups.google.com/forum/#!topic/subversion_users/AZ4M62Qwmjc but do
not find a bug report for it in the database. The linked bug is for
something similar but unrelated. Can I file this?
...Stu
stumacdonald@stumacdonald-VirtualBox:~/src/ianywhere$ svn --version
svn, version 1.7.13 (r1516569)
compiled Aug 27 2013, 09:06:07
Copyright (C) 2013 The Apache Software Foundation.
This software consists of contributions made by many people; see the NOTICE
file for more information.
Subversion is open source software, see http://subversion.apache.org/
The following repository access (RA) modules are available:
* ra_neon : Module for accessing a repository via WebDAV protocol using
Neon.
- handles 'http' scheme
- handles 'https' scheme
* ra_svn : Module for accessing a repository using the svn network protocol.
- with Cyrus SASL authentication
- handles 'svn' scheme
* ra_local : Module for accessing a repository on local disk.
- handles 'file' scheme
Re: Slow VM, svn client drops connection with FIN packet
Posted by Stuart MacDonald <st...@gmail.com>.
Thanks, I'll pass it along. It mentions that commit access is required to
exploit this though, so I think IT will probably ignore this if they
haven't already patched it.
...Stu
On Mon, Dec 16, 2013 at 2:20 PM, Ben Reser <be...@reser.org> wrote:
> On 12/16/13 11:08 AM, Stuart MacDonald wrote:
> > svn is 1.7.7 (we are not planning to upgrade for some time)
>
> This doesn't help with your issue but if you need ammo to convince IT to
> upgrade:
> https://subversion.apache.org/security/CVE-2013-4131-advisory.txt
>
> If it's a distribution package it might have been patched without changing
> the
> version number.
>
Re: Slow VM, svn client drops connection with FIN packet
Posted by Ben Reser <be...@reser.org>.
On 12/16/13 11:08 AM, Stuart MacDonald wrote:
> svn is 1.7.7 (we are not planning to upgrade for some time)
This doesn't help with your issue but if you need ammo to convince IT to upgrade:
https://subversion.apache.org/security/CVE-2013-4131-advisory.txt
If it's a distribution package it might have been patched without changing the
version number.
Re: Slow VM, svn client drops connection with FIN packet
Posted by Stuart MacDonald <st...@gmail.com>.
IT says we have:
nginx 1.05 with plans to move to 1.3.9
svn is 1.7.7 (we are not planning to upgrade for some time)
...Stu
On Mon, Dec 16, 2013 at 12:45 PM, Stuart MacDonald <stuartm.coding@gmail.com
> wrote:
> The link I provided is *exactly* what I'm seeing, so I didn't see the need
> to repeat that post. I'm in a VM, the client drops the connection with an
> erroneous [FIN, ACK], just after the TCP window opens up again.
>
> Hm, one detail I can provide now that's not in that post:
> U directory/file1
> U directory/file2
> svn: E175002: REPORT of '/svn/repos/<name>/!svn/me': Could not read
> response body: connection was closed by server (http://<server>)
>
> I don't know what's on the server side, and may not be able to find out,
> but I'll inquire. Hm, actually the network trace shows "Server:
> nginx/1.0.5". Client operation is 'svn up'.
>
> What else would you like me to do/provide?
>
> ...Stu
>
>
>
> On Mon, Dec 16, 2013 at 12:05 PM, Ben Reser <be...@reser.org> wrote:
>
>> On Mon Dec 16 07:56:32 2013, Stuart MacDonald wrote:
>> >
>> > I am seeing exactly this
>> > issue:
>> https://groups.google.com/forum/#!topic/subversion_users/AZ4M62Qwmjc
>> > but do not find a bug report for it in the database. The linked bug is
>> for
>> > something similar but unrelated. Can I file this?
>> >
>> > ...Stu
>> >
>> > stumacdonald@stumacdonald-VirtualBox:~/src/ianywhere$ svn --version
>> > svn, version 1.7.13 (r1516569)
>> > compiled Aug 27 2013, 09:06:07
>> >
>> > Copyright (C) 2013 The Apache Software Foundation.
>> > This software consists of contributions made by many people; see the
>> NOTICE
>> > file for more information.
>> > Subversion is open source software, see http://subversion.apache.org/
>> >
>> > The following repository access (RA) modules are available:
>> >
>> > * ra_neon : Module for accessing a repository via WebDAV protocol using
>> Neon.
>> > - handles 'http' scheme
>> > - handles 'https' scheme
>> > * ra_svn : Module for accessing a repository using the svn network
>> protocol.
>> > - with Cyrus SASL authentication
>> > - handles 'svn' scheme
>> > * ra_local : Module for accessing a repository on local disk.
>> > - handles 'file' scheme
>>
>> Before reporting an issue you really ought to try it with the most
>> recent version of Subversion (1.7.14 and 1.8.5). Since 1.8.0 we only
>> use the serf http-library. If you can trigger the same issue with
>> 1.8.5 then that's likely a sign that we aren't tending the TCP
>> connections as we should (vs the neon http library that your version
>> output shows you're using with 1.7.13). Another point is that you
>> really should provide more details here. What versions of httpd and
>> svn are on the server side? What client operation(s) are you having
>> the issue with.
>>
>> Without a lot more research on your part the bug isn't likely to get
>> much interest.
>>
>
>
Re: Slow VM, svn client drops connection with FIN packet
Posted by st...@gmail.com.
I replied yesterday, but my post seems to have disappeared.
I figured it out, and the key was your reproduction and my realisation I
wasn't seeing a FIN from my server. It turns out that in every case the last
392598 71.906863000 10.222.3.88 10.14.11.50 HTTP 1434 Continuation or
non-HTTP traffic
line from the server is actually also a FIN packet, and Wireshark (or the
TCP dissector) is "hiding" that; if I look at the packet I can see the FIN
flag, but it's not mentioned in the Info field in the packet list. So the
FINs from my client are close handshake responses not initiations, and it
is the server timing out after all.
Thanks for the help. I'm checking with my IT dept to see if they can bump
up the timeout.
...Stu
On Tuesday, January 14, 2014 11:55:39 AM UTC-5, Philip Martin wrote:
>
> Stuart MacDonald <stuartm...@gmail.com <javascript:>> writes:
>
> > svn: E175002: REPORT of '/svn/repos/deckard-65x/!svn/me': Could not read
> > response body: connection was closed by server (http://foundry51.qnx.com)
>
>
> I can provoke this on my local LAN by setting the Apache timeout on the
> server to a low value such as 5 seconds. Then I run a checkout that
> generates a large REPORT response and I use gdb to interrupt the client
> at subversion/libsvn_ra_neon/fetch.c:1709 so that the client is in the
> middle of reading the repsonse. I wait for the server timeout to
> expire and then resume the client and I get the error:
>
> svn: E175002: REPORT of '/obj/repo/!svn/me': Could not read response body:
> connection was closed by server (http://peri.local:8888)
>
> One fix is to increase the server timeout to handle your slow client.
>
> --
> Philip Martin | Subversion Committer
> WANdisco // *Non-Stop Data*
>
Re: Slow VM, svn client drops connection with FIN packet
Posted by Philip Martin <ph...@wandisco.com>.
Stuart MacDonald <st...@gmail.com> writes:
> svn: E175002: REPORT of '/svn/repos/deckard-65x/!svn/me': Could not read
> response body: connection was closed by server (http://foundry51.qnx.com)
I can provoke this on my local LAN by setting the Apache timeout on the
server to a low value such as 5 seconds. Then I run a checkout that
generates a large REPORT response and I use gdb to interrupt the client
at subversion/libsvn_ra_neon/fetch.c:1709 so that the client is in the
middle of reading the repsonse. I wait for the server timeout to
expire and then resume the client and I get the error:
svn: E175002: REPORT of '/obj/repo/!svn/me': Could not read response body: connection was closed by server (http://peri.local:8888)
One fix is to increase the server timeout to handle your slow client.
--
Philip Martin | Subversion Committer
WANdisco // *Non-Stop Data*
Re: Slow VM, svn client drops connection with FIN packet
Posted by Stuart MacDonald <st...@gmail.com>.
Here's this morning's failure. For testing, I've set http-timeout=67. I am
trying to see if the zero window is a coincidence or not.
The debug output:
Reading 8192 bytes of response body.
Got 8192 bytes.
XML: Parsing 8192 bytes.
XML: char-data (235) returns 0
XML: xmlParseChunk returned 0
Reading 8192 bytes of response body.
Got 6180 bytes.
XML: Parsing 6180 bytes.
XML: char-data (235) returns 0
XML: xmlParseChunk returned 0
Reading 8192 bytes of response body.
Aborted request (-3): Could not read response body
sess: Closing connection.
sess: Connection closed.
Request ends, status 200 class 2xx, error line:
Could not read response body: connection was closed by server
Running destroy hooks.
Request ends.
svn: E175002: REPORT of '/svn/repos/deckard-65x/!svn/me': Could not read
response body: connection was closed by server (http://foundry51.qnx.com)
sess: Destroying session.
sess: Destroying session.
The network capture:
391174 498.610439000 10.222.3.88 10.14.11.50 HTTP 1434
Continuation or non-HTTP traffic
391175 498.610495000 10.14.11.50 10.222.3.88 TCP 54 37480
> http [ACK] Seq=1945767 Ack=480574851 Win=1280 Len=0
391176 498.842179000 10.222.3.88 10.14.11.50 HTTP 1334
[TCP Window Full] Continuation or non-HTTP traffic
391177 499.042264000 10.14.11.50 10.222.3.88 TCP 54 [TCP
ZeroWindow] 37480 > http [ACK] Seq=1945767 Ack=480576131 Win=0 Len=0
391178 499.297972000 10.222.3.88 10.14.11.50 TCP 60 [TCP
Keep-Alive] http > 37480 [ACK] Seq=480576130 Ack=1945767 Win=908288 Len=0
391179 499.298065000 10.14.11.50 10.222.3.88 TCP 54 [TCP
ZeroWindow] 37480 > http [ACK] Seq=1945767 Ack=480576131 Win=0 Len=0
391180 499.796762000 10.222.3.88 10.14.11.50 TCP 60 [TCP
Keep-Alive] http > 37480 [ACK] Seq=480576130 Ack=1945767 Win=908288 Len=0
391181 499.796811000 10.14.11.50 10.222.3.88 TCP 54 [TCP
ZeroWindow] 37480 > http [ACK] Seq=1945767 Ack=480576131 Win=0 Len=0
391182 500.778349000 10.222.3.88 10.14.11.50 TCP 60 [TCP
Keep-Alive] http > 37480 [ACK] Seq=480576130 Ack=1945767 Win=908288 Len=0
391183 500.778405000 10.14.11.50 10.222.3.88 TCP 54 [TCP
ZeroWindow] 37480 > http [ACK] Seq=1945767 Ack=480576131 Win=0 Len=0
391184 502.728191000 10.222.3.88 10.14.11.50 TCP 60 [TCP
Keep-Alive] http > 37480 [ACK] Seq=480576130 Ack=1945767 Win=908288 Len=0
391185 502.728260000 10.14.11.50 10.222.3.88 TCP 54 [TCP
ZeroWindow] 37480 > http [ACK] Seq=1945767 Ack=480576131 Win=0 Len=0
391186 506.614031000 10.222.3.88 10.14.11.50 TCP 60 [TCP
Keep-Alive] http > 37480 [ACK] Seq=480576130 Ack=1945767 Win=908288 Len=0
391187 506.614093000 10.14.11.50 10.222.3.88 TCP 54 [TCP
ZeroWindow] 37480 > http [ACK] Seq=1945767 Ack=480576131 Win=0 Len=0
391188 514.371638000 10.222.3.88 10.14.11.50 TCP 60 [TCP
Keep-Alive] http > 37480 [ACK] Seq=480576130 Ack=1945767 Win=908288 Len=0
391189 514.371694000 10.14.11.50 10.222.3.88 TCP 54 [TCP
ZeroWindow] 37480 > http [ACK] Seq=1945767 Ack=480576131 Win=0 Len=0
391190 529.872211000 10.222.3.88 10.14.11.50 TCP 60 [TCP
Keep-Alive] http > 37480 [ACK] Seq=480576130 Ack=1945767 Win=908288 Len=0
391191 *REF* 10.14.11.50 10.222.3.88 TCP 54 [TCP
ZeroWindow] 37480 > http [ACK] Seq=1945767 Ack=480576131 Win=0 Len=0
391192 30.988601000 10.222.3.88 10.14.11.50 TCP 60 [TCP
Keep-Alive] http > 37480 [ACK] Seq=480576130 Ack=1945767 Win=908288 Len=0
391193 30.988645000 10.14.11.50 10.222.3.88 TCP 54 [TCP
ZeroWindow] 37480 > http [ACK] Seq=1945767 Ack=480576131 Win=0 Len=0
391194 71.282655000 10.14.11.50 10.222.3.88 TCP 54 [TCP
Window Update] 37480 > http [ACK] Seq=1945767 Ack=480576131 Win=1171456
Len=0
391195 71.296598000 10.222.3.88 10.14.11.50 HTTP 1434
Continuation or non-HTTP traffic
which results in:
392598 71.906863000 10.222.3.88 10.14.11.50 HTTP 1434
Continuation or non-HTTP traffic
392599 71.906925000 10.14.11.50 10.222.3.88 TCP 54 37480
> http [ACK] Seq=1945767 Ack=482310892 Win=987904 Len=0
392600 72.043276000 10.14.11.50 10.222.3.88 TCP 54 [TCP
Window Update] 37480 > http [ACK] Seq=1945767 Ack=482310892 Win=1188096
Len=0
392601 72.375325000 10.14.11.50 10.222.3.88 TCP 54 37480
> http [FIN, ACK] Seq=1945767 Ack=482310892 Win=1188096 Len=0
392602 72.388690000 10.222.3.88 10.14.11.50 TCP 60 http >
37480 [ACK] Seq=482310892 Ack=1945768 Win=908288 Len=0
There are two interesting things:
The long gap after the *REF* packet is unusual, it's the first time I've
seen that.
The FIN packet is 72 seconds later, not 67 like I set the timeout. OTOH,
it's close, and, given the long gap no packets occur at FIN - 67 seconds.
I'm going to keep trying random timeout values.
...Stu
On Tue, Jan 14, 2014 at 9:27 AM, Stuart MacDonald
<st...@gmail.com>wrote:
> I don't have access to the server log, but you can see from the wireshark
> capture that the *client* closed the connection (look for the RST packet),
> not the server. The error of "socket closed" is consistent with that.
>
> I suspect that there's an issue handling the "http-timeout" timer; it gets
> triggered during a TCP zero window situation, and then not canceled
> properly, and then it fires, closing the socket.
>
> That would also fit the data of the bug being in 1.6 (reported by someone
> else, but I linked to it in my original post), 1.7 and 1.8; it's a bug in
> svn's handling of the underlying HTTP interaction layer, and not in that
> layer or the server or somewhere else.
>
> I could be wrong. When I look at the evidence so far, that's what I see.
>
> ...Stu
>
>
> On Tue, Jan 14, 2014 at 6:33 AM, Philip Martin <philip.martin@wandisco.com
> > wrote:
>
>> stuartm.coding@gmail.com writes:
>>
>> > I have another failure, with some debug info, and some updates.
>> >
>> > We're switching to a different system for this particular repository,
>> so I
>> > may or may not be able to continue debug efforts. :-(
>> >
>> > Since there's no debug info available for the svn 1.8 client, I haven't
>> > been running that.
>> >
>> > I went a full week without a failure, which I suspect is due to the
>> > enormous amount of debugging that neon-debug-mask=255 produces, so this
>> > week I dialed it back to 127 and got a failure on the first try.
>> >
>> > Here's the latest failure. First up, the debug info:
>> > Got 8192 bytes.
>> > XML: Parsing 8192 bytes.
>> > XML: char-data (235) returns 0
>> > XML: xmlParseChunk returned 0
>> > Reading 8192 bytes of response body.
>> > Got 6600 bytes.
>> > XML: Parsing 6600 bytes.
>> > XML: char-data (235) returns 0
>> > XML: xmlParseChunk returned 0
>> > Reading 8192 bytes of response body.
>> > Aborted request (-3): Could not read response body
>>
>> That message comes from ne_request.c and the -3 is NE_SOCK_CLOSED
>> defined in ne_socket.h:
>>
>> #define NE_SOCK_CLOSED (-3)
>> /* Connection was reset (e.g. server crashed) */
>>
>> > sess: Closing connection.
>> > sess: Connection closed.
>> > Request ends, status 200 class 2xx, error line:
>> > Could not read response body: connection was closed by server
>> > Running destroy hooks.
>> > Request ends.
>> > svn: E175002: REPORT of '/svn/repos/<name>/!svn/me': Could not read
>> > response body: connection was closed by server (http://<server>)
>> > sess: Destroying session.
>> > sess: Destroying session.
>>
>> What does the server error log show?
>>
>> --
>> Philip Martin | Subversion Committer
>> WANdisco // *Non-Stop Data*
>>
>
>
Re: Slow VM, svn client drops connection with FIN packet
Posted by Stuart MacDonald <st...@gmail.com>.
I don't have access to the server log, but you can see from the wireshark
capture that the *client* closed the connection (look for the RST packet),
not the server. The error of "socket closed" is consistent with that.
I suspect that there's an issue handling the "http-timeout" timer; it gets
triggered during a TCP zero window situation, and then not canceled
properly, and then it fires, closing the socket.
That would also fit the data of the bug being in 1.6 (reported by someone
else, but I linked to it in my original post), 1.7 and 1.8; it's a bug in
svn's handling of the underlying HTTP interaction layer, and not in that
layer or the server or somewhere else.
I could be wrong. When I look at the evidence so far, that's what I see.
...Stu
On Tue, Jan 14, 2014 at 6:33 AM, Philip Martin
<ph...@wandisco.com>wrote:
> stuartm.coding@gmail.com writes:
>
> > I have another failure, with some debug info, and some updates.
> >
> > We're switching to a different system for this particular repository, so
> I
> > may or may not be able to continue debug efforts. :-(
> >
> > Since there's no debug info available for the svn 1.8 client, I haven't
> > been running that.
> >
> > I went a full week without a failure, which I suspect is due to the
> > enormous amount of debugging that neon-debug-mask=255 produces, so this
> > week I dialed it back to 127 and got a failure on the first try.
> >
> > Here's the latest failure. First up, the debug info:
> > Got 8192 bytes.
> > XML: Parsing 8192 bytes.
> > XML: char-data (235) returns 0
> > XML: xmlParseChunk returned 0
> > Reading 8192 bytes of response body.
> > Got 6600 bytes.
> > XML: Parsing 6600 bytes.
> > XML: char-data (235) returns 0
> > XML: xmlParseChunk returned 0
> > Reading 8192 bytes of response body.
> > Aborted request (-3): Could not read response body
>
> That message comes from ne_request.c and the -3 is NE_SOCK_CLOSED
> defined in ne_socket.h:
>
> #define NE_SOCK_CLOSED (-3)
> /* Connection was reset (e.g. server crashed) */
>
> > sess: Closing connection.
> > sess: Connection closed.
> > Request ends, status 200 class 2xx, error line:
> > Could not read response body: connection was closed by server
> > Running destroy hooks.
> > Request ends.
> > svn: E175002: REPORT of '/svn/repos/<name>/!svn/me': Could not read
> > response body: connection was closed by server (http://<server>)
> > sess: Destroying session.
> > sess: Destroying session.
>
> What does the server error log show?
>
> --
> Philip Martin | Subversion Committer
> WANdisco // *Non-Stop Data*
>
Re: Slow VM, svn client drops connection with FIN packet
Posted by Philip Martin <ph...@wandisco.com>.
Philip Martin <ph...@wandisco.com> writes:
> That message comes from ne_request.c and the -3 is NE_SOCK_CLOSED
> defined in ne_socket.h:
>
> #define NE_SOCK_CLOSED (-3)
> /* Connection was reset (e.g. server crashed) */
/* Socket was closed */
#define NE_SOCK_CLOSED (-3)
The comment associated with the constant is "Socket was closed" not
"Connection was reset".
--
Philip Martin | Subversion Committer
WANdisco // *Non-Stop Data*
Re: Slow VM, svn client drops connection with FIN packet
Posted by Philip Martin <ph...@wandisco.com>.
stuartm.coding@gmail.com writes:
> I have another failure, with some debug info, and some updates.
>
> We're switching to a different system for this particular repository, so I
> may or may not be able to continue debug efforts. :-(
>
> Since there's no debug info available for the svn 1.8 client, I haven't
> been running that.
>
> I went a full week without a failure, which I suspect is due to the
> enormous amount of debugging that neon-debug-mask=255 produces, so this
> week I dialed it back to 127 and got a failure on the first try.
>
> Here's the latest failure. First up, the debug info:
> Got 8192 bytes.
> XML: Parsing 8192 bytes.
> XML: char-data (235) returns 0
> XML: xmlParseChunk returned 0
> Reading 8192 bytes of response body.
> Got 6600 bytes.
> XML: Parsing 6600 bytes.
> XML: char-data (235) returns 0
> XML: xmlParseChunk returned 0
> Reading 8192 bytes of response body.
> Aborted request (-3): Could not read response body
That message comes from ne_request.c and the -3 is NE_SOCK_CLOSED
defined in ne_socket.h:
#define NE_SOCK_CLOSED (-3)
/* Connection was reset (e.g. server crashed) */
> sess: Closing connection.
> sess: Connection closed.
> Request ends, status 200 class 2xx, error line:
> Could not read response body: connection was closed by server
> Running destroy hooks.
> Request ends.
> svn: E175002: REPORT of '/svn/repos/<name>/!svn/me': Could not read
> response body: connection was closed by server (http://<server>)
> sess: Destroying session.
> sess: Destroying session.
What does the server error log show?
--
Philip Martin | Subversion Committer
WANdisco // *Non-Stop Data*
Re: Slow VM, svn client drops connection with FIN packet
Posted by st...@gmail.com.
I have another failure, with some debug info, and some updates.
We're switching to a different system for this particular repository, so I
may or may not be able to continue debug efforts. :-(
Since there's no debug info available for the svn 1.8 client, I haven't
been running that.
I went a full week without a failure, which I suspect is due to the
enormous amount of debugging that neon-debug-mask=255 produces, so this
week I dialed it back to 127 and got a failure on the first try.
Here's the latest failure. First up, the debug info:
Got 8192 bytes.
XML: Parsing 8192 bytes.
XML: char-data (235) returns 0
XML: xmlParseChunk returned 0
Reading 8192 bytes of response body.
Got 6600 bytes.
XML: Parsing 6600 bytes.
XML: char-data (235) returns 0
XML: xmlParseChunk returned 0
Reading 8192 bytes of response body.
Aborted request (-3): Could not read response body
sess: Closing connection.
sess: Connection closed.
Request ends, status 200 class 2xx, error line:
Could not read response body: connection was closed by server
Running destroy hooks.
Request ends.
svn: E175002: REPORT of '/svn/repos/<name>/!svn/me': Could not read
response body: connection was closed by server (http://<server>)
sess: Destroying session.
sess: Destroying session.
The capture info, with the default timeout which I am assuming is 600
seconds:
77081 82.621499000 10.222.3.88 10.14.11.50 HTTP 1434
Continuation or non-HTTP traffic
77082 82.621541000 10.14.11.50 10.222.3.88 TCP 54 36603 >
http [ACK] Seq=1850961 Ack=92395908 Win=1280 Len=0
77083 *REF* 10.222.3.88 10.14.11.50 HTTP 1334 [TCP Window
Full] Continuation or non-HTTP traffic
77084 0.200407000 10.14.11.50 10.222.3.88 TCP 54 [TCP
ZeroWindow] 36603 > http [ACK] Seq=1850961 Ack=92397188 Win=0 Len=0
77085 0.458337000 10.222.3.88 10.14.11.50 TCP 60 [TCP
Keep-Alive] http > 36603 [ACK] Seq=92397187 Ack=1850961 Win=893952 Len=0
77086 0.458390000 10.14.11.50 10.222.3.88 TCP 54 [TCP
ZeroWindow] 36603 > http [ACK] Seq=1850961 Ack=92397188 Win=0 Len=0
77087 0.960584000 10.222.3.88 10.14.11.50 TCP 60 [TCP
Keep-Alive] http > 36603 [ACK] Seq=92397187 Ack=1850961 Win=893952 Len=0
77088 0.960653000 10.14.11.50 10.222.3.88 TCP 54 [TCP
ZeroWindow] 36603 > http [ACK] Seq=1850961 Ack=92397188 Win=0 Len=0
77089 1.950177000 10.222.3.88 10.14.11.50 TCP 60 [TCP
Keep-Alive] http > 36603 [ACK] Seq=92397187 Ack=1850961 Win=893952 Len=0
77090 1.950213000 10.14.11.50 10.222.3.88 TCP 54 [TCP
ZeroWindow] 36603 > http [ACK] Seq=1850961 Ack=92397188 Win=0 Len=0
77092 2.991154000 10.222.3.88 10.14.11.50 HTTP 1434
Continuation or non-HTTP traffic
which gives
395638 524.989307000 10.222.3.88 10.14.11.50 HTTP 1434
Continuation or non-HTTP traffic
395639 524.989401000 10.14.11.50 10.222.3.88 TCP 54 [TCP
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=763605828
Win=746496 Len=0
395640 524.989606000 10.14.11.50 10.222.3.88 TCP 54 [TCP
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=763615488
Win=736768 Len=0
395641 524.989883000 10.14.11.50 10.222.3.88 TCP 54 [TCP
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=763629288
Win=722944 Len=0
395642 524.990140000 10.14.11.50 10.222.3.88 TCP 54 [TCP
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=763643088
Win=709120 Len=0
395643 525.869551000 10.14.11.50 10.222.3.88 TCP 54 [TCP
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=766011168
Win=1366016 Len=0
395644 526.501877000 10.14.11.50 10.222.3.88 TCP 54 [TCP
ZeroWindow] [TCP ACKed unseen segment] 36603 > http [ACK] Seq=1850961
Ack=767377268 Win=0 Len=0
395645 527.769324000 10.14.11.50 10.222.3.88 TCP 54 [TCP
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=768063228
Win=1101056 Len=0
395646 527.783100000 10.14.11.50 10.222.3.88 TCP 54 [TCP
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=768128088
Win=1116416 Len=0
395647 527.786772000 10.14.11.50 10.222.3.88 TCP 54 [TCP
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=768215028
Win=1154816 Len=0
395648 527.815465000 10.14.11.50 10.222.3.88 TCP 54 [TCP
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=768347508
Win=1218304 Len=0
395649 527.818509000 10.14.11.50 10.222.3.88 TCP 54 [TCP
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=768373728
Win=1204736 Len=0
395650 527.872518000 10.14.11.50 10.222.3.88 TCP 54 [TCP
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=768698028
Win=1233664 Len=0
395651 528.026681000 10.14.11.50 10.222.3.88 TCP 54 [TCP
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=769930368
Win=1239040 Len=0
395652 528.086233000 10.14.11.50 10.222.3.88 TCP 54 [TCP
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=770440968
Win=1298432 Len=0
395653 528.683101000 10.14.11.50 10.222.3.88 TCP 54 [TCP
ZeroWindow] [TCP ACKed unseen segment] 36603 > http [ACK] Seq=1850961
Ack=771739448 Win=0 Len=0
395654 529.544542000 10.14.11.50 10.222.3.88 TCP 54 [TCP
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=772977408
Win=757504 Len=0
395655 529.591855000 10.14.11.50 10.222.3.88 TCP 54 [TCP
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=773252028
Win=798976 Len=0
395656 529.660855000 10.14.11.50 10.222.3.88 TCP 54 [TCP
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=773881308
Win=423424 Len=0
395657 529.688608000 10.14.11.50 10.222.3.88 TCP 54 [TCP
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=774109008
Win=669184 Len=0
395658 529.860154000 10.14.11.50 10.222.3.88 TCP 54 [TCP
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=775638048
Win=611328 Len=0
395659 530.033617000 10.14.11.50 10.222.3.88 TCP 54 [TCP
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=777532788
Win=1091328 Len=0
395660 530.224557000 10.14.11.50 10.222.3.88 TCP 54 [TCP
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=779399928
Win=1686272 Len=0
395661 530.375431000 10.14.11.50 10.222.3.88 TCP 54 [TCP
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=781086288
Win=716032 Len=0
395662 531.350496000 10.14.11.50 10.222.3.88 TCP 54 [TCP
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=785885928
Win=739584 Len=0
395663 531.625875000 10.14.11.50 10.222.3.88 TCP 54 [TCP
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=786624228 Win=1280
Len=0
395664 532.084955000 10.14.11.50 10.222.3.88 TCP 54 [TCP
ZeroWindow] [TCP ACKed unseen segment] 36603 > http [ACK] Seq=1850961
Ack=786625508 Win=0 Len=0
395665 532.880166000 10.14.11.50 10.222.3.88 TCP 54 [TCP
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=786974748
Win=721664 Len=0
395666 532.910934000 10.14.11.50 10.222.3.88 TCP 54 [TCP
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=787118268
Win=832000 Len=0
395667 532.920303000 10.14.11.50 10.222.3.88 TCP 54 [TCP
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=787183128
Win=785152 Len=0
395668 532.955237000 10.14.11.50 10.222.3.88 TCP 54 [TCP
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=787427388
Win=1219072 Len=0
395669 532.955463000 10.14.11.50 10.222.3.88 TCP 54 [TCP
Window Update] [TCP ACKed unseen segment] 36603 > http [ACK] Seq=1850961
Ack=787427388 Win=1248768 Len=0
395670 532.960093000 10.14.11.50 10.222.3.88 TCP 54 [TCP
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=787490868
Win=1323264 Len=0
395671 532.970228000 10.14.11.50 10.222.3.88 TCP 54 [TCP
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=787557108
Win=1404672 Len=0
395672 532.984501000 10.14.11.50 10.222.3.88 TCP 54 [TCP
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=787678548
Win=1502720 Len=0
395673 532.999351000 10.14.11.50 10.222.3.88 TCP 54 [TCP
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=787786188
Win=1514496 Len=0
395674 533.019030000 10.14.11.50 10.222.3.88 TCP 54 [TCP
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=788026308
Win=1484800 Len=0
395675 533.032385000 10.14.11.50 10.222.3.88 TCP 54 [TCP
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=788144988
Win=1488896 Len=0
395676 533.047272000 10.14.11.50 10.222.3.88 TCP 54 [TCP
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=788299548
Win=1450240 Len=0
395677 533.188960000 10.14.11.50 10.222.3.88 TCP 54 [TCP
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=790268808
Win=953344 Len=0
395678 533.984603000 10.14.11.50 10.222.3.88 TCP 54 [TCP
ZeroWindow] [TCP ACKed unseen segment] 36603 > http [ACK] Seq=1850961
Ack=794394908 Win=0 Len=0
395679 549.627263000 10.222.3.88 10.14.11.50 TCP 60 [TCP
Previous segment not captured] http > 36603 [ACK] Seq=794394907 Ack=1850961
Win=893952 Len=0
395680 549.627315000 10.14.11.50 10.222.3.88 TCP 54 [TCP
ZeroWindow] [TCP ACKed unseen segment] 36603 > http [ACK] Seq=1850961
Ack=794394908 Win=0 Len=0
395681 565.449108000 10.222.3.88 10.14.11.50 TCP 60 [TCP
Dup ACK 395679#1] http > 36603 [ACK] Seq=794394907 Ack=1850961 Win=893952
Len=0
395682 565.449169000 10.14.11.50 10.222.3.88 TCP 54 [TCP
ZeroWindow] [TCP ACKed unseen segment] 36603 > http [ACK] Seq=1850961
Ack=794394908 Win=0 Len=0
395683 599.797841000 10.14.11.50 10.222.3.88 TCP 54 [TCP
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=794858688
Win=1141248 Len=0
395684 599.810002000 10.14.11.50 10.222.3.88 TCP 54 [TCP
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=794875248
Win=1172992 Len=0
395685 599.825538000 10.14.11.50 10.222.3.88 TCP 54 [TCP
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=794978748
Win=1188096 Len=0
395686 599.839904000 10.14.11.50 10.222.3.88 TCP 54 [TCP
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=795000828
Win=1233664 Len=0
395687 599.843870000 10.14.11.50 10.222.3.88 TCP 54 [TCP
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=795054648
Win=1185280 Len=0
395688 599.857927000 10.14.11.50 10.222.3.88 TCP 54 [TCP
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=795096048
Win=1149440 Len=0
395689 600.137326000 10.14.11.50 10.222.3.88 TCP 54 [TCP
ACKed unseen segment] 36603 > http [FIN, ACK] Seq=1850961 Ack=795809509
Win=1526272 Len=0
Any thoughts, now that there's a bit of debug info?
...Stu
On Monday, December 23, 2013 5:40:30 PM UTC-5, Stuart MacDonald wrote:
>
> On Mon, Dec 23, 2013 at 4:47 PM, Stuart MacDonald <
> stuartm.coding@gmail.com> wrote:
>
>> I took some time last Fri to see what was 180 seconds prior to the FIN
>> packet in the trace (this is the 1.7.14 failure):
>> 289223 259.418521000 10.222.3.88 10.14.10.32 HTTP
>> 1434 Continuation or non-HTTP traffic
>> 289224 259.418583000 10.14.10.32 10.222.3.88 TCP 54
>> 60442 > http [ACK] Seq=2151 Ack=358465616 Win=1280 Len=0
>> 289225 259.649415000 10.222.3.88 10.14.10.32 HTTP 1334
>> [TCP Window Full] Continuation or non-HTTP traffic
>> 289226 259.855605000 10.14.10.32 10.222.3.88 TCP 54
>> [TCP ZeroWindow] 60442 > http [ACK] Seq=2151 Ack=358466896 Win=0 Len=0
>> 289227 259.868018000 10.222.3.88 10.14.10.32 HTTP 1334
>> [TCP Window Full] [TCP Retransmission] Continuation or non-HTTP traffic
>> 289228 *REF* 10.14.10.32 10.222.3.88 TCP 66 [TCP
>> ZeroWindow] 60442 > http [ACK] Seq=2151 Ack=358466896 Win=0 Len=0
>> SLE=358465616 SRE=358466896
>> 289229 0.383858000 10.14.10.32 10.222.3.88 TCP 54 [TCP
>> Window Update] 60442 > http [ACK] Seq=2151 Ack=358466896 Win=2062848 Len=0
>> 289230 0.398005000 10.222.3.88 10.14.10.32 HTTP 1434
>> Continuation or non-HTTP traffic
>> which yields
>> 444658 178.073895000 10.222.3.88 10.14.10.32 HTTP 1434
>> Continuation or non-HTTP traffic
>> 444659 178.073944000 10.14.10.32 10.222.3.88 TCP 54
>> 60442 > http [ACK] Seq=2151 Ack=550721697 Win=542208 Len=0
>> 444660 179.906931000 10.14.10.32 10.222.3.88 TCP 54
>> [TCP Window Update] 60442 > http [ACK] Seq=2151 Ack=550721697 Win=1966336
>> Len=0
>> 444661 180.152119000 10.14.10.32 10.222.3.88 TCP 54
>> 60442 > http [FIN, ACK] Seq=2151 Ack=550721697 Win=1966336 Len=0
>> 444662 180.165529000 10.222.3.88 10.14.10.32 TCP 60
>> http > 60442 [ACK] Seq=550721697 Ack=2152 Win=10240 Len=0
>>
>> If the time ref is set at (wireshark-)packet 289227 then the FIN packet
>> would be at 179.something elapsed. There are a lot of TCP zero windows in
>> the trace though, that one isn't special AFAICT. There's plenty before and
>> after it.
>>
>> If someone can let me know what the default for http-timeout is for
>> 1.8.5/ra_serf, I'll do the same check in this morning's failure and see
>> what's there.
>>
>
> Some googling
> <http://mail-archives.apache.org/mod_mbox/subversion-commits/201211.mbox/%3C20121120183435.0E86A23888E3@eris.apache.org%3E>reveals
> that the default timeout is likely to be 600 seconds. Let's see what we
> have there (this is the 1.8.5 failure):
> 708 25.462185000 10.222.3.88 10.14.10.85 TCP 1434 [TCP
> segment of a reassembled PDU]
> 709 25.462580000 10.14.10.85 10.222.3.88 TCP 54 9415 >
> http [ACK] Seq=2713 Ack=753855 Win=1280 Len=0
> 710 25.707269000 10.222.3.88 10.14.10.85 TCP 1334 [TCP
> Window Full] [TCP segment of a reassembled PDU]
> 711 25.907229000 10.14.10.85 10.222.3.88 TCP 54 [TCP
> ZeroWindow] 9415 > http [ACK] Seq=2713 Ack=755135 Win=0 Len=0
> 712 *REF* 10.14.10.85 10.222.3.88 TCP 54 [TCP Window
> Update] 9415 > http [ACK] Seq=2713 Ack=755135 Win=3328 Len=0
> 713 0.013860000 10.222.3.88 10.14.10.85 TCP 1434 [TCP
> segment of a reassembled PDU]
> which yields
> 496958 599.344977000 10.222.3.88 10.14.10.85 HTTP 1434
> Continuation or non-HTTP traffic
> 496959 599.345056000 10.14.10.85 10.222.3.88 TCP 54 9415
> > http [ACK] Seq=2713 Ack=612899776 Win=1760768 Len=0
> 496960 599.345098000 10.14.10.85 10.222.3.88 TCP 54 [TCP
> Window Update] 9415 > http [ACK] Seq=2713 Ack=612899776 Win=1762048 Len=0
> 496961 600.011551000 10.14.10.85 10.222.3.88 TCP 54 9415
> > http [FIN, ACK] Seq=2713 Ack=612899776 Win=1762048 Len=0
> 496962 600.025110000 10.222.3.88 10.14.10.85 TCP 60 http
> > 9415 [ACK] Seq=612899776 Ack=2714 Win=12288 Len=0
>
> So that's interesting that it's also at a TCP zero window, but like the
> 1.7.14 failure, there's nothing special about that one; there are several
> before it and many after it.
>
> From 2013-12-17, 1.7.13 (WANdisco again), http-timeout=180, ra_neon:
> 380006 238.310421000 10.222.3.88 10.14.10.32 HTTP 1434
> Continuation or non-HTTP traffic
> 380007 238.310501000 10.14.10.32 10.222.3.88 TCP 54
> 12926 > http [ACK] Seq=2151 Ack=464948988 Win=1280 Len=0
> 380008 *REF* 10.222.3.88 10.14.10.32 HTTP 1334 [TCP
> Window Full] Continuation or non-HTTP traffic
> 380009 0.200088000 10.14.10.32 10.222.3.88 TCP 54 [TCP
> ZeroWindow] 12926 > http [ACK] Seq=2151 Ack=464950268 Win=0 Len=0
> 380010 0.456070000 10.222.3.88 10.14.10.32 TCP 60 [TCP
> Keep-Alive] http > 12926 [ACK] Seq=464950267 Ack=2151 Win=10240 Len=0
> 380011 0.456130000 10.14.10.32 10.222.3.88 TCP 54 [TCP
> ZeroWindow] 12926 > http [ACK] Seq=2151 Ack=464950268 Win=0 Len=0
> 380012 0.953776000 10.222.3.88 10.14.10.32 TCP 60 [TCP
> Keep-Alive] http > 12926 [ACK] Seq=464950267 Ack=2151 Win=10240 Len=0
> 380013 0.953813000 10.14.10.32 10.222.3.88 TCP 54 [TCP
> ZeroWindow] 12926 > http [ACK] Seq=2151 Ack=464950268 Win=0 Len=0
> 380014 1.935785000 10.222.3.88 10.14.10.32 TCP 60 [TCP
> Keep-Alive] http > 12926 [ACK] Seq=464950267 Ack=2151 Win=10240 Len=0
> 380015 1.935818000 10.14.10.32 10.222.3.88 TCP 54 [TCP
> ZeroWindow] 12926 > http [ACK] Seq=2151 Ack=464950268 Win=0 Len=0
> 380016 2.018723000 10.14.10.32 10.222.3.88 TCP 54 [TCP
> Window Update] 12926 > http [ACK] Seq=2151 Ack=464950268 Win=2472960 Len=0
> 380017 2.032298000 10.222.3.88 10.14.10.32 HTTP 1434
> Continuation or non-HTTP traffic
> which yields
> 444920 179.032704000 10.222.3.88 10.14.10.32 HTTP 1434
> Continuation or non-HTTP traffic
> 444921 179.032758000 10.14.10.32 10.222.3.88 TCP 54
> 12926 > http [ACK] Seq=2151 Ack=545077309 Win=198656 Len=0
> 444922 179.889276000 10.14.10.32 10.222.3.88 TCP 54 [TCP
> Window Update] 12926 > http [ACK] Seq=2151 Ack=545077309 Win=905216 Len=0
> 444923 180.088824000 10.14.10.32 10.222.3.88 TCP 54
> 12926 > http [FIN, ACK] Seq=2151 Ack=545077309 Win=905216 Len=0
> 444924 180.103081000 10.222.3.88 10.14.10.32 TCP 60 http
> > 12926 [ACK] Seq=545077309 Ack=2152 Win=10240 Len=0
>
> From 2013-12-16, same as 2013-12-17:
> 178168 140.424974000 10.222.3.88 10.14.10.32 HTTP 1434
> Continuation or non-HTTP traffic
> 178169 140.425010000 10.14.10.32 10.222.3.88 TCP 54
> 33813 > http [ACK] Seq=2151 Ack=220218202 Win=1280 Len=0
> 178170 140.658514000 10.222.3.88 10.14.10.32 HTTP 1334
> [TCP Window Full] Continuation or non-HTTP traffic
> 178171 140.858753000 10.14.10.32 10.222.3.88 TCP 54 [TCP
> ZeroWindow] 33813 > http [ACK] Seq=2151 Ack=220219482 Win=0 Len=0
> 178172 141.119647000 10.222.3.88 10.14.10.32 TCP 60 [TCP
> Keep-Alive] http > 33813 [ACK] Seq=220219481 Ack=2151 Win=10240 Len=0
> 178173 141.119690000 10.14.10.32 10.222.3.88 TCP 54 [TCP
> ZeroWindow] 33813 > http [ACK] Seq=2151 Ack=220219482 Win=0 Len=0
> 178174 141.627353000 10.222.3.88 10.14.10.32 TCP 60 [TCP
> Keep-Alive] http > 33813 [ACK] Seq=220219481 Ack=2151 Win=10240 Len=0
> 178175 141.627385000 10.14.10.32 10.222.3.88 TCP 54 [TCP
> ZeroWindow] 33813 > http [ACK] Seq=2151 Ack=220219482 Win=0 Len=0
> 178176 142.628795000 10.222.3.88 10.14.10.32 TCP 60 [TCP
> Keep-Alive] http > 33813 [ACK] Seq=220219481 Ack=2151 Win=10240 Len=0
> 178177 142.628839000 10.14.10.32 10.222.3.88 TCP 54 [TCP
> ZeroWindow] 33813 > http [ACK] Seq=2151 Ack=220219482 Win=0 Len=0
> 178178 144.619506000 10.222.3.88 10.14.10.32 TCP 60 [TCP
> Keep-Alive] http > 33813 [ACK] Seq=220219481 Ack=2151 Win=10240 Len=0
> 178179 144.619574000 10.14.10.32 10.222.3.88 TCP 54 [TCP
> ZeroWindow] 33813 > http [ACK] Seq=2151 Ack=220219482 Win=0 Len=0
> 178180 148.585393000 10.222.3.88 10.14.10.32 TCP 60 [TCP
> Keep-Alive] http > 33813 [ACK] Seq=220219481 Ack=2151 Win=10240 Len=0
> 178181 148.585447000 10.14.10.32 10.222.3.88 TCP 54 [TCP
> ZeroWindow] 33813 > http [ACK] Seq=2151 Ack=220219482 Win=0 Len=0
> 178182 156.502765000 10.222.3.88 10.14.10.32 TCP 60 [TCP
> Keep-Alive] http > 33813 [ACK] Seq=220219481 Ack=2151 Win=10240 Len=0
> 178183 *REF* 10.14.10.32 10.222.3.88 TCP 54 [TCP
> ZeroWindow] 33813 > http [ACK] Seq=2151 Ack=220219482 Win=0 Len=0
> 178184 15.821813000 10.222.3.88 10.14.10.32 TCP 60 [TCP
> Keep-Alive] http > 33813 [ACK] Seq=220219481 Ack=2151 Win=10240 Len=0
> 178185 15.821845000 10.14.10.32 10.222.3.88 TCP 54 [TCP
> ZeroWindow] 33813 > http [ACK] Seq=2151 Ack=220219482 Win=0 Len=0
> 178186 43.448451000 10.14.10.32 10.222.3.88 TCP 54 [TCP
> Window Update] 33813 > http [ACK] Seq=2151 Ack=220219482 Win=2856448 Len=0
> 178187 43.461842000 10.222.3.88 10.14.10.32 HTTP 1434
> Continuation or non-HTTP traffic
> which yields
> 291476 173.722084000 10.222.3.88 10.14.10.32 HTTP 1434
> Continuation or non-HTTP traffic
> 291477 173.722125000 10.14.10.32 10.222.3.88 TCP 54
> 33813 > http [ACK] Seq=2151 Ack=361058243 Win=52224 Len=0
> 291478 178.322196000 10.14.10.32 10.222.3.88 TCP 54 [TCP
> Window Update] 33813 > http [ACK] Seq=2151 Ack=361058243 Win=1489664 Len=0
> 291479 178.364417000 10.14.10.32 10.222.3.88 TCP 54 [TCP
> Window Update] 33813 > http [ACK] Seq=2151 Ack=361058243 Win=3042816 Len=0
> 291480 180.974205000 10.14.10.32 10.222.3.88 TCP 54
> 33813 > http [FIN, ACK] Seq=2151 Ack=361058243 Win=3042816 Len=0
> 291481 180.988706000 10.222.3.88 10.14.10.32 TCP 60 http
> > 33813 [ACK] Seq=361058243 Ack=2152 Win=10240 Len=0
>
> Nice to see the exponential backoff at work during the TCP zero window.
>
> Maybe this is a red herring. I do find it unusual that all the http
> timeout start points seem to be during TCP zero windows.
>
> ...Stu
>
Re: Slow VM, svn client drops connection with FIN packet
Posted by Stuart MacDonald <st...@gmail.com>.
On Mon, Dec 23, 2013 at 4:47 PM, Stuart MacDonald
<st...@gmail.com>wrote:
> I took some time last Fri to see what was 180 seconds prior to the FIN
> packet in the trace (this is the 1.7.14 failure):
> 289223 259.418521000 10.222.3.88 10.14.10.32 HTTP 1434
> Continuation or non-HTTP traffic
> 289224 259.418583000 10.14.10.32 10.222.3.88 TCP 54
> 60442 > http [ACK] Seq=2151 Ack=358465616 Win=1280 Len=0
> 289225 259.649415000 10.222.3.88 10.14.10.32 HTTP 1334
> [TCP Window Full] Continuation or non-HTTP traffic
> 289226 259.855605000 10.14.10.32 10.222.3.88 TCP 54 [TCP
> ZeroWindow] 60442 > http [ACK] Seq=2151 Ack=358466896 Win=0 Len=0
> 289227 259.868018000 10.222.3.88 10.14.10.32 HTTP 1334
> [TCP Window Full] [TCP Retransmission] Continuation or non-HTTP traffic
> 289228 *REF* 10.14.10.32 10.222.3.88 TCP 66 [TCP
> ZeroWindow] 60442 > http [ACK] Seq=2151 Ack=358466896 Win=0 Len=0
> SLE=358465616 SRE=358466896
> 289229 0.383858000 10.14.10.32 10.222.3.88 TCP 54 [TCP
> Window Update] 60442 > http [ACK] Seq=2151 Ack=358466896 Win=2062848 Len=0
> 289230 0.398005000 10.222.3.88 10.14.10.32 HTTP 1434
> Continuation or non-HTTP traffic
> which yields
> 444658 178.073895000 10.222.3.88 10.14.10.32 HTTP 1434
> Continuation or non-HTTP traffic
> 444659 178.073944000 10.14.10.32 10.222.3.88 TCP 54
> 60442 > http [ACK] Seq=2151 Ack=550721697 Win=542208 Len=0
> 444660 179.906931000 10.14.10.32 10.222.3.88 TCP 54 [TCP
> Window Update] 60442 > http [ACK] Seq=2151 Ack=550721697 Win=1966336 Len=0
> 444661 180.152119000 10.14.10.32 10.222.3.88 TCP 54
> 60442 > http [FIN, ACK] Seq=2151 Ack=550721697 Win=1966336 Len=0
> 444662 180.165529000 10.222.3.88 10.14.10.32 TCP 60 http
> > 60442 [ACK] Seq=550721697 Ack=2152 Win=10240 Len=0
>
> If the time ref is set at (wireshark-)packet 289227 then the FIN packet
> would be at 179.something elapsed. There are a lot of TCP zero windows in
> the trace though, that one isn't special AFAICT. There's plenty before and
> after it.
>
> If someone can let me know what the default for http-timeout is for
> 1.8.5/ra_serf, I'll do the same check in this morning's failure and see
> what's there.
>
Some googling
<http://mail-archives.apache.org/mod_mbox/subversion-commits/201211.mbox/%3C20121120183435.0E86A23888E3@eris.apache.org%3E>reveals
that the default timeout is likely to be 600 seconds. Let's see what we
have there (this is the 1.8.5 failure):
708 25.462185000 10.222.3.88 10.14.10.85 TCP 1434 [TCP
segment of a reassembled PDU]
709 25.462580000 10.14.10.85 10.222.3.88 TCP 54 9415 >
http [ACK] Seq=2713 Ack=753855 Win=1280 Len=0
710 25.707269000 10.222.3.88 10.14.10.85 TCP 1334 [TCP
Window Full] [TCP segment of a reassembled PDU]
711 25.907229000 10.14.10.85 10.222.3.88 TCP 54 [TCP
ZeroWindow] 9415 > http [ACK] Seq=2713 Ack=755135 Win=0 Len=0
712 *REF* 10.14.10.85 10.222.3.88 TCP 54 [TCP Window
Update] 9415 > http [ACK] Seq=2713 Ack=755135 Win=3328 Len=0
713 0.013860000 10.222.3.88 10.14.10.85 TCP 1434 [TCP
segment of a reassembled PDU]
which yields
496958 599.344977000 10.222.3.88 10.14.10.85 HTTP 1434
Continuation or non-HTTP traffic
496959 599.345056000 10.14.10.85 10.222.3.88 TCP 54 9415
> http [ACK] Seq=2713 Ack=612899776 Win=1760768 Len=0
496960 599.345098000 10.14.10.85 10.222.3.88 TCP 54 [TCP
Window Update] 9415 > http [ACK] Seq=2713 Ack=612899776 Win=1762048 Len=0
496961 600.011551000 10.14.10.85 10.222.3.88 TCP 54 9415
> http [FIN, ACK] Seq=2713 Ack=612899776 Win=1762048 Len=0
496962 600.025110000 10.222.3.88 10.14.10.85 TCP 60 http
> 9415 [ACK] Seq=612899776 Ack=2714 Win=12288 Len=0
So that's interesting that it's also at a TCP zero window, but like the
1.7.14 failure, there's nothing special about that one; there are several
before it and many after it.
>From 2013-12-17, 1.7.13 (WANdisco again), http-timeout=180, ra_neon:
380006 238.310421000 10.222.3.88 10.14.10.32 HTTP 1434
Continuation or non-HTTP traffic
380007 238.310501000 10.14.10.32 10.222.3.88 TCP 54 12926
> http [ACK] Seq=2151 Ack=464948988 Win=1280 Len=0
380008 *REF* 10.222.3.88 10.14.10.32 HTTP 1334 [TCP
Window Full] Continuation or non-HTTP traffic
380009 0.200088000 10.14.10.32 10.222.3.88 TCP 54 [TCP
ZeroWindow] 12926 > http [ACK] Seq=2151 Ack=464950268 Win=0 Len=0
380010 0.456070000 10.222.3.88 10.14.10.32 TCP 60 [TCP
Keep-Alive] http > 12926 [ACK] Seq=464950267 Ack=2151 Win=10240 Len=0
380011 0.456130000 10.14.10.32 10.222.3.88 TCP 54 [TCP
ZeroWindow] 12926 > http [ACK] Seq=2151 Ack=464950268 Win=0 Len=0
380012 0.953776000 10.222.3.88 10.14.10.32 TCP 60 [TCP
Keep-Alive] http > 12926 [ACK] Seq=464950267 Ack=2151 Win=10240 Len=0
380013 0.953813000 10.14.10.32 10.222.3.88 TCP 54 [TCP
ZeroWindow] 12926 > http [ACK] Seq=2151 Ack=464950268 Win=0 Len=0
380014 1.935785000 10.222.3.88 10.14.10.32 TCP 60 [TCP
Keep-Alive] http > 12926 [ACK] Seq=464950267 Ack=2151 Win=10240 Len=0
380015 1.935818000 10.14.10.32 10.222.3.88 TCP 54 [TCP
ZeroWindow] 12926 > http [ACK] Seq=2151 Ack=464950268 Win=0 Len=0
380016 2.018723000 10.14.10.32 10.222.3.88 TCP 54 [TCP
Window Update] 12926 > http [ACK] Seq=2151 Ack=464950268 Win=2472960 Len=0
380017 2.032298000 10.222.3.88 10.14.10.32 HTTP 1434
Continuation or non-HTTP traffic
which yields
444920 179.032704000 10.222.3.88 10.14.10.32 HTTP 1434
Continuation or non-HTTP traffic
444921 179.032758000 10.14.10.32 10.222.3.88 TCP 54 12926
> http [ACK] Seq=2151 Ack=545077309 Win=198656 Len=0
444922 179.889276000 10.14.10.32 10.222.3.88 TCP 54 [TCP
Window Update] 12926 > http [ACK] Seq=2151 Ack=545077309 Win=905216 Len=0
444923 180.088824000 10.14.10.32 10.222.3.88 TCP 54 12926
> http [FIN, ACK] Seq=2151 Ack=545077309 Win=905216 Len=0
444924 180.103081000 10.222.3.88 10.14.10.32 TCP 60 http
> 12926 [ACK] Seq=545077309 Ack=2152 Win=10240 Len=0
>From 2013-12-16, same as 2013-12-17:
178168 140.424974000 10.222.3.88 10.14.10.32 HTTP 1434
Continuation or non-HTTP traffic
178169 140.425010000 10.14.10.32 10.222.3.88 TCP 54 33813
> http [ACK] Seq=2151 Ack=220218202 Win=1280 Len=0
178170 140.658514000 10.222.3.88 10.14.10.32 HTTP 1334
[TCP Window Full] Continuation or non-HTTP traffic
178171 140.858753000 10.14.10.32 10.222.3.88 TCP 54 [TCP
ZeroWindow] 33813 > http [ACK] Seq=2151 Ack=220219482 Win=0 Len=0
178172 141.119647000 10.222.3.88 10.14.10.32 TCP 60 [TCP
Keep-Alive] http > 33813 [ACK] Seq=220219481 Ack=2151 Win=10240 Len=0
178173 141.119690000 10.14.10.32 10.222.3.88 TCP 54 [TCP
ZeroWindow] 33813 > http [ACK] Seq=2151 Ack=220219482 Win=0 Len=0
178174 141.627353000 10.222.3.88 10.14.10.32 TCP 60 [TCP
Keep-Alive] http > 33813 [ACK] Seq=220219481 Ack=2151 Win=10240 Len=0
178175 141.627385000 10.14.10.32 10.222.3.88 TCP 54 [TCP
ZeroWindow] 33813 > http [ACK] Seq=2151 Ack=220219482 Win=0 Len=0
178176 142.628795000 10.222.3.88 10.14.10.32 TCP 60 [TCP
Keep-Alive] http > 33813 [ACK] Seq=220219481 Ack=2151 Win=10240 Len=0
178177 142.628839000 10.14.10.32 10.222.3.88 TCP 54 [TCP
ZeroWindow] 33813 > http [ACK] Seq=2151 Ack=220219482 Win=0 Len=0
178178 144.619506000 10.222.3.88 10.14.10.32 TCP 60 [TCP
Keep-Alive] http > 33813 [ACK] Seq=220219481 Ack=2151 Win=10240 Len=0
178179 144.619574000 10.14.10.32 10.222.3.88 TCP 54 [TCP
ZeroWindow] 33813 > http [ACK] Seq=2151 Ack=220219482 Win=0 Len=0
178180 148.585393000 10.222.3.88 10.14.10.32 TCP 60 [TCP
Keep-Alive] http > 33813 [ACK] Seq=220219481 Ack=2151 Win=10240 Len=0
178181 148.585447000 10.14.10.32 10.222.3.88 TCP 54 [TCP
ZeroWindow] 33813 > http [ACK] Seq=2151 Ack=220219482 Win=0 Len=0
178182 156.502765000 10.222.3.88 10.14.10.32 TCP 60 [TCP
Keep-Alive] http > 33813 [ACK] Seq=220219481 Ack=2151 Win=10240 Len=0
178183 *REF* 10.14.10.32 10.222.3.88 TCP 54 [TCP
ZeroWindow] 33813 > http [ACK] Seq=2151 Ack=220219482 Win=0 Len=0
178184 15.821813000 10.222.3.88 10.14.10.32 TCP 60 [TCP
Keep-Alive] http > 33813 [ACK] Seq=220219481 Ack=2151 Win=10240 Len=0
178185 15.821845000 10.14.10.32 10.222.3.88 TCP 54 [TCP
ZeroWindow] 33813 > http [ACK] Seq=2151 Ack=220219482 Win=0 Len=0
178186 43.448451000 10.14.10.32 10.222.3.88 TCP 54 [TCP
Window Update] 33813 > http [ACK] Seq=2151 Ack=220219482 Win=2856448 Len=0
178187 43.461842000 10.222.3.88 10.14.10.32 HTTP 1434
Continuation or non-HTTP traffic
which yields
291476 173.722084000 10.222.3.88 10.14.10.32 HTTP 1434
Continuation or non-HTTP traffic
291477 173.722125000 10.14.10.32 10.222.3.88 TCP 54 33813
> http [ACK] Seq=2151 Ack=361058243 Win=52224 Len=0
291478 178.322196000 10.14.10.32 10.222.3.88 TCP 54 [TCP
Window Update] 33813 > http [ACK] Seq=2151 Ack=361058243 Win=1489664 Len=0
291479 178.364417000 10.14.10.32 10.222.3.88 TCP 54 [TCP
Window Update] 33813 > http [ACK] Seq=2151 Ack=361058243 Win=3042816 Len=0
291480 180.974205000 10.14.10.32 10.222.3.88 TCP 54 33813
> http [FIN, ACK] Seq=2151 Ack=361058243 Win=3042816 Len=0
291481 180.988706000 10.222.3.88 10.14.10.32 TCP 60 http
> 33813 [ACK] Seq=361058243 Ack=2152 Win=10240 Len=0
Nice to see the exponential backoff at work during the TCP zero window.
Maybe this is a red herring. I do find it unusual that all the http timeout
start points seem to be during TCP zero windows.
...Stu
Re: Slow VM, svn client drops connection with FIN packet
Posted by Stuart MacDonald <st...@gmail.com>.
On Mon, Dec 23, 2013 at 3:51 PM, Lieven Govaerts <lg...@apache.org> wrote:
> to be clear, you've tested:
> - svn 1.7.13 with ra_neon, svn 1.7.14 with ra_neon and 1.8.5 with ra_serf.
> - to an svn 1.7.7 server
> - over http, with nginx 1.0.5. as intermediate proxy
> - running 'svn up'
> - on an Ubuntu guest (version ______) using Virtualbox (version
> __________) on host OS ________.
>
> Can you fill in the blanks?
>
- on an Ubuntu guest (version 12.10) using Virtualbox (version 4.2.12
r84980) on host OS Windows 7 64-bit Enterprise SP1
For VirtualBox the guest extensions are installed; the adapter is set in
NAT mode.
The error "The server sent a truncated HTTP response body." you're
> seeing from ra_serf, means that ra_serf is busy reading a response
> from the server but all data isn't arriving.
>
The network trace shows the data arriving:
svn 1.8.5:
496958 625.444757000 10.222.3.88 10.14.10.85 HTTP 1434
Continuation or non-HTTP traffic
496959 625.444836000 10.14.10.85 10.222.3.88 TCP 54 9415
> http [ACK] Seq=2713 Ack=612899776 Win=1760768 Len=0
496960 625.444878000 10.14.10.85 10.222.3.88 TCP 54 [TCP
Window Update] 9415 > http [ACK] Seq=2713 Ack=612899776 Win=1762048 Len=0
496961 626.111331000 10.14.10.85 10.222.3.88 TCP 54 9415
> http [FIN, ACK] Seq=2713 Ack=612899776 Win=1762048 Len=0
496962 626.124890000 10.222.3.88 10.14.10.85 TCP 60 http
> 9415 [ACK] Seq=612899776 Ack=2714 Win=12288 Len=0
svn 1.7.14:
444658 437.941946000 10.222.3.88 10.14.10.32 HTTP 1434
Continuation or non-HTTP traffic
444659 437.941995000 10.14.10.32 10.222.3.88 TCP 54 60442
> http [ACK] Seq=2151 Ack=550721697 Win=542208 Len=0
444660 439.774982000 10.14.10.32 10.222.3.88 TCP 54 [TCP
Window Update] 60442 > http [ACK] Seq=2151 Ack=550721697 Win=1966336 Len=0
444661 440.020170000 10.14.10.32 10.222.3.88 TCP 54 60442
> http [FIN, ACK] Seq=2151 Ack=550721697 Win=1966336 Len=0
444662 440.033580000 10.222.3.88 10.14.10.32 TCP 60 http
> 60442 [ACK] Seq=550721697 Ack=2152 Win=10240 Len=0
One thing to note: a while ago my IT dept had me set "http-timeout=180" to
see if that fixed the problem. The 1.7.14 network capture is with that set
from last Fri. The 1.8.5 is without it set from this morning. I found that
setting it did reduce the frequency of occurrence, but not resolve it
entirely; I went from failing about 3 out of 5 mornings to 1 out of 5. I've
removed it (restored to using the default values) in order to have more
frequent failures for quicker testing.
We have seen this caused by proxies tampering with the response data.
> In this case I suppose the connection is shut down unexpectedly first,
> which ra_serf interprets as 'the connection closed, but I didn't get
> all data for this response yet" => error.
>
I would doubt that the proxy is tampering with the data, otherwise svn
would fail all the time, yes?
Note that the client is sending the [FIN, ACK], not the server.
> You have tested two versions of subversion, each with their own http
> client implementation (serf and neon). Both libraries have been
> implemented completely independently by different people, without
> sharing code. I find it difficult to believe that both libraries have
> the same bug, but ok, it's not impossible.
>
Agreed, it seems unlikely, but not impossible. I'd look first at the svn
layer that uses the libraries.
I took some time last Fri to see what was 180 seconds prior to the FIN
packet in the trace (this is the 1.7.14 failure):
289223 259.418521000 10.222.3.88 10.14.10.32 HTTP 1434
Continuation or non-HTTP traffic
289224 259.418583000 10.14.10.32 10.222.3.88 TCP 54 60442
> http [ACK] Seq=2151 Ack=358465616 Win=1280 Len=0
289225 259.649415000 10.222.3.88 10.14.10.32 HTTP 1334
[TCP Window Full] Continuation or non-HTTP traffic
289226 259.855605000 10.14.10.32 10.222.3.88 TCP 54 [TCP
ZeroWindow] 60442 > http [ACK] Seq=2151 Ack=358466896 Win=0 Len=0
289227 259.868018000 10.222.3.88 10.14.10.32 HTTP 1334
[TCP Window Full] [TCP Retransmission] Continuation or non-HTTP traffic
289228 *REF* 10.14.10.32 10.222.3.88 TCP 66 [TCP
ZeroWindow] 60442 > http [ACK] Seq=2151 Ack=358466896 Win=0 Len=0
SLE=358465616 SRE=358466896
289229 0.383858000 10.14.10.32 10.222.3.88 TCP 54 [TCP
Window Update] 60442 > http [ACK] Seq=2151 Ack=358466896 Win=2062848 Len=0
289230 0.398005000 10.222.3.88 10.14.10.32 HTTP 1434
Continuation or non-HTTP traffic
which yields
444658 178.073895000 10.222.3.88 10.14.10.32 HTTP 1434
Continuation or non-HTTP traffic
444659 178.073944000 10.14.10.32 10.222.3.88 TCP 54 60442
> http [ACK] Seq=2151 Ack=550721697 Win=542208 Len=0
444660 179.906931000 10.14.10.32 10.222.3.88 TCP 54 [TCP
Window Update] 60442 > http [ACK] Seq=2151 Ack=550721697 Win=1966336 Len=0
444661 180.152119000 10.14.10.32 10.222.3.88 TCP 54 60442
> http [FIN, ACK] Seq=2151 Ack=550721697 Win=1966336 Len=0
444662 180.165529000 10.222.3.88 10.14.10.32 TCP 60 http
> 60442 [ACK] Seq=550721697 Ack=2152 Win=10240 Len=0
If the time ref is set at (wireshark-)packet 289227 then the FIN packet
would be at 179.something elapsed. There are a lot of TCP zero windows in
the trace though, that one isn't special AFAICT. There's plenty before and
after it.
If someone can let me know what the default for http-timeout is for
1.8.5/ra_serf, I'll do the same check in this morning's failure and see
what's there.
In case of serf I can add that if serf would actively drop the
> connection due to a timeout, you'd see a different error message. In
> fact, I've searched for error E120106 and it's only raised when a
> clean EOF is returned from the network layer, whereas in case of error
> (TCP RST) the nw layer returns a ECONNABORTED or ECONNRESET status
> code.
>
Right, but that would be if the server drops the connection? The client is
dropping the connection.
> So, my first guess would be a problem with either the guest OS, or
> with the Virtualbox network code (the latter I think is more likely).
>
The packets are being delivered, so I'd expect not, but that's an area I
didn't consider. On that note: the wireshark capture is from the Win7 host,
not the guest. I'm assuming the captures would be the same in both cases,
but that may not be true.
> 1. If you like and are allowed to, you can send me the wireshark .pcap
> file privately (replace username/passwd/basic/digest authn headers
> with X, but keep the same # of chars!).
>
I've been assuming I'm not allowed to; I'd like to leave this to a last
resort.
> Also, the debug logging of both http client libraries would be nice:
> - For svn 1.7.x put neon-debug-mask=255 in your servers file.
>
Done.
> - For 1.8.5 this won't work anymore, you'll have to compile serf from
> source to enable debugging (flags CONN_VERBOSE, SOCK_VERBOSE,
> SOCK_MSG_VERBOSE in serf_private.h).
>
I tried compiling from source last week, but it was pretty painful. I'll
try again over the holidays.
> 2. I'd also test this same operation on a physical machine, with the
> same OS and/or with a different OS.
>
The rest of my team are not running a VM and nobody else sees this problem.
Also, they've syncing the entire repository, I'm only syncing a few (large)
branches to save disk space, so their operations are heavier than mine.
Keep in mind: the post that I linked to in my original post mentions that
the problem seems to be due to the slowness of the VM while svn is
processing the incoming data. I'm expecting this to be a necessary
contributing factor, although not a sufficient factor. IOW I wouldn't
expect this problem to show up on the host OS.
Hm. Given that I'm the only person out of a team of about 10 distributed
coast to coast across Canada, doesn't that rule out the proxy?
I have TortoiseSVN installed, I'll see if I can setup something on the host
OS.
> 3. Regardless of this issue, the upgrade of nginx is important for svn
> 1.8.5, as older versions of nginx have problems with serf's use of
> HTTP 1.1 features. There's a fallback mechanism now, but at the cost
> of some reduced performance.
I'll mention this to our IT.
...Stu
Re: Slow VM, svn client drops connection with FIN packet
Posted by Lieven Govaerts <lg...@apache.org>.
Hi Stu,
On Mon, Dec 23, 2013 at 6:11 PM, <st...@gmail.com> wrote:
> On Tuesday, December 17, 2013 10:32:34 AM UTC-5, Branko Čibej wrote:
>>
>> On 17.12.2013 16:19, Stuart MacDonald wrote:
>> > The Wireshark trace shows clearly and conclusively this is a bug in
>> > the client. The client drops the connection by sending a FIN packet
>> > unexpectedly.
>>
>> It could be waiting for a response from the proxy that never reaches the
>> client, and dropping the connection after a timeout.
>>
>> Regardless, you should still try to reproduce this with the latest
>> client (ideally, both latest 1.7 and 1.8) to determine if the problem
>> has been fixed. Especially in 1.8, there have been a lot of improvements
>> in the HTTP protocol driver.
>
>
> I'm using WANdisco's packages to test with, as it turned out to be easier to
> get those set up than to compile from source with all the optional bits I
> need.
>
> I have seen the exact same issue on 1.7.14:
> svn: E175002: REPORT of '/svn/repos/<name>/!svn/me': Could not read response
> body: connection was closed by server (http://<server>)
> and the Wireshark capture shows the client sending a [FIN, ACK] to close the
> connection, despite it only being a fraction of a second since the last
> server traffic. This always occurs just after the TCP window opening up
> again, not sure if that's significant.
>
> This *also* occurs on 1.8.5, although the error message is more informative:
> svn: E120106: ra_serf: The server sent a truncated HTTP response body.
> The Wireshark capture is *exactly* *identical*; after some server traffic,
> the TCP window opens up and the client sends a [FIN, ACK] to close the
> connection.
to be clear, you've tested:
- svn 1.7.13 with ra_neon, svn 1.7.14 with ra_neon and 1.8.5 with ra_serf.
- to an svn 1.7.7 server
- over http, with nginx 1.0.5. as intermediate proxy
- running 'svn up'
- on an Ubuntu guest (version ______) using Virtualbox (version
__________) on host OS ________.
Can you fill in the blanks?
The error "The server sent a truncated HTTP response body." you're
seeing from ra_serf, means that ra_serf is busy reading a response
from the server but all data isn't arriving.
We have seen this caused by proxies tampering with the response data.
In this case I suppose the connection is shut down unexpectedly first,
which ra_serf interprets as 'the connection closed, but I didn't get
all data for this response yet" => error.
> Having taken a closer look at the network captures; the failure pattern is
> always:
> - 1380 byte packets from the server
> - an ACK from the client
> - a TCP window update from the client (the window hadn't gone to 0 in any
> cases, this is just a regular update)
> - in one case there's another window update from the client
> - a delay of 0 - 4 seconds where there is no traffic in either direction
> - the client sends the [FIN, ACK]
>
>
> Now, I'm not sure what constitutes a truncated HTTP response body, as the
> traffic looks the same in both cases, large blocks of encoded data. So I
> think the 1.8.x error message is also incorrect, but possibly less incorrect
> than the 1.7.x error message.
>
Your setup has many different layers so many possible failing components.
>From your transcript of the wireshark trace it looks like the
connection is shut down on the client side (like you said).
You have tested two versions of subversion, each with their own http
client implementation (serf and neon). Both libraries have been
implemented completely independently by different people, without
sharing code. I find it difficult to believe that both libraries have
the same bug, but ok, it's not impossible.
In case of serf I can add that if serf would actively drop the
connection due to a timeout, you'd see a different error message. In
fact, I've searched for error E120106 and it's only raised when a
clean EOF is returned from the network layer, whereas in case of error
(TCP RST) the nw layer returns a ECONNABORTED or ECONNRESET status
code.
So, my first guess would be a problem with either the guest OS, or
with the Virtualbox network code (the latter I think is more likely).
> What's the next step?
1. If you like and are allowed to, you can send me the wireshark .pcap
file privately (replace username/passwd/basic/digest authn headers
with X, but keep the same # of chars!).
Also, the debug logging of both http client libraries would be nice:
- For svn 1.7.x put neon-debug-mask=255 in your servers file.
- For 1.8.5 this won't work anymore, you'll have to compile serf from
source to enable debugging (flags CONN_VERBOSE, SOCK_VERBOSE,
SOCK_MSG_VERBOSE in serf_private.h).
Maybe with this information I can confirm my reasoning or find another
potential cause.
2. I'd also test this same operation on a physical machine, with the
same OS and/or with a different OS.
3. Regardless of this issue, the upgrade of nginx is important for svn
1.8.5, as older versions of nginx have problems with serf's use of
HTTP 1.1 features. There's a fallback mechanism now, but at the cost
of some reduced performance.
>
> ...Stu
>
Lieven
Re: Slow VM, svn client drops connection with FIN packet
Posted by st...@gmail.com.
On Tuesday, December 17, 2013 10:32:34 AM UTC-5, Branko Čibej wrote:
>
> On 17.12.2013 16:19, Stuart MacDonald wrote:
> > The Wireshark trace shows clearly and conclusively this is a bug in
> > the client. The client drops the connection by sending a FIN packet
> > unexpectedly.
>
> It could be waiting for a response from the proxy that never reaches the
> client, and dropping the connection after a timeout.
>
> Regardless, you should still try to reproduce this with the latest
> client (ideally, both latest 1.7 and 1.8) to determine if the problem
> has been fixed. Especially in 1.8, there have been a lot of improvements
> in the HTTP protocol driver.
>
I'm using WANdisco's packages to test with, as it turned out to be easier
to get those set up than to compile from source with all the optional bits
I need.
I have seen the exact same issue on 1.7.14:
svn: E175002: REPORT of '/svn/repos/<name>/!svn/me': Could not read
response body: connection was closed by server (http://<server>)
and the Wireshark capture shows the client sending a [FIN, ACK] to close
the connection, despite it only being a fraction of a second since the last
server traffic. This always occurs just after the TCP window opening up
again, not sure if that's significant.
This *also* occurs on 1.8.5, although the error message is more informative:
svn: E120106: ra_serf: The server sent a truncated HTTP response body.
The Wireshark capture is *exactly* *identical*; after some server traffic,
the TCP window opens up and the client sends a [FIN, ACK] to close the
connection.
Having taken a closer look at the network captures; the failure pattern is
always:
- 1380 byte packets from the server
- an ACK from the client
- a TCP window update from the client (the window hadn't gone to 0 in any
cases, this is just a regular update)
- in one case there's another window update from the client
- a delay of 0 - 4 seconds where there is no traffic in either direction
- the client sends the [FIN, ACK]
Now, I'm not sure what constitutes a truncated HTTP response body, as the
traffic looks the same in both cases, large blocks of encoded data. So I
think the 1.8.x error message is also incorrect, but possibly less
incorrect than the 1.7.x error message.
What's the next step?
...Stu
Re: Slow VM, svn client drops connection with FIN packet
Posted by Branko Čibej <br...@wandisco.com>.
On 17.12.2013 16:19, Stuart MacDonald wrote:
> On Monday, December 16, 2013 1:45:11 PM UTC-5, Branko Čibej wrote:
>
> So your server is using an nginx proxy, and your 1.7 client
> doesn't work with it. The thing to do is to try to reproduce this
> with 1.8, and if it's reproducible, it's still most likely a proxy
> bug (nginx 1.0.5 is rather old). Reporting this to the server
> administrator would be the next step.
>
>
> The Wireshark trace shows clearly and conclusively this is a bug in
> the client. The client drops the connection by sending a FIN packet
> unexpectedly.
It could be waiting for a response from the proxy that never reaches the
client, and dropping the connection after a timeout.
Regardless, you should still try to reproduce this with the latest
client (ideally, both latest 1.7 and 1.8) to determine if the problem
has been fixed. Especially in 1.8, there have been a lot of improvements
in the HTTP protocol driver.
-- Brane
--
Branko Čibej | Director of Subversion
WANdisco // Non-Stop Data
e. brane@wandisco.com
Re: Slow VM, svn client drops connection with FIN packet
Posted by Stuart MacDonald <st...@gmail.com>.
On Monday, December 16, 2013 1:45:11 PM UTC-5, Branko Čibej wrote:
>
> So your server is using an nginx proxy, and your 1.7 client doesn't work
> with it. The thing to do is to try to reproduce this with 1.8, and if it's
> reproducible, it's still most likely a proxy bug (nginx 1.0.5 is rather
> old). Reporting this to the server administrator would be the next step.
>
The Wireshark trace shows clearly and conclusively this is a bug in the
client. The client drops the connection by sending a FIN packet
unexpectedly.
...Stu
Re: Slow VM, svn client drops connection with FIN packet
Posted by Branko Čibej <br...@wandisco.com>.
On 16.12.2013 18:45, Stuart MacDonald wrote:
> The link I provided is *exactly* what I'm seeing, so I didn't see the
> need to repeat that post. I'm in a VM, the client drops the connection
> with an erroneous [FIN, ACK], just after the TCP window opens up again.
>
> Hm, one detail I can provide now that's not in that post:
> U directory/file1
> U directory/file2
> svn: E175002: REPORT of '/svn/repos/<name>/!svn/me': Could not read
> response body: connection was closed by server (http://<server>)
>
> I don't know what's on the server side, and may not be able to find
> out, but I'll inquire. Hm, actually the network trace shows "Server:
> nginx/1.0.5". Client operation is 'svn up'.
So your server is using an nginx proxy, and your 1.7 client doesn't work
with it. The thing to do is to try to reproduce this with 1.8, and if
it's reproducible, it's still most likely a proxy bug (nginx 1.0.5 is
rather old). Reporting this to the server administrator would be the
next step.
-- Brane
--
Branko Čibej | Director of Subversion
WANdisco // Non-Stop Data
e. brane@wandisco.com
Re: Slow VM, svn client drops connection with FIN packet
Posted by Stuart MacDonald <st...@gmail.com>.
The link I provided is *exactly* what I'm seeing, so I didn't see the need
to repeat that post. I'm in a VM, the client drops the connection with an
erroneous [FIN, ACK], just after the TCP window opens up again.
Hm, one detail I can provide now that's not in that post:
U directory/file1
U directory/file2
svn: E175002: REPORT of '/svn/repos/<name>/!svn/me': Could not read
response body: connection was closed by server (http://<server>)
I don't know what's on the server side, and may not be able to find out,
but I'll inquire. Hm, actually the network trace shows "Server:
nginx/1.0.5". Client operation is 'svn up'.
What else would you like me to do/provide?
...Stu
On Mon, Dec 16, 2013 at 12:05 PM, Ben Reser <be...@reser.org> wrote:
> On Mon Dec 16 07:56:32 2013, Stuart MacDonald wrote:
> >
> > I am seeing exactly this
> > issue:
> https://groups.google.com/forum/#!topic/subversion_users/AZ4M62Qwmjc
> > but do not find a bug report for it in the database. The linked bug is
> for
> > something similar but unrelated. Can I file this?
> >
> > ...Stu
> >
> > stumacdonald@stumacdonald-VirtualBox:~/src/ianywhere$ svn --version
> > svn, version 1.7.13 (r1516569)
> > compiled Aug 27 2013, 09:06:07
> >
> > Copyright (C) 2013 The Apache Software Foundation.
> > This software consists of contributions made by many people; see the
> NOTICE
> > file for more information.
> > Subversion is open source software, see http://subversion.apache.org/
> >
> > The following repository access (RA) modules are available:
> >
> > * ra_neon : Module for accessing a repository via WebDAV protocol using
> Neon.
> > - handles 'http' scheme
> > - handles 'https' scheme
> > * ra_svn : Module for accessing a repository using the svn network
> protocol.
> > - with Cyrus SASL authentication
> > - handles 'svn' scheme
> > * ra_local : Module for accessing a repository on local disk.
> > - handles 'file' scheme
>
> Before reporting an issue you really ought to try it with the most
> recent version of Subversion (1.7.14 and 1.8.5). Since 1.8.0 we only
> use the serf http-library. If you can trigger the same issue with
> 1.8.5 then that's likely a sign that we aren't tending the TCP
> connections as we should (vs the neon http library that your version
> output shows you're using with 1.7.13). Another point is that you
> really should provide more details here. What versions of httpd and
> svn are on the server side? What client operation(s) are you having
> the issue with.
>
> Without a lot more research on your part the bug isn't likely to get
> much interest.
>
Re: Slow VM, svn client drops connection with FIN packet
Posted by Ben Reser <be...@reser.org>.
On Mon Dec 16 07:56:32 2013, Stuart MacDonald wrote:
>
> I am seeing exactly this
> issue: https://groups.google.com/forum/#!topic/subversion_users/AZ4M62Qwmjc
> but do not find a bug report for it in the database. The linked bug is for
> something similar but unrelated. Can I file this?
>
> ...Stu
>
> stumacdonald@stumacdonald-VirtualBox:~/src/ianywhere$ svn --version
> svn, version 1.7.13 (r1516569)
> compiled Aug 27 2013, 09:06:07
>
> Copyright (C) 2013 The Apache Software Foundation.
> This software consists of contributions made by many people; see the NOTICE
> file for more information.
> Subversion is open source software, see http://subversion.apache.org/
>
> The following repository access (RA) modules are available:
>
> * ra_neon : Module for accessing a repository via WebDAV protocol using Neon.
> - handles 'http' scheme
> - handles 'https' scheme
> * ra_svn : Module for accessing a repository using the svn network protocol.
> - with Cyrus SASL authentication
> - handles 'svn' scheme
> * ra_local : Module for accessing a repository on local disk.
> - handles 'file' scheme
Before reporting an issue you really ought to try it with the most
recent version of Subversion (1.7.14 and 1.8.5). Since 1.8.0 we only
use the serf http-library. If you can trigger the same issue with
1.8.5 then that's likely a sign that we aren't tending the TCP
connections as we should (vs the neon http library that your version
output shows you're using with 1.7.13). Another point is that you
really should provide more details here. What versions of httpd and
svn are on the server side? What client operation(s) are you having
the issue with.
Without a lot more research on your part the bug isn't likely to get
much interest.
Re: Slow VM, svn client drops connection with FIN packet
Posted by Thorsten Schöning <ts...@am-soft.de>.
Guten Tag Stuart MacDonald,
am Montag, 16. Dezember 2013 um 18:31 schrieben Sie:
> If someone can point me at a Ubuntu-compatible package[...]
We use the package form Wandisco on our LTS 12.04.
http://subversion.apache.org/packages.html#ubuntu
Mit freundlichen Grüßen,
Thorsten Schöning
--
Thorsten Schöning E-Mail:Thorsten.Schoening@AM-SoFT.de
AM-SoFT IT-Systeme http://www.AM-SoFT.de/
Telefon...........05151- 9468- 55
Fax...............05151- 9468- 88
Mobil..............0178-8 9468- 04
AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln
AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow
Re: Slow VM, svn client drops connection with FIN packet
Posted by Stuart MacDonald <st...@gmail.com>.
On Mon, Dec 16, 2013 at 12:45 PM, Mark Phippard <ma...@gmail.com> wrote:
> On Mon, Dec 16, 2013 at 12:31 PM, Stuart MacDonald <
> stuartm.coding@gmail.com> wrote:
>
>> If someone can point me at a Ubuntu-compatible package, I'm more than
>> happy to upgrade. Last time I looked (within the last two months) I was
>> unable to find anything later than what I'm running. I'd rather not spend
>> the time compiling from source if I can avoid it.
>>
>
> http://subversion.apache.org/packages.html#ubuntu
>
Yeah. The Ubuntu packages don't have either 1.7.14 or 1.8.5. The WANdisco
packages require registration. What I said stands, there are no current
Ubuntu-compatible packages.
So, compiling from sources.
...Stu
Re: Slow VM, svn client drops connection with FIN packet
Posted by Mark Phippard <ma...@gmail.com>.
On Mon, Dec 16, 2013 at 12:31 PM, Stuart MacDonald <stuartm.coding@gmail.com
> wrote:
> If someone can point me at a Ubuntu-compatible package, I'm more than
> happy to upgrade. Last time I looked (within the last two months) I was
> unable to find anything later than what I'm running. I'd rather not spend
> the time compiling from source if I can avoid it.
>
http://subversion.apache.org/packages.html#ubuntu
Please elucidate "capture traces". I already have a Wireshark capture of
> the failure. I see exactly what the poster saw: the client drops the
> connection with an unexplained [FIN, ACK].
>
I meant capture new traces once you are on 1.8.5 (assuming the problem
still exists). Different libraries are used now, so the error might
manifest differently.
> This is easily recreated for me: I run this update once a morning, and see
> the failure about 3 days out of 5. I've been working with my internal IT
> group, but they haven't been able to help much. They did have me add
> 'http-timeout = 180" to the .subversion/servers [global] section; this cut
> down the failures to about 1 out of 5 days.
>
Simply saying that this is not the sort of error that a dev can attach a
debugger to, so having traces from the latest version might help.
> Agreed; getting the bug fixed is the victory. However, this is clearly two
> bugs: 1) the error message is wrong, 2) the client is dropping the
> connection. Those need to be in the bug database unless they are already
> present, or already fixed. Neither seemed to be the case when I posted.
>
You are not using 1.8, so how do you know? The vast majority of bugs do
not go through the tracker. I am simply pointing out that getting this
into the tracker will not get it any more attention.
--
Thanks
Mark Phippard
http://markphip.blogspot.com/
Re: Slow VM, svn client drops connection with FIN packet
Posted by Stuart MacDonald <st...@gmail.com>.
If someone can point me at a Ubuntu-compatible package, I'm more than happy
to upgrade. Last time I looked (within the last two months) I was unable to
find anything later than what I'm running. I'd rather not spend the time
compiling from source if I can avoid it.
Please elucidate "capture traces". I already have a Wireshark capture of
the failure. I see exactly what the poster saw: the client drops the
connection with an unexplained [FIN, ACK].
This is easily recreated for me: I run this update once a morning, and see
the failure about 3 days out of 5. I've been working with my internal IT
group, but they haven't been able to help much. They did have me add
'http-timeout = 180" to the .subversion/servers [global] section; this cut
down the failures to about 1 out of 5 days.
Agreed; getting the bug fixed is the victory. However, this is clearly two
bugs: 1) the error message is wrong, 2) the client is dropping the
connection. Those need to be in the bug database unless they are already
present, or already fixed. Neither seemed to be the case when I posted.
...Stu
On Mon, Dec 16, 2013 at 11:59 AM, Mark Phippard <ma...@gmail.com> wrote:
> On Mon, Dec 16, 2013 at 10:56 AM, Stuart MacDonald <
> stuartm.coding@gmail.com> wrote:
>
>>
>> I am seeing exactly this issue:
>> https://groups.google.com/forum/#!topic/subversion_users/AZ4M62Qwmjc but
>> do not find a bug report for it in the database. The linked bug is for
>> something similar but unrelated. Can I file this?
>>
>> ...Stu
>>
>> stumacdonald@stumacdonald-VirtualBox:~/src/ianywhere$ svn --version
>> svn, version 1.7.13 (r1516569)
>> compiled Aug 27 2013, 09:06:07
>>
>
>
> Subversion 1.8 has an entirely new HTTP networking layer. So the first
> thing you should do is start with the latest version (1.8.5) so you can see
> if the problem is either fixed or manifests differently. You will likely
> also need to capture traces etc. to have any chance of getting this solved
> since it is not something that can be easily recreated.
>
> Finally, getting something filed in the issue tracker is not a victory.
> This mailing list is the best place to get attention for your problem and
> interact with others etc.
>
> --
> Thanks
>
> Mark Phippard
> http://markphip.blogspot.com/
>
Re: Slow VM, svn client drops connection with FIN packet
Posted by Mark Phippard <ma...@gmail.com>.
On Mon, Dec 16, 2013 at 10:56 AM, Stuart MacDonald <stuartm.coding@gmail.com
> wrote:
>
> I am seeing exactly this issue:
> https://groups.google.com/forum/#!topic/subversion_users/AZ4M62Qwmjc but
> do not find a bug report for it in the database. The linked bug is for
> something similar but unrelated. Can I file this?
>
> ...Stu
>
> stumacdonald@stumacdonald-VirtualBox:~/src/ianywhere$ svn --version
> svn, version 1.7.13 (r1516569)
> compiled Aug 27 2013, 09:06:07
>
Subversion 1.8 has an entirely new HTTP networking layer. So the first
thing you should do is start with the latest version (1.8.5) so you can see
if the problem is either fixed or manifests differently. You will likely
also need to capture traces etc. to have any chance of getting this solved
since it is not something that can be easily recreated.
Finally, getting something filed in the issue tracker is not a victory.
This mailing list is the best place to get attention for your problem and
interact with others etc.
--
Thanks
Mark Phippard
http://markphip.blogspot.com/