You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@subversion.apache.org by st...@gmail.com on 2014/01/13 18:07:01 UTC

Re: Slow VM, svn client drops connection with FIN packet

I have another failure, with some debug info, and some updates.

We're switching to a different system for this particular repository, so I 
may or may not be able to continue debug efforts. :-(

Since there's no debug info available for the svn 1.8 client, I haven't 
been running that.

I went a full week without a failure, which I suspect is due to the 
enormous amount of debugging that neon-debug-mask=255 produces, so this 
week I dialed it back to 127 and got a failure on the first try.

Here's the latest failure. First up, the debug info:
Got 8192 bytes.
XML: Parsing 8192 bytes.
XML: char-data (235) returns 0
XML: xmlParseChunk returned 0
Reading 8192 bytes of response body.
Got 6600 bytes.
XML: Parsing 6600 bytes.
XML: char-data (235) returns 0
XML: xmlParseChunk returned 0
Reading 8192 bytes of response body.
Aborted request (-3): Could not read response body
sess: Closing connection.
sess: Connection closed.
Request ends, status 200 class 2xx, error line:
Could not read response body: connection was closed by server
Running destroy hooks.
Request ends.
svn: E175002: REPORT of '/svn/repos/<name>/!svn/me': Could not read 
response body: connection was closed by server (http://<server>)
sess: Destroying session.
sess: Destroying session.

The capture info, with the default timeout which I am assuming is 600 
seconds:
77081    82.621499000    10.222.3.88    10.14.11.50    HTTP    1434    
Continuation or non-HTTP traffic
77082    82.621541000    10.14.11.50    10.222.3.88    TCP    54    36603 > 
http [ACK] Seq=1850961 Ack=92395908 Win=1280 Len=0
77083    *REF*    10.222.3.88    10.14.11.50    HTTP    1334    [TCP Window 
Full] Continuation or non-HTTP traffic
77084    0.200407000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ZeroWindow] 36603 > http [ACK] Seq=1850961 Ack=92397188 Win=0 Len=0
77085    0.458337000    10.222.3.88    10.14.11.50    TCP    60    [TCP 
Keep-Alive] http > 36603 [ACK] Seq=92397187 Ack=1850961 Win=893952 Len=0
77086    0.458390000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ZeroWindow] 36603 > http [ACK] Seq=1850961 Ack=92397188 Win=0 Len=0
77087    0.960584000    10.222.3.88    10.14.11.50    TCP    60    [TCP 
Keep-Alive] http > 36603 [ACK] Seq=92397187 Ack=1850961 Win=893952 Len=0
77088    0.960653000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ZeroWindow] 36603 > http [ACK] Seq=1850961 Ack=92397188 Win=0 Len=0
77089    1.950177000    10.222.3.88    10.14.11.50    TCP    60    [TCP 
Keep-Alive] http > 36603 [ACK] Seq=92397187 Ack=1850961 Win=893952 Len=0
77090    1.950213000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ZeroWindow] 36603 > http [ACK] Seq=1850961 Ack=92397188 Win=0 Len=0
77092    2.991154000    10.222.3.88    10.14.11.50    HTTP    1434    
Continuation or non-HTTP traffic
which gives
395638    524.989307000    10.222.3.88    10.14.11.50    HTTP    1434    
Continuation or non-HTTP traffic
395639    524.989401000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=763605828 
Win=746496 Len=0
395640    524.989606000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=763615488 
Win=736768 Len=0
395641    524.989883000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=763629288 
Win=722944 Len=0
395642    524.990140000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=763643088 
Win=709120 Len=0
395643    525.869551000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=766011168 
Win=1366016 Len=0
395644    526.501877000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ZeroWindow] [TCP ACKed unseen segment] 36603 > http [ACK] Seq=1850961 
Ack=767377268 Win=0 Len=0
395645    527.769324000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=768063228 
Win=1101056 Len=0
395646    527.783100000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=768128088 
Win=1116416 Len=0
395647    527.786772000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=768215028 
Win=1154816 Len=0
395648    527.815465000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=768347508 
Win=1218304 Len=0
395649    527.818509000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=768373728 
Win=1204736 Len=0
395650    527.872518000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=768698028 
Win=1233664 Len=0
395651    528.026681000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=769930368 
Win=1239040 Len=0
395652    528.086233000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=770440968 
Win=1298432 Len=0
395653    528.683101000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ZeroWindow] [TCP ACKed unseen segment] 36603 > http [ACK] Seq=1850961 
Ack=771739448 Win=0 Len=0
395654    529.544542000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=772977408 
Win=757504 Len=0
395655    529.591855000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=773252028 
Win=798976 Len=0
395656    529.660855000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=773881308 
Win=423424 Len=0
395657    529.688608000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=774109008 
Win=669184 Len=0
395658    529.860154000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=775638048 
Win=611328 Len=0
395659    530.033617000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=777532788 
Win=1091328 Len=0
395660    530.224557000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=779399928 
Win=1686272 Len=0
395661    530.375431000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=781086288 
Win=716032 Len=0
395662    531.350496000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=785885928 
Win=739584 Len=0
395663    531.625875000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=786624228 Win=1280 
Len=0
395664    532.084955000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ZeroWindow] [TCP ACKed unseen segment] 36603 > http [ACK] Seq=1850961 
Ack=786625508 Win=0 Len=0
395665    532.880166000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=786974748 
Win=721664 Len=0
395666    532.910934000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=787118268 
Win=832000 Len=0
395667    532.920303000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=787183128 
Win=785152 Len=0
395668    532.955237000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=787427388 
Win=1219072 Len=0
395669    532.955463000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
Window Update] [TCP ACKed unseen segment] 36603 > http [ACK] Seq=1850961 
Ack=787427388 Win=1248768 Len=0
395670    532.960093000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=787490868 
Win=1323264 Len=0
395671    532.970228000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=787557108 
Win=1404672 Len=0
395672    532.984501000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=787678548 
Win=1502720 Len=0
395673    532.999351000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=787786188 
Win=1514496 Len=0
395674    533.019030000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=788026308 
Win=1484800 Len=0
395675    533.032385000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=788144988 
Win=1488896 Len=0
395676    533.047272000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=788299548 
Win=1450240 Len=0
395677    533.188960000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=790268808 
Win=953344 Len=0
395678    533.984603000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ZeroWindow] [TCP ACKed unseen segment] 36603 > http [ACK] Seq=1850961 
Ack=794394908 Win=0 Len=0
395679    549.627263000    10.222.3.88    10.14.11.50    TCP    60    [TCP 
Previous segment not captured] http > 36603 [ACK] Seq=794394907 Ack=1850961 
Win=893952 Len=0
395680    549.627315000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ZeroWindow] [TCP ACKed unseen segment] 36603 > http [ACK] Seq=1850961 
Ack=794394908 Win=0 Len=0
395681    565.449108000    10.222.3.88    10.14.11.50    TCP    60    [TCP 
Dup ACK 395679#1] http > 36603 [ACK] Seq=794394907 Ack=1850961 Win=893952 
Len=0
395682    565.449169000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ZeroWindow] [TCP ACKed unseen segment] 36603 > http [ACK] Seq=1850961 
Ack=794394908 Win=0 Len=0
395683    599.797841000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=794858688 
Win=1141248 Len=0
395684    599.810002000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=794875248 
Win=1172992 Len=0
395685    599.825538000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=794978748 
Win=1188096 Len=0
395686    599.839904000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=795000828 
Win=1233664 Len=0
395687    599.843870000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=795054648 
Win=1185280 Len=0
395688    599.857927000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ACKed unseen segment] 36603 > http [ACK] Seq=1850961 Ack=795096048 
Win=1149440 Len=0
395689    600.137326000    10.14.11.50    10.222.3.88    TCP    54    [TCP 
ACKed unseen segment] 36603 > http [FIN, ACK] Seq=1850961 Ack=795809509 
Win=1526272 Len=0

Any thoughts, now that there's a bit of debug info?

...Stu

On Monday, December 23, 2013 5:40:30 PM UTC-5, Stuart MacDonald wrote:
>
> On Mon, Dec 23, 2013 at 4:47 PM, Stuart MacDonald <
> stuartm.coding@gmail.com> wrote:
>
>> I took some time last Fri to see what was 180 seconds prior to the FIN 
>> packet in the trace (this is the 1.7.14 failure):
>>  289223    259.418521000    10.222.3.88    10.14.10.32    HTTP    
>> 1434    Continuation or non-HTTP traffic
>> 289224    259.418583000    10.14.10.32    10.222.3.88    TCP    54    
>> 60442 > http [ACK] Seq=2151 Ack=358465616 Win=1280 Len=0
>> 289225    259.649415000    10.222.3.88    10.14.10.32    HTTP    1334    
>> [TCP Window Full] Continuation or non-HTTP traffic
>> 289226    259.855605000    10.14.10.32    10.222.3.88    TCP    54    
>> [TCP ZeroWindow] 60442 > http [ACK] Seq=2151 Ack=358466896 Win=0 Len=0
>> 289227    259.868018000    10.222.3.88    10.14.10.32    HTTP    1334    
>> [TCP Window Full] [TCP Retransmission] Continuation or non-HTTP traffic
>> 289228    *REF*    10.14.10.32    10.222.3.88    TCP    66    [TCP 
>> ZeroWindow] 60442 > http [ACK] Seq=2151 Ack=358466896 Win=0 Len=0 
>> SLE=358465616 SRE=358466896
>> 289229    0.383858000    10.14.10.32    10.222.3.88    TCP    54    [TCP 
>> Window Update] 60442 > http [ACK] Seq=2151 Ack=358466896 Win=2062848 Len=0
>> 289230    0.398005000    10.222.3.88    10.14.10.32    HTTP    1434    
>> Continuation or non-HTTP traffic
>> which yields
>> 444658    178.073895000    10.222.3.88    10.14.10.32    HTTP    1434    
>> Continuation or non-HTTP traffic
>> 444659    178.073944000    10.14.10.32    10.222.3.88    TCP    54    
>> 60442 > http [ACK] Seq=2151 Ack=550721697 Win=542208 Len=0
>> 444660    179.906931000    10.14.10.32    10.222.3.88    TCP    54    
>> [TCP Window Update] 60442 > http [ACK] Seq=2151 Ack=550721697 Win=1966336 
>> Len=0
>> 444661    180.152119000    10.14.10.32    10.222.3.88    TCP    54    
>> 60442 > http [FIN, ACK] Seq=2151 Ack=550721697 Win=1966336 Len=0
>> 444662    180.165529000    10.222.3.88    10.14.10.32    TCP    60    
>> http > 60442 [ACK] Seq=550721697 Ack=2152 Win=10240 Len=0
>>  
>> If the time ref is set at (wireshark-)packet 289227 then the FIN packet 
>> would be at 179.something elapsed. There are a lot of TCP zero windows in 
>> the trace though, that one isn't special AFAICT. There's plenty before and 
>> after it.
>>
>> If someone can let me know what the default for http-timeout is for 
>> 1.8.5/ra_serf, I'll do the same check in this morning's failure and see 
>> what's there.
>>
>
> Some googling 
> <http://mail-archives.apache.org/mod_mbox/subversion-commits/201211.mbox/%3C20121120183435.0E86A23888E3@eris.apache.org%3E>reveals 
> that the default timeout is likely to be 600 seconds. Let's see what we 
> have there (this is the 1.8.5 failure):
> 708    25.462185000    10.222.3.88    10.14.10.85    TCP    1434    [TCP 
> segment of a reassembled PDU]
> 709    25.462580000    10.14.10.85    10.222.3.88    TCP    54    9415 > 
> http [ACK] Seq=2713 Ack=753855 Win=1280 Len=0
> 710    25.707269000    10.222.3.88    10.14.10.85    TCP    1334    [TCP 
> Window Full] [TCP segment of a reassembled PDU]
> 711    25.907229000    10.14.10.85    10.222.3.88    TCP    54    [TCP 
> ZeroWindow] 9415 > http [ACK] Seq=2713 Ack=755135 Win=0 Len=0
> 712    *REF*    10.14.10.85    10.222.3.88    TCP    54    [TCP Window 
> Update] 9415 > http [ACK] Seq=2713 Ack=755135 Win=3328 Len=0
> 713    0.013860000    10.222.3.88    10.14.10.85    TCP    1434    [TCP 
> segment of a reassembled PDU]
> which yields
> 496958    599.344977000    10.222.3.88    10.14.10.85    HTTP    1434    
> Continuation or non-HTTP traffic
> 496959    599.345056000    10.14.10.85    10.222.3.88    TCP    54    9415 
> > http [ACK] Seq=2713 Ack=612899776 Win=1760768 Len=0
> 496960    599.345098000    10.14.10.85    10.222.3.88    TCP    54    [TCP 
> Window Update] 9415 > http [ACK] Seq=2713 Ack=612899776 Win=1762048 Len=0
> 496961    600.011551000    10.14.10.85    10.222.3.88    TCP    54    9415 
> > http [FIN, ACK] Seq=2713 Ack=612899776 Win=1762048 Len=0
> 496962    600.025110000    10.222.3.88    10.14.10.85    TCP    60    http 
> > 9415 [ACK] Seq=612899776 Ack=2714 Win=12288 Len=0
>
> So that's interesting that it's also at a TCP zero window, but like the 
> 1.7.14 failure, there's nothing special about that one; there are several 
> before it and many after it.
>
> From 2013-12-17, 1.7.13 (WANdisco again), http-timeout=180, ra_neon:
> 380006    238.310421000    10.222.3.88    10.14.10.32    HTTP    1434    
> Continuation or non-HTTP traffic
> 380007    238.310501000    10.14.10.32    10.222.3.88    TCP    54    
> 12926 > http [ACK] Seq=2151 Ack=464948988 Win=1280 Len=0
> 380008    *REF*    10.222.3.88    10.14.10.32    HTTP    1334    [TCP 
> Window Full] Continuation or non-HTTP traffic
> 380009    0.200088000    10.14.10.32    10.222.3.88    TCP    54    [TCP 
> ZeroWindow] 12926 > http [ACK] Seq=2151 Ack=464950268 Win=0 Len=0
> 380010    0.456070000    10.222.3.88    10.14.10.32    TCP    60    [TCP 
> Keep-Alive] http > 12926 [ACK] Seq=464950267 Ack=2151 Win=10240 Len=0
> 380011    0.456130000    10.14.10.32    10.222.3.88    TCP    54    [TCP 
> ZeroWindow] 12926 > http [ACK] Seq=2151 Ack=464950268 Win=0 Len=0
> 380012    0.953776000    10.222.3.88    10.14.10.32    TCP    60    [TCP 
> Keep-Alive] http > 12926 [ACK] Seq=464950267 Ack=2151 Win=10240 Len=0
> 380013    0.953813000    10.14.10.32    10.222.3.88    TCP    54    [TCP 
> ZeroWindow] 12926 > http [ACK] Seq=2151 Ack=464950268 Win=0 Len=0
> 380014    1.935785000    10.222.3.88    10.14.10.32    TCP    60    [TCP 
> Keep-Alive] http > 12926 [ACK] Seq=464950267 Ack=2151 Win=10240 Len=0
> 380015    1.935818000    10.14.10.32    10.222.3.88    TCP    54    [TCP 
> ZeroWindow] 12926 > http [ACK] Seq=2151 Ack=464950268 Win=0 Len=0
> 380016    2.018723000    10.14.10.32    10.222.3.88    TCP    54    [TCP 
> Window Update] 12926 > http [ACK] Seq=2151 Ack=464950268 Win=2472960 Len=0
> 380017    2.032298000    10.222.3.88    10.14.10.32    HTTP    1434    
> Continuation or non-HTTP traffic
> which yields
> 444920    179.032704000    10.222.3.88    10.14.10.32    HTTP    1434    
> Continuation or non-HTTP traffic
> 444921    179.032758000    10.14.10.32    10.222.3.88    TCP    54    
> 12926 > http [ACK] Seq=2151 Ack=545077309 Win=198656 Len=0
> 444922    179.889276000    10.14.10.32    10.222.3.88    TCP    54    [TCP 
> Window Update] 12926 > http [ACK] Seq=2151 Ack=545077309 Win=905216 Len=0
> 444923    180.088824000    10.14.10.32    10.222.3.88    TCP    54    
> 12926 > http [FIN, ACK] Seq=2151 Ack=545077309 Win=905216 Len=0
> 444924    180.103081000    10.222.3.88    10.14.10.32    TCP    60    http 
> > 12926 [ACK] Seq=545077309 Ack=2152 Win=10240 Len=0
>
> From 2013-12-16, same as 2013-12-17:
> 178168    140.424974000    10.222.3.88    10.14.10.32    HTTP    1434    
> Continuation or non-HTTP traffic
> 178169    140.425010000    10.14.10.32    10.222.3.88    TCP    54    
> 33813 > http [ACK] Seq=2151 Ack=220218202 Win=1280 Len=0
> 178170    140.658514000    10.222.3.88    10.14.10.32    HTTP    1334    
> [TCP Window Full] Continuation or non-HTTP traffic
> 178171    140.858753000    10.14.10.32    10.222.3.88    TCP    54    [TCP 
> ZeroWindow] 33813 > http [ACK] Seq=2151 Ack=220219482 Win=0 Len=0
> 178172    141.119647000    10.222.3.88    10.14.10.32    TCP    60    [TCP 
> Keep-Alive] http > 33813 [ACK] Seq=220219481 Ack=2151 Win=10240 Len=0
> 178173    141.119690000    10.14.10.32    10.222.3.88    TCP    54    [TCP 
> ZeroWindow] 33813 > http [ACK] Seq=2151 Ack=220219482 Win=0 Len=0
> 178174    141.627353000    10.222.3.88    10.14.10.32    TCP    60    [TCP 
> Keep-Alive] http > 33813 [ACK] Seq=220219481 Ack=2151 Win=10240 Len=0
> 178175    141.627385000    10.14.10.32    10.222.3.88    TCP    54    [TCP 
> ZeroWindow] 33813 > http [ACK] Seq=2151 Ack=220219482 Win=0 Len=0
> 178176    142.628795000    10.222.3.88    10.14.10.32    TCP    60    [TCP 
> Keep-Alive] http > 33813 [ACK] Seq=220219481 Ack=2151 Win=10240 Len=0
> 178177    142.628839000    10.14.10.32    10.222.3.88    TCP    54    [TCP 
> ZeroWindow] 33813 > http [ACK] Seq=2151 Ack=220219482 Win=0 Len=0
> 178178    144.619506000    10.222.3.88    10.14.10.32    TCP    60    [TCP 
> Keep-Alive] http > 33813 [ACK] Seq=220219481 Ack=2151 Win=10240 Len=0
> 178179    144.619574000    10.14.10.32    10.222.3.88    TCP    54    [TCP 
> ZeroWindow] 33813 > http [ACK] Seq=2151 Ack=220219482 Win=0 Len=0
> 178180    148.585393000    10.222.3.88    10.14.10.32    TCP    60    [TCP 
> Keep-Alive] http > 33813 [ACK] Seq=220219481 Ack=2151 Win=10240 Len=0
> 178181    148.585447000    10.14.10.32    10.222.3.88    TCP    54    [TCP 
> ZeroWindow] 33813 > http [ACK] Seq=2151 Ack=220219482 Win=0 Len=0
> 178182    156.502765000    10.222.3.88    10.14.10.32    TCP    60    [TCP 
> Keep-Alive] http > 33813 [ACK] Seq=220219481 Ack=2151 Win=10240 Len=0
> 178183    *REF*    10.14.10.32    10.222.3.88    TCP    54    [TCP 
> ZeroWindow] 33813 > http [ACK] Seq=2151 Ack=220219482 Win=0 Len=0
> 178184    15.821813000    10.222.3.88    10.14.10.32    TCP    60    [TCP 
> Keep-Alive] http > 33813 [ACK] Seq=220219481 Ack=2151 Win=10240 Len=0
> 178185    15.821845000    10.14.10.32    10.222.3.88    TCP    54    [TCP 
> ZeroWindow] 33813 > http [ACK] Seq=2151 Ack=220219482 Win=0 Len=0
> 178186    43.448451000    10.14.10.32    10.222.3.88    TCP    54    [TCP 
> Window Update] 33813 > http [ACK] Seq=2151 Ack=220219482 Win=2856448 Len=0
> 178187    43.461842000    10.222.3.88    10.14.10.32    HTTP    1434    
> Continuation or non-HTTP traffic
> which yields
> 291476    173.722084000    10.222.3.88    10.14.10.32    HTTP    1434    
> Continuation or non-HTTP traffic
> 291477    173.722125000    10.14.10.32    10.222.3.88    TCP    54    
> 33813 > http [ACK] Seq=2151 Ack=361058243 Win=52224 Len=0
> 291478    178.322196000    10.14.10.32    10.222.3.88    TCP    54    [TCP 
> Window Update] 33813 > http [ACK] Seq=2151 Ack=361058243 Win=1489664 Len=0
> 291479    178.364417000    10.14.10.32    10.222.3.88    TCP    54    [TCP 
> Window Update] 33813 > http [ACK] Seq=2151 Ack=361058243 Win=3042816 Len=0
> 291480    180.974205000    10.14.10.32    10.222.3.88    TCP    54    
> 33813 > http [FIN, ACK] Seq=2151 Ack=361058243 Win=3042816 Len=0
> 291481    180.988706000    10.222.3.88    10.14.10.32    TCP    60    http 
> > 33813 [ACK] Seq=361058243 Ack=2152 Win=10240 Len=0
>
> Nice to see the exponential backoff at work during the TCP zero window.
>
> Maybe this is a red herring. I do find it unusual that all the http 
> timeout start points seem to be during TCP zero windows.
>
> ...Stu
>

Re: Slow VM, svn client drops connection with FIN packet

Posted by st...@gmail.com.
I replied yesterday, but my post seems to have disappeared.

I figured it out, and the key was your reproduction and my realisation I 
wasn't seeing a FIN from my server. It turns out that in every case the last
392598 71.906863000 10.222.3.88 10.14.11.50 HTTP 1434 Continuation or 
non-HTTP traffic
line from the server is actually also a FIN packet, and Wireshark (or the 
TCP dissector) is "hiding" that; if I look at the packet I can see the FIN 
flag, but it's not mentioned in the Info field in the packet list. So the 
FINs from my client are close handshake responses not initiations, and it 
is the server timing out after all.

Thanks for the help. I'm checking with my IT dept to see if they can bump 
up the timeout.

...Stu

On Tuesday, January 14, 2014 11:55:39 AM UTC-5, Philip Martin wrote:
>
> Stuart MacDonald <stuartm...@gmail.com <javascript:>> writes: 
>
> > svn: E175002: REPORT of '/svn/repos/deckard-65x/!svn/me': Could not read 
> > response body: connection was closed by server (http://foundry51.qnx.com) 
>
>
> I can provoke this on my local LAN by setting the Apache timeout on the 
> server to a low value such as 5 seconds.  Then I run a checkout that 
> generates a large REPORT response and I use gdb to interrupt the client 
> at subversion/libsvn_ra_neon/fetch.c:1709 so that the client is in the 
> middle of reading the repsonse.  I wait for the server timeout to 
> expire and then resume the client and I get the error: 
>
> svn: E175002: REPORT of '/obj/repo/!svn/me': Could not read response body: 
> connection was closed by server (http://peri.local:8888) 
>
> One fix is to increase the server timeout to handle your slow client. 
>
> -- 
> Philip Martin | Subversion Committer 
> WANdisco // *Non-Stop Data* 
>

Re: Slow VM, svn client drops connection with FIN packet

Posted by Philip Martin <ph...@wandisco.com>.
Stuart MacDonald <st...@gmail.com> writes:

> svn: E175002: REPORT of '/svn/repos/deckard-65x/!svn/me': Could not read
> response body: connection was closed by server (http://foundry51.qnx.com)

I can provoke this on my local LAN by setting the Apache timeout on the
server to a low value such as 5 seconds.  Then I run a checkout that
generates a large REPORT response and I use gdb to interrupt the client
at subversion/libsvn_ra_neon/fetch.c:1709 so that the client is in the
middle of reading the repsonse.  I wait for the server timeout to
expire and then resume the client and I get the error:

svn: E175002: REPORT of '/obj/repo/!svn/me': Could not read response body: connection was closed by server (http://peri.local:8888)

One fix is to increase the server timeout to handle your slow client.

-- 
Philip Martin | Subversion Committer
WANdisco // *Non-Stop Data*

Re: Slow VM, svn client drops connection with FIN packet

Posted by Stuart MacDonald <st...@gmail.com>.
Here's this morning's failure. For testing, I've set http-timeout=67. I am
trying to see if the zero window is a coincidence or not.

The debug output:
Reading 8192 bytes of response body.
Got 8192 bytes.
XML: Parsing 8192 bytes.
XML: char-data (235) returns 0
XML: xmlParseChunk returned 0
Reading 8192 bytes of response body.
Got 6180 bytes.
XML: Parsing 6180 bytes.
XML: char-data (235) returns 0
XML: xmlParseChunk returned 0
Reading 8192 bytes of response body.
Aborted request (-3): Could not read response body
sess: Closing connection.
sess: Connection closed.
Request ends, status 200 class 2xx, error line:
Could not read response body: connection was closed by server
Running destroy hooks.
Request ends.
svn: E175002: REPORT of '/svn/repos/deckard-65x/!svn/me': Could not read
response body: connection was closed by server (http://foundry51.qnx.com)
sess: Destroying session.
sess: Destroying session.

The network capture:
391174    498.610439000    10.222.3.88    10.14.11.50    HTTP    1434
Continuation or non-HTTP traffic
391175    498.610495000    10.14.11.50    10.222.3.88    TCP    54    37480
> http [ACK] Seq=1945767 Ack=480574851 Win=1280 Len=0
391176    498.842179000    10.222.3.88    10.14.11.50    HTTP    1334
[TCP Window Full] Continuation or non-HTTP traffic
391177    499.042264000    10.14.11.50    10.222.3.88    TCP    54    [TCP
ZeroWindow] 37480 > http [ACK] Seq=1945767 Ack=480576131 Win=0 Len=0
391178    499.297972000    10.222.3.88    10.14.11.50    TCP    60    [TCP
Keep-Alive] http > 37480 [ACK] Seq=480576130 Ack=1945767 Win=908288 Len=0
391179    499.298065000    10.14.11.50    10.222.3.88    TCP    54    [TCP
ZeroWindow] 37480 > http [ACK] Seq=1945767 Ack=480576131 Win=0 Len=0
391180    499.796762000    10.222.3.88    10.14.11.50    TCP    60    [TCP
Keep-Alive] http > 37480 [ACK] Seq=480576130 Ack=1945767 Win=908288 Len=0
391181    499.796811000    10.14.11.50    10.222.3.88    TCP    54    [TCP
ZeroWindow] 37480 > http [ACK] Seq=1945767 Ack=480576131 Win=0 Len=0
391182    500.778349000    10.222.3.88    10.14.11.50    TCP    60    [TCP
Keep-Alive] http > 37480 [ACK] Seq=480576130 Ack=1945767 Win=908288 Len=0
391183    500.778405000    10.14.11.50    10.222.3.88    TCP    54    [TCP
ZeroWindow] 37480 > http [ACK] Seq=1945767 Ack=480576131 Win=0 Len=0
391184    502.728191000    10.222.3.88    10.14.11.50    TCP    60    [TCP
Keep-Alive] http > 37480 [ACK] Seq=480576130 Ack=1945767 Win=908288 Len=0
391185    502.728260000    10.14.11.50    10.222.3.88    TCP    54    [TCP
ZeroWindow] 37480 > http [ACK] Seq=1945767 Ack=480576131 Win=0 Len=0
391186    506.614031000    10.222.3.88    10.14.11.50    TCP    60    [TCP
Keep-Alive] http > 37480 [ACK] Seq=480576130 Ack=1945767 Win=908288 Len=0
391187    506.614093000    10.14.11.50    10.222.3.88    TCP    54    [TCP
ZeroWindow] 37480 > http [ACK] Seq=1945767 Ack=480576131 Win=0 Len=0
391188    514.371638000    10.222.3.88    10.14.11.50    TCP    60    [TCP
Keep-Alive] http > 37480 [ACK] Seq=480576130 Ack=1945767 Win=908288 Len=0
391189    514.371694000    10.14.11.50    10.222.3.88    TCP    54    [TCP
ZeroWindow] 37480 > http [ACK] Seq=1945767 Ack=480576131 Win=0 Len=0
391190    529.872211000    10.222.3.88    10.14.11.50    TCP    60    [TCP
Keep-Alive] http > 37480 [ACK] Seq=480576130 Ack=1945767 Win=908288 Len=0
391191    *REF*    10.14.11.50    10.222.3.88    TCP    54    [TCP
ZeroWindow] 37480 > http [ACK] Seq=1945767 Ack=480576131 Win=0 Len=0
391192    30.988601000    10.222.3.88    10.14.11.50    TCP    60    [TCP
Keep-Alive] http > 37480 [ACK] Seq=480576130 Ack=1945767 Win=908288 Len=0
391193    30.988645000    10.14.11.50    10.222.3.88    TCP    54    [TCP
ZeroWindow] 37480 > http [ACK] Seq=1945767 Ack=480576131 Win=0 Len=0
391194    71.282655000    10.14.11.50    10.222.3.88    TCP    54    [TCP
Window Update] 37480 > http [ACK] Seq=1945767 Ack=480576131 Win=1171456
Len=0
391195    71.296598000    10.222.3.88    10.14.11.50    HTTP    1434
Continuation or non-HTTP traffic
which results in:
392598    71.906863000    10.222.3.88    10.14.11.50    HTTP    1434
Continuation or non-HTTP traffic
392599    71.906925000    10.14.11.50    10.222.3.88    TCP    54    37480
> http [ACK] Seq=1945767 Ack=482310892 Win=987904 Len=0
392600    72.043276000    10.14.11.50    10.222.3.88    TCP    54    [TCP
Window Update] 37480 > http [ACK] Seq=1945767 Ack=482310892 Win=1188096
Len=0
392601    72.375325000    10.14.11.50    10.222.3.88    TCP    54    37480
> http [FIN, ACK] Seq=1945767 Ack=482310892 Win=1188096 Len=0
392602    72.388690000    10.222.3.88    10.14.11.50    TCP    60    http >
37480 [ACK] Seq=482310892 Ack=1945768 Win=908288 Len=0

There are two interesting things:
The long gap after the *REF* packet is unusual, it's the first time I've
seen that.
The FIN packet is 72 seconds later, not 67 like I set the timeout. OTOH,
it's close, and, given the long gap no packets occur at FIN - 67 seconds.

I'm going to keep trying random timeout values.

...Stu


On Tue, Jan 14, 2014 at 9:27 AM, Stuart MacDonald
<st...@gmail.com>wrote:

> I don't have access to the server log, but you can see from the wireshark
> capture that the *client* closed the connection (look for the RST packet),
> not the server. The error of "socket closed" is consistent with that.
>
> I suspect that there's an issue handling the "http-timeout" timer; it gets
> triggered during a TCP zero window situation, and then not canceled
> properly, and then it fires, closing the socket.
>
> That would also fit the data of the bug being in 1.6 (reported by someone
> else, but I linked to it in my original post), 1.7 and 1.8; it's a bug in
> svn's handling of the underlying HTTP interaction layer, and not in that
> layer or the server or somewhere else.
>
> I could be wrong. When I look at the evidence so far, that's what I see.
>
> ...Stu
>
>
> On Tue, Jan 14, 2014 at 6:33 AM, Philip Martin <philip.martin@wandisco.com
> > wrote:
>
>> stuartm.coding@gmail.com writes:
>>
>> > I have another failure, with some debug info, and some updates.
>> >
>> > We're switching to a different system for this particular repository,
>> so I
>> > may or may not be able to continue debug efforts. :-(
>> >
>> > Since there's no debug info available for the svn 1.8 client, I haven't
>> > been running that.
>> >
>> > I went a full week without a failure, which I suspect is due to the
>> > enormous amount of debugging that neon-debug-mask=255 produces, so this
>> > week I dialed it back to 127 and got a failure on the first try.
>> >
>> > Here's the latest failure. First up, the debug info:
>> > Got 8192 bytes.
>> > XML: Parsing 8192 bytes.
>> > XML: char-data (235) returns 0
>> > XML: xmlParseChunk returned 0
>> > Reading 8192 bytes of response body.
>> > Got 6600 bytes.
>> > XML: Parsing 6600 bytes.
>> > XML: char-data (235) returns 0
>> > XML: xmlParseChunk returned 0
>> > Reading 8192 bytes of response body.
>> > Aborted request (-3): Could not read response body
>>
>> That message comes from ne_request.c and the -3 is NE_SOCK_CLOSED
>> defined in ne_socket.h:
>>
>> #define NE_SOCK_CLOSED (-3)
>> /* Connection was reset (e.g. server crashed) */
>>
>> > sess: Closing connection.
>> > sess: Connection closed.
>> > Request ends, status 200 class 2xx, error line:
>> > Could not read response body: connection was closed by server
>> > Running destroy hooks.
>> > Request ends.
>> > svn: E175002: REPORT of '/svn/repos/<name>/!svn/me': Could not read
>> > response body: connection was closed by server (http://<server>)
>> > sess: Destroying session.
>> > sess: Destroying session.
>>
>> What does the server error log show?
>>
>> --
>> Philip Martin | Subversion Committer
>> WANdisco // *Non-Stop Data*
>>
>
>

Re: Slow VM, svn client drops connection with FIN packet

Posted by Stuart MacDonald <st...@gmail.com>.
I don't have access to the server log, but you can see from the wireshark
capture that the *client* closed the connection (look for the RST packet),
not the server. The error of "socket closed" is consistent with that.

I suspect that there's an issue handling the "http-timeout" timer; it gets
triggered during a TCP zero window situation, and then not canceled
properly, and then it fires, closing the socket.

That would also fit the data of the bug being in 1.6 (reported by someone
else, but I linked to it in my original post), 1.7 and 1.8; it's a bug in
svn's handling of the underlying HTTP interaction layer, and not in that
layer or the server or somewhere else.

I could be wrong. When I look at the evidence so far, that's what I see.

...Stu


On Tue, Jan 14, 2014 at 6:33 AM, Philip Martin
<ph...@wandisco.com>wrote:

> stuartm.coding@gmail.com writes:
>
> > I have another failure, with some debug info, and some updates.
> >
> > We're switching to a different system for this particular repository, so
> I
> > may or may not be able to continue debug efforts. :-(
> >
> > Since there's no debug info available for the svn 1.8 client, I haven't
> > been running that.
> >
> > I went a full week without a failure, which I suspect is due to the
> > enormous amount of debugging that neon-debug-mask=255 produces, so this
> > week I dialed it back to 127 and got a failure on the first try.
> >
> > Here's the latest failure. First up, the debug info:
> > Got 8192 bytes.
> > XML: Parsing 8192 bytes.
> > XML: char-data (235) returns 0
> > XML: xmlParseChunk returned 0
> > Reading 8192 bytes of response body.
> > Got 6600 bytes.
> > XML: Parsing 6600 bytes.
> > XML: char-data (235) returns 0
> > XML: xmlParseChunk returned 0
> > Reading 8192 bytes of response body.
> > Aborted request (-3): Could not read response body
>
> That message comes from ne_request.c and the -3 is NE_SOCK_CLOSED
> defined in ne_socket.h:
>
> #define NE_SOCK_CLOSED (-3)
> /* Connection was reset (e.g. server crashed) */
>
> > sess: Closing connection.
> > sess: Connection closed.
> > Request ends, status 200 class 2xx, error line:
> > Could not read response body: connection was closed by server
> > Running destroy hooks.
> > Request ends.
> > svn: E175002: REPORT of '/svn/repos/<name>/!svn/me': Could not read
> > response body: connection was closed by server (http://<server>)
> > sess: Destroying session.
> > sess: Destroying session.
>
> What does the server error log show?
>
> --
> Philip Martin | Subversion Committer
> WANdisco // *Non-Stop Data*
>

Re: Slow VM, svn client drops connection with FIN packet

Posted by Philip Martin <ph...@wandisco.com>.
Philip Martin <ph...@wandisco.com> writes:

> That message comes from ne_request.c and the -3 is NE_SOCK_CLOSED
> defined in ne_socket.h:
>
> #define NE_SOCK_CLOSED (-3)
> /* Connection was reset (e.g. server crashed) */

/* Socket was closed */
#define NE_SOCK_CLOSED (-3)

The comment associated with the constant is "Socket was closed" not
"Connection was reset".

-- 
Philip Martin | Subversion Committer
WANdisco // *Non-Stop Data*

Re: Slow VM, svn client drops connection with FIN packet

Posted by Philip Martin <ph...@wandisco.com>.
stuartm.coding@gmail.com writes:

> I have another failure, with some debug info, and some updates.
>
> We're switching to a different system for this particular repository, so I 
> may or may not be able to continue debug efforts. :-(
>
> Since there's no debug info available for the svn 1.8 client, I haven't 
> been running that.
>
> I went a full week without a failure, which I suspect is due to the 
> enormous amount of debugging that neon-debug-mask=255 produces, so this 
> week I dialed it back to 127 and got a failure on the first try.
>
> Here's the latest failure. First up, the debug info:
> Got 8192 bytes.
> XML: Parsing 8192 bytes.
> XML: char-data (235) returns 0
> XML: xmlParseChunk returned 0
> Reading 8192 bytes of response body.
> Got 6600 bytes.
> XML: Parsing 6600 bytes.
> XML: char-data (235) returns 0
> XML: xmlParseChunk returned 0
> Reading 8192 bytes of response body.
> Aborted request (-3): Could not read response body

That message comes from ne_request.c and the -3 is NE_SOCK_CLOSED
defined in ne_socket.h:

#define NE_SOCK_CLOSED (-3)
/* Connection was reset (e.g. server crashed) */

> sess: Closing connection.
> sess: Connection closed.
> Request ends, status 200 class 2xx, error line:
> Could not read response body: connection was closed by server
> Running destroy hooks.
> Request ends.
> svn: E175002: REPORT of '/svn/repos/<name>/!svn/me': Could not read 
> response body: connection was closed by server (http://<server>)
> sess: Destroying session.
> sess: Destroying session.

What does the server error log show?

-- 
Philip Martin | Subversion Committer
WANdisco // *Non-Stop Data*