You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@httpd.apache.org by Jacob Champion <ch...@gmail.com> on 2016/08/08 23:14:49 UTC

Re: [users@httpd] Apache 2.4.12+ on Windows x64 stops responding to requests

On 07/25/2016 11:13 AM, Arthur Ramsey wrote:
> I think I will try the following settings first, but failing that I'll
> give the x86 build a try.
>
> AcceptFilter https none

Any follow-up on this? I've been digging into the AcceptEx() 
implementation, since it looks like there have been intermittent 
problems with it for a while now. If you have a reproduction case, I'd 
be happy to run with it.

--Jacob

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Re: [users@httpd] Apache 2.4.12+ on Windows x64 stops responding to requests

Posted by Paul Spangler <pa...@ni.com>.
On 8/9/2016 12:07 PM, Jacob Champion wrote:
> At this point, my primary suspect is our use of recycled OVERLAPPED
> structs without reinitializing them to zero. To make matters worse,
> we're setting the OVERLAPPED's internal .Pointer field in the
> AcceptFilter 'data' case -- which we're not supposed to be doing to
> begin with [1]. We don't do that in the 'connect' filter.
>
> This is all just theorycrafting, though. I'll try to reproduce on my end
> too.
>
> --Jacob
>
> [1]
> https://msdn.microsoft.com/en-us/library/windows/desktop/ms684342(v=vs.85).aspx
> (the Members > Pointer section)
>
I think I've finally had some success finding a reproduction of this 
issue, though it's somewhat involved. I set up an instance of Apache 
2.4.16 64-bit (built from source) on a Windows 7 machine and spun up an 
instance of WANem (http://wanem.sourceforge.net/) in a VirtualBox VM 
hosted on my client machine (also Windows 7).

WANem configuration (Advanced Mode):
Bandwidth - 100Mbps
Random Disconnect Type - tcp-reset
Random Disconnect MTTF Low - 1
Random Disconnect MTTF High - 3
Random Disconnect MTTR Low - 0
Random Disconnect MTTR High - 0

This instructs WANem to inject a TCP RST into connections that pass 
through it every 1 to 3 seconds (then recover after 0 seconds).

Then on my client machine, I added a route to the server that passes 
through the WANem gateway (cmd prompt: ROUTE ADD <server-ip> <WANem-ip>).

Finally, I ran a program on the client that makes 10 cURL requests in 
parallel repeatedly, performing a GET on a simple index.html page (well, 
a 28 KB HTML page). Eventually, even requests made to localhost on the 
server machine stop responding (they hang until the client times out).

Nothing shows in the error logs (I tried up to debug verbosity), and 
once it reproduces, no more entries appear in the access logs. I have to 
restart the server, though I haven't tried letting it sit for a period 
of time to see if it recovers on its own.

When I do the whole process again with "AcceptFilter http connect", it 
does not reproduce, and requests continue to work (when not being reset 
by WANem).

Not easy to set up, but at least it doesn't involve a browser or 
specific content on the server. I've seen it reproduce almost 
immediately, but it usually does so within 10 seconds or so.

I'll see if Wireshark shows anything interesting going on around the RSTs.
-- 
Paul Spangler
LabVIEW R&D
National Instruments

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Re: [users@httpd] Apache 2.4.12+ on Windows x64 stops responding to requests

Posted by Jacob Champion <ch...@gmail.com>.
On 08/09/2016 07:22 AM, Paul Spangler wrote:
> Though in our case, we only needed to use
>
> AcceptFilter http connect
> AcceptFilter https connect
>
> rather than turning it off completely using "none". Setting it to
> connect allows the server to recycle sockets.
>
> I'll see if I can't look back into it and try to find a reproduction
> case again that I can narrow down.

Thanks Paul!

At this point, my primary suspect is our use of recycled OVERLAPPED 
structs without reinitializing them to zero. To make matters worse, 
we're setting the OVERLAPPED's internal .Pointer field in the 
AcceptFilter 'data' case -- which we're not supposed to be doing to 
begin with [1]. We don't do that in the 'connect' filter.

This is all just theorycrafting, though. I'll try to reproduce on my end 
too.

--Jacob

[1] 
https://msdn.microsoft.com/en-us/library/windows/desktop/ms684342(v=vs.85).aspx 
(the Members > Pointer section)

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Re: [users@httpd] Apache 2.4.12+ on Windows x64 stops responding to requests

Posted by Paul Spangler <pa...@ni.com>.
On 8/8/2016 6:14 PM, Jacob Champion wrote:
> On 07/25/2016 11:13 AM, Arthur Ramsey wrote:
>> I think I will try the following settings first, but failing that I'll
>> give the x86 build a try.
>>
>> AcceptFilter https none
>
> Any follow-up on this? I've been digging into the AcceptEx()
> implementation, since it looks like there have been intermittent
> problems with it for a while now. If you have a reproduction case, I'd
> be happy to run with it.
>
> --Jacob
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
> For additional commands, e-mail: users-help@httpd.apache.org
>

We've also seen this in our 64-bit Apache 2.4.16 build on Windows. 
Specifically, accessing/refreshing a particular page in Chrome would 
lead to a 10-20 second hang for all clients, Firefox would maybe lead to 
a 2-5 second hang sometimes, but IE had no effect. The page had a 
mixture of static files (mainly JavaScript) and dynamic content loaded 
via Ajax. I was able to determine from the access log that the actual 
request processing took a normal amount of time, so the extra time must 
have occurred before the request start time was recorded (i.e. during 
accept/reading the request line).

Though in our case, we only needed to use

AcceptFilter http connect
AcceptFilter https connect

rather than turning it off completely using "none". Setting it to 
connect allows the server to recycle sockets.

I'll see if I can't look back into it and try to find a reproduction 
case again that I can narrow down.

-- 
Paul Spangler
LabVIEW R&D
National Instruments

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org