You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@httpd.apache.org by Tom Evans <te...@googlemail.com> on 2012/03/06 14:01:42 UTC

[users@httpd] Re: Disappearing requests / tuning event MPM, 2.2.22

On Tue, Feb 28, 2012 at 2:50 PM, Tom Evans <te...@googlemail.com> wrote:
> Hi all
>
> [I'm re-reading this, and it is a bit of a convoluted setup - I
> appreciate any eyes that read this!]
>
> Hardware: 2 x Dell 2850, 2 x Xeon 5140 2.33 GHz, 4 GB RAM
> OS: FreeBSD 7.1-RELEASE-p4
> Server version: Apache/2.2.22 (FreeBSD)
> Server built:   Feb 13 2012 22:29:44
>
> At $JOB we use Apache to serve as a reverse proxy - we have a pair of
> servers which all our web requests are round robin routed to. These
> servers then provide SSL termination, serve static content and reverse
> proxy onto backend servers for dynamic content.
>
> We in fact run two httpd server instances on each server; one using
> worker MPM providing the SSL termination, and one using event MPM to
> serve static content and reverse proxy content.
>
> First off, I'm not a network guy; I can find out more about this
> routing stuff if you think it's relevant though. IIRC it works like
> this: both boxes have the all public IP addresses for our websites
> allocated on the loopback interface, and the edge routers round robin
> requests to a pair of CARP/VRRP IP addresses on the Apache boxes. By
> controlling which box has which CARP address, we can control which
> box(s) are receiving traffic.
>
> So our problems started when we put all traffic through one box,
> whilst we upgraded to 2.2.22. Some of our websites are served through
> a CDN, and we could observe from our office a significant proportion
> of requests that went via the CDN failed to ever reach our server. We
> can see from our squid proxy log that requests were made that did not
> reach or get recorded by Apache.
>
> We also had reports from our clients and users that the websites (even
> non CDN sites) were subjectively 'slow' once we were operating on just
> one box. We think that these were requests failing to reach our
> server, and then subsequently being retried.
>
> We can quite clearly identify when this happens with sites served from
> the CDN, as each timeout results in them returning a 503 to us, which
> we can detect in our squid proxy logs and track how frequently this
> was happening. When all the traffic was put through one of the Apache
> frontend proxies, the error rate we could detect was 5 times higher
> than when we spread the load through both frontend proxies.
>

To try and make it clearer, I've made an ASCII art of our architecture:

>        +---------+
>        |  LAN    |
>        +---------+
>             |
>        +---------+
>        | SQUID   |
>        +---------+
>             |
>        ~~~~~~~~~~~
>        ~   inet  ~
>        ~~~~~~~~~~~
>             |
>        ~~~~~~~~~~~
>        ~  CDN    ~
>        ~~~~~~~~~~~
>             |
>        +---------+
>        |   FW    |
>        +---------+
>            /\
>     +-----+  +-------+
>     |                |
>  +-------+      +---------+
>  | FEP 1 |      |  FEP 2  |
>  +-------+      +---------+
>    |\_____          |\_____
>    |      \         |      \
>  +-----+  +-----+  +-----+ +-----+
>  | BE1 |  | BE2 |  | BE3 | | BE4 |
>  +-----+  +-----+  +-----+ +-----+
>

Key:
LAN - our corporate LAN
SQUID - our corporate squid proxy
inet - our corporate internet
CDN - our partner CDN's network
FW - our data centre edge firewall
FEP 1/2 - our front end proxies - httpd-event 2.2.21
BE 1/2/3/4 - our backend web servers, varied (mainly httpd-prefork 2.2).

Hopefully that will come through unmangled...

So, we've been trying to track disappearing requests. We see lots of
requests that go via the CDN to reach our data centre failing with
error code 503. This error message is produced by the CDN, and the
request is not logged in either of the FEPs.

We've been trying to track what happens with tcpdump running at SQUID
and at FW. At SQUID, we see a POST request for a resource, followed by
a long wait, and then a 503 generated by the CDN. Interestingly, 95%
of the failing requests are POST requests.

Tracking that at FW, we see the request coming in, and no reply from
the FEP. The connection is a keep-alive connection, and had just
completed a similar request 4 seconds previously, to which we returned
a 200 and data. This (failing) request is made on the same connection,
we reply with an ACK, then no data for 47 seconds (same wait as seen
by squid), and finally the connection is closed with a FIN.


Any ideas?

Cheers

Tom

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Re: [users@httpd] enable HTTPD to support multi-layer certificates (ca chain)

Posted by Igor Cicimov <ic...@gmail.com>.
I wonder why would the setting SSLCertificateChainFile even exist if you
can get away with option 2?
 On Mar 8, 2012 6:21 PM, "Durairaj, Srinivasan (NSN - IN/Hyderabad)" <
srinivasan.durairaj@nsn.com> wrote:

> Hi,
> I want to enable HTTPD to support multi-layer certificates (ca chain).
> I had 2 options
> Option 1:
> We can configure SSLCertificateFile (EE file) and SSLCertificateChainFile
> (CA Chain)
>
> Option 2:
> We can configure SSLCertificateFile (EE+CA Chain)
>
> When we tested we found that Option 1 worked and Option 2 did not.
> Any idea if I have missed anything in Option 2 or how to make Option 2 work
> HTTP version Is 2.2.3
>
> Regards
> Srini
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
> For additional commands, e-mail: users-help@httpd.apache.org
>

Re: [users@httpd] enable HTTPD to support multi-layer certificates (ca chain)

Posted by Noel Butler <no...@ausics.net>.
On Thu, 2012-03-08 at 07:58 -0500, Mark Montague wrote:

> On March 8, 2012 2:09 , "Durairaj, Srinivasan (NSN - IN/Hyderabad)" 
> <sr...@nsn.com> wrote:
> > I want to enable HTTPD to support multi-layer certificates (ca chain).
> > I had 2 options
> > Option 1:
> > We can configure SSLCertificateFile (EE file) and SSLCertificateChainFile (CA Chain)
> >
> > Option 2:
> > We can configure SSLCertificateFile (EE+CA Chain)
> >
> > When we tested we found that Option 1 worked and Option 2 did not.
> > Any idea if I have missed anything in Option 2 or how to make Option 2 work
> > HTTP version Is 2.2.3
> 
> Why do you think Option 2 should work?  What is bad about Option 1?  
> What problem are you trying to solve?
> 


I agree, so many people using option 2 in other software
(postfix/dovecot etc), get the order WRONG, and the chain fails, half
the time without them even knowing.  I've seen plenty ask that a chain
option be introduced in other software, because it avoids the guessing
game by newbies, not actually tried it in httpd, but maybe it does work,
and the OP, like many before him, got the order wrong.


Re: [users@httpd] enable HTTPD to support multi-layer certificates (ca chain)

Posted by Mark Montague <ma...@catseye.org>.
On March 8, 2012 2:09 , "Durairaj, Srinivasan (NSN - IN/Hyderabad)" 
<sr...@nsn.com> wrote:
> I want to enable HTTPD to support multi-layer certificates (ca chain).
> I had 2 options
> Option 1:
> We can configure SSLCertificateFile (EE file) and SSLCertificateChainFile (CA Chain)
>
> Option 2:
> We can configure SSLCertificateFile (EE+CA Chain)
>
> When we tested we found that Option 1 worked and Option 2 did not.
> Any idea if I have missed anything in Option 2 or how to make Option 2 work
> HTTP version Is 2.2.3

Why do you think Option 2 should work?  What is bad about Option 1?  
What problem are you trying to solve?

The documentation is pretty clear.  
https://httpd.apache.org/docs/2.2/mod/mod_ssl.html#sslcertificatefile 
says that the file specified by SSLCetificateFile contains the 
certificate for the server and, optionally, the private key.  It does 
not mention anything about CA certificates.  On the other hand, 
https://httpd.apache.org/docs/2.2/mod/mod_ssl.html#sslcertificatechainfile 
says that SSLCertificateChainFile specifies the "all-in-one" file 
containing certificates from the server certificate up through and 
including the root CA certificate.

--
   Mark Montague
   mark@catseye.org


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


[users@httpd] enable HTTPD to support multi-layer certificates (ca chain)

Posted by "Durairaj, Srinivasan (NSN - IN/Hyderabad)" <sr...@nsn.com>.
Hi,
I want to enable HTTPD to support multi-layer certificates (ca chain).
I had 2 options
Option 1:
We can configure SSLCertificateFile (EE file) and SSLCertificateChainFile (CA Chain)

Option 2:
We can configure SSLCertificateFile (EE+CA Chain)

When we tested we found that Option 1 worked and Option 2 did not. 
Any idea if I have missed anything in Option 2 or how to make Option 2 work
HTTP version Is 2.2.3

Regards
Srini

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org

Re: [users@httpd] Re: Disappearing requests / tuning event MPM, 2.2.22

Posted by Rainer Jung <ra...@kippdata.de>.
On 06.03.2012 18:26, Tom Evans wrote:
> On Tue, Mar 6, 2012 at 1:44 PM, Tom Evans<te...@googlemail.com>  wrote:
>> On Tue, Mar 6, 2012 at 1:01 PM, Tom Evans<te...@googlemail.com>  wrote:
>>> So, we've been trying to track disappearing requests. We see lots of
>>> requests that go via the CDN to reach our data centre failing with
>>> error code 503. This error message is produced by the CDN, and the
>>> request is not logged in either of the FEPs.
>>>
>>> We've been trying to track what happens with tcpdump running at SQUID
>>> and at FW. At SQUID, we see a POST request for a resource, followed by
>>> a long wait, and then a 503 generated by the CDN. Interestingly, 95%
>>> of the failing requests are POST requests.
>>>
>>> Tracking that at FW, we see the request coming in, and no reply from
>>> the FEP. The connection is a keep-alive connection, and had just
>>> completed a similar request 4 seconds previously, to which we returned
>>> a 200 and data. This (failing) request is made on the same connection,
>>> we reply with an ACK, then no data for 47 seconds (same wait as seen
>>> by squid), and finally the connection is closed with a FIN.
>>>
>>
>> Sorry, one final thing - we can see these hanging connections on the FEP:
>>
>> netstat -an | head -n 2 ; netstat -an | fgrep EST | fgrep -v  "tcp4       0"
>>
>> This shows the established sockets with unread recv-q. Obviously not
>> every socket shown is hanging; but by observing it over an extended
>> (10s) period, you can quickly see connections whose recv-q is not
>> drained.
>>
>
> A final follow up for today. We have dramatically* improved the error
> rates by tuning the event MPM, so that child processes were not being
> constantly reaped and re-spawned.
>
> In brief, we massively increased MaxSpareThreads, so that it wouldn't
> start reaping until more than 75% of potential workers (MaxClients)
> are idle. We're now running:
>
> StartServers 8
> MaxClients 1024
> MinSpareThreads 128
> MaxSpareThreads 768
> ThreadsPerChild 64
>
> We now are not seeing apache children getting reaped or re-spawned
> (good!) and we're also not seeing any hanging established connections
> with unread recv-q, nor any failures from our squid proxy (good!). I
> don't think we've solved anything though, I think we have just
> engineered a sweet spot where the problems do not occur (not good!).
>
> Our tentative hypothesis for what is happening is this. Apache notices
> that there are too many idle workers, and decides to shutdown one of
> the processes.
> It marks that process as shutting down, and no new requests are
> allocated to workers from that process.
> Meanwhile, a keep-alive socket which is allocated to that child
> process comes alive again, and a new request is pushed down it.
> Apache never bothers to read the request, as the child is marked as
> shutting down.
> Once the child does finish all outstanding requests, the child does
> indeed shut down, and the OS sends a FIN packet to shut down the
> unread socket.
>
> Does this sound remotely possible? I would really appreciate some
> advice/insight here.

Yes it reminds me of a similar observation using event on 2.2 about 1-2 
years ago. When Apache needed to recycle a process due to either spare 
thread checking, max requests per child or just a graceful restart, the 
event MPM handled existing keep alive connections more unfriendly than 
other MPMs. The expectation though was that an HTTP client using 
Keep-Alive should be able to resend a failed request on another 
connection, because race conditions can not be avoided with HTTP keep 
alive (it is always possible, that the server closes the connection and 
in parallel the client started to send the next request). It's just that 
event seems to show that behaviour more often than other MPMs.

I did not yet check, whether the overhauled event MPM in 2.4 handles 
this better.

Originally I was able to reproduce this behaviour by:

- setting MaxRequestsPerChild to triger frequent process recycling
- Increase Keep Alive count and timeout to allow longer than usual keep 
alive usage
- Adding the process PID to the AccessLog LogFormat (%P)
- Adding the keep alive count of a connection to the AccessLog LogFormat
- (I think) adding the client connection port to the AccessLog LogFormat
- adding some log statements to ab.c, which shows local port number when 
sending a request on an established connection fails

It could be seen, that connections were exactly failing when processes 
were stopped and that the connections were the one that did not 
completely exhaust there allowed keep alive count.

If you can easily reproduce the problem and have a test environment, it 
would be very interesting whether 2.4 behaves better.

Adjusting min and max spare (increasing the difference between the two) 
threads to reduce creation and destruction of processes is a good 
optimization nevertheless.

Regards,

Rainer

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


[users@httpd] Re: Disappearing requests / tuning event MPM, 2.2.22

Posted by Tom Evans <te...@googlemail.com>.
On Tue, Mar 6, 2012 at 1:44 PM, Tom Evans <te...@googlemail.com> wrote:
> On Tue, Mar 6, 2012 at 1:01 PM, Tom Evans <te...@googlemail.com> wrote:
>> So, we've been trying to track disappearing requests. We see lots of
>> requests that go via the CDN to reach our data centre failing with
>> error code 503. This error message is produced by the CDN, and the
>> request is not logged in either of the FEPs.
>>
>> We've been trying to track what happens with tcpdump running at SQUID
>> and at FW. At SQUID, we see a POST request for a resource, followed by
>> a long wait, and then a 503 generated by the CDN. Interestingly, 95%
>> of the failing requests are POST requests.
>>
>> Tracking that at FW, we see the request coming in, and no reply from
>> the FEP. The connection is a keep-alive connection, and had just
>> completed a similar request 4 seconds previously, to which we returned
>> a 200 and data. This (failing) request is made on the same connection,
>> we reply with an ACK, then no data for 47 seconds (same wait as seen
>> by squid), and finally the connection is closed with a FIN.
>>
>
> Sorry, one final thing - we can see these hanging connections on the FEP:
>
> netstat -an | head -n 2 ; netstat -an | fgrep EST | fgrep -v  "tcp4       0"
>
> This shows the established sockets with unread recv-q. Obviously not
> every socket shown is hanging; but by observing it over an extended
> (10s) period, you can quickly see connections whose recv-q is not
> drained.
>

A final follow up for today. We have dramatically* improved the error
rates by tuning the event MPM, so that child processes were not being
constantly reaped and re-spawned.

In brief, we massively increased MaxSpareThreads, so that it wouldn't
start reaping until more than 75% of potential workers (MaxClients)
are idle. We're now running:

StartServers 8
MaxClients 1024
MinSpareThreads 128
MaxSpareThreads 768
ThreadsPerChild 64

We now are not seeing apache children getting reaped or re-spawned
(good!) and we're also not seeing any hanging established connections
with unread recv-q, nor any failures from our squid proxy (good!). I
don't think we've solved anything though, I think we have just
engineered a sweet spot where the problems do not occur (not good!).

Our tentative hypothesis for what is happening is this. Apache notices
that there are too many idle workers, and decides to shutdown one of
the processes.
It marks that process as shutting down, and no new requests are
allocated to workers from that process.
Meanwhile, a keep-alive socket which is allocated to that child
process comes alive again, and a new request is pushed down it.
Apache never bothers to read the request, as the child is marked as
shutting down.
Once the child does finish all outstanding requests, the child does
indeed shut down, and the OS sends a FIN packet to shut down the
unread socket.

Does this sound remotely possible? I would really appreciate some
advice/insight here.

When I get a chance, I will try to engineer a config that puts httpd
in this sort of state, and a test case that should expose this.

Cheers

Tom

* So much so, that 20 minutes after making the changes, my boss
suggested we all retire to the pub and celebrate.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


[users@httpd] Re: Disappearing requests / tuning event MPM, 2.2.22

Posted by Tom Evans <te...@googlemail.com>.
On Tue, Mar 6, 2012 at 1:01 PM, Tom Evans <te...@googlemail.com> wrote:
> So, we've been trying to track disappearing requests. We see lots of
> requests that go via the CDN to reach our data centre failing with
> error code 503. This error message is produced by the CDN, and the
> request is not logged in either of the FEPs.
>
> We've been trying to track what happens with tcpdump running at SQUID
> and at FW. At SQUID, we see a POST request for a resource, followed by
> a long wait, and then a 503 generated by the CDN. Interestingly, 95%
> of the failing requests are POST requests.
>
> Tracking that at FW, we see the request coming in, and no reply from
> the FEP. The connection is a keep-alive connection, and had just
> completed a similar request 4 seconds previously, to which we returned
> a 200 and data. This (failing) request is made on the same connection,
> we reply with an ACK, then no data for 47 seconds (same wait as seen
> by squid), and finally the connection is closed with a FIN.
>

Sorry, one final thing - we can see these hanging connections on the FEP:

netstat -an | head -n 2 ; netstat -an | fgrep EST | fgrep -v  "tcp4       0"

This shows the established sockets with unread recv-q. Obviously not
every socket shown is hanging; but by observing it over an extended
(10s) period, you can quickly see connections whose recv-q is not
drained.

Cheers

Tom

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org