You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by "Akins, Brian" <Br...@turner.com> on 2005/06/17 15:11:36 UTC

Keepalives

Here's the problem:

If you want to use keepalives, all of you workers (threads/procs/whatever)
can become busy just waiting on another request on a keepalive connection.
Raising MaxClients does not help.

The Event MPM does not seems to really help this situation.  It seems to
only make each keepalive connection "cheaper."  It can still allow all
workers to be blocking on keepalives.


Short Term solution:

This is what we did.  We use worker MPM.  We wrote a simple modules that
keep track of how many keeapalive connections are active.  When a threshold
is reached, it does not allow anymore keepalives.  (Basically sets
r->connection->keepalive = AP_CONN_CLOSE).  This works for us, but the limit
is per process and only works for threaded MPM's.


Long Term solution:

Keep track of keepalives in the scoreboard (or somewhere else). Allow
admin's to set a threshold for keepalives:

MaxClients 1024
MaxConcurrentKeepalives 768

Or something like that.


Thoughts?  I am willing to write the code if this seems desirable.  Should
this just be another module or in the http core?



-- 
Brian Akins
Lead Systems Engineer
CNN Internet Technologies



Re: Keepalives

Posted by Bill Stoddard <bi...@wstoddard.com>.
Brian Akins wrote:
> Bill Stoddard wrote:
> 
>> If the event MPM is working properly, then a worker thread should not 
>> be blocking waiting for the next ka
>> request. You still have the overhead of the tcp connection and some 
>> storage used by httpd to manage connection
>> events but both of those are small compared to a blocking thread.
> 
> 
> Should there be an upper limit on how many connections to have in 
> keepalive, even when using event? Say you have 100 worker threads, you 
> wouldn't want to have 8192 keepalive connections.  So you would want 
> some limit.
> 
>> Both approaches sound pragmatic (+.5) although I would like to think 
>> the best long term solution is to
>> completely decouple TCP connections from worker threads. 
> 
> 
> I really like the event mpm, but I still think there has to be an upper 
> limit on how many connections to allow to keepalive.
> 
>> is an experiment in that direction but
>> it still has a long ways to go. Earliest I could see this happening is 
>> in the v 2.4 timeframe.
> 
> 
> We've been doing some testing with the current 2.1 implementation, and 
> it works, it just currently doesn't offer much advantage over worker for 
> us.  If num keepalives == maxclients, you can't accept anymore 
> connections.  
Interesting point; it's been a while since I looked at the event MPM but I thought (mistakenly) that 
maxclients accounting was adjusted to reflect max number of concurrently active worker threads rather than 
active tcp connections. I agree we need some kind of upper limit on the max number of TCP connections into a 
running instance of httpd, regardless of whether those connections are associated with a worker thread or not.

Bill


Re: Keepalives

Posted by Greg Ames <gr...@remulak.net>.
Greg Ames wrote:
> Brian Akins wrote:

>> We've been doing some testing with the current 2.1 implementation, and 
>> it works, it just currently doesn't offer much advantage over worker 
>> for us.  If num keepalives == maxclients, you can't accept anymore 
>> connections.  
>  
> that's a surprise, and it sounds like a bug.  I'll investigate.  

the event mpm in httpd-2.1 trunk is working fine for me.  running a specweb99 
mini-benchmark with MaxClients 50 and Listen 8092, I see:

[gregames@tarpon built]$ while true; do netstat -an | grep -c 
"8092.*ESTABLISHED"; done
146
163
162
188
164
166
149
145
157
152
163

...so with this setup, I have roughly 3 connections for every worker thread, 
including the idle threads.

here is a server-status: 
http://people.apache.org/~gregames/event-server-status.html

how are you counting connections?

Greg


Re: Keepalives

Posted by Greg Ames <gr...@remulak.net>.
Brian Akins wrote:
> Bill Stoddard wrote:
> 
>> If the event MPM is working properly, then a worker thread should not 
>> be blocking waiting for the next ka
>> request. You still have the overhead of the tcp connection and some 
>> storage used by httpd to manage connection
>> events but both of those are small compared to a blocking thread.
> 
> 
> Should there be an upper limit on how many connections to have in 
> keepalive, even when using event? Say you have 100 worker threads, you 
> wouldn't want to have 8192 keepalive connections.  So you would want 
> some limit.

> I really like the event mpm, but I still think there has to be an upper 
> limit on how many connections to allow to keepalive.

I'm pleased to hear you've tried the event mpm.

not sure why there has to be a limit.  are you talking about connections per 
worker process?  except for the size of the pollset, I didn't see a need to put 
a limit on the number of connections per worker process back when I was stress 
testing it with specweb99.  when a worker process was saturated with active 
threads, the listener thread would block in ap_queue_info_wait_for_idler() until 
a worker thread freed up.  in the mean time, other processes would grab the new 
connections.  so it was sort of self balancing as far as distributing 
connections among processes.

not sure if the current code still behaves that way.  I plan to find out soon 
though.

> We've been doing some testing with the current 2.1 implementation, and 
> it works, it just currently doesn't offer much advantage over worker for 
> us.  If num keepalives == maxclients, you can't accept anymore 
> connections.  

that's a surprise, and it sounds like a bug.  I'll investigate.  it used to be 
that maxclients was really max worker threads and you could have far more 
connections than threads.

thanks for the feedback.

Greg



Re: Keepalives

Posted by Brian Akins <br...@turner.com>.
Bill Stoddard wrote:
> If the event MPM is working properly, then a worker thread should not be 
> blocking waiting for the next ka
> request. You still have the overhead of the tcp connection and some 
> storage used by httpd to manage connection
> events but both of those are small compared to a blocking thread.

Should there be an upper limit on how many connections to have in 
keepalive, even when using event? Say you have 100 worker threads, you 
wouldn't want to have 8192 keepalive connections.  So you would want 
some limit.

> Both approaches sound pragmatic (+.5) although I would like to think the 
> best long term solution is to
> completely decouple TCP connections from worker threads. 

I really like the event mpm, but I still think there has to be an upper 
limit on how many connections to allow to keepalive.

> is an experiment in that direction but
> it still has a long ways to go. Earliest I could see this happening is 
> in the v 2.4 timeframe.

We've been doing some testing with the current 2.1 implementation, and 
it works, it just currently doesn't offer much advantage over worker for 
us.  If num keepalives == maxclients, you can't accept anymore 
connections.  I want to be able to limit total number of keepalives.



-- 
Brian Akins
Lead Systems Engineer
CNN Internet Technologies

Re: Keepalives

Posted by Bill Stoddard <st...@apache.org>.
Akins, Brian wrote:
> Here's the problem:
> 
> If you want to use keepalives, all of you workers (threads/procs/whatever)
> can become busy just waiting on another request on a keepalive connection.
> Raising MaxClients does not help.
> 
> The Event MPM does not seems to really help this situation.  It seems to
> only make each keepalive connection "cheaper."  It can still allow all
> workers to be blocking on keepalives.

If the event MPM is working properly, then a worker thread should not be blocking waiting for the next ka 
request. You still have the overhead of the tcp connection and some storage used by httpd to manage connection 
events but both of those are small compared to a blocking thread.

> 
> 
> Short Term solution:
> 
> This is what we did.  We use worker MPM.  We wrote a simple modules that
> keep track of how many keeapalive connections are active.  When a threshold
> is reached, it does not allow anymore keepalives.  (Basically sets
> r->connection->keepalive = AP_CONN_CLOSE).  This works for us, but the 
> limit
> is per process and only works for threaded MPM's.
> 
> 
> Long Term solution:
> 
> Keep track of keepalives in the scoreboard (or somewhere else). Allow
> admin's to set a threshold for keepalives:
> 
> MaxClients 1024
> MaxConcurrentKeepalives 768
> 
> Or something like that.
> 
> 
> Thoughts?  

Both approaches sound pragmatic (+.5) although I would like to think the best long term solution is to 
completely decouple TCP connections from worker threads. The event MPM is an experiment in that direction but 
it still has a long ways to go. Earliest I could see this happening is in the v 2.4 timeframe.

Bill


Re: Keepalives

Posted by Brian Akins <br...@turner.com>.
Any interest/objections to added another MPM query>

AP_MPMQ_IDLE_WORKERS

(or some other name)

in worker.c, could just add this to ap_mpm_query:


  case AP_MPMQ_IDLE_WORKERS:
             *result = ap_idle_thread_count;
             return APR_SUCCESS;


and in perform_idle_server_maintenance we would update ap_idle_thread_count.

I can submit a patch if anyone thinks this has a chance of being committed.



-- 
Brian Akins
Lead Systems Engineer
CNN Internet Technologies

Re: Keepalives

Posted by Brian Akins <br...@turner.com>.
Nick Kew wrote:
> Could that be done dynamically?  As in, make the max keepalive time a
> function of how near the server is to running out of spare workers?

Sure.  I'd have to poke around a bit to see the best way to do it. 
Speed is of utmost concern for us. I guess I could dynamically change 
r->server->keep_alive_max or r->server->keep_alive_timeout? Maybe make 
the timeout a "sliding" timeout something like:

/*calculate max_clients by querying mpm*/
/*is there a good, fast way to get idle workers?*/
/*store keep_alive_timeout somewhere*/

r->server->keep_alive_timeout = keepalive_timeout / (max_clients / 
idle_workers);


Thoughts?


> Also, have you looked into making keepalive dependent on resource type?
> E.g. use them for HTML docs - which typically have inline contents - but
> not for other media types unless REFERER is a local HTML page?

Sounds horribly slow... Also, in our case, HTML and other content come 
from separate server pools.  But most pages are made up of a few HTML 
pages.  (You have to look at the HTML source to see what I mean).

Also, we have some "app" servers that often have all connections tied up 
in keepalive because the front ends open tons of keepalives (I have no 
direct control of them).

I was hoping for a more generic solution that would maybe help others. 
I'm sure there are others with similar situations.



-- 
Brian Akins
Lead Systems Engineer
CNN Internet Technologies

Re: Keepalives

Posted by Nick Kew <ni...@webthing.com>.
Akins, Brian wrote:

> Short Term solution:
> 
> This is what we did.  We use worker MPM.  We wrote a simple modules that
> keep track of how many keeapalive connections are active.  When a threshold
> is reached, it does not allow anymore keepalives.  (Basically sets
> r->connection->keepalive = AP_CONN_CLOSE).  This works for us, but the limit
> is per process and only works for threaded MPM's.

Could that be done dynamically?  As in, make the max keepalive time a
function of how near the server is to running out of spare workers?

Oh, and is the default still ridiculously high?  ISTR it being 15 secs
at one time - not sure if that ever got changed.

Also, have you looked into making keepalive dependent on resource type?
E.g. use them for HTML docs - which typically have inline contents - but
not for other media types unless REFERER is a local HTML page?


> Long Term solution:
> 
> Keep track of keepalives in the scoreboard (or somewhere else). Allow
> admin's to set a threshold for keepalives:
> 
> MaxClients 1024
> MaxConcurrentKeepalives 768
> 
> Or something like that.
> 
> 
> Thoughts?  I am willing to write the code if this seems desirable.  Should
> this just be another module or in the http core?

Is that a candidate application for the monitor hook?  Other things
being equal, I'd make it a module.

-- 
Nick Kew

Re: Keepalives

Posted by "William A. Rowe, Jr." <wr...@rowe-clan.net>.
At 10:12 AM 6/17/2005, Brian Akins wrote:
>>Adding an indexed list of 'counts' would be
>>very lightweight, and one atomic increment and decrement per state
>>change.  This would probably be more efficient than walking the
>>entire list.
>
>Sounds good.  Of course, when changing from on state to another you would always have to decrement the previous state and increment the new one.  The way the core seems to be now, that would require some careful examination of the code to ensure all the state changes were covered.

In that exact order :)  Much better to 'under report' a given state,
and consider that under reporting (if you sum during an update) to be
a product of state changes.

I think ++, -- would be much more misleading, since the server would
be taking more actions than could possibly occur at once.

Bill



Re: Keepalives

Posted by Brian Akins <br...@turner.com>.
William A. Rowe, Jr. wrote:
> Yes it makes sense.  But I'd encourage you to consider dropping that
> keepalive time and see if the problem isn't significantly mitigated.

It is mitigated somewhat, but we still hit maxclients without our "hack" 
in place.


> Right now, it does take cycles to walk the
> scoreboard to determine the number in a given state (and this is
> somewhat fuzzy since values are flipping as you walk along the
> list of workers.)  


I know the worker MPM, for example, keeps a count of idle workers 
internally.  Maybe just an mpm query to retrieve that value would be 
good?  all MPM's keep track of this in some fashion because they all 
know when maxclients is reached.


> Adding an indexed list of 'counts' would be
> very lightweight, and one atomic increment and decrement per state
> change.  This would probably be more efficient than walking the
> entire list.

Sounds good.  Of course, when changing from on state to another you 
would always have to decrement the previous state and increment the new 
one.  The way the core seems to be now, that would require some careful 
examination of the code to ensure all the state changes were covered.



-- 
Brian Akins
Lead Systems Engineer
CNN Internet Technologies

Re: Keepalives

Posted by "William A. Rowe, Jr." <wr...@rowe-clan.net>.
At 09:27 AM 6/17/2005, Brian Akins wrote:
>>Also, I'd be very concerned
>>about additional load - clients who are retrieving many gifs (with
>>no pause at all) in a pipelined fashion will end up hurting the
>>over resource usage if you force them back to HTTP/1.0 behavior.
>
>Yes, but if all threads are "waiting" for x seconds for keepalives (even if it is 3-5 seconds), the server cannot service any new clients.  I'm willing to take an overall resource hit (and "inconvenience" some clients) to maintain the overall availability of the server.
>
>Does that make any sense?  It does to me, but I may not be explaining our problem well.

Yes it makes sense.  But I'd encourage you to consider dropping that
keepalive time and see if the problem isn't significantly mitigated.

We have a schema today to create 'parallel' scoreboards, but perhaps
in the core we should offer this is a public API to module authors,
to keep it very simple?

I believe keepalive-blocked read should be able to be determined
from the scoreboard.  As far as 'counting' states, that would be
somewhat interesting.  Right now, it does take cycles to walk the
scoreboard to determine the number in a given state (and this is
somewhat fuzzy since values are flipping as you walk along the
list of workers.)  Adding an indexed list of 'counts' would be
very lightweight, and one atomic increment and decrement per state
change.  This would probably be more efficient than walking the
entire list.

In any case, I would simply extend counts for all registered
request states in the scoreboard, rather than a one-off for
every state someone becomes interested in.

Bill  


Re: Keepalives

Posted by Brian Akins <br...@turner.com>.
William A. Rowe, Jr. wrote:

> No, it doesn't :)  But lowering the keepalive threshold to three
> to five seconds does.  

For us, in heavy loads, that's 3-5 seconds that a thread cannot process 
a new client.  Under normal circumstances, the 15 seconds is fine, but 
when we are stressed, we need to free threads as quickly as possible.


>  Also, I'd be very concerned
> about additional load - clients who are retrieving many gifs (with
> no pause at all) in a pipelined fashion will end up hurting the
> over resource usage if you force them back to HTTP/1.0 behavior.

Yes, but if all threads are "waiting" for x seconds for keepalives (even 
if it is 3-5 seconds), the server cannot service any new clients.  I'm 
willing to take an overall resource hit (and "inconvenience" some 
clients) to maintain the overall availability of the server.

Does that make any sense?  It does to me, but I may not be explaining 
our problem well.


-- 
Brian Akins
Lead Systems Engineer
CNN Internet Technologies

Re: Keepalives

Posted by "William A. Rowe, Jr." <wr...@rowe-clan.net>.
At 08:11 AM 6/17/2005, Akins, Brian wrote:

>If you want to use keepalives, all of you workers (threads/procs/whatever) 
>can become busy just waiting on another request on a keepalive connection. 
>Raising MaxClients does not help. 

No, it doesn't :)  But lowering the keepalive threshold to three
to five seconds does.  We are lowering the 'example' keepalive
timeout in the next releases.

Keepalives as originally implemented were to help users with
loading additional images (and now, css stylesheets.)  And the
time was set to allow a user to grab 'another page' if they
quickly clicked through.  But the later use is not a good use
of child slots, and the former use case has a -much- lower
window these days, as most browsers are quite fast to compose
the base document and determine what images are required.

With the relative disappearance of 1200baud dialup, 15 seconds
for the client to sit and think about grabbing more documents
to compose the current page is very silly :)

>The Event MPM does not seems to really help this situation.  It seems to 
>only make each keepalive connection "cheaper."  It can still allow all 
>workers to be blocking on keepalives. 
>
>Short Term solution: 
>
>This is what we did.  We use worker MPM.  We wrote a simple modules that 
>keep track of how many keeapalive connections are active.  When a threshold 
>is reached, it does not allow anymore keepalives.  (Basically sets 
>r->connection->keepalive = AP_CONN_CLOSE).  This works for us, but the limit 
>is per process and only works for threaded MPM's. 
>
>Long Term solution: 
>
>Keep track of keepalives in the scoreboard (or somewhere else). Allow 
>admin's to set a threshold for keepalives: 
>
>MaxClients 1024 
>MaxConcurrentKeepalives 768 
>
>Or something like that. 
>
>Thoughts?  I am willing to write the code if this seems desirable.  Should 
>this just be another module or in the http core? 

If you experiment with setting the keepalive window to 3 seconds
or so, how does that affect your test?  Also, I'd be very concerned
about additional load - clients who are retrieving many gifs (with
no pause at all) in a pipelined fashion will end up hurting the
over resource usage if you force them back to HTTP/1.0 behavior.

Bill



Re: Keepalives

Posted by Paul Querna <ch...@force-elite.com>.
..... Snipping all the other issues, which are largely valid and do 
contain some good ideas....

Akins, Brian wrote:

> Here's the problem:
>
> If you want to use keepalives, all of you workers 
> (threads/procs/whatever)
> can become busy just waiting on another request on a keepalive 
> connection.
> Raising MaxClients does not help.
>
> The Event MPM does not seems to really help this situation.  It seems to
> only make each keepalive connection "cheaper."  It can still allow all
> workers to be blocking on keepalives.
>

Can you be more detailed on this?  It really _should_ help, and my 
testing says it does.  What are you seeing behavior wise?  Any changes 
you would like made?