You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@httpd.apache.org by Greg Ames <gr...@raleigh.ibm.com> on 2001/02/08 20:55:15 UTC

apache.org - MaxClients problem with prefork

Brian B. brought up 2.0 on apache.org briefly yesterday.  It started off
pretty well, but then got to a point where it had 120 children running,
needed more, and they wouldn't start.  MaxClients is 450.  It runs fine
on 1.3, so we can't blame the OS config.  I don't know if this recent
breakage, or has been there for a while, but we didn't notice it because
of the seg faults.

Anyway, Paul Reder managed to drive prefork to MaxClients=450 on his
Linux test box pretty easily, and Jeff T got up to some MaxClients value
without any problem either.  I believe the critical piece of code is
perform_idle_server_maintenance() in prefork.c.  I don't see any glaring
errors so far, but I noticed that it logs a couple of messages: 

  "server reached MaxClients setting, consider raising the MaxClients
setting"

I assume we don't get that on 2.0 apache.org 

                    "server seems busy, (you may need "
                    "to increase StartServers, or Min/MaxSpareServers),
"
                    "spawning %d children, there are %d idle, and "
                    "%d total children"

Brian, do you see that in yesterday's error log?  and where is the log,
btw?

I'm wondering if perform_idle_server_maintenance() even gets a chance to
run when we hit this problem.  I believe it runs on the parent..maybe
the parent is ill.

My plan is to bump up KeepAliveTimeout on apache.org:8092 (to keep the
children tied up longer) and do some log replay testing to see if I can
get up to MaxClients (probably set to 250 or so to be nice).  Anybody
have any hints/other ideas/words of wisdom?  

Thanks,
Greg

Re: apache.org - MaxClients problem with prefork

Posted by Bill Stoddard <bi...@wstoddard.com>.

> Bill Stoddard wrote:
> >
> > > > > Brian, do you see that in yesterday's error log?
> > >
> > > Ooops...after I sent this, I noticed that this message only is generated
> > > with LogLevel warn.  Sorry about that.  I'll change it to LogLevel error
> > > in my next build so we do get this message.
> >
> > No code change or rebuild is necessary. Just change the LogLevel setting
in
> > httpd.conf (LogLevel debug).
> >
>
> Won't that spew a lot of stuff we're not interested in, though?  I don't
> have any experience doing this in a production environment, so I really
> don't know.
>

Loglevel warn shouldn;t be too bad under 1.3

Bill

Re: apache.org - MaxClients problem with prefork

Posted by Greg Ames <gr...@raleigh.ibm.com>.

Bill Stoddard wrote:
> 
> > > > Brian, do you see that in yesterday's error log?
> >
> > Ooops...after I sent this, I noticed that this message only is generated
> > with LogLevel warn.  Sorry about that.  I'll change it to LogLevel error
> > in my next build so we do get this message.
> 
> No code change or rebuild is necessary. Just change the LogLevel setting in
> httpd.conf (LogLevel debug).
> 

Won't that spew a lot of stuff we're not interested in, though?  I don't
have any experience doing this in a production environment, so I really
don't know.

Greg

Re: apache.org - MaxClients problem with prefork

Posted by Bill Stoddard <bi...@wstoddard.com>.

> > > Brian, do you see that in yesterday's error log?
>
> Ooops...after I sent this, I noticed that this message only is generated
> with LogLevel warn.  Sorry about that.  I'll change it to LogLevel error
> in my next build so we do get this message.

No code change or rebuild is necessary. Just change the LogLevel setting in
httpd.conf (LogLevel debug).

Bill

Re: apache.org - MaxClients problem with prefork

Posted by Greg Ames <gr...@raleigh.ibm.com>.

Brian Behlendorf wrote:
> 
> On Thu, 8 Feb 2001, Greg Ames wrote:
> > I assume we don't get that on 2.0 apache.org
> >
> >                     "server seems busy, (you may need "
> >                     "to increase StartServers, or Min/MaxSpareServers),
> > "
> >                     "spawning %d children, there are %d idle, and "
> >                     "%d total children"
> >
> > Brian, do you see that in yesterday's error log?  

Ooops...after I sent this, I noticed that this message only is generated
with LogLevel warn.  Sorry about that.  I'll change it to LogLevel error
in my next build so we do get this message. This message is logged when
the code is in exponential process expansion mode and is starting at
least 8 processes in one shot, so we should not get flooded with them.

We did bring up 2.0 on port 8092 yesterday, and got up to MaxClients =
450 clients using log replay.  So it's not a simple FreeBSD or daedalus
issue AFAICT.

Since we are able to grow from 50 to 120 processes before it quits, the
perform_idle_server_maintenance code must be at least partially
working.  I'm wondering if the parent process could be stuck somehow
(i.e., blocked forever in a syscall, looping, ignoring timer pops, not
getting scheduled) once we get to 120 processes, so that
perform_idle_server_maintenance isn't getting called. 

I will "cvs up", tweak the log level for the "spawning children" msg,
re-build, test it to insure the msg is logged, and play with kill to see
if I can force a core dump.  That ought to tell us what the parent
process is up to.

Greg

Re: apache.org - MaxClients problem with prefork

Posted by Greg Ames <gr...@raleigh.ibm.com>.

rbb@covalent.net wrote:
> 
> > > > This was before we fixed the
> > > > memory leak in the file_buckets, correct?
> > >
> > > Incorrect.  The last two tests were with Cliff's patch applied.
> > >
> >
> > But not my patch.
> 
> That's the one I meant.  Cliff's patch was important but it would have
> been seen on any server.  On the other had, Bill's will only show up on a
> machine like apache.org, where we use mod_include on every file.
> 

Didn't notice it...sorry.  The next build will have it, and it won't be
an issue.   hmmm...I wonder how quickly Bill's leak would bog things
down on apache.org...

Greg

Re: apache.org - MaxClients problem with prefork

Posted by rb...@covalent.net.

> > > This was before we fixed the
> > > memory leak in the file_buckets, correct?  
> > 
> > Incorrect.  The last two tests were with Cliff's patch applied.  
> > 
> 
> But not my patch.

That's the one I meant.  Cliff's patch was important but it would have
been seen on any server.  On the other had, Bill's will only show up on a
machine like apache.org, where we use mod_include on every file.

Ryan
_______________________________________________________________________________
Ryan Bloom                        	rbb@apache.org
406 29th St.
San Francisco, CA 94131
-------------------------------------------------------------------------------

Re: apache.org - MaxClients problem with prefork

Posted by Bill Stoddard <bi...@wstoddard.com>.

> btw, I did a bunch of ab benchmarks against apache.org:8092 and
> apache.org (1.3).  2.0 came out ahead in 13 out of 16 runs.  These were
> quick-n-dirty tests, serving apache_pb.gif vs. autoindex, keepalive vs.
> no keepalives, network vs. no network, and concurrency level of 1 vs.
> 10.  So IMO once we get the "not enough children" issue solved, prefork
> will be rocking, and we can move on.
> 
> > This was before we fixed the
> > memory leak in the file_buckets, correct?  
> 
> Incorrect.  The last two tests were with Cliff's patch applied.  
> 

But not my patch.

Bill

Re: apache.org - MaxClients problem with prefork

Posted by Greg Ames <gr...@raleigh.ibm.com>.

rbb@covalent.net wrote:
> 
> How do we know we couldn't start enough processes to keep up?  

Brian saw it. We probably were communicating off-list; my apologies.  In
any case, we seem to get stuck at around 120 process when we go live
now.

> We have
> never seen that problem before in any of the other tests on
> apache.org.  

Well, we see it now, for at least the last two tests.  Before we were
core dumping quite a bit, so I don't know if we would have noticed this.

>           Nothing has changed in the perform_idle_server_maintainance
> function, and this test was done before I hacked around in the scoreboard
> code.

As I said in another message or two, perform_idle_server_maintenance
looks OK.  I don't know what it is, but we need to find out.

> 
> I am would believe that we were seeing a spike on apache.org that
> caused us to not be able to keep up, but every test I have run in the past
> to try to force my machine to reach MaxClients has gotten me the correct
> number of servers, so I don't understand what has changed.

Same here.  Even apache.org:8092 got up to MaxClients = 450 yesterday.

> 
> This would also relate nicely to what Brian said when he took the server
> down.  The performance was horrendous.  

Actually, Brian said the performance was "snappy" before we got behind
due to lack of children on the last two runs.

btw, I did a bunch of ab benchmarks against apache.org:8092 and
apache.org (1.3).  2.0 came out ahead in 13 out of 16 runs.  These were
quick-n-dirty tests, serving apache_pb.gif vs. autoindex, keepalive vs.
no keepalives, network vs. no network, and concurrency level of 1 vs.
10.  So IMO once we get the "not enough children" issue solved, prefork
will be rocking, and we can move on.

> This was before we fixed the
> memory leak in the file_buckets, correct?  

Incorrect.  The last two tests were with Cliff's patch applied.  

> Can we turn this back on
> tonight and see what happens?  Maybe turning logging down to info or
> debug?

Yeah, I'm going to tweak the log level for the "spawning children" msg,
and figure out what we need for diagnostics in case the parent process
is misbehaving somehow.  I'm thinking core dump via kill; Jeff suggested
truss; other ideas appreciated.

Greg

Re: apache.org - MaxClients problem with prefork

Posted by rb...@covalent.net.

On Fri, 9 Feb 2001, Greg Ames wrote:

> rbb@covalent.net wrote:
>
> > > Nope; the log is /logs/www/error_log, rotated nightly to
> > > /x2/logarchive/www/error_log.0.gz, kept only one more day.  I don't see
> > > anything in the log except lots of
> > >
> > > [Wed Feb 07 13:11:33 2001] [warn] (54)Connection reset by peer: setsockopt: (TCP_NODELAY)
> > > [Wed Feb 07 13:11:34 2001] [warn] (54)Connection reset by peer: setsockopt: (TCP_NODELAY)
> > > [Wed Feb 07 13:11:35 2001] [warn] (54)Connection reset by peer: setsockopt: (TCP_NODELAY)
> > > [Wed Feb 07 13:11:35 2001] [warn] (54)Connection reset by peer: setsockopt: (TCP_NODELAY)
> > 
> > I honestly believe this has to do with David Reid's change to store the
> > socket options.  I haven't had a chance to investigate though.
> > 
> 
> This message indicates that the connection died due to a TCP
> reset...setsockopt() is an innocent victim.  We know we couldn't start
> enough processes to handle the incoming workload, and TCP's accept queue
> was overflowing.  I think the resets are related to us not being able to
> keep up - due to our users getting frustrated and killing Netscape or
> whatever.

How do we know we couldn't start enough processes to keep up?  We have
never seen that problem before in any of the other tests on
apache.org.  Nothing has changed in the perform_idle_server_maintainance
function, and this test was done before I hacked around in the scoreboard
code.

I am would believe that we were seeing a spike on apache.org that
caused us to not be able to keep up, but every test I have run in the past
to try to force my machine to reach MaxClients has gotten me the correct
number of servers, so I don't understand what has changed.

This would also relate nicely to what Brian said when he took the server
down.  The performance was horrendous.  This was before we fixed the
memory leak in the file_buckets, correct?  Can we turn this back on
tonight and see what happens?  Maybe turning logging down to info or
debug?

Ryan
_______________________________________________________________________________
Ryan Bloom                        	rbb@apache.org
406 29th St.
San Francisco, CA 94131
-------------------------------------------------------------------------------

Re: apache.org - MaxClients problem with prefork

Posted by Greg Ames <gr...@raleigh.ibm.com>.

rbb@covalent.net wrote:
> 
> > >                     "server seems busy, (you may need "
> > >                     "to increase StartServers, or Min/MaxSpareServers),
> > > "
> > >                     "spawning %d children, there are %d idle, and "
> > >                     "%d total children"
> > >
> > > Brian, do you see that in yesterday's error log?  and where is the log,
> > > btw?
> >
> > Nope; the log is /logs/www/error_log, rotated nightly to
> > /x2/logarchive/www/error_log.0.gz, kept only one more day.  I don't see
> > anything in the log except lots of
> >
> > [Wed Feb 07 13:11:33 2001] [warn] (54)Connection reset by peer: setsockopt: (TCP_NODELAY)
> > [Wed Feb 07 13:11:34 2001] [warn] (54)Connection reset by peer: setsockopt: (TCP_NODELAY)
> > [Wed Feb 07 13:11:35 2001] [warn] (54)Connection reset by peer: setsockopt: (TCP_NODELAY)
> > [Wed Feb 07 13:11:35 2001] [warn] (54)Connection reset by peer: setsockopt: (TCP_NODELAY)
> 
> I honestly believe this has to do with David Reid's change to store the
> socket options.  I haven't had a chance to investigate though.
> 

This message indicates that the connection died due to a TCP
reset...setsockopt() is an innocent victim.  We know we couldn't start
enough processes to handle the incoming workload, and TCP's accept queue
was overflowing.  I think the resets are related to us not being able to
keep up - due to our users getting frustrated and killing Netscape or
whatever.

Greg

Re: apache.org - MaxClients problem with prefork

Posted by rb...@covalent.net.

> >                     "server seems busy, (you may need "
> >                     "to increase StartServers, or Min/MaxSpareServers),
> > "
> >                     "spawning %d children, there are %d idle, and "
> >                     "%d total children"
> >
> > Brian, do you see that in yesterday's error log?  and where is the log,
> > btw?
> 
> Nope; the log is /logs/www/error_log, rotated nightly to
> /x2/logarchive/www/error_log.0.gz, kept only one more day.  I don't see
> anything in the log except lots of
> 
> [Wed Feb 07 13:11:33 2001] [warn] (54)Connection reset by peer: setsockopt: (TCP_NODELAY)
> [Wed Feb 07 13:11:34 2001] [warn] (54)Connection reset by peer: setsockopt: (TCP_NODELAY)
> [Wed Feb 07 13:11:35 2001] [warn] (54)Connection reset by peer: setsockopt: (TCP_NODELAY)
> [Wed Feb 07 13:11:35 2001] [warn] (54)Connection reset by peer: setsockopt: (TCP_NODELAY)

I honestly believe this has to do with David Reid's change to store the
socket options.  I haven't had a chance to investigate though.

Ryan

_______________________________________________________________________________
Ryan Bloom                        	rbb@apache.org
406 29th St.
San Francisco, CA 94131
-------------------------------------------------------------------------------

Re: apache.org - MaxClients problem with prefork

Posted by Brian Behlendorf <br...@collab.net>.

On Thu, 8 Feb 2001, Greg Ames wrote:
> I assume we don't get that on 2.0 apache.org
>
>                     "server seems busy, (you may need "
>                     "to increase StartServers, or Min/MaxSpareServers),
> "
>                     "spawning %d children, there are %d idle, and "
>                     "%d total children"
>
> Brian, do you see that in yesterday's error log?  and where is the log,
> btw?

Nope; the log is /logs/www/error_log, rotated nightly to
/x2/logarchive/www/error_log.0.gz, kept only one more day.  I don't see
anything in the log except lots of

[Wed Feb 07 13:11:33 2001] [warn] (54)Connection reset by peer: setsockopt: (TCP_NODELAY)
[Wed Feb 07 13:11:34 2001] [warn] (54)Connection reset by peer: setsockopt: (TCP_NODELAY)
[Wed Feb 07 13:11:35 2001] [warn] (54)Connection reset by peer: setsockopt: (TCP_NODELAY)
[Wed Feb 07 13:11:35 2001] [warn] (54)Connection reset by peer: setsockopt: (TCP_NODELAY)

	Brian