You are viewing a plain text version of this content. The canonical link for it is here.

Posted to bugs@httpd.apache.org by bu...@apache.org on 2012/07/17 04:21:21 UTC

[Bug 53555] New: Scoreboard full error with event/ssl

https://issues.apache.org/bugzilla/show_bug.cgi?id=53555

          Priority: P2
            Bug ID: 53555
          Assignee: bugs@httpd.apache.org
           Summary: Scoreboard full error with event/ssl
          Severity: normal
    Classification: Unclassified
                OS: FreeBSD
          Reporter: astrange@ithinksw.com
          Hardware: PC
            Status: NEW
           Version: 2.4.2
         Component: mpm_event
           Product: Apache httpd-2

A high-traffic web server using event MPM and mostly receiving HTTPS requests
frequently got the error "scoreboard is full, not at MaxRequestWorkers" and
showed very bad performance.

We fixed the issue by reverting from 2.4.2 to 2.2.22, still using event MPM.

Related httpd.conf settings:

 StartServers 16
 MinSpareThreads 4
 MaxSpareThreads 4
 ListenBacklog 4096
 Timeout 5

Unfortunately don't have a capture of the server status page and increasing the
log level didn't seem to show much.

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org

[Bug 53555] Scoreboard full error with event/ssl

Posted by bu...@apache.org.

https://issues.apache.org/bugzilla/show_bug.cgi?id=53555

--- Comment #2 from Niklas Edmundsson <ni...@acc.umu.se> ---
We've seen AH00485: scoreboard is full, not at MaxRequestWorkers on 2.4.4 with
the event MPM, no SSL involved.

Haven't figured out the exact conditions yet, but involved are:
* High/varying load, causing worker processes to be spawned and killed,
  filling up the scoreboard with G:s.
* Server reloads due to config changes.

I suspect the root cause is that server processes are flagged for killing, but
later they're needed again but instead of reviving the existing process a new
one is created. If you have a lot of slow connections (this is a file archive
serving DVD-images etc) processes can add up.

The scoreboard can look like this after a while:

----------8<----------------
PID    Connections     Threads    Async connections
total    accepting    busy    idle    writing    keep-alive    closing
14465    94    no    0    0    72    0    21
28881    132    yes    0    0    79    0    6
23632    582    no    0    0    523    0    51
32314    43    no    0    0    28    0    15
13766    577    no    0    0    564    1    2
337    42    no    0    0    28    0    13
19580    39    no    0    0    27    0    12
30603    478    no    0    0    424    0    52
32163    177    no    0    0    136    0    24
16159    429    no    0    0    374    0    54
15376    93    no    0    0    45    0    47
32478    124    no    0    0    86    0    38
30604    395    yes    2    48    390    3    0
30667    61    no    0    0    38    0    17
31569    58    no    0    0    27    0    20
19614    161    no    0    0    117    0    44
32286    253    yes    0    50    252    0    0
17643    454    yes    2    48    445    0    3
23353    49    no    0    0    27    2    20
31581    145    no    0    0    106    0    34
Sum    4386         4    146    3788    6    473

LGLGGGLLGLGLGLLLLLGLGLGLLLLLGLLLLLLLLLLLLLLGGGLGLLGGGGLGGLGGGGGG
GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGLGGGGGGGGGGGGGGGGGGGGGGGGG
GGGGGGGGGLGLGGGGGLGLLGGGLGLLLLLLGGGLLLLLGGLGLGLLLGGGLGLLLGLGLLGL
LGGLLLLGGGGGGLGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGLGL
GGLLGGGLLGGLGLGGGGGLLGGGGLGLLLLLLGGGGGLGGGGGGLLLLLGLLLGLLLLLLLGL
LLLGLLLGLGLGGGLGLGGGGLLGLGGLLLGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
GLGGGGGGGGGGGGGGGGGGGGGGGGGLGGGGGGGGGGGGLGGGGGGGGGGGGGGGGGGGGGGG
GGLLLLLGLLLLGLLLLGLGLLLLGGLGLLLLLGLLGLLLLLLLLLLGGLLLGLGGGGGGGGGG
GGGGLGGGGLGGGLGGGGGGGLGGGGGGGGGGGGGGGGGGGGLGGGGGLLGGGGLLGGGLGLLG
GGGLGGLLGGGGLGGLGGLGLGGL____________________WW__________________
__________GGGGGGGGGGGGGGGGGGGGGGGGGGLGGLGLGGGGGGGGGGGGGGGGGGLLGG
LGLLGLGLGGGLLGLGGLLLLGLGGGLGLLGGLGLLGLGLLGLGGGLGGGGGGGGGGGGGLGGG
GLGGLGGGGGLGGGGGGGGGGLGLGGLLGLGG________________________________
____________________W___W_______________________________________
____GLGLLLLLLLGGGLLGGLLLGGLLLLLLGGLGLLGGLLGGGGLGLLLGGGLLGGLGLGGG
LLGLGGLLLLGLGLLGGGGGGLLGGGGGGLLGGGGLGLGL
----------8<----------------

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org

[Bug 53555] Scoreboard full error with event/ssl

Posted by bu...@apache.org.

https://issues.apache.org/bugzilla/show_bug.cgi?id=53555

Jean-Loup C. <p...@hfox.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |p@hfox.org

--- Comment #10 from Jean-Loup C. <p...@hfox.org> ---
I have a similar behavior as described here (with no ssl involved) with httpd
2.4.9.

I got a lot of AH00485: "scoreboard is full, not at MaxRequestWorkers", httpd
is still serving requests, however one worker is in graceful finishing state
and is taking 100% CPU.

The worker was in this stat for about 24h, until I kill(1)ed it.

Threads stats:

__________________W_____________________________________________
GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG

Unfortunately I don't have any other info from the status page.

strace of the worker shows an epoll_wait infinite loop:

    [...]
    epoll_wait(10, {}, 128, 100)            = 0
    epoll_wait(10, {}, 128, 100)            = 0
    epoll_wait(10, {}, 128, 100)            = 0
    [...]

mpm event config:

    StartServers         1
    ServerLimit          4
    MinSpareThreads      4
    MaxRequestWorkers    128
    ThreadsPerChild      64
    ThreadLimit          64
    AsyncRequestWorkerFactor 4

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org

[Bug 53555] Scoreboard full error with event/ssl

Posted by bu...@apache.org.

https://issues.apache.org/bugzilla/show_bug.cgi?id=53555

Greg Ames <gr...@apache.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |gregames@apache.org

--- Comment #1 from Greg Ames <gr...@apache.org> ---
This may be obvious, but the server-status page is a huge help in analyzing
scoreboard full issues.  Do you remember what it looked like?  what state codes
were most prevalent?  The scoreboard can fill up quickly if a back end server
stalls.

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org

[Bug 53555] Scoreboard full error with event/ssl

Posted by bu...@apache.org.

https://issues.apache.org/bugzilla/show_bug.cgi?id=53555

Daniel Lemsing <da...@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P2                          |P1
           Hardware|PC                          |Sun
            Version|2.4.2                       |2.4.6
                 OS|FreeBSD                     |SunOS
           Severity|normal                      |major

--- Comment #7 from Daniel Lemsing <da...@gmail.com> ---
Recently hit this error in a high traffic production web server (Apache 2.4.6)
leading to an outage.

Has anyone had success in overcoming this issue by amending Apache
configuration ?

If so, what did you change ...

Also, can anyone offer any suggestions on what triggers this issue ?

Being a production server, rolling back to 2.2.22 is not preferable.

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org

[Bug 53555] Scoreboard full error with event/ssl

Posted by bu...@apache.org.

https://issues.apache.org/bugzilla/show_bug.cgi?id=53555

Jan de Groot <in...@jandegrootict.nl> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |info@jandegrootict.nl

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org

[Bug 53555] Scoreboard full error with event/ssl

Posted by bu...@apache.org.

https://issues.apache.org/bugzilla/show_bug.cgi?id=53555

--- Comment #8 from Niklas Edmundsson <ni...@acc.umu.se> ---
One of the gotchas with this is that the scoreboard seems to be sized to cater
for MaxRequestWorkers, with no margins for server reloads etc.

In our case, when it can take days for processes to exit if people are
downloading large files over slow connections, we can easily have the situation
where multiple server reloads (due to config changes etc) causes the scoreboard
to fill up with old server processes in graceful-shutdown mode and no space for
new processes to do some actual work.

I can see a few ways to work around this:

1) Simply make the scoreboard bigger. I'd like a default size-multiplier of 2
for the event MPM, but configurable so we can set it to 4 or something for our
setup. An alternative is to set a ridiculously large MaxRequestWorkers to get a
big enough scoreboard, but one DOS and we're out of scoreboard anyway.

2) Kill off the oldest gracefully-exiting processes when we can't spawn a new
process to do useful work.

The ideal solution is probably a mix of these two.

Also, I'm wondering if this is also somehow related to the "server dies for a
while when doing reload" issue. We're still at httpd 2.4.6 though, so I can't
say for certain that some of these issues aren't already fixed.

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org

[Bug 53555] Scoreboard full error with event/ssl

Posted by bu...@apache.org.

https://issues.apache.org/bugzilla/show_bug.cgi?id=53555

--- Comment #9 from Ryan Egesdahl <de...@gmail.com> ---
In case it matters any, this problem appears to be specific to the Event MPM. I
had it happening on a server, and when I switched it to the Worker MPM, it
stopped. However, what I did notice is that the same server periodically had
all of its workers taken up with requests, so that may be relevant to the
problem as well.

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org

[Bug 53555] Scoreboard full error with event/ssl

Posted by bu...@apache.org.

https://issues.apache.org/bugzilla/show_bug.cgi?id=53555

Ludovico Cavedon <lu...@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |ludovico.cavedon@gmail.com

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org

[Bug 53555] Scoreboard full error with event/ssl

Posted by bu...@apache.org.

https://issues.apache.org/bugzilla/show_bug.cgi?id=53555

Niklas Edmundsson <ni...@acc.umu.se> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |nikke@acc.umu.se

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org

[Bug 53555] Scoreboard full error with event/ssl

Posted by bu...@apache.org.

https://issues.apache.org/bugzilla/show_bug.cgi?id=53555

Daniel Lemsing <da...@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |daniel.lemsing@gmail.com

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org

[Bug 53555] Scoreboard full error with event/ssl

Posted by bu...@apache.org.

https://issues.apache.org/bugzilla/show_bug.cgi?id=53555

--- Comment #6 from Greg Ames <gr...@apache.org> ---
I looked at apache.org and the code.  The Ls are normal when a gracefully
exiting process had an active thread.  Sorry for jumping to conclusions.

close_listeners sets all the G states during graceful shutdown.  (Unfortunately
this means we can no longer see which threads are active vs. idle - not sure
having the G state is worth it.) Any active threads which finish their requests
will log and set the L state before exiting.  The Gs that remain could
represent exited threads or active requests - we can't tell from server-status.

The processes that didn't exit have active connections.  If they due to slow
downloads, maybe the thing to do is to tune for less or no graceful process
terminations when the traffic drops by raising MaxSpareThreads.

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org

[Bug 53555] Scoreboard full error with event/ssl

Posted by bu...@apache.org.

https://issues.apache.org/bugzilla/show_bug.cgi?id=53555

--- Comment #4 from Niklas Edmundsson <ni...@acc.umu.se> ---
> OK, there are many worker processes that hang while trying to shut down,
> probably due to traffic fluctuations. The only two states we see in the
> scoreboard are G and L.  The G should be transient and can probably be
> ignored.  The Ls look like the cause of the hangs.

Transient for the G:s can mean days in this case, think slooow ADSL connection
downloading a DVD image...

> L means the threads are hung while trying to write to the log.  Normally you
> never see this with logs on a reasonably fast local hard drive.  Are the log
> files NFS mounted or something like that?

No, local filesystem. But I'll have to double check that we're not doing
anything overly clever on the log front...

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org

[Bug 53555] Scoreboard full error with event/ssl

Posted by bu...@apache.org.

https://issues.apache.org/bugzilla/show_bug.cgi?id=53555

--- Comment #11 from Andrei Boros <an...@yahoo.com> ---
Apache 2.4.10 on Slackware Linux 14.1 x86_64 platform.

I am seeing this about once a minute in the logs:
AH00485: scoreboard is full, not at MaxRequestWorkers

I was able to recover only by a forced restart (stop then start).

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org

[Bug 53555] Scoreboard full error with event/ssl

Posted by bu...@apache.org.

https://issues.apache.org/bugzilla/show_bug.cgi?id=53555

--- Comment #5 from Rainer Jung <ra...@kippdata.de> ---
Greg,

I didn't check the code, but to me it seems that a "G" letter does not mean
there's no more work going on. The server-status on our own
www.(eu|us).apache.org shows the same G plus L mixture for about a minute
(varying) whenever a process dies due to MaxConnectionsPerChild. When I checked
such processes, they had open client connections and were still sending data to
the client. So it was correct they were still aorund, but the status letters
"G" or "L" for those gracefully exiting children are not showing those details.

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org

[Bug 53555] Scoreboard full error with event/ssl

Posted by bu...@apache.org.

https://issues.apache.org/bugzilla/show_bug.cgi?id=53555

--- Comment #3 from Greg Ames <gr...@apache.org> ---
(In reply to Niklas Edmundsson from comment #2)
> We've seen AH00485: scoreboard is full, not at MaxRequestWorkers on 2.4.4
> with the event MPM, no SSL involved.

> PID	Connections 	Threads	Async connections
> total	accepting	busy	idle	writing	keep-alive	closing
> 14465	94	no	0	0	72	0	21
> 28881	132	yes	0	0	79	0	6
> 23632	582	no	0	0	523	0	51
> 32314	43	no	0	0	28	0	15
> 13766	577	no	0	0	564	1	2

> LGLGGGLLGLGLGLLLLLGLGLGLLLLLGLLLLLLLLLLLLLLGGGLGLLGGGGLGGLGGGGGG
> GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGLGGGGGGGGGGGGGGGGGGGGGGGGG

OK, there are many worker processes that hang while trying to shut down,
probably due to traffic fluctuations. The only two states we see in the
scoreboard are G and L.  The G should be transient and can probably be ignored.
 The Ls look like the cause of the hangs.

L means the threads are hung while trying to write to the log.  Normally you
never see this with logs on a reasonably fast local hard drive.  Are the log
files NFS mounted or something like that?

Greg

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org