You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Jeff Trawick <tr...@gmail.com> on 2007/10/25 17:00:36 UTC

bogus "Bad pid (%d) in scoreboard slot %d" messages when restarting 1.3

I think this is the problem: When a child is reaped normally after
exiting due to MaxSpareServers or MaxRequestsPerChild, it remains in
the scoreboard with status set to SERVER_DEAD, and it is removed from
the pid table.

Often that slot will be reused by a child created subsequently.

If it is never reused before termination or hard restart,
reclaim_child_processes() will see it in this code and complain that
it isn't in the pid table:

	for (i = 0; i < max_daemons_limit; ++i) {
	    int pid = ap_scoreboard_image->parent[i].pid;

	    if (pid == my_pid || pid == 0)
		continue;

            if (!in_pid_table(pid)) {
                ap_log_error(APLOG_MARK, APLOG_NOERRNO|APLOG_ERR, server_conf,
                             "Bad pid (%d) in scoreboard slot %d", pid, i);
                continue;
            }

But it doesn't need to complain if the child is in the scoreboard with
state SERVER_DEAD, since that means it exited previously and is out of
the pid table.

Here's a hack to look out for that situation:

             if (!in_pid_table(pid)) {
-                ap_log_error(APLOG_MARK, APLOG_NOERRNO|APLOG_ERR, server_conf,
-                             "Bad pid (%d) in scoreboard slot %d", pid, i);
+                 /* Report an error if the scoreboard state for this child is
+                  * something besides SERVER_DEAD or if we can't find the
+                  * child slot.
+                  *
+                  * It is okay to find it with state SERVER_DEAD.  The child
+                  * exited normally, the state was set to SERVER_DEAD, and we
+                  * didn't subsequently reuse that scoreboard slot for another
+                  * child.
+                  */
+                int child_slot = find_child_by_pid(pid);
+
+                if (child_slot < 0
+                    ||
ap_scoreboard_image->servers[child_slot].status != SERVER_DEAD) {
+                    ap_log_error(APLOG_MARK, APLOG_NOERRNO|APLOG_ERR,
server_conf,
+                                 "Bad pid (%d) in scoreboard slot %d", pid, i);
+                }
+                else {
+                    ap_log_error(APLOG_MARK, APLOG_NOERRNO|APLOG_ERR,
server_conf,
+                                 "avoided bad pid msg for %d;
child_slot %d, status %d",
+                                 pid, child_slot, child_slot >= 0 ?
ap_scoreboard_image->servers[child_slot].status : -1);
+                }
                 continue;
             }

The "avoided bad pid" msg is just for debugging, of course.

This is perhaps not cool on whatever imaginary machines keep the
scoreboard in a file.  I never fully grokked the sync-scoreboard-image
requirements.

Re: bogus "Bad pid (%d) in scoreboard slot %d" messages when restarting 1.3

Posted by Jim Jagielski <ji...@jaguNET.com>.
On Oct 26, 2007, at 7:32 AM, Jeff Trawick wrote:

>
> sure; I'm lacking cycles at the moment to start looking through the
> code for potential fallout; hope to start looking soon
>

by the by, I'll develop the (minor) patch while also stepping
through as well...

Re: bogus "Bad pid (%d) in scoreboard slot %d" messages when restarting 1.3

Posted by Jim Jagielski <ji...@jaguNET.com>.
On Oct 26, 2007, at 8:13 AM, Jim Jagielski wrote:

>>
>> sure; I'm lacking cycles at the moment to start looking through the
>> code for potential fallout; hope to start looking soon
>>
>
> I spent just a little bit of time, but the current code
> has a mishmash of logic checking for pid == 0 or SERVER_DEAD
> and sometimes both but not always. The constant assumption
> is that a pid of 0 means no child. So from a cursory check,
> this should be fixed anyway, and will also address the
> current issue as well.
>

btw, the only fallout I saw from a quick check last night
dealt with mod_status on TPF where there is an express
section of code that wants to show the PIDs of the
"living dead":

                         if (score_record.status == SERVER_DEAD)
#ifdef TPF
                             if (kill(ps_record.pid, 0) == 0) {
                                 /* on TPF show PIDs of the living  
dead */
                                 ap_rprintf(r,
                                 "<b>Server %d-%d</b> (%d): %d|%lu|% 
lu [",
                                 i, (int) ps_record.generation,
                                 (int)ps_record.pid, (int) conn_lres,
                                 my_lres, lres);
                             } else
#endif /* TPF */

Note 2 things:

   1. We don't bother to check if that is actually a "valid" pid,
      that is, a pid that belongs to us.
   2. But we just just want to see if the process is still around
      anyway.

To be honest, this looks weird to me since a bunch of the (unused) slots
with be SERVER_DEAD (which is 0) and with a pid of 0 so TPF appears
to handle this OK (I'm guessing the fact that TPF has no group
concept means its kill(0, 0) is different from Unix's)...

Recall that when we init the scoreboard anyway, we memset 0, so all  
unused
slots are SERVER_DEAD and pid == 0.

Re: bogus "Bad pid (%d) in scoreboard slot %d" messages when restarting 1.3

Posted by Jim Jagielski <ji...@jaguNET.com>.
On Oct 26, 2007, at 7:32 AM, Jeff Trawick wrote:

> On 10/25/07, Jim Jagielski <ji...@jagunet.com> wrote:
>> On Oct 25, 2007, at 11:00 AM, Jeff Trawick wrote:
>>
>>> I think this is the problem: When a child is reaped normally after
>>> exiting due to MaxSpareServers or MaxRequestsPerChild, it remains in
>>> the scoreboard with status set to SERVER_DEAD, and it is removed  
>>> from
>>> the pid table.
>>>
>>> Often that slot will be reused by a child created subsequently.
>>>
>>> If it is never reused before termination or hard restart,
>>> reclaim_child_processes() will see it in this code and complain that
>>> it isn't in the pid table:
>>
>> Yep... that appears to be it. When setting SERVER_DEAD we
>> aren't resetting the pid as well. Instead of working around
>> that, wouldn't the most straightforward approach be to
>> sync setting SERVER_DEAD status with also setting pid to 0?
>> This could be done in ap_update_child_status() which would
>> also hopefully address those file-based scoreboards as well.
>
> sure; I'm lacking cycles at the moment to start looking through the
> code for potential fallout; hope to start looking soon
>

I spent just a little bit of time, but the current code
has a mishmash of logic checking for pid == 0 or SERVER_DEAD
and sometimes both but not always. The constant assumption
is that a pid of 0 means no child. So from a cursory check,
this should be fixed anyway, and will also address the
current issue as well.

Re: bogus "Bad pid (%d) in scoreboard slot %d" messages when restarting 1.3

Posted by Jeff Trawick <tr...@gmail.com>.
On 10/25/07, Jim Jagielski <ji...@jagunet.com> wrote:
> On Oct 25, 2007, at 11:00 AM, Jeff Trawick wrote:
>
> > I think this is the problem: When a child is reaped normally after
> > exiting due to MaxSpareServers or MaxRequestsPerChild, it remains in
> > the scoreboard with status set to SERVER_DEAD, and it is removed from
> > the pid table.
> >
> > Often that slot will be reused by a child created subsequently.
> >
> > If it is never reused before termination or hard restart,
> > reclaim_child_processes() will see it in this code and complain that
> > it isn't in the pid table:
>
> Yep... that appears to be it. When setting SERVER_DEAD we
> aren't resetting the pid as well. Instead of working around
> that, wouldn't the most straightforward approach be to
> sync setting SERVER_DEAD status with also setting pid to 0?
> This could be done in ap_update_child_status() which would
> also hopefully address those file-based scoreboards as well.

sure; I'm lacking cycles at the moment to start looking through the
code for potential fallout; hope to start looking soon

Re: bogus "Bad pid (%d) in scoreboard slot %d" messages when restarting 1.3

Posted by Jim Jagielski <ji...@jaguNET.com>.
On Oct 25, 2007, at 11:00 AM, Jeff Trawick wrote:

> I think this is the problem: When a child is reaped normally after
> exiting due to MaxSpareServers or MaxRequestsPerChild, it remains in
> the scoreboard with status set to SERVER_DEAD, and it is removed from
> the pid table.
>
> Often that slot will be reused by a child created subsequently.
>
> If it is never reused before termination or hard restart,
> reclaim_child_processes() will see it in this code and complain that
> it isn't in the pid table:

Yep... that appears to be it. When setting SERVER_DEAD we
aren't resetting the pid as well. Instead of working around
that, wouldn't the most straightforward approach be to
sync setting SERVER_DEAD status with also setting pid to 0?
This could be done in ap_update_child_status() which would
also hopefully address those file-based scoreboards as well.