You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2007/10/02 17:11:43 UTC

[Bug 5665] New: spamd doesn't recognize when children have exited

http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5665

           Summary: spamd doesn't recognize when children have exited
           Product: Spamassassin
           Version: 3.2.3
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P4
         Component: spamc/spamd
        AssignedTo: dev@spamassassin.apache.org
        ReportedBy: kdeugau@vianet.ca


We're running a cluster of 4 spamd servers on Debian etch, amd64.  With a recent
upgrade to 3.2.3, we've started seeing spamd not notice that exiting children
have in fact exited (according to ps and top), and retains a ghost record in the
K state.

Over time, this fills up spamd's internal child tracking table, and eventually
all processing stalls out.

With the default values for --min-children, --min-spare, and
--max-conn-per-child, the first ghost entry shows up within about 15 minutes. 
Raising one or several of these in combination seems to make the problem less
likely.

Each ghost entry can be seen to happen along with a set of log entries like these:

prefork: cannot ping 25046, file handle not defined, child likely to still be
processing SIGCHLD handler after killing itself
prefork: killing failed child 25046 fd=undefined at
/opt/spamassassin-3.2.3/share/perl/5.8.8/Mail/SpamAssassin/SpamdForkScaling.pm
line 171.
prefork: kill of failed child 25046 failed: No such process
prefork: killed child 25046

This appears to be similar to bug 5313, but inverted;  the child processes *are*
killed successfully according to the OS, but spamd doesn't find out about it. 
Checking with ps or top shows that the PID in the log has in fact exited.

Enabling --round-robin seems to be working around the problem for now, but the
overall system load is much higher.

SA is installed from source on all four machines by a script set up to keep the
installations as close as possible to identical.

The Bayes DB is in MySQL on one machine;  that system is slightly slower to lose
track of its spamd children than the other 3.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5665] spamd keeps dead kids in state 'K', causing child hash to fill up

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5665


jm@jmason.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|spamd doesn't recognize when|spamd keeps dead kids in
                   |children have exited        |state 'K', causing child
                   |                            |hash to fill up
   Target Milestone|Undefined                   |3.2.4






------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5665] spamd keeps dead kids in state 'K', causing child hash to fill up

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5665


jm@jmason.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #4142 is|0                           |1
           obsolete|                            |




------- Additional Comments From jm@jmason.org  2007-10-05 11:22 -------
Created an attachment (id=4143)
 --> (http://issues.apache.org/SpamAssassin/attachment.cgi?id=4143&action=view)
minor tweak

Kris, could you try this version of the patch?	it removes a redundant
delete_socket_for_child() call and quiets down the debugging, but otherwise
should be exactly the same.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5665] spamd doesn't recognize when children have exited

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5665





------- Additional Comments From kdeugau@vianet.ca  2007-10-02 14:23 -------
Created an attachment (id=4142)
 --> (http://issues.apache.org/SpamAssassin/attachment.cgi?id=4142&action=view)
Quick hack to clean up ghost K-state children

The attached patch seems to be working to eliminate the ghost K-state children;
 I've patched the four production machines that were showing the problem and
all four are stable.  One has been running for ~3 hours, where it would have
accumulated ~5-8 (possibly more) ghost children without the patch during that
time.

The patch as-is includes some debug "logging", and could probably be vastly
improved.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5665] [review] spamd keeps dead kids in state 'K', causing child hash to fill up

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5665


jm@jmason.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|spamd keeps dead kids in    |[review] spamd keeps dead
                   |state 'K', causing child    |kids in state 'K', causing
                   |hash to fill up             |child hash to fill up
  Status Whiteboard|                            |needs 2 votes for 3.2




------- Additional Comments From jm@jmason.org  2007-10-07 03:59 -------
ok, applied to 3.3.0:

: jm 189...; svn commit -m "bug 5665: spamd may fail to notice that a child has
completed exiting, and keeps it in the child list in state 'K', eventually
filling up the child list with 'ghost' children.  fix"
lib/Mail/SpamAssassin/SpamdForkScaling.pm
Sending        lib/Mail/SpamAssassin/SpamdForkScaling.pm
Transmitting file data .
Committed revision 582610.


committers, votes please...



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5665] spamd keeps dead kids in state 'K', causing child hash to fill up

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5665





------- Additional Comments From kdeugau@vianet.ca  2007-10-03 10:11 -------
(In reply to comment #2)
> Kris, if that patch works OK, it looks good to me.  Could you monitor it for a
> few more days and let me know if it's still working, by the end of that?  If
> it is, I'll add the patch to SVN and 3.2.x.

ACK OK.

FWIW, it's been stable well beyond the point of "spamd ran out of child slots"
already, but I'll still watch it for another day or so to make sure it doesn't
eat the servers or stomp all over something else.

On one machine I'm seeing the "prefork: debug:" notes every ~3 minutes.  O_o

I honestly can't tell whether this is just papering over the "real" problem
somewhere else, or doing exactly what I intended and providing a little extra
cleanup where it's needed.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5665] spamd doesn't recognize when children have exited

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5665





------- Additional Comments From jm@jmason.org  2007-10-03 03:20 -------
Kris, if that patch works OK, it looks good to me.  Could you monitor it for a
few more days and let me know if it's still working, by the end of that?  If it
is, I'll add the patch to SVN and 3.2.x.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5665] spamd keeps dead kids in state 'K', causing child hash to fill up

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5665





------- Additional Comments From kdeugau@vianet.ca  2007-10-05 11:08 -------
(In reply to comment #3)
> FWIW, it's been stable well beyond the point of "spamd ran out of child slots"
> already, but I'll still watch it for another day or so to make sure it doesn't
> eat the servers or stomp all over something else.

Still stable on all four machines that were showing the problem.  None have
needed spamd restarted since I applied the patch;  unpatched spamd would run out
of child slots within 6-8 hours at most.  No apparent problems with any other
services (not that there's much else beyond SA).  No zombie children left
hanging around where there shouldn't be.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5665] [review] spamd keeps dead kids in state 'K', causing child hash to fill up

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5665


sidney@sidney.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Status Whiteboard|needs 1 votes for 3.2       |ready to commit for 3.2




------- Additional Comments From sidney@sidney.com  2007-12-16 11:09 -------
+1




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5665] spamd keeps dead kids in state 'K', causing child hash to fill up

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5665





------- Additional Comments From kdeugau@vianet.ca  2007-10-05 14:47 -------
(In reply to comment #5)
> Created an attachment (id=4143)
 --> (http://issues.apache.org/SpamAssassin/attachment.cgi?id=4143&action=view) [edit]
> minor tweak
> 
> Kris, could you try this version of the patch?	it removes a redundant
> delete_socket_for_child() call and quiets down the debugging, but otherwise
> should be exactly the same.

Seems to be working;  one system is stable for 3 hours so far.  Unpatched, ghost
children usually show up within 15-20 minutes.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5665] [review] spamd keeps dead kids in state 'K', causing child hash to fill up

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5665


jm@jmason.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED




------- Additional Comments From jm@jmason.org  2007-12-16 13:20 -------
fix checked in for 3.2.x: r604706



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5665] [review] spamd keeps dead kids in state 'K', causing child hash to fill up

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5665


spamassassin@dostech.ca changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Status Whiteboard|needs 2 votes for 3.2       |needs 1 votes for 3.2




------- Additional Comments From spamassassin@dostech.ca  2007-11-06 13:35 -------
>From my memory of how SpamdForkScaling works it looks safe, so +1.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.