You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2005/06/20 17:25:14 UTC

[Bug 4410] New: syswrite() on closed filehandle

http://bugzilla.spamassassin.org/show_bug.cgi?id=4410

           Summary: syswrite() on closed filehandle
           Product: Spamassassin
           Version: SVN Trunk (Latest Devel Version)
          Platform: Other
        OS/Version: other
            Status: NEW
          Severity: normal
          Priority: P5
         Component: spamc/spamd
        AssignedTo: dev@spamassassin.apache.org
        ReportedBy: jbrunette@excel.net


SpamAssassin 3.1.0pre1, Fedora Core 4, perl 5.8.6.

After a few minutes of spamd running seemingly happy, it sent out about 20
copies of the following attached output. They appear to be related to the spamd
child forking system.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4410] syswrite() on closed filehandle

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4410





------- Additional Comments From jm@jmason.org  2005-06-20 22:41 -------
actually, I can answer my own question -- looking at my server's logs, I can
spot a few of these.  it's totally harmless (although ugly).



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4410] syswrite() on closed filehandle

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4410


jm@jmason.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|3.1.0                       |Undefined




------- Additional Comments From jm@jmason.org  2005-06-20 14:59 -------
that sounds very likely.  the idea is that main_ping_kids can mark
a child as probably dead ("K"), but until SIGCHLD is received,
the process is still around so it shouldn't be removed from the list.

once the SIGCHLD is received, then the process is definitely dead
and should be removed from that list and its socket deleted.  so
I think you're spot on.




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4410] syswrite() on closed filehandle

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4410





------- Additional Comments From jm@jmason.org  2005-06-20 22:39 -------
btw, enough about the fix -- which is indeed correct btw.

what are the *effects* of this bug?  how much hosage is involved? ;)  does it
stabilise and carry on scanning spam, or does the master spamd become unusable?

in terms of what's triggering it, it happens 2.5 to 5 minutes after a transition
from reasonably heavy load to idleness (ie. a child was over-idle, and was killed).

anyway, fix applied:
Sending        lib/Mail/SpamAssassin/SpamdForkScaling.pm
Transmitting file data .
Committed revision 191624.




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4410] syswrite() on closed filehandle

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4410





------- Additional Comments From jbrunette@excel.net  2005-06-20 09:36 -------
child_exited() is not being called for some reason, and when main_ping_kids()
runs, it sees entries in $self->{backchannel}->{kids} for kids that have long
passed.  Looking into why child_exited() isn't being called.

Er...call me crazy but I can't find any code calling "child_exited()" anywhere?




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4410] syswrite() on closed filehandle

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4410





------- Additional Comments From jbrunette@excel.net  2005-06-20 13:00 -------
I'm not suggesting that this is *the* fix, but adding the
delete_socket_for_child() call like this into SpamdForkScaling::child_exited():

  delete $self->{kids}->{$pid};
+ $self->{backchannel}->delete_socket_for_child($pid);

...has, at least for me, stopped the errors and makes spamd seem "happy".  I'm
sure there's a more proper fix...this is just an FYI.




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4410] syswrite() on closed filehandle

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4410





------- Additional Comments From jbrunette@excel.net  2005-06-20 08:58 -------
Created an attachment (id=2946)
 --> (http://bugzilla.spamassassin.org/attachment.cgi?id=2946&action=view)
life of a spamd child which generates errors

Here is a bit more info about a particular child which crashed and burned.  It
was created but never scanned a message and was told to die.  However, I do see
spamd children which did scan message(s) and still caused errors like this.




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4410] syswrite() on closed filehandle

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4410





------- Additional Comments From jbrunette@excel.net  2005-06-20 09:55 -------
child_exited is referenced in the spamd executable, of course.

Attaching a full spamd debug log from start to finish.




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4410] syswrite() on closed filehandle

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4410





------- Additional Comments From jbrunette@excel.net  2005-06-21 06:21 -------
There is a somewhat large negative impact.  Eventually (or possibly, immediately
after the first set of errors), spamd may think there are too many children
running and not spawn any more than min_children at a time.

When the errors occur, main_ping_kids() runs child_error_kill(), sending it the
long-dead kid's pid.  child_error_kill() runs set_child_state() on that pid. 
Now there's an entry in $self->{kids} for the pid that's long gone and will
never be acted on.  If there are enough of these, adapt_num_children() won't
spawn any more kids until spamd is restarted.




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4410] syswrite() on closed filehandle

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4410





------- Additional Comments From jbrunette@excel.net  2005-06-20 08:26 -------
Created an attachment (id=2945)
 --> (http://bugzilla.spamassassin.org/attachment.cgi?id=2945&action=view)
syslog output of spamd errors

The only variations are the GEN## and process ids of the children it's
complaining about.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4410] syswrite() on closed filehandle

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4410


parkerm@pobox.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|Undefined                   |3.1.0






------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4410] syswrite() on closed filehandle

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4410





------- Additional Comments From jm@jmason.org  2005-06-21 10:02 -------
'When the errors occur, main_ping_kids() runs child_error_kill(), sending it the
long-dead kid's pid.  child_error_kill() runs set_child_state() on that pid. 
Now there's an entry in $self->{kids} for the pid that's long gone and will
never be acted on.  If there are enough of these, adapt_num_children() won't
spawn any more kids until spamd is restarted.'

actually, yes, you're right -- and the effect is that it's now limited to (e.g.)
two children only instead of whatever --max-children is set to, limiting its
throughput.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4410] syswrite() on closed filehandle

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4410





------- Additional Comments From jbrunette@excel.net  2005-06-20 12:02 -------
More investigation, correct me if I'm wrong.

main_ping_kids() goes through the list of kids in the backchannel object. 
However, when a child exits, I do not think the entry in backchannel->{kids} is
removed for it.  When main_ping_kids() goes through backchannel->{kids}, it
finds entries for every kid that existed up to that point.  The socket objects
still exist but the sockets have been closed since the child exited prior.

The only time backchannel->delete_socket_for_child($kid) is called is when
syswrite_with_retry() in main_ping_kids() fails on the child's socket (which has
been closed).  This is after the errors occur and the damage has been done.




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4410] syswrite() on closed filehandle

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4410


jbrunette@excel.net changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED




------- Additional Comments From jbrunette@excel.net  2005-06-30 06:52 -------
3.1.0pre2 includes this fix, closing.  Thanks!



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4410] syswrite() on closed filehandle

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4410





------- Additional Comments From jbrunette@excel.net  2005-06-20 10:22 -------
spamd command line: spamd -q -m 20 -x -s local1 -u nobody



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4410] syswrite() on closed filehandle

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4410





------- Additional Comments From jbrunette@excel.net  2005-06-20 08:30 -------
The "prefork: child states" message just before that was:

Jun 20 10:16:05 mke-lnx-02 spamassassin[5007]: prefork: child states: BIBI

Afterword, they looked like this:

Jun 20 10:16:46 mke-lnx-02 spamassassin[5007]: prefork: child states:
BBKKKKKKKKKKKKKKKKKKKKKKK
Jun 20 10:16:46 mke-lnx-02 spamassassin[5007]: prefork: server reached
--max-clients setting, consider raising it
Jun 20 10:16:46 mke-lnx-02 spamassassin[5048]: spamd: connection from
green.excel.net [209.103.224.2] at port 49941
Jun 20 10:16:46 mke-lnx-02 spamassassin[5007]: prefork: child states:
BBKKKKKKKKKKKKKKKKKKKKKKK
Jun 20 10:16:46 mke-lnx-02 spamassassin[5007]: prefork: server reached
--max-clients setting, consider raising it

Of note, the processes of the "K" children were gone from the system.  Spamd
thought they were still running, however, and only allowed two child processes
to do any work.





------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4410] syswrite() on closed filehandle

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4410


jbrunette@excel.net changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #2945 is|0                           |1
           obsolete|                            |
Attachment #2946 is|0                           |1
           obsolete|                            |




------- Additional Comments From jbrunette@excel.net  2005-06-20 09:57 -------
Created an attachment (id=2947)
 --> (http://bugzilla.spamassassin.org/attachment.cgi?id=2947&action=view)
spamd debug log with errors




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.