You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Mabry Tyson <Ty...@AI.SRI.COM> on 2005/12/30 19:33:00 UTC
Spamd (3.1.0) fails to terminate -- suggested patch
We keep occasionally (once a day or two) hitting an infinite loop in
spamd 3.1.0 where the process fails to terminate. We find that after a
while some resource is used up and the system stops accepting new spamd
connections. The process accumulates run time, but is not running at
100% of the cpu.
In the syslogs we find
> Dec 30 06:25:05 Savory spamd[31005]: prefork: syswrite(7) failed,
> retrying... at
> /usr/local/lib/perl5/site_perl/5.8.3/Mail/SpamAssassin/SpamdForkScaling.pm
> line 554.
> Dec 30 06:25:30 Savory last message repeated 5 times
> ...
> Dec 30 08:00:30 Savory last message repeated 12 times
> Dec 30 08:01:05 Savory last message repeated 7 times
I killed the process here.
> Dec 30 08:01:10 Savory spamd[31005]: spamd: handled cleanup of child
> pid 12022 due to SIGCHLD
> Dec 30 08:01:10 Savory spamd[31005]: prefork: write of ping failed to
> 12022 fd=7: Broken pipe at
> /usr/local/lib/perl5/site_perl/5.8.3/Mail/SpamAssassin/SpamdForkScaling.pm
> line 287.
> Dec 30 08:01:10 Savory spamd[31005]: prefork: killing failed child
> 12022 fd=7 at
> /usr/local/lib/perl5/site_perl/5.8.3/Mail/SpamAssassin/SpamdForkScaling.pm
> line 123.
> Dec 30 08:01:10 Savory spamd[31005]: prefork: kill of failed child
> 12022 failed: No such process
> Dec 30 08:01:10 Savory spamd[31005]: prefork: killed child 12022 at
> /usr/local/lib/perl5/site_perl/5.8.3/Mail/SpamAssassin/SpamdForkScaling.pm
> line 137.
> Dec 30 08:01:10 Savory spamd[31005]: spamd: server successfully
> spawned child process, pid 23037
> Dec 30 08:01:10 Savory spamd[31005]: spamd: server successfully
> spawned child process, pid 23038
> Dec 30 08:01:10 Savory spamd[31005]: prefork: lost idle kids, so still
> overloaded at
> /usr/local/lib/perl5/site_perl/5.8.3/Mail/SpamAssassin/SpamdForkScaling.pm
> line 262.
>
It appears that the routine "syswrite_with_retry" in
SpamdForkScaling.pm needs to be modified to timeout eventually rather
than to loop indefinitely. That routine has:
> warn "prefork: syswrite(".$sock->fileno.") failed, retrying...";
>
> # give it 5 seconds to recover. we retry indefinitely.
We are running with --timeout-child=300 where the docs say:
> --timeout-tcp=number This option specifies the number
> of seconds to wait for headers
> from a client (spamc) before closing the connection. The
> minimum
> value is 1, the default value is 30, and a value of 0 will
> disable
> socket timeouts completely.
>
> --timeout-child=number
> This option specifies the number of seconds to wait for a spamd
> child to to process or check a message. The minimum value
> is 1,
> the default value is 300, and a value of 0 will disable
> child time-
> outs completely.
>
At this point, we don't know what the reason is that the syswrite is
failing, but, no matter what the reason, the timeout should be obeyed.
If I lose this one message, that's a lot better than losing all the
messages while this sits and does nothing useful.
I can imagine various ways of modifying the code. Here's a suggested
change based on the code in sysread_with_timeout, but I haven't used
this long enough to see the failure.
> savory:/usr/local/lib/perl5/site_perl/5.8.3/Mail/SpamAssassin# diff -e
> SpamdForkScaling-FCS.pm SpamdForkScaling.pm
> 556c
> # ok, we didn't get it first time. We'll have to start using
> # select() and timeouts (which is slower).
>
> my $now = time();
> if (!defined $deadline) {
> # set this. it'll be close enough ;)
> $deadline = $now + TOUT_WRITE_MAX;
> }
> elsif ($now > $deadline) {
> # timed out! report failure
> warn "prefork: syswrite(".$sock->fileno.") failed after " .
> TOUT_WRITE_MAX . " secs";
> return undef;
> }
>
>
> # give it 5 seconds to recover. we retry until the timeout
> .
> 545a
> my $deadline; # we only set this if the first write fails
> .
> 69a
> # timeout for a syswrite() on the command channel. if we go this lon
> # without writing, it's an error.
> use constant TOUT_WRITE_MAX => 300;
> .
For our installed spamassassin,
spamd --version shows
SpamAssassin Server version 3.1.0
running on Perl 5.8.3
uname -a shows
Linux savory 2.4.18-1-686 #1 Wed Apr 14 18:20:10 UTC 2004 i686 GNU/Linux
Re: Spamd (3.1.0) fails to terminate -- suggested patch
Posted by "Daryl C. W. O'Shea" <sp...@dostech.ca>.
On 30/12/2005 1:33 PM, Mabry Tyson wrote:
> We keep occasionally (once a day or two) hitting an infinite loop in
> spamd 3.1.0 where the process fails to terminate. We find that after a
> while some resource is used up and the system stops accepting new spamd
> connections. The process accumulates run time, but is not running at
> 100% of the cpu.
>
> In the syslogs we find
>
>> Dec 30 06:25:05 Savory spamd[31005]: prefork: syswrite(7) failed,
>> retrying... at
>> /usr/local/lib/perl5/site_perl/5.8.3/Mail/SpamAssassin/SpamdForkScaling.pm
>> line 554.
>> Dec 30 06:25:30 Savory last message repeated 5 times
>> ...
>> Dec 30 08:00:30 Savory last message repeated 12 times
>> Dec 30 08:01:05 Savory last message repeated 7 times
PLEASE take a look at the following bug:
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4590
...and compare your patch to Justin's. If you can try out Justin's
patch and comment on whether it works for you, it'd be appreciated.
Daryl