You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Mabry Tyson <Ty...@AI.SRI.COM> on 2005/12/30 19:33:00 UTC

Spamd (3.1.0) fails to terminate -- suggested patch

We keep occasionally (once a day or two) hitting an infinite loop in 
spamd 3.1.0 where the process fails to terminate.   We find that after a 
while some resource is used up and the system stops accepting new spamd 
connections.   The process accumulates run time, but is not running at 
100% of the cpu.

In the syslogs we find

> Dec 30 06:25:05 Savory spamd[31005]: prefork: syswrite(7) failed, 
> retrying... at 
> /usr/local/lib/perl5/site_perl/5.8.3/Mail/SpamAssassin/SpamdForkScaling.pm 
> line 554.
> Dec 30 06:25:30 Savory last message repeated 5 times
> ...
> Dec 30 08:00:30 Savory last message repeated 12 times
> Dec 30 08:01:05 Savory last message repeated 7 times

I killed the process here.

> Dec 30 08:01:10 Savory spamd[31005]: spamd: handled cleanup of child 
> pid 12022 due to SIGCHLD
> Dec 30 08:01:10 Savory spamd[31005]: prefork: write of ping failed to 
> 12022 fd=7: Broken pipe at 
> /usr/local/lib/perl5/site_perl/5.8.3/Mail/SpamAssassin/SpamdForkScaling.pm 
> line 287.
> Dec 30 08:01:10 Savory spamd[31005]: prefork: killing failed child 
> 12022 fd=7 at 
> /usr/local/lib/perl5/site_perl/5.8.3/Mail/SpamAssassin/SpamdForkScaling.pm 
> line 123.
> Dec 30 08:01:10 Savory spamd[31005]: prefork: kill of failed child 
> 12022 failed: No such process
> Dec 30 08:01:10 Savory spamd[31005]: prefork: killed child 12022 at 
> /usr/local/lib/perl5/site_perl/5.8.3/Mail/SpamAssassin/SpamdForkScaling.pm 
> line 137.
> Dec 30 08:01:10 Savory spamd[31005]: spamd: server successfully 
> spawned child process, pid 23037
> Dec 30 08:01:10 Savory spamd[31005]: spamd: server successfully 
> spawned child process, pid 23038
> Dec 30 08:01:10 Savory spamd[31005]: prefork: lost idle kids, so still 
> overloaded at 
> /usr/local/lib/perl5/site_perl/5.8.3/Mail/SpamAssassin/SpamdForkScaling.pm 
> line 262.
>
It appears that the routine  "syswrite_with_retry" in 
SpamdForkScaling.pm needs to be modified to timeout eventually rather 
than to loop indefinitely.  That routine has:

>     warn "prefork: syswrite(".$sock->fileno.") failed, retrying...";
>
>     # give it 5 seconds to recover.  we retry indefinitely.


We are running with --timeout-child=300 where the docs say:

>        --timeout-tcp=number           This option specifies the number 
> of seconds to wait for headers
>            from a client (spamc) before closing the connection.  The 
> minimum
>            value is 1, the default value is 30, and a value of 0 will 
> disable
>            socket timeouts completely.
>
>        --timeout-child=number
>            This option specifies the number of seconds to wait for a spamd
>            child to to process or check a message.  The minimum value 
> is 1,
>            the default value is 300, and a value of 0 will disable 
> child time-
>            outs completely.
>
At this point, we don't know what the reason is that the syswrite is 
failing, but, no matter what the reason, the timeout should be obeyed.  
If I lose this one message, that's a lot better than losing all the 
messages while this sits and does nothing useful.

I can imagine various ways of modifying the code.   Here's a suggested 
change based on the code in sysread_with_timeout, but I haven't used 
this long enough to see the failure.

> savory:/usr/local/lib/perl5/site_perl/5.8.3/Mail/SpamAssassin# diff -e 
> SpamdForkScaling-FCS.pm SpamdForkScaling.pm
> 556c
>     # ok, we didn't get it first time.  We'll have to start using
>     # select() and timeouts (which is slower).
>
>     my $now = time();
>     if (!defined $deadline) {
>       # set this.  it'll be close enough ;)
>       $deadline = $now + TOUT_WRITE_MAX;
>     }
>     elsif ($now > $deadline) {
>       # timed out!  report failure
>       warn "prefork: syswrite(".$sock->fileno.") failed after " . 
> TOUT_WRITE_MAX . " secs";
>       return undef;
>     }
>
>
>     # give it 5 seconds to recover.  we retry until the timeout
> .
> 545a
>   my $deadline; # we only set this if the first write fails
> .
> 69a
> # timeout for a syswrite() on the command channel.  if we go this lon
> # without writing, it's an error.
> use constant TOUT_WRITE_MAX       => 300;
> .





For our installed spamassassin,

spamd --version   shows
SpamAssassin Server version 3.1.0
  running on Perl 5.8.3

uname -a shows
Linux savory 2.4.18-1-686 #1 Wed Apr 14 18:20:10 UTC 2004 i686 GNU/Linux




Re: Spamd (3.1.0) fails to terminate -- suggested patch

Posted by "Daryl C. W. O'Shea" <sp...@dostech.ca>.
On 30/12/2005 1:33 PM, Mabry Tyson wrote:
> We keep occasionally (once a day or two) hitting an infinite loop in 
> spamd 3.1.0 where the process fails to terminate.   We find that after a 
> while some resource is used up and the system stops accepting new spamd 
> connections.   The process accumulates run time, but is not running at 
> 100% of the cpu.
> 
> In the syslogs we find
> 
>> Dec 30 06:25:05 Savory spamd[31005]: prefork: syswrite(7) failed, 
>> retrying... at 
>> /usr/local/lib/perl5/site_perl/5.8.3/Mail/SpamAssassin/SpamdForkScaling.pm 
>> line 554.
>> Dec 30 06:25:30 Savory last message repeated 5 times
>> ...
>> Dec 30 08:00:30 Savory last message repeated 12 times
>> Dec 30 08:01:05 Savory last message repeated 7 times

PLEASE take a look at the following bug:

http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4590


...and compare your patch to Justin's.  If you can try out Justin's 
patch and comment on whether it works for you, it'd be appreciated.


Daryl