You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Gene Black <gb...@stctelecom.net> on 2005/05/05 23:18:44 UTC

No rebirth for spamd children

I've been running SA on Debian stable for a few years now without any 
real issues, upgrading SA and related items as needed from time to time 
origionally through debian, later from source, and last night (in a vain 
attempt to fix the problem I'm about to describe) through backports.org.

A few days ago (and no recent changes at all I'm aware of other than a 
few standard package upgrades through the debian package system that may 
or may not have upgraded a few Perl modules) all of the spam started 
pouring in. A closer investigation showed no markup on the e-mails, so 
SA wasn't checking, and this message from the logs:

sm-mta[23911]: j45GLfLE023892: timeout waiting for input from local dur
ing Draining Input

I restarted SA and everything cleared. I shrugged and went about my 
business. 12-24 hours later it happened again. I quit shrugging and 
started digging. I noted that telnetting straight into SA would yeild a 
connection, but no response from SA. I turned on debugging, restarted 
and waited. 12-24 hours later it died again. Checked the logs. Nothing. 
It logs just like everything is normal and then at some point it stops 
logging - and it stops right after it finishes a transaction, so it's 
not like it's stopping in the middle or something. Also, it seems to 
stop a bit (a few hours I think) before I start getting the Draining 
Input messages (BTW - I've got sendmail handing to procmail which is 
calling SA). Closer inspection of the running processes shows that while 
  my main SA process is sitting there fine:

/usr/sbin/spamd --create-prefs --max-children 5 --helper-home-dir -D -d 
--pidfile=/var/run/spamd.pid

There are no children in sight. (I think maybe once during one failure I 
actually saw 1 child. Most of the time I see none though.) There are 
however, plenty of spamc clients sitting around waiting on service.

I upgraded SA last night to the latest version (the one I was running 
wasn't that old, but it was still back in the 2 series) to:

SpamAssassin version 3.0.2
   running on Perl version 5.6.1

Well, today it's still pulling this stunt on me. given that an upgrade 
didn't fix it, and I don't see a lot of other people talking about this 
issue, I suspect it has to do with the perl libraries and threading. 
Threading in perl isn't something I know a lot about though. For kicks 
and grins I decided to see if it could respawn children at all and 
locked --max-conn-per-child down to 1. It definately respawns without a 
problem. The log clearly shows:

spamd[30798]: server hit by SIGCHLD
spamd[30798]: handled cleanup of child pid 5442
spamd[30798]: server successfully spawned child process, pid 5525

Well I let that run for a few minutes and it wouldn't die, so for 
performance I cranked --max-conn-per-child up to 25 and stopped/started 
(I get a little paranoid about HUP and restart sometimes) SA. About an 
2-3 hours later (much shorter this time) it died again. the process list 
looked a bit different this go around though:

  5586 ?        S      1:07 /usr/sbin/spamd --create-prefs 
--max-children 5 --helper-home-dir --max-conn-per-child 25 -d 
--pidfile=/var/run/spamd.pid
  8481 ?        S      0:00 spamc
10090 ?        S      0:00 spamc
10117 ?        S      0:00 spamc
10276 ?        S      0:00 spamc
10703 ?        S      0:00 spamc
10787 ?        S      0:00 spamc
11266 ?        S      0:00 spamc
11270 ?        S      0:00 spamc
11681 ?        S      0:00 spamc
11967 ?        S      0:00 spamc
11978 ?        S      0:00 spamc
11995 ?        S      0:00 spamc
12203 ?        S      0:00 spamc
12305 ?        S      0:00 spamc
12328 ?        S      0:00 spamc
12494 ?        S      0:00 spamc
12740 ?        S      0:00 spamc
12745 ?        S      0:00 spamc
13226 ?        S      0:00 spamc
13236 ?        S      0:00 spamc
13579 ?        S      0:00 spamc
13773 ?        S      0:00 spamc
14818 ?        S      0:00 spamc
15222 ?        S      0:00 spamc
15290 ?        S      0:00 spamc
15322 ?        S      0:00 spamc
16526 ?        S      0:00 spamc
17537 ?        S      0:00 spamc
17680 ?        S      0:00 spamc
18067 ?        S      0:00 spamc
18563 ?        S      0:00 spamc
18717 ?        S      0:00 spamc
18942 ?        S      0:00 spamc
19079 ?        S      0:00 spamc
19819 ?        S      0:00 /usr/sbin/spamd --create-prefs --max-children 
5 --helper-home-dir --max-conn-per-child 25 -d --pidfile=/var/run/spamd.pid

Notice the two different master processes - /usr/sbin/spamd (or that's 
what I assume they are anyway). For normal operation, things look like this:

20088 ?        S      0:00 /usr/sbin/spamd --create-prefs --max-children 
5 --helper-home-dir --max-conn-per-child 25 -d --pidfile=/var/run/spamd.pid
20097 ?        S      0:03 spamd child
20098 ?        S      0:03 spamd child
20099 ?        S      0:03 spamd child
20100 ?        S      0:02 spamd child
20101 ?        S      0:03 spamd child


Not sure why there were two master processes there in that crash. Both 
were using the same pid file it appeared. Maybe just an anonmoly not 
related (I always try to be aware of the fact that any problem I 
troubleshoot could actually be multiple unrelated problems occuring at 
the same time).


Anyway, I'm not real sure where to go with this other than to hit CPAN 
and start upgrading perl libraries that look like they might be involved 
(a very unDebian like thing to do). Anyone have experience with this or 
can point to the possible problem?

Stats:

Linux version 2.4.20 (gcc version 2.95.4 20011002 (Debian prerelease)) 
#11 Mon Dec 1 18:39:20 EST 2003
Debian stable
backport.org SA (SpamAssassin version 3.0.2)
Perl 5.6.1
procmail v3.22 2001/09/10
Sendmail 8.12.3+3.5Wbeta/8.12.3/Debian-7.1
POSIX.pm version 1.03
System uptime: Around 520 days
Intel(R) Pentium(R) 4 CPU 2.40GHz
512MB RAM
Dell PowerEdge Server
More than enough free disk space on all partitions.
Load average of around 0.5
Around 106 processes
2GB of swap space
Around 100MB of free memory

There's probably something someone wants than I've not supplied here. 
Tell me what it is and I'll dig it out.

Any help would be appreciated.

Thanks!

Gene