You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by "Dan Mahoney, System Admin" <da...@prime.gushi.org> on 2006/04/02 07:11:06 UTC

Re: Spamd keeps getting hung up!

On Fri, 31 Mar 2006, Justin Mason wrote:

Hey, if anyone is around RIGHT NOW, I'm getting the issue, it's 
repeatable, and I can't figure out strace...I'm trying strace -o 
/home/danm/strace.log -f -ttt /usr/local/bin/spamd -D -u spamd -i -A 
72.9.101.130,65.125.237.232,65.125.228.130,127.0.0.1 -q -d -m 40 -r 
/tmp/spamd.pid -l --min-spare=5 --max-spare=20

  but it's only capturing like one line of output to the logfile.


I should prolly note that the BSD standard seems to be truss rather than 
strace, but strace IS in ports and installed oni my spamd box.

If you can catch me via instand messenger (gushiDotOrg) or try email, I 
might be able to help nail this one down, at least as long as this barrage 
continues, assuming we can get a workable strace

it's the same tinsc user again, getting flooded, and I'm now capturing 
their messages for later analysis (admittedly after spamassassin hits 
them, but the filter also catches them after I have to KILL spamassassin, 
which lets us easily see which ones were being processed when it was 
killed (since they will lack the SA headers)

Sorry for the bad punctuation, I'm on satellite.

-Dan

>
> sounds like a new ticket is in order, alright. btw if *is* load-related,
> an "strace -f -ttt" log will show that pretty clearly.
>
> --j.
>
> Daryl C. W. O'Shea writes:
>> (copying Justin since this has to do with pre-forking)
>>
>> Dan Mahoney, System Admin wrote:
>>> On Fri, 10 Mar 2006, Daryl C. W. O'Shea wrote:
>>>
>>>> On 3/10/2006 11:22 AM, Dan Mahoney, System Admin wrote:
>>
>>> Okay,
>>>
>>> I'm still getting these issues.  I've corrected every other issue that's
>>> plagued us, and the thing still locks up.  USUALLY when a user gets some
>>> form of dictionary spam.  For the users I can identify I've been keeping
>>> copies of their stuff.
>>>
>>> NOTE: This is under a stock 3.1.1, if there are any other patches I
>>> should be using from the previous conversations that are NOT in 3.1.1,
>>> please let me know, and I'll make sure I have those too.  I'm seeing
>>> lots of the following:
>>>
>>> Mar 30 21:52:14 quark spamd[45835]: __alarm__
>>> Mar 30 21:52:14 quark spamd[45835]: __alarm__
>>> Mar 30 21:52:14 quark spamd[45835]: spamd: copy_config timeout (with
>>> empty $@), respawning child process after 25 messages at
>>> /usr/local/bin/spamd line 982.
>>> Mar 30 21:52:16 quark spamd[52479]: __alarm__
>>> Mar 30 21:52:16 quark spamd[52479]: __alarm__
>>> Mar 30 21:52:16 quark spamd[52479]: spamd: copy_config timeout (with
>>> empty $@), respawning child process after 9 messages at
>>> /usr/local/bin/spamd line 982.
>>
>> This indicates that the patch from bug 4699 is working -- spamd now
>> recognizes that the alarm timed out on copy_config.
>>
>>
>>> And also some of this:
>>>
>>> Mar 30 21:52:31 quark spamd[42292]: syswrite() on closed filehandle
>>> GEN88 at /usr/local/lib/perl5/5.8.6/mach/IO/Handle.pm line 451.
>>> Mar 30 21:52:31 quark spamd[42292]: Use of uninitialized value in
>>> concatenation (.) or string at
>>> /usr/local/lib/perl5/site_perl/5.8.6/Mail/SpamAssassin/SpamdForkScaling.pm
>>> line 330.
>>> Mar 30 21:52:31 quark spamd[42292]: prefork: write of ping failed to
>>> 52479 fd=:  at
>>> /usr/local/lib/perl5/site_perl/5.8.6/Mail/SpamAssassin/SpamdForkScaling.pm
>>> line 330.
>>> Mar 30 21:52:31 quark spamd[42292]: Use of uninitialized value in
>>> concatenation (.) or string at
>>> /usr/local/lib/perl5/site_perl/5.8.6/Mail/SpamAssassin/SpamdForkScaling.pm
>>> line 127.
>>> Mar 30 21:52:31 quark spamd[42292]: prefork: killing failed child 52479
>>> fd= at
>>> /usr/local/lib/perl5/site_perl/5.8.6/Mail/SpamAssassin/SpamdForkScaling.pm
>>> line 127.
>>> Mar 30 21:52:31 quark spamd[42292]: prefork: killed child 52479 at
>>> /usr/local/lib/perl5/site_perl/5.8.6/Mail/SpamAssassin/SpamdForkScaling.pm
>>> line 141.
>>> Mar 30 21:52:31 quark spamd[42292]: syswrite() on closed filehandle
>>> GEN70 at /usr/local/lib/perl5/5.8.6/mach/IO/Handle.pm line 451.
>>> Mar 30 21:52:31 quark spamd[42292]: Use of uninitialized value in
>>> concatenation (.) or string at
>>> /usr/local/lib/perl5/site_perl/5.8.6/Mail/SpamAssassin/SpamdForkScaling.pm
>>> line 330.
>>> Mar 30 21:52:31 quark spamd[42292]: prefork: write of ping failed to
>>> 45835 fd=:  at
>>> /usr/local/lib/perl5/site_perl/5.8.6/Mail/SpamAssassin/SpamdForkScaling.pm
>>> line 330.
>>> Mar 30 21:52:31 quark spamd[42292]: Use of uninitialized value in
>>> concatenation (.) or string at
>>> /usr/local/lib/perl5/site_perl/5.8.6/Mail/SpamAssassin/SpamdForkScaling.pm
>>> line 127.
>>> Mar 30 21:52:31 quark spamd[42292]: prefork: killing failed child 45835
>>> fd= at
>>> /usr/local/lib/perl5/site_perl/5.8.6/Mail/SpamAssassin/SpamdForkScaling.pm
>>> line 127.
>>> Mar 30 21:52:31 quark spamd[42292]: prefork: killed child 45835 at
>>> /usr/local/lib/perl5/site_perl/5.8.6/Mail/SpamAssassin/SpamdForkScaling.pm
>>> line 141.
>>
>> This indicates that the child is exiting, but SpamdForkScaling doesn't
>> know about it until a ping fails 150 seconds later, so a new child isn't
>> spawned for a long time after one of them commits suicide.
>>
>>
>>> Example at or around Mar 30 01:48:16 in this file:
>>>
>>> http://www.gushi.org/maillog33106-0.txt
>>>
>>> And another similar lockup at Mar 30 21:49:50 -- SAME USER, go figure.
>>>
>>> I don't have archived copies of this user's mail -- yet.  I've set up
>>> archiving for them, and we have everything from now forward, but I'm
>>> convinced there's SOMETHING in the spam they're getting that causes a
>>> lockup.
>>
>> I think it's actually load related... spamd is timing out the
>> copy_config sooner than it's really taking under high load.  If you were
>> to change the alarm value from 10 to 100 or so, around spamd line 949
>> this may go away.
>>
>> Any idea what sort of load averages you've got when this starts to
>> happen?  It looks like it starts off with a couple children timing out,
>> then you become short on children, mail starts stacking up, and it
>> snowballs from there.
>>
>>
>> BTW, we should probably find or open a bugzilla ticket for this.  Bug
>> 4699 is related.  The pre-fork issue is probably another bug of its own.
>>
>>
>> Daryl
>

--

"Hey Guys, does anyone know what 'poon tang' is?"

-C.S. Dave, July 8, 2K, about 12:30AM

--------Dan Mahoney--------
Techie,  Sysadmin,  WebGeek
Gushi on efnet/undernet IRC
ICQ: 13735144   AIM: LarpGM
Site:  http://www.gushi.org
---------------------------