You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2013/05/21 15:47:33 UTC

[Bug 6936] New: Not all Mails are scanned

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6936

            Bug ID: 6936
           Summary: Not all Mails are scanned
           Product: Spamassassin
           Version: 3.3.2
          Hardware: All
                OS: Linux
            Status: NEW
          Severity: minor
          Priority: P2
         Component: spamassassin
          Assignee: dev@spamassassin.apache.org
          Reporter: frank.urban@commerzbank.com
    Classification: Unclassified

Not all of my mails are scanned by SpamAssassin,
I found these lines in my logfile:

May 21 15:21:27 mail2 spamd[1009]: spamd: handled cleanup of child pid [30320]
due to SIGCHLD: interrupted, signal 2 (0002)
May 21 15:21:27 mail2 spamd[1009]: syswrite() on closed filehandle GEN367324 at
/usr/lib/perl5/5.10.0/x86_64-linux-thread-multi/IO/Handle.pm line 461.
May 21 15:21:27 mail2 spamd[1009]: Use of uninitialized value in concatenation
(.) or string at
/usr/lib/perl5/site_perl/5.10.0/Mail/SpamAssassin/SpamdForkScaling.pm line 497.
May 21 15:21:27 mail2 spamd[1009]: prefork: killing rogue child 310, failed to
write on fd :
May 21 15:21:27 mail2 spamd[1009]: prefork: killing failed child 310
fd=undefined at
/usr/lib/perl5/site_perl/5.10.0/Mail/SpamAssassin/SpamdForkScaling.pm line 172.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 6936] Not all Mails are scanned

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6936

Kevin A. McGrail <km...@pccc.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |kmcgrail@pccc.com

--- Comment #1 from Kevin A. McGrail <km...@pccc.com> ---
Looks like a partial duplicate of Bug 5390 but unsure.  Might want to try trunk
and see if that resolves your issue.

How much memory do you have on this box and what are your spamd parameters?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 6936] Not all Mails are scanned

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6936

--- Comment #3 from Mark Martinec <Ma...@ijs.si> ---
(In reply to comment #1)
> Looks like a partial duplicate of Bug 5390 but unsure.

Not really, the Bug 5390 is about killing idle spamd child processes
and is truly just an innocent warning.  This PR is about spamd
desperately (and somewhat clumsily) trying to kill borked child
processes.

These logs are just showing the spamd side of the story, i.e.
its attempts to clear child processes. The interesting part
would be to see the debug log of one of such child processes
to be able to see what is going on there.

> Might want to try trunk and see if that resolves your issue.

Indeed.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 6936] Not all Mails are scanned

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6936

--- Comment #8 from Karsten Bräckelmann <gu...@rudersport.de> ---
(In reply to comment #5)
> Would this be enough?
> startproc $SPAMD_BIN --max-children=300 --min-spare=20 -d -r
> /var/run/spamd.pid

min-spare 20...

(In reply to comment #2)
> May 21 15:21:27 mail2 spamd[1009]: prefork: adjust: 22 idle children more
> than 21 maximum idle children. Decreasing spamd children: 30320 killed.

... and max-spare 21 with a total max-children of 300.

Matches the configuration, but likely not the best idea anyway. The default
max-spare is 2, and spamd automatically sets it to min-spare +1, if max-spare
is less than min-spare. Which it is.

Most likely unrelated, but you might want to manually set max-spare to a saner
value.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 6936] Not all Mails are scanned

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6936

--- Comment #7 from Frank Urban <fr...@commerzbank.com> ---
>300 children concerns me.  What is your free like with that many processes running?
560.000 scanned mails yesterday means around 400 mail per minute in average on
one of my cluster servers.
My statistic from yesterday says that I had a maximum of 1500Spams/per minute
on my whole cluster.

>I and others are running trunk on production systems.  So yes.
OK. Will think about

>> May 22 14:45:11 mail4 spamd[1096]: razor2: razor2 check failed: Permission
>> denied razor2: razor2 had unknown error during check at
>> /usr/lib/perl5/site_perl/5.10.0/Mail/SpamAssassin/Plugin/Razor2.pm line 208,
>> <GEN30368> line 1. at
>> /usr/lib/perl5/site_perl/5.10.0/Mail/SpamAssassin/Plugin/Razor2.pm line 325.

>Weird.  Do you have SELinux running or anything that might interfere with processes?  Any limits on memory or File descriptors that might be interfering?
Its OpenSusE. But I think we should not mix the problems inside this bug report

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 6936] Not all Mails are scanned

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6936

Frank Urban <fr...@commerzbank.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |frank.urban@commerzbank.com

--- Comment #2 from Frank Urban <fr...@commerzbank.com> ---
Yes. this seemed to be the same problem.
But as far as I understand the bug will be fixed by turning the message from
warn to debug so it will not be shown in the log anymore and that's all.
But I see a real problem based on this .
Exactly at this time a SPAM mail passed the filter without being scanned.

The box has 32GByte RAM. Yesterday this box scanned 560.000 mails and I found
these error in the log

cat mail2.log |grep prefork: |grep failed

May 21 01:12:11 mail2 spamd[1033]: prefork: sysread(9) failed after 300 secs at
/usr/lib/perl5/site_perl/5.10.0/Mail/SpamAssassin/SpamdForkScaling.pm line 654.
May 21 01:12:14 mail2 spamd[1034]: prefork: sysread(10) failed after 300 secs
at /usr/lib/perl5/site_perl/5.10.0/Mail/SpamAssassin/SpamdForkScaling.pm line
654.
May 21 08:27:20 mail2 spamd[1009]: prefork: killing rogue child 446, failed to
write on fd : 
May 21 08:27:20 mail2 spamd[1009]: prefork: killing failed child 446
fd=undefined at
/usr/lib/perl5/site_perl/5.10.0/Mail/SpamAssassin/SpamdForkScaling.pm line 172.
May 21 10:08:15 mail2 spamd[1009]: prefork: killing rogue child 328, failed to
write on fd : 
May 21 10:08:15 mail2 spamd[1009]: prefork: killing failed child 328
fd=undefined at
/usr/lib/perl5/site_perl/5.10.0/Mail/SpamAssassin/SpamdForkScaling.pm line 172.
May 21 10:38:43 mail2 spamd[1009]: prefork: killing rogue child 395, failed to
write on fd : 
May 21 10:38:43 mail2 spamd[1009]: prefork: killing failed child 395
fd=undefined at
/usr/lib/perl5/site_perl/5.10.0/Mail/SpamAssassin/SpamdForkScaling.pm line 172.
May 21 10:44:25 mail2 spamd[1009]: prefork: killing rogue child 515, failed to
write on fd : 
May 21 10:44:25 mail2 spamd[1009]: prefork: killing failed child 515
fd=undefined at
/usr/lib/perl5/site_perl/5.10.0/Mail/SpamAssassin/SpamdForkScaling.pm line 172.
May 21 11:00:19 mail2 spamd[1009]: prefork: killing rogue child 570, failed to
write on fd : 
May 21 11:00:19 mail2 spamd[1009]: prefork: killing failed child 570
fd=undefined at
/usr/lib/perl5/site_perl/5.10.0/Mail/SpamAssassin/SpamdForkScaling.pm line 172.
May 21 11:03:10 mail2 spamd[1009]: prefork: killing rogue child 415, failed to
write on fd : 
May 21 11:03:10 mail2 spamd[1009]: prefork: killing failed child 415
fd=undefined at
/usr/lib/perl5/site_perl/5.10.0/Mail/SpamAssassin/SpamdForkScaling.pm line 172.
May 21 11:37:48 mail2 spamd[1009]: prefork: killing rogue child 539, failed to
write on fd : 
May 21 11:37:48 mail2 spamd[1009]: prefork: killing failed child 539
fd=undefined at
/usr/lib/perl5/site_perl/5.10.0/Mail/SpamAssassin/SpamdForkScaling.pm line 172.
May 21 12:02:33 mail2 spamd[1009]: prefork: killing rogue child 491, failed to
write on fd : 
May 21 12:02:33 mail2 spamd[1009]: prefork: killing failed child 491
fd=undefined at
/usr/lib/perl5/site_perl/5.10.0/Mail/SpamAssassin/SpamdForkScaling.pm line 172.
May 21 13:00:24 mail2 spamd[1009]: prefork: killing rogue child 338, failed to
write on fd : 
May 21 13:00:24 mail2 spamd[1009]: prefork: killing failed child 338
fd=undefined at
/usr/lib/perl5/site_perl/5.10.0/Mail/SpamAssassin/SpamdForkScaling.pm line 172.
May 21 14:31:02 mail2 spamd[1009]: prefork: killing rogue child 354, failed to
write on fd : 
May 21 14:31:02 mail2 spamd[1009]: prefork: killing failed child 354
fd=undefined at
/usr/lib/perl5/site_perl/5.10.0/Mail/SpamAssassin/SpamdForkScaling.pm line 172.
May 21 14:55:47 mail2 spamd[1009]: prefork: killing rogue child 363, failed to
write on fd : 
May 21 14:55:47 mail2 spamd[1009]: prefork: killing failed child 363
fd=undefined at
/usr/lib/perl5/site_perl/5.10.0/Mail/SpamAssassin/SpamdForkScaling.pm line 172.
May 21 15:01:57 mail2 spamd[1009]: prefork: killing rogue child 4255, failed to
write on fd : 
May 21 15:01:57 mail2 spamd[1009]: prefork: killing failed child 4255
fd=undefined at
/usr/lib/perl5/site_perl/5.10.0/Mail/SpamAssassin/SpamdForkScaling.pm line 172.
May 21 15:21:27 mail2 spamd[1009]: prefork: killing rogue child 310, failed to
write on fd : 
May 21 15:21:27 mail2 spamd[1009]: prefork: killing failed child 310
fd=undefined at
/usr/lib/perl5/site_perl/5.10.0/Mail/SpamAssassin/SpamdForkScaling.pm line 172.
May 21 15:51:40 mail2 spamd[1009]: prefork: killing rogue child 567, failed to
write on fd : 
May 21 15:51:40 mail2 spamd[1009]: prefork: killing failed child 567
fd=undefined at
/usr/lib/perl5/site_perl/5.10.0/Mail/SpamAssassin/SpamdForkScaling.pm line 172.
May 21 16:11:26 mail2 spamd[1009]: prefork: killing rogue child 328, failed to
write on fd : 
May 21 16:11:26 mail2 spamd[1009]: prefork: killing failed child 328
fd=undefined at
/usr/lib/perl5/site_perl/5.10.0/Mail/SpamAssassin/SpamdForkScaling.pm line 172.
May 21 16:39:51 mail2 spamd[1009]: prefork: killing rogue child 642, failed to
write on fd : 
May 21 16:39:51 mail2 spamd[1009]: prefork: killing failed child 642
fd=undefined at
/usr/lib/perl5/site_perl/5.10.0/Mail/SpamAssassin/SpamdForkScaling.pm line 172.
May 21 17:27:27 mail2 spamd[1009]: prefork: killing rogue child 421, failed to
write on fd : 
May 21 17:27:27 mail2 spamd[1009]: prefork: killing failed child 421
fd=undefined at
/usr/lib/perl5/site_perl/5.10.0/Mail/SpamAssassin/SpamdForkScaling.pm line 172.
May 21 18:25:03 mail2 spamd[1009]: prefork: killing rogue child 324, failed to
write on fd : 
May 21 18:25:03 mail2 spamd[1009]: prefork: killing failed child 324
fd=undefined at
/usr/lib/perl5/site_perl/5.10.0/Mail/SpamAssassin/SpamdForkScaling.pm line 172.
May 21 18:36:08 mail2 spamd[1009]: prefork: killing rogue child 398, failed to
write on fd : 
May 21 18:36:08 mail2 spamd[1009]: prefork: killing failed child 398
fd=undefined at
/usr/lib/perl5/site_perl/5.10.0/Mail/SpamAssassin/SpamdForkScaling.pm line 172.
May 21 21:00:29 mail2 spamd[1009]: prefork: killing rogue child 436, failed to
write on fd : 
May 21 21:00:29 mail2 spamd[1009]: prefork: killing failed child 436
fd=undefined at
/usr/lib/perl5/site_perl/5.10.0/Mail/SpamAssassin/SpamdForkScaling.pm line 172.
May 21 21:20:01 mail2 spamd[1009]: prefork: killing rogue child 444, failed to
write on fd : 
May 21 21:20:01 mail2 spamd[1009]: prefork: killing failed child 444
fd=undefined at
/usr/lib/perl5/site_perl/5.10.0/Mail/SpamAssassin/SpamdForkScaling.pm line 172.

One of my users send me a SPAM mail that was not scanned at exactly this time
(15:21:27)
Yesterday I saw the same with another example and so I opened this bug report.

I tried to filter out the interesting part of the log file at this time. Should
be this here:

May 21 15:21:25 mail2 postfix/smtpd[28254]: B16912DC103:
client=unknown[109.167.225.72]
May 21 15:21:27 mail2 postfix/cleanup[4454]: B16912DC103:
message-id=<20...@mail.example.com>
May 21 15:21:27 mail2 postfix/qmgr[18855]: B16912DC103: from=<>, size=676,
nrcpt=1 (queue active)
May 21 15:21:27 mail2 postfix/pipe[24380]: B16912DC103:
to=<as...@example.com>, relay=filter, delay=1.6, delays=1.4/0.04/0/0.13,
dsn=2.0.0, status=sent (delivered via filter service)
May 21 15:21:27 mail2 postfix/qmgr[18855]: B16912DC103: removed

ay 21 15:21:27 mail2 spamd[1009]: prefork: adjust: 23 idle children more than
21 maximum idle children. Decreasing spamd children: 30321 killed.
May 21 15:21:27 mail2 spamd[1009]: spamd: handled cleanup of child pid [30321]
due to SIGCHLD: interrupted, signal 2 (0002)
May 21 15:21:27 mail2 spamd[1009]: prefork: child states:
BBBBBIBBBBBBBBBIBBBBBBBBBIBBBBBBBBBBBBBIBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBIIIIIIIIIIIIIIIIII
May 21 15:21:27 mail2 spamd[1009]: prefork: adjust: 22 idle children more than
21 maximum idle children. Decreasing spamd children: 30320 killed.
May 21 15:21:27 mail2 spamd[1009]: spamd: handled cleanup of child pid [30320]
due to SIGCHLD: interrupted, signal 2 (0002)
May 21 15:21:27 mail2 spamd[1009]: syswrite() on closed filehandle GEN367324 at
/usr/lib/perl5/5.10.0/x86_64-linux-thread-multi/IO/Handle.pm line 461.
May 21 15:21:27 mail2 spamd[1009]: Use of uninitialized value in concatenation
(.) or string at
/usr/lib/perl5/site_perl/5.10.0/Mail/SpamAssassin/SpamdForkScaling.pm line 497.
May 21 15:21:27 mail2 spamd[1009]: prefork: killing rogue child 310, failed to
write on fd :
May 21 15:21:27 mail2 spamd[1009]: prefork: killing failed child 310
fd=undefined at
/usr/lib/perl5/site_perl/5.10.0/Mail/SpamAssassin/SpamdForkScaling.pm line 172.
May 21 15:21:27 mail2 spamd[1009]: prefork: error closing socket: Bad file
descriptor
May 21 15:21:27 mail2 spamd[1009]: prefork: killed child 310
May 21 15:21:27 mail2 spamd[1009]: spamd: handled cleanup of child pid [310]
due to SIGCHLD: interrupted, signal 2 (0002)

May 21 15:21:27 mail2 postfix/pickup[23728]: 319512DC119: uid=1002
from=<MAILER-DAEMON>
May 21 15:21:27 mail2 postfix/cleanup[25491]: 319512DC119:
message-id=<20...@mail.example.com>
May 21 15:21:27 mail2 postfix/qmgr[18855]: 319512DC119: from=<>, size=812,
nrcpt=1 (queue active)
May 21 15:21:27 mail2 postfix/smtp[29253]: 319512DC119:
to=<as...@example.com>,
relay=intern.postfix.example.com[172.16.71.210]:25, delay=0.05,
delays=0.02/0/0.02/0, dsn=2.0.0, status=sent (250 2.0.0 Ok: queued as
3A4B420026E)
May 21 15:21:27 mail2 postfix/qmgr[18855]: 319512DC119: removed

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 6936] Not all Mails are scanned

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6936

--- Comment #5 from Frank Urban <fr...@commerzbank.com> ---
Would this be enough?
startproc $SPAMD_BIN --max-children=300 --min-spare=20 -d -r /var/run/spamd.pid

How should I try to trunk?  On my production machine?

I does not happen so often and I can see sometimes some other errors in my log.
For example:

May 22 15:11:04 mail4 spamd[420]: rules: failed to run __freemail_reply test,
skipping:
May 22 15:11:04 mail4 spamd[420]:  (Assertion rx->sublen >= (s - rx->subbeg) +
i failed: [...] file "regcomp.c", line 5109 at
/usr/lib/perl5/site_perl/5.10.0/Mail/SpamAssassin/Plugin/FreeMail.pm line 347,
<GEN47834> line 341.

May 22 14:45:11 mail4 spamd[1096]: razor2: razor2 check failed: Permission
denied razor2: razor2 had unknown error during check at
/usr/lib/perl5/site_perl/5.10.0/Mail/SpamAssassin/Plugin/Razor2.pm line 208,
<GEN30368> line 1. at
/usr/lib/perl5/site_perl/5.10.0/Mail/SpamAssassin/Plugin/Razor2.pm line 325.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 6936] Not all Mails are scanned

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6936

--- Comment #4 from Kevin A. McGrail <km...@pccc.com> ---
I would reiterate to try trunk and see if it helps.  I also need to see your
spamd parameters.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 6936] Not all Mails are scanned

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6936

--- Comment #6 from Kevin A. McGrail <km...@pccc.com> ---
(In reply to comment #5)
> Would this be enough?
> startproc $SPAMD_BIN --max-children=300 --min-spare=20 -d -r
> /var/run/spamd.pid

300 children concerns me.  What is your free like with that many processes
running?

> How should I try to trunk?  On my production machine?

I and others are running trunk on production systems.  So yes.

> I does not happen so often and I can see sometimes some other errors in my
> log. For example:
> 
> May 22 15:11:04 mail4 spamd[420]: rules: failed to run __freemail_reply
> test, skipping:
> May 22 15:11:04 mail4 spamd[420]:  (Assertion rx->sublen >= (s - rx->subbeg)
> + i failed: [...] file "regcomp.c", line 5109 at
> /usr/lib/perl5/site_perl/5.10.0/Mail/SpamAssassin/Plugin/FreeMail.pm line
> 347, <GEN47834> line 341.

No idea about this one. 

> May 22 14:45:11 mail4 spamd[1096]: razor2: razor2 check failed: Permission
> denied razor2: razor2 had unknown error during check at
> /usr/lib/perl5/site_perl/5.10.0/Mail/SpamAssassin/Plugin/Razor2.pm line 208,
> <GEN30368> line 1. at
> /usr/lib/perl5/site_perl/5.10.0/Mail/SpamAssassin/Plugin/Razor2.pm line 325.

Weird.  Do you have SELinux running or anything that might interfere with
processes?  Any limits on memory or File descriptors that might be interfering?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 6936] Not all Mails are scanned

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6936

--- Comment #9 from Frank Urban <fr...@commerzbank.com> ---
>spamd automatically sets it to min-spare +1,
was new for me...
So I now set it to max-spare=200 and the whole 
May 23 07:51:53 mail2 spamd[11853]: prefork: adjust: 22 idle children more than
21 maximum idle children. Decreasing spamd children: 1674 killed.
messages are gone. Looks better.

-- 
You are receiving this mail because:
You are the assignee for the bug.