You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Kjetil Kjernsmo <kj...@kjernsmo.net> on 2006/07/09 13:06:58 UTC

Frequent "sysread not ready"-messages

Hi all!

I'm now running SA 3.1.3 on a machine on its own, and I'm seeing some 
problems. The main symptom is that quite a lot of email gets just this:
X-Spam-Status: No, hits= required=
	tests=

I'm calling spamd (on a box called "robin") from a qmail and qpsmptd 
setup running on a box (called "pooh") on the same LAN. I have patched 
qpsmptd to do this, but Ask just applied the patch to qpsmtpd, it was 
trivial. I use Pg 8.1 for the bayesian db, which runs on the same 
machine as spamd, but the spamd and Pg db are the only things running 
there, even though I have ambitions of running clamd there too. 

Now to the errors I'm seeing. Sometimes all goes well, and it returns
Jul  9 12:38:03 robin spamd[10366]: spamd: identified spam (20.4/5.0) 
for qpsmtpd:109 in 7.5 seconds, 1224 bytes.
Jul  9 12:38:03 robin spamd[10366]: spamd: result: Y 20 - 
BAYES_99,LONGWORDS,RCVD_ILLEGAL_IP,RCVD_IN_NJABL_DUL,RCVD_IN_WHOIS_BOGONS,URIBL_JP_SURBL,URIBL_OB_SURBL 
scantime=7.5,size=1224,user=qpsmtpd,uid=109,required_score=5.0,rhost=www.ausangate.kjernsmo.net,raddr=192.168.2.10,rport=47466,mid=<94...@compelled.noc.bootlick.gr>,bayes=0.99853074927797,autolearn=disabled
Jul  9 12:38:03 robin spamd[10366]: config: copying current conf from 
backup
Jul  9 12:38:03 robin spamd[10225]: prefork: child 10366: entering state 
1
Jul  9 12:38:03 robin spamd[10225]: prefork: new lowest idle kid: 10366
Jul  9 12:38:03 robin spamd[10225]: prefork: child reports idle
Jul  9 12:38:03 robin spamd[10225]: prefork: child states: II

But in the same second, it gets problems:

Jul  9 12:38:03 robin spamd[10366]: prefork: sysread(9) not ready, wait 
max 300 secs
Jul  9 12:38:09 robin spamd[10388]: prefork: periodic ping from spamd 
parent
Jul  9 12:38:09 robin spamd[10388]: prefork: sysread(9) not ready, wait 
max 300 secs

On the remote (calling) machine, the qpsmtpd log says:
Sun Jul  9 12:38:03 2006 pooh[27244]: check_spam: Yes, hits=20.4, 
required=5.0, tests=BAYES_99,LONGWORDS,RCVD_ILLEGAL_IP,RCVD_
IN_NJABL_DUL,RCVD_IN_WHOIS_BOGONS,URIBL_JP_SURBL,URIBL_OB_SURBL
Sun Jul  9 12:38:03 2006 pooh[27244]: spamassassin
Sun Jul  9 12:38:03 2006 pooh[27244]: 552 spam score exceeded threshold 
(#5.6.1)
Sun Jul  9 12:38:03 2006 pooh[25219]: cleaning up after 27245
Sun Jul  9 12:38:06 2006 pooh[27244]: dispatching QUIT
Sun Jul  9 12:38:06 2006 pooh[27244]: 221 pooh.kjernsmo.net closing 
connection. Have a wonderful day.
Sun Jul  9 12:38:06 2006 pooh[27244]: logging::file
Sun Jul  9 12:38:06 2006 pooh[27244]: rhsbl
Sun Jul  9 12:38:06 2006 pooh[27244]: dnsbl
Sun Jul  9 12:38:07 2006 pooh[25219]: cleaning up after 27244

So, it appears that the above problem can happen without the caller 
saying anything, or?

Diving deeper into that log, I see things like:
Sun Jul  9 12:44:01 2006 pooh[27263]: spamassassin
Sun Jul  9 12:44:01 2006 pooh[27263]: check_spam: No, hits=, required=, 
tests=

and that's what I see in my inbox too: Spammy messages that has the 
header, but no values. 

The log is loooong here, but I think this is the important snippet:

Jul  9 12:44:01 robin spamd[10225]: prefork: ordered 10366 to accept
Jul  9 12:44:01 robin spamd[10225]: prefork: sysread(7) not ready, wait 
max 300 secs
Jul  9 12:44:01 robin spamd[10225]: prefork: child 10366: entering state 
2
Jul  9 12:44:01 robin spamd[10225]: prefork: new lowest idle kid: 10388

10366 then proceeds to do a lot of work, at then it goes:
Jul  9 12:44:01 robin spamd[10366]: bayes: tok_get_all: token count: 255
Jul  9 12:44:01 robin spamd[10225]: prefork: new lowest idle kid: 10388
Jul  9 12:44:01 robin spamd[10225]: spamd: handled cleanup of child pid 
10366 due to SIGCHLD
Jul  9 12:44:01 robin spamd[10225]: prefork: child closed connection
Jul  9 12:44:01 robin spamd[10225]: prefork: child states: I
Jul  9 12:44:01 robin spamd[10225]: spamd: server successfully spawned 
child process, pid 10399
Jul  9 12:44:01 robin spamd[10225]: prefork: child 10399: entering state 
0
Jul  9 12:44:01 robin spamd[10399]: prefork: sysread(9) not ready, wait 
max 300 secs
Jul  9 12:44:01 robin spamd[10225]: prefork: new lowest idle kid: 10388
Jul  9 12:44:01 robin spamd[10225]: prefork: child 10399: entering state 
1
Jul  9 12:44:01 robin spamd[10225]: prefork: new lowest idle kid: 10388
Jul  9 12:44:01 robin spamd[10225]: prefork: child reports idle
Jul  9 12:44:01 robin spamd[10225]: prefork: child states: II

So, that's what I have... I had a brief look in the code, and found 
where the debug message originates but I don't know enough of SA to get 
me anywhere. Also, I cannot tell for sure if this sysread message has 
anything to do with the symptom, that spam doesn't get a score.

The environment is a nice mixup of Debian. SA itself is from unstable, 
many of the SA dependencies, including Perl and PostgreSQL 8.1 is from 
testing, but only the things that needed updating from SA is from 
testing, the rest of the system is still stable. So, perl is:
This is perl, v5.8.8 built for i486-linux-gnu-thread-multi

One weakness of this current machine is that it runs with DMA off. Which 
is curious, I've tried to enable it, I may have to recompile the 
kernel, I suppose. This will give it lower bandwidth to the disk than 
it should have, but I would still be really surprised if that could 
cause something like this, or?

Unfortunately (?), I'm going on a week's offline vacation in a couple of 
hours now, but I'll read your messages if any when I get back. :-)

Cheers,

Kjetil
-- 
Kjetil Kjernsmo
Programmer / Astrophysicist / Ski-orienteer / Orienteer / Mountaineer
kjetil@kjernsmo.net
Homepage: http://www.kjetil.kjernsmo.net/     OpenPGP KeyID: 6A6A0BBC

Re: Frequent "sysread not ready"-messages

Posted by jdow <jd...@earthlink.net>.
From: "Kjetil Kjernsmo" <kj...@kjernsmo.net>

> On Sunday 23 July 2006 00:52, jdow wrote:
>> We still need to know how SpamAssassin is invoked. Could you give an
>> anatomical description of your "mail goes in here and comes out
>> there" message digestion process with each tool that gets invoked
>> along the way at least named?
> 
> Yup, all that was in my original mail. I figured I'd try to be terse 
> this time! :-) spamd is running on a host robin (since then renamed to 
> eeyore). It is called by qpsmtpd 0.32 with remote spamd capabilities. 
> Since qpsmtpd is Perl-based, I'd be surprised if the calling code isn't 
> essentially identical to the code in spamassassin itself. It is very 
> simple, no amavisd or anything.

Spamd is "called" via spamc which is a C language thing.

>> It would also help to know how the daemon starts for spamd,
> 
> It is a Debian system, it has its own ways of launching things, but ps x 
> shows this:
> 
> /usr/sbin/spamd --ip-address --allowed-ips=192.168.2.0/24 --username 
> spamd --create-prefs --max-children 5 --helper-home-dir /var/opt/spamd 
> -d --pidfile=/var/opt/spamd/pid
> 
> I'm sure there isn't a permissions problem either, since it does run 
> most of the time.

> Before jumping onto qpsmtpd, I had been running Exim4 for several years. 
> I had seen occasional unscanned messages, say once a month. I thought I 
> had some kind of misconfiguration, so I didn't investigate further 
> (often, it were things that had come through e.g. my CPAN address). 
> This setup was not capable of using SA 3.1 like I wanted, so I split 
> off spamd but retained Exim4 at first. It pretty much had the same 
> symptoms, but I couldn't get Exim to give me any logs for debugging. 
> Anyway, I suspect this is independent of the caller, since the same 
> symptoms have been observed with Exim and qpsmtpd.

Regarding infrequent unscanned messages do they just happen to coincide
with a rules update time? That is the current unscanned messages problem.
I think 3.1.3 would have the fix for the PerMsgStatus.pm version of
unscanned messages.

I had the impression you were experiencing the cited problems with every
message. Having it happen with infrequent messages is a whole different
kettle or earthworms, most of which have been solved and the remainder of
which you can work around IF you can briefly tell qpsmtp to hold mail
for 3 or 4 minutes while the RDJ update is done. This should be QUITE
rare, however.

Another possible problem uniquely comes from the two machine configuration
if one or both machines have a firewall active on the interface net. There
are some ports that seem to get blocked for any use due to their frequent
appearance in nasty trojans. That could be causing you the upset.

{^_^}

Re: Frequent "sysread not ready"-messages

Posted by Kjetil Kjernsmo <kj...@kjernsmo.net>.
On Sunday 23 July 2006 00:52, jdow wrote:
> We still need to know how SpamAssassin is invoked. Could you give an
> anatomical description of your "mail goes in here and comes out
> there" message digestion process with each tool that gets invoked
> along the way at least named?

Yup, all that was in my original mail. I figured I'd try to be terse 
this time! :-) spamd is running on a host robin (since then renamed to 
eeyore). It is called by qpsmtpd 0.32 with remote spamd capabilities. 
Since qpsmtpd is Perl-based, I'd be surprised if the calling code isn't 
essentially identical to the code in spamassassin itself. It is very 
simple, no amavisd or anything.

> It would also help to know how the daemon starts for spamd,

It is a Debian system, it has its own ways of launching things, but ps x 
shows this:

/usr/sbin/spamd --ip-address --allowed-ips=192.168.2.0/24 --username 
spamd --create-prefs --max-children 5 --helper-home-dir /var/opt/spamd 
-d --pidfile=/var/opt/spamd/pid

I'm sure there isn't a permissions problem either, since it does run 
most of the time.

Before jumping onto qpsmtpd, I had been running Exim4 for several years. 
I had seen occasional unscanned messages, say once a month. I thought I 
had some kind of misconfiguration, so I didn't investigate further 
(often, it were things that had come through e.g. my CPAN address). 
This setup was not capable of using SA 3.1 like I wanted, so I split 
off spamd but retained Exim4 at first. It pretty much had the same 
symptoms, but I couldn't get Exim to give me any logs for debugging. 
Anyway, I suspect this is independent of the caller, since the same 
symptoms have been observed with Exim and qpsmtpd.

Best,

Kjetil
-- 
Kjetil Kjernsmo
Programmer / Astrophysicist / Ski-orienteer / Orienteer / Mountaineer
kjetil@kjernsmo.net
Homepage: http://www.kjetil.kjernsmo.net/     OpenPGP KeyID: 6A6A0BBC

Re: Frequent "sysread not ready"-messages

Posted by jdow <jd...@earthlink.net>.
From: "Kjetil Kjernsmo" <kj...@kjernsmo.net>

> Hello again!
>
> Sorry to follow up on myself, I guess it was a bad idea to post such a
> long message, and at the end say that I was going on vacation. I'm back
> now, and the problem has by no means disappeared. I would be very
> grateful if someone could have a look. I'll try to summarize, the
> previous message is in the archives at
> http://mail-archives.apache.org/mod_mbox/spamassassin-users/200607.mbox/%3c200607091307.00402.kjetil@kjernsmo.net%3e
>
> So, what I wrote was that:
>> The main symptom is that quite a lot of email gets just
>> this: X-Spam-Status: No, hits= required=
>> tests=
>
> This is SA 3.1.3. running on a box by itself, called remotely over a TCP
> socket. It is backed by a Pg 8.1 bayesian datastore.

We still need to know how SpamAssassin is invoked. Could you give an
anatomical description of your "mail goes in here and comes out there"
message digestion process with each tool that gets invoked along the
way at least named?

You know - mail comes in on SpooMail version 15.199.8 on machine one.
SpooMail calls MailMasher version 0.22.8982 for mail filtering. Then
MailMasher makes a TCP connection to SpamAssassin 3.1.3 running as
spamd on machine 2 using its own TCP tool in place of spamc. The mail
comes back through MailMasher back to SpooMail which then sends the
mail through MuckMail version 2.0.1 for delivery on machine 3.

It would also help to know how the daemon starts for spamd, since you
seem to be using it. (That suggests to me that amavisd-new would be a
bad way to try to run SpamAssassin without some special gyrations, for
example.) And the line in your MailMasher or equivalent that runs
spamc, which must be on the same machine as MailMasher or accessible
via some network file system, would help as well.

The more gozinta here gozouta there into the gozinta in this other
place might suggest to us where the problem lies and lead to either
a solution or a redirection to the support mailing list for MailMasher.

{^_^} 


Re: Frequent "sysread not ready"-messages

Posted by Kjetil Kjernsmo <kj...@kjernsmo.net>.
On Monday 24 July 2006 00:54, Daryl C. W. O'Shea wrote:
> Have a read through bug 4950.  This might be it.  If so, please
> provide as much info and log info as you can.  If not, please open a
> new bug.
>
> http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4590

Right, it looks like the same symptoms, if it has the same cause, I 
can't tell. In particular, I have very little load on my systems, and 
it is sysread rather than syswrite that grabbed my attention. In fact, 
if I grep for syswrite, I can't see it occuring, so I guess it is a new 
bug.
 
But since it happens many times every day here, I guess I can be of help 
in making straces and stuff. I very rarely do that, would some of the 
devs be on IRC today when I get back from work, in about 9 hours from 
now?

Cheers,

Kjetil
-- 
Kjetil Kjernsmo
Programmer / Astrophysicist / Ski-orienteer / Orienteer / Mountaineer
kjetil@kjernsmo.net
Homepage: http://www.kjetil.kjernsmo.net/     OpenPGP KeyID: 6A6A0BBC

Re: Frequent "sysread not ready"-messages

Posted by "Daryl C. W. O'Shea" <sp...@dostech.ca>.
Kjetil Kjernsmo wrote:

> In the error logs, I see a lot of errors like this:
>> Jul  9 12:38:03 robin spamd[10366]: prefork: sysread(9) not ready,
>> wait max 300 secs
>> Jul  9 12:38:09 robin spamd[10388]: prefork: periodic ping from spamd
>> parent
>> Jul  9 12:38:09 robin spamd[10388]: prefork: sysread(9) not ready,
>> wait max 300 secs

Have a read through bug 4950.  This might be it.  If so, please provide 
as much info and log info as you can.  If not, please open a new bug.

http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4590


Daryl

Re: Frequent "sysread not ready"-messages

Posted by Kjetil Kjernsmo <kj...@kjernsmo.net>.
Hello again!

Sorry to follow up on myself, I guess it was a bad idea to post such a 
long message, and at the end say that I was going on vacation. I'm back 
now, and the problem has by no means disappeared. I would be very 
grateful if someone could have a look. I'll try to summarize, the 
previous message is in the archives at 
http://mail-archives.apache.org/mod_mbox/spamassassin-users/200607.mbox/%3c200607091307.00402.kjetil@kjernsmo.net%3e

So, what I wrote was that:
> The main symptom is that quite a lot of email gets just
> this: X-Spam-Status: No, hits= required=
> 	tests=

This is SA 3.1.3. running on a box by itself, called remotely over a TCP 
socket. It is backed by a Pg 8.1 bayesian datastore.

In the error logs, I see a lot of errors like this:
> Jul  9 12:38:03 robin spamd[10366]: prefork: sysread(9) not ready,
> wait max 300 secs
> Jul  9 12:38:09 robin spamd[10388]: prefork: periodic ping from spamd
> parent
> Jul  9 12:38:09 robin spamd[10388]: prefork: sysread(9) not ready,
> wait max 300 secs


Also,
> Diving deeper into that log, I see things like:
> Sun Jul  9 12:44:01 2006 pooh[27263]: spamassassin
> Sun Jul  9 12:44:01 2006 pooh[27263]: check_spam: No, hits=,
> required=, tests=

And things have gone wrong.

Further, spamd has a dying child:

> 255 Jul  9 12:44:01 robin spamd[10225]: prefork: new lowest idle kid:
> 10388 Jul  9 12:44:01 robin spamd[10225]: spamd: handled cleanup of
> child pid 10366 due to SIGCHLD
> Jul  9 12:44:01 robin spamd[10225]: prefork: child closed connection
> Jul  9 12:44:01 robin spamd[10225]: prefork: child states: I
> Jul  9 12:44:01 robin spamd[10225]: spamd: server successfully
> spawned child process, pid 10399
> Jul  9 12:44:01 robin spamd[10225]: prefork: child 10399: entering
> state 0
> Jul  9 12:44:01 robin spamd[10399]: prefork: sysread(9) not ready,
> wait max 300 secs
> Jul  9 12:44:01 robin spamd[10225]: prefork: new lowest idle kid:
> 10388 Jul  9 12:44:01 robin spamd[10225]: prefork: child 10399:
> entering state 1
> Jul  9 12:44:01 robin spamd[10225]: prefork: new lowest idle kid:
> 10388 Jul  9 12:44:01 robin spamd[10225]: prefork: child reports idle
> Jul  9 12:44:01 robin spamd[10225]: prefork: child states: II

Any further clues on how to understand this problem would be 
appreciated.

Cheers,

Kjetil
-- 
Kjetil Kjernsmo
Programmer / Astrophysicist / Ski-orienteer / Orienteer / Mountaineer
kjetil@kjernsmo.net
Homepage: http://www.kjetil.kjernsmo.net/     OpenPGP KeyID: 6A6A0BBC