You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by ma...@seaan.net on 2007/02/27 15:46:39 UTC

per user bayes db: auto_expiry problem, spamd child timeout, very long scantimes

Hi all

I run a site for more than 2000 mailboxes with Postfix, SA 3.1.8 and
procmail. Every user has his own bayes db. Allow_user_rules is
deactivated.

I have a number of problems:

A number of emails passes spamd unfiltered due to spamd child timeout.
Looking at the scantime it often is far more than the 220s that are
defined as a timeout value. Some emails have a scantime of more than 900
seconds. Although I use SARE rules I do not blame them because I had this
problem already with SA 3.0|1.x.
It is possible that this problem is linked to the second problem. I have a
timeout on auto_expiry.
To address both issues I followed the hints and tipps that already were
discussed here not long ago. Yesterday I disabled auto_expiry and now run
sa-learn --force-expire --sync manually for those users that are concerned
by the expiry problem. I impossibly can run a force-expire job on a daily
cron basis for all users. This would simply use up the 24h a day has.
Also I have noticed that some users do have 1 to 2 million tokens in the
bayes db. A number between 150k to 200k is normal.
bayes_expiry_max_db_size default would be 150'000 and I havn't changed
this value.

What are the possible reasons why auto_expiry wouldn't expire such a huge
number of tokens?

I do not see a relation to a huge load on the SpamAssassin Servers (I have
2 of them). The timeout problems happen when there is small load (10 out
of 20 spamds marked Busy) as well as when there are 45 spamds forked with
35 marked Busy.

I wonder if I have to migrate from bayes db per user to a site-wide bayes
db. What would change?

In particular, these are the error messages:
spamd[27428]: child processing timeout at spamd line 1086, <GEN209> line 503.
spamd[3692]: bayes: expire_old_tokens: child processing timeout at spamd
line 1086, <GEN245> line 56.

Thank you very much in advance for any hints. I'd be really grateful.

Philipp



Re: per user bayes db: auto_expiry problem --->> SOLVED

Posted by ma...@seaan.net.
Hi Folks,

it's been a while I asked here how to solve bayes timeout and spamd child
timeout problems.
Well, at least for our environment I have found a solution that seems to
work.
Also, I have a theory about the reason for this bayes timeout and spamd
child timeout problems and I'd like to know whether this theory is
correct.

Symptoms:
 child processing timeout at spamd line 1086, <GEN786> line 108.
 child processing timeout at spamd line 1086, <GEN73> line 209.
 ...
 bayes: child processing timeout at spamd line 1086.


Reason:
spamc timeout set to 220s
spamd timeout set to 240s
procmail timeout set to 300s

First I did what everybody suggested. Disabling bayes_auto_expire in
local.cf and doing the job manually per user. I wrote a script that
extracted the users from the maillog that had a scantime of more than 220s
and ran a sa-learn -u $user --force-expire --sync. The problem stayed
unsolved. Then I changed the timeouts to values more than twice as high.
Result: For nearly 2 days I had no timeout errors anymore. Then I checked
once more the logs and I saw a lot of users having scantimes quite above
300s but lower than the new values. Those where users, that never before
have had come up in my logs with such high scantimes. Then, I basically
ran the whole day --force-expire --sync.
I realized that the manual force-expire job was not applicable for 2700
users and a 2.5GB Bayes DB in mysql (myisam engine). Also I realized that
doing the --force-expire job manually probably would mess up some or most
of the users Bayes DBs.

I changed back to auto_expire = 1 in local.cf and restarted spamd.
This is what happened next for a number of users:
bayes: expire_old_tokens: child processing timeout at spamd line 1086,
This was on Tuesday, March 14. Since then I have had no problems anymore
with spamd child timeouts.


I have not looked into the spamd code and I think I shouldnt do it as I am
no perl coder. Nevertheless I have a theory why the short timeout values
could have such a heavy impact:

If the timeouts are too short, spamd under some circumstances cannot
finish the bayes expire job if bayes_auto_expire is enabled in local.cf. I
hope, I correctly understand the expire job as a database cleanup job.
Thus, if it can't be finished, it turns from a cleanup to a messup job;
the problem gets wors or at least stayes at least as bad as it is.
Now, I hope that by changing the mysql engine from myisam to innodb which
is  capable of doing DB transactions and is suggested by the SpamAssassin
people in the Bayes manpages the expire job gets finished even if spamd
suffers a timeout.

Your comments?

Philipp






Re: per user bayes db: auto_expiry problem, spamd child timeout, very long scantimes

Posted by ma...@seaan.net.
> On Tue, 27 Feb 2007 mailinglists@seaan.net wrote:
>
>> Some emails have a scantime of more than 900 seconds.
>>
>> I do not see a relation to a huge load on the SpamAssassin Servers
>> (I have 2 of them). The timeout problems happen when there is
>> small load (10 out of 20 spamds marked Busy) as well as when there
>> are 45 spamds forked with 35 marked Busy.
>
> That really smells like swap thrashing. How much memory is in your SA
> servers, and what does procinfo / top report for swap used vs. swap
> available when things are going pear-shaped?

RAM doesn't seem to be the issue here. Both spamd boxes are equipped with
4GB RAM. Although all is used up 2.4GB, 1GB respectively are used for disk
cache. Swap space is untouched and swap pages per second in|out are near
zero. CPU load on both boxes peaks at 10%.

Hence both spamd boxes request and write their bayes stuff to one and the
same mysql box I suspect there to be the problem.
This morning I ran a manual sa-learn --force-expire --sync job on the
users the timeout problem occured during the night. While running the job
I had several timeout errors on the bayes DB as well as on certain spamd
children.
In the coming days I will try to reproduce the problem by stressing the
SQL based bayes db using a parallelized sa-learn --force-expire job.

Philipp






> --
>  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
>  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
>  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
> -----------------------------------------------------------------------
>   Users mistake widespread adoption of Microsoft Office as the
>   development of a standard document format.
> -----------------------------------------------------------------------
>  14 days until Albert Einstein's 128th Birthday
>
>



Re: per user bayes db: auto_expiry problem, spamd child timeout, very long scantimes

Posted by "John D. Hardin" <jh...@impsec.org>.
On Tue, 27 Feb 2007 mailinglists@seaan.net wrote:

> Some emails have a scantime of more than 900 seconds.
> 
> I do not see a relation to a huge load on the SpamAssassin Servers
> (I have 2 of them). The timeout problems happen when there is
> small load (10 out of 20 spamds marked Busy) as well as when there
> are 45 spamds forked with 35 marked Busy.

That really smells like swap thrashing. How much memory is in your SA 
servers, and what does procinfo / top report for swap used vs. swap 
available when things are going pear-shaped?

--
 John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
 jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
  Users mistake widespread adoption of Microsoft Office as the
  development of a standard document format.
-----------------------------------------------------------------------
 14 days until Albert Einstein's 128th Birthday