You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by "Steve [Spamassasin]" <sp...@shic.co.uk> on 2006/10/26 13:40:55 UTC

bayes_toks, expiry and spamd...

This feels like a series of FAQs, but previous frequent answers don't
seem to answer my questions directly...

With Spamassassin 3.1.4 I'm running spamd. and my global procmail uses
spamc to process  mail.  Individual users train/report with spamc too. 
In an end-user account there's a .spamassasin directory and this contains:

auto-whitelist 
bayes_toks
user_prefs
bayes_journal
bayes_seen

All of which makes sense... Over time, however, there is a build-up of 
bayes_toks.expire$$$$ files (where $ is a decimal digit) and I'm unclear
about these.    Anecdotally, when there are lots of these
bayes_toks.expire$$$$ files, from time-to-time, emails stop being
processed by spamassassin and mail and spam are delivered to my inbox
without any spamassassin headers.  This happened most recently this
overnight and, subsequently, no messages were processed for spam.  I
re-started spamassassin and things seemed to work again... I ran
sa-learn --force-expire and it reported keeping ~17,000 tokens and
expiring ~6,000.  My bayes_toks.expire$$$$ files remained.  This left me
with lots of unanswered questions...

What causes the creation of a bayes_toks.expire$$$$ file?
Do bayes_toks.expire$$$$ files affect performance, or just consume disk
space?
What effect would deleting these files have on spamassassin Bayesian
processing?
Is it likely that the 'failure' of spamassassin arose as a consequence
of a growing number of entries in bayes_toks, or is it more likely a
fault triggered by processing a malicious mail?
I've seen vague references to time-out settings - is this likely a
configuration issue (if so, which configuration options should be my focus)?
The fact that my forced expiry kept < 75% of the tokens suggests to me
that expiry was not happening automatically... should it be?  How can I
tell if it is working?
Should I be regularly forcing expiry from a cron-job?



Re: bayes_toks, expiry and spamd...

Posted by Theo Van Dinter <fe...@apache.org>.
On Thu, Oct 26, 2006 at 12:40:55PM +0100, Steve [Spamassasin] wrote:
> All of which makes sense... Over time, however, there is a build-up of 
> bayes_toks.expire$$$$ files (where $ is a decimal digit) and I'm unclear
> about these.    Anecdotally, when there are lots of these

They're the temporary files that exist while SA is trying to expire old tokens
from Bayes.  If they're still around after the process exits, it's either a
bug or more likely something you have is killing the processes (timeout?).

> What causes the creation of a bayes_toks.expire$$$$ file?

bayes expiry.

> Do bayes_toks.expire$$$$ files affect performance, or just consume disk
> space?

I don't know how it would impact performance, but if an expiry is running it's
being used by expiry and if allowed to complete will go away.  Otherwise, it's
a non-complete file just consuming disk space.

> What effect would deleting these files have on spamassassin Bayesian
> processing?

None, as long as the file(s) deleted aren't in use by an expiry.

> Is it likely that the 'failure' of spamassassin arose as a consequence
> of a growing number of entries in bayes_toks, or is it more likely a
> fault triggered by processing a malicious mail?

It's not likely to be either.  When an expire kicks off, that child will be
busy until the expire is done.  If all of your children are busy, messages
will queue up waiting for a free child.  If however you call SA decides to
timeout, you don't get processing.

> I've seen vague references to time-out settings - is this likely a
> configuration issue (if so, which configuration options should be my focus)?

It'd be the timeout of what you use to call SA, so ...  can't answer that
question.

> Should I be regularly forcing expiry from a cron-job?

It depends on your setup.  It sounds like you'll want to disable auto expiry,
and run it from cron periodically, based on your setup.

-- 
Randomly Selected Tagline:
"Guys are lucky because they get to grow mustaches. I wish I could.
 It's like having a little pet for your face."   - Anita Wise