You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by andrij <an...@gmail.com> on 2010/08/02 14:29:32 UTC

DB tokens expiration

Hi all,

after I trained the bayes classifier with several thousands of e-mails I run
"sa-learn --dump magic" and I got the following:

0.000          0          3          0  non-token data: bayes db version
0.000          0       5367          0  non-token data: nspam
0.000          0       3792          0  non-token data: nham
0.000          0     344519          0  non-token data: ntokens
0.000          0  847133240          0  non-token data: oldest atime
0.000          0 1274448689          0  non-token data: newest atime
0.000          0          0          0  non-token data: last journal sync
atime
0.000          0 1280569532          0  non-token data: last expiry atime
0.000          0    2764800          0  non-token data: last expire atime
delta
0.000          0     196817          0  non-token data: last expire
reduction count

I have the default settings set, i.e., "bayes_expiry_max_db_size 150000" and
"bayes_auto_expire 1". 

Why was the number of ntokens not reduced to 150000?

"last expiry atime" is greater than "newest atime". Does it mean that
reduction is just going to occur? 

The "last expire reduction count" means that in time 1280569532 the number
of tokens will be reduced by 196817?

If I do not add any new token (so the "newest atime" will not change) the
reduction will never occur?

Thank you.
-- 
View this message in context: http://old.nabble.com/DB-tokens-expiration-tp29324703p29324703.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.


Re: DB tokens expiration

Posted by RW <rw...@googlemail.com>.
On Mon, 2 Aug 2010 05:29:32 -0700 (PDT)
andrij <an...@gmail.com> wrote:

> 
> Hi all,
> 
> after I trained the bayes classifier with several thousands of
> e-mails I run "sa-learn --dump magic" and I got the following:
> 

> Why was the number of ntokens not reduced to 150000?

The expiry algorithm isn't very good. Sometimes it fails to translate
the token reduction to a sensible atime cut-off and just gives-in.

In the file Conf.pm find the line 

  $self->{bayes_expiry_max_exponent} = 9;

and try changing the 9 to 14.   

If that doesn't work it may just be a matter of waiting for the
distribution of atimes to get modified by new mail. It will probably
work in the end.



> "last expiry atime" is greater than "newest atime". Does it mean that
> reduction is just going to occur? 

 last expiry atime is the time the expiry occured - which is more recent
 than the tokens in the database 

> The "last expire reduction count" means that in time 1280569532 the
> number of tokens will be reduced by 196817?

No it means that 196817 tokens were removed at 1280569532

> If I do not add any new token (so the "newest atime" will not change)

atime are updated when you scan a mail too