You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Federico Giannici <gi...@neomedia.it> on 2005/06/15 12:59:36 UTC

bayes_seen file of 340MB

We have a SpamAssassin installation with a single bayes database for all 
   our mailboxes (a couple thousand).

I think that the "bayes_toks" file has the expected size (around 8MB), 
but the "bayes_seen" file seems too big to me: around 340MB!
Is this size normal?
Doesn't such a dimension slow down the queries?


Here is our "local.cf" content:

use_bayes 1
bayes_path /var/spamassassin/bayes
bayes_use_hapaxes 1
bayes_auto_learn 1
bayes_learn_to_journal 1
bayes_journal_max_size 1000000
bayes_expiry_max_db_size 250000


Here is the "sa-learn --dump magic" output:

0.000          0          3          0  non-token data: bayes db version
0.000          0    5103649          0  non-token data: nspam
0.000          0    1439768          0  non-token data: nham
0.000          0     448750          0  non-token data: ntokens
0.000          0 1118530322          0  non-token data: oldest atime
0.000          0 1118832322          0  non-token data: newest atime
0.000          0 1118832323          0  non-token data: last journal 
sync atime
0.000          0 1118797752          0  non-token data: last expiry atime
0.000          0      43200          0  non-token data: last expire 
atime delta
0.000          0     186807          0  non-token data: last expire 
reduction count


Thanks.

-- 
___________________________________________________
     __
    |-                      giannici@neomedia.it
    |ederico Giannici      http://www.neomedia.it
___________________________________________________

Re: bayes_seen file of 340MB

Posted by Matt Kettler <mk...@comcast.net>.
At 06:59 AM 6/15/2005, Federico Giannici wrote:
>We have a SpamAssassin installation with a single bayes database for 
>all   our mailboxes (a couple thousand).
>
>I think that the "bayes_toks" file has the expected size (around 8MB), but 
>the "bayes_seen" file seems too big to me: around 340MB!
>Is this size normal?

Yes, bayes_seen doesn't have expiry (yet). It was completely overlooked in 
the original bayes design.

It should be addressed in 3.1.0 (although I'm not sure if it's very 
automatic unless you take the path of disabling bayes_seen)

http://bugzilla.spamassassin.org/show_bug.cgi?id=2975

In the interim, you can stop SA and delete the file.

Be aware that when you do so, messages that have already been trained can 
be re-learned with sa-learn. This is not a big deal for most, but a few 
people rely on dumping files into a directory and learning the whole 
directory, including the files from the last learning run.


>Doesn't such a dimension slow down the queries?

I'm not sure, probably. I try to wipe my bayes seen on occasion.