You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Steven Moix <st...@axianet.ch> on 2006/01/26 10:52:20 UTC

bayes_seen and bayes_toks DB size

Hello all,

I'm currently running a mail server with Postfix + amavsid-new + SA  
3.1 with a global bayesian filtering and auto-learn enabled. It works  
perfectly except that since some days I notice that my bayes_seen and  
bayes_toks databases are not growing anymore...let's have a look at  
the current status (size in bytes, date, file):

20548 Jan 26 10:29 bayes_journal
323584 Jan 26 10:27 bayes_seen (That's exactly 316x1024)
5242880 Jan 26 10:27 bayes_toks (That's exactly 5120x1024)

The bayes_journal file is rotating from 0 to 102400 bytes according  
to the "bayes_journal_max_size 102400" directive and every time it  
hits it's maximal size the date on the bayes_seen and bayes_toks  
files gets updated so something is happening to these files.

I think that I have reached a point where the old tokens are simply  
beeing replaced with new ones from the bayes_journal and that's why  
the file size doesn't increment anymore...am I right?

I also tried to increse the "bayes_expiry_max_db_size" from 150000 to  
500000 but it didn't change anything...

Thanks
Steven

Re: bayes_seen and bayes_toks DB size

Posted by Steven Moix <st...@axianet.ch>.
Ok, that's a perfect answer to my questions..I didn't think of the DB  
preallocation.

Thanks!

Steven

On Jan 26, 2006, at 4:45 PM, Theo Van Dinter wrote:

> On Thu, Jan 26, 2006 at 10:52:20AM +0100, Steven Moix wrote:
>> I'm currently running a mail server with Postfix + amavsid-new + SA
>> 3.1 with a global bayesian filtering and auto-learn enabled. It works
>> perfectly except that since some days I notice that my bayes_seen and
>> bayes_toks databases are not growing anymore...let's have a look at
>> the current status (size in bytes, date, file):
>
> Yeah, that's perfectly fine.  Berkeley DB expands the file when it  
> needs to,
> but preallocates space to be more efficient for new entries.  This  
> works well,
> but makes it difficult to get the space back since the DB file  
> stays the same
> size even if you delete all the entries -- which is why SA has to  
> build a new
> DB, copy over entries, then delete and swap, whenever we do an expire.
>
>> I also tried to increse the "bayes_expiry_max_db_size" from 150000 to
>> 500000 but it didn't change anything...
>
> That setting tells SA to let more tokens go into the DB, but we  
> leave managing
> the DB file to Berkeley DB so it'll expand when it has to expand.
>
> -- 
> Randomly Generated Tagline:
> "Aiee!" - Linux kernel error message


Re: bayes_seen and bayes_toks DB size

Posted by Theo Van Dinter <fe...@apache.org>.
On Thu, Jan 26, 2006 at 10:52:20AM +0100, Steven Moix wrote:
> I'm currently running a mail server with Postfix + amavsid-new + SA  
> 3.1 with a global bayesian filtering and auto-learn enabled. It works  
> perfectly except that since some days I notice that my bayes_seen and  
> bayes_toks databases are not growing anymore...let's have a look at  
> the current status (size in bytes, date, file):

Yeah, that's perfectly fine.  Berkeley DB expands the file when it needs to,
but preallocates space to be more efficient for new entries.  This works well,
but makes it difficult to get the space back since the DB file stays the same
size even if you delete all the entries -- which is why SA has to build a new
DB, copy over entries, then delete and swap, whenever we do an expire.

> I also tried to increse the "bayes_expiry_max_db_size" from 150000 to  
> 500000 but it didn't change anything...

That setting tells SA to let more tokens go into the DB, but we leave managing
the DB file to Berkeley DB so it'll expand when it has to expand.

-- 
Randomly Generated Tagline:
"Aiee!" - Linux kernel error message