You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Stefan Jakobs <st...@rus.uni-stuttgart.de> on 2006/06/08 13:56:22 UTC

size of bayes db

Hello list,

I'm using SA 3.1.2 with amavis-new and postfix on a mailrelay. 
I turned on bayes autolearning with the standard options, but my bayes_seen db 
grows and grows, now it is by 1.1 GB.
Why reduce SA the size not automatically?
What can I do, to reduce the size of the db?
What are your experience with the bayes db?

Thanks for help.
Greetings
Stefan

Re: size of bayes db

Posted by Kris Deugau <kd...@vianet.ca>.
Stefan Jakobs wrote:
> I'm using SA 3.1.2 with amavis-new and postfix on a mailrelay. 
> I turned on bayes autolearning with the standard options, but my bayes_seen db 
> grows and grows, now it is by 1.1 GB.
> Why reduce SA the size not automatically?

Probably because its automatic expiry runs are getting interrupted by 
amavis-new.  Check back in the list archives;  quite a few people have 
had this problem.

For *any* file-based sitewide Bayes setup, IMO, you should set the SA 
options so it doesn't run automatic expiry, and set up a cron job to 
manually run the expiry process on a regular basis (daily is probably 
good for most sites;  *really* high-traffic sites can probably go every 
few hours but they should be using SQL-based Bayes anyway IMO <g>).

> What can I do, to reduce the size of the db?

Right away, you can manually expire tokens by running sa-learn 
--force-expire.

> What are your experience with the bayes db?

One legacy system still running 2.64 has had a stable Bayes db around 
40M for close to four years now.  (Possibly 5 years.  I don't recall 
when I upgraded to 2.5x on that box.)  Fairly early on, I disabled 
automatic expiry and set up a daily cron job to run the expiry process 
manually.  I've *never* had trouble with the database inflating out of 
control.

If you do set up a cron'ed expiry on your system, make sure it runs as 
the same user amavis-new is running as.  Otherwise you'll end up with 
file permission issues.

Check the man pages for your local SA install for the exact Bayes 
options you need to tweak.

-kgd

Re: size of bayes db

Posted by Kai Schaetzl <ma...@conactive.com>.
Stefan Jakobs wrote on Fri, 9 Jun 2006 11:06:47 +0200:

> It is a dbm db! The server process ~ 80 000 Mails per Day and the bayes_seen 
> db is 5 month old.

If you count both dbs together 1 GB might be what you end up with this volume 
and no expiry. What's your "salearn --dump magic" output? That will show you 
some statistics about your db. As an example, this is a dump of a 42 MB dbm 
database. I let it expire with a threshold of 1.5 Mio. tokens or so.

0.000          0      47588          0  non-token data: nspam
0.000          0      87524          0  non-token data: nham
0.000          0    1231268          0  non-token data: ntokens


 With such a large db you may be better off in terms of performance by using a 
sqlized one. But expect it to take even more space. With the volume of mail you 
get I'd expire everything older than a month.

Kai

-- 
Kai Schätzl, Berlin, Germany
Get your web at Conactive Internet Services: http://www.conactive.com




Re: size of bayes db

Posted by Stefan Jakobs <st...@rus.uni-stuttgart.de>.
Hallo,

Am Donnerstag, 8. Juni 2006 22:31 schrieb Kai Schaetzl:
> Stefan Jakobs wrote on Thu, 8 Jun 2006 13:56:22 +0200:
> > I turned on bayes autolearning with the standard options, but my
> > bayes_seen db grows and grows, now it is by 1.1 GB.
>
> This is indeed very much. This is a dbm db? (SQL has bigger sizes because
> of indexing.) How much mail do you process per day?

It is a dbm db! The server process ~ 80 000 Mails per Day and the bayes_seen 
db is 5 month old.

> Kai

Bye Stefan

Re: size of bayes db

Posted by Kai Schaetzl <ma...@conactive.com>.
Stefan Jakobs wrote on Thu, 8 Jun 2006 13:56:22 +0200:

> I turned on bayes autolearning with the standard options, but my bayes_seen db 
> grows and grows, now it is by 1.1 GB.

This is indeed very much. This is a dbm db? (SQL has bigger sizes because of 
indexing.) How much mail do you process per day?

Kai

-- 
Kai Schätzl, Berlin, Germany
Get your web at Conactive Internet Services: http://www.conactive.com