You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Richard Smits <R....@tudelft.nl> on 2007/06/12 09:29:53 UTC

How to decrease the bayes database size

Hello,

We realy need some help here. It has come to our attention that our 
bayes database is 2.4 GB big. It is really slowing down our servers and 
they have a big cpu load.

Now we have tried the trick with the sa-learn --force-expire , and it 
deletes a lot of entrys, but the file is not getting any smaller.

79K  Jun 12 09:26 bayes_journal
20M  Jun 12 09:26 bayes_toks
2.5G Jun 12 09:26 bayes_seen*

Does anyone has some tricks to help us out ?

Greetings... Richard Smits

----
0.000          0          3          0  non-token data: bayes db version
0.000          0   14201082          0  non-token data: nspam
0.000          0    7760360          0  non-token data: nham
0.000          0     916962          0  non-token data: ntokens
0.000          0 1181559955          0  non-token data: oldest atime
0.000          0 1181633069          0  non-token data: newest atime
0.000          0 1181633115          0  non-token data: last journal 
sync atime
0.000          0 1181604237          0  non-token data: last expiry atime
0.000          0      43200          0  non-token data: last expire 
atime delta
0.000          0     360013          0  non-token data: last expire 
reduction count

----------------------

Re: How to decrease the bayes database size

Posted by Matt Kettler <mk...@verizon.net>.
Richard Smits wrote:
> Hello,
>
> We realy need some help here. It has come to our attention that our
> bayes database is 2.4 GB big. It is really slowing down our servers
> and they have a big cpu load.
>
> Now we have tried the trick with the sa-learn --force-expire , and it
> deletes a lot of entrys, but the file is not getting any smaller.
>
> 79K  Jun 12 09:26 bayes_journal
> 20M  Jun 12 09:26 bayes_toks
> 2.5G Jun 12 09:26 bayes_seen*
>
> Does anyone has some tricks to help us out ?
Spamassassin does not have any expiry for bayes_seen. Expiry only
shrinks the bayes_toks file.

Starting with SA 3.0.0, it was made safe to delete it, so as Phil Randal
suggested, you need to delete it.

See also:

http://issues.apache.org/SpamAssassin/show_bug.cgi?id=2975

Re: How to decrease the bayes database size

Posted by Richard Smits <R....@tudelft.nl>.
Stéphane LEPREVOST wrote:
>  
> Thanks Theo for these usefull answers.
> 
> As we're using auto_learn and never use sa-learn by hand, is there a more
> particular risk if we simply delete the file ?
> 
> Here's the configuration we use about Bayes :
> 
> use_bayes 1
> use_bayes_rules 1
> bayes_auto_learn 1
> 
> -----Message d'origine-----
> De : Theo Van Dinter [mailto:felicity@apache.org] 
> Envoyé : mardi 12 juin 2007 17:06
> À : users@spamassassin.apache.org
> Objet : Re: How to decrease the bayes database size
> 
> On Tue, Jun 12, 2007 at 10:07:15AM +0200, Stéphane LEPREVOST wrote:
>> Thanks for this tip but what about the efficiency of the Bayes 
>> Database after this operation ?
> 
> The _seen database just tracks which mails have been learned from, and has
> no effect on the ratings coming out of the Bayes system.
> 
>> Is ther a way to export the real records of the file before deleting 
>> it and then re-import them back to it ? Shall we use something similar 
>> to check_whitelist and trim_whitelist tools ?
> 
> There'd be no point to that, entries are only deleted rarely (whenever you
> do a "sa-learn --forget"), otherwise they're just added.
> 
> If you're not worried about relearning the same mail, then just delete the
> seen DB file.
> 
> --
> Randomly Selected Tagline:
> Last year we drove across the country...  We switched on the driving...
>  every half mile.  We had one cassette tape to listen to on the entire trip.
>  I don't remember what it was.
>  		-- Steven Wright
> 
> 

Thank you all for these usefull answers. I have deleted the bayes_seen 
file and things are looking better now. Not perfect.
Sometimes I get an amavisd process with a memory load of 2 GB. This 
seems really out of proportions.

17581 amavis    25   0 2549M 2.1G   444 R    21.9 72.1   3:15   1 amavisd

This process goes away, but really slows things down. Could this be a 
corrupt database, or should I look at a different angle ?

Greetings... Richard

RE: How to decrease the bayes database size

Posted by Stéphane LEPREVOST <st...@soget.fr>.
 
Thanks Theo for these usefull answers.

As we're using auto_learn and never use sa-learn by hand, is there a more
particular risk if we simply delete the file ?

Here's the configuration we use about Bayes :

use_bayes 1
use_bayes_rules 1
bayes_auto_learn 1

-----Message d'origine-----
De : Theo Van Dinter [mailto:felicity@apache.org] 
Envoyé : mardi 12 juin 2007 17:06
À : users@spamassassin.apache.org
Objet : Re: How to decrease the bayes database size

On Tue, Jun 12, 2007 at 10:07:15AM +0200, Stéphane LEPREVOST wrote:
> Thanks for this tip but what about the efficiency of the Bayes 
> Database after this operation ?

The _seen database just tracks which mails have been learned from, and has
no effect on the ratings coming out of the Bayes system.

> Is ther a way to export the real records of the file before deleting 
> it and then re-import them back to it ? Shall we use something similar 
> to check_whitelist and trim_whitelist tools ?

There'd be no point to that, entries are only deleted rarely (whenever you
do a "sa-learn --forget"), otherwise they're just added.

If you're not worried about relearning the same mail, then just delete the
seen DB file.

--
Randomly Selected Tagline:
Last year we drove across the country...  We switched on the driving...
 every half mile.  We had one cassette tape to listen to on the entire trip.
 I don't remember what it was.
 		-- Steven Wright



Re: How to decrease the bayes database size

Posted by Theo Van Dinter <fe...@apache.org>.
On Tue, Jun 12, 2007 at 10:07:15AM +0200, Stéphane LEPREVOST wrote:
> Thanks for this tip but what about the efficiency of the Bayes Database
> after this operation ?

The _seen database just tracks which mails have been learned from, and has no
effect on the ratings coming out of the Bayes system.

> Is ther a way to export the real records of the file before deleting it and
> then re-import them back to it ? Shall we use something similar to
> check_whitelist and trim_whitelist tools ?

There'd be no point to that, entries are only deleted rarely (whenever you do
a "sa-learn --forget"), otherwise they're just added.

If you're not worried about relearning the same mail, then just delete the
seen DB file.

-- 
Randomly Selected Tagline:
Last year we drove across the country...  We switched on the driving...
 every half mile.  We had one cassette tape to listen to on the entire trip.
 I don't remember what it was.
 		-- Steven Wright

RE: How to decrease the bayes database size

Posted by Stéphane LEPREVOST <st...@soget.fr>.
Hi Phil,

Thanks for this tip but what about the efficiency of the Bayes Database
after this operation ?

I was thinking that the most this file can "remember", the most the bayes
filtering is efficient... In the limit of a reasonable file size of course !

As Richard said, "with the sa-learn --force-expire" ... "it deletes a lot of
entrys", but the file's size still remain the same.

Is ther a way to export the real records of the file before deleting it and
then re-import them back to it ? Shall we use something similar to
check_whitelist and trim_whitelist tools ?

-----Message d'origine-----
De : Randal, Phil [mailto:prandal@herefordshire.gov.uk] 
Envoyé : mardi 12 juin 2007 09:37
À : Richard Smits; users@spamassassin.apache.org
Objet : RE: How to decrease the bayes database size

bayes_seen just grows like topsy. All you need to do is delete it and let SA
recreate it.

Stop spamd / MailScanner / whatever.

check permissions on bayes_seen

rm bayes_seen

restart

do an sa-learn to make sure it still works (if it doesn't, reset permissions
on the newly created bayes_seen).

Cheers,

Phil
--
Phil Randal
Network Engineer
Herefordshire Council
Hereford, UK  

> -----Original Message-----
> From: Richard Smits [mailto:R.Smits@tudelft.nl]
> Sent: 12 June 2007 08:30
> To: users@spamassassin.apache.org
> Subject: How to decrease the bayes database size
> 
> Hello,
> 
> We realy need some help here. It has come to our attention that our 
> bayes database is 2.4 GB big. It is really slowing down our servers 
> and they have a big cpu load.
> 
> Now we have tried the trick with the sa-learn --force-expire , and it 
> deletes a lot of entrys, but the file is not getting any smaller.
> 
> 79K  Jun 12 09:26 bayes_journal
> 20M  Jun 12 09:26 bayes_toks
> 2.5G Jun 12 09:26 bayes_seen*
> 
> Does anyone has some tricks to help us out ?
> 
> Greetings... Richard Smits
> 
> ----
> 0.000          0          3          0  non-token data: bayes 
> db version
> 0.000          0   14201082          0  non-token data: nspam
> 0.000          0    7760360          0  non-token data: nham
> 0.000          0     916962          0  non-token data: ntokens
> 0.000          0 1181559955          0  non-token data: oldest atime
> 0.000          0 1181633069          0  non-token data: newest atime
> 0.000          0 1181633115          0  non-token data: last journal 
> sync atime
> 0.000          0 1181604237          0  non-token data: last 
> expiry atime
> 0.000          0      43200          0  non-token data: last expire 
> atime delta
> 0.000          0     360013          0  non-token data: last expire 
> reduction count
> 
> ----------------------
> 




RE: How to decrease the bayes database size

Posted by "Randal, Phil" <pr...@herefordshire.gov.uk>.
bayes_seen just grows like topsy. All you need to do is delete it and
let SA recreate it.

Stop spamd / MailScanner / whatever.

check permissions on bayes_seen

rm bayes_seen

restart

do an sa-learn to make sure it still works (if it doesn't, reset
permissions on the newly created bayes_seen).

Cheers,

Phil
--
Phil Randal
Network Engineer
Herefordshire Council
Hereford, UK  

> -----Original Message-----
> From: Richard Smits [mailto:R.Smits@tudelft.nl] 
> Sent: 12 June 2007 08:30
> To: users@spamassassin.apache.org
> Subject: How to decrease the bayes database size
> 
> Hello,
> 
> We realy need some help here. It has come to our attention that our 
> bayes database is 2.4 GB big. It is really slowing down our 
> servers and 
> they have a big cpu load.
> 
> Now we have tried the trick with the sa-learn --force-expire , and it 
> deletes a lot of entrys, but the file is not getting any smaller.
> 
> 79K  Jun 12 09:26 bayes_journal
> 20M  Jun 12 09:26 bayes_toks
> 2.5G Jun 12 09:26 bayes_seen*
> 
> Does anyone has some tricks to help us out ?
> 
> Greetings... Richard Smits
> 
> ----
> 0.000          0          3          0  non-token data: bayes 
> db version
> 0.000          0   14201082          0  non-token data: nspam
> 0.000          0    7760360          0  non-token data: nham
> 0.000          0     916962          0  non-token data: ntokens
> 0.000          0 1181559955          0  non-token data: oldest atime
> 0.000          0 1181633069          0  non-token data: newest atime
> 0.000          0 1181633115          0  non-token data: last journal 
> sync atime
> 0.000          0 1181604237          0  non-token data: last 
> expiry atime
> 0.000          0      43200          0  non-token data: last expire 
> atime delta
> 0.000          0     360013          0  non-token data: last expire 
> reduction count
> 
> ----------------------
> 

RE: How to decrease the bayes database size

Posted by Stéphane LEPREVOST <st...@soget.fr>.
Hi,

Same problem here with a 1.3G bayes_seen file.

No CPU load linked to this but a too big file is never good...

Can someone help to deal with this ? As long as I remember this problem were
discussed a lot of time here but I never saw   a trick for this

-----Message d'origine-----
De : Richard Smits [mailto:R.Smits@tudelft.nl] 
Envoyé : mardi 12 juin 2007 09:30
À : users@spamassassin.apache.org
Objet : How to decrease the bayes database size

Hello,

We realy need some help here. It has come to our attention that our bayes
database is 2.4 GB big. It is really slowing down our servers and they have
a big cpu load.

Now we have tried the trick with the sa-learn --force-expire , and it
deletes a lot of entrys, but the file is not getting any smaller.

79K  Jun 12 09:26 bayes_journal
20M  Jun 12 09:26 bayes_toks
2.5G Jun 12 09:26 bayes_seen*

Does anyone has some tricks to help us out ?

Greetings... Richard Smits

----
0.000          0          3          0  non-token data: bayes db version
0.000          0   14201082          0  non-token data: nspam
0.000          0    7760360          0  non-token data: nham
0.000          0     916962          0  non-token data: ntokens
0.000          0 1181559955          0  non-token data: oldest atime
0.000          0 1181633069          0  non-token data: newest atime
0.000          0 1181633115          0  non-token data: last journal 
sync atime
0.000          0 1181604237          0  non-token data: last expiry atime
0.000          0      43200          0  non-token data: last expire 
atime delta
0.000          0     360013          0  non-token data: last expire 
reduction count

----------------------