You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Grant Peel <gp...@thenetnow.com> on 2010/08/26 16:11:33 UTC

Expiring Beyes

Hi all,

I have serveral hundred domains on a box. Each domain's mail is controlled 
by a specific UNIX user.

Inside every user's directory, they have a user_prefs file.

While I have use_bayes 0 in the main config, some users have opted to turn 
on bayes in thier user_prefs.

This morning I noticed that one particular ~/.spamassassin/bayes* files had 
grown to 1.5 GB.

I have put:

use_bayes 0
bayes_auto_learn        0
bayes_auto_expire       1
bayes_expiry_max_db_size 50000

in the local.cf file, and restarted spamd.

The database did not appear to trim, so I tried:

sa-learn -u "user" -D --force-expire

and the database is still 1.5 GB.

I know I am doing something(s) incorrect, but can't figure out what.

How do I properly trim the offending file(s)?

Is there a command to trim all databases (sers) on the box?

Any advice would be appreciated.

Spamassassin 3.2.5
FreeBSD 8.0

-Grant 



Re: Expiring Beyes

Posted by RW <rw...@googlemail.com>.
On Thu, 26 Aug 2010 16:25:22 +0200
Yet Another Ninja <sa...@alexb.ch> wrote:


> I bet the biggest is bayes_seen.
> You can safely delete the bayes_seen file (unless you plan to
> "unlearn" msgs). I will stzart growing again, fast.

If the account has never been expired, bayes_seen is likely to be
negligible compared to the token file. It's only when the total expired
tokens hugely exceed the current tokens that it becomes a comparable
size. 



On Thu, 26 Aug 2010 10:11:33 -0400
"Grant Peel" <gp...@thenetnow.com> wrote:

> I have put:
> 
> use_bayes 0

why?

> bayes_auto_learn        0
> bayes_auto_expire       1
> bayes_expiry_max_db_size 50000

50,000 is far too low, the well out-of-date default is 150,000, and
there's a hardcoded minimum of 100,000, 1.5GB is approximately
50,000,000 by comparison.

Re: Expiring Beyes

Posted by Yet Another Ninja <sa...@alexb.ch>.
On 2010-08-26 16:11, Grant Peel wrote:
> Hi all,
> 
> I have serveral hundred domains on a box. Each domain's mail is 
> controlled by a specific UNIX user.
> 
> Inside every user's directory, they have a user_prefs file.
> 
> While I have use_bayes 0 in the main config, some users have opted to 
> turn on bayes in thier user_prefs.
> 
> This morning I noticed that one particular ~/.spamassassin/bayes* files 
> had grown to 1.5 GB.
> 
> I have put:
> 
> use_bayes 0
> bayes_auto_learn        0
> bayes_auto_expire       1
> bayes_expiry_max_db_size 50000
> 
> in the local.cf file, and restarted spamd.
> 
> The database did not appear to trim, so I tried:
> 
> sa-learn -u "user" -D --force-expire
> 
> and the database is still 1.5 GB.
> 
> I know I am doing something(s) incorrect, but can't figure out what.
> 
> How do I properly trim the offending file(s)?
> 
> Is there a command to trim all databases (sers) on the box?
> 
> Any advice would be appreciated.

I bet the biggest is bayes_seen.
You can safely delete the bayes_seen file (unless you plan to "unlearn" 
msgs). I will stzart growing again, fast.

the bayes_tokens file is the one which gets trimmed by expiration.
bayes_seen is what I call a parasite :-)

On a busy box, to avoid freezes I'd recommend settin
bayes_auto_expire       0

and do a cron'd force-expire during low traffic hours, eithe daily or 
weekly, depending on the bayes_tokes size.




h2h

Re: Expiring Bayes; aka bayes files stay BIG

Posted by RW <rw...@googlemail.com>.
On Wed, 15 Sep 2010 19:42:44 -0400
Dennis German <dg...@Real-World-Systems.com> wrote:

> On Sep 15, 2010, at 1:42 PM, RW wrote:
> 
> > On Wed, 15 Sep 2010 11:18:20 -0400
> > Dennis German <dg...@Real-World-Systems.com> wrote:

> >> I believe that  bayes_seen is a perl hash and will not be reduced
> >> in size by deleting entries. The only way to reduce it's size is
> >> to have a program read the current file, entry by entry and output
> >> to a new file. This will not copy deleted entries and the output
> >> will be significantly smaller. ...
> >>  Dennis German
> >> 
> > It's straightforward to do it with backup and restore, but the
> > problem is that that there is no time field. You might just as well
> > delete the file periodically.  
> 
> Thanks for the info however after running backup & restore:
> Before:
> 41,619,456 Sep 15 19:04 bayes_seen
> 2,543,616 Sep 15 19:04 bayes_toks 
> After:
> 43,511,808 Sep 15 19:26 bayes_seen
>  2,560,000 Sep 15 19:26 bayes_toks


autodelete doesn't remove signatures at all, so there's no point in
compacting. I misunderstood what you saying and thought you were talking
about removing entries. My point was that you can do that with
sa-learn, but there is no basis for selecting which entries to delete.
If you want to do it, it presumably could be done by switching to SQL
and adding a date field.

The trouble is that signatures shouldn't really be removed on a
multi-user system unless there is a centralized method for leaning -
otherwise there may be people who are relying on them to be kept
indefinitely e.g. they may occasionally run sa-learn --ham on a
read-mail folder.

Re: Expiring Bayes; aka bayes files stay BIG

Posted by Dennis German <dg...@Real-World-Systems.com>.
On Sep 15, 2010, at 1:42 PM, RW wrote:

> On Wed, 15 Sep 2010 11:18:20 -0400
> Dennis German <dg...@Real-World-Systems.com> wrote:
> 
>> On Aug 26, 2010, at 10:11 AM, Grant Peel wrote:
>> ...
>> ~/.spamassassin/bayes* files had grown to 1.5 GB
>>> I have put:
>>> use_bayes 0
>>> bayes_auto_learn        0
>>> bayes_auto_expire       1
>>> bayes_expiry_max_db_size 50000
>>> in the local.cf file, and restarted spamd.
>>> 
>>> The database did not appear to trim, so I tried:   sa-learn -u
>>> "user" -D --force-expire and the database is still 1.5 GB.
>>> I know I am doing something(s) incorrect, but can't figure out what.
>>> How do I properly trim the offending file(s)?
>>> Is there a command to trim all databases (sers) on the box?
>>> Any advice would be appreciated.   Spamassassin 3.2.5,  FreeBSD 8.0
>>> -Grant 
>>> 
>> I believe that  bayes_seen is a perl hash and will not be reduced in
>> size by deleting entries. The only way to reduce it's size is to have
>> a program read the current file, entry by entry and output to a new
>> file. This will not copy deleted entries and the output will be
>> significantly smaller. ...
>>  Dennis German
>> 
> It's straightforward to do it with backup and restore, but the problem
> is that that there is no time field. You might just as well delete
> the file periodically.  

Thanks for the info however after running backup & restore:
Before:
41,619,456 Sep 15 19:04 bayes_seen
2,543,616 Sep 15 19:04 bayes_toks 
After:
43,511,808 Sep 15 19:26 bayes_seen
 2,560,000 Sep 15 19:26 bayes_toks


Re: Expiring Bayes; aka bayes files are BIG

Posted by RW <rw...@googlemail.com>.
On Wed, 15 Sep 2010 11:18:20 -0400
Dennis German <dg...@Real-World-Systems.com> wrote:

> On Aug 26, 2010, at 10:11 AM, Grant Peel wrote:
> ...
>  ~/.spamassassin/bayes* files had grown to 1.5 GB
> > I have put:
> > use_bayes 0
> > bayes_auto_learn        0
> > bayes_auto_expire       1
> > bayes_expiry_max_db_size 50000
> > in the local.cf file, and restarted spamd.
> > 
> > The database did not appear to trim, so I tried:   sa-learn -u
> > "user" -D --force-expire and the database is still 1.5 GB.
> > I know I am doing something(s) incorrect, but can't figure out what.
> > How do I properly trim the offending file(s)?
> > Is there a command to trim all databases (sers) on the box?
> > Any advice would be appreciated.   Spamassassin 3.2.5,  FreeBSD 8.0
> > -Grant 
> > 
> I believe that  bayes_seen is a perl hash and will not be reduced in
> size by deleting entries. The only way to reduce it's size is to have
> a program read the current file, entry by entry and output to a new
> file. This will not copy deleted entries and the output will be
> significantly smaller. I don't know of any program, but if there is
> interest I might write one. Dennis German
> 
It's straightforward to do it with backup and restore, but the problem
is that that there is no time field. You might just as well delete
the file periodically.  

Re: Expiring Bayes; aka bayes files are BIG

Posted by Dennis German <dg...@Real-World-Systems.com>.
On Aug 26, 2010, at 10:11 AM, Grant Peel wrote:
...
 ~/.spamassassin/bayes* files had grown to 1.5 GB
> I have put:
> use_bayes 0
> bayes_auto_learn        0
> bayes_auto_expire       1
> bayes_expiry_max_db_size 50000
> in the local.cf file, and restarted spamd.
> 
> The database did not appear to trim, so I tried:   sa-learn -u "user" -D --force-expire
> and the database is still 1.5 GB.
> I know I am doing something(s) incorrect, but can't figure out what.
> How do I properly trim the offending file(s)?
> Is there a command to trim all databases (sers) on the box?
> Any advice would be appreciated.   Spamassassin 3.2.5,  FreeBSD 8.0
> -Grant 
> 
I believe that  bayes_seen is a perl hash and will not be reduced in size by deleting entries.
The only way to reduce it's size is to have a program read the current file, entry by entry and
output to a new file. This will not copy deleted entries and the output will be significantly smaller.
I don't know of any program, but if there is interest I might write one.
Dennis German


Re: Expiring Beyes

Posted by Bowie Bailey <Bo...@BUC.com>.
 On 8/26/2010 10:11 AM, Grant Peel wrote:
> Hi all,
>
> I have serveral hundred domains on a box. Each domain's mail is
> controlled by a specific UNIX user.
>
> Inside every user's directory, they have a user_prefs file.
>
> While I have use_bayes 0 in the main config, some users have opted to
> turn on bayes in thier user_prefs.
>
> This morning I noticed that one particular ~/.spamassassin/bayes*
> files had grown to 1.5 GB.
>
> I have put:
>
> use_bayes 0
> bayes_auto_learn        0
> bayes_auto_expire       1
> bayes_expiry_max_db_size 50000
>
> in the local.cf file, and restarted spamd.
>
> The database did not appear to trim, so I tried:
>
> sa-learn -u "user" -D --force-expire
>
> and the database is still 1.5 GB.
>
> I know I am doing something(s) incorrect, but can't figure out what.
>
> How do I properly trim the offending file(s)?
>
> Is there a command to trim all databases (sers) on the box?
>
> Any advice would be appreciated.
>
> Spamassassin 3.2.5
> FreeBSD 8.0

I believe the 'sa-learn -u' command only works when you are using an SQL
backend.

Try this:

su - user -c sa-learn --force-expire

-- 
Bowie