You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Dennis German <dg...@Real-World-Systems.com> on 2010/09/15 17:18:20 UTC

Re: Expiring Bayes; aka bayes files are BIG

On Aug 26, 2010, at 10:11 AM, Grant Peel wrote:
...
 ~/.spamassassin/bayes* files had grown to 1.5 GB
> I have put:
> use_bayes 0
> bayes_auto_learn        0
> bayes_auto_expire       1
> bayes_expiry_max_db_size 50000
> in the local.cf file, and restarted spamd.
> 
> The database did not appear to trim, so I tried:   sa-learn -u "user" -D --force-expire
> and the database is still 1.5 GB.
> I know I am doing something(s) incorrect, but can't figure out what.
> How do I properly trim the offending file(s)?
> Is there a command to trim all databases (sers) on the box?
> Any advice would be appreciated.   Spamassassin 3.2.5,  FreeBSD 8.0
> -Grant 
> 
I believe that  bayes_seen is a perl hash and will not be reduced in size by deleting entries.
The only way to reduce it's size is to have a program read the current file, entry by entry and
output to a new file. This will not copy deleted entries and the output will be significantly smaller.
I don't know of any program, but if there is interest I might write one.
Dennis German


Re: Expiring Bayes; aka bayes files stay BIG

Posted by RW <rw...@googlemail.com>.
On Wed, 15 Sep 2010 19:42:44 -0400
Dennis German <dg...@Real-World-Systems.com> wrote:

> On Sep 15, 2010, at 1:42 PM, RW wrote:
> 
> > On Wed, 15 Sep 2010 11:18:20 -0400
> > Dennis German <dg...@Real-World-Systems.com> wrote:

> >> I believe that  bayes_seen is a perl hash and will not be reduced
> >> in size by deleting entries. The only way to reduce it's size is
> >> to have a program read the current file, entry by entry and output
> >> to a new file. This will not copy deleted entries and the output
> >> will be significantly smaller. ...
> >>  Dennis German
> >> 
> > It's straightforward to do it with backup and restore, but the
> > problem is that that there is no time field. You might just as well
> > delete the file periodically.  
> 
> Thanks for the info however after running backup & restore:
> Before:
> 41,619,456 Sep 15 19:04 bayes_seen
> 2,543,616 Sep 15 19:04 bayes_toks 
> After:
> 43,511,808 Sep 15 19:26 bayes_seen
>  2,560,000 Sep 15 19:26 bayes_toks


autodelete doesn't remove signatures at all, so there's no point in
compacting. I misunderstood what you saying and thought you were talking
about removing entries. My point was that you can do that with
sa-learn, but there is no basis for selecting which entries to delete.
If you want to do it, it presumably could be done by switching to SQL
and adding a date field.

The trouble is that signatures shouldn't really be removed on a
multi-user system unless there is a centralized method for leaning -
otherwise there may be people who are relying on them to be kept
indefinitely e.g. they may occasionally run sa-learn --ham on a
read-mail folder.

Re: Expiring Bayes; aka bayes files stay BIG

Posted by Dennis German <dg...@Real-World-Systems.com>.
On Sep 15, 2010, at 1:42 PM, RW wrote:

> On Wed, 15 Sep 2010 11:18:20 -0400
> Dennis German <dg...@Real-World-Systems.com> wrote:
> 
>> On Aug 26, 2010, at 10:11 AM, Grant Peel wrote:
>> ...
>> ~/.spamassassin/bayes* files had grown to 1.5 GB
>>> I have put:
>>> use_bayes 0
>>> bayes_auto_learn        0
>>> bayes_auto_expire       1
>>> bayes_expiry_max_db_size 50000
>>> in the local.cf file, and restarted spamd.
>>> 
>>> The database did not appear to trim, so I tried:   sa-learn -u
>>> "user" -D --force-expire and the database is still 1.5 GB.
>>> I know I am doing something(s) incorrect, but can't figure out what.
>>> How do I properly trim the offending file(s)?
>>> Is there a command to trim all databases (sers) on the box?
>>> Any advice would be appreciated.   Spamassassin 3.2.5,  FreeBSD 8.0
>>> -Grant 
>>> 
>> I believe that  bayes_seen is a perl hash and will not be reduced in
>> size by deleting entries. The only way to reduce it's size is to have
>> a program read the current file, entry by entry and output to a new
>> file. This will not copy deleted entries and the output will be
>> significantly smaller. ...
>>  Dennis German
>> 
> It's straightforward to do it with backup and restore, but the problem
> is that that there is no time field. You might just as well delete
> the file periodically.  

Thanks for the info however after running backup & restore:
Before:
41,619,456 Sep 15 19:04 bayes_seen
2,543,616 Sep 15 19:04 bayes_toks 
After:
43,511,808 Sep 15 19:26 bayes_seen
 2,560,000 Sep 15 19:26 bayes_toks


Re: Expiring Bayes; aka bayes files are BIG

Posted by RW <rw...@googlemail.com>.
On Wed, 15 Sep 2010 11:18:20 -0400
Dennis German <dg...@Real-World-Systems.com> wrote:

> On Aug 26, 2010, at 10:11 AM, Grant Peel wrote:
> ...
>  ~/.spamassassin/bayes* files had grown to 1.5 GB
> > I have put:
> > use_bayes 0
> > bayes_auto_learn        0
> > bayes_auto_expire       1
> > bayes_expiry_max_db_size 50000
> > in the local.cf file, and restarted spamd.
> > 
> > The database did not appear to trim, so I tried:   sa-learn -u
> > "user" -D --force-expire and the database is still 1.5 GB.
> > I know I am doing something(s) incorrect, but can't figure out what.
> > How do I properly trim the offending file(s)?
> > Is there a command to trim all databases (sers) on the box?
> > Any advice would be appreciated.   Spamassassin 3.2.5,  FreeBSD 8.0
> > -Grant 
> > 
> I believe that  bayes_seen is a perl hash and will not be reduced in
> size by deleting entries. The only way to reduce it's size is to have
> a program read the current file, entry by entry and output to a new
> file. This will not copy deleted entries and the output will be
> significantly smaller. I don't know of any program, but if there is
> interest I might write one. Dennis German
> 
It's straightforward to do it with backup and restore, but the problem
is that that there is no time field. You might just as well delete
the file periodically.