You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Grant Peel <gp...@thenetnow.com> on 2010/08/26 16:11:33 UTC
Expiring Beyes
Hi all,
I have serveral hundred domains on a box. Each domain's mail is controlled
by a specific UNIX user.
Inside every user's directory, they have a user_prefs file.
While I have use_bayes 0 in the main config, some users have opted to turn
on bayes in thier user_prefs.
This morning I noticed that one particular ~/.spamassassin/bayes* files had
grown to 1.5 GB.
I have put:
use_bayes 0
bayes_auto_learn 0
bayes_auto_expire 1
bayes_expiry_max_db_size 50000
in the local.cf file, and restarted spamd.
The database did not appear to trim, so I tried:
sa-learn -u "user" -D --force-expire
and the database is still 1.5 GB.
I know I am doing something(s) incorrect, but can't figure out what.
How do I properly trim the offending file(s)?
Is there a command to trim all databases (sers) on the box?
Any advice would be appreciated.
Spamassassin 3.2.5
FreeBSD 8.0
-Grant
Re: Expiring Beyes
Posted by RW <rw...@googlemail.com>.
On Thu, 26 Aug 2010 16:25:22 +0200
Yet Another Ninja <sa...@alexb.ch> wrote:
> I bet the biggest is bayes_seen.
> You can safely delete the bayes_seen file (unless you plan to
> "unlearn" msgs). I will stzart growing again, fast.
If the account has never been expired, bayes_seen is likely to be
negligible compared to the token file. It's only when the total expired
tokens hugely exceed the current tokens that it becomes a comparable
size.
On Thu, 26 Aug 2010 10:11:33 -0400
"Grant Peel" <gp...@thenetnow.com> wrote:
> I have put:
>
> use_bayes 0
why?
> bayes_auto_learn 0
> bayes_auto_expire 1
> bayes_expiry_max_db_size 50000
50,000 is far too low, the well out-of-date default is 150,000, and
there's a hardcoded minimum of 100,000, 1.5GB is approximately
50,000,000 by comparison.
Re: Expiring Beyes
Posted by Yet Another Ninja <sa...@alexb.ch>.
On 2010-08-26 16:11, Grant Peel wrote:
> Hi all,
>
> I have serveral hundred domains on a box. Each domain's mail is
> controlled by a specific UNIX user.
>
> Inside every user's directory, they have a user_prefs file.
>
> While I have use_bayes 0 in the main config, some users have opted to
> turn on bayes in thier user_prefs.
>
> This morning I noticed that one particular ~/.spamassassin/bayes* files
> had grown to 1.5 GB.
>
> I have put:
>
> use_bayes 0
> bayes_auto_learn 0
> bayes_auto_expire 1
> bayes_expiry_max_db_size 50000
>
> in the local.cf file, and restarted spamd.
>
> The database did not appear to trim, so I tried:
>
> sa-learn -u "user" -D --force-expire
>
> and the database is still 1.5 GB.
>
> I know I am doing something(s) incorrect, but can't figure out what.
>
> How do I properly trim the offending file(s)?
>
> Is there a command to trim all databases (sers) on the box?
>
> Any advice would be appreciated.
I bet the biggest is bayes_seen.
You can safely delete the bayes_seen file (unless you plan to "unlearn"
msgs). I will stzart growing again, fast.
the bayes_tokens file is the one which gets trimmed by expiration.
bayes_seen is what I call a parasite :-)
On a busy box, to avoid freezes I'd recommend settin
bayes_auto_expire 0
and do a cron'd force-expire during low traffic hours, eithe daily or
weekly, depending on the bayes_tokes size.
h2h
Re: Expiring Bayes; aka bayes files stay BIG
Posted by RW <rw...@googlemail.com>.
On Wed, 15 Sep 2010 19:42:44 -0400
Dennis German <dg...@Real-World-Systems.com> wrote:
> On Sep 15, 2010, at 1:42 PM, RW wrote:
>
> > On Wed, 15 Sep 2010 11:18:20 -0400
> > Dennis German <dg...@Real-World-Systems.com> wrote:
> >> I believe that bayes_seen is a perl hash and will not be reduced
> >> in size by deleting entries. The only way to reduce it's size is
> >> to have a program read the current file, entry by entry and output
> >> to a new file. This will not copy deleted entries and the output
> >> will be significantly smaller. ...
> >> Dennis German
> >>
> > It's straightforward to do it with backup and restore, but the
> > problem is that that there is no time field. You might just as well
> > delete the file periodically.
>
> Thanks for the info however after running backup & restore:
> Before:
> 41,619,456 Sep 15 19:04 bayes_seen
> 2,543,616 Sep 15 19:04 bayes_toks
> After:
> 43,511,808 Sep 15 19:26 bayes_seen
> 2,560,000 Sep 15 19:26 bayes_toks
autodelete doesn't remove signatures at all, so there's no point in
compacting. I misunderstood what you saying and thought you were talking
about removing entries. My point was that you can do that with
sa-learn, but there is no basis for selecting which entries to delete.
If you want to do it, it presumably could be done by switching to SQL
and adding a date field.
The trouble is that signatures shouldn't really be removed on a
multi-user system unless there is a centralized method for leaning -
otherwise there may be people who are relying on them to be kept
indefinitely e.g. they may occasionally run sa-learn --ham on a
read-mail folder.
Re: Expiring Bayes; aka bayes files stay BIG
Posted by Dennis German <dg...@Real-World-Systems.com>.
On Sep 15, 2010, at 1:42 PM, RW wrote:
> On Wed, 15 Sep 2010 11:18:20 -0400
> Dennis German <dg...@Real-World-Systems.com> wrote:
>
>> On Aug 26, 2010, at 10:11 AM, Grant Peel wrote:
>> ...
>> ~/.spamassassin/bayes* files had grown to 1.5 GB
>>> I have put:
>>> use_bayes 0
>>> bayes_auto_learn 0
>>> bayes_auto_expire 1
>>> bayes_expiry_max_db_size 50000
>>> in the local.cf file, and restarted spamd.
>>>
>>> The database did not appear to trim, so I tried: sa-learn -u
>>> "user" -D --force-expire and the database is still 1.5 GB.
>>> I know I am doing something(s) incorrect, but can't figure out what.
>>> How do I properly trim the offending file(s)?
>>> Is there a command to trim all databases (sers) on the box?
>>> Any advice would be appreciated. Spamassassin 3.2.5, FreeBSD 8.0
>>> -Grant
>>>
>> I believe that bayes_seen is a perl hash and will not be reduced in
>> size by deleting entries. The only way to reduce it's size is to have
>> a program read the current file, entry by entry and output to a new
>> file. This will not copy deleted entries and the output will be
>> significantly smaller. ...
>> Dennis German
>>
> It's straightforward to do it with backup and restore, but the problem
> is that that there is no time field. You might just as well delete
> the file periodically.
Thanks for the info however after running backup & restore:
Before:
41,619,456 Sep 15 19:04 bayes_seen
2,543,616 Sep 15 19:04 bayes_toks
After:
43,511,808 Sep 15 19:26 bayes_seen
2,560,000 Sep 15 19:26 bayes_toks
Re: Expiring Bayes; aka bayes files are BIG
Posted by RW <rw...@googlemail.com>.
On Wed, 15 Sep 2010 11:18:20 -0400
Dennis German <dg...@Real-World-Systems.com> wrote:
> On Aug 26, 2010, at 10:11 AM, Grant Peel wrote:
> ...
> ~/.spamassassin/bayes* files had grown to 1.5 GB
> > I have put:
> > use_bayes 0
> > bayes_auto_learn 0
> > bayes_auto_expire 1
> > bayes_expiry_max_db_size 50000
> > in the local.cf file, and restarted spamd.
> >
> > The database did not appear to trim, so I tried: sa-learn -u
> > "user" -D --force-expire and the database is still 1.5 GB.
> > I know I am doing something(s) incorrect, but can't figure out what.
> > How do I properly trim the offending file(s)?
> > Is there a command to trim all databases (sers) on the box?
> > Any advice would be appreciated. Spamassassin 3.2.5, FreeBSD 8.0
> > -Grant
> >
> I believe that bayes_seen is a perl hash and will not be reduced in
> size by deleting entries. The only way to reduce it's size is to have
> a program read the current file, entry by entry and output to a new
> file. This will not copy deleted entries and the output will be
> significantly smaller. I don't know of any program, but if there is
> interest I might write one. Dennis German
>
It's straightforward to do it with backup and restore, but the problem
is that that there is no time field. You might just as well delete
the file periodically.
Re: Expiring Bayes; aka bayes files are BIG
Posted by Dennis German <dg...@Real-World-Systems.com>.
On Aug 26, 2010, at 10:11 AM, Grant Peel wrote:
...
~/.spamassassin/bayes* files had grown to 1.5 GB
> I have put:
> use_bayes 0
> bayes_auto_learn 0
> bayes_auto_expire 1
> bayes_expiry_max_db_size 50000
> in the local.cf file, and restarted spamd.
>
> The database did not appear to trim, so I tried: sa-learn -u "user" -D --force-expire
> and the database is still 1.5 GB.
> I know I am doing something(s) incorrect, but can't figure out what.
> How do I properly trim the offending file(s)?
> Is there a command to trim all databases (sers) on the box?
> Any advice would be appreciated. Spamassassin 3.2.5, FreeBSD 8.0
> -Grant
>
I believe that bayes_seen is a perl hash and will not be reduced in size by deleting entries.
The only way to reduce it's size is to have a program read the current file, entry by entry and
output to a new file. This will not copy deleted entries and the output will be significantly smaller.
I don't know of any program, but if there is interest I might write one.
Dennis German
Re: Expiring Beyes
Posted by Bowie Bailey <Bo...@BUC.com>.
On 8/26/2010 10:11 AM, Grant Peel wrote:
> Hi all,
>
> I have serveral hundred domains on a box. Each domain's mail is
> controlled by a specific UNIX user.
>
> Inside every user's directory, they have a user_prefs file.
>
> While I have use_bayes 0 in the main config, some users have opted to
> turn on bayes in thier user_prefs.
>
> This morning I noticed that one particular ~/.spamassassin/bayes*
> files had grown to 1.5 GB.
>
> I have put:
>
> use_bayes 0
> bayes_auto_learn 0
> bayes_auto_expire 1
> bayes_expiry_max_db_size 50000
>
> in the local.cf file, and restarted spamd.
>
> The database did not appear to trim, so I tried:
>
> sa-learn -u "user" -D --force-expire
>
> and the database is still 1.5 GB.
>
> I know I am doing something(s) incorrect, but can't figure out what.
>
> How do I properly trim the offending file(s)?
>
> Is there a command to trim all databases (sers) on the box?
>
> Any advice would be appreciated.
>
> Spamassassin 3.2.5
> FreeBSD 8.0
I believe the 'sa-learn -u' command only works when you are using an SQL
backend.
Try this:
su - user -c sa-learn --force-expire
--
Bowie