You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by Olivier Mueller <om...@omx.ch> on 2008/03/10 14:22:04 UTC

Re: [Bug 5652] bayes_seen - auto expire / Re: bayes_seen = 256GB

Hello,

On Wed, 2007-11-14 at 00:43 -0800,
bugzilla-daemon@bugzilla.spamassassin.org wrote:
> http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5652
[...]
> So in the interim could I suggest an FAQ that acknowledges the problem and gives
> some sort of workaround/fudge, even if that is as simple (even if suboptimal) as
> removing "bayes_seen".  (Can this be safely done without needing to restart SA,
> MailScanner, etc?)  Or perhaps an automatically installed cron job that removes
> it if it exceeds a certain size.
> 
> (Obviously a proper solution in the next release would be ideal.  But if that is
> not possible, then some sort of FAQ/known-issue and/or fudge/workaround.)
[...]


So what would be the proper procedure at the moment?  I also have
several servers with bayes_seen table growing to infinite (a few GB),
and would like to add a cron-based cleanup job like I did for awl /
tokens.

The systems are set up to work automatically (no manual sa-learn
operation, so no risk to load twice the same messages). 

So if I understand correctly (spent an hour browsing archives & faqs), I
could simply truncate the bayes_seen table every week or so, or add a
timestamp field and remove entries older than 1 week|month|...  and the
system would still work 100% fine? 

Thanks for a short confirmation & thanks for your great work,
regards from Switzerland,
Olivier



Re: [Bug 5652] bayes_seen - auto expire / Re: bayes_seen = 256GB

Posted by Yet Another Ninja <sa...@alexb.ch>.
On 3/10/2008 3:41 PM, Olivier Mueller wrote:
> Thanks for your feedback dear Ninja :)
> 
> On Mon, 2008-03-10 at 14:55 +0100, Yet Another Ninja wrote:
>>> So if I understand correctly (spent an hour browsing archives & faqs), I
>>> could simply truncate the bayes_seen table every week or so, or add a
>>> timestamp field and remove entries older than 1 week|month|...  and the
>>> system would still work 100% fine? 
>> unless you plan to "forget" etc on msgs you can safely wipe/purge/delete 
>> the seen file completely. Bayes works fine without it, will autocreate 
>> if missing.
>>
>> no need to restart spamd/mailscanner/amavis/whatever after rm -f.
> 
> well, in my case it would be using the SQL version, so I guess that
> if I remove the table (w/o restarting spamd) it will break something,
> but truncating (empty) shouldn't be a problem. Are there other admins
> around doing that? :)

you wouldn't delete the table
I think you'd  TRUNCATE BAYES_SEEN (or whatever the table is named - 
MySQL is not in my skill list) to purge the data in it.



Re: [Bug 5652] bayes_seen - auto expire / Re: bayes_seen = 256GB

Posted by Olivier Mueller <om...@omx.ch>.
Thanks for your feedback dear Ninja :)

On Mon, 2008-03-10 at 14:55 +0100, Yet Another Ninja wrote:
> > So if I understand correctly (spent an hour browsing archives & faqs), I
> > could simply truncate the bayes_seen table every week or so, or add a
> > timestamp field and remove entries older than 1 week|month|...  and the
> > system would still work 100% fine? 
> 
> unless you plan to "forget" etc on msgs you can safely wipe/purge/delete 
> the seen file completely. Bayes works fine without it, will autocreate 
> if missing.
> 
> no need to restart spamd/mailscanner/amavis/whatever after rm -f.

well, in my case it would be using the SQL version, so I guess that
if I remove the table (w/o restarting spamd) it will break something,
but truncating (empty) shouldn't be a problem. Are there other admins
around doing that? :)

regards,
Olivier




Re: [Bug 5652] bayes_seen - auto expire / Re: bayes_seen = 256GB

Posted by Yet Another Ninja <sa...@alexb.ch>.
On 3/10/2008 2:22 PM, Olivier Mueller wrote:
> Hello,
> 
> On Wed, 2007-11-14 at 00:43 -0800,
> bugzilla-daemon@bugzilla.spamassassin.org wrote:
>> http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5652
> [...]
>> So in the interim could I suggest an FAQ that acknowledges the problem and gives
>> some sort of workaround/fudge, even if that is as simple (even if suboptimal) as
>> removing "bayes_seen".  (Can this be safely done without needing to restart SA,
>> MailScanner, etc?)  Or perhaps an automatically installed cron job that removes
>> it if it exceeds a certain size.
>>
>> (Obviously a proper solution in the next release would be ideal.  But if that is
>> not possible, then some sort of FAQ/known-issue and/or fudge/workaround.)
> [...]
> 
> 
> So what would be the proper procedure at the moment?  I also have
> several servers with bayes_seen table growing to infinite (a few GB),
> and would like to add a cron-based cleanup job like I did for awl /
> tokens.
> 
> The systems are set up to work automatically (no manual sa-learn
> operation, so no risk to load twice the same messages). 
> 
> So if I understand correctly (spent an hour browsing archives & faqs), I
> could simply truncate the bayes_seen table every week or so, or add a
> timestamp field and remove entries older than 1 week|month|...  and the
> system would still work 100% fine? 

unless you plan to "forget" etc on msgs you can safely wipe/purge/delete 
the seen file completely. Bayes works fine without it, will autocreate 
if missing.

no need to restart spamd/mailscanner/amavis/whatever after rm -f.

A SA switch to avoid its creation in the first place would be very 
appreciated.... something for bugzilla...