You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by LuKreme <kr...@kreme.com> on 2009/04/01 00:36:05 UTC

Re: quirks with bayes ?

On 31-Mar-2009, at 14:24, James Wilkinson wrote:
> I wrote (about the AWL):
>> In the absence of any sort of expire mechanism¹ (see, for example,
>> https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6059) one  
>> can do
>> a crude approximation by periodically resetting it.
>
> LuKreme wrote:
>> But why would you want to ever reset the AWL?
>
> To quote that bug report:
>    at this stage I don’t think it’s worth having on by default,  
> given
>    the problems it causes for disk load and out-of-control bloated db
>    files eating lots of disk space and memory, vs the marginal gains  
> in
>    accuracy it provides.  let’s set it off by default.

Wow.  I couldn't possibly disagree more.  AWL is crucial to avoid the  
occasional 'spammish' message from people I correspond with often. For  
example, I just got this from my cousin:

X-Spam-Status: No, score=3.4 required=5.0  
tests=AWL,BAYES_95,EXTRA_MPART_TYPE,
	HTML_FONT_SIZE_HUGE,HTML_IMAGE_RATIO_02,HTML_MESSAGE,SPF_PASS,
	UNPARSEABLE_RELAY autolearn=no version=3.2.5

> One can temporarily get the db files under control by deleting them  
> and
> letting SpamAssassin recreate them from scratch.

Under control?  They are not significantly large. The largest auto- 
whitelist on my system is 10M and most of them are under 500K.

What's odd is that db41_dump185 autowhitelist produces no readable  
info, while strings autowhitelist at least shows SOME information.

-- 
Can't stop the signal