You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@spamassassin.apache.org by LuKreme <kr...@kreme.com> on 2009/03/09 19:57:00 UTC

threshold_spam site wide?

I've seen a couple of mentions of adding

bayes_auto_learn_threshold_spam #.#

but these webpages have specified this setting should be in the  
user_prefs file.  Is that right, or can it go in the local.cf?

I've also seen conflicting information as to the value that is  
default.  On my system, it appears to be 15, but I have seen claims  
the default is 6.0, 8.0, 9.0, and 12.0.

-
Nihil est--in vita priore ego imperator Romanus fui.

Re: threshold_spam site wide?

Posted by Karsten Bräckelmann <gu...@rudersport.de>.

On Mon, 2009-03-09 at 15:41 -0600, LuKreme wrote:
> So why was this 11.9 autolearned?

See my previous post about a non-Bayes score-set being used. You beat me
by a minute with this follow-up question. ;)

> X-Spam-Status: Yes, score=11.9 required=5.0  
>   tests=AWL,BAYES_99, ... autolearn=spam version=3.2.5

AWL (your snipped doesn't include the score for that) and BAYES_99 are
being ignored for the auto-learn threshold.

Since you are running with network tests enabled, score-set 1 (without
Bayes) rather than 3 is used for auto-learning. The rules that do make a
significant difference:

50_scores.cf:score TVD_APPROVED 2.999 2.558 1.550 1.731 # n=2
50_scores.cf:score URIBL_JP_SURBL 0 2.857 0 1.501 # n=0 n=2
50_scores.cf:score URIBL_OB_SURBL 0 2.132 0 1.500 # n=0 n=2

That's about +2.8 for these, using a non-Bayes score-set. Minus
BAYES_99, minus AWL. Total score using rule-set 1 is 12.008 -- above the
threshold.

-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}

Re: threshold_spam site wide?

Posted by LuKreme <kr...@kreme.com>.

On 9-Mar-2009, at 13:01, Michael Scheidell wrote:
> Default is 12.0.  Above this, email is learned as 'spam'. Both for  
> Bayes and
> for AWL. The '6.0, 8.0, 9.0, all seem to almost be X-Spam-Flag values.
> Don't confuse them.

So why was this 11.9 autolearned?

X-Spam-Status: Yes, score=11.9 required=5.0  
tests=AWL,BAYES_99,DCC_CHECK,
	 
DIGEST_MULTIPLE 
,RAZOR2_CF_RANGE_51_100,RAZOR2_CF_RANGE_E8_51_100,RAZOR2_CHECK,
	SPF_PASS,TVD_APPROVED,URIBL_BLACK,URIBL_JP_SURBL,URIBL_OB_SURBL
	autolearn=spam version=3.2.5

-- 
Criticizing evolutionary theory because Darwin was limited is like
	claiming computers don't work because Chuck Babbage didn't
	foresee Duke Nukem 3.

Re: threshold_spam site wide?

Posted by Karsten Bräckelmann <gu...@rudersport.de>.

> > I've also seen conflicting information as to the value that is
> > default.  On my system, it appears to be 15, but I have seen claims
> > the default is 6.0, 8.0, 9.0, and 12.0.

Yup, as Michael said, default with SA 3.2.x is a spam threshold of 12.
However, this uses a non-Bayes rule-set for score evaluation. Thus, some
rules contribute a different score for auto-learning (used for
comparison against the auto-learn thresholds only) than they are doing
for the final spam score, with Bayes enabled. Moreover, some rules are
not even taken into account, like Bayes and AWL for example.

See the docs.
  http://spamassassin.apache.org/full/3.2.x/doc/Mail_SpamAssassin_Plugin_AutoLearnThreshold.html


> Default is 12.0.  Above this, email is learned as 'spam'. Both for Bayes and
> for AWL. The '6.0, 8.0, 9.0, all seem to almost be X-Spam-Flag values.
> Don't confuse them.

Nope, the auto-learn threshold is NOT related to AWL in any way. AWL is
a pure score averager, which works on all mail -- regardless of the
auto-learn thresholds, the SA score, or even Bayes being enabled at all.


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}

Re: threshold_spam site wide?

Posted by Michael Scheidell <sc...@secnap.net>.

> I've seen a couple of mentions of adding
> 
> bayes_auto_learn_threshold_spam #.#
> 
> but these webpages have specified this setting should be in the
> user_prefs file.  Is that right, or can it go in the local.cf?
> 
If you are using a per user bayes, than it looks in local.cf first, then in
user_prefs for override.

> I've also seen conflicting information as to the value that is
> default.  On my system, it appears to be 15, but I have seen claims
> the default is 6.0, 8.0, 9.0, and 12.0.

Default is 12.0.  Above this, email is learned as 'spam'. Both for Bayes and
for AWL. The '6.0, 8.0, 9.0, all seem to almost be X-Spam-Flag values.
Don't confuse them.

What to set it to?

Make sure you TRY to keep a 10/1 ratio once finished learning

(on a per user basis, run this:

sa-learn --dump magic

Look at spam/ham count.

Spam should be 10x of ham (assuming a 90% spam ration)

YMMV.


-- 
Michael Scheidell, CTO
>|SECNAP Network Security
Finalist 2009 Network Products Guide Hot Companies
FreeBSD SpamAssassin Ports maintainer


_________________________________________________________________________
This email has been scanned and certified safe by SpammerTrap(r). 
For Information please see http://www.secnap.com/products/spammertrap/
_________________________________________________________________________