You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by Dave Ohlsson <da...@hotmail.com> on 2009/06/29 19:14:05 UTC

basic questions about scores

Hi,

I am new to Spamassassin and don't know Perl, so please bear with me...

Is it documented anywhere how Spamassassin scores work? I mean, documented such that even the layman that I am could understand...

More specifically:

1 - I can see from Mail-SpamAssassin-3.2.5/rules/50_scores.cf that some rules have one single score:
      score URIBL_GREY 0.25
      score URIBL_RED 0.001
    while some other rules have four:
      score WHOIS_WHOISGUARD 0 3.399 0 2.025 # n=0 n=2
      score WHOIS_WHOISPROT 0 0.000 0 1.000 # n=1 n=2
    What do these rules differ?

2 - I understood that, at a high level, Spamassassin works like this - please correct me if I am wrong:
    a - Spamassassin determines which rules a message matches
    b - Spamassassin somehow combines the scores of the rules matched in phase a and assigns a single, global score to the message
    c - based on the global score found in phase b and the user's settings, Spamassassin determines whether the message is ham, spam, or something in between
    Regarding phase b: Are the scores just added up, or is there some higher math involved?

-- dave

PS: Yes, I did have a look at http://wiki.apache.org/spamassassin/HowScoresAreAssigned and at http://wiki.apache.org/spamassassin/WhyUseRules...
_________________________________________________________________
Hotmail® has ever-growing storage! Don’t worry about storage limits. 
http://windowslive.com/Tutorial/Hotmail/Storage?ocid=TXT_TAGLM_WL_HM_Tutorial_Storage_062009

Re: basic questions about scores

Posted by Karsten Bräckelmann <gu...@rudersport.de>.
On Mon, 2009-06-29 at 13:14 -0400, Dave Ohlsson wrote:
> I am new to Spamassassin and don't know Perl, so please bear with
> me...
> 
> Is it documented anywhere how Spamassassin scores work? I mean,
> documented such that even the layman that I am could understand...

Please ask any basic or usage questions on the *users* mailing list.
This one is for development.

> More specifically:
> 
> 1 - I can see from Mail-SpamAssassin-3.2.5/rules/50_scores.cf that
> some rules have one single score while some other rules have four:
> What do these rules differ?

See the Conf docs, section Scoring Options. More specifically the
'score' configuration option.

http://spamassassin.apache.org/full/3.2.x/doc/Mail_SpamAssassin_Conf.html#scoring_options

> 2 - I understood that, at a high level, Spamassassin works like this -
> please correct me if I am wrong:
>     a - Spamassassin determines which rules a message matches
>     b - Spamassassin somehow combines the scores of the rules matched
> in phase a and assigns a single, global score to the message
>     c - based on the global score found in phase b and the user's
> settings, Spamassassin determines whether the message is ham, spam, or
> something in between

There's nothing in between, it's black and white. ;)  Though it is very
common to have at least two distinct spam categories based on the score,
where the lower scoring spam should be more carefully reviewed.

>     Regarding phase b: Are the scores just added up, or is there some
> higher math involved?

They're simply added.

The "higher math" more magical thingies mostly are done with meta rules,
and to lesser extent with some special rule's options -- the resulting
scores of course are treated just the same and being added...


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}