You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@spamassassin.apache.org by Harry Putnam <re...@newsguy.com> on 2013/09/15 17:19:12 UTC

SA Scoring... mysterious point loss

SA is letting mail thru as ham that should be spam apparently based on
what is too low a score (for my mail) for URIBL_JP_SURBL which was
1.9 by default.

I pushed it up to 4.

But then I see a report that shows a total score of 4.9 when
 4.0 is shown for URIBL_JP_SURBL
 1.0 is shown for SPF_SOFTFAIL

But the total score is 4.9.
-------        ---------       ---=---       ---------      -------- 
I assumed it had something to do with rounding or something so I
increased the score to 4.1 to get that message to break the spam level
of 5.

Now the same mail shows a total of 5.1

 4.1 is shown for URIBL_JP_SURBL
 1.0 is shown for SPF_SOFTFAIL

So what happened..? in one case a point (.1) is dropped and in the
other it is not.

Re: SA Scoring... mysterious point loss

Posted by RW <rw...@googlemail.com>.

On Sun, 15 Sep 2013 21:15:46 -0400
Harry Putnam wrote:

> RW <rw...@googlemail.com> writes:

> > I had a look into it, and it seems that rounding is handled in an
> > unusual way. It starts by rounding to the nearest 0.1, and then
> > subtracts 0.1 if the result is non-spam to avoid the case of:
> >
> > X-Spam-Status: No, score=5.0 required=5.0
> >
> > IMO simply rounding towards zero using int would be better. I think
> > most people understand rounding, this is a lot more disconcerting.
> >
> > None of this affects the result though, it's just what's displayed
> > in the headers.
> 
> Well thanks for the explanation, but your last statement there seems
> not to really be true.
> 
> I boosted the default 1.9 of URIBL_JP_SURBL score to 4 and something
> else had a score of 1 but still it was ruled ham with score 4.9,
> instead of 5 which would have made it spam.


In the scoreset you are using SPF_SOFTFAIL has a score of 0.972 which
gets rounded to 1.0 in the header.

Re: SA Scoring... mysterious point loss

Posted by Harry Putnam <re...@newsguy.com>.

RW <rw...@googlemail.com> writes:

> On Sun, 15 Sep 2013 11:19:12 -0400
> Harry Putnam wrote:
[...]

>> I assumed it had something to do with rounding or something so I
>> increased the score to 4.1 to get that message to break the spam level
>> of 5.
>> 
>> Now the same mail shows a total of 5.1
>> 
>>  4.1 is shown for URIBL_JP_SURBL
>>  1.0 is shown for SPF_SOFTFAIL
>> 
>> So what happened..? in one case a point (.1) is dropped and in the
>> other it is not.
>
> It's odd that 2 people should notice this almost simultaneouly when it's
> been around for years (see "Score = 4.9" )
>
> I had a look into it, and it seems that rounding is handled in an
> unusual way. It starts by rounding to the nearest 0.1, and then
> subtracts 0.1 if the result is non-spam to avoid the case of:
>
> X-Spam-Status: No, score=5.0 required=5.0
>
> IMO simply rounding towards zero using int would be better. I think most
> people understand rounding, this is a lot more disconcerting.
>
> None of this affects the result though, it's just what's displayed in
> the headers.

Well thanks for the explanation, but your last statement there seems
not to really be true.

I boosted the default 1.9 of URIBL_JP_SURBL score to 4 and something
else had a score of 1 but still it was ruled ham with score 4.9,
instead of 5 which would have made it spam.

So there must be some tiny percentage of mail that gets wrongly sent
thru as ham due to the problem you outlined.  So, not just a matter of
what is displayed in headers.

Re: SA Scoring... mysterious point loss

Posted by RW <rw...@googlemail.com>.

On Tue, 17 Sep 2013 15:20:41 -0400
David F. Skoll wrote:

> On Tue, 17 Sep 2013 20:08:22 +0100
> RW <rw...@googlemail.com> wrote:
> 
> > It is a bit more complicated than I thought though. Rounding
> > towards zero produces sensible results for the 5.0 threshold, but it
> > becomes more complicated if one needs to handle threholds close to,
> > or below, zero and which aren't multiples of 0.1.
> 
> Actually, 0.1 cannot be represented exactly in binary.  It is
> 0.0001100110011.... in base 2.

Yes, I know, that doesn't actually affect anything though. It's a
matter of allowing for the displayed threshold to be rounded as well as
the score.

Re: SA Scoring... mysterious point loss

Posted by "David F. Skoll" <df...@roaringpenguin.com>.

On Tue, 17 Sep 2013 20:08:22 +0100
RW <rw...@googlemail.com> wrote:

> It is a bit more complicated than I thought though. Rounding
> towards zero produces sensible results for the 5.0 threshold, but it
> becomes more complicated if one needs to handle threholds close to, or
> below, zero and which aren't multiples of 0.1.

Actually, 0.1 cannot be represented exactly in binary.  It is
0.0001100110011.... in base 2.

Darn computers! :)

Regards,

David.

Re: SA Scoring... mysterious point loss

Posted by RW <rw...@googlemail.com>.

On Tue, 17 Sep 2013 10:12:03 +0200
Karsten Bräckelmann wrote:

> I assume he knows about all that. Yet, being confronted with the
> initial mystery of 4.9 vs 5.0 and a sneaky spam refusing to cross
> that all-magic threshold, he seems to have forgotten about rounding.

If you reread the original post you'll see that he did initially
attribute 4.0+1.0 giving 4.9  to rounding.  It became confusing when
4.0 was increased to 4.1 and the score jumped by 0.2.

It is a bit more complicated than I thought though. Rounding
towards zero produces sensible results for the 5.0 threshold, but it
becomes more complicated if one needs to handle threholds close to, or
below, zero and which aren't multiples of 0.1.

Re: SA Scoring... mysterious point loss

Posted by Karsten Bräckelmann <gu...@rudersport.de>.

On Mon, 2013-09-16 at 00:59 +0100, RW wrote:
> On Sun, 15 Sep 2013 11:19:12 -0400 Harry Putnam wrote:

The real reason for what you're observing here is (as RW pointed out in
a follow-up post), that SPF_SOFTFAIL has a score of 0.972 -- that, and
you looking at the rounded scores in the brief summary, rather than the
actual rules' scores.

> > SA is letting mail thru as ham that should be spam apparently based on
> > what is too low a score (for my mail) for URIBL_JP_SURBL which was
> > 1.9 by default.
> > 
> > I pushed it up to 4.
> > 
> > But then I see a report that shows a total score of 4.9 when
> >  4.0 is shown for URIBL_JP_SURBL
> >  1.0 is shown for SPF_SOFTFAIL
> > 
> > But the total score is 4.9.

4.972, to be precise. Less than 5.0, thus not spam. This is an edge
case, where correct rounding would result in the headers essentially
claiming "5.0 < 5.0". Since this is the threshold that matters most to
people, correct rounding would lead to much more confusion and an FAQ.

> > I assumed it had something to do with rounding or something so I
> > increased the score to 4.1 to get that message to break the spam level
> > of 5.
> > 
> > Now the same mail shows a total of 5.1

5.072, rounding to 5.1 with a precision of 1 decimal place.

> > So what happened..? in one case a point (.1) is dropped and in the
> > other it is not.
> 
> It's odd that 2 people should notice this almost simultaneouly when it's
> been around for years (see "Score = 4.9" )
> 
> I had a look into it, and it seems that rounding is handled in an
> unusual way. It starts by rounding to the nearest 0.1, and then
> subtracts 0.1 if the result is non-spam to avoid the case of:

I prefer the term special case, rather than "handling rounding in an
unusual way", because in this one special case, SA simply does not
round.

> X-Spam-Status: No, score=5.0 required=5.0
> 
> IMO simply rounding towards zero using int would be better. I think most
> people understand rounding, this is a lot more disconcerting.

Most people do indeed understand rounding. They tend to forget about
this when dealing with the all-magic decision they care about -- spam or
not spam.

Example? This very thread. Given that Harry customizes scores, he seems
to have some experience with SA config, rules and scores. He checks the
Result header of his low scorers, so "oddities" like -0.0 are most
likely not new to him.

I assume he knows about all that. Yet, being confronted with the initial
mystery of 4.9 vs 5.0 and a sneaky spam refusing to cross that all-magic
threshold, he seems to have forgotten about rounding.

> None of this affects the result though, it's just what's displayed in
> the headers.

Very true.

-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}

Re: SA Scoring... mysterious point loss

Posted by RW <rw...@googlemail.com>.

On Sun, 15 Sep 2013 11:19:12 -0400
Harry Putnam wrote:

> SA is letting mail thru as ham that should be spam apparently based on
> what is too low a score (for my mail) for URIBL_JP_SURBL which was
> 1.9 by default.
> 
> I pushed it up to 4.
> 
> But then I see a report that shows a total score of 4.9 when
>  4.0 is shown for URIBL_JP_SURBL
>  1.0 is shown for SPF_SOFTFAIL
> 
> But the total score is 4.9.
> -------        ---------       ---=---       ---------      -------- 
> I assumed it had something to do with rounding or something so I
> increased the score to 4.1 to get that message to break the spam level
> of 5.
> 
> Now the same mail shows a total of 5.1
> 
>  4.1 is shown for URIBL_JP_SURBL
>  1.0 is shown for SPF_SOFTFAIL
> 
> So what happened..? in one case a point (.1) is dropped and in the
> other it is not.

It's odd that 2 people should notice this almost simultaneouly when it's
been around for years (see "Score = 4.9" )

I had a look into it, and it seems that rounding is handled in an
unusual way. It starts by rounding to the nearest 0.1, and then
subtracts 0.1 if the result is non-spam to avoid the case of:

X-Spam-Status: No, score=5.0 required=5.0

IMO simply rounding towards zero using int would be better. I think most
people understand rounding, this is a lot more disconcerting.

None of this affects the result though, it's just what's displayed in
the headers.