You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Craig McLean <cr...@craig.dnsalias.com> on 2005/05/15 17:49:33 UTC

Strange SA report maths.

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi all,
Using SA 3.0.3 on FreeBSD, I noticed the following interesting maths in
the report from a message received a moment ago:

- -quote-

Content analysis details:   (4.1 points, 4.0 required)

~ pts rule name              description
- ---- ----------------------
- --------------------------------------------------
~ 0.0 NO_REAL_NAME           From: does not include a real name
~ 0.2 INVALID_DATE           Invalid Date: header (not RFC 2822)
~ 0.1 HTML_COMMENT_SAVED_URL BODY: HTML message is a saved web page
~ 3.5 BAYES_99               BODY: Bayesian spam probability is 99 to 100%
~                            [score: 1.0000]
~ 0.0 HTML_MESSAGE           BODY: HTML included in message
~ 0.1 HTML_FONT_BIG          BODY: HTML tag for a big font size
~ 0.0 MIME_QP_LONG_LINE      RAW: Quoted-printable line longer than 76 chars

- -quote-
(Full headers below.)

Now correct me if I'm wrong, but 3.5 + 0.2 + 0.1 + 0.1 is not 4.1 ?

Kind Regards,
Craig.

- -original headers-

Return-Path: <ae...@freesurf.fr>
Received: from mta126.mail.ukl.yahoo.com (mta126.mail.ukl.yahoo.com
[217.12.11.75])
	by craig.dnsalias.com (8.12.10/8.12.10) with SMTP id j4FFYBIW010078
	for <ho...@craig.dnsalias.com>; Sun, 15 May 2005 16:34:14 +0100 (BST)
	(envelope-from aeim@freesurf.fr)
X-Yahoo-Forwarded: from mcavityp@yahoo.co.uk to honey@craig.dnsalias.com
X-Rocket-Track: 0: 100 ; IPCR=n-w0,n100,g0 ; IP=212.43.206.16 ;
SERVER=217.12.12.165
Authentication-Results: mta126.mail.ukl.yahoo.com
~  from=freesurf.fr; domainkeys=neutral (no sig)
X-Originating-IP: [212.43.206.16]
Received: from 212.43.206.16  (EHLO fidel.freesurf.fr) (212.43.206.16)
~  by mta126.mail.ukl.yahoo.com with SMTP; Sun, 15 May 2005 15:33:43 +0000
Received: from acps-77b8dgiwqw (du-204-236.nat.dialup.freesurf.fr
[212.43.204.236])
	by fidel.freesurf.fr (Postfix) with SMTP id 326832A7CBA;
	Sun, 15 May 2005 17:33:36 +0200 (CEST)
From: <ae...@freesurf.fr>
To: <co...@fidel.freesurf.fr>
Reply-To: <ae...@freesurf.fr>
Subject: **SPAM (4.1)** International Cd e-mail adress
Date: Sun, 15 may 2005 16:49:36 +0200
Importance: normal
Mime-Version: 1.0
Content-Type: multipart/mixed; boundary="----------=_42876BF9.E8A0075B"
Message-Id: <20...@fidel.freesurf.fr>
X-Spam-Flag: YES
X-Spam-Status: Yes, score=4.1 required=4.0 tests=BAYES_99,
	HTML_COMMENT_SAVED_URL,HTML_FONT_BIG,HTML_MESSAGE,INVALID_DATE,
	MIME_QP_LONG_LINE,NO_REAL_NAME autolearn=no version=3.0.3
X-Spam-Report:
	*  0.0 NO_REAL_NAME From: does not include a real name
	*  0.2 INVALID_DATE Invalid Date: header (not RFC 2822)
	*  0.1 HTML_COMMENT_SAVED_URL BODY: HTML message is a saved web page
	*  3.5 BAYES_99 BODY: Bayesian spam probability is 99 to 100%
	*      [score: 1.0000]
	*  0.0 HTML_MESSAGE BODY: HTML included in message
	*  0.1 HTML_FONT_BIG BODY: HTML tag for a big font size
	*  0.0 MIME_QP_LONG_LINE RAW: Quoted-printable line longer than 76 chars
X-Spam-Level: ****
X-Spam-Checker-Version: SpamAssassin 3.0.3 (2005-04-27) on mail.vega
X-Virus-Scanned: ClamAV devel-20050513/879/Sun May 15 14:43:45 2005 on
vega-mail.vega
X-Virus-Status: Clean

This is a multi-part message in MIME format.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iD8DBQFCh2+MMDDagS2VwJ4RAr6KAJ92D9I4Vh8NHV26dZKCZfwzSe50hQCgzJet
rEGs1JUXc0QMQMn7J2qPQUo=
=p0Bn
-----END PGP SIGNATURE-----

Re: Strange SA report maths.

Posted by Matt Kettler <mk...@evi-inc.com>.
Craig McLean wrote:

> Can you be more specific? A search of wiki.apache.org/spamassassin shows
> 2 pages containing "rounding":
> StatusRounding - orphaned.
> RoundingIssues - this is not the issue I'm talking about, and in any
> case was fixed in 3.0.

Actually if you read:
http://wiki.apache.org/spamassassin/RoundingIssues

Your particular case would be MAGNIFIED by the bugfix, not corrected by it.

SA 2.6x used normal rounding when printing numbers (ie: 0.06 rounds to 0.1, 0.04
rounds to 0).

With 2.6 it was possible for the hits to look like they should have been higher
than they are. (ie: rounded numbers add up to 5.0, but the total score is 4.9).
It was also possible for then to look like they should be lower. Normal rounding
does that. Sometimes you round up, sometimes down.

SA 3.0 "fixes" this by always rounding down (0.06 rounds to 0). This doesn't
really "fix" anything, as to "fix" it would be impossible. It does however shift
the nature of the problem around.

Now you'll never see a total that looks too low. Now you'll only ever see ones
that look to high. You'll also see on average more "errors" as there's never any
chance for two rounded numbers to cancel each other out. They' also on average
be more severe in magnitude.

Any form of rounding introduces error. You can force that error to always be on
the low side (round down), always on the high side (round up), or split between
the two (normal rounding). It's a matter of picking which of the three is better
understood by users (and really, the average user doesn't understand this kind
of thing at all, so you're pretty much screwed no matter what you do)

The only way to truly "fix" the "bug" is to print the whole, unrounded score for
everything. But that creates a very cluttered report, which would be another bug.


Re: Strange SA report maths.

Posted by Theodore Heise <th...@heise.nu>.

On Sun, 15 May 2005, Craig McLean wrote:

> The scores on the doors (using set 4):
> NO_REAL_NAME 0.007
> INVALID_DATE 0.236
> HTML_COMMENT_SAVED_URL 0.146
> BAYES_99 3.5
> HTML_MESSAGE 0.001
> HTML_FONT_BIG 0.142
> MIME_QP_LONG_LINE 0.039
>
> The scores for each rule, when added together, are 4.071. Hence a
> total score of 4.1 (rounded) even though the individual rules when
> rounded before addition only score 3.9.
>
> Perhaps it would be useful to add this little quirk to the wiki?
> On the RoundingIssues page?

I added this.

-- 
Theodore (Ted) Heise     <th...@heise.nu>     Bloomington, IN, USA

Re: Strange SA report maths.

Posted by Craig McLean <cr...@craig.dnsalias.com>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Craig McLean wrote:
| Theodore Heise wrote:
| |
| | On Sun, 15 May 2005, Craig McLean wrote:
| |
| |
| |>-----BEGIN PGP SIGNED MESSAGE-----
| |>Hash: SHA1
| |>
| |>Loren Wilton wrote:
| |>|>Now correct me if I'm wrong, but 3.5 + 0.2 + 0.1 + 0.1 is not 4.1 ?
| |>|
| |>| Rounding.  See the wiki.
| |>
| |>Can you be more specific? A search of wiki.apache.org/spamassassin shows
| |>2 pages containing "rounding":
| |>StatusRounding - orphaned.
| |>RoundingIssues - this is not the issue I'm talking about, and in any
| |>case was fixed in 3.0.
| |
| |
| | I don't what the wiki says, but here's my guess.  The scores applied
| | are actually three digits after the decimal.  In the header report
| | they are rounded off.  Suppose the exact scores in the example you
| | gave are 3.544 + 0.231 + 0.142 + 0.145.  These add up to 4.062,
| | which rounds to 4.1.
| |
|

Spot on. Have a cigar!

The scores on the doors (using set 4):
NO_REAL_NAME 0.007
INVALID_DATE 0.236
HTML_COMMENT_SAVED_URL 0.146
BAYES_99 3.5
HTML_MESSAGE 0.001
HTML_FONT_BIG 0.142
MIME_QP_LONG_LINE 0.039

The scores for each rule, when added together, are 4.071. Hence a total
score of 4.1 (rounded) even though the individual rules when rounded
before addition only score 3.9.

Perhaps it would be useful to add this little quirk to the wiki? On the
RoundingIssues page?

Thanks again for all the help!

Craig.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iD8DBQFCh6LgMDDagS2VwJ4RAqdRAJ9Vr47iEXRiwX5BIwTusOZ+RsrcEwCgjKpM
Nb8XghueAODjyQJ40qV67d4=
=XCuU
-----END PGP SIGNATURE-----

Re: Strange SA report maths.

Posted by Craig McLean <cr...@craig.dnsalias.com>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Theodore Heise wrote:
|
| On Sun, 15 May 2005, Craig McLean wrote:
|
|
|>-----BEGIN PGP SIGNED MESSAGE-----
|>Hash: SHA1
|>
|>Loren Wilton wrote:
|>|>Now correct me if I'm wrong, but 3.5 + 0.2 + 0.1 + 0.1 is not 4.1 ?
|>|
|>| Rounding.  See the wiki.
|>
|>Can you be more specific? A search of wiki.apache.org/spamassassin shows
|>2 pages containing "rounding":
|>StatusRounding - orphaned.
|>RoundingIssues - this is not the issue I'm talking about, and in any
|>case was fixed in 3.0.
|
|
| I don't what the wiki says, but here's my guess.  The scores applied
| are actually three digits after the decimal.  In the header report
| they are rounded off.  Suppose the exact scores in the example you
| gave are 3.544 + 0.231 + 0.142 + 0.145.  These add up to 4.062,
| which rounds to 4.1.
|

Yeah, that could well be it. I'll look at the scores in the .cf files
and see what gives..

Thanks!
Craig.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iD8DBQFCh50uMDDagS2VwJ4RAhHbAKCvXU7kmnRC3wjZBqwrvkz4UQQDOQCgsCCw
bP6QuwgSAvQHMzz/BzhWB4w=
=j+cu
-----END PGP SIGNATURE-----

Re: Strange SA report maths.

Posted by Theodore Heise <th...@heise.nu>.

On Sun, 15 May 2005, Craig McLean wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Loren Wilton wrote:
> |>Now correct me if I'm wrong, but 3.5 + 0.2 + 0.1 + 0.1 is not 4.1 ?
> |
> | Rounding.  See the wiki.
>
> Can you be more specific? A search of wiki.apache.org/spamassassin shows
> 2 pages containing "rounding":
> StatusRounding - orphaned.
> RoundingIssues - this is not the issue I'm talking about, and in any
> case was fixed in 3.0.

I don't what the wiki says, but here's my guess.  The scores applied
are actually three digits after the decimal.  In the header report
they are rounded off.  Suppose the exact scores in the example you
gave are 3.544 + 0.231 + 0.142 + 0.145.  These add up to 4.062,
which rounds to 4.1.

-- 
Theodore (Ted) Heise     <th...@heise.nu>     Bloomington, IN, USA


Re: Strange SA report maths.

Posted by Craig McLean <cr...@craig.dnsalias.com>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Loren Wilton wrote:
|>Now correct me if I'm wrong, but 3.5 + 0.2 + 0.1 + 0.1 is not 4.1 ?
|
|
| Rounding.  See the wiki.
|

Can you be more specific? A search of wiki.apache.org/spamassassin shows
2 pages containing "rounding":
StatusRounding - orphaned.
RoundingIssues - this is not the issue I'm talking about, and in any
case was fixed in 3.0.

Yours in confusion,
Craig.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iD8DBQFCh3Y9MDDagS2VwJ4RAuRAAKC575Fqcj2bpzp8CzcVE4sTiYghogCfTE0r
4205GfjsZXFuTIismKdcqBg=
=g52E
-----END PGP SIGNATURE-----

Re: Strange SA report maths.

Posted by Loren Wilton <lw...@earthlink.net>.
> Now correct me if I'm wrong, but 3.5 + 0.2 + 0.1 + 0.1 is not 4.1 ?

Rounding.  See the wiki.

        Loren