You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Chris Santerre <cs...@MerchantsOverseas.com> on 2004/09/29 16:55:55 UTC
Why such a low score?
What was the reason WS got such a low score in SA 3.0??? .5 is a joke! Hell
BigEvil was scored a 3 and now one complained, and it is the same data!! I
don't understand. Did the mass check not go well?
Chris Santerre
System Admin and SARE Ninja
http://www.rulesemporium.com
http://www.surbl.org
'It is not the strongest of the species that survives,
not the most intelligent, but the one most responsive to change.'
Charles Darwin
Re: Why such a low score?
Posted by Matt Kettler <mk...@evi-inc.com>.
At 10:55 AM 9/29/2004, Chris Santerre wrote:
>What was the reason WS got such a low score in SA 3.0??? .5 is a joke! Hell
>BigEvil was scored a 3 and now one complained, and it is the same data!! I
>don't understand. Did the mass check not go well?
Chris... Calm down a sec.
The score assigned by the GA does not indicate how well or poorly a rule
performs.
In this case WS probably got a low score due to a large amount of overlap
with the other URIBL rules.
Remember, the GA tunes the scores so the optimal amount of spam and ham end
up in the right baskets. This means that rule scores aren't a function of
just the rule, but a function of how that rule interacts with other rules.
Re: Why such a low score?
Posted by Raymond Dijkxhoorn <ra...@prolocation.net>.
Chris,
> What was the reason WS got such a low score in SA 3.0??? .5 is a joke! Hell
> BigEvil was scored a 3 and now one complained, and it is the same data!! I
> don't understand. Did the mass check not go well?
We pointed this out several times, the mass checker found way too many
FP's and so SA decided to score it lower. Its 'our own' problem, we have
to get out those FP's. The scoring is done with SA 3.1 again, so lets try
to do better there...
And yes, i am disappointed also with this very low scoring, personally i
have raised it via my local.cf.
Bye,
Raymond
Re: Why such a low score?
Posted by Matt Kettler <mk...@evi-inc.com>.
At 10:55 AM 9/29/2004, Chris Santerre wrote:
>What was the reason WS got such a low score in SA 3.0??? .5 is a joke! Hell
>BigEvil was scored a 3 and now one complained, and it is the same data!! I
>don't understand. Did the mass check not go well?
Upon closer inspection, the WS mass-check went pretty well, but WS had the
greatest number of nonspam hits of all the SURBL lists. It also hit the
most spam, but the OB list hit nearly as much spam, and almost no nonspam.
Since the GA treats FP's as 100 times worse than FNs, the GA is going to
heavily bias the score of any overlapping spam hits to the one that has the
least nonspam hits. I suspect that in the spam cases, most of the WS hits
also hit either OB or SC, which have better FP ratios, and the scores
assigned reflect this.
Admittedly the amount of nonspam WS hit is small (0.4%), but that's over 6
times more nonspam than OB did, and 100 times more than SC did.
Thus WS got a lowish score not for being a bad rule, but for not doing as
well as it's neighbors that catch the same spams.
From STATISTICS-set1.txt
OVERALL% SPAM% HAM% S/O RANK SCORE NAME
10.497 15.8904 0.0008 1.000 0.98 2.01 URIBL_AB_SURBL
18.019 27.2741 0.0046 1.000 0.97 3.90 URIBL_SC_SURBL
49.029 74.1861 0.0654 0.999 0.74 2.00 URIBL_OB_SURBL
51.999 78.4712 0.4756 0.994 0.45 0.54 URIBL_WS_SURBL
0.010 0.0146 0.0012 0.927 0.39 0.84 URIBL_PH_SURBL
From STATISTICS-set3.txt:
OVERALL% SPAM% HAM% S/O RANK SCORE NAME
7.022 14.4233 0.0061 1.000 0.95 4.26 URIBL_SC_SURBL
30.471 62.5514 0.0632 0.999 0.74 3.21 URIBL_OB_SURBL
2.950 6.0208 0.0385 0.994 0.73 0.42 URIBL_AB_SURBL
33.807 68.9994 0.4494 0.994 0.47 1.46 URIBL_WS_SURBL
0.019 0.0390 0.0008 0.981 0.44 2.00 URIBL_PH_SURBL
grep SURBL 50_scores.cf:
score URIBL_AB_SURBL 0 2.007 0 0.417
score URIBL_OB_SURBL 0 1.996 0 3.213
score URIBL_PH_SURBL 0 0.839 0 2.000
score URIBL_SC_SURBL 0 3.897 0 4.263
score URIBL_WS_SURBL 0 0.539 0 1.462