You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@spamassassin.apache.org by Adam Katz <sp...@khopis.com> on 2006/11/06 22:58:37 UTC

Default SpamAssassin scores don't make sense

(re-sending this email, last one sent 10/30 15:19 EST and not posted to
 list, despite that another message was sent to the list successfully
only half an hour later.)

Why do default scores not increase with severity?  For example,
SpamAssassin 3.1.7 has inconsistent progression of default scores in
html obfuscation, dates set in the future, and spf marking:

score HTML_OBFUSCATE_05_10 1.421 1.169 1.522 1.449
score HTML_OBFUSCATE_10_20 1.936 1.397 2.371 1.770
score HTML_OBFUSCATE_20_30 2.720 2.720 3.145 3.400
score HTML_OBFUSCATE_30_40 2.480 2.480 2.867 2.859
score HTML_OBFUSCATE_40_50 2.160 2.160 2.498 2.640
score HTML_OBFUSCATE_50_60 2.049 2.061 2.342 2.031
score HTML_OBFUSCATE_60_70 1.637 1.592 1.892 1.652
score HTML_OBFUSCATE_70_80 1.440 1.507 1.680 1.472
score HTML_OBFUSCATE_80_90 1.244 1.191 1.397 0.982
score HTML_OBFUSCATE_90_100 0 # n=0 n=1 n=2 n=3

score DATE_IN_FUTURE_03_06 2.061 2.007 2.275 1.961
score DATE_IN_FUTURE_06_12 1.680 1.498 1.883 1.668
score DATE_IN_FUTURE_12_24 2.320 2.316 2.775 2.767
score DATE_IN_FUTURE_24_48 2.080 2.080 2.498 2.688
score DATE_IN_FUTURE_48_96 1.680 1.680 1.942 2.100
score DATE_IN_FUTURE_96_XX 1.920 1.888 2.276 2.403

score SPF_NEUTRAL  0 1.379 0 1.069
score SPF_SOFTFAIL 0 1.470 0 1.384
score SPF_FAIL     0 1.333 0 1.142

To keep this message on-topic, I am not commenting about whether the
scores are fair to message spaminess.  I am asking about their fairness
to other relative levels; HTML_OBFUSCATE_80_90 should be higher than
HTML_OBFUSCATE_20_30, DATE_IN_FUTURE_96_XX should be higher than
DATE_IN_FUTURE_12_24, and SPF_FAIL should be higher than SPF_SOFTFAIL.
There are a large number of sets of scores that seem quite arbitrary in
their assignment.  While I'm happy to see this no longer includes
Bayesian scores, it is still a huge surprise.

Is there an explanation guide online about how scores are chosen?  Is
this automated in some manner that seems to get incremental tests
weighted based more on frequency than on severity?  I try to keep my
rules tweaks minor, but my local.cf is getting bigger and bigger...  how
large is the typical local.cf for servers with 25-100 users?

Thank you,
Adam Katz

Re: Default SpamAssassin scores don't make sense

Posted by Matt Kettler <mk...@verizon.net>.

Adam Katz wrote:
> Theo Van Dinter wrote:
>   
>> http://wiki.apache.org/spamassassin/HowScoresAreAssigned
>>     
>
> Thanks, that's what I was looking for.
>
>   
>> The short version is that as far as SA and the perceptron (that which
>> generates the scores) are concerned, rules are independent.  There is no
>> "increase in severity", either a rule hits or it doesn't
>>     
>
> Bayes is a perfect example of this, and is mentioned as such on the very
> page you referenced.  Several filters, including those that I listed at
> the top of this thread, are indeed incremental, increasing in severity.
>  I am shocked to hear that there is nobody moderating the automated
> scores (an Alan Greenspan of the anti-spam world, per se).
>   

Nobody said that nobody moderates the scores. I myself spend a
considerable amount of time studying them.

However, none of us is so rash as to make adjustments just to make the
results look better. 99% of the time, investigations into "illogical"
scores turn up real-world evidence that explains them.
Let's take a brief look at your SPF expample.

You'd expect SPF_FAIL to have a higher score than SPF_SOFTFAIL. However,
the real world shows otherwise.

Let's rip the results out of STATISTICS-set3.txt:

OVERALL%   SPAM%     HAM%     S/O    RANK   SCORE  NAME

  3.437   4.8942   0.0396    0.992   0.80    1.38  SPF_SOFTFAIL
  2.550   3.5717   0.1676    0.955   0.53    1.14  SPF_FAIL

Look at the S/O for each. This represents what percentage of mail the
rule matched is actually spam, where 1.00 means 100% of the matching
messages were spam.

Notice how the S/O of SPF_FAIL is actually LOWER than SOFTFAIL?

Why? Probably because there are more "aggressive" admins publishing
records with -all without thinking about their whole network. The more
cautious folks who have spent a lot of time thinking about their
network, are more likely to realize them might have missed something and
use ~all (softfail).

Human behavior is in no way linear, and SPF here is a result of the
behavior of the admin publishing the records. My explanation is a guess,
but it makes sense if you think about the generall behaviors of cautious
admin compared to a "rabbid" one.

Now let's look at DATE_IN_FUTURE..

  1.605   2.2815   0.0264    0.989   0.75    1.96  DATE_IN_FUTURE_03_06
  0.926   1.2926   0.0716    0.948   0.56    1.67  DATE_IN_FUTURE_06_12
  1.986   2.8309   0.0151    0.995   0.81    2.77  DATE_IN_FUTURE_12_24
  0.260   0.3676   0.0075    0.980   0.53    2.69  DATE_IN_FUTURE_24_48
  0.089   0.1252   0.0038    0.971   0.40    2.10  DATE_IN_FUTURE_48_96
  0.245   0.3474   0.0075    0.979   0.52    2.40  DATE_IN_FUTURE_96_XX

Here again we see non-linearity in the S/O performance of the real world
data. Note that 06_12 has the lowest S/O of the lot, and, imagine that,
it got the lowest score too.

There's some degree of "non-fit" here, as DATE_IN_FUTURE_96_XX has the
highest score, but not the highest S/O. A study of the actual corpus
itself would likely show that this rule is more likely to match spam
that has very few other rules matching, hence the higher score. This is
a case of that "interaction with other rules" thing in my last message.

HTML_OBFUSCATE is a bit more complicated:

OVERALL%   SPAM%     HAM%     S/O    RANK   SCORE  NAME
  0.637   0.9048   0.0132    0.986   0.66    1.45  HTML_OBFUSCATE_05_10
  0.921   1.3128   0.0075    0.994   0.74    1.77  HTML_OBFUSCATE_10_20
  0.671   0.9582   0.0000    1.000   0.70    3.40  HTML_OBFUSCATE_20_30
  0.406   0.5801   0.0000    1.000   0.63    2.86  HTML_OBFUSCATE_30_40
  0.198   0.2836   0.0000    1.000   0.51    2.64  HTML_OBFUSCATE_40_50
  0.242   0.3458   0.0000    1.000   0.54    2.03  HTML_OBFUSCATE_50_60
  0.081   0.1155   0.0000    1.000   0.40    1.65  HTML_OBFUSCATE_60_70
  0.055   0.0784   0.0000    1.000   0.38    1.47  HTML_OBFUSCATE_70_80
  0.012   0.0178   0.0000    1.000   0.31    0.98  HTML_OBFUSCATE_80_90
  0.004   0.0057   0.0000    1.000   0.29    0.00  HTML_OBFUSCATE_90_100

Here the S/O's have a clear up-swing trend. However, the hit-rates at
the upper end are very low. That's probably what's suppressing the
scores of 60_70 and higher. They just don't hit enough mail to be relevant.

Re: Default SpamAssassin scores don't make sense

Posted by Adam Katz <sp...@khopis.com>.

Theo Van Dinter wrote:
> http://wiki.apache.org/spamassassin/HowScoresAreAssigned

Thanks, that's what I was looking for.

> The short version is that as far as SA and the perceptron (that which
> generates the scores) are concerned, rules are independent.  There is no
> "increase in severity", either a rule hits or it doesn't

Bayes is a perfect example of this, and is mentioned as such on the very
page you referenced.  Several filters, including those that I listed at
the top of this thread, are indeed incremental, increasing in severity.
 I am shocked to hear that there is nobody moderating the automated
scores (an Alan Greenspan of the anti-spam world, per se).

>> weighted based more on frequency than on severity?  I try to keep my
>> rules tweaks minor, but my local.cf is getting bigger and bigger...  how
>> large is the typical local.cf for servers with 25-100 users?
> 
> Most people, I think, leave most of the scores alone, which is good and bad.
> 
> FWIW, the suggested way to get the best SA performance for your mail
> server is to generate your own score sets from your own mails.  I don't
> actually know of anyone who does this though.

The wiki documentation seems to discourage modifying rule scores more
than encourage it.  We have a dozen or so custom rules and several dozen
score modifications, plus a good number of the CustomRulesets from the
wiki and the SARE collection are in full use.

All low-scoring caught spam at my company gets caught in a net for my IT
staff to review, with the rare false positives getting forwarded to the
intended recipients, sa-learn'ed as ham, and offending scores get
reviewed.  A good 20-50% of the low-scoring caught spam was caught only
due to our custom filters and adjusted scores (note, these numbers are
with SA 2.63; our upgrade to 3.1.7 is scheduled for before thanksgiving
while I work out the kinks).

-Adam

Re: Default SpamAssassin scores don't make sense

Posted by Theo Van Dinter <fe...@apache.org>.

On Mon, Nov 06, 2006 at 04:58:37PM -0500, Adam Katz wrote:
> Why do default scores not increase with severity?  For example,
> SpamAssassin 3.1.7 has inconsistent progression of default scores in
> html obfuscation, dates set in the future, and spf marking:

http://wiki.apache.org/spamassassin/HowScoresAreAssigned

The short version is that as far as SA and the perceptron (that which
generates the scores) are concerned, rules are independent.  There is no
"increase in severity", either a rule hits or it doesn't

> weighted based more on frequency than on severity?  I try to keep my
> rules tweaks minor, but my local.cf is getting bigger and bigger...  how
> large is the typical local.cf for servers with 25-100 users?

Most people, I think, leave most of the scores alone, which is good and bad.

FWIW, the suggested way to get the best SA performance for your mail
server is to generate your own score sets from your own mails.  I don't
actually know of anyone who does this though.

-- 
Randomly Selected Tagline:
"These periods are always 15 minutes shorter than I'd like them, and 
 probably 15 minutes longer than you'd like them."   - Prof. Van Bluemel

Re: Default SpamAssassin scores don't make sense

Posted by Matt Kettler <mk...@verizon.net>.

Adam Katz wrote:
> On Mon, 6 Nov 2006, John D. Hardin wrote:
>   
>> The default scores are generated by analyzing their performance
>> against hand-categorized corpa of actual emails. If a rule hits spam
>> often and ham rarely, it will be given a higher score than one that
>> hits spam often and ham occasionally.
>>     
>
> That sounds very Bayesian ... with Bayesian rules already doing that sort
> of logic, I would hope there is more human thinking put into score
> setting. 

Actually, in this case, a little human thinking will mislead you. You're
thinking only of a tiny view of the overall picture.

Fundamentally, what rules hit and don't hit for spam are not some kind
of linear equation. They're a function of human behaviors, wierd quirks
of the code written into a spam generation tool by its author. All of
this is very much NOT subject to any kind of simple rules like "10% is
worse than 20%".

When you start to realize this, you'll start to understand the scoring
process.. just a little.

Now consider that the rules are not scored individually. They're scored
as a collective set. A single equation in hundreds of variables, all of
which are simultaneously tweaked to acheive a "best fit" to real-world data.

This causes the score of one rule to not just be a function of its own
behavior, but also a function of other rules.

A rule might perform very well, but it might also match all the same
spam as another rule. If that other rule matches just slightly fewer
nonspams, there's a dramatic shift in score to favor the better of the two.

And thousands of combinations of smaller-scale instances of such
balancing occurs in the scoring process.

If you put a LOT of human thinking into it, you'll come to understand
what's going on, but you really have to think about the big picture here.

Re: Default SpamAssassin scores don't make sense

Posted by Adam Katz <sp...@khopis.com>.

On Mon, 6 Nov 2006, John D. Hardin wrote:
> The default scores are generated by analyzing their performance
> against hand-categorized corpa of actual emails. If a rule hits spam
> often and ham rarely, it will be given a higher score than one that
> hits spam often and ham occasionally.

That sounds very Bayesian ... with Bayesian rules already doing that sort
of logic, I would hope there is more human thinking put into score
setting.  The bayes rules are very shiny and effective, but they are
supposed to assist the hand-drawn filters rather than have the filters
assist the bayes rules.  ... if that's the current SA thinking, I'll have
to re-consider CRM114 and other "better-than-bayes" systems.

> Rule performance against real-world traffic can be counterintuitive,
> and the rules' relation to each other isn't necessarily a part of the
> analysis.

That's where the human tweaking is supposed to happen; if gobs of spam
flag the 80% meter of some test while no ham does, and the 90% meter is
almost never hit by anything, it should have a higher value than the 80%
meter does.  If the 90% meter has more ham than spam despite the 80% meter
having more spam than ham, the tests need to be closely looked at rather
than inappropriately weighted.

just my two cents, anyway

-Adam Katz

Re: Default SpamAssassin scores don't make sense

Posted by "John D. Hardin" <jh...@impsec.org>.

On Mon, 6 Nov 2006, Adam Katz wrote:

> Why do default scores not increase with severity?  For example,
> SpamAssassin 3.1.7 has inconsistent progression of default scores in
> html obfuscation, dates set in the future, and spf marking:

The default scores are generated by analyzing their performance
against hand-categorized corpa of actual emails. If a rule hits spam
often and ham rarely, it will be given a higher score than one that
hits spam often and ham occasionally.

Rule performance against real-world traffic can be counterintuitive,
and the rules' relation to each other isn't necessarily a part of the
analysis.

I'm sure somebody else will chime in with a relevant wiki URL...

--
 John Hardin KA7OHZ    ICQ#15735746    http://www.impsec.org/~jhardin/
 jhardin@impsec.org    FALaholic #11174    pgpk -a jhardin@impsec.org
 key: 0xB8732E79 - 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
  The difference between ignorance and stupidity is that the stupid
  desire to remain ignorant.                             -- Jim Bacon
-----------------------------------------------------------------------
 Tomorrow: the campaign ads stop