You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spamassassin.apache.org by sp...@incubator.apache.org on 2004/07/12 18:56:13 UTC
[SpamAssassin Wiki] Updated: HowScoresAreAssigned

   Date: 2004-07-12T09:56:13
   Editor: 12.148.252.66 <>
   Wiki: SpamAssassin Wiki
   Page: HowScoresAreAssigned
   URL: http://wiki.apache.org/spamassassin/HowScoresAreAssigned

   no comment

Change Log:

------------------------------------------------------------------------------
@@ -1,9 +1,9 @@
 = How are the scores assigned? =
 
-SA's scores are assigned using a genetic algorithm (GA), to optimise their efficiency and minimise false positives and false negatives. More information can be found on the 'Tests' page. Note that you can help this system by providing statistics on your mail spool.
+In SpamAssassin 2.x, the scores are assigned using a genetic algorithm (GA).  In SpamAssassin 3.x, the scores are assigned using a perceptron learner.  Both systems attempt to optimise the efficiency of the rules which are run, while also minimising the number of false positives and false negatives. More information can be found on the 'Tests' page. Note that you can help this system by providing statistics on your mail spool.
 
 Some DNS blacklist rules are distributed with scores of 0. These generally request or require payment, and as such are disabled by default. Feel free to enable the lookups, if you've paid for them.
 
 A score of 0 will stop a rule from being run.
 
-Note: Scores for "learn" rules, such as BAYES_*, that rate the probability that a message is spam, are scored using the same method. This can produce "confusing" scores, for instance, that have BAYES_80 with a higher score than BAYES_99. There are a few reasons for this. 1) The GA does not understand that BAYES_* are related to one another, they're separate rules that need separate scores. 2) More importantly, the higher the probability from a "learn" rule, the higher likelihood that the message also hit a bunch of other rules. This lets the GA lower the "learn" rule score due to the inevitable false positive, while also still marking the message as spam via the other rule scores. 
+Note: Scores for "learn" rules, such as BAYES_*, that rate the probability that a message is spam, are scored using the same method. This can produce "confusing" scores, for instance, that have BAYES_80 with a higher score than BAYES_99. There are a few reasons for this. 1) The score generation system does not understand that BAYES_* are related to one another, they're separate rules that need separate scores. 2) More importantly, the higher the probability from a "learn" rule, the higher likelihood that the message also hit a bunch of other rules. This lets the score generation system lower the "learn" rule score due to the inevitable false positive, while also still marking the message as spam via the sum of all rule scores.