You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@spamassassin.apache.org on 2022/02/14 18:56:52 UTC

[Bug 7953] Inconsistent penalizing of TLD

https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7953

Bill Cole <bi...@apache.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |billcole@apache.org
         Resolution|---                         |INVALID
             Status|NEW                         |RESOLVED

--- Comment #1 from Bill Cole <bi...@apache.org> ---
Citing mail-tester.com in a bug report here REDUCES your credibility. That site
is NOT an accurate representation of SpamAssassin scoring in the wild. They do
a terrible job of staying up to date and have at times had rules and scores
that appear to be entirely invented and/or scored locally. Their verbiage
clearly encourages an incorrect view of how SpamAssassin is designed to work,
because we get far too many "bug reports" due to them which are entirely
non-actionable non-bugs. 

Like this one.

The statistics gathered and published by Spamhaus may be useful to you, or to
Spamhaus, but they are entirely irrelevant to the publication and scoring of
SpamAssassin rules. Most importantly, those stats count domain names, not
messages. What matters for spam filtering is whether a message is spam, not
whether a domain is in some way associated with spam. To illustrate, if
foo.space and bar.space were the only "bad" .space domains but together sent
100 times as much (all spam) mail as all other .space domain, it would be
useful (albeit sloppy, at that scale) to to treat all .space mail as more
likely to be spam than not. Even if 99.999% of .space domains never sent any
spam. 

See https://ruleqa.spamassassin.org for the details of how our rules score
against the manually classified corpora of ham and spam provided by some of our
users. This is an open system and we are always eager to add new dependable
sources to those corpora to get a wider sample. You can see in that system that
the rules you see as problematic match messages that are 97-100% spam. 

Our default ruleset is published daily, based on the operation of that RuleQA
system. Inclusion and scoring of rules is controlled by that system
programmatically, with some manual limits to reduce false positives. SA is
*designed* *intentionally* to have rules whose scores are well below the spam
threshold (5 by default) match on non-spam messages. The fact that a trio of
related rules adds 2.225 points to a non-spam message's score is not a bug. ALL
messages are expected to match multiple rules, some good and some bad. Unless
there is concrete evidence of messages being broadly misclassified as spam,
SpamAssassin is functioning as designed. 


Bottom line: NOT A BUG.

-- 
You are receiving this mail because:
You are the assignee for the bug.