You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@spamassassin.apache.org on 2022/06/14 08:55:23 UTC

[Bug 8008] New: Unacceptable penalization of XYZ TLD

https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8008

            Bug ID: 8008
           Summary: Unacceptable penalization of XYZ TLD
           Product: Spamassassin
           Version: unspecified
          Hardware: PC
                OS: Mac OS X
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Rules
          Assignee: dev@spamassassin.apache.org
          Reporter: collimarco91@gmail.com
  Target Milestone: Undefined

Hello,

We have a legit domain with a .XYZ extension.

SpamAssissin gives a score of -2.8 just for the TLD!

This doesn't really make sense:

1. On the web the authority belongs to first level domains, not to the TLD
2. Evaluating a website based on a TLD is like evaluating a person based on the
skin color (even if a race commits more crimes, it doesn't mean that they are
all criminals).
3. In any case the XYZ TLD sends less spam than the COM TLD, so your
discrimination is agains any logic (source:
https://www.spamhaus.org/statistics/tlds/)

Please fix this wrong, illogic, unmotivated behavior.
It's unacceptable that our domain, that never sent a single email of spam, gets
damaged due to a different TLD.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 8008] Unacceptable penalization of XYZ TLD

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8008

Bill Cole <bi...@apache.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
                 CC|                            |billcole@apache.org
         Resolution|---                         |INVALID

--- Comment #1 from Bill Cole <bi...@apache.org> ---
(In reply to Marco Colli from comment #0)
> Hello,
> 
> We have a legit domain with a .XYZ extension.
> 
> SpamAssissin gives a score of -2.8 just for the TLD!

The fact that you claim a negative score indicates that you are using something
other than SpamAssassin OR that you are using some broken tool that reverses
the sign of SA scores. Based on nuances of how a domain is used in specific
messages, different rules may hit on different messages that you send. 

Note that 2.8 is NOT enough of a score to make SA mark a message as spam. The
standard threshold is 5.0 and without concrete examples of legitimate mail
scoring over that threshold, we don't consider a match on a rule to constitute
a bug of any sort. SA rules go through a daily QA process that examines
correlations and assigns scores based on real world mail flows, with the
expectation that some rules which correlate to spam will sometimes match
non-spam, and rules that correlate to non-spam will sometimes match spam. 

> This doesn't really make sense:
> 
> 1. On the web the authority belongs to first level domains, not to the TLD

SA rules are not based on notional rules of authority, they are based on
real-world experience and correlations, NOT necessarily causation. We make no
effort to evaluate causation or sender intention. 

> 2. Evaluating a website

SA does not evaluate any website. SA evaluates email messages, one at a time.

> based on a TLD is like evaluating a person based on
> the skin color (even if a race commits more crimes, it doesn't mean that
> they are all criminals).

Domains are not people. People pay to use domains in specific TLDs of their 
choice. 

Because of past complaints regarding  the scoring of .xyz domains, we defined a
specific testing rule in QA that looks only at .xyz domains, apart from the
broad production rules that match a long list of TLDs. As of the latest full
network QA run, over 96%  of mail matching that rule is spam at the sites
submitting their data to us. See
https://ruleqa.spamassassin.org/20220604-r1901610-n/T_SCC_TLD_XYZ/detail for
what our contributors' mail flows show. 


> 3. In any case the XYZ TLD sends less spam than the COM TLD, so your
> discrimination is agains any logic (source:
> https://www.spamhaus.org/statistics/tlds/)

The Spamhaus stats look at raw volume, which is not relevant to how filtering
is implemented. 

What IS relevant to the SA scoring algorithm is how well rules correlate to the
spam/ham determination. >96% of mail using a .xyz address in the From header is
spam. That's a strong enough correlation to be useful, regardless of the
causality chain that might explain it. 

> Please fix this wrong, illogic, unmotivated behavior.

No. The relevant existing rules are good enough to retain without modification. 

> It's unacceptable that our domain, that never sent a single email of spam,
> gets damaged due to a different TLD.

If you have a concrete example of an actual mis-classification of legitimate
requested email by a site running SpamAssassin with the standard 5.0 threshold,
please provide it so that we can determine the best way to avoid that
false-positive result. The fact of a mislabeled sub-threshold score from an
unknown testing source is not an actionable issue. 

SA is operating AS DESIGNED.

-- 
You are receiving this mail because:
You are the assignee for the bug.