You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2013/04/09 02:02:43 UTC

[Bug 6926] New: cw, sx TLDs (and possibly others) not recognized

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6926

            Bug ID: 6926
           Summary: cw, sx TLDs (and possibly others) not recognized
           Product: Spamassassin
           Version: SVN Trunk (Latest Devel Version)
          Hardware: All
                OS: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Libraries
          Assignee: dev@spamassassin.apache.org
          Reporter: julian@mehnle.net
    Classification: Unclassified

http://www.iana.org/domains/root/db lists a couple ccTLDs that are not
recognized by SpamAssassin.  The two that occurred to me are cw and sx.  I
think they should be added to RegistrarBoundaries.pm.  Perhaps a comprehensive
sync with IANA is in order?  Do they provide their TLD list in machine readable
format?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 6926] cw, sx TLDs (and possibly others) not recognized

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6926

Joe Quinn <jq...@pccc.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
                 CC|                            |jquinn+SAbug@pccc.com
         Resolution|---                         |FIXED

--- Comment #8 from Joe Quinn <jq...@pccc.com> ---
Added the two mentioned TLDs with revision 1575917.

Higher-level improvements to RegistrarBoundaries.pm likely belong on bug 6782.
I've cross-referenced this ticket, so we can continue discussion there.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 6926] cw, sx TLDs (and possibly others) not recognized

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6926

--- Comment #2 from Adam Katz <an...@khopis.com> ---
(In reply to comment #1)
> Which currently reveals we're missing:
> 
>     bl bq bv cw eh gb mf post sj ss sx um

Ah, now I see the comment in RegistrarBoundaries.pm:

# The following have been removed from the list because they are
# inactive, as can be seen in the Wikipedia articles about them
# as of 2008-02-08, e.g. http://en.wikipedia.org/wiki/.so_%28domain_name%29
#     bv gb pm sj so um yt

A quick summary of all the candidates:

bl - Saint Barthélemy (French), reserved but unassigned
bq - Carribean Netherlands, designated but not yet used
bv - Bouvet Island (Norway), reserved and sponsored but unused
cw - Curaçao, new and in use, http://www.una.cw/cw_registry/
eh - Western Sahara, no recognized government, reserved but not in use
gb - Great Britain, abandoned except ~3 hosts under .dra.hmg.gb
mf - Saint Martin (French), reserved but unassigned
post - Universal Postal Union (snail mail), new as of August
sj - Svalbard and Jan Mayen (Norway), reserved and sponsored but unused
ss - South Sudan, new and registered, still pending (nation formed 2011-07)
sx - Sint Maarten (Netherlands), new, in use (open), http://registry.sx/
um - US Minor Outlying Islands (USA), revoked (why is it listed?)

It looks like it's still valid to avoid bv gb sj and um.  We should add cw and
post and sx.  The others may be on their way, but don't need inclusion quite
yet.

Given how .sx has open registration and looks like "sex," I expect it to
attract porn sites, which means spam.  We definitely want to include that one.

You can find these on wikipedia like https://en.wikipedia.org/wiki/.so (note
the dot, also note that the disambiguation page from 2008 is gone) and
similarly at IANA like https://www.iana.org/domains/root/db/so.html

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 6926] cw, sx TLDs (and possibly others) not recognized

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6926

Quanah Gibson-Mount <mi...@yahoo.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |mishikal@yahoo.com

--- Comment #4 from Quanah Gibson-Mount <mi...@yahoo.com> ---
We are seeing SPOOF_COM2COM and SPOOF_COM2OTH being triggered by domains on
.org.pe

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 6926] cw, sx TLDs (and possibly others) not recognized

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6926

--- Comment #6 from Quanah Gibson-Mount <mi...@yahoo.com> ---
never mind, .pe isn't a new domain. :P

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 6926] cw, sx TLDs (and possibly others) not recognized

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6926

--- Comment #5 from AXB <ax...@gmail.com> ---
(In reply to comment #4)
> We are seeing SPOOF_COM2COM and SPOOF_COM2OTH being triggered by domains on
> .org.pe

not relevant to this bug

please use the SA users list.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 6926] cw, sx TLDs (and possibly others) not recognized

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6926

Julian Mehnle <ju...@mehnle.net> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |julian@mehnle.net

--- Comment #7 from Julian Mehnle <ju...@mehnle.net> ---
Per comment #2, will anything be done about cw and sx?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 6926] cw, sx TLDs (and possibly others) not recognized

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6926

Adam Katz <an...@khopis.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |antispam@khopis.com

--- Comment #1 from Adam Katz <an...@khopis.com> ---
> Do they provide their TLD list in machine readable format?

How about:

    $ wget -qqO- https://www.iana.org/domains/root/db \
      |perl -ne 'if (m"/root/db/([^.]+)\.html") { print "$1\n" }' \
      > tld.txt

After that, you can run:

    $ sed '/^  ac ad/,/^  zm zw/!d; s/^  //; s/  */\n/g' \
      lib/Mail/SpamAssassin/Util/RegistrarBoundaries.pm \
        |grep -vwFf- tld.txt

Which currently reveals we're missing:

    bl bq bv cw eh gb mf post sj ss sx um
    (plus all the punycode IDNs, unless we track them elsewhere)

(I also ran the opposite.  We don't have any TLDs that aren't on IANA's list.)

We'll have to add these via util_rb_tld in sa-update in addition to
RegistrarBoundaries.pm so users don't have to wait for SA 3.4.0 to get this.

While on the ~tld topic, I see we don't yet include
https://mxr.mozilla.org/mozilla-central/source/netwerk/dns/effective_tld_names.dat?raw=1
(for 2tld and 3tld).  I haven't vetted that to see if it's worthwhile, but in
doing some research a while back, it looked ideal.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 6926] cw, sx TLDs (and possibly others) not recognized

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6926

D. Stussy <so...@kd6lvw.ampr.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |software+spamassassin@kd6lv
                   |                            |w.ampr.org

--- Comment #3 from D. Stussy <so...@kd6lvw.ampr.org> ---
Periodically walking the root zone's NSEC-RR list may be the best way to
regenerate the TLD part of the list, especially as we only care about currently
resolvable TLDs.  From there, the automated process would may any necessary
modifications (would these all be additions?).

-- 
You are receiving this mail because:
You are the assignee for the bug.