You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2013/04/11 03:13:39 UTC

[Bug 6926] cw, sx TLDs (and possibly others) not recognized

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6926

Adam Katz <an...@khopis.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |antispam@khopis.com

--- Comment #1 from Adam Katz <an...@khopis.com> ---
> Do they provide their TLD list in machine readable format?

How about:

    $ wget -qqO- https://www.iana.org/domains/root/db \
      |perl -ne 'if (m"/root/db/([^.]+)\.html") { print "$1\n" }' \
      > tld.txt

After that, you can run:

    $ sed '/^  ac ad/,/^  zm zw/!d; s/^  //; s/  */\n/g' \
      lib/Mail/SpamAssassin/Util/RegistrarBoundaries.pm \
        |grep -vwFf- tld.txt

Which currently reveals we're missing:

    bl bq bv cw eh gb mf post sj ss sx um
    (plus all the punycode IDNs, unless we track them elsewhere)

(I also ran the opposite.  We don't have any TLDs that aren't on IANA's list.)

We'll have to add these via util_rb_tld in sa-update in addition to
RegistrarBoundaries.pm so users don't have to wait for SA 3.4.0 to get this.

While on the ~tld topic, I see we don't yet include
https://mxr.mozilla.org/mozilla-central/source/netwerk/dns/effective_tld_names.dat?raw=1
(for 2tld and 3tld).  I haven't vetted that to see if it's worthwhile, but in
doing some research a while back, it looked ideal.

-- 
You are receiving this mail because:
You are the assignee for the bug.