You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2008/02/07 15:49:30 UTC

[Bug 5813] several TLDs are not parsed by URI text scanner in PerMsgStatus.pm

http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5813





------- Additional Comments From sidney@sidney.com  2008-02-07 06:49 -------
Looking at the list I posted in the description I noticed that .xxx is there
even though it is not a valid TLD. That led me to compare the table in
Util/RegistrarBoundaries.pm with the official list in
http://data.iana.org/TLD/tlds-alpha-by-domain.txt

I haven't done a careful diff, but I do notice that we include .cs which is not
an official TLD, as wel as xxx, so the table is not correct. Does anyone have
any objection to my simply snarfing the official list into
Util/RegistrarBoundaries.pm, minus the 11 IDN .test TLDS (Those are the ones
that look like XN--0ZWM56D and are in the root table only for an IDN test that
was recently run)?

Also, the comment in PerMsgStatus says that the regexp was generated using
Regexp::Optimizer, but the Regexp::Optimizer suggests using Regexp::List when
you are dealing with a plain list of words with no metacharacters, which is what
we have here. Does anyone know a reason why we should stick with Regexp::Optimizer?




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.