You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by Jeff Chan <je...@surbl.org> on 2004/04/22 00:43:18 UTC

Re: [SURBL-Discuss] ccTLDs and multiple queries

On Wednesday, April 21, 2004, 3:12:58 PM, Eric Kolve wrote:
> On Wed, Apr 21, 2004 at 03:00:52PM -0700, Jeff Chan wrote:
>> On Wednesday, April 21, 2004, 12:21:16 PM, John Fawcett wrote:
>> > From: "Eric Kolve"
>> >> Initially, when I released spamcopuri I decided to pretty much ignore
>> >> whether the TLD was a country code or not.  This results in about
>> >> twice as many queries as necessary, but guaranteed you would get
>> >> hits if the domain was listed.
>> >>
>> >> Now that people are pointing this to other RBL's beside just surbl,
>> >> should we continue to do second and third level queries? Or just
>> >> the query that we assume to be necessary?  My concern is that not
>> >> all RBLs will process the domains according to a list such as
>> >> http://www.bestregistrar.com/help/ccTLD.htm.  I suppose the worst
>> >> case scenario is we end up getting a miss when we should be getting
>> >> a hit because one side presumes that say TLD .za has a subdomain 'foo',
>> >> when the server doesn't.  The server side would expect a second level,
>> > while
>> >> the client would do a third level query (this is why I wanted the wildcard
>> >> records).  I guess this really isn't that great a consequence considering
>> >> the savings and the fact that this shouldn't occur very often.
>> >>
>> >> I will go ahead and make this change if everyone is comfortable with the
>> >> known risk.
>> >>
>> > I think if an rhsbl is listing a second level registry domain
>> > (like .co.uk) then I think it's up to the list maintainer to implement
>> > the wild card so that xxxxx.co.uk returns an A record. I wouldn't
>> > worry about taking into account such an extreme case,
>> > since I cannot imagine any list wanting to do such widespread
>> > blocking.
>> 
>> Yes, the two level ccTDLs like co.uk should never get into a
>> SURBL.  Only registrar-type domains should, like foo.co.uk.
>> 
>> > I believe there should be a mechanism which distinguishes whether
>> > a second or third level lookup is required based on a static
>> > lists of domains known to have or not have subdomains.
>> > If nothing is known then the default should be to check both
>> > second and third as at present.
>> 
>> Aha, now I think I understand what's being proposed.
>> 
>> Currently SpamCopURI checks all domains at the second
>> and third level against a given SURBL, regardless of
>> whether the domain is in a ccTLD or not.
>> 
>> It sounds like Eric is proposing a change, where if a domain is
>> in the ccTLD list like co.uk, then the client should try
>> extract and check a three level domain like foo.co.uk.  Otherwise
>> it should check two levels like foo.com.
>> 
>> Is that right?  If so it may be ok, though our list of ccTLDs is
>> slightly underspecified (there are some ccTLDs not in it).  Note
>> that my ccTLD list:

> Yes.  This is exactly what I am proposing.

Kewl.  Sounds good to me.  I'm cc'ing the SpamAssassin devlopers
to compare notes on how they're handling ccTLDs in message body
URI checks.

>>   http://spamcheck.freeapp.net/two-level-tlds
>> 
>> is (derived from but) slightly more complete than the one at
>> http://www.bestregistrar.com/help/ccTLD.htm ....
>> 
>> Worst case is that we miss a few ccTLDs.  Probably not too big a
>> deal given that most of the spam domains are .com, .biz, etc.
>> 
>> I believe Eric is also making a finer point that other SURBL data
>> sources may miss some unexpected geographic domains where foo.za
>> occurred and only two-level base-ccTLDs like foo.com.za were
>> expected. Not sure how to handle unusual cases like that.  I
>> suppose we'll need to relay on the country code authorities to be
>> somewhat consistent with respect to what levels they will allow
>> in their ccTLD.
>> 
>> Philosophical point: it's always possible that some spam domains
>> slip through the cracks, but if that happens often enough and
>> we spot them, we can always blacklist them manually.  Perfection
>> may not be possible, but we're certainly greatly increasing the
>> spam detection rates with this approach overall.

> My only concern is that we leave a wide enough of a hole that
> we end of playing catch-up and spammers run through various ccTLDs
> that we have mis-classified using them for links.

Aha, but if a domain is not in the ccTLD list, won't we check it
on two levels on both the client and server sides and therefore
catch it?

In other words if somenewspamdomain.bg comes up, and it's not
in our ccTLD list, our client and server progams will
automatically test it as a two level domain and eventually catch
it.  In that case I think we're ok, and the only danger is
blocking new legitimate two level ccTLDs that we're not yet
aware of like newlegitimatetld.bg .


Jeff C.