You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by Jeff Chan <je...@surbl.org> on 2004/04/19 03:49:33 UTC

Re: [SURBL-Discuss] RFC: SURBL software implemetation guidelines

On Sunday, April 18, 2004, 6:08:11 PM, Simon Byrnand wrote:
> At 12:43 19/04/2004, Jeff Chan wrote:
>> >    2. Extract base (registrar) domains from those URIs. This
>> > includes removing any and all leading host names, subdomains,
>> > www., randomized subdomains, etc. In order to determine the
>> > base domain it may be necessary to use a table of country code
>> > TLDs (ccTLDs) such as the partially-imcomplete one SURBL uses.

[...]
> If a spammer were to register a domain in NZ it would look like:

> spammer.co.nz or spammer.net.nz or spammer.gen.nz etc.... randomised 
> subdomains that they could create on their own nameservers would look like 
> a65423xyz.spammer.co.nz or awef3242.fssf342.spammer.co.nz etc...

> Will the current code (of both SpamCopURI, and the backend processing of 
> the surbl servers for that matter) incorrectly strip this off to co.nz ? I 
> ask, because I have definately seen dns queries from SpamCopURI trying to 
> look up co.nz.sc.surbl.org which is wrong - that would cover a large 
> fraction of the websites under the NZ domain heirachy, it should be looking 
> up spammer.co.nz, never co.nz.

> Is there any reliable way for the code to know what a base registrar domain 
> is and how many tiers there are under that domain heirachy ? (May also be a 
> non-trivial problem)

The traditional solution to ccTLDs (Country Code TLDs) seems to
be to make a table of them, and make sure any extracted domains
are +1 domain levels longer.  So for company.co.nz, don't take
co.nz as the base domain, but instead use company.co.nz since we
know from the table that co.nz is a two level country code TLD.
My slightly incomplete table of ccTLDs is at:

  http://spamcheck.freeapp.net/two-level-tlds

I think SpamAssassin (3.0?) in general has code to do that.
I'm sure SpamCop's internal processing of URIs also takes it into
account.  I'm not sure how Eric's SpamCopURI currently handles
it.  I do know that the current sc.surbl.org data engine will
capture them correctly and I have somewhat of a kludge to get
rid of the two level ccTLDs that would otherwise get through
by letting the engine to all the processing on them, then
suppressing their output with a whitelist which includes the two
level ccTLD domains.  Probably it would be better to increase
the cutoff to three levels instead of two in my code whenever
handling a two-level ccTLD such as co.nz to prevent the procesing
of two-level ccTLDs themselves in the first place while still
leaving the processing of longer ccTLD domains (i.e. complete
ones like company.co.nz) in place. 

So the quick answer is that the data side of sc.surbl.org has it
pretty much covered, and I'm not sure about the message parsing
side of things in SA 2.63 and 3.0.

Jeff C.


Re: [SURBL-Discuss] RFC: SURBL software implemetation guidelines

Posted by Jeff Chan <je...@surbl.org>.
On Sunday, April 18, 2004, 6:58:14 PM, Simon Byrnand wrote:
> At 13:49 19/04/2004, Jeff Chan wrote:
>>The traditional solution to ccTLDs (Country Code TLDs) seems to
>>be to make a table of them, and make sure any extracted domains
>>are +1 domain levels longer.  So for company.co.nz, don't take
>>co.nz as the base domain, but instead use company.co.nz since we
>>know from the table that co.nz is a two level country code TLD.
>>My slightly incomplete table of ccTLDs is at:
>>
>>   http://spamcheck.freeapp.net/two-level-tlds

> Hmm, well your list has .co.nz and .net.nz but not .school.nz (as an example)

OK I added school.nz.  Anyeone know any others to add?  Contact
me off lists.  :-)  The list of ccTLDs came mostly from a registrar's:

  http://www.bestregistrar.com/help/ccTLD.htm

> What are the relative proportions of one level to two level country code 
> TLD's ?

See below.  In terms of spam domains ccTLDs are not a major
problem.  .com, .biz, .net have far more spam domains.

> Are there any other one level hierachies used by countries, apart from the 
> generic .com .org .net .biz etc ? Might be easier (and safer ?) to assume 
> the other way around - assume its a two level country code unless listed. 
> Then you're only having to list the top level (.com for example) rather 
> than trying to keep track of things like .co.nz, .net.nz and so on, which 
> are subject to change at the discretion of the local registrar...

Yes, that's part of the problem.  Local TLD authorities seem to
be able to add whatever TLDs they like under their own CC.  Still
I think ccTLDs should be regarded as minor.  Certainly they are not
a major destination for spam messages.  Given that, handling the
non-ccTLDs as a first priority is probably the most efficient.

Here are some relative rankings of the TLDs in domain reports I
have from a couple weeks worth of SpamCop URI reports:

  TLD    Count of reports
  ---    ----
  com    1938
  biz     424
  net     322
  info     90
  org      79
  us       39
  ru       21
  de       20
  tv       13
  nl       12
  to       10
  ph        8
  cn        8
  cc        7
  br        7
  tw        6
  pl        6
  ch        6
  ws        5
  it        5
  fr        5
  es        5
  ro        4
  jp        4
  cl        4
  nu        3
  kr        3
  cz        3
  co        3
  za        2
  uk        2
  se        2
  pt        2

Jeff C.


Re: [SURBL-Discuss] RFC: SURBL software implemetation guidelines

Posted by Simon Byrnand <si...@igrin.co.nz>.
At 13:49 19/04/2004, Jeff Chan wrote:

>On Sunday, April 18, 2004, 6:08:11 PM, Simon Byrnand wrote:
> > At 12:43 19/04/2004, Jeff Chan wrote:
> >> >    2. Extract base (registrar) domains from those URIs. This
> >> > includes removing any and all leading host names, subdomains,
> >> > www., randomized subdomains, etc. In order to determine the
> >> > base domain it may be necessary to use a table of country code
> >> > TLDs (ccTLDs) such as the partially-imcomplete one SURBL uses.
>
>[...]
> > If a spammer were to register a domain in NZ it would look like:
>
> > spammer.co.nz or spammer.net.nz or spammer.gen.nz etc.... randomised
> > subdomains that they could create on their own nameservers would look like
> > a65423xyz.spammer.co.nz or awef3242.fssf342.spammer.co.nz etc...
>
> > Will the current code (of both SpamCopURI, and the backend processing of
> > the surbl servers for that matter) incorrectly strip this off to co.nz ? I
> > ask, because I have definately seen dns queries from SpamCopURI trying to
> > look up co.nz.sc.surbl.org which is wrong - that would cover a large
> > fraction of the websites under the NZ domain heirachy, it should be 
> looking
> > up spammer.co.nz, never co.nz.
>
> > Is there any reliable way for the code to know what a base registrar 
> domain
> > is and how many tiers there are under that domain heirachy ? (May also 
> be a
> > non-trivial problem)
>
>The traditional solution to ccTLDs (Country Code TLDs) seems to
>be to make a table of them, and make sure any extracted domains
>are +1 domain levels longer.  So for company.co.nz, don't take
>co.nz as the base domain, but instead use company.co.nz since we
>know from the table that co.nz is a two level country code TLD.
>My slightly incomplete table of ccTLDs is at:
>
>   http://spamcheck.freeapp.net/two-level-tlds

Hmm, well your list has .co.nz and .net.nz but not .school.nz (as an example)

What are the relative proportions of one level to two level country code 
TLD's ?

Are there any other one level hierachies used by countries, apart from the 
generic .com .org .net .biz etc ? Might be easier (and safer ?) to assume 
the other way around - assume its a two level country code unless listed. 
Then you're only having to list the top level (.com for example) rather 
than trying to keep track of things like .co.nz, .net.nz and so on, which 
are subject to change at the discretion of the local registrar...

Maybe I missed something :)

Regards,
Simon