You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by Jeff Chan <je...@surbl.org> on 2004/04/16 13:20:44 UTC

Re: Fwd: Re: Fwd: Re: Announcing SURBL support in SA 2.63 and 3.0 plugins - bug ?

Simon Byrnand, Eric Kolve and I were having a discussion of what
characters are legal in domain names, due to junk showing up
around URIs and apparently confusing some of the SpamAssassin URI
parsing code.  Wanted to share some research and ask if anyone
has any other authoritative information on what characters are
currently legal for domain names.  This is relevant for anyone
trying to work with domain names.

Also Eric, please share bugs you find in the SA URI parsing
code, preferably by  opening a bugzilla, especially if you can
isolate the module, etc.:

   http://bugzilla.spamassassin.org/enter_bug.cgi


Here's a little research on the subject:


The original domain name RFC had names only with letters,
numbers and hyphen:


  http://www.ietf.org/rfc/rfc1035.txt

<domain> ::= <subdomain> | " "

<subdomain> ::= <label> | <subdomain> "." <label>

<label> ::= <letter> [ [ <ldh-str> ] <let-dig> ]

<ldh-str> ::= <let-dig-hyp> | <let-dig-hyp> <ldh-str>

<let-dig-hyp> ::= <let-dig> | "-"

<let-dig> ::= <letter> | <digit>

<letter> ::= any one of the 52 alphabetic characters A through Z in
upper case and a through z in lower case

<digit> ::= any one of the ten digits 0 through 9

Note that while upper and lower case letters are allowed in domain
names, no significance is attached to the case.  That is, two names with
the same spelling but different case are to be treated as if identical.


But RFC 2181 leaves things wide open with respect to names:

  http://www.ietf.org/rfc/rfc2181.txt

11. Name syntax

   Occasionally it is assumed that the Domain Name System serves only
   the purpose of mapping Internet host names to data, and mapping
   Internet addresses to host names.  This is not correct, the DNS is a
   general (if somewhat limited) hierarchical database, and can store
   almost any kind of data, for almost any purpose.

   The DNS itself places only one restriction on the particular labels
   that can be used to identify resource records.  That one restriction
   relates to the length of the label and the full name.  The length of
   any one label is limited to between 1 and 63 octets.  A full domain
   name is limited to 255 octets (including the separators).  The zero
   length full name is defined as representing the root of the DNS tree,
   and is typically written and displayed as ".".  Those restrictions
   aside, any binary string whatever can be used as the label of any
   resource record.  Similarly, any binary string can serve as the value
   of any record that includes a domain name as some or all of its value
   (SOA, NS, MX, PTR, CNAME, and any others that may be added).
   Implementations of the DNS protocols must not place any restrictions
   on the labels that can be used.  In particular, DNS servers must not
   refuse to serve a zone because it contains labels that might not be
   acceptable to some DNS client programs.  A DNS server may be
   configurable to issue warnings when loading, or even to refuse to
   load, a primary zone containing labels that might be considered
   questionable, however this should not happen by default.

   Note however, that the various applications that make use of DNS data
   can have restrictions imposed on what particular values are
   acceptable in their environment.  For example, that any binary label
   can have an MX record does not imply that any binary name can be used
   as the host part of an e-mail address.  Clients of the DNS can impose
   whatever restrictions are appropriate to their circumstances on the
   values they use as keys for DNS lookup requests, and on the values
   returned by the DNS.  If the client has such restrictions, it is
   solely responsible for validating the data from the DNS to ensure
   that it conforms before it makes any use of that data.


After scanning the RFC descriptions that were linked from
RFC 1035:

> RFC 1035      Domain names - implementation and specification.
>       
> Authors:      P.V. Mockapetris.
> Date:         Nov-01-1987
> Formats:      txt pdf
> Obsoletes:    RFC 0973, RFC 0882, RFC 0883
> Updated by:   RFC 1101, RFC 1183, RFC 1348, RFC 1876, RFC 1982,
> RFC 1995, RFC 1996, RFC 2065, RFC 2136, RFC 2181, RFC 2137, RFC
> 2308, RFC 2535, RFC 2845, RFC 3425, RFC 3658 
> Also: STD 0013


it appears that these may be the only two authoritative
statements on what characters can be in domain names:

  RFC 1035:  letters, numbers, hyphen
  RFC 2181:  implementations should support anything

Does anyone have any more info on what characters are legal
in domain names?

Jeff C.


Re: Fwd: Re: Fwd: Re: Announcing SURBL support in SA 2.63 and 3.0 plugins - bug ?

Posted by Tony Finch <do...@dotat.at>.
On Fri, 16 Apr 2004, Jeff Chan wrote:
>
> Does anyone have any more info on what characters are legal
> in domain names?

Domain names that are used as host names (i.e. most of them) must follow
the preferred name syntax in section 2.3.1 of RFC 1035 (same as specified
in RFC 952) with the relaxation specified in RFC 1123 that name components
may start with a digit (e.g. 3com.com).

However the DNS itself doesn't have any such restriction, and a domain
name that isn't a hostname may contain any character at all. See RFC2317
for an example of using / in a domain name. Note also the DNS zone master
file format in section 5 of RFC 1035 which specifies how backslash
escaping can be used to place unusual characters in domain names, for
example a dot embedded in a label (rather than separating them as is
usual).

-- 
Tony Finch  <do...@dotat.at>  http://dotat.at/