You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by "William A. Rowe Jr." <wr...@rowe-clan.net> on 2012/07/30 22:54:47 UTC

Re: utf-8 -> punycode for ServerName|Alias?

On 4/7/2012 2:59 AM, Tim Bannister wrote:
> On 7 Apr 2012, at 07:33, William A. Rowe Jr. wrote:
> 
>> So we have live registrars, no longer "experimental", who are now registering domains in punycode.  Make of it what you will.
>>
>> Do we want to recognize non-ASCII strings in the ServerName|Alias directives as utf-8 -> punycode encodings?  Internally, from the time the servername field is assigned, it can be an ascii mapping.
> 
> I think this is more important for mass virtual hosting (VirtualDocumentRoot from mod_vhost_alias, etc). Users would create a document root directory named, eg, テスト.example and expect it to work. They don't know anything about Unicode, let alone punycode.
> I reckon a lot of users would work out quickly that only Roman characters work in domain names, but they aren't going to be able to work out how to rename that folder into the correct punycode nor to tell the folders apart if renamed in this way.
> 
> As a user: I already have a configuration file with a UTF-8 ServerAlias defined, that's just waiting for httpd to implement this feature … and until then, I have the punycoded version in there as well.

I've spent a bit more time on this.  The obvious issue of ambiguious domain
registrations is being handled on a registrar-by-registrar basis, and you
can get a nice summary of the punycode entries accepted by various registrars
here; http://www.mozilla.org/projects/security/tld-idn-policy-list.html

In thinking about what punycode is dangerous to represent, I can't come up
with any within the context of httpd.

 1. User VirtualHost ServerName/ServerAlias entries, or mod_vhost_alias
    entries.  These are controlled by the administrator, not affected by
    the remote client.  Provided that client provided non-ASCII domains
    are refused, then punycode can be represented as UTF-8 in our access
    and error logs, server config directives and so forth when referring
    to the locally configured domain names.  We should always present
    these in things like mod_info and httpd -D DUMP_VHOSTS as name(punyname)
    to help the administrator to untangle any confusion.

 2. Location: headers and automated self-url references should must present
    the punycode url in href= and other header fields, but may present the
    utf-8 in the presentation context such as error pages or autoindexes, etc.
    Whatever the W3C has to say about this in HTML5 is irrelevant if we don't
    know whether the user agent supports utf-8 -> punycode transliteration.

What is less clear is what precautions we should take when functioning as
a forward proxy with proxy uri string contents, or presenting user-provided,
non-canonicalized host names.  I can imagine such translation being abused to
conceal some forms of XSS exploitation.

I'd start by assembling a patch to introduce punycode transliteration into the
apr-util library and another patch into httpd for vhost, mass-vhosting using
utf-8 path names, and presenting trusted utf-8 values for our error log and
field tokens.  Does anyone have concerns before I begin messing with this logic?




Re: utf-8 -> punycode for ServerName|Alias?

Posted by "William A. Rowe Jr." <wr...@rowe-clan.net>.
On 7/30/2012 3:11 PM, Tim Bannister wrote:
> On 30 Jul 2012, at 23:00, William A. Rowe Jr. wrote:
> 
>> Exactly my point.  If you configure a utf-8 hostname, we know in fact it is
>> a punycode encoding of that value, which is why I believe it makes sense to
>> represent both when you test the vhost configs with -D DUMP_VHOSTS.  If you
>> configure a punycode hostname, it will be accepted with no hassle.  There
>> is no such thing as an actual utf-8 or extended ASCII (8 bit) hostname.
> 
> At the moment I have configuration (not working, but “ready” anyway :-) for the same virtual host in UTF-8 and punycode variants. I could easily set one of them to differ from the other.
> 
> How will the new httpd handle this kind of situation? I think what I'd expect is a warning and then for one of them to take precedence and the other to be ignored.

I expect we would follow the same duplicate detection logic we currently employ
against ServerName/ServerAlias.


Re: utf-8 -> punycode for ServerName|Alias?

Posted by Tim Bannister <is...@jellybaby.net>.
On 30 Jul 2012, at 23:00, William A. Rowe Jr. wrote:

> Exactly my point.  If you configure a utf-8 hostname, we know in fact it is
> a punycode encoding of that value, which is why I believe it makes sense to
> represent both when you test the vhost configs with -D DUMP_VHOSTS.  If you
> configure a punycode hostname, it will be accepted with no hassle.  There
> is no such thing as an actual utf-8 or extended ASCII (8 bit) hostname.

At the moment I have configuration (not working, but “ready” anyway :-) for the same virtual host in UTF-8 and punycode variants. I could easily set one of them to differ from the other.

How will the new httpd handle this kind of situation? I think what I'd expect is a warning and then for one of them to take precedence and the other to be ignored.

-- 
Tim Bannister – isoma@jellybaby.net


Re: utf-8 -> punycode for ServerName|Alias?

Posted by "William A. Rowe Jr." <wr...@rowe-clan.net>.
On 7/30/2012 2:47 PM, Reindl Harald wrote:
> 
> 
> Am 30.07.2012 22:54, schrieb William A. Rowe Jr.:
>> What is less clear is what precautions we should take when functioning as
>> a forward proxy with proxy uri string contents, or presenting user-provided,
>> non-canonicalized host names.  I can imagine such translation being abused to
>> conceal some forms of XSS exploitation.
>>
>> I'd start by assembling a patch to introduce punycode transliteration into the
>> apr-util library and another patch into httpd for vhost, mass-vhosting using
>> utf-8 path names, and presenting trusted utf-8 values for our error log and
>> field tokens.  Does anyone have concerns before I begin messing with this logic?
> 
> the idn-code has nothing to search in server-configs
> 
> they are not in DNS, they are not in mail-servers
> all on the server level is working with punny-codes
> and this is good how it is

Exactly my point.  If you configure a utf-8 hostname, we know in fact it is
a punycode encoding of that value, which is why I believe it makes sense to
represent both when you test the vhost configs with -D DUMP_VHOSTS.  If you
configure a punycode hostname, it will be accepted with no hassle.  There
is no such thing as an actual utf-8 or extended ASCII (8 bit) hostname.

For adding the feature to mod_vhost_alias, I absolutely would not do that
without adding a flag (or a %-escape modifier) to determine whether we are
looking at the punycode host (default) or utf-8 representation (configurable).

If there are a significant number of folks who enjoy reading punycode, I have
no problem making the access and error log representations in original
punycode representation, and for non-canonical hostname behavior, I believe
that would be for the best.



Re: utf-8 -> punycode for ServerName|Alias?

Posted by Reindl Harald <h....@thelounge.net>.

Am 30.07.2012 22:54, schrieb William A. Rowe Jr.:
> What is less clear is what precautions we should take when functioning as
> a forward proxy with proxy uri string contents, or presenting user-provided,
> non-canonicalized host names.  I can imagine such translation being abused to
> conceal some forms of XSS exploitation.
> 
> I'd start by assembling a patch to introduce punycode transliteration into the
> apr-util library and another patch into httpd for vhost, mass-vhosting using
> utf-8 path names, and presenting trusted utf-8 values for our error log and
> field tokens.  Does anyone have concerns before I begin messing with this logic?

the idn-code has nothing to search in server-configs

they are not in DNS, they are not in mail-servers
all on the server level is working with punny-codes
and this is good how it is