You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@tomcat.apache.org by André Warnier <aw...@ice-sa.com> on 2013/02/17 19:54:07 UTC

Re: [OT] getRequestURI() in relation to Connector.URIEncoding

Mike Wilson wrote:
...
> 
> Example 2: path /ä in "binary" Unicode
>   GET /.. [0xC3,0xA4]
> 

To nitpick : this is not "binary Unicode". It is simply non-URL-encoded, raw UTF-8, which 
is itself an encoding of Unicode.

The Unicode "codepoint" of "ä" is 0xE4 (decimal 228), usually represented as U+00E4.
That would be the "binary Unicode" value of this character (although one could argue that 
"11100100" would be more proper for binary).
It represents the position of this character in the overall Unicode characters table.

This is encoded as the 2 bytes [0xC3,0xA4] (decimal [195,164]) in the UTF-8 encoding.

Confusion in terminology leads to "mojibake", which in German can be translated as 
"Buchstabensalat" (see http://en.wikipedia.org/wiki/Mojibake).


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org