You are viewing a plain text version of this content. The canonical link for it is here.
Posted to bugs@httpd.apache.org by bu...@apache.org on 2012/07/17 01:55:18 UTC

[Bug 53554] New: Wrong case for hexadecimal percent encoding [patch]

https://issues.apache.org/bugzilla/show_bug.cgi?id=53554

          Priority: P2
            Bug ID: 53554
          Assignee: bugs@httpd.apache.org
           Summary: Wrong case for hexadecimal percent encoding [patch]
          Severity: normal
    Classification: Unclassified
                OS: Linux
          Reporter: tstarling@wikimedia.org
          Hardware: PC
            Status: NEW
           Version: 2.5-HEAD
         Component: mod_rewrite
           Product: Apache httpd-2

Created attachment 29069
  --> https://issues.apache.org/bugzilla/attachment.cgi?id=29069&action=edit
Use uppercase hexadecimal digits in mod_rewrite

Apache mod_rewrite encodes special characters using lowercase hexadecimal
digits, for example Chráněná becomes Chr%c3%a1n%c4%9bn%c3%a1 instead of
Chr%C3%A1n%C4%9Bn%C3%A1. The use of a non-canonical URL breaks our caching
system. We can't use lowercase hexadecimal digits as our canonical URLs because
no browser sends URLs like that, so the cache would be even more badly broken.
Please use uppercase hexadecimal digits in URLs.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org


[Bug 53554] Wrong case for hexadecimal percent encoding [patch]

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=53554

--- Comment #2 from Tim Starling <ts...@wikimedia.org> ---
(In reply to comment #1)
> In RFC 1738, about Uniform Resource Locators (URL)
> (http://www.rfc-editor.org/rfc/rfc1738.txt)
> 
> 
> it is written that :
> 
> >>>
> 2.2. URL Character Encoding Issues
> 
> [...]
> In addition, octets may be encoded by a character triplet consisting
> of the character "%" followed by the two hexadecimal digits (from
> "0123456789ABCDEF") which forming the hexadecimal value of the octet.
> (The characters "abcdef" may also be used in hexadecimal encodings.)
> [...]
> 
> <<<
> 
> 
> So, I guess that httpd is correct when encoding with lower case.
> 
> 
> I left the report open, just in case, but I think that it should be marked
> as FIXED, WONTFIX.

I think the RFC is pretty clear about which encoding is preferred, and it's not
the one httpd is using. You seem to be using a very loose definition of
"correct". There are two ways of doing it: one is preferred, the other is
idiosyncratic and breaks caching. It is a simple change and the patch is
attached.

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org


[Bug 53554] Wrong case for hexadecimal percent encoding [patch]

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=53554

--- Comment #3 from Wim Lewis <wi...@omnigroup.com> ---
Apache is not incorrect here; the cache is not performing its job as well as it
could: a well-written cache would compare URLs more intelligently than just a
simple string compare.

The RFC does say that software should encode URLs with upper-case hex encoding,
though, and many clients do have bugs like this one when it comes to comparing
URLs, so I think it would be reasonable for apache to change its behavior here.
("Be strict in what you produce, but liberal in what you accept", and all
that.)

http://tools.ietf.org/html/rfc3986#section-6.2 has more discussion on URL
comparison and normalization.

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org


[Bug 53554] Wrong case for hexadecimal percent encoding [patch]

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=53554

--- Comment #1 from Christophe JAILLET <ch...@wanadoo.fr> ---
In RFC 1738, about Uniform Resource Locators (URL)
(http://www.rfc-editor.org/rfc/rfc1738.txt)


it is written that :

>>>
2.2. URL Character Encoding Issues

[...]
In addition, octets may be encoded by a character triplet consisting
of the character "%" followed by the two hexadecimal digits (from
"0123456789ABCDEF") which forming the hexadecimal value of the octet.
(The characters "abcdef" may also be used in hexadecimal encodings.)
[...]

<<<


So, I guess that httpd is correct when encoding with lower case.


I left the report open, just in case, but I think that it should be marked as
FIXED, WONTFIX.

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org