You are viewing a plain text version of this content. The canonical link for it is here.
Posted to modperl@perl.apache.org by Tatsuhiko Miyagawa <mi...@edge.co.jp> on 2002/11/28 22:55:54 UTC

ap_unescape_url can't escape %uXXXX

It seems that Apache's ap_unescape_url() can't handle %uXXXX style
URI-escaped Unicode string, hence Apache::Request cannot neighther,
while CGI.pm can.

Is this a known issue?


-- 
Tatsuhiko Miyagawa <mi...@edge.co.jp>

Re: Re[2]: ap_unescape_url can't escape %uXXXX

Posted by Joe Schaefer <jo...@sunstarsys.com>.
Lee Goddard <ho...@LeeGoddard.com> writes:

> Any idea if/when that'll be incorporated to the distributions?
> 
> I currently have to have a handler check every incoming request
> for failure, and then convert with Unicode::String ....

It'll likely go into the next libapreq-1.X release, and be 
carried on to 2.X versions as well.  1.1 is now on its 
way to becoming official (I don't know if it'll ever 
appear on CPAN though), so there's nothing holding up further 
progress on httpd-apreq.

-- 
Joe Schaefer

Re: Re[2]: ap_unescape_url can't escape %uXXXX

Posted by Joe Schaefer <jo...@sunstarsys.com>.
Lee Goddard <ho...@LeeGoddard.com> writes:

> Any idea if/when that'll be incorporated to the distributions?
> 
> I currently have to have a handler check every incoming request
> for failure, and then convert with Unicode::String ....

It'll likely go into the next libapreq-1.X release, and be 
carried on to 2.X versions as well.  1.1 is now on its 
way to becoming official (I don't know if it'll ever 
appear on CPAN though), so there's nothing holding up further 
progress on httpd-apreq.

-- 
Joe Schaefer

Re[2]: ap_unescape_url can't escape %uXXXX

Posted by Lee Goddard <ho...@LeeGoddard.com>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: MD5

Hi IKEBE,

On Tuesday, January 28, 2003 at 1:32:43 PM, you wrote:


IT> I have written a little patch for libapreq which unescape
IT> the %uXXXX style URI-escaped string.
IT> the unescape algorithm is based on CGI.pm

Any idea if/when that'll be incorporated to the distributions?

I currently have to have a handler check every incoming request
for failure, and then convert with Unicode::String ....

- --
Cheers
 Lee "Of course, if everyone used Lynx, this wouldn't be a problem"
 Goddard

-----BEGIN PGP SIGNATURE-----
Version: 2.6

iQCVAwUAPjaBYadrfekeF/QBAQHDJgP/YnPgGH8c3emGgbwnuvAB3B2jIvnetcD0
2nyE4ODThKoRuITHRX5qa9FvHtz2ouNM+pgDr0wo6TRyJM7sqmpzXVy/0XYw6NUV
j8nxkBiELC4F9JWyf+a91rzvTOA/eXDPizrOC9/OgKKn+ZH86GftoeNd+KvhM4TG
kRmE6bJ5O+4=
=ZPvi
-----END PGP SIGNATURE-----


Re[2]: ap_unescape_url can't escape %uXXXX

Posted by Lee Goddard <ho...@LeeGoddard.com>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: MD5

Hi IKEBE,

On Tuesday, January 28, 2003 at 1:32:43 PM, you wrote:


IT> I have written a little patch for libapreq which unescape
IT> the %uXXXX style URI-escaped string.
IT> the unescape algorithm is based on CGI.pm

Any idea if/when that'll be incorporated to the distributions?

I currently have to have a handler check every incoming request
for failure, and then convert with Unicode::String ....

- --
Cheers
 Lee "Of course, if everyone used Lynx, this wouldn't be a problem"
 Goddard

-----BEGIN PGP SIGNATURE-----
Version: 2.6

iQCVAwUAPjaBYadrfekeF/QBAQHDJgP/YnPgGH8c3emGgbwnuvAB3B2jIvnetcD0
2nyE4ODThKoRuITHRX5qa9FvHtz2ouNM+pgDr0wo6TRyJM7sqmpzXVy/0XYw6NUV
j8nxkBiELC4F9JWyf+a91rzvTOA/eXDPizrOC9/OgKKn+ZH86GftoeNd+KvhM4TG
kRmE6bJ5O+4=
=ZPvi
-----END PGP SIGNATURE-----


Re: ap_unescape_url can't escape %uXXXX

Posted by IKEBE Tomohiro <ik...@edge.co.jp>.
I have written a little patch for libapreq which unescape
the %uXXXX style URI-escaped string.
the unescape algorithm is based on CGI.pm

At Fri, 29 Nov 2002 06:55:54 +0900,
Tatsuhiko Miyagawa wrote:
> 
> It seems that Apache's ap_unescape_url() can't handle %uXXXX style
> URI-escaped Unicode string, hence Apache::Request cannot neighther,
> while CGI.pm can.
> 
> Is this a known issue?
> 
> 
> -- 
> Tatsuhiko Miyagawa <mi...@edge.co.jp>
> 

-- 
IKEBE Tomohiro <ik...@edge.co.jp>


Re: ap_unescape_url can't escape %uXXXX

Posted by Joe Schaefer <jo...@sunstarsys.com>.
Tatsuhiko Miyagawa <mi...@edge.co.jp> writes:

> At 29 Nov 2002 02:17:31 -0500,
> Joe Schaefer wrote:
>  > > It seems that Apache's ap_unescape_url() can't handle %uXXXX style
> > > URI-escaped Unicode string, hence Apache::Request cannot neighther,
> > > while CGI.pm can.
> 
> my WinIE 5.5 / WinIE 6.0 uses this style of URI escaping when you use
> javascript to submit page's content. 
  ^^^^^^^^^^

AFAICT ECMA-262 does NOT use %uXXXX for encoding full URI's (see 15.1.3's
description of the encodeURI function).  However, Appendix B.2 
does describe %uXXXX as an "extended feature" of the escape/unescape 
functions.

> (Well, I'm talking about MovableType's bookmarklet, if you're
> interested) 

Yes, I'm interested.  I'm just not certain about who to blame 
for that :-).  FWIW, I'm +1 on apreq supporting this, even if 
dev@httpd.apache.org decides against it.

-- 
Joe Schaefer

Re: ap_unescape_url can't escape %uXXXX

Posted by Joe Schaefer <jo...@sunstarsys.com>.
Tatsuhiko Miyagawa <mi...@edge.co.jp> writes:

> At 29 Nov 2002 02:17:31 -0500,
> Joe Schaefer wrote:
>  > > It seems that Apache's ap_unescape_url() can't handle %uXXXX style
> > > URI-escaped Unicode string, hence Apache::Request cannot neighther,
> > > while CGI.pm can.
> 
> my WinIE 5.5 / WinIE 6.0 uses this style of URI escaping when you use
> javascript to submit page's content. 
  ^^^^^^^^^^

AFAICT ECMA-262 does NOT use %uXXXX for encoding full URI's (see 15.1.3's
description of the encodeURI function).  However, Appendix B.2 
does describe %uXXXX as an "extended feature" of the escape/unescape 
functions.

> (Well, I'm talking about MovableType's bookmarklet, if you're
> interested) 

Yes, I'm interested.  I'm just not certain about who to blame 
for that :-).  FWIW, I'm +1 on apreq supporting this, even if 
dev@httpd.apache.org decides against it.

-- 
Joe Schaefer

Re: ap_unescape_url can't escape %uXXXX

Posted by Tatsuhiko Miyagawa <mi...@edge.co.jp>.
At 29 Nov 2002 02:17:31 -0500,
Joe Schaefer wrote:
 > > It seems that Apache's ap_unescape_url() can't handle %uXXXX style
> > URI-escaped Unicode string, hence Apache::Request cannot neighther,
> > while CGI.pm can.

my WinIE 5.5 / WinIE 6.0 uses this style of URI escaping when you use
javascript to submit page's content. (Well, I'm talking about
MovableType's bookmarklet, if you're interested)
> 
> seems to indicate that this isn't a recommended practice. OTOH, IIRC the 
> apache source claims to support utf8 extension(s) of www-urlencoded
> ASCII, so if people really are using such encodings, supporting 
> "%uXXXX" in ap_unescape_url shouldn't hurt server performance at all.
> 
> In any case, putting together a patch of ap_unescape_url along the lines 
> of CGI::Util's utf8_chr() can't hurt :-).

Yep ;-)


-- 
Tatsuhiko Miyagawa <mi...@edge.co.jp>

Re: ap_unescape_url can't escape %uXXXX

Posted by Tatsuhiko Miyagawa <mi...@edge.co.jp>.
At 29 Nov 2002 02:17:31 -0500,
Joe Schaefer wrote:
 > > It seems that Apache's ap_unescape_url() can't handle %uXXXX style
> > URI-escaped Unicode string, hence Apache::Request cannot neighther,
> > while CGI.pm can.

my WinIE 5.5 / WinIE 6.0 uses this style of URI escaping when you use
javascript to submit page's content. (Well, I'm talking about
MovableType's bookmarklet, if you're interested)
> 
> seems to indicate that this isn't a recommended practice. OTOH, IIRC the 
> apache source claims to support utf8 extension(s) of www-urlencoded
> ASCII, so if people really are using such encodings, supporting 
> "%uXXXX" in ap_unescape_url shouldn't hurt server performance at all.
> 
> In any case, putting together a patch of ap_unescape_url along the lines 
> of CGI::Util's utf8_chr() can't hurt :-).

Yep ;-)


-- 
Tatsuhiko Miyagawa <mi...@edge.co.jp>

Re: ap_unescape_url can't escape %uXXXX

Posted by Joe Schaefer <jo...@sunstarsys.com>.
Tatsuhiko Miyagawa <mi...@edge.co.jp> writes:

> It seems that Apache's ap_unescape_url() can't handle %uXXXX style
> URI-escaped Unicode string, hence Apache::Request cannot neighther,
> while CGI.pm can.

You may want to take this issue up on dev@httpd.apache.org.
Personally I've never seen this kind of character encoding, 
and my reading of

  Section 8 at http://www.w3.org/TR/charmod/ 
  and RFC 2718, Section 2.2.5, 

seems to indicate that this isn't a recommended practice. OTOH, IIRC the 
apache source claims to support utf8 extension(s) of www-urlencoded
ASCII, so if people really are using such encodings, supporting 
"%uXXXX" in ap_unescape_url shouldn't hurt server performance at all.

In any case, putting together a patch of ap_unescape_url along the lines 
of CGI::Util's utf8_chr() can't hurt :-).

-- 
Joe Schaefer

Re: ap_unescape_url can't escape %uXXXX

Posted by Joe Schaefer <jo...@sunstarsys.com>.
Tatsuhiko Miyagawa <mi...@edge.co.jp> writes:

> It seems that Apache's ap_unescape_url() can't handle %uXXXX style
> URI-escaped Unicode string, hence Apache::Request cannot neighther,
> while CGI.pm can.

You may want to take this issue up on dev@httpd.apache.org.
Personally I've never seen this kind of character encoding, 
and my reading of

  Section 8 at http://www.w3.org/TR/charmod/ 
  and RFC 2718, Section 2.2.5, 

seems to indicate that this isn't a recommended practice. OTOH, IIRC the 
apache source claims to support utf8 extension(s) of www-urlencoded
ASCII, so if people really are using such encodings, supporting 
"%uXXXX" in ap_unescape_url shouldn't hurt server performance at all.

In any case, putting together a patch of ap_unescape_url along the lines 
of CGI::Util's utf8_chr() can't hurt :-).

-- 
Joe Schaefer

Re: ap_unescape_url can't escape %uXXXX

Posted by IKEBE Tomohiro <ik...@edge.co.jp>.
I have written a little patch for libapreq which unescape
the %uXXXX style URI-escaped string.
the unescape algorithm is based on CGI.pm

At Fri, 29 Nov 2002 06:55:54 +0900,
Tatsuhiko Miyagawa wrote:
> 
> It seems that Apache's ap_unescape_url() can't handle %uXXXX style
> URI-escaped Unicode string, hence Apache::Request cannot neighther,
> while CGI.pm can.
> 
> Is this a known issue?
> 
> 
> -- 
> Tatsuhiko Miyagawa <mi...@edge.co.jp>
> 

-- 
IKEBE Tomohiro <ik...@edge.co.jp>