You are viewing a plain text version of this content. The canonical link for it is here.
Posted to apreq-dev@httpd.apache.org by Nikolay Ananiev <an...@thegdb.com> on 2005/09/17 21:00:56 UTC
Problem with APR::Request::encode and UTF8 data
APR::Request::encode has a problem escaping binary data (or utf8)
This utf8 string: "\x{5c0f}\x{98fc} \x{5f3e}.txt" after encoding
should become '%E5%B0%8F%E9%A3%BC%20%E5%BC%BE.txt'
but it doesn't. It becomes '�%B0��Ǯ+Ȯ�.txt'
The following script demonstrates the problem and compares the results to
CGI.pm
#!/usr/bin/perl -w
use APR::Request;
use CGI::Util;
my $cgi_str = my $apr_str = "\x{5c0f}\x{98fc} \x{5f3e}.txt";
my $expected = '%E5%B0%8F%E9%A3%BC%20%E5%BC%BE.txt';
# strip utf8 flag. CGI.pm strips the utf8 flag
# internally, so don't strip it here
utf8::encode($apr_str);
$apr_str = APR::Request::encode($apr_str);
$cgi_str = CGI::Util::escape($cgi_str);
print "APR::Request\n";
print "STRING: $apr_str\n";
print "EXPECTED: $expected\n";
print "\nCGI.pm\n";
print "STRING: $cgi_str\n";
print "EXPECTED: $expected\n";
Re: Problem with APR::Request::encode and UTF8 data
Posted by Nikolay Ananiev <an...@thegdb.com>.
"William A. Rowe, Jr." <wr...@rowe-clan.net> wrote in message
news:432C7228.2080007@rowe-clan.net...
> Nikolay Ananiev wrote:
> > APR::Request::encode has a problem escaping binary data (or utf8)
> > This utf8 string: "\x{5c0f}\x{98fc} \x{5f3e}.txt" after encoding
>
> Your problem is that this isn't (apparently) a utf8 stream, but a
> unicode stream. I'd be interested to know what this stream looks
> like if you dump it to a file (just the origin stream). It might
> be nothing more than an issue with your perl grammer.
>
> Bill
>
I thought it was utf8 - I just took it from CGI.pm's t/util-58.t
Anyway, I tested the same example with some Cyrillic characters
and the result is the same - instead of escaped bytes, I get a broken string
Re: Problem with APR::Request::encode and UTF8 data
Posted by "William A. Rowe, Jr." <wr...@rowe-clan.net>.
Nikolay Ananiev wrote:
> APR::Request::encode has a problem escaping binary data (or utf8)
> This utf8 string: "\x{5c0f}\x{98fc} \x{5f3e}.txt" after encoding
Your problem is that this isn't (apparently) a utf8 stream, but a
unicode stream. I'd be interested to know what this stream looks
like if you dump it to a file (just the origin stream). It might
be nothing more than an issue with your perl grammer.
Bill
Re: Problem with APR::Request::encode and UTF8 data
Posted by "Philip M. Gollucci" <pg...@p6m7g8.com>.
Nikolay Ananiev wrote:
> Thanks Joe! Your patch fixes the problem.
> I have one question.
> Currently the white space is encoded as a plus sign.
> Shouldn't it be encoded as %20 ?
Both are correct -- it depends on the encoding scheme you are using.
--
END
------------------------------------------------------------
What doesn't kill us can only make us stronger.
Nothing is impossible.
Philip M. Gollucci (pgollucci@p6m7g8.com) 301.254.5198
Consultant / http://p6m7g8.net/Resume/
Senior Developer / Liquidity Services, Inc.
http://www.liquidityservicesinc.com
http://www.liquidation.com
http://www.uksurplus.com
http://www.govliquidation.com
http://www.gowholesale.com
Re: Problem with APR::Request::encode and UTF8 data
Posted by Nikolay Ananiev <an...@thegdb.com>.
"Joe Schaefer" <jo...@sunstarsys.com> wrote in message
news:87fyryjmv7.fsf@gemini.sunstarsys.com...
> "Nikolay Ananiev" <an...@thegdb.com> writes:
>
> > Win2000 Advanced server, ActivePerl 5.8.7 (build 813)
> > mod_perl 2.0.2-dev, latest apreq2 svn, httpd 2.0.54
> > (httpd, apreq2 and mp2 are compiled with VS .NET 2002)
>
> See if this patch helps any:
>
> Index: library/util.c
> [...]
Thanks Joe! Your patch fixes the problem.
I have one question.
Currently the white space is encoded as a plus sign.
Shouldn't it be encoded as %20 ?
Re: Problem with APR::Request::encode and UTF8 data
Posted by Joe Schaefer <jo...@sunstarsys.com>.
"Nikolay Ananiev" <an...@thegdb.com> writes:
> Win2000 Advanced server, ActivePerl 5.8.7 (build 813)
> mod_perl 2.0.2-dev, latest apreq2 svn, httpd 2.0.54
> (httpd, apreq2 and mp2 are compiled with VS .NET 2002)
See if this patch helps any:
Index: library/util.c
===================================================================
--- library/util.c (revision 279108)
+++ library/util.c (working copy)
@@ -497,11 +497,11 @@
{
char *d = dest;
const unsigned char *s = (const unsigned char *)src;
- unsigned c;
+ unsigned char c;
for ( ; s < (const unsigned char *)src + slen; ++s) {
c = *s;
- if ( apr_isalnum(c) || c == '-' || c == '.' || c == '_' || c == '~' )
+ if ( (c < 0x80 && apr_isalnum(c)) || c == '-' || c == '.' || c == '_' || c == '~' )
*d++ = c;
else if ( c == ' ' )
--
Joe Schaefer
Re: Problem with APR::Request::encode and UTF8 data
Posted by Nikolay Ananiev <an...@thegdb.com>.
"Joe Schaefer" <jo...@sunstarsys.com> wrote in message
news:87k6hajt1a.fsf@gemini.sunstarsys.com...
> "Nikolay Ananiev" <an...@thegdb.com> writes:
>
> > APR::Request::encode has a problem escaping binary data (or utf8)
> > This utf8 string: "\x{5c0f}\x{98fc} \x{5f3e}.txt" after encoding
> > should become '%E5%B0%8F%E9%A3%BC%20%E5%BC%BE.txt'
> > but it doesn't. It becomes '�%B0��Ǯ+Ȯ�.txt'
>
> Thanks. I can't reproduce the error. My perl version is 5.8.6.
> What's yours? And what platform are you on?
Win2000 Advanced server, ActivePerl 5.8.7 (build 813)
mod_perl 2.0.2-dev, latest apreq2 svn, httpd 2.0.54
(httpd, apreq2 and mp2 are compiled with VS .NET 2002)
Re: Problem with APR::Request::encode and UTF8 data
Posted by Joe Schaefer <jo...@sunstarsys.com>.
"Nikolay Ananiev" <an...@thegdb.com> writes:
> APR::Request::encode has a problem escaping binary data (or utf8)
> This utf8 string: "\x{5c0f}\x{98fc} \x{5f3e}.txt" after encoding
> should become '%E5%B0%8F%E9%A3%BC%20%E5%BC%BE.txt'
> but it doesn't. It becomes 'È%B0ðÝÇ®+È®¬.txt'
Thanks. I can't reproduce the error. My perl version is 5.8.6.
What's yours? And what platform are you on?
--
Joe Schaefer