You are viewing a plain text version of this content. The canonical link for it is here.
Posted to apreq-dev@httpd.apache.org by Nikolay Ananiev <an...@thegdb.com> on 2005/09/17 21:00:56 UTC

Problem with APR::Request::encode and UTF8 data

APR::Request::encode has a problem escaping binary data (or utf8)
This utf8 string: "\x{5c0f}\x{98fc} \x{5f3e}.txt" after encoding
should become '%E5%B0%8F%E9%A3%BC%20%E5%BC%BE.txt'
but it doesn't. It becomes '�%B0��Ǯ+Ȯ�.txt'
The following script demonstrates the problem and compares the results to
CGI.pm

#!/usr/bin/perl -w

use APR::Request;
use CGI::Util;

my $cgi_str = my $apr_str = "\x{5c0f}\x{98fc} \x{5f3e}.txt";
my $expected = '%E5%B0%8F%E9%A3%BC%20%E5%BC%BE.txt';

# strip utf8 flag. CGI.pm strips the utf8 flag
# internally, so don't strip it here
utf8::encode($apr_str);

$apr_str = APR::Request::encode($apr_str);
$cgi_str = CGI::Util::escape($cgi_str);

print "APR::Request\n";
print "STRING: $apr_str\n";
print "EXPECTED: $expected\n";

print "\nCGI.pm\n";
print "STRING: $cgi_str\n";
print "EXPECTED: $expected\n";




Re: Problem with APR::Request::encode and UTF8 data

Posted by Nikolay Ananiev <an...@thegdb.com>.
"William A. Rowe, Jr." <wr...@rowe-clan.net> wrote in message
news:432C7228.2080007@rowe-clan.net...
> Nikolay Ananiev wrote:
> > APR::Request::encode has a problem escaping binary data (or utf8)
> > This utf8 string: "\x{5c0f}\x{98fc} \x{5f3e}.txt" after encoding
>
> Your problem is that this isn't (apparently) a utf8 stream, but a
> unicode stream.  I'd be interested to know what this stream looks
> like if you dump it to a file (just the origin stream).  It might
> be nothing more than an issue with your perl grammer.
>
> Bill
>

I thought it was utf8 - I just took it from CGI.pm's t/util-58.t
Anyway, I tested the same example with some Cyrillic characters
and the result is the same - instead of escaped bytes, I get a broken string




Re: Problem with APR::Request::encode and UTF8 data

Posted by "William A. Rowe, Jr." <wr...@rowe-clan.net>.
Nikolay Ananiev wrote:
> APR::Request::encode has a problem escaping binary data (or utf8)
> This utf8 string: "\x{5c0f}\x{98fc} \x{5f3e}.txt" after encoding

Your problem is that this isn't (apparently) a utf8 stream, but a
unicode stream.  I'd be interested to know what this stream looks
like if you dump it to a file (just the origin stream).  It might
be nothing more than an issue with your perl grammer.

Bill

Re: Problem with APR::Request::encode and UTF8 data

Posted by "Philip M. Gollucci" <pg...@p6m7g8.com>.
Nikolay Ananiev wrote:
> Thanks Joe! Your patch fixes the problem.
> I have one question.
> Currently the white space is encoded as a plus sign.
> Shouldn't it be encoded as %20 ?
Both are correct -- it depends on the encoding scheme you are using.


-- 
END
------------------------------------------------------------
     What doesn't kill us can only make us stronger.
                 Nothing is impossible.
				
Philip M. Gollucci (pgollucci@p6m7g8.com) 301.254.5198
Consultant / http://p6m7g8.net/Resume/
Senior Developer / Liquidity Services, Inc.
   http://www.liquidityservicesinc.com
        http://www.liquidation.com
        http://www.uksurplus.com
        http://www.govliquidation.com
        http://www.gowholesale.com


Re: Problem with APR::Request::encode and UTF8 data

Posted by Nikolay Ananiev <an...@thegdb.com>.
"Joe Schaefer" <jo...@sunstarsys.com> wrote in message
news:87fyryjmv7.fsf@gemini.sunstarsys.com...
> "Nikolay Ananiev" <an...@thegdb.com> writes:
>
> > Win2000 Advanced server, ActivePerl 5.8.7 (build 813)
> > mod_perl 2.0.2-dev, latest apreq2 svn, httpd 2.0.54
> > (httpd, apreq2 and mp2 are compiled with VS .NET 2002)
>
> See if this patch helps any:
>
> Index: library/util.c
> [...]

Thanks Joe! Your patch fixes the problem.
I have one question.
Currently the white space is encoded as a plus sign.
Shouldn't it be encoded as %20 ?




Re: Problem with APR::Request::encode and UTF8 data

Posted by Joe Schaefer <jo...@sunstarsys.com>.
"Nikolay Ananiev" <an...@thegdb.com> writes:

> Win2000 Advanced server, ActivePerl 5.8.7 (build 813)
> mod_perl 2.0.2-dev, latest apreq2 svn, httpd 2.0.54
> (httpd, apreq2 and mp2 are compiled with VS .NET 2002)

See if this patch helps any:

Index: library/util.c
===================================================================
--- library/util.c	(revision 279108)
+++ library/util.c	(working copy)
@@ -497,11 +497,11 @@
 {
     char *d = dest;
     const unsigned char *s = (const unsigned char *)src;
-    unsigned c;
+    unsigned char c;
 
     for ( ; s < (const unsigned char *)src + slen; ++s) {
         c = *s;
-        if ( apr_isalnum(c) || c == '-' || c == '.' || c == '_' || c == '~' )
+        if ( (c < 0x80 && apr_isalnum(c)) || c == '-' || c == '.' || c == '_' || c == '~' )
             *d++ = c;
 
         else if ( c == ' ' )


-- 
Joe Schaefer


Re: Problem with APR::Request::encode and UTF8 data

Posted by Nikolay Ananiev <an...@thegdb.com>.
"Joe Schaefer" <jo...@sunstarsys.com> wrote in message
news:87k6hajt1a.fsf@gemini.sunstarsys.com...
> "Nikolay Ananiev" <an...@thegdb.com> writes:
>
> > APR::Request::encode has a problem escaping binary data (or utf8)
> > This utf8 string: "\x{5c0f}\x{98fc} \x{5f3e}.txt" after encoding
> > should become '%E5%B0%8F%E9%A3%BC%20%E5%BC%BE.txt'
> > but it doesn't. It becomes '�%B0��Ǯ+Ȯ�.txt'
>
> Thanks. I can't reproduce the error.  My perl version is 5.8.6.
> What's yours?  And what platform are you on?

Win2000 Advanced server, ActivePerl 5.8.7 (build 813)
mod_perl 2.0.2-dev, latest apreq2 svn, httpd 2.0.54
(httpd, apreq2 and mp2 are compiled with VS .NET 2002)




Re: Problem with APR::Request::encode and UTF8 data

Posted by Joe Schaefer <jo...@sunstarsys.com>.
"Nikolay Ananiev" <an...@thegdb.com> writes:

> APR::Request::encode has a problem escaping binary data (or utf8)
> This utf8 string: "\x{5c0f}\x{98fc} \x{5f3e}.txt" after encoding
> should become '%E5%B0%8F%E9%A3%BC%20%E5%BC%BE.txt'
> but it doesn't. It becomes 'È%B0ðÝÇ®+È®¬.txt'

Thanks. I can't reproduce the error.  My perl version is 5.8.6.
What's yours?  And what platform are you on?

-- 
Joe Schaefer