You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@perl.apache.org by David Wheeler <da...@kineticode.com> on 2004/09/10 04:52:13 UTC

Apache::Util::escape_html() and UTF-8

Hi All,

I got bit by a bug with Apache::Util's escape_html() function in  
mod_perl 1. It seems that it doesn't like Perl's Unicode encoded  
strings! This patch demonstrates the issue (be sure that your editor  
understands utf-8):

--- modperl/t/net/perl/util.pl.~1.18.~	Sun May 25 03:54:08 2003
+++ modperl/t/net/perl/util.pl	Thu Sep  9 19:38:40 2004
@@ -74,6 +74,25 @@

  #print $esc_2;
  test ++$i, $esc eq $esc_2;
+
+# Make sure that escape_html() understands multibyte characters.
+my $utf8 = '<專輯>';
+my $esc_utf8 = '&lt;專輯&gt;';
+my $test_esc_utf8 = Apache::Util::escape_html($utf8);
+test ++$i, $test_esc_utf8 eq $esc_utf8;
+#print STDERR "Compare '$test_esc_utf8'\n     to '$esc_utf8'\n";
+
+eval { require Encode };
+unless ($@) {
+    # Make sure escape_html() properly handles strings with Perl's
+    # Unicode encoding.
+    $utf8 = Encode::decode_utf8($utf8);
+    $esc_utf8 = Encode::decode_utf8($esc_utf8);
+    $test_esc_utf8 = Apache::Util::escape_html($utf8);
+    test ++$i, $test_esc_utf8 eq $esc_utf8;
+    #print STDERR "Compare '$test_esc_utf8'\n     to '$esc_utf8'\n";
+}
+
  use Benchmark;

  =pod

========================End Patch ======================================

If I enable the print statements and look at the log, I see this:

Compare '&lt;專輯&gt;'
      to '&lt;專輯&gt;'
Compare '&lt;å°è¼¯&gt;'
      to '&lt;專輯&gt;'

The first escape appears to work correctly, but when I decode the  
string to Perl's Unicode representation, you can see how badly  
escape_html() munges the text!

Curiously, both tests fail, although the first conversion appears to be  
correct. This could be due to the behavior of C<eq>, though I'm not  
sure why. But it's the second test that's the more interesting, since  
it really screws things up.

If you have trouble reading the Unicode characters in this email, I've  
also posted it to my blog.

    
http://www.justatheory.com/computers/programming/perl/mod_perl/ 
escape_html_utf8.html

Regards,

David


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@perl.apache.org
For additional commands, e-mail: dev-help@perl.apache.org


Re: Apache::Util::escape_html() and UTF-8

Posted by David Wheeler <da...@kineticode.com>.
On Sep 10, 2004, at 4:04 PM, Stas Bekman wrote:

> Any chance you have a patch to fix that too, David? At the moment we 
> have zero time to look at mp1 bugs.

No, that's C. It beats me. My workaround is:

   *escape_html = \&HTML::Entities::encode_entities;

>  Once mp2 is out it might be more likely. Does it work fine with mp2?

Dunno, I haven't played with mp2 at all yet.

Regards,

David


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@perl.apache.org
For additional commands, e-mail: dev-help@perl.apache.org


Re: Apache::Util::escape_html() and UTF-8

Posted by Stas Bekman <st...@stason.org>.
David Wheeler wrote:
> Hi All,
> 
> I got bit by a bug with Apache::Util's escape_html() function in  
> mod_perl 1. It seems that it doesn't like Perl's Unicode encoded  
> strings! This patch demonstrates the issue (be sure that your editor  
> understands utf-8):

Any chance you have a patch to fix that too, David? At the moment we have 
zero time to look at mp1 bugs. Once mp2 is out it might be more likely. 
Does it work fine with mp2?

> --- modperl/t/net/perl/util.pl.~1.18.~    Sun May 25 03:54:08 2003
> +++ modperl/t/net/perl/util.pl    Thu Sep  9 19:38:40 2004
> @@ -74,6 +74,25 @@
> 
>  #print $esc_2;
>  test ++$i, $esc eq $esc_2;
> +
> +# Make sure that escape_html() understands multibyte characters.
> +my $utf8 = '<專輯>';
> +my $esc_utf8 = '&lt;專輯&gt;';
> +my $test_esc_utf8 = Apache::Util::escape_html($utf8);
> +test ++$i, $test_esc_utf8 eq $esc_utf8;
> +#print STDERR "Compare '$test_esc_utf8'\n     to '$esc_utf8'\n";
> +
> +eval { require Encode };
> +unless ($@) {
> +    # Make sure escape_html() properly handles strings with Perl's
> +    # Unicode encoding.
> +    $utf8 = Encode::decode_utf8($utf8);
> +    $esc_utf8 = Encode::decode_utf8($esc_utf8);
> +    $test_esc_utf8 = Apache::Util::escape_html($utf8);
> +    test ++$i, $test_esc_utf8 eq $esc_utf8;
> +    #print STDERR "Compare '$test_esc_utf8'\n     to '$esc_utf8'\n";
> +}
> +
>  use Benchmark;
> 
>  =pod
> 
> ========================End Patch ======================================
> 
> If I enable the print statements and look at the log, I see this:
> 
> Compare '&lt;專輯&gt;'
>      to '&lt;專輯&gt;'
> Compare '&lt;å°è¼¯&gt;'
>      to '&lt;專輯&gt;'
> 
> The first escape appears to work correctly, but when I decode the  
> string to Perl's Unicode representation, you can see how badly  
> escape_html() munges the text!
> 
> Curiously, both tests fail, although the first conversion appears to be  
> correct. This could be due to the behavior of C<eq>, though I'm not  
> sure why. But it's the second test that's the more interesting, since  
> it really screws things up.
> 
> If you have trouble reading the Unicode characters in this email, I've  
> also posted it to my blog.
> 
>    http://www.justatheory.com/computers/programming/perl/mod_perl/ 
> escape_html_utf8.html
> 
> Regards,
> 
> David
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@perl.apache.org
> For additional commands, e-mail: dev-help@perl.apache.or
> g


-- 
__________________________________________________________________
Stas Bekman            JAm_pH ------> Just Another mod_perl Hacker
http://stason.org/     mod_perl Guide ---> http://perl.apache.org
mailto:stas@stason.org http://use.perl.org http://apacheweek.com
http://modperlbook.org http://apache.org   http://ticketmaster.com

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@perl.apache.org
For additional commands, e-mail: dev-help@perl.apache.org