You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@perl.apache.org by Eric Cholet <ch...@logilune.com> on 2002/03/25 19:22:39 UTC

Re: cvs commit: modperl/t/net/perl util.pl

--On Sunday, March 24, 2002 21:57:54 +0000 dougm@apache.org wrote:

> dougm       02/03/24 13:57:53
>
>   Modified:    .        Changes STATUS
>                src/modules/perl Util.xs
>                t/net/perl util.pl
>   Log:
>   Submitted by:   Geoff Young <ge...@modperlcookbook.org>
>   Reviewed by:	dougm
>   properly escape highbit chars in Apache::Utils::escape_html

This is uncool for those of us using a non-ASCII encoding and sending
out lots of characters with the 8th bit set, e.g. in a French page
many accented characters will be replaced by 6-byte sequences.
If I'm sending out "Content-type: text/html; charset=ISO-8859-1",
and calling escape_html to escape '<', '>' and the like, I'm going
to be serving quite a lot more bytes than before this patch.

However escape_html () has no clue as to what the character set is,
and whether it has been correctly specified in the Content-Type.
It has also be mentionned here that escape_html is only valid for
single-byte encodings.

So this patch does the right thing to escape the odd 8 bit char in
a mostly ASCII output, but users of other charsets should be warned
not to use it. I use HTML::Entities::encode($_[0], '<>&"') myself.

Therefore I propose a doc patch to clear this up:

Index: Util.pm
===================================================================
RCS file: /home/cvs/modperl/Util/Util.pm,v
retrieving revision 1.8
diff -u -r1.8 Util.pm
--- Util.pm	4 Mar 2000 20:55:47 -0000	1.8
+++ Util.pm	25 Mar 2002 18:19:37 -0000
@@ -68,6 +68,13 @@

  my $esc = Apache::Util::escape_html($html);

+This function is unaware of its argument's character set and encoding.
+It assumes a single-byte encoding and escapes all characters with the
+8th bit set. Do not use it with multi-byte encodings such as utf8.
+When using a single byte non-ASCII encoding such as ISO-8859-1,
+consider specifying the character set in the Content-Type header,
+and using HTML::Entities to avoid unnecessary escaping.
+
 =item escape_uri

 This function replaces all unsafe characters in the $string with their


--
Eric Cholet


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@perl.apache.org
For additional commands, e-mail: dev-help@perl.apache.org


Re: cvs commit: modperl/t/net/perl util.pl

Posted by Issac Goldstand <ma...@beamartyr.net>.
A casual user won't understand that documentation... Hell, I'm not even 
sure I completely understand the implications of it and when to use/not 
use escape_html based on it...  I think an example is called for, but 
not in the POD...  Maybe in the Guide?

  Issac

Eric Cholet wrote:

> --On Sunday, March 24, 2002 21:57:54 +0000 dougm@apache.org wrote:
>
>> dougm       02/03/24 13:57:53
>>
>>   Modified:    .        Changes STATUS
>>                src/modules/perl Util.xs
>>                t/net/perl util.pl
>>   Log:
>>   Submitted by:   Geoff Young <ge...@modperlcookbook.org>
>>   Reviewed by:    dougm
>>   properly escape highbit chars in Apache::Utils::escape_html
>
>
> This is uncool for those of us using a non-ASCII encoding and sending
> out lots of characters with the 8th bit set, e.g. in a French page
> many accented characters will be replaced by 6-byte sequences.
> If I'm sending out "Content-type: text/html; charset=ISO-8859-1",
> and calling escape_html to escape '<', '>' and the like, I'm going
> to be serving quite a lot more bytes than before this patch.
>
> However escape_html () has no clue as to what the character set is,
> and whether it has been correctly specified in the Content-Type.
> It has also be mentionned here that escape_html is only valid for
> single-byte encodings.
>
> So this patch does the right thing to escape the odd 8 bit char in
> a mostly ASCII output, but users of other charsets should be warned
> not to use it. I use HTML::Entities::encode($_[0], '<>&"') myself.
>
> Therefore I propose a doc patch to clear this up:
>
> Index: Util.pm
> ===================================================================
> RCS file: /home/cvs/modperl/Util/Util.pm,v
> retrieving revision 1.8
> diff -u -r1.8 Util.pm
> --- Util.pm    4 Mar 2000 20:55:47 -0000    1.8
> +++ Util.pm    25 Mar 2002 18:19:37 -0000
> @@ -68,6 +68,13 @@
>
>  my $esc = Apache::Util::escape_html($html);
>
> +This function is unaware of its argument's character set and encoding.
> +It assumes a single-byte encoding and escapes all characters with the
> +8th bit set. Do not use it with multi-byte encodings such as utf8.
> +When using a single byte non-ASCII encoding such as ISO-8859-1,
> +consider specifying the character set in the Content-Type header,
> +and using HTML::Entities to avoid unnecessary escaping.
> +
> =item escape_uri
>
> This function replaces all unsafe characters in the $string with their
>
>
> -- 
> Eric Cholet
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@perl.apache.org
> For additional commands, e-mail: dev-help@perl.apache.org





---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@perl.apache.org
For additional commands, e-mail: dev-help@perl.apache.org


Re: cvs commit: modperl/t/net/perl util.pl

Posted by Issac Goldstand <ma...@beamartyr.net>.
A casual user won't understand that documentation... Hell, I'm not even 
sure I completely understand the implications of it and when to use/not 
use escape_html based on it...  I think an example is called for, but 
not in the POD...  Maybe in the Guide?

  Issac

Eric Cholet wrote:

> --On Sunday, March 24, 2002 21:57:54 +0000 dougm@apache.org wrote:
>
>> dougm       02/03/24 13:57:53
>>
>>   Modified:    .        Changes STATUS
>>                src/modules/perl Util.xs
>>                t/net/perl util.pl
>>   Log:
>>   Submitted by:   Geoff Young <ge...@modperlcookbook.org>
>>   Reviewed by:    dougm
>>   properly escape highbit chars in Apache::Utils::escape_html
>
>
> This is uncool for those of us using a non-ASCII encoding and sending
> out lots of characters with the 8th bit set, e.g. in a French page
> many accented characters will be replaced by 6-byte sequences.
> If I'm sending out "Content-type: text/html; charset=ISO-8859-1",
> and calling escape_html to escape '<', '>' and the like, I'm going
> to be serving quite a lot more bytes than before this patch.
>
> However escape_html () has no clue as to what the character set is,
> and whether it has been correctly specified in the Content-Type.
> It has also be mentionned here that escape_html is only valid for
> single-byte encodings.
>
> So this patch does the right thing to escape the odd 8 bit char in
> a mostly ASCII output, but users of other charsets should be warned
> not to use it. I use HTML::Entities::encode($_[0], '<>&"') myself.
>
> Therefore I propose a doc patch to clear this up:
>
> Index: Util.pm
> ===================================================================
> RCS file: /home/cvs/modperl/Util/Util.pm,v
> retrieving revision 1.8
> diff -u -r1.8 Util.pm
> --- Util.pm    4 Mar 2000 20:55:47 -0000    1.8
> +++ Util.pm    25 Mar 2002 18:19:37 -0000
> @@ -68,6 +68,13 @@
>
>  my $esc = Apache::Util::escape_html($html);
>
> +This function is unaware of its argument's character set and encoding.
> +It assumes a single-byte encoding and escapes all characters with the
> +8th bit set. Do not use it with multi-byte encodings such as utf8.
> +When using a single byte non-ASCII encoding such as ISO-8859-1,
> +consider specifying the character set in the Content-Type header,
> +and using HTML::Entities to avoid unnecessary escaping.
> +
> =item escape_uri
>
> This function replaces all unsafe characters in the $string with their
>
>
> -- 
> Eric Cholet
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@perl.apache.org
> For additional commands, e-mail: dev-help@perl.apache.org





Re: cvs commit: modperl/t/net/perl util.pl

Posted by Eric Cholet <ch...@logilune.com>.
>>> This function will correctly escape US-ASCII output. If you're using
>>> a different character set such as UTF8, or need more control on
>>> the escaping process, use HTML::Entities.
>>
> I like it too for the simple reason that it seems simple and doesn't
> worry the casual user, or confuse him/her with problems that they have no
> need to understand...  Silly as that may seem, I can personally attest to
> having gottent stuck with numerous software packages because of
> documentation that seemed important and took me hours to understand, only
> to realize it had nothing to do with me, or my use of the software...

I read your misunderstanding of my previous proposal and tried to keep
it simple... mod_perl's pod is no place to teach people the intricacies
of character encodings and the like.


--
Eric Cholet


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@perl.apache.org
For additional commands, e-mail: dev-help@perl.apache.org


Re: cvs commit: modperl/t/net/perl util.pl

Posted by Issac Goldstand <ma...@beamartyr.net>.
Robin Berjon wrote:

>On Tuesday 26 March 2002 15:31, Eric Cholet wrote:
>
>>--On Monday, March 25, 2002 11:04:02 -0800 Doug MacEachern
>>
>>>what we've done with escape_html already (diverging from apache) is just
>>>plain wrong.  i don't want to take it any further.  this should be
>>>implemented properly for 2.0 (in apache) and/or HTML::Entities can be
>>>written in xs.  modperl is not the right place to implement this
>>>functionality.
>>>
>>How about adding this to the doc for escape_html:
>>
>>This function will correctly escape ASCII output. If you're using
>>a different character set such as UTF8, or need more control on
>>the escaping process, use HTML::Entities.
>>
>
>Sorry I missed the beginning of the thread due to being busy and could only 
>salvage the end from my trash. However, I get the gist and I think that 
>Eric's suggestion works well (with the tiny nit that I'd make that US-ASCII 
>instead of ASCII for the sake of correctness).
>
>escape_html() as it is now does indeed have a potential for causing breakage, 
>but fixing it on modperl's side is a task that's somewhere between quite hard 
>and impossible. Imho a simple docpatch as the above is the way to go.
>
I like it too for the simple reason that it seems simple and doesn't 
worry the casual user, or confuse him/her with problems that they have 
no need to understand...  Silly as that may seem, I can personally 
attest to having gottent stuck with numerous software packages because 
of documentation that seemed important and took me hours to understand, 
only to realize it had nothing to do with me, or my use of the software...

  Issac



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@perl.apache.org
For additional commands, e-mail: dev-help@perl.apache.org


Re: cvs commit: modperl/t/net/perl util.pl

Posted by Robin Berjon <ro...@knowscape.com>.
On Tuesday 26 March 2002 15:31, Eric Cholet wrote:
> --On Monday, March 25, 2002 11:04:02 -0800 Doug MacEachern
> > what we've done with escape_html already (diverging from apache) is just
> > plain wrong.  i don't want to take it any further.  this should be
> > implemented properly for 2.0 (in apache) and/or HTML::Entities can be
> > written in xs.  modperl is not the right place to implement this
> > functionality.
>
> How about adding this to the doc for escape_html:
>
> This function will correctly escape ASCII output. If you're using
> a different character set such as UTF8, or need more control on
> the escaping process, use HTML::Entities.

Sorry I missed the beginning of the thread due to being busy and could only 
salvage the end from my trash. However, I get the gist and I think that 
Eric's suggestion works well (with the tiny nit that I'd make that US-ASCII 
instead of ASCII for the sake of correctness).

escape_html() as it is now does indeed have a potential for causing breakage, 
but fixing it on modperl's side is a task that's somewhere between quite hard 
and impossible. Imho a simple docpatch as the above is the way to go.

-- 
_______________________________________________________________________
Robin Berjon <ro...@knowscape.com> -- CTO
k n o w s c a p e : // venture knowledge agency www.knowscape.com
-----------------------------------------------------------------------
Don't panic.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@perl.apache.org
For additional commands, e-mail: dev-help@perl.apache.org


Re: cvs commit: modperl/t/net/perl util.pl

Posted by Eric Cholet <ch...@logilune.com>.

--On Monday, March 25, 2002 11:04:02 -0800 Doug MacEachern 
<do...@covalent.net> wrote:

> what we've done with escape_html already (diverging from apache) is just
> plain wrong.  i don't want to take it any further.  this should be
> implemented properly for 2.0 (in apache) and/or HTML::Entities can be
> written in xs.  modperl is not the right place to implement this
> functionality.

How about adding this to the doc for escape_html:

This function will correctly escape ASCII output. If you're using
a different character set such as UTF8, or need more control on
the escaping process, use HTML::Entities.


--
Eric Cholet


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@perl.apache.org
For additional commands, e-mail: dev-help@perl.apache.org


Re: cvs commit: modperl/t/net/perl util.pl

Posted by Issac Goldstand <ma...@beamartyr.net>.
Stas Bekman wrote:

> Doug MacEachern wrote:
>
>> what we've done with escape_html already (diverging from apache) is 
>> just plain wrong.  i don't want to take it any further.  this should 
>> be implemented properly for 2.0 (in apache) and/or HTML::Entities can 
>> be written in xs.  modperl is not the right place to implement this 
>> functionality.
>
>
> So should this function be marked as 'deprecated' in 1.27?
> And the alternative is the slow sister from HTML::Entities.
>
> Issac Goldstand wrote:
> > A casual user won't understand that documentation... Hell, I'm not even
> > sure I completely understand the implications of it and when to use/not
> > use escape_html based on it...  I think an example is called for, but
> > not in the POD...  Maybe in the Guide?
>
> patches are welcome ;)

Let's see if Apache 2.0 implements it or not before making the 
"official" recommendation...  :-)

  Issac


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@perl.apache.org
For additional commands, e-mail: dev-help@perl.apache.org


Re: cvs commit: modperl/t/net/perl util.pl

Posted by Doug MacEachern <do...@covalent.net>.
On Tue, 26 Mar 2002, Stas Bekman wrote:
 
> So should this function be marked as 'deprecated' in 1.27?

no.  we just shouldn't enhance the function anymore in 1.x.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@perl.apache.org
For additional commands, e-mail: dev-help@perl.apache.org


Re: cvs commit: modperl/t/net/perl util.pl

Posted by Stas Bekman <st...@stason.org>.
Doug MacEachern wrote:
> what we've done with escape_html already (diverging from apache) is just 
> plain wrong.  i don't want to take it any further.  this should be 
> implemented properly for 2.0 (in apache) and/or HTML::Entities can be 
> written in xs.  modperl is not the right place to implement this 
> functionality.

So should this function be marked as 'deprecated' in 1.27?
And the alternative is the slow sister from HTML::Entities.

Issac Goldstand wrote:
 > A casual user won't understand that documentation... Hell, I'm not even
 > sure I completely understand the implications of it and when to use/not
 > use escape_html based on it...  I think an example is called for, but
 > not in the POD...  Maybe in the Guide?

patches are welcome ;)

__________________________________________________________________
Stas Bekman            JAm_pH ------> Just Another mod_perl Hacker
http://stason.org/     mod_perl Guide ---> http://perl.apache.org
mailto:stas@stason.org http://use.perl.org http://apacheweek.com
http://modperlbook.org http://apache.org   http://ticketmaster.com


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@perl.apache.org
For additional commands, e-mail: dev-help@perl.apache.org


Re: cvs commit: modperl/t/net/perl util.pl

Posted by Doug MacEachern <do...@covalent.net>.
what we've done with escape_html already (diverging from apache) is just 
plain wrong.  i don't want to take it any further.  this should be 
implemented properly for 2.0 (in apache) and/or HTML::Entities can be 
written in xs.  modperl is not the right place to implement this 
functionality.





---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@perl.apache.org
For additional commands, e-mail: dev-help@perl.apache.org


Re: cvs commit: modperl/t/net/perl util.pl

Posted by Eric Cholet <ch...@logilune.com>.

--On Monday, March 25, 2002 10:29:11 -0800 Doug MacEachern 
<do...@covalent.net> wrote:

> i had a bad feeling about this.  we should not be implementing
> escape_html  to begin with, the functionality should all be in apache.
> i'm going to  back out the patch.  anybody care to make a doc patch to
> explain the  problems with escape_html before the patch went in?  thanks.

I believe the patch is useful though, in cases where the charset is not 
explicitely
specified and there's an odd character with the 8th bit set it will now do 
the
right thing. I guess a lot of US coders would fall in that situation... I 
suppose
it's faster than HTML::Entities (I haven't benchmarked it though).
So I suspect the patch will fix more situations than it breaks: if using
a single-byte non-ASCII encoding, it doesn't actually break anything, just 
adds
bloat. If using a multi-byte encoding escape_html was broken/inapplicable 
already.


--
Eric Cholet


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@perl.apache.org
For additional commands, e-mail: dev-help@perl.apache.org


Re: cvs commit: modperl/t/net/perl util.pl

Posted by Eric Cholet <ch...@logilune.com>.

--On Monday, March 25, 2002 10:29:11 -0800 Doug MacEachern 
<do...@covalent.net> wrote:

> i had a bad feeling about this.  we should not be implementing
> escape_html  to begin with, the functionality should all be in apache.
> i'm going to  back out the patch.  anybody care to make a doc patch to
> explain the  problems with escape_html before the patch went in?  thanks.

I believe the patch is useful though, in cases where the charset is not 
explicitely
specified and there's an odd character with the 8th bit set it will now do 
the
right thing. I guess a lot of US coders would fall in that situation... I 
suppose
it's faster than HTML::Entities (I haven't benchmarked it though).
So I suspect the patch will fix more situations than it breaks: if using
a single-byte non-ASCII encoding, it doesn't actually break anything, just 
adds
bloat. If using a multi-byte encoding escape_html was broken/inapplicable 
already.


--
Eric Cholet


Re: cvs commit: modperl/t/net/perl util.pl

Posted by Geoffrey Young <ge...@modperlcookbook.org>.
Doug MacEachern wrote:
> 
> i had a bad feeling about this.  we should not be implementing escape_html
> to begin with, the functionality should all be in apache.  i'm going to
> back out the patch. 

sounds wise, especially considering people like Eric will end up with larger pages as a
result, while the patch fixes a rather obscure vunerability, for which other solutions
(HTML::Entities) are available.

> anybody care to make a doc patch to explain the
> problems with escape_html before the patch went in?  

I nominate robin, since I forget how it came up in the first place :)

IIRC is was due to this post

http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2000-03/msg00750.html

and specifically an exploit involving browsers incorrectly assuming 0x8b as a "<" and 0x9b
as a ">", thus creating a way around escape_html().

Robin, does that accurately summarize it?  it's been far too long for me :)

--Geoff

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@perl.apache.org
For additional commands, e-mail: dev-help@perl.apache.org


Re: cvs commit: modperl/t/net/perl util.pl

Posted by Geoffrey Young <ge...@modperlcookbook.org>.
Doug MacEachern wrote:
> 
> i had a bad feeling about this.  we should not be implementing escape_html
> to begin with, the functionality should all be in apache.  i'm going to
> back out the patch. 

sounds wise, especially considering people like Eric will end up with larger pages as a
result, while the patch fixes a rather obscure vunerability, for which other solutions
(HTML::Entities) are available.

> anybody care to make a doc patch to explain the
> problems with escape_html before the patch went in?  

I nominate robin, since I forget how it came up in the first place :)

IIRC is was due to this post

http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2000-03/msg00750.html

and specifically an exploit involving browsers incorrectly assuming 0x8b as a "<" and 0x9b
as a ">", thus creating a way around escape_html().

Robin, does that accurately summarize it?  it's been far too long for me :)

--Geoff

Re: cvs commit: modperl/t/net/perl util.pl

Posted by Doug MacEachern <do...@covalent.net>.
i had a bad feeling about this.  we should not be implementing escape_html 
to begin with, the functionality should all be in apache.  i'm going to 
back out the patch.  anybody care to make a doc patch to explain the 
problems with escape_html before the patch went in?  thanks.



Re: cvs commit: modperl/t/net/perl util.pl

Posted by Doug MacEachern <do...@covalent.net>.
i had a bad feeling about this.  we should not be implementing escape_html 
to begin with, the functionality should all be in apache.  i'm going to 
back out the patch.  anybody care to make a doc patch to explain the 
problems with escape_html before the patch went in?  thanks.



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@perl.apache.org
For additional commands, e-mail: dev-help@perl.apache.org