You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Graham Leggett <mi...@sharp.fm> on 2010/09/18 19:10:26 UTC
mod_include: echo, entity encoding and UTF-8
Hi all,
When the SSI tag below is handled, the value of the string output to
the browser is entity encoded:
<!--#echo encoding="entity" var="MY_VAR"-->
This is done with a line that looks something like this:
/* PR#25202: escape anything non-ascii here */
echo_text = ap_escape_html2(ctx->dpool, val, 1);
The problem with the above is the parameter "1", which means that non-
ASCII characters are entity encoded as html escape sequences, and in
the process anything encoded with UTF-8 (and is not ASCII) breaks.
What I propose we do is change the value for v2.3+ as follows:
echo_text = ap_escape_html2(ctx->dpool, val, 0);
This allows UTF-8 character sequences to be passed through unchanged.
Past discussion in PR#25202 seems to revolve around backwards
compatibility, though with v2.4+ we have the power to change this
behaviour.
Does any cross site scripting risk result as the allowance of UTF-8
character sequences? I understand not, but would like to confirm.
Regards,
Graham
--
Re: mod_include: echo, entity encoding and UTF-8
Posted by Graham Leggett <mi...@sharp.fm>.
On 18 Sep 2010, at 7:10 PM, Graham Leggett wrote:
> When the SSI tag below is handled, the value of the string output to
> the browser is entity encoded:
>
> <!--#echo encoding="entity" var="MY_VAR"-->
>
> This is done with a line that looks something like this:
>
> /* PR#25202: escape anything non-ascii here */
> echo_text = ap_escape_html2(ctx->dpool, val, 1);
>
> The problem with the above is the parameter "1", which means that
> non-ASCII characters are entity encoded as html escape sequences,
> and in the process anything encoded with UTF-8 (and is not ASCII)
> breaks.
Looking further at PR25202, this caused a regression described in
PR47686 where UTF-8 support broke.
I've created a fix for this, where the "set" and "echo" SSI command
have been taught to handle "encoding" and "decoding" parameters.
For both echo and for set, the value is first decoded by the given
parameter, and then encoded by the given parameter. This allows full
control of the encoding and decoding of variables and echoed
parameters, depending on where they came from.
Encoding and decoding can contain multiple values, so that you can for
example strip off urlencoding, then entity encoding before using a
value, like this: decoding="url,entity".
Regards,
Graham
--