You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@myfaces.apache.org by Curtiss Howard <cu...@gmail.com> on 2011/08/01 19:52:36 UTC

[Shared] Bug in HTMLEncoder.encodeURIAttribute()

Hi,

One of the teams we work with was recently complaining that whenever
they pass an already-encoded URL to <h:graphicImage> that they get a
double encoding, which never happened before in our previous release
(which corresponded to the 1.2 timeframe).  It appears that this is
the result of a bug in encodeURIAttribute(), introduced after 1.2 and
before 2.0:

                    else if (c == '%')
	            {
	                if (i + 2 < string.length())
	                {
	                    char c1 = string.charAt(i+1);
	                    char c2 = string.charAt(i+2);
	                    if ((( c1 >= '0' && c1 <='9') || (c1 >='A' && c1 <='Z')) &&
	                        (( c2 >= '0' && c2 <='9') || (c2 >='A' && c2 <='Z')))
	                    {
	                        // do not percent encode, because it could be
already encoded
	                        // and we don't want encode it twice
	                    }
	                    else
	                    {
	                        app = percentEncode(c, UTF8);
	                    }
	                }
	                else
	                {
	                    app = percentEncode(c, UTF8);
	                }
	            }

(Forgive me if the formatting doesn't come out right...)

So whenever we find a %, we try to look ahead and see if the next two
characters constitute two hex characters.  If so, we leave the % alone
since this is clearly a portion of the string that has already been
encoded.

There are two problems with that code however:

1. Checking for '0' <= c <= '9' && 'A' <= c <= 'Z' means that "%XX"
would remain unchanged, when really it needs to be "%2AXX".  We should
be checking for 'A' <= c <= 'F' since we're talking about hex
characters.
2. The problem our colleagues saw: we aren't handling lowercase characters!

So I just want to make sure everyone is in agreement with my
assessment that we need to change the character range from A..Z to
A..F and also check for a..f.  If so, I'll make the change.

Thanks,


Curtiss Howard

Re: [Shared] Bug in HTMLEncoder.encodeURIAttribute()

Posted by Curtiss Howard <cu...@gmail.com>.

On Mon, Aug 1, 2011 at 2:58 PM, Leonardo Uribe <lu...@gmail.com> wrote:
> Hi
>
> 2011/8/1 Curtiss Howard <cu...@gmail.com>:
>> Hi,
>>
>> One of the teams we work with was recently complaining that whenever
>> they pass an already-encoded URL to <h:graphicImage> that they get a
>> double encoding, which never happened before in our previous release
>> (which corresponded to the 1.2 timeframe).  It appears that this is
>> the result of a bug in encodeURIAttribute(), introduced after 1.2 and
>> before 2.0:
>>
>>                    else if (c == '%')
>>                    {
>>                        if (i + 2 < string.length())
>>                        {
>>                            char c1 = string.charAt(i+1);
>>                            char c2 = string.charAt(i+2);
>>                            if ((( c1 >= '0' && c1 <='9') || (c1 >='A' && c1 <='Z')) &&
>>                                (( c2 >= '0' && c2 <='9') || (c2 >='A' && c2 <='Z')))
>>                            {
>>                                // do not percent encode, because it could be
>> already encoded
>>                                // and we don't want encode it twice
>>                            }
>>                            else
>>                            {
>>                                app = percentEncode(c, UTF8);
>>                            }
>>                        }
>>                        else
>>                        {
>>                            app = percentEncode(c, UTF8);
>>                        }
>>                    }
>>
>> (Forgive me if the formatting doesn't come out right...)
>
> Just for reference the issue you are talking about is MYFACES-1841
> HtmlResponseWriterImpl.writeURIAttribute does not perform proper URLs
> encoding ( ex: & should be encoded in &amp)
>

That's the issue where this code was introduced, yes.

>>
>> So whenever we find a %, we try to look ahead and see if the next two
>> characters constitute two hex characters.  If so, we leave the % alone
>> since this is clearly a portion of the string that has already been
>> encoded.
>>
>> There are two problems with that code however:
>>
>> 1. Checking for '0' <= c <= '9' && 'A' <= c <= 'Z' means that "%XX"
>> would remain unchanged, when really it needs to be "%2AXX".  We should
>> be checking for 'A' <= c <= 'F' since we're talking about hex
>> characters.
>
> Are you mixing some javascript here? In theory if the percent encoding
> was already done, there is no reason to do it again, because this
> could break the generated URI. The algorithm check that condition and
> do this is intentional. Maybe the problem could be solved detecting
> this special condition when h:graphicImage do its job.
>

No, there's no javascript.  Just URLs.  Specifically, this is in a
Portal environment when using WSRP.  The image URLs are altered by
some Portal/WSRP magic into some sort of query string that includes
the original URL as a parameter (I don't follow all of the details,
I'm not a Portal guy).  Hence, they %-encode the URL before it reaches
the h:graphicImage renderer, which in turn calls
ResponseWriter.writeURIAttribute(), which in turn calls into
HTMLEncoder.encodeURIAttribute().

I've argued back and forth with them but I don't see any reason why
they can't pass a %-encoded value into
ResponseWriter.writeURIAttribute() and assume that they don't get
encoded twice.  In other words, the argument is that:

writeURIAttribute(writeURIAttribute (value)) === writeURIAttribute(value)

The code I've highlighted does what we're talking about, but it
clearly doesn't work with lowercase characters, which we both agree on
(and is the source of their problem; the URLs they pass in use
lowercase hex characters).

The second point amounts to something I think is a bug in the code
(rather than just a missing check).  It's deciding whether or not to
leave a % alone if the next two characters are alphanumeric.  I think
this is wrong, it should be checking for the next two characters to be
_hex only_.  After all, if my value is "%XX", then the result should
be "%25XX" since "XX" isn't valid hexadecimal.  But, I can understand
if changing the character range has greater implications.

>> 2. The problem our colleagues saw: we aren't handling lowercase characters!
>>
>> So I just want to make sure everyone is in agreement with my
>> assessment that we need to change the character range from A..Z to
>> A..F and also check for a..f.  If so, I'll make the change.
>
> I checked it and rfc 2141 specify lower case chars are valid in this
> case, so this change is valid. See:
>
> http://www.ietf.org/rfc/rfc2141.txt
>
> regards,
>
> Leonardo Uribe
>
>>
>> Thanks,
>>
>>
>> Curtiss Howard
>>
>

Re: [Shared] Bug in HTMLEncoder.encodeURIAttribute()

Posted by Leonardo Uribe <lu...@gmail.com>.

Hi

2011/8/1 Curtiss Howard <cu...@gmail.com>:
> Hi,
>
> One of the teams we work with was recently complaining that whenever
> they pass an already-encoded URL to <h:graphicImage> that they get a
> double encoding, which never happened before in our previous release
> (which corresponded to the 1.2 timeframe).  It appears that this is
> the result of a bug in encodeURIAttribute(), introduced after 1.2 and
> before 2.0:
>
>                    else if (c == '%')
>                    {
>                        if (i + 2 < string.length())
>                        {
>                            char c1 = string.charAt(i+1);
>                            char c2 = string.charAt(i+2);
>                            if ((( c1 >= '0' && c1 <='9') || (c1 >='A' && c1 <='Z')) &&
>                                (( c2 >= '0' && c2 <='9') || (c2 >='A' && c2 <='Z')))
>                            {
>                                // do not percent encode, because it could be
> already encoded
>                                // and we don't want encode it twice
>                            }
>                            else
>                            {
>                                app = percentEncode(c, UTF8);
>                            }
>                        }
>                        else
>                        {
>                            app = percentEncode(c, UTF8);
>                        }
>                    }
>
> (Forgive me if the formatting doesn't come out right...)

Just for reference the issue you are talking about is MYFACES-1841
HtmlResponseWriterImpl.writeURIAttribute does not perform proper URLs
encoding ( ex: & should be encoded in &amp)

>
> So whenever we find a %, we try to look ahead and see if the next two
> characters constitute two hex characters.  If so, we leave the % alone
> since this is clearly a portion of the string that has already been
> encoded.
>
> There are two problems with that code however:
>
> 1. Checking for '0' <= c <= '9' && 'A' <= c <= 'Z' means that "%XX"
> would remain unchanged, when really it needs to be "%2AXX".  We should
> be checking for 'A' <= c <= 'F' since we're talking about hex
> characters.

Are you mixing some javascript here? In theory if the percent encoding
was already done, there is no reason to do it again, because this
could break the generated URI. The algorithm check that condition and
do this is intentional. Maybe the problem could be solved detecting
this special condition when h:graphicImage do its job.

> 2. The problem our colleagues saw: we aren't handling lowercase characters!
>
> So I just want to make sure everyone is in agreement with my
> assessment that we need to change the character range from A..Z to
> A..F and also check for a..f.  If so, I'll make the change.

I checked it and rfc 2141 specify lower case chars are valid in this
case, so this change is valid. See:

http://www.ietf.org/rfc/rfc2141.txt

regards,

Leonardo Uribe

>
> Thanks,
>
>
> Curtiss Howard
>