You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sling.apache.org by 付 尧 <gu...@gmail.com> on 2008/11/14 03:42:12 UTC

A bug in parameter encoding

Hi guys,

I think there is a bug in sling's parameter encoding handling. It even  
fails at simple sample. Like this:

<form action="/contents/hello" method="post">
	<input name="test" value="测试"/>
	<input type="hidden" name="_charset_" value="utf-8" />
</form>

The cause of this problem is simple. When Sling get the request from  
osgi's http service, sling wrap the request in to its request, and if  
it can't guess the encoding of the request, it will use ISO-8859-1 as  
default.

Then it will use _charset_ parameter to fix it(if has).

But, there is something wrong when fixing the encoding, in  
ContainerRequestParameter's setEncoding method it decode its value  
first using the default charset ISO-8859-1(which is set as default),  
so "测试" becomes "??".

My solution is change ContainerRequestParameter's setEncoding method,  
make it don't decode the string, only get the byte of its value and  
encode it.

Is there a better way to fix this bug?
-----
Jack




Re: A bug in parameter encoding

Posted by Bertrand Delacretaz <bd...@apache.org>.
On Fri, Nov 14, 2008 at 9:26 AM, 付 尧 <gu...@gmail.com> wrote:
>> Felix wrote:
>> ...For best (and proven) results:
>>
>>  * Send everything as UTF-8
>>  * Use encoding="multipart/form-data" for your forms
>>  * Keep the _charset_ parameter set to UTF-8 (or utf-8)
>
> Yes, that's what I do. It doesn't work when using  dojo ajax post in ie(When
> I use dojo in ie do the ajax post, sling didn't get the encoding, so it use
> iso-8859-1 as default)....

There should be a way to set the encoding of a POST done by dojo,
maybe the following resources help?

http://dojotoolkit.org/search/node/encoding
http://dojotoolkit.org/forum/dojo-core-dojo-0-9/dojo-core-support/character-encoding-xhrpost

What I'd try is create a plain form like Felix suggests, verify that
that works, spy the http traffic to see what the browser sends, and
tweak the dojo post so that it sends the exact same POST request.

-Bertrand

Re: A bug in parameter encoding

Posted by 付 尧 <gu...@gmail.com>.
Hi Felix,

I know the cause of the problem now.

IE didn't support url encoding in utf-8, I must encode every my post  
url.

Thanks for you helping. :)
-----
Jack



在 2008-11-14,下午6:34, Felix Meschberger 写道:

> Hi,
>
> 付 尧 schrieb:
>> Hi Felix,
>>
>> I can't find a way to make the simple form post work, along with ajax
>> too, :( .
>>
>> How should I write the form?
>>
>> like this?
>>  <form id="form" method="post" action="/test/encoding"
>> enctype="text/html; charset=utf-8">
>>    <input name="name" value="测试" />
>>    <input name="message" value="" />
>>    <input type="hidden" name="_charset_" value="utf-8" />
>>    <input type="submit" value="submit" />
>>  </form>
>>
>> It won't work.
>>
>> And this?
>> <form id="form" method="post" action="/test/encoding"
>> enctype="application/x-www-form-urlencoded; charset=utf-8">
>> won't work.
>>
>> And this?
>> <form id="form" method="post" action="/test/encoding"
>> enctype="multipart/form-data; charset=utf-8">
>> won't work either.
>
> The value of the enctype should just be "multipart/form-data" as in:
>
>   <form id="form" method="post" action="/test/encoding"
>         enctype="multipart/form-data">
>
>>
>> Ajax Post is like that too.
>
> Depending on how you do the Ajax Post, you might want to set the
> Content-Type header with the charset on the request.
>
>>
>> My page's encoding is utf-8, what should I do?
>
> This should work.
>
> If you still can't get it to work, I suggest you post a JIRA with a
> sample script/servlet generating the form and expecting the response,
> such that we can better reproduce your issue.
>
> Regards
> Felix
>
>>
>> -----
>> Jack
>>
>>
>>
>> 在 2008-11-14,下午5:18, Felix Meschberger 写道:
>>
>>> Hi Jack,
>>>
>>> 付 尧 schrieb:
>>>> Hi Felix,
>>>>
>>>>> The core problem is, that the browsers generally do not send the
>>>>> Cotent-Type header with the proper charset parameter. Instead  
>>>>> they tend
>>>>> to encode the form post using the same encoding as was used for  
>>>>> the
>>>>> HTML
>>>>> on which the form resides.
>>>>
>>>> Yes, that's the browser's problem.
>>>>
>>>>> So if you send the form with
>>>>>
>>>>> Content-Type: text/hml; charset=UTF-8
>>>>
>>>> My form really can't be that content type, especially in an ajax
>>>> request.
>>>
>>> So you are not using a normal form post but an AJAX request to  
>>> send the
>>> data ?
>>>
>>> In this case you might want to consider setting the ContentType  
>>> header
>>> explicitly ?
>>>
>>>>
>>>>> the POST is sent back encoded in UTF-8 and Sling will happily  
>>>>> decode as
>>>>> UTF-8 as instructed by the _charset_ parameter.
>>>>>
>>>>> For best (and proven) results:
>>>>>
>>>>> * Send everything as UTF-8
>>>>> * Use encoding="multipart/form-data" for your forms
>>>>> * Keep the _charset_ parameter set to UTF-8 (or utf-8)
>>>>
>>>> Yes, that's what I do. It doesn't work when using  dojo ajax post  
>>>> in
>>>> ie(When I use dojo in ie do the ajax post, sling didn't get the
>>>> encoding, so it use iso-8859-1 as default).
>>>
>>> I do not know enough of dojo and how dojo sends requests, but  
>>> can't you
>>> set the character encoding (or content type) when sending a POST  
>>> request
>>> with dojo ?
>>>
>>>>
>>>> Besides, if sling fix the encoding for the request parameter, then
>>>> ContainerRequestParameter's encoding and decoding really seems
>>>> unnecessary. Even everything is right, decoded every parameter   
>>>> value in
>>>> utf-8 then encode it with the same encoding again is an unnecessary
>>>> waste.
>>>
>>> Reencoding does not always take place, but some times it is  
>>> required.
>>> For example when reading multipart/form-data, Sling must assume an
>>> encoding upfront in case it is not sent by the client. And the  
>>> safest
>>> bet is ISO-8859-1 which may be used as a transient encoding before
>>> actually encoding in UTF-8.
>>>
>>> The reason why this works is, that ISO-8859-1 just takes any byte  
>>> value
>>> from the inputstream and assumes it to be a character whose  
>>> character
>>> code is in the 0..255 range.
>>>
>>> Regards
>>> Felix
>>>
>>
>>
>


Re: A bug in parameter encoding

Posted by Felix Meschberger <fm...@gmail.com>.
Hi,

付 尧 schrieb:
> Hi Felix,
> 
> I can't find a way to make the simple form post work, along with ajax
> too, :( .
> 
> How should I write the form?
> 
> like this?
>   <form id="form" method="post" action="/test/encoding"
> enctype="text/html; charset=utf-8">
>     <input name="name" value="测试" />
>     <input name="message" value="" />
>     <input type="hidden" name="_charset_" value="utf-8" />
>     <input type="submit" value="submit" />
>   </form>
> 
> It won't work.
> 
> And this?
> <form id="form" method="post" action="/test/encoding"
> enctype="application/x-www-form-urlencoded; charset=utf-8">
> won't work.
> 
> And this?
> <form id="form" method="post" action="/test/encoding"
> enctype="multipart/form-data; charset=utf-8">
> won't work either.

The value of the enctype should just be "multipart/form-data" as in:

   <form id="form" method="post" action="/test/encoding"
         enctype="multipart/form-data">

> 
> Ajax Post is like that too.

Depending on how you do the Ajax Post, you might want to set the
Content-Type header with the charset on the request.

> 
> My page's encoding is utf-8, what should I do?

This should work.

If you still can't get it to work, I suggest you post a JIRA with a
sample script/servlet generating the form and expecting the response,
such that we can better reproduce your issue.

Regards
Felix

> 
> -----
> Jack
> 
> 
> 
> 在 2008-11-14,下午5:18, Felix Meschberger 写道:
> 
>> Hi Jack,
>>
>> 付 尧 schrieb:
>>> Hi Felix,
>>>
>>>> The core problem is, that the browsers generally do not send the
>>>> Cotent-Type header with the proper charset parameter. Instead they tend
>>>> to encode the form post using the same encoding as was used for the
>>>> HTML
>>>> on which the form resides.
>>>
>>> Yes, that's the browser's problem.
>>>
>>>> So if you send the form with
>>>>
>>>>  Content-Type: text/hml; charset=UTF-8
>>>
>>> My form really can't be that content type, especially in an ajax
>>> request.
>>
>> So you are not using a normal form post but an AJAX request to send the
>> data ?
>>
>> In this case you might want to consider setting the ContentType header
>> explicitly ?
>>
>>>
>>>> the POST is sent back encoded in UTF-8 and Sling will happily decode as
>>>> UTF-8 as instructed by the _charset_ parameter.
>>>>
>>>> For best (and proven) results:
>>>>
>>>> * Send everything as UTF-8
>>>> * Use encoding="multipart/form-data" for your forms
>>>> * Keep the _charset_ parameter set to UTF-8 (or utf-8)
>>>
>>> Yes, that's what I do. It doesn't work when using  dojo ajax post in
>>> ie(When I use dojo in ie do the ajax post, sling didn't get the
>>> encoding, so it use iso-8859-1 as default).
>>
>> I do not know enough of dojo and how dojo sends requests, but can't you
>> set the character encoding (or content type) when sending a POST request
>> with dojo ?
>>
>>>
>>> Besides, if sling fix the encoding for the request parameter, then
>>> ContainerRequestParameter's encoding and decoding really seems
>>> unnecessary. Even everything is right, decoded every parameter  value in
>>> utf-8 then encode it with the same encoding again is an unnecessary
>>> waste.
>>
>> Reencoding does not always take place, but some times it is required.
>> For example when reading multipart/form-data, Sling must assume an
>> encoding upfront in case it is not sent by the client. And the safest
>> bet is ISO-8859-1 which may be used as a transient encoding before
>> actually encoding in UTF-8.
>>
>> The reason why this works is, that ISO-8859-1 just takes any byte value
>> from the inputstream and assumes it to be a character whose character
>> code is in the 0..255 range.
>>
>> Regards
>> Felix
>>
> 
> 


Re: A bug in parameter encoding

Posted by 付 尧 <gu...@gmail.com>.
Hi Felix,

I can't find a way to make the simple form post work, along with ajax  
too, :( .

How should I write the form?

like this?
   <form id="form" method="post" action="/test/encoding" enctype="text/ 
html; charset=utf-8">
     <input name="name" value="测试" />
     <input name="message" value="" />
     <input type="hidden" name="_charset_" value="utf-8" />
     <input type="submit" value="submit" />
   </form>

It won't work.

And this?
<form id="form" method="post" action="/test/encoding"  
enctype="application/x-www-form-urlencoded; charset=utf-8">
won't work.

And this?
<form id="form" method="post" action="/test/encoding"  
enctype="multipart/form-data; charset=utf-8">
won't work either.

Ajax Post is like that too.

My page's encoding is utf-8, what should I do?

-----
Jack



在 2008-11-14,下午5:18, Felix Meschberger 写道:

> Hi Jack,
>
> 付 尧 schrieb:
>> Hi Felix,
>>
>>> The core problem is, that the browsers generally do not send the
>>> Cotent-Type header with the proper charset parameter. Instead they  
>>> tend
>>> to encode the form post using the same encoding as was used for  
>>> the HTML
>>> on which the form resides.
>>
>> Yes, that's the browser's problem.
>>
>>> So if you send the form with
>>>
>>>  Content-Type: text/hml; charset=UTF-8
>>
>> My form really can't be that content type, especially in an ajax  
>> request.
>
> So you are not using a normal form post but an AJAX request to send  
> the
> data ?
>
> In this case you might want to consider setting the ContentType header
> explicitly ?
>
>>
>>> the POST is sent back encoded in UTF-8 and Sling will happily  
>>> decode as
>>> UTF-8 as instructed by the _charset_ parameter.
>>>
>>> For best (and proven) results:
>>>
>>> * Send everything as UTF-8
>>> * Use encoding="multipart/form-data" for your forms
>>> * Keep the _charset_ parameter set to UTF-8 (or utf-8)
>>
>> Yes, that's what I do. It doesn't work when using  dojo ajax post in
>> ie(When I use dojo in ie do the ajax post, sling didn't get the
>> encoding, so it use iso-8859-1 as default).
>
> I do not know enough of dojo and how dojo sends requests, but can't  
> you
> set the character encoding (or content type) when sending a POST  
> request
> with dojo ?
>
>>
>> Besides, if sling fix the encoding for the request parameter, then
>> ContainerRequestParameter's encoding and decoding really seems
>> unnecessary. Even everything is right, decoded every parameter   
>> value in
>> utf-8 then encode it with the same encoding again is an unnecessary  
>> waste.
>
> Reencoding does not always take place, but some times it is required.
> For example when reading multipart/form-data, Sling must assume an
> encoding upfront in case it is not sent by the client. And the safest
> bet is ISO-8859-1 which may be used as a transient encoding before
> actually encoding in UTF-8.
>
> The reason why this works is, that ISO-8859-1 just takes any byte  
> value
> from the inputstream and assumes it to be a character whose character
> code is in the 0..255 range.
>
> Regards
> Felix
>


Re: A bug in parameter encoding

Posted by Felix Meschberger <fm...@gmail.com>.
Hi Jack,

付 尧 schrieb:
> Hi Felix,
> 
>> The core problem is, that the browsers generally do not send the
>> Cotent-Type header with the proper charset parameter. Instead they tend
>> to encode the form post using the same encoding as was used for the HTML
>> on which the form resides.
> 
> Yes, that's the browser's problem.
> 
>> So if you send the form with
>>
>>   Content-Type: text/hml; charset=UTF-8
> 
> My form really can't be that content type, especially in an ajax request.

So you are not using a normal form post but an AJAX request to send the
data ?

In this case you might want to consider setting the ContentType header
explicitly ?

> 
>> the POST is sent back encoded in UTF-8 and Sling will happily decode as
>> UTF-8 as instructed by the _charset_ parameter.
>>
>> For best (and proven) results:
>>
>>  * Send everything as UTF-8
>>  * Use encoding="multipart/form-data" for your forms
>>  * Keep the _charset_ parameter set to UTF-8 (or utf-8)
> 
> Yes, that's what I do. It doesn't work when using  dojo ajax post in
> ie(When I use dojo in ie do the ajax post, sling didn't get the
> encoding, so it use iso-8859-1 as default).

I do not know enough of dojo and how dojo sends requests, but can't you
set the character encoding (or content type) when sending a POST request
with dojo ?

> 
> Besides, if sling fix the encoding for the request parameter, then
> ContainerRequestParameter's encoding and decoding really seems
> unnecessary. Even everything is right, decoded every parameter  value in
> utf-8 then encode it with the same encoding again is an unnecessary waste.

Reencoding does not always take place, but some times it is required.
For example when reading multipart/form-data, Sling must assume an
encoding upfront in case it is not sent by the client. And the safest
bet is ISO-8859-1 which may be used as a transient encoding before
actually encoding in UTF-8.

The reason why this works is, that ISO-8859-1 just takes any byte value
from the inputstream and assumes it to be a character whose character
code is in the 0..255 range.

Regards
Felix


Re: A bug in parameter encoding

Posted by 付 尧 <gu...@gmail.com>.
Hi Felix,

> The core problem is, that the browsers generally do not send the
> Cotent-Type header with the proper charset parameter. Instead they  
> tend
> to encode the form post using the same encoding as was used for the  
> HTML
> on which the form resides.

Yes, that's the browser's problem.

> So if you send the form with
>
>   Content-Type: text/hml; charset=UTF-8

My form really can't be that content type, especially in an ajax  
request.

> the POST is sent back encoded in UTF-8 and Sling will happily decode  
> as
> UTF-8 as instructed by the _charset_ parameter.
>
> For best (and proven) results:
>
>  * Send everything as UTF-8
>  * Use encoding="multipart/form-data" for your forms
>  * Keep the _charset_ parameter set to UTF-8 (or utf-8)

Yes, that's what I do. It doesn't work when using  dojo ajax post in  
ie(When I use dojo in ie do the ajax post, sling didn't get the  
encoding, so it use iso-8859-1 as default).

Besides, if sling fix the encoding for the request parameter, then  
ContainerRequestParameter's encoding and decoding really seems  
unnecessary. Even everything is right, decoded every parameter  value  
in utf-8 then encode it with the same encoding again is an unnecessary  
waste.

Thanks for you reply.

-----
Jack




Re: A bug in parameter encoding

Posted by Felix Meschberger <fm...@gmail.com>.
Hi Jack,

The core problem is, that the browsers generally do not send the
Cotent-Type header with the proper charset parameter. Instead they tend
to encode the form post using the same encoding as was used for the HTML
on which the form resides.

So if you send the form with

   Content-Type: text/hml; charset=UTF-8

the POST is sent back encoded in UTF-8 and Sling will happily decode as
UTF-8 as instructed by the _charset_ parameter.

If you send the form without setting any specific charset, the default
is assumed to be ISO-8859-1. Hence your POST comes back as ISO-8859-1
but Sling is instructed to use UTF-8.

In addition your sample implies the URL-encoding, which has its own
little niche of problems associated.

For best (and proven) results:

  * Send everything as UTF-8
  * Use encoding="multipart/form-data" for your forms
  * Keep the _charset_ parameter set to UTF-8 (or utf-8)

Hope this helps.

Regards
Felix

付 尧 schrieb:
> Hi guys,
> 
> I think there is a bug in sling's parameter encoding handling. It even
> fails at simple sample. Like this:
> 
> <form action="/contents/hello" method="post">
>     <input name="test" value="测试"/>
>     <input type="hidden" name="_charset_" value="utf-8" />
> </form>
> 
> The cause of this problem is simple. When Sling get the request from
> osgi's http service, sling wrap the request in to its request, and if it
> can't guess the encoding of the request, it will use ISO-8859-1 as default.
> 
> Then it will use _charset_ parameter to fix it(if has).
> 
> But, there is something wrong when fixing the encoding, in
> ContainerRequestParameter's setEncoding method it decode its value first
> using the default charset ISO-8859-1(which is set as default), so "测试"
> becomes "??".
> 
> My solution is change ContainerRequestParameter's setEncoding method,
> make it don't decode the string, only get the byte of its value and
> encode it.
> 
> Is there a better way to fix this bug?
> -----
> Jack
> 
> 
> 
>