You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tomcat.apache.org by bu...@apache.org on 2003/12/31 22:38:37 UTC

DO NOT REPLY [Bug 25848] New: - ServletRequest.setCharacterEncoding() does not appear to work

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=25848>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=25848

ServletRequest.setCharacterEncoding() does not appear to work

           Summary: ServletRequest.setCharacterEncoding() does not appear to
                    work
           Product: Tomcat 5
           Version: 5.0.16
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: Major
          Priority: Other
         Component: Connector:Coyote
        AssignedTo: tomcat-dev@jakarta.apache.org
        ReportedBy: jessh@ptc.com


We have pages that process non-ASCII characters in request parameter values 
just fine with Tomcat 4.1.24 by encoding the URL request string (this is a GET 
request) and then using ServletRequest.setCharacterEncoding("UTF-8") when 
processing the resulting request.

This is failing in Tomcat 5.0.16 -- all else being held equal.

I've checked the bug database and see various issues with setCharacterEncoding
() and interactions with request dump valves, Jasper log-levels, etc.  I 
checked each one of these in my Tomcat 5 configuration and none of these are 
set.

I have been poking around in the debugger and can't seem to find anything amiss 
(e.g. tell-tale use of parseRequestParameters() prior to use of 
setCharacterEncoding()), but I'm coming up empty thus far...

---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-dev-help@jakarta.apache.org


Re: Bug 23929: ServletRequest.setCharacterEncoding()

Posted by Jess Holle <je...@ptc.com>.
> From a developers point of view however, applying the above two points
> a) brakes expected behaviour (setCharacterEncoding() method does not 
> work the same as before)
> b) does not give an acceptable alternative (if all parameter passing 
> could be solved with POST method, then the GET method would not be 
> needed, would it?)
> c) a lot of web apps stopped working when an upgrade of the tomcat 
> version was performed
>
> So I think it is legitimate to be upset when first confronted with 
> this change of behaviour.

I will not claim that I was reasonable when originally confronted with 
the change.

I will say that:

   1. Our existing (4.1.x) usage of setCharacterEncoding() works across
      all recent servlet engines tested [including 2 commercial servlet
      engines] -- and is thus some indication of a de facto standard.
   2. It would seem from examples provided with setCharacterEncoding()
      by Sun that the intent is to include request parameters and that
      thus this should be the default operation of this API rather than
      requiring additional configuration to obtain this behavior.

> As for how easy it is to NOT file duplicate bugs on this issue, having 
> followed this debate, I have collected the following list of somehow 
> related bugs

I did searches again after being scolded by Remy.  I must admit that I 
must have crossed wires when doing searches and filing bugs and somehow 
managed to miss this search (which it is my habit to do).

> Speaking for myself and having reread these messages:
> Assuming I 've been working for some time with the old behaviour and 
> experienced the new one, I would not be able to understand why this 
> change was made, EVEN if someone gave me the above list of bugs.

Agreed.  Without a short summary attached to the bugs I would still have 
filed a new bug and argued to high hell...

--
Jess Holle


Re: Bug 23929: ServletRequest.setCharacterEncoding()

Posted by Remy Maucherat <re...@apache.org>.
Stefanos Karasavvidis wrote:
> If not already done, port the useBodyEncodingForURI parameter to the 
> next 4.1.x release.

This new flag has been ported last month.

Rémy


---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-dev-help@jakarta.apache.org


Re: Bug 23929: ServletRequest.setCharacterEncoding()

Posted by Stefanos Karasavvidis <st...@msc.gr>.
Remy Maucherat wrote:
> Jess Holle wrote:
> 
>> Remy Maucherat wrote:
>>
>> This is a good question -- but one which only applies to POST.  My bug 
>> case was explictly with GET.
>>
>> If there is an entity body encoding specified in the request, then I 
>> am not sure which should override.  If there is not, then I would 
>> presume setCharacterEncoding() should win out.  If the only issue is 
>> when these differ, then I believe that site designers should simply 
>> ensure they don't.
> 
> 
> I think you should read the HTTP RFC. content-type does not apply to the 
> URI or the HTTP header. The fact that setCharacterEncoding would apply 
> to (part of) the URI and/or the header violates the RFC on URIs.
> 
> Anyway, to put it simply: in the next release, add 
> useBodyEncodingForURI="true" on the connector, and you're done.
> Please don't complain that it won't do what you want before trying it.
> You can also use the URIEncoding attribute to specify the path encoding.
> 
> Rémy
> 
my 2 cents on this issue,

Remy is for sure right stating that
a) the HTTP RFC does not cover variable character encoding for query 
parameters for different requests
b) it is (sounds ?) logical to assume that the whole URI (including 
paths, query parameters etc.) should be considered as being encoded with 
the same character encoding

 From a developers point of view however, applying the above two points
a) brakes expected behaviour (setCharacterEncoding() method does not 
work the same as before)
b) does not give an acceptable alternative (if all parameter passing 
could be solved with POST method, then the GET method would not be 
needed, would it?)
c) a lot of web apps stopped working when an upgrade of the tomcat 
version was performed

So I think it is legitimate to be upset when first confronted with this 
change of behaviour.

As for how easy it is to NOT file duplicate bugs on this issue, having 
followed this debate, I have collected the following list of somehow 
related bugs
bug 25360
bug 25231
bug 25235
bug 22666
bug 24557
bug 24345
bug 23929
bug 25848
and of course a bunch of messages in the developer list

Speaking for myself and having reread these messages:
Assuming I 've been working for some time with the old behaviour and 
experienced the new one, I would not be able to understand why this 
change was made, EVEN if someone gave me the above list of bugs.

I propose the following:
write a short summary of why this change was necessary and include the 
above list of bugs, as well as links to the related developer list 
threads. Then submit a link to this summary to all the above bugs.
If not already done, port the useBodyEncodingForURI parameter to the 
next 4.1.x release.

I volunteer to write the summary if the list thinks that the proposal is 
reasonable.

Regards

Stefanos

---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-dev-help@jakarta.apache.org


Re: Bug 23929: ServletRequest.setCharacterEncoding()

Posted by Remy Maucherat <re...@apache.org>.
Jess Holle wrote:
> Remy Maucherat wrote:
> 
> This is a good question -- but one which only applies to POST.  My bug 
> case was explictly with GET.
> 
> If there is an entity body encoding specified in the request, then I am 
> not sure which should override.  If there is not, then I would presume 
> setCharacterEncoding() should win out.  If the only issue is when these 
> differ, then I believe that site designers should simply ensure they don't.

I think you should read the HTTP RFC. content-type does not apply to the 
URI or the HTTP header. The fact that setCharacterEncoding would apply 
to (part of) the URI and/or the header violates the RFC on URIs.

Anyway, to put it simply: in the next release, add 
useBodyEncodingForURI="true" on the connector, and you're done.
Please don't complain that it won't do what you want before trying it.
You can also use the URIEncoding attribute to specify the path encoding.

Rémy



---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-dev-help@jakarta.apache.org


Re: Bug 23929: ServletRequest.setCharacterEncoding()

Posted by Jess Holle <je...@ptc.com>.
Remy Maucherat wrote:

> Jess Holle wrote:
>
>> Remy Maucherat wrote:
>>
>>> For example:
>>> remm        2003/12/10 14:26:28
>>>
>>>   Modified:    catalina/src/share/org/apache/coyote/tomcat5
>>>                         CoyoteConnector.java CoyoteRequest.java
>>>                         mbeans-descriptors.xml
>>>   Log:
>>>   - Add a flag to allow using the encoding specified in the 
>>> contentType for
>>>     the URI paramters. This is disabled by default, not compliant 
>>> with the standards,
>>>     but present for compatibility.
>>
>> But as per my previous message I /cannot /change this on a connector 
>> basis.  I /must /make this determination on a per-request basis -- 
>> /and the servlet spec specifically allows me to do this via the 
>> setCharacterEncoding() API as I read it/.
>
> The content-type header and your setCharacterEncoding call both 
> control the request entity body character encoding. So if using the 
> entity body encoding *also* for URI parameters, what would you think 
> it would do ?

This is a good question -- but one which only applies to POST.  My bug 
case was explictly with GET.

If there is an entity body encoding specified in the request, then I am 
not sure which should override.  If there is not, then I would presume 
setCharacterEncoding() should win out.  If the only issue is when these 
differ, then I believe that site designers should simply ensure they don't.

>>> There's a query page in BZ, also, and as I said, many threads on 
>>> tomcat-dev (use the archives).
>>
>> I queried both at some length -- especially BZ.  I'll query the 
>> tomcat-dev archives further, but again a simple synopsis of how 
>> Tomcat's behavior satisfies the spec and is thus not a bug attached 
>> to the bug would save everyone a lot of trouble in cases like this.  
>> In other words, where a bug that from all indications appears to be a 
>> spec violation is closed as "INVALID" an explanation attached to the 
>> bug itself would be a *very* good idea.
>
> Sorry, I'm not a broken record, and I will not go on repeating the 
> same stuff over and over 20 times.

Just once on the one of the bug reports in the duplicate chain would 
suffice.  [At least in my handling of our internal bug system it is 
common place to copy/paste the final status from e-mail threads and/or 
lists into the bugs attachments when closing the bug.]

--
Jess Holle



---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-dev-help@jakarta.apache.org


Re: Bug 23929: ServletRequest.setCharacterEncoding()

Posted by Remy Maucherat <re...@apache.org>.
Jess Holle wrote:
> Remy Maucherat wrote:
>>
>> For example:
>> remm        2003/12/10 14:26:28
>>
>>   Modified:    catalina/src/share/org/apache/coyote/tomcat5
>>                         CoyoteConnector.java CoyoteRequest.java
>>                         mbeans-descriptors.xml
>>   Log:
>>   - Add a flag to allow using the encoding specified in the 
>> contentType for
>>     the URI paramters. This is disabled by default, not compliant with 
>> the standards,
>>     but present for compatibility.
> 
> But as per my previous message I /cannot /change this on a connector 
> basis.  I /must /make this determination on a per-request basis -- /and 
> the servlet spec specifically allows me to do this via the 
> setCharacterEncoding() API as I read it/.

The content-type header and your setCharacterEncoding call both control 
the request entity body character encoding. So if using the entity body 
encoding *also* for URI parameters, what would you think it would do ?

>> There's a query page in BZ, also, and as I said, many threads on 
>> tomcat-dev (use the archives).
> 
> I queried both at some length -- especially BZ.  I'll query the 
> tomcat-dev archives further, but again a simple synopsis of how Tomcat's 
> behavior satisfies the spec and is thus not a bug attached to the bug 
> would save everyone a lot of trouble in cases like this.  In other 
> words, where a bug that from all indications appears to be a spec 
> violation is closed as "INVALID" an explanation attached to the bug 
> itself would be a *very* good idea.

Sorry, I'm not a broken record, and I will not go on repeating the same 
stuff over and over 20 times.

Rémy


---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-dev-help@jakarta.apache.org


Re: Bug 23929: ServletRequest.setCharacterEncoding()

Posted by Jess Holle <je...@ptc.com>.
Remy Maucherat wrote:

> Jess Holle wrote:
>
>>> - There's big threads, commit messages (incl recent ones), and bugs 
>>> on this issue. How about reading that before writing an email about 
>>> how bad things are.
>>
>>
>>
>> I did search the archives for such threads before even filing my 
>> duplicate bug, so apparently my searching is inept.  I'll look again, 
>> but pointers would be appreciated.
>
>
> For example:
> remm        2003/12/10 14:26:28
>
>   Modified:    catalina/src/share/org/apache/coyote/tomcat5
>                         CoyoteConnector.java CoyoteRequest.java
>                         mbeans-descriptors.xml
>   Log:
>   - Add a flag to allow using the encoding specified in the 
> contentType for
>     the URI paramters. This is disabled by default, not compliant with 
> the standards,
>     but present for compatibility.

But as per my previous message I /cannot /change this on a connector 
basis.  I /must /make this determination on a per-request basis -- /and 
the servlet spec specifically allows me to do this via the 
setCharacterEncoding() API as I read it/.

> There's a query page in BZ, also, and as I said, many threads on 
> tomcat-dev (use the archives).

I queried both at some length -- especially BZ.  I'll query the 
tomcat-dev archives further, but again a simple synopsis of how Tomcat's 
behavior satisfies the spec and is thus not a bug attached to the bug 
would save everyone a lot of trouble in cases like this.  In other 
words, where a bug that from all indications appears to be a spec 
violation is closed as "INVALID" an explanation attached to the bug 
itself would be a *very* good idea.

--
Jess Holle


Re: Bug 23929: ServletRequest.setCharacterEncoding()

Posted by Remy Maucherat <re...@apache.org>.
Jess Holle wrote:
>> - There's big threads, commit messages (incl recent ones), and bugs on 
>> this issue. How about reading that before writing an email about how 
>> bad things are.
> 
> 
> I did search the archives for such threads before even filing my 
> duplicate bug, so apparently my searching is inept.  I'll look again, 
> but pointers would be appreciated.

For example:
remm        2003/12/10 14:26:28

   Modified:    catalina/src/share/org/apache/coyote/tomcat5
                         CoyoteConnector.java CoyoteRequest.java
                         mbeans-descriptors.xml
   Log:
   - Add a flag to allow using the encoding specified in the contentType for
     the URI paramters. This is disabled by default, not compliant with 
the standards,
     but present for compatibility.

There's a query page in BZ, also, and as I said, many threads on 
tomcat-dev (use the archives).

Rémy



---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-dev-help@jakarta.apache.org


Re: Bug 23929: ServletRequest.setCharacterEncoding()

Posted by Jess Holle <je...@ptc.com>.
Remy Maucherat wrote:

> Jess Holle wrote:
>
>> Remmy, et al:
>>
>> The API is *not* optional.  It is a required part of the servlet spec.
>
> Great. I didn't know that ;-)
>
> How about:
> - Not CCing me. I'm subscribed to tomcat-dev already. thanks.

Sorry.

> - There's big threads, commit messages (incl recent ones), and bugs on 
> this issue. How about reading that before writing an email about how 
> bad things are.

I did search the archives for such threads before even filing my 
duplicate bug, so apparently my searching is inept.  I'll look again, 
but pointers would be appreciated.

> BTW, there's no bug.

It would be nice if the bug comments described why it is not a bug.  I 
understand Bugzilla is not a discussion forum, but it would really help 
future reporters of an issue not to resurrect old issues if the bug 
comments contained a final summary as to why the bug was closed as 
"INVALID".

Did I and the other reporter mis-use the API?  The API presumably must 
work, so how are we misuing it so that it does not?  If it does not 
work, then how does this meet the spec?

--
Jess Holle



---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-dev-help@jakarta.apache.org


Re: Bug 23929: ServletRequest.setCharacterEncoding()

Posted by Remy Maucherat <re...@apache.org>.
Jess Holle wrote:

> Remmy, et al:
> 
> The API is *not* optional.  It is a required part of the servlet spec.

Great. I didn't know that ;-)

How about:
- Not CCing me. I'm subscribed to tomcat-dev already. thanks.
- There's big threads, commit messages (incl recent ones), and bugs on 
this issue. How about reading that before writing an email about how bad 
things are.

BTW, there's no bug.

Rémy

> It works just great in Tomcat 4.1 and is not an acceptable regression in 
> Tomcat 5.  I am thus one step away from re-opening this bug 
> (http://nagoya.apache.org/bugzilla/show_bug.cgi?id=23929)
> 
> I cannot use the encoding setting on the connector as the standard 
> handling of servlet parameters is ISO-8859-1 decoding unless 
> setCharacterEncoding() is used to specify something else.  All of our 
> other code thus follows this standard carefully (and works across all 
> servlet engines tested).  [This includes handling multi-byte data in 
> servlet parameters.]  This does require some careful shuffling to 
> workaround the fact that the wrong encoding was used by the servlet 
> engine and to use the correct one (UTF-8 in most, but not all, cases).
> 
> We do, however, have some code which leverages this new API to 
> setCharacterEncoding("UTF-8") -- which is, in fact, very nice to have.  
> I can see that it can be obnoxious for implementation -- but users of 
> the API do not and should not care.
> 
> Tomcat 5 has a lot of promising things over Tomcat 4.1 -- don't let spec 
> non-compliance force those who are forced to care about rigorous i18n to 
> tell our customers to use Tomcat 4.1 or pay for a commercial servlet 
> engine if they want later spec compliance.
> 
> -- 
> Jess Holle



---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-dev-help@jakarta.apache.org


Bug 23929: ServletRequest.setCharacterEncoding()

Posted by Jess Holle <je...@ptc.com>.
Remmy, et al:

The API is *not* optional.  It is a required part of the servlet spec.

It works just great in Tomcat 4.1 and is not an acceptable regression in 
Tomcat 5.  I am thus one step away from re-opening this bug 
(http://nagoya.apache.org/bugzilla/show_bug.cgi?id=23929)

I cannot use the encoding setting on the connector as the standard 
handling of servlet parameters is ISO-8859-1 decoding unless 
setCharacterEncoding() is used to specify something else.  All of our 
other code thus follows this standard carefully (and works across all 
servlet engines tested).  [This includes handling multi-byte data in 
servlet parameters.]  This does require some careful shuffling to 
workaround the fact that the wrong encoding was used by the servlet 
engine and to use the correct one (UTF-8 in most, but not all, cases).

We do, however, have some code which leverages this new API to 
setCharacterEncoding("UTF-8") -- which is, in fact, very nice to have.  
I can see that it can be obnoxious for implementation -- but users of 
the API do not and should not care.

Tomcat 5 has a lot of promising things over Tomcat 4.1 -- don't let spec 
non-compliance force those who are forced to care about rigorous i18n to 
tell our customers to use Tomcat 4.1 or pay for a commercial servlet 
engine if they want later spec compliance.

--
Jess Holle



---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-dev-help@jakarta.apache.org