You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tomcat.apache.org by bu...@apache.org on 2003/12/31 22:38:37 UTC
DO NOT REPLY [Bug 25848] New: -
ServletRequest.setCharacterEncoding() does not appear to work
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=25848>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND
INSERTED IN THE BUG DATABASE.
http://nagoya.apache.org/bugzilla/show_bug.cgi?id=25848
ServletRequest.setCharacterEncoding() does not appear to work
Summary: ServletRequest.setCharacterEncoding() does not appear to
work
Product: Tomcat 5
Version: 5.0.16
Platform: All
OS/Version: All
Status: NEW
Severity: Major
Priority: Other
Component: Connector:Coyote
AssignedTo: tomcat-dev@jakarta.apache.org
ReportedBy: jessh@ptc.com
We have pages that process non-ASCII characters in request parameter values
just fine with Tomcat 4.1.24 by encoding the URL request string (this is a GET
request) and then using ServletRequest.setCharacterEncoding("UTF-8") when
processing the resulting request.
This is failing in Tomcat 5.0.16 -- all else being held equal.
I've checked the bug database and see various issues with setCharacterEncoding
() and interactions with request dump valves, Jasper log-levels, etc. I
checked each one of these in my Tomcat 5 configuration and none of these are
set.
I have been poking around in the debugger and can't seem to find anything amiss
(e.g. tell-tale use of parseRequestParameters() prior to use of
setCharacterEncoding()), but I'm coming up empty thus far...
---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-dev-help@jakarta.apache.org
Re: Bug 23929: ServletRequest.setCharacterEncoding()
Posted by Jess Holle <je...@ptc.com>.
> From a developers point of view however, applying the above two points
> a) brakes expected behaviour (setCharacterEncoding() method does not
> work the same as before)
> b) does not give an acceptable alternative (if all parameter passing
> could be solved with POST method, then the GET method would not be
> needed, would it?)
> c) a lot of web apps stopped working when an upgrade of the tomcat
> version was performed
>
> So I think it is legitimate to be upset when first confronted with
> this change of behaviour.
I will not claim that I was reasonable when originally confronted with
the change.
I will say that:
1. Our existing (4.1.x) usage of setCharacterEncoding() works across
all recent servlet engines tested [including 2 commercial servlet
engines] -- and is thus some indication of a de facto standard.
2. It would seem from examples provided with setCharacterEncoding()
by Sun that the intent is to include request parameters and that
thus this should be the default operation of this API rather than
requiring additional configuration to obtain this behavior.
> As for how easy it is to NOT file duplicate bugs on this issue, having
> followed this debate, I have collected the following list of somehow
> related bugs
I did searches again after being scolded by Remy. I must admit that I
must have crossed wires when doing searches and filing bugs and somehow
managed to miss this search (which it is my habit to do).
> Speaking for myself and having reread these messages:
> Assuming I 've been working for some time with the old behaviour and
> experienced the new one, I would not be able to understand why this
> change was made, EVEN if someone gave me the above list of bugs.
Agreed. Without a short summary attached to the bugs I would still have
filed a new bug and argued to high hell...
--
Jess Holle
Re: Bug 23929: ServletRequest.setCharacterEncoding()
Posted by Remy Maucherat <re...@apache.org>.
Stefanos Karasavvidis wrote:
> If not already done, port the useBodyEncodingForURI parameter to the
> next 4.1.x release.
This new flag has been ported last month.
Rémy
---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-dev-help@jakarta.apache.org
Re: Bug 23929: ServletRequest.setCharacterEncoding()
Posted by Stefanos Karasavvidis <st...@msc.gr>.
Remy Maucherat wrote:
> Jess Holle wrote:
>
>> Remy Maucherat wrote:
>>
>> This is a good question -- but one which only applies to POST. My bug
>> case was explictly with GET.
>>
>> If there is an entity body encoding specified in the request, then I
>> am not sure which should override. If there is not, then I would
>> presume setCharacterEncoding() should win out. If the only issue is
>> when these differ, then I believe that site designers should simply
>> ensure they don't.
>
>
> I think you should read the HTTP RFC. content-type does not apply to the
> URI or the HTTP header. The fact that setCharacterEncoding would apply
> to (part of) the URI and/or the header violates the RFC on URIs.
>
> Anyway, to put it simply: in the next release, add
> useBodyEncodingForURI="true" on the connector, and you're done.
> Please don't complain that it won't do what you want before trying it.
> You can also use the URIEncoding attribute to specify the path encoding.
>
> Rémy
>
my 2 cents on this issue,
Remy is for sure right stating that
a) the HTTP RFC does not cover variable character encoding for query
parameters for different requests
b) it is (sounds ?) logical to assume that the whole URI (including
paths, query parameters etc.) should be considered as being encoded with
the same character encoding
From a developers point of view however, applying the above two points
a) brakes expected behaviour (setCharacterEncoding() method does not
work the same as before)
b) does not give an acceptable alternative (if all parameter passing
could be solved with POST method, then the GET method would not be
needed, would it?)
c) a lot of web apps stopped working when an upgrade of the tomcat
version was performed
So I think it is legitimate to be upset when first confronted with this
change of behaviour.
As for how easy it is to NOT file duplicate bugs on this issue, having
followed this debate, I have collected the following list of somehow
related bugs
bug 25360
bug 25231
bug 25235
bug 22666
bug 24557
bug 24345
bug 23929
bug 25848
and of course a bunch of messages in the developer list
Speaking for myself and having reread these messages:
Assuming I 've been working for some time with the old behaviour and
experienced the new one, I would not be able to understand why this
change was made, EVEN if someone gave me the above list of bugs.
I propose the following:
write a short summary of why this change was necessary and include the
above list of bugs, as well as links to the related developer list
threads. Then submit a link to this summary to all the above bugs.
If not already done, port the useBodyEncodingForURI parameter to the
next 4.1.x release.
I volunteer to write the summary if the list thinks that the proposal is
reasonable.
Regards
Stefanos
---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-dev-help@jakarta.apache.org
Re: Bug 23929: ServletRequest.setCharacterEncoding()
Posted by Remy Maucherat <re...@apache.org>.
Jess Holle wrote:
> Remy Maucherat wrote:
>
> This is a good question -- but one which only applies to POST. My bug
> case was explictly with GET.
>
> If there is an entity body encoding specified in the request, then I am
> not sure which should override. If there is not, then I would presume
> setCharacterEncoding() should win out. If the only issue is when these
> differ, then I believe that site designers should simply ensure they don't.
I think you should read the HTTP RFC. content-type does not apply to the
URI or the HTTP header. The fact that setCharacterEncoding would apply
to (part of) the URI and/or the header violates the RFC on URIs.
Anyway, to put it simply: in the next release, add
useBodyEncodingForURI="true" on the connector, and you're done.
Please don't complain that it won't do what you want before trying it.
You can also use the URIEncoding attribute to specify the path encoding.
Rémy
---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-dev-help@jakarta.apache.org
Re: Bug 23929: ServletRequest.setCharacterEncoding()
Posted by Jess Holle <je...@ptc.com>.
Remy Maucherat wrote:
> Jess Holle wrote:
>
>> Remy Maucherat wrote:
>>
>>> For example:
>>> remm 2003/12/10 14:26:28
>>>
>>> Modified: catalina/src/share/org/apache/coyote/tomcat5
>>> CoyoteConnector.java CoyoteRequest.java
>>> mbeans-descriptors.xml
>>> Log:
>>> - Add a flag to allow using the encoding specified in the
>>> contentType for
>>> the URI paramters. This is disabled by default, not compliant
>>> with the standards,
>>> but present for compatibility.
>>
>> But as per my previous message I /cannot /change this on a connector
>> basis. I /must /make this determination on a per-request basis --
>> /and the servlet spec specifically allows me to do this via the
>> setCharacterEncoding() API as I read it/.
>
> The content-type header and your setCharacterEncoding call both
> control the request entity body character encoding. So if using the
> entity body encoding *also* for URI parameters, what would you think
> it would do ?
This is a good question -- but one which only applies to POST. My bug
case was explictly with GET.
If there is an entity body encoding specified in the request, then I am
not sure which should override. If there is not, then I would presume
setCharacterEncoding() should win out. If the only issue is when these
differ, then I believe that site designers should simply ensure they don't.
>>> There's a query page in BZ, also, and as I said, many threads on
>>> tomcat-dev (use the archives).
>>
>> I queried both at some length -- especially BZ. I'll query the
>> tomcat-dev archives further, but again a simple synopsis of how
>> Tomcat's behavior satisfies the spec and is thus not a bug attached
>> to the bug would save everyone a lot of trouble in cases like this.
>> In other words, where a bug that from all indications appears to be a
>> spec violation is closed as "INVALID" an explanation attached to the
>> bug itself would be a *very* good idea.
>
> Sorry, I'm not a broken record, and I will not go on repeating the
> same stuff over and over 20 times.
Just once on the one of the bug reports in the duplicate chain would
suffice. [At least in my handling of our internal bug system it is
common place to copy/paste the final status from e-mail threads and/or
lists into the bugs attachments when closing the bug.]
--
Jess Holle
---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-dev-help@jakarta.apache.org
Re: Bug 23929: ServletRequest.setCharacterEncoding()
Posted by Remy Maucherat <re...@apache.org>.
Jess Holle wrote:
> Remy Maucherat wrote:
>>
>> For example:
>> remm 2003/12/10 14:26:28
>>
>> Modified: catalina/src/share/org/apache/coyote/tomcat5
>> CoyoteConnector.java CoyoteRequest.java
>> mbeans-descriptors.xml
>> Log:
>> - Add a flag to allow using the encoding specified in the
>> contentType for
>> the URI paramters. This is disabled by default, not compliant with
>> the standards,
>> but present for compatibility.
>
> But as per my previous message I /cannot /change this on a connector
> basis. I /must /make this determination on a per-request basis -- /and
> the servlet spec specifically allows me to do this via the
> setCharacterEncoding() API as I read it/.
The content-type header and your setCharacterEncoding call both control
the request entity body character encoding. So if using the entity body
encoding *also* for URI parameters, what would you think it would do ?
>> There's a query page in BZ, also, and as I said, many threads on
>> tomcat-dev (use the archives).
>
> I queried both at some length -- especially BZ. I'll query the
> tomcat-dev archives further, but again a simple synopsis of how Tomcat's
> behavior satisfies the spec and is thus not a bug attached to the bug
> would save everyone a lot of trouble in cases like this. In other
> words, where a bug that from all indications appears to be a spec
> violation is closed as "INVALID" an explanation attached to the bug
> itself would be a *very* good idea.
Sorry, I'm not a broken record, and I will not go on repeating the same
stuff over and over 20 times.
Rémy
---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-dev-help@jakarta.apache.org
Re: Bug 23929: ServletRequest.setCharacterEncoding()
Posted by Jess Holle <je...@ptc.com>.
Remy Maucherat wrote:
> Jess Holle wrote:
>
>>> - There's big threads, commit messages (incl recent ones), and bugs
>>> on this issue. How about reading that before writing an email about
>>> how bad things are.
>>
>>
>>
>> I did search the archives for such threads before even filing my
>> duplicate bug, so apparently my searching is inept. I'll look again,
>> but pointers would be appreciated.
>
>
> For example:
> remm 2003/12/10 14:26:28
>
> Modified: catalina/src/share/org/apache/coyote/tomcat5
> CoyoteConnector.java CoyoteRequest.java
> mbeans-descriptors.xml
> Log:
> - Add a flag to allow using the encoding specified in the
> contentType for
> the URI paramters. This is disabled by default, not compliant with
> the standards,
> but present for compatibility.
But as per my previous message I /cannot /change this on a connector
basis. I /must /make this determination on a per-request basis -- /and
the servlet spec specifically allows me to do this via the
setCharacterEncoding() API as I read it/.
> There's a query page in BZ, also, and as I said, many threads on
> tomcat-dev (use the archives).
I queried both at some length -- especially BZ. I'll query the
tomcat-dev archives further, but again a simple synopsis of how Tomcat's
behavior satisfies the spec and is thus not a bug attached to the bug
would save everyone a lot of trouble in cases like this. In other
words, where a bug that from all indications appears to be a spec
violation is closed as "INVALID" an explanation attached to the bug
itself would be a *very* good idea.
--
Jess Holle
Re: Bug 23929: ServletRequest.setCharacterEncoding()
Posted by Remy Maucherat <re...@apache.org>.
Jess Holle wrote:
>> - There's big threads, commit messages (incl recent ones), and bugs on
>> this issue. How about reading that before writing an email about how
>> bad things are.
>
>
> I did search the archives for such threads before even filing my
> duplicate bug, so apparently my searching is inept. I'll look again,
> but pointers would be appreciated.
For example:
remm 2003/12/10 14:26:28
Modified: catalina/src/share/org/apache/coyote/tomcat5
CoyoteConnector.java CoyoteRequest.java
mbeans-descriptors.xml
Log:
- Add a flag to allow using the encoding specified in the contentType for
the URI paramters. This is disabled by default, not compliant with
the standards,
but present for compatibility.
There's a query page in BZ, also, and as I said, many threads on
tomcat-dev (use the archives).
Rémy
---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-dev-help@jakarta.apache.org
Re: Bug 23929: ServletRequest.setCharacterEncoding()
Posted by Jess Holle <je...@ptc.com>.
Remy Maucherat wrote:
> Jess Holle wrote:
>
>> Remmy, et al:
>>
>> The API is *not* optional. It is a required part of the servlet spec.
>
> Great. I didn't know that ;-)
>
> How about:
> - Not CCing me. I'm subscribed to tomcat-dev already. thanks.
Sorry.
> - There's big threads, commit messages (incl recent ones), and bugs on
> this issue. How about reading that before writing an email about how
> bad things are.
I did search the archives for such threads before even filing my
duplicate bug, so apparently my searching is inept. I'll look again,
but pointers would be appreciated.
> BTW, there's no bug.
It would be nice if the bug comments described why it is not a bug. I
understand Bugzilla is not a discussion forum, but it would really help
future reporters of an issue not to resurrect old issues if the bug
comments contained a final summary as to why the bug was closed as
"INVALID".
Did I and the other reporter mis-use the API? The API presumably must
work, so how are we misuing it so that it does not? If it does not
work, then how does this meet the spec?
--
Jess Holle
---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-dev-help@jakarta.apache.org
Re: Bug 23929: ServletRequest.setCharacterEncoding()
Posted by Remy Maucherat <re...@apache.org>.
Jess Holle wrote:
> Remmy, et al:
>
> The API is *not* optional. It is a required part of the servlet spec.
Great. I didn't know that ;-)
How about:
- Not CCing me. I'm subscribed to tomcat-dev already. thanks.
- There's big threads, commit messages (incl recent ones), and bugs on
this issue. How about reading that before writing an email about how bad
things are.
BTW, there's no bug.
Rémy
> It works just great in Tomcat 4.1 and is not an acceptable regression in
> Tomcat 5. I am thus one step away from re-opening this bug
> (http://nagoya.apache.org/bugzilla/show_bug.cgi?id=23929)
>
> I cannot use the encoding setting on the connector as the standard
> handling of servlet parameters is ISO-8859-1 decoding unless
> setCharacterEncoding() is used to specify something else. All of our
> other code thus follows this standard carefully (and works across all
> servlet engines tested). [This includes handling multi-byte data in
> servlet parameters.] This does require some careful shuffling to
> workaround the fact that the wrong encoding was used by the servlet
> engine and to use the correct one (UTF-8 in most, but not all, cases).
>
> We do, however, have some code which leverages this new API to
> setCharacterEncoding("UTF-8") -- which is, in fact, very nice to have.
> I can see that it can be obnoxious for implementation -- but users of
> the API do not and should not care.
>
> Tomcat 5 has a lot of promising things over Tomcat 4.1 -- don't let spec
> non-compliance force those who are forced to care about rigorous i18n to
> tell our customers to use Tomcat 4.1 or pay for a commercial servlet
> engine if they want later spec compliance.
>
> --
> Jess Holle
---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-dev-help@jakarta.apache.org
Bug 23929: ServletRequest.setCharacterEncoding()
Posted by Jess Holle <je...@ptc.com>.
Remmy, et al:
The API is *not* optional. It is a required part of the servlet spec.
It works just great in Tomcat 4.1 and is not an acceptable regression in
Tomcat 5. I am thus one step away from re-opening this bug
(http://nagoya.apache.org/bugzilla/show_bug.cgi?id=23929)
I cannot use the encoding setting on the connector as the standard
handling of servlet parameters is ISO-8859-1 decoding unless
setCharacterEncoding() is used to specify something else. All of our
other code thus follows this standard carefully (and works across all
servlet engines tested). [This includes handling multi-byte data in
servlet parameters.] This does require some careful shuffling to
workaround the fact that the wrong encoding was used by the servlet
engine and to use the correct one (UTF-8 in most, but not all, cases).
We do, however, have some code which leverages this new API to
setCharacterEncoding("UTF-8") -- which is, in fact, very nice to have.
I can see that it can be obnoxious for implementation -- but users of
the API do not and should not care.
Tomcat 5 has a lot of promising things over Tomcat 4.1 -- don't let spec
non-compliance force those who are forced to care about rigorous i18n to
tell our customers to use Tomcat 4.1 or pay for a commercial servlet
engine if they want later spec compliance.
--
Jess Holle
---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-dev-help@jakarta.apache.org