You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tomcat.apache.org by bu...@apache.org on 2010/01/14 22:52:05 UTC

DO NOT REPLY [Bug 48550] New: Update examples and default server.xml to use UTF-8

https://issues.apache.org/bugzilla/show_bug.cgi?id=48550

           Summary: Update examples and default server.xml to use UTF-8
           Product: Tomcat 7
           Version: trunk
          Platform: PC
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Examples
        AssignedTo: dev@tomcat.apache.org
        ReportedBy: knst.kolinko@gmail.com


It is just an idea, but I think that with Tomcat 7 we can update our server.xml
and our examples to use UTF-8.

That is:

1) add URIEncoding="UTF-8" to HTTP and AJP <Connector> elements in the default
server.xml

2) configure SetCharacterEncodingFilter in the examples webapp

3) update Servlet and JSP examples to allow UTF-8 input (1) and 2) will provide
that) and to use UTF-8 as their output character encoding

4) the servlet/JSP sources will probably stay as ISO-8859-1, as they are now

Please add, if I missed anything.


For reference:
http://wiki.apache.org/tomcat/FAQ/CharacterEncoding

I think we are a bit busy right now, so I am filing this issue, supposing that
a more detailed discussion can be raised later on dev@ or users@.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


[Bug 48550] Update examples and default server.xml to use UTF-8

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=48550

--- Comment #9 from Mark Thomas <ma...@apache.org> ---
Part 1 of the 4 tasks in the description has been completed for trunk (a.k.a
8.0.x)

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


DO NOT REPLY [Bug 48550] Update examples and default server.xml to use UTF-8

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=48550

--- Comment #3 from Attila Király <ki...@gmail.com> 2010-12-17 13:23:50 EST ---
As a user of Tomcat and a webapp developer I would really like to see the 1)
added to the default server.xml. I mostly develop apps using utf-8 encoding and
if the customer is using tomcat extra care is needed to either not use non
iso-8859-1 characters in query parameters or convincing them to modify the
tomcat configuration (from these options the former is always the easier).

Some test results:
- Glassfish 3.0.1 documentation contains a similar, optional (default value
"UTF-8") attribute called "uri-encoding" on the "http" element in the
"domain.xml" (mentioned here:
http://docs.sun.com/app/docs/doc/821-1753/girlq?l=en&a=view#indexterm-246 ).
Unfortunately it does not have any effect on query encoding (tried it with
different values but always ISO-8859-1 was used to decode query parameters).
This might be a bug in GF but the intention is there.
- On client side FF 3.6, Chrome 8, Opera 11 and IE9 Beta (and as I found on the
web older versions too) use the character encoding of the page to encode the
query parameters. So if the html is served with utf-8 encoding the query
parameters are encoded with utf-8.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


DO NOT REPLY [Bug 48550] Update examples and default server.xml to use UTF-8

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=48550

--- Comment #5 from Christopher Schultz <ch...@christopherschultz.net> 2010-12-17 14:35:27 EST ---
(In reply to comment #3)
> - On client side FF 3.6, Chrome 8, Opera 11 and IE9 Beta (and as I found on the
> web older versions too) use the character encoding of the page to encode the
> query parameters. So if the html is served with utf-8 encoding the query
> parameters are encoded with utf-8.

Could you provide references to the above? I had trouble finding official
default values for the URL character encoding used by browsers.

There's also the trouble of users being able to override the default and revert
back to (most likely) ISO-8859-1 encoding.

Right now, I'm -1 for making URIEncoding="UTF-8" by default since it might
break a lot of servers, but I'm willing to be convinced. For the record, I
always set URIEncoding="UTF-8" on my projects but we don't want an
out-of-the-box server configuration to surprise anyone.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


DO NOT REPLY [Bug 48550] Update examples and default server.xml to use UTF-8

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=48550

--- Comment #6 from Attila Király <ki...@gmail.com> 2010-12-18 05:54:48 EST ---
(In reply to comment #5)
> (In reply to comment #3)
> > - On client side FF 3.6, Chrome 8, Opera 11 and IE9 Beta (and as I found on the
> > web older versions too) use the character encoding of the page to encode the
> > query parameters. So if the html is served with utf-8 encoding the query
> > parameters are encoded with utf-8.
> 
> Could you provide references to the above? I had trouble finding official
> default values for the URL character encoding used by browsers.

I am afraid I can not give official references. The exact browser versions
mentioned above were tested by me (with UTF-8 and ISO-8859-1 encoded
pages-links) and those work like I wrote. But it is also mentioned in
- Tomcat wiki: http://wiki.apache.org/tomcat/FAQ/CharacterEncoding#Q9
"Many browsers are starting to offer (default) options of encoding URIs using
UTF-8 instead of ISO-8859-1. Some browsers appear to use the encoding of the
current page to encode URIs for links (see the note above regarding browser
behavior for POST encoding)."
- MozillaZine KB about the Firefox "network.standard-url.encode-query-utf8"
config property: 
http://kb.mozillazine.org/Network.standard-url.encode-query-utf8
"For compatibility with these websites, as well as parity with IE and Opera,
Mozilla now treats the query portion of a URI (the part following the ?)
differently than the rest.[...]
Encode the query portion of IRIs using the same encoding as the current page.
(Default)"

Additionally Jetty is also using UTF-8 by default:
Jetty wiki:
http://docs.codehaus.org/display/JETTY/International+Characters+and+Character+Encodings#InternationalCharactersandCharacterEncodings-InternationalcharactersinURLs
"The W3C organization's HTML standard now recommends the use of UTF-8:
http://www.w3.org/TR/html40/appendix/notes.html#non-ascii-chars and accordingly
jetty-6 series uses a default of UTF-8."

> 
> There's also the trouble of users being able to override the default and revert
> back to (most likely) ISO-8859-1 encoding.
> 
> Right now, I'm -1 for making URIEncoding="UTF-8" by default since it might
> break a lot of servers, but I'm willing to be convinced. For the record, I
> always set URIEncoding="UTF-8" on my projects but we don't want an
> out-of-the-box server configuration to surprise anyone.

This is true. However for me it seems the web is moving to an UTF-8 based
direction. So I think a change to the default encoding should be made sometimes
in Tomcat. That is a backward compatibility issue so it should be made in a
major point release. The 7.0 could be that. If it is not done now the next
possibility is at 8.0 in the future. I don't say developers can't live without
this change I can cope with it as I did it always (I only mentioned my reasons
here because this issue was already opened).

Probably my real problem is that query parameter decoding is inconsistent
between servlet containers and there is no way to regulate it on a per webapp
base (instead of a server wide option) in Tomcat (could use
"useBodyEncodingForURI=true" attribute but it still a modification in the
server.xml).

I would also be happy with a Jetty like solution. In jetty 7.2 UTF-8 is the
default for query decoding but it is overridable with
request.setAttribute("org.eclipse.jetty.server.Request.queryEncoding",
"ISO-8859-1"); on a per request base. Tomcat could have something like that. So
in a filter I could call:
request.setCharacterEncoding("UTF-8"); // for Glassfish 3 query decoding, but
it is already done anyway as it is needed for POST-s too for all serlet
containers
request.setAttribute("org.eclipse.jetty.server.Request.queryEncoding",
"UTF-8"); // for Jetty, just to be sure
request.setAttribute("org.apache.tomcat.Request.queryEncoding or similar",
"UTF-8"); // for Tomcat 7 and up
and could get a safe portable way for at least 3 servlet containers.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


DO NOT REPLY [Bug 48550] Update examples and default server.xml to use UTF-8

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=48550

--- Comment #7 from Peter Flynn <pe...@silmaril.ie> 2010-12-22 05:16:10 EST ---
I had some problems forcing Cocoon output to be UTF-8 (using Tomcat5 and Cocoon
2.1.11) because I didn't realise the default was ISO-8859-1 (everything else in
my sitemaps and XSLT was set to UTF-8, which is what puzzled me).

Our internal controls insist on UTF-8 for everything, so this was only exposed
when we accessed external resources (which could of course be anything,
including Windows-1252).

My gut feeling is that if we are to continue the general move towards
end-to-end XML in the business process (or at least, XML-as-early-as-possible),
then making the character repertoire uniform is A Good Idea, so a default of
UTF-8 would seem very sensible.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


DO NOT REPLY [Bug 48550] Update examples and default server.xml to use UTF-8

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=48550

unnivm@gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |NEEDINFO

--- Comment #1 from unnivm@gmail.com 2010-11-26 07:00:25 EST ---
Please clarify what is the actual difference between these two statements:

3) update Servlet and JSP examples to allow UTF-8 input (1) and 2) will provide
that) and to use UTF-8 as their output character encoding

4) the servlet/JSP sources will probably stay as ISO-8859-1, as they are now

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


DO NOT REPLY [Bug 48550] Update examples and default server.xml to use UTF-8

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=48550

--- Comment #2 from Christopher Schultz <ch...@christopherschultz.net> 2010-12-03 11:31:00 EST ---
(In reply to comment #1)
> Please clarify what is the actual difference between these two statements:
> 
> 3) update Servlet and JSP examples to allow UTF-8 input (1) and 2) will provide
> that) and to use UTF-8 as their output character encoding
> 
> 4) the servlet/JSP sources will probably stay as ISO-8859-1, as they are now

#3 means changing the examples webapp to accept UTF-8 input (shouldn't be a big
deal, as #1 and #2 provide that, as mentioned) and to set the <%@page
pageEncoding="UTF-8" @> in order to set the output encoding.

#4 means that we won't bother re-encoding all of the JSP files as UTF-8 because
a) such a change would be surprising to users and b) it is not necessary as
those pages are probably all using pure ASCII at this point anyway

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


DO NOT REPLY [Bug 48550] Update examples and default server.xml to use UTF-8

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=48550

Michael <mi...@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |michael.sonnleitner@gmail.c
                   |                            |om

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


[Bug 48550] Update examples and default server.xml to use UTF-8

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=48550

Konstantin Preißer <pr...@web.de> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEEDINFO                    |NEW

--- Comment #8 from Konstantin Preißer <pr...@web.de> ---
Hi,

as this has not been applied to Tomcat 7, what about Tomcat 8?

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


DO NOT REPLY [Bug 48550] Update examples and default server.xml to use UTF-8

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=48550

Attila Király <ki...@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |kiralyattila.hu@gmail.com

--- Comment #4 from Attila Király <ki...@gmail.com> 2010-12-17 13:33:51 EST ---
One more info. Further test revealed that Glassfish 3.0.1 is actually using the
request encoding for query parameter decoding. Calling
request.setCharacterEncoding("UTF-8"); triggered UTF-8 based decoding for
parameters.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


[Bug 48550] Update examples and default server.xml to use UTF-8

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=48550

Mark Thomas <ma...@apache.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED

--- Comment #10 from Mark Thomas <ma...@apache.org> ---
This has been fixed for trunk a.k.a 8.0.x and will be included in 8.0.0-RC2
onwards.

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org