You are viewing a plain text version of this content. The canonical link for it is here.
Posted to docs@cocoon.apache.org by Apache Wiki <wi...@apache.org> on 2007/05/10 12:58:53 UTC

[Cocoon Wiki] Update of "RequestParameterEncoding" by AlexanderKlimetschek

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cocoon Wiki" for change notification.

The following page has been changed by AlexanderKlimetschek:
http://wiki.apache.org/cocoon/RequestParameterEncoding

------------------------------------------------------------------------------
  = Request parameter encoding =
  
+ == How-to set everything to UTF-8 with Cocoon and CForms (with Ajax and Dojo) ==
+ 
+ The best for internationalization is to handle everything in UTF-8, since this is probably the most intelligent encoding available out there. Everything means server side (Backend, XML), HTTP Requests/Responses and client side with forms and dojo.io.bind.
+ 
+ === 1. Sending all pages in UTF-8 ===
+ 
+ You need to configure Cocoon's serializers to UTF-8. The XML serializer ({{{<serialize type="xml" />}}}) and the HTML serializer ({{{<serialize type="html" />}}}) need to be configured. To support all browsers, you must state the encoding to be used for the body and also include a meta tag in the html: {{{<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">}}}. This is very important, since the browser will then send form requests encoded in UTF-8 (and browsers normaly don't mention the encoding in the request, so you have to assume they are doing it right). Here is the configuration for the serializer components for your sitemaps that will do that:
+ 
+ {{{
+ <serializer name="xml" mime-type="text/xml"
+   src="org.apache.cocoon.serialization.XMLSerializer">
+   <encoding>UTF-8</encoding>
+ </serializer>
+ 
+ <serializer name="html" mime-type="text/html; charset=UTF-8"
+   src="org.apache.cocoon.serialization.HTMLSerializer">
+   <encoding>UTF-8</encoding>
+ 
+   <!-- the following common doctype is only included for completeness, it has no impact on encoding -->
+   <doctype-public>-//W3C//DTD HTML 4.01 Transitional//EN</doctype-public>
+   <doctype-system>http://www.w3.org/TR/html4/loose.dtd</doctype-system>
+ </serializer>
+ }}}
+ 
+ === 2. AJAX Requests with CForms/Dojo ===
+ 
+ If you use CForms with ajax enabled, Cocoon will make use of dojo.io.bind() under the hood, which creates XMLHttpRequests that POST the form data to the server. Here Dojo decides the encoding by default, which does not match the browser's behaviour of using the charset defined in the META tag. But you can easily tell Dojo which formatting to use for all dojo.io.bind() calls, just include that in the top of your HTML pages, before dojo.js is included:
+ 
+ {{{
+ <script>djConfig = { bindEncoding: "utf-8" };</script>
+ }}}
+ 
+ You might already have other djConfig options, then simply add the {{{bindEncoding}}} property to the hash map.
+ 
+ === 3. Decoding incoming requests: Servlet Container ===
+ 
+ When the browser sends stuff to your server, eg. form data, the {{{ServletRequest}}} will be created by your servlet container, which needs to decode the parameters correctly into Java Strings. If there is the encoding specified in the HTTP request header, he will use that, but unfortunately this is typically not the case. When the browser sends a form post, he will only say {{{application/x-www-form-urlencoded}}} in the header. So you have to assume the encoding here, and the right thing to assume is the encoding of the page you originally sent to the browser.
+ 
+ The servlet standard says that the default encoding for incoming requests should be ISO-8859-1 (Jetty is not according to the standard here, it assumes UTF-8 by default). So to make sure UTF-8 is used for the parameter decoding, you have to tell your servlet that encoding explicitly. This is done by calling {{{ServletRequest.setCharacterEncoding()}}}. To do that for all your requests, you can use a servlet filter like this one: SetCharacterEncodingFilter.
+ 
+ Then you add the filter to the web.xml:
+ 
+ {{{
+ <filter>
+   <filter-name>Set Character Encoding</filter-name>
+   <filter-class>filters.SetCharacterEncodingFilter</filter-class>
+   <init-param>
+     <param-name>encoding</param-name>
+     <param-value>UTF-8</param-value>
+   </init-param>
+ </filter>
+ 
+ <!-- either mapping to URL pattern -->
+ 
+ <filter-mapping>
+   <filter-name>Set Character Encoding</filter-name>
+   <url-pattern>/*</url-pattern>
+ </filter-mapping>
+ 
+ <!-- or mapping to your Cocoon servlet (the servlet-name might be different) -->
+ 
+ <filter-mapping>
+   <filter-name>SetCharacterEncoding</filter-name>
+   <servlet-name>CocoonBlocksDispatcherServlet</servlet-name>
+ </filter-mapping>
+ 
+ }}}
+ 
+ Since the filter element was added in the servlet 2.3 specification, you need at least 2.3 in your web.xml, but using the current 2.4 version is better, it's the standard for Cocoon webapplications. For 2.4 you use a XSD schema:
+ 
+ {{{
+ <web-app version="2.4"
+          xmlns="http://java.sun.com/xml/ns/j2ee"
+          xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
+          xsi:schemaLocation="http://java.sun.com/xml/ns/j2ee http://java.sun.com/xml/ns/j2ee/web-app_2_4.xsd">
+ }}}
+ 
+ For 2.3 you need to modify the DOCTYPE declaration in the web.xml:
+ 
+ {{{
+ <!DOCTYPE web-app
+     PUBLIC "-//Sun Microsystems, Inc.//DTD Web Application 2.3//EN"
+     "http://java.sun.com/dtd/web-app_2_3.dtd">
+ }}}
+ 
+ === 4. Setting Cocoon's encoding (especially CForms) ===
+ 
+ To tell Cocoon to use UTF-8 internally, you have to set 2 properties:
+ 
+ {{{
+ org.apache.cocoon.containerencoding=utf-8
+ org.apache.cocoon.formencoding=utf-8
+ }}}
+ 
+ They need to be in some {{{*.properties}}} file under {{{META-INF/cocoon/properties}}} in one of your blocks.
+ 
+ === 5. XML Files ===
+ 
+ This is normally not a problem, since the standard encoding for XML files is UTF-8. However, they should always start with the following instruction, which should force your XML Editor to save them in UTF-8 (it looks like most of them do that, so there should not be a problem here).
+ 
+ {{{
+ <?xml version="1.0" encoding="UTF-8"?>
+ }}}
+ 
+ === 6. Special Transformers ===
+ 
+ The standard XSLT Transformers and others are working on SAX events, which are not serialized, thus encoding is not a problem. But there are some special transformers that pass stuff on to another library that does include serialization and might need a hint to use the correct encoding. One problem is for example the NekoHTMLTransformer: https://issues.apache.org/jira/browse/COCOON-2063.
+ 
+ If you think there might be a transformer doing things wrong in your pipeline, add a {{{TeeTransformer}}} between each step, outputting the XML between the transformers into temp1.xml, temp2.xml and so on to look for the place where your umlaute and special characters are messed up.
+ 
+ === 7. Your own XML serializing Sources ===
+ 
+ If you have your own Source implementation that needs to serialize XML, make sure it will do that in UTF-8 as well. A good idea is to use Cocoon's XML serializer, since we already configured that one to UTF-8 above. Sample code that does that is here: ["UseCocoonXMLSerializerCode"]
+ 
+ 
+ == Older documentation ==
+ 
- == Basics ==
+ === Basics ===
  
  If your Cocoon application needs to read request parameters that could contain ''special'' characters, i.e. characters outside of the first 128 ASCII characters, you'll need to pay attention to what encoding is used.