You are viewing a plain text version of this content. The canonical link for it is here.
Posted to docs@cocoon.apache.org by st...@outerthought.org on 2003/03/13 19:00:03 UTC

[WIKI-UPDATE] RequestParameterEncoding BrunoDumon Thu Mar 13 19:00:02 2003

Page: http://wiki.cocoondev.org/Wiki.jsp?page=RequestParameterEncoding , version: 1 on Thu Mar 13 17:13:46 2003 by 157.193.121.51

New page created:
+ !!!Request parameter encoding
+ 
+ !!Basics
+ 
+ If your Cocoon application needs to read request parameters that could contain "special" characters, i.e. characters outside of the first 128 ASCII characters, you'll need to pay attention to what encoding is used.
+ 
+ Normally a browser will send data to the server using the same encoding as the page containing the submitted form (or whatever). So if the pages are serialized using UTF-8, the browser will submit form data using UTF-8. The user can change the encoding, but it's quite safe to assume he/she won't do that (have you ever done it?).
+ 
+ After doing some tests with popular browser's, I've noticed that usually browsers will not let the server know what encoding they used to encode the parameters, so we need to make sure ourselves that the encoding used when serializing pages corresponds to the encoding used when decoding request parameters.
+ 
+ First of all, check in the sitemap what encoding is used when serializing HTML pages:
+ 
+ {{{
+ <map:serializer logger="sitemap.serializer.html" mime-type="text/html"
+        name="html" pool-grow="4" pool-max="32" pool-min="4"
+        src="org.apache.cocoon.serialization.HTMLSerializer">
+   <buffer-size>1024</buffer-size>
+   <encoding>UTF-8</encoding>
+ </map:serializer>
+ }}}
+ 
+ In the example above, UTF-8 is the encoding used. This is a widely supported Unicode encoding, so it is often a good choice.
+ 
+ The HTML serializer will automatically insert a <meta> tag into the HTML page's HEAD element specifying the encoding. Most browsers apparently require this. The HTML serializer will however only do this if your page already
+ contains a HEAD (or head) element, so make sure it has one. The <meta> element inserted by the serializer will then look as follows:
+ 
+ {{{
+ <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
+ }}}
+ 
+ By default, if the browser doesn't explicitely mention the encoding, a servlet container will decode request parameters using the ISO-8859-1 encoding (independent of the platform on which the container is running). So in the above case where UTF-8 was used when serializing, we would be facing problems.
+ 
+ The encoding to use when decoding request parameters can be configured in the web.xml by supplying init parameters called "form-encoding" and "container-encoding" to the Cocoon servlet. The container-encoding parameter indicates according to what encoding the container tried to decode the request parameters (normally ISO-8859-1), and the form-encoding parameter indicates the actual encoding. Here's an example of how to specify the parameters in the web.xml:
+ 
+ {{{
+ <init-param>
+   <param-name>container-encoding</param-name>
+   <param-value>ISO-8859-1</param-value>
+ </init-param>
+ <init-param>
+   <param-name>form-encoding</param-name>
+   <param-value>UTF-8</param-value>
+ </init-param>
+ }}}
+ 
+ For Java-insiders: what Cocoon actually does internally is apply the following trick to get a parameter correctly decoded: suppose "value" is a string containing a request parameter, then Cocoon will do:
+ 
+ {{{
+ value = new String(value.getBytes("ISO-8859-1"), "UTF-8");
+ }}}
+ 
+ So it recodes the incorrectly decoded string back to bytes and decodes it using the correct encoding.
+ 
+ !!Locally overriding the form-encoding
+ 
+ Cocoon is ideally suited for publishing to different kinds of devices, and it may well be possible that for certain devices, it is required to use different encodings.  In this case, you can redefine the form-encoding for specific pipelines using the SetCharacterEncodingAction.
+ 
+ To use it, first of all make sure the action is declared in the map:actions element of the sitemap:
+ {{{
+ <map:action name="set-encoding" src="org.apache.cocoon.acting.SetCharacterEncodingAction"/>
+ }}}
+ 
+ and then call the action at the required location as follows:
+ {{{
+ <map:act type="set-encoding">
+   <map:parameter name="form-encoding" value="some-other-encoding"/>
+ </map:act>
+ }}}
+ 
+ !!Problems with components using the original HttpServletRequest (JSPGenerator, ...)
+ 
+ Some components such as the JSPGenerator use the original HttpServletRequest object, instead of the Cocoon Request object. In that case, the correct decoding of request parameters will not happen (that is, if for example the JSP page itself would read request parameters).
+ 
+ One possible solution would be to patch these components to use a wrapper class that delegates all calls to the HttpServletRequest object, except for the getParameter or getParameterValues methods, which should be delegated to Cocoon's Request object.
+ 
+ There's an easier solution that can be applied right away if your servlet container supports the Servlet 2.3 specification. Starting from 2.3, the Servlet specification allows to explicitely set the encoding to be used for decoding request parameters, though this has to happen before the first request data is read. Since Cocoon reads request parameters itself (such as cocoon-reload), this would require modification of the CocoonServlet. But it can also be done using a servlet filter.  Tomcat 4 contains just such a filter in its "examples" webapp. Look for the file jakarta-tomcat/webapps/examples/WEB-INF/classes/filters/SetCharacterEncodingFilter.java. Compile it (with servlet.jar in the classpath), put it in a jar (using correct package and such) and put the jar in your webapps WEB-INF/lib directory.
+ 
+ Now modify your webapp's web.xml file to include the following (after the display-name and description elements, but before the servlet element):
+ 
+ {{{
+ <filter>
+   <filter-name>Set Character Encoding</filter-name>
+   <filter-class>filters.SetCharacterEncodingFilter</filter-class>
+   <init-param>
+     <param-name>encoding</param-name>
+     <param-value>UTF-8</param-value>
+   </init-param>
+ </filter>
+ 
+ <filter-mapping>
+   <filter-name>Set Character Encoding</filter-name>
+   <url-pattern>/*</url-pattern>
+ </filter-mapping>
+ }}}
+ 
+ Since the filter element is new in the servlet 2.3 specification, you might need to modify the DOCTYPE declaration in the web.xml:
+ 
+ {{{
+ <!DOCTYPE web-app
+     PUBLIC "-//Sun Microsystems, Inc.//DTD Web Application 2.3//EN"
+     "http://java.sun.com/dtd/web-app_2_3.dtd">
+ }}}
+ 
+ Of course, when using a servlet filter to set the encoding, you should not supply the form-encoding init parameter anymore in the web.xml. You could still supply the container-encoding parameter, though its value will now have to be the same as the encoding supplied to the filter. This will allow you to override the form-encoding using the SetCharacterEncodingAction, though only for the Cocoon Request object.
+ 
+ Using a servlet filter also has the advantage that it will work for any servlet.  Suppose your webapp consists of multiple servlets, with Cocoon being only one of them.  Sometimes the processing could start in another servlet (which sets the character encoding correctly) and then be forwarded to Cocoon, while other times the processing could start immediately in the Cocoon servlet. It would then be impossible to know in Cocoon whether the request parameter encoding needs to be corrected or not.
+ 


Page: http://wiki.cocoondev.org/Wiki.jsp?page=BrunoDumon , version: 2 on Thu Mar 13 17:17:08 2003 by 157.193.121.51

- * [ImplementingTransformers]
+ * [DevelopingComponents] and [ImplementingTransformers]
+ * [RequestParameterEncoding]