You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@cocoon.apache.org by Bram Bouwens <br...@fredhopper.com> on 2002/11/05 10:00:06 UTC

Orion converts the XML passed to Cocoon into ISO-8859-1 where this is not wanted

Versions: orion 1.5.2/1.6.0, cocoon 2.0.3, jdk 1.3.1_06, RedHat 7.3.

We have a web application that used to produce HTML from the JSP pages
in the UTF-8 encoding, so there are no problems with most languages.

Now we split the functional part from the visual design by having the
JSP pages produce XML, and using Cocoon to render this into HTML.

The sitemap.xmap contains this:

...
            <map:match pattern="demo/**.fh">
             <map:generate type="jsp" src="/xmlout/{1}.jsp">
               <map:parameter name="use-request-parameters" value="true"/>
             </map:generate>
             <map:transform src="layout/demo/{1}.xsl"/>
             <map:serialize type="html"/>
            </map:match>
...

and cocoon.xconf contains
...
   <jsp-engine logger="core.jsp-engine">
     <parameter name="servlet-class" 
value="com.evermind.server.http.JSPServlet"/>
     <parameter name="servlet-name" value="*.jsp"/>
   </jsp-engine>
...

When I request the entry page in its XML-form /xmlout/index.jsp with the
browser (any browser) it all looks fine. It starts with

<?xml version="1.0" encoding="UTF-8"?>

and somewhere it has `België' (Belgium in Dutch) where the ë is encoded
as hex c3 ab, the correct UTF-8 encoding.

Characters like that are garbled when I look at /demo/index.fh .

I added debug printout to 
org/apache/cocoon/components/jsp/JSPEngineImpl.java,
with a class MyPrintWriter extends PrintWriter as the writer for
MyServletOutputStream. I forced the following traceback:

     at 
org.apache.cocoon.components.jsp.JSPEngineImpl$MyPrintWriter.write(JSPEngineImpl.java:342)
     at 
org.apache.cocoon.components.jsp.JSPEngineImpl$MyServletOutputStream.write(JSPEngineImpl.java:322)
     at java.io.OutputStream.write(OutputStream.java:97)
     at com.evermind.server.http.EvermindJSPWriter._vr(Unknown Source)
     at com.evermind.server.http.EvermindJSPWriter.flush(Unknown Source)
     at com.evermind.server.http.EvermindJSPWriter.close(Unknown Source)
     at 
__jspPage6_xmlout_index_jsp._jspService(__jspPage6_xmlout_index_jsp.java:2145)
     at com.orionserver.http.OrionHttpJspPage.service(Unknown Source)
     at com.evermind._ah._rad(Unknown Source)
     at com.evermind.server.http.JSPServlet.service(Unknown Source)
     at 
org.apache.cocoon.components.jsp.JSPEngineImpl.executeJSP(JSPEngineImpl.java:134)


The actual output is produced by calling void write(int b) from
MyServletOutputStream for each character, with the ISO-8859-1 encoding
of the character, sign extended, as a parameter: the ë mentioned above,
which is 235 in ISO-8859-1, is sent as -21 . This appears quite silly
and inefficient to me.

Most likely the issue is resolved simply by putting something in some
config file, as would be very evident when looking at the source of
com.evermind.server.http.EvermindJSPWriter. But unfortunately we don't have
that.

So the obvious question is: how do we fix this?

Bram Bouwens @ Fredhopper.com


---------------------------------------------------------------------
Please check that your question  has not already been answered in the
FAQ before posting.     <http://xml.apache.org/cocoon/faq/index.html>

To unsubscribe, e-mail:     <co...@xml.apache.org>
For additional commands, e-mail:   <co...@xml.apache.org>


Re: Orion converts the XML passed to Cocoon into ISO-8859-1 where this is not wanted

Posted by Bram Bouwens <br...@fredhopper.com>.
Bram Bouwens wrote:

> 
> The actual output is produced by calling void write(int b) from
> MyServletOutputStream for each character, with the ISO-8859-1 encoding
> of the character, sign extended, as a parameter: the ë mentioned above,
> which is 235 in ISO-8859-1, is sent as -21 . This appears quite silly
> and inefficient to me.
> 
I just noticed that fixing this -21 to the proper 235 already makes
the application work for now, as the current implementations happen
to be satisfied by ISO-8859-1. So as a temporary work-around I have
this patch:


--- 
old/cocoon-2.0.3/src/java/org/apache/cocoon/components/jsp/JSPEngineImpl.java 
 
                                                      Mon Jul 15 
10:56:05 2002
+++ 
project/cocoon-2.0.3/src/java/org/apache/cocoon/components/jsp/JSPEngineImpl.java 
 
                                                  Tue Nov  5 14:45:31 2002
@@ -286,8 +286,11 @@
              return this.writer;
          }
          public void write(int b) throws IOException  {
-            // This method is not used but have to be implemented
-            this.writer.write(b);
+            // This method is how Orion passes ALL the content, one 
char at a time
+            // very ugly: it isn't properly specified in which encoding 
b is
+            // but it turns out to be iso-8859-1 with the sign bit 
extended,
+            // so and with 255 to get proper 1-byte codes
+            this.writer.write(b & 255);
          }
          public byte[] toByteArray() {
              this.writer.flush();


---------------------------------------------------------------------
Please check that your question  has not already been answered in the
FAQ before posting.     <http://xml.apache.org/cocoon/faq/index.html>

To unsubscribe, e-mail:     <co...@xml.apache.org>
For additional commands, e-mail:   <co...@xml.apache.org>