You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@cocoon.apache.org by Magnus Haraldsen Amundsen <Ma...@computas.com> on 2008/04/06 19:16:03 UTC

Cocoon and UTF-8: Invalid byte 2 of 3-byte UTF-8 sequence

Hi,

I'm still having problems with Cocoon and UTF-8 using Windows XP/Vista.
Every time a searchresult/page content etc. contains the norwegian characters "æ ø å" I get a org.xml.sax.SAXParseException: Invalid byte 2 of 3-byte UTF-8 sequence. This problem does not occur with Linux.
I've created a smallest possible code example to recreate the exception. This code (zipped) can be found here: https://submarine.computas.com/sublima/trunk/temp/Cocoontest.zip

The basic flow of the code example is:

1. Request a URL
2. Sitemap matches the URL and calls a StatelessAppleController
3. The StatelessAppleController adds a String containg the special characters to a Map, and forwards it using res.sendPage("xml/test", bizData);
4. Sitemap matches xml/test and 

<map:match pattern="xml/*">
  <map:generate src="templates/{1}.jx.xml" type="jx"/>
  <map:transform src="transforms/test.xslt"/>
  <map:serialize type="xml"/>
</map:match>

The jx.xml takes the String from the Map in the StatelessAppleController from a <jx:out value="#{testresults}" xmlize="true"/>

I've followed the How to configure consistent encoding in Cocoon-steps, but it still doesn't work.

Could anyone take a look at the code and see if the spot the problem/solution? 

- Magnus

IMPORTANT NOTICE:
This message may contain confidential information. 
If you have received this e-mail in error, do not use, copy or 
distribute it. Do not open any attachments. Delete it immediately from
your system and notify the sender promptly by e-mail that you 
have done so. Thank you.


Re: Cocoon and UTF-8: Invalid byte 2 of 3-byte UTF-8 sequence

Posted by Bertrand Delacretaz <bd...@apache.org>.
On Sun, Apr 6, 2008 at 7:16 PM, Magnus Haraldsen Amundsen
> ... Every time a searchresult/page content etc. contains the norwegian
> characters "æ ø å" I get a org.xml.sax.SAXParseException: Invalid byte 2 of
> 3-byte UTF-8 sequence. This problem does not occur with Linux....

This most probably means at least one of your XML files has a wrong or
missing encoding declaration. Try opening those files in an XML aware
editor, for example, to find out which ones cause problems.

-Bertrand