You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@cocoon.apache.org by Egor <f_...@inbox.ru> on 2003/02/20 11:27:40 UTC

Broken encoding UTF-8 in XSP

Hello gurus of cocoon.

I have a problem with encoding. I use a simple input XML-file (2.xml)
with 4 internationalization characters (Russian).

<?xml version="1.0" encoding="UTF-8"?>
<xsp:page language="java" xmlns:xsp="http://apache.org/xsp">
  <Test>Ñ'Ð÷Ñ_Ñ'123</Test>
</xsp:page>

And two entries in the sitemap.xconf:

// Just generate and serialize
<map:match pattern="2.xml">
 <map:generate src="docs/2.xml"/>
 <map:serialize type="xml"/>
</map:match>

// Generate serverpage .java class and serialize
<map:match pattern="2.xsp">
 <map:generate type="serverpages" src="docs/2.xml"/>
 <map:serialize type="xml"/>
</map:match>

Than I'm trying to get it by URL http://..../2.xml and see the same is input file
The hex dump of 4 international chars in UTF-8 is "D1 27  D0 F7  D1 5F  D1 27"
So all is correct.

But if I'm trying to get it by URL http://.../2.xsp i get broken
output XML-file:
<?xml version="1.0" encoding="UTF-8"?>
<page xmlns:xsp="http://apache.org/xsp">
  <Test>Ã_Ã_Ã+Ã_123</Test>
</page>
The hex dump of 4 international chars in UTF-8 is "C3 5F  C3 5F  C3 2B  C3 5F"
So charcodes was changed by the way. And that is Western Europe characters.

If we take a look in generated _2_xml.java class we'll see:
...
this.characters("òåñò123");
...
This is the same 4 chars in Cp1251 (default windows encoding for
russian locale). And that is correct because JAVA's property
file.encoding=Cp1251.

Help me to understand what the problem is.
Where are the encoding changing?
Sorry for my bad English :)


I'm using Windows 2000, Tomcat-4.0.4, Cocoon-2.0.2

My java properties:
java.runtime.name=Java(TM) 2 Runtime Environment, Standard Edition
vendor-url=http\://xml.apache.org/xalan-j
sun.boot.library.path=C\:\\bin\\jdk\\jre\\bin
java.vm.version=1.3.1_03-b03
java.vm.vendor=Sun Microsystems Inc.
java.vendor.url=http\://java.sun.com/
path.separator=;
java.vm.name=Java HotSpot(TM) Client VM
file.encoding.pkg=sun.io
org.xml.sax.driver=org.apache.xerces.parsers.SAXParser
java.vm.specification.name=Java Virtual Machine Specification
user.dir=C\:\\bin\\tomcat
java.runtime.version=1.3.1_03-b03
java.awt.graphicsenv=sun.awt.Win32GraphicsEnvironment
os.arch=x86
java.io.tmpdir=c\:\\tmp\\
line.separator=\r\n
java.vm.specification.vendor=Sun Microsystems Inc.
java.naming.factory.url.pkgs=org.apache.naming
java.awt.fonts=
os.name=Windows 2000
vendor=Apache Software Foundation
java.library.path=C\:\\bin\\jdk\\bin;.;C\:\\WINNT\\System32;C\:\\WINNT;C\:\\bin\\j2sdk1.4\\bin;C\:\\oracle\\ora90\\bin;C\:\\WINNT\\system32;C\:\\WINNT;C\:\\WINNT\\System32\\Wbem;c\:\\bin\\cygwin\\bin;c\:\\bin;c\:\\bin\\jdk\\bin;C\:\\bin\\jdk\\jre\\bin;C\:\\oracle\\ora90\\bin;C\:\\WINNT\\system32;C\:\\WINNT;C\:\\WINNT\\System32\\Wbem;c\:\\bin\\cygwin\\bin;c\:\\bin
java.specification.name=Java Platform API Specification
java.class.version=47.0
os.version=5.0
user.home=C\:\\Documents and Settings\\egor
catalina.useNaming=true
user.timezone=Europe/Moscow
java.awt.printerjob=sun.awt.windows.WPrinterJob
file.encoding=Cp1251
java.specification.version=1.3
catalina.home=C\:\\bin\\tomcat
java.class.path=C\:\\bin\\tomcat\\bin\\bootstrap.jar
user.name=egor
java.naming.factory.initial=org.apache.naming.java.javaURLContextFactory
java.vm.specification.version=1.0
java.home=C\:\\bin\\jdk\\jre
user.language=ru
java.specification.vendor=Sun Microsystems Inc.
awt.toolkit=sun.awt.windows.WToolkit
java.vm.info=mixed mode
java.version=1.3.1_03
java.ext.dirs=C\:\\bin\\jdk\\jre\\lib\\ext
sun.boot.class.path=C\:\\bin\\jdk\\jre\\lib\\rt.jar;C\:\\bin\\jdk\\jre\\lib\\i18n.jar;C\:\\bin\\jdk\\jre\\lib\\sunrsasign.jar;C\:\\bin\\jdk\\jre\\classes
java.vendor=Sun Microsystems Inc.
catalina.base=C\:\\bin\\tomcat
file.separator=\\
java.vendor.url.bug=http\://java.sun.com/cgi-bin/bugreport.cgi
version=2.3.1
sun.io.unicode.encoding=UnicodeLittle
sun.cpu.endian=little
user.region=RU
sun.cpu.isalist=pentium i486 i386

-- 
Best regards,
 Egor                          mailto:f_egor1@inbox.ru


---------------------------------------------------------------------
Please check that your question  has not already been answered in the
FAQ before posting.     <http://xml.apache.org/cocoon/faq/index.html>

To unsubscribe, e-mail:     <co...@xml.apache.org>
For additional commands, e-mail:   <co...@xml.apache.org>