You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xerces.apache.org by SAXESS - Hussayn Dabbous <da...@saxess.com> on 2002/09/15 22:42:55 UTC
JAVA: trouble with UTF8 encoding and org.w3c.dom.CharacterData.getData()
Hy, JAVA programmers
I want to read utf8 characters from an XML file using a DOMParser, but
all i get is a set of single bytes. Probably this is a dummies error,
but i don't see the point. Maybe someone can help me ???
I did the following:
1.) I have written a simple XML-file containing utf8 character encodings:
+++ begin of file +++++++++++++++++++++++++++++++++++++++++++
<?xml version="1.0" encoding="UTF-8"?>
<myxml w="150" h="200" color="FFCCDDEE">
<text font="Cyberbit Cyberspace" size="13">???</text>
</myxml>
+++ end of file +++++++++++++++++++++++++++++++++++++++++++++
The three characters enclosed in the <text>-tag are in fact three UTF8 characters.
when looking at the file with XML-spy, i can see the three characters.
when looking at the file with a unix text editor i see 9 bytes in total there, which
i have verified to be the correct utf8 encoding. This mail possibly contains
only three questionmarks ... ("???")
2.) I read the file using a DOMParser as follows:
* I create a DOMParser() instance
* I Create an InputSource(FileReader) instance
* I create a Document with DOMParser.parse(InputSource)
* Then i step through the resulting document instance,
retrieve the Elements, detect the Text, finally
read Text.getData() to retrieve the textstring.
3.) Now i expect that the text string contains 3 characters, each of them
should be a unicode character.
But all i get is 9 characters, each containing one byte of the utf-8 raw string.
i tried encoding="UTF8" but that didn't help.
What's going wrong?
Maybe i should use an InputStream(filename,"UTF-8") instead of a
FileReader instance ??? (that doesn't sound correct for me ..)
any hint would help.
regards, hussayn
--
Dr. Hussayn Dabbous
SAXESS Software Design GmbH
Neuenhöfer Allee 125
50935 Köln
Telefon: +49-221-56011-0
Fax: +49-221-56011-20
E-Mail: dabbous@saxess.com
---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org