You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xerces.apache.org by "Jesus (John) Salvo Jr." <jo...@softgame.com.au> on 2001/11/26 08:05:26 UTC
Unicode characters written as ? in XML file
Xerces J 1.4.4.
I am trying to convert an existing Unicode text file ( sampleutf16.txt )
into XML.
Attached is my sample program UnicodeTest.java ( Set the first parameter as
the name of the input text file, the second parameter the name of the output
XML file. )
The output ( sample.xml ) that I get is:
<?xml version="1.0" encoding="UTF-8"?>
<trivia-questions>
<question ask="??2001??????????????????????????"/>
</trivia-questions>
What I was expecting was something like ( for sampleutf16.txt ):
<?xml version="1.0" encoding="UTF-8"?>
<trivia-questions>
<question ask="截止2001年.........."/>
</trivia-questions>
( See section 1.1 of http://www.unicode.org/unicode/reports/tr20/ )
I got the "&#x" values using "watch expression [ and show as Hex ]" in
JBuilder while debugging. Also compared that with the hex editor. The sample
program reads in the unicode text file into the variable "line" all fine.
I have also tried reading in an UTF8 file ( sampleutf8.txt ) by replacing
the the following line the UnicodeTest.java from:
InputStreamReader isr = new InputStreamReader( new FileInputStream(
inputFile ), "UTF-16" ); // You cant use this with sampleutf8.txt
to:
InputStreamReader isr = new InputStreamReader( new FileInputStream(
inputFile ), "UTF-8" ); // You cant use this with sampleutf16.txt
...with the same results.
What am I doing wrong?
John