You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-dev@xerces.apache.org by Sally Nemes <sj...@us.ibm.com> on 2001/09/10 21:56:54 UTC
Handling Kanji characters
Does Xerces handle double byte characters when it parses a document? I have
UTF-8 specifed for the encoding,
but when I parse the document containing Kanji characters in values for
elements and attributes I get the following error:
“ú•t: 01/09/10 15:14
ƒNƒ‰ƒX: com.ibm.emms.cptk.Validator
ƒ?ƒ\ƒbƒh: validate
MSG#SAXParseError Xerces reports a parsing problem.
org.xml.sax.SAXParseException: The element type "source" must be
terminated by the matching end-tag "</source>".
at
org.apache.xerces.framework.XMLParser.reportError(XMLParser.java:1196)
at
org.apache.xerces.framework.XMLDocumentScanner.reportFatalXMLError(XMLDocumentScanner.java:635)
at
org.apache.xerces.framework.XMLDocumentScanner.abortMarkup(XMLDocumentScanner.java:684)
at
org.apache.xerces.framework.XMLDocumentScanner$ContentDispatcher.dispatch(XMLDocumentScanner.java:1192)
at
org.apache.xerces.framework.XMLDocumentScanner.parseSome(XMLDocumentScanner.java:381)
at org.apache.xerces.framework.XMLParser.parse(XMLParser.java:1081)
at org.apache.xerces.framework.XMLParser.parse(XMLParser.java:1122)
An example of where the Kanji characters are:
<title>·‚Ó‚Ÿ‚ª‘?</title>
<sortTitle>‚½‚Í‚¢‚ ‚³‚ ‚ª</sortTitle>
<creatorString>‚ª‚͉æ‰Æ‚┃‚¤Œá‚ª‚Í‚ª‚Í‚</creatorString>
<description>‚Ó‚Ÿ‰æ‰Æ‚Í</description>
<basicPublishingMetadata>
<source>ŠGƒ_ƒCƒA‰ß‘ÓÂŒŠ‚ ‚ ‚ç‚™</source>
</basicPublishingMetadata>
<itemMetadataList count="1">
<itemMetadata>
<identifier scheme="doi">kj</identifier>
<title>All fields in congi chars</title>
<sortTitle>‚ ‚Ÿ‚¢‚ ‚â‚ç</sortTitle>
<creatorString>‚ ‚Ó‚Ÿ‚ª‚Í</creatorString>
<description>‚瑼’ƒ‚Ó‚¥‚Ó‚Ÿ’ƒ</description>
Thanks.
Sally Nemes
EMMS Subsystem Development, IBM Software Group
Internet Mail: sjsnemes@us.ibm.com
T/L 975-2872, External (561) 862-2872
IMAD 4181
8051 Congress Avenue
Boca Raton, Florida 33487