You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-dev@xerces.apache.org by Sally Nemes <sj...@us.ibm.com> on 2001/09/10 21:56:54 UTC

Handling Kanji characters

Does Xerces handle double byte characters when it parses a document? I have
UTF-8 specifed for the encoding,
but when I parse the document containing Kanji characters in values for
elements and attributes I get the following error:
 “ú•t:       01/09/10  15:14
ƒNƒ‰ƒX:      com.ibm.emms.cptk.Validator
ƒ?ƒ\ƒbƒh:     validate
  MSG#SAXParseError Xerces reports a parsing problem.
  org.xml.sax.SAXParseException: The element type "source" must be
terminated by the matching end-tag "</source>".
     at
org.apache.xerces.framework.XMLParser.reportError(XMLParser.java:1196)
     at
org.apache.xerces.framework.XMLDocumentScanner.reportFatalXMLError(XMLDocumentScanner.java:635)

     at
org.apache.xerces.framework.XMLDocumentScanner.abortMarkup(XMLDocumentScanner.java:684)

     at
org.apache.xerces.framework.XMLDocumentScanner$ContentDispatcher.dispatch(XMLDocumentScanner.java:1192)

     at
org.apache.xerces.framework.XMLDocumentScanner.parseSome(XMLDocumentScanner.java:381)

     at org.apache.xerces.framework.XMLParser.parse(XMLParser.java:1081)
     at org.apache.xerces.framework.XMLParser.parse(XMLParser.java:1122)

An example of where the Kanji characters are:
            <title>·‚Ó‚Ÿ‚ª‘?</title>
            <sortTitle>‚½‚Í‚¢‚ ‚³‚ ‚ª</sortTitle>
            <creatorString>‚ª‚͉æ‰Æ‚┃‚¤Œá‚ª‚Í‚ª‚͍‚</creatorString>
            <description>‚Ó‚Ÿ‰æ‰Æ‚Í</description>
            <basicPublishingMetadata>
                <source>ŠGƒ_ƒCƒA‰ß‘ӐŒŠ‚ ‚ ‚ç‚™</source>
            </basicPublishingMetadata>
            <itemMetadataList count="1">
                <itemMetadata>
                    <identifier scheme="doi">kj</identifier>
                    <title>All fields in congi chars</title>
                    <sortTitle>‚ ‚Ÿ‚¢‚ ‚â‚ç</sortTitle>
                    <creatorString>‚ ‚Ó‚Ÿ‚ª‚Í</creatorString>
                    <description>‚瑼’ƒ‚Ó‚¥‚Ó‚Ÿ’ƒ</description>
Thanks.
Sally Nemes

EMMS Subsystem Development, IBM Software Group
Internet Mail: sjsnemes@us.ibm.com
T/L 975-2872,  External (561) 862-2872
IMAD 4181
8051 Congress Avenue
Boca Raton, Florida 33487