You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-dev@xerces.apache.org by John Ward - XML Core Development <Jo...@ireland.sun.com> on 2000/08/01 17:50:07 UTC

Possible bug in Xerces?

Hi,

I have encountered a problem with the Xerces-j parser on a particular file. I 
have done a CVS bringover of the apache xerces source and built it. However when 
I run one of the sample applications over the following file:

<!DOCTYPE doc [
<!ELEMENT doc (#PCDATA)>
]>
<doc>£</doc>

I get the following error:


[Fatal Error] test.xml:5:1: The element type "doc" must be terminated by the 
matching end-tag "</doc>".
org.xml.sax.SAXException: Stopping after fatal error: The element type "doc" 
must be terminated by the matching end-tag "</doc>".
        at java.lang.Throwable.fillInStackTrace(Native Method)
        at java.lang.Throwable.fillInStackTrace(Compiled Code)
        at java.lang.Throwable.<init>(Compiled Code)
        at java.lang.Exception.<init>(Exception.java:42)
        at org.xml.sax.SAXException.<init>(SAXException.java:45)
        at 
org.apache.xerces.framework.XMLParser.reportError(XMLParser.java:1001)
        at 
org.apache.xerces.framework.XMLDocumentScanner.reportFatalXMLError(XMLDocumentSc
anner.java:634)
        at 
org.apache.xerces.framework.XMLDocumentScanner$ContentDispatcher.endOfInput(XMLD
ocumentScanner.java:1457)
        at 
org.apache.xerces.framework.XMLDocumentScanner.endOfInput(XMLDocumentScanner.jav
a:417)
        at 
org.apache.xerces.validators.common.XMLValidator.sendEndOfInputNotifications(XML
Validator.java:448)
        at 
org.apache.xerces.readers.DefaultEntityHandler.changeReaders(DefaultEntityHandle
r.java:1006)
        at 
org.apache.xerces.readers.XMLEntityReader.changeReaders(XMLEntityReader.java:168
)
        at 
org.apache.xerces.readers.UTF8Reader.changeReaders(UTF8Reader.java:182)
        at org.apache.xerces.readers.UTF8Reader.scanContent(Compiled Code)
        at 
org.apache.xerces.framework.XMLDocumentScanner$ContentDispatcher.dispatch(Compil
ed Code)
        at org.apache.xerces.framework.XMLDocumentScanner.parseSome(Compiled 
Code)
        at org.apache.xerces.framework.XMLParser.parse(XMLParser.java:861)
        at org.apache.xerces.framework.XMLParser.parse(XMLParser.java:900)
        at sax.SAXCount.print(SAXCount.java:152)
        at sax.SAXCount.main(Compiled Code)


According to the XML conformance test suite this file should be valid.  The £ 
character should be allowed as element content. 

Is this a bug or am is there some character encoding stuff that I am missing 
from the xerces build?

Thanks

John Ward.


Re: Possible bug in Xerces?

Posted by Andy Clark <an...@apache.org>.
Eric Ye wrote:
> in your sample file is out of the ASCII range, 0-127, you need to 
> specify the encoding to be "ISO-8859-1". The default encoding is 
> UTF8, with UTF8 the code points that is bigger than 127 could be 
> encoded as 2, 3, 4 bytes.

...which is why the parser "eats" the next characters, thus losing
the start of the closing element tag.

-- 
Andy Clark * IBM, JTC - Silicon Valley * andyc@apache.org

Re: Possible bug in Xerces?

Posted by Eric Ye <er...@locus.apache.org>.
Since the charater:
£

in your sample file is out of the ASCII range, 0-127, you need to specify
the encoding to be "ISO-8859-1". The default encoding is UTF8, with UTF8 the
code points that is bigger than 127 could be encoded as 2, 3, 4 bytes.

____


Eric Ye * IBM, JTC - Silicon Valley * ericye@locus.apache.org

----- Original Message -----
From: "John Ward - XML Core Development" <Jo...@ireland.sun.com>
To: <xe...@xml.apache.org>
Sent: Tuesday, August 01, 2000 8:50 AM
Subject: Possible bug in Xerces?


> Hi,
>
> I have encountered a problem with the Xerces-j parser on a particular
file. I
> have done a CVS bringover of the apache xerces source and built it.
However when
> I run one of the sample applications over the following file:
>
> <!DOCTYPE doc [
> <!ELEMENT doc (#PCDATA)>
> ]>
> <doc>£</doc>
>
> I get the following error:
>
>
> [Fatal Error] test.xml:5:1: The element type "doc" must be terminated by
the
> matching end-tag "</doc>".
> org.xml.sax.SAXException: Stopping after fatal error: The element type
"doc"
> must be terminated by the matching end-tag "</doc>".
>         at java.lang.Throwable.fillInStackTrace(Native Method)
>         at java.lang.Throwable.fillInStackTrace(Compiled Code)
>         at java.lang.Throwable.<init>(Compiled Code)
>         at java.lang.Exception.<init>(Exception.java:42)
>         at org.xml.sax.SAXException.<init>(SAXException.java:45)
>         at
> org.apache.xerces.framework.XMLParser.reportError(XMLParser.java:1001)
>         at
>
org.apache.xerces.framework.XMLDocumentScanner.reportFatalXMLError(XMLDocume
ntSc
> anner.java:634)
>         at
>
org.apache.xerces.framework.XMLDocumentScanner$ContentDispatcher.endOfInput(
XMLD
> ocumentScanner.java:1457)
>         at
>
org.apache.xerces.framework.XMLDocumentScanner.endOfInput(XMLDocumentScanner
.jav
> a:417)
>         at
>
org.apache.xerces.validators.common.XMLValidator.sendEndOfInputNotifications
(XML
> Validator.java:448)
>         at
>
org.apache.xerces.readers.DefaultEntityHandler.changeReaders(DefaultEntityHa
ndle
> r.java:1006)
>         at
>
org.apache.xerces.readers.XMLEntityReader.changeReaders(XMLEntityReader.java
:168
> )
>         at
> org.apache.xerces.readers.UTF8Reader.changeReaders(UTF8Reader.java:182)
>         at org.apache.xerces.readers.UTF8Reader.scanContent(Compiled Code)
>         at
>
org.apache.xerces.framework.XMLDocumentScanner$ContentDispatcher.dispatch(Co
mpil
> ed Code)
>         at
org.apache.xerces.framework.XMLDocumentScanner.parseSome(Compiled
> Code)
>         at org.apache.xerces.framework.XMLParser.parse(XMLParser.java:861)
>         at org.apache.xerces.framework.XMLParser.parse(XMLParser.java:900)
>         at sax.SAXCount.print(SAXCount.java:152)
>         at sax.SAXCount.main(Compiled Code)
>
>
> According to the XML conformance test suite this file should be valid.
The £
> character should be allowed as element content.
>
> Is this a bug or am is there some character encoding stuff that I am
missing
> from the xerces build?
>
> Thanks
>
> John Ward.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-dev-help@xml.apache.org
>
>