You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-dev@xerces.apache.org by bu...@apache.org on 2004/03/10 21:40:40 UTC
DO NOT REPLY [Bug 27583] New: -
Xerces throws IOExcepitons that should be SAXExceptions for bad UTF-8 and similar
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=27583>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND
INSERTED IN THE BUG DATABASE.
http://issues.apache.org/bugzilla/show_bug.cgi?id=27583
Xerces throws IOExcepitons that should be SAXExceptions for bad UTF-8 and similar
Summary: Xerces throws IOExcepitons that should be SAXExceptions
for bad UTF-8 and similar
Product: Xerces2-J
Version: 2.6.2
Platform: Other
OS/Version: Other
Status: NEW
Severity: Normal
Priority: Other
Component: SAX
AssignedTo: xerces-j-dev@xml.apache.org
ReportedBy: elharo@metalab.unc.edu
When Xerces (XMLReader.parse()) encounters malformed Unicode data such as an
invalid UTF-8 sequence it throws an IOException, more specifically a
UTFDataFormatException or a CharConversionException. However, according to the
SAX and XML specificaitons this should be a SAXException which is reported to
the ErrorHandler's fatalError() mehtod.
Note first from the XML spec which states, in section 4.3.3:
It is a fatal error when an XML processor encounters an entity with an encoding
that it is unable to process. It is a fatal error if an XML entity is determined
(via default, encoding declaration, or higher-level protocol) to be in a certain
encoding but contains byte sequences that are not legal in that encoding.
Specifically, it is a fatal error if an entity encoded in UTF-8 contains any
irregular code unit sequences, as defined in Unicode 3.1 [Unicode3]. Unless an
encoding is determined by a higher-level protocol, it is also a fatal error if
an XML entity contains no encoding declaration and its content is not legal
UTF-8 or UTF-16.
The SAX spec says of the fatalError() method, "This corresponds to the
definition of "fatal error" in section 1.2 of the W3C XML 1.0 Recommendation.
For example, a parser would use this callback to report the violation of a
well-formedness constraint." At one point I thought it was OK to report this as
an IOException. However, since the XML spec is unambiguous that character
encoding errors are fatal errors, and since the SAX spec does not limit fatal
errors to well-formedness errors, I think character encoding errors should be
reported as SAXExceptions rather than IOExceptions, and should be reported ot
the fatalError method.
---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org