You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-dev@xerces.apache.org by bu...@apache.org on 2003/11/10 21:03:18 UTC
DO NOT REPLY [Bug 24579] New: -
[XML 1.0] - E27: Must reject non-shortest forms in UTF-8
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=24579>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND
INSERTED IN THE BUG DATABASE.
http://nagoya.apache.org/bugzilla/show_bug.cgi?id=24579
[XML 1.0] - E27: Must reject non-shortest forms in UTF-8
Summary: [XML 1.0] - E27: Must reject non-shortest forms in UTF-8
Product: Xerces2-J
Version: 2.5.0
Platform: All
URL: http://www.w3.org/XML/xml-V10-2e-errata#E27
OS/Version: All
Status: NEW
Severity: Normal
Priority: Other
Component: Other
AssignedTo: xerces-j-dev@xml.apache.org
ReportedBy: mrglavas@ca.ibm.com
E27 [1] states that "it is a fatal error if an entity encoded in UTF-8 contains
any irregular code unit sequences, as defined in Unicode 3.1". I had a look at
this errata sometime ago, and in addition to irregular code unit sequences
being a fatal error, we should also reject non-shortest forms. These non-
shortest forms (such as C0 80 or E0 80 80
which both correspond to codepoint 0), are not legal in Unicode 3.1. See "UTF-8
Corrigendum" and "Table 3.1B. Legal UTF-8 Byte Sequences" of Unicode 3.1 [3].
[1] http://www.w3.org/XML/xml-V10-2e-errata#E27
---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org