You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-dev@xerces.apache.org by bu...@apache.org on 2001/03/21 20:25:53 UTC
[Bug 1070] New - SAX parser not working on UTF-16 coding.
http://nagoya.apache.org/bugzilla/show_bug.cgi?id=1070
*** shadow/1070 Wed Mar 21 11:25:52 2001
--- shadow/1070.tmp.21552 Wed Mar 21 11:25:52 2001
***************
*** 0 ****
--- 1,69 ----
+ +============================================================================+
+ | SAX parser not working on UTF-16 coding. |
+ +----------------------------------------------------------------------------+
+ | Bug #: 1070 Product: Xerces-J |
+ | Status: NEW Version: 1.3.0 |
+ | Resolution: Platform: PC |
+ | Severity: Normal OS/Version: |
+ | Priority: High Component: SAX |
+ +----------------------------------------------------------------------------+
+ | Assigned To: xerces-j-dev@xml.apache.org |
+ | Reported By: Peter.Qi@hummingbird.com |
+ +----------------------------------------------------------------------------+
+ | URL: |
+ +============================================================================+
+ | DESCRIPTION |
+ Hi,
+
+ In the FAQ of Xerces-J, it says that the parser supports a lot
+ of encodings, including UTF-16. But after I used version 1.3.0
+ and 1.3.1 of the parser, I found that the encoding was not
+ supported. Could someone please check that? Thanks a lot.
+
+ Peter Qi
+
+ PS: The following is the reply from Andy Clark <an...@apache.org>
+ about the bug:
+
+ Peter Qi wrote:
+ > After I had the attached two files passed. The following error
+ > messages were generated:
+
+ Okay, now I can reproduce your error. I think that this is just
+ a missing mapping in the MIME2Java table used by the parser to
+ translate IANA encoding names into Java encoding names. Please
+ open a bug to this affect using Bugzilla at:
+
+ http://nagoya.apache.org/bugzilla/
+
+ In fact, you should put in the bug report that all of the
+ defined IANA mappings should be added to the mapping table --
+ at least the ones where decoders are present in Java. The URL
+ to the list of encodings (and their aliases) is at:
+
+ http://ww.isi.edu/in-notes/iana/assignments/character-sets
+
+ Please realize that bugs get fixed faster when *you* fix the
+ bug and post the patch to the mailing list (as an file
+ attachment and not inline).
+
+ Incidentally, your attached XML document wasn't even encoded
+ in UTF-16. It was just straight ASCII which would produce an
+ error separate from the one that you saw. Please make sure
+ that you generate a truly UTF-16 file if you're going to set
+ the encoding in the XMLDecl line.
+
+ Some other points:
+
+ 1) Instead of being so specific about the endian-ness of your
+ document (because the parser will determine that by either
+ the BOM or the first few bytes in the file), just use
+ "UTF-16" as your encoding name. (Although, I made this
+ change and still get the same error. Strange...)
+ 2) Never put in a link to your DTD like that. Always use either
+ a relative or absolute URI and use an EntityResolver, if
+ needed, to locate the DTD. Otherwise your documents are not
+ portable.
+
+ --
+ Andy Clark * IBM, TRL - Japan * andyc@apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org