You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-dev@xerces.apache.org by bu...@apache.org on 2001/03/21 20:25:53 UTC

[Bug 1070] New - SAX parser not working on UTF-16 coding.

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=1070

*** shadow/1070	Wed Mar 21 11:25:52 2001
--- shadow/1070.tmp.21552	Wed Mar 21 11:25:52 2001
***************
*** 0 ****
--- 1,69 ----
+ +============================================================================+
+ | SAX parser not working on UTF-16 coding.                                   |
+ +----------------------------------------------------------------------------+
+ |        Bug #: 1070                        Product: Xerces-J                |
+ |       Status: NEW                         Version: 1.3.0                   |
+ |   Resolution:                            Platform: PC                      |
+ |     Severity: Normal                   OS/Version:                         |
+ |     Priority: High                      Component: SAX                     |
+ +----------------------------------------------------------------------------+
+ |  Assigned To: xerces-j-dev@xml.apache.org                                  |
+ |  Reported By: Peter.Qi@hummingbird.com                                     |
+ +----------------------------------------------------------------------------+
+ |          URL:                                                              |
+ +============================================================================+
+ |                              DESCRIPTION                                   |
+ Hi,
+ 
+ In the FAQ of Xerces-J, it says that the parser supports a lot 
+ of encodings, including UTF-16.  But after I used version 1.3.0
+ and 1.3.1 of the parser, I found that the encoding was not 
+ supported.  Could someone please check that?  Thanks a lot.
+ 
+ Peter Qi
+ 
+ PS:  The following is the reply from Andy Clark <an...@apache.org> 
+ about the bug:
+ 
+ Peter Qi wrote:
+ > After I had the attached two files passed.  The following error
+ > messages were generated:
+ 
+ Okay, now I can reproduce your error. I think that this is just
+ a missing mapping in the MIME2Java table used by the parser to
+ translate IANA encoding names into Java encoding names. Please
+ open a bug to this affect using Bugzilla at:
+ 
+   http://nagoya.apache.org/bugzilla/
+ 
+ In fact, you should put in the bug report that all of the
+ defined IANA mappings should be added to the mapping table --
+ at least the ones where decoders are present in Java. The URL
+ to the list of encodings (and their aliases) is at:
+ 
+   http://ww.isi.edu/in-notes/iana/assignments/character-sets
+ 
+ Please realize that bugs get fixed faster when *you* fix the
+ bug and post the patch to the mailing list (as an file
+ attachment and not inline).
+ 
+ Incidentally, your attached XML document wasn't even encoded
+ in UTF-16. It was just straight ASCII which would produce an
+ error separate from the one that you saw. Please make sure
+ that you generate a truly UTF-16 file if you're going to set
+ the encoding in the XMLDecl line.
+ 
+ Some other points:
+ 
+ 1) Instead of being so specific about the endian-ness of your
+    document (because the parser will determine that by either
+    the BOM or the first few bytes in the file), just use
+    "UTF-16" as your encoding name. (Although, I made this
+    change and still get the same error. Strange...)
+ 2) Never put in a link to your DTD like that. Always use either
+    a relative or absolute URI and use an EntityResolver, if
+    needed, to locate the DTD. Otherwise your documents are not
+    portable.
+ 
+ -- 
+ Andy Clark * IBM, TRL - Japan * andyc@apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org