You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-dev@xerces.apache.org by bu...@apache.org on 2002/07/17 21:11:02 UTC

DO NOT REPLY [Bug 10918] New: - copyright symbol causing UTF-8 error.

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=10918>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=10918

copyright symbol causing UTF-8 error.

           Summary: copyright symbol causing UTF-8 error.
           Product: Xerces2-J
           Version: 2.0.2
          Platform: PC
               URL: ftp://ftp.bind.ca/BIND/spec/xmldtd/BIND.dtd
        OS/Version: Linux
            Status: NEW
          Severity: Normal
          Priority: Other
         Component: SAX
        AssignedTo: xerces-j-dev@xml.apache.org
        ReportedBy: aarenson@iupui.edu


The URL above leads to a file which, when I attempt to use SAX, gives:

java.io.UTFDataFormatException: invalid byte 1 of 1-byte UTF-8 sequence (0xa9)

The culprit is a 'copyright' symbol. Using od -hc, I get:

0006260 7279 6769 7468 a920 3032 3130 4d20 756f
          y   r   i   g   h   t       �   2   0   0   1       M   o   u

If I delete the 'copyright' character, I don't get the error. Shouldn't SAX be
able to handle this character?

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org