You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-dev@xerces.apache.org by xe...@xml.apache.org on 2004/10/15 10:35:51 UTC

[jira] Created: (XERCESC-1288) Wrong line/column number in UTFDataFormatException

Message:

  A new issue has been created in JIRA.

---------------------------------------------------------------------
View the issue:
  http://issues.apache.org/jira/browse/XERCESC-1288

Here is an overview of the issue:
---------------------------------------------------------------------
        Key: XERCESC-1288
    Summary: Wrong line/column number in UTFDataFormatException
       Type: Bug

     Status: Unassigned
   Priority: Minor

    Project: Xerces-C++
 Components: 
             DOM
             Non-Validating Parser
             SAX/SAX2
   Versions:
             2.5.0
             2.6.0

   Assignee: 
   Reporter: Valerio Gionco

    Created: Fri, 15 Oct 2004 1:34 AM
    Updated: Fri, 15 Oct 2004 1:34 AM
Environment: Linux (SUSE 9.1, Fedora core 2, Redhat 9) on Intel, Solaris 7 on SPARC,  various gcc versions.

Description:
I've the following (bad) XML file:
--------------- bad.xml ----------------------------
<?xml version="1.0" encoding="UTF-8"?>
<block>
        <field>Blah blah</field>
        <field>Blah blah ò blah blah</field>
        <field>Blah blah</field>
</block>
----------------------------------------------------
(note the accented 'o' in the 2nd "field" line - hope it won't be
destroyed...)
The file is bad because the accented 'o' is represented with a single
byte, 0xf2. This is the hed dump:

3e 42 6c 61 68 20 62 6c  61 68 20 f2 20 62 6c 61  |>Blah blah . bla|

Problem is, when I run "SAXPrint bad.xml" i get the following error:
Fatal Error at file /users/valerio/tmp/bad.xml, line 1, char 39
  Message: An exception occurred! Type:UTFDataFormatException, Message:invalid byte 2 ( ) of a 4-byte sequence.

The row and column reported by SAXParseException::getColumnNumber()
and SAXParseException::getLineNumber() are wrong. I seem to recall
this was not the case with older (2.0 or 2.2?) versions of Xerces-C,
but I'm not sure.

I noticed the issue with 2.5, then tried with 2.6 but there was
no apparent difference. Can somebody take care of this? We often
have big XML files to parse, and not knowing where the error
really is is a real pain.


---------------------------------------------------------------------
JIRA INFORMATION:
This message is automatically generated by JIRA.

If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa

If you want more information on JIRA, or have a bug to report see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org