You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-dev@xerces.apache.org by "Boris Kolpackov (JIRA)" <xe...@xml.apache.org> on 2009/09/22 20:26:19 UTC

[jira] Commented: (XERCESC-1288) Wrong line/column number in UTFDataFormatException

    [ https://issues.apache.org/jira/browse/XERCESC-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758339#action_12758339 ] 

Boris Kolpackov commented on XERCESC-1288:
------------------------------------------

>From Alberto Massari:

"[T]he invalid byte is detected by the transcoder when reading a new chunk of data, but before the good data is processed. So the last known position is distant from the error location. The fix could be returning the data that is valid, and report the error only when the bad data at the beginning of the chunk."

> Wrong line/column number in UTFDataFormatException
> --------------------------------------------------
>
>                 Key: XERCESC-1288
>                 URL: https://issues.apache.org/jira/browse/XERCESC-1288
>             Project: Xerces-C++
>          Issue Type: Bug
>          Components: DOM, Non-Validating Parser, SAX/SAX2
>    Affects Versions: 2.5.0, 2.6.0
>         Environment: Linux (SUSE 9.1, Fedora core 2, Redhat 9) on Intel, Solaris 7 on SPARC,  various gcc versions.
>            Reporter: Valerio Gionco
>            Priority: Minor
>
> I've the following (bad) XML file:
> --------------- bad.xml ----------------------------
> <?xml version="1.0" encoding="UTF-8"?>
> <block>
>         <field>Blah blah</field>
>         <field>Blah blah ò blah blah</field>
>         <field>Blah blah</field>
> </block>
> ----------------------------------------------------
> (note the accented 'o' in the 2nd "field" line - hope it won't be
> destroyed...)
> The file is bad because the accented 'o' is represented with a single
> byte, 0xf2. This is the hed dump:
> 3e 42 6c 61 68 20 62 6c  61 68 20 f2 20 62 6c 61  |>Blah blah . bla|
> Problem is, when I run "SAXPrint bad.xml" i get the following error:
> Fatal Error at file /users/valerio/tmp/bad.xml, line 1, char 39
>   Message: An exception occurred! Type:UTFDataFormatException, Message:invalid byte 2 ( ) of a 4-byte sequence.
> The row and column reported by SAXParseException::getColumnNumber()
> and SAXParseException::getLineNumber() are wrong. I seem to recall
> this was not the case with older (2.0 or 2.2?) versions of Xerces-C,
> but I'm not sure.
> I noticed the issue with 2.5, then tried with 2.6 but there was
> no apparent difference. Can somebody take care of this? We often
> have big XML files to parse, and not knowing where the error
> really is is a real pain.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org