You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@uima.apache.org by "Marshall Schor (JIRA)" <ui...@incubator.apache.org> on 2007/03/01 00:15:51 UTC

[jira] Commented: (UIMA-194) Tools highlight incorrect annotation offsets due to XML serialization bug in Sun Java 1.4.2

    [ https://issues.apache.org/jira/browse/UIMA-194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12476757 ] 

Marshall Schor commented on UIMA-194:
-------------------------------------

In http://www.w3.org/TR/2003/WD-xslt-xquery-serialization-20030502/#xml-output , for XSLT 2.0, XQUERY 1.0 serialization requires that "CR characters in text nodes should be written as &#xD; or an equivalent; while CR, NL, and TAB characters in attribute nodes should be output respectively as &#xD;, &#xA;, and &#x9;, or their equivalents."

This is required for XSLT 2, but maybe not for 1.  The spec uses the word "should" rather than "must".

I wonder if we could run a little test at startup and warn the user if this wasn't happening with a suggested fix to change Java levels etc.





> Tools highlight incorrect annotation offsets due to XML serialization bug in Sun Java 1.4.2
> -------------------------------------------------------------------------------------------
>
>                 Key: UIMA-194
>                 URL: https://issues.apache.org/jira/browse/UIMA-194
>             Project: UIMA
>          Issue Type: Bug
>          Components: Documentation, Tools
>         Environment: Sun Java 1.4.2_12
>            Reporter: Adam Lally
>            Priority: Minor
>             Fix For: 2.2
>
>
> The XML serialization support in Sun Java 1.4.2_12 doesn't serialize CR characters to XML.  As a result, if the document text contains CR characters, XCAS or XMI serialization will cause them to be lost, resulting in incorrect annotation offsets.  This is exposed in the DocumentAnalyzer, with the highlighting being incorrect if the input document contains CR characters.
> Unit test failure occurred in XCasToCasDataHandlerTest, but the test was modified so it passes.  In that test an assertEquals of two strings fails, yet the strings appear identical in the compare viewer.
> The problem does not occur in Sun Java 1.5 or later, or with IBM Java.
> Probably a documentation update is an appropriate way to address this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.