You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@uima.apache.org by "Thilo Goetz (JIRA)" <ui...@incubator.apache.org> on 2007/05/28 16:49:15 UTC
[jira] Resolved: (UIMA-387) XMI Serializer can write invalid
control characters
[ https://issues.apache.org/jira/browse/UIMA-387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Thilo Goetz resolved UIMA-387.
------------------------------
Resolution: Fixed
Assignee: Adam Lally (was: Thilo Goetz)
I have "fixed" this by having the XMI serializer throw a SAXParseException when encountering an illegal character. Adam, please review. Thanks.
> XMI Serializer can write invalid control characters
> ---------------------------------------------------
>
> Key: UIMA-387
> URL: https://issues.apache.org/jira/browse/UIMA-387
> Project: UIMA
> Issue Type: Bug
> Components: Core Java Framework
> Affects Versions: 2.1
> Reporter: Adam Lally
> Assignee: Adam Lally
> Fix For: 2.2
>
>
> On 5/1/07, Leo Ferres <lf...@ccs.carleton.ca> wrote:
> > Hello,
> >
> > While trying to open an xmi file after processing in xml view, an
> > error pops up telling me that there is an invalid  xml character.
> > the error comes from the sax parser. Below is the stack trace. Thanks
> > very much for your help,
> >
> Most control characters are not allowed in XML 1.0, even if they are
> escaped with &#xxx. If your input document contains such characters,
> the XMI CAS serializer is writing them to the output XMI document,
> making it unreadable.
> I checked that if you edit the XMI document and change the first line to:
> <?xml version="1.1" encoding="UTF-8"?>
> The problem goes away, because XML version 1.1 does allow escaped
> control characters.
> So one possibility for us to fix this in UIMA is to have the XMI CAS
> Serializer generate XML version 1.1 tag by default. (I think we
> considered that before and decided not to for some reason, maybe we
> were worried that other applications might not be able to consume XML
> 1.1? I can't remember. :)
> Another possibility would be to have the XMI serializer automatically
> replace these characters with spaces. The XCAS (not XMI) serializer
> does that, but only for the document text, not for feature values. We
> could also serialize the XMI using XML version 1.1, which allows
> escaped control characters (but still not the 0x00 character).
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.