You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@uima.apache.org by "Marshall Schor (JIRA)" <de...@uima.apache.org> on 2013/07/01 21:44:20 UTC
[jira] [Commented] (UIMA-2849) XMLSerializer is not robust to ascii
control characters
[ https://issues.apache.org/jira/browse/UIMA-2849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13697097#comment-13697097 ]
Marshall Schor commented on UIMA-2849:
--------------------------------------
See also
http://uima.apache.org/d/uimaj-2.4.0/tutorials_and_users_guides.html#ugr.tug.xmi_emf.xml_character_issues
which discusses this issue and mentions this class:
org.apache.uima.internal.util.XMLUtils
which has some methods to check strings for invalid characters. It has some normalization methods, but these don't include normalizing the wider range of non-xml characters that are valid in 1.1 but not in 1.0.
Patches for this class that enhance this functionality are most welcome!
> XMLSerializer is not robust to ascii control characters
> --------------------------------------------------------
>
> Key: UIMA-2849
> URL: https://issues.apache.org/jira/browse/UIMA-2849
> Project: UIMA
> Issue Type: Bug
> Components: Core Java Framework
> Affects Versions: 2.4.0SDK
> Reporter: Matthew Hatem
>
> If any strings in the CAS contain an ascii control character the XMLSerializer fails with exception below. XMLSerializer appears to be escaping other invalid XML characters like '&' and '<'. Perhaps it would be appropriate to remove control characters (or escape these characters as well in the case of XML 1.1).
> Workaround is to ensure all strings stored in the CAS do not contain ascii control characters.
> org.xml.sax.SAXParseException: Trying to serialize non-XML 1.0 character: , 0x1c
> at org.apache.uima.util.XMLSerializer$CharacterValidatingContentHandler.checkForInvalidXmlChars(XMLSerializer.java:254)
> at org.apache.uima.util.XMLSerializer$CharacterValidatingContentHandler.startElement(XMLSerializer.java:174)
> at org.apache.uima.cas.impl.XmiCasSerializer$XmiCasDocSerializer.startElement(XmiCasSerializer.java:1003)
> at org.apache.uima.cas.impl.XmiCasSerializer$XmiCasDocSerializer.encodeFS(XmiCasSerializer.java:755)
> at org.apache.uima.cas.impl.XmiCasSerializer$XmiCasDocSerializer.encodeIndexed(XmiCasSerializer.java:700)
> at org.apache.uima.cas.impl.XmiCasSerializer$XmiCasDocSerializer.serialize(XmiCasSerializer.java:268)
> at org.apache.uima.cas.impl.XmiCasSerializer$XmiCasDocSerializer.access$700(XmiCasSerializer.java:108)
> at org.apache.uima.cas.impl.XmiCasSerializer.serialize(XmiCasSerializer.java:1516)
> at org.apache.uima.cas.impl.XmiCasSerializer.serialize(XmiCasSerializer.java:1496)
> at bugs.UimaXMIBug.writeXmi(UimaXMIBug.java:68)
> at bugs.UimaXMIBug.main(UimaXMIBug.java:38)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira