You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@uima.apache.org by "Petr Baudis (JIRA)" <de...@uima.apache.org> on 2015/08/08 02:33:46 UTC

[jira] [Commented] (UIMA-3818) Unsuported XML entity by XmiCas(De)serializer

    [ https://issues.apache.org/jira/browse/UIMA-3818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14662704#comment-14662704 ] 

Petr Baudis commented on UIMA-3818:
-----------------------------------

I experienced this issue with 2.6.0 too. It seems this is just bad interaction with "rogue" versions of the XML libraries brought into the classpath by Stanford NLP.  Disabling Xerces was not enough for me, in the DKpro+gradle context I had to do

  compile("de.tudarmstadt.ukp.dkpro.core:de.tudarmstadt.ukp.dkpro.core.stanfordnlp-gpl:$dkproVersion") {
    exclude group: "com.io7m.xom", module: "xom"  // this dependency breaks utf8 XMI serialization, c.f. UIMA-3818
  }

(I suspect the culprit is xml-apis or something, but I didn't investigate further and this fixes the issue for me).

> Unsuported XML entity by XmiCas(De)serializer
> ---------------------------------------------
>
>                 Key: UIMA-3818
>                 URL: https://issues.apache.org/jira/browse/UIMA-3818
>             Project: UIMA
>          Issue Type: Bug
>          Components: Collection Processing
>    Affects Versions: 2.4.2SDK
>            Reporter: Gregoire Jadi
>             Fix For: 2.6.0SDK
>
>
> The UTF8 character '𝒪' can not be deserialized by `XmiCasDeserializer.deserialize'.
> Here is a way to reproduce this:
> {code:java}
> import java.io.File;
> import java.io.FileInputStream;
> import java.io.FileOutputStream;
> import java.io.InputStream;
> import java.io.OutputStream;
> import org.apache.uima.cas.impl.XmiCasDeserializer;
> import org.apache.uima.cas.impl.XmiCasSerializer;
> import org.apache.uima.fit.factory.JCasFactory;
> import org.apache.uima.jcas.JCas;
> public class Test {
>     public static void main(String[] args) throws Exception {
>         JCas jCas = JCasFactory.createJCas();
>         jCas.setDocumentText("𝒪");
>         File file = new File("/tmp/test.xmi");
>         OutputStream outputStream = new FileOutputStream(file);
>         XmiCasSerializer.serialize(jCas.getCas(), outputStream);
>         InputStream inputStream = new FileInputStream(file);
>         XmiCasDeserializer.deserialize(inputStream, jCas.getCas());
>     }
> }
> {code}
> And here is the stacktrace:
> {code}
> [Fatal Error] :1:350: Character reference "&#56490" is an invalid XML character.
> Exception in thread "main" org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 350; Character reference "&#56490" is an invalid XML character.
> 	at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
> 	at org.apache.uima.cas.impl.XmiCasDeserializer.deserialize(XmiCasDeserializer.java:1955)
> 	at org.apache.uima.cas.impl.XmiCasDeserializer.deserialize(XmiCasDeserializer.java:1872)
> 	at Test.main(Test.java:24)
>      [java] Java Result: 1
> {code}
> Please tell me if you need more information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)