You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@uima.apache.org by "Richard Eckart de Castilho (Jira)" <de...@uima.apache.org> on 2019/12/18 16:59:00 UTC

[jira] [Updated] (UIMA-6162) Concurrent binary serialization produces corrupt output

     [ https://issues.apache.org/jira/browse/UIMA-6162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Richard Eckart de Castilho updated UIMA-6162:
---------------------------------------------
    Summary: Concurrent binary serialization produces corrupt output  (was: Sofa not found when deserializing CAS)

> Concurrent binary serialization produces corrupt output
> -------------------------------------------------------
>
>                 Key: UIMA-6162
>                 URL: https://issues.apache.org/jira/browse/UIMA-6162
>             Project: UIMA
>          Issue Type: Bug
>          Components: UIMA
>    Affects Versions: 3.1.1SDK
>            Reporter: Richard Eckart de Castilho
>            Priority: Major
>         Attachments: admin.ser
>
>
> I suspect there could be an issue in `BinaryCasSerDes`.
> When deserializing the attached file `admin.ser`, I get this stack trace:
> {code:java}
> Caused by: java.lang.ClassCastException: class org.apache.uima.jcas.tcas.Annotation cannot be cast to class org.apache.uima.jcas.cas.Sofa (org.apache.uima.jcas.tcas.Annotation and org.apache.uima.jcas.cas.Sofa are in unnamed module of loader org.apache.catalina.loader.ParallelWebappClassLoader @4593ff34)at org.apache.uima.cas.impl.BinaryCasSerDes.makeSofaFromHeap(BinaryCasSerDes.java:1823) ~[uimaj-core-3.1.1.jar:3.1.1]at org.apache.uima.cas.impl.BinaryCasSerDes.getSofaFromAnnotBase(BinaryCasSerDes.java:1817) ~[uimaj-core-3.1.1.jar:3.1.1]at org.apache.uima.cas.impl.BinaryCasSerDes.createFSsFromHeaps(BinaryCasSerDes.java:1701) ~[uimaj-core-3.1.1.jar:3.1.1]at org.apache.uima.cas.impl.BinaryCasSerDes.reinit(BinaryCasSerDes.java:259) ~[uimaj-core-3.1.1.jar:3.1.1]at org.apache.uima.cas.impl.BinaryCasSerDes.reinit(BinaryCasSerDes.java:328) ~[uimaj-core-3.1.1.jar:3.1.1]at org.apache.uima.cas.impl.Serialization.deserializeCASComplete(Serialization.java:129) ~[uimaj-core-3.1.1.jar:3.1.1]{code}
>  The code used to read the file before deserializing is as follows:
> {code:java}
>     public static void readSerializedCas(CAS aCas, File aFile)
>         throws IOException
>     {
>         try (ObjectInputStream is = new ObjectInputStream(new FileInputStream(aFile))) {
>             CASCompleteSerializer serializer = (CASCompleteSerializer) is.readObject();
>             deserializeCASComplete(serializer, (CASImpl) aCas);
>         }
>         catch (ClassNotFoundException e) {
>             throw new IOException(e);
>         }
>     }
> {code}
> I set a breakpoint to BinaryCasSerDes:1608 which is a for loop iterating over the heap. Apparently, the first feature structure that is encountered is an annotation type which is NOT the SOFA. Then in line 1700, the deserializer tries to resolve the SOFA for this annotation but fails because it has not yet been deserialized. Eventually makeSofaFromHeap is called and checks if a SOFA needs to be created. It tries to look up the SOFAs ID (1) from csds.addr2fs.get(sofaAddr) (BinaryCasSerDes:1821) and generates a new SOFA. However, when the SECOND annotation is read and csds.addr2fs.get(sofaAddr) (BinaryCasSerDes:1821) is called again and tries to resolve the SOFA from addr 1, it gets the previously deserialized annotation instead of the SOFA annotation that had been created.
> The SOFA that has been implicitly created is added to the csds.addr2fs map at key 1... however, later in BinaryCasSerDes:1723, the key 1 is overwritten by the deserialized annotation:
> {code}
>         if (!isSofa) { // if it was a sofa, other code added or pended it
>           csds.addFS(fs, heapIndex); // this overrides to SOFA that was created at key 1 because heapIndex is also 1
>         }
> {code}
> The heap looks something like this:
> {code}
> [0, 187, 1, 33, 46, 199, 200, 201, 44, 202, 187, 1, 33, 46, 203, 204, 205, 45, 206, 187, 1, 33, 46, 207, 208, 209, 46, 210, 187, 1, 33, 46, 211, 212, 213, 47, 214, 187, 1, 33, 46, 215, 216, 217, 48, 1, 187, 1,...
> {code}
> I guess that 187 is the type code of the first annotation and we can see it repeats a couple of times. The 1 seems to be the SOFA ID - the first feature of the feature structures. However, instead of 1 referring to the address of the SOFA, it points at the first annotation which is NOT a SOFA.
> Bug in the serialization code assuming that the SOFA is always in the first position?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)