You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@uima.apache.org by "Marshall Schor (JIRA)" <de...@uima.apache.org> on 2016/03/03 17:48:18 UTC

[jira] [Updated] (UIMA-4820) uv3 Supporting Delta deserialization requires preserving simulated heap addresses

     [ https://issues.apache.org/jira/browse/UIMA-4820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Marshall Schor updated UIMA-4820:
---------------------------------
    Summary: uv3 Supporting Delta deserialization requires preserving simulated heap addresses  (was: uv3 Supporting Delta deserialization requires holding on to FSs serialized)

> uv3 Supporting Delta deserialization requires preserving simulated heap addresses
> ---------------------------------------------------------------------------------
>
>                 Key: UIMA-4820
>                 URL: https://issues.apache.org/jira/browse/UIMA-4820
>             Project: UIMA
>          Issue Type: Bug
>          Components: Core Java Framework
>            Reporter: Marshall Schor
>            Assignee: Marshall Schor
>             Fix For: 3.0.0SDKexp
>
>
> UIMA supports various formats of delta deserialization, which is when a serialization is done (to, for example, a remote service), and then a delta serialization returns just the changes back to the original CAS.  
> There are two approaches used to get the set of FSs to serialize.  
> * One way, used for plain binary and form4 compressed, scans the "heap" sequentially, and sends all those FSs, including potentially FSs that are not "reachable".  
> * The other way is to use the indexes plus following reference chains to locate all "reachable" FSs, and only send those.  This is used for XCAS, XMI, JSON, and Form6 compressed.
> In V3, the plain and form4 serialization need to preserve simulated heap "addresses" (per CAS) for the FSs sent in order to enable future delta deserializations to have the proper "heap" addresses; it may not recalcuate this from the CAS FS contents, because intervening GCs may have garbage collected some unreachable FSs..  
> Furthermore, plain and form4 non-delta deserialization where a delta serialization is to follow, must likewise preserve these simulated heap addresses (per CAS), for all deserialized FSs.
> This preservation is needed to insure that the simulated "addresses" of FSs are constant, even if unreachable FSs are reclaimed.  In practice, this means that various maps involving simulated heap "addresses" need to be retained and not recreated.
> Because they are retained, their storage needs to be released when no longer needed:  at CAS Reset time, after a services delta deserializer has completed deserializing (potentially multiple) delta CASes, or when a new non-delta serialization is started (this will re-create this storage).  For services use, we may add a new API to release this storage; the service would call it after all delta deserializations for this CAS have been received (this use case is supporting having multiple remotes working on a common CAS and having their delta results merged back into the original CAS).  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)