You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@uima.apache.org by "Marshall Schor (JIRA)" <de...@uima.apache.org> on 2017/12/08 15:17:00 UTC

[jira] [Comment Edited] (UIMA-5662) uv3 support CAS deserialization subsequent low level access

    [ https://issues.apache.org/jira/browse/UIMA-5662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16283707#comment-16283707 ] 

Marshall Schor edited comment on UIMA-5662 at 12/8/17 3:16 PM:
---------------------------------------------------------------

I'm thinking about an alterantive approach.  Another Jira ( UIMA-5664) adds a semi-built-in new type, a map from int -> FS.  Another approach to deserialization could be:

1) change most deserializers to set the fsId to the v2-fs-id value imputed from the serialized form, either explicitly or implicitly.  
* This allows a subsequent serialization to reuse these, keeping them more-or-less stable. 
* This would be independent of supporting low-level-access, remembering the fs-id -> fs relation in a map, etc.

2) Allow users calling deserializers to specify a map from int -> FS (map of their choice, for instance a HashMap or LinkedHashMap).  If such a parameter is provided, the deserializers would add all the deserialized FSs to the map, with key being the v2-fs-id.

3) The deserializers would **not** add the FSs to the internal hidden map that makes low-level-cas-API to get FSFromRef work.

The pluses and minuses of this seem to be:  
* 3a) minus: low level cas Access doesn't work.  Code using this needs to be upgraded.
* 3b) plus: the internal low-level cas Access map not being used, allows future garbage collection.
* 3c) plus: users have more explicit control of what stays in the map.  Things not in the map could be GCd. Depending on their use case, they could
** clear the map after use
** remove selected entries
* 3c) Going forward, they could convert their application to save the map in the cas (using the new semi-built-in type); they could support multiple maps, etc.

Does this sound like a good general direction?  Other thoughts?


was (Author: schor):
I'm thinking about an alterantive approach.  Another Jira ( UIMA_5664) adds a semi-built-in new type, a map from int -> FS.  Another approach to deserialization could be:

1) change most deserializers to set the fsId to the v2-fs-id value imputed from the serialized form, either explicitly or implicitly.  
* This allows a subsequent serialization to reuse these, keeping them more-or-less stable. 
* This would be independent of supporting low-level-access, remembering the fs-id -> fs relation in a map, etc.

2) Allow users calling deserializers to specify a map from int -> FS (map of their choice, for instance a HashMap or LinkedHashMap).  If such a parameter is provided, the deserializers would add all the deserialized FSs to the map, with key being the v2-fs-id.

3) The deserializers would **not** add the FSs to the internal hidden map that makes low-level-cas-API to get FSFromRef work.

The pluses and minuses of this seem to be:  
* 3a) minus: low level cas Access doesn't work.  Code using this needs to be upgraded.
* 3b) plus: the internal low-level cas Access map not being used, allows future garbage collection.
* 3c) plus: users have more explicit control of what stays in the map.  Things not in the map could be GCd. Depending on their use case, they could
** clear the map after use
** remove selected entries
* 3c) Going forward, they could convert their application to save the map in the cas (using the new semi-built-in type); they could support multiple maps, etc.

Does this sound like a good general direction?  Other thoughts?

> uv3 support CAS deserialization subsequent low level access
> -----------------------------------------------------------
>
>                 Key: UIMA-5662
>                 URL: https://issues.apache.org/jira/browse/UIMA-5662
>             Project: UIMA
>          Issue Type: Improvement
>          Components: Core Java Framework
>    Affects Versions: 3.0.0SDK-beta
>            Reporter: Marshall Schor
>            Assignee: Marshall Schor
>            Priority: Minor
>             Fix For: 3.0.0SDK
>
>
> Some users depend 1) constant v2-ids for FSs preserved in deserialization and serialization, and 2) low level cas API access to these.
> V3 normally doesn't maintain tables linking ids to FSs, as these (unless weak refs are used) prevent GC of unreachable FSs.
> Based on a mode, set by -Duima.deserialize_perserve_ids, and also controllable by new config option per deserialize call, alter the deserialization for those deserializers which know about v2 ids, to put these into the map used for low-level CAS access, using the actual v2 ids, and change the v3 next available id for future new FSs to be 1 beyond the end.
> The -Duima.deserialize-preserve_ids global setting is needed to handle the use case of some annotators using low-level APIs, when part of a pipeline is "remoted". 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)