You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@uima.apache.org by "Marshall Schor (JIRA)" <de...@uima.apache.org> on 2015/05/05 05:42:06 UTC

[jira] [Resolved] (UIMA-4357) create auxiliary flattened version of index and its subtypes, automatically managed

     [ https://issues.apache.org/jira/browse/UIMA-4357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Marshall Schor resolved UIMA-4357.
----------------------------------
    Resolution: Fixed

> create auxiliary flattened version of index and its subtypes, automatically managed
> -----------------------------------------------------------------------------------
>
>                 Key: UIMA-4357
>                 URL: https://issues.apache.org/jira/browse/UIMA-4357
>             Project: UIMA
>          Issue Type: Improvement
>          Components: Core Java Framework
>            Reporter: Marshall Schor
>            Priority: Minor
>             Fix For: 2.7.1SDK
>
>
> UIMA indexes allow retrieving items from the CAS, trading off space (for indexes) for time (speed of finding items in the CAS, speed of iterating).  For sorted indexes over a type with subtypes, if the index isn't being modified, it is possible to do a one-time extraction in sorted order of the items and save this in an array, and iterate much more rapidly over that. I've seen lots of cases of UIMA flows where some annotators will create and index a type (and its subtypes), and once that's been done, the indexes are not subsequently updated for these types, but downstream annotators iterate over them.  It seems that a lazy creation for this kind of flattened index would work well in many cases.
> It is important, I think, to continue to preserve the same kind of ConcurrentModificationException detection.  To make this additional index space-time trade-off automatic and reasonable, make the additional index reachable via a SoftReference, to allow the GC to reclaim the space if needed.  
> Delay the creation of a flattened version until there's evidence that it will be unmodified for some time.  To count things that motivate its creation, count the number of times an iterator over an index is using the code "heapifyUp/Down" that manages the ordering of the subiterators to preserve sort order.  A basic indicator may be the number of times that occurs, without an intervening update to the indexes, relative to the size of the index.
> The flattened array can save a bit more time by holding references to the Java cover class (JCas or non-JCas) for this object. 
> Cas Reset needs to clear out these things.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)