You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@uima.apache.org by "Marshall Schor (JIRA)" <de...@uima.apache.org> on 2014/11/21 19:35:33 UTC

[jira] [Resolved] (UIMA-4003) add alternative int - int maps, and int sets for better space/time performance

     [ https://issues.apache.org/jira/browse/UIMA-4003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Marshall Schor resolved UIMA-4003.
----------------------------------
    Resolution: Fixed

Switched ListUtils to use PositiveIntSet

> add alternative int - int maps, and int sets for better space/time performance
> ------------------------------------------------------------------------------
>
>                 Key: UIMA-4003
>                 URL: https://issues.apache.org/jira/browse/UIMA-4003
>             Project: UIMA
>          Issue Type: Improvement
>          Components: Core Java Framework
>    Affects Versions: 2.6.0SDK
>            Reporter: Marshall Schor
>            Assignee: Marshall Schor
>            Priority: Minor
>             Fix For: 2.7.0SDK
>
>
> int - int maps and int sets are implemented in UIMA as either red-black trees (size = 5 words (4 words for sets) + 1 bit per item, search time = log 2 size (binary search), insert /removal can cause rebalancing tree), or as intVectors (like ArrayList<Integers> but doesn't wrap ints as Integers).
> For int - int maps, add a hash version (loses key "ordering"), which takes 3 - 6 words per item (avg 4.5 words - slightly smaller), and has O(1) performance (based on existing JCasHashMap impl, but without concurrency support).
> For int sets, add 2 styles: one based on the same hash map (space = 1.5 to 3 words per item vs 4 words), the other based on BitSets.  The BitSet one would be faster, would have a conventional key ordering, and would be smaller, if the max int stored < 128 * number of items stored.  (This is often the case in uses where it is used to track Feature Structure addresses).
> Finally, add one adaptable int set that is usually is implemented as a BitSet, but if the items being stored are such that the size would be significantly larger than the IntHashSet implementation, switch to that alternative representation.   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)