You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@uima.apache.org by "Marshall Schor (JIRA)" <de...@uima.apache.org> on 2013/11/07 18:05:17 UTC

[jira] [Commented] (UIMA-3413) improve remove-from-index performance

    [ https://issues.apache.org/jira/browse/UIMA-3413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13816124#comment-13816124 ] 

Marshall Schor commented on UIMA-3413:
--------------------------------------

A test of remove from indexes of 40,000 items indexed only in a bag index (that is, the FSs were not a subtype of Annotation, otherwise they would have also been indexed in the (default) annotation index): 

The current remove from indexes, if done in the same order as they were created, took about 150 ms.
After the optimization, it still took 150 ms.  (this is expected: because the optimization was done only for when the items are being removed in the FIFO, not LIFO order).

When the removes are done in LIFO order, the current impl took about 700 ms.  (This is because each remove has to scan all the way through the list to find the item to remove).
After the optimization, this reduced to about 5 ms, a 140 x improvement :-).  

While doing the test cases for this, I found some edge cases where the sorted index was throwing array index out of bounds, and also noticed that adding lots of items to a set index and then removing them all, repeatedly, caused the set index space used to keep growing (because the current impl of the set index is to use the int red-black tree, and deletes from that do not reclaim space).

> improve remove-from-index performance
> -------------------------------------
>
>                 Key: UIMA-3413
>                 URL: https://issues.apache.org/jira/browse/UIMA-3413
>             Project: UIMA
>          Issue Type: Improvement
>          Components: Core Java Framework
>    Affects Versions: 2.4.2SDK
>            Reporter: Marshall Schor
>            Assignee: Marshall Schor
>            Priority: Minor
>             Fix For: 2.5.0SDK
>
>
> Although UIMA-2434 improved time to remove from a sorted index, removal time from bag indexes is still likely to be order(number-of-elements that are in the index), because a sequential scan is done.  The ordering of elements in bag indexes is somewhat likely to ordered in ascending fs heap index order.  This can be exploited to improve the performance of remove-from-index.



--
This message was sent by Atlassian JIRA
(v6.1#6144)