You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@uima.apache.org by Marshall Schor <ms...@schor.com> on 2016/09/14 14:11:15 UTC

uv3 iterators

Version 2 had snapshot iterators, used for two purposes:

a) allowing underlying index modifications while iterating (over the snapshot).
Note that this includes even simple things like changing begin/end values in an
annotation (which could cause a remove/add-back to indexes action while those
features are changed).

b) performance (in some edge cases, but also has a performance cost initially
(to create the snapshot))

It might be reasonable to support case (a) more automatically.  One approach
might be to do a "copy on write" style for the index parts.  Java has, for
instance CopyOnWriteArrayList and CopyOnWriteArraySet.  This could add 1 more
level of indirection in using UIMA indexes; details need to be worked out and
could be complex (indexes need to be performant and thread-safe for reading).

Does this seem like a good thing to try?

-Marshall


Re: uv3 iterators - success in avoiding all concurrent modification exceptions

Posted by Marshall Schor <ms...@schor.com>.
re: remove the need for the snapshot iterators then?

Yes, mostly.  There's one other use for those iterators, I think - they can in
unusual circumstances, speed things up (but mostly, they slow things down a
little). The speed up happens if you're doing a fully sorted index with lots of
subtypes interleaved and do multiple moves forwards and backwards.  The snapshot
"flattens" the interleaved nature (if I remember correctly), and then the
forwards and backwards movement occurs more efficiently, without "rattling" the
multiple iterators (one per type) as they interleave.

-Marshall


On 9/16/2016 4:20 PM, Richard Eckart de Castilho wrote:
> On 16.09.2016, at 22:06, Marshall Schor <ms...@schor.com> wrote:
>>>> Does this seem like a good thing to try?
> Definitely sounds promising. So that would remove the need for the snapshot iterators then?
>
> Cheers,
>
> -- Richard


Re: uv3 iterators - success in avoiding all concurrent modification exceptions

Posted by Richard Eckart de Castilho <re...@apache.org>.
On 16.09.2016, at 22:06, Marshall Schor <ms...@schor.com> wrote:
> 
>>> Does this seem like a good thing to try?

Definitely sounds promising. So that would remove the need for the snapshot iterators then?

Cheers,

-- Richard

Re: uv3 iterators - success in avoiding all concurrent modification exceptions

Posted by Marshall Schor <ms...@schor.com>.
One other benefit: UIMA automatically may "under-the-covers" remove and add back
some FSs if you update some features used as keys in indexes.  This could cause
ConcurrentModificationException if you had loops that did this, even though you
had no index operations coded explicitly as part of the loop.

-Marshall Schor


On 9/16/2016 3:59 PM, Marshall Schor wrote:
> As an experiment, I implemented a copy-on-write style of concurrent modification
> exception prevention in UV3.
>
> It does minimal copying, only copying part of the index related to the
> particular type being updated; if no iterators are in use, there's no copying
> (but see below).
>
> The copy is done just once, even for multiple iterators, unless a subsequent
> iterator is created after another update has happened to that part of the index.
>
> With this, you get a trade-off: no more concurrent modification exceptions; you
> can modify indexes within loops, but (incrementally) copies are made of index
> parts if needed.  So it takes more space and time, due to copies sometimes being
> made.
>
> In the following case, no copies will be made:
>
>   a) modify the indexes
>
>   b) create an iterator, iterate, then drop references to the iterator, and have
> the garbage collector gc it.
>
>   c) repeat a and b as much as you like.
>
> If you're through with an iterator, but it hasn't been GC'd yet, then the
> modification code can't tell your through with the iterator, and has to make a copy.
>
> Is this a good trade off to make?  Should we have 2 modes of running pipelines -
> with/without this feature?
>
> -Marshall
>
> P.S. there's an edge case caught by the test cases.  In today's world, if you do:
>    a) modify the indexes
>    b) start iterating
>    c) modify the indexes
>    d) do one of moveToFirst, Last, or just moveTo(fs), these "reset" the
> concurrent mod, and allow continuing use of the iterator, this time over the
> updated indexes.  I had to add some more details in the impl to make this work
> the same way... 
>
> On 9/14/2016 10:11 AM, Marshall Schor wrote:
>> Version 2 had snapshot iterators, used for two purposes:
>>
>> a) allowing underlying index modifications while iterating (over the snapshot).
>> Note that this includes even simple things like changing begin/end values in an
>> annotation (which could cause a remove/add-back to indexes action while those
>> features are changed).
>>
>> b) performance (in some edge cases, but also has a performance cost initially
>> (to create the snapshot))
>>
>> It might be reasonable to support case (a) more automatically.  One approach
>> might be to do a "copy on write" style for the index parts.  Java has, for
>> instance CopyOnWriteArrayList and CopyOnWriteArraySet.  This could add 1 more
>> level of indirection in using UIMA indexes; details need to be worked out and
>> could be complex (indexes need to be performant and thread-safe for reading).
>>
>> Does this seem like a good thing to try?
>>
>> -Marshall
>>
>>
>


Re: uv3 iterators - success in avoiding all concurrent modification exceptions

Posted by Marshall Schor <ms...@schor.com>.
As an experiment, I implemented a copy-on-write style of concurrent modification
exception prevention in UV3.

It does minimal copying, only copying part of the index related to the
particular type being updated; if no iterators are in use, there's no copying
(but see below).

The copy is done just once, even for multiple iterators, unless a subsequent
iterator is created after another update has happened to that part of the index.

With this, you get a trade-off: no more concurrent modification exceptions; you
can modify indexes within loops, but (incrementally) copies are made of index
parts if needed.  So it takes more space and time, due to copies sometimes being
made.

In the following case, no copies will be made:

  a) modify the indexes

  b) create an iterator, iterate, then drop references to the iterator, and have
the garbage collector gc it.

  c) repeat a and b as much as you like.

If you're through with an iterator, but it hasn't been GC'd yet, then the
modification code can't tell your through with the iterator, and has to make a copy.

Is this a good trade off to make?  Should we have 2 modes of running pipelines -
with/without this feature?

-Marshall

P.S. there's an edge case caught by the test cases.  In today's world, if you do:
   a) modify the indexes
   b) start iterating
   c) modify the indexes
   d) do one of moveToFirst, Last, or just moveTo(fs), these "reset" the
concurrent mod, and allow continuing use of the iterator, this time over the
updated indexes.  I had to add some more details in the impl to make this work
the same way... 

On 9/14/2016 10:11 AM, Marshall Schor wrote:
> Version 2 had snapshot iterators, used for two purposes:
>
> a) allowing underlying index modifications while iterating (over the snapshot).
> Note that this includes even simple things like changing begin/end values in an
> annotation (which could cause a remove/add-back to indexes action while those
> features are changed).
>
> b) performance (in some edge cases, but also has a performance cost initially
> (to create the snapshot))
>
> It might be reasonable to support case (a) more automatically.  One approach
> might be to do a "copy on write" style for the index parts.  Java has, for
> instance CopyOnWriteArrayList and CopyOnWriteArraySet.  This could add 1 more
> level of indirection in using UIMA indexes; details need to be worked out and
> could be complex (indexes need to be performant and thread-safe for reading).
>
> Does this seem like a good thing to try?
>
> -Marshall
>
>