You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@uima.apache.org by Timo Boehme <ti...@ontochem.com> on 2009/10/07 11:17:50 UTC

iterator positioning on same region annotations

Hi,

in my scenario there may be multiple annotations of same type for the
same region. Before I add an annotation I would like to check if such an
annotation already exists.

To accomplish this I use  FSIndex.iterator( newAnnotation ) to get an
iterator which starts at the position of my new (but not added)
annotation. According to the method description the iterator should be
positioned so that previous annotations are less compared to newAnnotation.

However sometimes if I call moveToPrevious() (directly after iterator
creation) I will get (with get()) an annotation (of same type) with same
region as newAnnotation - which in my opinion is not less.

Thus I would like to know if annotations of same type for same region
will trigger some 'unspecified' behavior or if my understanding of the
iterator is wrong or if I stumbled upon a bug?


Kind regards
Timo


-- 

 Timo Boehme
 OntoChem GmbH
 H.-Damerow-Str. 4
 06120 Halle/Saale
 T: +49 345 4780472
 F: +49 345 4780471
 timo.boehme@ontochem.com

_____________________________________________________________________

 OntoChem GmbH
 Geschäftsführer: Dr. Lutz Weber
 Sitz: Halle / Saale
 Registergericht: Stendal
 Registernummer: HRB 215461
_____________________________________________________________________


Re: iterator positioning on same region annotations

Posted by Timo Boehme <ti...@ontochem.com>.
Thanks Thilo and Matthias for the explanations which confirmed my
assumptions.

Thilo Goetz wrote:
> Just to clarify: the moveTo() method tries to move the
> iterator to a position such that the annotation at the
> position is equal to the one you're looking for.  If
> there is more than one such annotation, it is undefined
> which one the iterator will point to.
> 
> Is this consistent with the API docs?  I would say it
> isn't.  We say "move the iterator to the first FS...".
> Intuitively, that should mean to the "leftmost" FS, but
> in fact it is implemented to mean the first one that
> the algorithm finds, which will generally _not_ be the
> leftmost one.

I think the API docs for FSIndex.iterator(FeatureStructure fs) is even
more inconsistent:
 "...The position of the iterator will be set such that the feature
structure returned by a call to the iterator's  get() method is greater
than or equal to fs, and any previous FS is less than FS..."
Thus here it really means "leftmost" FS which, as I pointed out, is not.

I will open a bug report on this.


--Timo

 Timo Boehme
 OntoChem GmbH
 H.-Damerow-Str. 4
 06120 Halle/Saale
 T: +49 345 4780472
 F: +49 345 4780471
 timo.boehme@ontochem.com

_____________________________________________________________________

 OntoChem GmbH
 Geschäftsführer: Dr. Lutz Weber
 Sitz: Halle / Saale
 Registergericht: Stendal
 Registernummer: HRB 215461
_____________________________________________________________________


Re: iterator positioning on same region annotations

Posted by Timo Boehme <ti...@ontochem.com>.
Thilo Goetz wrote:
> As a workaround, you can moveToPrevious() until you hit
> an annotation that is not equal to the one you're looking
> for.

I tried this workaround but found it to be very slow (moveToPrevious()
seems to be a costly operation). Since I didn't want to add dummy types
and type priorities as Matthias suggested I found a simple and fast
workaround:
- create a dummy annotation D of same type as annotation A to be test for
- set D.begin=A.begin and D.end=A.end+1
  (thus make sure D comes before all annotations we are interested in)
- in case moveTo(D) really found an annotation of this range use next()
  until it reaches the desired range (cheap operation)
now the iterator points to the correct start of annotations with
boundaries identical to A

Timo

> Matthias Wendt wrote:
>> Hi Timo,
>>
>> the order relation of the feature structures is defined by the index
>> definition. Have a look at the index definition of the (built-in)
>> annotation index. You can if you open any annotator descriptor using the
>> component editor in eclipse. This helped me a lot in understanding the
>> behaviour of the iterators.
>>
>> To put it short, if two annotations of the same type have exactly the
>> same boundaries, the behaviour is indeed unspecified. However, you can
>> avoid this indeterminism, by adding a second type and assigning it a
>> higher priority. If you don't need a second type, you can use it as a
>> helper, shifting an instance across the CAS as needed ;) - at least, I
>> don't know of any more elegant method.
>>
>> -- Hope this helps
>>
>> Matthias
>>
>>
>>
>>
>> Timo Boehme schrieb:
>>> Hi,
>>>
>>> in my scenario there may be multiple annotations of same type for the
>>> same region. Before I add an annotation I would like to check if such an
>>> annotation already exists.
>>>
>>> To accomplish this I use  FSIndex.iterator( newAnnotation ) to get an
>>> iterator which starts at the position of my new (but not added)
>>> annotation. According to the method description the iterator should be
>>> positioned so that previous annotations are less compared to
>>> newAnnotation.
>>>
>>> However sometimes if I call moveToPrevious() (directly after iterator
>>> creation) I will get (with get()) an annotation (of same type) with same
>>> region as newAnnotation - which in my opinion is not less.
>>>
>>> Thus I would like to know if annotations of same type for same region
>>> will trigger some 'unspecified' behavior or if my understanding of the
>>> iterator is wrong or if I stumbled upon a bug?
>>>
>>>
>>> Kind regards
>>> Timo
>>>
>>>
>>>   


 Timo Boehme
 OntoChem GmbH
 H.-Damerow-Str. 4
 06120 Halle/Saale
 T: +49 345 4780472
 F: +49 345 4780471
 timo.boehme@ontochem.com

_____________________________________________________________________

 OntoChem GmbH
 Geschäftsführer: Dr. Lutz Weber
 Sitz: Halle / Saale
 Registergericht: Stendal
 Registernummer: HRB 215461
_____________________________________________________________________


Re: iterator positioning on same region annotations

Posted by Thilo Goetz <tw...@gmx.de>.
Just to clarify: the moveTo() method tries to move the
iterator to a position such that the annotation at the
position is equal to the one you're looking for.  If
there is more than one such annotation, it is undefined
which one the iterator will point to.

Is this consistent with the API docs?  I would say it
isn't.  We say "move the iterator to the first FS...".
Intuitively, that should mean to the "leftmost" FS, but
in fact it is implemented to mean the first one that
the algorithm finds, which will generally _not_ be the
leftmost one.

As a workaround, you can moveToPrevious() until you hit
an annotation that is not equal to the one you're looking
for.

I think we should fix this, you're not the first one to
complain.  Please open a Jira issue, and we can look
into it after the upcoming release.

--Thilo

Matthias Wendt wrote:
> Hi Timo,
> 
> the order relation of the feature structures is defined by the index
> definition. Have a look at the index definition of the (built-in)
> annotation index. You can if you open any annotator descriptor using the
> component editor in eclipse. This helped me a lot in understanding the
> behaviour of the iterators.
> 
> To put it short, if two annotations of the same type have exactly the
> same boundaries, the behaviour is indeed unspecified. However, you can
> avoid this indeterminism, by adding a second type and assigning it a
> higher priority. If you don't need a second type, you can use it as a
> helper, shifting an instance across the CAS as needed ;) - at least, I
> don't know of any more elegant method.
> 
> -- Hope this helps
> 
> Matthias
> 
> 
> 
> 
> Timo Boehme schrieb:
>> Hi,
>>
>> in my scenario there may be multiple annotations of same type for the
>> same region. Before I add an annotation I would like to check if such an
>> annotation already exists.
>>
>> To accomplish this I use  FSIndex.iterator( newAnnotation ) to get an
>> iterator which starts at the position of my new (but not added)
>> annotation. According to the method description the iterator should be
>> positioned so that previous annotations are less compared to
>> newAnnotation.
>>
>> However sometimes if I call moveToPrevious() (directly after iterator
>> creation) I will get (with get()) an annotation (of same type) with same
>> region as newAnnotation - which in my opinion is not less.
>>
>> Thus I would like to know if annotations of same type for same region
>> will trigger some 'unspecified' behavior or if my understanding of the
>> iterator is wrong or if I stumbled upon a bug?
>>
>>
>> Kind regards
>> Timo
>>
>>
>>   
> 

Re: iterator positioning on same region annotations

Posted by Matthias Wendt <ma...@neofonie.de>.
Hi Timo,

the order relation of the feature structures is defined by the index 
definition. Have a look at the index definition of the (built-in) 
annotation index. You can if you open any annotator descriptor using the 
component editor in eclipse. This helped me a lot in understanding the 
behaviour of the iterators.

To put it short, if two annotations of the same type have exactly the 
same boundaries, the behaviour is indeed unspecified. However, you can 
avoid this indeterminism, by adding a second type and assigning it a 
higher priority. If you don't need a second type, you can use it as a 
helper, shifting an instance across the CAS as needed ;) - at least, I 
don't know of any more elegant method.

-- Hope this helps

Matthias




Timo Boehme schrieb:
> Hi,
>
> in my scenario there may be multiple annotations of same type for the
> same region. Before I add an annotation I would like to check if such an
> annotation already exists.
>
> To accomplish this I use  FSIndex.iterator( newAnnotation ) to get an
> iterator which starts at the position of my new (but not added)
> annotation. According to the method description the iterator should be
> positioned so that previous annotations are less compared to newAnnotation.
>
> However sometimes if I call moveToPrevious() (directly after iterator
> creation) I will get (with get()) an annotation (of same type) with same
> region as newAnnotation - which in my opinion is not less.
>
> Thus I would like to know if annotations of same type for same region
> will trigger some 'unspecified' behavior or if my understanding of the
> iterator is wrong or if I stumbled upon a bug?
>
>
> Kind regards
> Timo
>
>
>