You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@uima.apache.org by Jörn Kottmann <ko...@gmail.com> on 2011/09/07 14:27:30 UTC

Iterate over annotations with multiple types

Hello all,

what is the best way to iterate over annotations which have
different types?

I have documents which usually have to following structure:

One Headline Annotation
A couple of Sentence Annotations
A Sub-Headline Annotation
A couple of Sentence Annotations
...

Now I would like to iterate over the annotations above in the
order they occur.

In my current implementation I do the following:
- Retrieve all the annotations
- Put them in a List
- Sort the list
- Iterate over the annotations in the list

Is there a better way to do this?

Thanks,
Jörn

Re: Iterate over annotations with multiple types

Posted by Thilo Götz <tw...@gmx.de>.
On 07/09/11 15:21, Richard Eckart de Castilho wrote:
> It really depends on the data in your CAS. As far as I know, there is typically only one big annotation index - if you get an iterator for a specific type, a filtered iterator is created internally and returned. The only thing to speed up iteration is the offsets. If the annotations you are looking for are more or less evenly distributed throughout your text, it's probably faster to use a single filtered iterator than iterating separately for each type.
> 
> So far my understanding and experience. Any of the UIMA maintainers, please correct my if I am wrong.

Correcting.  It's actually the other way round.  There's an index for
each annotation type, and if you iterate over all annotations, the
iterators are merged at runtime.

If speed is of the essence, it's best to create an iterator
for each of the annotation types you're interested in, and
do the weaving manually.  Having said that, iterating in general
is quite fast, and unless your operations are really cheap, this
is not likely to by you a lot.

--Thilo

> 
> Cheers,
> 
> Richard
> 
> Am 07.09.2011 um 15:16 schrieb Jörn Kottmann:
> 
>> Isn't this slow? Because it then needs to iterate over every
>> single AnnotationFS inside my CAS.
>>
>> Jörn
>>
>>
>> On 9/7/11 3:06 PM, Richard Eckart de Castilho wrote:
>>> Hi Jörn,
>>>
>>>> what is the best way to iterate over annotations which have
>>>> different types?
>>> you can use a filtered iterator - more or less like this:
>>>
>>> 		CAS cas = jcas.getCas();
>>> 		ConstraintFactory cf = ConstraintFactory.instance();
>>> 		FSIterator<Annotation>  iterator = jcas.getAnnotationIndex().iterator();
>>> 		Type tokenType = jcas.getCasType(Token.type);
>>> 		Type sentenceType = jcas.getCasType(Sentence.type);
>>>
>>> 		// Restrict to Tokens
>>> 		FSTypeConstraint typeConstraint1 = cf.createTypeConstraint();
>>> 		typeConstraint.add(tokenType);
>>>
>>> 		// Restrict to Tokens
>>> 		FSTypeConstraint typeConstraint2 = cf.createTypeConstraint();
>>> 		typeConstraint.add(sentenceType);
>>>
>>> 		// Combine both constraints using "or"
>>> 		FSMatchConstraint disjunction = cf.or(typeConstraint1, typeConstraint2);
>>>
>>> 		// Create and use the filtered iterator
>>> 		FSIterator<Annotation>  filteredIterator = cas.createFilteredIterator(iterator, disjunction);
>>> 		while(filteredIterator.hasNext()) {
>>> 			System.out.println(filteredIterator.next().getCoveredText());
>>> 		}
>>>
>>> Cheers,
>>>
>>> Richard
>>>
>>
> 
> Richard Eckart de Castilho
> 

Re: Iterate over annotations with multiple types

Posted by Richard Eckart de Castilho <ec...@tk.informatik.tu-darmstadt.de>.
It really depends on the data in your CAS. As far as I know, there is typically only one big annotation index - if you get an iterator for a specific type, a filtered iterator is created internally and returned. The only thing to speed up iteration is the offsets. If the annotations you are looking for are more or less evenly distributed throughout your text, it's probably faster to use a single filtered iterator than iterating separately for each type.

So far my understanding and experience. Any of the UIMA maintainers, please correct my if I am wrong.

Cheers,

Richard

Am 07.09.2011 um 15:16 schrieb Jörn Kottmann:

> Isn't this slow? Because it then needs to iterate over every
> single AnnotationFS inside my CAS.
> 
> Jörn
> 
> 
> On 9/7/11 3:06 PM, Richard Eckart de Castilho wrote:
>> Hi Jörn,
>> 
>>> what is the best way to iterate over annotations which have
>>> different types?
>> you can use a filtered iterator - more or less like this:
>> 
>> 		CAS cas = jcas.getCas();
>> 		ConstraintFactory cf = ConstraintFactory.instance();
>> 		FSIterator<Annotation>  iterator = jcas.getAnnotationIndex().iterator();
>> 		Type tokenType = jcas.getCasType(Token.type);
>> 		Type sentenceType = jcas.getCasType(Sentence.type);
>> 
>> 		// Restrict to Tokens
>> 		FSTypeConstraint typeConstraint1 = cf.createTypeConstraint();
>> 		typeConstraint.add(tokenType);
>> 
>> 		// Restrict to Tokens
>> 		FSTypeConstraint typeConstraint2 = cf.createTypeConstraint();
>> 		typeConstraint.add(sentenceType);
>> 
>> 		// Combine both constraints using "or"
>> 		FSMatchConstraint disjunction = cf.or(typeConstraint1, typeConstraint2);
>> 
>> 		// Create and use the filtered iterator
>> 		FSIterator<Annotation>  filteredIterator = cas.createFilteredIterator(iterator, disjunction);
>> 		while(filteredIterator.hasNext()) {
>> 			System.out.println(filteredIterator.next().getCoveredText());
>> 		}
>> 
>> Cheers,
>> 
>> Richard
>> 
> 

Richard Eckart de Castilho

-- 
------------------------------------------------------------------- 
Richard Eckart de Castilho
Technical Lead
Ubiquitous Knowledge Processing Lab 
FB 20 Computer Science Department      
Technische Universität Darmstadt 
Hochschulstr. 10, D-64289 Darmstadt, Germany 
phone [+49] (0)6151 16-7477, fax -5455, room S2/02/B117
eckartde@tk.informatik.tu-darmstadt.de 
www.ukp.tu-darmstadt.de 
Web Research at TU Darmstadt (WeRC) www.werc.tu-darmstadt.de
------------------------------------------------------------------- 





Re: Iterate over annotations with multiple types

Posted by Jörn Kottmann <ko...@gmail.com>.
Isn't this slow? Because it then needs to iterate over every
single AnnotationFS inside my CAS.

Jörn


On 9/7/11 3:06 PM, Richard Eckart de Castilho wrote:
> Hi Jörn,
>
>> what is the best way to iterate over annotations which have
>> different types?
> you can use a filtered iterator - more or less like this:
>
> 		CAS cas = jcas.getCas();
> 		ConstraintFactory cf = ConstraintFactory.instance();
> 		FSIterator<Annotation>  iterator = jcas.getAnnotationIndex().iterator();
> 		Type tokenType = jcas.getCasType(Token.type);
> 		Type sentenceType = jcas.getCasType(Sentence.type);
>
> 		// Restrict to Tokens
> 		FSTypeConstraint typeConstraint1 = cf.createTypeConstraint();
> 		typeConstraint.add(tokenType);
>
> 		// Restrict to Tokens
> 		FSTypeConstraint typeConstraint2 = cf.createTypeConstraint();
> 		typeConstraint.add(sentenceType);
>
> 		// Combine both constraints using "or"
> 		FSMatchConstraint disjunction = cf.or(typeConstraint1, typeConstraint2);
>
> 		// Create and use the filtered iterator
> 		FSIterator<Annotation>  filteredIterator = cas.createFilteredIterator(iterator, disjunction);
> 		while(filteredIterator.hasNext()) {
> 			System.out.println(filteredIterator.next().getCoveredText());
> 		}
>
> Cheers,
>
> Richard
>


Re: Iterate over annotations with multiple types

Posted by Richard Eckart de Castilho <ec...@tk.informatik.tu-darmstadt.de>.
Hi Jörn,

> what is the best way to iterate over annotations which have
> different types?

you can use a filtered iterator - more or less like this:

		CAS cas = jcas.getCas();
		ConstraintFactory cf = ConstraintFactory.instance();
		FSIterator<Annotation> iterator = jcas.getAnnotationIndex().iterator();
		Type tokenType = jcas.getCasType(Token.type);
		Type sentenceType = jcas.getCasType(Sentence.type);

		// Restrict to Tokens
		FSTypeConstraint typeConstraint1 = cf.createTypeConstraint();
		typeConstraint.add(tokenType);

		// Restrict to Tokens
		FSTypeConstraint typeConstraint2 = cf.createTypeConstraint();
		typeConstraint.add(sentenceType);

		// Combine both constraints using "or"
		FSMatchConstraint disjunction = cf.or(typeConstraint1, typeConstraint2);

		// Create and use the filtered iterator
		FSIterator<Annotation> filteredIterator = cas.createFilteredIterator(iterator, disjunction);
		while(filteredIterator.hasNext()) {
			System.out.println(filteredIterator.next().getCoveredText());
		}

Cheers,

Richard

-- 
------------------------------------------------------------------- 
Richard Eckart de Castilho
Technical Lead
Ubiquitous Knowledge Processing Lab 
FB 20 Computer Science Department      
Technische Universität Darmstadt 
Hochschulstr. 10, D-64289 Darmstadt, Germany 
phone [+49] (0)6151 16-7477, fax -5455, room S2/02/B117
eckartde@tk.informatik.tu-darmstadt.de 
www.ukp.tu-darmstadt.de 
Web Research at TU Darmstadt (WeRC) www.werc.tu-darmstadt.de
-------------------------------------------------------------------