You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@uima.apache.org by "Александър Л. Димитров" <ad...@sfs.uni-tuebingen.de> on 2008/10/09 16:13:09 UTC

.addToIndexes() on subtype while iterating over supertype

Hello,

I have the following design issue with one of my AnalysisEngines. I searched the
documentation, but not exhaustively, so pardon me if I'm doing something
fundamentally wrong or there is an easy obvious solution.

I have a type T1 and its subtype T2. They would both span the same text in the
CAS, and while T1 represents a general data structure, T2 represents a more
specific one. Say, T1 represents a sentence and T2 a sentence of a certain kind.

In order to find out about all T2's in the text, I first have to find all T1's,
then declare some T1's as T2's. Currently, I first mark up all T1's in an AE,
then, in the next step and another AE, iterate over all T1's, look at their
features and decide whether or not a T1 is a T2. In my particular example, I
have to first do sentence boundary detection, then, after a few other AE's have
done additional work, decide whether a particular sentence contains a trigger.

So, I iterate over T1:

final AnnotationIndex ai = cas.getAnnotationIndex(T1.class);
for (final Iterator<T1> i = ai.iterator(); ai.hasNext(); ) {
    final T1 t1 = ai.next(); // throws ConcurrentModificationException …
    if (matchesDescription(t1)) {
	final T2 t2 = new T2(cas);
	doStuff(t2)
	t2.addToIndexes(); // … because we modified T1's indexes by adding a T2
	                   // to them
    }
}

As you can see, this code won't work, the Iterator's domain will be changed
because the subclass shares an index 'pool' (or so) with the superclass.
This means that the AnnotationIndex of cas.getAnnotationIndex(Foo.class) will
always contain all instances of Foo.class in the CAS, *and* all instances of all
the subclasses?

Apart from just caching all T2's I want to add to the indexes in an ArrayList
and then adding them after the iteration of T1's is finished, are there any
other solutions? I wouldn't like to break up the semantic tie of inheritance
between T1 and T2.

Thanks in advance,
Aleks

Re: .addToIndexes() on subtype while iterating over supertype

Posted by "Александър Л. Димитров" <al...@gmx.de>.
> No, that is the recommended solution to this issue.  I don't
> see anything wrong with it.

Well, I thought it was quite hacky. I usually detest 'outsourcing' resource this
way.

> This is not specific to the CAS,
> btw.  You always get into these kinds of issues when you try
> to modify a collection that you're currently iterating over.
>
> And you also may want to remove the old T1s from the index
> as well, since they'll be replaced by the new T2s.  You also
> need to do this in a separate step...

OK, thanks very much. I did the removal, too, seems cleaner this way.

Best,
Aleks

PS: very sorry for my second mail, but my university forbids outgoing mail to be
relayed by outside SMTPs and the UIMA-list initially rejected my mail for
address-forgery, so I resent it with my uni-account (but forgot that I did so),
without noticing Thilo had already forwarded my question. Just ignore the second
mail :-)

Re: .addToIndexes() on subtype while iterating over supertype

Posted by Thilo Goetz <tw...@gmx.de>.

Александър Л. Димитров wrote:
> Hello,
> 
> I have the following design issue with one of my AnalysisEngines. I searched the
> documentation, but not exhaustively, so pardon me if I'm doing something
> fundamentally wrong or there is an easy obvious solution.
> 
> I have a type T1 and its subtype T2. They would both span the same text in the
> CAS, and while T1 represents a general data structure, T2 represents a more
> specific one. Say, T1 represents a sentence and T2 a sentence of a certain kind.
> 
> In order to find out about all T2's in the text, I first have to find all T1's,
> then declare some T1's as T2's. Currently, I first mark up all T1's in an AE,
> then, in the next step and another AE, iterate over all T1's, look at their
> features and decide whether or not a T1 is a T2. In my particular example, I
> have to first do sentence boundary detection, then, after a few other AE's have
> done additional work, decide whether a particular sentence contains a trigger.
> 
> So, I iterate over T1:
> 
> final AnnotationIndex ai = cas.getAnnotationIndex(T1.class);
> for (final Iterator<T1> i = ai.iterator(); ai.hasNext(); ) {
>     final T1 t1 = ai.next(); // throws ConcurrentModificationException …
>     if (matchesDescription(t1)) {
> 	final T2 t2 = new T2(cas);
> 	doStuff(t2)
> 	t2.addToIndexes(); // … because we modified T1's indexes by adding a T2
> 	                   // to them
>     }
> }
> 
> As you can see, this code won't work, the Iterator's domain will be changed
> because the subclass shares an index 'pool' (or so) with the superclass.
> This means that the AnnotationIndex of cas.getAnnotationIndex(Foo.class) will
> always contain all instances of Foo.class in the CAS, *and* all instances of all
> the subclasses?

Yes.

> 
> Apart from just caching all T2's I want to add to the indexes in an ArrayList
> and then adding them after the iteration of T1's is finished, are there any
> other solutions? I wouldn't like to break up the semantic tie of inheritance
> between T1 and T2.

No, that is the recommended solution to this issue.  I don't
see anything wrong with it.  This is not specific to the CAS,
btw.  You always get into these kinds of issues when you try
to modify a collection that you're currently iterating over.

And you also may want to remove the old T1s from the index
as well, since they'll be replaced by the new T2s.  You also
need to do this in a separate step...

--Thilo

> 
> Thanks in advance,
> Aleks