You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@uima.apache.org by "SAITO, Isao Isaac" <13...@1995.sfc.ne.jp> on 2008/01/25 13:33:42 UTC

Question: How to get diffrent Annotations at exactly the same position?

Hi,

I wonder if there is any method delivered by UIMA framework that can
be applicable to My scenario below.

My scenario:
 - Regions annotated as Person are needed
 - IF multiple annotations includiong Person applied to the region
which has the same start and end position, THEN remove the Person
annotation with that region from Index


Though I know I can write ad-hoc codes for this,
I like to take the best method to avoid 1)decrease performance of
system 2)cost of writing adhoc codes in the future.

Thanks,
 Isaac

Re: Question: How to get diffrent Annotations at exactly the same position?

Posted by Thilo Goetz <tw...@gmx.de>.

Eddie Epstein wrote:
> Hi Isaac,
> 
> If I understand your scenario, you want to ignore duplicate Person 
> annotations. The set index type is useful for just this purpose.
> 
> The javadocs for this index type say:
> Indexing strategy: set index. A set index contains no duplicates of the 
> same type, where a duplicate is defined by the indexing comparator. A 
> set index is not guaranteed to be sorted.
> 
> A simple test shows an iterator for a set index to respect sort order, 
> so I'm not sure what the documentation means about "not guaranteeed to 
> be sorted". We'll have to wait for Thilo to clarify this.

The actual implementation does use a sorted set (as opposed
to, say, a hash set).  So annotations do come out sorted on
iteration.  I don't see that changing in the near future.

Let me know if Eddie's comments answered your question or not.


--Thilo

Re: Question: How to get diffrent Annotations at exactly the same position?

Posted by "SAITO, Isao Isaac" <13...@1995.sfc.ne.jp>.

Eddie,

The code you showed really helped me!
Using those and also UIMA's parameter architecture, I don't need to
hard-code any Class names which I like to remove in my scenario.

Thank you very much.
Isaac

On 2/5/08, Eddie Epstein <ea...@gmail.com> wrote:
> One screwup in my example code was in getting the Types needed for the
> filtered iterator. It is bad to create dummy FS in the CAS just to get a
> type object :(
>
> Instead of
>        constraint.add((new Person(jcas)).getType());
> use
>        constraint.add(jcas.getTypeSystem().getType("test.Person"));
>
> Eddie
>
> On Feb 4, 2008 7:30 PM, Eddie Epstein <ea...@gmail.com> wrote:
>
> > Hi Isaac,
> >
> > There is a simpler way to do this. Using FilteredIterators you can limit
> > an iterator to just returning types of interest. Using "type priorities" you
> > can guarantee the order that types are returned for annotations with the
> > same begin and end positions.
> >
> > Using both of these would allow a single pass iteration to eliminate
> > unwanted type instances.
> >
> > In the CDE for your component, on the Indexes tab, add a priority list
> > that puts Person in front of the other types. The resultant component
> > descriptor would look like this:
> >
> >     <typePriorities>
> >       <priorityList>
> >         <type>yourtype.Person</type>
> >         <type>yourtype.Organization</type>
> >         <type>yourtype.Company</type>
> >         ...
> >       </priorityList>
> >     </typePriorities>
> >
> >
> > Some code below.
> >
> >         FSIterator it = jcas.getAnnotationIndex().iterator();
> >         FSTypeConstraint constraint = cas.getConstraintFactory
> > ().createTypeConstraint();
> >         constraint.add((new Person(jcas)).getType());
> >         constraint.add((new Organization(jcas)).getType());
> >         constraint.add((new Company(jcas)).getType());
> >         it = jcas.createFilteredIterator(it, constraint);
> >
> >         it.moveToFirst();
> >         Annotation a = null;
> >         int pB = -1;
> >         int pE = -1;
> >         LinkedList<Annotation> toDel = new LinkedList<Annotation>();
> >         while (it.hasNext()) {
> >             a = (Annotation) it.get();
> >             it.moveToNext();
> >             if (a instanceof Person) {
> >                 // grab position for testing subsequent FS
> >                 pB = a.getBegin();
> >                 pE = a.getEnd();
> >             }
> >             else {
> >                 // not a Person; see if it has same position as the last
> > Person
> >                 if (pB == a.getBegin() && pE == a.getEnd()) {
> >                     toDel.add(a);
> >                 }
> >             }
> >         }
> >
> >         // must modify index outside iterator loop
> >         for (int i = 0; i < toDel.size(); i++) {
> >             toDel.get(i).removeFromIndexes();
> >         }
> >
> >
> > Regards,
> > Eddie
> >
> >
>

Re: Question: How to get diffrent Annotations at exactly the same position?

Posted by Eddie Epstein <ea...@gmail.com>.

One screwup in my example code was in getting the Types needed for the
filtered iterator. It is bad to create dummy FS in the CAS just to get a
type object :(

Instead of
        constraint.add((new Person(jcas)).getType());
use
        constraint.add(jcas.getTypeSystem().getType("test.Person"));

Eddie

On Feb 4, 2008 7:30 PM, Eddie Epstein <ea...@gmail.com> wrote:

> Hi Isaac,
>
> There is a simpler way to do this. Using FilteredIterators you can limit
> an iterator to just returning types of interest. Using "type priorities" you
> can guarantee the order that types are returned for annotations with the
> same begin and end positions.
>
> Using both of these would allow a single pass iteration to eliminate
> unwanted type instances.
>
> In the CDE for your component, on the Indexes tab, add a priority list
> that puts Person in front of the other types. The resultant component
> descriptor would look like this:
>
>     <typePriorities>
>       <priorityList>
>         <type>yourtype.Person</type>
>         <type>yourtype.Organization</type>
>         <type>yourtype.Company</type>
>         ...
>       </priorityList>
>     </typePriorities>
>
>
> Some code below.
>
>         FSIterator it = jcas.getAnnotationIndex().iterator();
>         FSTypeConstraint constraint = cas.getConstraintFactory
> ().createTypeConstraint();
>         constraint.add((new Person(jcas)).getType());
>         constraint.add((new Organization(jcas)).getType());
>         constraint.add((new Company(jcas)).getType());
>         it = jcas.createFilteredIterator(it, constraint);
>
>         it.moveToFirst();
>         Annotation a = null;
>         int pB = -1;
>         int pE = -1;
>         LinkedList<Annotation> toDel = new LinkedList<Annotation>();
>         while (it.hasNext()) {
>             a = (Annotation) it.get();
>             it.moveToNext();
>             if (a instanceof Person) {
>                 // grab position for testing subsequent FS
>                 pB = a.getBegin();
>                 pE = a.getEnd();
>             }
>             else {
>                 // not a Person; see if it has same position as the last
> Person
>                 if (pB == a.getBegin() && pE == a.getEnd()) {
>                     toDel.add(a);
>                 }
>             }
>         }
>
>         // must modify index outside iterator loop
>         for (int i = 0; i < toDel.size(); i++) {
>             toDel.get(i).removeFromIndexes();
>         }
>
>
> Regards,
> Eddie
>
>

Re: Question: How to get diffrent Annotations at exactly the same position?

Posted by Eddie Epstein <ea...@gmail.com>.

I guess Thilo is pointing out the second screwup:

        while (it.hasNext()) {
            a = (Annotation) it.get();
            it.moveToNext();

should be

        while (it.hasNext()) {
            a = (Annotation) it.next();

Eddie

On Feb 6, 2008 3:52 AM, Thilo Goetz <tw...@gmx.de> wrote:

> Eddie Epstein wrote:
> ...
> > Some code below.
> >
> >         FSIterator it = jcas.getAnnotationIndex().iterator();
> >         FSTypeConstraint constraint = cas.getConstraintFactory
> > ().createTypeConstraint();
> >         constraint.add((new Person(jcas)).getType());
> >         constraint.add((new Organization(jcas)).getType());
> >         constraint.add((new Company(jcas)).getType());
> >         it = jcas.createFilteredIterator(it, constraint);
> >
> >         it.moveToFirst();
> >         Annotation a = null;
> >         int pB = -1;
> >         int pE = -1;
> >         LinkedList<Annotation> toDel = new LinkedList<Annotation>();
> >         while (it.hasNext()) {
> >             a = (Annotation) it.get();
> >             it.moveToNext();
> >             if (a instanceof Person) {
> >                 // grab position for testing subsequent FS
> >                 pB = a.getBegin();
> >                 pE = a.getEnd();
> >             }
> >             else {
> >                 // not a Person; see if it has same position as the last
> > Person
> >                 if (pB == a.getBegin() && pE == a.getEnd()) {
> >                     toDel.add(a);
> >                 }
> >             }
> >         }
> >
> >         // must modify index outside iterator loop
> >         for (int i = 0; i < toDel.size(); i++) {
> >             toDel.get(i).removeFromIndexes();
> >         }
> >
> >
> > Regards,
> > Eddie
> >
>
> This has nothing to do with the problem at hand, but let me quote from
> our javadocs:
>
> > public interface FSIterator
> > extends Iterator
> >
> > Iterator over feature structures.
> >
> > This iterator interface extends java.util.Iterator, and supports the
> standard hasNext and next methods. If finer control, including reverse
> iteration, is needed, see below.
> >
> > Note: do not use the APIs described below *together* with the standard
> Java iterator methods next() and hasNext(). On any given iterator, use
> either the one or the other, but not both together. Otherwise, next/hasNext
> may exhibit incorrect behavior.
> >
>
> This is not an idle warning.  Mixing the iterator paradigms *will*
> result in unexpected and incorrect behavior.
>
> --Thilo
>
>
>

Re: Question: How to get diffrent Annotations at exactly the same position?

Posted by Thilo Goetz <tw...@gmx.de>.

Eddie Epstein wrote:
...
> Some code below.
> 
>         FSIterator it = jcas.getAnnotationIndex().iterator();
>         FSTypeConstraint constraint = cas.getConstraintFactory
> ().createTypeConstraint();
>         constraint.add((new Person(jcas)).getType());
>         constraint.add((new Organization(jcas)).getType());
>         constraint.add((new Company(jcas)).getType());
>         it = jcas.createFilteredIterator(it, constraint);
> 
>         it.moveToFirst();
>         Annotation a = null;
>         int pB = -1;
>         int pE = -1;
>         LinkedList<Annotation> toDel = new LinkedList<Annotation>();
>         while (it.hasNext()) {
>             a = (Annotation) it.get();
>             it.moveToNext();
>             if (a instanceof Person) {
>                 // grab position for testing subsequent FS
>                 pB = a.getBegin();
>                 pE = a.getEnd();
>             }
>             else {
>                 // not a Person; see if it has same position as the last
> Person
>                 if (pB == a.getBegin() && pE == a.getEnd()) {
>                     toDel.add(a);
>                 }
>             }
>         }
> 
>         // must modify index outside iterator loop
>         for (int i = 0; i < toDel.size(); i++) {
>             toDel.get(i).removeFromIndexes();
>         }
> 
> 
> Regards,
> Eddie
> 

This has nothing to do with the problem at hand, but let me quote from
our javadocs:

> public interface FSIterator
> extends Iterator
> 
> Iterator over feature structures.
> 
> This iterator interface extends java.util.Iterator, and supports the standard hasNext and next methods. If finer control, including reverse iteration, is needed, see below.
> 
> Note: do not use the APIs described below *together* with the standard Java iterator methods next() and hasNext(). On any given iterator, use either the one or the other, but not both together. Otherwise, next/hasNext may exhibit incorrect behavior.
> 

This is not an idle warning.  Mixing the iterator paradigms *will*
result in unexpected and incorrect behavior.

--Thilo

Re: Question: How to get diffrent Annotations at exactly the same position?

Posted by Eddie Epstein <ea...@gmail.com>.

Hi Isaac,

There is a simpler way to do this. Using FilteredIterators you can limit an
iterator to just returning types of interest. Using "type priorities" you
can guarantee the order that types are returned for annotations with the
same begin and end positions.

Using both of these would allow a single pass iteration to eliminate
unwanted type instances.

In the CDE for your component, on the Indexes tab, add a priority list that
puts Person in front of the other types. The resultant component descriptor
would look like this:

    <typePriorities>
      <priorityList>
        <type>yourtype.Person</type>
        <type>yourtype.Organization</type>
        <type>yourtype.Company</type>
        ...
      </priorityList>
    </typePriorities>


Some code below.

        FSIterator it = jcas.getAnnotationIndex().iterator();
        FSTypeConstraint constraint = cas.getConstraintFactory
().createTypeConstraint();
        constraint.add((new Person(jcas)).getType());
        constraint.add((new Organization(jcas)).getType());
        constraint.add((new Company(jcas)).getType());
        it = jcas.createFilteredIterator(it, constraint);

        it.moveToFirst();
        Annotation a = null;
        int pB = -1;
        int pE = -1;
        LinkedList<Annotation> toDel = new LinkedList<Annotation>();
        while (it.hasNext()) {
            a = (Annotation) it.get();
            it.moveToNext();
            if (a instanceof Person) {
                // grab position for testing subsequent FS
                pB = a.getBegin();
                pE = a.getEnd();
            }
            else {
                // not a Person; see if it has same position as the last
Person
                if (pB == a.getBegin() && pE == a.getEnd()) {
                    toDel.add(a);
                }
            }
        }

        // must modify index outside iterator loop
        for (int i = 0; i < toDel.size(); i++) {
            toDel.get(i).removeFromIndexes();
        }


Regards,
Eddie

Re: Question: How to get diffrent Annotations at exactly the same position?

Posted by "SAITO, Isao Isaac" <13...@1995.sfc.ne.jp>.

 Eddie, Thilo,

Sorry not to have responded.
And sorry again that I have not described my scenario properly and
made you misunderstood it...
The idea Eddie gave was also useful for my further development but
seems not the one for my case.
Writing ad-hoc code, I have already realized what I wanted. But
because I still wonder the same question, I explain the case again.

2 sections below
 A: Example of What I wanted this time
 B: The ad-hoc source code I made this time

As you see source code shown in B, you might see how my code is
redundunt and lacks extensionability (even ignoring my general Java
skills).
Simple and extensionable code is appreciated.

Thanks In ADV,
 Isaac


A: Example of What I wanted this time

I input text into aggregated AE, which consists of AE-1 and AE-2
working in the numbered order.
AE-1 puts <Person> and other 20 kinds of annotations.
AE-2 removes other annotations if their positions are the same with
those of <Person>'s.

i) Input text
 "The story of Mr.Saito is similar to the Isaac Foundation's mission."

 #NOTE: actually we're handling mainly Japanese, but it doesn't matter here.

ii) Annotation Result of AE-1:
 "The story of Mr.<Person>Saito</Person> is similar to the
<Organization><Person>Isaac Foundation</Person></Organization>'s
mission."

iii) Annotation Result of AE-2(What I wanted this time):
 "The story of Mr.<Person>Saito</Person> is similar to the Isaac
Foundation's mission."


B: The ad-hoc source code I made this time

public class PersonAnnotator extends JCasAnnotator_ImplBase {
	private static final String CLASSNAME_ROOT = "com.ibm.omnifind.ne.types";
	private static final String CLASSNAME_ORG = CLASSNAME_ROOT + ".Org";
	private static final String CLASSNAME_COMPANY = CLASSNAME_ROOT + ".Company";
	private static final String CLASSNAME_PLACE = CLASSNAME_ROOT + ".Place";
	private static final String CLASSNAME_COUNTRY = CLASSNAME_ROOT + ".Country";
	private static final String CLASSNAME_AREA = CLASSNAME_ROOT + ".Area";
	private static final String CLASSNAME_ORDINAL = CLASSNAME_ROOT + ".Ordinal";

	@Override
	public void process(JCas jcas) throws AnalysisEngineProcessException {
		this.removeMultiplyAssignedNe(jcas);
	}

	private void removeMultiplyAssignedNe(JCas jcas) {
		FSIterator personIter = jcas.getJFSIndexRepository()
				.getAnnotationIndex(Person.type).iterator();
		LinkedList<Person> persons = new LinkedList<Person>();
		for (; personIter.isValid(); personIter.moveToNext()) {
			persons.add((Person) personIter.next());
		}

		FSIterator annotItr = jcas.getAnnotationIndex().iterator();
		LinkedList<NamedEntity> removalCandidates = new LinkedList<NamedEntity>();
		for (; annotItr.isValid(); annotItr.moveToNext()) {
			String typename = annotItr.get().getType().getName();
			if (PersonAnnotator.CLASSNAME_ORG.equals(typename)
					|| PersonAnnotator.CLASSNAME_COMPANY.equals(typename)
					|| PersonAnnotator.CLASSNAME_PLACE.equals(typename)
					|| PersonAnnotator.CLASSNAME_COUNTRY.equals(typename)
					|| PersonAnnotator.CLASSNAME_AREA.equals(typename)
					|| PersonAnnotator.CLASSNAME_ORDINAL.equals(typename)) {
				NamedEntity ne = (NamedEntity) annotItr.get();
				removalCandidates.add(ne);
			}
		}
		for (int i = 0; i < removalCandidates.size(); i++) {
			boolean tobeRemoved = false;
			NamedEntity rn = removalCandidates.get(i);
			int startPos = rn.getBegin();
			int endPos = rn.getEnd();
			for (int j = 0; j < persons.size(); j++) {
				Person p = persons.get(j);
				int p_startPos = p.getBegin();
				int p_endPos = p.getEnd();
				if ((p_startPos == startPos) && (p_endPos == endPos)) {
					// removalCandidates.remove(rn);
					tobeRemoved = true;
				}
			}
			if (tobeRemoved) {
				System.out.println(super.getClass().getName()
						+ "#removeMultiplyAssignedNe: " + rn.getLex()
						+ " removed.");
				rn.removeFromIndexes();
			}
		}
	}
}



On Jan 27, 2008 5:06 AM, Eddie Epstein <ea...@gmail.com> wrote:
> Hi Isaac,
>
> If I understand your scenario, you want to ignore duplicate Person
> annotations. The set index type is useful for just this purpose.
>
> The javadocs for this index type say:
>  Indexing strategy: set index. A set index contains no duplicates of the
> same type, where a duplicate is defined by the indexing comparator. A set
> index is not guaranteed to be sorted.
>
> A simple test shows an iterator for a set index to respect sort order, so
> I'm not sure what the documentation means about "not guaranteeed to be
> sorted". We'll have to wait for Thilo to clarify this.
>
> The attached files are intended to be placed into
> $UIMA_HOME/examples/descriptors/analysis_engine/SetIndexTest.xml
> $UIMA_HOME/examples/src/org/apache/uima/examples/SetIndexTest.java
>
> The test prints the following:
>
> Set index contents:
> annotation at begin=0 end=3
> annotation at begin=10 end=13
> annotation at begin=20 end=23
>
> Annotation index contents:
> annotation at begin=0 end=3
> annotation at begin=10 end=15
>  annotation at begin=10 end=13
> annotation at begin=20 end=23
>
> Note that the Person at (10,15) is identical to (10,13) because the set
> index is defined with only one key, the begin feature.
>
> Regards,
>  Eddie
>
>
>
> On Jan 25, 2008 7:33 AM, SAITO, Isao Isaac <13...@1995.sfc.ne.jp> wrote:
> > Hi,
> >
> > I wonder if there is any method delivered by UIMA framework that can
> > be applicable to My scenario below.
> >
> > My scenario:
> >  - Regions annotated as Person are needed
> >  - IF multiple annotations includiong Person applied to the region
> > which has the same start and end position, THEN remove the Person
> > annotation with that region from Index
> >
> >
> > Though I know I can write ad-hoc codes for this,
> > I like to take the best method to avoid 1)decrease performance of
> > system 2)cost of writing adhoc codes in the future.
> >
> > Thanks,
> >  Isaac
> >
>

Re: Question: How to get diffrent Annotations at exactly the same position?

Posted by Eddie Epstein <ea...@gmail.com>.

Hi Isaac,

If I understand your scenario, you want to ignore duplicate Person
annotations. The set index type is useful for just this purpose.

The javadocs for this index type say:
Indexing strategy: set index. A set index contains no duplicates of the same
type, where a duplicate is defined by the indexing comparator. A set index
is not guaranteed to be sorted.

A simple test shows an iterator for a set index to respect sort order, so
I'm not sure what the documentation means about "not guaranteeed to be
sorted". We'll have to wait for Thilo to clarify this.

The attached files are intended to be placed into
$UIMA_HOME/examples/descriptors/analysis_engine/SetIndexTest.xml
$UIMA_HOME/examples/src/org/apache/uima/examples/SetIndexTest.java

The test prints the following:
Set index contents:
annotation at begin=0 end=3
annotation at begin=10 end=13
annotation at begin=20 end=23

Annotation index contents:
annotation at begin=0 end=3
annotation at begin=10 end=15
annotation at begin=10 end=13
annotation at begin=20 end=23

Note that the Person at (10,15) is identical to (10,13) because the set
index is defined with only one key, the begin feature.

Regards,
Eddie

On Jan 25, 2008 7:33 AM, SAITO, Isao Isaac <13...@1995.sfc.ne.jp> wrote:

> Hi,
>
> I wonder if there is any method delivered by UIMA framework that can
> be applicable to My scenario below.
>
> My scenario:
>  - Regions annotated as Person are needed
>  - IF multiple annotations includiong Person applied to the region
> which has the same start and end position, THEN remove the Person
> annotation with that region from Index
>
>
> Though I know I can write ad-hoc codes for this,
> I like to take the best method to avoid 1)decrease performance of
> system 2)cost of writing adhoc codes in the future.
>
> Thanks,
>  Isaac
>