You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@uima.apache.org by James Kitching <ja...@hemseye.org> on 2014/11/16 13:08:41 UTC

UIMA pipeline output persistence and multiple layer web based visualisation tools? Suggestions?

Hi

(First of all a BIG THANKS to ALL open source developers at UIMA and the 
other projects I mention below whom I am now relying on  :-) ).

I am looking at researching a particular knowledge base extraction task 
using UIMA components as part of the solution.  To do this work I need 
UIMA output persistence and to be able to visualise this output as 
multiple annotation layers on the same text.  Ultimately I want my 
automated annotations and visualisations to be web based and allow me to 
make additional manual annotations if required.  Once I have my multiple 
annotations made on a text I will then be able to apply my new knowledge 
extraction logic.

I have looked at webanno (which incorporates Brat for its UI) and 
U-Compare as well as Argo (See https://code.google.com/p/webanno/, 
http://brat.nlplab.org/, http://u-compare.org/, 
http://nactem.ac.u/ucompare/downloads/, 
http://argo.nactem.ac.uk/about-argo/).  I had hoped that I could use 
webanno for this task however webanno does not allow the direct import 
of UIMA components or UIMA output.  I found that I could get U-Compare 
to work as I wanted and it shows promise however if I get my any 
configuration wrong between any UIMA components it crashes out.  I got 
the software to work for me after I spent more time reading the manual.  
I found I needed to manually configure the input types for each 
component in the pipeline.  The software recognises subsequent pipeline 
component compatibility when a new component is added to a work flow.  
My initial errors came as I had initially expected subsequent U-Compare 
components to automatically pick up their input from the output from 
previous workflow components.  Whilst the U-compare software does 
support the saving of previous session data the software is not fully 
open source so I do not have easy access to this data.  I have not 
looked at the webservice pipeline generation fetaures of U-Compare as 
yet;  this might hold promise if it gives me a download configuration 
rather than a hosted solution.  When I looked at the argo tool I had 
similar problems with a lack of output.  I would assume for the same 
reasons.  Again Argo is not fully open source so I cannot work on 
modifying this tool to my own ends.  Are there any other better tools 
available that support web based UIMA layered visualisation and output 
persistence?

Currently I plan to continue to experiment with UIMA components using 
U-compare however I am looking to implement persistence and 
visualisation in a production tool.  If someone already has a good open 
source implementation of this need I would prefer not to spend time 
reinventing this particular wheel.

I would be very happy if the U-compare and webanno teams would work 
together and get their software integrated.  I will pass this mail onto 
these teams as a suggestion.

The particular data extraction task I am interested in is different to 
the current popular research shared task 
(http://www.nist.gov/tac/2014/KBP/) and one which I plan to share once  
have made some progress.

Thanks in advance.

Further information about me and my project can be found at www.hemseye.org

James Kitching

Re: can't remove duplicate Annotations with Java Set Collection

Posted by Kameron Cole <ka...@us.ibm.com>.
Having trouble with the Comparator.  If I compare Object, no issue:



If I compare Annotation, it doesn't recognize the method





                                                                               
                                                                               
                                                                               
 Kameron Arthur Cole                                                           
 Watson Content                                                                
 Analytics Applications                                                        
 and Support                                                                   
 email:                                                                        
 kameroncole@us.ibm.com                                                        
 | Tel: 305-389-8512                                                           
 upload logs here                                                              
                                                                               
                                                                               
                                                                               
                                                                               
                                                                               






From:	Richard Eckart de Castilho <re...@apache.org>
To:	user@uima.apache.org
Date:	11/18/2014 02:34 AM
Subject:	Re: can't remove duplicate Annotations with Java Set Collection



On 17.11.2014, at 20:59, Kameron Cole <ka...@us.ibm.com> wrote:

> I am trying to get rid of duplicates in the FSIndex.  I thought a very
> clever way to do this would be to just push them into a Set Collection in
> Java, which does not allow duplicates. This is very (very) standard Java:
>
> ArrayList al = new ArrayList();
> // add elements to al, including duplicates
> HashSet hs = new HashSet();
> hs.addAll(al);
> al.clear();
> al.addAll(hs);

There is no universal definition of equality other than object equality.
And this is what Java defaults to unless equals() and hashCode() are
implemented.
Since each UIMA user might have a different opinion on what is equal, UIMA
defers this decision to its indexing mechanism instead of hard-baking it
into equals()/hashcode() methods.

I suggest you do the following:

- implement a Comparator<FeatureStructure> or Comparator<AnnotationFS>
according to your definition of equality

- create a TreeSet based on your comparator

- drop all your annotations into this TreeSet

- "duplicates" according to your definition are dropped. The rest is sorted
(or not) depending on what your comparator returns in a non-equality case
(return value != 0).

Cheers,

-- Richard

Re: can't remove duplicate Annotations with Java Set Collection

Posted by Richard Eckart de Castilho <re...@apache.org>.
On 17.11.2014, at 20:59, Kameron Cole <ka...@us.ibm.com> wrote:

> I am trying to get rid of duplicates in the FSIndex.  I thought a very
> clever way to do this would be to just push them into a Set Collection in
> Java, which does not allow duplicates. This is very (very) standard Java:
> 
> ArrayList al = new ArrayList();
> // add elements to al, including duplicates
> HashSet hs = new HashSet();
> hs.addAll(al);
> al.clear();
> al.addAll(hs);

There is no universal definition of equality other than object equality. And this is what Java defaults to unless equals() and hashCode() are implemented.
Since each UIMA user might have a different opinion on what is equal, UIMA defers this decision to its indexing mechanism instead of hard-baking it into equals()/hashcode() methods.

I suggest you do the following:

- implement a Comparator<FeatureStructure> or Comparator<AnnotationFS> according to your definition of equality

- create a TreeSet based on your comparator

- drop all your annotations into this TreeSet

- "duplicates" according to your definition are dropped. The rest is sorted (or not) depending on what your comparator returns in a non-equality case (return value != 0). 

Cheers,

-- Richard

Re: can't remove duplicate Annotations with Java Set Collection

Posted by Marshall Schor <ms...@schor.com>.
Sorry, the pictures/images don't come through this email list...  If you want to
include them, please post them on a well-know clip-site, and include a link to
them in your email.

I think the issue you're having is that you wrote:

...
_@Override_
__*_public_*_ _*_int_*_ compare(Annotation __o1__, Annotation __o2__) {_
__...

The @Override indicates an error if the method signature you're defining can't
be matched to a method in the supertype.

The supertype here is "Comparator" and it only has a signature for compare with
2 args which are both "Object"s.

You can remove the @Override to get rid of this check.

-Marshall

On 11/18/2014 2:06 PM, Kameron Cole wrote:
>
> Awesome.  Your change will work.  And i will try it, thank you!
>
> But maybe you can help me to get this to work?   As I posted, if I use Object
> as the parameter in the compare method signature, Eclipse is ok; but when I
> change it to Annotation, it says I must override the methods - as though
> something about Annotator confuses Eclipse.  Here's the code I really want to
> work:
>
>
> -----------------------------------
>
> *public* *static* ArrayList<Annotation>  dedupe (AnnotationIndex<Annotation>
> idx2){
>
> ArrayList<Annotation> tempList = *new* ArrayList<Annotation>(idx2.size());
> FSIterator<Annotation> it2  = idx2.iterator();
> *while*(it2.hasNext())
> {
>
> tempList.add((Annotation) it2.next());
>
> }
>
> _Set_ set = *_new_*_ TreeSet(_*_new_*_ Comparator() {_
> ___@Override_
> __*_public_*_ _*_int_*_ compare(Annotation __o1__, Annotation __o2__) {_
> __*_if_*_(__o1__.getCoveredText()==__o2__.getCoveredText()){_
> _        _*_return_*_ 0;_
> _        }_
> _        _*_return_*_ 1;_
> _}_
> _})_;
>
> _set__.addAll(__tempList__)_;
>
> tempList.clear();
> tempList.addAll(_set_);
> System.*/out/*.println("templist length: "+tempList.size());
> *return* tempList;
>
> -----------------------------
>
> But look:at what Eclipse gives me:
>
>
>
>
>
>
>
>     --------------------------------------------------------------------------------
>
>     *Kameron Arthur Cole
>     Watson Content Analytics Applications and Support
>     email: **kameroncole@us.ibm.com* <ma...@us.ibm.com>* | Tel:
>     305-389-8512**
>     **upload logs here* <http://www.ecurep.ibm.com/app/upload>  
>
> 	
>
> 	
>
>     <ht...@ibmwatson><http://www.youtube.com/user/IBMWatsonSolutions/videos>
>
>
>     --------------------------------------------------------------------------------
>
>
>
> Inactive hide details for Marshall Schor ---11/18/2014 11:54:50 AM---An even
> simpler approach: Use a HashMap, where the key is Marshall Schor ---11/18/2014
> 11:54:50 AM---An even simpler approach: Use a HashMap, where the key is the
> annotation.getCoveredText() and the va
>
> From: Marshall Schor <ms...@schor.com>
> To: user@uima.apache.org
> Date: 11/18/2014 11:54 AM
> Subject: Re: can't remove duplicate Annotations with Java Set Collection
>
> --------------------------------------------------------------------------------
>
>
>
> An even simpler approach:
>
> Use a HashMap, where the key is the annotation.getCoveredText() and the value is
> the annotation, instead of a HashSet.
>
> replace this (in your original):
>
> // push tempList into HashSet
> HashSet<Annotation> hs = new HashSet<Annotation>();
> hs.addAll(tempList);
>
>
> with
>
> // push tempList into HashMap
> HashMap<String, Annotation> hm = new HashSet<String, Annotation>();
> for (Annotation a : tempList) {
>  hm.put(a.getCoveredText(), a);
> }
>
> -Marshall
>
> On 11/18/2014 9:45 AM, Marshall Schor wrote:
> > Eclipse pointed out a bug in my code, fix is below
> > On 11/18/2014 9:37 AM, Marshall Schor wrote:
> >> Hi Kameron,
> >>
> >> Based on this code snip, the two "cat" annotations you create are "different"
> >> using the HashSet definition, because they correspond to two distinct UIMA
> >> Annotations.  You could, for instance, update one of them, and not the other;
> >> that it the sense in which they are distinct.  In the case below, the two "cat"
> >> annotations would have different begin and end offsets.
> >>
> >> I'm guessing that your goal was to to have one of the two cat annotations be
> >> dropped.
> >>
> >> You could do that by using your hash set approach, if you defined equal to mean
> >> that just the covered text of the annotation was equal.
> >>
> >> Here's one way to do this:  Create a "cover object" for your annotations, that
> >> contains a reference to the annotation and defines equals and hashcode (you
> have
> >> to define these together).  The easy way to do this is using Eclipse - define a
> >> new class: e.g.
> >>
> >> public class MyAnnotationWithSpecialEquals {
> >>   final public Annotation annotation;   // the covered annotation
> >>  
> >>   public MyAnnotationWithSpecialEquals(Annotation annotation) {
> >>     this.annotation = annotation;
> >>   }
> >> }
> >>
> >> and then use Eclipse to define the equals and hashcode:  go to Menu ->
> Source ->
> >> Generate hashcode() and equals()
> >> and have it generate one based on just "annotation".  This will not (yet) be
> >> correct - it should add two methods like this:
> >>
> >>   @Override
> >>   public int hashCode() {
> >>     final int prime = 31;
> >>     int result = 1;
> >>     result = prime * result + ((annotation == null) ? 0 :
> annotation.hashCode());
> >>     return result;
> >>   }
> >>
> >>   @Override
> >>   public boolean equals(Object obj) {
> >>     if (this == obj)
> >>       return true;
> >>     if (obj == null)
> >>       return false;
> >>     if (getClass() != obj.getClass())
> >>       return false;
> >>     MyAnnotationWithSpecialEquals other = (MyAnnotationWithSpecialEquals) obj;
> >             // buggy lines
> >>     if (annotation == null) {
> >>       if (other.annotation != null)
> >>         return false;
> >             //  replace above with
> >       if (annotation == null && other.annotation != null)
> >         return false;
> >>     } else if (!annotation.equals(other.annotation))
> >>       return false;
> >>     return true;
> >>   }
> >>
> >> Now, to get these to be the definitions you want, which depend only on the
> >> covered text, modify these as follows:
> >>
> >> First, for hashCode, use only the string covered text:
> >>
> >>   @Override
> >>   public int hashCode() {
> >>     final int prime = 31;
> >>     int result = 1;
> >>     result = prime * result + ((annotation == null) ? 0 :
> >> annotation.getCoveredText().hashCode());
> >>     return result;
> >>   }
> >>
> >> and for equals: replace test for annotation being "equal" with
> >> annotation.getCoveredText() being "equal",
> >> with some additional edge case testing in case of nulls:
> >>
> >> @Override
> >>   public boolean equals(Object obj) {
> >>     if (this == obj)
> >>       return true;
> >>     if (obj == null)
> >>       return false;
> >>     if (getClass() != obj.getClass())
> >>       return false;
> >>     MyAnnotationWithSpecialEquals other = (MyAnnotationWithSpecialEquals) obj;
> >>     if (annotation == null) {
> >>       if (other.annotation != null)
> >>         return false;
> >>     } else {
> >>       String coveredText = annotation.getCoveredText();
> >>       if (coveredText == null) {
> >>          if (other.annotation.getCoveredText() == null)
> >>             return true;  // handle special case if covered text is null
> >>          else return false;
> >>       }
> >>       // coveredText is not null
> >>       if (!coveredText.equals(other.annotation.getCoveredText()))
> >>         return false;
> >>       return true;
> >>     }
> >>   }
> >>
> >> HTH.  -Marshall
> >>
> >>
> >> On 11/17/2014 4:49 PM, Kameron Cole wrote:
> >>> Input text:
> >>>
> >>> ------------------------------
> >>>
> >>> bird, cat, bush, cat
> >>>
> >>> ----------------------------
> >>>
> >>> Create the Annotations:
> >>>
> >>> -------------------------------
> >>> docText = aJCas.getDocumentText();
> >>>
> >>> *int* index = docText.indexOf("cat");
> >>> *while*(index >= 0) {
> >>> *int* begin = index;
> >>> *int* end = begin+3;
> >>> Animal animal = *new* Animal(aJCas);
> >>> animal.setBegin(begin);
> >>> animal.setEnd(end);
> >>> animal.addToIndexes();
> >>>  
> >>>    index = docText.indexOf("cat", index+1);
> >>> }
> >>>
> >>> index = docText.indexOf("bird");
> >>> *while*(index >= 0) {
> >>> *int* begin = index;
> >>> *int* end = begin+4;
> >>> Animal animal = *new* Animal(aJCas);
> >>> animal.setBegin(begin);
> >>> animal.setEnd(end);
> >>> animal.addToIndexes();
> >>>  
> >>>    index = docText.indexOf("bird", index+1);
> >>> }
> >>>
> >>> index = docText.indexOf("bush");
> >>> *while*(index >= 0) {
> >>> *int* begin = index;
> >>> *int* end = begin+4;
> >>> Vegetable animal = *new* Vegetable(aJCas);
> >>> animal.setBegin(begin);
> >>> animal.setEnd(end);
> >>> animal.addToIndexes();
> >>>  
> >>>    index = docText.indexOf("bird", index+1);
> >>> }
> >>> ------------------------------------------------------
> >>>
> >>>    
> --------------------------------------------------------------------------------
> >>>
> >>>     *Kameron Arthur Cole
> >>>     Watson Content Analytics Applications and Support
> >>>     email: **kameroncole@us.ibm.com* <ma...@us.ibm.com>* | Tel:
> >>>     305-389-8512**
> >>>     **upload logs here* <http://www.ecurep.ibm.com/app/upload>  
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>    
> <ht...@ibmwatson><http://www.youtube.com/user/IBMWatsonSolutions/videos>
> >>>
> >>>
> >>>    
> --------------------------------------------------------------------------------
> >>>
> >>>
> >>>
> >>> Inactive hide details for Marshall Schor ---11/17/2014 04:35:06 PM---Hi, Two
> >>> Feature Structures are considered "equal" in the sMarshall Schor ---11/17/2014
> >>> 04:35:06 PM---Hi, Two Feature Structures are considered "equal" in the sense
> >>> used by HashSet, if
> >>>
> >>> From: Marshall Schor <ms...@schor.com>
> >>> To: user@uima.apache.org
> >>> Date: 11/17/2014 04:35 PM
> >>> Subject: Re: can't remove duplicate Annotations with Java Set Collection
> >>>
> >>>
> --------------------------------------------------------------------------------
> >>>
> >>>
> >>>
> >>> Hi,
> >>>
> >>> Two Feature Structures are considered "equal" in the sense used by HashSet, if
> >>> fs1.equals(fs2).   The definition of "equals" for feature structures is: they
> >>> are equal if they refer to the same underlying CAS, and the same "spot" in the
> >>> the CAS Heap.
> >>>
> >>> How did you create the Annotations that you think are "equal" in the HashSet
> >>> sense?
> >>>
> >>> Here's an example of two annotations which are "equal" in the UIMA sorted
> index
> >>> sense, but unequal in the HashSet sense.
> >>>
> >>>    Annotation fs1 = new Annotation(myJCas, 0, 4); // create an instance of
> >>> Annotation in myJCas, with a begin = 0, and end = 4.
> >>>    Annotation fs2 = new Annotation(myJCas, 0, 4); // create an instance of
> >>> Annotation in myJCas, with a begin = 0, and end = 4.
> >>>
> >>> These will be "equal" in the UIMA sense - the same kind of annotation, in the
> >>> same CAS, with the same feature values, but will be two distinct feature
> >>> structures, so HashSet will consider them to be unequal.
> >>>
> >>> Could this be what is happening in your case?  Please respond so we can see if
> >>> there's another straight-forward solution that does what you're looking for.
> >>>
> >>> -Marshall
> >>> on 11/17/2014 2:59 PM, Kameron Cole wrote:
> >>>> Hello,
> >>>>
> >>>> I am trying to get rid of duplicates in the FSIndex.  I thought a very
> >>>> clever way to do this would be to just push them into a Set Collection in
> >>>> Java, which does not allow duplicates. This is very (very) standard Java:
> >>>>
> >>>> ArrayList al = new ArrayList();
> >>>> // add elements to al, including duplicates
> >>>> HashSet hs = new HashSet();
> >>>> hs.addAll(al);
> >>>> al.clear();
> >>>> al.addAll(hs);
> >>>>
> >>>> This list will contain no duplicates.
> >>>>
> >>>> However, I am not getting this to work in my UIMA code:
> >>>>
> >>>>
> >>>> System.out.println("Index size is: "+idx.size());
> >>>>
> >>>> AnnotationIndex<Annotation> idx = aJCas.getAnnotationIndex();
> >>>>
> >>>> ArrayList<Annotation> tempList = new ArrayList<Annotation>(idx.size());
> >>>>
> >>>> FSIterator it  = idx.iterator();
> >>>>
> >>>> //load the Annotations into a temporary list.  includes duplicates
> >>>>
> >>>> while(it.hasNext())
> >>>> {
> >>>>
> >>>> tempList.add((Annotation) it.next());
> >>>>
> >>>> }
> >>>>
> >>>> Iterator tempIt = tempList.iterator();
> >>>>
> >>>> // remove all Annotations from the index.  this works fine
> >>>>
> >>>> while(tempIt.hasNext()){
> >>>> ((Annotation) tempIt.next()).removeFromIndexes(aJCas);
> >>>> }
> >>>>
> >>>> // push tempList into HashSet
> >>>>
> >>>> HashSet<Annotation> hs = new HashSet<Annotation>();
> >>>>
> >>>> hs.addAll(tempList);
> >>>>
> >>>> // this should not allow duplicates
> >>>>
> >>>> System.out.println("HS length: "+hs.size()); // size should be less the
> >>>> size of the FSIndex by the number of duplicates.  it is not. This is the
> >>>> main problem
> >>>>
> >>>> tempList.clear();
> >>>>
> >>>> tempList.addAll(hs);
> >>>>
> >>>> System.out.println("templist length: "+tempList.size());
> >>>>
> >>>>
> >>>> Iterator<Annotation> it2 = tempList.iterator(); // this should now be the
> >>>> clean list
> >>>>
> >>>>
> >>>> while(it2.hasNext()){
> >>>> it2.next().addToIndexes(aJCas);
> >>>> }
> >
> >
>
>


Re: can't remove duplicate Annotations with Java Set Collection

Posted by Kameron Cole <ka...@us.ibm.com>.
Awesome.  Your change will work.  And i will try it, thank you!

But maybe you can help me to get this to work?   As I posted, if I use
Object as the parameter in the compare method signature, Eclipse is ok; but
when I change it to Annotation, it says I must override the methods - as
though something about Annotator confuses Eclipse.  Here's the code I
really want to work:


-----------------------------------

public static ArrayList<Annotation>  dedupe (AnnotationIndex<Annotation>
idx2){

	ArrayList<Annotation> tempList = new ArrayList<Annotation>(idx2.size
());
	FSIterator<Annotation> it2  = idx2.iterator();
	while(it2.hasNext())
	{

		tempList.add((Annotation) it2.next());

	}

	Set set = new TreeSet(new Comparator() {
		@Override
		public int compare(Annotation o1, Annotation o2) {
			if(o1.getCoveredText()==o2.getCoveredText()){
        		return 0;
        	}
        	return 1;
		}
	});

	set.addAll(tempList);

	tempList.clear();
	tempList.addAll(set);
	System.out.println("templist length: "+tempList.size());
return tempList;

-----------------------------

But look:at what Eclipse gives me:






                                                                               
                                                                               
                                                                               
 Kameron Arthur Cole                                                           
 Watson Content                                                                
 Analytics Applications                                                        
 and Support                                                                   
 email:                                                                        
 kameroncole@us.ibm.com                                                        
 | Tel: 305-389-8512                                                           
 upload logs here                                                              
                                                                               
                                                                               
                                                                               
                                                                               
                                                                               






From:	Marshall Schor <ms...@schor.com>
To:	user@uima.apache.org
Date:	11/18/2014 11:54 AM
Subject:	Re: can't remove duplicate Annotations with Java Set Collection



An even simpler approach:

Use a HashMap, where the key is the annotation.getCoveredText() and the
value is
the annotation, instead of a HashSet.

replace this (in your original):

// push tempList into HashSet
HashSet<Annotation> hs = new HashSet<Annotation>();
hs.addAll(tempList);


with

// push tempList into HashMap
HashMap<String, Annotation> hm = new HashSet<String, Annotation>();
for (Annotation a : tempList) {
  hm.put(a.getCoveredText(), a);
}

-Marshall

On 11/18/2014 9:45 AM, Marshall Schor wrote:
> Eclipse pointed out a bug in my code, fix is below
> On 11/18/2014 9:37 AM, Marshall Schor wrote:
>> Hi Kameron,
>>
>> Based on this code snip, the two "cat" annotations you create are
"different"
>> using the HashSet definition, because they correspond to two distinct
UIMA
>> Annotations.  You could, for instance, update one of them, and not the
other;
>> that it the sense in which they are distinct.  In the case below, the
two "cat"
>> annotations would have different begin and end offsets.
>>
>> I'm guessing that your goal was to to have one of the two cat
annotations be
>> dropped.
>>
>> You could do that by using your hash set approach, if you defined equal
to mean
>> that just the covered text of the annotation was equal.
>>
>> Here's one way to do this:  Create a "cover object" for your
annotations, that
>> contains a reference to the annotation and defines equals and hashcode
(you have
>> to define these together).  The easy way to do this is using Eclipse -
define a
>> new class: e.g.
>>
>> public class MyAnnotationWithSpecialEquals {
>>   final public Annotation annotation;   // the covered annotation
>>
>>   public MyAnnotationWithSpecialEquals(Annotation annotation) {
>>     this.annotation = annotation;
>>   }
>> }
>>
>> and then use Eclipse to define the equals and hashcode:  go to Menu ->
Source ->
>> Generate hashcode() and equals()
>> and have it generate one based on just "annotation".  This will not
(yet) be
>> correct - it should add two methods like this:
>>
>>   @Override
>>   public int hashCode() {
>>     final int prime = 31;
>>     int result = 1;
>>     result = prime * result + ((annotation == null) ? 0 :
annotation.hashCode());
>>     return result;
>>   }
>>
>>   @Override
>>   public boolean equals(Object obj) {
>>     if (this == obj)
>>       return true;
>>     if (obj == null)
>>       return false;
>>     if (getClass() != obj.getClass())
>>       return false;
>>     MyAnnotationWithSpecialEquals other =
(MyAnnotationWithSpecialEquals) obj;
>             // buggy lines
>>     if (annotation == null) {
>>       if (other.annotation != null)
>>         return false;
>             //  replace above with
>       if (annotation == null && other.annotation != null)
>         return false;
>>     } else if (!annotation.equals(other.annotation))
>>       return false;
>>     return true;
>>   }
>>
>> Now, to get these to be the definitions you want, which depend only on
the
>> covered text, modify these as follows:
>>
>> First, for hashCode, use only the string covered text:
>>
>>   @Override
>>   public int hashCode() {
>>     final int prime = 31;
>>     int result = 1;
>>     result = prime * result + ((annotation == null) ? 0 :
>> annotation.getCoveredText().hashCode());
>>     return result;
>>   }
>>
>> and for equals: replace test for annotation being "equal" with
>> annotation.getCoveredText() being "equal",
>> with some additional edge case testing in case of nulls:
>>
>> @Override
>>   public boolean equals(Object obj) {
>>     if (this == obj)
>>       return true;
>>     if (obj == null)
>>       return false;
>>     if (getClass() != obj.getClass())
>>       return false;
>>     MyAnnotationWithSpecialEquals other =
(MyAnnotationWithSpecialEquals) obj;
>>     if (annotation == null) {
>>       if (other.annotation != null)
>>         return false;
>>     } else {
>>       String coveredText = annotation.getCoveredText();
>>       if (coveredText == null) {
>>          if (other.annotation.getCoveredText() == null)
>>             return true;  // handle special case if covered text is null
>>          else return false;
>>       }
>>       // coveredText is not null
>>       if (!coveredText.equals(other.annotation.getCoveredText()))
>>         return false;
>>       return true;
>>     }
>>   }
>>
>> HTH.  -Marshall
>>
>>
>> On 11/17/2014 4:49 PM, Kameron Cole wrote:
>>> Input text:
>>>
>>> ------------------------------
>>>
>>> bird, cat, bush, cat
>>>
>>> ----------------------------
>>>
>>> Create the Annotations:
>>>
>>> -------------------------------
>>> docText = aJCas.getDocumentText();
>>>
>>> *int* index = docText.indexOf("cat");
>>> *while*(index >= 0) {
>>> *int* begin = index;
>>> *int* end = begin+3;
>>> Animal animal = *new* Animal(aJCas);
>>> animal.setBegin(begin);
>>> animal.setEnd(end);
>>> animal.addToIndexes();
>>>
>>>    index = docText.indexOf("cat", index+1);
>>> }
>>>
>>> index = docText.indexOf("bird");
>>> *while*(index >= 0) {
>>> *int* begin = index;
>>> *int* end = begin+4;
>>> Animal animal = *new* Animal(aJCas);
>>> animal.setBegin(begin);
>>> animal.setEnd(end);
>>> animal.addToIndexes();
>>>
>>>    index = docText.indexOf("bird", index+1);
>>> }
>>>
>>> index = docText.indexOf("bush");
>>> *while*(index >= 0) {
>>> *int* begin = index;
>>> *int* end = begin+4;
>>> Vegetable animal = *new* Vegetable(aJCas);
>>> animal.setBegin(begin);
>>> animal.setEnd(end);
>>> animal.addToIndexes();
>>>
>>>    index = docText.indexOf("bird", index+1);
>>> }
>>> ------------------------------------------------------
>>>
>>>
--------------------------------------------------------------------------------

>>>
>>>     *Kameron Arthur Cole
>>>     Watson Content Analytics Applications and Support
>>>     email: **kameroncole@us.ibm.com* <ma...@us.ibm.com>* |
Tel:
>>>     305-389-8512**
>>>     **upload logs here* <http://www.ecurep.ibm.com/app/upload>
>>>
>>>
>>>
>>>
>>>
>>>     <http://www.facebook.com/ibmwatson><https://twitter.com/@ibmwatson
><http://www.youtube.com/user/IBMWatsonSolutions/videos>
>>>
>>>
>>>
--------------------------------------------------------------------------------

>>>
>>>
>>>
>>> Inactive hide details for Marshall Schor ---11/17/2014 04:35:06
PM---Hi, Two
>>> Feature Structures are considered "equal" in the sMarshall Schor
---11/17/2014
>>> 04:35:06 PM---Hi, Two Feature Structures are considered "equal" in the
sense
>>> used by HashSet, if
>>>
>>> From: Marshall Schor <ms...@schor.com>
>>> To: user@uima.apache.org
>>> Date: 11/17/2014 04:35 PM
>>> Subject: Re: can't remove duplicate Annotations with Java Set
Collection
>>>
>>>
--------------------------------------------------------------------------------

>>>
>>>
>>>
>>> Hi,
>>>
>>> Two Feature Structures are considered "equal" in the sense used by
HashSet, if
>>> fs1.equals(fs2).   The definition of "equals" for feature structures
is: they
>>> are equal if they refer to the same underlying CAS, and the same "spot"
in the
>>> the CAS Heap.
>>>
>>> How did you create the Annotations that you think are "equal" in the
HashSet
>>> sense?
>>>
>>> Here's an example of two annotations which are "equal" in the UIMA
sorted index
>>> sense, but unequal in the HashSet sense.
>>>
>>>    Annotation fs1 = new Annotation(myJCas, 0, 4); // create an instance
of
>>> Annotation in myJCas, with a begin = 0, and end = 4.
>>>    Annotation fs2 = new Annotation(myJCas, 0, 4); // create an instance
of
>>> Annotation in myJCas, with a begin = 0, and end = 4.
>>>
>>> These will be "equal" in the UIMA sense - the same kind of annotation,
in the
>>> same CAS, with the same feature values, but will be two distinct
feature
>>> structures, so HashSet will consider them to be unequal.
>>>
>>> Could this be what is happening in your case?  Please respond so we can
see if
>>> there's another straight-forward solution that does what you're looking
for.
>>>
>>> -Marshall
>>> on 11/17/2014 2:59 PM, Kameron Cole wrote:
>>>> Hello,
>>>>
>>>> I am trying to get rid of duplicates in the FSIndex.  I thought a very
>>>> clever way to do this would be to just push them into a Set Collection
in
>>>> Java, which does not allow duplicates. This is very (very) standard
Java:
>>>>
>>>> ArrayList al = new ArrayList();
>>>> // add elements to al, including duplicates
>>>> HashSet hs = new HashSet();
>>>> hs.addAll(al);
>>>> al.clear();
>>>> al.addAll(hs);
>>>>
>>>> This list will contain no duplicates.
>>>>
>>>> However, I am not getting this to work in my UIMA code:
>>>>
>>>>
>>>> System.out.println("Index size is: "+idx.size());
>>>>
>>>> AnnotationIndex<Annotation> idx = aJCas.getAnnotationIndex();
>>>>
>>>> ArrayList<Annotation> tempList = new ArrayList<Annotation>(idx.size
());
>>>>
>>>> FSIterator it  = idx.iterator();
>>>>
>>>> //load the Annotations into a temporary list.  includes duplicates
>>>>
>>>> while(it.hasNext())
>>>> {
>>>>
>>>> tempList.add((Annotation) it.next());
>>>>
>>>> }
>>>>
>>>> Iterator tempIt = tempList.iterator();
>>>>
>>>> // remove all Annotations from the index.  this works fine
>>>>
>>>> while(tempIt.hasNext()){
>>>> ((Annotation) tempIt.next()).removeFromIndexes(aJCas);
>>>> }
>>>>
>>>> // push tempList into HashSet
>>>>
>>>> HashSet<Annotation> hs = new HashSet<Annotation>();
>>>>
>>>> hs.addAll(tempList);
>>>>
>>>> // this should not allow duplicates
>>>>
>>>> System.out.println("HS length: "+hs.size()); // size should be less
the
>>>> size of the FSIndex by the number of duplicates.  it is not. This is
the
>>>> main problem
>>>>
>>>> tempList.clear();
>>>>
>>>> tempList.addAll(hs);
>>>>
>>>> System.out.println("templist length: "+tempList.size());
>>>>
>>>>
>>>> Iterator<Annotation> it2 = tempList.iterator(); // this should now be
the
>>>> clean list
>>>>
>>>>
>>>> while(it2.hasNext()){
>>>> it2.next().addToIndexes(aJCas);
>>>> }
>
>


Re: can't remove duplicate Annotations with Java Set Collection

Posted by Marshall Schor <ms...@schor.com>.
An even simpler approach:

Use a HashMap, where the key is the annotation.getCoveredText() and the value is
the annotation, instead of a HashSet.

replace this (in your original):

// push tempList into HashSet
HashSet<Annotation> hs = new HashSet<Annotation>();
hs.addAll(tempList);


with

// push tempList into HashMap
HashMap<String, Annotation> hm = new HashSet<String, Annotation>();
for (Annotation a : tempList) {
  hm.put(a.getCoveredText(), a);
}

-Marshall

On 11/18/2014 9:45 AM, Marshall Schor wrote:
> Eclipse pointed out a bug in my code, fix is below
> On 11/18/2014 9:37 AM, Marshall Schor wrote:
>> Hi Kameron,
>>
>> Based on this code snip, the two "cat" annotations you create are "different"
>> using the HashSet definition, because they correspond to two distinct UIMA
>> Annotations.  You could, for instance, update one of them, and not the other;
>> that it the sense in which they are distinct.  In the case below, the two "cat"
>> annotations would have different begin and end offsets.
>>
>> I'm guessing that your goal was to to have one of the two cat annotations be
>> dropped.
>>
>> You could do that by using your hash set approach, if you defined equal to mean
>> that just the covered text of the annotation was equal.
>>
>> Here's one way to do this:  Create a "cover object" for your annotations, that
>> contains a reference to the annotation and defines equals and hashcode (you have
>> to define these together).  The easy way to do this is using Eclipse - define a
>> new class: e.g.
>>
>> public class MyAnnotationWithSpecialEquals {
>>   final public Annotation annotation;   // the covered annotation
>>  
>>   public MyAnnotationWithSpecialEquals(Annotation annotation) {
>>     this.annotation = annotation;
>>   }
>> }
>>
>> and then use Eclipse to define the equals and hashcode:  go to Menu -> Source ->
>> Generate hashcode() and equals()
>> and have it generate one based on just "annotation".  This will not (yet) be
>> correct - it should add two methods like this:
>>
>>   @Override
>>   public int hashCode() {
>>     final int prime = 31;
>>     int result = 1;
>>     result = prime * result + ((annotation == null) ? 0 : annotation.hashCode());
>>     return result;
>>   }
>>
>>   @Override
>>   public boolean equals(Object obj) {
>>     if (this == obj)
>>       return true;
>>     if (obj == null)
>>       return false;
>>     if (getClass() != obj.getClass())
>>       return false;
>>     MyAnnotationWithSpecialEquals other = (MyAnnotationWithSpecialEquals) obj;
>             // buggy lines
>>     if (annotation == null) {
>>       if (other.annotation != null)
>>         return false;
>             //  replace above with
>       if (annotation == null && other.annotation != null)
>         return false;
>>     } else if (!annotation.equals(other.annotation))
>>       return false;
>>     return true;
>>   }
>>
>> Now, to get these to be the definitions you want, which depend only on the
>> covered text, modify these as follows:
>>
>> First, for hashCode, use only the string covered text:
>>
>>   @Override
>>   public int hashCode() {
>>     final int prime = 31;
>>     int result = 1;
>>     result = prime * result + ((annotation == null) ? 0 :
>> annotation.getCoveredText().hashCode());
>>     return result;
>>   }
>>
>> and for equals: replace test for annotation being "equal" with
>> annotation.getCoveredText() being "equal",
>> with some additional edge case testing in case of nulls:
>>
>> @Override
>>   public boolean equals(Object obj) {
>>     if (this == obj)
>>       return true;
>>     if (obj == null)
>>       return false;
>>     if (getClass() != obj.getClass())
>>       return false;
>>     MyAnnotationWithSpecialEquals other = (MyAnnotationWithSpecialEquals) obj;
>>     if (annotation == null) {
>>       if (other.annotation != null)
>>         return false;
>>     } else {
>>       String coveredText = annotation.getCoveredText();
>>       if (coveredText == null) {
>>          if (other.annotation.getCoveredText() == null)
>>             return true;  // handle special case if covered text is null
>>          else return false;
>>       }
>>       // coveredText is not null
>>       if (!coveredText.equals(other.annotation.getCoveredText()))
>>         return false;
>>       return true;
>>     }
>>   }
>>
>> HTH.  -Marshall
>>
>>
>> On 11/17/2014 4:49 PM, Kameron Cole wrote:
>>> Input text:
>>>
>>> ------------------------------
>>>
>>> bird, cat, bush, cat
>>>
>>> ----------------------------
>>>
>>> Create the Annotations:
>>>
>>> -------------------------------
>>> docText = aJCas.getDocumentText();
>>>
>>> *int* index = docText.indexOf("cat");
>>> *while*(index >= 0) {
>>> *int* begin = index;
>>> *int* end = begin+3;
>>> Animal animal = *new* Animal(aJCas);
>>> animal.setBegin(begin);
>>> animal.setEnd(end);
>>> animal.addToIndexes();
>>>  
>>>    index = docText.indexOf("cat", index+1);
>>> }
>>>
>>> index = docText.indexOf("bird");
>>> *while*(index >= 0) {
>>> *int* begin = index;
>>> *int* end = begin+4;
>>> Animal animal = *new* Animal(aJCas);
>>> animal.setBegin(begin);
>>> animal.setEnd(end);
>>> animal.addToIndexes();
>>>  
>>>    index = docText.indexOf("bird", index+1);
>>> }
>>>
>>> index = docText.indexOf("bush");
>>> *while*(index >= 0) {
>>> *int* begin = index;
>>> *int* end = begin+4;
>>> Vegetable animal = *new* Vegetable(aJCas);
>>> animal.setBegin(begin);
>>> animal.setEnd(end);
>>> animal.addToIndexes();
>>>  
>>>    index = docText.indexOf("bird", index+1);
>>> }
>>> ------------------------------------------------------
>>>
>>>     --------------------------------------------------------------------------------
>>>
>>>     *Kameron Arthur Cole
>>>     Watson Content Analytics Applications and Support
>>>     email: **kameroncole@us.ibm.com* <ma...@us.ibm.com>* | Tel:
>>>     305-389-8512**
>>>     **upload logs here* <http://www.ecurep.ibm.com/app/upload>  
>>>
>>> 	
>>>
>>> 	
>>>
>>>     <ht...@ibmwatson><http://www.youtube.com/user/IBMWatsonSolutions/videos>
>>>
>>>
>>>     --------------------------------------------------------------------------------
>>>
>>>
>>>
>>> Inactive hide details for Marshall Schor ---11/17/2014 04:35:06 PM---Hi, Two
>>> Feature Structures are considered "equal" in the sMarshall Schor ---11/17/2014
>>> 04:35:06 PM---Hi, Two Feature Structures are considered "equal" in the sense
>>> used by HashSet, if
>>>
>>> From: Marshall Schor <ms...@schor.com>
>>> To: user@uima.apache.org
>>> Date: 11/17/2014 04:35 PM
>>> Subject: Re: can't remove duplicate Annotations with Java Set Collection
>>>
>>> --------------------------------------------------------------------------------
>>>
>>>
>>>
>>> Hi,
>>>
>>> Two Feature Structures are considered "equal" in the sense used by HashSet, if
>>> fs1.equals(fs2).   The definition of "equals" for feature structures is: they
>>> are equal if they refer to the same underlying CAS, and the same "spot" in the
>>> the CAS Heap.
>>>
>>> How did you create the Annotations that you think are "equal" in the HashSet
>>> sense?
>>>
>>> Here's an example of two annotations which are "equal" in the UIMA sorted index
>>> sense, but unequal in the HashSet sense.
>>>
>>>    Annotation fs1 = new Annotation(myJCas, 0, 4); // create an instance of
>>> Annotation in myJCas, with a begin = 0, and end = 4.
>>>    Annotation fs2 = new Annotation(myJCas, 0, 4); // create an instance of
>>> Annotation in myJCas, with a begin = 0, and end = 4.
>>>
>>> These will be "equal" in the UIMA sense - the same kind of annotation, in the
>>> same CAS, with the same feature values, but will be two distinct feature
>>> structures, so HashSet will consider them to be unequal.
>>>
>>> Could this be what is happening in your case?  Please respond so we can see if
>>> there's another straight-forward solution that does what you're looking for.
>>>
>>> -Marshall
>>> on 11/17/2014 2:59 PM, Kameron Cole wrote:
>>>> Hello,
>>>>
>>>> I am trying to get rid of duplicates in the FSIndex.  I thought a very
>>>> clever way to do this would be to just push them into a Set Collection in
>>>> Java, which does not allow duplicates. This is very (very) standard Java:
>>>>
>>>> ArrayList al = new ArrayList();
>>>> // add elements to al, including duplicates
>>>> HashSet hs = new HashSet();
>>>> hs.addAll(al);
>>>> al.clear();
>>>> al.addAll(hs);
>>>>
>>>> This list will contain no duplicates.
>>>>
>>>> However, I am not getting this to work in my UIMA code:
>>>>
>>>>
>>>> System.out.println("Index size is: "+idx.size());
>>>>
>>>> AnnotationIndex<Annotation> idx = aJCas.getAnnotationIndex();
>>>>
>>>> ArrayList<Annotation> tempList = new ArrayList<Annotation>(idx.size());
>>>>
>>>> FSIterator it  = idx.iterator();
>>>>
>>>> //load the Annotations into a temporary list.  includes duplicates
>>>>
>>>> while(it.hasNext())
>>>> {
>>>>
>>>> tempList.add((Annotation) it.next());
>>>>
>>>> }
>>>>
>>>> Iterator tempIt = tempList.iterator();
>>>>
>>>> // remove all Annotations from the index.  this works fine
>>>>
>>>> while(tempIt.hasNext()){
>>>> ((Annotation) tempIt.next()).removeFromIndexes(aJCas);
>>>> }
>>>>
>>>> // push tempList into HashSet
>>>>
>>>> HashSet<Annotation> hs = new HashSet<Annotation>();
>>>>
>>>> hs.addAll(tempList);
>>>>
>>>> // this should not allow duplicates
>>>>
>>>> System.out.println("HS length: "+hs.size()); // size should be less the
>>>> size of the FSIndex by the number of duplicates.  it is not. This is the
>>>> main problem
>>>>
>>>> tempList.clear();
>>>>
>>>> tempList.addAll(hs);
>>>>
>>>> System.out.println("templist length: "+tempList.size());
>>>>
>>>>
>>>> Iterator<Annotation> it2 = tempList.iterator(); // this should now be the
>>>> clean list
>>>>
>>>>
>>>> while(it2.hasNext()){
>>>> it2.next().addToIndexes(aJCas);
>>>> }
>
>


Re: can't remove duplicate Annotations with Java Set Collection

Posted by Marshall Schor <ms...@schor.com>.
Eclipse pointed out a bug in my code, fix is below
On 11/18/2014 9:37 AM, Marshall Schor wrote:
> Hi Kameron,
>
> Based on this code snip, the two "cat" annotations you create are "different"
> using the HashSet definition, because they correspond to two distinct UIMA
> Annotations.  You could, for instance, update one of them, and not the other;
> that it the sense in which they are distinct.  In the case below, the two "cat"
> annotations would have different begin and end offsets.
>
> I'm guessing that your goal was to to have one of the two cat annotations be
> dropped.
>
> You could do that by using your hash set approach, if you defined equal to mean
> that just the covered text of the annotation was equal.
>
> Here's one way to do this:  Create a "cover object" for your annotations, that
> contains a reference to the annotation and defines equals and hashcode (you have
> to define these together).  The easy way to do this is using Eclipse - define a
> new class: e.g.
>
> public class MyAnnotationWithSpecialEquals {
>   final public Annotation annotation;   // the covered annotation
>  
>   public MyAnnotationWithSpecialEquals(Annotation annotation) {
>     this.annotation = annotation;
>   }
> }
>
> and then use Eclipse to define the equals and hashcode:  go to Menu -> Source ->
> Generate hashcode() and equals()
> and have it generate one based on just "annotation".  This will not (yet) be
> correct - it should add two methods like this:
>
>   @Override
>   public int hashCode() {
>     final int prime = 31;
>     int result = 1;
>     result = prime * result + ((annotation == null) ? 0 : annotation.hashCode());
>     return result;
>   }
>
>   @Override
>   public boolean equals(Object obj) {
>     if (this == obj)
>       return true;
>     if (obj == null)
>       return false;
>     if (getClass() != obj.getClass())
>       return false;
>     MyAnnotationWithSpecialEquals other = (MyAnnotationWithSpecialEquals) obj;
            // buggy lines
>     if (annotation == null) {
>       if (other.annotation != null)
>         return false;
            //  replace above with
      if (annotation == null && other.annotation != null)
        return false;
>     } else if (!annotation.equals(other.annotation))
>       return false;
>     return true;
>   }
>
> Now, to get these to be the definitions you want, which depend only on the
> covered text, modify these as follows:
>
> First, for hashCode, use only the string covered text:
>
>   @Override
>   public int hashCode() {
>     final int prime = 31;
>     int result = 1;
>     result = prime * result + ((annotation == null) ? 0 :
> annotation.getCoveredText().hashCode());
>     return result;
>   }
>
> and for equals: replace test for annotation being "equal" with
> annotation.getCoveredText() being "equal",
> with some additional edge case testing in case of nulls:
>
> @Override
>   public boolean equals(Object obj) {
>     if (this == obj)
>       return true;
>     if (obj == null)
>       return false;
>     if (getClass() != obj.getClass())
>       return false;
>     MyAnnotationWithSpecialEquals other = (MyAnnotationWithSpecialEquals) obj;
>     if (annotation == null) {
>       if (other.annotation != null)
>         return false;
>     } else {
>       String coveredText = annotation.getCoveredText();
>       if (coveredText == null) {
>          if (other.annotation.getCoveredText() == null)
>             return true;  // handle special case if covered text is null
>          else return false;
>       }
>       // coveredText is not null
>       if (!coveredText.equals(other.annotation.getCoveredText()))
>         return false;
>       return true;
>     }
>   }
>
> HTH.  -Marshall
>
>
> On 11/17/2014 4:49 PM, Kameron Cole wrote:
>> Input text:
>>
>> ------------------------------
>>
>> bird, cat, bush, cat
>>
>> ----------------------------
>>
>> Create the Annotations:
>>
>> -------------------------------
>> docText = aJCas.getDocumentText();
>>
>> *int* index = docText.indexOf("cat");
>> *while*(index >= 0) {
>> *int* begin = index;
>> *int* end = begin+3;
>> Animal animal = *new* Animal(aJCas);
>> animal.setBegin(begin);
>> animal.setEnd(end);
>> animal.addToIndexes();
>>  
>>    index = docText.indexOf("cat", index+1);
>> }
>>
>> index = docText.indexOf("bird");
>> *while*(index >= 0) {
>> *int* begin = index;
>> *int* end = begin+4;
>> Animal animal = *new* Animal(aJCas);
>> animal.setBegin(begin);
>> animal.setEnd(end);
>> animal.addToIndexes();
>>  
>>    index = docText.indexOf("bird", index+1);
>> }
>>
>> index = docText.indexOf("bush");
>> *while*(index >= 0) {
>> *int* begin = index;
>> *int* end = begin+4;
>> Vegetable animal = *new* Vegetable(aJCas);
>> animal.setBegin(begin);
>> animal.setEnd(end);
>> animal.addToIndexes();
>>  
>>    index = docText.indexOf("bird", index+1);
>> }
>> ------------------------------------------------------
>>
>>     --------------------------------------------------------------------------------
>>
>>     *Kameron Arthur Cole
>>     Watson Content Analytics Applications and Support
>>     email: **kameroncole@us.ibm.com* <ma...@us.ibm.com>* | Tel:
>>     305-389-8512**
>>     **upload logs here* <http://www.ecurep.ibm.com/app/upload>  
>>
>> 	
>>
>> 	
>>
>>     <ht...@ibmwatson><http://www.youtube.com/user/IBMWatsonSolutions/videos>
>>
>>
>>     --------------------------------------------------------------------------------
>>
>>
>>
>> Inactive hide details for Marshall Schor ---11/17/2014 04:35:06 PM---Hi, Two
>> Feature Structures are considered "equal" in the sMarshall Schor ---11/17/2014
>> 04:35:06 PM---Hi, Two Feature Structures are considered "equal" in the sense
>> used by HashSet, if
>>
>> From: Marshall Schor <ms...@schor.com>
>> To: user@uima.apache.org
>> Date: 11/17/2014 04:35 PM
>> Subject: Re: can't remove duplicate Annotations with Java Set Collection
>>
>> --------------------------------------------------------------------------------
>>
>>
>>
>> Hi,
>>
>> Two Feature Structures are considered "equal" in the sense used by HashSet, if
>> fs1.equals(fs2).   The definition of "equals" for feature structures is: they
>> are equal if they refer to the same underlying CAS, and the same "spot" in the
>> the CAS Heap.
>>
>> How did you create the Annotations that you think are "equal" in the HashSet
>> sense?
>>
>> Here's an example of two annotations which are "equal" in the UIMA sorted index
>> sense, but unequal in the HashSet sense.
>>
>>    Annotation fs1 = new Annotation(myJCas, 0, 4); // create an instance of
>> Annotation in myJCas, with a begin = 0, and end = 4.
>>    Annotation fs2 = new Annotation(myJCas, 0, 4); // create an instance of
>> Annotation in myJCas, with a begin = 0, and end = 4.
>>
>> These will be "equal" in the UIMA sense - the same kind of annotation, in the
>> same CAS, with the same feature values, but will be two distinct feature
>> structures, so HashSet will consider them to be unequal.
>>
>> Could this be what is happening in your case?  Please respond so we can see if
>> there's another straight-forward solution that does what you're looking for.
>>
>> -Marshall
>> on 11/17/2014 2:59 PM, Kameron Cole wrote:
>>> Hello,
>>>
>>> I am trying to get rid of duplicates in the FSIndex.  I thought a very
>>> clever way to do this would be to just push them into a Set Collection in
>>> Java, which does not allow duplicates. This is very (very) standard Java:
>>>
>>> ArrayList al = new ArrayList();
>>> // add elements to al, including duplicates
>>> HashSet hs = new HashSet();
>>> hs.addAll(al);
>>> al.clear();
>>> al.addAll(hs);
>>>
>>> This list will contain no duplicates.
>>>
>>> However, I am not getting this to work in my UIMA code:
>>>
>>>
>>> System.out.println("Index size is: "+idx.size());
>>>
>>> AnnotationIndex<Annotation> idx = aJCas.getAnnotationIndex();
>>>
>>> ArrayList<Annotation> tempList = new ArrayList<Annotation>(idx.size());
>>>
>>> FSIterator it  = idx.iterator();
>>>
>>> //load the Annotations into a temporary list.  includes duplicates
>>>
>>> while(it.hasNext())
>>> {
>>>
>>> tempList.add((Annotation) it.next());
>>>
>>> }
>>>
>>> Iterator tempIt = tempList.iterator();
>>>
>>> // remove all Annotations from the index.  this works fine
>>>
>>> while(tempIt.hasNext()){
>>> ((Annotation) tempIt.next()).removeFromIndexes(aJCas);
>>> }
>>>
>>> // push tempList into HashSet
>>>
>>> HashSet<Annotation> hs = new HashSet<Annotation>();
>>>
>>> hs.addAll(tempList);
>>>
>>> // this should not allow duplicates
>>>
>>> System.out.println("HS length: "+hs.size()); // size should be less the
>>> size of the FSIndex by the number of duplicates.  it is not. This is the
>>> main problem
>>>
>>> tempList.clear();
>>>
>>> tempList.addAll(hs);
>>>
>>> System.out.println("templist length: "+tempList.size());
>>>
>>>
>>> Iterator<Annotation> it2 = tempList.iterator(); // this should now be the
>>> clean list
>>>
>>>
>>> while(it2.hasNext()){
>>> it2.next().addToIndexes(aJCas);
>>> }
>>
>


Re: can't remove duplicate Annotations with Java Set Collection

Posted by Marshall Schor <ms...@schor.com>.
Hi Kameron,

Based on this code snip, the two "cat" annotations you create are "different"
using the HashSet definition, because they correspond to two distinct UIMA
Annotations.  You could, for instance, update one of them, and not the other;
that it the sense in which they are distinct.  In the case below, the two "cat"
annotations would have different begin and end offsets.

I'm guessing that your goal was to to have one of the two cat annotations be
dropped.

You could do that by using your hash set approach, if you defined equal to mean
that just the covered text of the annotation was equal.

Here's one way to do this:  Create a "cover object" for your annotations, that
contains a reference to the annotation and defines equals and hashcode (you have
to define these together).  The easy way to do this is using Eclipse - define a
new class: e.g.

public class MyAnnotationWithSpecialEquals {
  final public Annotation annotation;   // the covered annotation
 
  public MyAnnotationWithSpecialEquals(Annotation annotation) {
    this.annotation = annotation;
  }
}

and then use Eclipse to define the equals and hashcode:  go to Menu -> Source ->
Generate hashcode() and equals()
and have it generate one based on just "annotation".  This will not (yet) be
correct - it should add two methods like this:

  @Override
  public int hashCode() {
    final int prime = 31;
    int result = 1;
    result = prime * result + ((annotation == null) ? 0 : annotation.hashCode());
    return result;
  }

  @Override
  public boolean equals(Object obj) {
    if (this == obj)
      return true;
    if (obj == null)
      return false;
    if (getClass() != obj.getClass())
      return false;
    MyAnnotationWithSpecialEquals other = (MyAnnotationWithSpecialEquals) obj;
    if (annotation == null) {
      if (other.annotation != null)
        return false;
    } else if (!annotation.equals(other.annotation))
      return false;
    return true;
  }

Now, to get these to be the definitions you want, which depend only on the
covered text, modify these as follows:

First, for hashCode, use only the string covered text:

  @Override
  public int hashCode() {
    final int prime = 31;
    int result = 1;
    result = prime * result + ((annotation == null) ? 0 :
annotation.getCoveredText().hashCode());
    return result;
  }

and for equals: replace test for annotation being "equal" with
annotation.getCoveredText() being "equal",
with some additional edge case testing in case of nulls:

@Override
  public boolean equals(Object obj) {
    if (this == obj)
      return true;
    if (obj == null)
      return false;
    if (getClass() != obj.getClass())
      return false;
    MyAnnotationWithSpecialEquals other = (MyAnnotationWithSpecialEquals) obj;
    if (annotation == null) {
      if (other.annotation != null)
        return false;
    } else {
      String coveredText = annotation.getCoveredText();
      if (coveredText == null) {
         if (other.annotation.getCoveredText() == null)
            return true;  // handle special case if covered text is null
         else return false;
      }
      // coveredText is not null
      if (!coveredText.equals(other.annotation.getCoveredText()))
        return false;
      return true;
    }
  }

HTH.  -Marshall


On 11/17/2014 4:49 PM, Kameron Cole wrote:
>
> Input text:
>
> ------------------------------
>
> bird, cat, bush, cat
>
> ----------------------------
>
> Create the Annotations:
>
> -------------------------------
> docText = aJCas.getDocumentText();
>
> *int* index = docText.indexOf("cat");
> *while*(index >= 0) {
> *int* begin = index;
> *int* end = begin+3;
> Animal animal = *new* Animal(aJCas);
> animal.setBegin(begin);
> animal.setEnd(end);
> animal.addToIndexes();
>  
>    index = docText.indexOf("cat", index+1);
> }
>
> index = docText.indexOf("bird");
> *while*(index >= 0) {
> *int* begin = index;
> *int* end = begin+4;
> Animal animal = *new* Animal(aJCas);
> animal.setBegin(begin);
> animal.setEnd(end);
> animal.addToIndexes();
>  
>    index = docText.indexOf("bird", index+1);
> }
>
> index = docText.indexOf("bush");
> *while*(index >= 0) {
> *int* begin = index;
> *int* end = begin+4;
> Vegetable animal = *new* Vegetable(aJCas);
> animal.setBegin(begin);
> animal.setEnd(end);
> animal.addToIndexes();
>  
>    index = docText.indexOf("bird", index+1);
> }
> ------------------------------------------------------
>
>     --------------------------------------------------------------------------------
>
>     *Kameron Arthur Cole
>     Watson Content Analytics Applications and Support
>     email: **kameroncole@us.ibm.com* <ma...@us.ibm.com>* | Tel:
>     305-389-8512**
>     **upload logs here* <http://www.ecurep.ibm.com/app/upload>  
>
> 	
>
> 	
>
>     <ht...@ibmwatson><http://www.youtube.com/user/IBMWatsonSolutions/videos>
>
>
>     --------------------------------------------------------------------------------
>
>
>
> Inactive hide details for Marshall Schor ---11/17/2014 04:35:06 PM---Hi, Two
> Feature Structures are considered "equal" in the sMarshall Schor ---11/17/2014
> 04:35:06 PM---Hi, Two Feature Structures are considered "equal" in the sense
> used by HashSet, if
>
> From: Marshall Schor <ms...@schor.com>
> To: user@uima.apache.org
> Date: 11/17/2014 04:35 PM
> Subject: Re: can't remove duplicate Annotations with Java Set Collection
>
> --------------------------------------------------------------------------------
>
>
>
> Hi,
>
> Two Feature Structures are considered "equal" in the sense used by HashSet, if
> fs1.equals(fs2).   The definition of "equals" for feature structures is: they
> are equal if they refer to the same underlying CAS, and the same "spot" in the
> the CAS Heap.
>
> How did you create the Annotations that you think are "equal" in the HashSet
> sense?
>
> Here's an example of two annotations which are "equal" in the UIMA sorted index
> sense, but unequal in the HashSet sense.
>
>    Annotation fs1 = new Annotation(myJCas, 0, 4); // create an instance of
> Annotation in myJCas, with a begin = 0, and end = 4.
>    Annotation fs2 = new Annotation(myJCas, 0, 4); // create an instance of
> Annotation in myJCas, with a begin = 0, and end = 4.
>
> These will be "equal" in the UIMA sense - the same kind of annotation, in the
> same CAS, with the same feature values, but will be two distinct feature
> structures, so HashSet will consider them to be unequal.
>
> Could this be what is happening in your case?  Please respond so we can see if
> there's another straight-forward solution that does what you're looking for.
>
> -Marshall
> on 11/17/2014 2:59 PM, Kameron Cole wrote:
> > Hello,
> >
> > I am trying to get rid of duplicates in the FSIndex.  I thought a very
> > clever way to do this would be to just push them into a Set Collection in
> > Java, which does not allow duplicates. This is very (very) standard Java:
> >
> > ArrayList al = new ArrayList();
> > // add elements to al, including duplicates
> > HashSet hs = new HashSet();
> > hs.addAll(al);
> > al.clear();
> > al.addAll(hs);
> >
> > This list will contain no duplicates.
> >
> > However, I am not getting this to work in my UIMA code:
> >
> >
> > System.out.println("Index size is: "+idx.size());
> >
> > AnnotationIndex<Annotation> idx = aJCas.getAnnotationIndex();
> >
> > ArrayList<Annotation> tempList = new ArrayList<Annotation>(idx.size());
> >
> > FSIterator it  = idx.iterator();
> >
> > //load the Annotations into a temporary list.  includes duplicates
> >
> > while(it.hasNext())
> > {
> >
> > tempList.add((Annotation) it.next());
> >
> > }
> >
> > Iterator tempIt = tempList.iterator();
> >
> > // remove all Annotations from the index.  this works fine
> >
> > while(tempIt.hasNext()){
> > ((Annotation) tempIt.next()).removeFromIndexes(aJCas);
> > }
> >
> > // push tempList into HashSet
> >
> > HashSet<Annotation> hs = new HashSet<Annotation>();
> >
> > hs.addAll(tempList);
> >
> > // this should not allow duplicates
> >
> > System.out.println("HS length: "+hs.size()); // size should be less the
> > size of the FSIndex by the number of duplicates.  it is not. This is the
> > main problem
> >
> > tempList.clear();
> >
> > tempList.addAll(hs);
> >
> > System.out.println("templist length: "+tempList.size());
> >
> >
> > Iterator<Annotation> it2 = tempList.iterator(); // this should now be the
> > clean list
> >
> >
> > while(it2.hasNext()){
> > it2.next().addToIndexes(aJCas);
> > }
>
>


Re: can't remove duplicate Annotations with Java Set Collection

Posted by Kameron Cole <ka...@us.ibm.com>.
Input text:

------------------------------

bird, cat, bush, cat

----------------------------

Create the Annotations:

-------------------------------
docText = aJCas.getDocumentText();

		 int index = docText.indexOf("cat");
		 while(index >= 0) {
			 int begin = index;
				int end = begin+3;
				Animal animal = new Animal(aJCas);
				animal.setBegin(begin);
				animal.setEnd(end);
				animal.addToIndexes();

		    index = docText.indexOf("cat", index+1);
		 }

		 index = docText.indexOf("bird");
		 while(index >= 0) {
			 int begin = index;
				int end = begin+4;
				Animal animal = new Animal(aJCas);
				animal.setBegin(begin);
				animal.setEnd(end);
				animal.addToIndexes();

		    index = docText.indexOf("bird", index+1);
		 }

		 index = docText.indexOf("bush");
		 while(index >= 0) {
			 int begin = index;
				int end = begin+4;
				Vegetable animal = new Vegetable(aJCas);
				animal.setBegin(begin);
				animal.setEnd(end);
				animal.addToIndexes();

		    index = docText.indexOf("bird", index+1);
		 }
------------------------------------------------------
                                                                               
                                                                               
                                                                               
 Kameron Arthur Cole                                                           
 Watson Content                                                                
 Analytics Applications                                                        
 and Support                                                                   
 email:                                                                        
 kameroncole@us.ibm.com                                                        
 | Tel: 305-389-8512                                                           
 upload logs here                                                              
                                                                               
                                                                               
                                                                               
                                                                               
                                                                               





From:	Marshall Schor <ms...@schor.com>
To:	user@uima.apache.org
Date:	11/17/2014 04:35 PM
Subject:	Re: can't remove duplicate Annotations with Java Set Collection



Hi,

Two Feature Structures are considered "equal" in the sense used by HashSet,
if
fs1.equals(fs2).   The definition of "equals" for feature structures is:
they
are equal if they refer to the same underlying CAS, and the same "spot" in
the
the CAS Heap.

How did you create the Annotations that you think are "equal" in the
HashSet sense?

Here's an example of two annotations which are "equal" in the UIMA sorted
index
sense, but unequal in the HashSet sense.

    Annotation fs1 = new Annotation(myJCas, 0, 4); // create an instance of
Annotation in myJCas, with a begin = 0, and end = 4.
    Annotation fs2 = new Annotation(myJCas, 0, 4); // create an instance of
Annotation in myJCas, with a begin = 0, and end = 4.

These will be "equal" in the UIMA sense - the same kind of annotation, in
the
same CAS, with the same feature values, but will be two distinct feature
structures, so HashSet will consider them to be unequal.

Could this be what is happening in your case?  Please respond so we can see
if
there's another straight-forward solution that does what you're looking
for.

-Marshall
on 11/17/2014 2:59 PM, Kameron Cole wrote:
> Hello,
>
> I am trying to get rid of duplicates in the FSIndex.  I thought a very
> clever way to do this would be to just push them into a Set Collection in
> Java, which does not allow duplicates. This is very (very) standard Java:
>
> ArrayList al = new ArrayList();
> // add elements to al, including duplicates
> HashSet hs = new HashSet();
> hs.addAll(al);
> al.clear();
> al.addAll(hs);
>
> This list will contain no duplicates.
>
> However, I am not getting this to work in my UIMA code:
>
>
> System.out.println("Index size is: "+idx.size());
>
> AnnotationIndex<Annotation> idx = aJCas.getAnnotationIndex();
>
> ArrayList<Annotation> tempList = new ArrayList<Annotation>(idx.size());
>
> 		 FSIterator it  = idx.iterator();
>
> //load the Annotations into a temporary list.  includes duplicates
>
> 		 while(it.hasNext())
> 		 {
>
> 		 		 tempList.add((Annotation) it.next());
>
> 		 }
>
> Iterator tempIt = tempList.iterator();
>
> // remove all Annotations from the index.  this works fine
>
> 		 		 while(tempIt.hasNext()){
> 		 		 		 ((Annotation) tempIt.next
()).removeFromIndexes(aJCas);
> 		 		 }
>
> // push tempList into HashSet
>
> 		 HashSet<Annotation> hs = new HashSet<Annotation>();
>
> 		 hs.addAll(tempList);
>
> // this should not allow duplicates
>
> System.out.println("HS length: "+hs.size()); // size should be less the
> size of the FSIndex by the number of duplicates.  it is not. This is the
> main problem
>
> tempList.clear();
>
> 		 tempList.addAll(hs);
>
> 		 System.out.println("templist length: "+tempList.size());
>
>
> Iterator<Annotation> it2 = tempList.iterator(); // this should now be the
> clean list
>
>
> 		 		 while(it2.hasNext()){
> 		 		 		 it2.next().addToIndexes(aJCas);
> 		 		 }


Re: can't remove duplicate Annotations with Java Set Collection

Posted by Marshall Schor <ms...@schor.com>.
Hi,

Two Feature Structures are considered "equal" in the sense used by HashSet, if
fs1.equals(fs2).   The definition of "equals" for feature structures is: they
are equal if they refer to the same underlying CAS, and the same "spot" in the
the CAS Heap.

How did you create the Annotations that you think are "equal" in the HashSet sense?

Here's an example of two annotations which are "equal" in the UIMA sorted index
sense, but unequal in the HashSet sense.

    Annotation fs1 = new Annotation(myJCas, 0, 4); // create an instance of
Annotation in myJCas, with a begin = 0, and end = 4.
    Annotation fs2 = new Annotation(myJCas, 0, 4); // create an instance of
Annotation in myJCas, with a begin = 0, and end = 4.

These will be "equal" in the UIMA sense - the same kind of annotation, in the
same CAS, with the same feature values, but will be two distinct feature
structures, so HashSet will consider them to be unequal.

Could this be what is happening in your case?  Please respond so we can see if
there's another straight-forward solution that does what you're looking for.

-Marshall
on 11/17/2014 2:59 PM, Kameron Cole wrote:
> Hello,
>
> I am trying to get rid of duplicates in the FSIndex.  I thought a very
> clever way to do this would be to just push them into a Set Collection in
> Java, which does not allow duplicates. This is very (very) standard Java:
>
> ArrayList al = new ArrayList();
> // add elements to al, including duplicates
> HashSet hs = new HashSet();
> hs.addAll(al);
> al.clear();
> al.addAll(hs);
>
> This list will contain no duplicates.
>
> However, I am not getting this to work in my UIMA code:
>
>
> System.out.println("Index size is: "+idx.size());
>
> AnnotationIndex<Annotation> idx = aJCas.getAnnotationIndex();
>
> ArrayList<Annotation> tempList = new ArrayList<Annotation>(idx.size());
>
> 	FSIterator it  = idx.iterator();
>
> //load the Annotations into a temporary list.  includes duplicates
>
> 	while(it.hasNext())
> 	{
>
> 		tempList.add((Annotation) it.next());
>
> 	}
>
> Iterator tempIt = tempList.iterator();
>
> // remove all Annotations from the index.  this works fine
>
> 		while(tempIt.hasNext()){
> 			((Annotation) tempIt.next()).removeFromIndexes(aJCas);
> 		}
>
> // push tempList into HashSet
>
> 	HashSet<Annotation> hs = new HashSet<Annotation>();
>
> 	hs.addAll(tempList);
>
> // this should not allow duplicates
>
> System.out.println("HS length: "+hs.size()); // size should be less the
> size of the FSIndex by the number of duplicates.  it is not. This is the
> main problem
>
> tempList.clear();
>
> 	tempList.addAll(hs);
>
> 	System.out.println("templist length: "+tempList.size());
>
>
> Iterator<Annotation> it2 = tempList.iterator(); // this should now be the
> clean list
>
>
> 		while(it2.hasNext()){
> 			it2.next().addToIndexes(aJCas);
> 		}


can't remove duplicate Annotations with Java Set Collection

Posted by Kameron Cole <ka...@us.ibm.com>.
Hello,

I am trying to get rid of duplicates in the FSIndex.  I thought a very
clever way to do this would be to just push them into a Set Collection in
Java, which does not allow duplicates. This is very (very) standard Java:

ArrayList al = new ArrayList();
// add elements to al, including duplicates
HashSet hs = new HashSet();
hs.addAll(al);
al.clear();
al.addAll(hs);

This list will contain no duplicates.

However, I am not getting this to work in my UIMA code:


System.out.println("Index size is: "+idx.size());

AnnotationIndex<Annotation> idx = aJCas.getAnnotationIndex();

ArrayList<Annotation> tempList = new ArrayList<Annotation>(idx.size());

	FSIterator it  = idx.iterator();

//load the Annotations into a temporary list.  includes duplicates

	while(it.hasNext())
	{

		tempList.add((Annotation) it.next());

	}

Iterator tempIt = tempList.iterator();

// remove all Annotations from the index.  this works fine

		while(tempIt.hasNext()){
			((Annotation) tempIt.next()).removeFromIndexes(aJCas);
		}

// push tempList into HashSet

	HashSet<Annotation> hs = new HashSet<Annotation>();

	hs.addAll(tempList);

// this should not allow duplicates

System.out.println("HS length: "+hs.size()); // size should be less the
size of the FSIndex by the number of duplicates.  it is not. This is the
main problem

tempList.clear();

	tempList.addAll(hs);

	System.out.println("templist length: "+tempList.size());


Iterator<Annotation> it2 = tempList.iterator(); // this should now be the
clean list


		while(it2.hasNext()){
			it2.next().addToIndexes(aJCas);
		}

Re: UIMA pipeline output persistence and multiple layer web based visualisation tools? Suggestions?

Posted by James Kitching <ja...@hemseye.org>.
Hi Richard,

Thanks very much for your replies (UIMA and webanno groups).  I have 
updated my research blog with your response and will hopefully soon get 
to a point where I can follow the advice you have given.

Regards

James Kitching

On 16/11/2014 13:08, Richard Eckart de Castilho wrote:
> Hi James,
>
> <taking Apache UIMA hat off, putting UKP Lab hat on>
>
> I'm working on the WebAnno (and DKPro Core) project. Thanks for checking it out and providing feedback!
>
> On 16.11.2014, at 13:08, James Kitching <ja...@hemseye.org> wrote:
>
>> I had hoped that I could use webanno for this task however webanno does not allow the direct import of UIMA components or UIMA output.
> WebAnno [1] is an annotation tool. It's scope is not the building or running of pipelines.
>
> WebAnno can quite immediately consume XMIs created with the DKPro Core [2] collection of UIMA components, since the built-in annotation types of WebAnno are modelled after the DKPro Core types. Actually, all import/export filters in WebAnno are UIMA components from DKPro Core.
>
> WebAnno is not meant to be a universal XMI/CAS editor. It is meant to be a user-friendly annotation tool. However, we internally use the UIMA CAS to represent annotations.
>
> To visualizes UIMA annotations in WebAnno, they need to be mapped to WebAnnos (cf. brat's) interaction paradigms. To this end, WebAnno supports three specific type-system design patterns (aka "layer types"): span, relation, and chain.
> A "span" is basically a UIMA "Annotation". A "relation" is an annotation with two features pointing to a "span" type. A "chain" is basically a variation of a linked list. Additional primitive features are also supported.
>
> If you want to use WebAnno with existing UIMA data, you can try this:
>
> - define custom annotation layers in WebAnno that closely resemble the data you wish to interface with
> - export the layer definition as JSON
> - edit the JSON file and change the type names "webanno.custom.XXX" into whatever these types are called in your existing UIMA type sytem
> - create a new project
> - import the modified JSON layer configuration
>
> For basic type system designs, this should work ok.
>
> Cheers,
>
> -- Richard
>
> [1] https://code.google.com/p/webanno/
> [2] https://code.google.com/p/dkpro-core-asl/


Re: UIMA pipeline output persistence and multiple layer web based visualisation tools? Suggestions?

Posted by Richard Eckart de Castilho <re...@apache.org>.
Hi James,

<taking Apache UIMA hat off, putting UKP Lab hat on>

I'm working on the WebAnno (and DKPro Core) project. Thanks for checking it out and providing feedback!

On 16.11.2014, at 13:08, James Kitching <ja...@hemseye.org> wrote:

> I had hoped that I could use webanno for this task however webanno does not allow the direct import of UIMA components or UIMA output.

WebAnno [1] is an annotation tool. It's scope is not the building or running of pipelines.

WebAnno can quite immediately consume XMIs created with the DKPro Core [2] collection of UIMA components, since the built-in annotation types of WebAnno are modelled after the DKPro Core types. Actually, all import/export filters in WebAnno are UIMA components from DKPro Core.

WebAnno is not meant to be a universal XMI/CAS editor. It is meant to be a user-friendly annotation tool. However, we internally use the UIMA CAS to represent annotations.

To visualizes UIMA annotations in WebAnno, they need to be mapped to WebAnnos (cf. brat's) interaction paradigms. To this end, WebAnno supports three specific type-system design patterns (aka "layer types"): span, relation, and chain.
A "span" is basically a UIMA "Annotation". A "relation" is an annotation with two features pointing to a "span" type. A "chain" is basically a variation of a linked list. Additional primitive features are also supported.

If you want to use WebAnno with existing UIMA data, you can try this:

- define custom annotation layers in WebAnno that closely resemble the data you wish to interface with
- export the layer definition as JSON
- edit the JSON file and change the type names "webanno.custom.XXX" into whatever these types are called in your existing UIMA type sytem
- create a new project
- import the modified JSON layer configuration

For basic type system designs, this should work ok.

Cheers,

-- Richard

[1] https://code.google.com/p/webanno/
[2] https://code.google.com/p/dkpro-core-asl/