You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@uima.apache.org by James Kitching <ja...@hemseye.org> on 2014/11/16 13:08:41 UTC
UIMA pipeline output persistence and multiple layer web based visualisation
tools? Suggestions?
Hi
(First of all a BIG THANKS to ALL open source developers at UIMA and the
other projects I mention below whom I am now relying on :-) ).
I am looking at researching a particular knowledge base extraction task
using UIMA components as part of the solution. To do this work I need
UIMA output persistence and to be able to visualise this output as
multiple annotation layers on the same text. Ultimately I want my
automated annotations and visualisations to be web based and allow me to
make additional manual annotations if required. Once I have my multiple
annotations made on a text I will then be able to apply my new knowledge
extraction logic.
I have looked at webanno (which incorporates Brat for its UI) and
U-Compare as well as Argo (See https://code.google.com/p/webanno/,
http://brat.nlplab.org/, http://u-compare.org/,
http://nactem.ac.u/ucompare/downloads/,
http://argo.nactem.ac.uk/about-argo/). I had hoped that I could use
webanno for this task however webanno does not allow the direct import
of UIMA components or UIMA output. I found that I could get U-Compare
to work as I wanted and it shows promise however if I get my any
configuration wrong between any UIMA components it crashes out. I got
the software to work for me after I spent more time reading the manual.
I found I needed to manually configure the input types for each
component in the pipeline. The software recognises subsequent pipeline
component compatibility when a new component is added to a work flow.
My initial errors came as I had initially expected subsequent U-Compare
components to automatically pick up their input from the output from
previous workflow components. Whilst the U-compare software does
support the saving of previous session data the software is not fully
open source so I do not have easy access to this data. I have not
looked at the webservice pipeline generation fetaures of U-Compare as
yet; this might hold promise if it gives me a download configuration
rather than a hosted solution. When I looked at the argo tool I had
similar problems with a lack of output. I would assume for the same
reasons. Again Argo is not fully open source so I cannot work on
modifying this tool to my own ends. Are there any other better tools
available that support web based UIMA layered visualisation and output
persistence?
Currently I plan to continue to experiment with UIMA components using
U-compare however I am looking to implement persistence and
visualisation in a production tool. If someone already has a good open
source implementation of this need I would prefer not to spend time
reinventing this particular wheel.
I would be very happy if the U-compare and webanno teams would work
together and get their software integrated. I will pass this mail onto
these teams as a suggestion.
The particular data extraction task I am interested in is different to
the current popular research shared task
(http://www.nist.gov/tac/2014/KBP/) and one which I plan to share once
have made some progress.
Thanks in advance.
Further information about me and my project can be found at www.hemseye.org
James Kitching
Re: can't remove duplicate Annotations with Java Set Collection
Posted by Kameron Cole <ka...@us.ibm.com>.
Having trouble with the Comparator. If I compare Object, no issue:
If I compare Annotation, it doesn't recognize the method
Kameron Arthur Cole
Watson Content
Analytics Applications
and Support
email:
kameroncole@us.ibm.com
| Tel: 305-389-8512
upload logs here
From: Richard Eckart de Castilho <re...@apache.org>
To: user@uima.apache.org
Date: 11/18/2014 02:34 AM
Subject: Re: can't remove duplicate Annotations with Java Set Collection
On 17.11.2014, at 20:59, Kameron Cole <ka...@us.ibm.com> wrote:
> I am trying to get rid of duplicates in the FSIndex. I thought a very
> clever way to do this would be to just push them into a Set Collection in
> Java, which does not allow duplicates. This is very (very) standard Java:
>
> ArrayList al = new ArrayList();
> // add elements to al, including duplicates
> HashSet hs = new HashSet();
> hs.addAll(al);
> al.clear();
> al.addAll(hs);
There is no universal definition of equality other than object equality.
And this is what Java defaults to unless equals() and hashCode() are
implemented.
Since each UIMA user might have a different opinion on what is equal, UIMA
defers this decision to its indexing mechanism instead of hard-baking it
into equals()/hashcode() methods.
I suggest you do the following:
- implement a Comparator<FeatureStructure> or Comparator<AnnotationFS>
according to your definition of equality
- create a TreeSet based on your comparator
- drop all your annotations into this TreeSet
- "duplicates" according to your definition are dropped. The rest is sorted
(or not) depending on what your comparator returns in a non-equality case
(return value != 0).
Cheers,
-- Richard
Re: can't remove duplicate Annotations with Java Set Collection
Posted by Richard Eckart de Castilho <re...@apache.org>.
On 17.11.2014, at 20:59, Kameron Cole <ka...@us.ibm.com> wrote:
> I am trying to get rid of duplicates in the FSIndex. I thought a very
> clever way to do this would be to just push them into a Set Collection in
> Java, which does not allow duplicates. This is very (very) standard Java:
>
> ArrayList al = new ArrayList();
> // add elements to al, including duplicates
> HashSet hs = new HashSet();
> hs.addAll(al);
> al.clear();
> al.addAll(hs);
There is no universal definition of equality other than object equality. And this is what Java defaults to unless equals() and hashCode() are implemented.
Since each UIMA user might have a different opinion on what is equal, UIMA defers this decision to its indexing mechanism instead of hard-baking it into equals()/hashcode() methods.
I suggest you do the following:
- implement a Comparator<FeatureStructure> or Comparator<AnnotationFS> according to your definition of equality
- create a TreeSet based on your comparator
- drop all your annotations into this TreeSet
- "duplicates" according to your definition are dropped. The rest is sorted (or not) depending on what your comparator returns in a non-equality case (return value != 0).
Cheers,
-- Richard
Re: can't remove duplicate Annotations with Java Set Collection
Posted by Marshall Schor <ms...@schor.com>.
Sorry, the pictures/images don't come through this email list... If you want to
include them, please post them on a well-know clip-site, and include a link to
them in your email.
I think the issue you're having is that you wrote:
...
_@Override_
__*_public_*_ _*_int_*_ compare(Annotation __o1__, Annotation __o2__) {_
__...
The @Override indicates an error if the method signature you're defining can't
be matched to a method in the supertype.
The supertype here is "Comparator" and it only has a signature for compare with
2 args which are both "Object"s.
You can remove the @Override to get rid of this check.
-Marshall
On 11/18/2014 2:06 PM, Kameron Cole wrote:
>
> Awesome. Your change will work. And i will try it, thank you!
>
> But maybe you can help me to get this to work? As I posted, if I use Object
> as the parameter in the compare method signature, Eclipse is ok; but when I
> change it to Annotation, it says I must override the methods - as though
> something about Annotator confuses Eclipse. Here's the code I really want to
> work:
>
>
> -----------------------------------
>
> *public* *static* ArrayList<Annotation> dedupe (AnnotationIndex<Annotation>
> idx2){
>
> ArrayList<Annotation> tempList = *new* ArrayList<Annotation>(idx2.size());
> FSIterator<Annotation> it2 = idx2.iterator();
> *while*(it2.hasNext())
> {
>
> tempList.add((Annotation) it2.next());
>
> }
>
> _Set_ set = *_new_*_ TreeSet(_*_new_*_ Comparator() {_
> ___@Override_
> __*_public_*_ _*_int_*_ compare(Annotation __o1__, Annotation __o2__) {_
> __*_if_*_(__o1__.getCoveredText()==__o2__.getCoveredText()){_
> _ _*_return_*_ 0;_
> _ }_
> _ _*_return_*_ 1;_
> _}_
> _})_;
>
> _set__.addAll(__tempList__)_;
>
> tempList.clear();
> tempList.addAll(_set_);
> System.*/out/*.println("templist length: "+tempList.size());
> *return* tempList;
>
> -----------------------------
>
> But look:at what Eclipse gives me:
>
>
>
>
>
>
>
> --------------------------------------------------------------------------------
>
> *Kameron Arthur Cole
> Watson Content Analytics Applications and Support
> email: **kameroncole@us.ibm.com* <ma...@us.ibm.com>* | Tel:
> 305-389-8512**
> **upload logs here* <http://www.ecurep.ibm.com/app/upload>
>
>
>
>
>
> <ht...@ibmwatson><http://www.youtube.com/user/IBMWatsonSolutions/videos>
>
>
> --------------------------------------------------------------------------------
>
>
>
> Inactive hide details for Marshall Schor ---11/18/2014 11:54:50 AM---An even
> simpler approach: Use a HashMap, where the key is Marshall Schor ---11/18/2014
> 11:54:50 AM---An even simpler approach: Use a HashMap, where the key is the
> annotation.getCoveredText() and the va
>
> From: Marshall Schor <ms...@schor.com>
> To: user@uima.apache.org
> Date: 11/18/2014 11:54 AM
> Subject: Re: can't remove duplicate Annotations with Java Set Collection
>
> --------------------------------------------------------------------------------
>
>
>
> An even simpler approach:
>
> Use a HashMap, where the key is the annotation.getCoveredText() and the value is
> the annotation, instead of a HashSet.
>
> replace this (in your original):
>
> // push tempList into HashSet
> HashSet<Annotation> hs = new HashSet<Annotation>();
> hs.addAll(tempList);
>
>
> with
>
> // push tempList into HashMap
> HashMap<String, Annotation> hm = new HashSet<String, Annotation>();
> for (Annotation a : tempList) {
> hm.put(a.getCoveredText(), a);
> }
>
> -Marshall
>
> On 11/18/2014 9:45 AM, Marshall Schor wrote:
> > Eclipse pointed out a bug in my code, fix is below
> > On 11/18/2014 9:37 AM, Marshall Schor wrote:
> >> Hi Kameron,
> >>
> >> Based on this code snip, the two "cat" annotations you create are "different"
> >> using the HashSet definition, because they correspond to two distinct UIMA
> >> Annotations. You could, for instance, update one of them, and not the other;
> >> that it the sense in which they are distinct. In the case below, the two "cat"
> >> annotations would have different begin and end offsets.
> >>
> >> I'm guessing that your goal was to to have one of the two cat annotations be
> >> dropped.
> >>
> >> You could do that by using your hash set approach, if you defined equal to mean
> >> that just the covered text of the annotation was equal.
> >>
> >> Here's one way to do this: Create a "cover object" for your annotations, that
> >> contains a reference to the annotation and defines equals and hashcode (you
> have
> >> to define these together). The easy way to do this is using Eclipse - define a
> >> new class: e.g.
> >>
> >> public class MyAnnotationWithSpecialEquals {
> >> final public Annotation annotation; // the covered annotation
> >>
> >> public MyAnnotationWithSpecialEquals(Annotation annotation) {
> >> this.annotation = annotation;
> >> }
> >> }
> >>
> >> and then use Eclipse to define the equals and hashcode: go to Menu ->
> Source ->
> >> Generate hashcode() and equals()
> >> and have it generate one based on just "annotation". This will not (yet) be
> >> correct - it should add two methods like this:
> >>
> >> @Override
> >> public int hashCode() {
> >> final int prime = 31;
> >> int result = 1;
> >> result = prime * result + ((annotation == null) ? 0 :
> annotation.hashCode());
> >> return result;
> >> }
> >>
> >> @Override
> >> public boolean equals(Object obj) {
> >> if (this == obj)
> >> return true;
> >> if (obj == null)
> >> return false;
> >> if (getClass() != obj.getClass())
> >> return false;
> >> MyAnnotationWithSpecialEquals other = (MyAnnotationWithSpecialEquals) obj;
> > // buggy lines
> >> if (annotation == null) {
> >> if (other.annotation != null)
> >> return false;
> > // replace above with
> > if (annotation == null && other.annotation != null)
> > return false;
> >> } else if (!annotation.equals(other.annotation))
> >> return false;
> >> return true;
> >> }
> >>
> >> Now, to get these to be the definitions you want, which depend only on the
> >> covered text, modify these as follows:
> >>
> >> First, for hashCode, use only the string covered text:
> >>
> >> @Override
> >> public int hashCode() {
> >> final int prime = 31;
> >> int result = 1;
> >> result = prime * result + ((annotation == null) ? 0 :
> >> annotation.getCoveredText().hashCode());
> >> return result;
> >> }
> >>
> >> and for equals: replace test for annotation being "equal" with
> >> annotation.getCoveredText() being "equal",
> >> with some additional edge case testing in case of nulls:
> >>
> >> @Override
> >> public boolean equals(Object obj) {
> >> if (this == obj)
> >> return true;
> >> if (obj == null)
> >> return false;
> >> if (getClass() != obj.getClass())
> >> return false;
> >> MyAnnotationWithSpecialEquals other = (MyAnnotationWithSpecialEquals) obj;
> >> if (annotation == null) {
> >> if (other.annotation != null)
> >> return false;
> >> } else {
> >> String coveredText = annotation.getCoveredText();
> >> if (coveredText == null) {
> >> if (other.annotation.getCoveredText() == null)
> >> return true; // handle special case if covered text is null
> >> else return false;
> >> }
> >> // coveredText is not null
> >> if (!coveredText.equals(other.annotation.getCoveredText()))
> >> return false;
> >> return true;
> >> }
> >> }
> >>
> >> HTH. -Marshall
> >>
> >>
> >> On 11/17/2014 4:49 PM, Kameron Cole wrote:
> >>> Input text:
> >>>
> >>> ------------------------------
> >>>
> >>> bird, cat, bush, cat
> >>>
> >>> ----------------------------
> >>>
> >>> Create the Annotations:
> >>>
> >>> -------------------------------
> >>> docText = aJCas.getDocumentText();
> >>>
> >>> *int* index = docText.indexOf("cat");
> >>> *while*(index >= 0) {
> >>> *int* begin = index;
> >>> *int* end = begin+3;
> >>> Animal animal = *new* Animal(aJCas);
> >>> animal.setBegin(begin);
> >>> animal.setEnd(end);
> >>> animal.addToIndexes();
> >>>
> >>> index = docText.indexOf("cat", index+1);
> >>> }
> >>>
> >>> index = docText.indexOf("bird");
> >>> *while*(index >= 0) {
> >>> *int* begin = index;
> >>> *int* end = begin+4;
> >>> Animal animal = *new* Animal(aJCas);
> >>> animal.setBegin(begin);
> >>> animal.setEnd(end);
> >>> animal.addToIndexes();
> >>>
> >>> index = docText.indexOf("bird", index+1);
> >>> }
> >>>
> >>> index = docText.indexOf("bush");
> >>> *while*(index >= 0) {
> >>> *int* begin = index;
> >>> *int* end = begin+4;
> >>> Vegetable animal = *new* Vegetable(aJCas);
> >>> animal.setBegin(begin);
> >>> animal.setEnd(end);
> >>> animal.addToIndexes();
> >>>
> >>> index = docText.indexOf("bird", index+1);
> >>> }
> >>> ------------------------------------------------------
> >>>
> >>>
> --------------------------------------------------------------------------------
> >>>
> >>> *Kameron Arthur Cole
> >>> Watson Content Analytics Applications and Support
> >>> email: **kameroncole@us.ibm.com* <ma...@us.ibm.com>* | Tel:
> >>> 305-389-8512**
> >>> **upload logs here* <http://www.ecurep.ibm.com/app/upload>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> <ht...@ibmwatson><http://www.youtube.com/user/IBMWatsonSolutions/videos>
> >>>
> >>>
> >>>
> --------------------------------------------------------------------------------
> >>>
> >>>
> >>>
> >>> Inactive hide details for Marshall Schor ---11/17/2014 04:35:06 PM---Hi, Two
> >>> Feature Structures are considered "equal" in the sMarshall Schor ---11/17/2014
> >>> 04:35:06 PM---Hi, Two Feature Structures are considered "equal" in the sense
> >>> used by HashSet, if
> >>>
> >>> From: Marshall Schor <ms...@schor.com>
> >>> To: user@uima.apache.org
> >>> Date: 11/17/2014 04:35 PM
> >>> Subject: Re: can't remove duplicate Annotations with Java Set Collection
> >>>
> >>>
> --------------------------------------------------------------------------------
> >>>
> >>>
> >>>
> >>> Hi,
> >>>
> >>> Two Feature Structures are considered "equal" in the sense used by HashSet, if
> >>> fs1.equals(fs2). The definition of "equals" for feature structures is: they
> >>> are equal if they refer to the same underlying CAS, and the same "spot" in the
> >>> the CAS Heap.
> >>>
> >>> How did you create the Annotations that you think are "equal" in the HashSet
> >>> sense?
> >>>
> >>> Here's an example of two annotations which are "equal" in the UIMA sorted
> index
> >>> sense, but unequal in the HashSet sense.
> >>>
> >>> Annotation fs1 = new Annotation(myJCas, 0, 4); // create an instance of
> >>> Annotation in myJCas, with a begin = 0, and end = 4.
> >>> Annotation fs2 = new Annotation(myJCas, 0, 4); // create an instance of
> >>> Annotation in myJCas, with a begin = 0, and end = 4.
> >>>
> >>> These will be "equal" in the UIMA sense - the same kind of annotation, in the
> >>> same CAS, with the same feature values, but will be two distinct feature
> >>> structures, so HashSet will consider them to be unequal.
> >>>
> >>> Could this be what is happening in your case? Please respond so we can see if
> >>> there's another straight-forward solution that does what you're looking for.
> >>>
> >>> -Marshall
> >>> on 11/17/2014 2:59 PM, Kameron Cole wrote:
> >>>> Hello,
> >>>>
> >>>> I am trying to get rid of duplicates in the FSIndex. I thought a very
> >>>> clever way to do this would be to just push them into a Set Collection in
> >>>> Java, which does not allow duplicates. This is very (very) standard Java:
> >>>>
> >>>> ArrayList al = new ArrayList();
> >>>> // add elements to al, including duplicates
> >>>> HashSet hs = new HashSet();
> >>>> hs.addAll(al);
> >>>> al.clear();
> >>>> al.addAll(hs);
> >>>>
> >>>> This list will contain no duplicates.
> >>>>
> >>>> However, I am not getting this to work in my UIMA code:
> >>>>
> >>>>
> >>>> System.out.println("Index size is: "+idx.size());
> >>>>
> >>>> AnnotationIndex<Annotation> idx = aJCas.getAnnotationIndex();
> >>>>
> >>>> ArrayList<Annotation> tempList = new ArrayList<Annotation>(idx.size());
> >>>>
> >>>> FSIterator it = idx.iterator();
> >>>>
> >>>> //load the Annotations into a temporary list. includes duplicates
> >>>>
> >>>> while(it.hasNext())
> >>>> {
> >>>>
> >>>> tempList.add((Annotation) it.next());
> >>>>
> >>>> }
> >>>>
> >>>> Iterator tempIt = tempList.iterator();
> >>>>
> >>>> // remove all Annotations from the index. this works fine
> >>>>
> >>>> while(tempIt.hasNext()){
> >>>> ((Annotation) tempIt.next()).removeFromIndexes(aJCas);
> >>>> }
> >>>>
> >>>> // push tempList into HashSet
> >>>>
> >>>> HashSet<Annotation> hs = new HashSet<Annotation>();
> >>>>
> >>>> hs.addAll(tempList);
> >>>>
> >>>> // this should not allow duplicates
> >>>>
> >>>> System.out.println("HS length: "+hs.size()); // size should be less the
> >>>> size of the FSIndex by the number of duplicates. it is not. This is the
> >>>> main problem
> >>>>
> >>>> tempList.clear();
> >>>>
> >>>> tempList.addAll(hs);
> >>>>
> >>>> System.out.println("templist length: "+tempList.size());
> >>>>
> >>>>
> >>>> Iterator<Annotation> it2 = tempList.iterator(); // this should now be the
> >>>> clean list
> >>>>
> >>>>
> >>>> while(it2.hasNext()){
> >>>> it2.next().addToIndexes(aJCas);
> >>>> }
> >
> >
>
>
Re: can't remove duplicate Annotations with Java Set Collection
Posted by Kameron Cole <ka...@us.ibm.com>.
Awesome. Your change will work. And i will try it, thank you!
But maybe you can help me to get this to work? As I posted, if I use
Object as the parameter in the compare method signature, Eclipse is ok; but
when I change it to Annotation, it says I must override the methods - as
though something about Annotator confuses Eclipse. Here's the code I
really want to work:
-----------------------------------
public static ArrayList<Annotation> dedupe (AnnotationIndex<Annotation>
idx2){
ArrayList<Annotation> tempList = new ArrayList<Annotation>(idx2.size
());
FSIterator<Annotation> it2 = idx2.iterator();
while(it2.hasNext())
{
tempList.add((Annotation) it2.next());
}
Set set = new TreeSet(new Comparator() {
@Override
public int compare(Annotation o1, Annotation o2) {
if(o1.getCoveredText()==o2.getCoveredText()){
return 0;
}
return 1;
}
});
set.addAll(tempList);
tempList.clear();
tempList.addAll(set);
System.out.println("templist length: "+tempList.size());
return tempList;
-----------------------------
But look:at what Eclipse gives me:
Kameron Arthur Cole
Watson Content
Analytics Applications
and Support
email:
kameroncole@us.ibm.com
| Tel: 305-389-8512
upload logs here
From: Marshall Schor <ms...@schor.com>
To: user@uima.apache.org
Date: 11/18/2014 11:54 AM
Subject: Re: can't remove duplicate Annotations with Java Set Collection
An even simpler approach:
Use a HashMap, where the key is the annotation.getCoveredText() and the
value is
the annotation, instead of a HashSet.
replace this (in your original):
// push tempList into HashSet
HashSet<Annotation> hs = new HashSet<Annotation>();
hs.addAll(tempList);
with
// push tempList into HashMap
HashMap<String, Annotation> hm = new HashSet<String, Annotation>();
for (Annotation a : tempList) {
hm.put(a.getCoveredText(), a);
}
-Marshall
On 11/18/2014 9:45 AM, Marshall Schor wrote:
> Eclipse pointed out a bug in my code, fix is below
> On 11/18/2014 9:37 AM, Marshall Schor wrote:
>> Hi Kameron,
>>
>> Based on this code snip, the two "cat" annotations you create are
"different"
>> using the HashSet definition, because they correspond to two distinct
UIMA
>> Annotations. You could, for instance, update one of them, and not the
other;
>> that it the sense in which they are distinct. In the case below, the
two "cat"
>> annotations would have different begin and end offsets.
>>
>> I'm guessing that your goal was to to have one of the two cat
annotations be
>> dropped.
>>
>> You could do that by using your hash set approach, if you defined equal
to mean
>> that just the covered text of the annotation was equal.
>>
>> Here's one way to do this: Create a "cover object" for your
annotations, that
>> contains a reference to the annotation and defines equals and hashcode
(you have
>> to define these together). The easy way to do this is using Eclipse -
define a
>> new class: e.g.
>>
>> public class MyAnnotationWithSpecialEquals {
>> final public Annotation annotation; // the covered annotation
>>
>> public MyAnnotationWithSpecialEquals(Annotation annotation) {
>> this.annotation = annotation;
>> }
>> }
>>
>> and then use Eclipse to define the equals and hashcode: go to Menu ->
Source ->
>> Generate hashcode() and equals()
>> and have it generate one based on just "annotation". This will not
(yet) be
>> correct - it should add two methods like this:
>>
>> @Override
>> public int hashCode() {
>> final int prime = 31;
>> int result = 1;
>> result = prime * result + ((annotation == null) ? 0 :
annotation.hashCode());
>> return result;
>> }
>>
>> @Override
>> public boolean equals(Object obj) {
>> if (this == obj)
>> return true;
>> if (obj == null)
>> return false;
>> if (getClass() != obj.getClass())
>> return false;
>> MyAnnotationWithSpecialEquals other =
(MyAnnotationWithSpecialEquals) obj;
> // buggy lines
>> if (annotation == null) {
>> if (other.annotation != null)
>> return false;
> // replace above with
> if (annotation == null && other.annotation != null)
> return false;
>> } else if (!annotation.equals(other.annotation))
>> return false;
>> return true;
>> }
>>
>> Now, to get these to be the definitions you want, which depend only on
the
>> covered text, modify these as follows:
>>
>> First, for hashCode, use only the string covered text:
>>
>> @Override
>> public int hashCode() {
>> final int prime = 31;
>> int result = 1;
>> result = prime * result + ((annotation == null) ? 0 :
>> annotation.getCoveredText().hashCode());
>> return result;
>> }
>>
>> and for equals: replace test for annotation being "equal" with
>> annotation.getCoveredText() being "equal",
>> with some additional edge case testing in case of nulls:
>>
>> @Override
>> public boolean equals(Object obj) {
>> if (this == obj)
>> return true;
>> if (obj == null)
>> return false;
>> if (getClass() != obj.getClass())
>> return false;
>> MyAnnotationWithSpecialEquals other =
(MyAnnotationWithSpecialEquals) obj;
>> if (annotation == null) {
>> if (other.annotation != null)
>> return false;
>> } else {
>> String coveredText = annotation.getCoveredText();
>> if (coveredText == null) {
>> if (other.annotation.getCoveredText() == null)
>> return true; // handle special case if covered text is null
>> else return false;
>> }
>> // coveredText is not null
>> if (!coveredText.equals(other.annotation.getCoveredText()))
>> return false;
>> return true;
>> }
>> }
>>
>> HTH. -Marshall
>>
>>
>> On 11/17/2014 4:49 PM, Kameron Cole wrote:
>>> Input text:
>>>
>>> ------------------------------
>>>
>>> bird, cat, bush, cat
>>>
>>> ----------------------------
>>>
>>> Create the Annotations:
>>>
>>> -------------------------------
>>> docText = aJCas.getDocumentText();
>>>
>>> *int* index = docText.indexOf("cat");
>>> *while*(index >= 0) {
>>> *int* begin = index;
>>> *int* end = begin+3;
>>> Animal animal = *new* Animal(aJCas);
>>> animal.setBegin(begin);
>>> animal.setEnd(end);
>>> animal.addToIndexes();
>>>
>>> index = docText.indexOf("cat", index+1);
>>> }
>>>
>>> index = docText.indexOf("bird");
>>> *while*(index >= 0) {
>>> *int* begin = index;
>>> *int* end = begin+4;
>>> Animal animal = *new* Animal(aJCas);
>>> animal.setBegin(begin);
>>> animal.setEnd(end);
>>> animal.addToIndexes();
>>>
>>> index = docText.indexOf("bird", index+1);
>>> }
>>>
>>> index = docText.indexOf("bush");
>>> *while*(index >= 0) {
>>> *int* begin = index;
>>> *int* end = begin+4;
>>> Vegetable animal = *new* Vegetable(aJCas);
>>> animal.setBegin(begin);
>>> animal.setEnd(end);
>>> animal.addToIndexes();
>>>
>>> index = docText.indexOf("bird", index+1);
>>> }
>>> ------------------------------------------------------
>>>
>>>
--------------------------------------------------------------------------------
>>>
>>> *Kameron Arthur Cole
>>> Watson Content Analytics Applications and Support
>>> email: **kameroncole@us.ibm.com* <ma...@us.ibm.com>* |
Tel:
>>> 305-389-8512**
>>> **upload logs here* <http://www.ecurep.ibm.com/app/upload>
>>>
>>>
>>>
>>>
>>>
>>> <http://www.facebook.com/ibmwatson><https://twitter.com/@ibmwatson
><http://www.youtube.com/user/IBMWatsonSolutions/videos>
>>>
>>>
>>>
--------------------------------------------------------------------------------
>>>
>>>
>>>
>>> Inactive hide details for Marshall Schor ---11/17/2014 04:35:06
PM---Hi, Two
>>> Feature Structures are considered "equal" in the sMarshall Schor
---11/17/2014
>>> 04:35:06 PM---Hi, Two Feature Structures are considered "equal" in the
sense
>>> used by HashSet, if
>>>
>>> From: Marshall Schor <ms...@schor.com>
>>> To: user@uima.apache.org
>>> Date: 11/17/2014 04:35 PM
>>> Subject: Re: can't remove duplicate Annotations with Java Set
Collection
>>>
>>>
--------------------------------------------------------------------------------
>>>
>>>
>>>
>>> Hi,
>>>
>>> Two Feature Structures are considered "equal" in the sense used by
HashSet, if
>>> fs1.equals(fs2). The definition of "equals" for feature structures
is: they
>>> are equal if they refer to the same underlying CAS, and the same "spot"
in the
>>> the CAS Heap.
>>>
>>> How did you create the Annotations that you think are "equal" in the
HashSet
>>> sense?
>>>
>>> Here's an example of two annotations which are "equal" in the UIMA
sorted index
>>> sense, but unequal in the HashSet sense.
>>>
>>> Annotation fs1 = new Annotation(myJCas, 0, 4); // create an instance
of
>>> Annotation in myJCas, with a begin = 0, and end = 4.
>>> Annotation fs2 = new Annotation(myJCas, 0, 4); // create an instance
of
>>> Annotation in myJCas, with a begin = 0, and end = 4.
>>>
>>> These will be "equal" in the UIMA sense - the same kind of annotation,
in the
>>> same CAS, with the same feature values, but will be two distinct
feature
>>> structures, so HashSet will consider them to be unequal.
>>>
>>> Could this be what is happening in your case? Please respond so we can
see if
>>> there's another straight-forward solution that does what you're looking
for.
>>>
>>> -Marshall
>>> on 11/17/2014 2:59 PM, Kameron Cole wrote:
>>>> Hello,
>>>>
>>>> I am trying to get rid of duplicates in the FSIndex. I thought a very
>>>> clever way to do this would be to just push them into a Set Collection
in
>>>> Java, which does not allow duplicates. This is very (very) standard
Java:
>>>>
>>>> ArrayList al = new ArrayList();
>>>> // add elements to al, including duplicates
>>>> HashSet hs = new HashSet();
>>>> hs.addAll(al);
>>>> al.clear();
>>>> al.addAll(hs);
>>>>
>>>> This list will contain no duplicates.
>>>>
>>>> However, I am not getting this to work in my UIMA code:
>>>>
>>>>
>>>> System.out.println("Index size is: "+idx.size());
>>>>
>>>> AnnotationIndex<Annotation> idx = aJCas.getAnnotationIndex();
>>>>
>>>> ArrayList<Annotation> tempList = new ArrayList<Annotation>(idx.size
());
>>>>
>>>> FSIterator it = idx.iterator();
>>>>
>>>> //load the Annotations into a temporary list. includes duplicates
>>>>
>>>> while(it.hasNext())
>>>> {
>>>>
>>>> tempList.add((Annotation) it.next());
>>>>
>>>> }
>>>>
>>>> Iterator tempIt = tempList.iterator();
>>>>
>>>> // remove all Annotations from the index. this works fine
>>>>
>>>> while(tempIt.hasNext()){
>>>> ((Annotation) tempIt.next()).removeFromIndexes(aJCas);
>>>> }
>>>>
>>>> // push tempList into HashSet
>>>>
>>>> HashSet<Annotation> hs = new HashSet<Annotation>();
>>>>
>>>> hs.addAll(tempList);
>>>>
>>>> // this should not allow duplicates
>>>>
>>>> System.out.println("HS length: "+hs.size()); // size should be less
the
>>>> size of the FSIndex by the number of duplicates. it is not. This is
the
>>>> main problem
>>>>
>>>> tempList.clear();
>>>>
>>>> tempList.addAll(hs);
>>>>
>>>> System.out.println("templist length: "+tempList.size());
>>>>
>>>>
>>>> Iterator<Annotation> it2 = tempList.iterator(); // this should now be
the
>>>> clean list
>>>>
>>>>
>>>> while(it2.hasNext()){
>>>> it2.next().addToIndexes(aJCas);
>>>> }
>
>
Re: can't remove duplicate Annotations with Java Set Collection
Posted by Marshall Schor <ms...@schor.com>.
An even simpler approach:
Use a HashMap, where the key is the annotation.getCoveredText() and the value is
the annotation, instead of a HashSet.
replace this (in your original):
// push tempList into HashSet
HashSet<Annotation> hs = new HashSet<Annotation>();
hs.addAll(tempList);
with
// push tempList into HashMap
HashMap<String, Annotation> hm = new HashSet<String, Annotation>();
for (Annotation a : tempList) {
hm.put(a.getCoveredText(), a);
}
-Marshall
On 11/18/2014 9:45 AM, Marshall Schor wrote:
> Eclipse pointed out a bug in my code, fix is below
> On 11/18/2014 9:37 AM, Marshall Schor wrote:
>> Hi Kameron,
>>
>> Based on this code snip, the two "cat" annotations you create are "different"
>> using the HashSet definition, because they correspond to two distinct UIMA
>> Annotations. You could, for instance, update one of them, and not the other;
>> that it the sense in which they are distinct. In the case below, the two "cat"
>> annotations would have different begin and end offsets.
>>
>> I'm guessing that your goal was to to have one of the two cat annotations be
>> dropped.
>>
>> You could do that by using your hash set approach, if you defined equal to mean
>> that just the covered text of the annotation was equal.
>>
>> Here's one way to do this: Create a "cover object" for your annotations, that
>> contains a reference to the annotation and defines equals and hashcode (you have
>> to define these together). The easy way to do this is using Eclipse - define a
>> new class: e.g.
>>
>> public class MyAnnotationWithSpecialEquals {
>> final public Annotation annotation; // the covered annotation
>>
>> public MyAnnotationWithSpecialEquals(Annotation annotation) {
>> this.annotation = annotation;
>> }
>> }
>>
>> and then use Eclipse to define the equals and hashcode: go to Menu -> Source ->
>> Generate hashcode() and equals()
>> and have it generate one based on just "annotation". This will not (yet) be
>> correct - it should add two methods like this:
>>
>> @Override
>> public int hashCode() {
>> final int prime = 31;
>> int result = 1;
>> result = prime * result + ((annotation == null) ? 0 : annotation.hashCode());
>> return result;
>> }
>>
>> @Override
>> public boolean equals(Object obj) {
>> if (this == obj)
>> return true;
>> if (obj == null)
>> return false;
>> if (getClass() != obj.getClass())
>> return false;
>> MyAnnotationWithSpecialEquals other = (MyAnnotationWithSpecialEquals) obj;
> // buggy lines
>> if (annotation == null) {
>> if (other.annotation != null)
>> return false;
> // replace above with
> if (annotation == null && other.annotation != null)
> return false;
>> } else if (!annotation.equals(other.annotation))
>> return false;
>> return true;
>> }
>>
>> Now, to get these to be the definitions you want, which depend only on the
>> covered text, modify these as follows:
>>
>> First, for hashCode, use only the string covered text:
>>
>> @Override
>> public int hashCode() {
>> final int prime = 31;
>> int result = 1;
>> result = prime * result + ((annotation == null) ? 0 :
>> annotation.getCoveredText().hashCode());
>> return result;
>> }
>>
>> and for equals: replace test for annotation being "equal" with
>> annotation.getCoveredText() being "equal",
>> with some additional edge case testing in case of nulls:
>>
>> @Override
>> public boolean equals(Object obj) {
>> if (this == obj)
>> return true;
>> if (obj == null)
>> return false;
>> if (getClass() != obj.getClass())
>> return false;
>> MyAnnotationWithSpecialEquals other = (MyAnnotationWithSpecialEquals) obj;
>> if (annotation == null) {
>> if (other.annotation != null)
>> return false;
>> } else {
>> String coveredText = annotation.getCoveredText();
>> if (coveredText == null) {
>> if (other.annotation.getCoveredText() == null)
>> return true; // handle special case if covered text is null
>> else return false;
>> }
>> // coveredText is not null
>> if (!coveredText.equals(other.annotation.getCoveredText()))
>> return false;
>> return true;
>> }
>> }
>>
>> HTH. -Marshall
>>
>>
>> On 11/17/2014 4:49 PM, Kameron Cole wrote:
>>> Input text:
>>>
>>> ------------------------------
>>>
>>> bird, cat, bush, cat
>>>
>>> ----------------------------
>>>
>>> Create the Annotations:
>>>
>>> -------------------------------
>>> docText = aJCas.getDocumentText();
>>>
>>> *int* index = docText.indexOf("cat");
>>> *while*(index >= 0) {
>>> *int* begin = index;
>>> *int* end = begin+3;
>>> Animal animal = *new* Animal(aJCas);
>>> animal.setBegin(begin);
>>> animal.setEnd(end);
>>> animal.addToIndexes();
>>>
>>> index = docText.indexOf("cat", index+1);
>>> }
>>>
>>> index = docText.indexOf("bird");
>>> *while*(index >= 0) {
>>> *int* begin = index;
>>> *int* end = begin+4;
>>> Animal animal = *new* Animal(aJCas);
>>> animal.setBegin(begin);
>>> animal.setEnd(end);
>>> animal.addToIndexes();
>>>
>>> index = docText.indexOf("bird", index+1);
>>> }
>>>
>>> index = docText.indexOf("bush");
>>> *while*(index >= 0) {
>>> *int* begin = index;
>>> *int* end = begin+4;
>>> Vegetable animal = *new* Vegetable(aJCas);
>>> animal.setBegin(begin);
>>> animal.setEnd(end);
>>> animal.addToIndexes();
>>>
>>> index = docText.indexOf("bird", index+1);
>>> }
>>> ------------------------------------------------------
>>>
>>> --------------------------------------------------------------------------------
>>>
>>> *Kameron Arthur Cole
>>> Watson Content Analytics Applications and Support
>>> email: **kameroncole@us.ibm.com* <ma...@us.ibm.com>* | Tel:
>>> 305-389-8512**
>>> **upload logs here* <http://www.ecurep.ibm.com/app/upload>
>>>
>>>
>>>
>>>
>>>
>>> <ht...@ibmwatson><http://www.youtube.com/user/IBMWatsonSolutions/videos>
>>>
>>>
>>> --------------------------------------------------------------------------------
>>>
>>>
>>>
>>> Inactive hide details for Marshall Schor ---11/17/2014 04:35:06 PM---Hi, Two
>>> Feature Structures are considered "equal" in the sMarshall Schor ---11/17/2014
>>> 04:35:06 PM---Hi, Two Feature Structures are considered "equal" in the sense
>>> used by HashSet, if
>>>
>>> From: Marshall Schor <ms...@schor.com>
>>> To: user@uima.apache.org
>>> Date: 11/17/2014 04:35 PM
>>> Subject: Re: can't remove duplicate Annotations with Java Set Collection
>>>
>>> --------------------------------------------------------------------------------
>>>
>>>
>>>
>>> Hi,
>>>
>>> Two Feature Structures are considered "equal" in the sense used by HashSet, if
>>> fs1.equals(fs2). The definition of "equals" for feature structures is: they
>>> are equal if they refer to the same underlying CAS, and the same "spot" in the
>>> the CAS Heap.
>>>
>>> How did you create the Annotations that you think are "equal" in the HashSet
>>> sense?
>>>
>>> Here's an example of two annotations which are "equal" in the UIMA sorted index
>>> sense, but unequal in the HashSet sense.
>>>
>>> Annotation fs1 = new Annotation(myJCas, 0, 4); // create an instance of
>>> Annotation in myJCas, with a begin = 0, and end = 4.
>>> Annotation fs2 = new Annotation(myJCas, 0, 4); // create an instance of
>>> Annotation in myJCas, with a begin = 0, and end = 4.
>>>
>>> These will be "equal" in the UIMA sense - the same kind of annotation, in the
>>> same CAS, with the same feature values, but will be two distinct feature
>>> structures, so HashSet will consider them to be unequal.
>>>
>>> Could this be what is happening in your case? Please respond so we can see if
>>> there's another straight-forward solution that does what you're looking for.
>>>
>>> -Marshall
>>> on 11/17/2014 2:59 PM, Kameron Cole wrote:
>>>> Hello,
>>>>
>>>> I am trying to get rid of duplicates in the FSIndex. I thought a very
>>>> clever way to do this would be to just push them into a Set Collection in
>>>> Java, which does not allow duplicates. This is very (very) standard Java:
>>>>
>>>> ArrayList al = new ArrayList();
>>>> // add elements to al, including duplicates
>>>> HashSet hs = new HashSet();
>>>> hs.addAll(al);
>>>> al.clear();
>>>> al.addAll(hs);
>>>>
>>>> This list will contain no duplicates.
>>>>
>>>> However, I am not getting this to work in my UIMA code:
>>>>
>>>>
>>>> System.out.println("Index size is: "+idx.size());
>>>>
>>>> AnnotationIndex<Annotation> idx = aJCas.getAnnotationIndex();
>>>>
>>>> ArrayList<Annotation> tempList = new ArrayList<Annotation>(idx.size());
>>>>
>>>> FSIterator it = idx.iterator();
>>>>
>>>> //load the Annotations into a temporary list. includes duplicates
>>>>
>>>> while(it.hasNext())
>>>> {
>>>>
>>>> tempList.add((Annotation) it.next());
>>>>
>>>> }
>>>>
>>>> Iterator tempIt = tempList.iterator();
>>>>
>>>> // remove all Annotations from the index. this works fine
>>>>
>>>> while(tempIt.hasNext()){
>>>> ((Annotation) tempIt.next()).removeFromIndexes(aJCas);
>>>> }
>>>>
>>>> // push tempList into HashSet
>>>>
>>>> HashSet<Annotation> hs = new HashSet<Annotation>();
>>>>
>>>> hs.addAll(tempList);
>>>>
>>>> // this should not allow duplicates
>>>>
>>>> System.out.println("HS length: "+hs.size()); // size should be less the
>>>> size of the FSIndex by the number of duplicates. it is not. This is the
>>>> main problem
>>>>
>>>> tempList.clear();
>>>>
>>>> tempList.addAll(hs);
>>>>
>>>> System.out.println("templist length: "+tempList.size());
>>>>
>>>>
>>>> Iterator<Annotation> it2 = tempList.iterator(); // this should now be the
>>>> clean list
>>>>
>>>>
>>>> while(it2.hasNext()){
>>>> it2.next().addToIndexes(aJCas);
>>>> }
>
>
Re: can't remove duplicate Annotations with Java Set Collection
Posted by Marshall Schor <ms...@schor.com>.
Eclipse pointed out a bug in my code, fix is below
On 11/18/2014 9:37 AM, Marshall Schor wrote:
> Hi Kameron,
>
> Based on this code snip, the two "cat" annotations you create are "different"
> using the HashSet definition, because they correspond to two distinct UIMA
> Annotations. You could, for instance, update one of them, and not the other;
> that it the sense in which they are distinct. In the case below, the two "cat"
> annotations would have different begin and end offsets.
>
> I'm guessing that your goal was to to have one of the two cat annotations be
> dropped.
>
> You could do that by using your hash set approach, if you defined equal to mean
> that just the covered text of the annotation was equal.
>
> Here's one way to do this: Create a "cover object" for your annotations, that
> contains a reference to the annotation and defines equals and hashcode (you have
> to define these together). The easy way to do this is using Eclipse - define a
> new class: e.g.
>
> public class MyAnnotationWithSpecialEquals {
> final public Annotation annotation; // the covered annotation
>
> public MyAnnotationWithSpecialEquals(Annotation annotation) {
> this.annotation = annotation;
> }
> }
>
> and then use Eclipse to define the equals and hashcode: go to Menu -> Source ->
> Generate hashcode() and equals()
> and have it generate one based on just "annotation". This will not (yet) be
> correct - it should add two methods like this:
>
> @Override
> public int hashCode() {
> final int prime = 31;
> int result = 1;
> result = prime * result + ((annotation == null) ? 0 : annotation.hashCode());
> return result;
> }
>
> @Override
> public boolean equals(Object obj) {
> if (this == obj)
> return true;
> if (obj == null)
> return false;
> if (getClass() != obj.getClass())
> return false;
> MyAnnotationWithSpecialEquals other = (MyAnnotationWithSpecialEquals) obj;
// buggy lines
> if (annotation == null) {
> if (other.annotation != null)
> return false;
// replace above with
if (annotation == null && other.annotation != null)
return false;
> } else if (!annotation.equals(other.annotation))
> return false;
> return true;
> }
>
> Now, to get these to be the definitions you want, which depend only on the
> covered text, modify these as follows:
>
> First, for hashCode, use only the string covered text:
>
> @Override
> public int hashCode() {
> final int prime = 31;
> int result = 1;
> result = prime * result + ((annotation == null) ? 0 :
> annotation.getCoveredText().hashCode());
> return result;
> }
>
> and for equals: replace test for annotation being "equal" with
> annotation.getCoveredText() being "equal",
> with some additional edge case testing in case of nulls:
>
> @Override
> public boolean equals(Object obj) {
> if (this == obj)
> return true;
> if (obj == null)
> return false;
> if (getClass() != obj.getClass())
> return false;
> MyAnnotationWithSpecialEquals other = (MyAnnotationWithSpecialEquals) obj;
> if (annotation == null) {
> if (other.annotation != null)
> return false;
> } else {
> String coveredText = annotation.getCoveredText();
> if (coveredText == null) {
> if (other.annotation.getCoveredText() == null)
> return true; // handle special case if covered text is null
> else return false;
> }
> // coveredText is not null
> if (!coveredText.equals(other.annotation.getCoveredText()))
> return false;
> return true;
> }
> }
>
> HTH. -Marshall
>
>
> On 11/17/2014 4:49 PM, Kameron Cole wrote:
>> Input text:
>>
>> ------------------------------
>>
>> bird, cat, bush, cat
>>
>> ----------------------------
>>
>> Create the Annotations:
>>
>> -------------------------------
>> docText = aJCas.getDocumentText();
>>
>> *int* index = docText.indexOf("cat");
>> *while*(index >= 0) {
>> *int* begin = index;
>> *int* end = begin+3;
>> Animal animal = *new* Animal(aJCas);
>> animal.setBegin(begin);
>> animal.setEnd(end);
>> animal.addToIndexes();
>>
>> index = docText.indexOf("cat", index+1);
>> }
>>
>> index = docText.indexOf("bird");
>> *while*(index >= 0) {
>> *int* begin = index;
>> *int* end = begin+4;
>> Animal animal = *new* Animal(aJCas);
>> animal.setBegin(begin);
>> animal.setEnd(end);
>> animal.addToIndexes();
>>
>> index = docText.indexOf("bird", index+1);
>> }
>>
>> index = docText.indexOf("bush");
>> *while*(index >= 0) {
>> *int* begin = index;
>> *int* end = begin+4;
>> Vegetable animal = *new* Vegetable(aJCas);
>> animal.setBegin(begin);
>> animal.setEnd(end);
>> animal.addToIndexes();
>>
>> index = docText.indexOf("bird", index+1);
>> }
>> ------------------------------------------------------
>>
>> --------------------------------------------------------------------------------
>>
>> *Kameron Arthur Cole
>> Watson Content Analytics Applications and Support
>> email: **kameroncole@us.ibm.com* <ma...@us.ibm.com>* | Tel:
>> 305-389-8512**
>> **upload logs here* <http://www.ecurep.ibm.com/app/upload>
>>
>>
>>
>>
>>
>> <ht...@ibmwatson><http://www.youtube.com/user/IBMWatsonSolutions/videos>
>>
>>
>> --------------------------------------------------------------------------------
>>
>>
>>
>> Inactive hide details for Marshall Schor ---11/17/2014 04:35:06 PM---Hi, Two
>> Feature Structures are considered "equal" in the sMarshall Schor ---11/17/2014
>> 04:35:06 PM---Hi, Two Feature Structures are considered "equal" in the sense
>> used by HashSet, if
>>
>> From: Marshall Schor <ms...@schor.com>
>> To: user@uima.apache.org
>> Date: 11/17/2014 04:35 PM
>> Subject: Re: can't remove duplicate Annotations with Java Set Collection
>>
>> --------------------------------------------------------------------------------
>>
>>
>>
>> Hi,
>>
>> Two Feature Structures are considered "equal" in the sense used by HashSet, if
>> fs1.equals(fs2). The definition of "equals" for feature structures is: they
>> are equal if they refer to the same underlying CAS, and the same "spot" in the
>> the CAS Heap.
>>
>> How did you create the Annotations that you think are "equal" in the HashSet
>> sense?
>>
>> Here's an example of two annotations which are "equal" in the UIMA sorted index
>> sense, but unequal in the HashSet sense.
>>
>> Annotation fs1 = new Annotation(myJCas, 0, 4); // create an instance of
>> Annotation in myJCas, with a begin = 0, and end = 4.
>> Annotation fs2 = new Annotation(myJCas, 0, 4); // create an instance of
>> Annotation in myJCas, with a begin = 0, and end = 4.
>>
>> These will be "equal" in the UIMA sense - the same kind of annotation, in the
>> same CAS, with the same feature values, but will be two distinct feature
>> structures, so HashSet will consider them to be unequal.
>>
>> Could this be what is happening in your case? Please respond so we can see if
>> there's another straight-forward solution that does what you're looking for.
>>
>> -Marshall
>> on 11/17/2014 2:59 PM, Kameron Cole wrote:
>>> Hello,
>>>
>>> I am trying to get rid of duplicates in the FSIndex. I thought a very
>>> clever way to do this would be to just push them into a Set Collection in
>>> Java, which does not allow duplicates. This is very (very) standard Java:
>>>
>>> ArrayList al = new ArrayList();
>>> // add elements to al, including duplicates
>>> HashSet hs = new HashSet();
>>> hs.addAll(al);
>>> al.clear();
>>> al.addAll(hs);
>>>
>>> This list will contain no duplicates.
>>>
>>> However, I am not getting this to work in my UIMA code:
>>>
>>>
>>> System.out.println("Index size is: "+idx.size());
>>>
>>> AnnotationIndex<Annotation> idx = aJCas.getAnnotationIndex();
>>>
>>> ArrayList<Annotation> tempList = new ArrayList<Annotation>(idx.size());
>>>
>>> FSIterator it = idx.iterator();
>>>
>>> //load the Annotations into a temporary list. includes duplicates
>>>
>>> while(it.hasNext())
>>> {
>>>
>>> tempList.add((Annotation) it.next());
>>>
>>> }
>>>
>>> Iterator tempIt = tempList.iterator();
>>>
>>> // remove all Annotations from the index. this works fine
>>>
>>> while(tempIt.hasNext()){
>>> ((Annotation) tempIt.next()).removeFromIndexes(aJCas);
>>> }
>>>
>>> // push tempList into HashSet
>>>
>>> HashSet<Annotation> hs = new HashSet<Annotation>();
>>>
>>> hs.addAll(tempList);
>>>
>>> // this should not allow duplicates
>>>
>>> System.out.println("HS length: "+hs.size()); // size should be less the
>>> size of the FSIndex by the number of duplicates. it is not. This is the
>>> main problem
>>>
>>> tempList.clear();
>>>
>>> tempList.addAll(hs);
>>>
>>> System.out.println("templist length: "+tempList.size());
>>>
>>>
>>> Iterator<Annotation> it2 = tempList.iterator(); // this should now be the
>>> clean list
>>>
>>>
>>> while(it2.hasNext()){
>>> it2.next().addToIndexes(aJCas);
>>> }
>>
>
Re: can't remove duplicate Annotations with Java Set Collection
Posted by Marshall Schor <ms...@schor.com>.
Hi Kameron,
Based on this code snip, the two "cat" annotations you create are "different"
using the HashSet definition, because they correspond to two distinct UIMA
Annotations. You could, for instance, update one of them, and not the other;
that it the sense in which they are distinct. In the case below, the two "cat"
annotations would have different begin and end offsets.
I'm guessing that your goal was to to have one of the two cat annotations be
dropped.
You could do that by using your hash set approach, if you defined equal to mean
that just the covered text of the annotation was equal.
Here's one way to do this: Create a "cover object" for your annotations, that
contains a reference to the annotation and defines equals and hashcode (you have
to define these together). The easy way to do this is using Eclipse - define a
new class: e.g.
public class MyAnnotationWithSpecialEquals {
final public Annotation annotation; // the covered annotation
public MyAnnotationWithSpecialEquals(Annotation annotation) {
this.annotation = annotation;
}
}
and then use Eclipse to define the equals and hashcode: go to Menu -> Source ->
Generate hashcode() and equals()
and have it generate one based on just "annotation". This will not (yet) be
correct - it should add two methods like this:
@Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + ((annotation == null) ? 0 : annotation.hashCode());
return result;
}
@Override
public boolean equals(Object obj) {
if (this == obj)
return true;
if (obj == null)
return false;
if (getClass() != obj.getClass())
return false;
MyAnnotationWithSpecialEquals other = (MyAnnotationWithSpecialEquals) obj;
if (annotation == null) {
if (other.annotation != null)
return false;
} else if (!annotation.equals(other.annotation))
return false;
return true;
}
Now, to get these to be the definitions you want, which depend only on the
covered text, modify these as follows:
First, for hashCode, use only the string covered text:
@Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + ((annotation == null) ? 0 :
annotation.getCoveredText().hashCode());
return result;
}
and for equals: replace test for annotation being "equal" with
annotation.getCoveredText() being "equal",
with some additional edge case testing in case of nulls:
@Override
public boolean equals(Object obj) {
if (this == obj)
return true;
if (obj == null)
return false;
if (getClass() != obj.getClass())
return false;
MyAnnotationWithSpecialEquals other = (MyAnnotationWithSpecialEquals) obj;
if (annotation == null) {
if (other.annotation != null)
return false;
} else {
String coveredText = annotation.getCoveredText();
if (coveredText == null) {
if (other.annotation.getCoveredText() == null)
return true; // handle special case if covered text is null
else return false;
}
// coveredText is not null
if (!coveredText.equals(other.annotation.getCoveredText()))
return false;
return true;
}
}
HTH. -Marshall
On 11/17/2014 4:49 PM, Kameron Cole wrote:
>
> Input text:
>
> ------------------------------
>
> bird, cat, bush, cat
>
> ----------------------------
>
> Create the Annotations:
>
> -------------------------------
> docText = aJCas.getDocumentText();
>
> *int* index = docText.indexOf("cat");
> *while*(index >= 0) {
> *int* begin = index;
> *int* end = begin+3;
> Animal animal = *new* Animal(aJCas);
> animal.setBegin(begin);
> animal.setEnd(end);
> animal.addToIndexes();
>
> index = docText.indexOf("cat", index+1);
> }
>
> index = docText.indexOf("bird");
> *while*(index >= 0) {
> *int* begin = index;
> *int* end = begin+4;
> Animal animal = *new* Animal(aJCas);
> animal.setBegin(begin);
> animal.setEnd(end);
> animal.addToIndexes();
>
> index = docText.indexOf("bird", index+1);
> }
>
> index = docText.indexOf("bush");
> *while*(index >= 0) {
> *int* begin = index;
> *int* end = begin+4;
> Vegetable animal = *new* Vegetable(aJCas);
> animal.setBegin(begin);
> animal.setEnd(end);
> animal.addToIndexes();
>
> index = docText.indexOf("bird", index+1);
> }
> ------------------------------------------------------
>
> --------------------------------------------------------------------------------
>
> *Kameron Arthur Cole
> Watson Content Analytics Applications and Support
> email: **kameroncole@us.ibm.com* <ma...@us.ibm.com>* | Tel:
> 305-389-8512**
> **upload logs here* <http://www.ecurep.ibm.com/app/upload>
>
>
>
>
>
> <ht...@ibmwatson><http://www.youtube.com/user/IBMWatsonSolutions/videos>
>
>
> --------------------------------------------------------------------------------
>
>
>
> Inactive hide details for Marshall Schor ---11/17/2014 04:35:06 PM---Hi, Two
> Feature Structures are considered "equal" in the sMarshall Schor ---11/17/2014
> 04:35:06 PM---Hi, Two Feature Structures are considered "equal" in the sense
> used by HashSet, if
>
> From: Marshall Schor <ms...@schor.com>
> To: user@uima.apache.org
> Date: 11/17/2014 04:35 PM
> Subject: Re: can't remove duplicate Annotations with Java Set Collection
>
> --------------------------------------------------------------------------------
>
>
>
> Hi,
>
> Two Feature Structures are considered "equal" in the sense used by HashSet, if
> fs1.equals(fs2). The definition of "equals" for feature structures is: they
> are equal if they refer to the same underlying CAS, and the same "spot" in the
> the CAS Heap.
>
> How did you create the Annotations that you think are "equal" in the HashSet
> sense?
>
> Here's an example of two annotations which are "equal" in the UIMA sorted index
> sense, but unequal in the HashSet sense.
>
> Annotation fs1 = new Annotation(myJCas, 0, 4); // create an instance of
> Annotation in myJCas, with a begin = 0, and end = 4.
> Annotation fs2 = new Annotation(myJCas, 0, 4); // create an instance of
> Annotation in myJCas, with a begin = 0, and end = 4.
>
> These will be "equal" in the UIMA sense - the same kind of annotation, in the
> same CAS, with the same feature values, but will be two distinct feature
> structures, so HashSet will consider them to be unequal.
>
> Could this be what is happening in your case? Please respond so we can see if
> there's another straight-forward solution that does what you're looking for.
>
> -Marshall
> on 11/17/2014 2:59 PM, Kameron Cole wrote:
> > Hello,
> >
> > I am trying to get rid of duplicates in the FSIndex. I thought a very
> > clever way to do this would be to just push them into a Set Collection in
> > Java, which does not allow duplicates. This is very (very) standard Java:
> >
> > ArrayList al = new ArrayList();
> > // add elements to al, including duplicates
> > HashSet hs = new HashSet();
> > hs.addAll(al);
> > al.clear();
> > al.addAll(hs);
> >
> > This list will contain no duplicates.
> >
> > However, I am not getting this to work in my UIMA code:
> >
> >
> > System.out.println("Index size is: "+idx.size());
> >
> > AnnotationIndex<Annotation> idx = aJCas.getAnnotationIndex();
> >
> > ArrayList<Annotation> tempList = new ArrayList<Annotation>(idx.size());
> >
> > FSIterator it = idx.iterator();
> >
> > //load the Annotations into a temporary list. includes duplicates
> >
> > while(it.hasNext())
> > {
> >
> > tempList.add((Annotation) it.next());
> >
> > }
> >
> > Iterator tempIt = tempList.iterator();
> >
> > // remove all Annotations from the index. this works fine
> >
> > while(tempIt.hasNext()){
> > ((Annotation) tempIt.next()).removeFromIndexes(aJCas);
> > }
> >
> > // push tempList into HashSet
> >
> > HashSet<Annotation> hs = new HashSet<Annotation>();
> >
> > hs.addAll(tempList);
> >
> > // this should not allow duplicates
> >
> > System.out.println("HS length: "+hs.size()); // size should be less the
> > size of the FSIndex by the number of duplicates. it is not. This is the
> > main problem
> >
> > tempList.clear();
> >
> > tempList.addAll(hs);
> >
> > System.out.println("templist length: "+tempList.size());
> >
> >
> > Iterator<Annotation> it2 = tempList.iterator(); // this should now be the
> > clean list
> >
> >
> > while(it2.hasNext()){
> > it2.next().addToIndexes(aJCas);
> > }
>
>
Re: can't remove duplicate Annotations with Java Set Collection
Posted by Kameron Cole <ka...@us.ibm.com>.
Input text:
------------------------------
bird, cat, bush, cat
----------------------------
Create the Annotations:
-------------------------------
docText = aJCas.getDocumentText();
int index = docText.indexOf("cat");
while(index >= 0) {
int begin = index;
int end = begin+3;
Animal animal = new Animal(aJCas);
animal.setBegin(begin);
animal.setEnd(end);
animal.addToIndexes();
index = docText.indexOf("cat", index+1);
}
index = docText.indexOf("bird");
while(index >= 0) {
int begin = index;
int end = begin+4;
Animal animal = new Animal(aJCas);
animal.setBegin(begin);
animal.setEnd(end);
animal.addToIndexes();
index = docText.indexOf("bird", index+1);
}
index = docText.indexOf("bush");
while(index >= 0) {
int begin = index;
int end = begin+4;
Vegetable animal = new Vegetable(aJCas);
animal.setBegin(begin);
animal.setEnd(end);
animal.addToIndexes();
index = docText.indexOf("bird", index+1);
}
------------------------------------------------------
Kameron Arthur Cole
Watson Content
Analytics Applications
and Support
email:
kameroncole@us.ibm.com
| Tel: 305-389-8512
upload logs here
From: Marshall Schor <ms...@schor.com>
To: user@uima.apache.org
Date: 11/17/2014 04:35 PM
Subject: Re: can't remove duplicate Annotations with Java Set Collection
Hi,
Two Feature Structures are considered "equal" in the sense used by HashSet,
if
fs1.equals(fs2). The definition of "equals" for feature structures is:
they
are equal if they refer to the same underlying CAS, and the same "spot" in
the
the CAS Heap.
How did you create the Annotations that you think are "equal" in the
HashSet sense?
Here's an example of two annotations which are "equal" in the UIMA sorted
index
sense, but unequal in the HashSet sense.
Annotation fs1 = new Annotation(myJCas, 0, 4); // create an instance of
Annotation in myJCas, with a begin = 0, and end = 4.
Annotation fs2 = new Annotation(myJCas, 0, 4); // create an instance of
Annotation in myJCas, with a begin = 0, and end = 4.
These will be "equal" in the UIMA sense - the same kind of annotation, in
the
same CAS, with the same feature values, but will be two distinct feature
structures, so HashSet will consider them to be unequal.
Could this be what is happening in your case? Please respond so we can see
if
there's another straight-forward solution that does what you're looking
for.
-Marshall
on 11/17/2014 2:59 PM, Kameron Cole wrote:
> Hello,
>
> I am trying to get rid of duplicates in the FSIndex. I thought a very
> clever way to do this would be to just push them into a Set Collection in
> Java, which does not allow duplicates. This is very (very) standard Java:
>
> ArrayList al = new ArrayList();
> // add elements to al, including duplicates
> HashSet hs = new HashSet();
> hs.addAll(al);
> al.clear();
> al.addAll(hs);
>
> This list will contain no duplicates.
>
> However, I am not getting this to work in my UIMA code:
>
>
> System.out.println("Index size is: "+idx.size());
>
> AnnotationIndex<Annotation> idx = aJCas.getAnnotationIndex();
>
> ArrayList<Annotation> tempList = new ArrayList<Annotation>(idx.size());
>
> FSIterator it = idx.iterator();
>
> //load the Annotations into a temporary list. includes duplicates
>
> while(it.hasNext())
> {
>
> tempList.add((Annotation) it.next());
>
> }
>
> Iterator tempIt = tempList.iterator();
>
> // remove all Annotations from the index. this works fine
>
> while(tempIt.hasNext()){
> ((Annotation) tempIt.next
()).removeFromIndexes(aJCas);
> }
>
> // push tempList into HashSet
>
> HashSet<Annotation> hs = new HashSet<Annotation>();
>
> hs.addAll(tempList);
>
> // this should not allow duplicates
>
> System.out.println("HS length: "+hs.size()); // size should be less the
> size of the FSIndex by the number of duplicates. it is not. This is the
> main problem
>
> tempList.clear();
>
> tempList.addAll(hs);
>
> System.out.println("templist length: "+tempList.size());
>
>
> Iterator<Annotation> it2 = tempList.iterator(); // this should now be the
> clean list
>
>
> while(it2.hasNext()){
> it2.next().addToIndexes(aJCas);
> }
Re: can't remove duplicate Annotations with Java Set Collection
Posted by Marshall Schor <ms...@schor.com>.
Hi,
Two Feature Structures are considered "equal" in the sense used by HashSet, if
fs1.equals(fs2). The definition of "equals" for feature structures is: they
are equal if they refer to the same underlying CAS, and the same "spot" in the
the CAS Heap.
How did you create the Annotations that you think are "equal" in the HashSet sense?
Here's an example of two annotations which are "equal" in the UIMA sorted index
sense, but unequal in the HashSet sense.
Annotation fs1 = new Annotation(myJCas, 0, 4); // create an instance of
Annotation in myJCas, with a begin = 0, and end = 4.
Annotation fs2 = new Annotation(myJCas, 0, 4); // create an instance of
Annotation in myJCas, with a begin = 0, and end = 4.
These will be "equal" in the UIMA sense - the same kind of annotation, in the
same CAS, with the same feature values, but will be two distinct feature
structures, so HashSet will consider them to be unequal.
Could this be what is happening in your case? Please respond so we can see if
there's another straight-forward solution that does what you're looking for.
-Marshall
on 11/17/2014 2:59 PM, Kameron Cole wrote:
> Hello,
>
> I am trying to get rid of duplicates in the FSIndex. I thought a very
> clever way to do this would be to just push them into a Set Collection in
> Java, which does not allow duplicates. This is very (very) standard Java:
>
> ArrayList al = new ArrayList();
> // add elements to al, including duplicates
> HashSet hs = new HashSet();
> hs.addAll(al);
> al.clear();
> al.addAll(hs);
>
> This list will contain no duplicates.
>
> However, I am not getting this to work in my UIMA code:
>
>
> System.out.println("Index size is: "+idx.size());
>
> AnnotationIndex<Annotation> idx = aJCas.getAnnotationIndex();
>
> ArrayList<Annotation> tempList = new ArrayList<Annotation>(idx.size());
>
> FSIterator it = idx.iterator();
>
> //load the Annotations into a temporary list. includes duplicates
>
> while(it.hasNext())
> {
>
> tempList.add((Annotation) it.next());
>
> }
>
> Iterator tempIt = tempList.iterator();
>
> // remove all Annotations from the index. this works fine
>
> while(tempIt.hasNext()){
> ((Annotation) tempIt.next()).removeFromIndexes(aJCas);
> }
>
> // push tempList into HashSet
>
> HashSet<Annotation> hs = new HashSet<Annotation>();
>
> hs.addAll(tempList);
>
> // this should not allow duplicates
>
> System.out.println("HS length: "+hs.size()); // size should be less the
> size of the FSIndex by the number of duplicates. it is not. This is the
> main problem
>
> tempList.clear();
>
> tempList.addAll(hs);
>
> System.out.println("templist length: "+tempList.size());
>
>
> Iterator<Annotation> it2 = tempList.iterator(); // this should now be the
> clean list
>
>
> while(it2.hasNext()){
> it2.next().addToIndexes(aJCas);
> }
can't remove duplicate Annotations with Java Set Collection
Posted by Kameron Cole <ka...@us.ibm.com>.
Hello,
I am trying to get rid of duplicates in the FSIndex. I thought a very
clever way to do this would be to just push them into a Set Collection in
Java, which does not allow duplicates. This is very (very) standard Java:
ArrayList al = new ArrayList();
// add elements to al, including duplicates
HashSet hs = new HashSet();
hs.addAll(al);
al.clear();
al.addAll(hs);
This list will contain no duplicates.
However, I am not getting this to work in my UIMA code:
System.out.println("Index size is: "+idx.size());
AnnotationIndex<Annotation> idx = aJCas.getAnnotationIndex();
ArrayList<Annotation> tempList = new ArrayList<Annotation>(idx.size());
FSIterator it = idx.iterator();
//load the Annotations into a temporary list. includes duplicates
while(it.hasNext())
{
tempList.add((Annotation) it.next());
}
Iterator tempIt = tempList.iterator();
// remove all Annotations from the index. this works fine
while(tempIt.hasNext()){
((Annotation) tempIt.next()).removeFromIndexes(aJCas);
}
// push tempList into HashSet
HashSet<Annotation> hs = new HashSet<Annotation>();
hs.addAll(tempList);
// this should not allow duplicates
System.out.println("HS length: "+hs.size()); // size should be less the
size of the FSIndex by the number of duplicates. it is not. This is the
main problem
tempList.clear();
tempList.addAll(hs);
System.out.println("templist length: "+tempList.size());
Iterator<Annotation> it2 = tempList.iterator(); // this should now be the
clean list
while(it2.hasNext()){
it2.next().addToIndexes(aJCas);
}
Re: UIMA pipeline output persistence and multiple layer web based
visualisation tools? Suggestions?
Posted by James Kitching <ja...@hemseye.org>.
Hi Richard,
Thanks very much for your replies (UIMA and webanno groups). I have
updated my research blog with your response and will hopefully soon get
to a point where I can follow the advice you have given.
Regards
James Kitching
On 16/11/2014 13:08, Richard Eckart de Castilho wrote:
> Hi James,
>
> <taking Apache UIMA hat off, putting UKP Lab hat on>
>
> I'm working on the WebAnno (and DKPro Core) project. Thanks for checking it out and providing feedback!
>
> On 16.11.2014, at 13:08, James Kitching <ja...@hemseye.org> wrote:
>
>> I had hoped that I could use webanno for this task however webanno does not allow the direct import of UIMA components or UIMA output.
> WebAnno [1] is an annotation tool. It's scope is not the building or running of pipelines.
>
> WebAnno can quite immediately consume XMIs created with the DKPro Core [2] collection of UIMA components, since the built-in annotation types of WebAnno are modelled after the DKPro Core types. Actually, all import/export filters in WebAnno are UIMA components from DKPro Core.
>
> WebAnno is not meant to be a universal XMI/CAS editor. It is meant to be a user-friendly annotation tool. However, we internally use the UIMA CAS to represent annotations.
>
> To visualizes UIMA annotations in WebAnno, they need to be mapped to WebAnnos (cf. brat's) interaction paradigms. To this end, WebAnno supports three specific type-system design patterns (aka "layer types"): span, relation, and chain.
> A "span" is basically a UIMA "Annotation". A "relation" is an annotation with two features pointing to a "span" type. A "chain" is basically a variation of a linked list. Additional primitive features are also supported.
>
> If you want to use WebAnno with existing UIMA data, you can try this:
>
> - define custom annotation layers in WebAnno that closely resemble the data you wish to interface with
> - export the layer definition as JSON
> - edit the JSON file and change the type names "webanno.custom.XXX" into whatever these types are called in your existing UIMA type sytem
> - create a new project
> - import the modified JSON layer configuration
>
> For basic type system designs, this should work ok.
>
> Cheers,
>
> -- Richard
>
> [1] https://code.google.com/p/webanno/
> [2] https://code.google.com/p/dkpro-core-asl/
Re: UIMA pipeline output persistence and multiple layer web based visualisation tools? Suggestions?
Posted by Richard Eckart de Castilho <re...@apache.org>.
Hi James,
<taking Apache UIMA hat off, putting UKP Lab hat on>
I'm working on the WebAnno (and DKPro Core) project. Thanks for checking it out and providing feedback!
On 16.11.2014, at 13:08, James Kitching <ja...@hemseye.org> wrote:
> I had hoped that I could use webanno for this task however webanno does not allow the direct import of UIMA components or UIMA output.
WebAnno [1] is an annotation tool. It's scope is not the building or running of pipelines.
WebAnno can quite immediately consume XMIs created with the DKPro Core [2] collection of UIMA components, since the built-in annotation types of WebAnno are modelled after the DKPro Core types. Actually, all import/export filters in WebAnno are UIMA components from DKPro Core.
WebAnno is not meant to be a universal XMI/CAS editor. It is meant to be a user-friendly annotation tool. However, we internally use the UIMA CAS to represent annotations.
To visualizes UIMA annotations in WebAnno, they need to be mapped to WebAnnos (cf. brat's) interaction paradigms. To this end, WebAnno supports three specific type-system design patterns (aka "layer types"): span, relation, and chain.
A "span" is basically a UIMA "Annotation". A "relation" is an annotation with two features pointing to a "span" type. A "chain" is basically a variation of a linked list. Additional primitive features are also supported.
If you want to use WebAnno with existing UIMA data, you can try this:
- define custom annotation layers in WebAnno that closely resemble the data you wish to interface with
- export the layer definition as JSON
- edit the JSON file and change the type names "webanno.custom.XXX" into whatever these types are called in your existing UIMA type sytem
- create a new project
- import the modified JSON layer configuration
For basic type system designs, this should work ok.
Cheers,
-- Richard
[1] https://code.google.com/p/webanno/
[2] https://code.google.com/p/dkpro-core-asl/