You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@uima.apache.org by James Kitching <ja...@hemseye.org> on 2014/11/16 13:08:41 UTC

UIMA pipeline output persistence and multiple layer web based visualisation tools? Suggestions?

Hi

(First of all a BIG THANKS to ALL open source developers at UIMA and the 
other projects I mention below whom I am now relying on  :-) ).

I am looking at researching a particular knowledge base extraction task 
using UIMA components as part of the solution.  To do this work I need 
UIMA output persistence and to be able to visualise this output as 
multiple annotation layers on the same text.  Ultimately I want my 
automated annotations and visualisations to be web based and allow me to 
make additional manual annotations if required.  Once I have my multiple 
annotations made on a text I will then be able to apply my new knowledge 
extraction logic.

I have looked at webanno (which incorporates Brat for its UI) and 
U-Compare as well as Argo (See https://code.google.com/p/webanno/, 
http://brat.nlplab.org/, http://u-compare.org/, 
http://nactem.ac.u/ucompare/downloads/, 
http://argo.nactem.ac.uk/about-argo/).  I had hoped that I could use 
webanno for this task however webanno does not allow the direct import 
of UIMA components or UIMA output.  I found that I could get U-Compare 
to work as I wanted and it shows promise however if I get my any 
configuration wrong between any UIMA components it crashes out.  I got 
the software to work for me after I spent more time reading the manual.  
I found I needed to manually configure the input types for each 
component in the pipeline.  The software recognises subsequent pipeline 
component compatibility when a new component is added to a work flow.  
My initial errors came as I had initially expected subsequent U-Compare 
components to automatically pick up their input from the output from 
previous workflow components.  Whilst the U-compare software does 
support the saving of previous session data the software is not fully 
open source so I do not have easy access to this data.  I have not 
looked at the webservice pipeline generation fetaures of U-Compare as 
yet;  this might hold promise if it gives me a download configuration 
rather than a hosted solution.  When I looked at the argo tool I had 
similar problems with a lack of output.  I would assume for the same 
reasons.  Again Argo is not fully open source so I cannot work on 
modifying this tool to my own ends.  Are there any other better tools 
available that support web based UIMA layered visualisation and output 
persistence?

Currently I plan to continue to experiment with UIMA components using 
U-compare however I am looking to implement persistence and 
visualisation in a production tool.  If someone already has a good open 
source implementation of this need I would prefer not to spend time 
reinventing this particular wheel.

I would be very happy if the U-compare and webanno teams would work 
together and get their software integrated.  I will pass this mail onto 
these teams as a suggestion.

The particular data extraction task I am interested in is different to 
the current popular research shared task 
(http://www.nist.gov/tac/2014/KBP/) and one which I plan to share once  
have made some progress.

Thanks in advance.

Further information about me and my project can be found at www.hemseye.org

James Kitching

Re: can't remove duplicate Annotations with Java Set Collection

Posted by Kameron Cole <ka...@us.ibm.com>.

Having trouble with the Comparator.  If I compare Object, no issue:



If I compare Annotation, it doesn't recognize the method





                                                                               
                                                                               
                                                                               
 Kameron Arthur Cole                                                           
 Watson Content                                                                
 Analytics Applications                                                        
 and Support                                                                   
 email:                                                                        
 kameroncole@us.ibm.com                                                        
 | Tel: 305-389-8512                                                           
 upload logs here                                                              
                                                                               
                                                                               
                                                                               
                                                                               
                                                                               






From:	Richard Eckart de Castilho <re...@apache.org>
To:	user@uima.apache.org
Date:	11/18/2014 02:34 AM
Subject:	Re: can't remove duplicate Annotations with Java Set Collection



On 17.11.2014, at 20:59, Kameron Cole <ka...@us.ibm.com> wrote:

> I am trying to get rid of duplicates in the FSIndex.  I thought a very
> clever way to do this would be to just push them into a Set Collection in
> Java, which does not allow duplicates. This is very (very) standard Java:
>
> ArrayList al = new ArrayList();
> // add elements to al, including duplicates
> HashSet hs = new HashSet();
> hs.addAll(al);
> al.clear();
> al.addAll(hs);

There is no universal definition of equality other than object equality.
And this is what Java defaults to unless equals() and hashCode() are
implemented.
Since each UIMA user might have a different opinion on what is equal, UIMA
defers this decision to its indexing mechanism instead of hard-baking it
into equals()/hashcode() methods.

I suggest you do the following:

- implement a Comparator<FeatureStructure> or Comparator<AnnotationFS>
according to your definition of equality

- create a TreeSet based on your comparator

- drop all your annotations into this TreeSet

- "duplicates" according to your definition are dropped. The rest is sorted
(or not) depending on what your comparator returns in a non-equality case
(return value != 0).

Cheers,

-- Richard