You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@uima.apache.org by Andrew Borthwick <an...@corp.spock.com> on 2007/05/31 21:22:35 UTC

Human annotation tool for UIMA

Hi,

I am new to UIMA and am trying to find the best tool for doing doing human
document annotation.  For instance, if I am building a machine-learning
based named entity tagger and I want to tag some text with named entities to
train my recognizers, what would be the best way to do that?  Also, is there
some UIMA tool for getting an F-Measure or some other metrics on the
accuracy of my NE tagger relative to the human taggings?

Thanks,

----------------------------------
Andrew Borthwick, Ph.D.
Principal Scientist
Spock Networks

Re: Human annotation tool for UIMA

Posted by "J. William Murdock" <bi...@murdocks.org>.
Joe Andrieu wrote:
> Are there any open source annotators people would recommend for integrating with UIMA?
>   

One manual annotation tool that is open source is Knowtator (which is 
licensed under MPL 1.1).  As I understand it, Knowtator is intended for 
manual annotation entities and relationships in text.  It is a layer on 
top of the Protégé open source ontology editor.  I'm not really familiar 
enough with Knowtator to explicitly recommend it.  Considering its 
stated goals and the framework that it was developed on, it seems like 
it might be particularly well suited to enabling manual annotations for 
relatively elaborate type systems that have a lot of structure and many 
common relation annotation types.  The flip side is that it may be 
overkill for the (more common) task of marking up instances of a flat 
list of named-entity types.  In any event, my point here is just that 
anyone who is thinking of building a mapping from an open source manual 
annotation tool to UIMA may want to consider Knowtator, especially if 
they are interested in a lot of expressive power.



Re: Human annotation tool for UIMA

Posted by Andrew Borthwick <an...@corp.spock.com>.
Thanks for your help, everyone.  I think that I will first explore using
GATE as Julien suggests below.  However, if anyone had any native UIMA tool
for doing manual annotations, it would be much appreciated.  Thilo, would
your tool work as a temporary solution?

It seems to me that having some sort of solution here would be an important
part of offering a complete UIMA toolset.  In our organization, we are
planning on working with both existing corpora and corpora which are more
specific to the domain on which we are working.  There is also the problem
of testing our NLP solution on documents of interest to us.  So there will
be many scenarios in which it won't be sufficient to simply use standard
corpora and we will need to do some annotation ourselves.

Thanks again,
Andrew Borthwick



On 6/1/07, Julien Nioche <J....@dcs.shef.ac.uk> wrote:
>
>  GATE (http://gate.ac.uk) is open source and allows to create annotations
> manually. The interface is tightly bound to the GATE API so porting it to
> UIMA would be a relatively costly operation. It would certainly be easier to
> write a new annotation tool from scratch. However GATE could be used in the
> meantime to annotate documents and save them as XML, which could be loaded
> by UIMA at a later stage.
>
> There is also a UIMA plugin for GATE which allows to call UIMA processes
> from GATE and vice versa; but I am not sure it works with the Apache version
> of UIMA. That could help using existing UIMA resources for pre-annotating
> documents.
>
> Hope that helps
>
> Julien
>
>  Thilo Goetz wrote:
>
>  BTW, I have recently hacked UIMA's CAS Visual Debugger for a
> colleague to allow creating manual annotations.  That was a
> one-off, though, and I haven't fed it back into the main code
> base.  If people are interested in that kind of
> functionality, let me know.  We wouldn't want to compete with
> a dedicated annotation tool, though.
>
>  I would like to second Andrew Borthwick's original request for a UIMA-savvy annotation tool.
>
> Adding it to a full-featured annotator would probably be great, but having an open source option would offer the most potential
> upside for UIMA. Alembic and its replacement Callisto are free, but not open source, so I believe MITRE would have to add support
> for UIMA themselves.
>
> Are there any open source annotators people would recommend for integrating with UIMA?
>
> -j
>
> --
> Joe Andrieu
> SwitchBook Software
> http://www.switchbook.comjoe@switchbook.com
> +1 (805) 705-8651
>
>
>
>
>
>

Re: Human annotation tool for UIMA

Posted by Julien Nioche <J....@dcs.shef.ac.uk>.
Hi Thilo

GATE is LGPL. Besides the page I mentioned earlier, 
http://sourceforge.net/projects/gate also contains a lot of information 
about GATE (mailing lists, forums, feature requests, etc..)

J.
>
>> GATE (http://gate.ac.uk) is open source and allows to create 
>> annotations manually. The interface is tightly bound to the GATE API 
>> so porting it to UIMA would be a relatively costly operation. It 
>> would certainly be easier to write a new annotation tool from 
>> scratch. However GATE could be used in the meantime to annotate 
>> documents and save them as XML, which could be loaded by UIMA at a 
>> later stage.
>>
>> There is also a UIMA plugin for GATE which allows to call UIMA 
>> processes from GATE and vice versa; but I am not sure it works with 
>> the Apache version of UIMA. That could help using existing UIMA 
>> resources for pre-annotating documents.
>>
>> Hope that helps
>>
>> Julien
>
> Hi Julien,
>
> what open source license is GATE under?  I looked at the documentation,
> but couldn't find it.
>
> --Thilo
>
>


Re: Human annotation tool for UIMA

Posted by Thilo Goetz <tw...@gmx.de>.
Julien Nioche wrote:
> GATE (http://gate.ac.uk) is open source and allows to create annotations 
> manually. The interface is tightly bound to the GATE API so porting it 
> to UIMA would be a relatively costly operation. It would certainly be 
> easier to write a new annotation tool from scratch. However GATE could 
> be used in the meantime to annotate documents and save them as XML, 
> which could be loaded by UIMA at a later stage.
> 
> There is also a UIMA plugin for GATE which allows to call UIMA processes 
> from GATE and vice versa; but I am not sure it works with the Apache 
> version of UIMA. That could help using existing UIMA resources for 
> pre-annotating documents.
> 
> Hope that helps
> 
> Julien

Hi Julien,

what open source license is GATE under?  I looked at the documentation,
but couldn't find it.

--Thilo



RE: Human annotation tool for UIMA

Posted by Joe Andrieu <jo...@switchbook.com>.
Thilo Goetz wrote:
> BTW, I have recently hacked UIMA's CAS Visual Debugger for a 
> colleague to allow creating manual annotations.  That was a 
> one-off, though, and I haven't fed it back into the main code 
> base.  If people are interested in that kind of 
> functionality, let me know.  We wouldn't want to compete with 
> a dedicated annotation tool, though.

I would like to second Andrew Borthwick's original request for a UIMA-savvy annotation tool. 

Adding it to a full-featured annotator would probably be great, but having an open source option would offer the most potential
upside for UIMA. Alembic and its replacement Callisto are free, but not open source, so I believe MITRE would have to add support
for UIMA themselves.

Are there any open source annotators people would recommend for integrating with UIMA?

-j

--
Joe Andrieu
SwitchBook Software
http://www.switchbook.com
joe@switchbook.com
+1 (805) 705-8651 



Re: Human annotation tool for UIMA

Posted by Thilo Goetz <tw...@gmx.de>.
Katrin Tomanek wrote:
> Dear Andrew,
> 
>> I am new to UIMA and am trying to find the best tool for doing doing 
>> human
>> document annotation.  For instance, if I am building a machine-learning
>> based named entity tagger and I want to tag some text with named 
>> entities to
>> train my recognizers, what would be the best way to do that? 
> I think thats a matter of human/manual annotation. Generating training 
> material for ML is a laborious task which is not an issue of UIMA (as 
> far as I understand). Depending on the entities and the domain and 
> language you are interested in you might find annotated corpora (you 
> might check http://torvald.aksis.uib.no/corpora/ for existing corpora).
> 
> regards,
> Katrin
> 
> 
> 

Also check http://registry.dfki.de/ for software tools to manually
annotate text.  I have no personal experience with any of the tools
there, but I have heard Alembic being favorably mentioned.  It looks
like it is freely available.  It should be relatively easy to transform
the resulting XML to UIMA, either via XSLT, or with a custom XML
parser that reads the annotated data and feeds it into UIMA APIs.

BTW, I have recently hacked UIMA's CAS Visual Debugger for a colleague
to allow creating manual annotations.  That was a one-off, though, and
I haven't fed it back into the main code base.  If people are interested
in that kind of functionality, let me know.  We wouldn't want to compete
with a dedicated annotation tool, though.

--Thilo


Re: Human annotation tool for UIMA

Posted by Katrin Tomanek <to...@coling-uni-jena.de>.
Dear Andrew,

> I am new to UIMA and am trying to find the best tool for doing doing human
> document annotation.  For instance, if I am building a machine-learning
> based named entity tagger and I want to tag some text with named 
> entities to
> train my recognizers, what would be the best way to do that? 
I think thats a matter of human/manual annotation. Generating training 
material for ML is a laborious task which is not an issue of UIMA (as 
far as I understand). Depending on the entities and the domain and 
language you are interested in you might find annotated corpora (you 
might check http://torvald.aksis.uib.no/corpora/ for existing corpora).

regards,
Katrin



-- 
Katrin Tomanek
Jena University Language and Information Engineering (JULIE) Lab
Phone: +49-3641-944307
Fax:   +49-3641-944321
email: tomanek@coling-uni-jena.de
URL:   http://www.coling.uni-jena.de