You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@uima.apache.org by Andrew Serff <li...@serff.net> on 2007/07/02 21:47:21 UTC

UIMA Beginners Help?

Hello.  I'm very new to the whole world of data mining and have stumbled 
upon UIMA within the last week or so.  I'm trying to go through all the 
documentation and just create a simple application but am hitting some 
road blocks and was wondering where I can find some newbie help. I 
realize this is sorta of long, so I appreciate any help anyone can give. 

First I have a question: What is the difference between a CAS and a JCas 
and why would I want to use one over the other?  Is this determined by 
the AEs I'm using (i.e. if they are implemented by extending a 
JCas_*_impl) or is there some other reason?  It seems the CAS is more 
developed and has things like CasPools, ability to make CASes with 
multiple AEs, Consumers, etc.  Should I just be using the CAS interface 
and forget about JCas?

My main issue right now is that I can't figure out how to set inputs for 
an AE.  I can't find any examples of how to do it.  See the description 
below of what I'm trying to do:

I'm trying to use some pre bundled AEs to parse some text.  I basically 
want to do Named Entity Extraction on text.  So I wrote a simple 
application that first does Sentence Boundary detection and prints out 
the sentences that it finds.  That was easy enough.  So now I would like 
to take those sentences and feed it into the named entity AE.  Both the 
Sentence Boundary AE and the NE AE I'm using are from the JULIE lab 
(http://www.julielab.de).  Reading the documentation for the NE AE it 
says that is requires inputs as Sentences (the output of the Sentence 
Boundary AE).  I cannot figure out how to set those inputs and am stuck 
at this point.  Once I figure that out, I think I'll be getting NEs out 
of the CAS. 

So now all that being said, I'm also not sure I'm coding this process 
the way I'm supposed to.  I eventually want to build all this into a 
distributed architecture with many threads running constantly processing 
using a pool of extractors.  I want to be able to submit documents to 
the named entity extractor, then persist the named entities in a 
database.  I would like to have multiple entry points into the extractor 
(i.e. adhoc (here is a doc, extract it now)) or using a collection 
reader to pull mulitple docs in at once and parse them all.  Right now, 
my simple application has 2 CASes and 2 AnalsysEngines (one for Sentence 
Detection and one for NE Extraction).  It seems like I would just want 
to make one AE that does the Sentence Detection and passes it on to the 
NE extractor, but I don't get how you do this.  Do I need to make a new 
AE and define these things in the xml that describes it?  Or is this a 
CPE? 

If anyone has a simple NE example application that could point me in the 
right direction, that would be great.  

Thanks!
Andrew

Re: UIMA Beginners Help?

Posted by Katrin Tomanek <to...@coling-uni-jena.de>.

Hi,

> All this said, it appears to me that UIMA-JNET doesn't actually require
> any prior annotations on the CAS that it gets but generates its own
> sentences internally.  Are you sure you aren't doing unnecessary work?

No, UIMA-JNET requires sentence annotations as it will search on a 
sentence level for entities. When there are no sentences, it will not 
throw an error, but also not find anything.

best regards,

Katrin

-- 
Katrin Tomanek
Jena University Language and Information Engineering (JULIE) Lab
Phone: +49-3641-944307
Fax:   +49-3641-944321
email: tomanek@coling-uni-jena.de
URL:   http://www.coling.uni-jena.de

RE: UIMA Beginners Help?

Posted by "LeHouillier, Frank D." <Fr...@gd-ais.com>.

Hopefully other people will correct and add to this:

JCas is essentially a Java wrapper for a CAS with a particular type
system.  For example if you create an Analysis Engine that produces
Sentence Annotations, JCasGen in the Eclipse plugin will generate a
Sentence class that can be easily added to a JCas.  The CAS itself is
easily accessed through the JCas.  Essentially, if you are working in
Java and you know the type system the AE is getting ahead of time it is
better to use the JCas. 

You can create a pipeline of Analysis Engines through an Aggregate
Analysis Engine.  So in the case of Sentences going to NE extraction.
You'd have an Aggregate Analysis Engine consisting of the Sentence
Extractor AE which adds Sentence annotations to the CAS and a NE
Extractor AE which uses the already present Sentence Annotations in the
course of adding Name Entity annotations (or PERSON annotations,
ORGANIZATION annotations, GPE annotations etc.) to the CAS.  I'm not
sure what you mean that your application has two CASes.  In the default
case there should only be one CAS and that should essentially represent
the entire document.  If you have some code that requires some small bit
of text such as a sentence, then this should probably be handled by code
inside of Analysis Engine code rather than by generating new CASes.  

My best piece of advice about this is that you should go through the
UIMA documentation and do the "Dave Extractor" example.  

All this said, it appears to me that UIMA-JNET doesn't actually require
any prior annotations on the CAS that it gets but generates its own
sentences internally.  Are you sure you aren't doing unnecessary work?


-----Original Message-----
From: Andrew Serff [mailto:lists@serff.net] 
Sent: Monday, July 02, 2007 3:47 PM
To: uima-user@incubator.apache.org
Subject: UIMA Beginners Help?

Hello.  I'm very new to the whole world of data mining and have stumbled

upon UIMA within the last week or so.  I'm trying to go through all the 
documentation and just create a simple application but am hitting some 
road blocks and was wondering where I can find some newbie help. I 
realize this is sorta of long, so I appreciate any help anyone can give.


First I have a question: What is the difference between a CAS and a JCas

and why would I want to use one over the other?  Is this determined by 
the AEs I'm using (i.e. if they are implemented by extending a 
JCas_*_impl) or is there some other reason?  It seems the CAS is more 
developed and has things like CasPools, ability to make CASes with 
multiple AEs, Consumers, etc.  Should I just be using the CAS interface 
and forget about JCas?

My main issue right now is that I can't figure out how to set inputs for

an AE.  I can't find any examples of how to do it.  See the description 
below of what I'm trying to do:

I'm trying to use some pre bundled AEs to parse some text.  I basically 
want to do Named Entity Extraction on text.  So I wrote a simple 
application that first does Sentence Boundary detection and prints out 
the sentences that it finds.  That was easy enough.  So now I would like

to take those sentences and feed it into the named entity AE.  Both the 
Sentence Boundary AE and the NE AE I'm using are from the JULIE lab 
(http://www.julielab.de).  Reading the documentation for the NE AE it 
says that is requires inputs as Sentences (the output of the Sentence 
Boundary AE).  I cannot figure out how to set those inputs and am stuck 
at this point.  Once I figure that out, I think I'll be getting NEs out 
of the CAS. 

So now all that being said, I'm also not sure I'm coding this process 
the way I'm supposed to.  I eventually want to build all this into a 
distributed architecture with many threads running constantly processing

using a pool of extractors.  I want to be able to submit documents to 
the named entity extractor, then persist the named entities in a 
database.  I would like to have multiple entry points into the extractor

(i.e. adhoc (here is a doc, extract it now)) or using a collection 
reader to pull mulitple docs in at once and parse them all.  Right now, 
my simple application has 2 CASes and 2 AnalsysEngines (one for Sentence

Detection and one for NE Extraction).  It seems like I would just want 
to make one AE that does the Sentence Detection and passes it on to the 
NE extractor, but I don't get how you do this.  Do I need to make a new 
AE and define these things in the xml that describes it?  Or is this a 
CPE? 

If anyone has a simple NE example application that could point me in the

right direction, that would be great.  

Thanks!
Andrew

Re: UIMA Beginners Help?

Posted by Katrin Tomanek <to...@coling-uni-jena.de>.

Hi,
> One other question I had:  Is there a model for the NE detector that is 
> trained off of news articles (like the one lingpipe has)?  Or can I use 
> the one lingpipe has (I assumed it was in the wrong format...)?  I'm not 
> really in the gene/bio business...;)  I'm looking to discover People, 
> Locations, and Organizations.
you might train your own model with UIMA-JNET on e.g. CoNLL or ACE for 
this domain... actually, we don't provide such a model...

regards,
Katrin

Re: UIMA Beginners Help?

Posted by Andrew Serff <li...@serff.net>.

Thanks for both of your help!  I think i'm understanding things a little 
better now.  I got your example to work Katrin, however I do agree with 
Frank in regards to the Aggregate AE.  I do want to be able to use the 
process outside a CPE.  So that is my next step.  For some reason, I 
still can't get the example to work in Java.  I now use one CAS, send it 
through the sentence detector, then send it through the NE parser and I 
get nothing out.  Maybe I'll have more luck when I try to make my own AE. 

I think a light clicked reading both your emails in that the inputs are 
the annotations put in the CAS by the previous AE.  So I don't have to 
set any "input" per say (i.e. there is no "setInput" equivalent on the 
CAS like I was looking for).  I'll try out the next few steps later this 
week hopefully.  Thanks again for all your help and happy independence 
day (if you're in the US...)!

One other question I had:  Is there a model for the NE detector that is 
trained off of news articles (like the one lingpipe has)?  Or can I use 
the one lingpipe has (I assumed it was in the wrong format...)?  I'm not 
really in the gene/bio business...;)  I'm looking to discover People, 
Locations, and Organizations.

Andrew

Katrin Tomanek wrote:
> Dear Andrew,
>
> I have added a small documentation to our website, explaining how to 
> set up a Collection Processing Engine (CPE) from PEAR packages. See here:
>
> https://watchtower.coling.uni-jena.de/~tomanek/UIMA/
>
> There is also a small demo-CPE.
>
> In case you want to use the components rather programatically, i.e. in 
> an application, you might refer the "UIMA Tutorial and Developers' 
> Guides" (UIMA Version 2.1), section 3.2 (Using Analysis Engines). Do 
> it as explained there, i.e. make a AE from the sentence splitter and 
> from the named entity tagger. Create a CAS (important: as explained in 
> 3.2.6!) and then just run the process method of both AEs, sentence 
> splitter first, then the ne tagger, on the CAS you created. Hope that 
> works.
>
> Best wishes,
> Katrin
>

RE: UIMA Beginners Help?

Posted by "LeHouillier, Frank D." <Fr...@gd-ais.com>.

Are other people exploiting the ordering of the CPE the same way?

-----Original Message-----
From: Katrin Tomanek [mailto:tomanek@coling-uni-jena.de] 
Sent: Wednesday, July 04, 2007 2:15 AM
To: uima-user@incubator.apache.org
Subject: Re: UIMA Beginners Help?

LeHouillier, Frank D. wrote:
> Should have looked at the documentation closer,
> 
> UIMA Tutorial and Developers' Guides
> 3.6.1. Deploying a UIMA Component as a SOAP Service
> 
> This, at least tells me how to include services in an Aggregate.  That
> seems to be a conclusive argument in favor of using Aggregate Analysis
> Engines when there are dependencies between primitive AE's, isn't it?

Hi Frank,

well, setting up a CPE was just the most easy or say straightforward way

I found to use our tools (because there is a GUI, so... easy to handle 
for beginners). Sure, an AE is also an option, so is an application in 
java... depends on you needs.

Best regards,
Katrin

Re: UIMA Beginners Help?

Posted by Katrin Tomanek <to...@coling-uni-jena.de>.

LeHouillier, Frank D. wrote:
> Should have looked at the documentation closer,
> 
> UIMA Tutorial and Developers' Guides
> 3.6.1. Deploying a UIMA Component as a SOAP Service
> 
> This, at least tells me how to include services in an Aggregate.  That
> seems to be a conclusive argument in favor of using Aggregate Analysis
> Engines when there are dependencies between primitive AE's, isn't it?

Hi Frank,

well, setting up a CPE was just the most easy or say straightforward way 
I found to use our tools (because there is a GUI, so... easy to handle 
for beginners). Sure, an AE is also an option, so is an application in 
java... depends on you needs.

Best regards,
Katrin

RE: UIMA Beginners Help?

Posted by "LeHouillier, Frank D." <Fr...@gd-ais.com>.

Should have looked at the documentation closer,

UIMA Tutorial and Developers' Guides
3.6.1. Deploying a UIMA Component as a SOAP Service

This, at least tells me how to include services in an Aggregate.  That
seems to be a conclusive argument in favor of using Aggregate Analysis
Engines when there are dependencies between primitive AE's, isn't it?

Frank

-----Original Message-----
From: LeHouillier, Frank D. [mailto:Frank.LeHouillier@gd-ais.com] 
Sent: Tuesday, July 03, 2007 8:56 AM
To: uima-user@incubator.apache.org
Subject: RE: UIMA Beginners Help?

So this brings up a legitimate question I have.  Even if you can specify
a pipeline in a CPE, wouldn't it be better practice to use an Aggregate
Analysis Engine in cases where there are actual input dependencies
between two or more analysis engines.  I had always understood the
purpose of the CPE to be getting collections of data to and from a set
of independent Analysis Engines, rather than specifically for defining
dependencies.  Is it part of the UIMA spec for example that the CPE
define an order on the Analysis Engines?  The CPE specifically excludes
the inclusion of a Flow Controller description.  Another argument I have
for using Aggregate AE's in a situation where there are dependencies is
that someone might want to use it without using the CPE at all.  For
example, somebody might already have an application that handles most of
the CPE stuff and just want to plug in the UIMA-JNET, they would have to
create an Aggregate of the Sentence Annotator and NE extractor anyway,
right?  On the other hand, is there a way to create an Aggregate
Analysis Engine, specifying the primitive Analysis Engines as separate
services?  If I want the Sentence Annotator to be a service and the NE
extractor to be a service, how do I make sure that the CAS hits these in
the right order?  How do I do this if the flow is not simply linear but
dynamic (i.e. the output of the language identifier sends the CAS to the
correct Sentence annotator)? 


-----Original Message-----
From: Katrin Tomanek [mailto:tomanek@coling-uni-jena.de] 
Sent: Tuesday, July 03, 2007 1:35 AM
To: uima-user@incubator.apache.org
Subject: Re: UIMA Beginners Help?

Dear Andrew,

I have added a small documentation to our website, explaining how to set

up a Collection Processing Engine (CPE) from PEAR packages. See here:

https://watchtower.coling.uni-jena.de/~tomanek/UIMA/

There is also a small demo-CPE.

In case you want to use the components rather programatically, i.e. in 
an application, you might refer the "UIMA Tutorial and Developers' 
Guides" (UIMA Version 2.1), section 3.2 (Using Analysis Engines). Do it 
as explained there, i.e. make a AE from the sentence splitter and from 
the named entity tagger. Create a CAS (important: as explained in 
3.2.6!) and then just run the process method of both AEs, sentence 
splitter first, then the ne tagger, on the CAS you created. Hope that
works.

Best wishes,
Katrin

-- 
Katrin Tomanek
Jena University Language and Information Engineering (JULIE) Lab
Phone: +49-3641-944307
Fax:   +49-3641-944321
email: tomanek@coling-uni-jena.de
URL:   http://www.julielab.de

RE: UIMA Beginners Help?

Posted by "LeHouillier, Frank D." <Fr...@gd-ais.com>.

So this brings up a legitimate question I have.  Even if you can specify
a pipeline in a CPE, wouldn't it be better practice to use an Aggregate
Analysis Engine in cases where there are actual input dependencies
between two or more analysis engines.  I had always understood the
purpose of the CPE to be getting collections of data to and from a set
of independent Analysis Engines, rather than specifically for defining
dependencies.  Is it part of the UIMA spec for example that the CPE
define an order on the Analysis Engines?  The CPE specifically excludes
the inclusion of a Flow Controller description.  Another argument I have
for using Aggregate AE's in a situation where there are dependencies is
that someone might want to use it without using the CPE at all.  For
example, somebody might already have an application that handles most of
the CPE stuff and just want to plug in the UIMA-JNET, they would have to
create an Aggregate of the Sentence Annotator and NE extractor anyway,
right?  On the other hand, is there a way to create an Aggregate
Analysis Engine, specifying the primitive Analysis Engines as separate
services?  If I want the Sentence Annotator to be a service and the NE
extractor to be a service, how do I make sure that the CAS hits these in
the right order?  How do I do this if the flow is not simply linear but
dynamic (i.e. the output of the language identifier sends the CAS to the
correct Sentence annotator)? 


-----Original Message-----
From: Katrin Tomanek [mailto:tomanek@coling-uni-jena.de] 
Sent: Tuesday, July 03, 2007 1:35 AM
To: uima-user@incubator.apache.org
Subject: Re: UIMA Beginners Help?

Dear Andrew,

I have added a small documentation to our website, explaining how to set

up a Collection Processing Engine (CPE) from PEAR packages. See here:

https://watchtower.coling.uni-jena.de/~tomanek/UIMA/

There is also a small demo-CPE.

In case you want to use the components rather programatically, i.e. in 
an application, you might refer the "UIMA Tutorial and Developers' 
Guides" (UIMA Version 2.1), section 3.2 (Using Analysis Engines). Do it 
as explained there, i.e. make a AE from the sentence splitter and from 
the named entity tagger. Create a CAS (important: as explained in 
3.2.6!) and then just run the process method of both AEs, sentence 
splitter first, then the ne tagger, on the CAS you created. Hope that
works.

Best wishes,
Katrin

-- 
Katrin Tomanek
Jena University Language and Information Engineering (JULIE) Lab
Phone: +49-3641-944307
Fax:   +49-3641-944321
email: tomanek@coling-uni-jena.de
URL:   http://www.julielab.de

Re: UIMA Beginners Help?

Posted by Katrin Tomanek <to...@coling-uni-jena.de>.

Dear Andrew,

I have added a small documentation to our website, explaining how to set 
up a Collection Processing Engine (CPE) from PEAR packages. See here:

https://watchtower.coling.uni-jena.de/~tomanek/UIMA/

There is also a small demo-CPE.

In case you want to use the components rather programatically, i.e. in 
an application, you might refer the "UIMA Tutorial and Developers' 
Guides" (UIMA Version 2.1), section 3.2 (Using Analysis Engines). Do it 
as explained there, i.e. make a AE from the sentence splitter and from 
the named entity tagger. Create a CAS (important: as explained in 
3.2.6!) and then just run the process method of both AEs, sentence 
splitter first, then the ne tagger, on the CAS you created. Hope that works.

Best wishes,
Katrin

-- 
Katrin Tomanek
Jena University Language and Information Engineering (JULIE) Lab
Phone: +49-3641-944307
Fax:   +49-3641-944321
email: tomanek@coling-uni-jena.de
URL:   http://www.julielab.de