You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@uima.apache.org by Jeffery <yu...@gmail.com> on 2014/05/14 20:21:42 UTC

Is there a way to tell UIMA component to only extract some kind of entities when run opennlp.pear?

For example, user dynamically specifies what kind of entity user is 
interested, for example: user may be only interested in person entities, so we 
run opennlp.pear, but it will extract all entities, such as: 
person,Organization,Location,Date,Time,Money,Percentage,Parse,Chunk,Token.

This makes the extraction unnecessarily slower. 

Same problem happens for RegExAnnotator.pear, it is able to extract isbn, 
email etc, we may add our own regex to extract usa phone number or etc.
But at one time, we may only want to extract email or phone number.


Re: Is there a way to tell UIMA component to only extract some kind of entities when run opennlp.pear?

Posted by Eddie Epstein <ea...@gmail.com>.
Hi Jeffery,

According the info at
http://uima.apache.org/d/uimaj-2.6.0/tutorials_and_users_guides.html#ugr.tug.aae.result_specification_setting

   "The default Result Specification is taken from the Engine's output
Capability Specification."

So it should be possible to deploy the UIMA-AS service with a particular
ResultSpecification,
if a static configuration is all that is needed.

Using the ResultSpecification to control annotator behavior is quite
limited;
consider wanting a speed vs accuracy knob. A more general static solution
would be based
on configuration parameters, and a dynamic solution would put control
information into the CAS.

Eddie




On Thu, Jun 5, 2014 at 5:20 PM, Jeffery <yu...@gmail.com> wrote:

> Marshall Schor <ms...@...> writes:
>
> >
> > UIMA's descriptors include a section under the XML capabilities element
> where
> > the descriptor may specify inputs and outputs.  These end up informing
> the
> > ResultSpecification which is provided to the annotator.  The
> ResultSpecification
> > can be queried by the annotator code to see what the annotator ought to
> produce.
> >
> > This is used, for example by sample annotators in the examples project:
> >    TutorialDateTime
> >    RegExAnnotator
> >    PersonTitleAnnotator
> >
> > to control what the annotators produce.
> >
> > This behavior, on the part of annotators, is "optional" - that is, an
> annotator
> > might be written to ignore the ResultSpecification.
> >
> > So the key may be to update the annotators to take account of the
> > ResultSpecification.
> >
> > For more background, see
> > http://uima.apache.org/d/uimaj-
>
> 2.6.0/tutorials_and_users_guides.html#ugr.tug.aae.result_specification_setti
> ng
> >
> > which discusses the ResultSpecification further.
> >
> > -Marshall
>
> Thanks, Marshall
>
>    I tried your suggestions, and it works very well.
>    Recently, I am looking into UIMA-AS, I am wonderring whether we can do
> same thing in uima-as. But seems UIMA-AS doesn't use ResultSpecificatio:
> the
> sendCas method doesn't accept ResultSpecificatio.
>
>   String casId = asAE.sendCAS(cas);
>
> Thanks again for your great help, Marshall.
> -- Seems my last thank-you post somehow was gone.
>
>

Re: Is there a way to tell UIMA component to only extract some kind of entities when run opennlp.pear?

Posted by Jeffery <yu...@gmail.com>.
Marshall Schor <ms...@...> writes:

> 
> UIMA's descriptors include a section under the XML capabilities element 
where
> the descriptor may specify inputs and outputs.  These end up informing the
> ResultSpecification which is provided to the annotator.  The 
ResultSpecification
> can be queried by the annotator code to see what the annotator ought to 
produce.
> 
> This is used, for example by sample annotators in the examples project:
>    TutorialDateTime
>    RegExAnnotator
>    PersonTitleAnnotator
> 
> to control what the annotators produce.
> 
> This behavior, on the part of annotators, is "optional" - that is, an 
annotator
> might be written to ignore the ResultSpecification. 
> 
> So the key may be to update the annotators to take account of the
> ResultSpecification.
> 
> For more background, see
> http://uima.apache.org/d/uimaj-
2.6.0/tutorials_and_users_guides.html#ugr.tug.aae.result_specification_setti
ng
> 
> which discusses the ResultSpecification further.
> 
> -Marshall

Thanks, Marshall

   I tried your suggestions, and it works very well.
   Recently, I am looking into UIMA-AS, I am wonderring whether we can do 
same thing in uima-as. But seems UIMA-AS doesn't use ResultSpecificatio: the 
sendCas method doesn't accept ResultSpecificatio.

  String casId = asAE.sendCAS(cas);

Thanks again for your great help, Marshall.
-- Seems my last thank-you post somehow was gone. 


Re: Is there a way to tell UIMA component to only extract some kind of entities when run opennlp.pear?

Posted by Marshall Schor <ms...@schor.com>.
UIMA's descriptors include a section under the XML capabilities element where
the descriptor may specify inputs and outputs.  These end up informing the
ResultSpecification which is provided to the annotator.  The ResultSpecification
can be queried by the annotator code to see what the annotator ought to produce.

This is used, for example by sample annotators in the examples project:
   TutorialDateTime
   RegExAnnotator
   PersonTitleAnnotator

to control what the annotators produce.

This behavior, on the part of annotators, is "optional" - that is, an annotator
might be written to ignore the ResultSpecification. 

So the key may be to update the annotators to take account of the
ResultSpecification.

For more background, see
http://uima.apache.org/d/uimaj-2.6.0/tutorials_and_users_guides.html#ugr.tug.aae.result_specification_setting

which discusses the ResultSpecification further.

-Marshall
On 5/14/2014 2:21 PM, Jeffery wrote:
> For example, user dynamically specifies what kind of entity user is 
> interested, for example: user may be only interested in person entities, so we 
> run opennlp.pear, but it will extract all entities, such as: 
> person,Organization,Location,Date,Time,Money,Percentage,Parse,Chunk,Token.
>
> This makes the extraction unnecessarily slower. 
>
> Same problem happens for RegExAnnotator.pear, it is able to extract isbn, 
> email etc, we may add our own regex to extract usa phone number or etc.
> But at one time, we may only want to extract email or phone number.
>
>
>