You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@uima.apache.org by Charles Bearden <Ch...@uth.tmc.edu> on 2011/08/15 23:29:44 UTC

Basic questions about UIMA AS deployment

We have used UIMA as a CPE to run several fairly simple pipelines, including 
some using cTAKES components [1]. UIMA AS is billed as "the next generation 
scalability replacement for the Collection Processing Manager (CPM)", and I'm 
trying to wrap my head around it by using it for some of the tasks we did 
previously with CPEs and the CPM.

Neither the Getting Started [2] nor the UIMA AS user manual [3] cover the 
practicalities of deploying asynchronous pipelines, so I'm relying on the README 
that comes with uima-as-2.3.1-bin.tar.gz. If there is a better document to work 
from, please let me know :-) UIMA is my first exposure to a Big Java Framework, 
so my knowledge & intuitions about it are not deep.

It looks to me as if there are two basic patterns:
(1) start the broker ('startBroker.sh'), and then
(2) use 'runRemoteAsyncAE.sh' to both connect the CR with the queue via the '-c' 
argument and to deploy the AS AEs via the '-d' flag; or

(1) start the broker ('startBroker.sh');
(2) deploy one or more instances of the AS AE with 'deployAsyncService.sh', and then
(3) use 'runRemoteAsyncAE.sh' to both connect the CR with the queue via the '-c' 
argument.

Do I have this right?

One challenge we face is that some essential third-part components are not 
thread-safe, and so it looks to me as if I'll have to scale out instances of 
those components by deploying them in their own JVMs and not by means of a 
single deployment with

   <scaleout numberOfInstances="20"/>

in the deployment descriptor.

Thanks for any pointers; I have more questions to follow up with :-)

[1] 
<https://cabig-kc.nci.nih.gov/Vocab/KC/index.php/OHNLP_Documentation_and_Downloads>
[2] <http://uima.apache.org/doc-uimaas-what.html>
[3] <http://uima.apache.org/d/uima-as-2.3.1/uima_async_scaleout.html>

-- 
Chuck Bearden
Programmer Analyst IV
The University of Texas Health Science Center at Houston
School of Biomedical Informatics
Email: Charles.F.Bearden@uth.tmc.edu
Phone: 713.500.9672


Re: Basic questions about UIMA AS deployment

Posted by Jaroslaw Cwiklik <ui...@gmail.com>.
Either of the two approaches will work. You can also embed the client side
of the UIMA AS in your own application. There is description/example of how
to do this in uima_async_scaleout.pdf
on page 30. The basic building blocks are:

1) Broker
2) UIMA AS service ( deployed from deployment descriptor
via deployAsyncService.sh )
3) UIMA AS client

Jerry C

On Mon, Aug 15, 2011 at 5:29 PM, Charles Bearden <
Charles.F.Bearden@uth.tmc.edu> wrote:

> We have used UIMA as a CPE to run several fairly simple pipelines,
> including some using cTAKES components [1]. UIMA AS is billed as "the next
> generation scalability replacement for the Collection Processing Manager
> (CPM)", and I'm trying to wrap my head around it by using it for some of the
> tasks we did previously with CPEs and the CPM.
>
> Neither the Getting Started [2] nor the UIMA AS user manual [3] cover the
> practicalities of deploying asynchronous pipelines, so I'm relying on the
> README that comes with uima-as-2.3.1-bin.tar.gz. If there is a better
> document to work from, please let me know :-) UIMA is my first exposure to a
> Big Java Framework, so my knowledge & intuitions about it are not deep.
>
> It looks to me as if there are two basic patterns:
> (1) start the broker ('startBroker.sh'), and then
> (2) use 'runRemoteAsyncAE.sh' to both connect the CR with the queue via the
> '-c' argument and to deploy the AS AEs via the '-d' flag; or
>
> (1) start the broker ('startBroker.sh');
> (2) deploy one or more instances of the AS AE with 'deployAsyncService.sh',
> and then
> (3) use 'runRemoteAsyncAE.sh' to both connect the CR with the queue via the
> '-c' argument.
>
> Do I have this right?
>
> One challenge we face is that some essential third-part components are not
> thread-safe, and so it looks to me as if I'll have to scale out instances of
> those components by deploying them in their own JVMs and not by means of a
> single deployment with
>
>  <scaleout numberOfInstances="20"/>
>
> in the deployment descriptor.
>
> Thanks for any pointers; I have more questions to follow up with :-)
>
> [1] <https://cabig-kc.nci.nih.gov/**Vocab/KC/index.php/OHNLP_**
> Documentation_and_Downloads<https://cabig-kc.nci.nih.gov/Vocab/KC/index.php/OHNLP_Documentation_and_Downloads>
> >
> [2] <http://uima.apache.org/doc-**uimaas-what.html<http://uima.apache.org/doc-uimaas-what.html>
> >
> [3] <http://uima.apache.org/d/**uima-as-2.3.1/uima_async_**scaleout.html<http://uima.apache.org/d/uima-as-2.3.1/uima_async_scaleout.html>
> >
>
> --
> Chuck Bearden
> Programmer Analyst IV
> The University of Texas Health Science Center at Houston
> School of Biomedical Informatics
> Email: Charles.F.Bearden@uth.tmc.edu
> Phone: 713.500.9672
>
>

Re: Basic questions about UIMA AS deployment

Posted by Eddie Epstein <ea...@gmail.com>.
Hi Chuck,

Great questions. The major issue that makes UIMA AS somewhat hard to
understand is that UIMA AS, although advertised as a scale out
framework, is lacking life cycle management for processes. It has so
far been focused on the details of interconnecting UIMA compliant
components in multi-threaded and multi-process configurations and on
error handling.

On Mon, Aug 15, 2011 at 5:29 PM, Charles Bearden
<Ch...@uth.tmc.edu> wrote:
> We have used UIMA as a CPE to run several fairly simple pipelines, including
> some using cTAKES components [1]. UIMA AS is billed as "the next generation
> scalability replacement for the Collection Processing Manager (CPM)", and
> I'm trying to wrap my head around it by using it for some of the tasks we
> did previously with CPEs and the CPM.
>
> Neither the Getting Started [2] nor the UIMA AS user manual [3] cover the
> practicalities of deploying asynchronous pipelines, so I'm relying on the
> README that comes with uima-as-2.3.1-bin.tar.gz. If there is a better
> document to work from, please let me know :-) UIMA is my first exposure to a
> Big Java Framework, so my knowledge & intuitions about it are not deep.
>
> It looks to me as if there are two basic patterns:
> (1) start the broker ('startBroker.sh'), and then
> (2) use 'runRemoteAsyncAE.sh' to both connect the CR with the queue via the
> '-c' argument and to deploy the AS AEs via the '-d' flag; or
>
> (1) start the broker ('startBroker.sh');
> (2) deploy one or more instances of the AS AE with 'deployAsyncService.sh',
> and then
> (3) use 'runRemoteAsyncAE.sh' to both connect the CR with the queue via the
> '-c' argument.
>
> Do I have this right?

The first pattern is basically a "getting started" example, and the
second typical for larger deployments.

RunRemoteAsyncAE.java is sample application code and useful tool for
exercising services. UIMA_Service.java, the program called by
deployAsyncService, is a useful tool and sample code for deploying
services; for example it can easily be adapted into a servlet
container.

>
> One challenge we face is that some essential third-part components are not
> thread-safe, and so it looks to me as if I'll have to scale out instances of
> those components by deploying them in their own JVMs and not by means of a
> single deployment with
>
>  <scaleout numberOfInstances="20"/>
>
> in the deployment descriptor.

Right, non thread-safe components are simply scaled out as multiple
processes all pulling from the same queue. Multi-thread scaling is
more essential for vertical scale out of analytics sharing large
in-memory objects.

Eddie