You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@uima.apache.org by Spico Florin <sp...@gmail.com> on 2012/03/26 22:06:12 UTC

Administration of UIMA AS pipeline challenge. Please advice. Architectural aspects

Hello!
  Currently I'm working on a project that:
1. collects news from Internet and
2. passes news  to UIMA pipeline should identify entities
(organizations, locations etc) Here we are using 4 annotators
3. determines the accuracy of a given subject category  ( using
prediction models) 9 annotators and the number will increase
4. should support more different type of annotators in the pipelines
5. when one analyze engine  fails with an exception or is dead, the
entire annotation process ( all the pipelines) should be stopped.
6. all of the remote analysis engines are located located on the same
machine together with parallel flow controller (one server machine
used for all annotators).
  Based on the above I've created 2 pipelines  that works in parallel
(listening an JMS topic)
that were built having in mind separation of concerning subject. One
pipeline is for entity recognition and the other one for prediction.
Both of them are using Aggregate Analysis Engine based on Parallel
Flow Controller. In order to benefit from the feature of parallel
annotation process, both pipelines are built using remote clients
(approximately a remote client per annotator).
  Given the above scenario, the following concerns and questions are raised:
1. Is my approach correct regarding the requirements? Is there a
better way to design the pipeline (perhaps in one single pipeline)?
2. Having so many remote analysis engine started (via scripts using
deplyAsyncService), I found difficult to manage them,especially
concerning the 5th requirement. Is there any support for
monitoring(view alive statuses) and operating (such as start, stop) on
these services (besides JMX jconsole)? Perhaps you can re command a
good application Nagios??)
3. Regarding the 5th requirement, I've observed that using the
allowContinueOnFailure feature on parallel flow stops the pipeline
processing when one component is down but only in case when AEs are
collocated. What about remote analysis? Is there any way to trap that
one AE is down and thus alerting all the other AE process (remote AE)
of the pipeline to be killed.
     Based on the above, I would appreciate any help and any advices
or suggestions from the UIMA community.
 Thank you for your patience ans support.
Regards,
 Florin

Re: Administration of UIMA AS pipeline challenge. Please advice. Architectural aspects

Posted by Eddie Epstein <ea...@gmail.com>.
Florin,

Not clear exactly what your goals are for parallel processing. Are all
annotators supposed to run in parallel? Lets assume not, that the
annotators are grouped, and each group runs in parallel. For example
two groups, entity recognition and prediction, will run in parallel,
but within a group annotators run sequentially. Consider the following
implementation:

 service 1: entity service
 service 2: prediction service
 service 3: top-level service
 driver:

The driver program first launches the three services, creates a
UIMA-AS client to the top-level service and wait for all three to
init. If any of the three services die, the driver will kill the other
two.

The driver feeds each news item to the top-level service, which calls
its two delegates in parallel. Merging of results is done
automatically in the top-level which returns all the results in one
CAS to the driver.

Set error handling for s1 and s2 so that if any annotator throws an
exception, the service terminates. The exception would be passed back
to s3, which would return the exception to the client driver. When any
service dies, the driver program would kill the rest.

Eddie


On Mon, Mar 26, 2012 at 4:06 PM, Spico Florin <sp...@gmail.com> wrote:
> Hello!
>  Currently I'm working on a project that:
> 1. collects news from Internet and
> 2. passes news  to UIMA pipeline should identify entities
> (organizations, locations etc) Here we are using 4 annotators
> 3. determines the accuracy of a given subject category  ( using
> prediction models) 9 annotators and the number will increase
> 4. should support more different type of annotators in the pipelines
> 5. when one analyze engine  fails with an exception or is dead, the
> entire annotation process ( all the pipelines) should be stopped.
> 6. all of the remote analysis engines are located located on the same
> machine together with parallel flow controller (one server machine
> used for all annotators).
>  Based on the above I've created 2 pipelines  that works in parallel
> (listening an JMS topic)
> that were built having in mind separation of concerning subject. One
> pipeline is for entity recognition and the other one for prediction.
> Both of them are using Aggregate Analysis Engine based on Parallel
> Flow Controller. In order to benefit from the feature of parallel
> annotation process, both pipelines are built using remote clients
> (approximately a remote client per annotator).
>  Given the above scenario, the following concerns and questions are raised:
> 1. Is my approach correct regarding the requirements? Is there a
> better way to design the pipeline (perhaps in one single pipeline)?
> 2. Having so many remote analysis engine started (via scripts using
> deplyAsyncService), I found difficult to manage them,especially
> concerning the 5th requirement. Is there any support for
> monitoring(view alive statuses) and operating (such as start, stop) on
> these services (besides JMX jconsole)? Perhaps you can re command a
> good application Nagios??)
> 3. Regarding the 5th requirement, I've observed that using the
> allowContinueOnFailure feature on parallel flow stops the pipeline
> processing when one component is down but only in case when AEs are
> collocated. What about remote analysis? Is there any way to trap that
> one AE is down and thus alerting all the other AE process (remote AE)
> of the pipeline to be killed.
>     Based on the above, I would appreciate any help and any advices
> or suggestions from the UIMA community.
>  Thank you for your patience ans support.
> Regards,
>  Florin