You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@uima.apache.org by Yosi Mass <YO...@il.ibm.com> on 2008/12/31 10:53:38 UTC

P2P UIMA

Hi,

I would like to suggest a scale-out of UIMA by enabling it to run in a P2P
environment.

>From my understanding, the CPE is a 1st generation scaleout, and it can run
a distributed pipeline using vinci/soap but the machines involved in the
pipeline are predefined in the UIMA descriptors.

The 2nd generation scaleout is called UIMA-AS (AS = Asynchronous Scaleout),
and is based on some Java and web standards, such as JMS (Java Messaging
Service).  It is now officially released on Apache UIMA.  This allows users
to selectively choose which parts of their pipeline to run in this mode,
which in turn allows scaling out individual parts of the pipeline, as
needed. Again there is no dynamic discovery of resources after startup.

I would like to suggest a 3rd generation scaleout using a fully
decentralized P2P network. Assume that each peer can publish its
capabilities (namely which annotators it can run) and its current
availability, then we may extend UIMA/UIMA-AS pipeline to discover an
available and capable peer for running an annotator and thus achieve better
load balancing and thus better performance than previous generations.

What people on the list think about this?

Thanks, Yosi




Re: P2P UIMA

Posted by Eddie Epstein <ea...@gmail.com>.
Hi Yosi,

Given the capability provided by UIMA-1127 to efficiently deal with missing
delegates during run time, UIMA AS may be close to providing the functionality
you describe below. A new issue, UIMA-1295, has been created to close one
required function. Can you review the description in 1295 and comment on
what other aspects of peer-to-peer are still missing?

Thanks and regards,
Eddie


On Wed, Dec 31, 2008 at 4:53 AM, Yosi Mass <YO...@il.ibm.com> wrote:
>
> Hi,
>
> I would like to suggest a scale-out of UIMA by enabling it to run in a P2P
> environment.
>
> From my understanding, the CPE is a 1st generation scaleout, and it can run
> a distributed pipeline using vinci/soap but the machines involved in the
> pipeline are predefined in the UIMA descriptors.
>
> The 2nd generation scaleout is called UIMA-AS (AS = Asynchronous Scaleout),
> and is based on some Java and web standards, such as JMS (Java Messaging
> Service).  It is now officially released on Apache UIMA.  This allows users
> to selectively choose which parts of their pipeline to run in this mode,
> which in turn allows scaling out individual parts of the pipeline, as
> needed. Again there is no dynamic discovery of resources after startup.
>
> I would like to suggest a 3rd generation scaleout using a fully
> decentralized P2P network. Assume that each peer can publish its
> capabilities (namely which annotators it can run) and its current
> availability, then we may extend UIMA/UIMA-AS pipeline to discover an
> available and capable peer for running an annotator and thus achieve better
> load balancing and thus better performance than previous generations.
>
> What people on the list think about this?
>
> Thanks, Yosi
>
>
>
>

Re: P2P UIMA

Posted by Marshall Schor <ms...@schor.com>.

Yosi Mass wrote:
> Hi,
>
> I would like to suggest a scale-out of UIMA by enabling it to run in a P2P
> environment.
>
> >From my understanding, the CPE is a 1st generation scaleout, and it can run
> a distributed pipeline using vinci/soap but the machines involved in the
> pipeline are predefined in the UIMA descriptors.
>
> The 2nd generation scaleout is called UIMA-AS (AS = Asynchronous Scaleout),
> and is based on some Java and web standards, such as JMS (Java Messaging
> Service).  It is now officially released on Apache UIMA.  This allows users
> to selectively choose which parts of their pipeline to run in this mode,
> which in turn allows scaling out individual parts of the pipeline, as
> needed. Again there is no dynamic discovery of resources after startup.
>   
Hmm, I think this may not be quite accurate.  In UIMA-AS, connections
are made using a JMS infrastructure, such as ActiveMQ.  Each service has
an associated "address" in the network space, made up of a Broker URL
and Port.

The actual service implementation is done by 1 or more servers that
register themselves with the Broker URL and Port.  During a run, servers
can be dynamically added or removed; this changes the "capacity" of the
service.  Of course, if all of the servers for a particular service are
removed, then the service "fails". 

But maybe what is meant, is, rather, the ability of the system to
recognize when a service becomes available, rather than merely changing
its capacity.  For instance, in the UIMA-AS case, this could mean
several kinds of things:

1) allowing a service to be configured with 0 servers available at startup

2) having the flow controller "know" more explicitly about service
"availablilty", for instance, the number of "servers" there might be for
a particular service.  Here, the idea would be that a flow controller
could dynamically decide, based on what the service level of different
steps in the pipeline were, how to "route" a CAS for a particular aggregate.

Are these the kinds of function that are desired?
> I would like to suggest a 3rd generation scaleout using a fully
> decentralized P2P network. Assume that each peer can publish its
> capabilities (namely which annotators it can run) and its current
> availability, then we may extend UIMA/UIMA-AS pipeline to discover an
> available and capable peer for running an annotator and thus achieve better
> load balancing and thus better performance than previous generations.
>   
The "publication" would need to include the type system of the
annotators, and some notion of which annotators would ever "want" to be
run together in a pipeline, because a key part of the UIMA design is the
"merging" of type systems to allow interoperability among the parts.

Is there a "reservation" idea here too?  For instance, in an open
environment, where there are lots of clients and services and servers
for those services, a particular client might want to reserve some
amount of processing capability for itself, (not necessarily all of the
capability).

Finally, I wonder -- are there systems / infrastructure / middleware
already out there that do this kind of thing that we could perhaps
easily adapt / adopt for these purposes?

-Marshall
> What people on the list think about this?
>
> Thanks, Yosi
>
>
>
>
>
>