You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@uima.apache.org by Marshall Schor <ms...@schor.com> on 2009/01/05 21:56:15 UTC
Re: P2P UIMA


Yosi Mass wrote:
> Hi,
>
> I would like to suggest a scale-out of UIMA by enabling it to run in a P2P
> environment.
>
> >From my understanding, the CPE is a 1st generation scaleout, and it can run
> a distributed pipeline using vinci/soap but the machines involved in the
> pipeline are predefined in the UIMA descriptors.
>
> The 2nd generation scaleout is called UIMA-AS (AS = Asynchronous Scaleout),
> and is based on some Java and web standards, such as JMS (Java Messaging
> Service).  It is now officially released on Apache UIMA.  This allows users
> to selectively choose which parts of their pipeline to run in this mode,
> which in turn allows scaling out individual parts of the pipeline, as
> needed. Again there is no dynamic discovery of resources after startup.
>   
Hmm, I think this may not be quite accurate.  In UIMA-AS, connections
are made using a JMS infrastructure, such as ActiveMQ.  Each service has
an associated "address" in the network space, made up of a Broker URL
and Port.

The actual service implementation is done by 1 or more servers that
register themselves with the Broker URL and Port.  During a run, servers
can be dynamically added or removed; this changes the "capacity" of the
service.  Of course, if all of the servers for a particular service are
removed, then the service "fails". 

But maybe what is meant, is, rather, the ability of the system to
recognize when a service becomes available, rather than merely changing
its capacity.  For instance, in the UIMA-AS case, this could mean
several kinds of things:

1) allowing a service to be configured with 0 servers available at startup

2) having the flow controller "know" more explicitly about service
"availablilty", for instance, the number of "servers" there might be for
a particular service.  Here, the idea would be that a flow controller
could dynamically decide, based on what the service level of different
steps in the pipeline were, how to "route" a CAS for a particular aggregate.

Are these the kinds of function that are desired?
> I would like to suggest a 3rd generation scaleout using a fully
> decentralized P2P network. Assume that each peer can publish its
> capabilities (namely which annotators it can run) and its current
> availability, then we may extend UIMA/UIMA-AS pipeline to discover an
> available and capable peer for running an annotator and thus achieve better
> load balancing and thus better performance than previous generations.
>   
The "publication" would need to include the type system of the
annotators, and some notion of which annotators would ever "want" to be
run together in a pipeline, because a key part of the UIMA design is the
"merging" of type systems to allow interoperability among the parts.

Is there a "reservation" idea here too?  For instance, in an open
environment, where there are lots of clients and services and servers
for those services, a particular client might want to reserve some
amount of processing capability for itself, (not necessarily all of the
capability).

Finally, I wonder -- are there systems / infrastructure / middleware
already out there that do this kind of thing that we could perhaps
easily adapt / adopt for these purposes?

-Marshall
> What people on the list think about this?
>
> Thanks, Yosi
>
>
>
>
>
>