You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@uima.apache.org by Petr Baudis <pa...@ucw.cz> on 2015/06/15 02:20:57 UTC

[ANN] Multi-threaded UIMA ASB

  Hi!

  I have created an extension of UIMA that replaces its default ASB
with a multi-threaded one, so that if you have a CAS multiplier in your
pipeline, multiple generated CASes may be processed in parallel in
different threads.  It has a few warts, but should be generally much
simpler to use than UIMA-AS if you do not need fancy things like cluster
deployment.

  It even has some documentation now.  Find it at:

	https://github.com/brmson/yodaqa/tree/master/src/main/java/cz/brmlab/yodaqa/flow/asb

  (Right now, it just lives as part of my YodaQA software, simply copy
that directory to your project.  I can spin-off the package properly
if there'll be enough interest in it.  It shares the YodaQA licence
statement, i.e. ASL2.)


On Wed, May 20, 2015 at 03:27:20AM +0200, Petr Baudis wrote:
>   I'm looking into ways to run a part of my pipeline multi-threaded:
..snip..
>   (i) I'm using UIMAfit heavily, and multiple CAS multipliers and
> mergers (even within the parallel branches).  So I can't use CPE.
> 
>   (ii) I need multi-threading, not separate processes.  (I have just
> a meager 24G RAM (sigh) and one Java process with all the linguistic
> models and stuff loaded takes 3GB RAM.  So I really need to load these
> resources to memory only once.)
..snip..
>   However, (before actually trying) it still seems to me to be much
> easier to rewrite a piece of the stock ASB than use UIMA-AS with complex
> pipeline construed by UIMAfit...  So I think I will try that first (and
> report back).

  Whew, this was not so easy!  It took a good few days (and a few
start-overs) to do and debug, and I learnt more about UIMAj internals
than I ever cared to. ;-)  But I think I'm still happier with the result
than if I used UIMA-AS and it doesn't seem to deadlock or crash anymore
even on (IMHO) a fairly massive pipeline.

  (What I'm bothered by the most at this point is the fixed-size CAS
pool, though there are a few more issues; I tried to document them all
as well.)

  P.S.: Would there be any interest in merging this to UIMA proper,
or at least cleaning up some UIMA API bits to simplify and future-proof
the external package?  I admit up-front that I probably won't have time
to do all that work myself, but I'd be happy to cooperate with someone.

-- 
				Petr Baudis
	If you have good ideas, good data and fast computers,
	you can do almost anything. -- Geoffrey Hinton

Re: [ANN] Multi-threaded UIMA ASB

Posted by Petr Baudis <pa...@ucw.cz>.
On Thu, Jul 09, 2015 at 04:17:44PM -0400, Marshall Schor wrote:
> Hi, just saw this ...
> 
> I'll take a look.  This kind of thing is "on the list" for uima v3; see
> https://cwiki.apache.org/confluence/display/UIMA/Ideas+for+UIMAJ+v3

  Thanks, I was not aware of that page.

  However, it seems to concern a much harder case of annotators
working in parallel on the same CAS.  I'm solving an easy case where
each CAS is processed by just a single annotator at once.  For this,
there are thankfully no large changes in current UIMA needed,
apparently, if one accepts a few rough corners (as documented).

-- 
				Petr Baudis
	If you have good ideas, good data and fast computers,
	you can do almost anything. -- Geoffrey Hinton

Re: UIMAj3 ideas

Posted by Richard Eckart de Castilho <re...@apache.org>.
On 16.07.2015, at 23:10, Jaroslaw Cwiklik <ui...@gmail.com> wrote:

> The UIMA-AS *does* have an API to generate deployment descriptors although
> its not documented. Its an internal API for now and most likely will be
> documented in the next release of UIMA-AS. The API is implemented by
> DeploymentDescriptorFactory.java. in the uimaj-as-core project.

Cool :) *thumbs up*

-- Richard

Re: Creating UIMA-AS deployment descriptors programmatically

Posted by Jaroslaw Cwiklik <ui...@gmail.com>.
Yes, I forgot about this. Its a minimal documentation which describes
primitive deployment. More complex deployments are supported but not
documented. I think more work is needed to clean up the API and when done
more documentation is necessary. This is work in progress.

-jerry

On Fri, Aug 12, 2016 at 10:30 AM, Richard Eckart de Castilho <rec@apache.org
> wrote:

> It's in the documentation. That's how I stumbled over it again and tried
> to remember why back in the day I had written my own factory.
>
> https://uima.apache.org/d/uima-as-2.8.1/uima_async_
> scaleout.html#ref.async.api.descriptor.generation
>
> Cheers,
>
> -- Richard
>
> > On 12.08.2016, at 16:28, Jaroslaw Cwiklik <ui...@gmail.com> wrote:
> >
> > I think this is documented in the code only for now and not in the
> UIMA-AS
> > documentation. This API still needs work. I was thinking of changing this
> > to use Builder pattern to configure deployment using a series of set/add
> > calls instead of passing many parameters.
> > I can enhance the code to support your suggestion. I will create a new
> JIRA
> > to capture this requirement.
> > Thanks
> >
> > -jerry
> >
> >
> >
> > On Thu, Aug 11, 2016 at 1:26 PM, Richard Eckart de Castilho <
> rec@apache.org>
> > wrote:
> >
> >> On 16.07.2015, at 23:10, Jaroslaw Cwiklik <ui...@gmail.com> wrote:
> >>>
> >>> The UIMA-AS *does* have an API to generate deployment descriptors
> >> although
> >>> its not documented. Its an internal API for now and most likely will be
> >>> documented in the next release of UIMA-AS. The API is implemented by
> >>> DeploymentDescriptorFactory.java. in the uimaj-as-core project.
> >>
> >> I see this is documented now.
> >>
> >> Would be nice if one could directly set an AnalyisEngineDescriptor in
> the
> >> ServiceContextImpl instead of having to first serialize the AED to a
> file.
> >>
> >> Cheers,
> >>
> >> -- Richard
> >>
>
>

Re: Creating UIMA-AS deployment descriptors programmatically

Posted by Richard Eckart de Castilho <re...@apache.org>.
It's in the documentation. That's how I stumbled over it again and tried
to remember why back in the day I had written my own factory.

https://uima.apache.org/d/uima-as-2.8.1/uima_async_scaleout.html#ref.async.api.descriptor.generation

Cheers,

-- Richard

> On 12.08.2016, at 16:28, Jaroslaw Cwiklik <ui...@gmail.com> wrote:
> 
> I think this is documented in the code only for now and not in the UIMA-AS
> documentation. This API still needs work. I was thinking of changing this
> to use Builder pattern to configure deployment using a series of set/add
> calls instead of passing many parameters.
> I can enhance the code to support your suggestion. I will create a new JIRA
> to capture this requirement.
> Thanks
> 
> -jerry
> 
> 
> 
> On Thu, Aug 11, 2016 at 1:26 PM, Richard Eckart de Castilho <re...@apache.org>
> wrote:
> 
>> On 16.07.2015, at 23:10, Jaroslaw Cwiklik <ui...@gmail.com> wrote:
>>> 
>>> The UIMA-AS *does* have an API to generate deployment descriptors
>> although
>>> its not documented. Its an internal API for now and most likely will be
>>> documented in the next release of UIMA-AS. The API is implemented by
>>> DeploymentDescriptorFactory.java. in the uimaj-as-core project.
>> 
>> I see this is documented now.
>> 
>> Would be nice if one could directly set an AnalyisEngineDescriptor in the
>> ServiceContextImpl instead of having to first serialize the AED to a file.
>> 
>> Cheers,
>> 
>> -- Richard
>> 


Re: Creating UIMA-AS deployment descriptors programmatically

Posted by Jaroslaw Cwiklik <ui...@gmail.com>.
I think this is documented in the code only for now and not in the UIMA-AS
documentation. This API still needs work. I was thinking of changing this
to use Builder pattern to configure deployment using a series of set/add
calls instead of passing many parameters.
I can enhance the code to support your suggestion. I will create a new JIRA
to capture this requirement.
Thanks

-jerry



On Thu, Aug 11, 2016 at 1:26 PM, Richard Eckart de Castilho <re...@apache.org>
wrote:

> On 16.07.2015, at 23:10, Jaroslaw Cwiklik <ui...@gmail.com> wrote:
> >
> > The UIMA-AS *does* have an API to generate deployment descriptors
> although
> > its not documented. Its an internal API for now and most likely will be
> > documented in the next release of UIMA-AS. The API is implemented by
> > DeploymentDescriptorFactory.java. in the uimaj-as-core project.
>
> I see this is documented now.
>
> Would be nice if one could directly set an AnalyisEngineDescriptor in the
> ServiceContextImpl instead of having to first serialize the AED to a file.
>
> Cheers,
>
> -- Richard
>

Creating UIMA-AS deployment descriptors programmatically

Posted by Richard Eckart de Castilho <re...@apache.org>.
On 16.07.2015, at 23:10, Jaroslaw Cwiklik <ui...@gmail.com> wrote:
> 
> The UIMA-AS *does* have an API to generate deployment descriptors although
> its not documented. Its an internal API for now and most likely will be
> documented in the next release of UIMA-AS. The API is implemented by
> DeploymentDescriptorFactory.java. in the uimaj-as-core project.

I see this is documented now.

Would be nice if one could directly set an AnalyisEngineDescriptor in the
ServiceContextImpl instead of having to first serialize the AED to a file.

Cheers,

-- Richard

Re: UIMAj3 ideas

Posted by Jaroslaw Cwiklik <ui...@gmail.com>.
The UIMA-AS *does* have an API to generate deployment descriptors although
its not documented. Its an internal API for now and most likely will be
documented in the next release of UIMA-AS. The API is implemented by
 DeploymentDescriptorFactory.java. in the uimaj-as-core project.

Jerry

On Thu, Jul 16, 2015 at 4:56 PM, Thomas Ginter <th...@utah.edu>
wrote:

> Richard,
>
> There is an API in UIMA for generating Analysis Engine Descriptors as well
> as Aggregates and Type System descriptions.  I use that API to generate the
> xml descriptor at runtime after the configuration has been completed.  I
> wrote my own logic to track the delegates of an Aggregate descriptor in
> order to propagate updates to/from delegates to allow the user to
> dynamically specify Analysis Engine parameters.  I also merged the scale
> out parameters for UIMA-AS into the Analysis Engine object for ease of
> configuration.
>
> In addition I wrote my own code to generate the deployment descriptor from
> the programmatic parameters provided.  The resulting XML is what the
> framework uses to generate the Spring Bean file you mentioned.
>
> That being said the existing API definitely has a learning curve which was
> part of the motivation for creating Leo.
>
> Thanks,
>
> Thomas Ginter
> 801-448-7676
> thomas.ginter@utah.edu
>
>
>
>
> > On Jul 16, 2015, at 1:51 PM, Richard Eckart de Castilho <re...@apache.org>
> wrote:
> >
> > Hi Thomas,
> >
> > On 16.07.2015, at 21:42, Thomas Ginter <th...@utah.edu> wrote:
> >
> >> Have you looked into using Leo?  It allows you to programmatically
> create Analysis Engines, Aggregates, the type system, and launch everything
> in UIMA-AS without having to manage any XML descriptors at all.
> Furthermore it is available via Maven so your code can compile an run.
> >
> > Did you find an API in UIMA AS to handle the programmatic generation of
> descriptors, or did you implement that yourself in Leo (as I had tried to
> in DKPro Lab)?
> >
> > If I remember correctly, then UIMA AS loaded plain XML descriptor files,
> transforms them to a Spring Bean file using XSLT and then used Spring to
> instantiate it. But I may have missed something.
> >
> > Cheers,
> >
> > -- Richard
>
>

Re: UIMAj3 ideas

Posted by Richard Eckart de Castilho <re...@apache.org>.
Thomas,

On 16.07.2015, at 22:56, Thomas Ginter <th...@utah.edu> wrote:

> There is an API in UIMA for generating Analysis Engine Descriptors as well as Aggregates and Type System descriptions.  I use that API to generate the xml descriptor at runtime after the configuration has been completed.  I wrote my own logic to track the delegates of an Aggregate descriptor in order to propagate updates to/from delegates to allow the user to dynamically specify Analysis Engine parameters.  I also merged the scale out parameters for UIMA-AS into the Analysis Engine object for ease of configuration.  

we're using the plain UIMA APIs for AED and friends in uimaFIT too - those APIs being not too user-friendly and XML being a pain was the major motivation to come up with uimaFIT. However, uimaFIT doesn't aspire to drive UIMA AS, just to make the core UIMA descriptors easier to handle.

> In addition I wrote my own code to generate the deployment descriptor from the programmatic parameters provided.  The resulting XML is what the framework uses to generate the Spring Bean file you mentioned.


So what you say confirms my findings. I never found a corresponding API for UIMA deployment descriptors in UIMA AS. It would have been great if UIMA AS had provided at least some basic API for deployment descriptors parallel to what UIMA offers for engines and aggregates.

> That being said the existing API definitely has a learning curve which was part of the motivation for creating Leo.

Same for uimaFIT ;) 

Cheers,

-- Richard

Re: UIMAj3 ideas

Posted by Thomas Ginter <th...@utah.edu>.
Richard,

There is an API in UIMA for generating Analysis Engine Descriptors as well as Aggregates and Type System descriptions.  I use that API to generate the xml descriptor at runtime after the configuration has been completed.  I wrote my own logic to track the delegates of an Aggregate descriptor in order to propagate updates to/from delegates to allow the user to dynamically specify Analysis Engine parameters.  I also merged the scale out parameters for UIMA-AS into the Analysis Engine object for ease of configuration.  

In addition I wrote my own code to generate the deployment descriptor from the programmatic parameters provided.  The resulting XML is what the framework uses to generate the Spring Bean file you mentioned.

That being said the existing API definitely has a learning curve which was part of the motivation for creating Leo.

Thanks,

Thomas Ginter
801-448-7676
thomas.ginter@utah.edu




> On Jul 16, 2015, at 1:51 PM, Richard Eckart de Castilho <re...@apache.org> wrote:
> 
> Hi Thomas,
> 
> On 16.07.2015, at 21:42, Thomas Ginter <th...@utah.edu> wrote:
> 
>> Have you looked into using Leo?  It allows you to programmatically create Analysis Engines, Aggregates, the type system, and launch everything in UIMA-AS without having to manage any XML descriptors at all.  Furthermore it is available via Maven so your code can compile an run.  
> 
> Did you find an API in UIMA AS to handle the programmatic generation of descriptors, or did you implement that yourself in Leo (as I had tried to in DKPro Lab)? 
> 
> If I remember correctly, then UIMA AS loaded plain XML descriptor files, transforms them to a Spring Bean file using XSLT and then used Spring to instantiate it. But I may have missed something.
> 
> Cheers,
> 
> -- Richard


Re: UIMAj3 ideas

Posted by Richard Eckart de Castilho <re...@apache.org>.
Hi Thomas,

On 16.07.2015, at 21:42, Thomas Ginter <th...@utah.edu> wrote:

> Have you looked into using Leo?  It allows you to programmatically create Analysis Engines, Aggregates, the type system, and launch everything in UIMA-AS without having to manage any XML descriptors at all.  Furthermore it is available via Maven so your code can compile an run.  

Did you find an API in UIMA AS to handle the programmatic generation of descriptors, or did you implement that yourself in Leo (as I had tried to in DKPro Lab)? 

If I remember correctly, then UIMA AS loaded plain XML descriptor files, transforms them to a Spring Bean file using XSLT and then used Spring to instantiate it. But I may have missed something.

Cheers,

-- Richard 

Re: UIMAj3 ideas

Posted by Petr Baudis <pa...@ucw.cz>.
  Hi!

On Thu, Jul 16, 2015 at 07:42:58PM +0000, Thomas Ginter wrote:
> Have you looked into using Leo?  It allows you to programmatically create Analysis Engines, Aggregates, the type system, and launch everything in UIMA-AS without having to manage any XML descriptors at all.  Furthermore it is available via Maven so your code can compile an run.  
> 
> http://department-of-veterans-affairs.github.io/Leo/userguide.html

  I had a look, but got the impression that I'd have to rewrite most
of my pipeline generation code, and it's not small code.  Also, it's
not clear to me from Leo's docs whether and/or how it supports CAS
multipliers and mergers, there seem to be no references to that.

  This impression might have been wrong, but overally I'd just welcome
if I could stick with stock UIMA for scaleout at least in the form
of multi-threading without cluster scaleout (which I think many UIMA
users would welcome, and much smaller percentage wants to deploy to
a cluster), that's what I was trying to say originally.

-- 
				Petr Baudis
	If you have good ideas, good data and fast computers,
	you can do almost anything. -- Geoffrey Hinton

Re: UIMAj3 ideas

Posted by Thomas Ginter <th...@utah.edu>.
Hi Petr,

Have you looked into using Leo?  It allows you to programmatically create Analysis Engines, Aggregates, the type system, and launch everything in UIMA-AS without having to manage any XML descriptors at all.  Furthermore it is available via Maven so your code can compile an run.  

http://department-of-veterans-affairs.github.io/Leo/userguide.html

The only catch to running UIMA-AS is making sure the broker is running.  A manual step that we have not yet automated.  Other than that it can scale most pipelines with the notable exception of pipelines that have really large resources.

As for ideas for UIMA 3 I would love to see a much simpler CAS system that didn’t require a pre-definition of types before execution.  Such as a very simple abstract base class that defines an “annotation” and is then extended in order to create/use a new type.  It seems like the basic location based indexes could still be provided that way as well as the option of extending to provide custom indexes.  If the CAS was implemented as a base set of very simple Java objects we would also have more serialization options.  Possibly even making it possible for the user to plug in a different serializer if required such as protobuff.  Just a thought.

Thanks,

Thomas Ginter
801-448-7676
thomas.ginter@utah.edu




> On Jul 16, 2015, at 10:25 AM, Petr Baudis <pa...@ucw.cz> wrote:
> 
>  Hi!
> 
> On Fri, Jul 10, 2015 at 10:28:08AM -0400, Eddie Epstein wrote:
>> Good comments which will likely generate lots of responses.
>> For now please see comments on scaleout below.
>> 
>> On Thu, Jul 9, 2015 at 6:52 PM, Petr Baudis <pa...@ucw.cz> wrote:
>> 
>>>  * UIMAfit is not part of core UIMA and UIMA-AS is not part of core
>>>    UIMA.  It seems to me that UIMA-AS is doing things a bit differently
>>>    than what the original UIMA idea of doing scaleout was.  The two
>>>    things don't play well together.  I'd love a way to easily take
>>>    my plain UIMA pipeline and scale it out, ideally without any code
>>>    changes, *and* avoid the terrible XML config files.
>>> 
>>> 
>> Not clear what you are referring to as the "original UIMA idea of doing
>> scaleout",
>> the CPE? Core UIMA is a single threaded, embeddable framework. UIMA-AS
>> is also an embeddable framework that offers flexible vertical
>> (multi-threading) and
>> horizontal (multi-process) options for deploying an arbitrary pipeline.
>> Admittedly
>> scaleout with UIMA-AS is complicated and the minimal support for process
>> management make it difficult to do scaleout simply. In what ways do you
>> think
>> UIMA-AS is inconsistent with UIMA or UIMA scaleout?
> 
>  Well, my impression after delving into some UIMA internals was that
> the original idea was to use the Analysis Structure Broker to control
> the pipeline flow and it would seem natural that when doing scale-out,
> one would simply provide a different ASB.  Its javadoc even reads
> 
>> The Analysis Structure Broker (<code>ASB</code>) is the component
>> responsible for the details of communicating with Analysis Engines
>> that may potentially be distributed across different physical
>> machines.
> 
> Of course, maybe I got it wrong.
> 
>> DUCC is full cluster management application that will scaleout a plain UIMA
>> pipeline with no code changes, assuming that the application code is
>> threadsafe.
>> But a typical pipeline with a single collection reader creating input CASes
>> and
>> a single cas consumer will limit scaleout performance pretty quickly. DUCC
>> makes it easyto eliminate the input data bottleneck. DUCC sample apps
>> show one approach to eliminating the output bottleneck. Have you looked at
>> DUCC?
> 
>  I use UIMA pipeline for question answering, where each question
> currently takes ~30s (single-threaded) to process (a lot of it spent
> waiting on databases), so I don't think I'd hit such a bottleneck.
> I did spend a few tens of minutes looking at DUCC, but I got the
> impression that it's not really trivial to set up.
> 
>  One of my goals is to minimize setup hassles for anyone who wants to
> run my software - ideally, they should be able to just compile and run.
> If I started to use DUCC, I'm not sure to what degree I could preserve
> this, but at least it's another element in the already steep learning
> curve for anyone who wants to tinker with the system.
> 
>  (Then there's this whole issue of UIMA-AS vs. UIMAfit and in-memory
> resource sharing - though from one of your previous emails, I got the
> impression that I could run multiple AEs in threads of a single java
> process; but I guess at that point I was already decided that I want
> to try something less complex.)
> 
> -- 
> 				Petr Baudis
> 	If you have good ideas, good data and fast computers,
> 	you can do almost anything. -- Geoffrey Hinton


Re: UIMAj3 ideas

Posted by Petr Baudis <pa...@ucw.cz>.
  Hi!

On Fri, Jul 10, 2015 at 10:28:08AM -0400, Eddie Epstein wrote:
> Good comments which will likely generate lots of responses.
> For now please see comments on scaleout below.
> 
> On Thu, Jul 9, 2015 at 6:52 PM, Petr Baudis <pa...@ucw.cz> wrote:
> 
> >   * UIMAfit is not part of core UIMA and UIMA-AS is not part of core
> >     UIMA.  It seems to me that UIMA-AS is doing things a bit differently
> >     than what the original UIMA idea of doing scaleout was.  The two
> >     things don't play well together.  I'd love a way to easily take
> >     my plain UIMA pipeline and scale it out, ideally without any code
> >     changes, *and* avoid the terrible XML config files.
> >
> >
> Not clear what you are referring to as the "original UIMA idea of doing
> scaleout",
> the CPE? Core UIMA is a single threaded, embeddable framework. UIMA-AS
> is also an embeddable framework that offers flexible vertical
> (multi-threading) and
> horizontal (multi-process) options for deploying an arbitrary pipeline.
> Admittedly
> scaleout with UIMA-AS is complicated and the minimal support for process
> management make it difficult to do scaleout simply. In what ways do you
> think
> UIMA-AS is inconsistent with UIMA or UIMA scaleout?

  Well, my impression after delving into some UIMA internals was that
the original idea was to use the Analysis Structure Broker to control
the pipeline flow and it would seem natural that when doing scale-out,
one would simply provide a different ASB.  Its javadoc even reads

> The Analysis Structure Broker (<code>ASB</code>) is the component
> responsible for the details of communicating with Analysis Engines
> that may potentially be distributed across different physical
> machines.

Of course, maybe I got it wrong.

> DUCC is full cluster management application that will scaleout a plain UIMA
> pipeline with no code changes, assuming that the application code is
> threadsafe.
> But a typical pipeline with a single collection reader creating input CASes
> and
> a single cas consumer will limit scaleout performance pretty quickly. DUCC
> makes it easyto eliminate the input data bottleneck. DUCC sample apps
> show one approach to eliminating the output bottleneck. Have you looked at
> DUCC?

  I use UIMA pipeline for question answering, where each question
currently takes ~30s (single-threaded) to process (a lot of it spent
waiting on databases), so I don't think I'd hit such a bottleneck.
I did spend a few tens of minutes looking at DUCC, but I got the
impression that it's not really trivial to set up.

  One of my goals is to minimize setup hassles for anyone who wants to
run my software - ideally, they should be able to just compile and run.
If I started to use DUCC, I'm not sure to what degree I could preserve
this, but at least it's another element in the already steep learning
curve for anyone who wants to tinker with the system.

  (Then there's this whole issue of UIMA-AS vs. UIMAfit and in-memory
resource sharing - though from one of your previous emails, I got the
impression that I could run multiple AEs in threads of a single java
process; but I guess at that point I was already decided that I want
to try something less complex.)

-- 
				Petr Baudis
	If you have good ideas, good data and fast computers,
	you can do almost anything. -- Geoffrey Hinton

Re: UIMAj3 ideas

Posted by Eddie Epstein <ea...@gmail.com>.
Hi Petr,

Good comments which will likely generate lots of responses.
For now please see comments on scaleout below.

On Thu, Jul 9, 2015 at 6:52 PM, Petr Baudis <pa...@ucw.cz> wrote:

>   * UIMAfit is not part of core UIMA and UIMA-AS is not part of core
>     UIMA.  It seems to me that UIMA-AS is doing things a bit differently
>     than what the original UIMA idea of doing scaleout was.  The two
>     things don't play well together.  I'd love a way to easily take
>     my plain UIMA pipeline and scale it out, ideally without any code
>     changes, *and* avoid the terrible XML config files.
>
>
Not clear what you are referring to as the "original UIMA idea of doing
scaleout",
the CPE? Core UIMA is a single threaded, embeddable framework. UIMA-AS
is also an embeddable framework that offers flexible vertical
(multi-threading) and
horizontal (multi-process) options for deploying an arbitrary pipeline.
Admittedly
scaleout with UIMA-AS is complicated and the minimal support for process
management make it difficult to do scaleout simply. In what ways do you
think
UIMA-AS is inconsistent with UIMA or UIMA scaleout?

DUCC is full cluster management application that will scaleout a plain UIMA
pipeline with no code changes, assuming that the application code is
threadsafe.
But a typical pipeline with a single collection reader creating input CASes
and
a single cas consumer will limit scaleout performance pretty quickly. DUCC
makes it easyto eliminate the input data bottleneck. DUCC sample apps
show one approach to eliminating the output bottleneck. Have you looked at
DUCC?

Regards,
Eddie

Re: [ANN] Multi-threaded UIMA ASB

Posted by Marshall Schor <ms...@schor.com>.
Hi, just saw this ...

I'll take a look.  This kind of thing is "on the list" for uima v3; see
https://cwiki.apache.org/confluence/display/UIMA/Ideas+for+UIMAJ+v3

-Marshall

On 6/14/2015 8:20 PM, Petr Baudis wrote:
>   Hi!
>
>   I have created an extension of UIMA that replaces its default ASB
> with a multi-threaded one, so that if you have a CAS multiplier in your
> pipeline, multiple generated CASes may be processed in parallel in
> different threads.  It has a few warts, but should be generally much
> simpler to use than UIMA-AS if you do not need fancy things like cluster
> deployment.
>
>   It even has some documentation now.  Find it at:
>
> 	https://github.com/brmson/yodaqa/tree/master/src/main/java/cz/brmlab/yodaqa/flow/asb
>
>   (Right now, it just lives as part of my YodaQA software, simply copy
> that directory to your project.  I can spin-off the package properly
> if there'll be enough interest in it.  It shares the YodaQA licence
> statement, i.e. ASL2.)
>
>
> On Wed, May 20, 2015 at 03:27:20AM +0200, Petr Baudis wrote:
>>   I'm looking into ways to run a part of my pipeline multi-threaded:
> ..snip..
>>   (i) I'm using UIMAfit heavily, and multiple CAS multipliers and
>> mergers (even within the parallel branches).  So I can't use CPE.
>>
>>   (ii) I need multi-threading, not separate processes.  (I have just
>> a meager 24G RAM (sigh) and one Java process with all the linguistic
>> models and stuff loaded takes 3GB RAM.  So I really need to load these
>> resources to memory only once.)
> ..snip..
>>   However, (before actually trying) it still seems to me to be much
>> easier to rewrite a piece of the stock ASB than use UIMA-AS with complex
>> pipeline construed by UIMAfit...  So I think I will try that first (and
>> report back).
>   Whew, this was not so easy!  It took a good few days (and a few
> start-overs) to do and debug, and I learnt more about UIMAj internals
> than I ever cared to. ;-)  But I think I'm still happier with the result
> than if I used UIMA-AS and it doesn't seem to deadlock or crash anymore
> even on (IMHO) a fairly massive pipeline.
>
>   (What I'm bothered by the most at this point is the fixed-size CAS
> pool, though there are a few more issues; I tried to document them all
> as well.)
>
>   P.S.: Would there be any interest in merging this to UIMA proper,
> or at least cleaning up some UIMA API bits to simplify and future-proof
> the external package?  I admit up-front that I probably won't have time
> to do all that work myself, but I'd be happy to cooperate with someone.
>