You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@uima.apache.org by Timo Boehme <ti...@ontochem.com> on 2012/10/10 12:00:52 UTC

Parallel CAS consumer

Hi,

is there any possibility without using UIMA-AS to run different CAS 
consumer components of a pipeline in parallel?
The standard behavior is that the consumer are called in sequence, but 
since in my case they don't depend on each other it would be more 
efficient to have them run in parallel. Can I use CAS multiplier + Flow 
control to achieve this?


Thanks,
Timo

-- 

  Timo Boehme
  OntoChem GmbH
  H.-Damerow-Str. 4
  06120 Halle/Saale
  T: +49 345 4780474
  F: +49 345 4780471
  timo.boehme@ontochem.com

_____________________________________________________________________

  OntoChem GmbH
  Geschäftsführer: Dr. Lutz Weber
  Sitz: Halle / Saale
  Registergericht: Stendal
  Registernummer: HRB 215461
_____________________________________________________________________

Re: Parallel CAS consumer

Posted by Eddie Epstein <ea...@gmail.com>.

On Fri, Oct 12, 2012 at 9:34 AM, Timo Boehme <ti...@ontochem.com> wrote:
>
> Thank you for the answer. I will give it a try.
> I wasn't sure if using a CAS multiplier will run the following annotators
> for the different CASes in different threads - or is it the flow controller
> which has to do the treading (?) - I will dig a bit deeper into the API to
> find out how this works.
> Thanks again.
>
> Timo
>

UIMA-AS will put every asynchronous component in a separate thread.
Using the ComponentDescriptorEditor on a UIMA-AS deployment
descriptor, marking an aggregate with "Run as AS aggregate" will make
every delegate in *that* aggregate an asynchronous component.

The flow controller doesn't "run" anything, it just returns the name
of the next delegate to run its CAS. Every CAS has its own flow
controller object. Always best to debug an aggregate in single
threaded mode before deploying it with UIMA-AS.

Eddie

Re: Parallel CAS consumer

Posted by Timo Boehme <ti...@ontochem.com>.

Am 12.10.2012 15:26, schrieb Eddie Epstein:
> On Wed, Oct 10, 2012 at 10:30 AM, Timo Boehme <ti...@ontochem.com> wrote:
>> This is exactly the solution I was thinking about. However I haven't used
>> UIMA-AS so far since we do not split processing over multiple machines but
>> use a multi-core server. Thus I don't want to set up a resource consuming
>> service infrastructure with slow socket communication but would like to run
>> it in the same Java VM.
>> Now it's not clear to me if this is nevertheless possible with UIMA-AS ("...
>> in the same process")?
>
> The second alternative described earlier would run the two annotators
> in parallel in the same process. Again, the idea is to use a Cas
> multiplier to make a copy of the CAS and then use a flow controller to
> send the two CASes in parallel to the two annotators.

Thank you for the answer. I will give it a try.
I wasn't sure if using a CAS multiplier will run the following 
annotators for the different CASes in different threads - or is it the 
flow controller which has to do the treading (?) - I will dig a bit 
deeper into the API to find out how this works.
Thanks again.

Timo

-- 

  Timo Boehme
  OntoChem GmbH
  H.-Damerow-Str. 4
  06120 Halle/Saale
  T: +49 345 4780474
  F: +49 345 4780471
  timo.boehme@ontochem.com

_____________________________________________________________________

  OntoChem GmbH
  Geschäftsführer: Dr. Lutz Weber
  Sitz: Halle / Saale
  Registergericht: Stendal
  Registernummer: HRB 215461
_____________________________________________________________________

Re: Parallel CAS consumer

Posted by Eddie Epstein <ea...@gmail.com>.

On Wed, Oct 10, 2012 at 10:30 AM, Timo Boehme <ti...@ontochem.com> wrote:
> This is exactly the solution I was thinking about. However I haven't used
> UIMA-AS so far since we do not split processing over multiple machines but
> use a multi-core server. Thus I don't want to set up a resource consuming
> service infrastructure with slow socket communication but would like to run
> it in the same Java VM.
> Now it's not clear to me if this is nevertheless possible with UIMA-AS ("...
> in the same process")?

The second alternative described earlier would run the two annotators
in parallel in the same process. Again, the idea is to use a Cas
multiplier to make a copy of the CAS and then use a flow controller to
send the two CASes in parallel to the two annotators.

Eddie

Re: Parallel CAS consumer

Posted by Timo Boehme <ti...@ontochem.com>.

Hi,

thank you very much for all the feedback.

Am 10.10.2012 15:24, schrieb Eddie Epstein:
> ...
> Another approach with all in the same process, also using UIMA-AS,
> would be to use a CAS multiplier to replicate a CAS and have the
> flow controller send a copy to each of the two delegates, in parallel.
> The results could then be merged back into one CAS with another
> CAS multiplier. Both CM and the two delegate could be implemented
> in an aggregate, so that the child CASes would only exist in the
> aggregate and all results would be returned in the original input CAS.

This is exactly the solution I was thinking about. However I haven't 
used UIMA-AS so far since we do not split processing over multiple 
machines but use a multi-core server. Thus I don't want to set up a 
resource consuming service infrastructure with slow socket communication 
but would like to run it in the same Java VM.
Now it's not clear to me if this is nevertheless possible with UIMA-AS 
("... in the same process")?


Thanks,
Timo


> On Wed, Oct 10, 2012 at 8:56 AM, Jens Grivolla <j+...@grivolla.net> wrote:
>> Hi all,
>>
>> from what I understand this does not involve CAS multipliers at all, but
>> simply a flow where all CAS consumers are done in one "parallel step".
>>
>> Apparently this can't be done in a CPE so you would need an aggregate of all
>> the CAS consumers, and have a parallel flow controller for that aggregate.
>>
>> However, that wouldn't really do any good according to the documentation:
>> "ParallelStep, which specifies that multiple Analysis Engines should receive
>> the CAS next, and that the relative order in which these Analysis Engines
>> execute does not matter. Logically, they can run in parallel. The runtime is
>> not obligated to actually execute them in parallel, however, and the current
>> implementation will execute them serially in an arbitrary order."
>>
>> Best,
>> Jens
>>
>>
>> On 10/10/2012 12:39 PM, Richard Eckart de Castilho wrote:
>>>
>>> Hi,
>>>
>>> I see. I think this is not possible. To my knowledge CPE (which you
>>> probably use) does not support CAS multipliers. I'm not too familiar with
>>> UIMA-AS, are you sure that it supports such a scenario?
>>>
>>> If you manage to get realize the scenario as you described, it would be
>>> great to hear how you did it.
>>>
>>> Best,
>>>
>>> -- Richard
>>>
>>> Am 10.10.2012 um 12:15 schrieb Timo Boehme <ti...@ontochem.com>
>>> :
>>>
>>>> Hi,
>>>>
>>>> Am 10.10.2012 12:05, schrieb Richard Eckart de Castilho:
>>>>>
>>>>> the main difference between CAS consumers and analysis engines is
>>>>> that the former be default run only a single instance and the latter
>>>>> can be multiplied. If your consumer code can be run in parallel, just
>>>>> try inheriting from AnalysisEngine_ImplBase (or something like that)
>>>>> instead.
>>>>
>>>>
>>>> Thanks for your answer. However each single consumer must run as single
>>>> instance (e.g. one database consumer, one consumer writing to a file; each
>>>> of them need to run as single instance). Thus I would like to have a single
>>>> instance per consumer but the different consumer to run in parallel.
>>>>
>>>>
>>>> Kind regards,
>>>> Timo
>>>>
>>>>> Am 10.10.2012 um 12:00 schrieb Timo Boehme <ti...@ontochem.com>
>>>>> :
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> is there any possibility without using UIMA-AS to run different CAS
>>>>>> consumer components of a pipeline in parallel?
>>>>>> The standard behavior is that the consumer are called in sequence, but
>>>>>> since in my case they don't depend on each other it would be more efficient
>>>>>> to have them run in parallel. Can I use CAS multiplier + Flow control to
>>>>>> achieve this?
>>>
>>>
>>>
>>
>>


-- 

  Timo Boehme
  OntoChem GmbH
  H.-Damerow-Str. 4
  06120 Halle/Saale
  T: +49 345 4780474
  F: +49 345 4780471
  timo.boehme@ontochem.com

_____________________________________________________________________

  OntoChem GmbH
  Geschäftsführer: Dr. Lutz Weber
  Sitz: Halle / Saale
  Registergericht: Stendal
  Registernummer: HRB 215461
_____________________________________________________________________

Re: Parallel CAS consumer

Posted by Burn Lewis <bu...@gmail.com>.

CAS access is not thread-safe in UIMA hence the need to copy the CAS so
each consumer can operate on different instances of the same data.  Since
the consumers are presumably at the end of the pipeline and probably add
nothing to the CASes, there would be no need to merge the child copies with
a 2nd CM.

~Burn

Re: Parallel CAS consumer

Posted by Eddie Epstein <ea...@gmail.com>.

The documentation for parallel step quoted here is for core UIMA,
which is single threaded. UIMA-AS implements parallel step
processing for delegates that are running as UIMA-AS services;
it implements CAS merging in the CAS-deserialization code.

Another approach with all in the same process, also using UIMA-AS,
would be to use a CAS multiplier to replicate a CAS and have the
flow controller send a copy to each of the two delegates, in parallel.
The results could then be merged back into one CAS with another
CAS multiplier. Both CM and the two delegate could be implemented
in an aggregate, so that the child CASes would only exist in the
aggregate and all results would be returned in the original input CAS.

Eddie

On Wed, Oct 10, 2012 at 8:56 AM, Jens Grivolla <j+...@grivolla.net> wrote:
> Hi all,
>
> from what I understand this does not involve CAS multipliers at all, but
> simply a flow where all CAS consumers are done in one "parallel step".
>
> Apparently this can't be done in a CPE so you would need an aggregate of all
> the CAS consumers, and have a parallel flow controller for that aggregate.
>
> However, that wouldn't really do any good according to the documentation:
> "ParallelStep, which specifies that multiple Analysis Engines should receive
> the CAS next, and that the relative order in which these Analysis Engines
> execute does not matter. Logically, they can run in parallel. The runtime is
> not obligated to actually execute them in parallel, however, and the current
> implementation will execute them serially in an arbitrary order."
>
> Best,
> Jens
>
>
> On 10/10/2012 12:39 PM, Richard Eckart de Castilho wrote:
>>
>> Hi,
>>
>> I see. I think this is not possible. To my knowledge CPE (which you
>> probably use) does not support CAS multipliers. I'm not too familiar with
>> UIMA-AS, are you sure that it supports such a scenario?
>>
>> If you manage to get realize the scenario as you described, it would be
>> great to hear how you did it.
>>
>> Best,
>>
>> -- Richard
>>
>> Am 10.10.2012 um 12:15 schrieb Timo Boehme <ti...@ontochem.com>
>> :
>>
>>> Hi,
>>>
>>> Am 10.10.2012 12:05, schrieb Richard Eckart de Castilho:
>>>>
>>>> the main difference between CAS consumers and analysis engines is
>>>> that the former be default run only a single instance and the latter
>>>> can be multiplied. If your consumer code can be run in parallel, just
>>>> try inheriting from AnalysisEngine_ImplBase (or something like that)
>>>> instead.
>>>
>>>
>>> Thanks for your answer. However each single consumer must run as single
>>> instance (e.g. one database consumer, one consumer writing to a file; each
>>> of them need to run as single instance). Thus I would like to have a single
>>> instance per consumer but the different consumer to run in parallel.
>>>
>>>
>>> Kind regards,
>>> Timo
>>>
>>>> Am 10.10.2012 um 12:00 schrieb Timo Boehme <ti...@ontochem.com>
>>>> :
>>>>
>>>>> Hi,
>>>>>
>>>>> is there any possibility without using UIMA-AS to run different CAS
>>>>> consumer components of a pipeline in parallel?
>>>>> The standard behavior is that the consumer are called in sequence, but
>>>>> since in my case they don't depend on each other it would be more efficient
>>>>> to have them run in parallel. Can I use CAS multiplier + Flow control to
>>>>> achieve this?
>>
>>
>>
>
>

Re: Parallel CAS consumer

Posted by Jens Grivolla <j+...@grivolla.net>.

Hi all,

from what I understand this does not involve CAS multipliers at all, but 
simply a flow where all CAS consumers are done in one "parallel step".

Apparently this can't be done in a CPE so you would need an aggregate of 
all the CAS consumers, and have a parallel flow controller for that 
aggregate.

However, that wouldn't really do any good according to the 
documentation: "ParallelStep, which specifies that multiple Analysis 
Engines should receive the CAS next, and that the relative order in 
which these Analysis Engines execute does not matter. Logically, they 
can run in parallel. The runtime is not obligated to actually execute 
them in parallel, however, and the current implementation will execute 
them serially in an arbitrary order."

Best,
Jens

On 10/10/2012 12:39 PM, Richard Eckart de Castilho wrote:
> Hi,
>
> I see. I think this is not possible. To my knowledge CPE (which you probably use) does not support CAS multipliers. I'm not too familiar with UIMA-AS, are you sure that it supports such a scenario?
>
> If you manage to get realize the scenario as you described, it would be great to hear how you did it.
>
> Best,
>
> -- Richard
>
> Am 10.10.2012 um 12:15 schrieb Timo Boehme <ti...@ontochem.com>
> :
>
>> Hi,
>>
>> Am 10.10.2012 12:05, schrieb Richard Eckart de Castilho:
>>> the main difference between CAS consumers and analysis engines is
>>> that the former be default run only a single instance and the latter
>>> can be multiplied. If your consumer code can be run in parallel, just
>>> try inheriting from AnalysisEngine_ImplBase (or something like that)
>>> instead.
>>
>> Thanks for your answer. However each single consumer must run as single instance (e.g. one database consumer, one consumer writing to a file; each of them need to run as single instance). Thus I would like to have a single instance per consumer but the different consumer to run in parallel.
>>
>>
>> Kind regards,
>> Timo
>>
>>> Am 10.10.2012 um 12:00 schrieb Timo Boehme <ti...@ontochem.com>
>>> :
>>>
>>>> Hi,
>>>>
>>>> is there any possibility without using UIMA-AS to run different CAS consumer components of a pipeline in parallel?
>>>> The standard behavior is that the consumer are called in sequence, but since in my case they don't depend on each other it would be more efficient to have them run in parallel. Can I use CAS multiplier + Flow control to achieve this?
>
>

Re: Parallel CAS consumer

Posted by Richard Eckart de Castilho <ec...@ukp.informatik.tu-darmstadt.de>.

Hi,

I see. I think this is not possible. To my knowledge CPE (which you probably use) does not support CAS multipliers. I'm not too familiar with UIMA-AS, are you sure that it supports such a scenario?

If you manage to get realize the scenario as you described, it would be great to hear how you did it.

Best,

-- Richard

Am 10.10.2012 um 12:15 schrieb Timo Boehme <ti...@ontochem.com>
:

> Hi,
> 
> Am 10.10.2012 12:05, schrieb Richard Eckart de Castilho:
>> the main difference between CAS consumers and analysis engines is
>> that the former be default run only a single instance and the latter
>> can be multiplied. If your consumer code can be run in parallel, just
>> try inheriting from AnalysisEngine_ImplBase (or something like that)
>> instead.
> 
> Thanks for your answer. However each single consumer must run as single instance (e.g. one database consumer, one consumer writing to a file; each of them need to run as single instance). Thus I would like to have a single instance per consumer but the different consumer to run in parallel.
> 
> 
> Kind regards,
> Timo
> 
>> Am 10.10.2012 um 12:00 schrieb Timo Boehme <ti...@ontochem.com>
>> :
>> 
>>> Hi,
>>> 
>>> is there any possibility without using UIMA-AS to run different CAS consumer components of a pipeline in parallel?
>>> The standard behavior is that the consumer are called in sequence, but since in my case they don't depend on each other it would be more efficient to have them run in parallel. Can I use CAS multiplier + Flow control to achieve this?


-- 
------------------------------------------------------------------- 
Richard Eckart de Castilho
Technical Lead
Ubiquitous Knowledge Processing Lab (UKP-TUD) 
FB 20 Computer Science Department      
Technische Universität Darmstadt 
Hochschulstr. 10, D-64289 Darmstadt, Germany 
phone [+49] (0)6151 16-7477, fax -5455, room S2/02/B117
eckart@ukp.informatik.tu-darmstadt.de 
www.ukp.tu-darmstadt.de 
Web Research at TU Darmstadt (WeRC) www.werc.tu-darmstadt.de
-------------------------------------------------------------------

Re: Parallel CAS consumer

Posted by Timo Boehme <ti...@ontochem.com>.

Hi,

Am 10.10.2012 12:05, schrieb Richard Eckart de Castilho:
> the main difference between CAS consumers and analysis engines is
> that the former be default run only a single instance and the latter
> can be multiplied. If your consumer code can be run in parallel, just
> try inheriting from AnalysisEngine_ImplBase (or something like that)
> instead.

Thanks for your answer. However each single consumer must run as single 
instance (e.g. one database consumer, one consumer writing to a file; 
each of them need to run as single instance). Thus I would like to have 
a single instance per consumer but the different consumer to run in 
parallel.


Kind regards,
Timo

> Am 10.10.2012 um 12:00 schrieb Timo Boehme <ti...@ontochem.com>
> :
>
>> Hi,
>>
>> is there any possibility without using UIMA-AS to run different CAS consumer components of a pipeline in parallel?
>> The standard behavior is that the consumer are called in sequence, but since in my case they don't depend on each other it would be more efficient to have them run in parallel. Can I use CAS multiplier + Flow control to achieve this?
>>
>>
>> Thanks,
>> Timo
>>
>> --
>>
>> Timo Boehme
>> OntoChem GmbH
>> H.-Damerow-Str. 4
>> 06120 Halle/Saale
>> T: +49 345 4780474
>> F: +49 345 4780471
>> timo.boehme@ontochem.com
>
>


-- 

  Timo Boehme
  OntoChem GmbH
  H.-Damerow-Str. 4
  06120 Halle/Saale
  T: +49 345 4780474
  F: +49 345 4780471
  timo.boehme@ontochem.com

_____________________________________________________________________

  OntoChem GmbH
  Geschäftsführer: Dr. Lutz Weber
  Sitz: Halle / Saale
  Registergericht: Stendal
  Registernummer: HRB 215461
_____________________________________________________________________

Re: Parallel CAS consumer

Posted by Richard Eckart de Castilho <ec...@ukp.informatik.tu-darmstadt.de>.

Hi,

the main difference between CAS consumers and analysis engines is that the former be default run only a single instance and the latter can be multiplied. If your consumer code can be run in parallel, just try inheriting from AnalysisEngine_ImplBase (or something like that) instead.

Cheers,

-- Richard

Am 10.10.2012 um 12:00 schrieb Timo Boehme <ti...@ontochem.com>
:

> Hi,
> 
> is there any possibility without using UIMA-AS to run different CAS consumer components of a pipeline in parallel?
> The standard behavior is that the consumer are called in sequence, but since in my case they don't depend on each other it would be more efficient to have them run in parallel. Can I use CAS multiplier + Flow control to achieve this?
> 
> 
> Thanks,
> Timo
> 
> -- 
> 
> Timo Boehme
> OntoChem GmbH
> H.-Damerow-Str. 4
> 06120 Halle/Saale
> T: +49 345 4780474
> F: +49 345 4780471
> timo.boehme@ontochem.com


-- 
------------------------------------------------------------------- 
Richard Eckart de Castilho
Technical Lead
Ubiquitous Knowledge Processing Lab (UKP-TUD) 
FB 20 Computer Science Department      
Technische Universität Darmstadt 
Hochschulstr. 10, D-64289 Darmstadt, Germany 
phone [+49] (0)6151 16-7477, fax -5455, room S2/02/B117
eckart@ukp.informatik.tu-darmstadt.de 
www.ukp.tu-darmstadt.de 
Web Research at TU Darmstadt (WeRC) www.werc.tu-darmstadt.de
-------------------------------------------------------------------