You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@uima.apache.org by Arun Tewatia <ar...@orkash.com> on 2011/09/23 06:05:21 UTC

remoteAnalysisEngine services not scaling to effect

Hi all, 
I am using UIMA-AS for some time now. I have been trying to divide my UIMA pipeline and scale it out on a cluster environment. 

Here's my setup : 

UIMA-AS 2.3.1 
Machine 1 : Broker , 1 instance of aggregate X on Queue-X (containing entries for 2 remoteAnalysisEngines and 1 remote CasConsumer all on different Queues - a,b,c respectively ). 
Machine 2 : analysisEngine A ( on Queue-a 5 instances of service ) 
Machine 3 : analysisEngine B ( on Queue-b 5 instances of service ) 
Machine 4 : CasConsumer C ( on Queue-c 5 instances of service ) 

The CasConsumer puts the output in database. analysisEngines A & B are primitives. 

Problem : The output in database is not synchronized, also the processing time didn't reduce by increasing number of instances of either of the Queues a ,b, c or x. 

When i run 1 instance of CasConsumer then the output produced in database is correct, but still the processing time is not showing any improvement by increasing number of services deployed of analysisEngines. 
I have tried remoteReplyQueueScaleout parameter also, but no effect then too. 

Am i going wrong somewhere in my understanding of UIMA-AS. Please help me in this.......... 

Arun Tewatia

Re: Scaling using Hadoop

Posted by Julien Nioche <li...@gmail.com>.

Would just like to mention that Behemoth (
https://github.com/jnioche/behemoth) can be used to run UIMA on Hadoop.
There is a new project S4 in incubation at Apache which could be used to do
on the fly processing, but I haven't looked at the details yet

Julien

On 6 October 2011 07:01, Thilo Götz <tw...@gmx.de> wrote:

> Forgot to mention performance.  Latency is pretty bad, but
> once it gets going, it's pretty fast in my experience.  We
> get near-linear scale out on multiple nodes.  I have less
> experience with using larger, multi-core machines.
>
> So use hadoop when you have thousands of documents you need
> to process in batch mode, and you can easily replicate your
> processing pipeline multiple times.  For those scenarios,
> it works well.  That is not to say it couldn't work in other
> setups as well, I simply never tried it.  It may well be
> that you can bring latency down by being a bit cleverer in
> your hadoop setup, but for the batch scenarios, it's not
> worth the trouble.
>
> --Thilo
>
> On 06/10/11 07:43, Thilo Götz wrote:
> > On 05/10/11 22:43, Marshall Schor wrote:
> >> We use hadoop with UIMA.  Here's the "fit", in one case:
> >>
> >> 1) UIMA runs as the map step; we put the uima pipeline into the mapper.
>  Hadoop
> >> has a configure (?) method where you can stick the creation and set up
> of the
> >> uima pipeline, similar to UIMA's initialize.
> >>
> >> 2) Write a hadoop record reader that reads input from hadoop's "splits",
> and
> >> creates things that would go into individual CASes.  These are the input
> to the
> >> Map step.
> >>
> >> 3) The map takes the input (a string, say), and puts it into a CAS, and
> then
> >> calls the process() method on the engine it set up and initialized in
> step 1.
> >>
> >> 4) When the process method returns, the CAS has all the results -
> iterate thru
> >> it and extract whatever you want, and stick those values into your
> hadoop output
> >> object, and output it.
> >>
> >> 5) The reduce step can take all of these output objects (which can be
> sorted as
> >> you wish) and do whatever you want with them.
> >
> > That basically sums it up.  We (and that's a different we than Marshall's
> we)
> > use hadoop only for batch processing, but since that's the only
> processing
> > we're currently doing, that works out well.  We use hdfs as the
> underlying
> > storage normally.
> >
> > --Thilo
> >
> >>
> >> We usually replicate our data 2x in Hadoop Distributed File System, so
> that big
> >> runs don't fail due to single failures of disk drives.
> >>
> >> HTH. -Marshall
> >>
> >> On 10/5/2011 2:24 PM, Greg Holmberg wrote:
> >>> On Tue, 27 Sep 2011 01:06:02 -0700, Thilo Götz <tw...@gmx.de> wrote:
> >>>
> >>>> On 26/09/11 22:31, Greg Holmberg wrote:
> >>>>>
> >>>>> This is what I'm doing.  I use JavaSpaces (producer/consumer queue),
> but I'm
> >>>>> sure you can get the same effect with UIMA AS and ActiveMQ.
> >>>>
> >>>> Or Hadoop.
> >>>
> >>> Thilo, could you expand on this?  Exactly how do you use Hadoop to
> scale UIMA?
> >>>
> >>> What storage do you use under Hadoop (HDFS, Hbase, Hive, etc), and what
> is
> >>> your final storage destination for the CAS data?
> >>>
> >>> Are you doing on-demand, streaming, or batch processing of documents?
> >>>
> >>> What are your key/value pairs?  URLs?  What's your map step, what's
> your
> >>> reduce step?
> >>>
> >>> How do you partition?  Do you find the system is load balanced?  What
> level of
> >>> efficiency do you get?  What level of CPU utilization?
> >>>
> >>> Do you do just document (UIMA) analysis in Hadoop, or also collection
> >>> (multi-doc) analytics?
> >>>
> >>> The fit between UIMA and Hadoop isn't obvious to me.  Just trying to
> figure it
> >>> out.
> >>>
> >>> Thanks,
> >>>
> >>>
> >>> Greg Holmberg
> >>>
>



-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com

Re: Scaling using Hadoop

Posted by Thilo Götz <tw...@gmx.de>.

Forgot to mention performance.  Latency is pretty bad, but
once it gets going, it's pretty fast in my experience.  We
get near-linear scale out on multiple nodes.  I have less
experience with using larger, multi-core machines.

So use hadoop when you have thousands of documents you need
to process in batch mode, and you can easily replicate your
processing pipeline multiple times.  For those scenarios,
it works well.  That is not to say it couldn't work in other
setups as well, I simply never tried it.  It may well be
that you can bring latency down by being a bit cleverer in
your hadoop setup, but for the batch scenarios, it's not
worth the trouble.

--Thilo

On 06/10/11 07:43, Thilo Götz wrote:
> On 05/10/11 22:43, Marshall Schor wrote:
>> We use hadoop with UIMA.  Here's the "fit", in one case:
>>
>> 1) UIMA runs as the map step; we put the uima pipeline into the mapper.  Hadoop
>> has a configure (?) method where you can stick the creation and set up of the
>> uima pipeline, similar to UIMA's initialize.
>>
>> 2) Write a hadoop record reader that reads input from hadoop's "splits", and
>> creates things that would go into individual CASes.  These are the input to the
>> Map step.
>>
>> 3) The map takes the input (a string, say), and puts it into a CAS, and then
>> calls the process() method on the engine it set up and initialized in step 1.
>>
>> 4) When the process method returns, the CAS has all the results - iterate thru
>> it and extract whatever you want, and stick those values into your hadoop output
>> object, and output it.
>>
>> 5) The reduce step can take all of these output objects (which can be sorted as
>> you wish) and do whatever you want with them. 
> 
> That basically sums it up.  We (and that's a different we than Marshall's we)
> use hadoop only for batch processing, but since that's the only processing
> we're currently doing, that works out well.  We use hdfs as the underlying
> storage normally.
> 
> --Thilo
> 
>>
>> We usually replicate our data 2x in Hadoop Distributed File System, so that big
>> runs don't fail due to single failures of disk drives. 
>>
>> HTH. -Marshall
>>
>> On 10/5/2011 2:24 PM, Greg Holmberg wrote:
>>> On Tue, 27 Sep 2011 01:06:02 -0700, Thilo Götz <tw...@gmx.de> wrote:
>>>
>>>> On 26/09/11 22:31, Greg Holmberg wrote:
>>>>>
>>>>> This is what I'm doing.  I use JavaSpaces (producer/consumer queue), but I'm
>>>>> sure you can get the same effect with UIMA AS and ActiveMQ.
>>>>
>>>> Or Hadoop.
>>>
>>> Thilo, could you expand on this?  Exactly how do you use Hadoop to scale UIMA?
>>>
>>> What storage do you use under Hadoop (HDFS, Hbase, Hive, etc), and what is
>>> your final storage destination for the CAS data?
>>>
>>> Are you doing on-demand, streaming, or batch processing of documents?
>>>
>>> What are your key/value pairs?  URLs?  What's your map step, what's your
>>> reduce step?
>>>
>>> How do you partition?  Do you find the system is load balanced?  What level of
>>> efficiency do you get?  What level of CPU utilization?
>>>
>>> Do you do just document (UIMA) analysis in Hadoop, or also collection
>>> (multi-doc) analytics?
>>>
>>> The fit between UIMA and Hadoop isn't obvious to me.  Just trying to figure it
>>> out.
>>>
>>> Thanks,
>>>
>>>
>>> Greg Holmberg
>>>

Re: Scaling using Hadoop

Posted by Jörn Kottmann <ko...@gmail.com>.

On 10/6/11 7:43 AM, Thilo Götz wrote:
>> We use hadoop with UIMA.  Here's the "fit", in one case:
>> >  
>> >  1) UIMA runs as the map step; we put the uima pipeline into the mapper.  Hadoop
>> >  has a configure (?) method where you can stick the creation and set up of the
>> >  uima pipeline, similar to UIMA's initialize.
>> >  
>> >  2) Write a hadoop record reader that reads input from hadoop's "splits", and
>> >  creates things that would go into individual CASes.  These are the input to the
>> >  Map step.
>> >  
>> >  3) The map takes the input (a string, say), and puts it into a CAS, and then
>> >  calls the process() method on the engine it set up and initialized in step 1.
>> >  
>> >  4) When the process method returns, the CAS has all the results - iterate thru
>> >  it and extract whatever you want, and stick those values into your hadoop output
>> >  object, and output it.
>> >  
>> >  5) The reduce step can take all of these output objects (which can be sorted as
>> >  you wish) and do whatever you want with them.
> That basically sums it up.  We (and that's a different we than Marshall's we)
> use hadoop only for batch processing, but since that's the only processing
> we're currently doing, that works out well.  We use hdfs as the underlying
> storage normally.

For low latency analysis I am using HBase and UIMA-AS, there an receiver 
writes
text items to HBase, afterward the row key is send to UIMA-AS which then 
retrieves the
document from HBase, after it is analyzed the results are written back 
to HBase.

Such a setup is well suited when you have a huge stream of documents, 
which must be analyzed
in near real time. If you would use Map Reduce, you would need to first 
wait and collected documents
before the job can be started.

Jörn

Re: Scaling using Hadoop

Posted by Thilo Götz <tw...@gmx.de>.

On 05/10/11 22:43, Marshall Schor wrote:
> We use hadoop with UIMA.  Here's the "fit", in one case:
> 
> 1) UIMA runs as the map step; we put the uima pipeline into the mapper.  Hadoop
> has a configure (?) method where you can stick the creation and set up of the
> uima pipeline, similar to UIMA's initialize.
> 
> 2) Write a hadoop record reader that reads input from hadoop's "splits", and
> creates things that would go into individual CASes.  These are the input to the
> Map step.
> 
> 3) The map takes the input (a string, say), and puts it into a CAS, and then
> calls the process() method on the engine it set up and initialized in step 1.
> 
> 4) When the process method returns, the CAS has all the results - iterate thru
> it and extract whatever you want, and stick those values into your hadoop output
> object, and output it.
> 
> 5) The reduce step can take all of these output objects (which can be sorted as
> you wish) and do whatever you want with them. 

That basically sums it up.  We (and that's a different we than Marshall's we)
use hadoop only for batch processing, but since that's the only processing
we're currently doing, that works out well.  We use hdfs as the underlying
storage normally.

--Thilo

> 
> We usually replicate our data 2x in Hadoop Distributed File System, so that big
> runs don't fail due to single failures of disk drives. 
> 
> HTH. -Marshall
> 
> On 10/5/2011 2:24 PM, Greg Holmberg wrote:
>> On Tue, 27 Sep 2011 01:06:02 -0700, Thilo Götz <tw...@gmx.de> wrote:
>>
>>> On 26/09/11 22:31, Greg Holmberg wrote:
>>>>
>>>> This is what I'm doing.  I use JavaSpaces (producer/consumer queue), but I'm
>>>> sure you can get the same effect with UIMA AS and ActiveMQ.
>>>
>>> Or Hadoop.
>>
>> Thilo, could you expand on this?  Exactly how do you use Hadoop to scale UIMA?
>>
>> What storage do you use under Hadoop (HDFS, Hbase, Hive, etc), and what is
>> your final storage destination for the CAS data?
>>
>> Are you doing on-demand, streaming, or batch processing of documents?
>>
>> What are your key/value pairs?  URLs?  What's your map step, what's your
>> reduce step?
>>
>> How do you partition?  Do you find the system is load balanced?  What level of
>> efficiency do you get?  What level of CPU utilization?
>>
>> Do you do just document (UIMA) analysis in Hadoop, or also collection
>> (multi-doc) analytics?
>>
>> The fit between UIMA and Hadoop isn't obvious to me.  Just trying to figure it
>> out.
>>
>> Thanks,
>>
>>
>> Greg Holmberg
>>

Re: Scaling using Hadoop

Posted by Marshall Schor <ms...@schor.com>.

We use hadoop with UIMA.  Here's the "fit", in one case:

1) UIMA runs as the map step; we put the uima pipeline into the mapper.  Hadoop
has a configure (?) method where you can stick the creation and set up of the
uima pipeline, similar to UIMA's initialize.

2) Write a hadoop record reader that reads input from hadoop's "splits", and
creates things that would go into individual CASes.  These are the input to the
Map step.

3) The map takes the input (a string, say), and puts it into a CAS, and then
calls the process() method on the engine it set up and initialized in step 1.

4) When the process method returns, the CAS has all the results - iterate thru
it and extract whatever you want, and stick those values into your hadoop output
object, and output it.

5) The reduce step can take all of these output objects (which can be sorted as
you wish) and do whatever you want with them. 

We usually replicate our data 2x in Hadoop Distributed File System, so that big
runs don't fail due to single failures of disk drives. 

HTH. -Marshall

On 10/5/2011 2:24 PM, Greg Holmberg wrote:
> On Tue, 27 Sep 2011 01:06:02 -0700, Thilo Götz <tw...@gmx.de> wrote:
>
>> On 26/09/11 22:31, Greg Holmberg wrote:
>>>
>>> This is what I'm doing.  I use JavaSpaces (producer/consumer queue), but I'm
>>> sure you can get the same effect with UIMA AS and ActiveMQ.
>>
>> Or Hadoop.
>
> Thilo, could you expand on this?  Exactly how do you use Hadoop to scale UIMA?
>
> What storage do you use under Hadoop (HDFS, Hbase, Hive, etc), and what is
> your final storage destination for the CAS data?
>
> Are you doing on-demand, streaming, or batch processing of documents?
>
> What are your key/value pairs?  URLs?  What's your map step, what's your
> reduce step?
>
> How do you partition?  Do you find the system is load balanced?  What level of
> efficiency do you get?  What level of CPU utilization?
>
> Do you do just document (UIMA) analysis in Hadoop, or also collection
> (multi-doc) analytics?
>
> The fit between UIMA and Hadoop isn't obvious to me.  Just trying to figure it
> out.
>
> Thanks,
>
>
> Greg Holmberg
>

Re: Scaling using Hadoop

Posted by Greg Holmberg <ho...@comcast.net>.

On Tue, 27 Sep 2011 01:06:02 -0700, Thilo Götz <tw...@gmx.de> wrote:

> On 26/09/11 22:31, Greg Holmberg wrote:
>>
>> This is what I'm doing.  I use JavaSpaces (producer/consumer queue),  
>> but I'm
>> sure you can get the same effect with UIMA AS and ActiveMQ.
>
> Or Hadoop.

Thilo, could you expand on this?  Exactly how do you use Hadoop to scale  
UIMA?

What storage do you use under Hadoop (HDFS, Hbase, Hive, etc), and what is  
your final storage destination for the CAS data?

Are you doing on-demand, streaming, or batch processing of documents?

What are your key/value pairs?  URLs?  What's your map step, what's your  
reduce step?

How do you partition?  Do you find the system is load balanced?  What  
level of efficiency do you get?  What level of CPU utilization?

Do you do just document (UIMA) analysis in Hadoop, or also collection  
(multi-doc) analytics?

The fit between UIMA and Hadoop isn't obvious to me.  Just trying to  
figure it out.

Thanks,

Greg Holmberg

Re: remoteAnalysisEngine services not scaling to effect

Posted by Burn Lewis <bu...@gmail.com>.

> I observed from the logs that cases are divided among the the 2 running
> instances of cas consumers , but some of the cases seem to be missed out,
which
> didn't go to any of the 2 instances. I can't understand why so ?

This is strange.  UIMA-AS doesn't treat CasConsumers any differently, since
any annotator can write data to a file system or DB.  Can you trace in the
logs the CasReferenceId of the missing CASes through each of the annotators
and see why they don't complete the pipeline?

~Burn

Re: remoteAnalysisEngine services not scaling to effect

Posted by Arun Tewatia <ar...@orkash.com>.

Thanks Greg Holmberg and Burn Lewis for the reply....

You have understood right, what i am trying to do.

> What you're doing is taking each step in your analysis engine and running it on
> one or more machines.

And yes this will create the 2 problems that you mentioned.
Network overhead & lumpy behavior

But then as  'Burn Lewis' mentioned it shows a disadvantage when some of the
annotators in the pipeline consumes lot of memory. Also almost all of the
documents are of same size. 

And similar is my case, some of annotators of my pipeline consume lot of memory.
So what i am trying to do is, club together a few annotators i.e. divide the
whole pipeline ( having about 15 AE's ) into 2-3 aggregates.


So now i can maintain the ratios btw these aggregates.
In first stage i am trying to optimize performance by maintaining this ratio. In
second stage I'll use cas multipliers to slice the documents.


As for my problem of " asynchronous data in database " , it still persists.
I enabled FINE-logging as suggested by Burn Lewis .
I also observed the queue depths of CasConsumers queue, which didn't budge from
zero. So i understand that there's no point of increasing instances of cas
consumers, but if i did so........ still the data should go syncronized.
Shouldn't it ?

I observed from the logs that cases are divided among the the 2 running
instances of cas consumers , but some of the cases seem to be missed out, which
didn't go to any of the 2 instances. I can't understand why so ?

Thanks
Arun Tewatia

Re: remoteAnalysisEngine services not scaling to effect

Posted by Burn Lewis <bu...@gmail.com>.

Arun,

If you have multiple instances of your CasConsumer then access to your
database must be synchronized.  I suggest you add logging to each instance
to verify that  all the CASes are being processed by your CasConsumers.  You
should also be able to see the CASes flow in and out of the services from
the messages in the uima logs if you set the following in your
logger.properties file:
     org.apache.uima.adapter.jms.activemq.JmsInputChannel.level = FINE

As Greg said it's sometimes easier to run synchronous pipelines and deploy
multiple instances of them on the same machine or across machines.  The
collection reader would be a UIMA-AS client sending CASes to the shared
input queue for each pipeline to process one at a time.  One disadvantage
would be if one of the annotators required a large amount of memory as this
might limit your scale-out rather than CPU load.

~Burn

Re: remoteAnalysisEngine services not scaling to effect

Posted by Thilo Götz <tw...@gmx.de>.

On 26/09/11 22:31, Greg Holmberg wrote:
> Arun--
> 
> 
> I don't know what the cause of your specific technical issue is, but in my
> opinion, there's a better way to slice the problem.
> 
> What you're doing is taking each step in your analysis engine and running it on
> one or more machines.  The creates two problems.
> 
> One, it's a lot of network overhead.  You're moving each document across the
> network many times.  You can easily spend more time just moving the data around
> than actually processing.  It also creates a low ceiling to scalability, since
> you chew up a lot of network bandwidth.
> 
> Two, in order to use your hardware efficiently, you have to get the right ratio
> of machines/CPUs for each step.  Some steps use more cycles than others.  For
> example, you might find that for a given configuration and set of documents that
> the ratio of CPU usage for steps A, B, and C are 1:5:2.  Now you need to
> instantiate A, B, and C services to use cores in that ratio.  Then, suppose you
> want to add more machines--how should you allocate them to A, B, and C?  It will
> always be lumpy, with some cores not being used much.  But worse, with a
> different configuration (different dictionaries, for example), or with different
> documents (longer vs. shorter, for example), the ratios will change, and you
> will have to reconfigure your machines again.  It's never-ending, and it's never
> completely right.
> 
> So, it would be much easier to manage and more efficient, more scalable, if you
> just run your analysis engine self-contained in a single process, and then
> replicate the engine over your machines/CPUs.  You slice by document, not by
> service--send each document to a different analysis engine instance.  This makes
> your life easier, always runs the CPUs at 100%, and scales indefinitely.  Just
> add more machines, it goes faster.
> 
> This is what I'm doing.  I use JavaSpaces (producer/consumer queue), but I'm
> sure you can get the same effect with UIMA AS and ActiveMQ.

Or Hadoop.

> 
> 
> Greg

Re: remoteAnalysisEngine services not scaling to effect

Posted by Greg Holmberg <ho...@comcast.net>.

Arun--


I don't know what the cause of your specific technical issue is, but in my  
opinion, there's a better way to slice the problem.

What you're doing is taking each step in your analysis engine and running  
it on one or more machines.  The creates two problems.

One, it's a lot of network overhead.  You're moving each document across  
the network many times.  You can easily spend more time just moving the  
data around than actually processing.  It also creates a low ceiling to  
scalability, since you chew up a lot of network bandwidth.

Two, in order to use your hardware efficiently, you have to get the right  
ratio of machines/CPUs for each step.  Some steps use more cycles than  
others.  For example, you might find that for a given configuration and  
set of documents that the ratio of CPU usage for steps A, B, and C are  
1:5:2.  Now you need to instantiate A, B, and C services to use cores in  
that ratio.  Then, suppose you want to add more machines--how should you  
allocate them to A, B, and C?  It will always be lumpy, with some cores  
not being used much.  But worse, with a different configuration (different  
dictionaries, for example), or with different documents (longer vs.  
shorter, for example), the ratios will change, and you will have to  
reconfigure your machines again.  It's never-ending, and it's never  
completely right.

So, it would be much easier to manage and more efficient, more scalable,  
if you just run your analysis engine self-contained in a single process,  
and then replicate the engine over your machines/CPUs.  You slice by  
document, not by service--send each document to a different analysis  
engine instance.  This makes your life easier, always runs the CPUs at  
100%, and scales indefinitely.  Just add more machines, it goes faster.

This is what I'm doing.  I use JavaSpaces (producer/consumer queue), but  
I'm sure you can get the same effect with UIMA AS and ActiveMQ.


Greg

Re: remoteAnalysisEngine services not scaling to effect

Posted by Arun Tewatia <ar...@orkash.com>.

Thanks for replying ....

I am trying to break my big pipeline of AE's having over 15 AE's for Cluster
Environment

I was initially testing with 2 of the beginning AE's of my pipeline. 
And when i checked the QueueSize using jconsole, I found that it was zero, next
I checked the same with my whole pipeline aggregate which turned out to be
12-15. In this case increasing the instances did reduce the processing time.....
Thanks.....

Also i increased CasPool size in aggregate from 5 to 15 which showed
improvement.....Thanks to Burn

But still the other issue that i mentioned is remaining

When i use single instance of CasConsumer, output to the database is correct but
when i use 2 instances of CasConsumer the output is incorrect.

My CasConsumer 'c' takes input from both AE's 'a' & 'b'.....
The aggregate has the flow defined as 'a' -> 'b' -> 'c'
Yet the output produced in database is less for 'a' than for 'b'
Also both are less than what should come actually, which comes in case when only
1 CasConsumer is running i.e. equal for 'a' and 'b' ......

What else am i missing........?


Arun Tewatia

Re: remoteAnalysisEngine services not scaling to effect

Posted by Burn Lewis <bu...@gmail.com>.

You didn't say how big the casPool is in your aggregate ... I think it
should be >= 15 to keep all your remote services busy.

~Burn

Re: remoteAnalysisEngine services not scaling to effect

Posted by Jaroslaw Cwiklik <ui...@gmail.com>.

Are you doing any real analysis in your AEs? If the analysis is fast,
replicating AEs makes no sense. You can see that by observing queue depths
in jConsole. Attach it to a broker and check the queue size while you are
processing. You may need to enable broker JMX support in activemq.xml first:

<managementContext>
            <managementContext createConnector="true"/>
</managementContext>

Replicating UIMA AS service makes sense if the queue size is > 0.

JC


On Fri, Sep 23, 2011 at 12:05 AM, Arun Tewatia <ar...@orkash.com>wrote:

> Hi all,
> I am using UIMA-AS for some time now. I have been trying to divide my UIMA
> pipeline and scale it out on a cluster environment.
>
> Here's my setup :
>
> UIMA-AS 2.3.1
> Machine 1 : Broker , 1 instance of aggregate X on Queue-X (containing
> entries for 2 remoteAnalysisEngines and 1 remote CasConsumer all on
> different Queues - a,b,c respectively ).
> Machine 2 : analysisEngine A ( on Queue-a 5 instances of service )
> Machine 3 : analysisEngine B ( on Queue-b 5 instances of service )
> Machine 4 : CasConsumer C ( on Queue-c 5 instances of service )
>
> The CasConsumer puts the output in database. analysisEngines A & B are
> primitives.
>
> Problem : The output in database is not synchronized, also the processing
> time didn't reduce by increasing number of instances of either of the Queues
> a ,b, c or x.
>
> When i run 1 instance of CasConsumer then the output produced in database
> is correct, but still the processing time is not showing any improvement by
> increasing number of services deployed of analysisEngines.
> I have tried remoteReplyQueueScaleout parameter also, but no effect then
> too.
>
> Am i going wrong somewhere in my understanding of UIMA-AS. Please help me
> in this..........
>
> Arun Tewatia
>