You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@uima.apache.org by LinTong <pc...@gmail.com> on 2010/05/18 13:46:21 UTC

the performance of UIMA AS

Hallo everybody

Now I am investigating UIMA AS. I'm very confused by the poor
performance of UIMA-AS. I run the example AS descriptor
MeetingDetectorTAE. No matter
Deploy_MeetingDetectorTAE_3MeetingAnnotators.xml or
Deploy_MeetingDetectorTAE_Sync_3Instances.xml, there is no speedup at
all. Also I tried Deploy_MeetingDetectorTAE_RemoteRoomNumber.xml and
deployed several instances of service RemoteRoomNumber. But still no
speedup. My sample includes 450 documents. Actually MeetingDetectorTAE
costs appx. 1000ms in CPE. Deploy_MeetingDetectorTAE.xml costs 5000ms
in UIMA AS while all components are on the same machine. If I run
Deploy_MeetingDetectorTAE_RemoteRoomNumber.xml and service
RemoteRoomNumber on different computer, it takes almost 20000ms. I
know these is overhead including de/serialisation, but there is no
reason that the performance is so poor. Does anybody have idea about
my problem? Did I make any stupid mistake?

BTW, when I enable the flag named async, system gives the following
debug information back. The analysis time and idle time seem quite
strange. Does my AE only cost c.a. 280ms?(the collection reader I used
costs c.a. 2000ms).


INFO: Controller: [Meeting Detector TAE] Delegate <<Meeting Detector
TAE>> Stats:
	 Total Number CASes Processed: 257
	 Total CAS Deserialization Time: 327,602 ms
	 Total CAS Serialization Time: 93,601 ms
	 Total Time Spent In Analysis: 280,802 ms
	 Max Serialization Time: 15,6 ms
	 Max Deserialization Time: 15,6 ms
	 Max Analysis Time: 202,801 ms
	 Total Idle Time: 1.625,275 ms
Completed 451 documents; 593984 characters
Time Elapsed : 4808 ms


Thank you so much if somebody could help me !

-- 
Best Regards
LinTong(Pierre)

Re: the performance of UIMA AS

Posted by Eddie Epstein <ea...@gmail.com>.
An intro to UIMA AS at

http://uima.apache.org/doc-uimaas-what.html#UIMA%20AS%20-%20Example%20Application%20Scenarios

Includes the comment: Scaleout efficiency is determined by the ratio
of the processing done by the scaled out analysis engines to the
serialization overhead in the services. That said, there are different
UIMA AS configurations that will minimize overhead; see the last
example on that web page.

Eddie

On Tue, May 18, 2010 at 7:46 AM, LinTong <pc...@gmail.com> wrote:
> Hallo everybody
>
> Now I am investigating UIMA AS. I'm very confused by the poor
> performance of UIMA-AS. I run the example AS descriptor
> MeetingDetectorTAE. No matter
> Deploy_MeetingDetectorTAE_3MeetingAnnotators.xml or
> Deploy_MeetingDetectorTAE_Sync_3Instances.xml, there is no speedup at
> all. Also I tried Deploy_MeetingDetectorTAE_RemoteRoomNumber.xml and
> deployed several instances of service RemoteRoomNumber. But still no
> speedup. My sample includes 450 documents. Actually MeetingDetectorTAE
> costs appx. 1000ms in CPE. Deploy_MeetingDetectorTAE.xml costs 5000ms
> in UIMA AS while all components are on the same machine. If I run
> Deploy_MeetingDetectorTAE_RemoteRoomNumber.xml and service
> RemoteRoomNumber on different computer, it takes almost 20000ms. I
> know these is overhead including de/serialisation, but there is no
> reason that the performance is so poor. Does anybody have idea about
> my problem? Did I make any stupid mistake?
>
> BTW, when I enable the flag named async, system gives the following
> debug information back. The analysis time and idle time seem quite
> strange. Does my AE only cost c.a. 280ms?(the collection reader I used
> costs c.a. 2000ms).
>
>
> INFO: Controller: [Meeting Detector TAE] Delegate <<Meeting Detector
> TAE>> Stats:
>         Total Number CASes Processed: 257
>         Total CAS Deserialization Time: 327,602 ms
>         Total CAS Serialization Time: 93,601 ms
>         Total Time Spent In Analysis: 280,802 ms
>         Max Serialization Time: 15,6 ms
>         Max Deserialization Time: 15,6 ms
>         Max Analysis Time: 202,801 ms
>         Total Idle Time: 1.625,275 ms
> Completed 451 documents; 593984 characters
> Time Elapsed : 4808 ms
>
>
> Thank you so much if somebody could help me !
>
> --
> Best Regards
> LinTong(Pierre)
>

Re: the performance of UIMA AS

Posted by Eddie Epstein <ea...@gmail.com>.
LinTong,

When you say "similar to Fig 4", are you actually running a CPM with
the MeetingDetectorTAE configured as a remote component? The CPM
calls remote components synchronously, meaning only one CAS is
sent at a time. Using a synchronous client, the remote annotator
is not doing any processing the entire time each CAS is being serialized,
deserialized, and transmitted via the broker to the remote service back and
forth.

Since the MeetingDetectorTAE has 3 delegates, if deployed
asynchronously it should be able to process 3 CASes concurrently, and
so the client sending CASes to the MeetingDetectorTAE queue should
have at least 3 outstanding CASes at a time. The RunRemoteAsyncAE
application has the ability to have multiple CASes outstanding to a queue.

Of course for this application with dummy analytics, virtually all the CPU
cycles are going to serialization. Using a separate RemoteRoomNumber
service compounds the problem, doubling the serialization work. Note
that the configuration in Fig 5 avoids all serialization of each document.

There is also an optimization to serialization that can be used to reduce
that overhead by 2-3x or more, depending on the data, using binary
serialization to UIMA AS services. The drawback of this approach is
that both client and service must have identical type systems.

Regards,
Eddie

On Wed, May 19, 2010 at 4:36 AM, LinTong <pc...@gmail.com> wrote:
> To Eddie
> I have read all the document about UIMA AS I can find, including
> handbook"UIMA Asynchronous Scaleout", readme and website. And I tried
> different performance parameters mentioned in the handbook for tuning.
> Unfortunately they do not help. Could you please tell me some other
> documents or tutorials I can follow? BTW, my approach is similar as
> Fig. 4 on website. I know there might be bottleneck if services are
> remote. But it is just a test and I don't know why the performance is
> so poor..
>

Re: the performance of UIMA AS

Posted by LinTong <pc...@gmail.com>.
Hi

Thanks very much for help!

To Jörn
According to your suggestion I enlarged my sample. Now it has appx.
10000document and 10MB. About overhead I got a strange behavior you
can see at the bottom.

To Eddie
I have read all the document about UIMA AS I can find, including
handbook"UIMA Asynchronous Scaleout", readme and website. And I tried
different performance parameters mentioned in the handbook for tuning.
Unfortunately they do not help. Could you please tell me some other
documents or tutorials I can follow? BTW, my approach is similar as
Fig. 4 on website. I know there might be bottleneck if services are
remote. But it is just a test and I don't know why the performance is
so poor..

To Marshall
About cpu usage.
My cpu has two cores. No matter which approach I use
(Deploy_MeetingDetectorTAE_3MeetingAnnotators,
Deploy_MeetingDetectorTAE or
Deploy_MeetingDetectorTAE_Sync_3Instances), the usage is always 100%.
Then I try to scale it out. The behavior is really strange:

Computer A is local.
runRemoteAsyncAE with Deploy_MeetingDetectorTAE_RemoteRoomNumber and
collection reader is on A

Computer B is remote.
service Deploy_RoomNumberAnnotator and Broker is on B

Result: The process is running but the performance is even worse than
on same computer. the cpu usage on both A and B are almost empty. Is
it because of network bottleneck?(A and B are connected by LAN)

Another strange thing:
If the broker is on A, service which is on B is able to connect to
broker but the analyse process can not run. It seems client can not
find service.
*******************************
I know my question is quit long and sounds stupid. But as a newbie, I
am confused by them for several weeks. Thanks so much for all helps
from you.


On Tue, May 18, 2010 at 3:57 PM, Marshall Schor <ms...@schor.com> wrote:
> Hi,
>
> Here are just a few general observations.
>
> A generally useful check: while the tests are running, examine the cpu %
> busy on the various machines being used.  If it is not 100%, then look
> for a bottleneck somewhere...
>
> If you're running on one machine, then the speed ups you get will
> probably only be seen if that machine is a multi-core machine, or
> there's a lot of I/O that the annotators are doing.  In your case, the
> annotators do no I/O - so you would need to be on a multi-core machine.
> Once you scale past the number of cores, there's no further speed up
> possible, I think, for the main pipeline.
>
> The timing measurements below I believe are wall-clock measures, not
> cpu-time.
>
> If you do manage to get scaleout, the overall performance in this case
> is probably going to be dictated by the rate at which your collection
> reader can send CASes into the pipeline.  In many of our tests, where
> we're deploying on a network of machines, we find that to load up the
> pipeline, we have to pre-read all the test CASes into memory, ahead of
> time, and then have the driver program send those as fast as it can, in
> order to create a reasonable load.
>
> HTH.   -Marshall
>
> On 5/18/2010 7:46 AM, LinTong wrote:
>> Hallo everybody
>>
>> Now I am investigating UIMA AS. I'm very confused by the poor
>> performance of UIMA-AS. I run the example AS descriptor
>> MeetingDetectorTAE. No matter
>> Deploy_MeetingDetectorTAE_3MeetingAnnotators.xml or
>> Deploy_MeetingDetectorTAE_Sync_3Instances.xml, there is no speedup at
>> all. Also I tried Deploy_MeetingDetectorTAE_RemoteRoomNumber.xml and
>> deployed several instances of service RemoteRoomNumber. But still no
>> speedup. My sample includes 450 documents. Actually MeetingDetectorTAE
>> costs appx. 1000ms in CPE. Deploy_MeetingDetectorTAE.xml costs 5000ms
>> in UIMA AS while all components are on the same machine. If I run
>> Deploy_MeetingDetectorTAE_RemoteRoomNumber.xml and service
>> RemoteRoomNumber on different computer, it takes almost 20000ms. I
>> know these is overhead including de/serialisation, but there is no
>> reason that the performance is so poor. Does anybody have idea about
>> my problem? Did I make any stupid mistake?
>>
>> BTW, when I enable the flag named async, system gives the following
>> debug information back. The analysis time and idle time seem quite
>> strange. Does my AE only cost c.a. 280ms?(the collection reader I used
>> costs c.a. 2000ms).
>>
>>
>> INFO: Controller: [Meeting Detector TAE] Delegate <<Meeting Detector
>> TAE>> Stats:
>>        Total Number CASes Processed: 257
>>        Total CAS Deserialization Time: 327,602 ms
>>        Total CAS Serialization Time: 93,601 ms
>>        Total Time Spent In Analysis: 280,802 ms
>>        Max Serialization Time: 15,6 ms
>>        Max Deserialization Time: 15,6 ms
>>        Max Analysis Time: 202,801 ms
>>        Total Idle Time: 1.625,275 ms
>> Completed 451 documents; 593984 characters
>> Time Elapsed : 4808 ms
>>
>>
>> Thank you so much if somebody could help me !
>>
>>
>



-- 
Best Regards
LinTong(Pierre)

Re: the performance of UIMA AS

Posted by Marshall Schor <ms...@schor.com>.
Hi,

Here are just a few general observations.

A generally useful check: while the tests are running, examine the cpu %
busy on the various machines being used.  If it is not 100%, then look
for a bottleneck somewhere...

If you're running on one machine, then the speed ups you get will
probably only be seen if that machine is a multi-core machine, or
there's a lot of I/O that the annotators are doing.  In your case, the
annotators do no I/O - so you would need to be on a multi-core machine. 
Once you scale past the number of cores, there's no further speed up
possible, I think, for the main pipeline.

The timing measurements below I believe are wall-clock measures, not
cpu-time.

If you do manage to get scaleout, the overall performance in this case
is probably going to be dictated by the rate at which your collection
reader can send CASes into the pipeline.  In many of our tests, where
we're deploying on a network of machines, we find that to load up the
pipeline, we have to pre-read all the test CASes into memory, ahead of
time, and then have the driver program send those as fast as it can, in
order to create a reasonable load.

HTH.   -Marshall 

On 5/18/2010 7:46 AM, LinTong wrote:
> Hallo everybody
>
> Now I am investigating UIMA AS. I'm very confused by the poor
> performance of UIMA-AS. I run the example AS descriptor
> MeetingDetectorTAE. No matter
> Deploy_MeetingDetectorTAE_3MeetingAnnotators.xml or
> Deploy_MeetingDetectorTAE_Sync_3Instances.xml, there is no speedup at
> all. Also I tried Deploy_MeetingDetectorTAE_RemoteRoomNumber.xml and
> deployed several instances of service RemoteRoomNumber. But still no
> speedup. My sample includes 450 documents. Actually MeetingDetectorTAE
> costs appx. 1000ms in CPE. Deploy_MeetingDetectorTAE.xml costs 5000ms
> in UIMA AS while all components are on the same machine. If I run
> Deploy_MeetingDetectorTAE_RemoteRoomNumber.xml and service
> RemoteRoomNumber on different computer, it takes almost 20000ms. I
> know these is overhead including de/serialisation, but there is no
> reason that the performance is so poor. Does anybody have idea about
> my problem? Did I make any stupid mistake?
>
> BTW, when I enable the flag named async, system gives the following
> debug information back. The analysis time and idle time seem quite
> strange. Does my AE only cost c.a. 280ms?(the collection reader I used
> costs c.a. 2000ms).
>
>
> INFO: Controller: [Meeting Detector TAE] Delegate <<Meeting Detector
> TAE>> Stats:
> 	 Total Number CASes Processed: 257
> 	 Total CAS Deserialization Time: 327,602 ms
> 	 Total CAS Serialization Time: 93,601 ms
> 	 Total Time Spent In Analysis: 280,802 ms
> 	 Max Serialization Time: 15,6 ms
> 	 Max Deserialization Time: 15,6 ms
> 	 Max Analysis Time: 202,801 ms
> 	 Total Idle Time: 1.625,275 ms
> Completed 451 documents; 593984 characters
> Time Elapsed : 4808 ms
>
>
> Thank you so much if somebody could help me !
>
>   

Re: the performance of UIMA AS

Posted by Eddie Epstein <ea...@gmail.com>.
The scaling parameters of CPM are the number of pipeline threads
and the number of CASes available to feed these threads. Normally
the number of CASes should be at least the number of pipelines
plus 2; one extra for the final CasConsumer thread and another one
for the CollectionReader thread. Additional CASes might be useful
if the CollectionReader is slow.

Is your CPM all running on the same computer, or using remote Vinci
services for some components?

Eddie

On Tue, May 25, 2010 at 1:53 PM, Sean <mu...@mayo.edu> wrote:
>
> Do you happen to know if this scaling would apply to the CPM version of UIMA?
> This difference may be inherent between AS vs CPM, but I wanted to know if there
> is an equivalent of font loading in the latter case.  If not, do you have any
> other suggestions for potentially increasing the CPM throughput?
>      Thanks,
>                     ~Sean
>
>
>

Re: the performance of UIMA AS

Posted by Sean <mu...@mayo.edu>.
Do you happen to know if this scaling would apply to the CPM version of UIMA? 
This difference may be inherent between AS vs CPM, but I wanted to know if there
is an equivalent of font loading in the latter case.  If not, do you have any
other suggestions for potentially increasing the CPM throughput?
      Thanks,
                     ~Sean



Re: the performance of UIMA AS

Posted by Jörn Kottmann <ko...@gmail.com>.
LinTong wrote:
> Hallo everybody
>
> Now I am investigating UIMA AS. I'm very confused by the poor
> performance of UIMA-AS. I run the example AS descriptor
> MeetingDetectorTAE. No matter
> Deploy_MeetingDetectorTAE_3MeetingAnnotators.xml or
> Deploy_MeetingDetectorTAE_Sync_3Instances.xml, there is no speedup at
> all. Also I tried Deploy_MeetingDetectorTAE_RemoteRoomNumber.xml and
> deployed several instances of service RemoteRoomNumber. But still no
> speedup. My sample includes 450 documents. Actually MeetingDetectorTAE
> costs appx. 1000ms in CPE. Deploy_MeetingDetectorTAE.xml costs 5000ms
> in UIMA AS while all components are on the same machine. If I run
> Deploy_MeetingDetectorTAE_RemoteRoomNumber.xml and service
> RemoteRoomNumber on different computer, it takes almost 20000ms. I
> know these is overhead including de/serialisation, but there is no
> reason that the performance is so poor. Does anybody have idea about
> my problem? Did I make any stupid mistake?
>   

My performance tests showed that UIMA AS scales nicely with
the number of additional services. In our test system each service
runs on a dedicated medium sized server on which it reaches almost
full CPU utilization.
Compared to your tests there I can see two differences, we usually
do load tests with at least a few hundred thousand documents up to a few
millions. Our AEs are much more compute intensive than the
Meeting Detecor or Room Number AE, thats why I never noted the overhead
caused by UIMA AS.

Jörn