You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@uima.apache.org by Julien Nioche <li...@gmail.com> on 2008/08/13 16:03:50 UTC

running aggregate engine within CPE and client code

Hi,

I am slightly puzzled by the following case. I have integrated an aggregate
engine into my code in a very straightforward way :

* // reset the tcas for the next document
 tcas.reset();

 InputStream fis = new BufferedInputStream(new FileInputStream(target));
 byte[] contents = new byte[(int) target.length()];
 fis.read(contents);
 fis.close();

 String document = new String(contents);

 tcas.setDocumentText(document);
 tcas.setDocumentLanguage("en");

 controller.process(tcas);

*Using the aggregate engine from the CPM is more than 10x faster than my
client code; both are running in a single thread. I profiled my application
and found that the slower part is

*87.9% - 50,781 ms
org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process*
*
*i.e the time is not spent in other parts of my code but in the process()
method.*

*I get a similar difference even when setting *casPoolSize="1" *in my CPE
descriptor.* *Needless to say that I'd like to get the same type of
performance in both cases. Any idea of what might be the cause?*
**
*Thanks

Julien*

-- 
*DigitalPebble Ltd
http://www.digitalpebble.com

Re: running aggregate engine within CPE and client code

Posted by Julien Nioche <li...@gmail.com>.

Hi guys,

As it is often the case in similar situations, I suspect that the source of
the problem is in my code. The piece of code that I gave earlier and which
runs the engine is actually embedded in a third part library. I tried using
it outside that library in a very simple class and the results I get are
very close to those of the CPE - which is reassuring. What confuses me is
that the profiler gives a very similar information for both scenarios so I
still don't know why it is slower when I use it through that code. There is
no visible reason why it is so.

I know that the library uses the Java Plugin Framework and detects the jars
to use by itself (i.e they are not explicitely set on the classpath). Maybe
something to do with the classloaders? I have no idea but I have contacted
the author of that third part library, we'll see if there is a good
explanation. The difference seems to be more important when processing small
documents; with larger docs there is almost no difference.

Thanks for your help

Julien

2008/8/14 Julien Nioche <li...@gmail.com>

> they use the same uima-core.jar and the version of the JRE
> (java-1.5.0-sun-1.5.0.15/), same JVM options, same aggregate engine.
>
> I see that there is an object called *performanceTuningSettings* but it
> does not seem to be used by the CPE in a special way, and even if it did
> would that explain such a difference (10 times!).
>
> J.
>
>
> 2008/8/14 Eddie Epstein <ea...@gmail.com>
>
>> Julien,
>>
>>
>> A UIMA aggregate is a single-threaded animal. Deploying an aggregate
>> under UIMA AS offers the opportunity to deploy the delegates in
>> separate threads.
>>
>> Are the classpaths different between the two scenarios? Different JRE?
>>
>> Eddie
>>
>> On Thu, Aug 14, 2008 at 10:14 AM, Julien Nioche
>> <li...@gmail.com> wrote:
>> > Hi Eddie,
>> >
>> > Thank you for your message. Yes, the profiling includes everything in my
>> > client code, including the I/O.
>> >
>> > I checked that casPoolSize="1" in my CPM config file. Setting
>> > casPoolSize="3" in the config file makes virtually no difference, which
>> > means that (a) loading my 2000 documents in the same thread or in a
>> separate
>> > one makes no difference or (b) this parameter is not taken into account
>> at
>> > all.
>> >
>> > With an aggregate engine : is each primitive engine executed in a
>> separate
>> > thread or is the whole aggregate done in the same thread?
>> >
>> > Thank you for you help
>> >
>> > Julien
>> >
>>
>
>
>
> --
> DigitalPebble Ltd
> http://www.digitalpebble.com
>

-- 
DigitalPebble Ltd
http://www.digitalpebble.com

Re: running aggregate engine within CPE and client code

Posted by Julien Nioche <li...@gmail.com>.

they use the same uima-core.jar and the version of the JRE
(java-1.5.0-sun-1.5.0.15/), same JVM options, same aggregate engine.

I see that there is an object called *performanceTuningSettings* but it does
not seem to be used by the CPE in a special way, and even if it did would
that explain such a difference (10 times!).

J.


2008/8/14 Eddie Epstein <ea...@gmail.com>

> Julien,
>
> A UIMA aggregate is a single-threaded animal. Deploying an aggregate
> under UIMA AS offers the opportunity to deploy the delegates in
> separate threads.
>
> Are the classpaths different between the two scenarios? Different JRE?
>
> Eddie
>
> On Thu, Aug 14, 2008 at 10:14 AM, Julien Nioche
> <li...@gmail.com> wrote:
> > Hi Eddie,
> >
> > Thank you for your message. Yes, the profiling includes everything in my
> > client code, including the I/O.
> >
> > I checked that casPoolSize="1" in my CPM config file. Setting
> > casPoolSize="3" in the config file makes virtually no difference, which
> > means that (a) loading my 2000 documents in the same thread or in a
> separate
> > one makes no difference or (b) this parameter is not taken into account
> at
> > all.
> >
> > With an aggregate engine : is each primitive engine executed in a
> separate
> > thread or is the whole aggregate done in the same thread?
> >
> > Thank you for you help
> >
> > Julien
> >
>



-- 
DigitalPebble Ltd
http://www.digitalpebble.com

Re: running aggregate engine within CPE and client code

Posted by Eddie Epstein <ea...@gmail.com>.

Julien,

A UIMA aggregate is a single-threaded animal. Deploying an aggregate
under UIMA AS offers the opportunity to deploy the delegates in
separate threads.

Are the classpaths different between the two scenarios? Different JRE?

Eddie

On Thu, Aug 14, 2008 at 10:14 AM, Julien Nioche
<li...@gmail.com> wrote:
> Hi Eddie,
>
> Thank you for your message. Yes, the profiling includes everything in my
> client code, including the I/O.
>
> I checked that casPoolSize="1" in my CPM config file. Setting
> casPoolSize="3" in the config file makes virtually no difference, which
> means that (a) loading my 2000 documents in the same thread or in a separate
> one makes no difference or (b) this parameter is not taken into account at
> all.
>
> With an aggregate engine : is each primitive engine executed in a separate
> thread or is the whole aggregate done in the same thread?
>
> Thank you for you help
>
> Julien
>

Re: running aggregate engine within CPE and client code

Posted by Julien Nioche <li...@gmail.com>.

Hi Eddie,

Thank you for your message. Yes, the profiling includes everything in my
client code, including the I/O.

I checked that casPoolSize="1" in my CPM config file. Setting
casPoolSize="3" in the config file makes virtually no difference, which
means that (a) loading my 2000 documents in the same thread or in a separate
one makes no difference or (b) this parameter is not taken into account at
all.

With an aggregate engine : is each primitive engine executed in a separate
thread or is the whole aggregate done in the same thread?

Thank you for you help

Julien

2008/8/14 Eddie Epstein <ea...@gmail.com>

> Hi Julien,
>
> Using default settings, the CPM will run the collection reader in one
> thread, each processing pipeline in another, and finally another
> thread for the Cas consumers. These threads can only run concurrently
> if there are enough CASes. A Cas pool size of 1 limits all work to one
> thread at a time.
>
> Does your profile take into account the I/O time reading the documents?
>
> Eddie
>
> On Wed, Aug 13, 2008 at 10:03 AM, Julien Nioche
> <li...@gmail.com> wrote:
> > Hi,
> >
> > I am slightly puzzled by the following case. I have integrated an
> aggregate
> > engine into my code in a very straightforward way :
> >
> > * // reset the tcas for the next document
> >  tcas.reset();
> >
> >  InputStream fis = new BufferedInputStream(new FileInputStream(target));
> >  byte[] contents = new byte[(int) target.length()];
> >  fis.read(contents);
> >  fis.close();
> >
> >  String document = new String(contents);
> >
> >  tcas.setDocumentText(document);
> >  tcas.setDocumentLanguage("en");
> >
> >  controller.process(tcas);
> >
> > *Using the aggregate engine from the CPM is more than 10x faster than my
> > client code; both are running in a single thread. I profiled my
> application
> > and found that the slower part is
> >
> > *87.9% - 50,781 ms
> > org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process*
> > *
> > *i.e the time is not spent in other parts of my code but in the process()
> > method.*
> >
> > *I get a similar difference even when setting *casPoolSize="1" *in my CPE
> > descriptor.* *Needless to say that I'd like to get the same type of
> > performance in both cases. Any idea of what might be the cause?*
> > **
> > *Thanks
> >
> > Julien*
> >
> > --
> > *DigitalPebble Ltd
> > http://www.digitalpebble.com
> >
>



-- 
DigitalPebble Ltd
http://www.digitalpebble.com

Re: running aggregate engine within CPE and client code

Posted by Eddie Epstein <ea...@gmail.com>.

Hi Julien,

Using default settings, the CPM will run the collection reader in one
thread, each processing pipeline in another, and finally another
thread for the Cas consumers. These threads can only run concurrently
if there are enough CASes. A Cas pool size of 1 limits all work to one
thread at a time.

Does your profile take into account the I/O time reading the documents?

Eddie

On Wed, Aug 13, 2008 at 10:03 AM, Julien Nioche
<li...@gmail.com> wrote:
> Hi,
>
> I am slightly puzzled by the following case. I have integrated an aggregate
> engine into my code in a very straightforward way :
>
> * // reset the tcas for the next document
>  tcas.reset();
>
>  InputStream fis = new BufferedInputStream(new FileInputStream(target));
>  byte[] contents = new byte[(int) target.length()];
>  fis.read(contents);
>  fis.close();
>
>  String document = new String(contents);
>
>  tcas.setDocumentText(document);
>  tcas.setDocumentLanguage("en");
>
>  controller.process(tcas);
>
> *Using the aggregate engine from the CPM is more than 10x faster than my
> client code; both are running in a single thread. I profiled my application
> and found that the slower part is
>
> *87.9% - 50,781 ms
> org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process*
> *
> *i.e the time is not spent in other parts of my code but in the process()
> method.*
>
> *I get a similar difference even when setting *casPoolSize="1" *in my CPE
> descriptor.* *Needless to say that I'd like to get the same type of
> performance in both cases. Any idea of what might be the cause?*
> **
> *Thanks
>
> Julien*
>
> --
> *DigitalPebble Ltd
> http://www.digitalpebble.com
>