You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cayenne.apache.org by Wernke zur Borg <w....@rheagroup.com> on 2012/01/24 17:11:59 UTC

Performance question

Dear list,

I have another question. My application converts a huge MS Access 
database to a set of XML files by traversing some really big tables with 
Select queries and also querying related tables to collect the 
information needed for the output. Performance is horribly bad, so I did 
some profiling and found that it spends 75% of its CPU time in 
org.apache.cayenne.access.ExternalTransaction.close() and another 21% in 
org.apache.cayenne.conn.DriverDataSource.getConnection(). Only 2% are 
spent in SelectAction.performAction().

So it appears that it opens and closes a new connection for every single 
select query. I must be doing something wrong here! Where can I tell it 
to re-use connections? Or make one big transaction as it is read-only 
anyway? Any other hints? I hope I can reduce the processing time to 4% 
of what it is now...

Thanks,
Wernke





This email (including any attachments) may contain confidential and/or privileged
information or information otherwise protected from disclosure.  If you are not the
intended recipient, please notify the sender immediately, do not copy this message
or any attachments and do not use it or any purpose or disclose its content to any
person, but delete this message and any attachments from your system.
RHEA System S.A. (RHEA) disclaims any and all liability if this email transmission
was virus corrupted, altered or falsified.

If a Non Disclosure Agreement (NDA) exists between RHEA and the receiving
organization, then all contents shall be considered proprietary and covered under
the NDA.



RE: Performance question

Posted by "Durchholz, Joachim" <Jo...@hennig-fahrzeugteile.de>.
> I have another question. My application converts a huge
> MS Access database to a set of XML files by traversing
> some really big tables with Select queries and also
> querying related tables to collect the information
> needed for the output. Performance is horribly bad,
> so I did some profiling and found that it spends 75%
> of its CPU time in
> org.apache.cayenne.access.ExternalTransaction.close()
> and another 21% in
> org.apache.cayenne.conn.DriverDataSource.getConnection().

Phew, that's bad.
Is the connection pool active?

Re: Performance question

Posted by Wernke zur Borg <w....@rheagroup.com>.
On 2012-01-24 17:17, Durchholz, Joachim wrote:
> Is the connection pool active?
Connection pool settings do not change the behaviour, I tried various 
values for min/max.


On 2012-01-24 21:39, Robert Zeigler wrote:
> Are you using query prefetching?
Adding prefetching does not change the behaviour.

Any further hints will be appreciated, thanks!

> On Jan 24, 2012, at 1/2410:11 AM , Wernke zur Borg wrote:
>
>> Dear list,
>>
>> I have another question. My application converts a huge MS Access
>> database to a set of XML files by traversing some really big tables with
>> Select queries and also querying related tables to collect the
>> information needed for the output. Performance is horribly bad, so I did
>> some profiling and found that it spends 75% of its CPU time in
>> org.apache.cayenne.access.ExternalTransaction.close() and another 21% in
>> org.apache.cayenne.conn.DriverDataSource.getConnection(). Only 2% are
>> spent in SelectAction.performAction().
>>
>> So it appears that it opens and closes a new connection for every single
>> select query. I must be doing something wrong here! Where can I tell it
>> to re-use connections? Or make one big transaction as it is read-only
>> anyway? Any other hints? I hope I can reduce the processing time to 4%
>> of what it is now...
>>
>> Thanks,
>> Wernke
>>



This email (including any attachments) may contain confidential and/or privileged
information or information otherwise protected from disclosure.  If you are not the
intended recipient, please notify the sender immediately, do not copy this message
or any attachments and do not use it or any purpose or disclose its content to any
person, but delete this message and any attachments from your system.
RHEA System S.A. (RHEA) disclaims any and all liability if this email transmission
was virus corrupted, altered or falsified.

If a Non Disclosure Agreement (NDA) exists between RHEA and the receiving
organization, then all contents shall be considered proprietary and covered under
the NDA.



Re: Performance question

Posted by Robert Zeigler <ro...@roxanemy.com>.
Are you using query prefetching?

Robert

On Jan 24, 2012, at 1/2410:11 AM , Wernke zur Borg wrote:

> 
> Dear list,
> 
> I have another question. My application converts a huge MS Access 
> database to a set of XML files by traversing some really big tables with 
> Select queries and also querying related tables to collect the 
> information needed for the output. Performance is horribly bad, so I did 
> some profiling and found that it spends 75% of its CPU time in 
> org.apache.cayenne.access.ExternalTransaction.close() and another 21% in 
> org.apache.cayenne.conn.DriverDataSource.getConnection(). Only 2% are 
> spent in SelectAction.performAction().
> 
> So it appears that it opens and closes a new connection for every single 
> select query. I must be doing something wrong here! Where can I tell it 
> to re-use connections? Or make one big transaction as it is read-only 
> anyway? Any other hints? I hope I can reduce the processing time to 4% 
> of what it is now...
> 
> Thanks,
> Wernke
> 
> 
> 
> 
> 
> This email (including any attachments) may contain confidential and/or privileged
> information or information otherwise protected from disclosure.  If you are not the
> intended recipient, please notify the sender immediately, do not copy this message
> or any attachments and do not use it or any purpose or disclose its content to any
> person, but delete this message and any attachments from your system.
> RHEA System S.A. (RHEA) disclaims any and all liability if this email transmission
> was virus corrupted, altered or falsified.
> 
> If a Non Disclosure Agreement (NDA) exists between RHEA and the receiving
> organization, then all contents shall be considered proprietary and covered under
> the NDA.
> 
> 


Re: Performance question

Posted by Wernke zur Borg <w....@rheagroup.com>.
Hi Jo,

thanks for your valuable explanations. I also found my error w.r.t. 
Connection.close(), that's why I got the idea with the one big 
transaction. I do not see why a single reading application with no 
concurrent writers should use multiple transactions anyway. Of course we 
would have to rethink/redesign this in case concurrent writers would 
eventually be added. But this is a simple offline tool, so there is 
currently no danger.

And I can assure you, I will not reinvent the connection pooling of 
DataSource. Perhaps I'll find the time to investigate the issue why the 
pooling apparently did not work for me. I will let you know.

Thanks again for your contributions to the list.

Wernke


On 2012-01-25 14:21, Durchholz, Joachim wrote:
> It closes the /transaction/, not the /connection/ (it's 
> /ExternalTransaction/.close). You fixed that for now by putting 
> everything into a single transaction, but I suppose it will come back 
> to bite you once business logic dictates many small transactions 
> (maybe in a later project, so this may not be relevant for you right 
> now - it's just something to keep in mind).
> There's no good reason why a getConnection from a connection pool 
> should take up more time than running any SQL statement.
> Either the connection isn't taken from the connection pool. Which 
> makes me suspect the connection pool isn't active - maybe something is 
> misconfigured.
> Or Cayenne's getConnection is unreasonably slow.
> It would be nice to know what actually caused the problem.
> N.B. writing your own DataSource that reuses connections would be just 
> a reinvention of connection pooling. You don't want to do that - 
> connection pooling is far more involved than meets the eye; doing it 
> well will certainly make you go over the time budget, and doing it 
> not-so-well is a waste of time. (The issues I'm aware of are: keeping 
> connections open in advance to reduce latency, which means opening 
> connections in a separate thread and avoiding race conditions; dealing 
> with connection loss due to networking outages; dealing with 
> connection loss due to server-side time-outs after periods of 
> inactivity; avoiding unnecessary work in a transaction after the 
> connection was gone; distinguishing whether to silently reestablish 
> connection or report back a failed connection, that despends on 
> whether you actually had any SQL activity after establishing the 
> connection. There might be more.)
> Regards,
> Jo
>
> ------------------------------------------------------------------------
> *From:* Wernke zur Borg [mailto:w.zurborg@rheagroup.com]
> *Sent:* Wednesday, January 25, 2012 10:02 AM
> *To:* user@cayenne.apache.org
> *Subject:* Re: Performance question
>
>
> To give you a picture here is a screenshot of the profiler showing 
> that one connection is opened and closed for every query.
>
>
>
> The question I do not understand is why does DataNode.performQueries() 
> unconditionally close the current connection?
>
> Otherwise I could probably provide a custom DataSource class 
> overriding getConnection() to re-use connections.
>
> Wernke 


-- 
/Wernke zur Borg
*Rhea System S.A.*
Robert-Bosch-Str. 7
64293 Darmstadt / Germany
Tel. +49-6151-8709591/


This email (including any attachments) may contain confidential and/or privileged
information or information otherwise protected from disclosure.  If you are not the
intended recipient, please notify the sender immediately, do not copy this message
or any attachments and do not use it or any purpose or disclose its content to any
person, but delete this message and any attachments from your system.
RHEA System S.A. (RHEA) disclaims any and all liability if this email transmission
was virus corrupted, altered or falsified.

If a Non Disclosure Agreement (NDA) exists between RHEA and the receiving
organization, then all contents shall be considered proprietary and covered under
the NDA.


RE: Performance question

Posted by "Durchholz, Joachim" <Jo...@hennig-fahrzeugteile.de>.
It closes the transaction, not the connection (it's ExternalTransaction.close). You fixed that for now by putting everything into a single transaction, but I suppose it will come back to bite you once business logic dictates many small transactions (maybe in a later project, so this may not be relevant for you right now - it's just something to keep in mind).
 
There's no good reason why a getConnection from a connection pool should take up more time than running any SQL statement.
Either the connection isn't taken from the connection pool. Which makes me suspect the connection pool isn't active - maybe something is misconfigured.
Or Cayenne's getConnection is unreasonably slow.
It would be nice to know what actually caused the problem.
 
N.B. writing your own DataSource that reuses connections would be just a reinvention of connection pooling. You don't want to do that - connection pooling is far more involved than meets the eye; doing it well will certainly make you go over the time budget, and doing it not-so-well is a waste of time. (The issues I'm aware of are: keeping connections open in advance to reduce latency, which means opening connections in a separate thread and avoiding race conditions; dealing with connection loss due to networking outages; dealing with connection loss due to server-side time-outs after periods of inactivity; avoiding unnecessary work in a transaction after the connection was gone; distinguishing whether to silently reestablish connection or report back a failed connection, that despends on whether you actually had any SQL activity after establishing the connection. There might be more.)
 
Regards,
Jo

________________________________

From: Wernke zur Borg [mailto:w.zurborg@rheagroup.com] 
Sent: Wednesday, January 25, 2012 10:02 AM
To: user@cayenne.apache.org
Subject: Re: Performance question



To give you a picture here is a screenshot of the profiler showing that one connection is opened and closed for every query.

 

The question I do not understand is why does DataNode.performQueries() unconditionally close the current connection?

Otherwise I could probably provide a custom DataSource class overriding getConnection() to re-use connections.

Wernke

Re: Performance question [Problem solved!]

Posted by Wernke zur Borg <w....@rheagroup.com>.
I solved the problem by wrapping the entire processing into one 
transaction, following "User-Defined Transaction Scope" in 
http://cayenne.apache.org/doc/understanding-transactions.html.

         // embed the whole thing in one single transaction
         DataDomain d = getRuntime().getDataDomain();
         Transaction tx = d.createTransaction();
         Transaction.bindThreadTransaction(tx);

         try {
                 // do all processing here
         } finally {
             Transaction.bindThreadTransaction(null);
         }

No need to worry about rollbacks as everything is read-only.

 >>> Processing time has gone down from one hour to one minute !!!

Thanks to everybody,
Wernke


On 2012-01-25 10:01, Wernke zur Borg wrote:
>
> To give you a picture here is a screenshot of the profiler showing 
> that one connection is opened and closed for every query.
>
>
>
> The question I do not understand is why does DataNode.performQueries() 
> unconditionally close the current connection?
>
> Otherwise I could probably provide a custom DataSource class 
> overriding getConnection() to re-use connections.
>
> Wernke
>
>
> On 2012-01-24 17:11, Wernke zur Borg wrote:
>> Dear list,
>>
>> I have another question. My application converts a huge MS Access
>> database to a set of XML files by traversing some really big tables with
>> Select queries and also querying related tables to collect the
>> information needed for the output. Performance is horribly bad, so I did
>> some profiling and found that it spends 75% of its CPU time in
>> org.apache.cayenne.access.ExternalTransaction.close() and another 21% in
>> org.apache.cayenne.conn.DriverDataSource.getConnection(). Only 2% are
>> spent in SelectAction.performAction().
>>
>> So it appears that it opens and closes a new connection for every single
>> select query. I must be doing something wrong here! Where can I tell it
>> to re-use connections? Or make one big transaction as it is read-only
>> anyway? Any other hints? I hope I can reduce the processing time to 4%
>> of what it is now...
>>
>> Thanks,
>> Wernke
>>



This email (including any attachments) may contain confidential and/or privileged
information or information otherwise protected from disclosure.  If you are not the
intended recipient, please notify the sender immediately, do not copy this message
or any attachments and do not use it or any purpose or disclose its content to any
person, but delete this message and any attachments from your system.
RHEA System S.A. (RHEA) disclaims any and all liability if this email transmission
was virus corrupted, altered or falsified.

If a Non Disclosure Agreement (NDA) exists between RHEA and the receiving
organization, then all contents shall be considered proprietary and covered under
the NDA.


Re: Performance question

Posted by Wernke zur Borg <w....@rheagroup.com>.
To give you a picture here is a screenshot of the profiler showing that 
one connection is opened and closed for every query.



The question I do not understand is why does DataNode.performQueries() 
unconditionally close the current connection?

Otherwise I could probably provide a custom DataSource class overriding 
getConnection() to re-use connections.

Wernke


On 2012-01-24 17:11, Wernke zur Borg wrote:
> Dear list,
>
> I have another question. My application converts a huge MS Access
> database to a set of XML files by traversing some really big tables with
> Select queries and also querying related tables to collect the
> information needed for the output. Performance is horribly bad, so I did
> some profiling and found that it spends 75% of its CPU time in
> org.apache.cayenne.access.ExternalTransaction.close() and another 21% in
> org.apache.cayenne.conn.DriverDataSource.getConnection(). Only 2% are
> spent in SelectAction.performAction().
>
> So it appears that it opens and closes a new connection for every single
> select query. I must be doing something wrong here! Where can I tell it
> to re-use connections? Or make one big transaction as it is read-only
> anyway? Any other hints? I hope I can reduce the processing time to 4%
> of what it is now...
>
> Thanks,
> Wernke
>



This email (including any attachments) may contain confidential and/or privileged
information or information otherwise protected from disclosure.  If you are not the
intended recipient, please notify the sender immediately, do not copy this message
or any attachments and do not use it or any purpose or disclose its content to any
person, but delete this message and any attachments from your system.
RHEA System S.A. (RHEA) disclaims any and all liability if this email transmission
was virus corrupted, altered or falsified.

If a Non Disclosure Agreement (NDA) exists between RHEA and the receiving
organization, then all contents shall be considered proprietary and covered under
the NDA.