You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cayenne.apache.org by Marcin Skladaniec <ma...@ish.com.au> on 2007/06/22 02:35:13 UTC

paged query slow when fetching big lists

Hi
Recently we have found that fetching a list of 100,000 records using  
ROP with paging and no cache takes a long time, about 50 seconds in  
our case. We have profiled the cpu usage and the result shows that  
99% of time is spent in IncrementalFaultList, within the fillIn()  
method.

The fillIn method works (in my opinion) in a bit strange fashion: it  
does execute query at once, stores the query result in  
java.sql.ResultSet, and than iterates through the result either  
creating the whole DataRow or just ObjectId. If there is a need the  
DataRows are faulted at the end of the method.
 From our testing it came up that this bit of code :

while (it.hasNextRow()) {
	elements.add(it.nextObjectId(entity));
}

is where all the time is spent. Each iteration in this loop takes  
about 0.5ms, which multiplied by 100,000 takes almost 50 seconds.
nextObjectId method consists of two parts: fetching the next result  
from ResultSet and creating a ObjectId, but I was unable to check  
which one takes the most time, anyway I think that this approach is  
somewhat wrong, since always 99% of the records will be fetched as  
ObjectId and never faulted, so my ideas to enhance this are:
* fetch only Pk columns and create all ObjectIds at once, get rid of  
the iterating process if possible
* use already existing method resolveInterval() to fault the required  
range of records
If the creation of ObjectId and getting the results from ResultSet  
cannot be speed up (because it simply has to happen, and it does not  
depend on the way it is done), the only choice will be to implement  
some more complex solution using sql LIMIT statement.

I would like to mention that we are using some DataContext decorators  
and life-cycle callbacks, but I don't believe those are important  
factors in this case.

Whatever is the solution, i think it is pretty crucial that it will  
be implemented soon, since the usability of the ROP without fast  
paging is rather low.

With regards
Marcin
-------------------------->
ish
http://www.ish.com.au
Level 1, 30 Wilson Street Newtown 2042 Australia
phone +61 2 9550 5001   fax +61 2 9550 4001




Re: paged query slow when fetching big lists

Posted by Andrus Adamchik <an...@objectstyle.org>.
Hi Marcin,

> * fetch only Pk columns and create all ObjectIds at once, get rid  
> of the iterating process if possible
> * use already existing method resolveInterval() to fault the  
> required range of records

This strategy was discussed in the May thread with Ari (the one that  
Michael Gentry mentioned). My vote is +0, meaning that before we make  
this change, I want to confirm first that it has a visible impact on  
performance. Could you possibly make such change locally and see if  
it helps? (Look at SelectQuery.addCustomDbAttribute() to only include  
PK; if you have problems making this change, ping me via the dev list  
- I'll try my best to help).


> If the creation of ObjectId and getting the results from ResultSet  
> cannot be speed up (because it simply has to happen, and it does  
> not depend on the way it is done), the only choice will be to  
> implement some more complex solution using sql LIMIT statement.

I'd love to avoid that, as the data you get may as well be different  
the next time you resolve a page, so you may end up with duplicates  
or skipped records. If we ever go this way, we'll probably need to  
make it a user choice (use LIMIT vs. IncrementalFaultList).

Andrus



On Jun 22, 2007, at 2:35 AM, Marcin Skladaniec wrote:

> Hi
> Recently we have found that fetching a list of 100,000 records  
> using ROP with paging and no cache takes a long time, about 50  
> seconds in our case. We have profiled the cpu usage and the result  
> shows that 99% of time is spent in IncrementalFaultList, within the  
> fillIn() method.
>
> The fillIn method works (in my opinion) in a bit strange fashion:  
> it does execute query at once, stores the query result in  
> java.sql.ResultSet, and than iterates through the result either  
> creating the whole DataRow or just ObjectId. If there is a need the  
> DataRows are faulted at the end of the method.
> From our testing it came up that this bit of code :
>
> while (it.hasNextRow()) {
> 	elements.add(it.nextObjectId(entity));
> }
>
> is where all the time is spent. Each iteration in this loop takes  
> about 0.5ms, which multiplied by 100,000 takes almost 50 seconds.
> nextObjectId method consists of two parts: fetching the next result  
> from ResultSet and creating a ObjectId, but I was unable to check  
> which one takes the most time, anyway I think that this approach is  
> somewhat wrong, since always 99% of the records will be fetched as  
> ObjectId and never faulted, so my ideas to enhance this are:
> * fetch only Pk columns and create all ObjectIds at once, get rid  
> of the iterating process if possible
> * use already existing method resolveInterval() to fault the  
> required range of records
> If the creation of ObjectId and getting the results from ResultSet  
> cannot be speed up (because it simply has to happen, and it does  
> not depend on the way it is done), the only choice will be to  
> implement some more complex solution using sql LIMIT statement.
>
> I would like to mention that we are using some DataContext  
> decorators and life-cycle callbacks, but I don't believe those are  
> important factors in this case.
>
> Whatever is the solution, i think it is pretty crucial that it will  
> be implemented soon, since the usability of the ROP without fast  
> paging is rather low.
>
> With regards
> Marcin
> -------------------------->
> ish
> http://www.ish.com.au
> Level 1, 30 Wilson Street Newtown 2042 Australia
> phone +61 2 9550 5001   fax +61 2 9550 4001
>
>
>


Re: paged query slow when fetching big lists

Posted by Michael Gentry <bl...@gmail.com>.
Not to pick on your iMac, which is a nice machine, but the PowerPC
line was notoriously bad at running Java (G5 better than the G4,
obviously).  If you tried this on an Intel box, I suspect your speed
improvement would be even better (I'm guessing a good portion of your
10 seconds is in the Java GUI).  When I went from a G4 PowerBook to a
MacBook Pro, I saw over an order of magnitude in improvement.  Eclipse
actually became usable instead of a lurching monster.  (Really,
Andrus, I don't know how you got so much Java work done on your old
PowerBook!)  I probably shouldn't ramble before I wake up in the
morning ... :-)

/dev/mrg


On 6/28/07, Aristedes Maniatis <ar...@maniatis.org> wrote:
> A final followup on these changes. We've done testing both single
> user and three-tier and the speed improvement is now consistent
> throughout. With the following setup:
>
> * 1.6Ghz iMac G5
> * Derby database
> * 120,000 records
> * all Cayenne caching is off

Re: paged query slow when fetching big lists

Posted by Aristedes Maniatis <ar...@maniatis.org>.
On 28/06/2007, at 6:10 PM, Andrus Adamchik wrote:

> Fixed. Now the things are fast on the client as well.


A final followup on these changes. We've done testing both single  
user and three-tier and the speed improvement is now consistent  
throughout. With the following setup:

* 1.6Ghz iMac G5
* Derby database
* 120,000 records
* all Cayenne caching is off

We perform a query which returns 120,000 records and get back a list  
in three-tier which is then used for the table model of a JTable and  
displayed on screen. Timings are for the list to be drawn onscreen  
(only the first page of 30 records are faulted), so many other  
factors (Swing, network speed, Hessian serialising, etc) are in play.

* before changes: 18-26 seconds
* after changes: 10 seconds (very consistent)


Naturally we can improve the application speed by not showing the  
user 120,000 records in one list, but it is nice to know that a user  
choosing "find all" will not hang the application for very long.

Also, it is worth noting that in our preliminary tests moving the  
client to another similar spec machine on a 10/100 network resulted  
in exactly the same 10 second timing. We will continue to profile our  
app and see where the next most important bottleneck is, but I think  
these changes have been extremely effective. Thanks Andrus.


Ari Maniatis


-------------------------->
Aristedes Maniatis
phone +61 2 9660 9700
PGP fingerprint 08 57 20 4B 80 69 59 E2  A9 BF 2D 48 C2 20 0C C8



Re: AW: paged query slow when fetching big lists

Posted by Andrus Adamchik <an...@objectstyle.org>.
To be fair this depends on the kind of issue and on the user effort  
to prove to the developers that a given fix is actually going to make  
things better.

Andrus


On Jun 28, 2007, at 11:18 AM, Peter Schröder wrote:
> the speed of fixing things is amaizing... i wish that tapestry  
> would have such a support!
>
> -----Ursprüngliche Nachricht-----
> Von: Andrus Adamchik [mailto:andrus@objectstyle.org]
> Gesendet: Donnerstag, 28. Juni 2007 10:11
> An: user@cayenne.apache.org
> Betreff: Re: paged query slow when fetching big lists
>
> Fixed. Now the things are fast on the client as well.
>
> Andrus
>
> On Jun 28, 2007, at 12:18 AM, Andrus Adamchik wrote:
>
>>
>> On Jun 27, 2007, at 6:38 AM, Marcin Skladaniec wrote:
>>
>>> There is one but: fix does work only for queries executed on
>>> server, when I executed the query on (ROP) client, the query takes
>>> the same amount of time ! Is it possible that the remote calls are
>>> using a different constructor ? or maybe the
>>> isFetchingCustomAttributes() returns true for 'remote'
>>> SelectQueries, and therefore the constructor works as before ?
>>
>> Doh! Yeah, I found the reason - client wraps paginated SelectQuery
>> in a IncrementalQuery wrapper (needed for server-side caching), and
>> the optimization code inside IncrementalFaultList is hardcoded to
>> look for SelectQuery. I don't have an immediate solution - I have
>> to think about it a bit, and test a few possibilities. But at least
>> we've identified the problem.
>>
>> Andrus
>>
>>
>
>


AW: paged query slow when fetching big lists

Posted by Peter Schröder <Pe...@freenet-ag.de>.
the speed of fixing things is amaizing... i wish that tapestry would have such a support! 

-----Ursprüngliche Nachricht-----
Von: Andrus Adamchik [mailto:andrus@objectstyle.org] 
Gesendet: Donnerstag, 28. Juni 2007 10:11
An: user@cayenne.apache.org
Betreff: Re: paged query slow when fetching big lists

Fixed. Now the things are fast on the client as well.

Andrus

On Jun 28, 2007, at 12:18 AM, Andrus Adamchik wrote:

>
> On Jun 27, 2007, at 6:38 AM, Marcin Skladaniec wrote:
>
>> There is one but: fix does work only for queries executed on  
>> server, when I executed the query on (ROP) client, the query takes  
>> the same amount of time ! Is it possible that the remote calls are  
>> using a different constructor ? or maybe the  
>> isFetchingCustomAttributes() returns true for 'remote'  
>> SelectQueries, and therefore the constructor works as before ?
>
> Doh! Yeah, I found the reason - client wraps paginated SelectQuery  
> in a IncrementalQuery wrapper (needed for server-side caching), and  
> the optimization code inside IncrementalFaultList is hardcoded to  
> look for SelectQuery. I don't have an immediate solution - I have  
> to think about it a bit, and test a few possibilities. But at least  
> we've identified the problem.
>
> Andrus
>
>


Re: paged query slow when fetching big lists

Posted by Andrus Adamchik <an...@objectstyle.org>.
Fixed. Now the things are fast on the client as well.

Andrus

On Jun 28, 2007, at 12:18 AM, Andrus Adamchik wrote:

>
> On Jun 27, 2007, at 6:38 AM, Marcin Skladaniec wrote:
>
>> There is one but: fix does work only for queries executed on  
>> server, when I executed the query on (ROP) client, the query takes  
>> the same amount of time ! Is it possible that the remote calls are  
>> using a different constructor ? or maybe the  
>> isFetchingCustomAttributes() returns true for 'remote'  
>> SelectQueries, and therefore the constructor works as before ?
>
> Doh! Yeah, I found the reason - client wraps paginated SelectQuery  
> in a IncrementalQuery wrapper (needed for server-side caching), and  
> the optimization code inside IncrementalFaultList is hardcoded to  
> look for SelectQuery. I don't have an immediate solution - I have  
> to think about it a bit, and test a few possibilities. But at least  
> we've identified the problem.
>
> Andrus
>
>


Re: paged query slow when fetching big lists

Posted by Andrus Adamchik <an...@objectstyle.org>.
On Jun 27, 2007, at 6:38 AM, Marcin Skladaniec wrote:

> There is one but: fix does work only for queries executed on  
> server, when I executed the query on (ROP) client, the query takes  
> the same amount of time ! Is it possible that the remote calls are  
> using a different constructor ? or maybe the  
> isFetchingCustomAttributes() returns true for 'remote'  
> SelectQueries, and therefore the constructor works as before ?

Doh! Yeah, I found the reason - client wraps paginated SelectQuery in  
a IncrementalQuery wrapper (needed for server-side caching), and the  
optimization code inside IncrementalFaultList is hardcoded to look  
for SelectQuery. I don't have an immediate solution - I have to think  
about it a bit, and test a few possibilities. But at least we've  
identified the problem.

Andrus


Re: paged query slow when fetching big lists

Posted by Marcin Skladaniec <ma...@ish.com.au>.
Hi Andrus !

The fix you committed today works great, I can confirm, in our case  
the 30-50 second SelectQuery now executes in 5-7 seconds !

There is one but: fix does work only for queries executed on server,  
when I executed the query on (ROP) client, the query takes the same  
amount of time ! Is it possible that the remote calls are using a  
different constructor ? or maybe the isFetchingCustomAttributes()  
returns true for 'remote' SelectQueries, and therefore the  
constructor works as before ?
Can I ask you to check that out, I could have debug it myself, but  
from previous experience I know, that you know those things straight  
away, and I would only waste my time.

Marcin

On 27/06/2007, at 4:32 AM, Andrus Adamchik wrote:

> Hi Marcin,
>
> I have good news (I think). Fetching just id columns inside the  
> IncrementalFaultList indeed speeds things up significantly. I just  
> committed the change to optimize SelectQueries to do just that.  
> Please let me know how does it work for you.
>
> Now the profiling details...
>
> * I profiled on Derby and MySQL. In both cases fetching a table  
> with 25 columns and 100000 rows took between 3-4 seconds (not as  
> long as in your case, but what's important is relative times I guess)
> * YourKit clearly showed the bottleneck: ~95% of the 'fillIn'  
> method is spent in the driver code, rewinding the result set (i.e.  
> brining the data from db to the client).
> * After my change the query time went down to 0.2-0.5 second (0.8  
> if you consider the second query needed to fault the first page).  
> Not bad!
> * ResultSet reading still remained a bottleneck, but it became  
> faster in absolute terms. And now finally Cayenne-related code  
> (such as DataRow creation) started to show up on the radar (e.g.  
> DataRow constructor taking 3% of the 'fillIn' method time).
>
> Andrus
>
>
>
> On Jun 26, 2007, at 10:55 AM, Marcin Skladaniec wrote:
>
>> Hi Andrus !
>> Many thanks for that !
>> Marcin
>>
>> On 26/06/2007, at 5:39 PM, Andrus Adamchik wrote:
>>
>>> Hi Marcin,
>>>
>>>> 1) SelectQuery(Student.class) with page size 10 takes 30-50  
>>>> seconds.
>>>> 2) SQLTemplate(Student.class, "SELECT #result('id' 'int') from  
>>>> STUDENT") without paging takes 100 sec.
>>>> 3) SQLTemplate(Student.class, "SELECT #result('id' 'int') from  
>>>> STUDENT") with page size 10 takes 5 sec.
>>>> 4) SQLTemplate(Student.class, "SELECT #result('id' 'int') from  
>>>> STUDENT") with page size 10 and fetching row data rows takes 4.5  
>>>> sec.
>>>
>>> I see you still didn't use profiler, but the data you provided  
>>> seems to finally confirm that at least on Derby not fetching all  
>>> columns does result in significant speedup (#1 vs. #3). So now it  
>>> is a question of implementing the right algorithm for the  
>>> IncrementalFaultList.
>>>
>>>> Andrus, you mentioned using addCustomDbAttribute to fetch only  
>>>> part of the data. I tried to use addCustomDbAttribute("id") on  
>>>> client, it resulted in returning the raw dataRows, is there  
>>>> something I can do to fetch faulted objects ?
>>>
>>> We should encapsulate this logic inside IncrementalFaultList on  
>>> the server.
>>>
>>>> Our application was designed to use the SelectQuery. If we have  
>>>> to change that and use the SQLTemplate instead, there is a lot  
>>>> of work for us, including:
>>>
>>> Same thing - the right thing to do is to fix it on the server.
>>>
>>> Let me try to find a spare minute later tonight and implement id- 
>>> only fetch. I have some large tables in a MySQL5 so I can test  
>>> the performance in a slightly different environment.
>>>
>>> Andrus
>>>
>>>
>>>
>>> On Jun 26, 2007, at 9:26 AM, Marcin Skladaniec wrote:
>>>> Hi
>>>>
>>>> I have done some more profiling and testing.
>>>>
>>>> executing queries on table with >100000 records, directly on  
>>>> server (not on client) gave results as listed below:
>>>>
>>>> 1) SelectQuery(Student.class) with page size 10 takes 30-50  
>>>> seconds.
>>>> 2) SQLTemplate(Student.class, "SELECT #result('id' 'int') from  
>>>> STUDENT") without paging takes 100 sec.
>>>> 3) SQLTemplate(Student.class, "SELECT #result('id' 'int') from  
>>>> STUDENT") with page size 10 takes 5 sec.
>>>> 4) SQLTemplate(Student.class, "SELECT #result('id' 'int') from  
>>>> STUDENT") with page size 10 and fetching row data rows takes 4.5  
>>>> sec.
>>>>
>>>> what more, I found that executing the SQLTemplate does allow to  
>>>> fault the objects (I sometimes discover the simplest things  
>>>> last), so I did try to check how long it takes for the objects  
>>>> to be faulted:
>>>> 1) first object on every page (except first) 30-200ms, rest = 0ms
>>>> 2) objects is faulted in 20ms (average)
>>>> 3) on first page first object faulted in 200ms, rest  ~20 ms,
>>>>      on any following page first object faulted in 30-200ms,  
>>>> rest 0ms (interesting that the first page does not seem to be  
>>>> faulted at all)
>>>> 4) no point testing.
>>>>
>>>> Also I did check if the resizing of the ArrayList which is  
>>>> keeping the results does affect the speed, and it does not.  
>>>> (Tried to make the ArrayList initial size = 150,000).
>>>>
>>>> My conclusion is that SelectQuery with paging is usable only for  
>>>> fetching less than, say 10,000 records, otherwise the  
>>>> performance is to low. With SQLTemplate the performance is much  
>>>> greater. It applies to both ROP and 'normal' cayenne, since I  
>>>> made those tests on server.
>>>>
>>>> Andrus, you mentioned using addCustomDbAttribute to fetch only  
>>>> part of the data. I tried to use addCustomDbAttribute("id") on  
>>>> client, it resulted in returning the raw dataRows, is there  
>>>> something I can do to fetch faulted objects ?
>>>>
>>>> Our application was designed to use the SelectQuery. If we have  
>>>> to change that and use the SQLTemplate instead, there is a lot  
>>>> of work for us, including:
>>>> - dealing with adding and concatenating Expressions to the  
>>>> SQLTemplate (is there an easy way ?)
>>>> - dealing with declared qualifier (the one set in modeller)
>>>> - possibly more...
>>>> i would really like to avoid all of that, so if you have any  
>>>> ideas on how to improve the performance without too much hassle  
>>>> I would really appreciate.
>>>>
>>>> Marcin
>>>>
>>>> On 25/06/2007, at 8:31 PM, Marcin Skladaniec wrote:
>>>>
>>>>> Hi Andrus
>>>>> I had not much time to check, but with the fix the 100k records  
>>>>> load in 30 instead of 50 seconds. It is some improvement, but  
>>>>> not enough. I'll do some more profiling tomorrow and let you know.
>>>>>
>>>>> By the way,  we are using netbeans for profiling, the benefit :  
>>>>> it is free. I will evaluate the yourkit as we are moving away  
>>>>> from netbeans as a development platform.
>>>>>
>>>>> Marcin
>>>>>
>>>>> On 23/06/2007, at 5:38 PM, Andrus Adamchik wrote:
>>>>>
>>>>>> Ari, Marcin --
>>>>>>
>>>>>> going through the code I noticed one inefficiency - the  
>>>>>> elements array access is synchronized in 'fillIn' method.  
>>>>>> Since 'fillIn' is called from constructor, such  
>>>>>> synchronization is unneeded and only slows things down. I just  
>>>>>> checked a fixed version to trunk. Could you try it out?
>>>>>>
>>>>>> Andrus
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Jun 23, 2007, at 12:33 AM, Aristedes Maniatis wrote:
>>>>>>> On 22/06/2007, at 11:10 PM, Michael Gentry wrote:
>>>>>>>
>>>>>>>> Marcin, this thread might be of interest to you ...
>>>>>>>>
>>>>>>>> http://mail-archives.apache.org/mod_mbox/cayenne-dev/ 
>>>>>>>> 200705.mbox/browser
>>>>>>>>
>>>>>>>> Look at the "Paging and SQL queries" thread on May 25.
>>>>>>>
>>>>>>>
>>>>>>> Yes, this is the same project we are working on. I started  
>>>>>>> some performance profiling and Marcin has been able now to  
>>>>>>> take it much further. What is it about:
>>>>>>>
>>>>>>>> elements.add(it.nextObjectId(entity));
>>>>>>>
>>>>>>> which is so slow? The code gets a little complex at that  
>>>>>>> point and we are having difficulty tracing it through to the  
>>>>>>> exact performance problem in the underlying code. Is it the  
>>>>>>> speed of adding the object id to the Collection or the speed  
>>>>>>> of creating an object id itself? 0.5ms doesn't sound slow,  
>>>>>>> but it doesn't scale well.
>>>>>>>
>>>>>>> Andrus, I got the impression from the previous thread that  
>>>>>>> you suspected something slightly different. That the  
>>>>>>> performance problem might be in the fat query itself, but  
>>>>>>> from our tests this isn't the case. If I've got this right,  
>>>>>>> the way it works is:
>>>>>>>
>>>>>>> 1. perform regular query to get all columns but return result  
>>>>>>> in iterator
>>>>>>> 2. iterate through first page and build full objects
>>>>>>> 3. iterate through other pages and build just objectids (this  
>>>>>>> is the slow part for us)
>>>>>>> 4. when another page is fetched perform another query and  
>>>>>>> fetch those objects from the DB
>>>>>>>
>>>>>>> So, getting just primary keys from the DB may not be any  
>>>>>>> faster if the performance problem is simply in the  
>>>>>>> construction of objectIds. If the performance problem is in  
>>>>>>> the numerous resizings of the Collection (each time it runs  
>>>>>>> out of space, then it is increased by 50% or 100% in size),  
>>>>>>> then the solution could be as simple as figuring out the size  
>>>>>>> of the iterator and sizing the collection appropriately from  
>>>>>>> the beginning.
>>>>>>>
>>>>>>> Any ideas on how to discover the exact cause of the  
>>>>>>> performance hit?
>>>>>>>
>>>>>>>
>>>>>>> Ari Maniatis
>>>>>>>
>>>>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>>
>>
>

Marcin




Re: paged query slow when fetching big lists

Posted by Andrus Adamchik <an...@objectstyle.org>.
3.0 (SVN trunk), which will hopefully be released fairly soon as 3.0M1

Andrus


On Jul 5, 2007, at 3:06 PM, Borut Bolčina wrote:
> Hello,
>
> which version includes this corrections? 2.0.3 or 3.0?
>
> Thanks,
> Borut
>
> On 26.6.2007 20:32, Andrus Adamchik wrote:
>> Hi Marcin,
>>
>> I have good news (I think). Fetching just id columns inside the  
>> IncrementalFaultList indeed speeds things up significantly. I just  
>> committed the change to optimize SelectQueries to do just that.  
>> Please let me know how does it work for you.
>>
>> Now the profiling details...
>>
>> * I profiled on Derby and MySQL. In both cases fetching a table  
>> with 25 columns and 100000 rows took between 3-4 seconds (not as  
>> long as in your case, but what's important is relative times I guess)
>> * YourKit clearly showed the bottleneck: ~95% of the 'fillIn'  
>> method is spent in the driver code, rewinding the result set (i.e.  
>> brining the data from db to the client).
>> * After my change the query time went down to 0.2-0.5 second (0.8  
>> if you consider the second query needed to fault the first page).  
>> Not bad!
>> * ResultSet reading still remained a bottleneck, but it became  
>> faster in absolute terms. And now finally Cayenne-related code  
>> (such as DataRow creation) started to show up on the radar (e.g.  
>> DataRow constructor taking 3% of the 'fillIn' method time).
>>
>> Andrus
>>
>>
>>
>> On Jun 26, 2007, at 10:55 AM, Marcin Skladaniec wrote:
>>
>>> Hi Andrus !
>>> Many thanks for that !
>>> Marcin
>>>
>>> On 26/06/2007, at 5:39 PM, Andrus Adamchik wrote:
>>>
>>>> Hi Marcin,
>>>>
>>>>> 1) SelectQuery(Student.class) with page size 10 takes 30-50  
>>>>> seconds.
>>>>> 2) SQLTemplate(Student.class, "SELECT #result('id' 'int') from  
>>>>> STUDENT") without paging takes 100 sec.
>>>>> 3) SQLTemplate(Student.class, "SELECT #result('id' 'int') from  
>>>>> STUDENT") with page size 10 takes 5 sec.
>>>>> 4) SQLTemplate(Student.class, "SELECT #result('id' 'int') from  
>>>>> STUDENT") with page size 10 and fetching row data rows takes  
>>>>> 4.5 sec.
>>>>
>>>> I see you still didn't use profiler, but the data you provided  
>>>> seems to finally confirm that at least on Derby not fetching all  
>>>> columns does result in significant speedup (#1 vs. #3). So now  
>>>> it is a question of implementing the right algorithm for the  
>>>> IncrementalFaultList.
>>>>
>>>>> Andrus, you mentioned using addCustomDbAttribute to fetch only  
>>>>> part of the data. I tried to use addCustomDbAttribute("id") on  
>>>>> client, it resulted in returning the raw dataRows, is there  
>>>>> something I can do to fetch faulted objects ?
>>>>
>>>> We should encapsulate this logic inside IncrementalFaultList on  
>>>> the server.
>>>>
>>>>> Our application was designed to use the SelectQuery. If we have  
>>>>> to change that and use the SQLTemplate instead, there is a lot  
>>>>> of work for us, including:
>>>>
>>>> Same thing - the right thing to do is to fix it on the server.
>>>>
>>>> Let me try to find a spare minute later tonight and implement id- 
>>>> only fetch. I have some large tables in a MySQL5 so I can test  
>>>> the performance in a slightly different environment.
>>>>
>>>> Andrus
>>>>
>>>>
>>>>
>>>> On Jun 26, 2007, at 9:26 AM, Marcin Skladaniec wrote:
>>>>> Hi
>>>>>
>>>>> I have done some more profiling and testing.
>>>>>
>>>>> executing queries on table with >100000 records, directly on  
>>>>> server (not on client) gave results as listed below:
>>>>>
>>>>> 1) SelectQuery(Student.class) with page size 10 takes 30-50  
>>>>> seconds.
>>>>> 2) SQLTemplate(Student.class, "SELECT #result('id' 'int') from  
>>>>> STUDENT") without paging takes 100 sec.
>>>>> 3) SQLTemplate(Student.class, "SELECT #result('id' 'int') from  
>>>>> STUDENT") with page size 10 takes 5 sec.
>>>>> 4) SQLTemplate(Student.class, "SELECT #result('id' 'int') from  
>>>>> STUDENT") with page size 10 and fetching row data rows takes  
>>>>> 4.5 sec.
>>>>>
>>>>> what more, I found that executing the SQLTemplate does allow to  
>>>>> fault the objects (I sometimes discover the simplest things  
>>>>> last), so I did try to check how long it takes for the objects  
>>>>> to be faulted:
>>>>> 1) first object on every page (except first) 30-200ms, rest = 0ms
>>>>> 2) objects is faulted in 20ms (average)
>>>>> 3) on first page first object faulted in 200ms, rest  ~20 ms,
>>>>>      on any following page first object faulted in 30-200ms,  
>>>>> rest 0ms (interesting that the first page does not seem to be  
>>>>> faulted at all)
>>>>> 4) no point testing.
>>>>>
>>>>> Also I did check if the resizing of the ArrayList which is  
>>>>> keeping the results does affect the speed, and it does not.  
>>>>> (Tried to make the ArrayList initial size = 150,000).
>>>>>
>>>>> My conclusion is that SelectQuery with paging is usable only  
>>>>> for fetching less than, say 10,000 records, otherwise the  
>>>>> performance is to low. With SQLTemplate the performance is much  
>>>>> greater. It applies to both ROP and 'normal' cayenne, since I  
>>>>> made those tests on server.
>>>>>
>>>>> Andrus, you mentioned using addCustomDbAttribute to fetch only  
>>>>> part of the data. I tried to use addCustomDbAttribute("id") on  
>>>>> client, it resulted in returning the raw dataRows, is there  
>>>>> something I can do to fetch faulted objects ?
>>>>>
>>>>> Our application was designed to use the SelectQuery. If we have  
>>>>> to change that and use the SQLTemplate instead, there is a lot  
>>>>> of work for us, including:
>>>>> - dealing with adding and concatenating Expressions to the  
>>>>> SQLTemplate (is there an easy way ?)
>>>>> - dealing with declared qualifier (the one set in modeller)
>>>>> - possibly more...
>>>>> i would really like to avoid all of that, so if you have any  
>>>>> ideas on how to improve the performance without too much hassle  
>>>>> I would really appreciate.
>>>>>
>>>>> Marcin
>>>>>
>>>>> On 25/06/2007, at 8:31 PM, Marcin Skladaniec wrote:
>>>>>
>>>>>> Hi Andrus
>>>>>> I had not much time to check, but with the fix the 100k  
>>>>>> records load in 30 instead of 50 seconds. It is some  
>>>>>> improvement, but not enough. I'll do some more profiling  
>>>>>> tomorrow and let you know.
>>>>>>
>>>>>> By the way,  we are using netbeans for profiling, the  
>>>>>> benefit : it is free. I will evaluate the yourkit as we are  
>>>>>> moving away from netbeans as a development platform.
>>>>>>
>>>>>> Marcin
>>>>>>
>>>>>> On 23/06/2007, at 5:38 PM, Andrus Adamchik wrote:
>>>>>>
>>>>>>> Ari, Marcin --
>>>>>>>
>>>>>>> going through the code I noticed one inefficiency - the  
>>>>>>> elements array access is synchronized in 'fillIn' method.  
>>>>>>> Since 'fillIn' is called from constructor, such  
>>>>>>> synchronization is unneeded and only slows things down. I  
>>>>>>> just checked a fixed version to trunk. Could you try it out?
>>>>>>>
>>>>>>> Andrus
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Jun 23, 2007, at 12:33 AM, Aristedes Maniatis wrote:
>>>>>>>> On 22/06/2007, at 11:10 PM, Michael Gentry wrote:
>>>>>>>>
>>>>>>>>> Marcin, this thread might be of interest to you ...
>>>>>>>>>
>>>>>>>>> http://mail-archives.apache.org/mod_mbox/cayenne-dev/ 
>>>>>>>>> 200705.mbox/browser
>>>>>>>>>
>>>>>>>>> Look at the "Paging and SQL queries" thread on May 25.
>>>>>>>>
>>>>>>>>
>>>>>>>> Yes, this is the same project we are working on. I started  
>>>>>>>> some performance profiling and Marcin has been able now to  
>>>>>>>> take it much further. What is it about:
>>>>>>>>
>>>>>>>>> elements.add(it.nextObjectId(entity));
>>>>>>>>
>>>>>>>> which is so slow? The code gets a little complex at that  
>>>>>>>> point and we are having difficulty tracing it through to the  
>>>>>>>> exact performance problem in the underlying code. Is it the  
>>>>>>>> speed of adding the object id to the Collection or the speed  
>>>>>>>> of creating an object id itself? 0.5ms doesn't sound slow,  
>>>>>>>> but it doesn't scale well.
>>>>>>>>
>>>>>>>> Andrus, I got the impression from the previous thread that  
>>>>>>>> you suspected something slightly different. That the  
>>>>>>>> performance problem might be in the fat query itself, but  
>>>>>>>> from our tests this isn't the case. If I've got this right,  
>>>>>>>> the way it works is:
>>>>>>>>
>>>>>>>> 1. perform regular query to get all columns but return  
>>>>>>>> result in iterator
>>>>>>>> 2. iterate through first page and build full objects
>>>>>>>> 3. iterate through other pages and build just objectids  
>>>>>>>> (this is the slow part for us)
>>>>>>>> 4. when another page is fetched perform another query and  
>>>>>>>> fetch those objects from the DB
>>>>>>>>
>>>>>>>> So, getting just primary keys from the DB may not be any  
>>>>>>>> faster if the performance problem is simply in the  
>>>>>>>> construction of objectIds. If the performance problem is in  
>>>>>>>> the numerous resizings of the Collection (each time it runs  
>>>>>>>> out of space, then it is increased by 50% or 100% in size),  
>>>>>>>> then the solution could be as simple as figuring out the  
>>>>>>>> size of the iterator and sizing the collection appropriately  
>>>>>>>> from the beginning.
>>>>>>>>
>>>>>>>> Any ideas on how to discover the exact cause of the  
>>>>>>>> performance hit?
>>>>>>>>
>>>>>>>>
>>>>>>>> Ari Maniatis
>>>>>>>>
>>>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>>
>>>
>>
>
> -- 
> <blog-logo.gif>


Re: paged query slow when fetching big lists

Posted by Borut Bolčina <bo...@najdi.si>.
Hello,

which version includes this corrections? 2.0.3 or 3.0?

Thanks,
Borut

On 26.6.2007 20:32, Andrus Adamchik wrote:
> Hi Marcin,
>
> I have good news (I think). Fetching just id columns inside the 
> IncrementalFaultList indeed speeds things up significantly. I just 
> committed the change to optimize SelectQueries to do just that. Please 
> let me know how does it work for you.
>
> Now the profiling details...
>
> * I profiled on Derby and MySQL. In both cases fetching a table with 
> 25 columns and 100000 rows took between 3-4 seconds (not as long as in 
> your case, but what's important is relative times I guess)
> * YourKit clearly showed the bottleneck: ~95% of the 'fillIn' method 
> is spent in the driver code, rewinding the result set (i.e. brining 
> the data from db to the client).
> * After my change the query time went down to 0.2-0.5 second (0.8 if 
> you consider the second query needed to fault the first page). Not bad!
> * ResultSet reading still remained a bottleneck, but it became faster 
> in absolute terms. And now finally Cayenne-related code (such as 
> DataRow creation) started to show up on the radar (e.g. DataRow 
> constructor taking 3% of the 'fillIn' method time).
>
> Andrus
>
>
>
> On Jun 26, 2007, at 10:55 AM, Marcin Skladaniec wrote:
>
>> Hi Andrus !
>> Many thanks for that !
>> Marcin
>>
>> On 26/06/2007, at 5:39 PM, Andrus Adamchik wrote:
>>
>>> Hi Marcin,
>>>
>>>> 1) SelectQuery(Student.class) with page size 10 takes 30-50 seconds.
>>>> 2) SQLTemplate(Student.class, "SELECT #result('id' 'int') from 
>>>> STUDENT") without paging takes 100 sec.
>>>> 3) SQLTemplate(Student.class, "SELECT #result('id' 'int') from 
>>>> STUDENT") with page size 10 takes 5 sec.
>>>> 4) SQLTemplate(Student.class, "SELECT #result('id' 'int') from 
>>>> STUDENT") with page size 10 and fetching row data rows takes 4.5 sec.
>>>
>>> I see you still didn't use profiler, but the data you provided seems 
>>> to finally confirm that at least on Derby not fetching all columns 
>>> does result in significant speedup (#1 vs. #3). So now it is a 
>>> question of implementing the right algorithm for the 
>>> IncrementalFaultList.
>>>
>>>> Andrus, you mentioned using addCustomDbAttribute to fetch only part 
>>>> of the data. I tried to use addCustomDbAttribute("id") on client, 
>>>> it resulted in returning the raw dataRows, is there something I can 
>>>> do to fetch faulted objects ?
>>>
>>> We should encapsulate this logic inside IncrementalFaultList on the 
>>> server.
>>>
>>>> Our application was designed to use the SelectQuery. If we have to 
>>>> change that and use the SQLTemplate instead, there is a lot of work 
>>>> for us, including:
>>>
>>> Same thing - the right thing to do is to fix it on the server.
>>>
>>> Let me try to find a spare minute later tonight and implement 
>>> id-only fetch. I have some large tables in a MySQL5 so I can test 
>>> the performance in a slightly different environment.
>>>
>>> Andrus
>>>
>>>
>>>
>>> On Jun 26, 2007, at 9:26 AM, Marcin Skladaniec wrote:
>>>> Hi
>>>>
>>>> I have done some more profiling and testing.
>>>>
>>>> executing queries on table with >100000 records, directly on server 
>>>> (not on client) gave results as listed below:
>>>>
>>>> 1) SelectQuery(Student.class) with page size 10 takes 30-50 seconds.
>>>> 2) SQLTemplate(Student.class, "SELECT #result('id' 'int') from 
>>>> STUDENT") without paging takes 100 sec.
>>>> 3) SQLTemplate(Student.class, "SELECT #result('id' 'int') from 
>>>> STUDENT") with page size 10 takes 5 sec.
>>>> 4) SQLTemplate(Student.class, "SELECT #result('id' 'int') from 
>>>> STUDENT") with page size 10 and fetching row data rows takes 4.5 sec.
>>>>
>>>> what more, I found that executing the SQLTemplate does allow to 
>>>> fault the objects (I sometimes discover the simplest things last), 
>>>> so I did try to check how long it takes for the objects to be faulted:
>>>> 1) first object on every page (except first) 30-200ms, rest = 0ms
>>>> 2) objects is faulted in 20ms (average)
>>>> 3) on first page first object faulted in 200ms, rest  ~20 ms,
>>>>      on any following page first object faulted in 30-200ms, rest 
>>>> 0ms (interesting that the first page does not seem to be faulted at 
>>>> all)
>>>> 4) no point testing.
>>>>
>>>> Also I did check if the resizing of the ArrayList which is keeping 
>>>> the results does affect the speed, and it does not. (Tried to make 
>>>> the ArrayList initial size = 150,000).
>>>>
>>>> My conclusion is that SelectQuery with paging is usable only for 
>>>> fetching less than, say 10,000 records, otherwise the performance 
>>>> is to low. With SQLTemplate the performance is much greater. It 
>>>> applies to both ROP and 'normal' cayenne, since I made those tests 
>>>> on server.
>>>>
>>>> Andrus, you mentioned using addCustomDbAttribute to fetch only part 
>>>> of the data. I tried to use addCustomDbAttribute("id") on client, 
>>>> it resulted in returning the raw dataRows, is there something I can 
>>>> do to fetch faulted objects ?
>>>>
>>>> Our application was designed to use the SelectQuery. If we have to 
>>>> change that and use the SQLTemplate instead, there is a lot of work 
>>>> for us, including:
>>>> - dealing with adding and concatenating Expressions to the 
>>>> SQLTemplate (is there an easy way ?)
>>>> - dealing with declared qualifier (the one set in modeller)
>>>> - possibly more...
>>>> i would really like to avoid all of that, so if you have any ideas 
>>>> on how to improve the performance without too much hassle I would 
>>>> really appreciate.
>>>>
>>>> Marcin
>>>>
>>>> On 25/06/2007, at 8:31 PM, Marcin Skladaniec wrote:
>>>>
>>>>> Hi Andrus
>>>>> I had not much time to check, but with the fix the 100k records 
>>>>> load in 30 instead of 50 seconds. It is some improvement, but not 
>>>>> enough. I'll do some more profiling tomorrow and let you know.
>>>>>
>>>>> By the way,  we are using netbeans for profiling, the benefit : it 
>>>>> is free. I will evaluate the yourkit as we are moving away from 
>>>>> netbeans as a development platform.
>>>>>
>>>>> Marcin
>>>>>
>>>>> On 23/06/2007, at 5:38 PM, Andrus Adamchik wrote:
>>>>>
>>>>>> Ari, Marcin --
>>>>>>
>>>>>> going through the code I noticed one inefficiency - the elements 
>>>>>> array access is synchronized in 'fillIn' method. Since 'fillIn' 
>>>>>> is called from constructor, such synchronization is unneeded and 
>>>>>> only slows things down. I just checked a fixed version to trunk. 
>>>>>> Could you try it out?
>>>>>>
>>>>>> Andrus
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Jun 23, 2007, at 12:33 AM, Aristedes Maniatis wrote:
>>>>>>> On 22/06/2007, at 11:10 PM, Michael Gentry wrote:
>>>>>>>
>>>>>>>> Marcin, this thread might be of interest to you ...
>>>>>>>>
>>>>>>>> http://mail-archives.apache.org/mod_mbox/cayenne-dev/200705.mbox/browser 
>>>>>>>>
>>>>>>>>
>>>>>>>> Look at the "Paging and SQL queries" thread on May 25.
>>>>>>>
>>>>>>>
>>>>>>> Yes, this is the same project we are working on. I started some 
>>>>>>> performance profiling and Marcin has been able now to take it 
>>>>>>> much further. What is it about:
>>>>>>>
>>>>>>>> elements.add(it.nextObjectId(entity));
>>>>>>>
>>>>>>> which is so slow? The code gets a little complex at that point 
>>>>>>> and we are having difficulty tracing it through to the exact 
>>>>>>> performance problem in the underlying code. Is it the speed of 
>>>>>>> adding the object id to the Collection or the speed of creating 
>>>>>>> an object id itself? 0.5ms doesn't sound slow, but it doesn't 
>>>>>>> scale well.
>>>>>>>
>>>>>>> Andrus, I got the impression from the previous thread that you 
>>>>>>> suspected something slightly different. That the performance 
>>>>>>> problem might be in the fat query itself, but from our tests 
>>>>>>> this isn't the case. If I've got this right, the way it works is:
>>>>>>>
>>>>>>> 1. perform regular query to get all columns but return result in 
>>>>>>> iterator
>>>>>>> 2. iterate through first page and build full objects
>>>>>>> 3. iterate through other pages and build just objectids (this is 
>>>>>>> the slow part for us)
>>>>>>> 4. when another page is fetched perform another query and fetch 
>>>>>>> those objects from the DB
>>>>>>>
>>>>>>> So, getting just primary keys from the DB may not be any faster 
>>>>>>> if the performance problem is simply in the construction of 
>>>>>>> objectIds. If the performance problem is in the numerous 
>>>>>>> resizings of the Collection (each time it runs out of space, 
>>>>>>> then it is increased by 50% or 100% in size), then the solution 
>>>>>>> could be as simple as figuring out the size of the iterator and 
>>>>>>> sizing the collection appropriately from the beginning.
>>>>>>>
>>>>>>> Any ideas on how to discover the exact cause of the performance 
>>>>>>> hit?
>>>>>>>
>>>>>>>
>>>>>>> Ari Maniatis
>>>>>>>
>>>>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>>
>>
>

-- 
bbLOG <http://borutb.tuditi.delo.si/>

Re: paged query slow when fetching big lists

Posted by Andrus Adamchik <an...@objectstyle.org>.
Hi Marcin,

I have good news (I think). Fetching just id columns inside the  
IncrementalFaultList indeed speeds things up significantly. I just  
committed the change to optimize SelectQueries to do just that.  
Please let me know how does it work for you.

Now the profiling details...

* I profiled on Derby and MySQL. In both cases fetching a table with  
25 columns and 100000 rows took between 3-4 seconds (not as long as  
in your case, but what's important is relative times I guess)
* YourKit clearly showed the bottleneck: ~95% of the 'fillIn' method  
is spent in the driver code, rewinding the result set (i.e. brining  
the data from db to the client).
* After my change the query time went down to 0.2-0.5 second (0.8 if  
you consider the second query needed to fault the first page). Not bad!
* ResultSet reading still remained a bottleneck, but it became faster  
in absolute terms. And now finally Cayenne-related code (such as  
DataRow creation) started to show up on the radar (e.g. DataRow  
constructor taking 3% of the 'fillIn' method time).

Andrus



On Jun 26, 2007, at 10:55 AM, Marcin Skladaniec wrote:

> Hi Andrus !
> Many thanks for that !
> Marcin
>
> On 26/06/2007, at 5:39 PM, Andrus Adamchik wrote:
>
>> Hi Marcin,
>>
>>> 1) SelectQuery(Student.class) with page size 10 takes 30-50 seconds.
>>> 2) SQLTemplate(Student.class, "SELECT #result('id' 'int') from  
>>> STUDENT") without paging takes 100 sec.
>>> 3) SQLTemplate(Student.class, "SELECT #result('id' 'int') from  
>>> STUDENT") with page size 10 takes 5 sec.
>>> 4) SQLTemplate(Student.class, "SELECT #result('id' 'int') from  
>>> STUDENT") with page size 10 and fetching row data rows takes 4.5  
>>> sec.
>>
>> I see you still didn't use profiler, but the data you provided  
>> seems to finally confirm that at least on Derby not fetching all  
>> columns does result in significant speedup (#1 vs. #3). So now it  
>> is a question of implementing the right algorithm for the  
>> IncrementalFaultList.
>>
>>> Andrus, you mentioned using addCustomDbAttribute to fetch only  
>>> part of the data. I tried to use addCustomDbAttribute("id") on  
>>> client, it resulted in returning the raw dataRows, is there  
>>> something I can do to fetch faulted objects ?
>>
>> We should encapsulate this logic inside IncrementalFaultList on  
>> the server.
>>
>>> Our application was designed to use the SelectQuery. If we have  
>>> to change that and use the SQLTemplate instead, there is a lot of  
>>> work for us, including:
>>
>> Same thing - the right thing to do is to fix it on the server.
>>
>> Let me try to find a spare minute later tonight and implement id- 
>> only fetch. I have some large tables in a MySQL5 so I can test the  
>> performance in a slightly different environment.
>>
>> Andrus
>>
>>
>>
>> On Jun 26, 2007, at 9:26 AM, Marcin Skladaniec wrote:
>>> Hi
>>>
>>> I have done some more profiling and testing.
>>>
>>> executing queries on table with >100000 records, directly on  
>>> server (not on client) gave results as listed below:
>>>
>>> 1) SelectQuery(Student.class) with page size 10 takes 30-50 seconds.
>>> 2) SQLTemplate(Student.class, "SELECT #result('id' 'int') from  
>>> STUDENT") without paging takes 100 sec.
>>> 3) SQLTemplate(Student.class, "SELECT #result('id' 'int') from  
>>> STUDENT") with page size 10 takes 5 sec.
>>> 4) SQLTemplate(Student.class, "SELECT #result('id' 'int') from  
>>> STUDENT") with page size 10 and fetching row data rows takes 4.5  
>>> sec.
>>>
>>> what more, I found that executing the SQLTemplate does allow to  
>>> fault the objects (I sometimes discover the simplest things  
>>> last), so I did try to check how long it takes for the objects to  
>>> be faulted:
>>> 1) first object on every page (except first) 30-200ms, rest = 0ms
>>> 2) objects is faulted in 20ms (average)
>>> 3) on first page first object faulted in 200ms, rest  ~20 ms,
>>>      on any following page first object faulted in 30-200ms, rest  
>>> 0ms (interesting that the first page does not seem to be faulted  
>>> at all)
>>> 4) no point testing.
>>>
>>> Also I did check if the resizing of the ArrayList which is  
>>> keeping the results does affect the speed, and it does not.  
>>> (Tried to make the ArrayList initial size = 150,000).
>>>
>>> My conclusion is that SelectQuery with paging is usable only for  
>>> fetching less than, say 10,000 records, otherwise the performance  
>>> is to low. With SQLTemplate the performance is much greater. It  
>>> applies to both ROP and 'normal' cayenne, since I made those  
>>> tests on server.
>>>
>>> Andrus, you mentioned using addCustomDbAttribute to fetch only  
>>> part of the data. I tried to use addCustomDbAttribute("id") on  
>>> client, it resulted in returning the raw dataRows, is there  
>>> something I can do to fetch faulted objects ?
>>>
>>> Our application was designed to use the SelectQuery. If we have  
>>> to change that and use the SQLTemplate instead, there is a lot of  
>>> work for us, including:
>>> - dealing with adding and concatenating Expressions to the  
>>> SQLTemplate (is there an easy way ?)
>>> - dealing with declared qualifier (the one set in modeller)
>>> - possibly more...
>>> i would really like to avoid all of that, so if you have any  
>>> ideas on how to improve the performance without too much hassle I  
>>> would really appreciate.
>>>
>>> Marcin
>>>
>>> On 25/06/2007, at 8:31 PM, Marcin Skladaniec wrote:
>>>
>>>> Hi Andrus
>>>> I had not much time to check, but with the fix the 100k records  
>>>> load in 30 instead of 50 seconds. It is some improvement, but  
>>>> not enough. I'll do some more profiling tomorrow and let you know.
>>>>
>>>> By the way,  we are using netbeans for profiling, the benefit :  
>>>> it is free. I will evaluate the yourkit as we are moving away  
>>>> from netbeans as a development platform.
>>>>
>>>> Marcin
>>>>
>>>> On 23/06/2007, at 5:38 PM, Andrus Adamchik wrote:
>>>>
>>>>> Ari, Marcin --
>>>>>
>>>>> going through the code I noticed one inefficiency - the  
>>>>> elements array access is synchronized in 'fillIn' method. Since  
>>>>> 'fillIn' is called from constructor, such synchronization is  
>>>>> unneeded and only slows things down. I just checked a fixed  
>>>>> version to trunk. Could you try it out?
>>>>>
>>>>> Andrus
>>>>>
>>>>>
>>>>>
>>>>> On Jun 23, 2007, at 12:33 AM, Aristedes Maniatis wrote:
>>>>>> On 22/06/2007, at 11:10 PM, Michael Gentry wrote:
>>>>>>
>>>>>>> Marcin, this thread might be of interest to you ...
>>>>>>>
>>>>>>> http://mail-archives.apache.org/mod_mbox/cayenne-dev/ 
>>>>>>> 200705.mbox/browser
>>>>>>>
>>>>>>> Look at the "Paging and SQL queries" thread on May 25.
>>>>>>
>>>>>>
>>>>>> Yes, this is the same project we are working on. I started  
>>>>>> some performance profiling and Marcin has been able now to  
>>>>>> take it much further. What is it about:
>>>>>>
>>>>>>> elements.add(it.nextObjectId(entity));
>>>>>>
>>>>>> which is so slow? The code gets a little complex at that point  
>>>>>> and we are having difficulty tracing it through to the exact  
>>>>>> performance problem in the underlying code. Is it the speed of  
>>>>>> adding the object id to the Collection or the speed of  
>>>>>> creating an object id itself? 0.5ms doesn't sound slow, but it  
>>>>>> doesn't scale well.
>>>>>>
>>>>>> Andrus, I got the impression from the previous thread that you  
>>>>>> suspected something slightly different. That the performance  
>>>>>> problem might be in the fat query itself, but from our tests  
>>>>>> this isn't the case. If I've got this right, the way it works is:
>>>>>>
>>>>>> 1. perform regular query to get all columns but return result  
>>>>>> in iterator
>>>>>> 2. iterate through first page and build full objects
>>>>>> 3. iterate through other pages and build just objectids (this  
>>>>>> is the slow part for us)
>>>>>> 4. when another page is fetched perform another query and  
>>>>>> fetch those objects from the DB
>>>>>>
>>>>>> So, getting just primary keys from the DB may not be any  
>>>>>> faster if the performance problem is simply in the  
>>>>>> construction of objectIds. If the performance problem is in  
>>>>>> the numerous resizings of the Collection (each time it runs  
>>>>>> out of space, then it is increased by 50% or 100% in size),  
>>>>>> then the solution could be as simple as figuring out the size  
>>>>>> of the iterator and sizing the collection appropriately from  
>>>>>> the beginning.
>>>>>>
>>>>>> Any ideas on how to discover the exact cause of the  
>>>>>> performance hit?
>>>>>>
>>>>>>
>>>>>> Ari Maniatis
>>>>>>
>>>>>>
>>>
>>>
>>>
>>
>
>
>
>


Re: paged query slow when fetching big lists

Posted by Marcin Skladaniec <ma...@ish.com.au>.
Hi Andrus !
Many thanks for that !
Marcin

On 26/06/2007, at 5:39 PM, Andrus Adamchik wrote:

> Hi Marcin,
>
>> 1) SelectQuery(Student.class) with page size 10 takes 30-50 seconds.
>> 2) SQLTemplate(Student.class, "SELECT #result('id' 'int') from  
>> STUDENT") without paging takes 100 sec.
>> 3) SQLTemplate(Student.class, "SELECT #result('id' 'int') from  
>> STUDENT") with page size 10 takes 5 sec.
>> 4) SQLTemplate(Student.class, "SELECT #result('id' 'int') from  
>> STUDENT") with page size 10 and fetching row data rows takes 4.5 sec.
>
> I see you still didn't use profiler, but the data you provided  
> seems to finally confirm that at least on Derby not fetching all  
> columns does result in significant speedup (#1 vs. #3). So now it  
> is a question of implementing the right algorithm for the  
> IncrementalFaultList.
>
>> Andrus, you mentioned using addCustomDbAttribute to fetch only  
>> part of the data. I tried to use addCustomDbAttribute("id") on  
>> client, it resulted in returning the raw dataRows, is there  
>> something I can do to fetch faulted objects ?
>
> We should encapsulate this logic inside IncrementalFaultList on the  
> server.
>
>> Our application was designed to use the SelectQuery. If we have to  
>> change that and use the SQLTemplate instead, there is a lot of  
>> work for us, including:
>
> Same thing - the right thing to do is to fix it on the server.
>
> Let me try to find a spare minute later tonight and implement id- 
> only fetch. I have some large tables in a MySQL5 so I can test the  
> performance in a slightly different environment.
>
> Andrus
>
>
>
> On Jun 26, 2007, at 9:26 AM, Marcin Skladaniec wrote:
>> Hi
>>
>> I have done some more profiling and testing.
>>
>> executing queries on table with >100000 records, directly on  
>> server (not on client) gave results as listed below:
>>
>> 1) SelectQuery(Student.class) with page size 10 takes 30-50 seconds.
>> 2) SQLTemplate(Student.class, "SELECT #result('id' 'int') from  
>> STUDENT") without paging takes 100 sec.
>> 3) SQLTemplate(Student.class, "SELECT #result('id' 'int') from  
>> STUDENT") with page size 10 takes 5 sec.
>> 4) SQLTemplate(Student.class, "SELECT #result('id' 'int') from  
>> STUDENT") with page size 10 and fetching row data rows takes 4.5 sec.
>>
>> what more, I found that executing the SQLTemplate does allow to  
>> fault the objects (I sometimes discover the simplest things last),  
>> so I did try to check how long it takes for the objects to be  
>> faulted:
>> 1) first object on every page (except first) 30-200ms, rest = 0ms
>> 2) objects is faulted in 20ms (average)
>> 3) on first page first object faulted in 200ms, rest  ~20 ms,
>>      on any following page first object faulted in 30-200ms, rest  
>> 0ms (interesting that the first page does not seem to be faulted  
>> at all)
>> 4) no point testing.
>>
>> Also I did check if the resizing of the ArrayList which is keeping  
>> the results does affect the speed, and it does not. (Tried to make  
>> the ArrayList initial size = 150,000).
>>
>> My conclusion is that SelectQuery with paging is usable only for  
>> fetching less than, say 10,000 records, otherwise the performance  
>> is to low. With SQLTemplate the performance is much greater. It  
>> applies to both ROP and 'normal' cayenne, since I made those tests  
>> on server.
>>
>> Andrus, you mentioned using addCustomDbAttribute to fetch only  
>> part of the data. I tried to use addCustomDbAttribute("id") on  
>> client, it resulted in returning the raw dataRows, is there  
>> something I can do to fetch faulted objects ?
>>
>> Our application was designed to use the SelectQuery. If we have to  
>> change that and use the SQLTemplate instead, there is a lot of  
>> work for us, including:
>> - dealing with adding and concatenating Expressions to the  
>> SQLTemplate (is there an easy way ?)
>> - dealing with declared qualifier (the one set in modeller)
>> - possibly more...
>> i would really like to avoid all of that, so if you have any ideas  
>> on how to improve the performance without too much hassle I would  
>> really appreciate.
>>
>> Marcin
>>
>> On 25/06/2007, at 8:31 PM, Marcin Skladaniec wrote:
>>
>>> Hi Andrus
>>> I had not much time to check, but with the fix the 100k records  
>>> load in 30 instead of 50 seconds. It is some improvement, but not  
>>> enough. I'll do some more profiling tomorrow and let you know.
>>>
>>> By the way,  we are using netbeans for profiling, the benefit :  
>>> it is free. I will evaluate the yourkit as we are moving away  
>>> from netbeans as a development platform.
>>>
>>> Marcin
>>>
>>> On 23/06/2007, at 5:38 PM, Andrus Adamchik wrote:
>>>
>>>> Ari, Marcin --
>>>>
>>>> going through the code I noticed one inefficiency - the elements  
>>>> array access is synchronized in 'fillIn' method. Since 'fillIn'  
>>>> is called from constructor, such synchronization is unneeded and  
>>>> only slows things down. I just checked a fixed version to trunk.  
>>>> Could you try it out?
>>>>
>>>> Andrus
>>>>
>>>>
>>>>
>>>> On Jun 23, 2007, at 12:33 AM, Aristedes Maniatis wrote:
>>>>> On 22/06/2007, at 11:10 PM, Michael Gentry wrote:
>>>>>
>>>>>> Marcin, this thread might be of interest to you ...
>>>>>>
>>>>>> http://mail-archives.apache.org/mod_mbox/cayenne-dev/ 
>>>>>> 200705.mbox/browser
>>>>>>
>>>>>> Look at the "Paging and SQL queries" thread on May 25.
>>>>>
>>>>>
>>>>> Yes, this is the same project we are working on. I started some  
>>>>> performance profiling and Marcin has been able now to take it  
>>>>> much further. What is it about:
>>>>>
>>>>>> elements.add(it.nextObjectId(entity));
>>>>>
>>>>> which is so slow? The code gets a little complex at that point  
>>>>> and we are having difficulty tracing it through to the exact  
>>>>> performance problem in the underlying code. Is it the speed of  
>>>>> adding the object id to the Collection or the speed of creating  
>>>>> an object id itself? 0.5ms doesn't sound slow, but it doesn't  
>>>>> scale well.
>>>>>
>>>>> Andrus, I got the impression from the previous thread that you  
>>>>> suspected something slightly different. That the performance  
>>>>> problem might be in the fat query itself, but from our tests  
>>>>> this isn't the case. If I've got this right, the way it works is:
>>>>>
>>>>> 1. perform regular query to get all columns but return result  
>>>>> in iterator
>>>>> 2. iterate through first page and build full objects
>>>>> 3. iterate through other pages and build just objectids (this  
>>>>> is the slow part for us)
>>>>> 4. when another page is fetched perform another query and fetch  
>>>>> those objects from the DB
>>>>>
>>>>> So, getting just primary keys from the DB may not be any faster  
>>>>> if the performance problem is simply in the construction of  
>>>>> objectIds. If the performance problem is in the numerous  
>>>>> resizings of the Collection (each time it runs out of space,  
>>>>> then it is increased by 50% or 100% in size), then the solution  
>>>>> could be as simple as figuring out the size of the iterator and  
>>>>> sizing the collection appropriately from the beginning.
>>>>>
>>>>> Any ideas on how to discover the exact cause of the performance  
>>>>> hit?
>>>>>
>>>>>
>>>>> Ari Maniatis
>>>>>
>>>>>
>>
>>
>>
>




Re: paged query slow when fetching big lists

Posted by Andrus Adamchik <an...@objectstyle.org>.
Hi Marcin,

> 1) SelectQuery(Student.class) with page size 10 takes 30-50 seconds.
> 2) SQLTemplate(Student.class, "SELECT #result('id' 'int') from  
> STUDENT") without paging takes 100 sec.
> 3) SQLTemplate(Student.class, "SELECT #result('id' 'int') from  
> STUDENT") with page size 10 takes 5 sec.
> 4) SQLTemplate(Student.class, "SELECT #result('id' 'int') from  
> STUDENT") with page size 10 and fetching row data rows takes 4.5 sec.

I see you still didn't use profiler, but the data you provided seems  
to finally confirm that at least on Derby not fetching all columns  
does result in significant speedup (#1 vs. #3). So now it is a  
question of implementing the right algorithm for the  
IncrementalFaultList.

> Andrus, you mentioned using addCustomDbAttribute to fetch only part  
> of the data. I tried to use addCustomDbAttribute("id") on client,  
> it resulted in returning the raw dataRows, is there something I can  
> do to fetch faulted objects ?

We should encapsulate this logic inside IncrementalFaultList on the  
server.

> Our application was designed to use the SelectQuery. If we have to  
> change that and use the SQLTemplate instead, there is a lot of work  
> for us, including:

Same thing - the right thing to do is to fix it on the server.

Let me try to find a spare minute later tonight and implement id-only  
fetch. I have some large tables in a MySQL5 so I can test the  
performance in a slightly different environment.

Andrus



On Jun 26, 2007, at 9:26 AM, Marcin Skladaniec wrote:
> Hi
>
> I have done some more profiling and testing.
>
> executing queries on table with >100000 records, directly on server  
> (not on client) gave results as listed below:
>
> 1) SelectQuery(Student.class) with page size 10 takes 30-50 seconds.
> 2) SQLTemplate(Student.class, "SELECT #result('id' 'int') from  
> STUDENT") without paging takes 100 sec.
> 3) SQLTemplate(Student.class, "SELECT #result('id' 'int') from  
> STUDENT") with page size 10 takes 5 sec.
> 4) SQLTemplate(Student.class, "SELECT #result('id' 'int') from  
> STUDENT") with page size 10 and fetching row data rows takes 4.5 sec.
>
> what more, I found that executing the SQLTemplate does allow to  
> fault the objects (I sometimes discover the simplest things last),  
> so I did try to check how long it takes for the objects to be faulted:
> 1) first object on every page (except first) 30-200ms, rest = 0ms
> 2) objects is faulted in 20ms (average)
> 3) on first page first object faulted in 200ms, rest  ~20 ms,
>      on any following page first object faulted in 30-200ms, rest  
> 0ms (interesting that the first page does not seem to be faulted at  
> all)
> 4) no point testing.
>
> Also I did check if the resizing of the ArrayList which is keeping  
> the results does affect the speed, and it does not. (Tried to make  
> the ArrayList initial size = 150,000).
>
> My conclusion is that SelectQuery with paging is usable only for  
> fetching less than, say 10,000 records, otherwise the performance  
> is to low. With SQLTemplate the performance is much greater. It  
> applies to both ROP and 'normal' cayenne, since I made those tests  
> on server.
>
> Andrus, you mentioned using addCustomDbAttribute to fetch only part  
> of the data. I tried to use addCustomDbAttribute("id") on client,  
> it resulted in returning the raw dataRows, is there something I can  
> do to fetch faulted objects ?
>
> Our application was designed to use the SelectQuery. If we have to  
> change that and use the SQLTemplate instead, there is a lot of work  
> for us, including:
> - dealing with adding and concatenating Expressions to the  
> SQLTemplate (is there an easy way ?)
> - dealing with declared qualifier (the one set in modeller)
> - possibly more...
> i would really like to avoid all of that, so if you have any ideas  
> on how to improve the performance without too much hassle I would  
> really appreciate.
>
> Marcin
>
> On 25/06/2007, at 8:31 PM, Marcin Skladaniec wrote:
>
>> Hi Andrus
>> I had not much time to check, but with the fix the 100k records  
>> load in 30 instead of 50 seconds. It is some improvement, but not  
>> enough. I'll do some more profiling tomorrow and let you know.
>>
>> By the way,  we are using netbeans for profiling, the benefit : it  
>> is free. I will evaluate the yourkit as we are moving away from  
>> netbeans as a development platform.
>>
>> Marcin
>>
>> On 23/06/2007, at 5:38 PM, Andrus Adamchik wrote:
>>
>>> Ari, Marcin --
>>>
>>> going through the code I noticed one inefficiency - the elements  
>>> array access is synchronized in 'fillIn' method. Since 'fillIn'  
>>> is called from constructor, such synchronization is unneeded and  
>>> only slows things down. I just checked a fixed version to trunk.  
>>> Could you try it out?
>>>
>>> Andrus
>>>
>>>
>>>
>>> On Jun 23, 2007, at 12:33 AM, Aristedes Maniatis wrote:
>>>> On 22/06/2007, at 11:10 PM, Michael Gentry wrote:
>>>>
>>>>> Marcin, this thread might be of interest to you ...
>>>>>
>>>>> http://mail-archives.apache.org/mod_mbox/cayenne-dev/ 
>>>>> 200705.mbox/browser
>>>>>
>>>>> Look at the "Paging and SQL queries" thread on May 25.
>>>>
>>>>
>>>> Yes, this is the same project we are working on. I started some  
>>>> performance profiling and Marcin has been able now to take it  
>>>> much further. What is it about:
>>>>
>>>>> elements.add(it.nextObjectId(entity));
>>>>
>>>> which is so slow? The code gets a little complex at that point  
>>>> and we are having difficulty tracing it through to the exact  
>>>> performance problem in the underlying code. Is it the speed of  
>>>> adding the object id to the Collection or the speed of creating  
>>>> an object id itself? 0.5ms doesn't sound slow, but it doesn't  
>>>> scale well.
>>>>
>>>> Andrus, I got the impression from the previous thread that you  
>>>> suspected something slightly different. That the performance  
>>>> problem might be in the fat query itself, but from our tests  
>>>> this isn't the case. If I've got this right, the way it works is:
>>>>
>>>> 1. perform regular query to get all columns but return result in  
>>>> iterator
>>>> 2. iterate through first page and build full objects
>>>> 3. iterate through other pages and build just objectids (this is  
>>>> the slow part for us)
>>>> 4. when another page is fetched perform another query and fetch  
>>>> those objects from the DB
>>>>
>>>> So, getting just primary keys from the DB may not be any faster  
>>>> if the performance problem is simply in the construction of  
>>>> objectIds. If the performance problem is in the numerous  
>>>> resizings of the Collection (each time it runs out of space,  
>>>> then it is increased by 50% or 100% in size), then the solution  
>>>> could be as simple as figuring out the size of the iterator and  
>>>> sizing the collection appropriately from the beginning.
>>>>
>>>> Any ideas on how to discover the exact cause of the performance  
>>>> hit?
>>>>
>>>>
>>>> Ari Maniatis
>>>>
>>>>
>
>
>


Re: paged query slow when fetching big lists

Posted by Marcin Skladaniec <ma...@ish.com.au>.
Hi

I have done some more profiling and testing.

executing queries on table with >100000 records, directly on server  
(not on client) gave results as listed below:

1) SelectQuery(Student.class) with page size 10 takes 30-50 seconds.
2) SQLTemplate(Student.class, "SELECT #result('id' 'int') from  
STUDENT") without paging takes 100 sec.
3) SQLTemplate(Student.class, "SELECT #result('id' 'int') from  
STUDENT") with page size 10 takes 5 sec.
4) SQLTemplate(Student.class, "SELECT #result('id' 'int') from  
STUDENT") with page size 10 and fetching row data rows takes 4.5 sec.

what more, I found that executing the SQLTemplate does allow to fault  
the objects (I sometimes discover the simplest things last), so I did  
try to check how long it takes for the objects to be faulted:
1) first object on every page (except first) 30-200ms, rest = 0ms
2) objects is faulted in 20ms (average)
3) on first page first object faulted in 200ms, rest  ~20 ms,
      on any following page first object faulted in 30-200ms, rest  
0ms (interesting that the first page does not seem to be faulted at all)
4) no point testing.

Also I did check if the resizing of the ArrayList which is keeping  
the results does affect the speed, and it does not. (Tried to make  
the ArrayList initial size = 150,000).

My conclusion is that SelectQuery with paging is usable only for  
fetching less than, say 10,000 records, otherwise the performance is  
to low. With SQLTemplate the performance is much greater. It applies  
to both ROP and 'normal' cayenne, since I made those tests on server.

Andrus, you mentioned using addCustomDbAttribute to fetch only part  
of the data. I tried to use addCustomDbAttribute("id") on client, it  
resulted in returning the raw dataRows, is there something I can do  
to fetch faulted objects ?

Our application was designed to use the SelectQuery. If we have to  
change that and use the SQLTemplate instead, there is a lot of work  
for us, including:
- dealing with adding and concatenating Expressions to the  
SQLTemplate (is there an easy way ?)
- dealing with declared qualifier (the one set in modeller)
- possibly more...
i would really like to avoid all of that, so if you have any ideas on  
how to improve the performance without too much hassle I would really  
appreciate.

Marcin

On 25/06/2007, at 8:31 PM, Marcin Skladaniec wrote:

> Hi Andrus
> I had not much time to check, but with the fix the 100k records  
> load in 30 instead of 50 seconds. It is some improvement, but not  
> enough. I'll do some more profiling tomorrow and let you know.
>
> By the way,  we are using netbeans for profiling, the benefit : it  
> is free. I will evaluate the yourkit as we are moving away from  
> netbeans as a development platform.
>
> Marcin
>
> On 23/06/2007, at 5:38 PM, Andrus Adamchik wrote:
>
>> Ari, Marcin --
>>
>> going through the code I noticed one inefficiency - the elements  
>> array access is synchronized in 'fillIn' method. Since 'fillIn' is  
>> called from constructor, such synchronization is unneeded and only  
>> slows things down. I just checked a fixed version to trunk. Could  
>> you try it out?
>>
>> Andrus
>>
>>
>>
>> On Jun 23, 2007, at 12:33 AM, Aristedes Maniatis wrote:
>>> On 22/06/2007, at 11:10 PM, Michael Gentry wrote:
>>>
>>>> Marcin, this thread might be of interest to you ...
>>>>
>>>> http://mail-archives.apache.org/mod_mbox/cayenne-dev/200705.mbox/ 
>>>> browser
>>>>
>>>> Look at the "Paging and SQL queries" thread on May 25.
>>>
>>>
>>> Yes, this is the same project we are working on. I started some  
>>> performance profiling and Marcin has been able now to take it  
>>> much further. What is it about:
>>>
>>>> elements.add(it.nextObjectId(entity));
>>>
>>> which is so slow? The code gets a little complex at that point  
>>> and we are having difficulty tracing it through to the exact  
>>> performance problem in the underlying code. Is it the speed of  
>>> adding the object id to the Collection or the speed of creating  
>>> an object id itself? 0.5ms doesn't sound slow, but it doesn't  
>>> scale well.
>>>
>>> Andrus, I got the impression from the previous thread that you  
>>> suspected something slightly different. That the performance  
>>> problem might be in the fat query itself, but from our tests this  
>>> isn't the case. If I've got this right, the way it works is:
>>>
>>> 1. perform regular query to get all columns but return result in  
>>> iterator
>>> 2. iterate through first page and build full objects
>>> 3. iterate through other pages and build just objectids (this is  
>>> the slow part for us)
>>> 4. when another page is fetched perform another query and fetch  
>>> those objects from the DB
>>>
>>> So, getting just primary keys from the DB may not be any faster  
>>> if the performance problem is simply in the construction of  
>>> objectIds. If the performance problem is in the numerous  
>>> resizings of the Collection (each time it runs out of space, then  
>>> it is increased by 50% or 100% in size), then the solution could  
>>> be as simple as figuring out the size of the iterator and sizing  
>>> the collection appropriately from the beginning.
>>>
>>> Any ideas on how to discover the exact cause of the performance hit?
>>>
>>>
>>> Ari Maniatis
>>>
>>>



Re: paged query slow when fetching big lists

Posted by Marcin Skladaniec <ma...@ish.com.au>.
Hi Andrus
I had not much time to check, but with the fix the 100k records load  
in 30 instead of 50 seconds. It is some improvement, but not enough.  
I'll do some more profiling tomorrow and let you know.

By the way,  we are using netbeans for profiling, the benefit : it is  
free. I will evaluate the yourkit as we are moving away from netbeans  
as a development platform.

Marcin

On 23/06/2007, at 5:38 PM, Andrus Adamchik wrote:

> Ari, Marcin --
>
> going through the code I noticed one inefficiency - the elements  
> array access is synchronized in 'fillIn' method. Since 'fillIn' is  
> called from constructor, such synchronization is unneeded and only  
> slows things down. I just checked a fixed version to trunk. Could  
> you try it out?
>
> Andrus
>
>
>
> On Jun 23, 2007, at 12:33 AM, Aristedes Maniatis wrote:
>> On 22/06/2007, at 11:10 PM, Michael Gentry wrote:
>>
>>> Marcin, this thread might be of interest to you ...
>>>
>>> http://mail-archives.apache.org/mod_mbox/cayenne-dev/200705.mbox/ 
>>> browser
>>>
>>> Look at the "Paging and SQL queries" thread on May 25.
>>
>>
>> Yes, this is the same project we are working on. I started some  
>> performance profiling and Marcin has been able now to take it much  
>> further. What is it about:
>>
>>> elements.add(it.nextObjectId(entity));
>>
>> which is so slow? The code gets a little complex at that point and  
>> we are having difficulty tracing it through to the exact  
>> performance problem in the underlying code. Is it the speed of  
>> adding the object id to the Collection or the speed of creating an  
>> object id itself? 0.5ms doesn't sound slow, but it doesn't scale  
>> well.
>>
>> Andrus, I got the impression from the previous thread that you  
>> suspected something slightly different. That the performance  
>> problem might be in the fat query itself, but from our tests this  
>> isn't the case. If I've got this right, the way it works is:
>>
>> 1. perform regular query to get all columns but return result in  
>> iterator
>> 2. iterate through first page and build full objects
>> 3. iterate through other pages and build just objectids (this is  
>> the slow part for us)
>> 4. when another page is fetched perform another query and fetch  
>> those objects from the DB
>>
>> So, getting just primary keys from the DB may not be any faster if  
>> the performance problem is simply in the construction of  
>> objectIds. If the performance problem is in the numerous resizings  
>> of the Collection (each time it runs out of space, then it is  
>> increased by 50% or 100% in size), then the solution could be as  
>> simple as figuring out the size of the iterator and sizing the  
>> collection appropriately from the beginning.
>>
>> Any ideas on how to discover the exact cause of the performance hit?
>>
>>
>> Ari Maniatis
>>
>>
>>
>>
>> -------------------------->
>> ish
>> http://www.ish.com.au
>> Level 1, 30 Wilson Street Newtown 2042 Australia
>> phone +61 2 9550 5001   fax +61 2 9550 4001
>> GPG fingerprint CBFB 84B4 738D 4E87 5E5C  5EFA EF6A 7D2E 3E49 102A
>>
>>
>

Marcin




Re: paged query slow when fetching big lists

Posted by Andrus Adamchik <an...@objectstyle.org>.
You need to use a profiler. You can try an evaluation version of  
YourKit for example. It has a nice Eclipse plugin, so you can profile  
straight from Eclipse:

    http://www.yourkit.com/

(wonder if ASF has a free open source license for YourKit?  
ObjectStyle does - it was granted to Cayenne project in the past).


I think we can probably optimize ID Map creation by storing the  
values in an Object[] instead of a Map, but we do profile data to  
tell whether it is even a bottleneck (and not the ResultSet reading  
for instance).

Andrus


On Jun 23, 2007, at 9:38 AM, Andrus Adamchik wrote:
>>
>> Any ideas on how to discover the exact cause of the performance hit?


Re: paged query slow when fetching big lists

Posted by Andrus Adamchik <an...@objectstyle.org>.
Ari, Marcin --

going through the code I noticed one inefficiency - the elements  
array access is synchronized in 'fillIn' method. Since 'fillIn' is  
called from constructor, such synchronization is unneeded and only  
slows things down. I just checked a fixed version to trunk. Could you  
try it out?

Andrus



On Jun 23, 2007, at 12:33 AM, Aristedes Maniatis wrote:
> On 22/06/2007, at 11:10 PM, Michael Gentry wrote:
>
>> Marcin, this thread might be of interest to you ...
>>
>> http://mail-archives.apache.org/mod_mbox/cayenne-dev/200705.mbox/ 
>> browser
>>
>> Look at the "Paging and SQL queries" thread on May 25.
>
>
> Yes, this is the same project we are working on. I started some  
> performance profiling and Marcin has been able now to take it much  
> further. What is it about:
>
>> elements.add(it.nextObjectId(entity));
>
> which is so slow? The code gets a little complex at that point and  
> we are having difficulty tracing it through to the exact  
> performance problem in the underlying code. Is it the speed of  
> adding the object id to the Collection or the speed of creating an  
> object id itself? 0.5ms doesn't sound slow, but it doesn't scale well.
>
> Andrus, I got the impression from the previous thread that you  
> suspected something slightly different. That the performance  
> problem might be in the fat query itself, but from our tests this  
> isn't the case. If I've got this right, the way it works is:
>
> 1. perform regular query to get all columns but return result in  
> iterator
> 2. iterate through first page and build full objects
> 3. iterate through other pages and build just objectids (this is  
> the slow part for us)
> 4. when another page is fetched perform another query and fetch  
> those objects from the DB
>
> So, getting just primary keys from the DB may not be any faster if  
> the performance problem is simply in the construction of objectIds.  
> If the performance problem is in the numerous resizings of the  
> Collection (each time it runs out of space, then it is increased by  
> 50% or 100% in size), then the solution could be as simple as  
> figuring out the size of the iterator and sizing the collection  
> appropriately from the beginning.
>
> Any ideas on how to discover the exact cause of the performance hit?
>
>
> Ari Maniatis
>
>
>
>
> -------------------------->
> ish
> http://www.ish.com.au
> Level 1, 30 Wilson Street Newtown 2042 Australia
> phone +61 2 9550 5001   fax +61 2 9550 4001
> GPG fingerprint CBFB 84B4 738D 4E87 5E5C  5EFA EF6A 7D2E 3E49 102A
>
>


Re: paged query slow when fetching big lists

Posted by Aristedes Maniatis <ar...@maniatis.org>.
On 22/06/2007, at 11:10 PM, Michael Gentry wrote:

> Marcin, this thread might be of interest to you ...
>
> http://mail-archives.apache.org/mod_mbox/cayenne-dev/200705.mbox/ 
> browser
>
> Look at the "Paging and SQL queries" thread on May 25.


Yes, this is the same project we are working on. I started some  
performance profiling and Marcin has been able now to take it much  
further. What is it about:

> elements.add(it.nextObjectId(entity));

which is so slow? The code gets a little complex at that point and we  
are having difficulty tracing it through to the exact performance  
problem in the underlying code. Is it the speed of adding the object  
id to the Collection or the speed of creating an object id itself?  
0.5ms doesn't sound slow, but it doesn't scale well.

Andrus, I got the impression from the previous thread that you  
suspected something slightly different. That the performance problem  
might be in the fat query itself, but from our tests this isn't the  
case. If I've got this right, the way it works is:

1. perform regular query to get all columns but return result in  
iterator
2. iterate through first page and build full objects
3. iterate through other pages and build just objectids (this is the  
slow part for us)
4. when another page is fetched perform another query and fetch those  
objects from the DB

So, getting just primary keys from the DB may not be any faster if  
the performance problem is simply in the construction of objectIds.  
If the performance problem is in the numerous resizings of the  
Collection (each time it runs out of space, then it is increased by  
50% or 100% in size), then the solution could be as simple as  
figuring out the size of the iterator and sizing the collection  
appropriately from the beginning.

Any ideas on how to discover the exact cause of the performance hit?


Ari Maniatis




-------------------------->
ish
http://www.ish.com.au
Level 1, 30 Wilson Street Newtown 2042 Australia
phone +61 2 9550 5001   fax +61 2 9550 4001
GPG fingerprint CBFB 84B4 738D 4E87 5E5C  5EFA EF6A 7D2E 3E49 102A



Re: paged query slow when fetching big lists

Posted by Michael Gentry <bl...@gmail.com>.
Marcin, this thread might be of interest to you ...

http://mail-archives.apache.org/mod_mbox/cayenne-dev/200705.mbox/browser

Look at the "Paging and SQL queries" thread on May 25.

/dev/mrg