You are viewing a plain text version of this content. The canonical link for it is here.
Posted to ojb-dev@db.apache.org by Lance Eason <la...@whisperwire.com> on 2003/05/13 20:01:04 UTC

Idea for further OJB performance improvements

Thomas, Armin, et al.,

I have another idea to bounce off you.  In my recent tuning exercise I got my biggest performance improvement from actually taking management of certain relationships away from OJB which makes me kind of sad.  Whereas the changes I posted to the list account for about a 20% improvement this bought me a 600% improvement (from 24 seconds down to 4 seconds).  What I did is this...

Let's say we have three tables, PURCHASE_ORDER, LINE_ITEM and LINE_ITEM_NOTE, where LINE_ITEMs are children of PURCHASE_ORDERs and LINE_ITEM_NOTEs are children of LINE_ITEMs.  Now say OJB is defined to manage those relationships, a PurchaseOrder object has a collection of LineItem objects and each LineItem has a collection of LineItemNote objects.  I do a query for PurchaseOrders where customer_id = 20 and let's say I get back 10 PurchaseOrders each of which contains 5 LineItems.

The way OJB will handle this is:

   - Do a select 'SELECT * FROM PURCHASE_ORDER WHERE customer_id = 20' to retrieve PurchaseOrders
   - For each of the 10 PurchaseOrders do an additional select 'SELECT * FROM LINE_ITEM WHERE purchase_order_id = X' for a total of 10 more queries
   - For each of the 50 LineItems do an additional select 'SELECT * FROM LINE_ITEM_NOTE WHERE line_item_id = X' for a total of 50 more queries

The point is that the number of queries rapidly explodes as relationships are followed and each of those queries has overhead associated with it (getting a broker to use, creating the statement, and the actual network hop and processing time of the query itself).  I can use proxies to delay doing the queries until the information is actually asked for but if I really do need all the information the net amount of work is the same.

What I did instead was:

   - Do a select to retrieve PurchaseOrders, 'SELECT * FROM PURCHASE_ORDER WHERE customer_id = 20'
   - Do a single select to retrieve all associated line items 
       'SELECT li.* 
          FROM PURCHASE_ORDER po, LINE_ITEM li
         WHERE li.purchase_order_id = po.purchase_order_id
           AND po.customer_id = 20'
   - Use the LineItem's foreign key back to the PurchaseOrder to attach it to the right PurchaseOrder from the original select
   - Do a single select to retrieve all associated line item notes
       'SELECT lin.*
          FROM PURCHASE_ORDER po, LINE_ITEM li, LINE_ITEM_NOTE lin
         WHERE lin.line_item_id = li.line_item_id
           AND li.purchase_order_id = po.purchase_order_id
           AND po.customer_id = 20'
   - Use the LineItemNote's foreign key back to the LineItem to attach it to the right LineItem

So the queries have gone down from 61 to 3 and the processing time has improved dramatically.  This type of thing can be done in general where you use the same criteria that was used to generate the top level query and just add the appropriate join information to get the data for the related table you're interested in.

Of course given caching, doing this doesn't always make sense.  For instance if 9 of the 10 PurchaseOrders were already populated and in my cache then OJB's current strategy is likely better than retrieving the LineItem and LineItemNotes for all 10 when all I need is the 1 uncached one.  The way I handled this was to define a threshold for the ratio between items returned by the query and the one's not found in the cache.  If that ratio was high enough I used this approach otherwise I fell back on OJB's approach (except that I took advantage of the fact that I use single column integer primary keys and just generated an IN clause with all the keys to still do it in a single query).

I'm pretty sure OJB has all the information it needs to do this same type of thing and I think it would be great if I could just let OJB handle it for me rather than pulling responsibility away from OJB.  Any thoughts? 

Re: Idea for further OJB performance improvements

Posted by Jakob Braeuchi <jb...@gmx.ch>.
hi lance,

to reduce the number of queries used for relationship retrieval you can 
already use prefetched queries.

hth
jakob

Lance Eason wrote:

>Thomas, Armin, et al.,
>
>I have another idea to bounce off you.  In my recent tuning exercise I got my biggest performance improvement from actually taking management of certain relationships away from OJB which makes me kind of sad.  Whereas the changes I posted to the list account for about a 20% improvement this bought me a 600% improvement (from 24 seconds down to 4 seconds).  What I did is this...
>
>Let's say we have three tables, PURCHASE_ORDER, LINE_ITEM and LINE_ITEM_NOTE, where LINE_ITEMs are children of PURCHASE_ORDERs and LINE_ITEM_NOTEs are children of LINE_ITEMs.  Now say OJB is defined to manage those relationships, a PurchaseOrder object has a collection of LineItem objects and each LineItem has a collection of LineItemNote objects.  I do a query for PurchaseOrders where customer_id = 20 and let's say I get back 10 PurchaseOrders each of which contains 5 LineItems.
>
>The way OJB will handle this is:
>
>   - Do a select 'SELECT * FROM PURCHASE_ORDER WHERE customer_id = 20' to retrieve PurchaseOrders
>   - For each of the 10 PurchaseOrders do an additional select 'SELECT * FROM LINE_ITEM WHERE purchase_order_id = X' for a total of 10 more queries
>   - For each of the 50 LineItems do an additional select 'SELECT * FROM LINE_ITEM_NOTE WHERE line_item_id = X' for a total of 50 more queries
>
>The point is that the number of queries rapidly explodes as relationships are followed and each of those queries has overhead associated with it (getting a broker to use, creating the statement, and the actual network hop and processing time of the query itself).  I can use proxies to delay doing the queries until the information is actually asked for but if I really do need all the information the net amount of work is the same.
>
>What I did instead was:
>
>   - Do a select to retrieve PurchaseOrders, 'SELECT * FROM PURCHASE_ORDER WHERE customer_id = 20'
>   - Do a single select to retrieve all associated line items 
>       'SELECT li.* 
>          FROM PURCHASE_ORDER po, LINE_ITEM li
>         WHERE li.purchase_order_id = po.purchase_order_id
>           AND po.customer_id = 20'
>   - Use the LineItem's foreign key back to the PurchaseOrder to attach it to the right PurchaseOrder from the original select
>   - Do a single select to retrieve all associated line item notes
>       'SELECT lin.*
>          FROM PURCHASE_ORDER po, LINE_ITEM li, LINE_ITEM_NOTE lin
>         WHERE lin.line_item_id = li.line_item_id
>           AND li.purchase_order_id = po.purchase_order_id
>           AND po.customer_id = 20'
>   - Use the LineItemNote's foreign key back to the LineItem to attach it to the right LineItem
>
>So the queries have gone down from 61 to 3 and the processing time has improved dramatically.  This type of thing can be done in general where you use the same criteria that was used to generate the top level query and just add the appropriate join information to get the data for the related table you're interested in.
>
>Of course given caching, doing this doesn't always make sense.  For instance if 9 of the 10 PurchaseOrders were already populated and in my cache then OJB's current strategy is likely better than retrieving the LineItem and LineItemNotes for all 10 when all I need is the 1 uncached one.  The way I handled this was to define a threshold for the ratio between items returned by the query and the one's not found in the cache.  If that ratio was high enough I used this approach otherwise I fell back on OJB's approach (except that I took advantage of the fact that I use single column integer primary keys and just generated an IN clause with all the keys to still do it in a single query).
>
>I'm pretty sure OJB has all the information it needs to do this same type of thing and I think it would be great if I could just let OJB handle it for me rather than pulling responsibility away from OJB.  Any thoughts? 
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: ojb-dev-unsubscribe@db.apache.org
>For additional commands, e-mail: ojb-dev-help@db.apache.org
>
>
>  
>