You are viewing a plain text version of this content. The canonical link for it is here.

Posted to ojb-user@db.apache.org by John M <in...@yahoo.com> on 2003/06/27 23:57:23 UTC

question about performance - cache, pre-fetch, other

I'm using rc4 from cvs with the PB kernel only. 
During testing of a piece of the application I had
some pretty serious performance problems as a whole
lot of data is being read at once.  Most of the object
model I have has auto-retrieve on and everything is
proxied via interfaces.  

The first pass at the code had no optimization, so
running through a list of, say, 1000 objects
potentially caused 1000 * 3 single queries as
referenced objects were forced to load by accessing
the proxy.  Obviously not good.  One way to fix this
perhaps is to have the global cache and just hope that
all the objects are in there when references are
resolved so it's faster.  

What seems to be the OJB way is to use the prefetched
relationships in a Query.  However, this didn't seem
to work great for my problem for a couple of reasons. 
One is that it selects with an IN clause, which is
limited in size.  So really for a lot of results you
may have several queries.   I also don't know how fast
mySQL can be expected to return a query when you pass
it 100 ids in an IN list, either.  A big one, however,
is that you can only prefetch one reference deep as
far as I can tell.  So if you have A->B->C, you can 
prefetch B when querying on A, but you can't get C.

My solution was to change OJB to allow the following:
select the fields from all the tables (objects) you
want loaded and retrieve the data in one query,
building the objects from there.  My question is this:
 Does this sound valid and useful, or am I missing
something or totally off my rocker?  For the way we
are using OJB this seems to fit the bill perfectly
when used appropriately.

Here's how I changed OJB to support this:
  Added a query type that holds the reference names to
get (and allows OJB to know you want to do it that
way)
  In SqlSelectStatement it appends all the columns
from each table specified.  This is a little different
that what normally happens in that fields from
ClassDescriptor.getFieldDescriptors are used directly
instead of getFieldDescriptorsForMultiMappedTable
because the order is important.  This is the part most
likely to screw something up, but we don't use fancy
mapping.
  Changed SqlQueryStatement to forcibly add table
aliases for the desired join tables in case they
weren't specified in the Criteria.
  Modified RsIterator, RowReader, and
RowReaderDefaultImpl to allow reading of a row into an
object given an offset and FieldDescriptor array. 
This should be a little faster anyway as it reads
based on the index instead of name, but being able to
use the offset is the most important thing.
  Modified RsIterator to read and cache the extra
objects from the joins as it iterates through the main
object list.  These are read before the main object so
the references in the main object resolve to the
cached objects.

I just submitted a couple of other patches to the dev
list that I found while looking at this stuff, but
they seemed more generic and thus could stand alone. 
I can submit patches for these changes if anyone wants
to see them.  Hopefully this or something like it will
be worked into OJB so I don't have to keep track of
these changes!

Thanks,
John Marshall
Connectria

__________________________________
Do you Yahoo!?
SBC Yahoo! DSL - Now only $29.95 per month!
http://sbc.yahoo.com

Re: question about performance - cache, pre-fetch, other

Posted by Thomas Mahler <th...@web.de>.

Hi Michael,

Michael Mogley wrote:
> I've been aware of this aspect of Ojb for a while and have always
> wondered why it chose not to implement the join solution to batch-reads.
> The join, as John points out, in addition to guaranteeing reconstruction
> of the object graph in one read, also handles cases of multiple
> indirection.  This is the approach taken by major vendors in the past
> such as Toplink, and is also how I handled it in my own layer.

It's only a question of available resources.
When we started we kept things as simple as possible, that is with out 
any mechanisms to retrieve large object graphs via complex joins.

Of course it would be extremely cool to have this feature.
Until now nobody urgently requested this feature. So we were busy 
implementing other stuff.

I think implementing such a feature is a lot of work and I don't see how 
we can achieve it with the actual size of contributors.

If someone offers to implemnt this beast we'd be happy to integrate it!

cheers,
Thomas


> Thanks John.
> 
> Michael
> 
> 
>>-----Original Message-----
>>From: Jakob Braeuchi [mailto:jbraeuchi@gmx.ch]
>>Sent: Sunday, June 29, 2003 6:47 AM
>>To: OJB Users List
>>Subject: Re: question about performance - cache, pre-fetch, other
>>
>>hi john,
>>
>>the prefetching was my first try to improve performance. it is quite
>>simple but has it's drawbacks.
>>im very interested in your joined-approach (does it handle extents
>>correctly) , so please send me your patches (and the complete files as
>>well).
>>
>>jakob
>>
>>John M wrote:
>>
>>
>>>I'm using rc4 from cvs with the PB kernel only.
>>>During testing of a piece of the application I had
>>>some pretty serious performance problems as a whole
>>>lot of data is being read at once.  Most of the object
>>>model I have has auto-retrieve on and everything is
>>>proxied via interfaces.
>>>
>>>The first pass at the code had no optimization, so
>>>running through a list of, say, 1000 objects
>>>potentially caused 1000 * 3 single queries as
>>>referenced objects were forced to load by accessing
>>>the proxy.  Obviously not good.  One way to fix this
>>>perhaps is to have the global cache and just hope that
>>>all the objects are in there when references are
>>>resolved so it's faster.
>>>
>>>What seems to be the OJB way is to use the prefetched
>>>relationships in a Query.  However, this didn't seem
>>>to work great for my problem for a couple of reasons.
>>>One is that it selects with an IN clause, which is
>>>limited in size.  So really for a lot of results you
>>>may have several queries.   I also don't know how fast
>>>mySQL can be expected to return a query when you pass
>>>it 100 ids in an IN list, either.  A big one, however,
>>>is that you can only prefetch one reference deep as
>>>far as I can tell.  So if you have A->B->C, you can
>>>prefetch B when querying on A, but you can't get C.
>>>
>>>My solution was to change OJB to allow the following:
>>>select the fields from all the tables (objects) you
>>>want loaded and retrieve the data in one query,
>>>building the objects from there.  My question is this:
>>>Does this sound valid and useful, or am I missing
>>>something or totally off my rocker?  For the way we
>>>are using OJB this seems to fit the bill perfectly
>>>when used appropriately.
>>>
>>>Here's how I changed OJB to support this:
>>> Added a query type that holds the reference names to
>>>get (and allows OJB to know you want to do it that
>>>way)
>>> In SqlSelectStatement it appends all the columns
>>
>>>from each table specified.  This is a little different
>>
>>>that what normally happens in that fields from
>>>ClassDescriptor.getFieldDescriptors are used directly
>>>instead of getFieldDescriptorsForMultiMappedTable
>>>because the order is important.  This is the part most
>>>likely to screw something up, but we don't use fancy
>>>mapping.
>>> Changed SqlQueryStatement to forcibly add table
>>>aliases for the desired join tables in case they
>>>weren't specified in the Criteria.
>>> Modified RsIterator, RowReader, and
>>>RowReaderDefaultImpl to allow reading of a row into an
>>>object given an offset and FieldDescriptor array.
>>>This should be a little faster anyway as it reads
>>>based on the index instead of name, but being able to
>>>use the offset is the most important thing.
>>> Modified RsIterator to read and cache the extra
>>>objects from the joins as it iterates through the main
>>>object list.  These are read before the main object so
>>>the references in the main object resolve to the
>>>cached objects.
>>>
>>>I just submitted a couple of other patches to the dev
>>>list that I found while looking at this stuff, but
>>>they seemed more generic and thus could stand alone.
>>>I can submit patches for these changes if anyone wants
>>>to see them.  Hopefully this or something like it will
>>>be worked into OJB so I don't have to keep track of
>>>these changes!
>>>
>>>Thanks,
>>>John Marshall
>>>Connectria
>>>
>>>__________________________________
>>>Do you Yahoo!?
>>>SBC Yahoo! DSL - Now only $29.95 per month!
>>>http://sbc.yahoo.com
>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: ojb-user-unsubscribe@db.apache.org
>>>For additional commands, e-mail: ojb-user-help@db.apache.org
>>>
>>>
>>>
>>>
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: ojb-user-unsubscribe@db.apache.org
>>For additional commands, e-mail: ojb-user-help@db.apache.org
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: ojb-user-unsubscribe@db.apache.org
> For additional commands, e-mail: ojb-user-help@db.apache.org
> 
>

RE: question about performance - cache, pre-fetch, other

Posted by Michael Mogley <mm...@adelphia.net>.

I've been aware of this aspect of Ojb for a while and have always
wondered why it chose not to implement the join solution to batch-reads.
The join, as John points out, in addition to guaranteeing reconstruction
of the object graph in one read, also handles cases of multiple
indirection.  This is the approach taken by major vendors in the past
such as Toplink, and is also how I handled it in my own layer.

Thanks John.

Michael

> -----Original Message-----
> From: Jakob Braeuchi [mailto:jbraeuchi@gmx.ch]
> Sent: Sunday, June 29, 2003 6:47 AM
> To: OJB Users List
> Subject: Re: question about performance - cache, pre-fetch, other
> 
> hi john,
> 
> the prefetching was my first try to improve performance. it is quite
> simple but has it's drawbacks.
> im very interested in your joined-approach (does it handle extents
> correctly) , so please send me your patches (and the complete files as
> well).
> 
> jakob
> 
> John M wrote:
> 
> >I'm using rc4 from cvs with the PB kernel only.
> >During testing of a piece of the application I had
> >some pretty serious performance problems as a whole
> >lot of data is being read at once.  Most of the object
> >model I have has auto-retrieve on and everything is
> >proxied via interfaces.
> >
> >The first pass at the code had no optimization, so
> >running through a list of, say, 1000 objects
> >potentially caused 1000 * 3 single queries as
> >referenced objects were forced to load by accessing
> >the proxy.  Obviously not good.  One way to fix this
> >perhaps is to have the global cache and just hope that
> >all the objects are in there when references are
> >resolved so it's faster.
> >
> >What seems to be the OJB way is to use the prefetched
> >relationships in a Query.  However, this didn't seem
> >to work great for my problem for a couple of reasons.
> >One is that it selects with an IN clause, which is
> >limited in size.  So really for a lot of results you
> >may have several queries.   I also don't know how fast
> >mySQL can be expected to return a query when you pass
> >it 100 ids in an IN list, either.  A big one, however,
> >is that you can only prefetch one reference deep as
> >far as I can tell.  So if you have A->B->C, you can
> >prefetch B when querying on A, but you can't get C.
> >
> >My solution was to change OJB to allow the following:
> >select the fields from all the tables (objects) you
> >want loaded and retrieve the data in one query,
> >building the objects from there.  My question is this:
> > Does this sound valid and useful, or am I missing
> >something or totally off my rocker?  For the way we
> >are using OJB this seems to fit the bill perfectly
> >when used appropriately.
> >
> >Here's how I changed OJB to support this:
> >  Added a query type that holds the reference names to
> >get (and allows OJB to know you want to do it that
> >way)
> >  In SqlSelectStatement it appends all the columns
> >from each table specified.  This is a little different
> >that what normally happens in that fields from
> >ClassDescriptor.getFieldDescriptors are used directly
> >instead of getFieldDescriptorsForMultiMappedTable
> >because the order is important.  This is the part most
> >likely to screw something up, but we don't use fancy
> >mapping.
> >  Changed SqlQueryStatement to forcibly add table
> >aliases for the desired join tables in case they
> >weren't specified in the Criteria.
> >  Modified RsIterator, RowReader, and
> >RowReaderDefaultImpl to allow reading of a row into an
> >object given an offset and FieldDescriptor array.
> >This should be a little faster anyway as it reads
> >based on the index instead of name, but being able to
> >use the offset is the most important thing.
> >  Modified RsIterator to read and cache the extra
> >objects from the joins as it iterates through the main
> >object list.  These are read before the main object so
> >the references in the main object resolve to the
> >cached objects.
> >
> >I just submitted a couple of other patches to the dev
> >list that I found while looking at this stuff, but
> >they seemed more generic and thus could stand alone.
> >I can submit patches for these changes if anyone wants
> >to see them.  Hopefully this or something like it will
> >be worked into OJB so I don't have to keep track of
> >these changes!
> >
> >Thanks,
> >John Marshall
> >Connectria
> >
> >__________________________________
> >Do you Yahoo!?
> >SBC Yahoo! DSL - Now only $29.95 per month!
> >http://sbc.yahoo.com
> >
> >---------------------------------------------------------------------
> >To unsubscribe, e-mail: ojb-user-unsubscribe@db.apache.org
> >For additional commands, e-mail: ojb-user-help@db.apache.org
> >
> >
> >
> >
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: ojb-user-unsubscribe@db.apache.org
> For additional commands, e-mail: ojb-user-help@db.apache.org

Re: question about performance - cache, pre-fetch, other

Posted by Jakob Braeuchi <jb...@gmx.ch>.

hi john,

the prefetching was my first try to improve performance. it is quite 
simple but has it's drawbacks.
im very interested in your joined-approach (does it handle extents 
correctly) , so please send me your patches (and the complete files as 
well).

jakob

John M wrote:

>I'm using rc4 from cvs with the PB kernel only. 
>During testing of a piece of the application I had
>some pretty serious performance problems as a whole
>lot of data is being read at once.  Most of the object
>model I have has auto-retrieve on and everything is
>proxied via interfaces.  
>
>The first pass at the code had no optimization, so
>running through a list of, say, 1000 objects
>potentially caused 1000 * 3 single queries as
>referenced objects were forced to load by accessing
>the proxy.  Obviously not good.  One way to fix this
>perhaps is to have the global cache and just hope that
>all the objects are in there when references are
>resolved so it's faster.  
>
>What seems to be the OJB way is to use the prefetched
>relationships in a Query.  However, this didn't seem
>to work great for my problem for a couple of reasons. 
>One is that it selects with an IN clause, which is
>limited in size.  So really for a lot of results you
>may have several queries.   I also don't know how fast
>mySQL can be expected to return a query when you pass
>it 100 ids in an IN list, either.  A big one, however,
>is that you can only prefetch one reference deep as
>far as I can tell.  So if you have A->B->C, you can 
>prefetch B when querying on A, but you can't get C.
>
>My solution was to change OJB to allow the following:
>select the fields from all the tables (objects) you
>want loaded and retrieve the data in one query,
>building the objects from there.  My question is this:
> Does this sound valid and useful, or am I missing
>something or totally off my rocker?  For the way we
>are using OJB this seems to fit the bill perfectly
>when used appropriately.
>
>Here's how I changed OJB to support this:
>  Added a query type that holds the reference names to
>get (and allows OJB to know you want to do it that
>way)
>  In SqlSelectStatement it appends all the columns
>from each table specified.  This is a little different
>that what normally happens in that fields from
>ClassDescriptor.getFieldDescriptors are used directly
>instead of getFieldDescriptorsForMultiMappedTable
>because the order is important.  This is the part most
>likely to screw something up, but we don't use fancy
>mapping.
>  Changed SqlQueryStatement to forcibly add table
>aliases for the desired join tables in case they
>weren't specified in the Criteria.
>  Modified RsIterator, RowReader, and
>RowReaderDefaultImpl to allow reading of a row into an
>object given an offset and FieldDescriptor array. 
>This should be a little faster anyway as it reads
>based on the index instead of name, but being able to
>use the offset is the most important thing.
>  Modified RsIterator to read and cache the extra
>objects from the joins as it iterates through the main
>object list.  These are read before the main object so
>the references in the main object resolve to the
>cached objects.
>
>I just submitted a couple of other patches to the dev
>list that I found while looking at this stuff, but
>they seemed more generic and thus could stand alone. 
>I can submit patches for these changes if anyone wants
>to see them.  Hopefully this or something like it will
>be worked into OJB so I don't have to keep track of
>these changes!
>
>Thanks,
>John Marshall
>Connectria
>
>__________________________________
>Do you Yahoo!?
>SBC Yahoo! DSL - Now only $29.95 per month!
>http://sbc.yahoo.com
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: ojb-user-unsubscribe@db.apache.org
>For additional commands, e-mail: ojb-user-help@db.apache.org
>
>
>  
>