You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@geronimo.apache.org by Jeremy Boynes <jb...@apache.org> on 2005/09/08 20:19:33 UTC

Re: CMP FBPK Yields an individual SELECT per column

Matt asked a couple of questions on IRC related to this:

 > SELECT Q.symbol FROM QuoteEJB Q WHERE Q.symbol = s:153

This query is checking whether a entity exists - I thought for that we 
actually issued
SELECT 1 FROM QuoteEJB Q WHERE Q.symbol = s:153
but the effect is similar.

The columns are probably being fetched individually because there is no 
pre-fetch information defined. The challenge here is to pick the right 
default set for the work that is being performed in the transaction - 
not too much, not too little.

If we simply fetch all, then we may load way more information than is 
needed and that gets problematic when the table has a couple hundred 
columns or large objects like attachments or images.

If we fetch too little then we see this behaviour and do way more trips 
to the database than desirable.

Complicating things further is that there is often little correlation 
between the finder invoked and the fields that get accessed during the 
transaction. For example, when an application is displaying data, it may 
find a bean and then need to read every field so adding all fields to 
the query is desirable; when updating data, it may use the same finder 
but then overwrite every field and so optimal behaviour would be to add 
no fields to the query.

Also, a lot of this behaviour is influenced by the caching strategy in 
place. This lack of prefetch information does not cause problems if all 
the data being accessed is held in a local cache; it will be faulted in 
once and then reused. Of course, if caching is disabled then this won't 
perform well.

Where the appropriate hook points are also depends on the front-end 
persistence model. For example:
* CMP1 allows access to fields and has no relationships so you really
   need to load all fields for the bean and can't prefetch children
* Hibernate allows access to fields but does have relationships so
   you need to load all fields but can prefetch (this is without field
   access interception, if you do that you can lazy load fields)
* CMP2 intercepts all access so you can choose which fields to load
   and can prefetch relationships

TranQL supports all these different models through the concept of query 
events (e.g. when a finder runs) and though cache-miss events. The basic 
strategy is "do-something-when-a-cache-miss-occurs" where "something" is 
defined by the front-end depending on the access model it supports; the 
"something" may have side effects such as loading other values into the 
cache (which is how prefetch works).

So, the simple fix here is to set up prefetch associated with the finder 
or with the ejbLoad event which will load all the columns for the bean.

In the longer term, OpenEJB should be extended to associate cache 
pre-load operations with transaction initiation so that the entire data 
graph can be loaded up front in one query.

--
Jeremy

Matt Hogstrom (JIRA) wrote:
> CMP FBPK Yields an individual SELECT per column
> -----------------------------------------------
> 
>          Key: GERONIMO-985
>          URL: http://issues.apache.org/jira/browse/GERONIMO-985
>      Project: Geronimo
>         Type: Bug
>     Versions: 1.0-M5    
>  Environment: Geronimo w/Derby
>     Reporter: Matt Hogstrom
> 
> 
> I'm testing the DayTrader Application and it appears that FBPK finds are executing a single SELECT per field in the CMP field.  Here are the SELECTs making up a single OrderEJB.findByPrimaryKey().  This should be broken down into a single SELECT for the entity.
> 
> SELECT Q.symbol FROM QuoteEJB Q WHERE Q.symbol = s:153
> SELECT Q.symbol FROM QuoteEJB Q WHERE Q.symbol = s:153
> SELECT Q.companyName FROM QuoteEJB Q WHERE Q.symbol = s:153
> SELECT Q.volume FROM QuoteEJB Q WHERE Q.symbol = s:153
> SELECT Q.price FROM QuoteEJB Q WHERE Q.symbol = s:153
> SELECT Q.open1 FROM QuoteEJB Q WHERE Q.symbol = s:153
> SELECT Q.low FROM QuoteEJB Q WHERE Q.symbol = s:153
> SELECT Q.high FROM QuoteEJB Q WHERE Q.symbol = s:153
> SELECT Q.change1 FROM QuoteEJB Q WHERE Q.symbol = s:153
> 
> 
> 
>

Re: CMP FBPK Yields an individual SELECT per column

Posted by Jeremy Boynes <jb...@apache.org>.

Aaron Mulder wrote:
> 	I think the prefetch default should be "all columns" not "only 1
> column".  I see a lot more small/normal tables than tables with giant
> BLOBs or 100s of columns, so it makes much more sense to me to force the
> user to optimize for giant tables (default to load all CMP) than to force
> the user to optimize for every table (default to load 1 at a time).
> 

I agree - what you're seeing is TranQL's default when the front-end 
doesn't tell it what to do. I would suggest OpenEJB set this up as follows:

* CMP1 - load all
* CMP2 - load everything that isn't a LOB and let those fault in

--
Jeremy

Re: CMP FBPK Yields an individual SELECT per column

Posted by Aaron Mulder <am...@alumni.princeton.edu>.

	I think the prefetch default should be "all columns" not "only 1
column".  I see a lot more small/normal tables than tables with giant
BLOBs or 100s of columns, so it makes much more sense to me to force the
user to optimize for giant tables (default to load all CMP) than to force
the user to optimize for every table (default to load 1 at a time).

Aaron

On Thu, 8 Sep 2005, Jeremy Boynes wrote:

> Matt asked a couple of questions on IRC related to this:
> 
>  > SELECT Q.symbol FROM QuoteEJB Q WHERE Q.symbol = s:153
> 
> This query is checking whether a entity exists - I thought for that we 
> actually issued
> SELECT 1 FROM QuoteEJB Q WHERE Q.symbol = s:153
> but the effect is similar.
> 
> The columns are probably being fetched individually because there is no 
> pre-fetch information defined. The challenge here is to pick the right 
> default set for the work that is being performed in the transaction - 
> not too much, not too little.
> 
> If we simply fetch all, then we may load way more information than is 
> needed and that gets problematic when the table has a couple hundred 
> columns or large objects like attachments or images.
> 
> If we fetch too little then we see this behaviour and do way more trips 
> to the database than desirable.
> 
> Complicating things further is that there is often little correlation 
> between the finder invoked and the fields that get accessed during the 
> transaction. For example, when an application is displaying data, it may 
> find a bean and then need to read every field so adding all fields to 
> the query is desirable; when updating data, it may use the same finder 
> but then overwrite every field and so optimal behaviour would be to add 
> no fields to the query.
> 
> Also, a lot of this behaviour is influenced by the caching strategy in 
> place. This lack of prefetch information does not cause problems if all 
> the data being accessed is held in a local cache; it will be faulted in 
> once and then reused. Of course, if caching is disabled then this won't 
> perform well.
> 
> Where the appropriate hook points are also depends on the front-end 
> persistence model. For example:
> * CMP1 allows access to fields and has no relationships so you really
>    need to load all fields for the bean and can't prefetch children
> * Hibernate allows access to fields but does have relationships so
>    you need to load all fields but can prefetch (this is without field
>    access interception, if you do that you can lazy load fields)
> * CMP2 intercepts all access so you can choose which fields to load
>    and can prefetch relationships
> 
> TranQL supports all these different models through the concept of query 
> events (e.g. when a finder runs) and though cache-miss events. The basic 
> strategy is "do-something-when-a-cache-miss-occurs" where "something" is 
> defined by the front-end depending on the access model it supports; the 
> "something" may have side effects such as loading other values into the 
> cache (which is how prefetch works).
> 
> So, the simple fix here is to set up prefetch associated with the finder 
> or with the ejbLoad event which will load all the columns for the bean.
> 
> In the longer term, OpenEJB should be extended to associate cache 
> pre-load operations with transaction initiation so that the entire data 
> graph can be loaded up front in one query.
> 
> --
> Jeremy
> 
> Matt Hogstrom (JIRA) wrote:
> > CMP FBPK Yields an individual SELECT per column
> > -----------------------------------------------
> > 
> >          Key: GERONIMO-985
> >          URL: http://issues.apache.org/jira/browse/GERONIMO-985
> >      Project: Geronimo
> >         Type: Bug
> >     Versions: 1.0-M5    
> >  Environment: Geronimo w/Derby
> >     Reporter: Matt Hogstrom
> > 
> > 
> > I'm testing the DayTrader Application and it appears that FBPK finds are executing a single SELECT per field in the CMP field.  Here are the SELECTs making up a single OrderEJB.findByPrimaryKey().  This should be broken down into a single SELECT for the entity.
> > 
> > SELECT Q.symbol FROM QuoteEJB Q WHERE Q.symbol = s:153
> > SELECT Q.symbol FROM QuoteEJB Q WHERE Q.symbol = s:153
> > SELECT Q.companyName FROM QuoteEJB Q WHERE Q.symbol = s:153
> > SELECT Q.volume FROM QuoteEJB Q WHERE Q.symbol = s:153
> > SELECT Q.price FROM QuoteEJB Q WHERE Q.symbol = s:153
> > SELECT Q.open1 FROM QuoteEJB Q WHERE Q.symbol = s:153
> > SELECT Q.low FROM QuoteEJB Q WHERE Q.symbol = s:153
> > SELECT Q.high FROM QuoteEJB Q WHERE Q.symbol = s:153
> > SELECT Q.change1 FROM QuoteEJB Q WHERE Q.symbol = s:153
> > 
> > 
> > 
> > 
> 
>

Re: CMP FBPK Yields an individual SELECT per column

Posted by Matt Hogstrom <ma...@hogstrom.org>.

Sounds like a good strategy.  Is there some documentation that goes 
along with this?  My opinion is that the default behaviour should be 
that all columns are faulted in when first referenced.  My concern is 
that for databases like Oracle that run at READ-COMMITTED multiple trips 
to the database has the possibility of getting a fragmented view of the 
data as other Txs could have updated the row in between gets.  As a 
consequence a transaction may act inappropriately based on the 
fragmented data.

This isn't a problem for other databases like DB2 which runs at Read 
Stability by default (this has its own set of annoying issues but at 
least data corruption isn't one).  Firebird would probably work ok with 
this configuration but given the popularity of Oracle a conservative 
approach makes sense.

Can you provide the configuration informaiton and I'll put a 
recommendation on the list for a change in the default behaviour.

I'm still chasing the CMR problem but I'm confirming its not a 
configuration problem.

- Matt

Jeremy Boynes wrote:
> Matt asked a couple of questions on IRC related to this:
> 
>  > SELECT Q.symbol FROM QuoteEJB Q WHERE Q.symbol = s:153
> 
> This query is checking whether a entity exists - I thought for that we 
> actually issued
> SELECT 1 FROM QuoteEJB Q WHERE Q.symbol = s:153
> but the effect is similar.
> 
> The columns are probably being fetched individually because there is no 
> pre-fetch information defined. The challenge here is to pick the right 
> default set for the work that is being performed in the transaction - 
> not too much, not too little.
> 
> If we simply fetch all, then we may load way more information than is 
> needed and that gets problematic when the table has a couple hundred 
> columns or large objects like attachments or images.
> 
> If we fetch too little then we see this behaviour and do way more trips 
> to the database than desirable.
> 
> Complicating things further is that there is often little correlation 
> between the finder invoked and the fields that get accessed during the 
> transaction. For example, when an application is displaying data, it may 
> find a bean and then need to read every field so adding all fields to 
> the query is desirable; when updating data, it may use the same finder 
> but then overwrite every field and so optimal behaviour would be to add 
> no fields to the query.
> 
> Also, a lot of this behaviour is influenced by the caching strategy in 
> place. This lack of prefetch information does not cause problems if all 
> the data being accessed is held in a local cache; it will be faulted in 
> once and then reused. Of course, if caching is disabled then this won't 
> perform well.
> 
> Where the appropriate hook points are also depends on the front-end 
> persistence model. For example:
> * CMP1 allows access to fields and has no relationships so you really
>   need to load all fields for the bean and can't prefetch children
> * Hibernate allows access to fields but does have relationships so
>   you need to load all fields but can prefetch (this is without field
>   access interception, if you do that you can lazy load fields)
> * CMP2 intercepts all access so you can choose which fields to load
>   and can prefetch relationships
> 
> TranQL supports all these different models through the concept of query 
> events (e.g. when a finder runs) and though cache-miss events. The basic 
> strategy is "do-something-when-a-cache-miss-occurs" where "something" is 
> defined by the front-end depending on the access model it supports; the 
> "something" may have side effects such as loading other values into the 
> cache (which is how prefetch works).
> 
> So, the simple fix here is to set up prefetch associated with the finder 
> or with the ejbLoad event which will load all the columns for the bean.
> 
> In the longer term, OpenEJB should be extended to associate cache 
> pre-load operations with transaction initiation so that the entire data 
> graph can be loaded up front in one query.
> 
> -- 
> Jeremy
> 
> Matt Hogstrom (JIRA) wrote:
> 
>> CMP FBPK Yields an individual SELECT per column
>> -----------------------------------------------
>>
>>          Key: GERONIMO-985
>>          URL: http://issues.apache.org/jira/browse/GERONIMO-985
>>      Project: Geronimo
>>         Type: Bug
>>     Versions: 1.0-M5     Environment: Geronimo w/Derby
>>     Reporter: Matt Hogstrom
>>
>>
>> I'm testing the DayTrader Application and it appears that FBPK finds 
>> are executing a single SELECT per field in the CMP field.  Here are 
>> the SELECTs making up a single OrderEJB.findByPrimaryKey().  This 
>> should be broken down into a single SELECT for the entity.
>>
>> SELECT Q.symbol FROM QuoteEJB Q WHERE Q.symbol = s:153
>> SELECT Q.symbol FROM QuoteEJB Q WHERE Q.symbol = s:153
>> SELECT Q.companyName FROM QuoteEJB Q WHERE Q.symbol = s:153
>> SELECT Q.volume FROM QuoteEJB Q WHERE Q.symbol = s:153
>> SELECT Q.price FROM QuoteEJB Q WHERE Q.symbol = s:153
>> SELECT Q.open1 FROM QuoteEJB Q WHERE Q.symbol = s:153
>> SELECT Q.low FROM QuoteEJB Q WHERE Q.symbol = s:153
>> SELECT Q.high FROM QuoteEJB Q WHERE Q.symbol = s:153
>> SELECT Q.change1 FROM QuoteEJB Q WHERE Q.symbol = s:153
>>
>>
>>
>>
> 
> 
> 
>