You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@geronimo.apache.org by ApolloX <sc...@selikoff.net> on 2008/02/27 03:54:56 UTC

CMP2 on G2 - Delayed Database Flush

Is there a way to configure when commands are flushed to the database for
EJB2 CMP beans in G2?  I noticed something that may be related to the severe
caching/performance slowdown from trying to migrate CMP2 beans from G1 to
G2.

Here's a concrete example of the behavior:

AdminLocalHome movieHome = (MovieLocalHome)
context.lookup("java:comp/env/ejb/MovieLocal");
MovieLocal newMovie = movieHome.create(someId);
newMovie.setTitle("The Matrix");

In G1, this code worked fine because the database INSERT was delayed until
after the setTitle() was called.  In G2, the INSERT happens immediately
after the call to create() leading to a database insertion error since title
is a required field in the database.

Could someone provide me with a solution to delay the database flushing
until later on?  As I've said, I don't get the impression anything is being
cached for CMP2 beans in G2 based on the severe performance slowdown I've
seen.

ApolloX

-- 
View this message in context: http://www.nabble.com/CMP2-on-G2---Delayed-Database-Flush-tp15704963s134p15704963.html
Sent from the Apache Geronimo - Users mailing list archive at Nabble.com.

Re: CMP2 on G2 - Delayed Database Flush

Posted by David Blevins <da...@visi.com>.

On Apr 3, 2008, at 10:43 PM, Kevan Miller wrote:
> A Geronimo user had reported a performance problem w/ CMP as  
> described below. Any thoughts?
>
> I'd tried to forward to dev@openejb a while back, but looks like I  
> sent to a bad email address, instead...

No, it made it.  I didn't respond as I'm not much of an expert in the  
CMP code.

Rick, I know at one point you were interested in digging into the CMP  
code.  You interested in tackling this as an introductory step toward  
CMP/JPA?  I don't think there's any immediate time pressure, so even  
if you're busy right this second, feel free to still say yes :)

-David


> On Feb 26, 2008, at 9:54 PM, ApolloX wrote:
>
>>
>> Is there a way to configure when commands are flushed to the  
>> database for
>> EJB2 CMP beans in G2?  I noticed something that may be related to  
>> the severe
>> caching/performance slowdown from trying to migrate CMP2 beans from  
>> G1 to
>> G2.
>>
>> Here's a concrete example of the behavior:
>>
>> AdminLocalHome movieHome = (MovieLocalHome)
>> context.lookup("java:comp/env/ejb/MovieLocal");
>> MovieLocal newMovie = movieHome.create(someId);
>> newMovie.setTitle("The Matrix");
>>
>> In G1, this code worked fine because the database INSERT was  
>> delayed until
>> after the setTitle() was called.  In G2, the INSERT happens  
>> immediately
>> after the call to create() leading to a database insertion error  
>> since title
>> is a required field in the database.
>>
>> Could someone provide me with a solution to delay the database  
>> flushing
>> until later on?  As I've said, I don't get the impression anything  
>> is being
>> cached for CMP2 beans in G2 based on the severe performance  
>> slowdown I've
>> seen.
>>
>> ApolloX
>>
>> -- 
>> View this message in context: http://www.nabble.com/CMP2-on-G2---Delayed-Database-Flush-tp15704963s134p15704963.html
>> Sent from the Apache Geronimo - Users mailing list archive at  
>> Nabble.com.
>>
>

Re: CMP2 on G2 - Delayed Database Flush

Posted by Dain Sundstrom <da...@iq80.com>.

One other thing occurred to me... while you are looking at the  
flushing code, you should double check that I implemented optional  
flush-before-query stuff.  I'm not sure if I got around to doing that,  
and it can have a serious performance impact to not allow users to  
disable flushing before queries are executed.

-dain

On Apr 14, 2008, at 1:04 PM, Dain Sundstrom wrote:
> On Apr 14, 2008, at 6:08 AM, Rick McGuire wrote:
>> I've not come up with any clever way of implementing the cache so  
>> far, other than just keeping a list of objects whose primary keys  
>> have not been calculated, and then, if all other lookups fail,  
>> start resolving the primary keys looking for the given target.  Not  
>> elegant, but I think this will work.
>> I do wonder if another approach might work better.  If I understand  
>> the reasoning behind the flush, it is necessary because it's  
>> possible that some of the information needed to calculate the  
>> primary key only becomes available after the JPA flush()/merge()  
>> sequence.  I suspect for many objects, this is not needed because a  
>> simple primary key is used.  Would it be feasible to detect the  
>> situation where a flush is needed to "crystalize" the object to  
>> calculate the primary key?  This way, simple object instances where  
>> the primary key is provided in the create() operation would not  
>> experience the performance hit.
>
> Hummm... this is becoming a much more interesting problem.
>
> In the cmp system, the JPA persistence.xml is the master source of  
> mapping information.  In CmpJpaConversion we convert the CMP  
> declarations in the ejb-jar.xml file to a persistence.xml (JaxB  
> objects), but a user can provide the persistence.xml file directly  
> effectively bypassing this code.  It should be easy to add a step to  
> CmpJpaConverstion (or a new DynamicDeployer) that walks the  
> persistence.xml JaxB objects and notes which beans have generated  
> primary key fields.
>
> Hopefully knowing which CMP beans use generated primary keys will  
> make the cache work easier, since you will know that a user can't  
> possibly ask for an object by primary key when you haven't resolved  
> the primary key for the user yet.
>
> -dain

Re: CMP2 on G2 - Delayed Database Flush

Posted by Dain Sundstrom <da...@iq80.com>.

On Apr 14, 2008, at 6:08 AM, Rick McGuire wrote:
> I've not come up with any clever way of implementing the cache so  
> far, other than just keeping a list of objects whose primary keys  
> have not been calculated, and then, if all other lookups fail, start  
> resolving the primary keys looking for the given target.  Not  
> elegant, but I think this will work.
> I do wonder if another approach might work better.  If I understand  
> the reasoning behind the flush, it is necessary because it's  
> possible that some of the information needed to calculate the  
> primary key only becomes available after the JPA flush()/merge()  
> sequence.  I suspect for many objects, this is not needed because a  
> simple primary key is used.  Would it be feasible to detect the  
> situation where a flush is needed to "crystalize" the object to  
> calculate the primary key?  This way, simple object instances where  
> the primary key is provided in the create() operation would not  
> experience the performance hit.

Hummm... this is becoming a much more interesting problem.

In the cmp system, the JPA persistence.xml is the master source of  
mapping information.  In CmpJpaConversion we convert the CMP  
declarations in the ejb-jar.xml file to a persistence.xml (JaxB  
objects), but a user can provide the persistence.xml file directly  
effectively bypassing this code.  It should be easy to add a step to  
CmpJpaConverstion (or a new DynamicDeployer) that walks the  
persistence.xml JaxB objects and notes which beans have generated  
primary key fields.

Hopefully knowing which CMP beans use generated primary keys will make  
the cache work easier, since you will know that a user can't possibly  
ask for an object by primary key when you haven't resolved the primary  
key for the user yet.

-dain

Re: CMP2 on G2 - Delayed Database Flush

Posted by Rick McGuire <ri...@gmail.com>.

Dain Sundstrom wrote:
> On Apr 7, 2008, at 7:06 AM, Rick McGuire wrote:
>> Dain Sundstrom wrote:
>>> I've been sucked into another project and haven't been paying much 
>>> attention to the lists...
>>>
>>> The problem is we flush before returning the created object to the 
>>> caller.  The reason we do this is because database generated fields 
>>> are not filled in until the flush statement which means the primary 
>>> key is not guaranteed to be available until flush.  The current code 
>>> requires the primary key to create the cmp proxy we return to the 
>>> caller.  The code will have to be changed to allow for late primary 
>>> key resolution either when the code calls getPrimaryKey or at the 
>>> end of the transaction.
>>>
>>> I don't have the time to look at this, but I can help you if you 
>>> want to work on it.
>> I've started poking around in the code trying to understand what 
>> needs to change.  Is the JpaCmpEngine.createBean() method where the 
>> flushing takes place?  It appears at that point in time that the 
>> primaryKey is used for 1) creating the ThreadContext instance, 2) ror 
>> storing the bean in the transaction cache, and 3) for creating the 
>> ProxyInfo instance.  Am I looking in the correct location for this?
>
> Yes.
>
>> The ThreadContext primary key bit looks easily changed to a lazy 
>> resolution, and probably the ProxyInfo as well, but the transcaction 
>> cache does not appear to be as easily changed, since the primary key 
>> is the main lookup method for the transaction cache.  I guess the 
>> transaction cache step could be bypassed until the primary key is 
>> actually generated, but I'm concerned that this could result in some 
>> resolution failures where an object would be expected to be located 
>> in the cache.
>
> The transaction cache was introduced as a work around to the 
> new-delete-new bug in OpenJPA (see JpaTestObject.newDeleteNew()).  If 
> you create, remove and recreated a bean with the same pk, OpenJPA 
> internally leave the pk as "deleted" so calls find(Class,Object) 
> result in a null.  We work around this by using a private cache to 
> track the objects created during the transaction.
>
> To implement delayed flush, you will have to add another way to track 
> the JPA instance object (since we won't have the pk to "find" the 
> object in the entity manager).  When the pk is not available, you use 
> the new, alternate, method to find the object, and when the pk is 
> finally resolved, you would add it to the transaction cache.
I've not come up with any clever way of implementing the cache so far, 
other than just keeping a list of objects whose primary keys have not 
been calculated, and then, if all other lookups fail, start resolving 
the primary keys looking for the given target.  Not elegant, but I think 
this will work. 

I do wonder if another approach might work better.  If I understand the 
reasoning behind the flush, it is necessary because it's possible that 
some of the information needed to calculate the primary key only becomes 
available after the JPA flush()/merge() sequence.  I suspect for many 
objects, this is not needed because a simple primary key is used.  Would 
it be feasible to detect the situation where a flush is needed to 
"crystalize" the object to calculate the primary key?  This way, simple 
object instances where the primary key is provided in the create() 
operation would not experience the performance hit.

Rick
>
> Off the top of my head, it may be possible to use a stand-in pk object 
> which wraps the JPA object itself (using identity based hashcode and 
> equals) until the real pk is resolved.  This pk object would then be 
> the alternate tx cache.
>
>> Any pointers on where the end of transaction processing would need to 
>> be performed?
>
> CmpContainer.ejbLoad(EntityBean) uses 
> TransactionSynchronizationRegistry.registerInterposedSynchronization 
> to store entities at the end of the transaction.  You'll want to 
> expand that logic to handle pk resolution in addition to ejbStore 
> callbacks.  The registerInterposedSynchronization doesn't really 
> handle ordering well so I suggest you use a single Synchronization 
> object to handle processing of the pks and the ejb store callbacks.
>
> One other think to keep in mind is that before a CMP is passed to a 
> remote vm, you'll need to make sure the pk has been resolved.
>
> -dain
>
>

Re: CMP2 on G2 - Delayed Database Flush

Posted by Dain Sundstrom <da...@iq80.com>.

On Apr 7, 2008, at 7:06 AM, Rick McGuire wrote:
> Dain Sundstrom wrote:
>> I've been sucked into another project and haven't been paying much  
>> attention to the lists...
>>
>> The problem is we flush before returning the created object to the  
>> caller.  The reason we do this is because database generated fields  
>> are not filled in until the flush statement which means the primary  
>> key is not guaranteed to be available until flush.  The current  
>> code requires the primary key to create the cmp proxy we return to  
>> the caller.  The code will have to be changed to allow for late  
>> primary key resolution either when the code calls getPrimaryKey or  
>> at the end of the transaction.
>>
>> I don't have the time to look at this, but I can help you if you  
>> want to work on it.
> I've started poking around in the code trying to understand what  
> needs to change.  Is the JpaCmpEngine.createBean() method where the  
> flushing takes place?  It appears at that point in time that the  
> primaryKey is used for 1) creating the ThreadContext instance, 2)  
> ror storing the bean in the transaction cache, and 3) for creating  
> the ProxyInfo instance.  Am I looking in the correct location for  
> this?

Yes.

> The ThreadContext primary key bit looks easily changed to a lazy  
> resolution, and probably the ProxyInfo as well, but the transcaction  
> cache does not appear to be as easily changed, since the primary key  
> is the main lookup method for the transaction cache.  I guess the  
> transaction cache step could be bypassed until the primary key is  
> actually generated, but I'm concerned that this could result in some  
> resolution failures where an object would be expected to be located  
> in the cache.

The transaction cache was introduced as a work around to the new- 
delete-new bug in OpenJPA (see JpaTestObject.newDeleteNew()).  If you  
create, remove and recreated a bean with the same pk, OpenJPA  
internally leave the pk as "deleted" so calls find(Class,Object)  
result in a null.  We work around this by using a private cache to  
track the objects created during the transaction.

To implement delayed flush, you will have to add another way to track  
the JPA instance object (since we won't have the pk to "find" the  
object in the entity manager).  When the pk is not available, you use  
the new, alternate, method to find the object, and when the pk is  
finally resolved, you would add it to the transaction cache.

Off the top of my head, it may be possible to use a stand-in pk object  
which wraps the JPA object itself (using identity based hashcode and  
equals) until the real pk is resolved.  This pk object would then be  
the alternate tx cache.

> Any pointers on where the end of transaction processing would need  
> to be performed?

CmpContainer.ejbLoad(EntityBean) uses  
TransactionSynchronizationRegistry.registerInterposedSynchronization  
to store entities at the end of the transaction.  You'll want to  
expand that logic to handle pk resolution in addition to ejbStore  
callbacks.  The registerInterposedSynchronization doesn't really  
handle ordering well so I suggest you use a single Synchronization  
object to handle processing of the pks and the ejb store callbacks.

One other think to keep in mind is that before a CMP is passed to a  
remote vm, you'll need to make sure the pk has been resolved.

-dain

Re: CMP2 on G2 - Delayed Database Flush

Posted by Rick McGuire <ri...@gmail.com>.

Dain Sundstrom wrote:
> I've been sucked into another project and haven't been paying much 
> attention to the lists...
>
> The problem is we flush before returning the created object to the 
> caller.  The reason we do this is because database generated fields 
> are not filled in until the flush statement which means the primary 
> key is not guaranteed to be available until flush.  The current code 
> requires the primary key to create the cmp proxy we return to the 
> caller.  The code will have to be changed to allow for late primary 
> key resolution either when the code calls getPrimaryKey or at the end 
> of the transaction.
>
> I don't have the time to look at this, but I can help you if you want 
> to work on it.
I've started poking around in the code trying to understand what needs 
to change.  Is the JpaCmpEngine.createBean() method where the flushing 
takes place?  It appears at that point in time that the primaryKey is 
used for 1) creating the ThreadContext instance, 2) ror storing the bean 
in the transaction cache, and 3) for creating the ProxyInfo instance.  
Am I looking in the correct location for this?

The ThreadContext primary key bit looks easily changed to a lazy 
resolution, and probably the ProxyInfo as well, but the transcaction 
cache does not appear to be as easily changed, since the primary key is 
the main lookup method for the transaction cache.  I guess the 
transaction cache step could be bypassed until the primary key is 
actually generated, but I'm concerned that this could result in some 
resolution failures where an object would be expected to be located in 
the cache. 

Any pointers on where the end of transaction processing would need to be 
performed?

Rick

>
> -dain
>
> On Apr 3, 2008, at 10:43 PM, Kevan Miller wrote:
>> A Geronimo user had reported a performance problem w/ CMP as 
>> described below. Any thoughts?
>>
>> I'd tried to forward to dev@openejb a while back, but looks like I 
>> sent to a bad email address, instead...
>>
>> --kevan
>>
>> On Feb 26, 2008, at 9:54 PM, ApolloX wrote:
>>
>>>
>>> Is there a way to configure when commands are flushed to the 
>>> database for
>>> EJB2 CMP beans in G2?  I noticed something that may be related to 
>>> the severe
>>> caching/performance slowdown from trying to migrate CMP2 beans from 
>>> G1 to
>>> G2.
>>>
>>> Here's a concrete example of the behavior:
>>>
>>> AdminLocalHome movieHome = (MovieLocalHome)
>>> context.lookup("java:comp/env/ejb/MovieLocal");
>>> MovieLocal newMovie = movieHome.create(someId);
>>> newMovie.setTitle("The Matrix");
>>>
>>> In G1, this code worked fine because the database INSERT was delayed 
>>> until
>>> after the setTitle() was called.  In G2, the INSERT happens immediately
>>> after the call to create() leading to a database insertion error 
>>> since title
>>> is a required field in the database.
>>>
>>> Could someone provide me with a solution to delay the database flushing
>>> until later on?  As I've said, I don't get the impression anything 
>>> is being
>>> cached for CMP2 beans in G2 based on the severe performance slowdown 
>>> I've
>>> seen.
>>>
>>> ApolloX
>>>
>>> -- 
>>> View this message in context: 
>>> http://www.nabble.com/CMP2-on-G2---Delayed-Database-Flush-tp15704963s134p15704963.html 
>>>
>>> Sent from the Apache Geronimo - Users mailing list archive at 
>>> Nabble.com.
>>>
>>
>
>

Re: CMP2 on G2 - Delayed Database Flush

Posted by Kevan Miller <ke...@gmail.com>.

On Apr 4, 2008, at 2:20 PM, Dain Sundstrom wrote:

> I've been sucked into another project and haven't been paying much  
> attention to the lists...
>
> The problem is we flush before returning the created object to the  
> caller.  The reason we do this is because database generated fields  
> are not filled in until the flush statement which means the primary  
> key is not guaranteed to be available until flush.  The current code  
> requires the primary key to create the cmp proxy we return to the  
> caller.  The code will have to be changed to allow for late primary  
> key resolution either when the code calls getPrimaryKey or at the  
> end of the transaction.
>
> I don't have the time to look at this, but I can help you if you  
> want to work on it.

Hi Dain,
Thanks a lot for the info. Makes sense. Totally understand lack of  
time... Unlikely that I'm going to have much time to spend on this  
either. Not exactly my cup-o-tea, anyway.  As David mentioned, nice  
little project to start understanding some of this code. Hopefully,  
somebody will be interested...

--kevan

Re: CMP2 on G2 - Delayed Database Flush

Posted by Dain Sundstrom <da...@iq80.com>.

I've been sucked into another project and haven't been paying much  
attention to the lists...

The problem is we flush before returning the created object to the  
caller.  The reason we do this is because database generated fields  
are not filled in until the flush statement which means the primary  
key is not guaranteed to be available until flush.  The current code  
requires the primary key to create the cmp proxy we return to the  
caller.  The code will have to be changed to allow for late primary  
key resolution either when the code calls getPrimaryKey or at the end  
of the transaction.

I don't have the time to look at this, but I can help you if you want  
to work on it.

-dain

On Apr 3, 2008, at 10:43 PM, Kevan Miller wrote:
> A Geronimo user had reported a performance problem w/ CMP as  
> described below. Any thoughts?
>
> I'd tried to forward to dev@openejb a while back, but looks like I  
> sent to a bad email address, instead...
>
> --kevan
>
> On Feb 26, 2008, at 9:54 PM, ApolloX wrote:
>
>>
>> Is there a way to configure when commands are flushed to the  
>> database for
>> EJB2 CMP beans in G2?  I noticed something that may be related to  
>> the severe
>> caching/performance slowdown from trying to migrate CMP2 beans from  
>> G1 to
>> G2.
>>
>> Here's a concrete example of the behavior:
>>
>> AdminLocalHome movieHome = (MovieLocalHome)
>> context.lookup("java:comp/env/ejb/MovieLocal");
>> MovieLocal newMovie = movieHome.create(someId);
>> newMovie.setTitle("The Matrix");
>>
>> In G1, this code worked fine because the database INSERT was  
>> delayed until
>> after the setTitle() was called.  In G2, the INSERT happens  
>> immediately
>> after the call to create() leading to a database insertion error  
>> since title
>> is a required field in the database.
>>
>> Could someone provide me with a solution to delay the database  
>> flushing
>> until later on?  As I've said, I don't get the impression anything  
>> is being
>> cached for CMP2 beans in G2 based on the severe performance  
>> slowdown I've
>> seen.
>>
>> ApolloX
>>
>> -- 
>> View this message in context: http://www.nabble.com/CMP2-on-G2---Delayed-Database-Flush-tp15704963s134p15704963.html
>> Sent from the Apache Geronimo - Users mailing list archive at  
>> Nabble.com.
>>
>

Re: CMP2 on G2 - Delayed Database Flush

Posted by Kevan Miller <ke...@gmail.com>.

A Geronimo user had reported a performance problem w/ CMP as described  
below. Any thoughts?

I'd tried to forward to dev@openejb a while back, but looks like I  
sent to a bad email address, instead...

--kevan

On Feb 26, 2008, at 9:54 PM, ApolloX wrote:

>
> Is there a way to configure when commands are flushed to the  
> database for
> EJB2 CMP beans in G2?  I noticed something that may be related to  
> the severe
> caching/performance slowdown from trying to migrate CMP2 beans from  
> G1 to
> G2.
>
> Here's a concrete example of the behavior:
>
> AdminLocalHome movieHome = (MovieLocalHome)
> context.lookup("java:comp/env/ejb/MovieLocal");
> MovieLocal newMovie = movieHome.create(someId);
> newMovie.setTitle("The Matrix");
>
> In G1, this code worked fine because the database INSERT was delayed  
> until
> after the setTitle() was called.  In G2, the INSERT happens  
> immediately
> after the call to create() leading to a database insertion error  
> since title
> is a required field in the database.
>
> Could someone provide me with a solution to delay the database  
> flushing
> until later on?  As I've said, I don't get the impression anything  
> is being
> cached for CMP2 beans in G2 based on the severe performance slowdown  
> I've
> seen.
>
> ApolloX
>
> -- 
> View this message in context: http://www.nabble.com/CMP2-on-G2---Delayed-Database-Flush-tp15704963s134p15704963.html
> Sent from the Apache Geronimo - Users mailing list archive at  
> Nabble.com.
>

Re: CMP2 on G2 - Delayed Database Flush

Posted by Kevan Miller <ke...@gmail.com>.

Any ideas?
--kevan

On Feb 26, 2008, at 9:54 PM, ApolloX wrote:

>
> Is there a way to configure when commands are flushed to the  
> database for
> EJB2 CMP beans in G2?  I noticed something that may be related to  
> the severe
> caching/performance slowdown from trying to migrate CMP2 beans from  
> G1 to
> G2.
>
> Here's a concrete example of the behavior:
>
> AdminLocalHome movieHome = (MovieLocalHome)
> context.lookup("java:comp/env/ejb/MovieLocal");
> MovieLocal newMovie = movieHome.create(someId);
> newMovie.setTitle("The Matrix");
>
> In G1, this code worked fine because the database INSERT was delayed  
> until
> after the setTitle() was called.  In G2, the INSERT happens  
> immediately
> after the call to create() leading to a database insertion error  
> since title
> is a required field in the database.
>
> Could someone provide me with a solution to delay the database  
> flushing
> until later on?  As I've said, I don't get the impression anything  
> is being
> cached for CMP2 beans in G2 based on the severe performance slowdown  
> I've
> seen.
>
> ApolloX
>
> -- 
> View this message in context: http://www.nabble.com/CMP2-on-G2---Delayed-Database-Flush-tp15704963s134p15704963.html
> Sent from the Apache Geronimo - Users mailing list archive at  
> Nabble.com.
>