You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@tomee.apache.org by Alexander Saint Croix <sa...@gmail.com> on 2008/04/20 17:57:30 UTC

Best practice for cascading persist of existing entity records

Howdy, all.

I'm wondering what the best practice is for the following use case:

I have a handful of entity classes, one of which is a Unit (such as
"kilogram") which has an auto-generated UID field.  Each individual instance
of Unit ("kilogram", "second", "ampere", etc) can and will be used by
multiple instances of the other entity classes.

The problem arises after I persist one of the unit instances (such as
"kilogram") and it is assigned a primary key field generated by the
container.  From that point forward, each other object that refers to that
specific Unit instance gives me trouble when I try to persist it, because it
already has a nonzero UID field value.

Ideally, I do not want to have more than one "kilogram" record in the "Unit"
table.  Is there a common practice to tell the persistence container to
"apply the persist cascade of the holding object to this field, unless the
data represented by the field already exists in the database, in which case
don't cascade--just reference the UID of that instance"?

If the question isn't clear, I can provide sample code.  If you want to punt
to the OpenJPA list, that's also acceptable.

Cheers,
--
Alex

Re: Best practice for cascading persist of existing entity records

Posted by Alexander Saint Croix <sa...@gmail.com>.
This sounds more in line with what I was seeing.  I'll chip away at
it--coming up on the end of my testing cycle here.  The code patterns that
David showed me in Jan are paying big dividends now.

If I figure out a really solid way to do this (such as front-loading the
units ahead of time, which I think is a great idea), I'll see about drafting
up an example.

Cheers,
--
Alex





On Tue, Apr 22, 2008 at 12:33 PM, Dain Sundstrom <da...@iq80.com> wrote:

> I ran into a strange behavior (strange in my mind) a while back that may
> be effecting your code.  Say you have the following code for Cheese
>
> public class Cheese {
>  private Long id; // auto generated pk
>  private String name;
>  public Cheese(String name) { this.name = name; }
>  public Long getId() { return id; }
>  public String getName() { return name; }
> }
>
>
> and a simple test
>
> public void test() {
>  Cheese wiz = new Cheese("wiz");
>  mgr.persist(wiz);
>  assertNotNull(wiz.getId());  // fails
> }
>
> The test will fail, because the id is not filled in until you flush and
> merge the instance.  Not the really, really, annoying thing is the merge
> returns a new instance of your object.  For example
>
> public void test() {
>  Cheese wiz = new Cheese("wiz");
>  mgr.persist(wiz);
>  mgr.flush();
>  Cheese persistentWiz = mgr.merge(wiz);
>  assertNotNull(persistentWiz.getId());
>  assertSame(wiz, persistentWiz); // fails
> }
>
> Anyway, unless you are updating your cache with the "merged" instance, the
> JPA system thinks you are attempting to create a new Cheese with the same
> name.  Assuming you changed your code to update, you will still run into
> problems when you have a multithreaded system.  The two transactions will
> likely want to use the same Unit (or Cheese in this case), the first one
> will persist and update and concurrent transaction will get a failure.  The
> easiest way to avoid these types of problems is to fill your Unit table
> ahead of time.  This is normally a reasonable requirement, as a business
> will only operate using one set of units (english vs. metric or a fixed set
> of allowed currencies), and typically the choice of of allowed units is made
> by a business person way ahead of time.  Alternatively, at runtime you can
> add units in a separate transaction.  You suspend the current tx, start a
> new one, create and persist the new unit, commit the tx, and resume the
> original tx.  The only drawback to that, is you don't get automatic rollback
> of the new unit.
>
> -dain
>
>
>
> On Apr 21, 2008, at 6:39 PM, Alexander Saint Croix wrote:
>
> > Thanks for the help, man.  I appreciate it.  The good news is that I
> > don't
> > have multiple instances of units with the same values.  I use a static
> > factory to build units (such as kilogram), and make them available via
> > public static references from an SI class.
> >
> > Forgetting units for a moment, let's say I have two entities.  Person
> > and
> > Cheese.  I provide a static reference "Cheeses.SWISS" to a pre-built
> > instance of Cheese with the "name" field of the instance set to "swiss".
> > Then, I create Person bob = new Person("Bob", Cheeses.SWISS) and call
> > mgr.persist(bob).
> >
> > In my case, I've got a cascading PERSIST relationship between Person and
> > Cheese.  So, after the transaction, Cheeses.SWISS has an ID value.
> >
> > Now, if I create Person alex = new Person("Alex", Cheeses.SWISS) and
> > pass it
> > to mgr.persist(alex), I get a funny error about the primary key field
> > already having a value (sorry, I don't have the exact error text, but
> > it's
> > easy to reproduce if it comes to that).
> >
> > I gather from this that this isn't the best way to introduce new objects
> > to
> > the persistence store when the objects reference data that is already in
> > the
> > store.  I'm guessing that instead of this, there should be some kind of
> > transaction that gets the Cheese object from the datastore and then adds
> > a
> > reference to it to the new Person object.  I think that pre-loading the
> > SI
> > units into the datastore at the beginning makes a lot of sense.  Now I
> > just
> > need some way to make sure that new entities that reference data already
> > in
> > the datastore are correctly wired.  I'm wondering if there are examples
> > of
> > this sort of thing?
> >
> > Cheers,
> > --
> > Alex
> >
> >
> >
> >
> > On Mon, Apr 21, 2008 at 12:21 PM, Dain Sundstrom <da...@iq80.com> wrote:
> >
> >  I'm not sure I fully understand what you are running into, but I'll
> > > take a
> > > stab at it.
> > >
> > > Normally, I suggest people avoid storing unit tables in the db, but in
> > > your specific case, I know you are have a persistence model for a
> > > truly
> > > generic system.  BUT, for anyone else reading this email in the
> > > archives, be
> > > careful with unit tables as you can get nasty locking problems.
> > >
> > > In your case, I would suggest, if at all possible, that you create and
> > > cache your unit objects in a separate "setup" transaction.  After the
> > > setup
> > > transaction, no one would add new units (in a production system valid
> > > units
> > > would be added by hand by an admin), so you would never have a cascade
> > > persist problem.
> > >
> > > As for your exact situation, I think the root cause of the problem is
> > > that
> > > you have multiple instances with the same value, say kilogram, in your
> > > object graph.  If instead all references refereed to the same object
> > > instance, you the key would only be generated once.  To test my
> > > theory,
> > > replace the Unit constructor with a static factory, and cache the
> > > instances
> > > you return.
> > >
> > > -dain
> > >
> > >
> > > On Apr 20, 2008, at 8:57 AM, Alexander Saint Croix wrote:
> > >
> > >  Howdy, all.
> > > >
> > > > I'm wondering what the best practice is for the following use case:
> > > >
> > > > I have a handful of entity classes, one of which is a Unit (such as
> > > > "kilogram") which has an auto-generated UID field.  Each individual
> > > > instance
> > > > of Unit ("kilogram", "second", "ampere", etc) can and will be used
> > > > by
> > > > multiple instances of the other entity classes.
> > > >
> > > > The problem arises after I persist one of the unit instances (such
> > > > as
> > > > "kilogram") and it is assigned a primary key field generated by the
> > > > container.  From that point forward, each other object that refers
> > > > to
> > > > that
> > > > specific Unit instance gives me trouble when I try to persist it,
> > > > because it
> > > > already has a nonzero UID field value.
> > > >
> > > > Ideally, I do not want to have more than one "kilogram" record in
> > > > the
> > > > "Unit"
> > > > table.  Is there a common practice to tell the persistence container
> > > > to
> > > > "apply the persist cascade of the holding object to this field,
> > > > unless
> > > > the
> > > > data represented by the field already exists in the database, in
> > > > which
> > > > case
> > > > don't cascade--just reference the UID of that instance"?
> > > >
> > > > If the question isn't clear, I can provide sample code.  If you want
> > > > to
> > > > punt
> > > > to the OpenJPA list, that's also acceptable.
> > > >
> > > > Cheers,
> > > > --
> > > > Alex
> > > >
> > > >
> > >
> > >
>

Re: Best practice for cascading persist of existing entity records

Posted by Dain Sundstrom <da...@iq80.com>.
I ran into a strange behavior (strange in my mind) a while back that  
may be effecting your code.  Say you have the following code for Cheese

public class Cheese {
   private Long id; // auto generated pk
   private String name;
   public Cheese(String name) { this.name = name; }
   public Long getId() { return id; }
   public String getName() { return name; }
}


and a simple test

public void test() {
   Cheese wiz = new Cheese("wiz");
   mgr.persist(wiz);
   assertNotNull(wiz.getId());  // fails
}

The test will fail, because the id is not filled in until you flush  
and merge the instance.  Not the really, really, annoying thing is the  
merge returns a new instance of your object.  For example

public void test() {
   Cheese wiz = new Cheese("wiz");
   mgr.persist(wiz);
   mgr.flush();
   Cheese persistentWiz = mgr.merge(wiz);
   assertNotNull(persistentWiz.getId());
   assertSame(wiz, persistentWiz); // fails
}

Anyway, unless you are updating your cache with the "merged" instance,  
the JPA system thinks you are attempting to create a new Cheese with  
the same name.  Assuming you changed your code to update, you will  
still run into problems when you have a multithreaded system.  The two  
transactions will likely want to use the same Unit (or Cheese in this  
case), the first one will persist and update and concurrent  
transaction will get a failure.  The easiest way to avoid these types  
of problems is to fill your Unit table ahead of time.  This is  
normally a reasonable requirement, as a business will only operate  
using one set of units (english vs. metric or a fixed set of allowed  
currencies), and typically the choice of of allowed units is made by a  
business person way ahead of time.  Alternatively, at runtime you can  
add units in a separate transaction.  You suspend the current tx,  
start a new one, create and persist the new unit, commit the tx, and  
resume the original tx.  The only drawback to that, is you don't get  
automatic rollback of the new unit.

-dain


On Apr 21, 2008, at 6:39 PM, Alexander Saint Croix wrote:
> Thanks for the help, man.  I appreciate it.  The good news is that I  
> don't
> have multiple instances of units with the same values.  I use a static
> factory to build units (such as kilogram), and make them available via
> public static references from an SI class.
>
> Forgetting units for a moment, let's say I have two entities.   
> Person and
> Cheese.  I provide a static reference "Cheeses.SWISS" to a pre-built
> instance of Cheese with the "name" field of the instance set to  
> "swiss".
> Then, I create Person bob = new Person("Bob", Cheeses.SWISS) and call
> mgr.persist(bob).
>
> In my case, I've got a cascading PERSIST relationship between Person  
> and
> Cheese.  So, after the transaction, Cheeses.SWISS has an ID value.
>
> Now, if I create Person alex = new Person("Alex", Cheeses.SWISS) and  
> pass it
> to mgr.persist(alex), I get a funny error about the primary key field
> already having a value (sorry, I don't have the exact error text,  
> but it's
> easy to reproduce if it comes to that).
>
> I gather from this that this isn't the best way to introduce new  
> objects to
> the persistence store when the objects reference data that is  
> already in the
> store.  I'm guessing that instead of this, there should be some kind  
> of
> transaction that gets the Cheese object from the datastore and then  
> adds a
> reference to it to the new Person object.  I think that pre-loading  
> the SI
> units into the datastore at the beginning makes a lot of sense.  Now  
> I just
> need some way to make sure that new entities that reference data  
> already in
> the datastore are correctly wired.  I'm wondering if there are  
> examples of
> this sort of thing?
>
> Cheers,
> --
> Alex
>
>
>
>
> On Mon, Apr 21, 2008 at 12:21 PM, Dain Sundstrom <da...@iq80.com>  
> wrote:
>
>> I'm not sure I fully understand what you are running into, but I'll  
>> take a
>> stab at it.
>>
>> Normally, I suggest people avoid storing unit tables in the db, but  
>> in
>> your specific case, I know you are have a persistence model for a  
>> truly
>> generic system.  BUT, for anyone else reading this email in the  
>> archives, be
>> careful with unit tables as you can get nasty locking problems.
>>
>> In your case, I would suggest, if at all possible, that you create  
>> and
>> cache your unit objects in a separate "setup" transaction.  After  
>> the setup
>> transaction, no one would add new units (in a production system  
>> valid units
>> would be added by hand by an admin), so you would never have a  
>> cascade
>> persist problem.
>>
>> As for your exact situation, I think the root cause of the problem  
>> is that
>> you have multiple instances with the same value, say kilogram, in  
>> your
>> object graph.  If instead all references refereed to the same object
>> instance, you the key would only be generated once.  To test my  
>> theory,
>> replace the Unit constructor with a static factory, and cache the  
>> instances
>> you return.
>>
>> -dain
>>
>>
>> On Apr 20, 2008, at 8:57 AM, Alexander Saint Croix wrote:
>>
>>> Howdy, all.
>>>
>>> I'm wondering what the best practice is for the following use case:
>>>
>>> I have a handful of entity classes, one of which is a Unit (such as
>>> "kilogram") which has an auto-generated UID field.  Each individual
>>> instance
>>> of Unit ("kilogram", "second", "ampere", etc) can and will be used  
>>> by
>>> multiple instances of the other entity classes.
>>>
>>> The problem arises after I persist one of the unit instances (such  
>>> as
>>> "kilogram") and it is assigned a primary key field generated by the
>>> container.  From that point forward, each other object that refers  
>>> to
>>> that
>>> specific Unit instance gives me trouble when I try to persist it,
>>> because it
>>> already has a nonzero UID field value.
>>>
>>> Ideally, I do not want to have more than one "kilogram" record in  
>>> the
>>> "Unit"
>>> table.  Is there a common practice to tell the persistence  
>>> container to
>>> "apply the persist cascade of the holding object to this field,  
>>> unless
>>> the
>>> data represented by the field already exists in the database, in  
>>> which
>>> case
>>> don't cascade--just reference the UID of that instance"?
>>>
>>> If the question isn't clear, I can provide sample code.  If you  
>>> want to
>>> punt
>>> to the OpenJPA list, that's also acceptable.
>>>
>>> Cheers,
>>> --
>>> Alex
>>>
>>
>>


Re: Best practice for cascading persist of existing entity records

Posted by Alexander Saint Croix <sa...@gmail.com>.
Thanks for the help, man.  I appreciate it.  The good news is that I don't
have multiple instances of units with the same values.  I use a static
factory to build units (such as kilogram), and make them available via
public static references from an SI class.

Forgetting units for a moment, let's say I have two entities.  Person and
Cheese.  I provide a static reference "Cheeses.SWISS" to a pre-built
instance of Cheese with the "name" field of the instance set to "swiss".
Then, I create Person bob = new Person("Bob", Cheeses.SWISS) and call
mgr.persist(bob).

In my case, I've got a cascading PERSIST relationship between Person and
Cheese.  So, after the transaction, Cheeses.SWISS has an ID value.

Now, if I create Person alex = new Person("Alex", Cheeses.SWISS) and pass it
to mgr.persist(alex), I get a funny error about the primary key field
already having a value (sorry, I don't have the exact error text, but it's
easy to reproduce if it comes to that).

I gather from this that this isn't the best way to introduce new objects to
the persistence store when the objects reference data that is already in the
store.  I'm guessing that instead of this, there should be some kind of
transaction that gets the Cheese object from the datastore and then adds a
reference to it to the new Person object.  I think that pre-loading the SI
units into the datastore at the beginning makes a lot of sense.  Now I just
need some way to make sure that new entities that reference data already in
the datastore are correctly wired.  I'm wondering if there are examples of
this sort of thing?

Cheers,
--
Alex




On Mon, Apr 21, 2008 at 12:21 PM, Dain Sundstrom <da...@iq80.com> wrote:

> I'm not sure I fully understand what you are running into, but I'll take a
> stab at it.
>
> Normally, I suggest people avoid storing unit tables in the db, but in
> your specific case, I know you are have a persistence model for a truly
> generic system.  BUT, for anyone else reading this email in the archives, be
> careful with unit tables as you can get nasty locking problems.
>
> In your case, I would suggest, if at all possible, that you create and
> cache your unit objects in a separate "setup" transaction.  After the setup
> transaction, no one would add new units (in a production system valid units
> would be added by hand by an admin), so you would never have a cascade
> persist problem.
>
> As for your exact situation, I think the root cause of the problem is that
> you have multiple instances with the same value, say kilogram, in your
> object graph.  If instead all references refereed to the same object
> instance, you the key would only be generated once.  To test my theory,
> replace the Unit constructor with a static factory, and cache the instances
> you return.
>
> -dain
>
>
> On Apr 20, 2008, at 8:57 AM, Alexander Saint Croix wrote:
>
> > Howdy, all.
> >
> > I'm wondering what the best practice is for the following use case:
> >
> > I have a handful of entity classes, one of which is a Unit (such as
> > "kilogram") which has an auto-generated UID field.  Each individual
> > instance
> > of Unit ("kilogram", "second", "ampere", etc) can and will be used by
> > multiple instances of the other entity classes.
> >
> > The problem arises after I persist one of the unit instances (such as
> > "kilogram") and it is assigned a primary key field generated by the
> > container.  From that point forward, each other object that refers to
> > that
> > specific Unit instance gives me trouble when I try to persist it,
> > because it
> > already has a nonzero UID field value.
> >
> > Ideally, I do not want to have more than one "kilogram" record in the
> > "Unit"
> > table.  Is there a common practice to tell the persistence container to
> > "apply the persist cascade of the holding object to this field, unless
> > the
> > data represented by the field already exists in the database, in which
> > case
> > don't cascade--just reference the UID of that instance"?
> >
> > If the question isn't clear, I can provide sample code.  If you want to
> > punt
> > to the OpenJPA list, that's also acceptable.
> >
> > Cheers,
> > --
> > Alex
> >
>
>

Re: Best practice for cascading persist of existing entity records

Posted by Dain Sundstrom <da...@iq80.com>.
I'm not sure I fully understand what you are running into, but I'll  
take a stab at it.

Normally, I suggest people avoid storing unit tables in the db, but in  
your specific case, I know you are have a persistence model for a  
truly generic system.  BUT, for anyone else reading this email in the  
archives, be careful with unit tables as you can get nasty locking  
problems.

In your case, I would suggest, if at all possible, that you create and  
cache your unit objects in a separate "setup" transaction.  After the  
setup transaction, no one would add new units (in a production system  
valid units would be added by hand by an admin), so you would never  
have a cascade persist problem.

As for your exact situation, I think the root cause of the problem is  
that you have multiple instances with the same value, say kilogram, in  
your object graph.  If instead all references refereed to the same  
object instance, you the key would only be generated once.  To test my  
theory, replace the Unit constructor with a static factory, and cache  
the instances you return.

-dain

On Apr 20, 2008, at 8:57 AM, Alexander Saint Croix wrote:
> Howdy, all.
>
> I'm wondering what the best practice is for the following use case:
>
> I have a handful of entity classes, one of which is a Unit (such as
> "kilogram") which has an auto-generated UID field.  Each individual  
> instance
> of Unit ("kilogram", "second", "ampere", etc) can and will be used by
> multiple instances of the other entity classes.
>
> The problem arises after I persist one of the unit instances (such as
> "kilogram") and it is assigned a primary key field generated by the
> container.  From that point forward, each other object that refers  
> to that
> specific Unit instance gives me trouble when I try to persist it,  
> because it
> already has a nonzero UID field value.
>
> Ideally, I do not want to have more than one "kilogram" record in  
> the "Unit"
> table.  Is there a common practice to tell the persistence container  
> to
> "apply the persist cascade of the holding object to this field,  
> unless the
> data represented by the field already exists in the database, in  
> which case
> don't cascade--just reference the UID of that instance"?
>
> If the question isn't clear, I can provide sample code.  If you want  
> to punt
> to the OpenJPA list, that's also acceptable.
>
> Cheers,
> --
> Alex