You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cayenne.apache.org by Philip Copeland <pc...@avoka.com> on 2007/09/23 00:26:45 UTC

Transactions , Primary Keys

Hi,

 

We've used Cayenne extensively over the last 2 years - with great
success on many projects. 

 

I'm in the process of designing & implementing a generic import/export
feature for some of our projects - and its raising some difficult issues
that I don't know how to resolve. We're currently using the 1.2 code
base and generally use the database generated Primary Key option. The
import/export raise challenges because its potentially dealing with very
large amounts of data - and the hard part is that I need to "remap" the
foreign key relationships as part of the process. We've though of
several ways of doing this but they all involve being able to query
objects that have not yet been written to the database. At the same time
the import needs to happen as a "transaction" so that it can be rolled
back if something fails.

 

The main problem I'm facing is that objects don't get written to the
database (nor Primary Keys generated) unless we commit our transaction.
What I'm wishing for is a mode where all changes are made to the
underlying database - but not committed until I have completed all the
work I need to do. The central issue is that I need to be able to
perform database queries on objects that are uncommitted (and part of my
current transaction) - and I don't have a way to do this at present with
Cayenne. The though of having to go back to JDBC for this is not
attractive. 

 

The more I think about this - the more I wonder why the "write or save"
function is not handled separately from a "commit". 

 

I'd be very interested if anyone has suggestions.

 

Thanks

 

Philip

 


Re: Transactions , Primary Keys

Posted by Aristedes Maniatis <ar...@maniatis.org>.
On 23/09/2007, at 2:23 PM, Philip Copeland wrote:

> Did you have an issue committing every "several thousand records".  
> What
> would you do if it failed at that stage - did you figure out a way to
> resume again at a known point? Handling large imports is never easy.

Initially we had some memory issues, but improvements by Andrus in  
the builds of Cayenne 3 around April 2007 fixed that nicely. Then we  
experimented with different batch sizes and their effect on speed. We  
found that 1000 worked well for us, so I think that's what we left it  
at in the end.

We prevent failure during commit by running validateForSave() on each  
record. If it fails we remove it from the context and write out a log  
entry. In our case we regularly see hundreds of errors in the import  
data, but thankfully they mostly aren't our problem to fix. That way  
we continue right to the end, even if some records fail. This allows  
the customer to fix the errors and either import them along, or  
return to a backup database and rerun the whole thing.

Of course, your goals may be quite different.

Ari Maniatis


-------------------------->
Aristedes Maniatis
phone +61 2 9660 9700
PGP fingerprint 08 57 20 4B 80 69 59 E2  A9 BF 2D 48 C2 20 0C C8



RE: Transactions , Primary Keys

Posted by Philip Copeland <pc...@avoka.com>.
Hi,

I was planning to use a similar approach - but was thinking of using a
temporary table - but you are right - if I can do this via a map it
could work - as long as the number of objects in the import set is
reasonable.

Did you have an issue committing every "several thousand records". What
would you do if it failed at that stage - did you figure out a way to
resume again at a known point? Handling large imports is never easy.

Phil



-----Original Message-----
From: Aristedes Maniatis [mailto:ari@maniatis.org] 
Sent: Sunday, 23 September 2007 11:54 AM
To: user@cayenne.apache.org
Subject: Re: Transactions , Primary Keys 


On 23/09/2007, at 8:26 AM, Philip Copeland wrote:

> The main problem I'm facing is that objects don't get written to the 
> database (nor Primary Keys generated) unless we commit our 
> transaction.
> What I'm wishing for is a mode where all changes are made to the 
> underlying database - but not committed until I have completed all the

> work I need to do.

Marcin and I have also spent a fair bit of time importing data into a
Cayenne driven system. In our case we had an XML data source with
(usually) about 150,000 objects which required all the keys and
relationships to be remapped.

I believe that if your solution requires extensive manipulation and
searching of primary and foreign keys through Cayenne, you are missing a
large part of the advantages of an ORM driven approach and using direct
JDBC might be simpler. However, think about whether you are really
approaching this in the best way.


In our case we read in the source XML and create not only the Cayenne
objects, but also a Map<Integer,PersistentEntity> which links those
objects back to the PKs found in the original import data. That Map can
then be used to look up the new object from the old PK/FK as required,
without you needing to know anything about what new PK will be assigned
upon commit.

In our case we committed the context every several thousands records, or
at certain stages in the process.

Does this help?

Ari Maniatis



-------------------------->
Aristedes Maniatis
phone +61 2 9660 9700
PGP fingerprint 08 57 20 4B 80 69 59 E2  A9 BF 2D 48 C2 20 0C C8


Re: Transactions , Primary Keys

Posted by Aristedes Maniatis <ar...@maniatis.org>.
On 23/09/2007, at 8:26 AM, Philip Copeland wrote:

> The main problem I'm facing is that objects don't get written to the
> database (nor Primary Keys generated) unless we commit our  
> transaction.
> What I'm wishing for is a mode where all changes are made to the
> underlying database - but not committed until I have completed all the
> work I need to do.

Marcin and I have also spent a fair bit of time importing data into a  
Cayenne driven system. In our case we had an XML data source with  
(usually) about 150,000 objects which required all the keys and  
relationships to be remapped.

I believe that if your solution requires extensive manipulation and  
searching of primary and foreign keys through Cayenne, you are  
missing a large part of the advantages of an ORM driven approach and  
using direct JDBC might be simpler. However, think about whether you  
are really approaching this in the best way.


In our case we read in the source XML and create not only the Cayenne  
objects, but also a Map<Integer,PersistentEntity> which links those  
objects back to the PKs found in the original import data. That Map  
can then be used to look up the new object from the old PK/FK as  
required, without you needing to know anything about what new PK will  
be assigned upon commit.

In our case we committed the context every several thousands records,  
or at certain stages in the process.

Does this help?

Ari Maniatis



-------------------------->
Aristedes Maniatis
phone +61 2 9660 9700
PGP fingerprint 08 57 20 4B 80 69 59 E2  A9 BF 2D 48 C2 20 0C C8