You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cayenne.apache.org by Michael Gentry <mg...@masslight.net> on 2015/12/08 12:53:36 UTC

Re: using database generated PK

Switched to dev...

There are two basic types of database-generated keys:

1) Auto-increment (such as MySQL and H2)
2) Sequences (such as PostgreSQL and Oracle)

#1 has to be done row-by-row per insert.
#2 can handle "batches" (you define how many PKs to cache) and is similar
in operation to AUTO_PK_SUPPORT.

In theory, #2 should be faster since the PKs/FKs can be assigned in advance
before contacting the DB instead of having to wait on a reply back from the
DB.  Remember, Cayenne can't insert a record with a FK to a PK until that
PK is known, so it has to get the auto-increment PK value from the DB and
fill in the FK before it can do another insert.

One limitation we currently have is handling cyclic graphs:

A:FK -> B:PK
B:FK -> C:PK
C:FK -> A:FK

With auto-increment, there isn't a good solution to this that I can think
of, especially if DB constraints are in play (if an FK is required, you
can't insert a NULL FK, then do an update in the same transaction after you
know it).  With sequences (or AUTO_PK_SUPPORT), this would actually be
possible, because we generate the PKs prior to the inserts, therefore we
can assign all the FKs prior to inserts, too.  However the last time I
looked at the code, I don't believe it is smart enough to have two
different paths of execution, so we still can't handle cyclic graphs (plus
Ashwood fails).

Any thoughts on the value of investigating changing this?

Thanks,

mrg


On Wed, Dec 2, 2015 at 10:08 PM, John Huss <jo...@gmail.com> wrote:

> The AUTO_PK_SUPPORT table exists only to support PK generation for DBs that
> do not have native support for it built in.  However, in this day and age
> any database worth using has this built in.  I've run production apps with
> Postgres, MySQL, FrontBase, and Interbase / Firebird, and all of them have
> native PK generation.  Heck, even Derby, H2, and HSQLDB support it.
>
> The main advantage of a native solution is that you know it works.  If you
> ever used native PK generation the AUTO_PK_SUPPORT table feels very
> kludgy.  It works (I guess?), but it's not ideal.  A native solution is
> faster and simpler.  It handles transactions better.  Most DBs (all?) do PK
> generation outside of transactions, so incrementing the sequence always
> persists regardless of transaction rollbacks or commits.
>
> Performance is better because a native DB implementation is able to
> implement it with as little locking as necessary, which is certain to be
> much faster that the equivalent set of SQL commands.
>
> Another advantage is being able to use the same PK generators to insert
> data directly with SQL, which is sometimes necessary.  Also, maintenance is
> easier since there are simple built in functions to query or manipulate the
> sequence value.
>
> MySQL's implementation is different than the others I've used.  There's is
> more magical.  The component parts aren't accessible to developers.  The
> engine just magically chooses the next highest PK value when rows are
> inserted without giving any idea where this number comes from or how it is
> chosen.  This has some benefits since the PKs don't have to be explicitly
> generated ahead of time.  I don't know if it's better, but it's different
> anyhow.
>
> On Wed, Dec 2, 2015 at 3:30 AM Aristedes Maniatis <am...@apache.org>
> wrote:
>
> > I've always used the AUTO_PK_SUPPORT approach to PK generation, all the
> > way back from WebObject EOF days many many years ago.
> >
> > However a recently discovered bug in this approach [1] and how it
> > intersects with transactions may require use to think about going over to
> > database generated primary keys with a auto-increment column. For
> > developers not using an ORM this is pretty common I understand.
> >
> > What are the downsides to that approach? Is performance, reliability and
> > everything else just the same? What problem was AUTO_PK_SUPPORT
> originally
> > designed to solve?
> >
> > Cheers
> > Ari
> >
> >
> > [1] https://issues.apache.org/jira/browse/CAY-2040
> >
> >
> > --
> > -------------------------->
> > Aristedes Maniatis
> > GPG fingerprint CBFB 84B4 738D 4E87 5E5C  5EFA EF6A 7D2E 3E49 102A
> >
>

Re: using database generated PK

Posted by Michael Gentry <mg...@masslight.net>.
It has been a few years since I last looked at Ashwood, but I believe you
are right about #1.  If I recall, the graph is pre-generated once and
reused after that, therefore no PK considerations are involved, only Entity
relationships.

I hadn't thought of having an option to turn off Ashwood completely.  That
would require the DB to only use deferrable constraints, but could be a
quite useful feature.


On Thu, Dec 10, 2015 at 7:26 AM, Andrus Adamchik <an...@objectstyle.org>
wrote:

>
> > On Dec 10, 2015, at 3:13 PM, Michael Gentry <mg...@masslight.net>
> wrote:
> >
> > I didn't understand your #1 example.
>
> "A1 depends on B1 depends on A2" <- no cycle
>
>    when mapped to entities becomes this:
>
> "EntityA depends on EntityB depends on EntityA" <- IIRC that will be
> treated as an unresolvable cycle by Ashwood.
>
> > #3 I'm pretty sure can be done with deferrable constraints (such as
> > PostgreSQL supports) if you assign the keys up-front.  The mandatory FK
> > won't be checked until the transaction is committed.  I know in the past
> > I've had to make several schema design changes due to limitations in
> MySQL
> > (and sometimes Cayenne) because of constraints/etc.  There will simply be
> > some cases that aren't easily solved by Cayenne due to other external
> > choices, and that's OK -- we shouldn't try to solve those.  But
> > pre-assigning the PKs if using sequences or auto-PK support might be
> > trivial enough for us to do and, if your DB supports deferrable
> > constraints, solve a few of those issues.
>
> This won't work with auto-incremented PK of course. But any other current
> Cayenne PK generation strategy (sequences, PK table lookup) should work
> with deferred constraints  as Cayenne does generate all PKs before commit.
> The only thing we need is to turn on deferred constraints (and maybe turn
> off Ashwood, as in this case Cayenne-side operation ordering will be a
> waste).
>
> Andrus
>
>

Re: using database generated PK

Posted by Andrus Adamchik <an...@objectstyle.org>.
> On Dec 10, 2015, at 3:13 PM, Michael Gentry <mg...@masslight.net> wrote:
> 
> I didn't understand your #1 example.

"A1 depends on B1 depends on A2" <- no cycle

   when mapped to entities becomes this:

"EntityA depends on EntityB depends on EntityA" <- IIRC that will be treated as an unresolvable cycle by Ashwood.

> #3 I'm pretty sure can be done with deferrable constraints (such as
> PostgreSQL supports) if you assign the keys up-front.  The mandatory FK
> won't be checked until the transaction is committed.  I know in the past
> I've had to make several schema design changes due to limitations in MySQL
> (and sometimes Cayenne) because of constraints/etc.  There will simply be
> some cases that aren't easily solved by Cayenne due to other external
> choices, and that's OK -- we shouldn't try to solve those.  But
> pre-assigning the PKs if using sequences or auto-PK support might be
> trivial enough for us to do and, if your DB supports deferrable
> constraints, solve a few of those issues.

This won't work with auto-incremented PK of course. But any other current Cayenne PK generation strategy (sequences, PK table lookup) should work with deferred constraints  as Cayenne does generate all PKs before commit. The only thing we need is to turn on deferred constraints (and maybe turn off Ashwood, as in this case Cayenne-side operation ordering will be a waste).

Andrus


Re: using database generated PK

Posted by Michael Gentry <mg...@masslight.net>.
Hi Andrus,

I didn't understand your #1 example.

#2 is doable, but would make the commit code more complex and I'm not sure
it is worth the added burden to Cayenne.

#3 I'm pretty sure can be done with deferrable constraints (such as
PostgreSQL supports) if you assign the keys up-front.  The mandatory FK
won't be checked until the transaction is committed.  I know in the past
I've had to make several schema design changes due to limitations in MySQL
(and sometimes Cayenne) because of constraints/etc.  There will simply be
some cases that aren't easily solved by Cayenne due to other external
choices, and that's OK -- we shouldn't try to solve those.  But
pre-assigning the PKs if using sequences or auto-PK support might be
trivial enough for us to do and, if your DB supports deferrable
constraints, solve a few of those issues.

Thanks,

mrg


On Thu, Dec 10, 2015 at 1:43 AM, Andrus Adamchik <an...@objectstyle.org>
wrote:

> Assigning a PK before commit is a good feature on its own. Still not sure
> it will solve the cyclic graphs issue. Consider the following cases:
>
> 1. Cycles between entities, but not between the objects. Can be handled by
> making Ashwood smarter.
>
> 2. Optional FK, which can be handled via INSERT then UPDATE:
>
> * insert department with null manager
> * insert a person as a department member
> * then update department to set the person as a manager
>
> 3. Same as #2, but department.manager_id is mandatory. There's no valid
> operation ordering.
>
> So #1 and #2 do not require us to know the PK upfront. And #3 can't be
> solved even if we do.
>
> Andrus
>
> > On Dec 8, 2015, at 2:53 PM, Michael Gentry <mg...@masslight.net>
> wrote:
> >
> > Switched to dev...
> >
> > There are two basic types of database-generated keys:
> >
> > 1) Auto-increment (such as MySQL and H2)
> > 2) Sequences (such as PostgreSQL and Oracle)
> >
> > #1 has to be done row-by-row per insert.
> > #2 can handle "batches" (you define how many PKs to cache) and is similar
> > in operation to AUTO_PK_SUPPORT.
> >
> > In theory, #2 should be faster since the PKs/FKs can be assigned in
> advance
> > before contacting the DB instead of having to wait on a reply back from
> the
> > DB.  Remember, Cayenne can't insert a record with a FK to a PK until that
> > PK is known, so it has to get the auto-increment PK value from the DB and
> > fill in the FK before it can do another insert.
> >
> > One limitation we currently have is handling cyclic graphs:
> >
> > A:FK -> B:PK
> > B:FK -> C:PK
> > C:FK -> A:FK
> >
> > With auto-increment, there isn't a good solution to this that I can think
> > of, especially if DB constraints are in play (if an FK is required, you
> > can't insert a NULL FK, then do an update in the same transaction after
> you
> > know it).  With sequences (or AUTO_PK_SUPPORT), this would actually be
> > possible, because we generate the PKs prior to the inserts, therefore we
> > can assign all the FKs prior to inserts, too.  However the last time I
> > looked at the code, I don't believe it is smart enough to have two
> > different paths of execution, so we still can't handle cyclic graphs
> (plus
> > Ashwood fails).
> >
> > Any thoughts on the value of investigating changing this?
> >
> > Thanks,
> >
> > mrg
> >
> >
> > On Wed, Dec 2, 2015 at 10:08 PM, John Huss <jo...@gmail.com> wrote:
> >
> >> The AUTO_PK_SUPPORT table exists only to support PK generation for DBs
> that
> >> do not have native support for it built in.  However, in this day and
> age
> >> any database worth using has this built in.  I've run production apps
> with
> >> Postgres, MySQL, FrontBase, and Interbase / Firebird, and all of them
> have
> >> native PK generation.  Heck, even Derby, H2, and HSQLDB support it.
> >>
> >> The main advantage of a native solution is that you know it works.  If
> you
> >> ever used native PK generation the AUTO_PK_SUPPORT table feels very
> >> kludgy.  It works (I guess?), but it's not ideal.  A native solution is
> >> faster and simpler.  It handles transactions better.  Most DBs (all?)
> do PK
> >> generation outside of transactions, so incrementing the sequence always
> >> persists regardless of transaction rollbacks or commits.
> >>
> >> Performance is better because a native DB implementation is able to
> >> implement it with as little locking as necessary, which is certain to be
> >> much faster that the equivalent set of SQL commands.
> >>
> >> Another advantage is being able to use the same PK generators to insert
> >> data directly with SQL, which is sometimes necessary.  Also,
> maintenance is
> >> easier since there are simple built in functions to query or manipulate
> the
> >> sequence value.
> >>
> >> MySQL's implementation is different than the others I've used.  There's
> is
> >> more magical.  The component parts aren't accessible to developers.  The
> >> engine just magically chooses the next highest PK value when rows are
> >> inserted without giving any idea where this number comes from or how it
> is
> >> chosen.  This has some benefits since the PKs don't have to be
> explicitly
> >> generated ahead of time.  I don't know if it's better, but it's
> different
> >> anyhow.
> >>
> >> On Wed, Dec 2, 2015 at 3:30 AM Aristedes Maniatis <amaniatis@apache.org
> >
> >> wrote:
> >>
> >>> I've always used the AUTO_PK_SUPPORT approach to PK generation, all the
> >>> way back from WebObject EOF days many many years ago.
> >>>
> >>> However a recently discovered bug in this approach [1] and how it
> >>> intersects with transactions may require use to think about going over
> to
> >>> database generated primary keys with a auto-increment column. For
> >>> developers not using an ORM this is pretty common I understand.
> >>>
> >>> What are the downsides to that approach? Is performance, reliability
> and
> >>> everything else just the same? What problem was AUTO_PK_SUPPORT
> >> originally
> >>> designed to solve?
> >>>
> >>> Cheers
> >>> Ari
> >>>
> >>>
> >>> [1] https://issues.apache.org/jira/browse/CAY-2040
> >>>
> >>>
> >>> --
> >>> -------------------------->
> >>> Aristedes Maniatis
> >>> GPG fingerprint CBFB 84B4 738D 4E87 5E5C  5EFA EF6A 7D2E 3E49 102A
> >>>
> >>
>
>

Re: using database generated PK

Posted by Andrus Adamchik <an...@objectstyle.org>.
Assigning a PK before commit is a good feature on its own. Still not sure it will solve the cyclic graphs issue. Consider the following cases:

1. Cycles between entities, but not between the objects. Can be handled by making Ashwood smarter.

2. Optional FK, which can be handled via INSERT then UPDATE:

* insert department with null manager
* insert a person as a department member
* then update department to set the person as a manager

3. Same as #2, but department.manager_id is mandatory. There's no valid operation ordering.

So #1 and #2 do not require us to know the PK upfront. And #3 can't be solved even if we do.

Andrus

> On Dec 8, 2015, at 2:53 PM, Michael Gentry <mg...@masslight.net> wrote:
> 
> Switched to dev...
> 
> There are two basic types of database-generated keys:
> 
> 1) Auto-increment (such as MySQL and H2)
> 2) Sequences (such as PostgreSQL and Oracle)
> 
> #1 has to be done row-by-row per insert.
> #2 can handle "batches" (you define how many PKs to cache) and is similar
> in operation to AUTO_PK_SUPPORT.
> 
> In theory, #2 should be faster since the PKs/FKs can be assigned in advance
> before contacting the DB instead of having to wait on a reply back from the
> DB.  Remember, Cayenne can't insert a record with a FK to a PK until that
> PK is known, so it has to get the auto-increment PK value from the DB and
> fill in the FK before it can do another insert.
> 
> One limitation we currently have is handling cyclic graphs:
> 
> A:FK -> B:PK
> B:FK -> C:PK
> C:FK -> A:FK
> 
> With auto-increment, there isn't a good solution to this that I can think
> of, especially if DB constraints are in play (if an FK is required, you
> can't insert a NULL FK, then do an update in the same transaction after you
> know it).  With sequences (or AUTO_PK_SUPPORT), this would actually be
> possible, because we generate the PKs prior to the inserts, therefore we
> can assign all the FKs prior to inserts, too.  However the last time I
> looked at the code, I don't believe it is smart enough to have two
> different paths of execution, so we still can't handle cyclic graphs (plus
> Ashwood fails).
> 
> Any thoughts on the value of investigating changing this?
> 
> Thanks,
> 
> mrg
> 
> 
> On Wed, Dec 2, 2015 at 10:08 PM, John Huss <jo...@gmail.com> wrote:
> 
>> The AUTO_PK_SUPPORT table exists only to support PK generation for DBs that
>> do not have native support for it built in.  However, in this day and age
>> any database worth using has this built in.  I've run production apps with
>> Postgres, MySQL, FrontBase, and Interbase / Firebird, and all of them have
>> native PK generation.  Heck, even Derby, H2, and HSQLDB support it.
>> 
>> The main advantage of a native solution is that you know it works.  If you
>> ever used native PK generation the AUTO_PK_SUPPORT table feels very
>> kludgy.  It works (I guess?), but it's not ideal.  A native solution is
>> faster and simpler.  It handles transactions better.  Most DBs (all?) do PK
>> generation outside of transactions, so incrementing the sequence always
>> persists regardless of transaction rollbacks or commits.
>> 
>> Performance is better because a native DB implementation is able to
>> implement it with as little locking as necessary, which is certain to be
>> much faster that the equivalent set of SQL commands.
>> 
>> Another advantage is being able to use the same PK generators to insert
>> data directly with SQL, which is sometimes necessary.  Also, maintenance is
>> easier since there are simple built in functions to query or manipulate the
>> sequence value.
>> 
>> MySQL's implementation is different than the others I've used.  There's is
>> more magical.  The component parts aren't accessible to developers.  The
>> engine just magically chooses the next highest PK value when rows are
>> inserted without giving any idea where this number comes from or how it is
>> chosen.  This has some benefits since the PKs don't have to be explicitly
>> generated ahead of time.  I don't know if it's better, but it's different
>> anyhow.
>> 
>> On Wed, Dec 2, 2015 at 3:30 AM Aristedes Maniatis <am...@apache.org>
>> wrote:
>> 
>>> I've always used the AUTO_PK_SUPPORT approach to PK generation, all the
>>> way back from WebObject EOF days many many years ago.
>>> 
>>> However a recently discovered bug in this approach [1] and how it
>>> intersects with transactions may require use to think about going over to
>>> database generated primary keys with a auto-increment column. For
>>> developers not using an ORM this is pretty common I understand.
>>> 
>>> What are the downsides to that approach? Is performance, reliability and
>>> everything else just the same? What problem was AUTO_PK_SUPPORT
>> originally
>>> designed to solve?
>>> 
>>> Cheers
>>> Ari
>>> 
>>> 
>>> [1] https://issues.apache.org/jira/browse/CAY-2040
>>> 
>>> 
>>> --
>>> -------------------------->
>>> Aristedes Maniatis
>>> GPG fingerprint CBFB 84B4 738D 4E87 5E5C  5EFA EF6A 7D2E 3E49 102A
>>> 
>>