You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cayenne.apache.org by Andrus Adamchik <an...@objectstyle.org> on 2016/07/01 00:43:21 UTC

Re: Order of operations?

There are cases when INSERT -> UPDATE -> DELETE is the only order that would satisfy the constraints. Such as when UPDATE nullifies FKs to the record that is later deleted. So this was a conscious decision. Too simplistic of course, as we've seen here. So a smarter algorithm is in order. But just reversing it to DELETE -> UPDATE -> INSERT (or picking any other fixed order for that matter) is going to break apps that work currently.

Andrus


> On Jun 22, 2016, at 9:32 PM, Michael Gentry <bl...@gmail.com> wrote:
> 
> I believe I inquired about the ordering years ago and I think Andrus gave a
> reason why, but it escapes me at the moment.  Perhaps he will remember and
> chime in here.
> 
> mrg
> 
> 
> On Wed, Jun 22, 2016 at 6:58 PM, Aristedes Maniatis <ar...@maniatis.org>
> wrote:
> 
>> On 23/06/2016 5:25am, Lon Varscsak wrote:
>>> Okay, I’ve found where it’s at (DataDomainFlushAction.preprocess).  I
>> don’t
>>> see an easy way to override this, without just forking (which is totally
>>> doable).  Does anyone know why the default order of operations is INSERT
>> ->
>>> UPDATE -> DELETE?  Because if there’s no specific reason, it seems like
>> we
>>> could change this to support DBs that don’t have deferred constraints.
>> (or
>>> provide a hook to reorder these)
>> 
>> 
>> That's a pretty old piece of code, probably before my time. The history is
>> unfortunately broken by a move back in 2013 [1] but it would be interesting
>> to go back to the origins of that file and see if any commit message sheds
>> light on why that ordering was chosen. It does seem an odd choice, but
>> perhaps there was a reason.
>> 
>> Before you fork Cayenne let's see if we can improve this behaviour for the
>> entire community.
>> 
>> 
>> [1]
>> https://github.com/apache/cayenne/commits/b0631deb251f036840d1ca3aee6d4ae50f2441bf/cayenne-server/src/main/java/org/apache/cayenne/access/DataDomainFlushAction.java
>> 
>> --
>> -------------------------->
>> Aristedes Maniatis
>> GPG fingerprint CBFB 84B4 738D 4E87 5E5C  5EFA EF6A 7D2E 3E49 102A
>> 


Re: Order of operations?

Posted by Michael Gentry <bl...@gmail.com>.
I looked at AshwoodEntitySorter a few years ago and also concluded it would
be non-trivial and likely would be more than just the graph library that
needed to be changed, such as extra logic to handle dependency cycles.

On Thu, Jun 30, 2016 at 9:01 PM, Andrus Adamchik <an...@objectstyle.org>
wrote:

> It may be time to revisit current AshwoodEntitySorter and start
> building/sorting DB operations based on the object graph change log.
> Relationships between objects participating in a given transaction will
> likely provide us with a better insight about sorting then analyzing the
> underlying entities by themselves (something that we do now). The we can
> execute a mix of INSERT/UPDATE/DELETE in an order defined dynamically based
> on a commit set at hand.
>
> Perhaps it will also help us in resolving dependency cycles, where there's
> no single valid operation order for simple operations (e.g. creating both
> department and its manager who is also a department employee). This can be
> solved by running an INSERT with null FK on one side of the cycle followed
> by UPDATE with the right PK.
>
> Also current AshwoodEntitySorter approach is geared towards merging all
> semantically identical operations in batched PreparedStatements, thus
> improving performance of large commits. If we are to do something new,
> we'll need to take this into account.
>
> So there are tradeoffs and challenges and this won't be an easy task. But
> if anyone feels like applying their graph theory knowledge to a non-trivial
> problem, please speak up :)
>
> Andrus
>
> > On Jun 30, 2016, at 8:43 PM, Andrus Adamchik <an...@objectstyle.org>
> wrote:
> >
> > There are cases when INSERT -> UPDATE -> DELETE is the only order that
> would satisfy the constraints. Such as when UPDATE nullifies FKs to the
> record that is later deleted. So this was a conscious decision. Too
> simplistic of course, as we've seen here. So a smarter algorithm is in
> order. But just reversing it to DELETE -> UPDATE -> INSERT (or picking any
> other fixed order for that matter) is going to break apps that work
> currently.
> >
> > Andrus
> >
> >
> >> On Jun 22, 2016, at 9:32 PM, Michael Gentry <bl...@gmail.com>
> wrote:
> >>
> >> I believe I inquired about the ordering years ago and I think Andrus
> gave a
> >> reason why, but it escapes me at the moment.  Perhaps he will remember
> and
> >> chime in here.
> >>
> >> mrg
> >>
> >>
> >> On Wed, Jun 22, 2016 at 6:58 PM, Aristedes Maniatis <ar...@maniatis.org>
> >> wrote:
> >>
> >>> On 23/06/2016 5:25am, Lon Varscsak wrote:
> >>>> Okay, I’ve found where it’s at (DataDomainFlushAction.preprocess).  I
> >>> don’t
> >>>> see an easy way to override this, without just forking (which is
> totally
> >>>> doable).  Does anyone know why the default order of operations is
> INSERT
> >>> ->
> >>>> UPDATE -> DELETE?  Because if there’s no specific reason, it seems
> like
> >>> we
> >>>> could change this to support DBs that don’t have deferred constraints.
> >>> (or
> >>>> provide a hook to reorder these)
> >>>
> >>>
> >>> That's a pretty old piece of code, probably before my time. The
> history is
> >>> unfortunately broken by a move back in 2013 [1] but it would be
> interesting
> >>> to go back to the origins of that file and see if any commit message
> sheds
> >>> light on why that ordering was chosen. It does seem an odd choice, but
> >>> perhaps there was a reason.
> >>>
> >>> Before you fork Cayenne let's see if we can improve this behaviour for
> the
> >>> entire community.
> >>>
> >>>
> >>> [1]
> >>>
> https://github.com/apache/cayenne/commits/b0631deb251f036840d1ca3aee6d4ae50f2441bf/cayenne-server/src/main/java/org/apache/cayenne/access/DataDomainFlushAction.java
> >>>
> >>> --
> >>> -------------------------->
> >>> Aristedes Maniatis
> >>> GPG fingerprint CBFB 84B4 738D 4E87 5E5C  5EFA EF6A 7D2E 3E49 102A
> >>>
> >
>
>

Re: Order of operations?

Posted by Andrus Adamchik <an...@objectstyle.org>.

> On Jul 1, 2016, at 9:09 AM, Aristedes Maniatis <ar...@maniatis.org> wrote:
> 
> On 1/07/2016 8:57pm, Andrus Adamchik wrote:
>> 
>>> On Jun 30, 2016, at 11:04 PM, Aristedes Maniatis <ar...@maniatis.org> wrote:
>>> 
>>> 1. The Cayenne model will need to have additional metadata about database indexes and constraints. I've always liked that idea anyway but lacked the time to work on it personally. It could open up interesting interactions with projects like liquibase [1]
>> 
>> We already have information about relationships (that are essentially FK constraints). Why do we need to know about indexes for ordering purposes?
> 
> I was just thinking about db constraints really. For example, a unique index on a field (other than the PK) might need to be taken into consideration if you were deleting and creating records. The delete has to happen before the create, or they need to be coalesced into a single update.

Ah ok. Yeah, once you start digging, there's always an extra layer of complexity :)

Andrus

Re: Order of operations?

Posted by Michael Gentry <bl...@gmail.com>.
Constraints can get even more complex, like Oracle and PostgreSQL (perhaps
others) allow deferrable constraints, which are nice, but MySQL will blow
up immediately if a constraint fails even if it'll be satisfied by the end
of the transaction.


On Fri, Jul 1, 2016 at 9:09 AM, Aristedes Maniatis <ar...@maniatis.org> wrote:

> On 1/07/2016 8:57pm, Andrus Adamchik wrote:
> >
> >> On Jun 30, 2016, at 11:04 PM, Aristedes Maniatis <ar...@maniatis.org>
> wrote:
> >>
> >> 1. The Cayenne model will need to have additional metadata about
> database indexes and constraints. I've always liked that idea anyway but
> lacked the time to work on it personally. It could open up interesting
> interactions with projects like liquibase [1]
> >
> > We already have information about relationships (that are essentially FK
> constraints). Why do we need to know about indexes for ordering purposes?
>
> I was just thinking about db constraints really. For example, a unique
> index on a field (other than the PK) might need to be taken into
> consideration if you were deleting and creating records. The delete has to
> happen before the create, or they need to be coalesced into a single update.
>
> Ari
>
>
>
> --
> -------------------------->
> Aristedes Maniatis
> GPG fingerprint CBFB 84B4 738D 4E87 5E5C  5EFA EF6A 7D2E 3E49 102A
>

Re: Order of operations?

Posted by Aristedes Maniatis <ar...@maniatis.org>.
On 1/07/2016 8:57pm, Andrus Adamchik wrote:
> 
>> On Jun 30, 2016, at 11:04 PM, Aristedes Maniatis <ar...@maniatis.org> wrote:
>>
>> 1. The Cayenne model will need to have additional metadata about database indexes and constraints. I've always liked that idea anyway but lacked the time to work on it personally. It could open up interesting interactions with projects like liquibase [1]
> 
> We already have information about relationships (that are essentially FK constraints). Why do we need to know about indexes for ordering purposes?

I was just thinking about db constraints really. For example, a unique index on a field (other than the PK) might need to be taken into consideration if you were deleting and creating records. The delete has to happen before the create, or they need to be coalesced into a single update.

Ari



-- 
-------------------------->
Aristedes Maniatis
GPG fingerprint CBFB 84B4 738D 4E87 5E5C  5EFA EF6A 7D2E 3E49 102A

Re: Order of operations?

Posted by Andrus Adamchik <an...@objectstyle.org>.
> On Jun 30, 2016, at 11:04 PM, Aristedes Maniatis <ar...@maniatis.org> wrote:
> 
> 1. The Cayenne model will need to have additional metadata about database indexes and constraints. I've always liked that idea anyway but lacked the time to work on it personally. It could open up interesting interactions with projects like liquibase [1]

We already have information about relationships (that are essentially FK constraints). Why do we need to know about indexes for ordering purposes?

Andrus

Re: Order of operations?

Posted by Aristedes Maniatis <ar...@maniatis.org>.
On 1/07/2016 11:01am, Andrus Adamchik wrote:
> So there are tradeoffs and challenges and this won't be an easy task. But if anyone feels like applying their graph theory knowledge to a non-trivial problem, please speak up :)

Although I don't have a solution, I have three observations/questions:

1. The Cayenne model will need to have additional metadata about database indexes and constraints. I've always liked that idea anyway but lacked the time to work on it personally. It could open up interesting interactions with projects like liquibase [1]

2. Has Hibernate or any other ORM solved this problem in an interesting way? [2] Hibernate appears to split "collection deletes/inserts" which I presume means "to-many" joined records.

3. Keeping the order that changes are added to the context is another simple approach that leaves it up to the developer to get it right in code. It doesn't quite tell us what to do with two updates to the same object that are collapsed into one, but otherwise it seems sensible as an option.


[1] https://github.com/liquibase/liquibase-hibernate for a simplistic approach, but we could go much further.

[2] http://stackoverflow.com/questions/12616336/how-is-hibernate-deciding-order-of-update-insert-delete

-- 
-------------------------->
Aristedes Maniatis
GPG fingerprint CBFB 84B4 738D 4E87 5E5C  5EFA EF6A 7D2E 3E49 102A

Re: Order of operations?

Posted by Andrus Adamchik <an...@objectstyle.org>.
It may be time to revisit current AshwoodEntitySorter and start building/sorting DB operations based on the object graph change log. Relationships between objects participating in a given transaction will likely provide us with a better insight about sorting then analyzing the underlying entities by themselves (something that we do now). The we can execute a mix of INSERT/UPDATE/DELETE in an order defined dynamically based on a commit set at hand.

Perhaps it will also help us in resolving dependency cycles, where there's no single valid operation order for simple operations (e.g. creating both department and its manager who is also a department employee). This can be solved by running an INSERT with null FK on one side of the cycle followed by UPDATE with the right PK. 

Also current AshwoodEntitySorter approach is geared towards merging all semantically identical operations in batched PreparedStatements, thus improving performance of large commits. If we are to do something new, we'll need to take this into account.

So there are tradeoffs and challenges and this won't be an easy task. But if anyone feels like applying their graph theory knowledge to a non-trivial problem, please speak up :)

Andrus

> On Jun 30, 2016, at 8:43 PM, Andrus Adamchik <an...@objectstyle.org> wrote:
> 
> There are cases when INSERT -> UPDATE -> DELETE is the only order that would satisfy the constraints. Such as when UPDATE nullifies FKs to the record that is later deleted. So this was a conscious decision. Too simplistic of course, as we've seen here. So a smarter algorithm is in order. But just reversing it to DELETE -> UPDATE -> INSERT (or picking any other fixed order for that matter) is going to break apps that work currently.
> 
> Andrus
> 
> 
>> On Jun 22, 2016, at 9:32 PM, Michael Gentry <bl...@gmail.com> wrote:
>> 
>> I believe I inquired about the ordering years ago and I think Andrus gave a
>> reason why, but it escapes me at the moment.  Perhaps he will remember and
>> chime in here.
>> 
>> mrg
>> 
>> 
>> On Wed, Jun 22, 2016 at 6:58 PM, Aristedes Maniatis <ar...@maniatis.org>
>> wrote:
>> 
>>> On 23/06/2016 5:25am, Lon Varscsak wrote:
>>>> Okay, I’ve found where it’s at (DataDomainFlushAction.preprocess).  I
>>> don’t
>>>> see an easy way to override this, without just forking (which is totally
>>>> doable).  Does anyone know why the default order of operations is INSERT
>>> ->
>>>> UPDATE -> DELETE?  Because if there’s no specific reason, it seems like
>>> we
>>>> could change this to support DBs that don’t have deferred constraints.
>>> (or
>>>> provide a hook to reorder these)
>>> 
>>> 
>>> That's a pretty old piece of code, probably before my time. The history is
>>> unfortunately broken by a move back in 2013 [1] but it would be interesting
>>> to go back to the origins of that file and see if any commit message sheds
>>> light on why that ordering was chosen. It does seem an odd choice, but
>>> perhaps there was a reason.
>>> 
>>> Before you fork Cayenne let's see if we can improve this behaviour for the
>>> entire community.
>>> 
>>> 
>>> [1]
>>> https://github.com/apache/cayenne/commits/b0631deb251f036840d1ca3aee6d4ae50f2441bf/cayenne-server/src/main/java/org/apache/cayenne/access/DataDomainFlushAction.java
>>> 
>>> --
>>> -------------------------->
>>> Aristedes Maniatis
>>> GPG fingerprint CBFB 84B4 738D 4E87 5E5C  5EFA EF6A 7D2E 3E49 102A
>>> 
>