You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@directory.apache.org by Emmanuel Lecharny <el...@gmail.com> on 2009/01/20 13:22:08 UTC

Re: [Replication] Handling Triggers (was: Re: [Mitosis] random thoughts ...)

On Tue, Jan 20, 2009 at 4:16 AM, Alex Karasulu <ak...@gmail.com> wrote:
> Hi Emmanuel,
>
> On Sat, Jan 17, 2009 at 7:15 PM, Emmanuel Lecharny <el...@gmail.com>
> wrote:
>>
>> Last, not least : the triggers. If some modification can triggers some
>> other (because of integrity constraints being activated), then it should be
>> logged in the change log. When replicating, the triggers _must_ be disabled,
>> as the merged operations will contain all the triggered operations.
>
> This is one way to handle it but it could be very expensive.  If the trigger
> firing impacts many entries or results in a cascade of firings, then the
> cost of replicating the changes could be very large.

Even if you fire the trigger, you will have the same amount of change
to do. You just spare the checks and the logic cost.

>
> Triggers are modeled as entries.  As entries they will themselves be
> replicated.  It would be nice if the trigger on a consumer could fire and do
> all the work so we could avoid unnecessary network traffic.  This is all
> nice but it gets really complicated really fast.

Right, it would spare a hell lot of network trafic, if triggers can be
fired instead of disactivated. In order to do so, we have to add a
special attribute into each entry modified by the trigger, or even
better, use a special user (a Trigger user) and put it into the
creatorName or modifersName AT.


> Before going on to talk about triggers let's stop for a second and talk
> about how replication events must be handled by a consumer.  The consumer
> must make sure that whatever change is to be applied to the DIT (except for
> delete operations) must have the proper operational attributes applied.
> More specifically the following basic operational attributes need the proper
> values:
>
> createTimestamp
> creatorsName
> modifyTimestamp
> modifiersName
>
> So the replication event should contain the who and the time at which the
> operation actually occurred rather than the current time for example.

yep.

 Hence
> replication event processing must perform operations against the DIT with
> the identity of the client making the change at that time on the supplier.
> So unlike a regular operation, an operation to apply replication deltas,
> must use different values for these attributes. In a way this kind of
> operation is not a direct operation against the consumer, but an indirect
> operation.
>
> Direct operations by clients may raise, triggers which may perform
> additional operations against the DIT.  These triggered operation can
> themselves raise triggers that cause more changes.  A cascade may result
> although should be constrained through various means.  The server is
> designed to track the fact that a triggered change is occuring because of
> another change.  This is tracked through a linked list where at the head
> you'll find the operation that started it all.  All the triggered operations
> are treated as indirect operations caused by the operation at the head.
>
> The point I want to make is we already have some machinery here for tracking
> direct and indirect opertations.  Although presently triggers don't work and
> the tracking mechanism lacks a way to put the same timestamp on all changed
> entries as if they happened at the same time, it should have this.  The
> server must treat replication operations at the consumer in a similar
> fashion and apply timestamps properly.  It can also do the same with respect
> to the changes due to triggers even if the operation in question is
> replicated or not.
>
> This is the main worry with triggers and if we can properly solve this
> problem in a simple and easy to maintain way then we're golden.

Right now, I think that the first step would be to have replication
working, Triggers or not. More specifically, if implementing a first
version of a working replication, and if it breaks triggers, then i'm
ready to pay the price : just because a server with the best possible
triggers implementation worth nothing without a working replication.

And i'm pretty sure we will be able to whip the triggers
implementation over a working replication than trying to catch all the
balls at the same time. We have to learn how to juggle with one ball
before trying the ten balls challenge !



-- 
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com

Re: [Replication] Handling Triggers (was: Re: [Mitosis] random thoughts ...)

Posted by Alex Karasulu <ak...@gmail.com>.
On Tue, Jan 20, 2009 at 7:22 AM, Emmanuel Lecharny <el...@gmail.com>wrote:

> On Tue, Jan 20, 2009 at 4:16 AM, Alex Karasulu <ak...@gmail.com>
> wrote:
> > Hi Emmanuel,
> >
> > On Sat, Jan 17, 2009 at 7:15 PM, Emmanuel Lecharny <el...@gmail.com>
> > wrote:
> >>
> >> Last, not least : the triggers. If some modification can triggers some
> >> other (because of integrity constraints being activated), then it should
> be
> >> logged in the change log. When replicating, the triggers _must_ be
> disabled,
> >> as the merged operations will contain all the triggered operations.
> >
> > This is one way to handle it but it could be very expensive.  If the
> trigger
> > firing impacts many entries or results in a cascade of firings, then the
> > cost of replicating the changes could be very large.
>
> Even if you fire the trigger, you will have the same amount of change
> to do. You just spare the checks and the logic cost.
>
> >
> > Triggers are modeled as entries.  As entries they will themselves be
> > replicated.  It would be nice if the trigger on a consumer could fire and
> do
> > all the work so we could avoid unnecessary network traffic.  This is all
> > nice but it gets really complicated really fast.
>
> Right, it would spare a hell lot of network trafic, if triggers can be
> fired instead of disactivated. In order to do so, we have to add a
> special attribute into each entry modified by the trigger, or even
> better, use a special user (a Trigger user) and put it into the
> creatorName or modifersName AT.
>
>
> > Before going on to talk about triggers let's stop for a second and talk
> > about how replication events must be handled by a consumer.  The consumer
> > must make sure that whatever change is to be applied to the DIT (except
> for
> > delete operations) must have the proper operational attributes applied.
> > More specifically the following basic operational attributes need the
> proper
> > values:
> >
> > createTimestamp
> > creatorsName
> > modifyTimestamp
> > modifiersName
> >
> > So the replication event should contain the who and the time at which the
> > operation actually occurred rather than the current time for example.
>
> yep.
>
>  Hence
> > replication event processing must perform operations against the DIT with
> > the identity of the client making the change at that time on the
> supplier.
> > So unlike a regular operation, an operation to apply replication deltas,
> > must use different values for these attributes. In a way this kind of
> > operation is not a direct operation against the consumer, but an indirect
> > operation.
> >
> > Direct operations by clients may raise, triggers which may perform
> > additional operations against the DIT.  These triggered operation can
> > themselves raise triggers that cause more changes.  A cascade may result
> > although should be constrained through various means.  The server is
> > designed to track the fact that a triggered change is occuring because of
> > another change.  This is tracked through a linked list where at the head
> > you'll find the operation that started it all.  All the triggered
> operations
> > are treated as indirect operations caused by the operation at the head.
> >
> > The point I want to make is we already have some machinery here for
> tracking
> > direct and indirect opertations.  Although presently triggers don't work
> and
> > the tracking mechanism lacks a way to put the same timestamp on all
> changed
> > entries as if they happened at the same time, it should have this.  The
> > server must treat replication operations at the consumer in a similar
> > fashion and apply timestamps properly.  It can also do the same with
> respect
> > to the changes due to triggers even if the operation in question is
> > replicated or not.
> >
> > This is the main worry with triggers and if we can properly solve this
> > problem in a simple and easy to maintain way then we're golden.
>
> Right now, I think that the first step would be to have replication
> working, Triggers or not. More specifically, if implementing a first
> version of a working replication, and if it breaks triggers, then i'm
> ready to pay the price : just because a server with the best possible
> triggers implementation worth nothing without a working replication.
>
> And i'm pretty sure we will be able to whip the triggers
> implementation over a working replication than trying to catch all the
> balls at the same time. We have to learn how to juggle with one ball
> before trying the ten balls challenge !
>

Oh yes I agree completely with your approach.  I started this thread for
some background discussions on this specific topic while we were focusing on
getting replication working period.  It's obvious we just want to get
something working then iron out the details.
Having these discussions during this time might help us avoid certain
pitfalls.

Alex