You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@directory.apache.org by Emmanuel Lecharny <el...@gmail.com> on 2007/09/26 18:02:39 UTC

[ChangeLog] Basic design

Hi,

as I'm trying to muse around the ChangeLog interceptor, to help unit
tests to be faster, I have a load of question here.

Basically, what we want is a way to play tests which will let the
server in its original state even after any kind of modification. We
need an activable rollback mechanism here.

Suppose we want to play for a while on the data, here is the sequence
of operation I'm thinking about :
1) start a 'Mark' (I prefer using 'Mark' instead of 'Transaction', for
semantic reasons)
3) inject whatever we want into the server (add, del, modify...)
4) do a rollback (the anti-operations are committed)

At the end, the server will be in a consistent state.

This leads to some tricky points :
1) Some partitions may be treated differently (schema ?) : we may need
a level of protection
2) The logger will become a bottleneck, as we will have to synchornize
the concurrent access to the storage
3) if we want to rollback the operations, the server should not be
able to process any other operations until the rollback is done

If we seriously consider using this mechanism for something more
critical, like storing a journal we can replay on crash (usefull with
the differed write mechanism we have), then other elements come into
the play :
1) we must flush the data as soon as they arrive, on disk
2) we have to think about a recover mechanism, which should compare
the current database state and the current journal
2-1) this recover mechanism will have to know which data has been
flushed to the backend, otherwise we may have a difference between the
journal and the backend. Namely, the Sync thread should be driven by
the ChangeLog interceptor (when the commit is submitted, then the
synch thread is waken up and flush data on the backend, marking the
entries in the log when they are written)

In any case, we also need a mechnaism to activate the ChangeLog operations :
- startLog
- beginMark
- commit
- rollback
- stopLog

What would be the best solution ? We can do that with a specific
control, an extendedOperation or a standard modifyRequest of a
specific entry in the ou=system partition (remember the 'configuration
in the DIT' thing ?) coupled with a trigger and SP.

thoughts ?


-- 
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com

Re: [ChangeLog] Basic design

Posted by Alex Karasulu <ak...@apache.org>.
Hi Emmanuel,

On 9/26/07, Emmanuel Lecharny <el...@gmail.com> wrote:
>
> Suppose we want to play for a while on the data, here is the sequence
> of operation I'm thinking about :
> 1) start a 'Mark' (I prefer using 'Mark' instead of 'Transaction', for
> semantic reasons)


Call it a revision or a tag.  I want to use SVN like names here instead of
making up our own since
people understand SVN concepts and really this mechanism will be very
similar.

3) inject whatever we want into the server (add, del, modify...)
> 4) do a rollback (the anti-operations are committed)
>
> At the end, the server will be in a consistent state.


Well this is not a matter of consistency perhaps.  You will roll back to an
earlier
state but the revision number will increase similar to the way you
merge back to an earlier state in SVN but then commit forward.  Example:

    revisions: 0 1 2 3 4 5 6 7 8
    At rev 8 you want to rollback to state at rev 5 then commit forward to
revision 9

In this case SVN applies a series a diffs from is commit backwards.  This is
very similar
to what we will be doing.  Instead we apply LDIFs and our revisions will
increase by 3 in
the example since there are 3 reverse ldifs to apply.

This leads to some tricky points :
> 1) Some partitions may be treated differently (schema ?) : we may need
> a level of protection


Ok you want to scope out different changes here so you can only apply those
that
you want.  Again this is easy to do once we have query capabilities on the
log.  We
will be able to ask it give me all the changes that took place under a DN on
the following
attributes etc.  This way you can pick the things you want to revert in a
particular subtree.
This is going to be very powerful.

Does this answer the question?

2) The logger will become a bottleneck, as we will have to synchornize
> the concurrent access to the storage


Well yes this is true.  I don't know how to avoid this safely.  You can
asynchronously
send messages to the log as it writes it back out but this is dangerous
since you can
loose data.

3) if we want to rollback the operations, the server should not be
> able to process any other operations until the rollback is done


Don't know if this is absolutely true.  As long as the operation does not
conflict with the changes
we should be ok.  We can also quickly determine which subtrees or entries
are effected by a rollback
and check fast to see if an operation is going to be in conflict.

If we seriously consider using this mechanism for something more
> critical, like storing a journal we can replay on crash (usefull with
> the differed write mechanism we have), then other elements come into
> the play :


This overload the purpose past revision control and tries to us the change
log
as a transaction log.  This is a different function with different
requirements.
Perhaps we can do both but we should not overload it at this point or else
we
cannot get anywhere with this feature.

1) we must flush the data as soon as they arrive, on disk


yep this is key for a txn log or else you loose the data

2) we have to think about a recover mechanism, which should compare
> the current database state and the current journal


With revisions this is easy to do.

2-1) this recover mechanism will have to know which data has been
> flushed to the backend, otherwise we may have a difference between the
> journal and the backend. Namely, the Sync thread should be driven by
> the ChangeLog interceptor (when the commit is submitted, then the
> synch thread is waken up and flush data on the backend, marking the
> entries in the log when they are written)


Again I think this is an incredibly bad move to mix these two concerns
together.  We need a
separate transaction log or need to leverage the one that exists in backing
stores.  I prefer our
own transaction log but this should be a separate subsystem all together
maybe based on HOWL.

Then the two subsystems can compliment each other.  Keep it simple without
overloading
functionality on one subsystem so we can actually get work done rapidly and
maintain them
better.

In any case, we also need a mechnaism to activate the ChangeLog operations :
> - startLog


The log can be started and stopped but not if it's also a transaction log so
let's not mix
these concerns and mess up both of them.

- beginMark


Ok using svn language: this is a tag.  You can tag several revisions that
are of significance.  So it's not one
tag (mark) but for the testing situation you have to deal with then yes you
will get the current revision of the server
before starting a test then roll back to that revision after the test is
complete.  But I recommend just turning
on the change log at the end of setUp and turning it off on tearDown then
just applying the reverse.ldif.

- commit


Right now we don't commit several operations in one.  We do not have
transactions.

- rollback

- stopLog


Ok we are mixing lots of things here.  I think we're going to get lost in
the woods.

What would be the best solution ?


I would take things one step at a time because I get confused easily.  The
first step should just be to build
the simplest implementation to capture the changes and produce a forward and
reverse diff log.  That's the
first step and we can use this immediately with the test rollback
requirements we have.

We can do that with a specific
> control, an extendedOperation or a standard modifyRequest of a
> specific entry in the ou=system partition (remember the 'configuration
> in the DIT' thing ?) coupled with a trigger and SP.
>

Oh man this is a separate thread on it's own.  For now I suggest we take
things simply and progress
and just start to flush out the issues with time while solving some
immediate concerns.

Let's divide and conquer the problems so they're not so overwhelming.  If we
try to solve every problem
all at once then we cannot start and finish something that can give us
value.

Alex