You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@directory.apache.org by Emmanuel Lecharny <el...@apache.org> on 2009/03/01 00:23:19 UTC

Journal first draft

Hi guys,

I'm currently working on a very preliminary implementation of the 
journal. It is an interceptor added at the very end of the chain, just 
before the partition. the way it works is very simple :
- it logs the LDIF for every modification in the journal, plus some 
extra informations (Ldapprincipal, a timestamp and a revision number). 
This is done before calling the partition Add method
- when the entry has been added into the partition, an ACK is logged 
into the journal, as a comment, containing the revision of the ACKed 
operation.
- if the addition failed, a NACK is logged in the same way
- the written information are immediately flushed on disk, and the write 
operation is synchronized in order to avoid a mixup of operations in the 
file.

Here is an exemple of what we get :

# 0.9.2342.19200300.100.1.1=admin,2.5.4.11=system
# ts: 1235862542574
# rev: 1235862541942
dn: 2.5.4.3=kate#bush,2.5.4.11=system
changeType: Add
createtimestamp: 20090228230902Z
sn: Bush
entryuuid:: OWM5MDU5YjYtNmJkNC00ZmQzLTgwODMtMzE4MGJhOGQyMGY0
cn: Kate#Bush
entrycsn:: MjAwOTAzMDEwMDA5MDIuMDAwNTM0WiMwIzAjMDAwMDAw
objectclass: person
objectclass: top
creatorsname: 0.9.2342.19200300.100.1.1=admin,2.5.4.11=system

# ack-rev: 1235862541942

# 0.9.2342.19200300.100.1.1=admin,2.5.4.11=system
# ts: 1235862542618
# rev: 1235862541943
dn: 2.5.4.3=kate#bush,2.5.4.11=system
changeType: Delete

# ack-rev: 1235862541943
...

The revision number is used to associate a ack/nack to an operation : if 
the server crashes in the middle of a partition update, we won't have a 
ack/nack for the pending operation.

As you can see, this is a very basic implementation atm. I didn't dealt 
with all the intricacy of journal rotation, journal cleanup, and such. 
It's a growing file, and it can grow fast. Some immediate improvement 
could be to use more than one file to store the changes, with an 
executor to deal with the write, allowing parallel update of the journal 
(it would be a bit more complicated to restore the journal, but at 
least, we would avoid a contention problem).

So wdyt about this first version ? What's missing ?

-- 
--
cordialement, regards,
Emmanuel Lécharny
www.iktek.com
directory.apache.org



Re: Journal first draft

Posted by Emmanuel Lecharny <el...@apache.org>.
Alex Karasulu wrote:
> OK thanks for the clarifications.
>   

This was the easy part :)

I'm now dealing with the hard part :
- how to avoid the journal to grow indefinitively
- how to use the journal with the coming replication system
- how to build the recovery system

I have some idea about #1 and #3 : Considering that the journal is just 
a representation of the pending operation,
we can rotate the file every N operations, or each time we sync(), if of 
course we have had some modifications, or evey T seconds. Once we have a 
rotated the file, we can check if all the operation has been acked, and 
if so, we can ditch them. If the server works well, we won't have 
operation pending for ever.

When the server crash, or when we stop it and restart it (it makes no 
difference), we just have to check on the journal if we have pending 
operation. If so, we will check if the pending operations have been 
applied or not (it's easy for add or del operation, a bit more 
complicated for modifications, definitively more complex for move and 
rename, as the operation can have been partially applied).

What if the base is screwed ? (ie, we can't restart the server because 
the backend files are FU). Then we have to start from a recent backup, 
and apply all the journal on this backup.

So next question : how do we get this backup ? We have to build it : the 
idea is to have a local server working as a slave, and periodically 
applying the journal on its base. When done, the server is stopped, the 
partitions are saved, and we just wait for the next backup to start. So 
if the machine crashes, even during a backup, we can still start from 
the previous saved file and reapply the journal on it. Then when the 
journal has been applied, we just have to copy the backup partition to 
the real server, and we are done.

Seems a bit complex, but don't worry : it is complex :)
> Alex
>
> On Mon, Mar 2, 2009 at 2:20 PM, Emmanuel Lecharny <el...@apache.org>wrote:
>
>   
>> Alex Karasulu wrote:
>>
>>     
>>> So wdyt about this first version ? What's missing ?
>>>       
>>>>
>>>>         
>>> (1) Why not make these comment attribute actual attributes in the LDIF
>>> even
>>> though they are special?
>>>
>>>
>>>       
>> I was considering this as an option, but the problem is that you need to
>> clone the entry in order to add those special attributes, ar add them and
>> remove them immediately, which is not really elegant.
>>
>>     
>>> (2) Why is rev not a regular attribute?
>>>
>>>
>>>       
>> Same reason. And we can't use the ChangeLog rev because it is not added if
>> the changeLog is not activated.
>>
>> In any case, we don't need to store those elements in the partition within
>> the entry, because they are volatile (each time you restart the server, the
>> revision is set to a new value)
>>
>>     
>>> (3) What exactly is the difference between ACK and NACK and why do we need
>>> a
>>> NACK?
>>>
>>>
>>>       
>> We will have ACK for entries which have been stored in the partition. NACK
>> is for entries that we _know_ haven't been stored for any reason. If the
>> entry has not been stored yet, or not been rejected yet, then we won't have
>> any ACK or NACK.
>>
>> It's interesting to have a NACK to distinguish between those two cases.
>>
>>
>> --
>> --
>> cordialement, regards,
>> Emmanuel Lécharny
>> www.iktek.com
>> directory.apache.org
>>
>>
>>
>>     
>
>   


-- 
--
cordialement, regards,
Emmanuel Lécharny
www.iktek.com
directory.apache.org



Re: Journal first draft

Posted by Alex Karasulu <ak...@gmail.com>.
OK thanks for the clarifications.

Alex

On Mon, Mar 2, 2009 at 2:20 PM, Emmanuel Lecharny <el...@apache.org>wrote:

> Alex Karasulu wrote:
>
>> So wdyt about this first version ? What's missing ?
>>>
>>>
>>>
>>
>> (1) Why not make these comment attribute actual attributes in the LDIF
>> even
>> though they are special?
>>
>>
> I was considering this as an option, but the problem is that you need to
> clone the entry in order to add those special attributes, ar add them and
> remove them immediately, which is not really elegant.
>
>> (2) Why is rev not a regular attribute?
>>
>>
> Same reason. And we can't use the ChangeLog rev because it is not added if
> the changeLog is not activated.
>
> In any case, we don't need to store those elements in the partition within
> the entry, because they are volatile (each time you restart the server, the
> revision is set to a new value)
>
>> (3) What exactly is the difference between ACK and NACK and why do we need
>> a
>> NACK?
>>
>>
> We will have ACK for entries which have been stored in the partition. NACK
> is for entries that we _know_ haven't been stored for any reason. If the
> entry has not been stored yet, or not been rejected yet, then we won't have
> any ACK or NACK.
>
> It's interesting to have a NACK to distinguish between those two cases.
>
>
> --
> --
> cordialement, regards,
> Emmanuel Lécharny
> www.iktek.com
> directory.apache.org
>
>
>

Re: Journal first draft

Posted by Emmanuel Lecharny <el...@apache.org>.
Alex Karasulu wrote:
>> So wdyt about this first version ? What's missing ?
>>
>>     
>
> (1) Why not make these comment attribute actual attributes in the LDIF even
> though they are special?
>   
I was considering this as an option, but the problem is that you need to 
clone the entry in order to add those special attributes, ar add them 
and remove them immediately, which is not really elegant.
> (2) Why is rev not a regular attribute?
>   
Same reason. And we can't use the ChangeLog rev because it is not added 
if the changeLog is not activated.

In any case, we don't need to store those elements in the partition 
within the entry, because they are volatile (each time you restart the 
server, the revision is set to a new value)
> (3) What exactly is the difference between ACK and NACK and why do we need a
> NACK?
>   
We will have ACK for entries which have been stored in the partition. 
NACK is for entries that we _know_ haven't been stored for any reason. 
If the entry has not been stored yet, or not been rejected yet, then we 
won't have any ACK or NACK.

It's interesting to have a NACK to distinguish between those two cases.

-- 
--
cordialement, regards,
Emmanuel Lécharny
www.iktek.com
directory.apache.org



Re: Journal first draft

Posted by Alex Karasulu <ak...@gmail.com>.
Hi Emm,

On Sat, Feb 28, 2009 at 6:23 PM, Emmanuel Lecharny <el...@apache.org>wrote:

> Hi guys,
>
> I'm currently working on a very preliminary implementation of the journal.
> It is an interceptor added at the very end of the chain, just before the
> partition. the way it works is very simple :
> - it logs the LDIF for every modification in the journal, plus some extra
> informations (Ldapprincipal, a timestamp and a revision number). This is
> done before calling the partition Add method
> - when the entry has been added into the partition, an ACK is logged into
> the journal, as a comment, containing the revision of the ACKed operation.
> - if the addition failed, a NACK is logged in the same way
> - the written information are immediately flushed on disk, and the write
> operation is synchronized in order to avoid a mixup of operations in the
> file.
>
> Here is an exemple of what we get :
>
> # 0.9.2342.19200300.100.1.1=admin,2.5.4.11=system
> # ts: 1235862542574
> # rev: 1235862541942
> dn: 2.5.4.3=kate#bush,2.5.4.11=system
> changeType: Add
> createtimestamp: 20090228230902Z
> sn: Bush
> entryuuid:: OWM5MDU5YjYtNmJkNC00ZmQzLTgwODMtMzE4MGJhOGQyMGY0
> cn: Kate#Bush
> entrycsn:: MjAwOTAzMDEwMDA5MDIuMDAwNTM0WiMwIzAjMDAwMDAw
> objectclass: person
> objectclass: top
> creatorsname: 0.9.2342.19200300.100.1.1=admin,2.5.4.11=system
>
> # ack-rev: 1235862541942
>
> # 0.9.2342.19200300.100.1.1=admin,2.5.4.11=system
> # ts: 1235862542618
> # rev: 1235862541943
> dn: 2.5.4.3=kate#bush,2.5.4.11=system
> changeType: Delete
>
> # ack-rev: 1235862541943
> ...
>
> The revision number is used to associate a ack/nack to an operation : if
> the server crashes in the middle of a partition update, we won't have a
> ack/nack for the pending operation.
>
> As you can see, this is a very basic implementation atm. I didn't dealt
> with all the intricacy of journal rotation, journal cleanup, and such. It's
> a growing file, and it can grow fast. Some immediate improvement could be to
> use more than one file to store the changes, with an executor to deal with
> the write, allowing parallel update of the journal (it would be a bit more
> complicated to restore the journal, but at least, we would avoid a
> contention problem).
>
> So wdyt about this first version ? What's missing ?
>

(1) Why not make these comment attribute actual attributes in the LDIF even
though they are special?

(2) Why is rev not a regular attribute?

(3) What exactly is the difference between ACK and NACK and why do we need a
NACK?

Thanks,
Alex