You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@directory.apache.org by Emmanuel Lecharny <el...@gmail.com> on 2009/01/20 23:07:32 UTC

[Mitosis] Random thoughts (3)

As I can't really code atm, I will continue to forward my public 
transportations thoughts to the ML :)

So my last two mails presented some theorical replication system where 
servers replicates by reverting back to a previous solid state, and 
re-applied the ordered modifications. So far, so good.

What about performances ? They are obviously terrible... If you think 
about a server A on which you have injected 1 million entries, trying to 
replicate with a server B on which a unique entry has been modified. We 
will have to :
- revert the million entries on A
- revert the modification on B
- transmit the million entry from A to B
- transmit the entry from B to A
- order the million and 1 entry on both server
- inject one million and one entry on A and B.

Obviously, we will have to transmit the million entry to B and to apply 
them, but why should we revert and re-apply them on A? This is just a 
waste...

How can we improve this ?

We should think about the possible operations on a server. We have three 
kind of operations :
- add an entry
- delete an entry
- modify an entry

The rename/move operation can be seen as a combinaison :
rename = delete + (modification) + add
move = delete + add
move and rename = delete + (modification) + add

(The modification represent the RDN AT which has to be updated)

A modification is a special case, we will deal with it later. So it left 
only 2 operations : add and delete. Let's analyse what can happen when 
two servers are disconnected.

1) some entry are added on A and on B.
If any of those entries are 'colliding' (ie the intersection is not 
empty), we have to manage some conflict. We have two different cases :
- both entries are the same : that's just ok, we just keep the older one.
- entries share the same DN, but have different values : either we keep 
the older one, or we keep the one on the server with a higher priority - 
assuming we have set up some priority -

2) some entries are deleted from A and B.
As both servers were in sync when the deletion started, we just have to 
apply the deletion as they arrive, discarding any error (ie, if we rtry 
to delete an entry which does not exist, that means it has already been 
deleted)

3) We have a mix of added and deleted entries from A and B.
This is the most complex case. Let's try to clarify it. We will consider 
only two operations done on A and B, op(a) and op(b). We don't really 
care if op(a) is a add or a delete, and about op(b) either.

Suppose that op(a) is done before op(b). What are the different cases ?
a) op(a) is an add, op(b) is an add : see case (1)
b) op(a) is a delete, op(b) is a delete : see case (2)
c) op(a) is an add, op(b) is a delete : the fact that both operations 
has been transmitted for replication means that they were valid. If 
op(a) and op(b) aren't pointing on the same DN, we just apply both 
modification, they will success. If  they are pointing to the same DN, 
then we will :
* on B : do nothing. The entry has already been deleted, as we have 
deleted it ...
* on A : delete the entry.
d) op(a) is a delete, op(b) is an add : This is exactly the same case as 
(c), reverted. Do the same thing, but inverting the servers.


And that's it. Well, pretty much. The only little problem is that we may 
have many other operations impacting the modified entry between op(a) 
and op(b), but we don't care : any other operation don't interact, 
otherwise they will have been treated the way we just dealt with op(a) 
and op(b). What I mean is that we have selected op(a) and op(b) exactly 
as if they were the first conflicting operations.

I'm not 100% sure, but I think this should be the way to deal with 
conflits for add and delete operations.

What about a modification ? I will let it for some more random thoughts, 
as I have around 2h and a half public transportation tomorrow :)

Ok, it's up to you know to tell me if I'm again high in the sky :)

Thanks !

-- 
--
cordialement, regards,
Emmanuel Lécharny
www.iktek.com
directory.apache.org

Re: [Mitosis] Random thoughts (3)

Posted by Emmanuel Lecharny <el...@gmail.com>.

Alex Karasulu wrote:
> On Tue, Jan 20, 2009 at 5:07 PM, Emmanuel Lecharny <el...@gmail.com>wrote:
> ...
>
> What about performances ? They are obviously terrible... If you think about
>   
>> a server A on which you have injected 1 million entries, trying to replicate
>> with a server B on which a unique entry has been modified. We will have to :
>> - revert the million entries on A
>> - revert the modification on B
>> - transmit the million entry from A to B
>> - transmit the entry from B to A
>> - order the million and 1 entry on both server
>> - inject one million and one entry on A and B.
>>
>> Obviously, we will have to transmit the million entry to B and to apply
>> them, but why should we revert and re-apply them on A? This is just a
>> waste...
>>
>> How can we improve this ?
>>
>>     
>
> One idea might be to send batches of replication events instead of one event
> at a time after the servers connect.
>   
Yeah, that would help a bit at the margin. What I wanted to stress out 
is that the first proposal, though guaranted to success, is just 
impracticable. Using the second algorithm, we most certainly want to use 
standard LDAP requests, up to a point. If we have millions of entries to 
replicate, then it would also be a case we would want to switch to 
another method. I'm specifically thinking about a full init of a replica 
(a blank one) : it would be better to send everything as a big LDIF, zipped.
> ...
>
> 1) some entry are added on A and on B.
>   
>> If any of those entries are 'colliding' (ie the intersection is not empty),
>> we have to manage some conflict. We have two different cases :
>> - both entries are the same : that's just ok, we just keep the older one.
>>     
>
>
> You should keep the newer one I think because the second add would have
> failed and you want the timestamps and creator information to match the
> first add.
>   
Right. We need to delete the old entry, and replace it with the new one. 
what would be interesting is to switch to a modif operation in this 
case, to avoid many index to be changed (delete => some index 
modifications, and add => some more index modifications, when lot of 
them will remain unchanged if we simply compute a delta and update the 
previous entry by modifying only the necessary attributes).

-- 
--
cordialement, regards,
Emmanuel Lécharny
www.iktek.com
directory.apache.org

Re: [Mitosis] Random thoughts (3)

Posted by Alex Karasulu <ak...@gmail.com>.

On Tue, Jan 20, 2009 at 5:07 PM, Emmanuel Lecharny <el...@gmail.com>wrote:
...

What about performances ? They are obviously terrible... If you think about
> a server A on which you have injected 1 million entries, trying to replicate
> with a server B on which a unique entry has been modified. We will have to :
> - revert the million entries on A
> - revert the modification on B
> - transmit the million entry from A to B
> - transmit the entry from B to A
> - order the million and 1 entry on both server
> - inject one million and one entry on A and B.
>
> Obviously, we will have to transmit the million entry to B and to apply
> them, but why should we revert and re-apply them on A? This is just a
> waste...
>
> How can we improve this ?
>

One idea might be to send batches of replication events instead of one event
at a time after the servers connect.

...

1) some entry are added on A and on B.
> If any of those entries are 'colliding' (ie the intersection is not empty),
> we have to manage some conflict. We have two different cases :
> - both entries are the same : that's just ok, we just keep the older one.


You should keep the newer one I think because the second add would have
failed and you want the timestamps and creator information to match the
first add.

Alex