You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@directory.apache.org by Emmanuel Lecharny <el...@apache.org> on 2009/03/13 15:34:59 UTC

Replication configuration

Hi,

on the replication branch, we are now able to connect to an OpenLDAP 
server, and subscribe as a slave with the refreshOnly or 
refreshAndPersist operations. This is very experimental atm, and we need 
more than the current configuration in order to implement this in the 
server.

So far, here are the needed informations :
- a replicaId (or RID), uniquely identifying the server
- a replication type : RefreshOnly or RefreshAndPersist
- an interval for a RefreshOnly replication
- a search base, which will be the part of the tree to replicate
- a principal, used to connect on the master server
- a password

Currently, the lacking informations are :
- the replication type
- the search base
- the principal
- the password

We have a Replica class holding similar informations, namely a 
SocketAddress, as we where based on a proprietary protocol to handle the 
replication in the previous version (Mitosis). As the new replication 
model will be based on RFC 4533, we need to change this.

So the ReplicationInterceptor configuration will change. Currently, it 
looks like that :

    <replicationInterceptor>
      <configuration>
        <replicationConfiguration logMaxAge="5"
                                  replicaId="instance_a"
                                  replicationInterval="2"
                                  responseTimeout="10"
                                  serverPort="10390">
          <s:property name="peerReplicas">
            <s:set>
              <s:value>instance_b@localhost:1234</s:value>
              <s:value>instance_c@localhost:1234</s:value>
            </s:set>
          </s:property>
        </replicationConfiguration>
      </configuration>
    </replicationInterceptor>


We will remove the logMaxAge, responseTimeout and serverPort parameters. 
The peerReplicas will contain an LdapURL with the list of server we want 
to replicate from. Those replicas will look like :

ldap://[<principalDN>:<password>]@<server>[:<port>]/<baseDN>

We will end with a configuration like :

    <replicationInterceptor>
      <configuration>
        <replicationConfiguration sync="RefreshOnly"   (or "RefreshAndPersist")
                                  replicaId="001"
                                  replicationInterval="00:05:00">  (every 5 minutes)
          <s:property name="peerReplicas">
            <s:set>
              <s:value>ldap://uid=admin,ou=system:secret@ldap2.apache.org:10389/ou=people,dc=apache,dc=org</s:value>
              <s:value>ldap://uid=admin,ou=system:secret@ldap3.apache.org:10389/ou=projects,dc=apache,dc=org</s:value>
            </s:set>
          </s:property>
        </replicationConfiguration>
      </configuration>
    </replicationInterceptor>

(the replicaId is now a 3 digits value, as the OpenLDAP looks like 
rid=000,sid=000,csn=20090311230920.705931Z#000000#001#000000).

We may want to be more specific with the peerReplicas, like for instance 
define a different replication Interval for each search base. That could 
be done using such a configuration :

    <replicationInterceptor>
      <configuration>
        <replicationConfiguration replicaId="001">
          <s:property name="peerReplicas">
            <s:set>
              <replica>
                <type>refreshAndPersist</type>
                <principalDn>uid=admin,ou=system</principalDn>
                <password>secret</password>
                <server>ldap1.apache.org</server>
                <port>10389</port>
                <baseDN>ou=people,dc=apache,dc=org</baseDN>
              </replica>
              <replica>
                <type>refreshOnly</type>
                <principalDn>uid=admin,ou=system</principalDn>
                <password>secret</password>
                <server>ldap1.apache.org</server>
                <port>10389</port>
                <baseDN>cn=config,ou=system</baseDN>
                <interval>01:00:00</interval>
              </replica>
            </s:set>
          </s:property>
        </replicationConfiguration>
      </configuration>
    </replicationInterceptor>


This is a very preliminary proposal. Feel free to comment it.

-- 
--
cordialement, regards,
Emmanuel Lécharny
www.iktek.com
directory.apache.org



Re: Replication configuration : second thought

Posted by Alex Karasulu <ak...@gmail.com>.
On Mon, Mar 16, 2009 at 3:42 PM, Emmanuel Lecharny <el...@apache.org>wrote:

>
>  We should consider three cases :
>>> - no replication at all, nor as a consumer neither as a producer. This is
>>> what you describe.
>>>
>>>
>>
>>
>> Yes this is what the core DirectoryService should be.  If replication is
>> needed then LdapService needs to be used.
>>
>>
> Ok, let's start from here then. Assuming that the core is free of network
> is a good base.
>
>
>>
>>> - embedded server with a consumer
>>> - embedded server which is also a producer
>>>
>>>
>>>
>>
>> By embedded you don't mean just the core? Let's be exacting with this
>> term.
>>
>>
> I mean, the core. Not the network part.
>
>> The following embedded configurations of ApacheDS are possible:
>>
>>   (1) Only the core without any network services enabled.
>>     (2) The core and the LDAP server together with potentially other
>> network
>> services enabled.
>>
>>
> Right. This is from this starting point I'm trying to figure out the best
> possible place to put the replication layer.
>
>  Now in the 1st configuration I don't see replication as available.  It
>> should not be available.  Trustin broke this order by developing mitosis
>> in
>> a vacuum.  I intend to correct this mistake.
>>
>>
> I agree.
>
>> Now that we're using syncrepl which sits on top of the LDAP line protocol,
>> it makes sense to have it enabled with the frontend in the 2nd
>> configuration
>> I wrote above.
>>
>>
> Ok. There is one little things remaining then (see later).
>
>>
>>
>>> In case 3, then better start the LdapService, as it has everything
>>>
>>> In case 2, i don't see why we should also get all the LDAP machinery when
>>> just a client would be enough ?
>>>
>>>
>>>
>>
>> Consumer or actual LDAP client don't understand.
>>
>>
> Consumer. Sorry for the confusion.
>
>>
>>
>>
>>> Maybe we need an intermediate machinery, not in the core but not in the
>>> service ?
>>>
>>>
>>>
>>
>> These are bad choices IMHO. It's yet another new way things can be setup.
>>  I
>> think you should keep the LDAP protocol stack and replication together as
>> part of the frontend since it is inherently part of the frontend.
>>
>>
> Hmmm, you may be right.
>
>> Just remember we cannot please everyone.  There will be people who want
>> just
>> the core but then find they will want replication too.  Those folks can
>> just
>> enable the frontend and get replication.  I don't see any other way.
>>
>>
> The only thing we can do is to allow a use who want to embed the server
> _and_ benefit from the replication as a consumer (not a producer). That
> means we must allow users to disable the incoming LDAP part in the
> LdapService. Should be easy.
>

Yeah.  But we need not do this now.  Let's just get this working and
organized properly wrt to the configuration.  Anyways users can use ACI and
other techniques to constrain what can be done through the LDAP service if
they want replication but do not want to expose LDAP.  This is definitely
icing on the configuration cake.

Alex

Re: Replication configuration : second thought

Posted by Emmanuel Lecharny <el...@apache.org>.
>> We should consider three cases :
>> - no replication at all, nor as a consumer neither as a producer. This is
>> what you describe.
>>     
>
>
> Yes this is what the core DirectoryService should be.  If replication is
> needed then LdapService needs to be used.
>   
Ok, let's start from here then. Assuming that the core is free of 
network is a good base.

>   
>> - embedded server with a consumer
>> - embedded server which is also a producer
>>
>>     
>
> By embedded you don't mean just the core? Let's be exacting with this term.
>   
I mean, the core. Not the network part.
> The following embedded configurations of ApacheDS are possible:
>
>    (1) Only the core without any network services enabled.
>   
>    (2) The core and the LDAP server together with potentially other network
> services enabled.
>   
Right. This is from this starting point I'm trying to figure out the 
best possible place to put the replication layer.

> Now in the 1st configuration I don't see replication as available.  It
> should not be available.  Trustin broke this order by developing mitosis in
> a vacuum.  I intend to correct this mistake.
>   
I agree.
> Now that we're using syncrepl which sits on top of the LDAP line protocol,
> it makes sense to have it enabled with the frontend in the 2nd configuration
> I wrote above.
>   
Ok. There is one little things remaining then (see later).
>   
>> In case 3, then better start the LdapService, as it has everything
>>
>> In case 2, i don't see why we should also get all the LDAP machinery when
>> just a client would be enough ?
>>
>>     
>
> Consumer or actual LDAP client don't understand.
>   
Consumer. Sorry for the confusion.
>
>   
>> Maybe we need an intermediate machinery, not in the core but not in the
>> service ?
>>
>>     
>
> These are bad choices IMHO. It's yet another new way things can be setup.  I
> think you should keep the LDAP protocol stack and replication together as
> part of the frontend since it is inherently part of the frontend.
>   
Hmmm, you may be right.
> Just remember we cannot please everyone.  There will be people who want just
> the core but then find they will want replication too.  Those folks can just
> enable the frontend and get replication.  I don't see any other way.
>   
The only thing we can do is to allow a use who want to embed the server 
_and_ benefit from the replication as a consumer (not a producer). That 
means we must allow users to disable the incoming LDAP part in the 
LdapService. Should be easy.

-- 
--
cordialement, regards,
Emmanuel Lécharny
www.iktek.com
directory.apache.org



Re: Replication configuration : second thought

Posted by Alex Karasulu <ak...@gmail.com>.
On Mon, Mar 16, 2009 at 3:10 PM, Emmanuel Lecharny <el...@apache.org>wrote:

>
>  what about having an embedded server with replication capability ?
>>>
>>>
>>
>>
>> Embedded does not mean the protocol is not exposed.  You can embed with
>> the
>> front end or without.  Without the frontend you will not get replication
>> either since our replication sits on top of the LDAP protocol now.  So it
>> makes more sense to do this in the frontend IMHO.
>>
>>
> If the front end is LdapService, then  the problem is that you can't start
> it without starting the network part of it... You can't have an embedded
> server which is a replica of another server, but without all the bells and
> whistle of a full server. Sometime we may need just the outgoing part
> without the ingoing...
>
>>
>>> that's a good question... I would tend to think that injecting the
>>> replication subsystem into the core server is probably a better way, as
>>> it
>>> will be available not only to the standalone server, but also for an
>>> embedded server (all in all, the networked server just embedded the
>>> core...)
>>>
>>>
>>>
>>>
>> I would respectfully disagree here.  Those who just want to embed the core
>> without the LDAP protocol and socket servers do so because they do not
>> want
>> to expose anything.  We would still have to expose a network protocol even
>> if not the entire LDAP protocol, to do replication.  It makes less sense
>> to
>> me to put the network machinery into the core.
>>
>>
> We should consider three cases :
> - no replication at all, nor as a consumer neither as a producer. This is
> what you describe.


Yes this is what the core DirectoryService should be.  If replication is
needed then LdapService needs to be used.


>
> - embedded server with a consumer
> - embedded server which is also a producer
>

By embedded you don't mean just the core? Let's be exacting with this term.

The following embedded configurations of ApacheDS are possible:

   (1) Only the core without any network services enabled.
   (2) The core and the LDAP server together with potentially other network
services enabled.

Now in the 1st configuration I don't see replication as available.  It
should not be available.  Trustin broke this order by developing mitosis in
a vacuum.  I intend to correct this mistake.

Now that we're using syncrepl which sits on top of the LDAP line protocol,
it makes sense to have it enabled with the frontend in the 2nd configuration
I wrote above.


> In case 3, then better start the LdapService, as it has everything
>
> In case 2, i don't see why we should also get all the LDAP machinery when
> just a client would be enough ?
>

Consumer or actual LDAP client don't understand.


>
> Maybe we need an intermediate machinery, not in the core but not in the
> service ?
>

These are bad choices IMHO. It's yet another new way things can be setup.  I
think you should keep the LDAP protocol stack and replication together as
part of the frontend since it is inherently part of the frontend.

Just remember we cannot please everyone.  There will be people who want just
the core but then find they will want replication too.  Those folks can just
enable the frontend and get replication.  I don't see any other way.

Frankly I cannot discuss this ad infinitum since life is crazy at the moment
for me. But the call is yours since I'm not involved in this effort. I just
know that you'll tell 6-8 months later that I was right :-).  This however
will increase entropy and require more work to correct and this is what I
don't want for anyone.  It will be a waste.

Alex

Re: Replication configuration : second thought

Posted by Emmanuel Lecharny <el...@apache.org>.
>> what about having an embedded server with replication capability ?
>>     
>
>
> Embedded does not mean the protocol is not exposed.  You can embed with the
> front end or without.  Without the frontend you will not get replication
> either since our replication sits on top of the LDAP protocol now.  So it
> makes more sense to do this in the frontend IMHO.
>   
If the front end is LdapService, then  the problem is that you can't 
start it without starting the network part of it... You can't have an 
embedded server which is a replica of another server, but without all 
the bells and whistle of a full server. Sometime we may need just the 
outgoing part without the ingoing...
>>
>> that's a good question... I would tend to think that injecting the
>> replication subsystem into the core server is probably a better way, as it
>> will be available not only to the standalone server, but also for an
>> embedded server (all in all, the networked server just embedded the core...)
>>
>>
>>     
> I would respectfully disagree here.  Those who just want to embed the core
> without the LDAP protocol and socket servers do so because they do not want
> to expose anything.  We would still have to expose a network protocol even
> if not the entire LDAP protocol, to do replication.  It makes less sense to
> me to put the network machinery into the core.
>   
We should consider three cases :
- no replication at all, nor as a consumer neither as a producer. This 
is what you describe.
- embedded server with a consumer
- embedded server which is also a producer

In case 3, then better start the LdapService, as it has everything

In case 2, i don't see why we should also get all the LDAP machinery 
when just a client would be enough ?

Maybe we need an intermediate machinery, not in the core but not in the 
service ?

-- 
--
cordialement, regards,
Emmanuel Lécharny
www.iktek.com
directory.apache.org



Re: Replication configuration : second thought

Posted by Alex Karasulu <ak...@gmail.com>.
On Mon, Mar 16, 2009 at 12:10 PM, Emmanuel Lecharny <el...@apache.org>wrote:

>
>  Regarding Repl Terminology
>> ----------------------------------------
>>
>> OK let's get religious about using replication specific terminology.
>> Instead of talking about client, server, master, slave, let's use the
>> terms
>> supplier and consumer.  The reason why I like these terms besides their
>> common use is because it clearly denotes directionality and breaks down
>> elements of the replication agreements to their atomic components. So I'd
>> like to think in terms of a consumer configuration (with replication
>> agreement) and a supplier configuration (also with a repl-agmt).
>>
>>
> Totally +1. This is why I added them in (). So let's use them exclusively.
>
>>
>> Regarding Configuration
>> -----------------------------------
>>
>> On a separate note, I've been torn between two ways of thinking about
>> configuration for some time.  Regardless of whether we're talking about
>> replication subsystem or not, we could apply this general discussion to
>> any
>> feature/facet of the server which also contains an interceptor.  So this
>> discussion applies across the board as a configuration issue.
>>
>>
> probably
>
>> So where do we configure a subsystem? In the interceptor configuration? Or
>> as a separate component under the directory service.  Over the years I've
>> made many mistakes with this stuff.  I like the idea of having a high
>> level
>> subsystem bean that contains all the configuration for the feature in one
>> place instead of being distributed all over the configuration tree.
>> Localization is always good because then the user and developer only needs
>> to goto one place to get this information or modify the code for it.  It's
>> more manageable.
>>
>>
> That could lead to a centralized configuration in the DiT...
>

Exactly this is one of the benefits of making the configuration for a
subsystem in a single configuration bean rather than spreading the
configuration all over the place in some beans and interceptors etc.  This
is the luxury we have to gain from better organization.


>
>  I was always uncomfortable with bloating these interceptors with code.
>> Instead I just wanted to leave them as simple hooks that funneled events
>> (calls) into the subsystem.  The intercetor is then a simple listener that
>> belongs to the subsystem in question. So the listener detects events and
>> shuffles them into the subsystem to properly respond.  This allows us to
>> keep the configuration in the top level subsystem facade while properly
>> designing the various parts of the subsystem underneath with clarity
>> instead
>> of jamming all the subsystem handling logic tightly into interceptors
>> which
>> leads to code bloat in the interceptor modules leading to a jumbled up
>> mess
>> that is hard to manage.
>>
>>
> In fact, there are _very few_ interceptors that need a configuration :
> - Authentication (takes a list of authenticators, something we may want to
> be global too)
> - Journal ( but this is a new interceptor, and the configuration is not yet
> stabilized)
> - Replication, but this is exactly what we are discussing about
> and that's it.
>
> So far, there is nothing that prevent us to get rid of any parameter in all
> the interceptors...


This is good since it gives us the chance to do this right and not have to
clean anything up.


>
>  So I recommend the following:
>>
>>  (1) Implement a subsystem for replication with a top level facade bean
>> that can be configured via XBean.
>>
>>
> done
>
>>  (2) Setup a means to setup a set of replication agreements.  The same DSA
>> can be both a consumer and a supplier to many other DSAs.  So we'll have a
>> set of consumer agreements and a set of supplier agreements.
>>
>>
> +1
>
>>  (3) Build the interceptor out to be just a simple hook into other Classes
>> in the subsystem.  Make the subsystem return values for LDAP methods in
>> the
>> chain that require return values.
>>
>>
> okie
>
>>  (4) Don't wory about adding the replication intercetor to the chain via
>> XBean, just have the subsystem inject the intercetor programatically.  The
>> user does not need to know this interceptor even exists right?
>>
>>
> this can be discussed further, as this question is a bit too wide right
> now. Let's use the mechanism we have right now.


OK just as long as the interceptor does not have any configuration to it we
should be fine.


>
>
>> DirectoryService vs. LdapServer
>> ----------------------------------------------
>>
>> Now I don't know the answer to this but something itches me about putting
>> the replication subsystem into the DirectoryService.  For some time now
>> the
>> DS corresponding to the top level facade representing what we always
>> refered
>> to as the core.  The core was never supposed to be networked because then
>> that would pull dependencies like MINA.  It was the frontend that had
>> network capability.  I know mitosis broke from this but seems the moron
>> who
>> wrote it broke all the rules.
>>
>>
> what about having an embedded server with replication capability ?


Embedded does not mean the protocol is not exposed.  You can embed with the
front end or without.  Without the frontend you will not get replication
either since our replication sits on top of the LDAP protocol now.  So it
makes more sense to do this in the frontend IMHO.


>
>  Question is do we want replication to be a high level system in the
>> frontend
>> since it leverages the protocol? Or do we want it in the DS which was
>> traditionally core services without networking?
>>
>>
> that's a good question... I would tend to think that injecting the
> replication subsystem into the core server is probably a better way, as it
> will be available not only to the standalone server, but also for an
> embedded server (all in all, the networked server just embedded the core...)
>
>
I would respectfully disagree here.  Those who just want to embed the core
without the LDAP protocol and socket servers do so because they do not want
to expose anything.  We would still have to expose a network protocol even
if not the entire LDAP protocol, to do replication.  It makes less sense to
me to put the network machinery into the core.

Alex

Re: Replication configuration : second thought

Posted by Emmanuel Lecharny <el...@apache.org>.
> Regarding Repl Terminology
> ----------------------------------------
>
> OK let's get religious about using replication specific terminology.
> Instead of talking about client, server, master, slave, let's use the terms
> supplier and consumer.  The reason why I like these terms besides their
> common use is because it clearly denotes directionality and breaks down
> elements of the replication agreements to their atomic components. So I'd
> like to think in terms of a consumer configuration (with replication
> agreement) and a supplier configuration (also with a repl-agmt).
>   
Totally +1. This is why I added them in (). So let's use them exclusively.
>
> Regarding Configuration
> -----------------------------------
>
> On a separate note, I've been torn between two ways of thinking about
> configuration for some time.  Regardless of whether we're talking about
> replication subsystem or not, we could apply this general discussion to any
> feature/facet of the server which also contains an interceptor.  So this
> discussion applies across the board as a configuration issue.
>   
probably
> So where do we configure a subsystem? In the interceptor configuration? Or
> as a separate component under the directory service.  Over the years I've
> made many mistakes with this stuff.  I like the idea of having a high level
> subsystem bean that contains all the configuration for the feature in one
> place instead of being distributed all over the configuration tree.
> Localization is always good because then the user and developer only needs
> to goto one place to get this information or modify the code for it.  It's
> more manageable.
>   
That could lead to a centralized configuration in the DiT...
> I was always uncomfortable with bloating these interceptors with code.
> Instead I just wanted to leave them as simple hooks that funneled events
> (calls) into the subsystem.  The intercetor is then a simple listener that
> belongs to the subsystem in question. So the listener detects events and
> shuffles them into the subsystem to properly respond.  This allows us to
> keep the configuration in the top level subsystem facade while properly
> designing the various parts of the subsystem underneath with clarity instead
> of jamming all the subsystem handling logic tightly into interceptors which
> leads to code bloat in the interceptor modules leading to a jumbled up mess
> that is hard to manage.
>   
In fact, there are _very few_ interceptors that need a configuration :
 - Authentication (takes a list of authenticators, something we may want 
to be global too)
 - Journal ( but this is a new interceptor, and the configuration is not 
yet stabilized)
 - Replication, but this is exactly what we are discussing about
and that's it.

So far, there is nothing that prevent us to get rid of any parameter in 
all the interceptors...
> So I recommend the following:
>
>   (1) Implement a subsystem for replication with a top level facade bean
> that can be configured via XBean.
>   
done
>   (2) Setup a means to setup a set of replication agreements.  The same DSA
> can be both a consumer and a supplier to many other DSAs.  So we'll have a
> set of consumer agreements and a set of supplier agreements.
>   
+1
>   (3) Build the interceptor out to be just a simple hook into other Classes
> in the subsystem.  Make the subsystem return values for LDAP methods in the
> chain that require return values.
>   
okie
>   (4) Don't wory about adding the replication intercetor to the chain via
> XBean, just have the subsystem inject the intercetor programatically.  The
> user does not need to know this interceptor even exists right?
>   
this can be discussed further, as this question is a bit too wide right 
now. Let's use the mechanism we have right now.
>
> DirectoryService vs. LdapServer
> ----------------------------------------------
>
> Now I don't know the answer to this but something itches me about putting
> the replication subsystem into the DirectoryService.  For some time now the
> DS corresponding to the top level facade representing what we always refered
> to as the core.  The core was never supposed to be networked because then
> that would pull dependencies like MINA.  It was the frontend that had
> network capability.  I know mitosis broke from this but seems the moron who
> wrote it broke all the rules.
>   
what about having an embedded server with replication capability ?
> Question is do we want replication to be a high level system in the frontend
> since it leverages the protocol? Or do we want it in the DS which was
> traditionally core services without networking?
>   
that's a good question... I would tend to think that injecting the 
replication subsystem into the core server is probably a better way, as 
it will be available not only to the standalone server, but also for an 
embedded server (all in all, the networked server just embedded the core...)

-- 
--
cordialement, regards,
Emmanuel Lécharny
www.iktek.com
directory.apache.org



Re: Replication configuration : second thought

Posted by Howard Chu <hy...@symas.com>.
Emmanuel Lecharny wrote:
> Thanks Howard for all the points.
> - I think I was a bit confused about the rid/sid, because there are many
> documents all over the internet describing them as integer, or hex.
> After having checked, we already implemented it as a hex value

Unfortunately things are quite confusing here: the rid is decimal, the sid is hex.

-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/

Re: Replication configuration : second thought

Posted by Emmanuel Lecharny <el...@apache.org>.
Howard Chu wrote:
> Emmanuel Lecharny wrote:
>> Based on the previous elements, let's define what can be the
>> configuration, assuming we have one replication subsystem.
>>
>> consumer :
>> - replicaId, an int between [0,999] : identify uniquely this consumer
>
> Before you get too far, I should point out that replicaID is only 
> significant within a single server. It's just a means of labeling each 
> individual consumer config within one server. There is no requirement 
> for uniqueness across multiple servers.
>
> For OpenLDAP's MMR, we had to introduce the serverID parameter which 
> *is* required to be unique across all masters. This goes into the sid 
> field of the CSN, so it's a hex value [0-fff]. And also note that 0 is 
> only valid for single-master replication, for MMR the sid must be 
> non-zero.
>
>> - for each replication peer :
>>    o type, the replication type (RefreshAndPersist or ResfreshOnly).
>> Default to RefreshAndPersist.
>>    o interval, if type is ResfreshOnly : a hh/mm/ss interval between 
>> each
>> content polling
>>    o search base, the base DN to start the search on the producer
>>    o principal, the principal to use in order to connect on the producer
>>    o credential, the password to use to connect on the producer
>
> I don't know if your network layer already handles it, so you haven't 
> mentioned it here, but parameters for timeout / detecting a failed 
> network connection and scheduling retries are also necessary.
>
>> producer :
>> currently, I see no specific information to set, but it's really a
>> preliminary proposal
>
> One of the points to RFC4533 is that no producer-side configuration is 
> required, although some optimization is possible.
>
>> Regarding the principal/credential information, as the passwords are
>> stored crypted using a one-way algorithm, we have to store it in clear.
>> This is not very safe. We can think about a better algorithm, like
>> encrypting the password using a 2 ways algorithm, but that means we have
>> to store a key on the server, which is a breach too. There is no free
>> lunch ...
>
> Also provide the option of using SASL mechanisms. All of my production 
> syncrepl configs use TLS and certificate-based authentication. You 
> still get stuck needing the plaintext private key, but at least it's 
> not a text string that can be easily spotted and memorized in passing.
>
>> I'm going to implement this configuration on the replication branch.
>>
>> feel free to comment, and don't really care about the breakage that can
>> introduce in the code I will commit  : it's just there as a starting
>> point, nothing more.
>>
>> Thanks !
>>
>
>
Thanks Howard for all the points.
- I think I was a bit confused about the rid/sid, because there are many 
documents all over the internet describing them as integer, or hex. 
After having checked, we already implemented it as a hex value
- We haven't yet added all the network timeout/failure. But we will :)
- The very same for SASL/SSL. In any case, as this is plain LDAP 
protocol, we will benefit from the existing stack.

-- 
--
cordialement, regards,
Emmanuel Lécharny
www.iktek.com
directory.apache.org



Re: Replication configuration : second thought

Posted by Howard Chu <hy...@symas.com>.
Emmanuel Lecharny wrote:
> Based on the previous elements, let's define what can be the
> configuration, assuming we have one replication subsystem.
>
> consumer :
> - replicaId, an int between [0,999] : identify uniquely this consumer

Before you get too far, I should point out that replicaID is only significant 
within a single server. It's just a means of labeling each individual consumer 
config within one server. There is no requirement for uniqueness across 
multiple servers.

For OpenLDAP's MMR, we had to introduce the serverID parameter which *is* 
required to be unique across all masters. This goes into the sid field of the 
CSN, so it's a hex value [0-fff]. And also note that 0 is only valid for 
single-master replication, for MMR the sid must be non-zero.

> - for each replication peer :
>    o type, the replication type (RefreshAndPersist or ResfreshOnly).
> Default to RefreshAndPersist.
>    o interval, if type is ResfreshOnly : a hh/mm/ss interval between each
> content polling
>    o search base, the base DN to start the search on the producer
>    o principal, the principal to use in order to connect on the producer
>    o credential, the password to use to connect on the producer

I don't know if your network layer already handles it, so you haven't 
mentioned it here, but parameters for timeout / detecting a failed network 
connection and scheduling retries are also necessary.

> producer :
> currently, I see no specific information to set, but it's really a
> preliminary proposal

One of the points to RFC4533 is that no producer-side configuration is 
required, although some optimization is possible.

> Regarding the principal/credential information, as the passwords are
> stored crypted using a one-way algorithm, we have to store it in clear.
> This is not very safe. We can think about a better algorithm, like
> encrypting the password using a 2 ways algorithm, but that means we have
> to store a key on the server, which is a breach too. There is no free
> lunch ...

Also provide the option of using SASL mechanisms. All of my production 
syncrepl configs use TLS and certificate-based authentication. You still get 
stuck needing the plaintext private key, but at least it's not a text string 
that can be easily spotted and memorized in passing.

> I'm going to implement this configuration on the replication branch.
>
> feel free to comment, and don't really care about the breakage that can
> introduce in the code I will commit  : it's just there as a starting
> point, nothing more.
>
> Thanks !
>


-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/

Re: Replication configuration : second thought

Posted by Emmanuel Lecharny <el...@apache.org>.
Based on the previous elements, let's define what can be the 
configuration, assuming we have one replication subsystem.

consumer :
- replicaId, an int between [0,999] : identify uniquely this consumer
- for each replication peer :
  o type, the replication type (RefreshAndPersist or ResfreshOnly). 
Default to RefreshAndPersist.
  o interval, if type is ResfreshOnly : a hh/mm/ss interval between each 
content polling
  o search base, the base DN to start the search on the producer
  o principal, the principal to use in order to connect on the producer
  o credential, the password to use to connect on the producer

producer :
currently, I see no specific information to set, but it's really a 
preliminary proposal

Regarding the principal/credential information, as the passwords are 
stored crypted using a one-way algorithm, we have to store it in clear. 
This is not very safe. We can think about a better algorithm, like 
encrypting the password using a 2 ways algorithm, but that means we have 
to store a key on the server, which is a breach too. There is no free 
lunch ...

I'm going to implement this configuration on the replication branch.

feel free to comment, and don't really care about the breakage that can 
introduce in the code I will commit  : it's just there as a starting 
point, nothing more.

Thanks !

-- 
--
cordialement, regards,
Emmanuel Lécharny
www.iktek.com
directory.apache.org



Re: Replication configuration : second thought

Posted by Alex Karasulu <ak...@gmail.com>.
Hi Emmanuel,

On Mon, Mar 16, 2009 at 7:16 AM, Emmanuel Lecharny <el...@apache.org>wrote:

> Hi guys,
>
> after having thought about the configuration a bit more, I'm not sure that
> it should be associated with a replication interceptor. I'm not even sure
> that such an interceptor makes sense.
>
> If we consider both side, a server can be either a client, a server, or
> both. But we will consider each case as distinct.
>
> If the server is a client (a replica, or a consumer), then  it connects to
> a master server using a RefreshAndPersist control associated to a search
> request. This connection can be managed in a specific interceptor, but in
> any case, the searchResults will have to go through the whole chain.  As
> it's also a persistent connection, it would be better to deal with it at the
> DirectoryService level.
>

> If the server is a master (a producer), then we will consider that a client
> search request has been received, and persists. All the resulting entries
> are filtered by the chain, and AFAICT, we don't need to add an interceptor
> to filter the results. We just have to consider that the clinet is a
> standard client.
>

Regarding Repl Terminology
----------------------------------------

OK let's get religious about using replication specific terminology.
Instead of talking about client, server, master, slave, let's use the terms
supplier and consumer.  The reason why I like these terms besides their
common use is because it clearly denotes directionality and breaks down
elements of the replication agreements to their atomic components. So I'd
like to think in terms of a consumer configuration (with replication
agreement) and a supplier configuration (also with a repl-agmt).


Regarding Configuration
-----------------------------------

On a separate note, I've been torn between two ways of thinking about
configuration for some time.  Regardless of whether we're talking about
replication subsystem or not, we could apply this general discussion to any
feature/facet of the server which also contains an interceptor.  So this
discussion applies across the board as a configuration issue.

So where do we configure a subsystem? In the interceptor configuration? Or
as a separate component under the directory service.  Over the years I've
made many mistakes with this stuff.  I like the idea of having a high level
subsystem bean that contains all the configuration for the feature in one
place instead of being distributed all over the configuration tree.
Localization is always good because then the user and developer only needs
to goto one place to get this information or modify the code for it.  It's
more manageable.

I was always uncomfortable with bloating these interceptors with code.
Instead I just wanted to leave them as simple hooks that funneled events
(calls) into the subsystem.  The intercetor is then a simple listener that
belongs to the subsystem in question. So the listener detects events and
shuffles them into the subsystem to properly respond.  This allows us to
keep the configuration in the top level subsystem facade while properly
designing the various parts of the subsystem underneath with clarity instead
of jamming all the subsystem handling logic tightly into interceptors which
leads to code bloat in the interceptor modules leading to a jumbled up mess
that is hard to manage.

So I recommend the following:

  (1) Implement a subsystem for replication with a top level facade bean
that can be configured via XBean.
  (2) Setup a means to setup a set of replication agreements.  The same DSA
can be both a consumer and a supplier to many other DSAs.  So we'll have a
set of consumer agreements and a set of supplier agreements.
  (3) Build the interceptor out to be just a simple hook into other Classes
in the subsystem.  Make the subsystem return values for LDAP methods in the
chain that require return values.
  (4) Don't wory about adding the replication intercetor to the chain via
XBean, just have the subsystem inject the intercetor programatically.  The
user does not need to know this interceptor even exists right?


DirectoryService vs. LdapServer
----------------------------------------------

Now I don't know the answer to this but something itches me about putting
the replication subsystem into the DirectoryService.  For some time now the
DS corresponding to the top level facade representing what we always refered
to as the core.  The core was never supposed to be networked because then
that would pull dependencies like MINA.  It was the frontend that had
network capability.  I know mitosis broke from this but seems the moron who
wrote it broke all the rules.

Question is do we want replication to be a high level system in the frontend
since it leverages the protocol? Or do we want it in the DS which was
traditionally core services without networking?

As I write this I am leaning towards the LdapServer.  What we have to ask
ourselves is does the replication subsystem need access to protocol layer
structures and information? Another question is does anything other than the
interceptor need access to the ReplicationManager facade (presuming this is
what we named it).

This is a very important question.



>
> So I would suggest to move the replication configuration at a higher level
> (DirectoryService), and remove the ReplicationInterceptor.
>
> This is not definitive, I'm waiting for any comment before moving on this
> direction regarding to the interceptor removal (I'm not 100% sure that we
> don't need some replication specific filtering, for instance).
>
> wdyt ?
>
> --
> --
> cordialement, regards,
> Emmanuel Lécharny
> www.iktek.com
> directory.apache.org
>
>
>

Replication configuration : second thought

Posted by Emmanuel Lecharny <el...@apache.org>.
Hi guys,

after having thought about the configuration a bit more, I'm not sure 
that it should be associated with a replication interceptor. I'm not 
even sure that such an interceptor makes sense.

If we consider both side, a server can be either a client, a server, or 
both. But we will consider each case as distinct.

If the server is a client (a replica, or a consumer), then  it connects 
to a master server using a RefreshAndPersist control associated to a 
search request. This connection can be managed in a specific 
interceptor, but in any case, the searchResults will have to go through 
the whole chain.  As it's also a persistent connection, it would be 
better to deal with it at the DirectoryService level.

If the server is a master (a producer), then we will consider that a 
client search request has been received, and persists. All the resulting 
entries are filtered by the chain, and AFAICT, we don't need to add an 
interceptor to filter the results. We just have to consider that the 
clinet is a standard client.

So I would suggest to move the replication configuration at a higher 
level (DirectoryService), and remove the ReplicationInterceptor.

This is not definitive, I'm waiting for any comment before moving on 
this direction regarding to the interceptor removal (I'm not 100% sure 
that we don't need some replication specific filtering, for instance).

wdyt ?

-- 
--
cordialement, regards,
Emmanuel Lécharny
www.iktek.com
directory.apache.org



Re: Replication configuration

Posted by Emmanuel Lecharny <el...@apache.org>.
Kiran Ayyagari wrote:
>
>>    <replicationInterceptor>
>>      <configuration>
>>        <replicationConfiguration sync="RefreshOnly"   (or 
>> "RefreshAndPersist")
>
> how about replacing the parameter 'sync' with 'refreshOnly' and 
> setting the value like
>
> <replicationConfiguration refreshOnly="true" />
>
> The mode 'refreshAndPersist' will be used(assumed) if the attribute 
> 'refreshOnly' is set
> to 'false' or is not present.
Make sense...


-- 
--
cordialement, regards,
Emmanuel Lécharny
www.iktek.com
directory.apache.org



Re: Replication configuration

Posted by Kiran Ayyagari <ay...@gmail.com>.
>    <replicationInterceptor>
>      <configuration>
>        <replicationConfiguration sync="RefreshOnly"   (or 
> "RefreshAndPersist")

how about replacing the parameter 'sync' with 'refreshOnly' and setting the value like

<replicationConfiguration refreshOnly="true" />

The mode 'refreshAndPersist' will be used(assumed) if the attribute 'refreshOnly' is set
to 'false' or is not present.

-- 
Kiran Ayyagari

Re: Replication configuration

Posted by Emmanuel Lecharny <el...@apache.org>.
> However, on the last snippet of XML, I can't see the interval on the first
> replica. Does this mean there's a default value?
>   
For refreshAndPersist, the interval does not make any sense, as you will 
be informed everytime an update is done on the master server.
> All this "complex" (more verbose than complex actually) configuration makes
> me think a really cool UI could be drawn for this in Studio. :)
>   
Yeah, definitively !

> We could even use GEF (Graphical Editing Framework) to build a graphical UI
> for handling the replication.
>   
Why not... But we have to make replication working first :)

-- 
--
cordialement, regards,
Emmanuel Lécharny
www.iktek.com
directory.apache.org



Re: Replication configuration

Posted by Pierre-Arnaud Marcelot <pa...@marcelot.net>.
Hi Emmanuel,
Sorry for flooding the mailing list with my JIRA cleaning. Your message is
now hidden between dozen of JIRA notifications.

On Fri, Mar 13, 2009 at 3:34 PM, Emmanuel Lecharny <el...@apache.org>wrote:

> Hi,
>
> on the replication branch, we are now able to connect to an OpenLDAP
> server, and subscribe as a slave with the refreshOnly or refreshAndPersist
> operations. This is very experimental atm, and we need more than the current
> configuration in order to implement this in the server.
>

That's awesome! I'd love to see that in action.
Maybe at ApacheCon...


> So far, here are the needed informations :
> - a replicaId (or RID), uniquely identifying the server
> - a replication type : RefreshOnly or RefreshAndPersist
> - an interval for a RefreshOnly replication
> - a search base, which will be the part of the tree to replicate
> - a principal, used to connect on the master server
> - a password
>
> Currently, the lacking informations are :
> - the replication type
> - the search base
> - the principal
> - the password
>
> We have a Replica class holding similar informations, namely a
> SocketAddress, as we where based on a proprietary protocol to handle the
> replication in the previous version (Mitosis). As the new replication model
> will be based on RFC 4533, we need to change this.
>
> So the ReplicationInterceptor configuration will change. Currently, it
> looks like that :
>
>   <replicationInterceptor>
>     <configuration>
>       <replicationConfiguration logMaxAge="5"
>                                 replicaId="instance_a"
>                                 replicationInterval="2"
>                                 responseTimeout="10"
>                                 serverPort="10390">
>         <s:property name="peerReplicas">
>           <s:set>
>             <s:value>instance_b@localhost:1234</s:value>
>             <s:value>instance_c@localhost:1234</s:value>
>           </s:set>
>         </s:property>
>       </replicationConfiguration>
>     </configuration>
>   </replicationInterceptor>
>
>
> We will remove the logMaxAge, responseTimeout and serverPort parameters.
> The peerReplicas will contain an LdapURL with the list of server we want to
> replicate from. Those replicas will look like :
>
> ldap://[<principalDN>:<password>]@<server>[:<port>]/<baseDN>
>
> We will end with a configuration like :
>
>   <replicationInterceptor>
>     <configuration>
>       <replicationConfiguration sync="RefreshOnly"   (or
> "RefreshAndPersist")
>                                 replicaId="001"
>                                 replicationInterval="00:05:00">  (every 5
> minutes)
>         <s:property name="peerReplicas">
>           <s:set>
>             <s:value>ldap://uid=admin,ou=
> system:secret@ldap2.apache.org:10389/ou=people,dc=apache,dc=org</s:value>
>             <s:value>ldap://uid=admin,ou=
> system:secret@ldap3.apache.org:10389/ou=projects,dc=apache,dc=org
> </s:value>
>           </s:set>
>         </s:property>
>       </replicationConfiguration>
>     </configuration>
>   </replicationInterceptor>
>
> (the replicaId is now a 3 digits value, as the OpenLDAP looks like
> rid=000,sid=000,csn=20090311230920.705931Z#000000#001#000000).
>
> We may want to be more specific with the peerReplicas, like for instance
> define a different replication Interval for each search base. That could be
> done using such a configuration :
>
>   <replicationInterceptor>
>     <configuration>
>       <replicationConfiguration replicaId="001">
>         <s:property name="peerReplicas">
>           <s:set>
>             <replica>
>               <type>refreshAndPersist</type>
>               <principalDn>uid=admin,ou=system</principalDn>
>               <password>secret</password>
>               <server>ldap1.apache.org</server>
>               <port>10389</port>
>               <baseDN>ou=people,dc=apache,dc=org</baseDN>
>             </replica>
>             <replica>
>               <type>refreshOnly</type>
>               <principalDn>uid=admin,ou=system</principalDn>
>               <password>secret</password>
>               <server>ldap1.apache.org</server>
>               <port>10389</port>
>               <baseDN>cn=config,ou=system</baseDN>
>               <interval>01:00:00</interval>
>             </replica>
>           </s:set>
>         </s:property>
>       </replicationConfiguration>
>     </configuration>
>   </replicationInterceptor>
>
>
> This is a very preliminary proposal. Feel free to comment it.


This looks good.

However, on the last snippet of XML, I can't see the interval on the first
replica. Does this mean there's a default value?

All this "complex" (more verbose than complex actually) configuration makes
me think a really cool UI could be drawn for this in Studio. :)
We could even use GEF (Graphical Editing Framework) to build a graphical UI
for handling the replication.

Regards,
P-A