You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@directory.apache.org by Emmanuel Lecharny <el...@gmail.com> on 2011/06/29 14:10:37 UTC

Partition and Backend confusion

Hi guys,

I'd like to address some issue I have with the current Partition 
interface and the inherited classes. Let me explain.

<side note>
First, please excuse me if some of the terms I'm using may not be 
totally semantically correct. I may use the wrong ones in some cases, 
but what I have in mind is pretty clear, it's just that I may use the 
incorrect name for the concept I'm discussing. So bear with me, and feel 
free to correct me.
</side note>

<side note>
I don't try to put some blame on anyone here, I just try to get a clue 
about this part of the code I'm discovering. The question I raise are 
more or less to get a better vision on the why and how the existing code 
was layered, and my perception might be iconoclast here. Just to be sure 
that I don't hurt anyone's feeling, this is certainly not my intention. 
Remember,  i'm quite a virgin in the partition/store code :)
</side note>

We currently have a common Partition interface, which is the base on 
which all the backend implementations are built. It's also used as an 
interface for the Nexus.

In fact, we can split the Partition implementations in two categories :
1) those which are manipulation an opertation context (AddContext, 
DeleteOperationContext, etc)
2) those which are interacting with the underlying store

The current hierarchy is (<XXX> : interface, [YYY] : abstract class) :
<Partition>
   [AbstractPartition]
     [BTreePartition<ID>]
       [AbstractLdifPartition]
         LdifPartition
         ReadOnlyConfigurationPartition
         SingleFileLdifPartition
       [AbstractXdbmPartition<ID>]
         AvlPartition
         JdbmPartition
     DefaultPartitionNexus (also implement <PartitionNexus>)
     NullPartition
     SchemaPartition

Some few remarks :
- the BTreePartition<ID> should be renamed AbstractBTreePartition
- we should have a BTreePartition interface

I'm also wondering if we should not make a better distinction between 
what is backed by a store (ie, BTreePartition and SchemaPartition) and 
what is not (ie PartitionNexus). Morever, why should the PartitionNexus 
extend the Partition interface ? Does it make sense?

-- 
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com


Re: Partition and Backend confusion

Posted by Alex Karasulu <ak...@apache.org>.
FYI read, will read again. Under heavy load but I am not giving up on
this thread ... it's a good one :-). Just wanted to give you a heads
up. More to come ...

Thanks for understanding,
Alex

On Fri, Jul 8, 2011 at 9:23 AM, Emmanuel Lécharny <el...@apache.org> wrote:
> On 7/8/11 12:10 AM, Alex Karasulu wrote:
>>
>> On Wed, Jun 29, 2011 at 3:10 PM, Emmanuel Lecharny<el...@gmail.com>
>>  wrote:
>>
>> SNIP ...
>>
>>> We currently have a common Partition interface, which is the base on
>>> which
>>> all the backend implementations are built. It's also used as an interface
>>> for the Nexus.
>>
>> Yes.
>>
>>> In fact, we can split the Partition implementations in two categories :
>>> 1) those which are manipulation an opertation context (AddContext,
>>> DeleteOperationContext, etc)
>>> 2) those which are interacting with the underlying store
>>
>> This does not make any sense to me at all. I can't see these as being
>> two distinct categories. I must not be understanding you, can you
>> elaborate?
>
> Sure. What I'm saying is that we have one layer which takes methods with
> OperationContext parameters, and transforms them to what is expected by the
> under layer. To me, those two layers are two different things.
>>>
>>> The current hierarchy is (<XXX>  : interface, [YYY] : abstract class) :
>>> <Partition>
>>>  [AbstractPartition]
>>>    [BTreePartition<ID>]
>>>      [AbstractLdifPartition]
>>>        LdifPartition
>>>        ReadOnlyConfigurationPartition
>>>        SingleFileLdifPartition
>>>      [AbstractXdbmPartition<ID>]
>>>        AvlPartition
>>>        JdbmPartition
>>>    DefaultPartitionNexus (also implement<PartitionNexus>)
>>>    NullPartition
>>>    SchemaPartition
>>>
>>> Some few remarks :
>>> - the BTreePartition<ID>  should be renamed AbstractBTreePartition
>>> - we should have a BTreePartition interface
>>
>> Why?
>
> All the abstract classes we have in ADS are prefixed by Abstract. For
> consistency reasons, I do think that we should rename BTreePartition to
> AbstractBTreePartition.
>
> Also as it exposes methods which are specific to BTrees, an interface would
> be a good way to isolate the BTree behaviors.
>
> Nothing big here, just clarification.
>>>
>>> I'm also wondering if we should not make a better distinction between
>>> what
>>> is backed by a store (ie, BTreePartition and SchemaPartition) and what is
>>> not (ie PartitionNexus). Morever, why should the PartitionNexus extend
>>> the
>>> Partition interface ? Does it make sense?
>>
>> The PartitionNexus is a proxy to partitions so it implements the
>> interface. It's a single point to apply operations and have the route
>> to the appropriate partition.
>
> Makes sense.
>>
>> There's work to be done in this area for sure. First off I'd like to
>> see partitions that hash entries across other partitions and some that
>> contain entries and still can nest other partitions: acting both as
>> entry stores and routers of operations. For example I've wanted a root
>> partition that could also mount (nest) other partitions while still
>> storing entries so the root DSE can be mastered in it and we can
>> manage other subentries for the server in it instead of at the
>> namingContext level.
>
> With you.
>>
>> Incidentally the store interface might be able to be gotten rid of.
>
> Hmmm, can you elaborate ?
>>
>> The key to several things we're going to do down the line around
>> partitions rests around having entry ID be globally unique rather than
>> unique within just the partition. After this is done it opens the door
>> to several solutions ... including solutions to a couple recent
>> problems:
>>
>>   (1) aliases referring to entry targets across partitions
>>   (2) moddn operations across partitions
>>   (3) virtualization, via views, and other constructs need it
>
> Just wondering how badly we need to get rid of those IDs. They are not
> unique, each partition has it own, but AFAICT, if we transit one entry from
> a partition to another one (moddn), we don't care too much about the ID.
>
> Regarding Aliases, I'm not sure (yet) we have to deal with them at this
> layer. Still have to think about it.
>
> Moddn ops can be leveraged across partitions even if we keep the ID around.
> One partition does not have to know anything about the other partition's ID.
> We are just moving full entries (and all the associated index) from one
> partition to the other, as if it was a delete on one side, and an add on the
> other.
>
> Virtualization is most certainly handled at an upper layer, and should
> probably don't have to know anything about the storage.
>>
>> ....
>>
>> There's more. But first we need a globally unique UUID for entries as
>> the PK and we need to get rid of using long partition specific entry
>> IDs as the PK.
>
> Ok. I'm not sure we need to get rid of IDs right now, but I may be missing
> some element sin the big picture atm. It needs some serious consideration
> anyway. This is not something we should do lightly, and certainly not for
> 2.0. However, if we need to do this move and we may perfectly have to do it,
> then we need a stabilized base to work on.
>>
>> I would not change around interfaces right now. It's just going shift
>> things without a clear direction and as you said yourself you're new
>> to this code. Class renames and a few interface changes just to get
>> familiar and comfortable with the code base is not going to help down
>> the line.
>
> I don't think either we need to change a hell lots of things ATM. Far too
> dangerous, and probably overkilling. As you said, this is a part of the code
> I don't know well, and I'm just pushing some ideas around to see where it's
> bringing me. I already paid the price once by killing one week on a reverse
> table removal for nothing, I certainly would like to avoid such waste of
> time again.
>>
>> Let's go global on the UUID and look at the big partition picture. We
>> can redesign things to best suite small steps to get to our ultimate
>> destination.
>
> Sure. Right now, I'm pushing ideas. I don't want them to be pushed into the
> server, it's way too far fetched, and I may miss the target at large. In any
> case, I don't want to jeopardize 2.0, when what we need to make it solid is
> just a couple of features (namely, replication and DSR).
>
> Atm, I'm just trying to get aliases work smoothly but if it requires some
> huge refactoring, then I'll let it down for 2.0. We don't need aliases for
> 2.0, we just need replication.
>
> If we have to refactor heavily the backend to get aliases working fine, then
> I'm fine for a 2.1 or a 3.0. In any case, no urgency.
>
> --
> Regards,
> Cordialement,
> Emmanuel Lécharny
> www.iktek.com
>
>

Re: Partition and Backend confusion

Posted by Emmanuel Lécharny <el...@apache.org>.
On 7/8/11 12:10 AM, Alex Karasulu wrote:
> On Wed, Jun 29, 2011 at 3:10 PM, Emmanuel Lecharny<el...@gmail.com>  wrote:
>
> SNIP ...
>
>> We currently have a common Partition interface, which is the base on which
>> all the backend implementations are built. It's also used as an interface
>> for the Nexus.
> Yes.
>
>> In fact, we can split the Partition implementations in two categories :
>> 1) those which are manipulation an opertation context (AddContext,
>> DeleteOperationContext, etc)
>> 2) those which are interacting with the underlying store
> This does not make any sense to me at all. I can't see these as being
> two distinct categories. I must not be understanding you, can you
> elaborate?
Sure. What I'm saying is that we have one layer which takes methods with 
OperationContext parameters, and transforms them to what is expected by 
the under layer. To me, those two layers are two different things.
>> The current hierarchy is (<XXX>  : interface, [YYY] : abstract class) :
>> <Partition>
>>   [AbstractPartition]
>>     [BTreePartition<ID>]
>>       [AbstractLdifPartition]
>>         LdifPartition
>>         ReadOnlyConfigurationPartition
>>         SingleFileLdifPartition
>>       [AbstractXdbmPartition<ID>]
>>         AvlPartition
>>         JdbmPartition
>>     DefaultPartitionNexus (also implement<PartitionNexus>)
>>     NullPartition
>>     SchemaPartition
>>
>> Some few remarks :
>> - the BTreePartition<ID>  should be renamed AbstractBTreePartition
>> - we should have a BTreePartition interface
> Why?
All the abstract classes we have in ADS are prefixed by Abstract. For 
consistency reasons, I do think that we should rename BTreePartition to 
AbstractBTreePartition.

Also as it exposes methods which are specific to BTrees, an interface 
would be a good way to isolate the BTree behaviors.

Nothing big here, just clarification.
>> I'm also wondering if we should not make a better distinction between what
>> is backed by a store (ie, BTreePartition and SchemaPartition) and what is
>> not (ie PartitionNexus). Morever, why should the PartitionNexus extend the
>> Partition interface ? Does it make sense?
> The PartitionNexus is a proxy to partitions so it implements the
> interface. It's a single point to apply operations and have the route
> to the appropriate partition.
Makes sense.
> There's work to be done in this area for sure. First off I'd like to
> see partitions that hash entries across other partitions and some that
> contain entries and still can nest other partitions: acting both as
> entry stores and routers of operations. For example I've wanted a root
> partition that could also mount (nest) other partitions while still
> storing entries so the root DSE can be mastered in it and we can
> manage other subentries for the server in it instead of at the
> namingContext level.
With you.
> Incidentally the store interface might be able to be gotten rid of.
Hmmm, can you elaborate ?
> The key to several things we're going to do down the line around
> partitions rests around having entry ID be globally unique rather than
> unique within just the partition. After this is done it opens the door
> to several solutions ... including solutions to a couple recent
> problems:
>
>    (1) aliases referring to entry targets across partitions
>    (2) moddn operations across partitions
>    (3) virtualization, via views, and other constructs need it
Just wondering how badly we need to get rid of those IDs. They are not 
unique, each partition has it own, but AFAICT, if we transit one entry 
from a partition to another one (moddn), we don't care too much about 
the ID.

Regarding Aliases, I'm not sure (yet) we have to deal with them at this 
layer. Still have to think about it.

Moddn ops can be leveraged across partitions even if we keep the ID 
around. One partition does not have to know anything about the other 
partition's ID. We are just moving full entries (and all the associated 
index) from one partition to the other, as if it was a delete on one 
side, and an add on the other.

Virtualization is most certainly handled at an upper layer, and should 
probably don't have to know anything about the storage.
> ....
>
> There's more. But first we need a globally unique UUID for entries as
> the PK and we need to get rid of using long partition specific entry
> IDs as the PK.
Ok. I'm not sure we need to get rid of IDs right now, but I may be 
missing some element sin the big picture atm. It needs some serious 
consideration anyway. This is not something we should do lightly, and 
certainly not for 2.0. However, if we need to do this move and we may 
perfectly have to do it, then we need a stabilized base to work on.
> I would not change around interfaces right now. It's just going shift
> things without a clear direction and as you said yourself you're new
> to this code. Class renames and a few interface changes just to get
> familiar and comfortable with the code base is not going to help down
> the line.
I don't think either we need to change a hell lots of things ATM. Far 
too dangerous, and probably overkilling. As you said, this is a part of 
the code I don't know well, and I'm just pushing some ideas around to 
see where it's bringing me. I already paid the price once by killing one 
week on a reverse table removal for nothing, I certainly would like to 
avoid such waste of time again.
> Let's go global on the UUID and look at the big partition picture. We
> can redesign things to best suite small steps to get to our ultimate
> destination.
Sure. Right now, I'm pushing ideas. I don't want them to be pushed into 
the server, it's way too far fetched, and I may miss the target at 
large. In any case, I don't want to jeopardize 2.0, when what we need to 
make it solid is just a couple of features (namely, replication and DSR).

Atm, I'm just trying to get aliases work smoothly but if it requires 
some huge refactoring, then I'll let it down for 2.0. We don't need 
aliases for 2.0, we just need replication.

If we have to refactor heavily the backend to get aliases working fine, 
then I'm fine for a 2.1 or a 3.0. In any case, no urgency.

-- 
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com


Re: Partition and Backend confusion

Posted by Alex Karasulu <ak...@apache.org>.
On Wed, Jun 29, 2011 at 3:10 PM, Emmanuel Lecharny <el...@gmail.com> wrote:

SNIP ...

> We currently have a common Partition interface, which is the base on which
> all the backend implementations are built. It's also used as an interface
> for the Nexus.

Yes.

> In fact, we can split the Partition implementations in two categories :
> 1) those which are manipulation an opertation context (AddContext,
> DeleteOperationContext, etc)
> 2) those which are interacting with the underlying store

This does not make any sense to me at all. I can't see these as being
two distinct categories. I must not be understanding you, can you
elaborate?

> The current hierarchy is (<XXX> : interface, [YYY] : abstract class) :
> <Partition>
>  [AbstractPartition]
>    [BTreePartition<ID>]
>      [AbstractLdifPartition]
>        LdifPartition
>        ReadOnlyConfigurationPartition
>        SingleFileLdifPartition
>      [AbstractXdbmPartition<ID>]
>        AvlPartition
>        JdbmPartition
>    DefaultPartitionNexus (also implement <PartitionNexus>)
>    NullPartition
>    SchemaPartition
>
> Some few remarks :
> - the BTreePartition<ID> should be renamed AbstractBTreePartition
> - we should have a BTreePartition interface

Why?

> I'm also wondering if we should not make a better distinction between what
> is backed by a store (ie, BTreePartition and SchemaPartition) and what is
> not (ie PartitionNexus). Morever, why should the PartitionNexus extend the
> Partition interface ? Does it make sense?

The PartitionNexus is a proxy to partitions so it implements the
interface. It's a single point to apply operations and have the route
to the appropriate partition.

There's work to be done in this area for sure. First off I'd like to
see partitions that hash entries across other partitions and some that
contain entries and still can nest other partitions: acting both as
entry stores and routers of operations. For example I've wanted a root
partition that could also mount (nest) other partitions while still
storing entries so the root DSE can be mastered in it and we can
manage other subentries for the server in it instead of at the
namingContext level.

Incidentally the store interface might be able to be gotten rid of.

The key to several things we're going to do down the line around
partitions rests around having entry ID be globally unique rather than
unique within just the partition. After this is done it opens the door
to several solutions ... including solutions to a couple recent
problems:

  (1) aliases referring to entry targets across partitions
  (2) moddn operations across partitions
  (3) virtualization, via views, and other constructs need it
....

There's more. But first we need a globally unique UUID for entries as
the PK and we need to get rid of using long partition specific entry
IDs as the PK.

I would not change around interfaces right now. It's just going shift
things without a clear direction and as you said yourself you're new
to this code. Class renames and a few interface changes just to get
familiar and comfortable with the code base is not going to help down
the line.

Let's go global on the UUID and look at the big partition picture. We
can redesign things to best suite small steps to get to our ultimate
destination.

Regards,
Alex