You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@directory.apache.org by Alex Karasulu <ak...@apache.org> on 2009/03/28 02:57:23 UTC

[ApacheDS] A solution for the subentry association performance problem

Hi all,

<note>
Not a dig at anyone for supporting DN presence in entries.
</note>

Similar Problem We Are Familiar With
--------------------------------------------------------------

During this ApacheCon we had discussions around the problem of embedding the
DN in an entry.  Presently the DN is maintained in an entry and in the DN
index.  This makes it very expensive to perform modifyDn operations.  The
classic scenario of this is when a parent entry containing millions of
descendants has it's DN modified.  The result is devastating:

(1) millions of entries are looked up from the master table
     (a) millions of disk blocks are accessed
     (b) millions of entries are deserialized
(2) the entry cache turns over loosing it's cache hit memory
(3) the operation can take minutes if not hours


The Problem With Subentry Changes
------------------------------------------------------------

Almost the same exact problem is causing the subentry performance issue we
have been observing.  When a subentry is created, all entries selected by
the subentry's subtree specification are altered to point to that subentry
for it's administrative purpose.  For example, if the subentry is for access
control then the accessControlSubentries multi-valued attribute has the DN
of the subentry added to any entry selected by the subentry's
subtreeSpecification.  This attribute is persisted in the entry within the
master table.  When the subentry is removed, renamed, or it's subtree
specification is altered, all the old entries selected previously are
recalled from the master table to remove the old reference.  Then the new
reference is added to all entries selected by the new filter.

This process is very expensive as is the modifyDN issue because of the
massive number of master table read-write operations.  As with the modifyDN
problem a specialized index could minimize the impact of these kinds of
alterations on large sets of entries.


The Solution
--------------------

First a special [subentry ID<->entry ID] index will be maintained.  This
maps the ID of a subentry to the ID of the entry the subentry selects via
it's subtree specification.  Note we're not differentiating between the
different kinds of subentries.  When an entry is looked up specifically
asking for a subentry association attribute like accessControlSubentries
then we can inject this attribute into the entry before returning it.  To
compute the contents of the accessControlSubentries attribute all we have to
do is walk this index in the opposite direction for the entry ID.  For each
subentry all we have to do is check the objectClass index to see if it is an
accessControlSubentry.  If it is then the DN of that entry is added to the
accessControlSubentries attribute.  Since subentry reference attributes are
all operational and normally are not returned unless explicitly asked for,
we seldom need to add these attributes nor do we need to access this index.

Presently the authorization/subtree subsystem will inject these attributes
into entries before pumping them into the partition.  This means that we can
maintain this index by finding these subentry association attributes and
using their information before deleting them.

Issues
----------

I don't like this solution since it puts more responsibility on the
partition implementation to consistently adhere to these semantics.  It also
presumes every partition will have the ability to create this index. This
makes me uncomfortable.  However this would make these server faster in it's
ability to respond to subentry changes.  All the changes would occur on this
index instead of on the master table.

Thoughts?

Alex

Re: [ApacheDS] A solution for the subentry association performance problem

Posted by Alex Karasulu <ak...@gmail.com>.
On Sat, Mar 28, 2009 at 7:46 AM, Ersin ER <er...@gmail.com> wrote:

> Hi,
>
> The problem you mentioned and many more (collective attrs, dynamic groups)
> of that type are related to computed attributes. The solution is to develop
> a Virtual Attribute Computation layer on top of partitions.


Exactly! I was thinking of some mechanisms to enable this.  Namely the idea
of some kind of virtual subsystem came  to mind which would be passed into
partitions for their utilization in search and lookup operations.  First
virtual attributes must be injected into entries when they are looked up.
Second some kind of virtual Index needs to be presented/exposed to allow a
search engine implementation to be able to correctly apply a filter on these
non-physical attributes.


> Of course it's not that easy to implement as it's said but that's something
> we strongly need for many other features we want to implement. I had
> reported this issue here:
>
> http://issues.apache.org/jira/browse/DIRSERVER-1067
>
> Although I do not have a detailed solution proposal what I want to say is
> that it's a more general problem which blocks us implementing other nice
> features.


In this specific case where a subentry index is used I was thinking it's an
implementation detail that I call it an index.  You just need to expose
mappings and allow partitions to access them.  A btree works well in this
case to store precomputed values of subtree specification evaluation.  In
this specific example you are just caching computed attributes, but
essentially these are just as virtual and computed as other kinds of virtual
attributes we've envisioned.

Alex


>
>
> On Sat, Mar 28, 2009 at 03:57, Alex Karasulu <ak...@apache.org> wrote:
>
>> Hi all,
>>
>> <note>
>> Not a dig at anyone for supporting DN presence in entries.
>> </note>
>>
>> Similar Problem We Are Familiar With
>> --------------------------------------------------------------
>>
>> During this ApacheCon we had discussions around the problem of embedding
>> the DN in an entry.  Presently the DN is maintained in an entry and in the
>> DN index.  This makes it very expensive to perform modifyDn operations.  The
>> classic scenario of this is when a parent entry containing millions of
>> descendants has it's DN modified.  The result is devastating:
>>
>> (1) millions of entries are looked up from the master table
>>      (a) millions of disk blocks are accessed
>>      (b) millions of entries are deserialized
>> (2) the entry cache turns over loosing it's cache hit memory
>> (3) the operation can take minutes if not hours
>>
>>
>> The Problem With Subentry Changes
>> ------------------------------------------------------------
>>
>> Almost the same exact problem is causing the subentry performance issue we
>> have been observing.  When a subentry is created, all entries selected by
>> the subentry's subtree specification are altered to point to that subentry
>> for it's administrative purpose.  For example, if the subentry is for access
>> control then the accessControlSubentries multi-valued attribute has the DN
>> of the subentry added to any entry selected by the subentry's
>> subtreeSpecification.  This attribute is persisted in the entry within the
>> master table.  When the subentry is removed, renamed, or it's subtree
>> specification is altered, all the old entries selected previously are
>> recalled from the master table to remove the old reference.  Then the new
>> reference is added to all entries selected by the new filter.
>>
>> This process is very expensive as is the modifyDN issue because of the
>> massive number of master table read-write operations.  As with the modifyDN
>> problem a specialized index could minimize the impact of these kinds of
>> alterations on large sets of entries.
>>
>>
>> The Solution
>> --------------------
>>
>> First a special [subentry ID<->entry ID] index will be maintained.  This
>> maps the ID of a subentry to the ID of the entry the subentry selects via
>> it's subtree specification.  Note we're not differentiating between the
>> different kinds of subentries.  When an entry is looked up specifically
>> asking for a subentry association attribute like accessControlSubentries
>> then we can inject this attribute into the entry before returning it.  To
>> compute the contents of the accessControlSubentries attribute all we have to
>> do is walk this index in the opposite direction for the entry ID.  For each
>> subentry all we have to do is check the objectClass index to see if it is an
>> accessControlSubentry.  If it is then the DN of that entry is added to the
>> accessControlSubentries attribute.  Since subentry reference attributes are
>> all operational and normally are not returned unless explicitly asked for,
>> we seldom need to add these attributes nor do we need to access this index.
>>
>> Presently the authorization/subtree subsystem will inject these attributes
>> into entries before pumping them into the partition.  This means that we can
>> maintain this index by finding these subentry association attributes and
>> using their information before deleting them.
>>
>> Issues
>> ----------
>>
>> I don't like this solution since it puts more responsibility on the
>> partition implementation to consistently adhere to these semantics.  It also
>> presumes every partition will have the ability to create this index. This
>> makes me uncomfortable.  However this would make these server faster in it's
>> ability to respond to subentry changes.  All the changes would occur on this
>> index instead of on the master table.
>>
>> Thoughts?
>>
>> Alex
>>
>>
>>
>>
>
>
> --
> Ersin ER
> http://www.ersiner.net
>

Re: [ApacheDS] A solution for the subentry association performance problem

Posted by Ersin ER <er...@gmail.com>.
The problem with this "filter" approach is that it needs to be built into
the core above partitions, not  at the level of interceptors. So that we can
see it as a regular attribute and use in any operation. (But this makes it
really hard to implement.) A related problem we have due to
as-interceptors-implementation of virtual attributes is here:

http://issues.apache.org/jira/browse/DIRSERVER-824

We need to have virtual attributes just above partitions (or below the
searcn engine).

On Sat, Mar 28, 2009 at 11:48, Emmanuel Lecharny <el...@apache.org>wrote:

> Ersin ER wrote:
>
>> Hi,
>>
>> The problem you mentioned and many more (collective attrs, dynamic groups)
>> of that type are related to computed attributes. The solution is to
>> develop
>> a Virtual Attribute Computation layer on top of partitions.
>>
> I totally agree. Subentry should be seen as filters, instead of implemented
> the way it is. I like to see them as 'layers' applied on top of the Dit.
> They have the same structure than the Dit, but they restrict the number of
> element you can "see" (in this case, "restrict" means applies a set of
> ACIs). SO when processing an operation, you add the subentry filters on the
> result, or even better, you use the subentry as a filter to limit the number
> of returned elements. Then you don't need to modify all the entries the way
> we do. I don't know if I explain corectly what I have in mind, but anyway...
>
> --
> --
> cordialement, regards,
> Emmanuel Lécharny
> www.iktek.com
> directory.apache.org
>
>
>


-- 
Ersin ER
http://www.ersiner.net

Re: [ApacheDS] A solution for the subentry association performance problem

Posted by Emmanuel Lecharny <el...@apache.org>.
Ersin ER wrote:
> Hi,
>
> The problem you mentioned and many more (collective attrs, dynamic groups)
> of that type are related to computed attributes. The solution is to develop
> a Virtual Attribute Computation layer on top of partitions. 
I totally agree. Subentry should be seen as filters, instead of 
implemented the way it is. I like to see them as 'layers' applied on top 
of the Dit. They have the same structure than the Dit, but they restrict 
the number of element you can "see" (in this case, "restrict" means 
applies a set of ACIs). SO when processing an operation, you add the 
subentry filters on the result, or even better, you use the subentry as 
a filter to limit the number of returned elements. Then you don't need 
to modify all the entries the way we do. I don't know if I explain 
corectly what I have in mind, but anyway...

-- 
--
cordialement, regards,
Emmanuel Lécharny
www.iktek.com
directory.apache.org



Re: [ApacheDS] A solution for the subentry association performance problem

Posted by Ersin ER <er...@gmail.com>.
Hi,

The problem you mentioned and many more (collective attrs, dynamic groups)
of that type are related to computed attributes. The solution is to develop
a Virtual Attribute Computation layer on top of partitions. Of course it's
not that easy to implement as it's said but that's something we strongly
need for many other features we want to implement. I had reported this issue
here:

http://issues.apache.org/jira/browse/DIRSERVER-1067

Although I do not have a detailed solution proposal what I want to say is
that it's a more general problem which blocks us implementing other nice
features.

On Sat, Mar 28, 2009 at 03:57, Alex Karasulu <ak...@apache.org> wrote:

> Hi all,
>
> <note>
> Not a dig at anyone for supporting DN presence in entries.
> </note>
>
> Similar Problem We Are Familiar With
> --------------------------------------------------------------
>
> During this ApacheCon we had discussions around the problem of embedding
> the DN in an entry.  Presently the DN is maintained in an entry and in the
> DN index.  This makes it very expensive to perform modifyDn operations.  The
> classic scenario of this is when a parent entry containing millions of
> descendants has it's DN modified.  The result is devastating:
>
> (1) millions of entries are looked up from the master table
>      (a) millions of disk blocks are accessed
>      (b) millions of entries are deserialized
> (2) the entry cache turns over loosing it's cache hit memory
> (3) the operation can take minutes if not hours
>
>
> The Problem With Subentry Changes
> ------------------------------------------------------------
>
> Almost the same exact problem is causing the subentry performance issue we
> have been observing.  When a subentry is created, all entries selected by
> the subentry's subtree specification are altered to point to that subentry
> for it's administrative purpose.  For example, if the subentry is for access
> control then the accessControlSubentries multi-valued attribute has the DN
> of the subentry added to any entry selected by the subentry's
> subtreeSpecification.  This attribute is persisted in the entry within the
> master table.  When the subentry is removed, renamed, or it's subtree
> specification is altered, all the old entries selected previously are
> recalled from the master table to remove the old reference.  Then the new
> reference is added to all entries selected by the new filter.
>
> This process is very expensive as is the modifyDN issue because of the
> massive number of master table read-write operations.  As with the modifyDN
> problem a specialized index could minimize the impact of these kinds of
> alterations on large sets of entries.
>
>
> The Solution
> --------------------
>
> First a special [subentry ID<->entry ID] index will be maintained.  This
> maps the ID of a subentry to the ID of the entry the subentry selects via
> it's subtree specification.  Note we're not differentiating between the
> different kinds of subentries.  When an entry is looked up specifically
> asking for a subentry association attribute like accessControlSubentries
> then we can inject this attribute into the entry before returning it.  To
> compute the contents of the accessControlSubentries attribute all we have to
> do is walk this index in the opposite direction for the entry ID.  For each
> subentry all we have to do is check the objectClass index to see if it is an
> accessControlSubentry.  If it is then the DN of that entry is added to the
> accessControlSubentries attribute.  Since subentry reference attributes are
> all operational and normally are not returned unless explicitly asked for,
> we seldom need to add these attributes nor do we need to access this index.
>
> Presently the authorization/subtree subsystem will inject these attributes
> into entries before pumping them into the partition.  This means that we can
> maintain this index by finding these subentry association attributes and
> using their information before deleting them.
>
> Issues
> ----------
>
> I don't like this solution since it puts more responsibility on the
> partition implementation to consistently adhere to these semantics.  It also
> presumes every partition will have the ability to create this index. This
> makes me uncomfortable.  However this would make these server faster in it's
> ability to respond to subentry changes.  All the changes would occur on this
> index instead of on the master table.
>
> Thoughts?
>
> Alex
>
>
>
>


-- 
Ersin ER
http://www.ersiner.net