You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@directory.apache.org by Emmanuel Lecharny <el...@gmail.com> on 2007/09/21 16:43:39 UTC

Convo about collective attributes

Hi guys,

we have had an interesting convo yesturday (debug session) wuth Alex.
Here is what we talked about.

While debugging the collective attribute (CA) interceptor, I faced a
strange error. When processing a search, the way CA are handled is the
following :
1) do a lookup for the entry
2) filter the result and add the CA if the returned entry contains a
collectiveAttributeSubentries (CAS) attribute
  3-1) return the entry augmented with the CA if the CAS is found
  3-2) or just return the entry as is.

This is ok, but the step (2) did another lookup in the backend, so
there was 2 lookups for one single entry (overkilling ...). So I
removed this ssecond lookup, as we already have got the entry
previously.

Everything went right in standard searches, but when you specify some
returned attributes, like a CA, to be returned, then the entry comes
back without anything ... Puzzling !

After some debugging, I found that the search does a lookup, and in
the partition, the lookup semantics is : get all the attributes for
the specified entry, unless you specify some attributes, then simply
return the found attributes. Of course, as CA are _not_ stored within
the entry, if you ask for this CA only, you won't get it. And as the
lookup returns only the requested attributes, it does not return the
CAS anymore. So the collectiveAttribute won't find this CAS, and won't
add the CA value... This is a dead end, and now I understand why there
is a second lookup : to workaround this problem. The second lookup is
done without any requested attribute, so you get the whole entry, and
then you can find the CAS, and add the CA.

How to fix this bad workaround ? There is a solution :
- modify the lookup semantic to avoid dealing with requested attributes

Sadly, doing that has such extensive implication that it's not
possible to do that in the current version (1.5). So this is not an
easy solution.

There is another way to fix the problem, which is a hack, but this
hack avoid a double lookup when the user don't specify an attribute
(this hack is not really interesting...)

Up to this point, we were discussing about how to fix this problem,
but then we switched to the reason why we have this semantic for the
lookup method. Alex pointed out that the fixes mad in the filter
handling this week also modified the Partition semantic. Here are the
important points :

1) Lookup should have another semantic : it simply should returns
entries, with all its attributes.
2) The LeafEvaluator which has been modified in the Parttion is really
specific to the BTree implementation. If we have to change the
backend, then the server won't work anymore, unless a new
LeafEvaluator class is written (and it's not really the easiest part
!). The idea is to get this evalutaion done _before_ the backend. For
that, we need to modify the lookup method and split it in three
methods :
 - lookupWithDN( DN ) which will return a single entry, the one
associated with the DN (or null if there is no entry)
 - lookupWithAttribute( attrId ) which will return an enumeration of
entries, using the indexed attribute. This lookup should only be used
if the attribute is indexed.
 - lookupAll() will return all the entries. It's a full scan.
Then you can build the evaluator on top of the Partition, and decouple
its logic from the backend implementation.
Another advantage will be that we will be able to build an entry cache
 on top of the backend, as we will simply have to implement it as a
Partition. The 3 lookup methods will be mapped to return an object if
it is cached, otherwise ask the real partition for the object.

This is what we discussed yesturday, if anyone want to comment this
mail, you are welcomed. Alex, feel free to comment it if you feel that
I have missed something



-- 
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com

Re: Convo about collective attributes

Posted by Alex Karasulu <ak...@apache.org>.
Let me add to this with my thoughts ...

Partitions are supposed to act as dumb stores that you just add and remove
entries from
since we don't know how they are implemented.  We try to expose as little as
possible on
this interface to allow them to utilize backing store features.  The idea is
to keep the
partition as simple as possible so the higher levels of the server can
centralize logic for
handling various LDAP specific features, semantics and operations so that
each Partition
need not reimplement these capabilities.  For example dealing with
OperationalAttributes
etc.  These interfaces also should not expose implementation specific
details.

Now this is a balancing act since Partitions will have some capabilities
that are inherent
to their backing store.  Some features are left best handled by a Partition
and others should
and have to be handled higher up.

One sticky point is how to delegate search to a Partition.  Presently
Partitions are supposed
to conduct search with some cues from the server.  Some partitions like
btree based partitions
(JDBM or JE) will need to  implement a search engine where SQL backed
partitions will not and
can leverage the underlying SQL query engine for various reasons.  Then
there are odd things
like virtual partitions or proxying partitions.

We are finding that certain optimizations can be done deep inside a
partition to improve
performance however it is requiring more LDAP specific logic to be put
inside them which
makes it so the Partition now must be more aware and less dumb.  These are
the class of
problems that are causing Emmanuel's issues below.

Alex

On 9/21/07, Emmanuel Lecharny <el...@gmail.com> wrote:
>
> Hi guys,
>
> we have had an interesting convo yesturday (debug session) wuth Alex.
> Here is what we talked about.
>
> While debugging the collective attribute (CA) interceptor, I faced a
> strange error. When processing a search, the way CA are handled is the
> following :
> 1) do a lookup for the entry
> 2) filter the result and add the CA if the returned entry contains a
> collectiveAttributeSubentries (CAS) attribute
>   3-1) return the entry augmented with the CA if the CAS is found
>   3-2) or just return the entry as is.
>
> This is ok, but the step (2) did another lookup in the backend, so
> there was 2 lookups for one single entry (overkilling ...). So I
> removed this ssecond lookup, as we already have got the entry
> previously.
>
> Everything went right in standard searches, but when you specify some
> returned attributes, like a CA, to be returned, then the entry comes
> back without anything ... Puzzling !
>
> After some debugging, I found that the search does a lookup, and in
> the partition, the lookup semantics is : get all the attributes for
> the specified entry, unless you specify some attributes, then simply
> return the found attributes. Of course, as CA are _not_ stored within
> the entry, if you ask for this CA only, you won't get it. And as the
> lookup returns only the requested attributes, it does not return the
> CAS anymore. So the collectiveAttribute won't find this CAS, and won't
> add the CA value... This is a dead end, and now I understand why there
> is a second lookup : to workaround this problem. The second lookup is
> done without any requested attribute, so you get the whole entry, and
> then you can find the CAS, and add the CA.
>
> How to fix this bad workaround ? There is a solution :
> - modify the lookup semantic to avoid dealing with requested attributes
>
> Sadly, doing that has such extensive implication that it's not
> possible to do that in the current version (1.5). So this is not an
> easy solution.
>
> There is another way to fix the problem, which is a hack, but this
> hack avoid a double lookup when the user don't specify an attribute
> (this hack is not really interesting...)
>
> Up to this point, we were discussing about how to fix this problem,
> but then we switched to the reason why we have this semantic for the
> lookup method. Alex pointed out that the fixes mad in the filter
> handling this week also modified the Partition semantic. Here are the
> important points :
>
> 1) Lookup should have another semantic : it simply should returns
> entries, with all its attributes.
> 2) The LeafEvaluator which has been modified in the Parttion is really
> specific to the BTree implementation. If we have to change the
> backend, then the server won't work anymore, unless a new
> LeafEvaluator class is written (and it's not really the easiest part
> !). The idea is to get this evalutaion done _before_ the backend. For
> that, we need to modify the lookup method and split it in three
> methods :
> - lookupWithDN( DN ) which will return a single entry, the one
> associated with the DN (or null if there is no entry)
> - lookupWithAttribute( attrId ) which will return an enumeration of
> entries, using the indexed attribute. This lookup should only be used
> if the attribute is indexed.
> - lookupAll() will return all the entries. It's a full scan.
> Then you can build the evaluator on top of the Partition, and decouple
> its logic from the backend implementation.
> Another advantage will be that we will be able to build an entry cache
> on top of the backend, as we will simply have to implement it as a
> Partition. The 3 lookup methods will be mapped to return an object if
> it is cached, otherwise ask the real partition for the object.
>
> This is what we discussed yesturday, if anyone want to comment this
> mail, you are welcomed. Alex, feel free to comment it if you feel that
> I have missed something
>
>
>
> --
> Regards,
> Cordialement,
> Emmanuel Lécharny
> www.iktek.com
>