You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@directory.apache.org by Alex Karasulu <ak...@apache.org> on 2007/08/19 22:31:35 UTC

[ApacheDS] Using indices while evaluating filters with negation operators (was: Re: svn commit: r567310 - ... )

Hi Martin,

On 8/18/07, Martin Alderson <eq...@planetquake.com> wrote:
>
> Hi Alex,
>
> We need both the entries that don't pass the specified filter for the
> attribute, and those that don't have the attribute at all.  We would
> need a way to get all entries that are not contained in the index for
> the attribute and a way to join that enumeration to the one we had in
> the old code (which should be easy enough with a JoinedEnumeration class).
>
> I think the current attribute indices need to be modified to store a
> list of entries that do not have the attribute.  Each index will then
> take up a bit more space.


Yeah this could get very large.  If for example theres an index on dc and
there are 4 entries in the server that have a dc attribute when there are
millions of entries in the server then the index as it is now will be tiny.
Now
if we applied this idea where we included all the entries that do not have
this attribute at all then we'd millions - 4 entries in that btree of the dc
index.

All existing ApacheDS installations would
> also need their attribute indices rebuilding too.
>
> What do you think?


I think we can logically infer that a value is not present if the entry id
is not in this attribute
index and is not in the existence system index.  You see where I'm going
with this?

Alex

Re: [ApacheDS] Using indices while evaluating filters with negation operators (was: Re: svn commit: r567310 - ... )

Posted by Martin Alderson <eq...@planetquake.com>.
Hi Alex,

OK, so we would just need a way to enumerate over all entries from the 
master table, throwing away all those found in the specified index.

I guess that's more efficient than the way I'm doing it now - where we 
enumerate over all entries in the master table, then (I think!) read 
each and skip it if it doesn't meet the condition in the sub filter. 
For the entries that are skipped we basically have an improvement of 
just requiring a hash lookup rather than a database read and some check 
(usually a string comparison).

Couldn't we use this same optimisation for AND and OR filters - i.e. 
leverage multiple indices?  As an example, for the filter 
(&(objectClass=inetOrgPerson)(roomNumber=52)), we could use indices for 
both objectClass and roomNumber (if they existed) to just read those 
entries from the database that match both filters.  I've been thinking 
about this kind of optimisation for a few days now - am I right in 
thinking this is possible?

I don't fully understand the JDBM API yet, but will keep looking at it 
until it clicks into place.  If the optimisation I am talking about 
above is possible I would be very interested in implementing it.

Martin



Alex Karasulu wrote:
> Hi Martin,
> 
> On 8/18/07, Martin Alderson <eq...@planetquake.com> wrote:
>> Hi Alex,
>>
>> We need both the entries that don't pass the specified filter for the
>> attribute, and those that don't have the attribute at all.  We would
>> need a way to get all entries that are not contained in the index for
>> the attribute and a way to join that enumeration to the one we had in
>> the old code (which should be easy enough with a JoinedEnumeration class).
>>
>> I think the current attribute indices need to be modified to store a
>> list of entries that do not have the attribute.  Each index will then
>> take up a bit more space.
> 
> 
> Yeah this could get very large.  If for example theres an index on dc and
> there are 4 entries in the server that have a dc attribute when there are
> millions of entries in the server then the index as it is now will be tiny.
> Now
> if we applied this idea where we included all the entries that do not have
> this attribute at all then we'd millions - 4 entries in that btree of the dc
> index.
> 
> All existing ApacheDS installations would
>> also need their attribute indices rebuilding too.
>>
>> What do you think?
> 
> 
> I think we can logically infer that a value is not present if the entry id
> is not in this attribute
> index and is not in the existence system index.  You see where I'm going
> with this?
> 
> Alex
>