You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@directory.apache.org by Alex Karasulu <ak...@gmail.com> on 2009/03/28 12:56:48 UTC

[ApacheDS] Virtualization (was: Re: [ApacheDS] A solution for the subentry association performance problem)

Creating a separate thread on the primary topic of virtualization.

To achieve virtualization I have a specific idea in mind although this will
most likely not be implemented before 2.0.


(1) Server Wide Entry ID (not UUID)

First I'd like to see the server creating IDs for partitions, while exposing
access to this ID.  Right now entry ids are specific to a partition
implementation and are not exposed to higher levels.  I would like to see
these IDs exposed to be able to leverage them within the server above
partitions.  Note there's a confluence page about making server wide
composite IDs which use some bits to associate the entry with it's
partition.

Even if the partition is not necessarily Index based as is JDBM and LDIF
will be, it still can take a server provide ID for entries it creates.  The
partition is still in charge of what is created as long as it uses the
provided identifier to associate with the entry.  This will allow the
virtual subsystem to build indices itself if it's needed, both for caching
and precomputing virtual values.

This change will also enable other kinds of features such as:

   (a) partition nesting
   (b) hashed entry partitioning in a single parent where multiple
partitions can be used to contain for example 500 Million user entries under
a single parent
   (c) A root (default) partition to store things like the RootDSE and DIT
wide subentries


(2) Schema Extension for Virtual Attributes

A new schema extension will be created to be able to mark attributeTypes as
virtual.  This marker simply allows the search engine to know that it must
consult the virtual subsystem when conducting searches with filters contain
virtual attributes.  VIRTUAL might be best.


(3) Virtual Subsystem Interface

An interface is needed to ask specific questions about sets of entries or
specific entry candidates using IDs.  This interface kind of resembles an
Index in the BTree based partitions.  You can ask for the value of a virtual
attribute for an entry and get the result which is similar to an index
lookup.  Behind the scenes the virtual subsystem may actually compute the
value, lookup cached values, or read the value from disk or access some
external store.  In this process, the partition containing the entry may be
accessed to acquire more information.  Since this will most likely be built
into the Xdbm search engine all partitions involved will most likely have
indices and will be able to expose them.

In addition to lookup requests for specific entries, Cursors can be acquired
to access sets of entries satisfying some assertion on the virtual
attribute.  This virtual Cursor can be used by the search engine to build a
Cursor system encorporating the virtual assertions into the search result.
Of course since virtualization can be expensive (computing or going over the
network), these assertions on virtual attributes will have the lowest
priority.  The idea is to constrain the amount of computation we need to do
by restricting the search space to as small as possible.  The presence of
the schema extension to designate attributeTypes as virtual will allow us to
do this.

----------------

With this configuration, the search engine will compose a system of cursors
built to reflect the search filter while consulting the virtual subsystem to
perform lookups and request virtual cursors.  Once incorporated into the
system of cursors, the product is bubbled back up the system to be used as
before.  Other layers of the server are not impacted but now need not worry
about inject virtual attributes any longer (i.e. the collective system
manually injects today).

I know this might not be perfect and several tweaks and optimizations will
be required.  However I think this is a design we can work on to achieve a
solid means to deal with virtual attributes in search.  This does not
however explain how we are going to allow for user specified virtualization,
or manage it in general.  This is just a first step to get lookups and
search on virtual attributeTypes working.

Comments?

Alex