You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@directory.apache.org by Alex Karasulu <ak...@apache.org> on 2010/05/08 09:43:23 UTC

[ApacheDS] [XDBM Partition] Using a global UUID instead of partition specific Long ID PK

Hi all,

Any thoughts about using the globally visible UUID in the XDBM partition
design for the primary key for Entries instead of using a partition specific
 Long ID?

I'm thinking we need one day to implement certain features. Let me list then
and also point out why using the globally unique UUID might be advantageous:

(1) System wide DN and Entry Cache

      Rather than having each partition manage it's own cache a central DN
and Entry cache makes sense. In this case a global identifier for an entry
might come in handy for hashing cached values.

(2) Nested Partitions, Default Root Partition, Hash Partitioning and Range
Partitioning

      At some point we will want to have nestable partitions. This means we
can have one ADS Partition mounted under another ADS Partition with
operation routing taking place properly to the nested partition where
appropriate.

      Nested partitions will also allow us to also have a default root
partition from which we can mount other partitions.  The default root
partition is nice to have since it allows us to add administrative areas and
their administrative points with subentries onto the root empty string DN.
 It also makes it so the RootDSE is now stored in this partition properly
with persistence.  Right now the RootDSE is generated and not mutable.

      Hash partitioning and range partitioning entails distributing entries
across partitions under some container entry based on some value. Hash
partitioning uses the value's hash to distribute entries where as range
partitioning uses ranges of values to distribute the entries.  So it's not
really the DN that determines which partition the entry is pushed into but
this hash or range value. This makes it so we can scale to very large
numbers of entries in the DIT while also distributing the disk access load
across several disk spindles as does Oracle's RDBMS in these kinds of
configurations.

(3) Global Indices

      If we use a globally unique UUID instead of a partition specific Long
ID then we can expose index segments managed by partitions to higher layers
to construct global indices.  These global indices can then be used to
conduct searches outside of the partition one step higher.  This makes it
possible for us to implement certain virtual directory strategies
irregardless of the partition implementations used in a server's
configuration.  The XDBM search algorithm can leverage these global indices
or delegate sub partition search to a partition if a partition uses it's own
search mechanism.  There's a lot to be said here but this is neither the
time or the place to expand on this topic. But global indices is a key
factor for several things including virtualization.

Thoughts?

-- 
Alex Karasulu
My Blog :: http://www.jroller.com/akarasulu/
Apache Directory Server :: http://directory.apache.org
Apache MINA :: http://mina.apache.org
To set up a meeting with me: http://tungle.me/AlexKarasulu

Re: [ApacheDS] [XDBM Partition] Using a global UUID instead of partition specific Long ID PK

Posted by Alex Karasulu <ak...@apache.org>.
On Sun, May 9, 2010 at 11:54 PM, Stefan Seelmann <se...@apache.org>wrote:

> No objection at all.
>
> I updated XDBM to use an <ID> type parameter to be flexible for
> different ID types. The reason was that I wanted to use UUID for the
> HBase partition. If we would use UUID in general for all partitions we
> can remove that type parameter again.
>
>
Hmmm I guess you tried this out with the HBase partition already?  Was
wondering how it worked since the increment for the long is used to update
the on disk stored value.  I would have thought that the ID parameter
extended Numeric or something.

Alex


>
> Alex Karasulu wrote:
> > Hi all,
> >
> > Any thoughts about using the globally visible UUID in the XDBM partition
> > design for the primary key for Entries instead of using a partition
> > specific  Long ID?
> >
> > I'm thinking we need one day to implement certain features. Let me list
> > then and also point out why using the globally unique UUID might be
> > advantageous:
> >
> > (1) System wide DN and Entry Cache
> >
> >       Rather than having each partition manage it's own cache a central
> > DN and Entry cache makes sense. In this case a global identifier for an
> > entry might come in handy for hashing cached values.
> >
> > (2) Nested Partitions, Default Root Partition, Hash Partitioning and
> > Range Partitioning
> >
> >       At some point we will want to have nestable partitions. This means
> > we can have one ADS Partition mounted under another ADS Partition with
> > operation routing taking place properly to the nested partition where
> > appropriate.
> >
> >       Nested partitions will also allow us to also have a default root
> > partition from which we can mount other partitions.  The default root
> > partition is nice to have since it allows us to add administrative areas
> > and their administrative points with subentries onto the root empty
> > string DN.  It also makes it so the RootDSE is now stored in this
> > partition properly with persistence.  Right now the RootDSE is generated
> > and not mutable.
> >
> >       Hash partitioning and range partitioning entails distributing
> > entries across partitions under some container entry based on some
> > value. Hash partitioning uses the value's hash to distribute entries
> > where as range partitioning uses ranges of values to distribute the
> > entries.  So it's not really the DN that determines which partition the
> > entry is pushed into but this hash or range value. This makes it so we
> > can scale to very large numbers of entries in the DIT while also
> > distributing the disk access load across several disk spindles as does
> > Oracle's RDBMS in these kinds of configurations.
> >
> > (3) Global Indices
> >
> >       If we use a globally unique UUID instead of a partition specific
> > Long ID then we can expose index segments managed by partitions to
> > higher layers to construct global indices.  These global indices can
> > then be used to conduct searches outside of the partition one step
> > higher.  This makes it possible for us to implement certain virtual
> > directory strategies irregardless of the partition implementations used
> > in a server's configuration.  The XDBM search algorithm can leverage
> > these global indices or delegate sub partition search to a partition if
> > a partition uses it's own search mechanism.  There's a lot to be said
> > here but this is neither the time or the place to expand on this topic.
> > But global indices is a key factor for several things including
> > virtualization.
> >
> > Thoughts?
> >
> > --
> > Alex Karasulu
> > My Blog :: http://www.jroller.com/akarasulu/
> > Apache Directory Server :: http://directory.apache.org
> > Apache MINA :: http://mina.apache.org
> > To set up a meeting with me: http://tungle.me/AlexKarasulu
>
>


-- 
Alex Karasulu
My Blog :: http://www.jroller.com/akarasulu/
Apache Directory Server :: http://directory.apache.org
Apache MINA :: http://mina.apache.org
To set up a meeting with me: http://tungle.me/AlexKarasulu

Re: [ApacheDS] [XDBM Partition] Using a global UUID instead of partition specific Long ID PK

Posted by Kiran Ayyagari <ka...@apache.org>.
On Mon, May 10, 2010 at 12:21 AM, Emmanuel Lecharny <el...@gmail.com> wrote:
> On 5/9/10 10:54 PM, Stefan Seelmann wrote:
>>
>> No objection at all.
>>
>> I updated XDBM to use an<ID>  type parameter to be flexible for
>> different ID types. The reason was that I wanted to use UUID for the
>> HBase partition. If we would use UUID in general for all partitions we
>> can remove that type parameter again.
>>
>
> We would also have an extra benefit : we won't need anymore the entryUUID
> index, if I'm not wrong.

yeap, provided we use the same entryUUID value as the ID or vice-versa

Kiran Ayyagari

Re: [ApacheDS] [XDBM Partition] Using a global UUID instead of partition specific Long ID PK

Posted by Emmanuel Lecharny <el...@gmail.com>.
On 5/9/10 10:54 PM, Stefan Seelmann wrote:
> No objection at all.
>
> I updated XDBM to use an<ID>  type parameter to be flexible for
> different ID types. The reason was that I wanted to use UUID for the
> HBase partition. If we would use UUID in general for all partitions we
> can remove that type parameter again.
>    

We would also have an extra benefit : we won't need anymore the 
entryUUID index, if I'm not wrong.

-- 
Regards,
Cordialement,
Emmanuel Lécharny
www.nextury.com



Re: [ApacheDS] [XDBM Partition] Using a global UUID instead of partition specific Long ID PK

Posted by Stefan Seelmann <se...@apache.org>.
No objection at all.

I updated XDBM to use an <ID> type parameter to be flexible for
different ID types. The reason was that I wanted to use UUID for the
HBase partition. If we would use UUID in general for all partitions we
can remove that type parameter again.

Kind Regards,
Stefan


Alex Karasulu wrote:
> Hi all,
> 
> Any thoughts about using the globally visible UUID in the XDBM partition
> design for the primary key for Entries instead of using a partition
> specific  Long ID?
> 
> I'm thinking we need one day to implement certain features. Let me list
> then and also point out why using the globally unique UUID might be
> advantageous:
> 
> (1) System wide DN and Entry Cache 
> 
>       Rather than having each partition manage it's own cache a central
> DN and Entry cache makes sense. In this case a global identifier for an
> entry might come in handy for hashing cached values.
> 
> (2) Nested Partitions, Default Root Partition, Hash Partitioning and
> Range Partitioning 
> 
>       At some point we will want to have nestable partitions. This means
> we can have one ADS Partition mounted under another ADS Partition with
> operation routing taking place properly to the nested partition where
> appropriate.  
> 
>       Nested partitions will also allow us to also have a default root
> partition from which we can mount other partitions.  The default root
> partition is nice to have since it allows us to add administrative areas
> and their administrative points with subentries onto the root empty
> string DN.  It also makes it so the RootDSE is now stored in this
> partition properly with persistence.  Right now the RootDSE is generated
> and not mutable.
> 
>       Hash partitioning and range partitioning entails distributing
> entries across partitions under some container entry based on some
> value. Hash partitioning uses the value's hash to distribute entries
> where as range partitioning uses ranges of values to distribute the
> entries.  So it's not really the DN that determines which partition the
> entry is pushed into but this hash or range value. This makes it so we
> can scale to very large numbers of entries in the DIT while also
> distributing the disk access load across several disk spindles as does
> Oracle's RDBMS in these kinds of configurations.
> 
> (3) Global Indices
> 
>       If we use a globally unique UUID instead of a partition specific
> Long ID then we can expose index segments managed by partitions to
> higher layers to construct global indices.  These global indices can
> then be used to conduct searches outside of the partition one step
> higher.  This makes it possible for us to implement certain virtual
> directory strategies irregardless of the partition implementations used
> in a server's configuration.  The XDBM search algorithm can leverage
> these global indices or delegate sub partition search to a partition if
> a partition uses it's own search mechanism.  There's a lot to be said
> here but this is neither the time or the place to expand on this topic.
> But global indices is a key factor for several things including
> virtualization.
> 
> Thoughts?
> 
> -- 
> Alex Karasulu
> My Blog :: http://www.jroller.com/akarasulu/
> Apache Directory Server :: http://directory.apache.org
> Apache MINA :: http://mina.apache.org
> To set up a meeting with me: http://tungle.me/AlexKarasulu


Re: [ApacheDS] [XDBM Partition] Using a global UUID instead of partition specific Long ID PK

Posted by Alex Karasulu <ak...@apache.org>.
On Sat, May 8, 2010 at 12:36 PM, Kiran Ayyagari <ka...@apache.org>wrote:

> On Sat, May 8, 2010 at 11:09 AM, Emmanuel Lecharny <el...@gmail.com>
> wrote:
> > On 5/8/10 9:43 AM, Alex Karasulu wrote:
> >>
> >> Hi all,
> >>
> >> Any thoughts about using the globally visible UUID in the XDBM partition
> >> design for the primary key for Entries instead of using a partition
> >> specific
> >>  Long ID?
> >>
> >> I'm thinking we need one day to implement certain features. Let me list
> >> then
> >> and also point out why using the globally unique UUID might be
> >> advantageous:
> >>
> >> (1) System wide DN and Entry Cache
> >>
> >>       Rather than having each partition manage it's own cache a central
> DN
> >> and Entry cache makes sense. In this case a global identifier for an
> entry
> >> might come in handy for hashing cached values.
> >>
> >> (2) Nested Partitions, Default Root Partition, Hash Partitioning and
> Range
> >> Partitioning
> >>
> >>       At some point we will want to have nestable partitions. This means
> >> we
> >> can have one ADS Partition mounted under another ADS Partition with
> >> operation routing taking place properly to the nested partition where
> >> appropriate.
> >>
> >>       Nested partitions will also allow us to also have a default root
> >> partition from which we can mount other partitions.  The default root
> >> partition is nice to have since it allows us to add administrative areas
> >> and
> >> their administrative points with subentries onto the root empty string
> DN.
> >>  It also makes it so the RootDSE is now stored in this partition
> properly
> >> with persistence.  Right now the RootDSE is generated and not mutable.
> >>
> >>       Hash partitioning and range partitioning entails distributing
> >> entries
> >> across partitions under some container entry based on some value. Hash
> >> partitioning uses the value's hash to distribute entries where as range
> >> partitioning uses ranges of values to distribute the entries.  So it's
> not
> >> really the DN that determines which partition the entry is pushed into
> but
> >> this hash or range value. This makes it so we can scale to very large
> >> numbers of entries in the DIT while also distributing the disk access
> load
> >> across several disk spindles as does Oracle's RDBMS in these kinds of
> >> configurations.
> >>
> >> (3) Global Indices
> >>
> >>       If we use a globally unique UUID instead of a partition specific
> >> Long
> >> ID then we can expose index segments managed by partitions to higher
> >> layers
> >> to construct global indices.  These global indices can then be used to
> >> conduct searches outside of the partition one step higher.  This makes
> it
> >> possible for us to implement certain virtual directory strategies
> >> irregardless of the partition implementations used in a server's
> >> configuration.  The XDBM search algorithm can leverage these global
> >> indices
> >> or delegate sub partition search to a partition if a partition uses it's
> >> own
> >> search mechanism.  There's a lot to be said here but this is neither the
> >> time or the place to expand on this topic. But global indices is a key
> >> factor for several things including virtualization.
> >>
> >> Thoughts?
> >>
> >
> > One other advantage will be that we won't need anymore to store an
> increment
> > on the disk. Atm, each time we add an element in the backend, we have to
> ask
> > for a Long, which has to be unique. This is potentially a bottleneck, and
> > it's costly, as this unique Long has to be stored on disk.
> besides this I see some more advantages
>
> *if* we keep the entryUUID of entry also as the ID of the entry then,
> building the DN using the RDN index will be
> a lot easier (cause finding the parent of an entry requires now a full
> DN construction which can be avoided
> by doing a reverse lookup in RDN idex if we know the entry's ID)
>
> >
> > I don't yet see any other negative impact we can get by using UUID
> instead
> > of Long, except that it will requires more disk space (slightly).
> yeap, and RDN index also takes more disk space now
>
>
Yeah but this disk space is very negligible. Basically the UUID is 16 bytes
and the Long is 8 on intel arch. We're talking about 8 extra bytes here. So
no need to even worry about it. The benefits will outweigh the disadvantages
if this is all we can see for disadvantages.


Regards,
-- 
Alex Karasulu
My Blog :: http://www.jroller.com/akarasulu/
Apache Directory Server :: http://directory.apache.org
Apache MINA :: http://mina.apache.org
To set up a meeting with me: http://tungle.me/AlexKarasulu

Re: [ApacheDS] [XDBM Partition] Using a global UUID instead of partition specific Long ID PK

Posted by Kiran Ayyagari <ka...@apache.org>.
On Sat, May 8, 2010 at 11:09 AM, Emmanuel Lecharny <el...@gmail.com> wrote:
> On 5/8/10 9:43 AM, Alex Karasulu wrote:
>>
>> Hi all,
>>
>> Any thoughts about using the globally visible UUID in the XDBM partition
>> design for the primary key for Entries instead of using a partition
>> specific
>>  Long ID?
>>
>> I'm thinking we need one day to implement certain features. Let me list
>> then
>> and also point out why using the globally unique UUID might be
>> advantageous:
>>
>> (1) System wide DN and Entry Cache
>>
>>       Rather than having each partition manage it's own cache a central DN
>> and Entry cache makes sense. In this case a global identifier for an entry
>> might come in handy for hashing cached values.
>>
>> (2) Nested Partitions, Default Root Partition, Hash Partitioning and Range
>> Partitioning
>>
>>       At some point we will want to have nestable partitions. This means
>> we
>> can have one ADS Partition mounted under another ADS Partition with
>> operation routing taking place properly to the nested partition where
>> appropriate.
>>
>>       Nested partitions will also allow us to also have a default root
>> partition from which we can mount other partitions.  The default root
>> partition is nice to have since it allows us to add administrative areas
>> and
>> their administrative points with subentries onto the root empty string DN.
>>  It also makes it so the RootDSE is now stored in this partition properly
>> with persistence.  Right now the RootDSE is generated and not mutable.
>>
>>       Hash partitioning and range partitioning entails distributing
>> entries
>> across partitions under some container entry based on some value. Hash
>> partitioning uses the value's hash to distribute entries where as range
>> partitioning uses ranges of values to distribute the entries.  So it's not
>> really the DN that determines which partition the entry is pushed into but
>> this hash or range value. This makes it so we can scale to very large
>> numbers of entries in the DIT while also distributing the disk access load
>> across several disk spindles as does Oracle's RDBMS in these kinds of
>> configurations.
>>
>> (3) Global Indices
>>
>>       If we use a globally unique UUID instead of a partition specific
>> Long
>> ID then we can expose index segments managed by partitions to higher
>> layers
>> to construct global indices.  These global indices can then be used to
>> conduct searches outside of the partition one step higher.  This makes it
>> possible for us to implement certain virtual directory strategies
>> irregardless of the partition implementations used in a server's
>> configuration.  The XDBM search algorithm can leverage these global
>> indices
>> or delegate sub partition search to a partition if a partition uses it's
>> own
>> search mechanism.  There's a lot to be said here but this is neither the
>> time or the place to expand on this topic. But global indices is a key
>> factor for several things including virtualization.
>>
>> Thoughts?
>>
>
> One other advantage will be that we won't need anymore to store an increment
> on the disk. Atm, each time we add an element in the backend, we have to ask
> for a Long, which has to be unique. This is potentially a bottleneck, and
> it's costly, as this unique Long has to be stored on disk.
besides this I see some more advantages

*if* we keep the entryUUID of entry also as the ID of the entry then,
building the DN using the RDN index will be
a lot easier (cause finding the parent of an entry requires now a full
DN construction which can be avoided
by doing a reverse lookup in RDN idex if we know the entry's ID)

>
> I don't yet see any other negative impact we can get by using UUID instead
> of Long, except that it will requires more disk space (slightly).
yeap, and RDN index also takes more disk space now

Kiran Ayyagari

Re: [ApacheDS] [XDBM Partition] Using a global UUID instead of partition specific Long ID PK

Posted by Alex Karasulu <ak...@apache.org>.
On Sat, May 8, 2010 at 11:09 AM, Emmanuel Lecharny <el...@gmail.com>wrote:

> On 5/8/10 9:43 AM, Alex Karasulu wrote:
>
>> Hi all,
>>
>> Any thoughts about using the globally visible UUID in the XDBM partition
>> design for the primary key for Entries instead of using a partition
>> specific
>>  Long ID?
>>
>> I'm thinking we need one day to implement certain features. Let me list
>> then
>> and also point out why using the globally unique UUID might be
>> advantageous:
>>
>> (1) System wide DN and Entry Cache
>>
>>       Rather than having each partition manage it's own cache a central DN
>> and Entry cache makes sense. In this case a global identifier for an entry
>> might come in handy for hashing cached values.
>>
>> (2) Nested Partitions, Default Root Partition, Hash Partitioning and Range
>> Partitioning
>>
>>       At some point we will want to have nestable partitions. This means
>> we
>> can have one ADS Partition mounted under another ADS Partition with
>> operation routing taking place properly to the nested partition where
>> appropriate.
>>
>>       Nested partitions will also allow us to also have a default root
>> partition from which we can mount other partitions.  The default root
>> partition is nice to have since it allows us to add administrative areas
>> and
>> their administrative points with subentries onto the root empty string DN.
>>  It also makes it so the RootDSE is now stored in this partition properly
>> with persistence.  Right now the RootDSE is generated and not mutable.
>>
>>       Hash partitioning and range partitioning entails distributing
>> entries
>> across partitions under some container entry based on some value. Hash
>> partitioning uses the value's hash to distribute entries where as range
>> partitioning uses ranges of values to distribute the entries.  So it's not
>> really the DN that determines which partition the entry is pushed into but
>> this hash or range value. This makes it so we can scale to very large
>> numbers of entries in the DIT while also distributing the disk access load
>> across several disk spindles as does Oracle's RDBMS in these kinds of
>> configurations.
>>
>> (3) Global Indices
>>
>>       If we use a globally unique UUID instead of a partition specific
>> Long
>> ID then we can expose index segments managed by partitions to higher
>> layers
>> to construct global indices.  These global indices can then be used to
>> conduct searches outside of the partition one step higher.  This makes it
>> possible for us to implement certain virtual directory strategies
>> irregardless of the partition implementations used in a server's
>> configuration.  The XDBM search algorithm can leverage these global
>> indices
>> or delegate sub partition search to a partition if a partition uses it's
>> own
>> search mechanism.  There's a lot to be said here but this is neither the
>> time or the place to expand on this topic. But global indices is a key
>> factor for several things including virtualization.
>>
>> Thoughts?
>>
>>
> One other advantage will be that we won't need anymore to store an
> increment on the disk. Atm, each time we add an element in the backend, we
> have to ask for a Long, which has to be unique. This is potentially a
> bottleneck, and it's costly, as this unique Long has to be stored on disk.
>
> I don't yet see any other negative impact we can get by using UUID instead
> of Long, except that it will requires more disk space (slightly).
>
>
>
Aye very good point I forgot that the Long ID incrementation is a bottleneck
for add operations on disk. This goes away and should improve adds
significantly.

-- 
Alex Karasulu
My Blog :: http://www.jroller.com/akarasulu/
Apache Directory Server :: http://directory.apache.org
Apache MINA :: http://mina.apache.org
To set up a meeting with me: http://tungle.me/AlexKarasulu

Re: [ApacheDS] [XDBM Partition] Using a global UUID instead of partition specific Long ID PK

Posted by Emmanuel Lecharny <el...@gmail.com>.
On 5/8/10 9:43 AM, Alex Karasulu wrote:
> Hi all,
>
> Any thoughts about using the globally visible UUID in the XDBM partition
> design for the primary key for Entries instead of using a partition specific
>   Long ID?
>
> I'm thinking we need one day to implement certain features. Let me list then
> and also point out why using the globally unique UUID might be advantageous:
>
> (1) System wide DN and Entry Cache
>
>        Rather than having each partition manage it's own cache a central DN
> and Entry cache makes sense. In this case a global identifier for an entry
> might come in handy for hashing cached values.
>
> (2) Nested Partitions, Default Root Partition, Hash Partitioning and Range
> Partitioning
>
>        At some point we will want to have nestable partitions. This means we
> can have one ADS Partition mounted under another ADS Partition with
> operation routing taking place properly to the nested partition where
> appropriate.
>
>        Nested partitions will also allow us to also have a default root
> partition from which we can mount other partitions.  The default root
> partition is nice to have since it allows us to add administrative areas and
> their administrative points with subentries onto the root empty string DN.
>   It also makes it so the RootDSE is now stored in this partition properly
> with persistence.  Right now the RootDSE is generated and not mutable.
>
>        Hash partitioning and range partitioning entails distributing entries
> across partitions under some container entry based on some value. Hash
> partitioning uses the value's hash to distribute entries where as range
> partitioning uses ranges of values to distribute the entries.  So it's not
> really the DN that determines which partition the entry is pushed into but
> this hash or range value. This makes it so we can scale to very large
> numbers of entries in the DIT while also distributing the disk access load
> across several disk spindles as does Oracle's RDBMS in these kinds of
> configurations.
>
> (3) Global Indices
>
>        If we use a globally unique UUID instead of a partition specific Long
> ID then we can expose index segments managed by partitions to higher layers
> to construct global indices.  These global indices can then be used to
> conduct searches outside of the partition one step higher.  This makes it
> possible for us to implement certain virtual directory strategies
> irregardless of the partition implementations used in a server's
> configuration.  The XDBM search algorithm can leverage these global indices
> or delegate sub partition search to a partition if a partition uses it's own
> search mechanism.  There's a lot to be said here but this is neither the
> time or the place to expand on this topic. But global indices is a key
> factor for several things including virtualization.
>
> Thoughts?
>    
One other advantage will be that we won't need anymore to store an 
increment on the disk. Atm, each time we add an element in the backend, 
we have to ask for a Long, which has to be unique. This is potentially a 
bottleneck, and it's costly, as this unique Long has to be stored on disk.

I don't yet see any other negative impact we can get by using UUID 
instead of Long, except that it will requires more disk space (slightly).


-- 
Regards,
Cordialement,
Emmanuel Lécharny
www.nextury.com