You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@directory.apache.org by Emmanuel Lecharny <el...@gmail.com> on 2012/01/20 18:47:11 UTC

Heads up

Hi guys,

I was a bit silent last week and this week, let me update you about what 
I was working on.

First of all, I have had to deal with some familly issues, which ate 
half of my time.

Regarding the Txn branch I was working on until last week, I stopped 
because I was not able to fix the code without a serious help from 
Selcuk. As he is busy, I preferred to wait for him to be available 
again, instead of bullying into the code and break it seriously. I 
believe that there are some improvement since the moment I started to 
work on the branch, but it's not working fully yet.

So I switched to something we wanted to do a long time ago : designing a 
new version of JDBM. JDBM is a BTree implementation, with locks to 
protect concurrent access. The idea was to implement a MVCC solution on 
top of a BTree :
- each search can be done concurrently with any other operation, because 
it asks for a specific existing revision from the btree
- each modification is done on a new revision
- two modifications can't be done at the same time (so modifications are 
queued and executed one after the other)

The consequence is that searches will be very fast. It comes to a price 
though : we keep a track on every revision, until it's not used anymore. 
This is done by copying every modified pages when applying some 
modification.

As of today, the addition operation and the find operation is working 
just fine. I conducted some benchmark on additions, and it seems that 
the system is pretty decent.

A *lot* remains to be done :
- deletion must be implemented
- browsing the tree is not yet implemented
- it's all in memory atm
- we must add some semaphore for concurrent modifications
- a GC must be added to discard unused pages

But most of all, as it's a in-memory btree atm, I must add the disk 
layer. It will be based on Memory Mapped files.

Once those preliminary works will be done, the idea is to use this 
implementation to replace JDBM. That would make the server consistent, 
and we may then use it without the in-memory txn layer.

Not to say that this txn layer is useless; using a MVCC btree based 
backend is *not* enough : we have no way to guarantee the atomicity of 
move operation across partitions.

This work has been done in my sandbox, where you can follow the work in 
progress : 
http://svn.apache.org/viewvc/directory/sandbox/elecharny/shared-mvbt

At the same time, thanks to Pierre-Arnaud, a first milstone of Studio 
2.0 has been released, and it exposed some nasty bugs in the LDAP API. 
Which is actually a good thing : we can fix them !

So keep tuned, a lot of new things are coming soon !

-- 
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com

Re: Heads up

Posted by Selcuk AYA <ay...@gmail.com>.

On Fri, Jan 20, 2012 at 9:47 AM, Emmanuel Lecharny <el...@gmail.com> wrote:
> Hi guys,
>
> I was a bit silent last week and this week, let me update you about what I
> was working on.
>
> First of all, I have had to deal with some familly issues, which ate half of
> my time.
>
> Regarding the Txn branch I was working on until last week, I stopped because
> I was not able to fix the code without a serious help from Selcuk. As he is
> busy, I preferred to wait for him to be available again, instead of bullying
> into the code and break it seriously. I believe that there are some
> improvement since the moment I started to work on the branch, but it's not
> working fully yet.

I am starting to look into this again. I did an svn up and txn manager
tests in core.shared are not working. Do you have the same problem?
Also what was the test you thought was returning incorrect results? I
couldnt find the relevant email.


>
> So I switched to something we wanted to do a long time ago : designing a new
> version of JDBM. JDBM is a BTree implementation, with locks to protect
> concurrent access. The idea was to implement a MVCC solution on top of a
> BTree :
> - each search can be done concurrently with any other operation, because it
> asks for a specific existing revision from the btree
> - each modification is done on a new revision
> - two modifications can't be done at the same time (so modifications are
> queued and executed one after the other)
>
> The consequence is that searches will be very fast. It comes to a price
> though : we keep a track on every revision, until it's not used anymore.
> This is done by copying every modified pages when applying some
> modification.
>
> As of today, the addition operation and the find operation is working just
> fine. I conducted some benchmark on additions, and it seems that the system
> is pretty decent.
>
> A *lot* remains to be done :
> - deletion must be implemented
> - browsing the tree is not yet implemented
> - it's all in memory atm
> - we must add some semaphore for concurrent modifications
> - a GC must be added to discard unused pages
>
> But most of all, as it's a in-memory btree atm, I must add the disk layer.
> It will be based on Memory Mapped files.
>
> Once those preliminary works will be done, the idea is to use this
> implementation to replace JDBM. That would make the server consistent, and
> we may then use it without the in-memory txn layer.
>
> Not to say that this txn layer is useless; using a MVCC btree based backend
> is *not* enough : we have no way to guarantee the atomicity of move
> operation across partitions.
>
> This work has been done in my sandbox, where you can follow the work in
> progress :
> http://svn.apache.org/viewvc/directory/sandbox/elecharny/shared-mvbt
>
> At the same time, thanks to Pierre-Arnaud, a first milstone of Studio 2.0
> has been released, and it exposed some nasty bugs in the LDAP API. Which is
> actually a good thing : we can fix them !
>
> So keep tuned, a lot of new things are coming soon !
>
> --
> Regards,
> Cordialement,
> Emmanuel Lécharny
> www.iktek.com
>

thanks
Selcuk

Re: Heads up

Posted by Howard Chu <hy...@symas.com>.

Emmanuel Lecharny wrote:
> Hi guys,
>
> I was a bit silent last week and this week, let me update you about what
> I was working on.

Sounds very cool!
>
> First of all, I have had to deal with some familly issues, which ate
> half of my time.
>
> Regarding the Txn branch I was working on until last week, I stopped
> because I was not able to fix the code without a serious help from
> Selcuk. As he is busy, I preferred to wait for him to be available
> again, instead of bullying into the code and break it seriously. I
> believe that there are some improvement since the moment I started to
> work on the branch, but it's not working fully yet.
>
> So I switched to something we wanted to do a long time ago : designing a
> new version of JDBM. JDBM is a BTree implementation, with locks to
> protect concurrent access. The idea was to implement a MVCC solution on
> top of a BTree :
> - each search can be done concurrently with any other operation, because
> it asks for a specific existing revision from the btree
> - each modification is done on a new revision
> - two modifications can't be done at the same time (so modifications are
> queued and executed one after the other)
>
> The consequence is that searches will be very fast. It comes to a price
> though : we keep a track on every revision, until it's not used anymore.
> This is done by copying every modified pages when applying some
> modification.
>
> As of today, the addition operation and the find operation is working
> just fine. I conducted some benchmark on additions, and it seems that
> the system is pretty decent.
>
> A *lot* remains to be done :
> - deletion must be implemented
> - browsing the tree is not yet implemented
> - it's all in memory atm
> - we must add some semaphore for concurrent modifications
> - a GC must be added to discard unused pages
>
> But most of all, as it's a in-memory btree atm, I must add the disk
> layer. It will be based on Memory Mapped files.
>
> Once those preliminary works will be done, the idea is to use this
> implementation to replace JDBM. That would make the server consistent,
> and we may then use it without the in-memory txn layer.
>
> Not to say that this txn layer is useless; using a MVCC btree based
> backend is *not* enough : we have no way to guarantee the atomicity of
> move operation across partitions.
>
> This work has been done in my sandbox, where you can follow the work in
> progress :
> http://svn.apache.org/viewvc/directory/sandbox/elecharny/shared-mvbt
>
> At the same time, thanks to Pierre-Arnaud, a first milstone of Studio
> 2.0 has been released, and it exposed some nasty bugs in the LDAP API.
> Which is actually a good thing : we can fix them !
>
> So keep tuned, a lot of new things are coming soon !
>


-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/

Re: Heads up

Posted by Selcuk AYA <ay...@gmail.com>.

On Fri, Jan 20, 2012 at 4:48 PM, Emmanuel Lecharny <el...@gmail.com> wrote:
> Just FYI, I will continue to play around the BTree I've pushed in my branch.
> It's funny, and it helps me to understand the concepts without having to go
> deep into books and existing algorithm. Call it an exercise, this is exactly
> what it is.
>
> As far as I can tell, it also seems that JDBM has a few sub-optimal methods
> : to insert N nodes into a BTree, JDBM do 10% more comparison than I do in
> my implementation. We can probably shave those extra 10% from JDBM code
> later (I tried, but it had some bad side effect when done brutally).
>
> Otherwise, I do think that if your modified JDBM already implements the MVCC
> Btree, then we should use it in trunk.
>
> I'm also wondering if there is any added GC in the modified JDBM code ? Just
> because if we don't have one, the JDBM files will grow very quickly.

there is GC in the added code.

>
>
>
> On 1/20/12 10:07 PM, Selcuk AYA wrote:
>>
>> On Fri, Jan 20, 2012 at 9:47 AM, Emmanuel Lecharny<el...@gmail.com>
>>  wrote:
>>>
>>> Hi guys,
>>>
>>> I was a bit silent last week and this week, let me update you about what
>>> I
>>> was working on.
>>>
>>> First of all, I have had to deal with some familly issues, which ate half
>>> of
>>> my time.
>>>
>>> Regarding the Txn branch I was working on until last week, I stopped
>>> because
>>> I was not able to fix the code without a serious help from Selcuk. As he
>>> is
>>> busy, I preferred to wait for him to be available again, instead of
>>> bullying
>>> into the code and break it seriously. I believe that there are some
>>> improvement since the moment I started to work on the branch, but it's
>>> not
>>> working fully yet.
>>>
>>> So I switched to something we wanted to do a long time ago : designing a
>>> new
>>> version of JDBM. JDBM is a BTree implementation, with locks to protect
>>> concurrent access. The idea was to implement a MVCC solution on top of a
>>> BTree :
>>> - each search can be done concurrently with any other operation, because
>>> it
>>> asks for a specific existing revision from the btree
>>> - each modification is done on a new revision
>>> - two modifications can't be done at the same time (so modifications are
>>> queued and executed one after the other)
>>
>>
>> Exactly what you are saying here is already implemented in JDBM thanks
>> to  the latest changes. How is what you are implementing is different?
>>
>>> The consequence is that searches will be very fast. It comes to a price
>>> though : we keep a track on every revision, until it's not used anymore.
>>> This is done by copying every modified pages when applying some
>>> modification.
>>>
>>> As of today, the addition operation and the find operation is working
>>> just
>>> fine. I conducted some benchmark on additions, and it seems that the
>>> system
>>> is pretty decent.
>>>
>>> A *lot* remains to be done :
>>> - deletion must be implemented
>>> - browsing the tree is not yet implemented
>>> - it's all in memory atm
>>> - we must add some semaphore for concurrent modifications
>>> - a GC must be added to discard unused pages
>>>
>>> But most of all, as it's a in-memory btree atm, I must add the disk
>>> layer.
>>> It will be based on Memory Mapped files.
>>>
>>> Once those preliminary works will be done, the idea is to use this
>>> implementation to replace JDBM. That would make the server consistent,
>>> and
>>> we may then use it without the in-memory txn layer.
>>>
>>> Not to say that this txn layer is useless; using a MVCC btree based
>>> backend
>>> is *not* enough : we have no way to guarantee the atomicity of move
>>> operation across partitions.
>>
>> Txn layer enables us to group multiple atomic atomic changes and
>> consistent search into a single atomic unit. For example, when you
>> change an entry, change to the entry and indices happen as an atomic
>> unit. Do you have code to provide this at your new implementation?
>>
>>> This work has been done in my sandbox, where you can follow the work in
>>> progress :
>>> http://svn.apache.org/viewvc/directory/sandbox/elecharny/shared-mvbt
>>>
>>> At the same time, thanks to Pierre-Arnaud, a first milstone of Studio 2.0
>>> has been released, and it exposed some nasty bugs in the LDAP API. Which
>>> is
>>> actually a good thing : we can fix them !
>>>
>>> So keep tuned, a lot of new things are coming soon !
>>>
>>> --
>>> Regards,
>>> Cordialement,
>>> Emmanuel Lécharny
>>> www.iktek.com
>>>
>
>
> --
> Regards,
> Cordialement,
> Emmanuel Lécharny
> www.iktek.com
>

Re: Heads up

Posted by Emmanuel Lecharny <el...@gmail.com>.

Just FYI, I will continue to play around the BTree I've pushed in my 
branch. It's funny, and it helps me to understand the concepts without 
having to go deep into books and existing algorithm. Call it an 
exercise, this is exactly what it is.

As far as I can tell, it also seems that JDBM has a few sub-optimal 
methods : to insert N nodes into a BTree, JDBM do 10% more comparison 
than I do in my implementation. We can probably shave those extra 10% 
from JDBM code later (I tried, but it had some bad side effect when done 
brutally).

Otherwise, I do think that if your modified JDBM already implements the 
MVCC Btree, then we should use it in trunk.

I'm also wondering if there is any added GC in the modified JDBM code ? 
Just because if we don't have one, the JDBM files will grow very quickly.


On 1/20/12 10:07 PM, Selcuk AYA wrote:
> On Fri, Jan 20, 2012 at 9:47 AM, Emmanuel Lecharny<el...@gmail.com>  wrote:
>> Hi guys,
>>
>> I was a bit silent last week and this week, let me update you about what I
>> was working on.
>>
>> First of all, I have had to deal with some familly issues, which ate half of
>> my time.
>>
>> Regarding the Txn branch I was working on until last week, I stopped because
>> I was not able to fix the code without a serious help from Selcuk. As he is
>> busy, I preferred to wait for him to be available again, instead of bullying
>> into the code and break it seriously. I believe that there are some
>> improvement since the moment I started to work on the branch, but it's not
>> working fully yet.
>>
>> So I switched to something we wanted to do a long time ago : designing a new
>> version of JDBM. JDBM is a BTree implementation, with locks to protect
>> concurrent access. The idea was to implement a MVCC solution on top of a
>> BTree :
>> - each search can be done concurrently with any other operation, because it
>> asks for a specific existing revision from the btree
>> - each modification is done on a new revision
>> - two modifications can't be done at the same time (so modifications are
>> queued and executed one after the other)
>
> Exactly what you are saying here is already implemented in JDBM thanks
> to  the latest changes. How is what you are implementing is different?
>
>> The consequence is that searches will be very fast. It comes to a price
>> though : we keep a track on every revision, until it's not used anymore.
>> This is done by copying every modified pages when applying some
>> modification.
>>
>> As of today, the addition operation and the find operation is working just
>> fine. I conducted some benchmark on additions, and it seems that the system
>> is pretty decent.
>>
>> A *lot* remains to be done :
>> - deletion must be implemented
>> - browsing the tree is not yet implemented
>> - it's all in memory atm
>> - we must add some semaphore for concurrent modifications
>> - a GC must be added to discard unused pages
>>
>> But most of all, as it's a in-memory btree atm, I must add the disk layer.
>> It will be based on Memory Mapped files.
>>
>> Once those preliminary works will be done, the idea is to use this
>> implementation to replace JDBM. That would make the server consistent, and
>> we may then use it without the in-memory txn layer.
>>
>> Not to say that this txn layer is useless; using a MVCC btree based backend
>> is *not* enough : we have no way to guarantee the atomicity of move
>> operation across partitions.
> Txn layer enables us to group multiple atomic atomic changes and
> consistent search into a single atomic unit. For example, when you
> change an entry, change to the entry and indices happen as an atomic
> unit. Do you have code to provide this at your new implementation?
>
>> This work has been done in my sandbox, where you can follow the work in
>> progress :
>> http://svn.apache.org/viewvc/directory/sandbox/elecharny/shared-mvbt
>>
>> At the same time, thanks to Pierre-Arnaud, a first milstone of Studio 2.0
>> has been released, and it exposed some nasty bugs in the LDAP API. Which is
>> actually a good thing : we can fix them !
>>
>> So keep tuned, a lot of new things are coming soon !
>>
>> --
>> Regards,
>> Cordialement,
>> Emmanuel Lécharny
>> www.iktek.com
>>


-- 
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com

Re: Heads up

Posted by Emmanuel Lecharny <el...@gmail.com>.

On 1/20/12 10:07 PM, Selcuk AYA wrote:
> On Fri, Jan 20, 2012 at 9:47 AM, Emmanuel Lecharny<el...@gmail.com>  wrote:
>> Hi guys,
>>
>> I was a bit silent last week and this week, let me update you about what I
>> was working on.
>>
>> First of all, I have had to deal with some familly issues, which ate half of
>> my time.
>>
>> Regarding the Txn branch I was working on until last week, I stopped because
>> I was not able to fix the code without a serious help from Selcuk. As he is
>> busy, I preferred to wait for him to be available again, instead of bullying
>> into the code and break it seriously. I believe that there are some
>> improvement since the moment I started to work on the branch, but it's not
>> working fully yet.
>>
>> So I switched to something we wanted to do a long time ago : designing a new
>> version of JDBM. JDBM is a BTree implementation, with locks to protect
>> concurrent access. The idea was to implement a MVCC solution on top of a
>> BTree :
>> - each search can be done concurrently with any other operation, because it
>> asks for a specific existing revision from the btree
>> - each modification is done on a new revision
>> - two modifications can't be done at the same time (so modifications are
>> queued and executed one after the other)
>
> Exactly what you are saying here is already implemented in JDBM thanks
> to  the latest changes. How is what you are implementing is different?

Damn it... I had to dig back into last august mails to realize that...

Call it a complete duplication of effort. The sad thing is that I didn't 
start from the JDBM code in the branch, otherwise I would have 
immediately realize that it was already implemented, but from the trunk 
code.

We have had a quick convo last week with Alex, and I told him that I was 
going to play around this MVCC-btree idea a bit, waiting for you to come 
back, and it seems none of us remembered that it was already a done work.

>
>> The consequence is that searches will be very fast. It comes to a price
>> though : we keep a track on every revision, until it's not used anymore.
>> This is done by copying every modified pages when applying some
>> modification.
>>
>> As of today, the addition operation and the find operation is working just
>> fine. I conducted some benchmark on additions, and it seems that the system
>> is pretty decent.
>>
>> A *lot* remains to be done :
>> - deletion must be implemented
>> - browsing the tree is not yet implemented
>> - it's all in memory atm
>> - we must add some semaphore for concurrent modifications
>> - a GC must be added to discard unused pages
>>
>> But most of all, as it's a in-memory btree atm, I must add the disk layer.
>> It will be based on Memory Mapped files.
>>
>> Once those preliminary works will be done, the idea is to use this
>> implementation to replace JDBM. That would make the server consistent, and
>> we may then use it without the in-memory txn layer.
>>
>> Not to say that this txn layer is useless; using a MVCC btree based backend
>> is *not* enough : we have no way to guarantee the atomicity of move
>> operation across partitions.
> Txn layer enables us to group multiple atomic atomic changes and
> consistent search into a single atomic unit. For example, when you
> change an entry, change to the entry and indices happen as an atomic
> unit. Do you have code to provide this at your new implementation?
No. In my mind, and this is what I tried to explain, the best way to 
offer this grouped changes atomicity is to use the txn layer you are 
working on.

Do you have any idea about when you'll be back with us ? I'm afraid that 
I'm losing my time on things you either have already done, or you can do 
better than me, and I'm musing around trying to get those things done 
when I might have spent my time in other -less urgent- area...

-- 
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com

Re: Heads up

Posted by Selcuk AYA <ay...@gmail.com>.

On Fri, Jan 20, 2012 at 9:47 AM, Emmanuel Lecharny <el...@gmail.com> wrote:
> Hi guys,
>
> I was a bit silent last week and this week, let me update you about what I
> was working on.
>
> First of all, I have had to deal with some familly issues, which ate half of
> my time.
>
> Regarding the Txn branch I was working on until last week, I stopped because
> I was not able to fix the code without a serious help from Selcuk. As he is
> busy, I preferred to wait for him to be available again, instead of bullying
> into the code and break it seriously. I believe that there are some
> improvement since the moment I started to work on the branch, but it's not
> working fully yet.
>
> So I switched to something we wanted to do a long time ago : designing a new
> version of JDBM. JDBM is a BTree implementation, with locks to protect
> concurrent access. The idea was to implement a MVCC solution on top of a
> BTree :
> - each search can be done concurrently with any other operation, because it
> asks for a specific existing revision from the btree
> - each modification is done on a new revision
> - two modifications can't be done at the same time (so modifications are
> queued and executed one after the other)


Exactly what you are saying here is already implemented in JDBM thanks
to  the latest changes. How is what you are implementing is different?

>
> The consequence is that searches will be very fast. It comes to a price
> though : we keep a track on every revision, until it's not used anymore.
> This is done by copying every modified pages when applying some
> modification.
>
> As of today, the addition operation and the find operation is working just
> fine. I conducted some benchmark on additions, and it seems that the system
> is pretty decent.
>
> A *lot* remains to be done :
> - deletion must be implemented
> - browsing the tree is not yet implemented
> - it's all in memory atm
> - we must add some semaphore for concurrent modifications
> - a GC must be added to discard unused pages
>
> But most of all, as it's a in-memory btree atm, I must add the disk layer.
> It will be based on Memory Mapped files.
>
> Once those preliminary works will be done, the idea is to use this
> implementation to replace JDBM. That would make the server consistent, and
> we may then use it without the in-memory txn layer.
>
> Not to say that this txn layer is useless; using a MVCC btree based backend
> is *not* enough : we have no way to guarantee the atomicity of move
> operation across partitions.

Txn layer enables us to group multiple atomic atomic changes and
consistent search into a single atomic unit. For example, when you
change an entry, change to the entry and indices happen as an atomic
unit. Do you have code to provide this at your new implementation?

>
> This work has been done in my sandbox, where you can follow the work in
> progress :
> http://svn.apache.org/viewvc/directory/sandbox/elecharny/shared-mvbt
>
> At the same time, thanks to Pierre-Arnaud, a first milstone of Studio 2.0
> has been released, and it exposed some nasty bugs in the LDAP API. Which is
> actually a good thing : we can fix them !
>
> So keep tuned, a lot of new things are coming soon !
>
> --
> Regards,
> Cordialement,
> Emmanuel Lécharny
> www.iktek.com
>