You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@directory.apache.org by Selcuk AYA <ay...@gmail.com> on 2012/05/07 11:04:45 UTC

merging txn branch back into the trunk

Hi All,

I think we have a good checkpoint for the transaction branch and we
can merge it back into the trunk. There are significant changes to the
server code and it would be good to review this and understand what
the impact will be as much as possible. Below is a summary of the
general motivation and what has been done so far and affect of it on
the current code.


****CHANGES AFFECTING THE CURRENT  CODE*********

*** The general motive is to implement a logical transaction layer
above partitions. This transaction layer keeps a log of logical
changes. It works above partitions and expects partitions to  conform
to a certain data model and give some consistency guarantees:

                       1) Partitions should expose a MasterTable which
stores Entry objects. They also should expose the expected system
indices and can expose user indices. Master and Index tables are
basically (key, value) tables.
                       2) Supported modification operations on Master
and Index tables should be atomic
                       3) Lookup on Master and Index tables by their
key should return consistent data
                       4) Scan on Master and Index tables should
return committed data(this does not necessarily mean scan will return
data with snapshot consistency, it might return data committed since
it began scanning).



Towards this goal, the following changes have been made:

                      1) JDBM has been changed to provide the above
consistency requirements(done and merged separately).
                      2)  AVL partition has been rewritten to provide
the above consistency guarantees.
                      3) Partition interface has been changed to
expose its MasterTable and the necessary indices and user indices.


***Another goal in this effort was to move the operation execution
logic above partitions so that transactions could be implemented
independent of partitions with the above consistency guarantees.
Towards this goal, DefaultOperationExecutionManager has been
introduced. This mostly copies the logic in AbstractBTreePartition and
executes updates on master tables and indices using log edits rather
than direct updates on them. These log edits are then applied to
partitions using the atomic modifications on Master and Index tables
exposed by partitions. This class is where we really need to review
well to ensure things are OK.

***Partitions still can use any search engine they want and get
transactionally consistent data using the methods exposed by
TxnManager. For DefaultSearchEngine, this was done using Index and
MasterTable wrappers which merged what was read from underlying
partitions with what is in the txn logs. So the search engine code
worked pretty much without any significant change(except the changes
to remove generics and use UUID as described below).

*** To make implementing transactional layer easier, UUID is used as
the key for all partitions. A side effect of this is generics are
mostly removed from the code. So for example index interface is
changed to:
         Index<K>:
          UUID forwardLookup( K attrVal ) throws Exception;
          K reverseLookup( UUID id ) throws Exception;

*** There was a sync method which we called on partitions every x ms.
This has been removed(flushing of partitions handles it)


*** all logical data changes(including schema registries) are
implemented using a single lock. Locks for individual caches are
removed(For example referral management lock is not used)


********OTHER CHANGES*******************

 Rest of the changes have less impact on the existing code. The guts
of the transactional management system is implemented at core.txn
package. It is OCC+MVCC(except the gross hack we use for logical data
handling).  It uses a logging system to do WAL logging.


******OPEN ISSUES***********

Will send another email about these.

******MERGING********

- One and Sub level index removal will probably impact
DefaultOperationExecutionManager.
- From emails, I understand the Index interface assuming UUID might be
a problem with the recent changes. Maybe this needs to be changed too.


Either before or both before and after the merge, we should run a test
with concurrent threads(read+write) and clear out any remaining
issues.


*******TODO********

After the merge, I will work on implementing crash recovery part which
will complete the transactional layer changes.

thanks
Selcuk

Re: merging txn branch back into the trunk

Posted by Emmanuel Lécharny <el...@gmail.com>.
Le 5/7/12 11:04 AM, Selcuk AYA a écrit :
> Hi All,
>
> I think we have a good checkpoint for the transaction branch and we
> can merge it back into the trunk. There are significant changes to the
> server code and it would be good to review this and understand what
> the impact will be as much as possible. Below is a summary of the
> general motivation and what has been done so far and affect of it on
> the current code.
Ok, let me just cut the release (well, as soon I can get through the 
tests without having some failure with the cache :/), then we can merge 
back the branch in the trunk.

I have to run now, I haven't yet read the full mail, will do that later 
today.


-- 
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com


Re: merging txn branch back into the trunk

Posted by Emmanuel Lécharny <el...@gmail.com>.
Le 5/8/12 7:18 AM, Selcuk AYA a écrit :
> On Mon, May 7, 2012 at 3:50 PM, Emmanuel Lécharny<el...@gmail.com>  wrote:
>> Hi,
>>
>> some few comnts inline...
>>
>>
>> Le 5/7/12 11:04 AM, Selcuk AYA a écrit :
>>> Hi All,
>>>
>>> ****CHANGES AFFECTING THE CURRENT  CODE*********
>>>
>>> *** The general motive is to implement a logical transaction layer
>>> above partitions. This transaction layer keeps a log of logical
>>> changes. It works above partitions and expects partitions to  conform
>>> to a certain data model and give some consistency guarantees:
>>>
>>>                         1) Partitions should expose a MasterTable which
>>> stores Entry objects. They also should expose the expected system
>>> indices and can expose user indices. Master and Index tables are
>>> basically (key, value) tables.
>>>                         2) Supported modification operations on Master
>>> and Index tables should be atomic
>> RdnIndex may be updated more than once when adding an entry, do to the
>> 1/SubLevel indexes. If this break atomicity, then we have to deal with that.
>> (see below)
> multiple updtes should not be an issue.
>>>
>>> ***Another goal in this effort was to move the operation execution
>>> logic above partitions so that transactions could be implemented
>>> independent of partitions with the above consistency guarantees.
>>> Towards this goal, DefaultOperationExecutionManager has been
>>> introduced. This mostly copies the logic in AbstractBTreePartition and
>>> executes updates on master tables and indices using log edits rather
>>> than direct updates on them. These log edits are then applied to
>>> partitions using the atomic modifications on Master and Index tables
>>> exposed by partitions. This class is where we really need to review
>>> well to ensure things are OK.
>> Ok, will review this.
>>
>>> ******MERGING********
>>>
>>> - One and Sub level index removal will probably impact
>>> DefaultOperationExecutionManager.
>> The question is to what extent these changes will impact the code.
>>
>>> - From emails, I understand the Index interface assuming UUID might be
>>> a problem with the recent changes. Maybe this needs to be changed too.
>> Hmm, not sure. Moving to get rid of any other type to use UUID instead is
>> the right choice. All the indexes contains relation between a key and some
>> Entry's UUIDs (forward index), and between an UUID and some keys (reverse
>> index). The reverse index is used by the search engine.
>>
>> Do you see any issue with that ?
> What you described is what is done in the branch right now. I thought
> you were going to store nbdescendants and other auxiliary information
> along with UUID. That is why I thought the change would be necessary.
> How is this auxiliary information stored now?
Currently, the RdnIndex has 2 internal indexes (forward and reverse), 
which contain <ParentIdAndRdn, UUID> and <UUID, ParentIdAndRdn> tuples.

The ParentIdAndRdn is a data structure containig :
- the RDN's parent ID (which will become a UUID)
- the entry's RDN
- the number of children
- the number of descendant.

For instance, if you have to store 
cn=akarasulu,dc=apache,dc=example,dc=com into the RDN index, assuming 
that the RootDSE UUID is 0 (I'm using integer for clarity sake, but we 
would assume that this D should be 
|00000000-0000-0000-0000-000000000000), dc=example,dc=com is the 
partition suffix with an UUID of 1, dc=apache an entry which UUID is 2, 
and cn=|akarasulu|an entry which UUID is 3, then the RDNIndex table will 
contain :

Forward index :
<0, 'dc=example,dc=com', 1, 2> --> 1 (this entry - the suffix - has one 
child and 2 descendants, and it's ParentID is 0)
||<1, 'dc=apache', 1, 1> --> 2 (this entry has one child and 1 
descendant||, and it's ParentID is 1||)||
<2, 'cn=|akarasulu|', 0, 0> --> 3 (this entry has no child and no 
descendant||, and it's ParentID is 2||)

Reverse index :
1 --> <||0, 'dc=example,dc=com', 1, 2>|
|2 --> ||<||1, 'dc=apache', 1, 1>|
|3 --> ||<||2, 'cn=akarasulu', 0, 0>|

If we add a new entry under dc=apache, with UUID 4, then the RdnIndex 
becomes :
|Forward index :
<0, 'dc=example,dc=com', 1, 3> --> 1 (this entry - the suffix - has one 
child and 3 descendants, and it's ParentID is 0)
||<1, 'dc=apache', 2, 2>         --> 2 (this entry has two children and 
2 descendant||, and it's ParentID is 1||)||
<2, 'cn=|akarasulu|', 0, 0>     --> 3 (this entry has no child and no 
descendant||, and it's ParentID is 2||)
<2, 'cn=saya', 0, 0>           --> 4 (||this entry has no child and no 
descendant||, and it's ParentID is 2||too)|
|
Reverse index :
1 --> <||0, 'dc=example,dc=com', 1, 3>|
|2 --> ||<||1, 'dc=apache', 2, 2>|
|3 --> ||<||2, 'cn=akarasulu', 0, 0>|
|4 --> ||<||2, 'cn=saya', 0, 0>|

I don't think that switching from a Long to an UUID here makes any 
difference.


-- 
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com


Re: merging txn branch back into the trunk

Posted by Selcuk AYA <ay...@gmail.com>.
On Mon, May 7, 2012 at 3:50 PM, Emmanuel Lécharny <el...@gmail.com> wrote:
> Hi,
>
> some few comnts inline...
>
>
> Le 5/7/12 11:04 AM, Selcuk AYA a écrit :
>>
>> Hi All,
>>
>> ****CHANGES AFFECTING THE CURRENT  CODE*********
>>
>> *** The general motive is to implement a logical transaction layer
>> above partitions. This transaction layer keeps a log of logical
>> changes. It works above partitions and expects partitions to  conform
>> to a certain data model and give some consistency guarantees:
>>
>>                        1) Partitions should expose a MasterTable which
>> stores Entry objects. They also should expose the expected system
>> indices and can expose user indices. Master and Index tables are
>> basically (key, value) tables.
>>                        2) Supported modification operations on Master
>> and Index tables should be atomic
>
> RdnIndex may be updated more than once when adding an entry, do to the
> 1/SubLevel indexes. If this break atomicity, then we have to deal with that.
> (see below)

multiple updtes should not be an issue.
>
>>
>>
>> ***Another goal in this effort was to move the operation execution
>> logic above partitions so that transactions could be implemented
>> independent of partitions with the above consistency guarantees.
>> Towards this goal, DefaultOperationExecutionManager has been
>> introduced. This mostly copies the logic in AbstractBTreePartition and
>> executes updates on master tables and indices using log edits rather
>> than direct updates on them. These log edits are then applied to
>> partitions using the atomic modifications on Master and Index tables
>> exposed by partitions. This class is where we really need to review
>> well to ensure things are OK.
>
> Ok, will review this.
>
>> ******MERGING********
>>
>> - One and Sub level index removal will probably impact
>> DefaultOperationExecutionManager.
>
> The question is to what extent these changes will impact the code.
>
>> - From emails, I understand the Index interface assuming UUID might be
>> a problem with the recent changes. Maybe this needs to be changed too.
>
> Hmm, not sure. Moving to get rid of any other type to use UUID instead is
> the right choice. All the indexes contains relation between a key and some
> Entry's UUIDs (forward index), and between an UUID and some keys (reverse
> index). The reverse index is used by the search engine.
>
> Do you see any issue with that ?

What you described is what is done in the branch right now. I thought
you were going to store nbdescendants and other auxiliary information
along with UUID. That is why I thought the change would be necessary.
How is this auxiliary information stored now?

>
>>
>>
>> Either before or both before and after the merge, we should run a test
>> with concurrent threads(read+write) and clear out any remaining
>> issues.
>
> Ok.
>
>>
>>
>> *******TODO********
>>
>> After the merge, I will work on implementing crash recovery part which
>> will complete the transactional layer changes.
>
> Fine !
>
> Great job !
>
>
>
>
> --
> Regards,
> Cordialement,
> Emmanuel Lécharny
> www.iktek.com
>

Re: merging txn branch back into the trunk

Posted by Emmanuel Lécharny <el...@gmail.com>.
Hi,

some few comnts inline...

Le 5/7/12 11:04 AM, Selcuk AYA a écrit :
> Hi All,
>
> ****CHANGES AFFECTING THE CURRENT  CODE*********
>
> *** The general motive is to implement a logical transaction layer
> above partitions. This transaction layer keeps a log of logical
> changes. It works above partitions and expects partitions to  conform
> to a certain data model and give some consistency guarantees:
>
>                         1) Partitions should expose a MasterTable which
> stores Entry objects. They also should expose the expected system
> indices and can expose user indices. Master and Index tables are
> basically (key, value) tables.
>                         2) Supported modification operations on Master
> and Index tables should be atomic
RdnIndex may be updated more than once when adding an entry, do to the 
1/SubLevel indexes. If this break atomicity, then we have to deal with 
that. (see below)
>
>
> ***Another goal in this effort was to move the operation execution
> logic above partitions so that transactions could be implemented
> independent of partitions with the above consistency guarantees.
> Towards this goal, DefaultOperationExecutionManager has been
> introduced. This mostly copies the logic in AbstractBTreePartition and
> executes updates on master tables and indices using log edits rather
> than direct updates on them. These log edits are then applied to
> partitions using the atomic modifications on Master and Index tables
> exposed by partitions. This class is where we really need to review
> well to ensure things are OK.
Ok, will review this.
> ******MERGING********
>
> - One and Sub level index removal will probably impact
> DefaultOperationExecutionManager.
The question is to what extent these changes will impact the code.
> - From emails, I understand the Index interface assuming UUID might be
> a problem with the recent changes. Maybe this needs to be changed too.
Hmm, not sure. Moving to get rid of any other type to use UUID instead 
is the right choice. All the indexes contains relation between a key and 
some Entry's UUIDs (forward index), and between an UUID and some keys 
(reverse index). The reverse index is used by the search engine.

Do you see any issue with that ?
>
>
> Either before or both before and after the merge, we should run a test
> with concurrent threads(read+write) and clear out any remaining
> issues.
Ok.
>
>
> *******TODO********
>
> After the merge, I will work on implementing crash recovery part which
> will complete the transactional layer changes.
Fine !

Great job !



-- 
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com