You are viewing a plain text version of this content. The canonical link for it is here.
Posted to kerby@directory.apache.org by "Zheng, Kai" <ka...@intel.com> on 2015/09/21 13:29:37 UTC

Transaction support for Kerby backend

Hi all,

This is proposing to add transaction support API for Kerby backend for efficiency. Kerby provides various backends, some of them is file based, like the Json one. I'm attempting to add another one based on Google Flatbuffers format. Such backends based on simple file would be better to have transaction support for efficiency. In existing codes, every call to addIdentity/updateIdentity/deleteIdentity will require to write the memory buffer/states to the disk file, quite inefficiently.

For simple, it would be good enough to be:

1.       A backend instance allows only one transaction at a time;

2.       When it's in a transaction, any mutation operation via non-transaction API (existing one) will be denied;

3.       In a transaction, multiple mutation operations can be made via the new transaction API, and states are only updated to the memory, no store/save/flush to the disk file;

4.       When the transaction ends, the memory state will be persisted/synced to the disk file, then the update content will be visible to other backend instances if it reloads.

5.       For backends that use a system already supporting transaction, like Mavibot, LDAP and Zookeeper, the new transaction API will have default implementation that performs no-op.

I'd like to hold on some time for feedbacks before proceed, since I'm not expert for such system designs. I wish it to be practical, simple, efficiency, and easy to do.
Thanks.

Regards,
Kai

Re: Transaction support for Kerby backend

Posted by Emmanuel Lécharny <el...@gmail.com>.
Le 25/09/15 10:02, Zheng, Kai a écrit :
> Thanks Emmanuel for the good thoughts. You had a clear saying about what transaction means. I will look down into existing Kerby backends considering how to define/refine the transaction semantics. Generally, we would deligate transaction support to the underlying system if the backend is just a thin wrapper to it; otherwise, like flat file based, we need to come up a simple approach to support transaction and operations efficiently and reliably. Will be back to this in some time later. Thanks.

Ideally speaking, you should be the one defining the transaction
frontiers (ie start and end). If the underlying backend does not support
that, then it's not the right backend, and yiu will have to workaround it.


RE: Transaction support for Kerby backend

Posted by "Zheng, Kai" <ka...@intel.com>.
Thanks Emmanuel for the good thoughts. You had a clear saying about what transaction means. I will look down into existing Kerby backends considering how to define/refine the transaction semantics. Generally, we would deligate transaction support to the underlying system if the backend is just a thin wrapper to it; otherwise, like flat file based, we need to come up a simple approach to support transaction and operations efficiently and reliably. Will be back to this in some time later. Thanks.

Regards,
Kai

-----Original Message-----
From: Emmanuel Lécharny [mailto:elecharny@gmail.com] 
Sent: Thursday, September 24, 2015 10:22 PM
To: kerby@directory.apache.org
Subject: Re: Transaction support for Kerby backend

Le 21/09/15 13:29, Zheng, Kai a écrit :
> Hi all,
>
> This is proposing to add transaction support API for Kerby backend for efficiency. Kerby provides various backends, some of them is file based, like the Json one. I'm attempting to add another one based on Google Flatbuffers format. Such backends based on simple file would be better to have transaction support for efficiency. In existing codes, every call to addIdentity/updateIdentity/deleteIdentity will require to write the memory buffer/states to the disk file, quite inefficiently.
>
> For simple, it would be good enough to be:
>
> 1.       A backend instance allows only one transaction at a time;
>
> 2.       When it's in a transaction, any mutation operation via non-transaction API (existing one) will be denied;
>
> 3.       In a transaction, multiple mutation operations can be made via the new transaction API, and states are only updated to the memory, no store/save/flush to the disk file;
>
> 4.       When the transaction ends, the memory state will be persisted/synced to the disk file, then the update content will be visible to other backend instances if it reloads.
>
> 5.       For backends that use a system already supporting transaction, like Mavibot, LDAP and Zookeeper, the new transaction API will have default implementation that performs no-op.
>
> I'd like to hold on some time for feedbacks before proceed, since I'm not expert for such system designs. I wish it to be practical, simple, efficiency, and easy to do.
> Thanks.

So, let's talk about what is a transaction, in the kerby context.

Let's first talk about what is a Transaction in ApacheDS, just for the record. In ApacheDS, when we update an entry, we impact many B-trees (the Master table, which holds the entries, and the various indexes). We currently don't have a cross-B-tree transaction, but we are working on it. the idea is that either all the B-trees are updated as a whole, or none of them are. That makes teh LDAP server consistent. It also implies that reads aren't impacted by the writes.

In a Kerberos context, the problem is exactly the same, as many elements might be modified when injecting some new element.

How I see the thing :
- if the underlying repository (or backend) does not support transactions, then you are a bit in trouble. Typically, if the backend is using flat files, then that means you have to copy the files when updating them, which might be a costly operation, then swap the files when done, and let the new readers using this new files, while the existing readers keep going with the old files... Not simple !

JDBM has the exact same problem : it support transaction at the B-tree level, but not across B-trees, which makes it  a wrong backend (as every of us are realized 3 years ago :/) and this is the reason we started Mavibot !

So I think kerby *NEEDS* to define a transaction layer on top of the backends, but it also has to ensure than the backends support transactions.

Thoughts ?


Re: Transaction support for Kerby backend

Posted by Emmanuel Lécharny <el...@gmail.com>.
Le 21/09/15 13:29, Zheng, Kai a écrit :
> Hi all,
>
> This is proposing to add transaction support API for Kerby backend for efficiency. Kerby provides various backends, some of them is file based, like the Json one. I'm attempting to add another one based on Google Flatbuffers format. Such backends based on simple file would be better to have transaction support for efficiency. In existing codes, every call to addIdentity/updateIdentity/deleteIdentity will require to write the memory buffer/states to the disk file, quite inefficiently.
>
> For simple, it would be good enough to be:
>
> 1.       A backend instance allows only one transaction at a time;
>
> 2.       When it's in a transaction, any mutation operation via non-transaction API (existing one) will be denied;
>
> 3.       In a transaction, multiple mutation operations can be made via the new transaction API, and states are only updated to the memory, no store/save/flush to the disk file;
>
> 4.       When the transaction ends, the memory state will be persisted/synced to the disk file, then the update content will be visible to other backend instances if it reloads.
>
> 5.       For backends that use a system already supporting transaction, like Mavibot, LDAP and Zookeeper, the new transaction API will have default implementation that performs no-op.
>
> I'd like to hold on some time for feedbacks before proceed, since I'm not expert for such system designs. I wish it to be practical, simple, efficiency, and easy to do.
> Thanks.

So, let's talk about what is a transaction, in the kerby context.

Let's first talk about what is a Transaction in ApacheDS, just for the
record. In ApacheDS, when we update an entry, we impact many B-trees
(the Master table, which holds the entries, and the various indexes). We
currently don't have a cross-B-tree transaction, but we are working on
it. the idea is that either all the B-trees are updated as a whole, or
none of them are. That makes teh LDAP server consistent. It also implies
that reads aren't impacted by the writes.

In a Kerberos context, the problem is exactly the same, as many elements
might be modified when injecting some new element.

How I see the thing :
- if the underlying repository (or backend) does not support
transactions, then you are a bit in trouble. Typically, if the backend
is using flat files, then that means you have to copy the files when
updating them, which might be a costly operation, then swap the files
when done, and let the new readers using this new files, while the
existing readers keep going with the old files... Not simple !

JDBM has the exact same problem : it support transaction at the B-tree
level, but not across B-trees, which makes it  a wrong backend (as every
of us are realized 3 years ago :/) and this is the reason we started
Mavibot !

So I think kerby *NEEDS* to define a transaction layer on top of the
backends, but it also has to ensure than the backends support transactions.

Thoughts ?