You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Daniel Gómez Ferro <da...@yahoo-inc.com> on 2011/11/04 12:24:59 UTC

Omid: Transactional Support for HBase

(I apologize for resending but I forgot to add the user list.)

Hi all,

It is my pleasure to announce the open source release of Omid, a project whose goal is to add lock-free transactional support on top of HBase. The current release includes CrSO, a client-replicated status oracle that detects the write-write conflicts to provide Snapshot Isolation. CrSO has the following appealing properties:

1) It does not need any modification into the HBase code nor the table scheme.
2) The overhead on HBase DataNodes is negligible (only after an abort)
3) It scales up to 50,000 write transactions per second (TPS) and a thousand of client connections.

We have setup a github project: https://github.com/dgomezferro/omid

More information is available at the wiki: https://github.com/dgomezferro/omid/wiki

If you are interested, installation and running instructions are available on the README: https://github.com/dgomezferro/omid/blob/master/README.md

Please do not hesitate to contact us in the case of any question.

Best Regards,
Daniel Gómez Ferro

Re: Omid: Transactional Support for HBase

Posted by Daniel Gómez Ferro <da...@yahoo-inc.com>.

On Nov 8, 2011, at 10:48 , Daniel Gómez Ferro wrote:

> Hi Jignesh
> 
> On Nov 7, 2011, at 21:44 , Jignesh Patel wrote:
> 
>> Looks like this transaction is limited for one row. Is that correct?
>> 
> 
> No, it's not. Transactions can span multiple rows.
> 
>> Another thing I don't have zookeepr installed as I am running in
>> pseudo distibuted mode. The document doesn't say anything about
>> integrating in pseudo distributed mode.
>> 
> 
> Currently Omid requires both ZooKeeper and BookKeeper to operate, but we provide some scripts to launch them locally if you just want to try it. I've just pushed a change so you don't need to install anything manually, just download/checkout Omid, run 'mvn package' and follow the instructions to run the benchmark locally.

Please remember that the repository we are using now is https://github.com/yahoo/omid/ 

> 
> If people still find cumbersome or difficult to run ZK/BK we could provide an option to disable the replication to the WAL.
> 
> Daniel
> 
>> -Jignesh
>> 
>> 2011/11/7 Daniel Gómez Ferro <da...@yahoo-inc.com>:
>>> 
>>> On Nov 6, 2011, at 21:53 , lars hofhansl wrote:
>>> 
>>>> Another question: I assume this will not work out of the box with deletes?
>>> 
>>> Hi,
>>> 
>>> Our current approach does support deletes (i.e., user requested deletes). Right now we use empty values as delete marks: when the user calls TransactionalTable.delete() we insert empty values at the specified timestamp. At the filtering time, we keep track of these delete marks and we can discard the ones that are uncommitted or fall outside our time range of interest. When a transaction aborts, the cleanup procedure deletes the specific values inserted by the transactions (in contrast to all versions). This way we don't insert delete tombstones that mask previous values.
>>> 
>>> The drawbacks of this approach are that (i) we give a special meaning to the empty values, and (ii) to delete the whole column family (in contrast with a column) we have to perform a get beforehand to obtain the column qualifiers.
>>> 
>>>> 
>>>> Deletes always cover all key values in the past (from their timestamps on backwards), so once a delete marker is placed there is no way to get back any of a puts it affects.
>>>> 
>>>> HBase trunk has HBASE-4536 to allow time-range scans to work with deleted rows (but needs to be enabled for a column family - I still think it should be the default, but anyway).
>>>> 
>>> 
>>> I think this feature would be very useful, and enables a cleaner implementation. It would be great if the flag was enabled by default, we want the user to change as little as possible his setup, but it's not a big deal.
>>> 
>>>> -- Lars
>>>> 
>>>> ________________________________
>>>> From: Flavio Junqueira <fp...@yahoo-inc.com>
>>>> To: Daniel Gómez Ferro <da...@yahoo-inc.com>
>>>> Cc: "user@hbase.apache.org" <us...@hbase.apache.org>; lars hofhansl <lh...@yahoo.com>; "dev@hbase.apache.org" <de...@hbase.apache.org>; Maysam Yabandeh <ma...@yahoo-inc.com>; Benjamin Reed <br...@yahoo-inc.com>; Ivan Kelly <iv...@yahoo-inc.com>
>>>> Sent: Sunday, November 6, 2011 7:14 AM
>>>> Subject: Re: Omid: Transactional Support for HBase
>>>> 
>>>> 
>>>> A quick note on Omid for the ones following on github: the repository we will be working with is the fork under the Yahoo! account:
>>>> 
>>>> 
>>>> https://github.com/yahoo/omid/
>>>> 
>>>> -Flavio
>>>> 
>>>> 
>>>> On Nov 5, 2011, at 9:36 PM, Daniel Gómez Ferro wrote:
>>>> 
>>>> 
>>>>> 
>>>>> On Nov 5, 2011, at 05:37 , lars hofhansl wrote:
>>>>> 
>>>>> Cool stuff Daniel,
>>>>>> 
>>>>> 
>>>>> Hi Lars,
>>>>> 
>>>>> Thanks for the good points.
>>>>> 
>>>>> 
>>>>> 
>>>>>> Was looking through the code a bit. Seems like you make a best effort to push as much of
>>>>>> the filtering of KVs of uncommitted transactions to HBase and then do some filtering on the client
>>>>>> not a bad approach. (I hope I didn't misunderstand the approach, only looked through the code for
>>>>>> 1/2 hour or so).
>>>>>> 
>>>>> 
>>>>> Putting it more accurately, the uncommitted KVs are stored at HBase, but it is the client's job to filter them using the commit information that it has received from the status oracle. According to snapshot isolation guarantee, all the versions that are inserted with a timestamp larger than the transaction start timestamp must be ignored, which is done by setting the time range on the client's get request sent to HBase. Since the uncommitted changes of the aborted transactions are eventually removed from HBase, the client rarely needs to fetch more than a version to reach a KV that is committed before the transaction starts (the first property of snapshot isolation).
>>>>> 
>>>>> 
>>>>>> 
>>>>>> One thing I was wondering: Why bookkeeper? Why not store the WAL itself in HBase? That way
>>>>>> you might not even need a separate server.
>>>>>> 
>>>>>> Did you see: HBaseSI (http://www.cs.uwaterloo.ca/~c15zhang/HBaseSI.pdf), they also do MVCC
>>>>>> on top of unaltered HBase/schema, although from reading that paper I get the impression that it
>>>>>> would not scale to scans touching many rows (which is where your client side filtering comes in).
>>>>>> 
>>>>> 
>>>>> 
>>>>> Thanks for the link. We had seen the other paper of the same authors (Grid2010) that shares the same bottlenecks with the recent work.
>>>>> As you pointed out correctly, the question is about performance. You could see the scalability bottleneck of 400 TPS in the evaluation section of this paper. Our approach, however, provides snapshot isolation with a negligible overhead on region servers, and could scale up to tens of thousands write transactions per second. If you are interested, a summary of techniques that we used to achieve this performance is published at SOSP'11, poster section.
>>>>> http://sigops.org/sosp/sosp11/posters/summaries/sosp11-final12.pdf
>>>>> 
>>>>> 
>>>>>> -- Lars
>>>>>> 
>>>>>> 
>>>>>> ----- Original Message -----
>>>>>> From: Daniel Gómez Ferro <da...@yahoo-inc.com>
>>>>>> To: "dev@hbase.apache.org" <de...@hbase.apache.org>; "user@hbase.apache.org" <us...@hbase.apache.org>
>>>>>> Cc: Maysam Yabandeh <ma...@yahoo-inc.com>; Flavio Junqueira <fp...@yahoo-inc.com>; Benjamin Reed <br...@yahoo-inc.com>; Ivan Kelly <iv...@yahoo-inc.com>
>>>>>> Sent: Friday, November 4, 2011 4:24 AM
>>>>>> Subject: Omid: Transactional Support for HBase
>>>>>> 
>>>>>> (I apologize for resending but I forgot to add the user list.)
>>>>>> 
>>>>>> Hi all,
>>>>>> 
>>>>>> It is my pleasure to announce the open source release of Omid, a project whose goal is to add lock-free transactional support on top of HBase. The current release includes CrSO, a client-replicated status oracle that detects the write-write conflicts to provide Snapshot Isolation. CrSO has the following appealing properties:
>>>>>> 
>>>>>> 1) It does not need any modification into the HBase code nor the table scheme.
>>>>>> 2) The overhead on HBase DataNodes is negligible (only after an abort)
>>>>>> 3) It scales up to 50,000 write transactions per second (TPS) and a thousand of client connections.
>>>>>> 
>>>>>> We have setup a github project: https://github.com/dgomezferro/omid
>>>>>> 
>>>>>> More information is available at the wiki: https://github.com/dgomezferro/omid/wiki
>>>>>> 
>>>>>> If you are interested, installation and running instructions are available on the README: https://github.com/dgomezferro/omid/blob/master/README.md
>>>>>> 
>>>>>> Please do not hesitate to contact us in the case of any question.
>>>>>> 
>>>>>> Best Regards,
>>>>>> Daniel Gómez Ferro
>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>>> flavio
>>>> junqueira
>>>> 
>>>> research scientist
>>>> 
>>>> fpj@yahoo-inc.com
>>>> direct +34 93-183-8828
>>>> 
>>>> avinguda diagonal 177, 8th floor, barcelona, 08018, es
>>>> phone (408) 349 3300    fax (408) 349 3301
>>> 
>>> 
>

Re: Omid: Transactional Support for HBase

Posted by Daniel Gómez Ferro <da...@yahoo-inc.com>.

Hi Jignesh

On Nov 7, 2011, at 21:44 , Jignesh Patel wrote:

> Looks like this transaction is limited for one row. Is that correct?
> 

No, it's not. Transactions can span multiple rows.

> Another thing I don't have zookeepr installed as I am running in
> pseudo distibuted mode. The document doesn't say anything about
> integrating in pseudo distributed mode.
> 

Currently Omid requires both ZooKeeper and BookKeeper to operate, but we provide some scripts to launch them locally if you just want to try it. I've just pushed a change so you don't need to install anything manually, just download/checkout Omid, run 'mvn package' and follow the instructions to run the benchmark locally.

If people still find cumbersome or difficult to run ZK/BK we could provide an option to disable the replication to the WAL.

Daniel

> -Jignesh
> 
> 2011/11/7 Daniel Gómez Ferro <da...@yahoo-inc.com>:
>> 
>> On Nov 6, 2011, at 21:53 , lars hofhansl wrote:
>> 
>>> Another question: I assume this will not work out of the box with deletes?
>> 
>> Hi,
>> 
>> Our current approach does support deletes (i.e., user requested deletes). Right now we use empty values as delete marks: when the user calls TransactionalTable.delete() we insert empty values at the specified timestamp. At the filtering time, we keep track of these delete marks and we can discard the ones that are uncommitted or fall outside our time range of interest. When a transaction aborts, the cleanup procedure deletes the specific values inserted by the transactions (in contrast to all versions). This way we don't insert delete tombstones that mask previous values.
>> 
>> The drawbacks of this approach are that (i) we give a special meaning to the empty values, and (ii) to delete the whole column family (in contrast with a column) we have to perform a get beforehand to obtain the column qualifiers.
>> 
>>> 
>>> Deletes always cover all key values in the past (from their timestamps on backwards), so once a delete marker is placed there is no way to get back any of a puts it affects.
>>> 
>>> HBase trunk has HBASE-4536 to allow time-range scans to work with deleted rows (but needs to be enabled for a column family - I still think it should be the default, but anyway).
>>> 
>> 
>> I think this feature would be very useful, and enables a cleaner implementation. It would be great if the flag was enabled by default, we want the user to change as little as possible his setup, but it's not a big deal.
>> 
>>> -- Lars
>>> 
>>> ________________________________
>>> From: Flavio Junqueira <fp...@yahoo-inc.com>
>>> To: Daniel Gómez Ferro <da...@yahoo-inc.com>
>>> Cc: "user@hbase.apache.org" <us...@hbase.apache.org>; lars hofhansl <lh...@yahoo.com>; "dev@hbase.apache.org" <de...@hbase.apache.org>; Maysam Yabandeh <ma...@yahoo-inc.com>; Benjamin Reed <br...@yahoo-inc.com>; Ivan Kelly <iv...@yahoo-inc.com>
>>> Sent: Sunday, November 6, 2011 7:14 AM
>>> Subject: Re: Omid: Transactional Support for HBase
>>> 
>>> 
>>> A quick note on Omid for the ones following on github: the repository we will be working with is the fork under the Yahoo! account:
>>> 
>>> 
>>> https://github.com/yahoo/omid/
>>> 
>>> -Flavio
>>> 
>>> 
>>> On Nov 5, 2011, at 9:36 PM, Daniel Gómez Ferro wrote:
>>> 
>>> 
>>>> 
>>>> On Nov 5, 2011, at 05:37 , lars hofhansl wrote:
>>>> 
>>>> Cool stuff Daniel,
>>>>> 
>>>> 
>>>> Hi Lars,
>>>> 
>>>> Thanks for the good points.
>>>> 
>>>> 
>>>> 
>>>>> Was looking through the code a bit. Seems like you make a best effort to push as much of
>>>>> the filtering of KVs of uncommitted transactions to HBase and then do some filtering on the client
>>>>> not a bad approach. (I hope I didn't misunderstand the approach, only looked through the code for
>>>>> 1/2 hour or so).
>>>>> 
>>>> 
>>>> Putting it more accurately, the uncommitted KVs are stored at HBase, but it is the client's job to filter them using the commit information that it has received from the status oracle. According to snapshot isolation guarantee, all the versions that are inserted with a timestamp larger than the transaction start timestamp must be ignored, which is done by setting the time range on the client's get request sent to HBase. Since the uncommitted changes of the aborted transactions are eventually removed from HBase, the client rarely needs to fetch more than a version to reach a KV that is committed before the transaction starts (the first property of snapshot isolation).
>>>> 
>>>> 
>>>>> 
>>>>> One thing I was wondering: Why bookkeeper? Why not store the WAL itself in HBase? That way
>>>>> you might not even need a separate server.
>>>>> 
>>>>> Did you see: HBaseSI (http://www.cs.uwaterloo.ca/~c15zhang/HBaseSI.pdf), they also do MVCC
>>>>> on top of unaltered HBase/schema, although from reading that paper I get the impression that it
>>>>> would not scale to scans touching many rows (which is where your client side filtering comes in).
>>>>> 
>>>> 
>>>> 
>>>> Thanks for the link. We had seen the other paper of the same authors (Grid2010) that shares the same bottlenecks with the recent work.
>>>> As you pointed out correctly, the question is about performance. You could see the scalability bottleneck of 400 TPS in the evaluation section of this paper. Our approach, however, provides snapshot isolation with a negligible overhead on region servers, and could scale up to tens of thousands write transactions per second. If you are interested, a summary of techniques that we used to achieve this performance is published at SOSP'11, poster section.
>>>> http://sigops.org/sosp/sosp11/posters/summaries/sosp11-final12.pdf
>>>> 
>>>> 
>>>>> -- Lars
>>>>> 
>>>>> 
>>>>> ----- Original Message -----
>>>>> From: Daniel Gómez Ferro <da...@yahoo-inc.com>
>>>>> To: "dev@hbase.apache.org" <de...@hbase.apache.org>; "user@hbase.apache.org" <us...@hbase.apache.org>
>>>>> Cc: Maysam Yabandeh <ma...@yahoo-inc.com>; Flavio Junqueira <fp...@yahoo-inc.com>; Benjamin Reed <br...@yahoo-inc.com>; Ivan Kelly <iv...@yahoo-inc.com>
>>>>> Sent: Friday, November 4, 2011 4:24 AM
>>>>> Subject: Omid: Transactional Support for HBase
>>>>> 
>>>>> (I apologize for resending but I forgot to add the user list.)
>>>>> 
>>>>> Hi all,
>>>>> 
>>>>> It is my pleasure to announce the open source release of Omid, a project whose goal is to add lock-free transactional support on top of HBase. The current release includes CrSO, a client-replicated status oracle that detects the write-write conflicts to provide Snapshot Isolation. CrSO has the following appealing properties:
>>>>> 
>>>>> 1) It does not need any modification into the HBase code nor the table scheme.
>>>>> 2) The overhead on HBase DataNodes is negligible (only after an abort)
>>>>> 3) It scales up to 50,000 write transactions per second (TPS) and a thousand of client connections.
>>>>> 
>>>>> We have setup a github project: https://github.com/dgomezferro/omid
>>>>> 
>>>>> More information is available at the wiki: https://github.com/dgomezferro/omid/wiki
>>>>> 
>>>>> If you are interested, installation and running instructions are available on the README: https://github.com/dgomezferro/omid/blob/master/README.md
>>>>> 
>>>>> Please do not hesitate to contact us in the case of any question.
>>>>> 
>>>>> Best Regards,
>>>>> Daniel Gómez Ferro
>>>>> 
>>>>> 
>>>> 
>>> 
>>> flavio
>>> junqueira
>>> 
>>> research scientist
>>> 
>>> fpj@yahoo-inc.com
>>> direct +34 93-183-8828
>>> 
>>> avinguda diagonal 177, 8th floor, barcelona, 08018, es
>>> phone (408) 349 3300    fax (408) 349 3301
>> 
>>

Re: Omid: Transactional Support for HBase

Posted by Jignesh Patel <ji...@gmail.com>.

Looks like this transaction is limited for one row. Is that correct?

Another thing I don't have zookeepr installed as I am running in
pseudo distibuted mode. The document doesn't say anything about
integrating in pseudo distributed mode.

-Jignesh

2011/11/7 Daniel Gómez Ferro <da...@yahoo-inc.com>:
>
> On Nov 6, 2011, at 21:53 , lars hofhansl wrote:
>
>> Another question: I assume this will not work out of the box with deletes?
>
> Hi,
>
> Our current approach does support deletes (i.e., user requested deletes). Right now we use empty values as delete marks: when the user calls TransactionalTable.delete() we insert empty values at the specified timestamp. At the filtering time, we keep track of these delete marks and we can discard the ones that are uncommitted or fall outside our time range of interest. When a transaction aborts, the cleanup procedure deletes the specific values inserted by the transactions (in contrast to all versions). This way we don't insert delete tombstones that mask previous values.
>
> The drawbacks of this approach are that (i) we give a special meaning to the empty values, and (ii) to delete the whole column family (in contrast with a column) we have to perform a get beforehand to obtain the column qualifiers.
>
>>
>> Deletes always cover all key values in the past (from their timestamps on backwards), so once a delete marker is placed there is no way to get back any of a puts it affects.
>>
>> HBase trunk has HBASE-4536 to allow time-range scans to work with deleted rows (but needs to be enabled for a column family - I still think it should be the default, but anyway).
>>
>
> I think this feature would be very useful, and enables a cleaner implementation. It would be great if the flag was enabled by default, we want the user to change as little as possible his setup, but it's not a big deal.
>
>> -- Lars
>>
>> ________________________________
>> From: Flavio Junqueira <fp...@yahoo-inc.com>
>> To: Daniel Gómez Ferro <da...@yahoo-inc.com>
>> Cc: "user@hbase.apache.org" <us...@hbase.apache.org>; lars hofhansl <lh...@yahoo.com>; "dev@hbase.apache.org" <de...@hbase.apache.org>; Maysam Yabandeh <ma...@yahoo-inc.com>; Benjamin Reed <br...@yahoo-inc.com>; Ivan Kelly <iv...@yahoo-inc.com>
>> Sent: Sunday, November 6, 2011 7:14 AM
>> Subject: Re: Omid: Transactional Support for HBase
>>
>>
>> A quick note on Omid for the ones following on github: the repository we will be working with is the fork under the Yahoo! account:
>>
>>
>> https://github.com/yahoo/omid/
>>
>> -Flavio
>>
>>
>> On Nov 5, 2011, at 9:36 PM, Daniel Gómez Ferro wrote:
>>
>>
>>>
>>> On Nov 5, 2011, at 05:37 , lars hofhansl wrote:
>>>
>>> Cool stuff Daniel,
>>>>
>>>
>>> Hi Lars,
>>>
>>> Thanks for the good points.
>>>
>>>
>>>
>>>> Was looking through the code a bit. Seems like you make a best effort to push as much of
>>>> the filtering of KVs of uncommitted transactions to HBase and then do some filtering on the client
>>>> not a bad approach. (I hope I didn't misunderstand the approach, only looked through the code for
>>>> 1/2 hour or so).
>>>>
>>>
>>> Putting it more accurately, the uncommitted KVs are stored at HBase, but it is the client's job to filter them using the commit information that it has received from the status oracle. According to snapshot isolation guarantee, all the versions that are inserted with a timestamp larger than the transaction start timestamp must be ignored, which is done by setting the time range on the client's get request sent to HBase. Since the uncommitted changes of the aborted transactions are eventually removed from HBase, the client rarely needs to fetch more than a version to reach a KV that is committed before the transaction starts (the first property of snapshot isolation).
>>>
>>>
>>>>
>>>> One thing I was wondering: Why bookkeeper? Why not store the WAL itself in HBase? That way
>>>> you might not even need a separate server.
>>>>
>>>> Did you see: HBaseSI (http://www.cs.uwaterloo.ca/~c15zhang/HBaseSI.pdf), they also do MVCC
>>>> on top of unaltered HBase/schema, although from reading that paper I get the impression that it
>>>> would not scale to scans touching many rows (which is where your client side filtering comes in).
>>>>
>>>
>>>
>>> Thanks for the link. We had seen the other paper of the same authors (Grid2010) that shares the same bottlenecks with the recent work.
>>> As you pointed out correctly, the question is about performance. You could see the scalability bottleneck of 400 TPS in the evaluation section of this paper. Our approach, however, provides snapshot isolation with a negligible overhead on region servers, and could scale up to tens of thousands write transactions per second. If you are interested, a summary of techniques that we used to achieve this performance is published at SOSP'11, poster section.
>>> http://sigops.org/sosp/sosp11/posters/summaries/sosp11-final12.pdf
>>>
>>>
>>>> -- Lars
>>>>
>>>>
>>>> ----- Original Message -----
>>>> From: Daniel Gómez Ferro <da...@yahoo-inc.com>
>>>> To: "dev@hbase.apache.org" <de...@hbase.apache.org>; "user@hbase.apache.org" <us...@hbase.apache.org>
>>>> Cc: Maysam Yabandeh <ma...@yahoo-inc.com>; Flavio Junqueira <fp...@yahoo-inc.com>; Benjamin Reed <br...@yahoo-inc.com>; Ivan Kelly <iv...@yahoo-inc.com>
>>>> Sent: Friday, November 4, 2011 4:24 AM
>>>> Subject: Omid: Transactional Support for HBase
>>>>
>>>> (I apologize for resending but I forgot to add the user list.)
>>>>
>>>> Hi all,
>>>>
>>>> It is my pleasure to announce the open source release of Omid, a project whose goal is to add lock-free transactional support on top of HBase. The current release includes CrSO, a client-replicated status oracle that detects the write-write conflicts to provide Snapshot Isolation. CrSO has the following appealing properties:
>>>>
>>>> 1) It does not need any modification into the HBase code nor the table scheme.
>>>> 2) The overhead on HBase DataNodes is negligible (only after an abort)
>>>> 3) It scales up to 50,000 write transactions per second (TPS) and a thousand of client connections.
>>>>
>>>> We have setup a github project: https://github.com/dgomezferro/omid
>>>>
>>>> More information is available at the wiki: https://github.com/dgomezferro/omid/wiki
>>>>
>>>> If you are interested, installation and running instructions are available on the README: https://github.com/dgomezferro/omid/blob/master/README.md
>>>>
>>>> Please do not hesitate to contact us in the case of any question.
>>>>
>>>> Best Regards,
>>>> Daniel Gómez Ferro
>>>>
>>>>
>>>
>>
>> flavio
>> junqueira
>>
>> research scientist
>>
>> fpj@yahoo-inc.com
>> direct +34 93-183-8828
>>
>> avinguda diagonal 177, 8th floor, barcelona, 08018, es
>> phone (408) 349 3300    fax (408) 349 3301
>
>

Re: Omid: Transactional Support for HBase

Posted by Daniel Gómez Ferro <da...@yahoo-inc.com>.

On Nov 6, 2011, at 21:53 , lars hofhansl wrote:

> Another question: I assume this will not work out of the box with deletes?

Hi,

Our current approach does support deletes (i.e., user requested deletes). Right now we use empty values as delete marks: when the user calls TransactionalTable.delete() we insert empty values at the specified timestamp. At the filtering time, we keep track of these delete marks and we can discard the ones that are uncommitted or fall outside our time range of interest. When a transaction aborts, the cleanup procedure deletes the specific values inserted by the transactions (in contrast to all versions). This way we don't insert delete tombstones that mask previous values.

The drawbacks of this approach are that (i) we give a special meaning to the empty values, and (ii) to delete the whole column family (in contrast with a column) we have to perform a get beforehand to obtain the column qualifiers.

> 
> Deletes always cover all key values in the past (from their timestamps on backwards), so once a delete marker is placed there is no way to get back any of a puts it affects.
> 
> HBase trunk has HBASE-4536 to allow time-range scans to work with deleted rows (but needs to be enabled for a column family - I still think it should be the default, but anyway).
> 

I think this feature would be very useful, and enables a cleaner implementation. It would be great if the flag was enabled by default, we want the user to change as little as possible his setup, but it's not a big deal.

> -- Lars
> 
> ________________________________
> From: Flavio Junqueira <fp...@yahoo-inc.com>
> To: Daniel Gómez Ferro <da...@yahoo-inc.com>
> Cc: "user@hbase.apache.org" <us...@hbase.apache.org>; lars hofhansl <lh...@yahoo.com>; "dev@hbase.apache.org" <de...@hbase.apache.org>; Maysam Yabandeh <ma...@yahoo-inc.com>; Benjamin Reed <br...@yahoo-inc.com>; Ivan Kelly <iv...@yahoo-inc.com>
> Sent: Sunday, November 6, 2011 7:14 AM
> Subject: Re: Omid: Transactional Support for HBase
> 
> 
> A quick note on Omid for the ones following on github: the repository we will be working with is the fork under the Yahoo! account:
> 
> 
> https://github.com/yahoo/omid/
> 
> -Flavio
> 
> 
> On Nov 5, 2011, at 9:36 PM, Daniel Gómez Ferro wrote:
> 
> 
>> 
>> On Nov 5, 2011, at 05:37 , lars hofhansl wrote:
>> 
>> Cool stuff Daniel,
>>> 
>> 
>> Hi Lars,
>> 
>> Thanks for the good points.
>> 
>> 
>> 
>>> Was looking through the code a bit. Seems like you make a best effort to push as much of
>>> the filtering of KVs of uncommitted transactions to HBase and then do some filtering on the client
>>> not a bad approach. (I hope I didn't misunderstand the approach, only looked through the code for
>>> 1/2 hour or so).
>>> 
>> 
>> Putting it more accurately, the uncommitted KVs are stored at HBase, but it is the client's job to filter them using the commit information that it has received from the status oracle. According to snapshot isolation guarantee, all the versions that are inserted with a timestamp larger than the transaction start timestamp must be ignored, which is done by setting the time range on the client's get request sent to HBase. Since the uncommitted changes of the aborted transactions are eventually removed from HBase, the client rarely needs to fetch more than a version to reach a KV that is committed before the transaction starts (the first property of snapshot isolation).
>> 
>> 
>>> 
>>> One thing I was wondering: Why bookkeeper? Why not store the WAL itself in HBase? That way
>>> you might not even need a separate server.
>>> 
>>> Did you see: HBaseSI (http://www.cs.uwaterloo.ca/~c15zhang/HBaseSI.pdf), they also do MVCC
>>> on top of unaltered HBase/schema, although from reading that paper I get the impression that it
>>> would not scale to scans touching many rows (which is where your client side filtering comes in).
>>> 
>> 
>> 
>> Thanks for the link. We had seen the other paper of the same authors (Grid2010) that shares the same bottlenecks with the recent work.
>> As you pointed out correctly, the question is about performance. You could see the scalability bottleneck of 400 TPS in the evaluation section of this paper. Our approach, however, provides snapshot isolation with a negligible overhead on region servers, and could scale up to tens of thousands write transactions per second. If you are interested, a summary of techniques that we used to achieve this performance is published at SOSP'11, poster section.
>> http://sigops.org/sosp/sosp11/posters/summaries/sosp11-final12.pdf
>> 
>> 
>>> -- Lars
>>> 
>>> 
>>> ----- Original Message -----
>>> From: Daniel Gómez Ferro <da...@yahoo-inc.com>
>>> To: "dev@hbase.apache.org" <de...@hbase.apache.org>; "user@hbase.apache.org" <us...@hbase.apache.org>
>>> Cc: Maysam Yabandeh <ma...@yahoo-inc.com>; Flavio Junqueira <fp...@yahoo-inc.com>; Benjamin Reed <br...@yahoo-inc.com>; Ivan Kelly <iv...@yahoo-inc.com>
>>> Sent: Friday, November 4, 2011 4:24 AM
>>> Subject: Omid: Transactional Support for HBase
>>> 
>>> (I apologize for resending but I forgot to add the user list.)
>>> 
>>> Hi all,
>>> 
>>> It is my pleasure to announce the open source release of Omid, a project whose goal is to add lock-free transactional support on top of HBase. The current release includes CrSO, a client-replicated status oracle that detects the write-write conflicts to provide Snapshot Isolation. CrSO has the following appealing properties:
>>> 
>>> 1) It does not need any modification into the HBase code nor the table scheme.
>>> 2) The overhead on HBase DataNodes is negligible (only after an abort)
>>> 3) It scales up to 50,000 write transactions per second (TPS) and a thousand of client connections.
>>> 
>>> We have setup a github project: https://github.com/dgomezferro/omid
>>> 
>>> More information is available at the wiki: https://github.com/dgomezferro/omid/wiki
>>> 
>>> If you are interested, installation and running instructions are available on the README: https://github.com/dgomezferro/omid/blob/master/README.md
>>> 
>>> Please do not hesitate to contact us in the case of any question.
>>> 
>>> Best Regards,
>>> Daniel Gómez Ferro
>>> 
>>> 
>> 
> 
> flavio
> junqueira
> 
> research scientist
> 
> fpj@yahoo-inc.com
> direct +34 93-183-8828
> 
> avinguda diagonal 177, 8th floor, barcelona, 08018, es
> phone (408) 349 3300    fax (408) 349 3301

Re: Omid: Transactional Support for HBase

Posted by Daniel Gómez Ferro <da...@yahoo-inc.com>.

On Nov 6, 2011, at 21:53 , lars hofhansl wrote:

> Another question: I assume this will not work out of the box with deletes?

Hi,

Our current approach does support deletes (i.e., user requested deletes). Right now we use empty values as delete marks: when the user calls TransactionalTable.delete() we insert empty values at the specified timestamp. At the filtering time, we keep track of these delete marks and we can discard the ones that are uncommitted or fall outside our time range of interest. When a transaction aborts, the cleanup procedure deletes the specific values inserted by the transactions (in contrast to all versions). This way we don't insert delete tombstones that mask previous values.

The drawbacks of this approach are that (i) we give a special meaning to the empty values, and (ii) to delete the whole column family (in contrast with a column) we have to perform a get beforehand to obtain the column qualifiers.

> 
> Deletes always cover all key values in the past (from their timestamps on backwards), so once a delete marker is placed there is no way to get back any of a puts it affects.
> 
> HBase trunk has HBASE-4536 to allow time-range scans to work with deleted rows (but needs to be enabled for a column family - I still think it should be the default, but anyway).
> 

I think this feature would be very useful, and enables a cleaner implementation. It would be great if the flag was enabled by default, we want the user to change as little as possible his setup, but it's not a big deal.

> -- Lars
> 
> ________________________________
> From: Flavio Junqueira <fp...@yahoo-inc.com>
> To: Daniel Gómez Ferro <da...@yahoo-inc.com>
> Cc: "user@hbase.apache.org" <us...@hbase.apache.org>; lars hofhansl <lh...@yahoo.com>; "dev@hbase.apache.org" <de...@hbase.apache.org>; Maysam Yabandeh <ma...@yahoo-inc.com>; Benjamin Reed <br...@yahoo-inc.com>; Ivan Kelly <iv...@yahoo-inc.com>
> Sent: Sunday, November 6, 2011 7:14 AM
> Subject: Re: Omid: Transactional Support for HBase
> 
> 
> A quick note on Omid for the ones following on github: the repository we will be working with is the fork under the Yahoo! account:
> 
> 
> https://github.com/yahoo/omid/
> 
> -Flavio
> 
> 
> On Nov 5, 2011, at 9:36 PM, Daniel Gómez Ferro wrote:
> 
> 
>> 
>> On Nov 5, 2011, at 05:37 , lars hofhansl wrote:
>> 
>> Cool stuff Daniel,
>>> 
>> 
>> Hi Lars,
>> 
>> Thanks for the good points.
>> 
>> 
>> 
>>> Was looking through the code a bit. Seems like you make a best effort to push as much of
>>> the filtering of KVs of uncommitted transactions to HBase and then do some filtering on the client
>>> not a bad approach. (I hope I didn't misunderstand the approach, only looked through the code for
>>> 1/2 hour or so).
>>> 
>> 
>> Putting it more accurately, the uncommitted KVs are stored at HBase, but it is the client's job to filter them using the commit information that it has received from the status oracle. According to snapshot isolation guarantee, all the versions that are inserted with a timestamp larger than the transaction start timestamp must be ignored, which is done by setting the time range on the client's get request sent to HBase. Since the uncommitted changes of the aborted transactions are eventually removed from HBase, the client rarely needs to fetch more than a version to reach a KV that is committed before the transaction starts (the first property of snapshot isolation).
>> 
>> 
>>> 
>>> One thing I was wondering: Why bookkeeper? Why not store the WAL itself in HBase? That way
>>> you might not even need a separate server.
>>> 
>>> Did you see: HBaseSI (http://www.cs.uwaterloo.ca/~c15zhang/HBaseSI.pdf), they also do MVCC
>>> on top of unaltered HBase/schema, although from reading that paper I get the impression that it
>>> would not scale to scans touching many rows (which is where your client side filtering comes in).
>>> 
>> 
>> 
>> Thanks for the link. We had seen the other paper of the same authors (Grid2010) that shares the same bottlenecks with the recent work.
>> As you pointed out correctly, the question is about performance. You could see the scalability bottleneck of 400 TPS in the evaluation section of this paper. Our approach, however, provides snapshot isolation with a negligible overhead on region servers, and could scale up to tens of thousands write transactions per second. If you are interested, a summary of techniques that we used to achieve this performance is published at SOSP'11, poster section.
>> http://sigops.org/sosp/sosp11/posters/summaries/sosp11-final12.pdf
>> 
>> 
>>> -- Lars
>>> 
>>> 
>>> ----- Original Message -----
>>> From: Daniel Gómez Ferro <da...@yahoo-inc.com>
>>> To: "dev@hbase.apache.org" <de...@hbase.apache.org>; "user@hbase.apache.org" <us...@hbase.apache.org>
>>> Cc: Maysam Yabandeh <ma...@yahoo-inc.com>; Flavio Junqueira <fp...@yahoo-inc.com>; Benjamin Reed <br...@yahoo-inc.com>; Ivan Kelly <iv...@yahoo-inc.com>
>>> Sent: Friday, November 4, 2011 4:24 AM
>>> Subject: Omid: Transactional Support for HBase
>>> 
>>> (I apologize for resending but I forgot to add the user list.)
>>> 
>>> Hi all,
>>> 
>>> It is my pleasure to announce the open source release of Omid, a project whose goal is to add lock-free transactional support on top of HBase. The current release includes CrSO, a client-replicated status oracle that detects the write-write conflicts to provide Snapshot Isolation. CrSO has the following appealing properties:
>>> 
>>> 1) It does not need any modification into the HBase code nor the table scheme.
>>> 2) The overhead on HBase DataNodes is negligible (only after an abort)
>>> 3) It scales up to 50,000 write transactions per second (TPS) and a thousand of client connections.
>>> 
>>> We have setup a github project: https://github.com/dgomezferro/omid
>>> 
>>> More information is available at the wiki: https://github.com/dgomezferro/omid/wiki
>>> 
>>> If you are interested, installation and running instructions are available on the README: https://github.com/dgomezferro/omid/blob/master/README.md
>>> 
>>> Please do not hesitate to contact us in the case of any question.
>>> 
>>> Best Regards,
>>> Daniel Gómez Ferro
>>> 
>>> 
>> 
> 
> flavio
> junqueira
> 
> research scientist
> 
> fpj@yahoo-inc.com
> direct +34 93-183-8828
> 
> avinguda diagonal 177, 8th floor, barcelona, 08018, es
> phone (408) 349 3300    fax (408) 349 3301

Re: Omid: Transactional Support for HBase

Posted by lars hofhansl <lh...@yahoo.com>.

Another question: I assume this will not work out of the box with deletes?

Deletes always cover all key values in the past (from their timestamps on backwards), so once a delete marker is placed there is no way to get back any of a puts it affects.

HBase trunk has HBASE-4536 to allow time-range scans to work with deleted rows (but needs to be enabled for a column family - I still think it should be the default, but anyway).

-- Lars

________________________________
From: Flavio Junqueira <fp...@yahoo-inc.com>
To: Daniel Gómez Ferro <da...@yahoo-inc.com>
Cc: "user@hbase.apache.org" <us...@hbase.apache.org>; lars hofhansl <lh...@yahoo.com>; "dev@hbase.apache.org" <de...@hbase.apache.org>; Maysam Yabandeh <ma...@yahoo-inc.com>; Benjamin Reed <br...@yahoo-inc.com>; Ivan Kelly <iv...@yahoo-inc.com>
Sent: Sunday, November 6, 2011 7:14 AM
Subject: Re: Omid: Transactional Support for HBase


A quick note on Omid for the ones following on github: the repository we will be working with is the fork under the Yahoo! account:


https://github.com/yahoo/omid/

-Flavio


On Nov 5, 2011, at 9:36 PM, Daniel Gómez Ferro wrote:


>
>On Nov 5, 2011, at 05:37 , lars hofhansl wrote:
>
>Cool stuff Daniel,
>>
>
>Hi Lars,
>
>Thanks for the good points.
>
>
>
>>Was looking through the code a bit. Seems like you make a best effort to push as much of
>>the filtering of KVs of uncommitted transactions to HBase and then do some filtering on the client
>>not a bad approach. (I hope I didn't misunderstand the approach, only looked through the code for
>>1/2 hour or so).
>>
>
>Putting it more accurately, the uncommitted KVs are stored at HBase, but it is the client's job to filter them using the commit information that it has received from the status oracle. According to snapshot isolation guarantee, all the versions that are inserted with a timestamp larger than the transaction start timestamp must be ignored, which is done by setting the time range on the client's get request sent to HBase. Since the uncommitted changes of the aborted transactions are eventually removed from HBase, the client rarely needs to fetch more than a version to reach a KV that is committed before the transaction starts (the first property of snapshot isolation).
>
>
>>
>>One thing I was wondering: Why bookkeeper? Why not store the WAL itself in HBase? That way
>>you might not even need a separate server.
>>
>>Did you see: HBaseSI (http://www.cs.uwaterloo.ca/~c15zhang/HBaseSI.pdf), they also do MVCC
>>on top of unaltered HBase/schema, although from reading that paper I get the impression that it
>>would not scale to scans touching many rows (which is where your client side filtering comes in).
>>
>
>
>Thanks for the link. We had seen the other paper of the same authors (Grid2010) that shares the same bottlenecks with the recent work.
>As you pointed out correctly, the question is about performance. You could see the scalability bottleneck of 400 TPS in the evaluation section of this paper. Our approach, however, provides snapshot isolation with a negligible overhead on region servers, and could scale up to tens of thousands write transactions per second. If you are interested, a summary of techniques that we used to achieve this performance is published at SOSP'11, poster section.
>http://sigops.org/sosp/sosp11/posters/summaries/sosp11-final12.pdf
>
>
>>-- Lars
>>
>>
>>----- Original Message -----
>>From: Daniel Gómez Ferro <da...@yahoo-inc.com>
>>To: "dev@hbase.apache.org" <de...@hbase.apache.org>; "user@hbase.apache.org" <us...@hbase.apache.org>
>>Cc: Maysam Yabandeh <ma...@yahoo-inc.com>; Flavio Junqueira <fp...@yahoo-inc.com>; Benjamin Reed <br...@yahoo-inc.com>; Ivan Kelly <iv...@yahoo-inc.com>
>>Sent: Friday, November 4, 2011 4:24 AM
>>Subject: Omid: Transactional Support for HBase
>>
>>(I apologize for resending but I forgot to add the user list.)
>>
>>Hi all,
>>
>>It is my pleasure to announce the open source release of Omid, a project whose goal is to add lock-free transactional support on top of HBase. The current release includes CrSO, a client-replicated status oracle that detects the write-write conflicts to provide Snapshot Isolation. CrSO has the following appealing properties:
>>
>>1) It does not need any modification into the HBase code nor the table scheme.
>>2) The overhead on HBase DataNodes is negligible (only after an abort)
>>3) It scales up to 50,000 write transactions per second (TPS) and a thousand of client connections.
>>
>>We have setup a github project: https://github.com/dgomezferro/omid
>>
>>More information is available at the wiki: https://github.com/dgomezferro/omid/wiki
>>
>>If you are interested, installation and running instructions are available on the README: https://github.com/dgomezferro/omid/blob/master/README.md
>>
>>Please do not hesitate to contact us in the case of any question.
>>
>>Best Regards,
>>Daniel Gómez Ferro
>>
>>
>

flavio
junqueira

research scientist

fpj@yahoo-inc.com
direct +34 93-183-8828

avinguda diagonal 177, 8th floor, barcelona, 08018, es
phone (408) 349 3300    fax (408) 349 3301

Re: Omid: Transactional Support for HBase

Posted by lars hofhansl <lh...@yahoo.com>.

Another question: I assume this will not work out of the box with deletes?

Deletes always cover all key values in the past (from their timestamps on backwards), so once a delete marker is placed there is no way to get back any of a puts it affects.

HBase trunk has HBASE-4536 to allow time-range scans to work with deleted rows (but needs to be enabled for a column family - I still think it should be the default, but anyway).

-- Lars

________________________________
From: Flavio Junqueira <fp...@yahoo-inc.com>
To: Daniel Gómez Ferro <da...@yahoo-inc.com>
Cc: "user@hbase.apache.org" <us...@hbase.apache.org>; lars hofhansl <lh...@yahoo.com>; "dev@hbase.apache.org" <de...@hbase.apache.org>; Maysam Yabandeh <ma...@yahoo-inc.com>; Benjamin Reed <br...@yahoo-inc.com>; Ivan Kelly <iv...@yahoo-inc.com>
Sent: Sunday, November 6, 2011 7:14 AM
Subject: Re: Omid: Transactional Support for HBase


A quick note on Omid for the ones following on github: the repository we will be working with is the fork under the Yahoo! account:


https://github.com/yahoo/omid/

-Flavio


On Nov 5, 2011, at 9:36 PM, Daniel Gómez Ferro wrote:


>
>On Nov 5, 2011, at 05:37 , lars hofhansl wrote:
>
>Cool stuff Daniel,
>>
>
>Hi Lars,
>
>Thanks for the good points.
>
>
>
>>Was looking through the code a bit. Seems like you make a best effort to push as much of
>>the filtering of KVs of uncommitted transactions to HBase and then do some filtering on the client
>>not a bad approach. (I hope I didn't misunderstand the approach, only looked through the code for
>>1/2 hour or so).
>>
>
>Putting it more accurately, the uncommitted KVs are stored at HBase, but it is the client's job to filter them using the commit information that it has received from the status oracle. According to snapshot isolation guarantee, all the versions that are inserted with a timestamp larger than the transaction start timestamp must be ignored, which is done by setting the time range on the client's get request sent to HBase. Since the uncommitted changes of the aborted transactions are eventually removed from HBase, the client rarely needs to fetch more than a version to reach a KV that is committed before the transaction starts (the first property of snapshot isolation).
>
>
>>
>>One thing I was wondering: Why bookkeeper? Why not store the WAL itself in HBase? That way
>>you might not even need a separate server.
>>
>>Did you see: HBaseSI (http://www.cs.uwaterloo.ca/~c15zhang/HBaseSI.pdf), they also do MVCC
>>on top of unaltered HBase/schema, although from reading that paper I get the impression that it
>>would not scale to scans touching many rows (which is where your client side filtering comes in).
>>
>
>
>Thanks for the link. We had seen the other paper of the same authors (Grid2010) that shares the same bottlenecks with the recent work.
>As you pointed out correctly, the question is about performance. You could see the scalability bottleneck of 400 TPS in the evaluation section of this paper. Our approach, however, provides snapshot isolation with a negligible overhead on region servers, and could scale up to tens of thousands write transactions per second. If you are interested, a summary of techniques that we used to achieve this performance is published at SOSP'11, poster section.
>http://sigops.org/sosp/sosp11/posters/summaries/sosp11-final12.pdf
>
>
>>-- Lars
>>
>>
>>----- Original Message -----
>>From: Daniel Gómez Ferro <da...@yahoo-inc.com>
>>To: "dev@hbase.apache.org" <de...@hbase.apache.org>; "user@hbase.apache.org" <us...@hbase.apache.org>
>>Cc: Maysam Yabandeh <ma...@yahoo-inc.com>; Flavio Junqueira <fp...@yahoo-inc.com>; Benjamin Reed <br...@yahoo-inc.com>; Ivan Kelly <iv...@yahoo-inc.com>
>>Sent: Friday, November 4, 2011 4:24 AM
>>Subject: Omid: Transactional Support for HBase
>>
>>(I apologize for resending but I forgot to add the user list.)
>>
>>Hi all,
>>
>>It is my pleasure to announce the open source release of Omid, a project whose goal is to add lock-free transactional support on top of HBase. The current release includes CrSO, a client-replicated status oracle that detects the write-write conflicts to provide Snapshot Isolation. CrSO has the following appealing properties:
>>
>>1) It does not need any modification into the HBase code nor the table scheme.
>>2) The overhead on HBase DataNodes is negligible (only after an abort)
>>3) It scales up to 50,000 write transactions per second (TPS) and a thousand of client connections.
>>
>>We have setup a github project: https://github.com/dgomezferro/omid
>>
>>More information is available at the wiki: https://github.com/dgomezferro/omid/wiki
>>
>>If you are interested, installation and running instructions are available on the README: https://github.com/dgomezferro/omid/blob/master/README.md
>>
>>Please do not hesitate to contact us in the case of any question.
>>
>>Best Regards,
>>Daniel Gómez Ferro
>>
>>
>

flavio
junqueira

research scientist

fpj@yahoo-inc.com
direct +34 93-183-8828

avinguda diagonal 177, 8th floor, barcelona, 08018, es
phone (408) 349 3300    fax (408) 349 3301

Re: Omid: Transactional Support for HBase

Posted by Flavio Junqueira <fp...@yahoo-inc.com>.

A quick note on Omid for the ones following on github: the repository  
we will be working with is the fork under the Yahoo! account:

https://github.com/yahoo/omid/

-Flavio

On Nov 5, 2011, at 9:36 PM, Daniel Gómez Ferro wrote:

>
> On Nov 5, 2011, at 05:37 , lars hofhansl wrote:
>
>> Cool stuff Daniel,
>
> Hi Lars,
>
> Thanks for the good points.
>
>>
>> Was looking through the code a bit. Seems like you make a best  
>> effort to push as much of
>> the filtering of KVs of uncommitted transactions to HBase and then  
>> do some filtering on the client
>> not a bad approach. (I hope I didn't misunderstand the approach,  
>> only looked through the code for
>> 1/2 hour or so).
>
> Putting it more accurately, the uncommitted KVs are stored at HBase,  
> but it is the client's job to filter them using the commit  
> information that it has received from the status oracle. According  
> to snapshot isolation guarantee, all the versions that are inserted  
> with a timestamp larger than the transaction start timestamp must be  
> ignored, which is done by setting the time range on the client's get  
> request sent to HBase. Since the uncommitted changes of the aborted  
> transactions are eventually removed from HBase, the client rarely  
> needs to fetch more than a version to reach a KV that is committed  
> before the transaction starts (the first property of snapshot  
> isolation).
>
>>
>>
>> One thing I was wondering: Why bookkeeper? Why not store the WAL  
>> itself in HBase? That way
>> you might not even need a separate server.
>>
>> Did you see: HBaseSI (http://www.cs.uwaterloo.ca/~c15zhang/HBaseSI.pdf 
>> ), they also do MVCC
>> on top of unaltered HBase/schema, although from reading that paper  
>> I get the impression that it
>> would not scale to scans touching many rows (which is where your  
>> client side filtering comes in).
>
> Thanks for the link. We had seen the other paper of the same authors  
> (Grid2010) that shares the same bottlenecks with the recent work.
> As you pointed out correctly, the question is about performance. You  
> could see the scalability bottleneck of 400 TPS in the evaluation  
> section of this paper. Our approach, however, provides snapshot  
> isolation with a negligible overhead on region servers, and could  
> scale up to tens of thousands write transactions per second. If you  
> are interested, a summary of techniques that we used to achieve this  
> performance is published at SOSP'11, poster section.
> http://sigops.org/sosp/sosp11/posters/summaries/sosp11-final12.pdf
>
>>
>> -- Lars
>>
>>
>> ----- Original Message -----
>> From: Daniel Gómez Ferro <da...@yahoo-inc.com>
>> To: "dev@hbase.apache.org" <de...@hbase.apache.org>; "user@hbase.apache.org 
>> " <us...@hbase.apache.org>
>> Cc: Maysam Yabandeh <ma...@yahoo-inc.com>; Flavio Junqueira <fpj@yahoo-inc.com 
>> >; Benjamin Reed <br...@yahoo-inc.com>; Ivan Kelly <ivank@yahoo-inc.com 
>> >
>> Sent: Friday, November 4, 2011 4:24 AM
>> Subject: Omid: Transactional Support for HBase
>>
>> (I apologize for resending but I forgot to add the user list.)
>>
>> Hi all,
>>
>> It is my pleasure to announce the open source release of Omid, a  
>> project whose goal is to add lock-free transactional support on top  
>> of HBase. The current release includes CrSO, a client-replicated  
>> status oracle that detects the write-write conflicts to provide  
>> Snapshot Isolation. CrSO has the following appealing properties:
>>
>> 1) It does not need any modification into the HBase code nor the  
>> table scheme.
>> 2) The overhead on HBase DataNodes is negligible (only after an  
>> abort)
>> 3) It scales up to 50,000 write transactions per second (TPS) and a  
>> thousand of client connections.
>>
>> We have setup a github project: https://github.com/dgomezferro/omid
>>
>> More information is available at the wiki: https://github.com/dgomezferro/omid/wiki
>>
>> If you are interested, installation and running instructions are  
>> available on the README: https://github.com/dgomezferro/omid/blob/master/README.md
>>
>> Please do not hesitate to contact us in the case of any question.
>>
>> Best Regards,
>> Daniel Gómez Ferro
>>
>

flavio
junqueira

research scientist

fpj@yahoo-inc.com
direct +34 93-183-8828

avinguda diagonal 177, 8th floor, barcelona, 08018, es
phone (408) 349 3300    fax (408) 349 3301

Re: Omid: Transactional Support for HBase

Posted by Flavio Junqueira <fp...@yahoo-inc.com>.

A quick note on Omid for the ones following on github: the repository  
we will be working with is the fork under the Yahoo! account:

https://github.com/yahoo/omid/

-Flavio

On Nov 5, 2011, at 9:36 PM, Daniel Gómez Ferro wrote:

>
> On Nov 5, 2011, at 05:37 , lars hofhansl wrote:
>
>> Cool stuff Daniel,
>
> Hi Lars,
>
> Thanks for the good points.
>
>>
>> Was looking through the code a bit. Seems like you make a best  
>> effort to push as much of
>> the filtering of KVs of uncommitted transactions to HBase and then  
>> do some filtering on the client
>> not a bad approach. (I hope I didn't misunderstand the approach,  
>> only looked through the code for
>> 1/2 hour or so).
>
> Putting it more accurately, the uncommitted KVs are stored at HBase,  
> but it is the client's job to filter them using the commit  
> information that it has received from the status oracle. According  
> to snapshot isolation guarantee, all the versions that are inserted  
> with a timestamp larger than the transaction start timestamp must be  
> ignored, which is done by setting the time range on the client's get  
> request sent to HBase. Since the uncommitted changes of the aborted  
> transactions are eventually removed from HBase, the client rarely  
> needs to fetch more than a version to reach a KV that is committed  
> before the transaction starts (the first property of snapshot  
> isolation).
>
>>
>>
>> One thing I was wondering: Why bookkeeper? Why not store the WAL  
>> itself in HBase? That way
>> you might not even need a separate server.
>>
>> Did you see: HBaseSI (http://www.cs.uwaterloo.ca/~c15zhang/HBaseSI.pdf 
>> ), they also do MVCC
>> on top of unaltered HBase/schema, although from reading that paper  
>> I get the impression that it
>> would not scale to scans touching many rows (which is where your  
>> client side filtering comes in).
>
> Thanks for the link. We had seen the other paper of the same authors  
> (Grid2010) that shares the same bottlenecks with the recent work.
> As you pointed out correctly, the question is about performance. You  
> could see the scalability bottleneck of 400 TPS in the evaluation  
> section of this paper. Our approach, however, provides snapshot  
> isolation with a negligible overhead on region servers, and could  
> scale up to tens of thousands write transactions per second. If you  
> are interested, a summary of techniques that we used to achieve this  
> performance is published at SOSP'11, poster section.
> http://sigops.org/sosp/sosp11/posters/summaries/sosp11-final12.pdf
>
>>
>> -- Lars
>>
>>
>> ----- Original Message -----
>> From: Daniel Gómez Ferro <da...@yahoo-inc.com>
>> To: "dev@hbase.apache.org" <de...@hbase.apache.org>; "user@hbase.apache.org 
>> " <us...@hbase.apache.org>
>> Cc: Maysam Yabandeh <ma...@yahoo-inc.com>; Flavio Junqueira <fpj@yahoo-inc.com 
>> >; Benjamin Reed <br...@yahoo-inc.com>; Ivan Kelly <ivank@yahoo-inc.com 
>> >
>> Sent: Friday, November 4, 2011 4:24 AM
>> Subject: Omid: Transactional Support for HBase
>>
>> (I apologize for resending but I forgot to add the user list.)
>>
>> Hi all,
>>
>> It is my pleasure to announce the open source release of Omid, a  
>> project whose goal is to add lock-free transactional support on top  
>> of HBase. The current release includes CrSO, a client-replicated  
>> status oracle that detects the write-write conflicts to provide  
>> Snapshot Isolation. CrSO has the following appealing properties:
>>
>> 1) It does not need any modification into the HBase code nor the  
>> table scheme.
>> 2) The overhead on HBase DataNodes is negligible (only after an  
>> abort)
>> 3) It scales up to 50,000 write transactions per second (TPS) and a  
>> thousand of client connections.
>>
>> We have setup a github project: https://github.com/dgomezferro/omid
>>
>> More information is available at the wiki: https://github.com/dgomezferro/omid/wiki
>>
>> If you are interested, installation and running instructions are  
>> available on the README: https://github.com/dgomezferro/omid/blob/master/README.md
>>
>> Please do not hesitate to contact us in the case of any question.
>>
>> Best Regards,
>> Daniel Gómez Ferro
>>
>

flavio
junqueira

research scientist

fpj@yahoo-inc.com
direct +34 93-183-8828

avinguda diagonal 177, 8th floor, barcelona, 08018, es
phone (408) 349 3300    fax (408) 349 3301

Re: Omid: Transactional Support for HBase

Posted by Daniel Gómez Ferro <da...@yahoo-inc.com>.

On Nov 5, 2011, at 05:37 , lars hofhansl wrote:

Cool stuff Daniel,

Hi Lars,

Thanks for the good points.


Was looking through the code a bit. Seems like you make a best effort to push as much of
the filtering of KVs of uncommitted transactions to HBase and then do some filtering on the client
not a bad approach. (I hope I didn't misunderstand the approach, only looked through the code for
1/2 hour or so).

Putting it more accurately, the uncommitted KVs are stored at HBase, but it is the client's job to filter them using the commit information that it has received from the status oracle. According to snapshot isolation guarantee, all the versions that are inserted with a timestamp larger than the transaction start timestamp must be ignored, which is done by setting the time range on the client's get request sent to HBase. Since the uncommitted changes of the aborted transactions are eventually removed from HBase, the client rarely needs to fetch more than a version to reach a KV that is committed before the transaction starts (the first property of snapshot isolation).



One thing I was wondering: Why bookkeeper? Why not store the WAL itself in HBase? That way
you might not even need a separate server.

Did you see: HBaseSI (http://www.cs.uwaterloo.ca/~c15zhang/HBaseSI.pdf), they also do MVCC
on top of unaltered HBase/schema, although from reading that paper I get the impression that it
would not scale to scans touching many rows (which is where your client side filtering comes in).

Thanks for the link. We had seen the other paper of the same authors (Grid2010) that shares the same bottlenecks with the recent work.
As you pointed out correctly, the question is about performance. You could see the scalability bottleneck of 400 TPS in the evaluation section of this paper. Our approach, however, provides snapshot isolation with a negligible overhead on region servers, and could scale up to tens of thousands write transactions per second. If you are interested, a summary of techniques that we used to achieve this performance is published at SOSP'11, poster section.
http://sigops.org/sosp/sosp11/posters/summaries/sosp11-final12.pdf


-- Lars


----- Original Message -----
From: Daniel Gómez Ferro <da...@yahoo-inc.com>>
To: "dev@hbase.apache.org<ma...@hbase.apache.org>" <de...@hbase.apache.org>>; "user@hbase.apache.org<ma...@hbase.apache.org>" <us...@hbase.apache.org>>
Cc: Maysam Yabandeh <ma...@yahoo-inc.com>>; Flavio Junqueira <fp...@yahoo-inc.com>>; Benjamin Reed <br...@yahoo-inc.com>>; Ivan Kelly <iv...@yahoo-inc.com>>
Sent: Friday, November 4, 2011 4:24 AM
Subject: Omid: Transactional Support for HBase

(I apologize for resending but I forgot to add the user list.)

Hi all,

It is my pleasure to announce the open source release of Omid, a project whose goal is to add lock-free transactional support on top of HBase. The current release includes CrSO, a client-replicated status oracle that detects the write-write conflicts to provide Snapshot Isolation. CrSO has the following appealing properties:

1) It does not need any modification into the HBase code nor the table scheme.
2) The overhead on HBase DataNodes is negligible (only after an abort)
3) It scales up to 50,000 write transactions per second (TPS) and a thousand of client connections.

We have setup a github project: https://github.com/dgomezferro/omid

More information is available at the wiki: https://github.com/dgomezferro/omid/wiki

If you are interested, installation and running instructions are available on the README: https://github.com/dgomezferro/omid/blob/master/README.md

Please do not hesitate to contact us in the case of any question.

Best Regards,
Daniel Gómez Ferro

Re: Omid: Transactional Support for HBase

Posted by Daniel Gómez Ferro <da...@yahoo-inc.com>.

On Nov 5, 2011, at 05:37 , lars hofhansl wrote:

Cool stuff Daniel,

Hi Lars,

Thanks for the good points.


Was looking through the code a bit. Seems like you make a best effort to push as much of
the filtering of KVs of uncommitted transactions to HBase and then do some filtering on the client
not a bad approach. (I hope I didn't misunderstand the approach, only looked through the code for
1/2 hour or so).

Putting it more accurately, the uncommitted KVs are stored at HBase, but it is the client's job to filter them using the commit information that it has received from the status oracle. According to snapshot isolation guarantee, all the versions that are inserted with a timestamp larger than the transaction start timestamp must be ignored, which is done by setting the time range on the client's get request sent to HBase. Since the uncommitted changes of the aborted transactions are eventually removed from HBase, the client rarely needs to fetch more than a version to reach a KV that is committed before the transaction starts (the first property of snapshot isolation).



One thing I was wondering: Why bookkeeper? Why not store the WAL itself in HBase? That way
you might not even need a separate server.

Did you see: HBaseSI (http://www.cs.uwaterloo.ca/~c15zhang/HBaseSI.pdf), they also do MVCC
on top of unaltered HBase/schema, although from reading that paper I get the impression that it
would not scale to scans touching many rows (which is where your client side filtering comes in).

Thanks for the link. We had seen the other paper of the same authors (Grid2010) that shares the same bottlenecks with the recent work.
As you pointed out correctly, the question is about performance. You could see the scalability bottleneck of 400 TPS in the evaluation section of this paper. Our approach, however, provides snapshot isolation with a negligible overhead on region servers, and could scale up to tens of thousands write transactions per second. If you are interested, a summary of techniques that we used to achieve this performance is published at SOSP'11, poster section.
http://sigops.org/sosp/sosp11/posters/summaries/sosp11-final12.pdf


-- Lars


----- Original Message -----
From: Daniel Gómez Ferro <da...@yahoo-inc.com>>
To: "dev@hbase.apache.org<ma...@hbase.apache.org>" <de...@hbase.apache.org>>; "user@hbase.apache.org<ma...@hbase.apache.org>" <us...@hbase.apache.org>>
Cc: Maysam Yabandeh <ma...@yahoo-inc.com>>; Flavio Junqueira <fp...@yahoo-inc.com>>; Benjamin Reed <br...@yahoo-inc.com>>; Ivan Kelly <iv...@yahoo-inc.com>>
Sent: Friday, November 4, 2011 4:24 AM
Subject: Omid: Transactional Support for HBase

(I apologize for resending but I forgot to add the user list.)

Hi all,

It is my pleasure to announce the open source release of Omid, a project whose goal is to add lock-free transactional support on top of HBase. The current release includes CrSO, a client-replicated status oracle that detects the write-write conflicts to provide Snapshot Isolation. CrSO has the following appealing properties:

1) It does not need any modification into the HBase code nor the table scheme.
2) The overhead on HBase DataNodes is negligible (only after an abort)
3) It scales up to 50,000 write transactions per second (TPS) and a thousand of client connections.

We have setup a github project: https://github.com/dgomezferro/omid

More information is available at the wiki: https://github.com/dgomezferro/omid/wiki

If you are interested, installation and running instructions are available on the README: https://github.com/dgomezferro/omid/blob/master/README.md

Please do not hesitate to contact us in the case of any question.

Best Regards,
Daniel Gómez Ferro

Re: Omid: Transactional Support for HBase

Posted by Flavio Junqueira <fp...@yahoo-inc.com>.

Hi Stack, It is an old jira: HBASE-2315.

-Flavio

On Nov 5, 2011, at 7:40 PM, Stack wrote:

> On Sat, Nov 5, 2011 at 12:53 AM, Flavio Junqueira <fpj@yahoo- 
> inc.com> wrote:
>>
>> On Nov 5, 2011, at 5:37 AM, lars hofhansl wrote:
>>
>>> One thing I was wondering: Why bookkeeper? Why not store the WAL  
>>> itself in
>>> HBase? That way
>>> you might not even need a separate server.
>>>
>>
>> Lars, The key reasons are performance and scalability. We also  
>> would like to
>> have region servers at some point using BookKeeper for WAL for our
>> applications, even though the figure doesn't show it. In that case,  
>> we have
>> a common platform for WAL. We have a jira open about it, but we  
>> haven't made
>> progress in a while due to other priorities and some issues with  
>> the WAL
>> interface, but we have been planning on getting back to it.
>>
>> Btw, I'm assuming you know about the project:
>> http://zookeeper.apache.org/bookkeeper/
>>
>
> Please keep us posted on project.  It sounds like an interesting
> experiment.  What is the JIRA so we can follow.   What are the WAL
> Interface issues (same as for NN?).
>
> Thanks Flavio,
> St.Ack

flavio
junqueira

research scientist

fpj@yahoo-inc.com
direct +34 93-183-8828

avinguda diagonal 177, 8th floor, barcelona, 08018, es
phone (408) 349 3300    fax (408) 349 3301

Re: Omid: Transactional Support for HBase

Posted by Flavio Junqueira <fp...@yahoo-inc.com>.

Hi Stack, It is an old jira: HBASE-2315.

-Flavio

On Nov 5, 2011, at 7:40 PM, Stack wrote:

> On Sat, Nov 5, 2011 at 12:53 AM, Flavio Junqueira <fpj@yahoo- 
> inc.com> wrote:
>>
>> On Nov 5, 2011, at 5:37 AM, lars hofhansl wrote:
>>
>>> One thing I was wondering: Why bookkeeper? Why not store the WAL  
>>> itself in
>>> HBase? That way
>>> you might not even need a separate server.
>>>
>>
>> Lars, The key reasons are performance and scalability. We also  
>> would like to
>> have region servers at some point using BookKeeper for WAL for our
>> applications, even though the figure doesn't show it. In that case,  
>> we have
>> a common platform for WAL. We have a jira open about it, but we  
>> haven't made
>> progress in a while due to other priorities and some issues with  
>> the WAL
>> interface, but we have been planning on getting back to it.
>>
>> Btw, I'm assuming you know about the project:
>> http://zookeeper.apache.org/bookkeeper/
>>
>
> Please keep us posted on project.  It sounds like an interesting
> experiment.  What is the JIRA so we can follow.   What are the WAL
> Interface issues (same as for NN?).
>
> Thanks Flavio,
> St.Ack

flavio
junqueira

research scientist

fpj@yahoo-inc.com
direct +34 93-183-8828

avinguda diagonal 177, 8th floor, barcelona, 08018, es
phone (408) 349 3300    fax (408) 349 3301

Re: Omid: Transactional Support for HBase

Posted by Stack <st...@duboce.net>.

On Sat, Nov 5, 2011 at 12:53 AM, Flavio Junqueira <fp...@yahoo-inc.com> wrote:
>
> On Nov 5, 2011, at 5:37 AM, lars hofhansl wrote:
>
>> One thing I was wondering: Why bookkeeper? Why not store the WAL itself in
>> HBase? That way
>> you might not even need a separate server.
>>
>
> Lars, The key reasons are performance and scalability. We also would like to
> have region servers at some point using BookKeeper for WAL for our
> applications, even though the figure doesn't show it. In that case, we have
> a common platform for WAL. We have a jira open about it, but we haven't made
> progress in a while due to other priorities and some issues with the WAL
> interface, but we have been planning on getting back to it.
>
> Btw, I'm assuming you know about the project:
> http://zookeeper.apache.org/bookkeeper/
>

Please keep us posted on project.  It sounds like an interesting
experiment.  What is the JIRA so we can follow.   What are the WAL
Interface issues (same as for NN?).

Thanks Flavio,
St.Ack

Re: Omid: Transactional Support for HBase

Posted by Stack <st...@duboce.net>.

On Sat, Nov 5, 2011 at 12:53 AM, Flavio Junqueira <fp...@yahoo-inc.com> wrote:
>
> On Nov 5, 2011, at 5:37 AM, lars hofhansl wrote:
>
>> One thing I was wondering: Why bookkeeper? Why not store the WAL itself in
>> HBase? That way
>> you might not even need a separate server.
>>
>
> Lars, The key reasons are performance and scalability. We also would like to
> have region servers at some point using BookKeeper for WAL for our
> applications, even though the figure doesn't show it. In that case, we have
> a common platform for WAL. We have a jira open about it, but we haven't made
> progress in a while due to other priorities and some issues with the WAL
> interface, but we have been planning on getting back to it.
>
> Btw, I'm assuming you know about the project:
> http://zookeeper.apache.org/bookkeeper/
>

Please keep us posted on project.  It sounds like an interesting
experiment.  What is the JIRA so we can follow.   What are the WAL
Interface issues (same as for NN?).

Thanks Flavio,
St.Ack

Re: Omid: Transactional Support for HBase

Posted by Flavio Junqueira <fp...@yahoo-inc.com>.

On Nov 5, 2011, at 5:37 AM, lars hofhansl wrote:

> One thing I was wondering: Why bookkeeper? Why not store the WAL  
> itself in HBase? That way
> you might not even need a separate server.
>

Lars, The key reasons are performance and scalability. We also would  
like to have region servers at some point using BookKeeper for WAL for  
our applications, even though the figure doesn't show it. In that  
case, we have a common platform for WAL. We have a jira open about it,  
but we haven't made progress in a while due to other priorities and  
some issues with the WAL interface, but we have been planning on  
getting back to it.

Btw, I'm assuming you know about the project: http://zookeeper.apache.org/bookkeeper/

-Flavio

> ----- Original Message -----
> From: Daniel Gómez Ferro <da...@yahoo-inc.com>
> To: "dev@hbase.apache.org" <de...@hbase.apache.org>; "user@hbase.apache.org 
> " <us...@hbase.apache.org>
> Cc: Maysam Yabandeh <ma...@yahoo-inc.com>; Flavio Junqueira <fpj@yahoo-inc.com 
> >; Benjamin Reed <br...@yahoo-inc.com>; Ivan Kelly <ivank@yahoo-inc.com 
> >
> Sent: Friday, November 4, 2011 4:24 AM
> Subject: Omid: Transactional Support for HBase
>
> (I apologize for resending but I forgot to add the user list.)
>
> Hi all,
>
> It is my pleasure to announce the open source release of Omid, a  
> project whose goal is to add lock-free transactional support on top  
> of HBase. The current release includes CrSO, a client-replicated  
> status oracle that detects the write-write conflicts to provide  
> Snapshot Isolation. CrSO has the following appealing properties:
>
> 1) It does not need any modification into the HBase code nor the  
> table scheme.
> 2) The overhead on HBase DataNodes is negligible (only after an abort)
> 3) It scales up to 50,000 write transactions per second (TPS) and a  
> thousand of client connections.
>
> We have setup a github project: https://github.com/dgomezferro/omid
>
> More information is available at the wiki: https://github.com/dgomezferro/omid/wiki
>
> If you are interested, installation and running instructions are  
> available on the README: https://github.com/dgomezferro/omid/blob/master/README.md
>
> Please do not hesitate to contact us in the case of any question.
>
> Best Regards,
> Daniel Gómez Ferro
>

flavio
junqueira

research scientist

fpj@yahoo-inc.com
direct +34 93-183-8828

avinguda diagonal 177, 8th floor, barcelona, 08018, es
phone (408) 349 3300    fax (408) 349 3301

Re: Omid: Transactional Support for HBase

Posted by Flavio Junqueira <fp...@yahoo-inc.com>.

On Nov 5, 2011, at 5:37 AM, lars hofhansl wrote:

> One thing I was wondering: Why bookkeeper? Why not store the WAL  
> itself in HBase? That way
> you might not even need a separate server.
>

Lars, The key reasons are performance and scalability. We also would  
like to have region servers at some point using BookKeeper for WAL for  
our applications, even though the figure doesn't show it. In that  
case, we have a common platform for WAL. We have a jira open about it,  
but we haven't made progress in a while due to other priorities and  
some issues with the WAL interface, but we have been planning on  
getting back to it.

Btw, I'm assuming you know about the project: http://zookeeper.apache.org/bookkeeper/

-Flavio

> ----- Original Message -----
> From: Daniel Gómez Ferro <da...@yahoo-inc.com>
> To: "dev@hbase.apache.org" <de...@hbase.apache.org>; "user@hbase.apache.org 
> " <us...@hbase.apache.org>
> Cc: Maysam Yabandeh <ma...@yahoo-inc.com>; Flavio Junqueira <fpj@yahoo-inc.com 
> >; Benjamin Reed <br...@yahoo-inc.com>; Ivan Kelly <ivank@yahoo-inc.com 
> >
> Sent: Friday, November 4, 2011 4:24 AM
> Subject: Omid: Transactional Support for HBase
>
> (I apologize for resending but I forgot to add the user list.)
>
> Hi all,
>
> It is my pleasure to announce the open source release of Omid, a  
> project whose goal is to add lock-free transactional support on top  
> of HBase. The current release includes CrSO, a client-replicated  
> status oracle that detects the write-write conflicts to provide  
> Snapshot Isolation. CrSO has the following appealing properties:
>
> 1) It does not need any modification into the HBase code nor the  
> table scheme.
> 2) The overhead on HBase DataNodes is negligible (only after an abort)
> 3) It scales up to 50,000 write transactions per second (TPS) and a  
> thousand of client connections.
>
> We have setup a github project: https://github.com/dgomezferro/omid
>
> More information is available at the wiki: https://github.com/dgomezferro/omid/wiki
>
> If you are interested, installation and running instructions are  
> available on the README: https://github.com/dgomezferro/omid/blob/master/README.md
>
> Please do not hesitate to contact us in the case of any question.
>
> Best Regards,
> Daniel Gómez Ferro
>

flavio
junqueira

research scientist

fpj@yahoo-inc.com
direct +34 93-183-8828

avinguda diagonal 177, 8th floor, barcelona, 08018, es
phone (408) 349 3300    fax (408) 349 3301

Re: Omid: Transactional Support for HBase

Posted by lars hofhansl <lh...@yahoo.com>.

Cool stuff Daniel,

Was looking through the code a bit. Seems like you make a best effort to push as much of
the filtering of KVs of uncommitted transactions to HBase and then do some filtering on the client
not a bad approach. (I hope I didn't misunderstand the approach, only looked through the code for
1/2 hour or so).


One thing I was wondering: Why bookkeeper? Why not store the WAL itself in HBase? That way
you might not even need a separate server.

Did you see: HBaseSI (http://www.cs.uwaterloo.ca/~c15zhang/HBaseSI.pdf), they also do MVCC
on top of unaltered HBase/schema, although from reading that paper I get the impression that it
would not scale to scans touching many rows (which is where your client side filtering comes in).

-- Lars


----- Original Message -----
From: Daniel Gómez Ferro <da...@yahoo-inc.com>
To: "dev@hbase.apache.org" <de...@hbase.apache.org>; "user@hbase.apache.org" <us...@hbase.apache.org>
Cc: Maysam Yabandeh <ma...@yahoo-inc.com>; Flavio Junqueira <fp...@yahoo-inc.com>; Benjamin Reed <br...@yahoo-inc.com>; Ivan Kelly <iv...@yahoo-inc.com>
Sent: Friday, November 4, 2011 4:24 AM
Subject: Omid: Transactional Support for HBase

(I apologize for resending but I forgot to add the user list.)

Hi all,

It is my pleasure to announce the open source release of Omid, a project whose goal is to add lock-free transactional support on top of HBase. The current release includes CrSO, a client-replicated status oracle that detects the write-write conflicts to provide Snapshot Isolation. CrSO has the following appealing properties:

1) It does not need any modification into the HBase code nor the table scheme.
2) The overhead on HBase DataNodes is negligible (only after an abort)
3) It scales up to 50,000 write transactions per second (TPS) and a thousand of client connections.

We have setup a github project: https://github.com/dgomezferro/omid

More information is available at the wiki: https://github.com/dgomezferro/omid/wiki

If you are interested, installation and running instructions are available on the README: https://github.com/dgomezferro/omid/blob/master/README.md

Please do not hesitate to contact us in the case of any question.

Best Regards,
Daniel Gómez Ferro

Re: Omid: Transactional Support for HBase

Posted by lars hofhansl <lh...@yahoo.com>.

Cool stuff Daniel,

Was looking through the code a bit. Seems like you make a best effort to push as much of
the filtering of KVs of uncommitted transactions to HBase and then do some filtering on the client
not a bad approach. (I hope I didn't misunderstand the approach, only looked through the code for
1/2 hour or so).


One thing I was wondering: Why bookkeeper? Why not store the WAL itself in HBase? That way
you might not even need a separate server.

Did you see: HBaseSI (http://www.cs.uwaterloo.ca/~c15zhang/HBaseSI.pdf), they also do MVCC
on top of unaltered HBase/schema, although from reading that paper I get the impression that it
would not scale to scans touching many rows (which is where your client side filtering comes in).

-- Lars


----- Original Message -----
From: Daniel Gómez Ferro <da...@yahoo-inc.com>
To: "dev@hbase.apache.org" <de...@hbase.apache.org>; "user@hbase.apache.org" <us...@hbase.apache.org>
Cc: Maysam Yabandeh <ma...@yahoo-inc.com>; Flavio Junqueira <fp...@yahoo-inc.com>; Benjamin Reed <br...@yahoo-inc.com>; Ivan Kelly <iv...@yahoo-inc.com>
Sent: Friday, November 4, 2011 4:24 AM
Subject: Omid: Transactional Support for HBase

(I apologize for resending but I forgot to add the user list.)

Hi all,

It is my pleasure to announce the open source release of Omid, a project whose goal is to add lock-free transactional support on top of HBase. The current release includes CrSO, a client-replicated status oracle that detects the write-write conflicts to provide Snapshot Isolation. CrSO has the following appealing properties:

1) It does not need any modification into the HBase code nor the table scheme.
2) The overhead on HBase DataNodes is negligible (only after an abort)
3) It scales up to 50,000 write transactions per second (TPS) and a thousand of client connections.

We have setup a github project: https://github.com/dgomezferro/omid

More information is available at the wiki: https://github.com/dgomezferro/omid/wiki

If you are interested, installation and running instructions are available on the README: https://github.com/dgomezferro/omid/blob/master/README.md

Please do not hesitate to contact us in the case of any question.

Best Regards,
Daniel Gómez Ferro