You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Andrew Purtell <ap...@apache.org> on 2010/09/02 21:36:10 UTC

response to "The problems with ACID, and how to fix them without going NoSQL"

I've tried to post the below comment twice at

The problems with ACID, and how to fix them without going NoSQL
http://dbmsmusings.blogspot.com/2010/08/problems-with-acid-and-how-to-fix-them.html

For whatever reason, it has appeared in the comments section from my perspective briefly twice and then disappeared twice, so I will just post it here, because HBase is mentioned in the article a few times, and ... well, just read. :-)

>>>

Many earlier comments have covered much of what I would say. However, nobody to date has raised an objection to the mildly offensive contention that "the NoSQL decision to give up on ACID is the lazy solution to these scalability and replication issues." Possibly this was not meant in the pejorative sense, but it reads that way. I would argue the correct term of art here is pragmatism, not laziness.

I am a contributor to the HBase project. HBase is an open source implementation of the BigTable architecture. Indeed our system does scale out by substantially relaxing the scope of ACID guarantees. But it is a gross generalization to suggest "NoSQL" is "NoACID", and somehow lazy in the pejorative sense, and this mars the argument of the authors. HBase at least in particular provides durability, row-level atomicity (agree here this is a nice convenient partition), and favors strong consistency in its design choices. In this regard, I would also like to bring to your attention that the authors made an error describing the scope of transactional atomicity available in BigTable -- the scope is actually the row, not each individual KV.

Also, at least HBase in particular is a big project with several interesting design/research directions and so does not reduce to a convenient stereotype: a transactional layer that provides global ACID properties at user option (that does not scale out like the underlying system but is nonetheless available), exploration of notions of referential integrity, even consideration of optional relaxed consistency (read replicas) in the other direction.

Back to the matter of pragmatism: While it is likely most structured data store users are not building systems on the scale of a globally distributed search engine, actually that is not too far off the mark for the design targets of some HBase installations. We indeed do need to work with very large mutating data sets today and nothing in the manner of a traditional relational database system is up to the task. The discussion here, while intriguing, is also rendered fairly academic by the "horrible" performance if spinning media is used. Flash will not be competitive with spinning media at high tera- or peta-scale for at least several years yet. Other commenters have also noticed apparent bottlenecks in the presented design which suggest a high scale implementation will be problematic.

Anyway, it is my belief we are attacking the same set of problems but are starting at it on opposing sides of a continuum and, ultimately, we shall meet up somewhere in the middle.

September 2, 2010 10:55 AM

<<<

- Andy

Re: response to "The problems with ACID, and how to fix them without going NoSQL"

Posted by Edward Capriolo <ed...@gmail.com>.

On Thu, Sep 2, 2010 at 3:39 PM, Ryan Rawson <ry...@gmail.com> wrote:
> The flaws with the paper are insanely obvious if you look at them:
>
> - their solution doesn't run on disk.  Many things get faster when you
> restrict yourself to RAM/flash
> - their solution doesn't scale!  Looks like a shared nothing sharding
> with global transaction ordering and no internal locks.
>
> Or am I missing something big here?
>
> I generally find it tiresome when people bash on bigtable, yet their
> "awesome" thing doesn't scale to multi-PB databases.  Reminds me of
> that "time for an architectural rewrite" which was essentially "if you
> do everything in 1 thread/CPU you dont need locks and are faster".
> This was just the same thing as far as I can tell from skimming the
> paper.
>
> On Thu, Sep 2, 2010 at 12:36 PM, Andrew Purtell <ap...@apache.org> wrote:
>> I've tried to post the below comment twice at
>>
>>    The problems with ACID, and how to fix them without going NoSQL
>>    http://dbmsmusings.blogspot.com/2010/08/problems-with-acid-and-how-to-fix-them.html
>>
>> For whatever reason, it has appeared in the comments section from my perspective briefly twice and then disappeared twice, so I will just post it here, because HBase is mentioned in the article a few times, and ... well, just read. :-)
>>
>>>>>
>>
>> Many earlier comments have covered much of what I would say. However, nobody to date has raised an objection to the mildly offensive contention that "the NoSQL decision to give up on ACID is the lazy solution to these scalability and replication issues." Possibly this was not meant in the pejorative sense, but it reads that way. I would argue the correct term of art here is pragmatism, not laziness.
>>
>> I am a contributor to the HBase project. HBase is an open source implementation of the BigTable architecture. Indeed our system does scale out by substantially relaxing the scope of ACID guarantees. But it is a gross generalization to suggest "NoSQL" is "NoACID", and somehow lazy in the pejorative sense, and this mars the argument of the authors. HBase at least in particular provides durability, row-level atomicity (agree here this is a nice convenient partition), and favors strong consistency in its design choices. In this regard, I would also like to bring to your attention that the authors made an error describing the scope of transactional atomicity available in BigTable -- the scope is actually the row, not each individual KV.
>>
>> Also, at least HBase in particular is a big project with several interesting design/research directions and so does not reduce to a convenient stereotype: a transactional layer that provides global ACID properties at user option (that does not scale out like the underlying system but is nonetheless available), exploration of notions of referential integrity, even consideration of optional relaxed consistency (read replicas) in the other direction.
>>
>> Back to the matter of pragmatism: While it is likely most structured data store users are not building systems on the scale of a globally distributed search engine, actually that is not too far off the mark for the design targets of some HBase installations. We indeed do need to work with very large mutating data sets today and nothing in the manner of a traditional relational database system is up to the task. The discussion here, while intriguing, is also rendered fairly academic by the "horrible" performance if spinning media is used. Flash will not be competitive with spinning media at high tera- or peta-scale for at least several years yet. Other commenters have also noticed apparent bottlenecks in the presented design which suggest a high scale implementation will be problematic.
>>
>> Anyway, it is my belief we are attacking the same set of problems but are starting at it on opposing sides of a continuum and, ultimately, we shall meet up somewhere in the middle.
>>
>> September 2, 2010 10:55 AM
>>
>> <<<
>>
>>   - Andy
>>
>>
>>
>>
>>
>>
>

Wait! I thought all the problems were solved 25 years ago? :)


http://cloud.pubs.dbs.uni-leipzig.de/node/27

Re: response to "The problems with ACID, and how to fix them without going NoSQL"

Posted by Ryan Rawson <ry...@gmail.com>.

The flaws with the paper are insanely obvious if you look at them:

- their solution doesn't run on disk.  Many things get faster when you
restrict yourself to RAM/flash
- their solution doesn't scale!  Looks like a shared nothing sharding
with global transaction ordering and no internal locks.

Or am I missing something big here?

I generally find it tiresome when people bash on bigtable, yet their
"awesome" thing doesn't scale to multi-PB databases.  Reminds me of
that "time for an architectural rewrite" which was essentially "if you
do everything in 1 thread/CPU you dont need locks and are faster".
This was just the same thing as far as I can tell from skimming the
paper.

On Thu, Sep 2, 2010 at 12:36 PM, Andrew Purtell <ap...@apache.org> wrote:
> I've tried to post the below comment twice at
>
>    The problems with ACID, and how to fix them without going NoSQL
>    http://dbmsmusings.blogspot.com/2010/08/problems-with-acid-and-how-to-fix-them.html
>
> For whatever reason, it has appeared in the comments section from my perspective briefly twice and then disappeared twice, so I will just post it here, because HBase is mentioned in the article a few times, and ... well, just read. :-)
>
>>>>
>
> Many earlier comments have covered much of what I would say. However, nobody to date has raised an objection to the mildly offensive contention that "the NoSQL decision to give up on ACID is the lazy solution to these scalability and replication issues." Possibly this was not meant in the pejorative sense, but it reads that way. I would argue the correct term of art here is pragmatism, not laziness.
>
> I am a contributor to the HBase project. HBase is an open source implementation of the BigTable architecture. Indeed our system does scale out by substantially relaxing the scope of ACID guarantees. But it is a gross generalization to suggest "NoSQL" is "NoACID", and somehow lazy in the pejorative sense, and this mars the argument of the authors. HBase at least in particular provides durability, row-level atomicity (agree here this is a nice convenient partition), and favors strong consistency in its design choices. In this regard, I would also like to bring to your attention that the authors made an error describing the scope of transactional atomicity available in BigTable -- the scope is actually the row, not each individual KV.
>
> Also, at least HBase in particular is a big project with several interesting design/research directions and so does not reduce to a convenient stereotype: a transactional layer that provides global ACID properties at user option (that does not scale out like the underlying system but is nonetheless available), exploration of notions of referential integrity, even consideration of optional relaxed consistency (read replicas) in the other direction.
>
> Back to the matter of pragmatism: While it is likely most structured data store users are not building systems on the scale of a globally distributed search engine, actually that is not too far off the mark for the design targets of some HBase installations. We indeed do need to work with very large mutating data sets today and nothing in the manner of a traditional relational database system is up to the task. The discussion here, while intriguing, is also rendered fairly academic by the "horrible" performance if spinning media is used. Flash will not be competitive with spinning media at high tera- or peta-scale for at least several years yet. Other commenters have also noticed apparent bottlenecks in the presented design which suggest a high scale implementation will be problematic.
>
> Anyway, it is my belief we are attacking the same set of problems but are starting at it on opposing sides of a continuum and, ultimately, we shall meet up somewhere in the middle.
>
> September 2, 2010 10:55 AM
>
> <<<
>
>   - Andy
>
>
>
>
>
>