You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by dir dir <si...@gmail.com> on 2010/04/24 19:20:49 UTC

The Difference Between Cassandra and HBase

In general what is the difference between Cassandra and HBase??

Thanks.

Re: The Difference Between Cassandra and HBase

Posted by Masood Mortazavi <ma...@gmail.com>.
On Sat, Apr 24, 2010 at 10:20 AM, dir dir <si...@gmail.com> wrote:

> In general what is the difference between Cassandra and HBase??
>
> Thanks.
>


Others have already said it ...

Cassandra has a peer architecture, with all peers being essentially
equivalent (minus the concept of a "seed," as far as I can tell).

This is a great architectural advantage of Cassandra and Cassandra-like
systems. It wasn't really possible to make practical systems like this in
earlier ages because of computing (memory, CPU, disk) limitations which made
characteristic times (including expected characteristic response, recovery,
replication, etc. times) and system dynamics almost impossible to deal with.
This problem persists but has become far more manageable because expected
response times haven't evolved or narrowed any faster than computational
capabilities.

HBase on the other hand is a layered system already. It relies on the
underlying HDFS, beyond and above the OS. As a more layered systems, it has
better service architecture, in a sense, but it relies and is limited to the
capabilities of those "services" ... say the distributed file service.

Cassandra rolls its own partitioning and replication mechanisms at the level
of its peers. It does not rely on some underlying system service for these
capabilities. Cassandra is definitely easier to provision and use, from an
operational point of view, and this is a great advantage -- although
installations that afford scanning (through ordered partitioning) would
become more involved.

(As suggested by others, reading the BigTable and Dynamo paper will help you
to establish the difference between HBase and Cassandra in more clear,
architectural terms.)

- m.

Re: The Difference Between Cassandra and HBase

Posted by Joe Stump <jo...@joestump.net>.
On Apr 25, 2010, at 5:18 PM, Eric Hauser wrote:

> Out of curiosity, are you planning on copying the data you store in HBase/Hive into separate Hadoop cluster in a different data center or backing up HDFS in some other manner?  Redundancy isn't an issue within the cluster; it's more a concern of storing all your HDFS data in one physical location.

We'll eventually move to this. For the near-term we'll be routing traffic using HBase to a single data center.

--Joe


Re: The Difference Between Cassandra and HBase

Posted by Eric Hauser <ew...@gmail.com>.
Out of curiosity, are you planning on copying the data you store in
HBase/Hive into separate Hadoop cluster in a different data center or
backing up HDFS in some other manner?  Redundancy isn't an issue within the
cluster; it's more a concern of storing all your HDFS data in one physical
location.


On Sun, Apr 25, 2010 at 8:04 AM, Joe Stump <jo...@joestump.net> wrote:

>
> On Apr 25, 2010, at 11:40 AM, Mark Robson wrote:
>
> > For me an important difference is that Cassandra is operationally much
> more straightforward - there is only one type of node, and it is fully
> redundant (depending what consistency level you're using).
> >
> > This seems to be an advantage in Cassandra vs most other distributed
> storage systems, which almost all seem to require some "master" nodes which
> have different operational requirements (e.g. cannot fail, need to be failed
> over manually or have another HA solution installed for them)
>
> These two remain the #1 and #2 reasons I recommend Cassandra over HBase. At
> the end of the day, Cassandra is an *absolute* dream to manage across
> multiple data centers. I could go on and on about the voodoo that is
> expanding, contracting, and rebalancing a Cassandra cluster. It's pretty
> awesome.
>
> That being said, we're getting ready to spin up an HBase cluster. If you're
> wanting increment/decrement, more complex range scans, etc. then HBase is a
> great candidate. Especially if you don't need it to span multiple data
> centers. We're using Cassandra for our main things, and then HBase+Hive for
> analytics.
>
> There's room for both. Especially if you're using Hadoop with Cassandra.
>
> --Joe
>
>

Re: The Difference Between Cassandra and HBase

Posted by Joseph Stein <cr...@gmail.com>.
it is kind of the classic distinction between OLTP & OLAP.

Cassandra is to OLTP as HBase is to OLAP (for those SAT nutz).

Both are useful and valuable in their own right, agreed.

On Sun, Apr 25, 2010 at 12:20 PM, Jeff Hodges <jh...@twitter.com> wrote:
> HBase is awesome when you need high throughput and don't care so much
> about latency. Cassandra is generally the opposite. They are
> wonderfully complementary.
> --
> Jeff
>
> On Sun, Apr 25, 2010 at 8:19 AM, Lenin Gali <ga...@gmail.com> wrote:
>> I second Joe.
>>
>> Lenin
>> Sent from my BlackBerry® wireless handheld
>>
>> -----Original Message-----
>> From: Joe Stump <jo...@joestump.net>
>> Date: Sun, 25 Apr 2010 13:04:50
>> To: <us...@cassandra.apache.org>
>> Subject: Re: The Difference Between Cassandra and HBase
>>
>>
>> On Apr 25, 2010, at 11:40 AM, Mark Robson wrote:
>>
>>> For me an important difference is that Cassandra is operationally much more straightforward - there is only one type of node, and it is fully redundant (depending what consistency level you're using).
>>>
>>> This seems to be an advantage in Cassandra vs most other distributed storage systems, which almost all seem to require some "master" nodes which have different operational requirements (e.g. cannot fail, need to be failed over manually or have another HA solution installed for them)
>>
>> These two remain the #1 and #2 reasons I recommend Cassandra over HBase. At the end of the day, Cassandra is an *absolute* dream to manage across multiple data centers. I could go on and on about the voodoo that is expanding, contracting, and rebalancing a Cassandra cluster. It's pretty awesome.
>>
>> That being said, we're getting ready to spin up an HBase cluster. If you're wanting increment/decrement, more complex range scans, etc. then HBase is a great candidate. Especially if you don't need it to span multiple data centers. We're using Cassandra for our main things, and then HBase+Hive for analytics.
>>
>> There's room for both. Especially if you're using Hadoop with Cassandra.
>>
>> --Joe
>>
>>
>



-- 
/*
Joe Stein
http://www.linkedin.com/in/charmalloc
*/

Re: The Difference Between Cassandra and HBase

Posted by Jeff Hodges <jh...@twitter.com>.
HBase is awesome when you need high throughput and don't care so much
about latency. Cassandra is generally the opposite. They are
wonderfully complementary.
--
Jeff

On Sun, Apr 25, 2010 at 8:19 AM, Lenin Gali <ga...@gmail.com> wrote:
> I second Joe.
>
> Lenin
> Sent from my BlackBerry® wireless handheld
>
> -----Original Message-----
> From: Joe Stump <jo...@joestump.net>
> Date: Sun, 25 Apr 2010 13:04:50
> To: <us...@cassandra.apache.org>
> Subject: Re: The Difference Between Cassandra and HBase
>
>
> On Apr 25, 2010, at 11:40 AM, Mark Robson wrote:
>
>> For me an important difference is that Cassandra is operationally much more straightforward - there is only one type of node, and it is fully redundant (depending what consistency level you're using).
>>
>> This seems to be an advantage in Cassandra vs most other distributed storage systems, which almost all seem to require some "master" nodes which have different operational requirements (e.g. cannot fail, need to be failed over manually or have another HA solution installed for them)
>
> These two remain the #1 and #2 reasons I recommend Cassandra over HBase. At the end of the day, Cassandra is an *absolute* dream to manage across multiple data centers. I could go on and on about the voodoo that is expanding, contracting, and rebalancing a Cassandra cluster. It's pretty awesome.
>
> That being said, we're getting ready to spin up an HBase cluster. If you're wanting increment/decrement, more complex range scans, etc. then HBase is a great candidate. Especially if you don't need it to span multiple data centers. We're using Cassandra for our main things, and then HBase+Hive for analytics.
>
> There's room for both. Especially if you're using Hadoop with Cassandra.
>
> --Joe
>
>

Re: The Difference Between Cassandra and HBase

Posted by Lenin Gali <ga...@gmail.com>.
I second Joe.

Lenin
Sent from my BlackBerry® wireless handheld

-----Original Message-----
From: Joe Stump <jo...@joestump.net>
Date: Sun, 25 Apr 2010 13:04:50 
To: <us...@cassandra.apache.org>
Subject: Re: The Difference Between Cassandra and HBase


On Apr 25, 2010, at 11:40 AM, Mark Robson wrote:

> For me an important difference is that Cassandra is operationally much more straightforward - there is only one type of node, and it is fully redundant (depending what consistency level you're using).
> 
> This seems to be an advantage in Cassandra vs most other distributed storage systems, which almost all seem to require some "master" nodes which have different operational requirements (e.g. cannot fail, need to be failed over manually or have another HA solution installed for them)

These two remain the #1 and #2 reasons I recommend Cassandra over HBase. At the end of the day, Cassandra is an *absolute* dream to manage across multiple data centers. I could go on and on about the voodoo that is expanding, contracting, and rebalancing a Cassandra cluster. It's pretty awesome.

That being said, we're getting ready to spin up an HBase cluster. If you're wanting increment/decrement, more complex range scans, etc. then HBase is a great candidate. Especially if you don't need it to span multiple data centers. We're using Cassandra for our main things, and then HBase+Hive for analytics. 

There's room for both. Especially if you're using Hadoop with Cassandra. 

--Joe


Re: The Difference Between Cassandra and HBase

Posted by Joe Stump <jo...@joestump.net>.
On Apr 25, 2010, at 11:40 AM, Mark Robson wrote:

> For me an important difference is that Cassandra is operationally much more straightforward - there is only one type of node, and it is fully redundant (depending what consistency level you're using).
> 
> This seems to be an advantage in Cassandra vs most other distributed storage systems, which almost all seem to require some "master" nodes which have different operational requirements (e.g. cannot fail, need to be failed over manually or have another HA solution installed for them)

These two remain the #1 and #2 reasons I recommend Cassandra over HBase. At the end of the day, Cassandra is an *absolute* dream to manage across multiple data centers. I could go on and on about the voodoo that is expanding, contracting, and rebalancing a Cassandra cluster. It's pretty awesome.

That being said, we're getting ready to spin up an HBase cluster. If you're wanting increment/decrement, more complex range scans, etc. then HBase is a great candidate. Especially if you don't need it to span multiple data centers. We're using Cassandra for our main things, and then HBase+Hive for analytics. 

There's room for both. Especially if you're using Hadoop with Cassandra. 

--Joe


Re: The Difference Between Cassandra and HBase

Posted by Mark Robson <ma...@gmail.com>.
For me an important difference is that Cassandra is operationally much more
straightforward - there is only one type of node, and it is fully redundant
(depending what consistency level you're using).

This seems to be an advantage in Cassandra vs most other distributed storage
systems, which almost all seem to require some "master" nodes which have
different operational requirements (e.g. cannot fail, need to be failed over
manually or have another HA solution installed for them)

Mark

Re: The Difference Between Cassandra and HBase

Posted by Erik Holstad <er...@gmail.com>.
I would say that HBase is a little bit more focused on reads and Cassandra
on writes.
HBase has better scans and Cassandra better multi datacenter functionality.

Erik

Re: The Difference Between Cassandra and HBase

Posted by Jonathan Ellis <jb...@gmail.com>.
On Sat, Apr 24, 2010 at 9:29 PM, dir dir <si...@gmail.com> wrote:
> I have already read Jonathan Ellis's Blog today
> (http://spyced.blogspot.com/2009/03/why-i-like-cassandra.html)
>
> in this blog, Jonathan tried to explain the difference between Cassandra and
> Hbase.
> But I have several questions. In this blog Jonathan said:
>
> 1. Hbase Follows the bigtable model, so it's more complicated than it needs
> to be.
>     (300+kloc vs 50 for Cassandra; many more components). This means it's
> that
>     much harder for me to troubleshoot.
>
> But in same blog Jonathan also said that Cassandra is a sort of hybrid
> between dynamo
> and big table. Therefore I can conclude Cassandra also follows the Big Table
> model.

No, I said Cassandra provides a data model similar to the one
described in BigTable, i.e. ColumnFamilies with sparse rows.  The
architecture of Cassandra (a fully distributed cluster with each node
managing local storage) is completely different from BigTable (lock
server + master server + tablet servers on top of GFS).

> I also do not understand what is the meaning of  "so it's more complicated
> than it needs to be"??

Read the BigTable paper, then read the Dynamo paper, and this will
make more sense.

> To run Cassandra, should I check out the code from
> (http://svn.apache.org/repos/asf/incubator/cassandra/trunk)?? or as an User,
> I just enough download the binary software of Cassandra 0.6.1  from
> http://cassandra.apache.org??

You should be using 0.6.1.

Re: The Difference Between Cassandra and HBase

Posted by dir dir <si...@gmail.com>.
Hi Paul,

I have already read Jonathan Ellis's Blog today
(http://spyced.blogspot.com/2009/03/why-i-like-cassandra.html)

in this blog, Jonathan tried to explain the difference between Cassandra and
Hbase.
But I have several questions. In this blog Jonathan said:

1. Hbase Follows the bigtable model, so it's more complicated than it needs
to be.
    (300+kloc vs 50 for Cassandra; many more components). This means it's
that
    much harder for me to troubleshoot.

But in same blog Jonathan also said that Cassandra is a sort of hybrid
between dynamo
and big table. Therefore I can conclude Cassandra also follows the Big Table
model.
my question is:  what is the difference between Cassandra and Hbase since
both of
                              them follows the Big Table model??

I also do not understand what is the meaning of  "so it's more complicated
than it needs to be"??
Since Cassandra also follows Big Table model, why Hbase more complicated
than it needs
to be??

Jonathan also said "(300+kloc vs 50 for Cassandra; many more components)".
Does that statement mean Coding in Hbase more long than Cassndra??
(therefore it is much harder for him to troubleshoot)

To run Cassandra, should I check out the code from
(http://svn.apache.org/repos/asf/incubator/cassandra/trunk)?? or as an User,
I just enough download the binary software of Cassandra 0.6.1  from
http://cassandra.apache.org??

Thank You.



On Sun, Apr 25, 2010 at 2:05 AM, Paul Prescod <pa...@prescod.net> wrote:

> http://ria101.wordpress.com/2010/02/24/hbase-vs-cassandra-why-we-moved/
>
> http://spyced.blogspot.com/2009/03/why-i-like-cassandra.html
>
> On Sat, Apr 24, 2010 at 10:20 AM, dir dir <si...@gmail.com> wrote:
> > In general what is the difference between Cassandra and HBase??
> >
> > Thanks.
> >
>

Re: The Difference Between Cassandra and HBase

Posted by Paul Prescod <pa...@prescod.net>.
http://ria101.wordpress.com/2010/02/24/hbase-vs-cassandra-why-we-moved/

http://spyced.blogspot.com/2009/03/why-i-like-cassandra.html

On Sat, Apr 24, 2010 at 10:20 AM, dir dir <si...@gmail.com> wrote:
> In general what is the difference between Cassandra and HBase??
>
> Thanks.
>