You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Time Less <ti...@gmail.com> on 2010/08/18 20:11:08 UTC

Re: major differences with Cassandra

HBase is run by persons who understand (or are willing to hear) the
operational requirements of distributed databases in high-volume
environments, whereas the Cassandra project isn't.

Talks about technical differences are really noise, because they're entirely
theoretical. When viewed with this knowledge, a lot of the disagreements,
flamewars, and shoutfests begin to make sense.

As of today, I'm unaware of any major feature Cassandra claims that it
actually delivers outside of installations run by the developers themselves.
Specifically: multi-DC, hinted handoff, compaction, dynamic cluster resizing
are all fail. The developers will adamantly claim all such features work
just fine. Good luck getting any of it to work in YOUR environment.

In stark contrast, I am intimately familiar with at least one large HBase
installation run by non-developers (at Mozilla).

Disclaimers: I am very familiar with the Cassandra product internals,
developers, history, and community. I am less familiar with HBase. I might
therefore have a rosy view of the HBase community based on ignorance. Also,
in a low-volume environment, pretty much anything works. Including
Cassandra. Or anything else. Any NoSQL. Any SQL. Pick whatever you want and
run with it.


On Fri, Jul 30, 2010 at 9:03 PM, Otis Gospodnetic <
otis_gospodnetic@yahoo.com> wrote:

> I don't have the URL handy, but just the other day I read some
> Cassandra/HBase
> blog post where Cassandra was described as having no SPOF, but somebody
> left
> some very "strong comments" calling out that and a few other claims as
> false.
>  Ah, I remember, here is the URL:
>
> http://blog.mozilla.com/data/2010/05/18/riak-and-cassandra-and-hbase-oh-my/
>
>
> Otis----
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Hadoop ecosystem search :: http://search-hadoop.com/
>
>
>
> ----- Original Message ----
> > From: Jeff Zhang <zj...@gmail.com>
> > To: user@hbase.apache.org
> > Sent: Thu, July 8, 2010 1:34:18 AM
> > Subject: Re: major differences with Cassandra
> >
> > HBase do not have super column family.
> >
> > And I can list the following major  difference between hbase and
> cassandra (
> > welcome any supplement) :
> >
> > 1.  HBase is master-slave architecture, while cassandra has no master,
> and
> > you  can consider it as p2p structure, and it has no single point of
> failure.
> > 2.  HBase is strong consistency while cassandra is eventual consistency
> > (although  you can tune it to be strong consistency)
> >
> >
> > On Thu, Jul 8, 2010 at 1:26  PM, S Ahmed <sa...@gmail.com>  wrote:
> >
> > > Hello!
> > >
> > > I was hoping some has experiences with  both Cassandra and HBase.
> > >
> > > What are the major differences between  Cassandra and HBase?
> > >
> > > Does HBase have the concept of  ColumnFamilies and SuperColumnFamilies
> like
> > > Cassandra?
> > >
> > >  Where in the wiki does it go over designing a data  model?
> > >
> > >
> > > thanks!
> > >
> >
> >
> >
> > --
> > Best  Regards
> >
> > Jeff Zhang
> >
>



-- 
timeless(ness)

Re: major differences with Cassandra

Posted by Edward Capriolo <ed...@gmail.com>.
On Wed, Aug 18, 2010 at 2:17 PM, Ryan Rawson <ry...@gmail.com> wrote:
> Thanks for that bit of feedback.
>
> Right now stumbleupon operates a cluster that handles 20,000 requests a
> second 24/7 for about a year now. Even though we have hbase developers I
> don't think there is any special sauce and anyone could replicate the
> successes we've had. Mozilla is one candidate. There are others who are
> quieter about it.
>
> On Aug 18, 2010 11:11 AM, "Time Less" <ti...@gmail.com> wrote:
>> HBase is run by persons who understand (or are willing to hear) the
>> operational requirements of distributed databases in high-volume
>> environments, whereas the Cassandra project isn't.
>>
>> Talks about technical differences are really noise, because they're
> entirely
>> theoretical. When viewed with this knowledge, a lot of the disagreements,
>> flamewars, and shoutfests begin to make sense.
>>
>> As of today, I'm unaware of any major feature Cassandra claims that it
>> actually delivers outside of installations run by the developers
> themselves.
>> Specifically: multi-DC, hinted handoff, compaction, dynamic cluster
> resizing
>> are all fail. The developers will adamantly claim all such features work
>> just fine. Good luck getting any of it to work in YOUR environment.
>>
>> In stark contrast, I am intimately familiar with at least one large HBase
>> installation run by non-developers (at Mozilla).
>>
>> Disclaimers: I am very familiar with the Cassandra product internals,
>> developers, history, and community. I am less familiar with HBase. I might
>> therefore have a rosy view of the HBase community based on ignorance.
> Also,
>> in a low-volume environment, pretty much anything works. Including
>> Cassandra. Or anything else. Any NoSQL. Any SQL. Pick whatever you want
> and
>> run with it.
>>
>>
>> On Fri, Jul 30, 2010 at 9:03 PM, Otis Gospodnetic <
>> otis_gospodnetic@yahoo.com> wrote:
>>
>>> I don't have the URL handy, but just the other day I read some
>>> Cassandra/HBase
>>> blog post where Cassandra was described as having no SPOF, but somebody
>>> left
>>> some very "strong comments" calling out that and a few other claims as
>>> false.
>>> Ah, I remember, here is the URL:
>>>
>>>
> http://blog.mozilla.com/data/2010/05/18/riak-and-cassandra-and-hbase-oh-my/
>>>
>>>
>>> Otis----
>>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
>>> Hadoop ecosystem search :: http://search-hadoop.com/
>>>
>>>
>>>
>>> ----- Original Message ----
>>> > From: Jeff Zhang <zj...@gmail.com>
>>> > To: user@hbase.apache.org
>>> > Sent: Thu, July 8, 2010 1:34:18 AM
>>> > Subject: Re: major differences with Cassandra
>>> >
>>> > HBase do not have super column family.
>>> >
>>> > And I can list the following major difference between hbase and
>>> cassandra (
>>> > welcome any supplement) :
>>> >
>>> > 1. HBase is master-slave architecture, while cassandra has no master,
>>> and
>>> > you can consider it as p2p structure, and it has no single point of
>>> failure.
>>> > 2. HBase is strong consistency while cassandra is eventual consistency
>>> > (although you can tune it to be strong consistency)
>>> >
>>> >
>>> > On Thu, Jul 8, 2010 at 1:26 PM, S Ahmed <sa...@gmail.com> wrote:
>>> >
>>> > > Hello!
>>> > >
>>> > > I was hoping some has experiences with both Cassandra and HBase.
>>> > >
>>> > > What are the major differences between Cassandra and HBase?
>>> > >
>>> > > Does HBase have the concept of ColumnFamilies and SuperColumnFamilies
>>> like
>>> > > Cassandra?
>>> > >
>>> > > Where in the wiki does it go over designing a data model?
>>> > >
>>> > >
>>> > > thanks!
>>> > >
>>> >
>>> >
>>> >
>>> > --
>>> > Best Regards
>>> >
>>> > Jeff Zhang
>>> >
>>>
>>
>>
>>
>> --
>> timeless(ness)
>

You said:
As of today, I'm unaware of any major feature Cassandra claims that it
actually delivers outside of installations run by the developers themselves.
Specifically: multi-DC, hinted handoff, compaction, dynamic cluster resizing
are all fail. The developers will adamantly claim all such features work
just fine. Good luck getting any of it to work in YOUR environment.

Where to start with this statement:
Multi-DC support:
You are saying cassandra is bad at X.... but hbase does not even do X.
https://issues.apache.org/jira/browse/HBASE-1295

Hinted Handoff:
If i take down a cassandra node hints get delivered to other nodes.
When the failed node comes back online the hints are delivered.

Compaction:
Compaction works. My tables compact at user defined intervals.

Dynamic Cluster Resizing:
Joining a new node is more intensive in cassandra as data has to
physically move from physical node to another. Yet, I regularly add,
replace, and move nodes.

You said:
Talks about technical differences are really noise, because they're
entirely theoretical.

This statement is contradictory. You are saying technical differences
are theoretical. Small technical differences have profound
implications.

You said:
In stark contrast, I am intimately familiar with at least one large HBase
installation run by non-developers (at Mozilla).
Then later:
I am very familiar with the Cassandra product internals,
developers, history, and community. I am less familiar with HBase.

I give up. Are you "intimately familiar" or "less familiar" ?

Where can I check out the source for this "Any SQL" you mention?
Sounds like it has way less problems then these damned no sql
solutions.

Re: major differences with Cassandra

Posted by Ryan Rawson <ry...@gmail.com>.
Thanks for that bit of feedback.

Right now stumbleupon operates a cluster that handles 20,000 requests a
second 24/7 for about a year now. Even though we have hbase developers I
don't think there is any special sauce and anyone could replicate the
successes we've had. Mozilla is one candidate. There are others who are
quieter about it.

On Aug 18, 2010 11:11 AM, "Time Less" <ti...@gmail.com> wrote:
> HBase is run by persons who understand (or are willing to hear) the
> operational requirements of distributed databases in high-volume
> environments, whereas the Cassandra project isn't.
>
> Talks about technical differences are really noise, because they're
entirely
> theoretical. When viewed with this knowledge, a lot of the disagreements,
> flamewars, and shoutfests begin to make sense.
>
> As of today, I'm unaware of any major feature Cassandra claims that it
> actually delivers outside of installations run by the developers
themselves.
> Specifically: multi-DC, hinted handoff, compaction, dynamic cluster
resizing
> are all fail. The developers will adamantly claim all such features work
> just fine. Good luck getting any of it to work in YOUR environment.
>
> In stark contrast, I am intimately familiar with at least one large HBase
> installation run by non-developers (at Mozilla).
>
> Disclaimers: I am very familiar with the Cassandra product internals,
> developers, history, and community. I am less familiar with HBase. I might
> therefore have a rosy view of the HBase community based on ignorance.
Also,
> in a low-volume environment, pretty much anything works. Including
> Cassandra. Or anything else. Any NoSQL. Any SQL. Pick whatever you want
and
> run with it.
>
>
> On Fri, Jul 30, 2010 at 9:03 PM, Otis Gospodnetic <
> otis_gospodnetic@yahoo.com> wrote:
>
>> I don't have the URL handy, but just the other day I read some
>> Cassandra/HBase
>> blog post where Cassandra was described as having no SPOF, but somebody
>> left
>> some very "strong comments" calling out that and a few other claims as
>> false.
>> Ah, I remember, here is the URL:
>>
>>
http://blog.mozilla.com/data/2010/05/18/riak-and-cassandra-and-hbase-oh-my/
>>
>>
>> Otis----
>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
>> Hadoop ecosystem search :: http://search-hadoop.com/
>>
>>
>>
>> ----- Original Message ----
>> > From: Jeff Zhang <zj...@gmail.com>
>> > To: user@hbase.apache.org
>> > Sent: Thu, July 8, 2010 1:34:18 AM
>> > Subject: Re: major differences with Cassandra
>> >
>> > HBase do not have super column family.
>> >
>> > And I can list the following major difference between hbase and
>> cassandra (
>> > welcome any supplement) :
>> >
>> > 1. HBase is master-slave architecture, while cassandra has no master,
>> and
>> > you can consider it as p2p structure, and it has no single point of
>> failure.
>> > 2. HBase is strong consistency while cassandra is eventual consistency
>> > (although you can tune it to be strong consistency)
>> >
>> >
>> > On Thu, Jul 8, 2010 at 1:26 PM, S Ahmed <sa...@gmail.com> wrote:
>> >
>> > > Hello!
>> > >
>> > > I was hoping some has experiences with both Cassandra and HBase.
>> > >
>> > > What are the major differences between Cassandra and HBase?
>> > >
>> > > Does HBase have the concept of ColumnFamilies and SuperColumnFamilies
>> like
>> > > Cassandra?
>> > >
>> > > Where in the wiki does it go over designing a data model?
>> > >
>> > >
>> > > thanks!
>> > >
>> >
>> >
>> >
>> > --
>> > Best Regards
>> >
>> > Jeff Zhang
>> >
>>
>
>
>
> --
> timeless(ness)