You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Bill Hastings <bl...@gmail.com> on 2009/12/06 03:41:19 UTC

Cassandra vs HBase

We are seriously evaluating both Cassandra and HBase in our organization. We
are very pleased with the performance and ease of use of Cassandra and will
start the same exercise with HBase soon. However some people in the
organization feel that HBase has a larger comunity adoption. I understand
that Cassandra has been deployed within Facebook and is also at Digg and
Rackspace. I also went through the survey that Jonathan had conducted a few
weeks back.

I am interested in hearing from those of you if there are any who have
ditched HBase for Cassandra. If so why? Also, perhaps this is not a question
for this list, is HBase used for real timish applications and if so any
ideas what the largest deployment is. It would help me in swaying the folks
towards Cassandra. Any help would be appreciated.

Cheers
Bill

Re: Cassandra vs HBase

Posted by Jonathan Ellis <jb...@gmail.com>.
hop is to traditional hadoop what something like truviso is to
postgresql.  in other words, hop is still poorly suited for the sorts
of things people would use cassandra for.

On Tue, Dec 8, 2009 at 12:07 AM, Ian Holsman <ia...@holsman.net> wrote:
> This is slightly off-topic
> There is a recent project called hadoop online (hop) on google-code that
> promises a online/continuous query ability on top of hadoop which should
> allow for near real time activities instead of the batch stuff that mapred
> does
>
> ---
> Sent from my phone
> Ian Holsman - 703 879-3128
> On 06/12/2009, at 3:12 PM, Joseph Bowman <bo...@gmail.com> wrote:
>
> When I wrote my Why Cassandra article, I didn't get into the why I didn't
> choose x platform because I didn't want to start a flame war by doing
> comparisons. For HBase, the primary reason I didn't choose it is that while
> there were benchmarks of what it could theoretically do, there wasn't any
> real real world deployments proving it. My experience as a systems
> administrator is that it's best to go with a product that's been proven over
> time in real world scenarios.
>
> I'll add to this though, that nothing nosql, even Cassandra, has reached the
> point where I feel it's no-brainer to choose it over anything, including sql
> based solutions like mysql and oracle. It really comes down to your
> requirements.
>
> On Sat, Dec 5, 2009 at 11:04 PM, Matt Revelle <mr...@gmail.com> wrote:
>>
>> On Dec 5, 2009, at 21:45, Joe Stump <jo...@joestump.net> wrote:
>>
>>>
>>> On Dec 5, 2009, at 7:41 PM, Bill Hastings wrote:
>>>
>>>> [Is] HBase used for real timish applications and if so any ideas what
>>>> the largest deployment is.
>>>
>>> I don't know of anyone off the top of my head who's using anything built
>>> on top of Hadoop for a real-time environment. Hadoop just wasn't built for
>>> that. It was built, like MapReduce, for crunching absurd amounts of data
>>> across hundreds of nodes in a "reasonable" amount of time.
>>>
>>> Just my $0.02.
>>>
>>> --Joe
>>>
>>
>> While Hadoop MapReduce isn't meant for realtime use, HBase can handle it.
>>
>> Over last summer there were some benchmarks included in HBase/Hadoop
>> presentations that showed, IIRC, performance comparable to Cassandra.
>>
>
>

Re: Cassandra vs HBase

Posted by Ian Holsman <ia...@holsman.net>.
This is slightly off-topic

There is a recent project called hadoop online (hop) on google-code  
that promises a online/continuous query ability on top of hadoop which  
should allow for near real time activities instead of the batch stuff  
that mapred does

---
Sent from my phone
Ian Holsman - 703 879-3128

On 06/12/2009, at 3:12 PM, Joseph Bowman <bo...@gmail.com>  
wrote:

> When I wrote my Why Cassandra article, I didn't get into the why I  
> didn't choose x platform because I didn't want to start a flame war  
> by doing comparisons. For HBase, the primary reason I didn't choose  
> it is that while there were benchmarks of what it could  
> theoretically do, there wasn't any real real world deployments  
> proving it. My experience as a systems administrator is that it's  
> best to go with a product that's been proven over time in real world  
> scenarios.
>
> I'll add to this though, that nothing nosql, even Cassandra, has  
> reached the point where I feel it's no-brainer to choose it over  
> anything, including sql based solutions like mysql and oracle. It  
> really comes down to your requirements.
>
> On Sat, Dec 5, 2009 at 11:04 PM, Matt Revelle <mr...@gmail.com>  
> wrote:
> On Dec 5, 2009, at 21:45, Joe Stump <jo...@joestump.net> wrote:
>
>
> On Dec 5, 2009, at 7:41 PM, Bill Hastings wrote:
>
> [Is] HBase used for real timish applications and if so any ideas  
> what the largest deployment is.
>
> I don't know of anyone off the top of my head who's using anything  
> built on top of Hadoop for a real-time environment. Hadoop just  
> wasn't built for that. It was built, like MapReduce, for crunching  
> absurd amounts of data across hundreds of nodes in a "reasonable"  
> amount of time.
>
> Just my $0.02.
>
> --Joe
>
>
> While Hadoop MapReduce isn't meant for realtime use, HBase can  
> handle it.
>
> Over last summer there were some benchmarks included in HBase/Hadoop  
> presentations that showed, IIRC, performance comparable to Cassandra.
>
>

Re: Cassandra vs HBase

Posted by Joseph Bowman <bo...@gmail.com>.
When I wrote my Why Cassandra article, I didn't get into the why I didn't
choose x platform because I didn't want to start a flame war by doing
comparisons. For HBase, the primary reason I didn't choose it is that while
there were benchmarks of what it could theoretically do, there wasn't any
real real world deployments proving it. My experience as a systems
administrator is that it's best to go with a product that's been proven over
time in real world scenarios.

I'll add to this though, that nothing nosql, even Cassandra, has reached the
point where I feel it's no-brainer to choose it over anything, including sql
based solutions like mysql and oracle. It really comes down to your
requirements.

On Sat, Dec 5, 2009 at 11:04 PM, Matt Revelle <mr...@gmail.com> wrote:

> On Dec 5, 2009, at 21:45, Joe Stump <jo...@joestump.net> wrote:
>
>
>> On Dec 5, 2009, at 7:41 PM, Bill Hastings wrote:
>>
>>  [Is] HBase used for real timish applications and if so any ideas what the
>>> largest deployment is.
>>>
>>
>> I don't know of anyone off the top of my head who's using anything built
>> on top of Hadoop for a real-time environment. Hadoop just wasn't built for
>> that. It was built, like MapReduce, for crunching absurd amounts of data
>> across hundreds of nodes in a "reasonable" amount of time.
>>
>> Just my $0.02.
>>
>> --Joe
>>
>>
> While Hadoop MapReduce isn't meant for realtime use, HBase can handle it.
>
> Over last summer there were some benchmarks included in HBase/Hadoop
> presentations that showed, IIRC, performance comparable to Cassandra.
>
>

Re: Cassandra vs HBase

Posted by Tim Estes <ti...@digitalreasoning.com>.
Thanks. That is interesting and what I was looking for.

I knew V.20 was closing the gap. Probably good to compare with V0.5B1  
on the Cassandra side. I'd think that fast multi-get and batch insert/ 
update would be interesting to compare and benchmark. I know we are  
taxing Cassandra now and working on some auxillary means (outside if  
Thrift) to see what the per node limits really are...

Sent from my iPhone

On Dec 6, 2009, at 12:35 AM, "Matt Revelle" <mr...@gmail.com> wrote:

> Cassandra performance likely still beats HBase, but according to the  
> "Powered By" page on the HBase wiki it is being used to handle  
> realtime requests by StumbleUpon, Meetup, and Streamy (http://wiki.apache.org/hadoop/Hbase/PoweredBy 
> ).
>
> These two documents contain some performance numbers:
> http://static.last.fm/johan/nosql-20090611/hbase_nosql.pdf  (skip to  
> page 22)
> http://www.slideshare.net/schubertzhang/hbase-0200-performance-evaluation
>
> Both Cassandra and HBase are useful tech, I just wanted to point out  
> that HBase performance has improved over the past year and it can  
> handle realtime requests.
>
> On Dec 5, 2009, at 11:08 PM, Tim Estes wrote:
>
>> Can you link/reference those? I haven't seen random read or write  
>> performance numbers published around V0.20 Hbase that are within 5x  
>> of Cassandra. I'm very curious about this...
>>
>> Sent from my iPhone
>>
>> On Dec 5, 2009, at 11:05 PM, "Matt Revelle" <mr...@gmail.com>  
>> wrote:
>>
>>> On Dec 5, 2009, at 21:45, Joe Stump <jo...@joestump.net> wrote:
>>>
>>>>
>>>> On Dec 5, 2009, at 7:41 PM, Bill Hastings wrote:
>>>>
>>>>> [Is] HBase used for real timish applications and if so any ideas  
>>>>> what the largest deployment is.
>>>>
>>>> I don't know of anyone off the top of my head who's using  
>>>> anything built on top of Hadoop for a real-time environment.  
>>>> Hadoop just wasn't built for that. It was built, like MapReduce,  
>>>> for crunching absurd amounts of data across hundreds of nodes in  
>>>> a "reasonable" amount of time.
>>>>
>>>> Just my $0.02.
>>>>
>>>> --Joe
>>>>
>>>
>>> While Hadoop MapReduce isn't meant for realtime use, HBase can  
>>> handle it.
>>>
>>> Over last summer there were some benchmarks included in HBase/ 
>>> Hadoop presentations that showed, IIRC, performance comparable to  
>>> Cassandra.
>>>
>

Re: Cassandra vs HBase

Posted by Joseph Bowman <bo...@gmail.com>.
Just to play devils advocate... when was the last time someone benchmarked
Cassandra? There's been a lot of changes and a couple releases since the
last version I saw benchmarks for.

On Sun, Dec 6, 2009 at 12:34 AM, Matt Revelle <mr...@gmail.com> wrote:

> Cassandra performance likely still beats HBase, but according to the
> "Powered By" page on the HBase wiki it is being used to handle realtime
> requests by StumbleUpon, Meetup, and Streamy (
> http://wiki.apache.org/hadoop/Hbase/PoweredBy).
>
> These two documents contain some performance numbers:
> http://static.last.fm/johan/nosql-20090611/hbase_nosql.pdf  (skip to page
> 22)
> http://www.slideshare.net/schubertzhang/hbase-0200-performance-evaluation
>
> Both Cassandra and HBase are useful tech, I just wanted to point out that
> HBase performance has improved over the past year and it can handle realtime
> requests.
>
> On Dec 5, 2009, at 11:08 PM, Tim Estes wrote:
>
> > Can you link/reference those? I haven't seen random read or write
> performance numbers published around V0.20 Hbase that are within 5x of
> Cassandra. I'm very curious about this...
> >
> > Sent from my iPhone
> >
> > On Dec 5, 2009, at 11:05 PM, "Matt Revelle" <mr...@gmail.com> wrote:
> >
> >> On Dec 5, 2009, at 21:45, Joe Stump <jo...@joestump.net> wrote:
> >>
> >>>
> >>> On Dec 5, 2009, at 7:41 PM, Bill Hastings wrote:
> >>>
> >>>> [Is] HBase used for real timish applications and if so any ideas what
> the largest deployment is.
> >>>
> >>> I don't know of anyone off the top of my head who's using anything
> built on top of Hadoop for a real-time environment. Hadoop just wasn't built
> for that. It was built, like MapReduce, for crunching absurd amounts of data
> across hundreds of nodes in a "reasonable" amount of time.
> >>>
> >>> Just my $0.02.
> >>>
> >>> --Joe
> >>>
> >>
> >> While Hadoop MapReduce isn't meant for realtime use, HBase can handle
> it.
> >>
> >> Over last summer there were some benchmarks included in HBase/Hadoop
> presentations that showed, IIRC, performance comparable to Cassandra.
> >>
>
>

Re: Cassandra vs HBase

Posted by Matt Revelle <mr...@gmail.com>.
Cassandra performance likely still beats HBase, but according to the "Powered By" page on the HBase wiki it is being used to handle realtime requests by StumbleUpon, Meetup, and Streamy (http://wiki.apache.org/hadoop/Hbase/PoweredBy).

These two documents contain some performance numbers:
http://static.last.fm/johan/nosql-20090611/hbase_nosql.pdf  (skip to page 22)
http://www.slideshare.net/schubertzhang/hbase-0200-performance-evaluation

Both Cassandra and HBase are useful tech, I just wanted to point out that HBase performance has improved over the past year and it can handle realtime requests.

On Dec 5, 2009, at 11:08 PM, Tim Estes wrote:

> Can you link/reference those? I haven't seen random read or write performance numbers published around V0.20 Hbase that are within 5x of Cassandra. I'm very curious about this...
> 
> Sent from my iPhone
> 
> On Dec 5, 2009, at 11:05 PM, "Matt Revelle" <mr...@gmail.com> wrote:
> 
>> On Dec 5, 2009, at 21:45, Joe Stump <jo...@joestump.net> wrote:
>> 
>>> 
>>> On Dec 5, 2009, at 7:41 PM, Bill Hastings wrote:
>>> 
>>>> [Is] HBase used for real timish applications and if so any ideas what the largest deployment is.
>>> 
>>> I don't know of anyone off the top of my head who's using anything built on top of Hadoop for a real-time environment. Hadoop just wasn't built for that. It was built, like MapReduce, for crunching absurd amounts of data across hundreds of nodes in a "reasonable" amount of time.
>>> 
>>> Just my $0.02.
>>> 
>>> --Joe
>>> 
>> 
>> While Hadoop MapReduce isn't meant for realtime use, HBase can handle it.
>> 
>> Over last summer there were some benchmarks included in HBase/Hadoop presentations that showed, IIRC, performance comparable to Cassandra.
>> 


Re: Cassandra vs HBase

Posted by Tim Estes <ti...@digitalreasoning.com>.
Can you link/reference those? I haven't seen random read or write  
performance numbers published around V0.20 Hbase that are within 5x of  
Cassandra. I'm very curious about this...

Sent from my iPhone

On Dec 5, 2009, at 11:05 PM, "Matt Revelle" <mr...@gmail.com> wrote:

> On Dec 5, 2009, at 21:45, Joe Stump <jo...@joestump.net> wrote:
>
>>
>> On Dec 5, 2009, at 7:41 PM, Bill Hastings wrote:
>>
>>> [Is] HBase used for real timish applications and if so any ideas  
>>> what the largest deployment is.
>>
>> I don't know of anyone off the top of my head who's using anything  
>> built on top of Hadoop for a real-time environment. Hadoop just  
>> wasn't built for that. It was built, like MapReduce, for crunching  
>> absurd amounts of data across hundreds of nodes in a "reasonable"  
>> amount of time.
>>
>> Just my $0.02.
>>
>> --Joe
>>
>
> While Hadoop MapReduce isn't meant for realtime use, HBase can  
> handle it.
>
> Over last summer there were some benchmarks included in HBase/Hadoop  
> presentations that showed, IIRC, performance comparable to Cassandra.
>

Re: Cassandra vs HBase

Posted by Matt Revelle <mr...@gmail.com>.
On Dec 5, 2009, at 21:45, Joe Stump <jo...@joestump.net> wrote:

>
> On Dec 5, 2009, at 7:41 PM, Bill Hastings wrote:
>
>> [Is] HBase used for real timish applications and if so any ideas  
>> what the largest deployment is.
>
> I don't know of anyone off the top of my head who's using anything  
> built on top of Hadoop for a real-time environment. Hadoop just  
> wasn't built for that. It was built, like MapReduce, for crunching  
> absurd amounts of data across hundreds of nodes in a "reasonable"  
> amount of time.
>
> Just my $0.02.
>
> --Joe
>

While Hadoop MapReduce isn't meant for realtime use, HBase can handle  
it.

Over last summer there were some benchmarks included in HBase/Hadoop  
presentations that showed, IIRC, performance comparable to Cassandra.


Re: Cassandra vs HBase

Posted by Joe Stump <jo...@joestump.net>.
On Dec 5, 2009, at 7:41 PM, Bill Hastings wrote:

> [Is] HBase used for real timish applications and if so any ideas what the largest deployment is.

I don't know of anyone off the top of my head who's using anything built on top of Hadoop for a real-time environment. Hadoop just wasn't built for that. It was built, like MapReduce, for crunching absurd amounts of data across hundreds of nodes in a "reasonable" amount of time.

Just my $0.02.

--Joe