You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Ajay <aj...@gmail.com> on 2015/05/29 21:12:28 UTC

Hbase vs Cassandra

Hi,

I need some info on Hbase vs Cassandra as a data store (in general plus
specific to time series data).

The comparison in the following helps:
1: features
2: deployment and monitoring
3: performance
4: anything else

Thanks
Ajay

Re: Hbase vs Cassandra

Posted by Serega Sheypak <se...@gmail.com>.

http://blog.parsely.com/post/1928/cass/
Here is cool blogpost. I've used hbase for years and once had a project
with Cassandra. Over complicated system with bugs declared as features.
Really there is no reason to use Cassandra.
Describe our task and I can tell you how solve it using hbase

пятница, 29 мая 2015 г. пользователь Ajay написал:

> Hi,
>
> I need some info on Hbase vs Cassandra as a data store (in general plus
> specific to time series data).
>
> The comparison in the following helps:
> 1: features
> 2: deployment and monitoring
> 3: performance
> 4: anything else
>
> Thanks
> Ajay
>

Re: Hbase vs Cassandra

Posted by john guthrie <gr...@gmail.com>.

funny, i was just on a con-call with a hortonworks engineer. his take was
that if you need/want to be part of a wider hadoop ecosystem, HBase.
otherwise it was pretty much a wash

john

On Fri, May 29, 2015 at 3:12 PM, Ajay <aj...@gmail.com> wrote:

> Hi,
>
> I need some info on Hbase vs Cassandra as a data store (in general plus
> specific to time series data).
>
> The comparison in the following helps:
> 1: features
> 2: deployment and monitoring
> 3: performance
> 4: anything else
>
> Thanks
> Ajay
>

Re: Hbase vs Cassandra

Posted by Serega Sheypak <se...@gmail.com>.

You can use Cassandra not datastax distro. Apache Cassandra is opensourse

суббота, 30 мая 2015 г. пользователь jongchul seon написал:

> I have not tried cassandra, because it is not fully open source.......  I
> personally prefer HBase which always shows expected result for my code.
>
>
> 2015-05-30 11:40 GMT+00:00 Serega Sheypak <serega.sheypak@gmail.com
> <javascript:;>>:
>
> > 1. No killer features comparing to hbase
> > 2.terrible!!! Ambari/cloudera manager rulezzz. Netflix has its own tool
> for
> > Cassandra but it doesn't support vnodes.
> > 3. Rumors say it fast when it works;) the reason- it can silently drop
> data
> > you try to write.
> > 4. Timeseries is a nightmare. The easiest approach is just replicate data
> > to hdfs, partition it by hour/day and run spark/scalding/pig/hive/Impala
> >
> > пятница, 29 мая 2015 г. пользователь Ajay написал:
> >
> > > Hi,
> > >
> > > I need some info on Hbase vs Cassandra as a data store (in general plus
> > > specific to time series data).
> > >
> > > The comparison in the following helps:
> > > 1: features
> > > 2: deployment and monitoring
> > > 3: performance
> > > 4: anything else
> > >
> > > Thanks
> > > Ajay
> > >
> >
>

Re: Hbase vs Cassandra

Posted by jongchul seon <jo...@gmail.com>.

I have not tried cassandra, because it is not fully open source.......  I
personally prefer HBase which always shows expected result for my code.


2015-05-30 11:40 GMT+00:00 Serega Sheypak <se...@gmail.com>:

> 1. No killer features comparing to hbase
> 2.terrible!!! Ambari/cloudera manager rulezzz. Netflix has its own tool for
> Cassandra but it doesn't support vnodes.
> 3. Rumors say it fast when it works;) the reason- it can silently drop data
> you try to write.
> 4. Timeseries is a nightmare. The easiest approach is just replicate data
> to hdfs, partition it by hour/day and run spark/scalding/pig/hive/Impala
>
> пятница, 29 мая 2015 г. пользователь Ajay написал:
>
> > Hi,
> >
> > I need some info on Hbase vs Cassandra as a data store (in general plus
> > specific to time series data).
> >
> > The comparison in the following helps:
> > 1: features
> > 2: deployment and monitoring
> > 3: performance
> > 4: anything else
> >
> > Thanks
> > Ajay
> >
>

Re: Hbase vs Cassandra

Posted by Andrew Purtell <an...@gmail.com>.

You are both making correct points, but FWIW HBase does not require use of Hadoop YARN or MapReduce. We do require HDFS of course. Some of the tools we ship are MapReduce applications but these are not core functions. We know of several large production use cases where the HBase(+HDFS) clusters are used as a data store backing online applications without colocated computation.


On Jun 2, 2015, at 7:29 AM, Vladimir Rodionov <vl...@gmail.com> wrote:

>>> The key issue is that unless you need or want to use Hadoop, you
> shouldn’t be using HBase. Its not a stand alone product or system.
> Hello, what is use case of a big data application w/o Hadoop?
> 
> -Vlad
> 
> On Mon, Jun 1, 2015 at 2:26 PM, Michael Segel <mi...@hotmail.com>
> wrote:
> 
>> Saying Ambari rules is like saying that you like to drink MD 20/20 and
>> calling it a fine wine.
>> 
>> Sorry to all the Hortonworks guys but Amabari has a long way to go…. very
>> immature.
>> 
>> What that has to do with Cassandra vs HBase? I haven’t a clue.
>> 
>> The key issue is that unless you need or want to use Hadoop, you shouldn’t
>> be using HBase. Its not a stand alone product or system.
>> 
>> 
>> 
>> 
>>>> On May 30, 2015, at 7:40 AM, Serega Sheypak <se...@gmail.com>
>>> wrote:
>>> 
>>> 1. No killer features comparing to hbase
>>> 2.terrible!!! Ambari/cloudera manager rulezzz. Netflix has its own tool
>> for
>>> Cassandra but it doesn't support vnodes.
>>> 3. Rumors say it fast when it works;) the reason- it can silently drop
>> data
>>> you try to write.
>>> 4. Timeseries is a nightmare. The easiest approach is just replicate data
>>> to hdfs, partition it by hour/day and run spark/scalding/pig/hive/Impala
>>> 
>>> пятница, 29 мая 2015 г. пользователь Ajay написал:
>>> 
>>>> Hi,
>>>> 
>>>> I need some info on Hbase vs Cassandra as a data store (in general plus
>>>> specific to time series data).
>>>> 
>>>> The comparison in the following helps:
>>>> 1: features
>>>> 2: deployment and monitoring
>>>> 3: performance
>>>> 4: anything else
>>>> 
>>>> Thanks
>>>> Ajay
>> 
>>

Re: Hbase vs Cassandra

Posted by lars hofhansl <la...@apache.org>.

HBase is a distributed, consistent, sorted key value store. The "sorted" bit allows for range scans in addition to the point gets that all K/V stores support. Nothing more, nothing less.

It happens to store its data in HDFS by default, and we provide convenient input and output formats for map reduce.

      From: Michael Segel <mi...@hotmail.com>
 To: user@hbase.apache.org 
 Sent: Monday, June 1, 2015 5:32 PM
 Subject: Re: Hbase vs Cassandra
   
The point is that HBase is part of the Hadoop ecosystem. Not a stand alone database like Cassandra. 

This is one thing that gets lost when people want to compare NoSQL databases / data stores. 

As to Big Data without Hadoop? Well, there’s spark on mesos … :-P
And there are other Big Data systems out there but are not as well known. 
Lexus/Nexus had their proprietary system that they’ve been trying to sell … 




> On Jun 1, 2015, at 5:29 PM, Vladimir Rodionov <vl...@gmail.com> wrote:
> 
>>> The key issue is that unless you need or want to use Hadoop, you
> shouldn’t be using HBase. Its not a stand alone product or system.
> 
> Hello, what is use case of a big data application w/o Hadoop?
> 
> -Vlad
> 
> On Mon, Jun 1, 2015 at 2:26 PM, Michael Segel <mi...@hotmail.com>
> wrote:
> 
>> Saying Ambari rules is like saying that you like to drink MD 20/20 and
>> calling it a fine wine.
>> 
>> Sorry to all the Hortonworks guys but Amabari has a long way to go…. very
>> immature.
>> 
>> What that has to do with Cassandra vs HBase? I haven’t a clue.
>> 
>> The key issue is that unless you need or want to use Hadoop, you shouldn’t
>> be using HBase. Its not a stand alone product or system.
>> 
>> 
>> 
>> 
>>> On May 30, 2015, at 7:40 AM, Serega Sheypak <se...@gmail.com>
>> wrote:
>>> 
>>> 1. No killer features comparing to hbase
>>> 2.terrible!!! Ambari/cloudera manager rulezzz. Netflix has its own tool
>> for
>>> Cassandra but it doesn't support vnodes.
>>> 3. Rumors say it fast when it works;) the reason- it can silently drop
>> data
>>> you try to write.
>>> 4. Timeseries is a nightmare. The easiest approach is just replicate data
>>> to hdfs, partition it by hour/day and run spark/scalding/pig/hive/Impala
>>> 
>>> пятница, 29 мая 2015 г. пользователь Ajay написал:
>>> 
>>>> Hi,
>>>> 
>>>> I need some info on Hbase vs Cassandra as a data store (in general plus
>>>> specific to time series data).
>>>> 
>>>> The comparison in the following helps:
>>>> 1: features
>>>> 2: deployment and monitoring
>>>> 3: performance
>>>> 4: anything else
>>>> 
>>>> Thanks
>>>> Ajay
>>>> 
>> 
>>

Re: Hbase vs Cassandra

Posted by Russell Jurney <ru...@gmail.com>.

Hbase can do range scans, and one can attack many problems with range
scans. Cassandra can't do range scans.

Hbase has a master. Cassandra does not.

Those are the two main differences.

On Monday, June 1, 2015, Andrew Purtell <an...@gmail.com> wrote:

> HBase can very well be a standalone database, but we are debating
> semantics not technology I suspect. HBase uses some Hadoop ecosystem
> technologies but is absolutely a first class data store. I need to look no
> further than my employer for an example of a rather large production deploy
> of HBase* as a (internal) service, a high scale data storage platform.
>
> * - Strictly speaking HBase accessed with Apache Phoenix's JDBC driver.
>
>
> > On Jun 2, 2015, at 10:32 AM, Michael Segel <michael_segel@hotmail.com
> <javascript:;>> wrote:
> >
> > The point is that HBase is part of the Hadoop ecosystem. Not a stand
> alone database like Cassandra.
> >
> > This is one thing that gets lost when people want to compare NoSQL
> databases / data stores.
> >
> > As to Big Data without Hadoop? Well, there’s spark on mesos … :-P
> > And there are other Big Data systems out there but are not as well known.
> > Lexus/Nexus had their proprietary system that they’ve been trying to
> sell …
> >
> >
> >> On Jun 1, 2015, at 5:29 PM, Vladimir Rodionov <vladrodionov@gmail.com
> <javascript:;>> wrote:
> >>
> >>>> The key issue is that unless you need or want to use Hadoop, you
> >> shouldn’t be using HBase. Its not a stand alone product or system.
> >>
> >> Hello, what is use case of a big data application w/o Hadoop?
> >>
> >> -Vlad
> >>
> >> On Mon, Jun 1, 2015 at 2:26 PM, Michael Segel <
> michael_segel@hotmail.com <javascript:;>>
> >> wrote:
> >>
> >>> Saying Ambari rules is like saying that you like to drink MD 20/20 and
> >>> calling it a fine wine.
> >>>
> >>> Sorry to all the Hortonworks guys but Amabari has a long way to go….
> very
> >>> immature.
> >>>
> >>> What that has to do with Cassandra vs HBase? I haven’t a clue.
> >>>
> >>> The key issue is that unless you need or want to use Hadoop, you
> shouldn’t
> >>> be using HBase. Its not a stand alone product or system.
> >>>
> >>>
> >>>
> >>>
> >>>> On May 30, 2015, at 7:40 AM, Serega Sheypak <serega.sheypak@gmail.com
> <javascript:;>>
> >>> wrote:
> >>>>
> >>>> 1. No killer features comparing to hbase
> >>>> 2.terrible!!! Ambari/cloudera manager rulezzz. Netflix has its own
> tool
> >>> for
> >>>> Cassandra but it doesn't support vnodes.
> >>>> 3. Rumors say it fast when it works;) the reason- it can silently drop
> >>> data
> >>>> you try to write.
> >>>> 4. Timeseries is a nightmare. The easiest approach is just replicate
> data
> >>>> to hdfs, partition it by hour/day and run
> spark/scalding/pig/hive/Impala
> >>>>
> >>>> пятница, 29 мая 2015 г. пользователь Ajay написал:
> >>>>
> >>>>> Hi,
> >>>>>
> >>>>> I need some info on Hbase vs Cassandra as a data store (in general
> plus
> >>>>> specific to time series data).
> >>>>>
> >>>>> The comparison in the following helps:
> >>>>> 1: features
> >>>>> 2: deployment and monitoring
> >>>>> 3: performance
> >>>>> 4: anything else
> >>>>>
> >>>>> Thanks
> >>>>> Ajay
> >>>>>
> >>>
> >>>
> >
>


-- 
Russell Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.com

Re: Hbase vs Cassandra

Posted by Andrew Purtell <an...@gmail.com>.

HBase can very well be a standalone database, but we are debating semantics not technology I suspect. HBase uses some Hadoop ecosystem technologies but is absolutely a first class data store. I need to look no further than my employer for an example of a rather large production deploy of HBase* as a (internal) service, a high scale data storage platform. 

* - Strictly speaking HBase accessed with Apache Phoenix's JDBC driver. 


> On Jun 2, 2015, at 10:32 AM, Michael Segel <mi...@hotmail.com> wrote:
> 
> The point is that HBase is part of the Hadoop ecosystem. Not a stand alone database like Cassandra. 
> 
> This is one thing that gets lost when people want to compare NoSQL databases / data stores. 
> 
> As to Big Data without Hadoop? Well, there’s spark on mesos … :-P
> And there are other Big Data systems out there but are not as well known. 
> Lexus/Nexus had their proprietary system that they’ve been trying to sell … 
> 
> 
>> On Jun 1, 2015, at 5:29 PM, Vladimir Rodionov <vl...@gmail.com> wrote:
>> 
>>>> The key issue is that unless you need or want to use Hadoop, you
>> shouldn’t be using HBase. Its not a stand alone product or system.
>> 
>> Hello, what is use case of a big data application w/o Hadoop?
>> 
>> -Vlad
>> 
>> On Mon, Jun 1, 2015 at 2:26 PM, Michael Segel <mi...@hotmail.com>
>> wrote:
>> 
>>> Saying Ambari rules is like saying that you like to drink MD 20/20 and
>>> calling it a fine wine.
>>> 
>>> Sorry to all the Hortonworks guys but Amabari has a long way to go…. very
>>> immature.
>>> 
>>> What that has to do with Cassandra vs HBase? I haven’t a clue.
>>> 
>>> The key issue is that unless you need or want to use Hadoop, you shouldn’t
>>> be using HBase. Its not a stand alone product or system.
>>> 
>>> 
>>> 
>>> 
>>>> On May 30, 2015, at 7:40 AM, Serega Sheypak <se...@gmail.com>
>>> wrote:
>>>> 
>>>> 1. No killer features comparing to hbase
>>>> 2.terrible!!! Ambari/cloudera manager rulezzz. Netflix has its own tool
>>> for
>>>> Cassandra but it doesn't support vnodes.
>>>> 3. Rumors say it fast when it works;) the reason- it can silently drop
>>> data
>>>> you try to write.
>>>> 4. Timeseries is a nightmare. The easiest approach is just replicate data
>>>> to hdfs, partition it by hour/day and run spark/scalding/pig/hive/Impala
>>>> 
>>>> пятница, 29 мая 2015 г. пользователь Ajay написал:
>>>> 
>>>>> Hi,
>>>>> 
>>>>> I need some info on Hbase vs Cassandra as a data store (in general plus
>>>>> specific to time series data).
>>>>> 
>>>>> The comparison in the following helps:
>>>>> 1: features
>>>>> 2: deployment and monitoring
>>>>> 3: performance
>>>>> 4: anything else
>>>>> 
>>>>> Thanks
>>>>> Ajay
>>>>> 
>>> 
>>> 
>

Re: Hbase vs Cassandra

Posted by Michael Segel <mi...@hotmail.com>.

The point is that HBase is part of the Hadoop ecosystem. Not a stand alone database like Cassandra. 

This is one thing that gets lost when people want to compare NoSQL databases / data stores. 

As to Big Data without Hadoop? Well, there’s spark on mesos … :-P
And there are other Big Data systems out there but are not as well known. 
Lexus/Nexus had their proprietary system that they’ve been trying to sell … 


> On Jun 1, 2015, at 5:29 PM, Vladimir Rodionov <vl...@gmail.com> wrote:
> 
>>> The key issue is that unless you need or want to use Hadoop, you
> shouldn’t be using HBase. Its not a stand alone product or system.
> 
> Hello, what is use case of a big data application w/o Hadoop?
> 
> -Vlad
> 
> On Mon, Jun 1, 2015 at 2:26 PM, Michael Segel <mi...@hotmail.com>
> wrote:
> 
>> Saying Ambari rules is like saying that you like to drink MD 20/20 and
>> calling it a fine wine.
>> 
>> Sorry to all the Hortonworks guys but Amabari has a long way to go…. very
>> immature.
>> 
>> What that has to do with Cassandra vs HBase? I haven’t a clue.
>> 
>> The key issue is that unless you need or want to use Hadoop, you shouldn’t
>> be using HBase. Its not a stand alone product or system.
>> 
>> 
>> 
>> 
>>> On May 30, 2015, at 7:40 AM, Serega Sheypak <se...@gmail.com>
>> wrote:
>>> 
>>> 1. No killer features comparing to hbase
>>> 2.terrible!!! Ambari/cloudera manager rulezzz. Netflix has its own tool
>> for
>>> Cassandra but it doesn't support vnodes.
>>> 3. Rumors say it fast when it works;) the reason- it can silently drop
>> data
>>> you try to write.
>>> 4. Timeseries is a nightmare. The easiest approach is just replicate data
>>> to hdfs, partition it by hour/day and run spark/scalding/pig/hive/Impala
>>> 
>>> пятница, 29 мая 2015 г. пользователь Ajay написал:
>>> 
>>>> Hi,
>>>> 
>>>> I need some info on Hbase vs Cassandra as a data store (in general plus
>>>> specific to time series data).
>>>> 
>>>> The comparison in the following helps:
>>>> 1: features
>>>> 2: deployment and monitoring
>>>> 3: performance
>>>> 4: anything else
>>>> 
>>>> Thanks
>>>> Ajay
>>>> 
>> 
>>

Re: Hbase vs Cassandra

Posted by Vladimir Rodionov <vl...@gmail.com>.

>> The key issue is that unless you need or want to use Hadoop, you
shouldn’t be using HBase. Its not a stand alone product or system.

Hello, what is use case of a big data application w/o Hadoop?

-Vlad

On Mon, Jun 1, 2015 at 2:26 PM, Michael Segel <mi...@hotmail.com>
wrote:

> Saying Ambari rules is like saying that you like to drink MD 20/20 and
> calling it a fine wine.
>
> Sorry to all the Hortonworks guys but Amabari has a long way to go…. very
> immature.
>
> What that has to do with Cassandra vs HBase? I haven’t a clue.
>
> The key issue is that unless you need or want to use Hadoop, you shouldn’t
> be using HBase. Its not a stand alone product or system.
>
>
>
>
> > On May 30, 2015, at 7:40 AM, Serega Sheypak <se...@gmail.com>
> wrote:
> >
> > 1. No killer features comparing to hbase
> > 2.terrible!!! Ambari/cloudera manager rulezzz. Netflix has its own tool
> for
> > Cassandra but it doesn't support vnodes.
> > 3. Rumors say it fast when it works;) the reason- it can silently drop
> data
> > you try to write.
> > 4. Timeseries is a nightmare. The easiest approach is just replicate data
> > to hdfs, partition it by hour/day and run spark/scalding/pig/hive/Impala
> >
> > пятница, 29 мая 2015 г. пользователь Ajay написал:
> >
> >> Hi,
> >>
> >> I need some info on Hbase vs Cassandra as a data store (in general plus
> >> specific to time series data).
> >>
> >> The comparison in the following helps:
> >> 1: features
> >> 2: deployment and monitoring
> >> 3: performance
> >> 4: anything else
> >>
> >> Thanks
> >> Ajay
> >>
>
>

Re: Hbase vs Cassandra

Posted by Michael Segel <mi...@hotmail.com>.

Saying Ambari rules is like saying that you like to drink MD 20/20 and calling it a fine wine.

Sorry to all the Hortonworks guys but Amabari has a long way to go…. very immature. 

What that has to do with Cassandra vs HBase? I haven’t a clue. 

The key issue is that unless you need or want to use Hadoop, you shouldn’t be using HBase. Its not a stand alone product or system. 

> On May 30, 2015, at 7:40 AM, Serega Sheypak <se...@gmail.com> wrote:
> 
> 1. No killer features comparing to hbase
> 2.terrible!!! Ambari/cloudera manager rulezzz. Netflix has its own tool for
> Cassandra but it doesn't support vnodes.
> 3. Rumors say it fast when it works;) the reason- it can silently drop data
> you try to write.
> 4. Timeseries is a nightmare. The easiest approach is just replicate data
> to hdfs, partition it by hour/day and run spark/scalding/pig/hive/Impala
> 
> пятница, 29 мая 2015 г. пользователь Ajay написал:
> 
>> Hi,
>> 
>> I need some info on Hbase vs Cassandra as a data store (in general plus
>> specific to time series data).
>> 
>> The comparison in the following helps:
>> 1: features
>> 2: deployment and monitoring
>> 3: performance
>> 4: anything else
>> 
>> Thanks
>> Ajay
>>

Re: Hbase vs Cassandra

Posted by Serega Sheypak <se...@gmail.com>.

1. No killer features comparing to hbase
2.terrible!!! Ambari/cloudera manager rulezzz. Netflix has its own tool for
Cassandra but it doesn't support vnodes.
3. Rumors say it fast when it works;) the reason- it can silently drop data
you try to write.
4. Timeseries is a nightmare. The easiest approach is just replicate data
to hdfs, partition it by hour/day and run spark/scalding/pig/hive/Impala

пятница, 29 мая 2015 г. пользователь Ajay написал:

> Hi,
>
> I need some info on Hbase vs Cassandra as a data store (in general plus
> specific to time series data).
>
> The comparison in the following helps:
> 1: features
> 2: deployment and monitoring
> 3: performance
> 4: anything else
>
> Thanks
> Ajay
>

Re: Hbase vs Cassandra

Posted by Otis Gospodnetic <ot...@gmail.com>.

Hi Ajay,

You won't be able to get unbiased opinion here easily.  You'll need to try
and see how each works for your use case.  We use HBase for the SPM backend
and it has worked well for us - it's stable, handles billions and billions
of rows (I lost track of the actual number many moons ago) and fast, if you
get your key design right.  I'll answer your Q about monitoring:

I'd say both are equally well "monitorable".  SPM <http://sematext.com/spm>
can monitor both HBase and Cassandra equally well.  Because Cassandra is a
bit simpler (vs. HBase having multiple processes one needs to run), it's a
bit simpler to add monitoring to Cassandra, but the difference is small.

SPM is at http://sematext.com/spm if you want to have a look.  We expose
our own HBase clusters in the live demo, so you can see what metrics HBase
exposes.  We don't run Cassandra, so we can't show its graphs, but you can
see some charts, metrics, and filters for Cassandra at
http://blog.sematext.com/2014/06/02/announcement-cassandra-performance-monitoring-in-spm/

I hope this helps.

Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/

On Fri, May 29, 2015 at 3:12 PM, Ajay <aj...@gmail.com> wrote:

> Hi,
>
> I need some info on Hbase vs Cassandra as a data store (in general plus
> specific to time series data).
>
> The comparison in the following helps:
> 1: features
> 2: deployment and monitoring
> 3: performance
> 4: anything else
>
> Thanks
> Ajay
>

Re: Hbase vs Cassandra

Posted by Jerry He <je...@gmail.com>.

Another point to add is the new "HBase read high-availability using
timeline-consistent region replicas" feature from HBase 1.0 onward,
which brings HBase closer to Cassandra in term of Read Availability during
node failures.  You have a choice for Read Availability now.

https://issues.apache.org/jira/browse/HBASE-10070



On Sun, May 31, 2015 at 12:32 PM, Vladimir Rodionov <vl...@gmail.com>
wrote:

> Couple more + for HBase
>
> * Coprocessor framework (custom code inside Region Server and Master
> Servers), which Cassandra is missing, afaik.
>    Coprocessors have been widely used by hBase users (Phoenix SQL, for
> example) since inception (in 0.92).
> * HBase security model is more mature and align well with Hadoop/HDFS
> security. Cassandra provides just basic authentication/authorization/SSL
> encryption, no Kerberos, no end-to-end data encryption, no cell level
> security.
>
> -Vlad
>
> On Sun, May 31, 2015 at 12:05 PM, lars hofhansl <la...@apache.org> wrote:
>
> > You really have to try out both if you want to be sure.
> >
> > The fundamental differences that come to mind are:
> > * HBase is always consistent. Machine outages lead to inability to read
> or
> > write data on that machine. With Cassandra you can always write.
> >
> > * Cassandra defaults to a random partitioner, so range scans are not
> > possible (by default)
> > * HBase has a range partitioner (if you don't want that the client has to
> > prefix the rowkey with a prefix of a hash of the rowkey). The main
> feature
> > that set HBase apart are range scans.
> >
> > * HBase is much more tightly integrated with Hadoop/MapReduce/HDFS, etc.
> > You can map reduce directly into HFiles and map those into HBase
> instantly.
> >
> > * Cassandra has a dedicated company supporting (and promoting) it.
> > * Getting started is easier with Cassandra. For HBase you need to run
> HDFS
> > and Zookeeper, etc.
> > * I've heard lots of anecdotes about Cassandra working nicely with small
> > cluster (< 50 nodes) and quick degenerating above that.
> > * HBase does not have a query language (but you can use Phoenix for full
> > SQL support)
> > * HBase does not have secondary indexes (having an eventually consistent
> > index, similar to what Cassandra has, is easy in HBase, but making it as
> > consistent as the rest of HBase is hard)
> >
> > * Everything you'll hear here is biased :)
> >
> >
> >
> > From personal experience... At Salesforce we spent a few months
> > prototyping various stores (including Cassandra) and arrived at HBase.
> Your
> > mileage may vary.
> >
> >
> > -- Lars
> >
> >
> > ----- Original Message -----
> > From: Ajay <aj...@gmail.com>
> > To: user@hbase.apache.org
> > Cc:
> > Sent: Friday, May 29, 2015 12:12 PM
> > Subject: Hbase vs Cassandra
> >
> > Hi,
> >
> > I need some info on Hbase vs Cassandra as a data store (in general plus
> > specific to time series data).
> >
> > The comparison in the following helps:
> > 1: features
> > 2: deployment and monitoring
> > 3: performance
> > 4: anything else
> >
> > Thanks
> > Ajay
> >
>

Re: Hbase vs Cassandra

Posted by Michael Segel <mi...@hotmail.com>.

Well since you brought up coprocessors… lets talk about a lack of security and stability that’s been introduced by coprocessors. ;-) 

I’m not saying that you don’t want server side extensibility, but you need to recognize the risks introduced by coprocessors. 


> On May 31, 2015, at 3:32 PM, Vladimir Rodionov <vl...@gmail.com> wrote:
> 
> Couple more + for HBase
> 
> * Coprocessor framework (custom code inside Region Server and Master
> Servers), which Cassandra is missing, afaik.
>   Coprocessors have been widely used by hBase users (Phoenix SQL, for
> example) since inception (in 0.92).
> * HBase security model is more mature and align well with Hadoop/HDFS
> security. Cassandra provides just basic authentication/authorization/SSL
> encryption, no Kerberos, no end-to-end data encryption, no cell level
> security.
> 
> -Vlad
> 
> On Sun, May 31, 2015 at 12:05 PM, lars hofhansl <la...@apache.org> wrote:
> 
>> You really have to try out both if you want to be sure.
>> 
>> The fundamental differences that come to mind are:
>> * HBase is always consistent. Machine outages lead to inability to read or
>> write data on that machine. With Cassandra you can always write.
>> 
>> * Cassandra defaults to a random partitioner, so range scans are not
>> possible (by default)
>> * HBase has a range partitioner (if you don't want that the client has to
>> prefix the rowkey with a prefix of a hash of the rowkey). The main feature
>> that set HBase apart are range scans.
>> 
>> * HBase is much more tightly integrated with Hadoop/MapReduce/HDFS, etc.
>> You can map reduce directly into HFiles and map those into HBase instantly.
>> 
>> * Cassandra has a dedicated company supporting (and promoting) it.
>> * Getting started is easier with Cassandra. For HBase you need to run HDFS
>> and Zookeeper, etc.
>> * I've heard lots of anecdotes about Cassandra working nicely with small
>> cluster (< 50 nodes) and quick degenerating above that.
>> * HBase does not have a query language (but you can use Phoenix for full
>> SQL support)
>> * HBase does not have secondary indexes (having an eventually consistent
>> index, similar to what Cassandra has, is easy in HBase, but making it as
>> consistent as the rest of HBase is hard)
>> 
>> * Everything you'll hear here is biased :)
>> 
>> 
>> 
>> From personal experience... At Salesforce we spent a few months
>> prototyping various stores (including Cassandra) and arrived at HBase. Your
>> mileage may vary.
>> 
>> 
>> -- Lars
>> 
>> 
>> ----- Original Message -----
>> From: Ajay <aj...@gmail.com>
>> To: user@hbase.apache.org
>> Cc:
>> Sent: Friday, May 29, 2015 12:12 PM
>> Subject: Hbase vs Cassandra
>> 
>> Hi,
>> 
>> I need some info on Hbase vs Cassandra as a data store (in general plus
>> specific to time series data).
>> 
>> The comparison in the following helps:
>> 1: features
>> 2: deployment and monitoring
>> 3: performance
>> 4: anything else
>> 
>> Thanks
>> Ajay
>>

Re: Hbase vs Cassandra

Posted by Vladimir Rodionov <vl...@gmail.com>.

Couple more + for HBase

* Coprocessor framework (custom code inside Region Server and Master
Servers), which Cassandra is missing, afaik.
   Coprocessors have been widely used by hBase users (Phoenix SQL, for
example) since inception (in 0.92).
* HBase security model is more mature and align well with Hadoop/HDFS
security. Cassandra provides just basic authentication/authorization/SSL
encryption, no Kerberos, no end-to-end data encryption, no cell level
security.

-Vlad

On Sun, May 31, 2015 at 12:05 PM, lars hofhansl <la...@apache.org> wrote:

> You really have to try out both if you want to be sure.
>
> The fundamental differences that come to mind are:
> * HBase is always consistent. Machine outages lead to inability to read or
> write data on that machine. With Cassandra you can always write.
>
> * Cassandra defaults to a random partitioner, so range scans are not
> possible (by default)
> * HBase has a range partitioner (if you don't want that the client has to
> prefix the rowkey with a prefix of a hash of the rowkey). The main feature
> that set HBase apart are range scans.
>
> * HBase is much more tightly integrated with Hadoop/MapReduce/HDFS, etc.
> You can map reduce directly into HFiles and map those into HBase instantly.
>
> * Cassandra has a dedicated company supporting (and promoting) it.
> * Getting started is easier with Cassandra. For HBase you need to run HDFS
> and Zookeeper, etc.
> * I've heard lots of anecdotes about Cassandra working nicely with small
> cluster (< 50 nodes) and quick degenerating above that.
> * HBase does not have a query language (but you can use Phoenix for full
> SQL support)
> * HBase does not have secondary indexes (having an eventually consistent
> index, similar to what Cassandra has, is easy in HBase, but making it as
> consistent as the rest of HBase is hard)
>
> * Everything you'll hear here is biased :)
>
>
>
> From personal experience... At Salesforce we spent a few months
> prototyping various stores (including Cassandra) and arrived at HBase. Your
> mileage may vary.
>
>
> -- Lars
>
>
> ----- Original Message -----
> From: Ajay <aj...@gmail.com>
> To: user@hbase.apache.org
> Cc:
> Sent: Friday, May 29, 2015 12:12 PM
> Subject: Hbase vs Cassandra
>
> Hi,
>
> I need some info on Hbase vs Cassandra as a data store (in general plus
> specific to time series data).
>
> The comparison in the following helps:
> 1: features
> 2: deployment and monitoring
> 3: performance
> 4: anything else
>
> Thanks
> Ajay
>

Re: Hbase vs Cassandra

Posted by lars hofhansl <la...@apache.org>.

You really have to try out both if you want to be sure.

The fundamental differences that come to mind are:
* HBase is always consistent. Machine outages lead to inability to read or write data on that machine. With Cassandra you can always write.

* Cassandra defaults to a random partitioner, so range scans are not possible (by default)
* HBase has a range partitioner (if you don't want that the client has to prefix the rowkey with a prefix of a hash of the rowkey). The main feature that set HBase apart are range scans.

* HBase is much more tightly integrated with Hadoop/MapReduce/HDFS, etc. You can map reduce directly into HFiles and map those into HBase instantly.

* Cassandra has a dedicated company supporting (and promoting) it.
* Getting started is easier with Cassandra. For HBase you need to run HDFS and Zookeeper, etc.
* I've heard lots of anecdotes about Cassandra working nicely with small cluster (< 50 nodes) and quick degenerating above that.
* HBase does not have a query language (but you can use Phoenix for full SQL support)
* HBase does not have secondary indexes (having an eventually consistent index, similar to what Cassandra has, is easy in HBase, but making it as consistent as the rest of HBase is hard)

* Everything you'll hear here is biased :)



>From personal experience... At Salesforce we spent a few months prototyping various stores (including Cassandra) and arrived at HBase. Your mileage may vary.


-- Lars


----- Original Message -----
From: Ajay <aj...@gmail.com>
To: user@hbase.apache.org
Cc: 
Sent: Friday, May 29, 2015 12:12 PM
Subject: Hbase vs Cassandra

Hi,

I need some info on Hbase vs Cassandra as a data store (in general plus
specific to time series data).

The comparison in the following helps:
1: features
2: deployment and monitoring
3: performance
4: anything else

Thanks
Ajay

Re: Hbase vs Cassandra

Posted by Ted Yu <yu...@gmail.com>.

See http://hbase.apache.org/book.html#perf.network.call_me_maybe

Cheers

On Fri, May 29, 2015 at 12:20 PM, Lukáš Vlček <lu...@gmail.com> wrote:

> As for the #4 you might be interested in reading
> https://aphyr.com/posts/294-call-me-maybe-cassandra
> Not sure if there is comparable article about HBase (anybody knows?) but it
> can give you another perspective about what else to keep an eye on
> regarding these systems.
>
> Regards,
> Lukas
>
> On Fri, May 29, 2015 at 9:12 PM, Ajay <aj...@gmail.com> wrote:
>
> > Hi,
> >
> > I need some info on Hbase vs Cassandra as a data store (in general plus
> > specific to time series data).
> >
> > The comparison in the following helps:
> > 1: features
> > 2: deployment and monitoring
> > 3: performance
> > 4: anything else
> >
> > Thanks
> > Ajay
> >
>

Re: Hbase vs Cassandra

Posted by anil gupta <an...@gmail.com>.

Hey Ajay,

Your topic of discussion of too broad.
There are tons of comparison on HBase vs Cassandra:
https://www.google.com/search?q=hbase+vs+cassandra&ie=utf-8&oe=utf-8

Which one you should use, boils down to your use case? strong consistency?
range scans? need deeper integration with hadoop ecosystem?,etc
Please explain your use case and share your thoughts after doing some
preliminary reading.

Thanks,
Anil Gupta

On Fri, May 29, 2015 at 12:20 PM, Lukáš Vlček <lu...@gmail.com> wrote:

> As for the #4 you might be interested in reading
> https://aphyr.com/posts/294-call-me-maybe-cassandra
> Not sure if there is comparable article about HBase (anybody knows?) but it
> can give you another perspective about what else to keep an eye on
> regarding these systems.
>
> Regards,
> Lukas
>
> On Fri, May 29, 2015 at 9:12 PM, Ajay <aj...@gmail.com> wrote:
>
> > Hi,
> >
> > I need some info on Hbase vs Cassandra as a data store (in general plus
> > specific to time series data).
> >
> > The comparison in the following helps:
> > 1: features
> > 2: deployment and monitoring
> > 3: performance
> > 4: anything else
> >
> > Thanks
> > Ajay
> >
>

-- 
Thanks & Regards,
Anil Gupta

Re: Hbase vs Cassandra

Posted by Lukáš Vlček <lu...@gmail.com>.

As for the #4 you might be interested in reading
https://aphyr.com/posts/294-call-me-maybe-cassandra
Not sure if there is comparable article about HBase (anybody knows?) but it
can give you another perspective about what else to keep an eye on
regarding these systems.

Regards,
Lukas

On Fri, May 29, 2015 at 9:12 PM, Ajay <aj...@gmail.com> wrote:

> Hi,
>
> I need some info on Hbase vs Cassandra as a data store (in general plus
> specific to time series data).
>
> The comparison in the following helps:
> 1: features
> 2: deployment and monitoring
> 3: performance
> 4: anything else
>
> Thanks
> Ajay
>