You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Prem Yadav <ip...@gmail.com> on 2014/07/04 16:37:57 UTC

Cassandra use cases/Strengths/Weakness

Hi,
I have seen this in a lot of replies that Cassandra is not designed for
this and that. I don't want to sound rude, i just need some info about this
so that i can compare it to technologies like hbase, mongo,
elasticsearch, solr,
etc.

1) what is Cassandra designed for. Heave writes yes. So is Hbase. Or
ElasticSearch
What is the use case(s) that suit Cassandra.

2) What kind of queries are best suited for Cassandra.
I ask this Because I have seen people asking about queries and getting
replies that its not suited for Cassandra. For ex: queries where large
number of rows are requested and timeout happens. Or range queries or
aggregate queries.

3) Where does Cassandra excel compared to other technologies?

I have been working on Casandra for some time. I know how it works and I
like it very much.
We are moving towards building a big cluster. But at this point, I am not
sure if its a right decision.

A lot of people including me like Cassandra in my company. But it has more
to do with the CQL and not the internals or the use cases. Until now, there
have been small PoCs and people enjoyed it. But a large scale project, we
are not so sure.

Please guide us.
Please note that the drawbacks of other technologies do not interest me,
its the strengths/weaknesses of Cassandra I am interested in.
Thanks

Re: Cassandra use cases/Strengths/Weakness

Posted by Jens Rantil <je...@tink.se>.
Hi,


I think you are asking the wrong first question. You should start with "What are my requirements?". If you are only storing two items that are rarely ever modified, any database is a good approach. We have no idea what your use case is. We could speculate about it, but really it all boils down to one or multiple applications.




Don't use a hammer if don't know what to use it for.




As to your CTO, your answer to use database X should be "because it fits our requirements". Simple as that.




My five cents,

Jens
—
Sent from Mailbox

On Fri, Jul 4, 2014 at 9:31 PM, Prem Yadav <ip...@gmail.com> wrote:

> Thanks Manoj. Great post for those who already have Cassandra in production.
> However it brings me back to my original post.
> All the points you have mentioned apply to any big data technology.
> Storage- All of them
> Query- All of them. In fact lot of them perform better. Agree that CQL
> structure is better. But hive,mongo all good
> Availability- many of them
> So my question is basically to Cassandra support people e.g.- Datastax Or
> the developers.
> What makes Cassandra special.
> If I have to convince my CTO to spend million dollars on a cluster and
> support, his first question would be why Cassandra? Why not this or that?
> So I still am not sure about what special Cassandra brings to the table?
> Sorry about the rant. But in the enterprise world, decisions are taken
> based on taking into account the stability, convincing managers and what
> not. Chosen technology has to be stable for years. People should be
> convinced that the engineers are not going to do a lot of firefighting.
> Any inputs appreciated.
> On Fri, Jul 4, 2014 at 7:07 PM, Manoj Khangaonkar <kh...@gmail.com>
> wrote:
>> These are my personal opinions based on few months using Cassandra. These
>> are my views. Others
>> may have different opinion
>>
>>
>>
>> http://khangaonkar.blogspot.com/2014/06/apache-cassandra-things-to-consider.html
>>
>> regards
>>
>>
>>
>> On Fri, Jul 4, 2014 at 7:37 AM, Prem Yadav <ip...@gmail.com> wrote:
>>
>>> Hi,
>>> I have seen this in a lot of replies that Cassandra is not designed for
>>> this and that. I don't want to sound rude, i just need some info about this
>>> so that i can compare it to technologies like hbase, mongo, elasticsearch, solr,
>>> etc.
>>>
>>> 1) what is Cassandra designed for. Heave writes yes. So is Hbase. Or
>>> ElasticSearch
>>> What is the use case(s) that suit Cassandra.
>>>
>>> 2) What kind of queries are best suited for Cassandra.
>>> I ask this Because I have seen people asking about queries and getting
>>> replies that its not suited for Cassandra. For ex: queries where large
>>> number of rows are requested and timeout happens. Or range queries or
>>> aggregate queries.
>>>
>>> 3) Where does Cassandra excel compared to other technologies?
>>>
>>> I have been working on Casandra for some time. I know how it works and I
>>> like it very much.
>>> We are moving towards building a big cluster. But at this point, I am not
>>> sure if its a right decision.
>>>
>>> A lot of people including me like Cassandra in my company. But it has
>>> more to do with the CQL and not the internals or the use cases. Until now,
>>> there have been small PoCs and people enjoyed it. But a large scale
>>> project, we are not so sure.
>>>
>>> Please guide us.
>>> Please note that the drawbacks of other technologies do not interest me,
>>> its the strengths/weaknesses of Cassandra I am interested in.
>>> Thanks
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>> --
>> http://khangaonkar.blogspot.com/
>>

Re: Cassandra use cases/Strengths/Weakness

Posted by Prem Yadav <ip...@gmail.com>.
Jens,
thanks for the response but your reply doesn't serve any purpose. I asked
about use cases suitable for Cassandra. It is a basic question about what
purpose does this technology serve? My use case or requirements do not
matter in that regard. And 'fits our requirements' is not a valid reason
anymore. Until hadoop came along, RDBMS fit all requirements just fine. Its
about choosing a superior technology which turns into minimal overhead and
more profit for the company.


*James*, excellent points and very helpful. I have supported multiple
systems as well. Hadoop, Hbase, ElasticSearch, Solr and we also have a very
efficient and fast, horizontally scalable scala based internal system.
Apart from the performance and operational excellenece that you mentioned,
is there is use case which cassandra excels by design?
Take OPENTSDB for example. They chose hbase because hbase serves this use
case by design. Great scan performance.

Any use case like that where Cassandra is the obvious choice?

Thanks



On Fri, Jul 4, 2014 at 8:58 PM, James Horey <jl...@opencore.io> wrote:

> I’ve supported a variety of different “big data” systems and most have
> their own particular set of use cases that make sense. Having said that, I
> believe that Cassandra uniquely excels at the following:
>
> * Low write latency with respect to small to medium write sizes (logs,
> sensor data, etc.)
> * Linear write scalability
> * Fault-tolerance across geographic locations
>
> The first two points makes it an excellent candidate for high-throughput
> “transactional” systems. Other systems that play in this space tend to be
> HBase and Riak (there may be others, but I’m most familiar with those two).
> However, the last point is pretty unique to Cassandra.
>
> So if you’re looking for a high-scale out, high-throughput transactional
> system then Cassandra may make sense for you. If you’re looking for
> something more geared towards analytics (so few bulk writes, many reads),
> then something in the Hadoop space may make sense.
>
> Cheers
> James
>
> On Jul 4, 2014, at 3:31 PM, Prem Yadav <ip...@gmail.com> wrote:
>
> Thanks Manoj. Great post for those who already have Cassandra in
> production.
> However it brings me back to my original post.
> All the points you have mentioned apply to any big data technology.
> Storage- All of them
> Query- All of them. In fact lot of them perform better. Agree that CQL
> structure is better. But hive,mongo all good
> Availability- many of them
>
> So my question is basically to Cassandra support people e.g.- Datastax Or
> the developers.
> What makes Cassandra special.
> If I have to convince my CTO to spend million dollars on a cluster and
> support, his first question would be why Cassandra? Why not this or that?
>
> So I still am not sure about what special Cassandra brings to the table?
>
> Sorry about the rant. But in the enterprise world, decisions are taken
> based on taking into account the stability, convincing managers and what
> not. Chosen technology has to be stable for years. People should be
> convinced that the engineers are not going to do a lot of firefighting.
>
> Any inputs appreciated.
>
>
>
> On Fri, Jul 4, 2014 at 7:07 PM, Manoj Khangaonkar <kh...@gmail.com>
> wrote:
>
>> These are my personal opinions based on few months using Cassandra. These
>> are my views. Others
>> may have different opinion
>>
>>
>>
>> http://khangaonkar.blogspot.com/2014/06/apache-cassandra-things-to-consider.html
>>
>> regards
>>
>>
>>
>> On Fri, Jul 4, 2014 at 7:37 AM, Prem Yadav <ip...@gmail.com> wrote:
>>
>>> Hi,
>>> I have seen this in a lot of replies that Cassandra is not designed for
>>> this and that. I don't want to sound rude, i just need some info about this
>>> so that i can compare it to technologies like hbase, mongo, elasticsearch, solr,
>>> etc.
>>>
>>> 1) what is Cassandra designed for. Heave writes yes. So is Hbase. Or
>>> ElasticSearch
>>> What is the use case(s) that suit Cassandra.
>>>
>>> 2) What kind of queries are best suited for Cassandra.
>>> I ask this Because I have seen people asking about queries and getting
>>> replies that its not suited for Cassandra. For ex: queries where large
>>> number of rows are requested and timeout happens. Or range queries or
>>> aggregate queries.
>>>
>>> 3) Where does Cassandra excel compared to other technologies?
>>>
>>> I have been working on Casandra for some time. I know how it works and I
>>> like it very much.
>>> We are moving towards building a big cluster. But at this point, I am
>>> not sure if its a right decision.
>>>
>>> A lot of people including me like Cassandra in my company. But it has
>>> more to do with the CQL and not the internals or the use cases. Until now,
>>> there have been small PoCs and people enjoyed it. But a large scale
>>> project, we are not so sure.
>>>
>>> Please guide us.
>>> Please note that the drawbacks of other technologies do not interest me,
>>> its the strengths/weaknesses of Cassandra I am interested in.
>>> Thanks
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>> --
>> http://khangaonkar.blogspot.com/
>>
>
>
>

Re: Cassandra use cases/Strengths/Weakness

Posted by Keith Freeman <8f...@gmail.com>.
We've struggled getting consistent write latency & linear write 
scalability with a pretty heavy insert load (1000's of records/second), 
and our records are about 1k-2k of data (mix of integer/string columns 
and a blob).  Wondering if you have any rough numbers for your "small to 
medium write sizes" experience?

On 07/04/2014 01:58 PM, James Horey wrote:
> ...
> * Low write latency with respect to small to medium write sizes (logs, 
> sensor data, etc.)
> * Linear write scalability
> * ...


Re: Cassandra use cases/Strengths/Weakness

Posted by James Horey <jl...@opencore.io>.
I’ve supported a variety of different “big data” systems and most have their own particular set of use cases that make sense. Having said that, I believe that Cassandra uniquely excels at the following:

* Low write latency with respect to small to medium write sizes (logs, sensor data, etc.)
* Linear write scalability
* Fault-tolerance across geographic locations

The first two points makes it an excellent candidate for high-throughput “transactional” systems. Other systems that play in this space tend to be HBase and Riak (there may be others, but I’m most familiar with those two). However, the last point is pretty unique to Cassandra. 

So if you’re looking for a high-scale out, high-throughput transactional system then Cassandra may make sense for you. If you’re looking for something more geared towards analytics (so few bulk writes, many reads), then something in the Hadoop space may make sense.

Cheers
James

On Jul 4, 2014, at 3:31 PM, Prem Yadav <ip...@gmail.com> wrote:

> Thanks Manoj. Great post for those who already have Cassandra in production.
> However it brings me back to my original post.
> All the points you have mentioned apply to any big data technology.
> Storage- All of them
> Query- All of them. In fact lot of them perform better. Agree that CQL structure is better. But hive,mongo all good
> Availability- many of them
> 
> So my question is basically to Cassandra support people e.g.- Datastax Or the developers. 
> What makes Cassandra special. 
> If I have to convince my CTO to spend million dollars on a cluster and support, his first question would be why Cassandra? Why not this or that?
> 
> So I still am not sure about what special Cassandra brings to the table?
> 
> Sorry about the rant. But in the enterprise world, decisions are taken based on taking into account the stability, convincing managers and what not. Chosen technology has to be stable for years. People should be convinced that the engineers are not going to do a lot of firefighting.
> 
> Any inputs appreciated.
> 
> 
> 
> On Fri, Jul 4, 2014 at 7:07 PM, Manoj Khangaonkar <kh...@gmail.com> wrote:
> These are my personal opinions based on few months using Cassandra. These are my views. Others
> may have different opinion
> 
> 
> http://khangaonkar.blogspot.com/2014/06/apache-cassandra-things-to-consider.html
> 
> regards
> 
> 
> 
> On Fri, Jul 4, 2014 at 7:37 AM, Prem Yadav <ip...@gmail.com> wrote:
> Hi,
> I have seen this in a lot of replies that Cassandra is not designed for this and that. I don't want to sound rude, i just need some info about this so that i can compare it to technologies like hbase, mongo, elasticsearch, solr, etc.
> 
> 1) what is Cassandra designed for. Heave writes yes. So is Hbase. Or ElasticSearch
> What is the use case(s) that suit Cassandra.
> 
> 2) What kind of queries are best suited for Cassandra.
> I ask this Because I have seen people asking about queries and getting replies that its not suited for Cassandra. For ex: queries where large number of rows are requested and timeout happens. Or range queries or aggregate queries.
> 
> 3) Where does Cassandra excel compared to other technologies?
> 
> I have been working on Casandra for some time. I know how it works and I like it very much. 
> We are moving towards building a big cluster. But at this point, I am not sure if its a right decision. 
> 
> A lot of people including me like Cassandra in my company. But it has more to do with the CQL and not the internals or the use cases. Until now, there have been small PoCs and people enjoyed it. But a large scale project, we are not so sure.
> 
> Please guide us.
> Please note that the drawbacks of other technologies do not interest me, its the strengths/weaknesses of Cassandra I am interested in.
> Thanks
> 
>  
> 
> 
> 
> 
> 
> 
> 
> -- 
> http://khangaonkar.blogspot.com/
> 


Re: Cassandra use cases/Strengths/Weakness

Posted by Robert Stupp <sn...@snazy.de>.
I agree, that traditional RDBMS have good and established admin/mgmt tools/practices. 

But C* strength is distributed, failure tolerant operation. And this is exactly where nearly all traditional RDBMS just fail. I've seen both Oracle and IBM "clusters"/"HA" "solutions" (and a lot of other software) fail regularly - even just running at moderate load.

--
Sent from my iPhone 

> Am 09.07.2014 um 02:51 schrieb Robert Coli <rc...@eventbrite.com>:
> 
>> On Fri, Jul 4, 2014 at 2:10 PM, DuyHai Doan <do...@gmail.com> wrote:
>>  c. operational simplicity due to master-less architecture. This feature is, although quite transparent for developers, is a key selling point. Having suffered when installing manually a Hadoop cluster, I happen to love the deployment simplicity of C*, only one process per node, no moving parts.
> 
> Asserting that Cassandra, as a fully functioning production system, is currently easier to operate than RDBMS is just false. It is still false even if we ignore the availability of experienced RDBMS operators and decades of RDBMS operational best practice.
> 
> The quality of software engineering practice in RDBMS land also most assuredly results in a more easily operable system in many, many use cases. Yes, Cassandra is more tolerant to individual node failures. This turns out to not matter as much in terms of "operability" as non-operators appear to think it does. Very trivial operational activities ("create a new columnfamily" or "replace a failed node") are subject to failure mode edge cases which often are not resolvable without brute force methods.
> 
> I am unable to get my head around the oft-heard marketing assertion that a data-store in which such common activities are not bulletproof is capable of being than better to operate than the RDBMS status quo. The production operators I know also do not agree that Cassandra is simple to operate.
> 
> All the above aside, I continue to maintain that Cassandra is the best at being the type of thing that it is. If you have a need to horizontally scale a use case that is well suited for its strength and poorly suited for RDBMS, you should use it. Far fewer people actually have this sort of case than think they do.
> 
> =Rob

Re: Cassandra use cases/Strengths/Weakness

Posted by Jonathan Haddad <jo...@jonhaddad.com>.
I've used various databases in production for over 10 years.  Each has
strengths and weaknesses.

I ran Cassandra for just shy of 2 years in production as part of both
development teams and operations, and I only hit 1 serious problem
that Rob mentioned.  Ideally C* would have guarded against it, but it
did not.  I did not have any downtime as a result, however.  For those
curious, I tried to add 1.2 nodes to a 1.1 cluster.  Aside from that,
I actually did find Cassandra simple to operate & manage.

I used Cassandra as more of a general purpose database.  I was willing
to give up some query flexibility in favor of high availability and
multi dc support.  There were times we needed to add more servers to
deal with additional load, it handled it perfectly.

For me it wasn't such a big problem, there's always optimizations that
need to be made no matter what DB you use.

Disclaimer: I now work for Datastax.


On Tue, Jul 8, 2014 at 5:51 PM, Robert Coli <rc...@eventbrite.com> wrote:
> On Fri, Jul 4, 2014 at 2:10 PM, DuyHai Doan <do...@gmail.com> wrote:
>>
>>  c. operational simplicity due to master-less architecture. This feature
>> is, although quite transparent for developers, is a key selling point.
>> Having suffered when installing manually a Hadoop cluster, I happen to love
>> the deployment simplicity of C*, only one process per node, no moving parts.
>
>
> Asserting that Cassandra, as a fully functioning production system, is
> currently easier to operate than RDBMS is just false. It is still false even
> if we ignore the availability of experienced RDBMS operators and decades of
> RDBMS operational best practice.
>
> The quality of software engineering practice in RDBMS land also most
> assuredly results in a more easily operable system in many, many use cases.
> Yes, Cassandra is more tolerant to individual node failures. This turns out
> to not matter as much in terms of "operability" as non-operators appear to
> think it does. Very trivial operational activities ("create a new
> columnfamily" or "replace a failed node") are subject to failure mode edge
> cases which often are not resolvable without brute force methods.
>
> I am unable to get my head around the oft-heard marketing assertion that a
> data-store in which such common activities are not bulletproof is capable of
> being than better to operate than the RDBMS status quo. The production
> operators I know also do not agree that Cassandra is simple to operate.
>
> All the above aside, I continue to maintain that Cassandra is the best at
> being the type of thing that it is. If you have a need to horizontally scale
> a use case that is well suited for its strength and poorly suited for RDBMS,
> you should use it. Far fewer people actually have this sort of case than
> think they do.
>
> =Rob



-- 
Jon Haddad
http://www.rustyrazorblade.com
skype: rustyrazorblade

Re: Cassandra use cases/Strengths/Weakness

Posted by DuyHai Doan <do...@gmail.com>.
Indeed I did not really compare C* operational simplicity to traditional
RDBMS. Implicity the comparison is made with other NoSQL datastore.


On Wed, Jul 9, 2014 at 2:51 AM, Robert Coli <rc...@eventbrite.com> wrote:

> On Fri, Jul 4, 2014 at 2:10 PM, DuyHai Doan <do...@gmail.com> wrote:
>
>>  c. operational simplicity due to master-less architecture. This feature
>> is, although quite transparent for developers, is a key selling point.
>> Having suffered when installing manually a Hadoop cluster, I happen to love
>> the deployment simplicity of C*, only one process per node, no moving parts.
>>
>
> Asserting that Cassandra, as a fully functioning production system, is
> currently easier to operate than RDBMS is just false. It is still false
> even if we ignore the availability of experienced RDBMS operators and
> decades of RDBMS operational best practice.
>
> The quality of software engineering practice in RDBMS land also most
> assuredly results in a more easily operable system in many, many use cases.
> Yes, Cassandra is more tolerant to individual node failures. This turns out
> to not matter as much in terms of "operability" as non-operators appear to
> think it does. Very trivial operational activities ("create a new
> columnfamily" or "replace a failed node") are subject to failure mode edge
> cases which often are not resolvable without brute force methods.
>
> I am unable to get my head around the oft-heard marketing assertion that a
> data-store in which such common activities are not bulletproof is capable
> of being than better to operate than the RDBMS status quo. The production
> operators I know also do not agree that Cassandra is simple to operate.
>
> All the above aside, I continue to maintain that Cassandra is the best at
> being the type of thing that it is. If you have a need to horizontally
> scale a use case that is well suited for its strength and poorly suited for
> RDBMS, you should use it. Far fewer people actually have this sort of case
> than think they do.
>
> =Rob
>

Re: Cassandra use cases/Strengths/Weakness

Posted by Robert Coli <rc...@eventbrite.com>.
On Fri, Jul 4, 2014 at 2:10 PM, DuyHai Doan <do...@gmail.com> wrote:

>  c. operational simplicity due to master-less architecture. This feature
> is, although quite transparent for developers, is a key selling point.
> Having suffered when installing manually a Hadoop cluster, I happen to love
> the deployment simplicity of C*, only one process per node, no moving parts.
>

Asserting that Cassandra, as a fully functioning production system, is
currently easier to operate than RDBMS is just false. It is still false
even if we ignore the availability of experienced RDBMS operators and
decades of RDBMS operational best practice.

The quality of software engineering practice in RDBMS land also most
assuredly results in a more easily operable system in many, many use cases.
Yes, Cassandra is more tolerant to individual node failures. This turns out
to not matter as much in terms of "operability" as non-operators appear to
think it does. Very trivial operational activities ("create a new
columnfamily" or "replace a failed node") are subject to failure mode edge
cases which often are not resolvable without brute force methods.

I am unable to get my head around the oft-heard marketing assertion that a
data-store in which such common activities are not bulletproof is capable
of being than better to operate than the RDBMS status quo. The production
operators I know also do not agree that Cassandra is simple to operate.

All the above aside, I continue to maintain that Cassandra is the best at
being the type of thing that it is. If you have a need to horizontally
scale a use case that is well suited for its strength and poorly suited for
RDBMS, you should use it. Far fewer people actually have this sort of case
than think they do.

=Rob

Re: Cassandra use cases/Strengths/Weakness

Posted by Prem Yadav <ip...@gmail.com>.
Duy,
if you are not already working for Datastax, they should hire you. :)

Great response. You have given me some good points to think about.  I will
do the rest of the research.

Thanks.





On Fri, Jul 4, 2014 at 10:10 PM, DuyHai Doan <do...@gmail.com> wrote:

> I would answer your question this way:
>
> 1) Why should I choose C* ?
>
>  a. linear scalability, throughputs scale "almost" linearly with number of
> nodes
>
>  b. almost unbounded extensivity (there is no limit, or at least  huge
> limit in term of number of nodes you can have on a cluster)
>
>  c. operational simplicity due to master-less architecture. This feature
> is, although quite transparent for developers, is a key selling point.
> Having suffered when installing manually a Hadoop cluster, I happen to love
> the deployment simplicity of C*, only one process per node, no moving parts.
>
> d. high availability. C* trades consistency for availability clearly so
> you can expect to have something like 99.99% of uptime. Very selling point
> for critical business which need to be up all the time
>
> e. support for multi data centers out of the box. Again, on the
> operational side, it's a great feature if you plan a worldwide deployment
>
> That's all I can see for now
>
> 2) Why shouldn't I choose C* ?
>
> a. need for a strong consistency most of the time. Although you can
> perform all requests  with Consistency level ALL, it's clearly not the best
> use of C*. You'll suffer for higher latency and reduced availability. Even
> the new "lightweight transaction" feature is not meant to be use on large
> scale
>
> b. very complicated and changing queries. Denormalizing is great when you
> know ahead of time exactly how you'll query your data. Once done, any new
> way of querying will require new coding & new tables to support it
>
> c. ridiculous data load. I've seen people in prod using C* for only 200Gb
> because they want to be trendy and use bleeding edge technologies. They'd
> better off using a classical RDBMS solution that fit perfectly their load
>
> Hope that helps
>
> Duy Hai DOAN
>
>
>
> On Fri, Jul 4, 2014 at 9:31 PM, Prem Yadav <ip...@gmail.com> wrote:
>
>> Thanks Manoj. Great post for those who already have Cassandra in
>> production.
>> However it brings me back to my original post.
>> All the points you have mentioned apply to any big data technology.
>> Storage- All of them
>> Query- All of them. In fact lot of them perform better. Agree that CQL
>> structure is better. But hive,mongo all good
>> Availability- many of them
>>
>> So my question is basically to Cassandra support people e.g.- Datastax Or
>> the developers.
>> What makes Cassandra special.
>> If I have to convince my CTO to spend million dollars on a cluster and
>> support, his first question would be why Cassandra? Why not this or that?
>>
>> So I still am not sure about what special Cassandra brings to the table?
>>
>> Sorry about the rant. But in the enterprise world, decisions are taken
>> based on taking into account the stability, convincing managers and what
>> not. Chosen technology has to be stable for years. People should be
>> convinced that the engineers are not going to do a lot of firefighting.
>>
>> Any inputs appreciated.
>>
>>
>>
>> On Fri, Jul 4, 2014 at 7:07 PM, Manoj Khangaonkar <kh...@gmail.com>
>> wrote:
>>
>>> These are my personal opinions based on few months using Cassandra.
>>> These are my views. Others
>>>  may have different opinion
>>>
>>>
>>>
>>> http://khangaonkar.blogspot.com/2014/06/apache-cassandra-things-to-consider.html
>>>
>>> regards
>>>
>>>
>>>
>>> On Fri, Jul 4, 2014 at 7:37 AM, Prem Yadav <ip...@gmail.com> wrote:
>>>
>>>> Hi,
>>>> I have seen this in a lot of replies that Cassandra is not designed for
>>>> this and that. I don't want to sound rude, i just need some info about this
>>>> so that i can compare it to technologies like hbase, mongo, elasticsearch, solr,
>>>> etc.
>>>>
>>>> 1) what is Cassandra designed for. Heave writes yes. So is Hbase. Or
>>>> ElasticSearch
>>>> What is the use case(s) that suit Cassandra.
>>>>
>>>> 2) What kind of queries are best suited for Cassandra.
>>>> I ask this Because I have seen people asking about queries and getting
>>>> replies that its not suited for Cassandra. For ex: queries where large
>>>> number of rows are requested and timeout happens. Or range queries or
>>>> aggregate queries.
>>>>
>>>> 3) Where does Cassandra excel compared to other technologies?
>>>>
>>>> I have been working on Casandra for some time. I know how it works and
>>>> I like it very much.
>>>> We are moving towards building a big cluster. But at this point, I am
>>>> not sure if its a right decision.
>>>>
>>>> A lot of people including me like Cassandra in my company. But it has
>>>> more to do with the CQL and not the internals or the use cases. Until now,
>>>> there have been small PoCs and people enjoyed it. But a large scale
>>>> project, we are not so sure.
>>>>
>>>> Please guide us.
>>>> Please note that the drawbacks of other technologies do not interest
>>>> me, its the strengths/weaknesses of Cassandra I am interested in.
>>>> Thanks
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> http://khangaonkar.blogspot.com/
>>>
>>
>>
>

Re: Cassandra use cases/Strengths/Weakness

Posted by DuyHai Doan <do...@gmail.com>.
I would answer your question this way:

1) Why should I choose C* ?

 a. linear scalability, throughputs scale "almost" linearly with number of
nodes

 b. almost unbounded extensivity (there is no limit, or at least  huge
limit in term of number of nodes you can have on a cluster)

 c. operational simplicity due to master-less architecture. This feature
is, although quite transparent for developers, is a key selling point.
Having suffered when installing manually a Hadoop cluster, I happen to love
the deployment simplicity of C*, only one process per node, no moving parts.

d. high availability. C* trades consistency for availability clearly so you
can expect to have something like 99.99% of uptime. Very selling point for
critical business which need to be up all the time

e. support for multi data centers out of the box. Again, on the operational
side, it's a great feature if you plan a worldwide deployment

That's all I can see for now

2) Why shouldn't I choose C* ?

a. need for a strong consistency most of the time. Although you can perform
all requests  with Consistency level ALL, it's clearly not the best use of
C*. You'll suffer for higher latency and reduced availability. Even the new
"lightweight transaction" feature is not meant to be use on large scale

b. very complicated and changing queries. Denormalizing is great when you
know ahead of time exactly how you'll query your data. Once done, any new
way of querying will require new coding & new tables to support it

c. ridiculous data load. I've seen people in prod using C* for only 200Gb
because they want to be trendy and use bleeding edge technologies. They'd
better off using a classical RDBMS solution that fit perfectly their load

Hope that helps

Duy Hai DOAN



On Fri, Jul 4, 2014 at 9:31 PM, Prem Yadav <ip...@gmail.com> wrote:

> Thanks Manoj. Great post for those who already have Cassandra in
> production.
> However it brings me back to my original post.
> All the points you have mentioned apply to any big data technology.
> Storage- All of them
> Query- All of them. In fact lot of them perform better. Agree that CQL
> structure is better. But hive,mongo all good
> Availability- many of them
>
> So my question is basically to Cassandra support people e.g.- Datastax Or
> the developers.
> What makes Cassandra special.
> If I have to convince my CTO to spend million dollars on a cluster and
> support, his first question would be why Cassandra? Why not this or that?
>
> So I still am not sure about what special Cassandra brings to the table?
>
> Sorry about the rant. But in the enterprise world, decisions are taken
> based on taking into account the stability, convincing managers and what
> not. Chosen technology has to be stable for years. People should be
> convinced that the engineers are not going to do a lot of firefighting.
>
> Any inputs appreciated.
>
>
>
> On Fri, Jul 4, 2014 at 7:07 PM, Manoj Khangaonkar <kh...@gmail.com>
> wrote:
>
>> These are my personal opinions based on few months using Cassandra. These
>> are my views. Others
>> may have different opinion
>>
>>
>>
>> http://khangaonkar.blogspot.com/2014/06/apache-cassandra-things-to-consider.html
>>
>> regards
>>
>>
>>
>> On Fri, Jul 4, 2014 at 7:37 AM, Prem Yadav <ip...@gmail.com> wrote:
>>
>>> Hi,
>>> I have seen this in a lot of replies that Cassandra is not designed for
>>> this and that. I don't want to sound rude, i just need some info about this
>>> so that i can compare it to technologies like hbase, mongo, elasticsearch, solr,
>>> etc.
>>>
>>> 1) what is Cassandra designed for. Heave writes yes. So is Hbase. Or
>>> ElasticSearch
>>> What is the use case(s) that suit Cassandra.
>>>
>>> 2) What kind of queries are best suited for Cassandra.
>>> I ask this Because I have seen people asking about queries and getting
>>> replies that its not suited for Cassandra. For ex: queries where large
>>> number of rows are requested and timeout happens. Or range queries or
>>> aggregate queries.
>>>
>>> 3) Where does Cassandra excel compared to other technologies?
>>>
>>> I have been working on Casandra for some time. I know how it works and I
>>> like it very much.
>>> We are moving towards building a big cluster. But at this point, I am
>>> not sure if its a right decision.
>>>
>>> A lot of people including me like Cassandra in my company. But it has
>>> more to do with the CQL and not the internals or the use cases. Until now,
>>> there have been small PoCs and people enjoyed it. But a large scale
>>> project, we are not so sure.
>>>
>>> Please guide us.
>>> Please note that the drawbacks of other technologies do not interest me,
>>> its the strengths/weaknesses of Cassandra I am interested in.
>>> Thanks
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>> --
>> http://khangaonkar.blogspot.com/
>>
>
>

Re: Cassandra use cases/Strengths/Weakness

Posted by Prem Yadav <ip...@gmail.com>.
Thanks Manoj. Great post for those who already have Cassandra in production.
However it brings me back to my original post.
All the points you have mentioned apply to any big data technology.
Storage- All of them
Query- All of them. In fact lot of them perform better. Agree that CQL
structure is better. But hive,mongo all good
Availability- many of them

So my question is basically to Cassandra support people e.g.- Datastax Or
the developers.
What makes Cassandra special.
If I have to convince my CTO to spend million dollars on a cluster and
support, his first question would be why Cassandra? Why not this or that?

So I still am not sure about what special Cassandra brings to the table?

Sorry about the rant. But in the enterprise world, decisions are taken
based on taking into account the stability, convincing managers and what
not. Chosen technology has to be stable for years. People should be
convinced that the engineers are not going to do a lot of firefighting.

Any inputs appreciated.



On Fri, Jul 4, 2014 at 7:07 PM, Manoj Khangaonkar <kh...@gmail.com>
wrote:

> These are my personal opinions based on few months using Cassandra. These
> are my views. Others
> may have different opinion
>
>
>
> http://khangaonkar.blogspot.com/2014/06/apache-cassandra-things-to-consider.html
>
> regards
>
>
>
> On Fri, Jul 4, 2014 at 7:37 AM, Prem Yadav <ip...@gmail.com> wrote:
>
>> Hi,
>> I have seen this in a lot of replies that Cassandra is not designed for
>> this and that. I don't want to sound rude, i just need some info about this
>> so that i can compare it to technologies like hbase, mongo, elasticsearch, solr,
>> etc.
>>
>> 1) what is Cassandra designed for. Heave writes yes. So is Hbase. Or
>> ElasticSearch
>> What is the use case(s) that suit Cassandra.
>>
>> 2) What kind of queries are best suited for Cassandra.
>> I ask this Because I have seen people asking about queries and getting
>> replies that its not suited for Cassandra. For ex: queries where large
>> number of rows are requested and timeout happens. Or range queries or
>> aggregate queries.
>>
>> 3) Where does Cassandra excel compared to other technologies?
>>
>> I have been working on Casandra for some time. I know how it works and I
>> like it very much.
>> We are moving towards building a big cluster. But at this point, I am not
>> sure if its a right decision.
>>
>> A lot of people including me like Cassandra in my company. But it has
>> more to do with the CQL and not the internals or the use cases. Until now,
>> there have been small PoCs and people enjoyed it. But a large scale
>> project, we are not so sure.
>>
>> Please guide us.
>> Please note that the drawbacks of other technologies do not interest me,
>> its the strengths/weaknesses of Cassandra I am interested in.
>> Thanks
>>
>>
>>
>>
>>
>>
>>
>
>
> --
> http://khangaonkar.blogspot.com/
>

Re: Cassandra use cases/Strengths/Weakness

Posted by Manoj Khangaonkar <kh...@gmail.com>.
These are my personal opinions based on few months using Cassandra. These
are my views. Others
may have different opinion


http://khangaonkar.blogspot.com/2014/06/apache-cassandra-things-to-consider.html

regards



On Fri, Jul 4, 2014 at 7:37 AM, Prem Yadav <ip...@gmail.com> wrote:

> Hi,
> I have seen this in a lot of replies that Cassandra is not designed for
> this and that. I don't want to sound rude, i just need some info about this
> so that i can compare it to technologies like hbase, mongo, elasticsearch, solr,
> etc.
>
> 1) what is Cassandra designed for. Heave writes yes. So is Hbase. Or
> ElasticSearch
> What is the use case(s) that suit Cassandra.
>
> 2) What kind of queries are best suited for Cassandra.
> I ask this Because I have seen people asking about queries and getting
> replies that its not suited for Cassandra. For ex: queries where large
> number of rows are requested and timeout happens. Or range queries or
> aggregate queries.
>
> 3) Where does Cassandra excel compared to other technologies?
>
> I have been working on Casandra for some time. I know how it works and I
> like it very much.
> We are moving towards building a big cluster. But at this point, I am not
> sure if its a right decision.
>
> A lot of people including me like Cassandra in my company. But it has more
> to do with the CQL and not the internals or the use cases. Until now, there
> have been small PoCs and people enjoyed it. But a large scale project, we
> are not so sure.
>
> Please guide us.
> Please note that the drawbacks of other technologies do not interest me,
> its the strengths/weaknesses of Cassandra I am interested in.
> Thanks
>
>
>
>
>
>
>


-- 
http://khangaonkar.blogspot.com/

Re: Cassandra use cases/Strengths/Weakness

Posted by Jack Krupansky <ja...@basetechnology.com>.
“Is cassandra only for use cases with data load > 100TB and massive user counts?”

I wouldn’t make that extreme a statement! There are plenty of more moderate use cases for Cassandra. For example, a dozen nodes with 300 GB per node for just a few million users and their interactions and transactions.

I would say that as a rough rule of thumb that a traditional RDBMS is great for up to low millions of rows, and Cassandra is clearly needed when you have more than a few hundred millions of rows. In between, it becomes a more subjective choice.

Tens of millions of rows can probably be dealt with effectively by an RDBMS, but... you’re starting to have to be careful and configure high-end systems and manage them carefully. 100 million rows? Sure, you could still do that on an RDBMS if you are motivated and put in the effort. For example, some relational databases may require manual partitioning when you have more than 25 million rows or so. And then you have to pay attention to query latency as well.

First big question: It may be 100 million rows today, but what growth rate do you anticipate?

-- Jack Krupansky

From: Matthias Hübner 
Sent: Saturday, July 5, 2014 5:49 AM
To: user@cassandra.apache.org 
Subject: Re: Cassandra use cases/Strengths/Weakness

Hi,

i am a bit confused if cassandra is a choice for my use case especially after reading this thread.


Is cassandra only for use cases with data load > 100TB and massive user counts?


What about all the other features of cassandra, are they not useable to avoid limitations of relational databases, even for smaller use cases?


What do you think for my use case:


I need to manage data data for around 1000 retail stores to produce each day a delivery plan (including predictions several weeks in the future) to refill the stores. For each store I have to collect data about every single store item. A store has some 10 thousand items. This makes around 100 million items to manage. Each day I have store some updates for every single store item. Also I receive for all items sale predictions day by day. Every day I have to produce one ore more delivery plans. Most data will replace old data, so its not increasing that much. 

I thought i can handle data load easier with cassandra than with mariadb. I don’t have to care about locking, I could write all incoming data and merge into my tables. And I could use aggregations. So I would be able to add all store item related data together that I need to compute my delivery plans. Finally I would be able to use commodity hardware and can scale easier.




Have a nice weekend,

Matthias








2014-07-05 0:37 GMT+02:00 Jack Krupansky <ja...@basetechnology.com>:

  Elasticsearch and Solr are “search platforms”, not “databases”. The best description for Cassandra, especially for a CTO, is its home page:
  http://cassandra.apache.org/
  Even if you have seen it before, please read it again. There is a lot packed into a few words.

  DataStax Enterprise (DSE) combines Cassandra, Hadoop and Spark for analytics, and tightly integrated Solr for rich search of the Cassandra data.

  The main, biggest benefit of Cassandra is that it is a master-free distributed real-time database designed for scale, including support for multiple data centers, so that it is ready for managing mission critical operational data, for applications that need low latency and high availability for real-time data access.

  And OpsCenter is great for managing a Cassandra or DSE cluster. I’m sure a CTO would appreciate it:
  http://www.datastax.com/what-we-offer/products-services/datastax-opscenter

  Here’s a feature comparison of some NoSQL databases:
  http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis

  -- Jack Krupansky

  From: Prem Yadav 
  Sent: Friday, July 4, 2014 10:37 AM
  To: user@cassandra.apache.org 
  Subject: Cassandra use cases/Strengths/Weakness

  Hi,
  I have seen this in a lot of replies that Cassandra is not designed for this and that. I don't want to sound rude, i just need some info about this so that i can compare it to technologies like hbase, mongo, elasticsearch, solr, etc. 

  1) what is Cassandra designed for. Heave writes yes. So is Hbase. Or ElasticSearch
  What is the use case(s) that suit Cassandra.

  2) What kind of queries are best suited for Cassandra.
  I ask this Because I have seen people asking about queries and getting replies that its not suited for Cassandra. For ex: queries where large number of rows are requested and timeout happens. Or range queries or aggregate queries.



  3) Where does Cassandra excel compared to other technologies?

  I have been working on Casandra for some time. I know how it works and I like it very much. 
  We are moving towards building a big cluster. But at this point, I am not sure if its a right decision. 

  A lot of people including me like Cassandra in my company. But it has more to do with the CQL and not the internals or the use cases. Until now, there have been small PoCs and people enjoyed it. But a large scale project, we are not so sure.

  Please guide us.
  Please note that the drawbacks of other technologies do not interest me, its the strengths/weaknesses of Cassandra I am interested in.
  Thanks


   








Re: Cassandra use cases/Strengths/Weakness

Posted by Matthias Hübner <ma...@gmail.com>.
Hi,

i am a bit confused if cassandra is a choice for my use case especially
after reading this thread.

Is cassandra only for use cases with data load > 100TB and massive user
counts?

What about all the other features of cassandra, are they not useable to
avoid limitations of relational databases, even for smaller use cases?

What do you think for my use case:

I need to manage data data for around 1000 retail stores to produce each
day a delivery plan (including predictions several weeks in the future) to
refill the stores. For each store I have to collect data about every single
store item. A store has some 10 thousand items. This makes around 100
million items to manage. Each day I have store some updates for every
single store item. Also I receive for all items sale predictions day by
day. Every day I have to produce one ore more delivery plans. Most data
will replace old data, so its not increasing that much.

I thought i can handle data load easier with cassandra than with mariadb. I
don’t have to care about locking, I could write all incoming data and merge
into my tables. And I could use aggregations. So I would be able to add all
store item related data together that I need to compute my delivery plans.
Finally I would be able to use commodity hardware and can scale easier.


 Have a nice weekend,

Matthias





2014-07-05 0:37 GMT+02:00 Jack Krupansky <ja...@basetechnology.com>:

>   Elasticsearch and Solr are “search platforms”, not “databases”. The
> best description for Cassandra, especially for a CTO, is its home page:
> http://cassandra.apache.org/
> Even if you have seen it before, please read it again. There is a lot
> packed into a few words.
>
> DataStax Enterprise (DSE) combines Cassandra, Hadoop and Spark for
> analytics, and tightly integrated Solr for rich search of the Cassandra
> data.
>
> The main, biggest benefit of Cassandra is that it is a master-free
> distributed real-time database designed for scale, including support for
> multiple data centers, so that it is ready for managing mission critical
> operational data, for applications that need low latency and high
> availability for real-time data access.
>
> And OpsCenter is great for managing a Cassandra or DSE cluster. I’m sure a
> CTO would appreciate it:
> http://www.datastax.com/what-we-offer/products-services/datastax-opscenter
>
> Here’s a feature comparison of some NoSQL databases:
> http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis
>
> -- Jack Krupansky
>
>  *From:* Prem Yadav <ip...@gmail.com>
> *Sent:* Friday, July 4, 2014 10:37 AM
> *To:* user@cassandra.apache.org
> *Subject:* Cassandra use cases/Strengths/Weakness
>
>  Hi,
> I have seen this in a lot of replies that Cassandra is not designed for
> this and that. I don't want to sound rude, i just need some info about this
> so that i can compare it to technologies like hbase, mongo, elasticsearch, solr,
> etc.
>
> 1) what is Cassandra designed for. Heave writes yes. So is Hbase. Or
> ElasticSearch
> What is the use case(s) that suit Cassandra.
>
> 2) What kind of queries are best suited for Cassandra.
> I ask this Because I have seen people asking about queries and getting
> replies that its not suited for Cassandra. For ex: queries where large
> number of rows are requested and timeout happens. Or range queries or
> aggregate queries.
>
> 3) Where does Cassandra excel compared to other technologies?
>
> I have been working on Casandra for some time. I know how it works and I
> like it very much.
> We are moving towards building a big cluster. But at this point, I am not
> sure if its a right decision.
>
> A lot of people including me like Cassandra in my company. But it has more
> to do with the CQL and not the internals or the use cases. Until now, there
> have been small PoCs and people enjoyed it. But a large scale project, we
> are not so sure.
>
> Please guide us.
> Please note that the drawbacks of other technologies do not interest me,
> its the strengths/weaknesses of Cassandra I am interested in.
> Thanks
>
>
>
>
>
>
>
>

Re: Cassandra use cases/Strengths/Weakness

Posted by Jack Krupansky <ja...@basetechnology.com>.
Elasticsearch and Solr are “search platforms”, not “databases”. The best description for Cassandra, especially for a CTO, is its home page:
http://cassandra.apache.org/
Even if you have seen it before, please read it again. There is a lot packed into a few words.

DataStax Enterprise (DSE) combines Cassandra, Hadoop and Spark for analytics, and tightly integrated Solr for rich search of the Cassandra data.

The main, biggest benefit of Cassandra is that it is a master-free distributed real-time database designed for scale, including support for multiple data centers, so that it is ready for managing mission critical operational data, for applications that need low latency and high availability for real-time data access.

And OpsCenter is great for managing a Cassandra or DSE cluster. I’m sure a CTO would appreciate it:
http://www.datastax.com/what-we-offer/products-services/datastax-opscenter

Here’s a feature comparison of some NoSQL databases:
http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis

-- Jack Krupansky

From: Prem Yadav 
Sent: Friday, July 4, 2014 10:37 AM
To: user@cassandra.apache.org 
Subject: Cassandra use cases/Strengths/Weakness

Hi,
I have seen this in a lot of replies that Cassandra is not designed for this and that. I don't want to sound rude, i just need some info about this so that i can compare it to technologies like hbase, mongo, elasticsearch, solr, etc. 

1) what is Cassandra designed for. Heave writes yes. So is Hbase. Or ElasticSearch
What is the use case(s) that suit Cassandra.

2) What kind of queries are best suited for Cassandra.
I ask this Because I have seen people asking about queries and getting replies that its not suited for Cassandra. For ex: queries where large number of rows are requested and timeout happens. Or range queries or aggregate queries.



3) Where does Cassandra excel compared to other technologies?

I have been working on Casandra for some time. I know how it works and I like it very much. 
We are moving towards building a big cluster. But at this point, I am not sure if its a right decision. 

A lot of people including me like Cassandra in my company. But it has more to do with the CQL and not the internals or the use cases. Until now, there have been small PoCs and people enjoyed it. But a large scale project, we are not so sure.

Please guide us.
Please note that the drawbacks of other technologies do not interest me, its the strengths/weaknesses of Cassandra I am interested in.
Thanks