You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Oliver Ruebenacker <cu...@gmail.com> on 2018/03/12 19:58:02 UTC

Cassandra vs MySQL

     Hello,

  We have a project currently using MySQL single-node with 5-6TB of data
and some performance issues, and we plan to add data up to a total size of
maybe 25-30TB.

  We are thinking of migrating to Cassandra. I have been trying to find
benchmarks or other guidelines to compare MySQL and Cassandra, but most of
them seem to be five years old or older.

  Is there some good more recent material?

  Thanks!

     Best, Oliver

-- 
Oliver Ruebenacker
Senior Software Engineer, Diabetes Portal
<http://www.type2diabetesgenetics.org/>, Broad Institute
<http://www.broadinstitute.org/>

Re: [EXTERNAL] Cassandra vs MySQL

Posted by Carl Mueller <ca...@smartthings.com>.
Yes, cassandra's big win is that once you get your data and applications
adapted to the platform, you have a clear path to very very large scale and
resiliency. Um, assuming you have the dollars. It scales out on commodity
hardware, but isn't exactly efficient in the use of that hardware. I like
to say that Cassandra makes big data "bigger data" because of the
timestamp-per-cell and column name overhead and replication factor.

On Tue, Mar 20, 2018 at 2:54 PM, Jeff Jirsa <jj...@gmail.com> wrote:

> I suspect you're approaching this problem from the wrong side.
>
> The decision of MySQL vs Cassandra isn't usually about performance, it's
> about the other features that may impact/enable that performance.
>
> - Will you have a data set that won't fit on any single MySQL Server?
> - Will you want to write into two different hot datacenters at the same
> time?
> - Do you want to be able to restart any single server without impacting
> the cluster?
>
> If you answer yes to those, then cassandra has an option to do so
> trivially, where you'd have to build tooling with MySQL.
>
> - Do you want to do arbitrary text searches?
> - Do you need JOINs?
> - Do you want to build indices on a lot of the columns and do ad-hoc
> querying?
>
> If you answer yes to those, they're far easier in MySQL than Cassandra.
>
> If you're just looking for "Cassandra can do X writes per second and MySQL
> can do Y writes per second", those types of benchmarks are rarely relevant,
> because in both cases they tend to require expert tuning to get the full
> potential (and very few people are experts in both) and data dependent (and
> your data probably doesn't match the benchmarker's dataset).
>
> If I had a dataset that was ~10-20gb and wanted to do arbitrary reads on
> the data, I'd choose MySQL unless I absolutely positively could not
> tolerate downtime, in which case I'd go with Cassandra spanning multiple
> datacenters. If I had a dataset that was 200TB, or 200PB, I'd choose
> Cassandra, even if I could theoretically make MySQL do it faster, because
> the extra effort in building the tooling to manage that many shards of
> MySQL would be prohibitive to most organizations.
>
>
>
>
>
>
>
> On Tue, Mar 20, 2018 at 11:44 AM, Oliver Ruebenacker <cu...@gmail.com>
> wrote:
>
>>
>>      Hello,
>>
>>   Thanks for all the responses.
>>
>>   I do know some SQL and CQL, so I know the main differences. You can do
>> joins in MySQL, but the bigger your data, the less likely you want to do
>> that.
>>
>>   If you are a team that wants to consider migrating from MySQL to
>> Cassandra, you need some reason to believe that it is going to be faster.
>> What evidence is there?
>>
>>   Even the Cassandra home page has references to benchmarks to make the
>> case for Cassandra. Unfortunately, they seem to be about five to six years
>> old. It doesn't make sense to keep them there if you just can't compare.
>>
>>      Best, Oliver
>>
>> On Tue, Mar 20, 2018 at 1:13 PM, Durity, Sean R <
>> SEAN_R_DURITY@homedepot.com> wrote:
>>
>>> I’m not sure there is a fair comparison. MySQL and Cassandra have
>>> different ways of solving related (but not necessarily the same) problems
>>> of storing and retrieving data.
>>>
>>>
>>>
>>> The data model between MySQL and Cassandra is likely to be very
>>> different. The key for Cassandra is that you need to model for the queries
>>> that will be executed. If you cannot know the queries ahead of time,
>>> Cassandra is not the best choice. If table scans are typically required,
>>> Cassandra is not a good choice. If you need more than a few hundred tables
>>> in a cluster, Cassandra is not a good choice.
>>>
>>>
>>>
>>> If multi-datacenter replication is required, Cassandra is an awesome
>>> choice. If you are going to always query by a partition key (or primary
>>> key), Cassandra is a great choice. The nice thing is that the performance
>>> scales linearly, so additional data is fine (as long as you add nodes) –
>>> again, if your data model is designed for Cassandra. If you like
>>> no-downtime upgrades and extreme reliability and availability, Cassandra is
>>> a great choice.
>>>
>>>
>>>
>>> Personally, I hope to never have to use/support MySQL again, and I love
>>> working with Cassandra. But, Cassandra is not the choice for all data
>>> problems.
>>>
>>>
>>>
>>>
>>>
>>> Sean Durity
>>>
>>>
>>>
>>> *From:* Oliver Ruebenacker [mailto:curoli@gmail.com]
>>> *Sent:* Monday, March 12, 2018 3:58 PM
>>> *To:* user@cassandra.apache.org
>>> *Subject:* [EXTERNAL] Cassandra vs MySQL
>>>
>>>
>>>
>>>
>>>
>>>      Hello,
>>>
>>>   We have a project currently using MySQL single-node with 5-6TB of data
>>> and some performance issues, and we plan to add data up to a total size of
>>> maybe 25-30TB.
>>>
>>>   We are thinking of migrating to Cassandra. I have been trying to find
>>> benchmarks or other guidelines to compare MySQL and Cassandra, but most of
>>> them seem to be five years old or older.
>>>
>>>   Is there some good more recent material?
>>>
>>>   Thanks!
>>>
>>>      Best, Oliver
>>>
>>>
>>> --
>>>
>>> Oliver Ruebenacker
>>>
>>> Senior Software Engineer, Diabetes Portal
>>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.type2diabetesgenetics.org_&d=DwMFaQ&c=MtgQEAMQGqekjTjiAhkudQ&r=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ&m=j3Lz6pcGNV-FgBKxSeA0Lj6Jh2PC7f53PrXNjGYOPiU&s=1qS6jO1gSrBpPz6yc33IUcVUA-Q0jKm6jmjJr1u89Tc&e=>,
>>> Broad Institute
>>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.broadinstitute.org_&d=DwMFaQ&c=MtgQEAMQGqekjTjiAhkudQ&r=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ&m=j3Lz6pcGNV-FgBKxSeA0Lj6Jh2PC7f53PrXNjGYOPiU&s=bzHFcavS9i7dzp6ahF4aLzSmH_LukAHXbiiLk03LeD8&e=>
>>>
>>>
>>>
>>> ------------------------------
>>>
>>> The information in this Internet Email is confidential and may be
>>> legally privileged. It is intended solely for the addressee. Access to this
>>> Email by anyone else is unauthorized. If you are not the intended
>>> recipient, any disclosure, copying, distribution or any action taken or
>>> omitted to be taken in reliance on it, is prohibited and may be unlawful.
>>> When addressed to our clients any opinions or advice contained in this
>>> Email are subject to the terms and conditions expressed in any applicable
>>> governing The Home Depot terms of business or client engagement letter. The
>>> Home Depot disclaims all responsibility and liability for the accuracy and
>>> content of this attachment and for any damages or losses arising from any
>>> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
>>> items of a destructive nature, which may be contained in this attachment
>>> and shall not be liable for direct, indirect, consequential or special
>>> damages in connection with this e-mail message or its attachment.
>>>
>>
>>
>>
>> --
>> Oliver Ruebenacker
>> Senior Software Engineer, Diabetes Portal
>> <http://www.type2diabetesgenetics.org/>, Broad Institute
>> <http://www.broadinstitute.org/>
>>
>>
>

Re: [EXTERNAL] Cassandra vs MySQL

Posted by Jeff Jirsa <jj...@gmail.com>.
I suspect you're approaching this problem from the wrong side.

The decision of MySQL vs Cassandra isn't usually about performance, it's
about the other features that may impact/enable that performance.

- Will you have a data set that won't fit on any single MySQL Server?
- Will you want to write into two different hot datacenters at the same
time?
- Do you want to be able to restart any single server without impacting the
cluster?

If you answer yes to those, then cassandra has an option to do so
trivially, where you'd have to build tooling with MySQL.

- Do you want to do arbitrary text searches?
- Do you need JOINs?
- Do you want to build indices on a lot of the columns and do ad-hoc
querying?

If you answer yes to those, they're far easier in MySQL than Cassandra.

If you're just looking for "Cassandra can do X writes per second and MySQL
can do Y writes per second", those types of benchmarks are rarely relevant,
because in both cases they tend to require expert tuning to get the full
potential (and very few people are experts in both) and data dependent (and
your data probably doesn't match the benchmarker's dataset).

If I had a dataset that was ~10-20gb and wanted to do arbitrary reads on
the data, I'd choose MySQL unless I absolutely positively could not
tolerate downtime, in which case I'd go with Cassandra spanning multiple
datacenters. If I had a dataset that was 200TB, or 200PB, I'd choose
Cassandra, even if I could theoretically make MySQL do it faster, because
the extra effort in building the tooling to manage that many shards of
MySQL would be prohibitive to most organizations.







On Tue, Mar 20, 2018 at 11:44 AM, Oliver Ruebenacker <cu...@gmail.com>
wrote:

>
>      Hello,
>
>   Thanks for all the responses.
>
>   I do know some SQL and CQL, so I know the main differences. You can do
> joins in MySQL, but the bigger your data, the less likely you want to do
> that.
>
>   If you are a team that wants to consider migrating from MySQL to
> Cassandra, you need some reason to believe that it is going to be faster.
> What evidence is there?
>
>   Even the Cassandra home page has references to benchmarks to make the
> case for Cassandra. Unfortunately, they seem to be about five to six years
> old. It doesn't make sense to keep them there if you just can't compare.
>
>      Best, Oliver
>
> On Tue, Mar 20, 2018 at 1:13 PM, Durity, Sean R <
> SEAN_R_DURITY@homedepot.com> wrote:
>
>> I’m not sure there is a fair comparison. MySQL and Cassandra have
>> different ways of solving related (but not necessarily the same) problems
>> of storing and retrieving data.
>>
>>
>>
>> The data model between MySQL and Cassandra is likely to be very
>> different. The key for Cassandra is that you need to model for the queries
>> that will be executed. If you cannot know the queries ahead of time,
>> Cassandra is not the best choice. If table scans are typically required,
>> Cassandra is not a good choice. If you need more than a few hundred tables
>> in a cluster, Cassandra is not a good choice.
>>
>>
>>
>> If multi-datacenter replication is required, Cassandra is an awesome
>> choice. If you are going to always query by a partition key (or primary
>> key), Cassandra is a great choice. The nice thing is that the performance
>> scales linearly, so additional data is fine (as long as you add nodes) –
>> again, if your data model is designed for Cassandra. If you like
>> no-downtime upgrades and extreme reliability and availability, Cassandra is
>> a great choice.
>>
>>
>>
>> Personally, I hope to never have to use/support MySQL again, and I love
>> working with Cassandra. But, Cassandra is not the choice for all data
>> problems.
>>
>>
>>
>>
>>
>> Sean Durity
>>
>>
>>
>> *From:* Oliver Ruebenacker [mailto:curoli@gmail.com]
>> *Sent:* Monday, March 12, 2018 3:58 PM
>> *To:* user@cassandra.apache.org
>> *Subject:* [EXTERNAL] Cassandra vs MySQL
>>
>>
>>
>>
>>
>>      Hello,
>>
>>   We have a project currently using MySQL single-node with 5-6TB of data
>> and some performance issues, and we plan to add data up to a total size of
>> maybe 25-30TB.
>>
>>   We are thinking of migrating to Cassandra. I have been trying to find
>> benchmarks or other guidelines to compare MySQL and Cassandra, but most of
>> them seem to be five years old or older.
>>
>>   Is there some good more recent material?
>>
>>   Thanks!
>>
>>      Best, Oliver
>>
>>
>> --
>>
>> Oliver Ruebenacker
>>
>> Senior Software Engineer, Diabetes Portal
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.type2diabetesgenetics.org_&d=DwMFaQ&c=MtgQEAMQGqekjTjiAhkudQ&r=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ&m=j3Lz6pcGNV-FgBKxSeA0Lj6Jh2PC7f53PrXNjGYOPiU&s=1qS6jO1gSrBpPz6yc33IUcVUA-Q0jKm6jmjJr1u89Tc&e=>,
>> Broad Institute
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.broadinstitute.org_&d=DwMFaQ&c=MtgQEAMQGqekjTjiAhkudQ&r=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ&m=j3Lz6pcGNV-FgBKxSeA0Lj6Jh2PC7f53PrXNjGYOPiU&s=bzHFcavS9i7dzp6ahF4aLzSmH_LukAHXbiiLk03LeD8&e=>
>>
>>
>>
>> ------------------------------
>>
>> The information in this Internet Email is confidential and may be legally
>> privileged. It is intended solely for the addressee. Access to this Email
>> by anyone else is unauthorized. If you are not the intended recipient, any
>> disclosure, copying, distribution or any action taken or omitted to be
>> taken in reliance on it, is prohibited and may be unlawful. When addressed
>> to our clients any opinions or advice contained in this Email are subject
>> to the terms and conditions expressed in any applicable governing The Home
>> Depot terms of business or client engagement letter. The Home Depot
>> disclaims all responsibility and liability for the accuracy and content of
>> this attachment and for any damages or losses arising from any
>> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
>> items of a destructive nature, which may be contained in this attachment
>> and shall not be liable for direct, indirect, consequential or special
>> damages in connection with this e-mail message or its attachment.
>>
>
>
>
> --
> Oliver Ruebenacker
> Senior Software Engineer, Diabetes Portal
> <http://www.type2diabetesgenetics.org/>, Broad Institute
> <http://www.broadinstitute.org/>
>
>

Re: [EXTERNAL] Cassandra vs MySQL

Posted by Joaquin Casares <jo...@thelastpickle.com>.
Hello Oliver,

The first thing that I check when seeing if a workload will work well
within Cassandra is by looking at it's read patterns. Once the read
patterns can be written down on paper, we need to figure out how the write
patterns will populate the required tables. Since you know enough about
CQL, it's mainly about checking to see how denormalization is going to work
out for on-disk read access requests.

Once the read and write patterns are known, we can see if Cassandra will be
a good fit for denormalizing your workflow and thereby benefiting from a
datastore that can scale out horizontally. If your datastore can scale out
horizontally then Cassandra should be faster than a single node MySQL
cluster. If your datastore has too many relational requirements, is built
in for a queue-like purpose, or other edge cases, then it doesn't matter
how fast Cassandra is if it's not the correct tool for the job.

I hope that helps align your discovery/investigation process. :)

Cheers,

Joaquin Casares
Consultant
Austin, TX

Apache Cassandra Consulting
http://www.thelastpickle.com

On Tue, Mar 20, 2018 at 1:44 PM, Oliver Ruebenacker <cu...@gmail.com>
wrote:

>
>      Hello,
>
>   Thanks for all the responses.
>
>   I do know some SQL and CQL, so I know the main differences. You can do
> joins in MySQL, but the bigger your data, the less likely you want to do
> that.
>
>   If you are a team that wants to consider migrating from MySQL to
> Cassandra, you need some reason to believe that it is going to be faster.
> What evidence is there?
>
>   Even the Cassandra home page has references to benchmarks to make the
> case for Cassandra. Unfortunately, they seem to be about five to six years
> old. It doesn't make sense to keep them there if you just can't compare.
>
>      Best, Oliver
>
> On Tue, Mar 20, 2018 at 1:13 PM, Durity, Sean R <
> SEAN_R_DURITY@homedepot.com> wrote:
>
>> I’m not sure there is a fair comparison. MySQL and Cassandra have
>> different ways of solving related (but not necessarily the same) problems
>> of storing and retrieving data.
>>
>>
>>
>> The data model between MySQL and Cassandra is likely to be very
>> different. The key for Cassandra is that you need to model for the queries
>> that will be executed. If you cannot know the queries ahead of time,
>> Cassandra is not the best choice. If table scans are typically required,
>> Cassandra is not a good choice. If you need more than a few hundred tables
>> in a cluster, Cassandra is not a good choice.
>>
>>
>>
>> If multi-datacenter replication is required, Cassandra is an awesome
>> choice. If you are going to always query by a partition key (or primary
>> key), Cassandra is a great choice. The nice thing is that the performance
>> scales linearly, so additional data is fine (as long as you add nodes) –
>> again, if your data model is designed for Cassandra. If you like
>> no-downtime upgrades and extreme reliability and availability, Cassandra is
>> a great choice.
>>
>>
>>
>> Personally, I hope to never have to use/support MySQL again, and I love
>> working with Cassandra. But, Cassandra is not the choice for all data
>> problems.
>>
>>
>>
>>
>>
>> Sean Durity
>>
>>
>>
>> *From:* Oliver Ruebenacker [mailto:curoli@gmail.com]
>> *Sent:* Monday, March 12, 2018 3:58 PM
>> *To:* user@cassandra.apache.org
>> *Subject:* [EXTERNAL] Cassandra vs MySQL
>>
>>
>>
>>
>>
>>      Hello,
>>
>>   We have a project currently using MySQL single-node with 5-6TB of data
>> and some performance issues, and we plan to add data up to a total size of
>> maybe 25-30TB.
>>
>>   We are thinking of migrating to Cassandra. I have been trying to find
>> benchmarks or other guidelines to compare MySQL and Cassandra, but most of
>> them seem to be five years old or older.
>>
>>   Is there some good more recent material?
>>
>>   Thanks!
>>
>>      Best, Oliver
>>
>>
>> --
>>
>> Oliver Ruebenacker
>>
>> Senior Software Engineer, Diabetes Portal
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.type2diabetesgenetics.org_&d=DwMFaQ&c=MtgQEAMQGqekjTjiAhkudQ&r=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ&m=j3Lz6pcGNV-FgBKxSeA0Lj6Jh2PC7f53PrXNjGYOPiU&s=1qS6jO1gSrBpPz6yc33IUcVUA-Q0jKm6jmjJr1u89Tc&e=>,
>> Broad Institute
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.broadinstitute.org_&d=DwMFaQ&c=MtgQEAMQGqekjTjiAhkudQ&r=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ&m=j3Lz6pcGNV-FgBKxSeA0Lj6Jh2PC7f53PrXNjGYOPiU&s=bzHFcavS9i7dzp6ahF4aLzSmH_LukAHXbiiLk03LeD8&e=>
>>
>>
>>
>> ------------------------------
>>
>> The information in this Internet Email is confidential and may be legally
>> privileged. It is intended solely for the addressee. Access to this Email
>> by anyone else is unauthorized. If you are not the intended recipient, any
>> disclosure, copying, distribution or any action taken or omitted to be
>> taken in reliance on it, is prohibited and may be unlawful. When addressed
>> to our clients any opinions or advice contained in this Email are subject
>> to the terms and conditions expressed in any applicable governing The Home
>> Depot terms of business or client engagement letter. The Home Depot
>> disclaims all responsibility and liability for the accuracy and content of
>> this attachment and for any damages or losses arising from any
>> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
>> items of a destructive nature, which may be contained in this attachment
>> and shall not be liable for direct, indirect, consequential or special
>> damages in connection with this e-mail message or its attachment.
>>
>
>
>
> --
> Oliver Ruebenacker
> Senior Software Engineer, Diabetes Portal
> <http://www.type2diabetesgenetics.org/>, Broad Institute
> <http://www.broadinstitute.org/>
>
>

Re: [EXTERNAL] Cassandra vs MySQL

Posted by Oliver Ruebenacker <cu...@gmail.com>.
     Hello,

  Thanks for all the responses.

  I do know some SQL and CQL, so I know the main differences. You can do
joins in MySQL, but the bigger your data, the less likely you want to do
that.

  If you are a team that wants to consider migrating from MySQL to
Cassandra, you need some reason to believe that it is going to be faster.
What evidence is there?

  Even the Cassandra home page has references to benchmarks to make the
case for Cassandra. Unfortunately, they seem to be about five to six years
old. It doesn't make sense to keep them there if you just can't compare.

     Best, Oliver

On Tue, Mar 20, 2018 at 1:13 PM, Durity, Sean R <SEAN_R_DURITY@homedepot.com
> wrote:

> I’m not sure there is a fair comparison. MySQL and Cassandra have
> different ways of solving related (but not necessarily the same) problems
> of storing and retrieving data.
>
>
>
> The data model between MySQL and Cassandra is likely to be very different.
> The key for Cassandra is that you need to model for the queries that will
> be executed. If you cannot know the queries ahead of time, Cassandra is not
> the best choice. If table scans are typically required, Cassandra is not a
> good choice. If you need more than a few hundred tables in a cluster,
> Cassandra is not a good choice.
>
>
>
> If multi-datacenter replication is required, Cassandra is an awesome
> choice. If you are going to always query by a partition key (or primary
> key), Cassandra is a great choice. The nice thing is that the performance
> scales linearly, so additional data is fine (as long as you add nodes) –
> again, if your data model is designed for Cassandra. If you like
> no-downtime upgrades and extreme reliability and availability, Cassandra is
> a great choice.
>
>
>
> Personally, I hope to never have to use/support MySQL again, and I love
> working with Cassandra. But, Cassandra is not the choice for all data
> problems.
>
>
>
>
>
> Sean Durity
>
>
>
> *From:* Oliver Ruebenacker [mailto:curoli@gmail.com]
> *Sent:* Monday, March 12, 2018 3:58 PM
> *To:* user@cassandra.apache.org
> *Subject:* [EXTERNAL] Cassandra vs MySQL
>
>
>
>
>
>      Hello,
>
>   We have a project currently using MySQL single-node with 5-6TB of data
> and some performance issues, and we plan to add data up to a total size of
> maybe 25-30TB.
>
>   We are thinking of migrating to Cassandra. I have been trying to find
> benchmarks or other guidelines to compare MySQL and Cassandra, but most of
> them seem to be five years old or older.
>
>   Is there some good more recent material?
>
>   Thanks!
>
>      Best, Oliver
>
>
> --
>
> Oliver Ruebenacker
>
> Senior Software Engineer, Diabetes Portal
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.type2diabetesgenetics.org_&d=DwMFaQ&c=MtgQEAMQGqekjTjiAhkudQ&r=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ&m=j3Lz6pcGNV-FgBKxSeA0Lj6Jh2PC7f53PrXNjGYOPiU&s=1qS6jO1gSrBpPz6yc33IUcVUA-Q0jKm6jmjJr1u89Tc&e=>,
> Broad Institute
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.broadinstitute.org_&d=DwMFaQ&c=MtgQEAMQGqekjTjiAhkudQ&r=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ&m=j3Lz6pcGNV-FgBKxSeA0Lj6Jh2PC7f53PrXNjGYOPiU&s=bzHFcavS9i7dzp6ahF4aLzSmH_LukAHXbiiLk03LeD8&e=>
>
>
>
> ------------------------------
>
> The information in this Internet Email is confidential and may be legally
> privileged. It is intended solely for the addressee. Access to this Email
> by anyone else is unauthorized. If you are not the intended recipient, any
> disclosure, copying, distribution or any action taken or omitted to be
> taken in reliance on it, is prohibited and may be unlawful. When addressed
> to our clients any opinions or advice contained in this Email are subject
> to the terms and conditions expressed in any applicable governing The Home
> Depot terms of business or client engagement letter. The Home Depot
> disclaims all responsibility and liability for the accuracy and content of
> this attachment and for any damages or losses arising from any
> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
> items of a destructive nature, which may be contained in this attachment
> and shall not be liable for direct, indirect, consequential or special
> damages in connection with this e-mail message or its attachment.
>



-- 
Oliver Ruebenacker
Senior Software Engineer, Diabetes Portal
<http://www.type2diabetesgenetics.org/>, Broad Institute
<http://www.broadinstitute.org/>

RE: [EXTERNAL] Cassandra vs MySQL

Posted by "Durity, Sean R" <SE...@homedepot.com>.
I’m not sure there is a fair comparison. MySQL and Cassandra have different ways of solving related (but not necessarily the same) problems of storing and retrieving data.

The data model between MySQL and Cassandra is likely to be very different. The key for Cassandra is that you need to model for the queries that will be executed. If you cannot know the queries ahead of time, Cassandra is not the best choice. If table scans are typically required, Cassandra is not a good choice. If you need more than a few hundred tables in a cluster, Cassandra is not a good choice.

If multi-datacenter replication is required, Cassandra is an awesome choice. If you are going to always query by a partition key (or primary key), Cassandra is a great choice. The nice thing is that the performance scales linearly, so additional data is fine (as long as you add nodes) – again, if your data model is designed for Cassandra. If you like no-downtime upgrades and extreme reliability and availability, Cassandra is a great choice.

Personally, I hope to never have to use/support MySQL again, and I love working with Cassandra. But, Cassandra is not the choice for all data problems.


Sean Durity

From: Oliver Ruebenacker [mailto:curoli@gmail.com]
Sent: Monday, March 12, 2018 3:58 PM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Cassandra vs MySQL


     Hello,
  We have a project currently using MySQL single-node with 5-6TB of data and some performance issues, and we plan to add data up to a total size of maybe 25-30TB.
  We are thinking of migrating to Cassandra. I have been trying to find benchmarks or other guidelines to compare MySQL and Cassandra, but most of them seem to be five years old or older.
  Is there some good more recent material?
  Thanks!
     Best, Oliver

--
Oliver Ruebenacker
Senior Software Engineer, Diabetes Portal<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.type2diabetesgenetics.org_&d=DwMFaQ&c=MtgQEAMQGqekjTjiAhkudQ&r=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ&m=j3Lz6pcGNV-FgBKxSeA0Lj6Jh2PC7f53PrXNjGYOPiU&s=1qS6jO1gSrBpPz6yc33IUcVUA-Q0jKm6jmjJr1u89Tc&e=>, Broad Institute<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.broadinstitute.org_&d=DwMFaQ&c=MtgQEAMQGqekjTjiAhkudQ&r=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ&m=j3Lz6pcGNV-FgBKxSeA0Lj6Jh2PC7f53PrXNjGYOPiU&s=bzHFcavS9i7dzp6ahF4aLzSmH_LukAHXbiiLk03LeD8&e=>


________________________________

The information in this Internet Email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this Email by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. When addressed to our clients any opinions or advice contained in this Email are subject to the terms and conditions expressed in any applicable governing The Home Depot terms of business or client engagement letter. The Home Depot disclaims all responsibility and liability for the accuracy and content of this attachment and for any damages or losses arising from any inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other items of a destructive nature, which may be contained in this attachment and shall not be liable for direct, indirect, consequential or special damages in connection with this e-mail message or its attachment.

Re: Cassandra vs MySQL

Posted by Satendra <st...@gmail.com>.
Cassandra is going to be die in next few time (What I see) - Cassandra
is not solving the purpose rather people are facing fewer issue
sometime where in virtual environments.

We have tried crdb database cluster and migrated few of cluster over
on the cockroach database environment, it seems working having said
the relational as nature.

Saen

On 3/13/18, Matija Gobec <ma...@gmail.com> wrote:
> Hi Oliver,
>
> Few years back I had a similar problem where there was a lot of data in
> MySQL and it was starting to choke. I migrated data to Cassandra, ran
> benchmarks and blew MySQL out of the water with a small 3 node C* cluster.
> If you have a use case for Cassandra the answer is yes, but keep in mind
> that there are some use cases like relational problems which can be hard to
> solve with Cassandra and I tend to keep them in relational database. That
> being said, I don't think you can benchmark these two head to head since
> they basically solve different problems and Cassandra is distributed by
> design.
>
> Best,
> Matija
>
> On Mon, Mar 12, 2018 at 9:27 PM, Gábor Auth <au...@gmail.com> wrote:
>
>> Hi,
>>
>> On Mon, Mar 12, 2018 at 8:58 PM Oliver Ruebenacker <cu...@gmail.com>
>> wrote:
>>
>>> We have a project currently using MySQL single-node with 5-6TB of data
>>> and some performance issues, and we plan to add data up to a total size
>>> of
>>> maybe 25-30TB.
>>>
>>
>> There is no 'silver bullet', the Cassandra is not a 'drop in' replacement
>> of MySQL. Maybe it will be faster, maybe it will be totally unusable,
>> based
>> on your use-case and database scheme.
>>
>> Is there some good more recent material?
>>>
>>
>> Are you able to completely redesign your database schema? :)
>>
>> Bye,
>> Gábor Auth
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
For additional commands, e-mail: user-help@cassandra.apache.org


Re: Cassandra vs MySQL

Posted by Anthony Grasso <an...@gmail.com>.
Hi Oliver,

I was in a similar situation to you and Matija a few years back as well and
can vouch for what Matija has said. Some data sets are more suitable for
Cassandra than others; so the answer to your question depends on the type
of data and how it is modelled in Cassandra. The data model will affect
performance and how the cluster expands over time.

The application(s) connecting to the database will need to be modified to
at least call the cluster and possibly to perform some of the operations
that MySQL performed (e.g joins). This means that any sort of benchmark
would have use the full system end-to-end. If you did decide to benchmark
your system with MySQL and then with Cassandra, it is best to use a full
production data load. This is because the data model used in Cassandra will
affect the system performance characteristics as the data grows.

Kind regards,
Anthony


On Tue, 13 Mar 2018 at 07:29, Matija Gobec <ma...@gmail.com> wrote:

> Hi Oliver,
>
> Few years back I had a similar problem where there was a lot of data in
> MySQL and it was starting to choke. I migrated data to Cassandra, ran
> benchmarks and blew MySQL out of the water with a small 3 node C* cluster.
> If you have a use case for Cassandra the answer is yes, but keep in mind
> that there are some use cases like relational problems which can be hard to
> solve with Cassandra and I tend to keep them in relational database. That
> being said, I don't think you can benchmark these two head to head since
> they basically solve different problems and Cassandra is distributed by
> design.
>
> Best,
> Matija
>
> On Mon, Mar 12, 2018 at 9:27 PM, Gábor Auth <au...@gmail.com> wrote:
>
>> Hi,
>>
>> On Mon, Mar 12, 2018 at 8:58 PM Oliver Ruebenacker <cu...@gmail.com>
>> wrote:
>>
>>> We have a project currently using MySQL single-node with 5-6TB of data
>>> and some performance issues, and we plan to add data up to a total size of
>>> maybe 25-30TB.
>>>
>>
>> There is no 'silver bullet', the Cassandra is not a 'drop in' replacement
>> of MySQL. Maybe it will be faster, maybe it will be totally unusable, based
>> on your use-case and database scheme.
>>
>> Is there some good more recent material?
>>>
>>
>> Are you able to completely redesign your database schema? :)
>>
>> Bye,
>> Gábor Auth
>>
>>
>

Re: Cassandra vs MySQL

Posted by Matija Gobec <ma...@gmail.com>.
Hi Oliver,

Few years back I had a similar problem where there was a lot of data in
MySQL and it was starting to choke. I migrated data to Cassandra, ran
benchmarks and blew MySQL out of the water with a small 3 node C* cluster.
If you have a use case for Cassandra the answer is yes, but keep in mind
that there are some use cases like relational problems which can be hard to
solve with Cassandra and I tend to keep them in relational database. That
being said, I don't think you can benchmark these two head to head since
they basically solve different problems and Cassandra is distributed by
design.

Best,
Matija

On Mon, Mar 12, 2018 at 9:27 PM, Gábor Auth <au...@gmail.com> wrote:

> Hi,
>
> On Mon, Mar 12, 2018 at 8:58 PM Oliver Ruebenacker <cu...@gmail.com>
> wrote:
>
>> We have a project currently using MySQL single-node with 5-6TB of data
>> and some performance issues, and we plan to add data up to a total size of
>> maybe 25-30TB.
>>
>
> There is no 'silver bullet', the Cassandra is not a 'drop in' replacement
> of MySQL. Maybe it will be faster, maybe it will be totally unusable, based
> on your use-case and database scheme.
>
> Is there some good more recent material?
>>
>
> Are you able to completely redesign your database schema? :)
>
> Bye,
> Gábor Auth
>
>

Re: Cassandra vs MySQL

Posted by Gábor Auth <au...@gmail.com>.
Hi,

On Mon, Mar 12, 2018 at 8:58 PM Oliver Ruebenacker <cu...@gmail.com> wrote:

> We have a project currently using MySQL single-node with 5-6TB of data and
> some performance issues, and we plan to add data up to a total size of
> maybe 25-30TB.
>

There is no 'silver bullet', the Cassandra is not a 'drop in' replacement
of MySQL. Maybe it will be faster, maybe it will be totally unusable, based
on your use-case and database scheme.

Is there some good more recent material?
>

Are you able to completely redesign your database schema? :)

Bye,
Gábor Auth

Re: Cassandra vs MySQL

Posted by Carl Mueller <ca...@smartthings.com>.
THERE ARE NO JOINS WITH CASSANDRA

CQL != SQL

Same for aggregation, subqueries, etc. And effectively multitable
transactions are out.

If you have simple single-table queries and updates, or can convert the app
to do so, then you're in business.

On Tue, Mar 13, 2018 at 5:02 AM, Rahul Singh <ra...@gmail.com>
wrote:

> Oliver,
>
>
> Here’s the criteria I have for you:
>
> 1. Do you need massive concurrency on reads and writes ?
>
> If not you can replicate MySQL using master slave. Or consider Galera -
> Maria DB master master. I’ve not used it but then again doesn’t mean that
> it doesn’t work. If you have time to experiment , please do a comparison
> with Galera vs. Cassandra. ;)
>
> 2. Do you plan on doing both OLTP and OLAP on the same data?
>
> Cassandra can replicate data to different Datacenters so you can
> concurrently do heavy read and write on one Logical Datacenter and
> simultaneously have another Logical Datacenter for analytics.
>
> 3. Do you have a ridiculously strict SLA to maintain? And does it need to
> be global?
>
> If you don’t need to be up and running all the time and don’t need a
> global platform, don’t bother using Cassandra.
>
> Exporting a relational schema and importing into Cassandra will be a box
> of hurt. In my professional (the type of experience that comes from people
> paying me to make judgments, decisions ) experience with Cassandra, the
> biggest mistake is people thinking that since CQL is similar to SQL that it
> is just like SQL. It’s not. The keys and literally “no relationships” mean
> that all the tables should be “Report tables” or “direct object tables.”
> That being said if you don’t do a lot of joins and arbitrary selects on any
> field, Cassandra can help achieve massive scale.
>
> The statement that “Cassandra is going to die in a few time” is the same
> thing people said about Java and .NET. They are still here decades later.
> Cassandra has achieved critical mass. So much that a company made a C++
> version of it and Microsoft supports a global Database as a service version
> of it called Cosmos, not to mention that DataStax supports huge global
> brands on a commercial build of it. It’s not going anywhere.
>
>
> --
> Rahul Singh
> rahul.singh@anant.us
>
> Anant Corporation
>
> On Mar 12, 2018, 3:58 PM -0400, Oliver Ruebenacker <cu...@gmail.com>,
> wrote:
>
>
>      Hello,
>
>   We have a project currently using MySQL single-node with 5-6TB of data
> and some performance issues, and we plan to add data up to a total size of
> maybe 25-30TB.
>
>   We are thinking of migrating to Cassandra. I have been trying to find
> benchmarks or other guidelines to compare MySQL and Cassandra, but most of
> them seem to be five years old or older.
>
>   Is there some good more recent material?
>
>   Thanks!
>
>      Best, Oliver
>
> --
> Oliver Ruebenacker
> Senior Software Engineer, Diabetes Portal
> <http://www.type2diabetesgenetics.org/>, Broad Institute
> <http://www.broadinstitute.org/>
>
>

Re: Cassandra vs MySQL

Posted by Rahul Singh <ra...@gmail.com>.
Oliver,


Here’s the criteria I have for you:

1. Do you need massive concurrency on reads and writes ?

If not you can replicate MySQL using master slave. Or consider Galera - Maria DB master master. I’ve not used it but then again doesn’t mean that it doesn’t work. If you have time to experiment , please do a comparison with Galera vs. Cassandra. ;)

2. Do you plan on doing both OLTP and OLAP on the same data?

Cassandra can replicate data to different Datacenters so you can concurrently do heavy read and write on one Logical Datacenter and simultaneously have another Logical Datacenter for analytics.

3. Do you have a ridiculously strict SLA to maintain? And does it need to be global?

If you don’t need to be up and running all the time and don’t need a global platform, don’t bother using Cassandra.

Exporting a relational schema and importing into Cassandra will be a box of hurt. In my professional (the type of experience that comes from people paying me to make judgments, decisions ) experience with Cassandra, the biggest mistake is people thinking that since CQL is similar to SQL that it is just like SQL. It’s not. The keys and literally “no relationships” mean that all the tables should be “Report tables” or “direct object tables.” That being said if you don’t do a lot of joins and arbitrary selects on any field, Cassandra can help achieve massive scale.

The statement that “Cassandra is going to die in a few time” is the same thing people said about Java and .NET. They are still here decades later. Cassandra has achieved critical mass. So much that a company made a C++ version of it and Microsoft supports a global Database as a service version of it called Cosmos, not to mention that DataStax supports huge global brands on a commercial build of it. It’s not going anywhere.


--
Rahul Singh
rahul.singh@anant.us

Anant Corporation

On Mar 12, 2018, 3:58 PM -0400, Oliver Ruebenacker <cu...@gmail.com>, wrote:
>
>      Hello,
>
>   We have a project currently using MySQL single-node with 5-6TB of data and some performance issues, and we plan to add data up to a total size of maybe 25-30TB.
>
>   We are thinking of migrating to Cassandra. I have been trying to find benchmarks or other guidelines to compare MySQL and Cassandra, but most of them seem to be five years old or older.
>
>   Is there some good more recent material?
>
>   Thanks!
>
>      Best, Oliver
>
> --
> Oliver Ruebenacker
> Senior Software Engineer, Diabetes Portal, Broad Institute
>