You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Jeremy Jongsma <je...@barchart.com> on 2014/08/19 17:56:34 UTC

EC2 SSD cluster costs

The latest consensus around the web for running Cassandra on EC2 seems to
be "use new SSD instances." I've not seen any mention of the elephant in
the room - using the new SSD instances significantly raises the cluster
cost per TB. With Cassandra's strength being linear scalability to many
terabytes of data, it strikes me as odd that everyone is recommending such
a large storage cost hike almost without reservation.

Monthly cost comparison for a 100TB cluster (non-reserved instances):

m1.xlarge (2x420 non-SSD): $30,000 (120 nodes)
m3.xlarge (2x40 SSD): $250,000 (1250 nodes! Clearly not an option)
i2.xlarge (1x800 SSD): $76,000 (125 nodes)

Best case, the cost goes up 150%. How are others approaching these new
instances? Have you migrated and eaten the costs, or are you staying on
previous generation until prices come down?

Re: EC2 SSD cluster costs

Posted by Aiman Parvaiz <ai...@shift.com>.
I completely agree with others here. It depends on your use case. We were
using Hi1.4xlarge boxes and paying huge amount to Amazon, lately our
requirements changed and we are not hammering C* as much and our data size
has gone down too, so given the new conditions we reserved and migrated to
c3.4xlarges to save quite a lot of money.


On Aug 19, 2014, at 10:25 AM, Paulo Ricardo Motta Gomes <
paulo.motta@chaordicsystems.com> wrote:

Still using good ol' m1.xlarge here + external caching (memcached). Trying
to adapt our use case to have different clusters for different use cases so
we can leverage SSD at an acceptable cost in some of them.


On Tue, Aug 19, 2014 at 1:05 PM, Shane Hansen <sh...@gmail.com>
wrote:

> Again, depends on your use case.
> But we wanted to keep the data per node below 500gb,
> and we found raided ssds to be the best bang for the buck
> for our cluster. I think we moved to from the i2 to c3 because
> our bottleneck tended to be CPU utilization (from parsing requests).
>
>
>
> (Discliamer, we're not cassandra veterans but we're not part of the RF=N=3
> club)
>
>
>
> On Tue, Aug 19, 2014 at 10:00 AM, Russell Bradberry <rb...@gmail.com>
> wrote:
>
>> Short answer, it depends on your use-case.
>>
>> We migrated to i2.xlarge nodes and saw an immediate increase in
>> performance.  If you just need plain ole raw disk space and don’t have a
>> performance requirement to meet then the m1 machines would work, or hell
>> even SSD EBS volumes may work for you.  The problem we were having is that
>> we couldn’t fill the m1 machines because we needed to add more nodes for
>> performance.  Now we have much more power and just the right amount of disk
>> space.
>>
>> Basically saying, these are not apples-to-apples comparisons
>>
>>
>>
>> On August 19, 2014 at 11:57:10 AM, Jeremy Jongsma (jeremy@barchart.com)
>> wrote:
>>
>> The latest consensus around the web for running Cassandra on EC2 seems to
>> be "use new SSD instances." I've not seen any mention of the elephant in
>> the room - using the new SSD instances significantly raises the cluster
>> cost per TB. With Cassandra's strength being linear scalability to many
>> terabytes of data, it strikes me as odd that everyone is recommending such
>> a large storage cost hike almost without reservation.
>>
>> Monthly cost comparison for a 100TB cluster (non-reserved instances):
>>
>> m1.xlarge (2x420 non-SSD): $30,000 (120 nodes)
>> m3.xlarge (2x40 SSD): $250,000 (1250 nodes! Clearly not an option)
>> i2.xlarge (1x800 SSD): $76,000 (125 nodes)
>>
>> Best case, the cost goes up 150%. How are others approaching these new
>> instances? Have you migrated and eaten the costs, or are you staying on
>> previous generation until prices come down?
>>
>>
>


-- 
*Paulo Motta*

Chaordic | *Platform*
*www.chaordic.com.br <http://www.chaordic.com.br/>*
+55 48 3232.3200

Re: EC2 SSD cluster costs

Posted by Paulo Ricardo Motta Gomes <pa...@chaordicsystems.com>.
Still using good ol' m1.xlarge here + external caching (memcached). Trying
to adapt our use case to have different clusters for different use cases so
we can leverage SSD at an acceptable cost in some of them.


On Tue, Aug 19, 2014 at 1:05 PM, Shane Hansen <sh...@gmail.com>
wrote:

> Again, depends on your use case.
> But we wanted to keep the data per node below 500gb,
> and we found raided ssds to be the best bang for the buck
> for our cluster. I think we moved to from the i2 to c3 because
> our bottleneck tended to be CPU utilization (from parsing requests).
>
>
>
> (Discliamer, we're not cassandra veterans but we're not part of the RF=N=3
> club)
>
>
>
> On Tue, Aug 19, 2014 at 10:00 AM, Russell Bradberry <rb...@gmail.com>
> wrote:
>
>> Short answer, it depends on your use-case.
>>
>> We migrated to i2.xlarge nodes and saw an immediate increase in
>> performance.  If you just need plain ole raw disk space and don’t have a
>> performance requirement to meet then the m1 machines would work, or hell
>> even SSD EBS volumes may work for you.  The problem we were having is that
>> we couldn’t fill the m1 machines because we needed to add more nodes for
>> performance.  Now we have much more power and just the right amount of disk
>> space.
>>
>> Basically saying, these are not apples-to-apples comparisons
>>
>>
>>
>> On August 19, 2014 at 11:57:10 AM, Jeremy Jongsma (jeremy@barchart.com)
>> wrote:
>>
>> The latest consensus around the web for running Cassandra on EC2 seems to
>> be "use new SSD instances." I've not seen any mention of the elephant in
>> the room - using the new SSD instances significantly raises the cluster
>> cost per TB. With Cassandra's strength being linear scalability to many
>> terabytes of data, it strikes me as odd that everyone is recommending such
>> a large storage cost hike almost without reservation.
>>
>> Monthly cost comparison for a 100TB cluster (non-reserved instances):
>>
>> m1.xlarge (2x420 non-SSD): $30,000 (120 nodes)
>> m3.xlarge (2x40 SSD): $250,000 (1250 nodes! Clearly not an option)
>> i2.xlarge (1x800 SSD): $76,000 (125 nodes)
>>
>> Best case, the cost goes up 150%. How are others approaching these new
>> instances? Have you migrated and eaten the costs, or are you staying on
>> previous generation until prices come down?
>>
>>
>


-- 
*Paulo Motta*

Chaordic | *Platform*
*www.chaordic.com.br <http://www.chaordic.com.br/>*
+55 48 3232.3200

Re: EC2 SSD cluster costs

Posted by Shane Hansen <sh...@gmail.com>.
Again, depends on your use case.
But we wanted to keep the data per node below 500gb,
and we found raided ssds to be the best bang for the buck
for our cluster. I think we moved to from the i2 to c3 because
our bottleneck tended to be CPU utilization (from parsing requests).



(Discliamer, we're not cassandra veterans but we're not part of the RF=N=3
club)



On Tue, Aug 19, 2014 at 10:00 AM, Russell Bradberry <rb...@gmail.com>
wrote:

> Short answer, it depends on your use-case.
>
> We migrated to i2.xlarge nodes and saw an immediate increase in
> performance.  If you just need plain ole raw disk space and don’t have a
> performance requirement to meet then the m1 machines would work, or hell
> even SSD EBS volumes may work for you.  The problem we were having is that
> we couldn’t fill the m1 machines because we needed to add more nodes for
> performance.  Now we have much more power and just the right amount of disk
> space.
>
> Basically saying, these are not apples-to-apples comparisons
>
>
>
> On August 19, 2014 at 11:57:10 AM, Jeremy Jongsma (jeremy@barchart.com)
> wrote:
>
> The latest consensus around the web for running Cassandra on EC2 seems to
> be "use new SSD instances." I've not seen any mention of the elephant in
> the room - using the new SSD instances significantly raises the cluster
> cost per TB. With Cassandra's strength being linear scalability to many
> terabytes of data, it strikes me as odd that everyone is recommending such
> a large storage cost hike almost without reservation.
>
> Monthly cost comparison for a 100TB cluster (non-reserved instances):
>
> m1.xlarge (2x420 non-SSD): $30,000 (120 nodes)
> m3.xlarge (2x40 SSD): $250,000 (1250 nodes! Clearly not an option)
> i2.xlarge (1x800 SSD): $76,000 (125 nodes)
>
> Best case, the cost goes up 150%. How are others approaching these new
> instances? Have you migrated and eaten the costs, or are you staying on
> previous generation until prices come down?
>
>

Re: EC2 SSD cluster costs

Posted by Russell Bradberry <rb...@gmail.com>.
Short answer, it depends on your use-case.

We migrated to i2.xlarge nodes and saw an immediate increase in performance.  If you just need plain ole raw disk space and don’t have a performance requirement to meet then the m1 machines would work, or hell even SSD EBS volumes may work for you.  The problem we were having is that we couldn’t fill the m1 machines because we needed to add more nodes for performance.  Now we have much more power and just the right amount of disk space.

Basically saying, these are not apples-to-apples comparisons



On August 19, 2014 at 11:57:10 AM, Jeremy Jongsma (jeremy@barchart.com) wrote:

The latest consensus around the web for running Cassandra on EC2 seems to be "use new SSD instances." I've not seen any mention of the elephant in the room - using the new SSD instances significantly raises the cluster cost per TB. With Cassandra's strength being linear scalability to many terabytes of data, it strikes me as odd that everyone is recommending such a large storage cost hike almost without reservation.

Monthly cost comparison for a 100TB cluster (non-reserved instances):

m1.xlarge (2x420 non-SSD): $30,000 (120 nodes)
m3.xlarge (2x40 SSD): $250,000 (1250 nodes! Clearly not an option)
i2.xlarge (1x800 SSD): $76,000 (125 nodes)

Best case, the cost goes up 150%. How are others approaching these new instances? Have you migrated and eaten the costs, or are you staying on previous generation until prices come down?

Re: EC2 SSD cluster costs

Posted by Kevin Burton <bu...@spinn3r.com>.
You're pricing it out at $ per GB… that's not the way to look at it.

Price it out at $ per IO… Once you price it that way, SSD makes a LOT more
sense.

Of course, it depends on your workload.  If you're just doing writes, and
they're all sequential, then cost per IO might not make a lot of sense.

We're VERY IO bound… so for us SSD is a no brainer.

We were actually all memory before because of this and just finished a big
SSD migration … (though on MySQL)…

But our Cassandra deploy will be on SSD on Softlayer.

It's a no brainer really..

Kevin


On Tue, Aug 19, 2014 at 8:56 AM, Jeremy Jongsma <je...@barchart.com> wrote:

> The latest consensus around the web for running Cassandra on EC2 seems to
> be "use new SSD instances." I've not seen any mention of the elephant in
> the room - using the new SSD instances significantly raises the cluster
> cost per TB. With Cassandra's strength being linear scalability to many
> terabytes of data, it strikes me as odd that everyone is recommending such
> a large storage cost hike almost without reservation.
>
> Monthly cost comparison for a 100TB cluster (non-reserved instances):
>
> m1.xlarge (2x420 non-SSD): $30,000 (120 nodes)
> m3.xlarge (2x40 SSD): $250,000 (1250 nodes! Clearly not an option)
> i2.xlarge (1x800 SSD): $76,000 (125 nodes)
>
> Best case, the cost goes up 150%. How are others approaching these new
> instances? Have you migrated and eaten the costs, or are you staying on
> previous generation until prices come down?
>



-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
<https://plus.google.com/102718274791889610666/posts>
<http://spinn3r.com>