You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Kanwar Sangha <ka...@mavenir.com> on 2013/02/21 23:56:42 UTC

Cassandra with SAN

Hi - Is it a good idea to use Cassandra with SAN ?  Say a SAN which provides me 8 Petabytes of storage. Would I not be I/O bound irrespective of the no of Cassandra machines and scaling by adding
machines won't help ?

Thanks
Kanwar

Re: Cassandra with SAN

Posted by David Schairer <ds...@humbaba.net>.
"Who breaks a butterfly upon a wheel?"

It will work, but you'd have a distributed database running on a single point of failure storage fabric, thus destroying much of your benefits, unless you have enough discrete SAN units that you treat them as racks in your cassandra topology to ensure that you have data replicated across redundant SAN shelves|controllers|etc.

You also would end up with redundancy at cross purposes in that the SAN will be striping data that Cassandra is already distributing efficiently.

If the SAN is free and unused, it'll be fine as a Cassandra test platform.  But I wouldn't spend a penny on SAN hardware instead of a much larger distributed cluster with commodity hardware.  Derive your redundancy and performance from lots of hardware in lots of places, not expensive hardware in one place.  

--DRS

On Feb 21, 2013, at 3:42 PM, Kanwar Sangha <ka...@mavenir.com> wrote:

> Ok. What would be the drawbacks J
>  
> From: Michael Kjellman [mailto:mkjellman@barracuda.com] 
> Sent: 21 February 2013 17:12
> To: user@cassandra.apache.org
> Subject: Re: Cassandra with SAN
>  
> No, this is a really really bad idea and C* was not designed for this, in fact, it was designed so you don't need to have a large expensive SAN.
>  
> Don't be tempted by the shiny expensive SAN. :)
>  
> If money is no object instead throw SSD's in your nodes and run 10G between racks
>  
> From: Kanwar Sangha <ka...@mavenir.com>
> Reply-To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
> Date: Thursday, February 21, 2013 2:56 PM
> To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
> Subject: Cassandra with SAN
>  
> Hi – Is it a good idea to use Cassandra with SAN ?  Say a SAN which provides me 8 Petabytes of storage. Would I not be I/O bound irrespective of the no of Cassandra machines and scaling by adding
> machines won’t help ?
>  
> Thanks
> Kanwar
>  
> ---------------------------------- 
> Copy, by Barracuda, helps you store, protect, and share all your amazing things. Start today: www.copy.com.
>   ­­  


Re: Cassandra with SAN

Posted by Ben Gambley <be...@intoscience.com>.
On Friday, February 22, 2013, Jared Biel wrote:

> > As a counter argument though, anyone running a C* cluster on the Amazon
> cloud is going to be using SAN storage (or some kind of proprietary storage
> array) at the lowest  layers...Amazon isn't going to have a bunch of JBOD
> running their cloud infrastructure.  However, they've invested in the
> infrastructure to do it right.
>
> This is certainly true when using EBS, however it's generally not
> recommended to use EBS when running Cassandra. EBS has proven to be
> unreliable in the past and it's a bit of a SPOF. Instead, it's recommended
> to use the "instance store" disks that come with most instances (handy
> chart here: http://www.ec2instances.info/). These are the rough
> equivalent of local disks (probably host level RAID 10 storage if I'd have
> to guess.)
>
> -Jared
>
> On 22 February 2013 00:40, Michael Morris <mi...@gmail.com>wrote:
>
> I'm running a 27 node cassandra cluster on SAN without issue.  I will be
> perfectly clear though, the hosts are multi-homed to different
> switches/fabrics in the SAN, we have an _expensive_ EMC array, and other
> than a datacenter-wide power outage, there's no SPOF for the SAN.  We use
> it because it's there, and it's already a sunk cost.
>
> I certainly would not go out of my way to purchase SAN infrastructure for
> a C* cluster, it just doesn't make sense (for all the reasons others have
> mentioned).  Any more, you can load up a single 2U server with multi-TB
> worth of disk, so the aggregate storage capacity of your C* cluster could
> potentially be as much as a SAN you would purchase (and a lot less hassle
> too).
>
> As a counter argument though, anyone running a C* cluster on the Amazon
> cloud is going to be using SAN storage (or some kind of proprietary storage
> array) at the lowest layers...Amazon isn't going to have a bunch of JBOD
> running their cloud infrastructure.  However, they've invested in the
> infrastructure to do it right.
>
> - Mike
>
>
> On Thu, Feb 21, 2013 at 6:08 PM, P. Taylor Goetz <pt...@gmail.com>wrote:
>
> I shouldn't have used the word "spinning"... SSDs are a great option as
> well.
>
> I also agree with all the "expensive SPOF" points others have made.
>
> Sent from my iPhone
>
> On Feb 21, 2013, at 6:56 PM, "P. Taylor Goetz" <pt...@gmail.com> wrote:
>
> Cassandra is designed to write and read data in a way that is optimized
> for physical spinning disks.
>
> Running C* on a SAN introduces a layer of abstraction that, at best
> negates those optimizations, and at worst introduces additional overhead.
>
> Sent from my iPhone
>
> On Feb 21, 2013, at 6:42 PM, Kanwar Sangha <ka...@mavenir.com> wrote:
>
>  Ok. What would be the drawbacks J****
>
> ** **
>
> *From:* Michael Kjellman [mailto:mkjellman@barracuda.com]
> *Sent:* 21 February 2013 17:12
> *To:* user@cassandra.apache.org
> *Subject:* Re: Cassandra with SAN****
>
> ** **
>
> No, this is a really really bad idea and C* was not designed for this, in
> fact, it was designed so you don't need to have a large expensive SAN.****
>
> ** **
>
> Don't be tempted by the shiny expensive SAN. :)****
>
> ** **
>
> If money is no object instead throw SSD's in your nodes and run 10G
> between racks****
>
> ** **
>
> *From: *Kanwar Sangha <ka...@mavenir.com>
> *Reply-To: *"user@cassandra.apache.org" <
>
>

Re: Cassandra with SAN

Posted by Jared Biel <ja...@bolderthinking.com>.
> As a counter argument though, anyone running a C* cluster on the Amazon
cloud is going to be using SAN storage (or some kind of proprietary storage
array) at the lowest  layers...Amazon isn't going to have a bunch of JBOD
running their cloud infrastructure.  However, they've invested in the
infrastructure to do it right.

This is certainly true when using EBS, however it's generally not
recommended to use EBS when running Cassandra. EBS has proven to be
unreliable in the past and it's a bit of a SPOF. Instead, it's recommended
to use the "instance store" disks that come with most instances (handy
chart here: http://www.ec2instances.info/). These are the rough equivalent
of local disks (probably host level RAID 10 storage if I'd have to guess.)

-Jared

On 22 February 2013 00:40, Michael Morris <mi...@gmail.com>wrote:

> I'm running a 27 node cassandra cluster on SAN without issue.  I will be
> perfectly clear though, the hosts are multi-homed to different
> switches/fabrics in the SAN, we have an _expensive_ EMC array, and other
> than a datacenter-wide power outage, there's no SPOF for the SAN.  We use
> it because it's there, and it's already a sunk cost.
>
> I certainly would not go out of my way to purchase SAN infrastructure for
> a C* cluster, it just doesn't make sense (for all the reasons others have
> mentioned).  Any more, you can load up a single 2U server with multi-TB
> worth of disk, so the aggregate storage capacity of your C* cluster could
> potentially be as much as a SAN you would purchase (and a lot less hassle
> too).
>
> As a counter argument though, anyone running a C* cluster on the Amazon
> cloud is going to be using SAN storage (or some kind of proprietary storage
> array) at the lowest layers...Amazon isn't going to have a bunch of JBOD
> running their cloud infrastructure.  However, they've invested in the
> infrastructure to do it right.
>
> - Mike
>
>
> On Thu, Feb 21, 2013 at 6:08 PM, P. Taylor Goetz <pt...@gmail.com>wrote:
>
>> I shouldn't have used the word "spinning"... SSDs are a great option as
>> well.
>>
>> I also agree with all the "expensive SPOF" points others have made.
>>
>> Sent from my iPhone
>>
>> On Feb 21, 2013, at 6:56 PM, "P. Taylor Goetz" <pt...@gmail.com> wrote:
>>
>> Cassandra is designed to write and read data in a way that is optimized
>> for physical spinning disks.
>>
>> Running C* on a SAN introduces a layer of abstraction that, at best
>> negates those optimizations, and at worst introduces additional overhead.
>>
>> Sent from my iPhone
>>
>> On Feb 21, 2013, at 6:42 PM, Kanwar Sangha <ka...@mavenir.com> wrote:
>>
>>  Ok. What would be the drawbacks J****
>>
>> ** **
>>
>> *From:* Michael Kjellman [mailto:mkjellman@barracuda.com<mk...@barracuda.com>]
>>
>> *Sent:* 21 February 2013 17:12
>> *To:* user@cassandra.apache.org
>> *Subject:* Re: Cassandra with SAN****
>>
>> ** **
>>
>> No, this is a really really bad idea and C* was not designed for this, in
>> fact, it was designed so you don't need to have a large expensive SAN.***
>> *
>>
>> ** **
>>
>> Don't be tempted by the shiny expensive SAN. :)****
>>
>> ** **
>>
>> If money is no object instead throw SSD's in your nodes and run 10G
>> between racks****
>>
>> ** **
>>
>> *From: *Kanwar Sangha <ka...@mavenir.com>
>> *Reply-To: *"user@cassandra.apache.org" <us...@cassandra.apache.org>
>> *Date: *Thursday, February 21, 2013 2:56 PM
>> *To: *"user@cassandra.apache.org" <us...@cassandra.apache.org>
>> *Subject: *Cassandra with SAN****
>>
>> ** **
>>
>> Hi – Is it a good idea to use Cassandra with SAN ?  Say a SAN which
>> provides me 8 Petabytes of storage. Would I not be I/O bound irrespective
>> of the no of Cassandra machines and scaling by adding ****
>>
>> machines won’t help ?****
>>
>>  ****
>>
>> Thanks****
>>
>> Kanwar****
>>
>> ** **
>>
>> ----------------------------------
>> Copy, by Barracuda, helps you store, protect, and share all your amazing
>> things. Start today: www.copy.com <http://www.copy.com?a=em_footer>. ****
>>
>>   ­­  ****
>>
>>
>

Re: Cassandra with SAN

Posted by Michael Morris <mi...@gmail.com>.
I'm running a 27 node cassandra cluster on SAN without issue.  I will be
perfectly clear though, the hosts are multi-homed to different
switches/fabrics in the SAN, we have an _expensive_ EMC array, and other
than a datacenter-wide power outage, there's no SPOF for the SAN.  We use
it because it's there, and it's already a sunk cost.

I certainly would not go out of my way to purchase SAN infrastructure for a
C* cluster, it just doesn't make sense (for all the reasons others have
mentioned).  Any more, you can load up a single 2U server with multi-TB
worth of disk, so the aggregate storage capacity of your C* cluster could
potentially be as much as a SAN you would purchase (and a lot less hassle
too).

As a counter argument though, anyone running a C* cluster on the Amazon
cloud is going to be using SAN storage (or some kind of proprietary storage
array) at the lowest layers...Amazon isn't going to have a bunch of JBOD
running their cloud infrastructure.  However, they've invested in the
infrastructure to do it right.

- Mike

On Thu, Feb 21, 2013 at 6:08 PM, P. Taylor Goetz <pt...@gmail.com> wrote:

> I shouldn't have used the word "spinning"... SSDs are a great option as
> well.
>
> I also agree with all the "expensive SPOF" points others have made.
>
> Sent from my iPhone
>
> On Feb 21, 2013, at 6:56 PM, "P. Taylor Goetz" <pt...@gmail.com> wrote:
>
> Cassandra is designed to write and read data in a way that is optimized
> for physical spinning disks.
>
> Running C* on a SAN introduces a layer of abstraction that, at best
> negates those optimizations, and at worst introduces additional overhead.
>
> Sent from my iPhone
>
> On Feb 21, 2013, at 6:42 PM, Kanwar Sangha <ka...@mavenir.com> wrote:
>
>  Ok. What would be the drawbacks J****
>
> ** **
>
> *From:* Michael Kjellman [mailto:mkjellman@barracuda.com<mk...@barracuda.com>]
>
> *Sent:* 21 February 2013 17:12
> *To:* user@cassandra.apache.org
> *Subject:* Re: Cassandra with SAN****
>
> ** **
>
> No, this is a really really bad idea and C* was not designed for this, in
> fact, it was designed so you don't need to have a large expensive SAN.****
>
> ** **
>
> Don't be tempted by the shiny expensive SAN. :)****
>
> ** **
>
> If money is no object instead throw SSD's in your nodes and run 10G
> between racks****
>
> ** **
>
> *From: *Kanwar Sangha <ka...@mavenir.com>
> *Reply-To: *"user@cassandra.apache.org" <us...@cassandra.apache.org>
> *Date: *Thursday, February 21, 2013 2:56 PM
> *To: *"user@cassandra.apache.org" <us...@cassandra.apache.org>
> *Subject: *Cassandra with SAN****
>
> ** **
>
> Hi – Is it a good idea to use Cassandra with SAN ?  Say a SAN which
> provides me 8 Petabytes of storage. Would I not be I/O bound irrespective
> of the no of Cassandra machines and scaling by adding ****
>
> machines won’t help ?****
>
>  ****
>
> Thanks****
>
> Kanwar****
>
> ** **
>
> ----------------------------------
> Copy, by Barracuda, helps you store, protect, and share all your amazing
> things. Start today: www.copy.com <http://www.copy.com?a=em_footer>. ****
>
>   ­­  ****
>
>

Re: Cassandra with SAN

Posted by "P. Taylor Goetz" <pt...@gmail.com>.
I shouldn't have used the word "spinning"... SSDs are a great option as well.

I also agree with all the "expensive SPOF" points others have made.

Sent from my iPhone

On Feb 21, 2013, at 6:56 PM, "P. Taylor Goetz" <pt...@gmail.com> wrote:

> Cassandra is designed to write and read data in a way that is optimized for physical spinning disks.
> 
> Running C* on a SAN introduces a layer of abstraction that, at best negates those optimizations, and at worst introduces additional overhead.
> 
> Sent from my iPhone
> 
> On Feb 21, 2013, at 6:42 PM, Kanwar Sangha <ka...@mavenir.com> wrote:
> 
>> Ok. What would be the drawbacks J
>>  
>> From: Michael Kjellman [mailto:mkjellman@barracuda.com] 
>> Sent: 21 February 2013 17:12
>> To: user@cassandra.apache.org
>> Subject: Re: Cassandra with SAN
>>  
>> No, this is a really really bad idea and C* was not designed for this, in fact, it was designed so you don't need to have a large expensive SAN.
>>  
>> Don't be tempted by the shiny expensive SAN. :)
>>  
>> If money is no object instead throw SSD's in your nodes and run 10G between racks
>>  
>> From: Kanwar Sangha <ka...@mavenir.com>
>> Reply-To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
>> Date: Thursday, February 21, 2013 2:56 PM
>> To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
>> Subject: Cassandra with SAN
>>  
>> Hi – Is it a good idea to use Cassandra with SAN ?  Say a SAN which provides me 8 Petabytes of storage. Would I not be I/O bound irrespective of the no of Cassandra machines and scaling by adding
>> machines won’t help ?
>>  
>> Thanks
>> Kanwar
>>  
>> ---------------------------------- 
>> Copy, by Barracuda, helps you store, protect, and share all your amazing things. Start today: www.copy.com.
>>   ­­  

Re: Cassandra with SAN

Posted by "P. Taylor Goetz" <pt...@gmail.com>.
Cassandra is designed to write and read data in a way that is optimized for physical spinning disks.

Running C* on a SAN introduces a layer of abstraction that, at best negates those optimizations, and at worst introduces additional overhead.

Sent from my iPhone

On Feb 21, 2013, at 6:42 PM, Kanwar Sangha <ka...@mavenir.com> wrote:

> Ok. What would be the drawbacks J
>  
> From: Michael Kjellman [mailto:mkjellman@barracuda.com] 
> Sent: 21 February 2013 17:12
> To: user@cassandra.apache.org
> Subject: Re: Cassandra with SAN
>  
> No, this is a really really bad idea and C* was not designed for this, in fact, it was designed so you don't need to have a large expensive SAN.
>  
> Don't be tempted by the shiny expensive SAN. :)
>  
> If money is no object instead throw SSD's in your nodes and run 10G between racks
>  
> From: Kanwar Sangha <ka...@mavenir.com>
> Reply-To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
> Date: Thursday, February 21, 2013 2:56 PM
> To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
> Subject: Cassandra with SAN
>  
> Hi – Is it a good idea to use Cassandra with SAN ?  Say a SAN which provides me 8 Petabytes of storage. Would I not be I/O bound irrespective of the no of Cassandra machines and scaling by adding
> machines won’t help ?
>  
> Thanks
> Kanwar
>  
> ---------------------------------- 
> Copy, by Barracuda, helps you store, protect, and share all your amazing things. Start today: www.copy.com.
>   ­­  

Re: Cassandra with SAN

Posted by Michael Kjellman <mk...@barracuda.com>.
Adding a Single Point of Failure when you chose a distributed database for probably a good reason. I'd also think you'd be tempted to have multiple terabytes per node. (so you're even more cost inefficient because you'll still need to buy the same number of nodes everyone else does even though you have the SAN). Then any operations are going to be unbearable (repair, cleanup). Also if you want to be multi dc, now you'll need two SANS.

I can't think of one good reason to run C* with a SAN.

From: Kanwar Sangha <ka...@mavenir.com>>
Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Date: Thursday, February 21, 2013 3:42 PM
To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Subject: RE: Cassandra with SAN

Ok. What would be the drawbacks :)

From: Michael Kjellman [mailto:mkjellman@barracuda.com]
Sent: 21 February 2013 17:12
To: user@cassandra.apache.org<ma...@cassandra.apache.org>
Subject: Re: Cassandra with SAN

No, this is a really really bad idea and C* was not designed for this, in fact, it was designed so you don't need to have a large expensive SAN.

Don't be tempted by the shiny expensive SAN. :)

If money is no object instead throw SSD's in your nodes and run 10G between racks

From: Kanwar Sangha <ka...@mavenir.com>>
Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Date: Thursday, February 21, 2013 2:56 PM
To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Subject: Cassandra with SAN

Hi – Is it a good idea to use Cassandra with SAN ?  Say a SAN which provides me 8 Petabytes of storage. Would I not be I/O bound irrespective of the no of Cassandra machines and scaling by adding
machines won’t help ?

Thanks
Kanwar

----------------------------------
Copy, by Barracuda, helps you store, protect, and share all your amazing things. Start today: www.copy.com<http://www.copy.com?a=em_footer>.
  ­­

Copy, by Barracuda, helps you store, protect, and share all your amazing
things. Start today: www.copy.com.

RE: Cassandra with SAN

Posted by Kanwar Sangha <ka...@mavenir.com>.
Ok. What would be the drawbacks :)

From: Michael Kjellman [mailto:mkjellman@barracuda.com]
Sent: 21 February 2013 17:12
To: user@cassandra.apache.org
Subject: Re: Cassandra with SAN

No, this is a really really bad idea and C* was not designed for this, in fact, it was designed so you don't need to have a large expensive SAN.

Don't be tempted by the shiny expensive SAN. :)

If money is no object instead throw SSD's in your nodes and run 10G between racks

From: Kanwar Sangha <ka...@mavenir.com>>
Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Date: Thursday, February 21, 2013 2:56 PM
To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Subject: Cassandra with SAN

Hi - Is it a good idea to use Cassandra with SAN ?  Say a SAN which provides me 8 Petabytes of storage. Would I not be I/O bound irrespective of the no of Cassandra machines and scaling by adding
machines won't help ?

Thanks
Kanwar

----------------------------------
Copy, by Barracuda, helps you store, protect, and share all your amazing things. Start today: www.copy.com<http://www.copy.com?a=em_footer>.
  

Re: Cassandra with SAN

Posted by Michael Kjellman <mk...@barracuda.com>.
No, this is a really really bad idea and C* was not designed for this, in fact, it was designed so you don't need to have a large expensive SAN.

Don't be tempted by the shiny expensive SAN. :)

If money is no object instead throw SSD's in your nodes and run 10G between racks

From: Kanwar Sangha <ka...@mavenir.com>>
Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Date: Thursday, February 21, 2013 2:56 PM
To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Subject: Cassandra with SAN

Hi – Is it a good idea to use Cassandra with SAN ?  Say a SAN which provides me 8 Petabytes of storage. Would I not be I/O bound irrespective of the no of Cassandra machines and scaling by adding
machines won’t help ?

Thanks
Kanwar

Copy, by Barracuda, helps you store, protect, and share all your amazing
things. Start today: www.copy.com.