You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Alexandru Sicoe <ad...@gmail.com> on 2011/10/25 14:23:15 UTC

Cassandra cluster HW spec (commit log directory vs data file directory)

Hi everyone,

I am currently in the process of writing a hardware proposal for a Cassandra
cluster for storing a lot of monitoring time series data. My workload is
write intensive and my data set is extremely varied in types of variables
and insertion rate for these variables (I will have to handle an order of 2
million variables coming in, each at very different rates - the majority of
them will come at very low rates but there are many that will come at higher
rates constant rates and a few coming in with huge spikes in rates). These
variables correspond to all basic C++ types and arrays of these types. The
highest insertion rates are received for basic types, out of which U32
variables seem to be the most prevalent (e.g. I recorded 2 million U32 vars
were inserted in 8 mins of operation while 600.000 doubles and 170.000
strings were inserted during the same time. Note this measurement was only
for a subset of the total data currently taken in).

At the moment I am partitioning the data in Cassandra in 75 CFs (each CF
corresponds to a logical partitioning of the set of variables mentioned
before - but this partitioning is not related with the amount of data or
rates...it is somewhat random). These 75 CFs account for ~1 million of the
variables I need to store. I have a 3 node Cassandra 0.8.5 cluster (each
node is a 4 real core with 4 GB RAM and split commit log directory and data
file directory between two RAID arrays with HDDs). I can handle the load in
this configuration but the average CPU usage of the Cassandra nodes is
slightly above 50%. As I will need to add 12 more CFs (corresponding to
another ~ 1 million variables) plus potentially other data later, it is
clear that I need better hardware (also for the retrieval part).

I am looking at Dell servers (Power Edge etc)

Questions:

1. Is anyone using Dell HW for their Cassandra clusters? How do they behave?
Anybody care to share their configurations or tips for buying, what to avoid
etc?

2. Obviously I am going to keep to the advice on the
http://wiki.apache.org/cassandra/CassandraHardware and split the commmitlog
and data on separate disks. I was going to use SSD for commitlog but then
did some more research and found out that it doesn't make sense to use SSDs
for sequential appends because it won't have a performance advantage with
respect to rotational media. So I am going to use rotational disk for the
commit log and an SSD for data. Does this make sense?

3. What's the best way to find out how big my commitlog disk and my data
disk has to be? The Cassandra hardware page says the Commitlog disk
shouldn't be big but still I need to choose a size!

4. I also noticed RAID 0 configuration is recommended for the data file
directory. Can anyone explain why?

Sorry for the huge email.....

Cheers,
Alex

Re: Cassandra cluster HW spec (commit log directory vs data file directory)

Posted by Chris Goffinet <cg...@chrisgoffinet.com>.
No. We built a pluggable cache provider for memcache.

On Sun, Oct 30, 2011 at 7:31 PM, Mohit Anchlia <mo...@gmail.com>wrote:

> On Sun, Oct 30, 2011 at 6:53 PM, Chris Goffinet <cg...@chrisgoffinet.com>
> wrote:
> >
> >
> > On Sun, Oct 30, 2011 at 3:34 PM, Sorin Julean <so...@gmail.com>
> > wrote:
> >>
> >> Hey Chris,
> >>
> >>  Thanks for sharing all  the info.
> >>  I have few questions:
> >>  1. What are you doing with so much memory :) ? How much of it do you
> >> allocate for heap ?
> >
> > max heap is 12GB. we use the rest for cache. we run memcache on each node
> > and allocate the remaining to that.
>
> Is this using off heap cache of Cassandra?
>
> >
> >>
> >>  2. What your network speed ? Do you use trunks ? Do you have a
> dedicated
> >> VLAN for gossip/store traffic ?
> >>
> > No dedicated VLAN for gossip. We run at 2Gb/s. We have bonded NIC's.
> >
> >>
> >> Cheers,
> >> Sorin
> >>
> >>
> >> On Sun, Oct 30, 2011 at 5:00 AM, Chris Goffinet <cg...@chrisgoffinet.com>
> >> wrote:
> >>>
> >>> RE: RAID0 Recommendation
> >>> Cassandra supports multiple data file directories. Because we do
> >>> compactions, it's just much easier to deal with (1) data file
> directory that
> >>> is stripped across all disks as 1 volume (RAID0). There are other ways
> to
> >>> accomplish this though. At Twitter we use software raid (RAID0 &
> RAID10).
> >>> We own the physical hardware and have found that even with hardware
> raid,
> >>> software raid in Linux actually faster. The reason being is:
> >>> http://en.wikipedia.org/wiki/Non-standard_RAID_levels#Linux_MD_RAID_10
> >>> We have found that using far-copies is much faster over near-copies. We
> >>> set the i/o scheduler to noop at the moment. We might move back to CFQ
> with
> >>> more tuning in the future.
> >>> We use RAID10 for cases where we need better disk performance if we are
> >>> hitting the disk often, sacrificing storage. We initially thought RAID0
> >>> should be faster over RAID10 until we found out about the near vs far
> >>> layouts.
> >>> RE: Hardware
> >>> This is going to depend on how well your automated infrastructure is,
> but
> >>> we chose the path of finding the cheapest servers we could get from
> >>> Dell/HP/etc. 8/12 cores, 72gb memory per node, 2TB/3TB, 2.5".
> >>> We are in the process of making changes to our servers, I'll report
> back
> >>> in when we have more details to share.
> >>> I wouldn't recommend 75 CFs. It could work but just seems too complex.
> >>> Another recommendation for clusters, always go big. You will be
> thankful
> >>> in the future for this. Even if you can do this on 3-6 nodes, go much
> larger
> >>> for future expansion. If you own your hardware and racks, I recommend
> making
> >>> sure to size out the rack diversity and # of nodes per rack. Also take
> into
> >>> account the replication factor when doing this. RF=3, should be min of
> 3
> >>> racks, and # of nodes per rack should be divisible by the replication
> >>> factor. This has worked out pretty well for us. Our biggest problems
> today
> >>> are adding 100s of nodes to existing clusters at once. I'm not sure
> how many
> >>> other companies are having this problem, but it's certainly on our
> radar to
> >>> improve, if you get to that point :)
> >>>
> >>> On Tue, Oct 25, 2011 at 5:23 AM, Alexandru Sicoe <ad...@gmail.com>
> >>> wrote:
> >>>>
> >>>> Hi everyone,
> >>>>
> >>>> I am currently in the process of writing a hardware proposal for a
> >>>> Cassandra cluster for storing a lot of monitoring time series data. My
> >>>> workload is write intensive and my data set is extremely varied in
> types of
> >>>> variables and insertion rate for these variables (I will have to
> handle an
> >>>> order of 2 million variables coming in, each at very different rates
> - the
> >>>> majority of them will come at very low rates but there are many that
> will
> >>>> come at higher rates constant rates and a few coming in with huge
> spikes in
> >>>> rates). These variables correspond to all basic C++ types and arrays
> of
> >>>> these types. The highest insertion rates are received for basic
> types, out
> >>>> of which U32 variables seem to be the most prevalent (e.g. I recorded
> 2
> >>>> million U32 vars were inserted in 8 mins of operation while 600.000
> doubles
> >>>> and 170.000 strings were inserted during the same time. Note this
> >>>> measurement was only for a subset of the total data currently taken
> in).
> >>>>
> >>>> At the moment I am partitioning the data in Cassandra in 75 CFs (each
> CF
> >>>> corresponds to a logical partitioning of the set of variables
> mentioned
> >>>> before - but this partitioning is not related with the amount of data
> or
> >>>> rates...it is somewhat random). These 75 CFs account for ~1 million
> of the
> >>>> variables I need to store. I have a 3 node Cassandra 0.8.5 cluster
> (each
> >>>> node is a 4 real core with 4 GB RAM and split commit log directory
> and data
> >>>> file directory between two RAID arrays with HDDs). I can handle the
> load in
> >>>> this configuration but the average CPU usage of the Cassandra nodes is
> >>>> slightly above 50%. As I will need to add 12 more CFs (corresponding
> to
> >>>> another ~ 1 million variables) plus potentially other data later, it
> is
> >>>> clear that I need better hardware (also for the retrieval part).
> >>>>
> >>>> I am looking at Dell servers (Power Edge etc)
> >>>>
> >>>> Questions:
> >>>>
> >>>> 1. Is anyone using Dell HW for their Cassandra clusters? How do they
> >>>> behave? Anybody care to share their configurations or tips for
> buying, what
> >>>> to avoid etc?
> >>>>
> >>>> 2. Obviously I am going to keep to the advice on the
> >>>> http://wiki.apache.org/cassandra/CassandraHardware and split the
> commmitlog
> >>>> and data on separate disks. I was going to use SSD for commitlog but
> then
> >>>> did some more research and found out that it doesn't make sense to
> use SSDs
> >>>> for sequential appends because it won't have a performance advantage
> with
> >>>> respect to rotational media. So I am going to use rotational disk for
> the
> >>>> commit log and an SSD for data. Does this make sense?
> >>>>
> >>>> 3. What's the best way to find out how big my commitlog disk and my
> data
> >>>> disk has to be? The Cassandra hardware page says the Commitlog disk
> >>>> shouldn't be big but still I need to choose a size!
> >>>>
> >>>> 4. I also noticed RAID 0 configuration is recommended for the data
> file
> >>>> directory. Can anyone explain why?
> >>>>
> >>>> Sorry for the huge email.....
> >>>>
> >>>> Cheers,
> >>>> Alex
> >>>
> >>
> >
> >
>

Re: Cassandra cluster HW spec (commit log directory vs data file directory)

Posted by Mohit Anchlia <mo...@gmail.com>.
On Sun, Oct 30, 2011 at 6:53 PM, Chris Goffinet <cg...@chrisgoffinet.com> wrote:
>
>
> On Sun, Oct 30, 2011 at 3:34 PM, Sorin Julean <so...@gmail.com>
> wrote:
>>
>> Hey Chris,
>>
>>  Thanks for sharing all  the info.
>>  I have few questions:
>>  1. What are you doing with so much memory :) ? How much of it do you
>> allocate for heap ?
>
> max heap is 12GB. we use the rest for cache. we run memcache on each node
> and allocate the remaining to that.

Is this using off heap cache of Cassandra?

>
>>
>>  2. What your network speed ? Do you use trunks ? Do you have a dedicated
>> VLAN for gossip/store traffic ?
>>
> No dedicated VLAN for gossip. We run at 2Gb/s. We have bonded NIC's.
>
>>
>> Cheers,
>> Sorin
>>
>>
>> On Sun, Oct 30, 2011 at 5:00 AM, Chris Goffinet <cg...@chrisgoffinet.com>
>> wrote:
>>>
>>> RE: RAID0 Recommendation
>>> Cassandra supports multiple data file directories. Because we do
>>> compactions, it's just much easier to deal with (1) data file directory that
>>> is stripped across all disks as 1 volume (RAID0). There are other ways to
>>> accomplish this though. At Twitter we use software raid (RAID0 & RAID10).
>>> We own the physical hardware and have found that even with hardware raid,
>>> software raid in Linux actually faster. The reason being is:
>>> http://en.wikipedia.org/wiki/Non-standard_RAID_levels#Linux_MD_RAID_10
>>> We have found that using far-copies is much faster over near-copies. We
>>> set the i/o scheduler to noop at the moment. We might move back to CFQ with
>>> more tuning in the future.
>>> We use RAID10 for cases where we need better disk performance if we are
>>> hitting the disk often, sacrificing storage. We initially thought RAID0
>>> should be faster over RAID10 until we found out about the near vs far
>>> layouts.
>>> RE: Hardware
>>> This is going to depend on how well your automated infrastructure is, but
>>> we chose the path of finding the cheapest servers we could get from
>>> Dell/HP/etc. 8/12 cores, 72gb memory per node, 2TB/3TB, 2.5".
>>> We are in the process of making changes to our servers, I'll report back
>>> in when we have more details to share.
>>> I wouldn't recommend 75 CFs. It could work but just seems too complex.
>>> Another recommendation for clusters, always go big. You will be thankful
>>> in the future for this. Even if you can do this on 3-6 nodes, go much larger
>>> for future expansion. If you own your hardware and racks, I recommend making
>>> sure to size out the rack diversity and # of nodes per rack. Also take into
>>> account the replication factor when doing this. RF=3, should be min of 3
>>> racks, and # of nodes per rack should be divisible by the replication
>>> factor. This has worked out pretty well for us. Our biggest problems today
>>> are adding 100s of nodes to existing clusters at once. I'm not sure how many
>>> other companies are having this problem, but it's certainly on our radar to
>>> improve, if you get to that point :)
>>>
>>> On Tue, Oct 25, 2011 at 5:23 AM, Alexandru Sicoe <ad...@gmail.com>
>>> wrote:
>>>>
>>>> Hi everyone,
>>>>
>>>> I am currently in the process of writing a hardware proposal for a
>>>> Cassandra cluster for storing a lot of monitoring time series data. My
>>>> workload is write intensive and my data set is extremely varied in types of
>>>> variables and insertion rate for these variables (I will have to handle an
>>>> order of 2 million variables coming in, each at very different rates - the
>>>> majority of them will come at very low rates but there are many that will
>>>> come at higher rates constant rates and a few coming in with huge spikes in
>>>> rates). These variables correspond to all basic C++ types and arrays of
>>>> these types. The highest insertion rates are received for basic types, out
>>>> of which U32 variables seem to be the most prevalent (e.g. I recorded 2
>>>> million U32 vars were inserted in 8 mins of operation while 600.000 doubles
>>>> and 170.000 strings were inserted during the same time. Note this
>>>> measurement was only for a subset of the total data currently taken in).
>>>>
>>>> At the moment I am partitioning the data in Cassandra in 75 CFs (each CF
>>>> corresponds to a logical partitioning of the set of variables mentioned
>>>> before - but this partitioning is not related with the amount of data or
>>>> rates...it is somewhat random). These 75 CFs account for ~1 million of the
>>>> variables I need to store. I have a 3 node Cassandra 0.8.5 cluster (each
>>>> node is a 4 real core with 4 GB RAM and split commit log directory and data
>>>> file directory between two RAID arrays with HDDs). I can handle the load in
>>>> this configuration but the average CPU usage of the Cassandra nodes is
>>>> slightly above 50%. As I will need to add 12 more CFs (corresponding to
>>>> another ~ 1 million variables) plus potentially other data later, it is
>>>> clear that I need better hardware (also for the retrieval part).
>>>>
>>>> I am looking at Dell servers (Power Edge etc)
>>>>
>>>> Questions:
>>>>
>>>> 1. Is anyone using Dell HW for their Cassandra clusters? How do they
>>>> behave? Anybody care to share their configurations or tips for buying, what
>>>> to avoid etc?
>>>>
>>>> 2. Obviously I am going to keep to the advice on the
>>>> http://wiki.apache.org/cassandra/CassandraHardware and split the commmitlog
>>>> and data on separate disks. I was going to use SSD for commitlog but then
>>>> did some more research and found out that it doesn't make sense to use SSDs
>>>> for sequential appends because it won't have a performance advantage with
>>>> respect to rotational media. So I am going to use rotational disk for the
>>>> commit log and an SSD for data. Does this make sense?
>>>>
>>>> 3. What's the best way to find out how big my commitlog disk and my data
>>>> disk has to be? The Cassandra hardware page says the Commitlog disk
>>>> shouldn't be big but still I need to choose a size!
>>>>
>>>> 4. I also noticed RAID 0 configuration is recommended for the data file
>>>> directory. Can anyone explain why?
>>>>
>>>> Sorry for the huge email.....
>>>>
>>>> Cheers,
>>>> Alex
>>>
>>
>
>

Re: Cassandra cluster HW spec (commit log directory vs data file directory)

Posted by Chris Goffinet <cg...@chrisgoffinet.com>.
On Sun, Oct 30, 2011 at 3:34 PM, Sorin Julean <so...@gmail.com>wrote:

> Hey Chris,
>
>  Thanks for sharing all  the info.
>  I have few questions:
>  1. What are you doing with so much memory :) ? How much of it do you
> allocate for heap ?
>

max heap is 12GB. we use the rest for cache. we run memcache on each node
and allocate the remaining to that.


>  2. What your network speed ? Do you use trunks ? Do you have a dedicated
> VLAN for gossip/store traffic ?
>
> No dedicated VLAN for gossip. We run at 2Gb/s. We have bonded NIC's.



> Cheers,
> Sorin
>
>
> On Sun, Oct 30, 2011 at 5:00 AM, Chris Goffinet <cg...@chrisgoffinet.com>wrote:
>
>> RE: RAID0 Recommendation
>>
>> Cassandra supports multiple data file directories. Because we do
>> compactions, it's just much easier to deal with (1) data file directory
>> that is stripped across all disks as 1 volume (RAID0). There are other ways
>> to accomplish this though. At Twitter we use software raid (RAID0 & RAID10).
>>
>> We own the physical hardware and have found that even with hardware raid,
>> software raid in Linux actually faster. The reason being is:
>>
>> http://en.wikipedia.org/wiki/Non-standard_RAID_levels#Linux_MD_RAID_10
>>
>> We have found that using far-copies is much faster over near-copies. We
>> set the i/o scheduler to noop at the moment. We might move back to CFQ with
>> more tuning in the future.
>>
>> We use RAID10 for cases where we need better disk performance if we are
>> hitting the disk often, sacrificing storage. We initially thought RAID0
>> should be faster over RAID10 until we found out about the near vs far
>> layouts.
>>
>> RE: Hardware
>>
>> This is going to depend on how well your automated infrastructure is, but
>> we chose the path of finding the cheapest servers we could get from
>> Dell/HP/etc. 8/12 cores, 72gb memory per node, 2TB/3TB, 2.5".
>>
>> We are in the process of making changes to our servers, I'll report back
>> in when we have more details to share.
>>
>> I wouldn't recommend 75 CFs. It could work but just seems too complex.
>>
>> Another recommendation for clusters, always go big. You will be thankful
>> in the future for this. Even if you can do this on 3-6 nodes, go much
>> larger for future expansion. If you own your hardware and racks, I
>> recommend making sure to size out the rack diversity and # of nodes per
>> rack. Also take into account the replication factor when doing this. RF=3,
>> should be min of 3 racks, and # of nodes per rack should be divisible by
>> the replication factor. This has worked out pretty well for us. Our biggest
>> problems today are adding 100s of nodes to existing clusters at once. I'm
>> not sure how many other companies are having this problem, but it's
>> certainly on our radar to improve, if you get to that point :)
>>
>>
>> On Tue, Oct 25, 2011 at 5:23 AM, Alexandru Sicoe <ad...@gmail.com>wrote:
>>
>>> Hi everyone,
>>>
>>> I am currently in the process of writing a hardware proposal for a
>>> Cassandra cluster for storing a lot of monitoring time series data. My
>>> workload is write intensive and my data set is extremely varied in types of
>>> variables and insertion rate for these variables (I will have to handle an
>>> order of 2 million variables coming in, each at very different rates - the
>>> majority of them will come at very low rates but there are many that will
>>> come at higher rates constant rates and a few coming in with huge spikes in
>>> rates). These variables correspond to all basic C++ types and arrays of
>>> these types. The highest insertion rates are received for basic types, out
>>> of which U32 variables seem to be the most prevalent (e.g. I recorded 2
>>> million U32 vars were inserted in 8 mins of operation while 600.000 doubles
>>> and 170.000 strings were inserted during the same time. Note this
>>> measurement was only for a subset of the total data currently taken in).
>>>
>>> At the moment I am partitioning the data in Cassandra in 75 CFs (each CF
>>> corresponds to a logical partitioning of the set of variables mentioned
>>> before - but this partitioning is not related with the amount of data or
>>> rates...it is somewhat random). These 75 CFs account for ~1 million of the
>>> variables I need to store. I have a 3 node Cassandra 0.8.5 cluster (each
>>> node is a 4 real core with 4 GB RAM and split commit log directory and data
>>> file directory between two RAID arrays with HDDs). I can handle the load in
>>> this configuration but the average CPU usage of the Cassandra nodes is
>>> slightly above 50%. As I will need to add 12 more CFs (corresponding to
>>> another ~ 1 million variables) plus potentially other data later, it is
>>> clear that I need better hardware (also for the retrieval part).
>>>
>>> I am looking at Dell servers (Power Edge etc)
>>>
>>> Questions:
>>>
>>> 1. Is anyone using Dell HW for their Cassandra clusters? How do they
>>> behave? Anybody care to share their configurations or tips for buying, what
>>> to avoid etc?
>>>
>>> 2. Obviously I am going to keep to the advice on the
>>> http://wiki.apache.org/cassandra/CassandraHardware and split the
>>> commmitlog and data on separate disks. I was going to use SSD for commitlog
>>> but then did some more research and found out that it doesn't make sense to
>>> use SSDs for sequential appends because it won't have a performance
>>> advantage with respect to rotational media. So I am going to use rotational
>>> disk for the commit log and an SSD for data. Does this make sense?
>>>
>>> 3. What's the best way to find out how big my commitlog disk and my data
>>> disk has to be? The Cassandra hardware page says the Commitlog disk
>>> shouldn't be big but still I need to choose a size!
>>>
>>> 4. I also noticed RAID 0 configuration is recommended for the data file
>>> directory. Can anyone explain why?
>>>
>>> Sorry for the huge email.....
>>>
>>> Cheers,
>>> Alex
>>>
>>
>>
>

Re: Cassandra cluster HW spec (commit log directory vs data file directory)

Posted by Radim Kolar <hs...@sendmail.cz>.
Dne 30.10.2011 23:34, Sorin Julean napsal(a):
> Hey Chris,
>
>  Thanks for sharing all  the info.
>  I have few questions:
>  1. What are you doing with so much memory :) ?
cassandra eats memory like there is no tomorrow on large databases. It 
keeps some structures in memory which depends on database size.

  2. What your network speed ? 100 mbit is failure

3. Do you have a dedicated VLAN for gossip/store traffic ?
We share hadoop + cassandra on VLAN due to low budget. It is best to 
have them separated. Hadoop is very heavy on network.

Re: Cassandra cluster HW spec (commit log directory vs data file directory)

Posted by Sorin Julean <so...@gmail.com>.
Hey Chris,

 Thanks for sharing all  the info.
 I have few questions:
 1. What are you doing with so much memory :) ? How much of it do you
allocate for heap ?
 2. What your network speed ? Do you use trunks ? Do you have a dedicated
VLAN for gossip/store traffic ?

Cheers,
Sorin


On Sun, Oct 30, 2011 at 5:00 AM, Chris Goffinet <cg...@chrisgoffinet.com>wrote:

> RE: RAID0 Recommendation
>
> Cassandra supports multiple data file directories. Because we do
> compactions, it's just much easier to deal with (1) data file directory
> that is stripped across all disks as 1 volume (RAID0). There are other ways
> to accomplish this though. At Twitter we use software raid (RAID0 & RAID10).
>
> We own the physical hardware and have found that even with hardware raid,
> software raid in Linux actually faster. The reason being is:
>
> http://en.wikipedia.org/wiki/Non-standard_RAID_levels#Linux_MD_RAID_10
>
> We have found that using far-copies is much faster over near-copies. We
> set the i/o scheduler to noop at the moment. We might move back to CFQ with
> more tuning in the future.
>
> We use RAID10 for cases where we need better disk performance if we are
> hitting the disk often, sacrificing storage. We initially thought RAID0
> should be faster over RAID10 until we found out about the near vs far
> layouts.
>
> RE: Hardware
>
> This is going to depend on how well your automated infrastructure is, but
> we chose the path of finding the cheapest servers we could get from
> Dell/HP/etc. 8/12 cores, 72gb memory per node, 2TB/3TB, 2.5".
>
> We are in the process of making changes to our servers, I'll report back
> in when we have more details to share.
>
> I wouldn't recommend 75 CFs. It could work but just seems too complex.
>
> Another recommendation for clusters, always go big. You will be thankful
> in the future for this. Even if you can do this on 3-6 nodes, go much
> larger for future expansion. If you own your hardware and racks, I
> recommend making sure to size out the rack diversity and # of nodes per
> rack. Also take into account the replication factor when doing this. RF=3,
> should be min of 3 racks, and # of nodes per rack should be divisible by
> the replication factor. This has worked out pretty well for us. Our biggest
> problems today are adding 100s of nodes to existing clusters at once. I'm
> not sure how many other companies are having this problem, but it's
> certainly on our radar to improve, if you get to that point :)
>
>
> On Tue, Oct 25, 2011 at 5:23 AM, Alexandru Sicoe <ad...@gmail.com>wrote:
>
>> Hi everyone,
>>
>> I am currently in the process of writing a hardware proposal for a
>> Cassandra cluster for storing a lot of monitoring time series data. My
>> workload is write intensive and my data set is extremely varied in types of
>> variables and insertion rate for these variables (I will have to handle an
>> order of 2 million variables coming in, each at very different rates - the
>> majority of them will come at very low rates but there are many that will
>> come at higher rates constant rates and a few coming in with huge spikes in
>> rates). These variables correspond to all basic C++ types and arrays of
>> these types. The highest insertion rates are received for basic types, out
>> of which U32 variables seem to be the most prevalent (e.g. I recorded 2
>> million U32 vars were inserted in 8 mins of operation while 600.000 doubles
>> and 170.000 strings were inserted during the same time. Note this
>> measurement was only for a subset of the total data currently taken in).
>>
>> At the moment I am partitioning the data in Cassandra in 75 CFs (each CF
>> corresponds to a logical partitioning of the set of variables mentioned
>> before - but this partitioning is not related with the amount of data or
>> rates...it is somewhat random). These 75 CFs account for ~1 million of the
>> variables I need to store. I have a 3 node Cassandra 0.8.5 cluster (each
>> node is a 4 real core with 4 GB RAM and split commit log directory and data
>> file directory between two RAID arrays with HDDs). I can handle the load in
>> this configuration but the average CPU usage of the Cassandra nodes is
>> slightly above 50%. As I will need to add 12 more CFs (corresponding to
>> another ~ 1 million variables) plus potentially other data later, it is
>> clear that I need better hardware (also for the retrieval part).
>>
>> I am looking at Dell servers (Power Edge etc)
>>
>> Questions:
>>
>> 1. Is anyone using Dell HW for their Cassandra clusters? How do they
>> behave? Anybody care to share their configurations or tips for buying, what
>> to avoid etc?
>>
>> 2. Obviously I am going to keep to the advice on the
>> http://wiki.apache.org/cassandra/CassandraHardware and split the
>> commmitlog and data on separate disks. I was going to use SSD for commitlog
>> but then did some more research and found out that it doesn't make sense to
>> use SSDs for sequential appends because it won't have a performance
>> advantage with respect to rotational media. So I am going to use rotational
>> disk for the commit log and an SSD for data. Does this make sense?
>>
>> 3. What's the best way to find out how big my commitlog disk and my data
>> disk has to be? The Cassandra hardware page says the Commitlog disk
>> shouldn't be big but still I need to choose a size!
>>
>> 4. I also noticed RAID 0 configuration is recommended for the data file
>> directory. Can anyone explain why?
>>
>> Sorry for the huge email.....
>>
>> Cheers,
>> Alex
>>
>
>

Re: Cassandra cluster HW spec (commit log directory vs data file directory)

Posted by Alexandru Dan Sicoe <si...@googlemail.com>.
Hi Chris,
 Thanks for your post. I can see you guys handle extremely large amounts of
data compared to my system. Yes I will own the racks and the machines but
the problem is I am limited by actual physical space in our data center
(believe it or not) and also the budget. It would be hard for me to justify
acquisition of more than 3-4 machines, that's why I will need to find a
system that empties Cassandra and transfers the data to another mass
storage system. Thanks for the RAID10 suggestion...I'll look into that!
I've seen everybody warns me about the number of CFs si I'll listen to you
guys and reduce the number.
 Yeah, it would be nice to hear about your HW evolution.....I will report
back as well once I finish my proposal!

Cheers,
Alex

On Sun, Oct 30, 2011 at 4:00 AM, Chris Goffinet <cg...@chrisgoffinet.com>wrote:

> RE: RAID0 Recommendation
>
> Cassandra supports multiple data file directories. Because we do
> compactions, it's just much easier to deal with (1) data file directory
> that is stripped across all disks as 1 volume (RAID0). There are other ways
> to accomplish this though. At Twitter we use software raid (RAID0 & RAID10).
>
> We own the physical hardware and have found that even with hardware raid,
> software raid in Linux actually faster. The reason being is:
>
> http://en.wikipedia.org/wiki/Non-standard_RAID_levels#Linux_MD_RAID_10
>
> We have found that using far-copies is much faster over near-copies. We
> set the i/o scheduler to noop at the moment. We might move back to CFQ with
> more tuning in the future.
>
> We use RAID10 for cases where we need better disk performance if we are
> hitting the disk often, sacrificing storage. We initially thought RAID0
> should be faster over RAID10 until we found out about the near vs far
> layouts.
>
> RE: Hardware
>
> This is going to depend on how well your automated infrastructure is, but
> we chose the path of finding the cheapest servers we could get from
> Dell/HP/etc. 8/12 cores, 72gb memory per node, 2TB/3TB, 2.5".
>
> We are in the process of making changes to our servers, I'll report back
> in when we have more details to share.
>
> I wouldn't recommend 75 CFs. It could work but just seems too complex.
>
> Another recommendation for clusters, always go big. You will be thankful
> in the future for this. Even if you can do this on 3-6 nodes, go much
> larger for future expansion. If you own your hardware and racks, I
> recommend making sure to size out the rack diversity and # of nodes per
> rack. Also take into account the replication factor when doing this. RF=3,
> should be min of 3 racks, and # of nodes per rack should be divisible by
> the replication factor. This has worked out pretty well for us. Our biggest
> problems today are adding 100s of nodes to existing clusters at once. I'm
> not sure how many other companies are having this problem, but it's
> certainly on our radar to improve, if you get to that point :)
>
>
> On Tue, Oct 25, 2011 at 5:23 AM, Alexandru Sicoe <ad...@gmail.com>wrote:
>
>> Hi everyone,
>>
>> I am currently in the process of writing a hardware proposal for a
>> Cassandra cluster for storing a lot of monitoring time series data. My
>> workload is write intensive and my data set is extremely varied in types of
>> variables and insertion rate for these variables (I will have to handle an
>> order of 2 million variables coming in, each at very different rates - the
>> majority of them will come at very low rates but there are many that will
>> come at higher rates constant rates and a few coming in with huge spikes in
>> rates). These variables correspond to all basic C++ types and arrays of
>> these types. The highest insertion rates are received for basic types, out
>> of which U32 variables seem to be the most prevalent (e.g. I recorded 2
>> million U32 vars were inserted in 8 mins of operation while 600.000 doubles
>> and 170.000 strings were inserted during the same time. Note this
>> measurement was only for a subset of the total data currently taken in).
>>
>> At the moment I am partitioning the data in Cassandra in 75 CFs (each CF
>> corresponds to a logical partitioning of the set of variables mentioned
>> before - but this partitioning is not related with the amount of data or
>> rates...it is somewhat random). These 75 CFs account for ~1 million of the
>> variables I need to store. I have a 3 node Cassandra 0.8.5 cluster (each
>> node is a 4 real core with 4 GB RAM and split commit log directory and data
>> file directory between two RAID arrays with HDDs). I can handle the load in
>> this configuration but the average CPU usage of the Cassandra nodes is
>> slightly above 50%. As I will need to add 12 more CFs (corresponding to
>> another ~ 1 million variables) plus potentially other data later, it is
>> clear that I need better hardware (also for the retrieval part).
>>
>> I am looking at Dell servers (Power Edge etc)
>>
>> Questions:
>>
>> 1. Is anyone using Dell HW for their Cassandra clusters? How do they
>> behave? Anybody care to share their configurations or tips for buying, what
>> to avoid etc?
>>
>> 2. Obviously I am going to keep to the advice on the
>> http://wiki.apache.org/cassandra/CassandraHardware and split the
>> commmitlog and data on separate disks. I was going to use SSD for commitlog
>> but then did some more research and found out that it doesn't make sense to
>> use SSDs for sequential appends because it won't have a performance
>> advantage with respect to rotational media. So I am going to use rotational
>> disk for the commit log and an SSD for data. Does this make sense?
>>
>> 3. What's the best way to find out how big my commitlog disk and my data
>> disk has to be? The Cassandra hardware page says the Commitlog disk
>> shouldn't be big but still I need to choose a size!
>>
>> 4. I also noticed RAID 0 configuration is recommended for the data file
>> directory. Can anyone explain why?
>>
>> Sorry for the huge email.....
>>
>> Cheers,
>> Alex
>>
>
>


-- 
Alexandru Dan Sicoe
MEng, CERN Marie Curie ACEOLE Fellow

Re: Cassandra cluster HW spec (commit log directory vs data file directory)

Posted by Chris Goffinet <cg...@chrisgoffinet.com>.
RE: RAID0 Recommendation

Cassandra supports multiple data file directories. Because we do
compactions, it's just much easier to deal with (1) data file directory
that is stripped across all disks as 1 volume (RAID0). There are other ways
to accomplish this though. At Twitter we use software raid (RAID0 & RAID10).

We own the physical hardware and have found that even with hardware raid,
software raid in Linux actually faster. The reason being is:

http://en.wikipedia.org/wiki/Non-standard_RAID_levels#Linux_MD_RAID_10

We have found that using far-copies is much faster over near-copies. We set
the i/o scheduler to noop at the moment. We might move back to CFQ with
more tuning in the future.

We use RAID10 for cases where we need better disk performance if we are
hitting the disk often, sacrificing storage. We initially thought RAID0
should be faster over RAID10 until we found out about the near vs far
layouts.

RE: Hardware

This is going to depend on how well your automated infrastructure is, but
we chose the path of finding the cheapest servers we could get from
Dell/HP/etc. 8/12 cores, 72gb memory per node, 2TB/3TB, 2.5".

We are in the process of making changes to our servers, I'll report back in
when we have more details to share.

I wouldn't recommend 75 CFs. It could work but just seems too complex.

Another recommendation for clusters, always go big. You will be thankful in
the future for this. Even if you can do this on 3-6 nodes, go much larger
for future expansion. If you own your hardware and racks, I recommend
making sure to size out the rack diversity and # of nodes per rack. Also
take into account the replication factor when doing this. RF=3, should be
min of 3 racks, and # of nodes per rack should be divisible by the
replication factor. This has worked out pretty well for us. Our biggest
problems today are adding 100s of nodes to existing clusters at once. I'm
not sure how many other companies are having this problem, but it's
certainly on our radar to improve, if you get to that point :)


On Tue, Oct 25, 2011 at 5:23 AM, Alexandru Sicoe <ad...@gmail.com> wrote:

> Hi everyone,
>
> I am currently in the process of writing a hardware proposal for a
> Cassandra cluster for storing a lot of monitoring time series data. My
> workload is write intensive and my data set is extremely varied in types of
> variables and insertion rate for these variables (I will have to handle an
> order of 2 million variables coming in, each at very different rates - the
> majority of them will come at very low rates but there are many that will
> come at higher rates constant rates and a few coming in with huge spikes in
> rates). These variables correspond to all basic C++ types and arrays of
> these types. The highest insertion rates are received for basic types, out
> of which U32 variables seem to be the most prevalent (e.g. I recorded 2
> million U32 vars were inserted in 8 mins of operation while 600.000 doubles
> and 170.000 strings were inserted during the same time. Note this
> measurement was only for a subset of the total data currently taken in).
>
> At the moment I am partitioning the data in Cassandra in 75 CFs (each CF
> corresponds to a logical partitioning of the set of variables mentioned
> before - but this partitioning is not related with the amount of data or
> rates...it is somewhat random). These 75 CFs account for ~1 million of the
> variables I need to store. I have a 3 node Cassandra 0.8.5 cluster (each
> node is a 4 real core with 4 GB RAM and split commit log directory and data
> file directory between two RAID arrays with HDDs). I can handle the load in
> this configuration but the average CPU usage of the Cassandra nodes is
> slightly above 50%. As I will need to add 12 more CFs (corresponding to
> another ~ 1 million variables) plus potentially other data later, it is
> clear that I need better hardware (also for the retrieval part).
>
> I am looking at Dell servers (Power Edge etc)
>
> Questions:
>
> 1. Is anyone using Dell HW for their Cassandra clusters? How do they
> behave? Anybody care to share their configurations or tips for buying, what
> to avoid etc?
>
> 2. Obviously I am going to keep to the advice on the
> http://wiki.apache.org/cassandra/CassandraHardware and split the
> commmitlog and data on separate disks. I was going to use SSD for commitlog
> but then did some more research and found out that it doesn't make sense to
> use SSDs for sequential appends because it won't have a performance
> advantage with respect to rotational media. So I am going to use rotational
> disk for the commit log and an SSD for data. Does this make sense?
>
> 3. What's the best way to find out how big my commitlog disk and my data
> disk has to be? The Cassandra hardware page says the Commitlog disk
> shouldn't be big but still I need to choose a size!
>
> 4. I also noticed RAID 0 configuration is recommended for the data file
> directory. Can anyone explain why?
>
> Sorry for the huge email.....
>
> Cheers,
> Alex
>

Re: Cassandra cluster HW spec (commit log directory vs data file directory)

Posted by David Jeske <da...@gmail.com>.
On Tue, Oct 25, 2011 at 5:23 AM, Alexandru Sicoe <ad...@gmail.com> wrote:

> At the moment I am partitioning the data in Cassandra in 75 CFs


You might consider not using so many column families. I am not a Cassandra
expert, but from what I've seen floated around, there is currently a unique
memtable, commit log, and sorted-table fileset per column-family. As a
result, you can both use less memory (for memtable) and get higher write
throughput by using one column family than 75.  An alternative to column
families is using key-prefixing.

Perhaps someone with more definitive knowledge can chime in.

Re: Cassandra cluster HW spec (commit log directory vs data file directory)

Posted by Mohit Anchlia <mo...@gmail.com>.
If you need to have this data available outside the private network
then why not create the cluster outside itself? It seems inefficient
that you would do bulk transfers. You  might think of an alternate
design using queues, subscribers or exposing Cassandra over HTTP etc.
You could also look at http://www.stunnel.org/

On Tue, Oct 25, 2011 at 1:14 PM, Alexandru Dan Sicoe
<si...@googlemail.com> wrote:
> Thanks for the detailed answers Dan, what you said makes sense. I think my
> biggest worry right now is making the correct preditions of my data storage
> space based on the measurements with the current cluster. Other than that I
> should be fairly comfortable with the rest of the HW specs.
>
> Thanks for the observation Mohit, I'll keep a closer eye to this disk
> parameter which I do see in the specs all the time.
>
> Todd, your link explains questions I have had for quite some time .... I
> have found that indeed I am dominated by metadata like one of the example
> shows.
>
> Since we're on this subject, I want to ask you guys another question. I have
> my monitoring data sources within an enclosed network, so my Cassandra
> cluster will also be in that enclosed network (by enclosed network I mean
> any communication or data transfer in and out of the network must go through
> a gateway). The problem is I need to make the data available outside!
> Do any of you guys have some suggestions for doing that? My first thought
> was to have an internal 3 node cluster taking in the insertion load and
> then, in the period of low load do a major compaction and then ship the data
> out to an external Cassandra node used only for reading. This outside node
> would have to have a lot of disk (hold the data for 1 year) and be optimised
> for reading - I was thinking of having an SSD caching layer between my bulk
> storage and Cassandra. Only hot data will go in this layer....somehow!
>
> So my questions:
> 1) Is my method unheard of or does it sound reasonable?
> 2) What is the best way to transfer data from the cluster inside the
> enclosed network to the node outside? I heard some time in the past that
> there is a tool that does bulk transfers of data but I'm not sure how that
> can be done...a script that calls this tool on a certain
> trigger..........any ideas?
> 3) Is this intermediate SSD cache thing doable...or I should just stick to
> the normal RAID array of disks and the indexes and in memory caching of
> columns that Cassandra offers?
>
> Cheers,
> Alex
>
> On Tue, Oct 25, 2011 at 9:06 PM, Todd Burruss <bb...@expedia.com> wrote:
>>
>> This may help determining your data storage requirements ...
>>
>> http://btoddb-cass-storage.blogspot.com/
>>
>>
>>
>> On 10/25/11 11:22 AM, "Mohit Anchlia" <mo...@gmail.com> wrote:
>>
>> >On Tue, Oct 25, 2011 at 11:18 AM, Dan Hendry <da...@gmail.com>
>> >wrote:
>> >>> 2. ... So I am going to use rotational disk for the commit log and an
>> >>>SSD
>> >>> for data. Does this make sense?
>> >>
>> >>
>> >>
>> >> Yes, just keep in mind however that the primary characteristic of SSDs
>> >>is
>> >> lower seek times which translates into faster random access. We have a
>> >> similar Cassandra use case (time series data and comparable volumes)
>> >> and
>> >> decided the random read performance boost (unquantified in our case to
>> >>be
>> >> fair) was not worth the price and we went with more, larger, cheaper
>> >>7.2k
>> >> HDDs.
>> >>
>> >>
>> >>
>> >>> 3. What's the best way to find out how big my commitlog disk and my
>> >>>data
>> >>> disk has to be? The Cassandra hardware page says the Commitlog disk
>> >>> shouldn't be big but still I need to choose a size!
>> >>
>> >>
>> >>
>> >> As of Cassandra 1.0, the commit log has an explicit size bound
>> >>(defaulting
>> >> to 4GB I believe). In 0.8, I dont think I have ever seen my commit log
>> >>grow
>> >> beyond that point but the limit should be the ammount of data you
>> >> insert
>> >> within the maximum CF timed flush period (³memtable_flush_after²
>> >>parameter,
>> >> to be safe, maximumum across all CFs). Any modern drive should be
>> >> sufficient. As for the size of your data disks, that is largely
>> >>application
>> >> dependent, and you should be able to judge best based on your currnet
>> >> cluster.
>> >>
>> >>
>> >>
>> >>> 4. I also noticed RAID 0 configuration is recommended for the data
>> >>> file
>> >>> directory. Can anyone explain why?
>> >>
>> >>
>> >>
>> >> In comparison to RAID1/RAID1+0? For any RF > 1, Cassadra already takes
>> >>care
>> >> of redundancy by replicating the data across multiple nodes. Your
>> >> applications choice of replication factor and read/write consistencies
>> >> should be specified to tollerate a node failing (for any reason: disk
>> >> failure, network failure, a disgruntled employee taking a sledge hammer
>> >>to
>> >> the box, etc). As such, what is the point of waisting your disks
>> >>duplicating
>> >> data on a single machine to minimize the chances of one particular type
>> >>of
>> >> failure when it should not matter anyways?
>> >
>> >It all boils down to operations cost vs hardware cost. Also consider
>> >MTBF and how equipped you are to handle disk failures which are more
>> >common than others.
>> >>
>> >>
>> >>
>> >> Dan
>> >>
>> >>
>> >>
>> >> From: Alexandru Sicoe [mailto:adsicoe@gmail.com]
>> >> Sent: October-25-11 8:23
>> >> To: user@cassandra.apache.org
>> >> Subject: Cassandra cluster HW spec (commit log directory vs data file
>> >> directory)
>> >>
>> >>
>> >>
>> >> Hi everyone,
>> >>
>> >> I am currently in the process of writing a hardware proposal for a
>> >>Cassandra
>> >> cluster for storing a lot of monitoring time series data. My workload
>> >> is
>> >> write intensive and my data set is extremely varied in types of
>> >>variables
>> >> and insertion rate for these variables (I will have to handle an order
>> >>of 2
>> >> million variables coming in, each at very different rates - the
>> >>majority of
>> >> them will come at very low rates but there are many that will come at
>> >>higher
>> >> rates constant rates and a few coming in with huge spikes in rates).
>> >>These
>> >> variables correspond to all basic C++ types and arrays of these types.
>> >>The
>> >> highest insertion rates are received for basic types, out of which U32
>> >> variables seem to be the most prevalent (e.g. I recorded 2 million U32
>> >>vars
>> >> were inserted in 8 mins of operation while 600.000 doubles and 170.000
>> >> strings were inserted during the same time. Note this measurement was
>> >>only
>> >> for a subset of the total data currently taken in).
>> >>
>> >> At the moment I am partitioning the data in Cassandra in 75 CFs (each
>> >> CF
>> >> corresponds to a logical partitioning of the set of variables mentioned
>> >> before - but this partitioning is not related with the amount of data
>> >> or
>> >> rates...it is somewhat random). These 75 CFs account for ~1 million of
>> >>the
>> >> variables I need to store. I have a 3 node Cassandra 0.8.5 cluster
>> >> (each
>> >> node is a 4 real core with 4 GB RAM and split commit log directory and
>> >>data
>> >> file directory between two RAID arrays with HDDs). I can handle the
>> >>load in
>> >> this configuration but the average CPU usage of the Cassandra nodes is
>> >> slightly above 50%. As I will need to add 12 more CFs (corresponding to
>> >> another ~ 1 million variables) plus potentially other data later, it is
>> >> clear that I need better hardware (also for the retrieval part).
>> >>
>> >> I am looking at Dell servers (Power Edge etc)
>> >>
>> >> Questions:
>> >>
>> >> 1. Is anyone using Dell HW for their Cassandra clusters? How do they
>> >>behave?
>> >> Anybody care to share their configurations or tips for buying, what to
>> >>avoid
>> >> etc?
>> >>
>> >> 2. Obviously I am going to keep to the advice on the
>> >> http://wiki.apache.org/cassandra/CassandraHardware and split the
>> >>commmitlog
>> >> and data on separate disks. I was going to use SSD for commitlog but
>> >>then
>> >> did some more research and found out that it doesn't make sense to use
>> >>SSDs
>> >> for sequential appends because it won't have a performance advantage
>> >>with
>> >> respect to rotational media. So I am going to use rotational disk for
>> >>the
>> >> commit log and an SSD for data. Does this make sense?
>> >>
>> >> 3. What's the best way to find out how big my commitlog disk and my
>> >> data
>> >> disk has to be? The Cassandra hardware page says the Commitlog disk
>> >> shouldn't be big but still I need to choose a size!
>> >>
>> >> 4. I also noticed RAID 0 configuration is recommended for the data file
>> >> directory. Can anyone explain why?
>> >>
>> >> Sorry for the huge email.....
>> >>
>> >> Cheers,
>> >> Alex
>> >>
>> >> No virus found in this incoming message.
>> >> Checked by AVG - www.avg.com
>> >> Version: 9.0.920 / Virus Database: 271.1.1/3972 - Release Date:
>> >> 10/24/11
>> >> 14:35:00
>>
>
>

Re: Cassandra cluster HW spec (commit log directory vs data file directory)

Posted by Alexandru Dan Sicoe <si...@googlemail.com>.
Thanks for the detailed answers Dan, what you said makes sense. I think my
biggest worry right now is making the correct preditions of my data storage
space based on the measurements with the current cluster. Other than that I
should be fairly comfortable with the rest of the HW specs.

Thanks for the observation Mohit, I'll keep a closer eye to this disk
parameter which I do see in the specs all the time.

Todd, your link explains questions I have had for quite some time .... I
have found that indeed I am dominated by metadata like one of the example
shows.

Since we're on this subject, I want to ask you guys another question. I have
my monitoring data sources within an enclosed network, so my Cassandra
cluster will also be in that enclosed network (by enclosed network I mean
any communication or data transfer in and out of the network must go through
a gateway). The problem is I need to make the data available outside!
Do any of you guys have some suggestions for doing that? My first thought
was to have an internal 3 node cluster taking in the insertion load and
then, in the period of low load do a major compaction and then ship the data
out to an external Cassandra node used only for reading. This outside node
would have to have a lot of disk (hold the data for 1 year) and be optimised
for reading - I was thinking of having an SSD caching layer between my bulk
storage and Cassandra. Only hot data will go in this layer....somehow!

So my questions:
1) Is my method unheard of or does it sound reasonable?
2) What is the best way to transfer data from the cluster inside the
enclosed network to the node outside? I heard some time in the past that
there is a tool that does bulk transfers of data but I'm not sure how that
can be done...a script that calls this tool on a certain
trigger..........any ideas?
3) Is this intermediate SSD cache thing doable...or I should just stick to
the normal RAID array of disks and the indexes and in memory caching of
columns that Cassandra offers?

Cheers,
Alex

On Tue, Oct 25, 2011 at 9:06 PM, Todd Burruss <bb...@expedia.com> wrote:

> This may help determining your data storage requirements ...
>
> http://btoddb-cass-storage.blogspot.com/
>
>
>
> On 10/25/11 11:22 AM, "Mohit Anchlia" <mo...@gmail.com> wrote:
>
> >On Tue, Oct 25, 2011 at 11:18 AM, Dan Hendry <da...@gmail.com>
> >wrote:
> >>> 2. ... So I am going to use rotational disk for the commit log and an
> >>>SSD
> >>> for data. Does this make sense?
> >>
> >>
> >>
> >> Yes, just keep in mind however that the primary characteristic of SSDs
> >>is
> >> lower seek times which translates into faster random access. We have a
> >> similar Cassandra use case (time series data and comparable volumes) and
> >> decided the random read performance boost (unquantified in our case to
> >>be
> >> fair) was not worth the price and we went with more, larger, cheaper
> >>7.2k
> >> HDDs.
> >>
> >>
> >>
> >>> 3. What's the best way to find out how big my commitlog disk and my
> >>>data
> >>> disk has to be? The Cassandra hardware page says the Commitlog disk
> >>> shouldn't be big but still I need to choose a size!
> >>
> >>
> >>
> >> As of Cassandra 1.0, the commit log has an explicit size bound
> >>(defaulting
> >> to 4GB I believe). In 0.8, I dont think I have ever seen my commit log
> >>grow
> >> beyond that point but the limit should be the ammount of data you insert
> >> within the maximum CF timed flush period (³memtable_flush_after²
> >>parameter,
> >> to be safe, maximumum across all CFs). Any modern drive should be
> >> sufficient. As for the size of your data disks, that is largely
> >>application
> >> dependent, and you should be able to judge best based on your currnet
> >> cluster.
> >>
> >>
> >>
> >>> 4. I also noticed RAID 0 configuration is recommended for the data file
> >>> directory. Can anyone explain why?
> >>
> >>
> >>
> >> In comparison to RAID1/RAID1+0? For any RF > 1, Cassadra already takes
> >>care
> >> of redundancy by replicating the data across multiple nodes. Your
> >> applications choice of replication factor and read/write consistencies
> >> should be specified to tollerate a node failing (for any reason: disk
> >> failure, network failure, a disgruntled employee taking a sledge hammer
> >>to
> >> the box, etc). As such, what is the point of waisting your disks
> >>duplicating
> >> data on a single machine to minimize the chances of one particular type
> >>of
> >> failure when it should not matter anyways?
> >
> >It all boils down to operations cost vs hardware cost. Also consider
> >MTBF and how equipped you are to handle disk failures which are more
> >common than others.
> >>
> >>
> >>
> >> Dan
> >>
> >>
> >>
> >> From: Alexandru Sicoe [mailto:adsicoe@gmail.com]
> >> Sent: October-25-11 8:23
> >> To: user@cassandra.apache.org
> >> Subject: Cassandra cluster HW spec (commit log directory vs data file
> >> directory)
> >>
> >>
> >>
> >> Hi everyone,
> >>
> >> I am currently in the process of writing a hardware proposal for a
> >>Cassandra
> >> cluster for storing a lot of monitoring time series data. My workload is
> >> write intensive and my data set is extremely varied in types of
> >>variables
> >> and insertion rate for these variables (I will have to handle an order
> >>of 2
> >> million variables coming in, each at very different rates - the
> >>majority of
> >> them will come at very low rates but there are many that will come at
> >>higher
> >> rates constant rates and a few coming in with huge spikes in rates).
> >>These
> >> variables correspond to all basic C++ types and arrays of these types.
> >>The
> >> highest insertion rates are received for basic types, out of which U32
> >> variables seem to be the most prevalent (e.g. I recorded 2 million U32
> >>vars
> >> were inserted in 8 mins of operation while 600.000 doubles and 170.000
> >> strings were inserted during the same time. Note this measurement was
> >>only
> >> for a subset of the total data currently taken in).
> >>
> >> At the moment I am partitioning the data in Cassandra in 75 CFs (each CF
> >> corresponds to a logical partitioning of the set of variables mentioned
> >> before - but this partitioning is not related with the amount of data or
> >> rates...it is somewhat random). These 75 CFs account for ~1 million of
> >>the
> >> variables I need to store. I have a 3 node Cassandra 0.8.5 cluster (each
> >> node is a 4 real core with 4 GB RAM and split commit log directory and
> >>data
> >> file directory between two RAID arrays with HDDs). I can handle the
> >>load in
> >> this configuration but the average CPU usage of the Cassandra nodes is
> >> slightly above 50%. As I will need to add 12 more CFs (corresponding to
> >> another ~ 1 million variables) plus potentially other data later, it is
> >> clear that I need better hardware (also for the retrieval part).
> >>
> >> I am looking at Dell servers (Power Edge etc)
> >>
> >> Questions:
> >>
> >> 1. Is anyone using Dell HW for their Cassandra clusters? How do they
> >>behave?
> >> Anybody care to share their configurations or tips for buying, what to
> >>avoid
> >> etc?
> >>
> >> 2. Obviously I am going to keep to the advice on the
> >> http://wiki.apache.org/cassandra/CassandraHardware and split the
> >>commmitlog
> >> and data on separate disks. I was going to use SSD for commitlog but
> >>then
> >> did some more research and found out that it doesn't make sense to use
> >>SSDs
> >> for sequential appends because it won't have a performance advantage
> >>with
> >> respect to rotational media. So I am going to use rotational disk for
> >>the
> >> commit log and an SSD for data. Does this make sense?
> >>
> >> 3. What's the best way to find out how big my commitlog disk and my data
> >> disk has to be? The Cassandra hardware page says the Commitlog disk
> >> shouldn't be big but still I need to choose a size!
> >>
> >> 4. I also noticed RAID 0 configuration is recommended for the data file
> >> directory. Can anyone explain why?
> >>
> >> Sorry for the huge email.....
> >>
> >> Cheers,
> >> Alex
> >>
> >> No virus found in this incoming message.
> >> Checked by AVG - www.avg.com
> >> Version: 9.0.920 / Virus Database: 271.1.1/3972 - Release Date: 10/24/11
> >> 14:35:00
>
>

Re: Cassandra cluster HW spec (commit log directory vs data file directory)

Posted by Todd Burruss <bb...@expedia.com>.
This may help determining your data storage requirements ...

http://btoddb-cass-storage.blogspot.com/



On 10/25/11 11:22 AM, "Mohit Anchlia" <mo...@gmail.com> wrote:

>On Tue, Oct 25, 2011 at 11:18 AM, Dan Hendry <da...@gmail.com>
>wrote:
>>> 2. ... So I am going to use rotational disk for the commit log and an
>>>SSD
>>> for data. Does this make sense?
>>
>>
>>
>> Yes, just keep in mind however that the primary characteristic of SSDs
>>is
>> lower seek times which translates into faster random access. We have a
>> similar Cassandra use case (time series data and comparable volumes) and
>> decided the random read performance boost (unquantified in our case to
>>be
>> fair) was not worth the price and we went with more, larger, cheaper
>>7.2k
>> HDDs.
>>
>>
>>
>>> 3. What's the best way to find out how big my commitlog disk and my
>>>data
>>> disk has to be? The Cassandra hardware page says the Commitlog disk
>>> shouldn't be big but still I need to choose a size!
>>
>>
>>
>> As of Cassandra 1.0, the commit log has an explicit size bound
>>(defaulting
>> to 4GB I believe). In 0.8, I dont think I have ever seen my commit log
>>grow
>> beyond that point but the limit should be the ammount of data you insert
>> within the maximum CF timed flush period (³memtable_flush_after²
>>parameter,
>> to be safe, maximumum across all CFs). Any modern drive should be
>> sufficient. As for the size of your data disks, that is largely
>>application
>> dependent, and you should be able to judge best based on your currnet
>> cluster.
>>
>>
>>
>>> 4. I also noticed RAID 0 configuration is recommended for the data file
>>> directory. Can anyone explain why?
>>
>>
>>
>> In comparison to RAID1/RAID1+0? For any RF > 1, Cassadra already takes
>>care
>> of redundancy by replicating the data across multiple nodes. Your
>> applications choice of replication factor and read/write consistencies
>> should be specified to tollerate a node failing (for any reason: disk
>> failure, network failure, a disgruntled employee taking a sledge hammer
>>to
>> the box, etc). As such, what is the point of waisting your disks
>>duplicating
>> data on a single machine to minimize the chances of one particular type
>>of
>> failure when it should not matter anyways?
>
>It all boils down to operations cost vs hardware cost. Also consider
>MTBF and how equipped you are to handle disk failures which are more
>common than others.
>>
>>
>>
>> Dan
>>
>>
>>
>> From: Alexandru Sicoe [mailto:adsicoe@gmail.com]
>> Sent: October-25-11 8:23
>> To: user@cassandra.apache.org
>> Subject: Cassandra cluster HW spec (commit log directory vs data file
>> directory)
>>
>>
>>
>> Hi everyone,
>>
>> I am currently in the process of writing a hardware proposal for a
>>Cassandra
>> cluster for storing a lot of monitoring time series data. My workload is
>> write intensive and my data set is extremely varied in types of
>>variables
>> and insertion rate for these variables (I will have to handle an order
>>of 2
>> million variables coming in, each at very different rates - the
>>majority of
>> them will come at very low rates but there are many that will come at
>>higher
>> rates constant rates and a few coming in with huge spikes in rates).
>>These
>> variables correspond to all basic C++ types and arrays of these types.
>>The
>> highest insertion rates are received for basic types, out of which U32
>> variables seem to be the most prevalent (e.g. I recorded 2 million U32
>>vars
>> were inserted in 8 mins of operation while 600.000 doubles and 170.000
>> strings were inserted during the same time. Note this measurement was
>>only
>> for a subset of the total data currently taken in).
>>
>> At the moment I am partitioning the data in Cassandra in 75 CFs (each CF
>> corresponds to a logical partitioning of the set of variables mentioned
>> before - but this partitioning is not related with the amount of data or
>> rates...it is somewhat random). These 75 CFs account for ~1 million of
>>the
>> variables I need to store. I have a 3 node Cassandra 0.8.5 cluster (each
>> node is a 4 real core with 4 GB RAM and split commit log directory and
>>data
>> file directory between two RAID arrays with HDDs). I can handle the
>>load in
>> this configuration but the average CPU usage of the Cassandra nodes is
>> slightly above 50%. As I will need to add 12 more CFs (corresponding to
>> another ~ 1 million variables) plus potentially other data later, it is
>> clear that I need better hardware (also for the retrieval part).
>>
>> I am looking at Dell servers (Power Edge etc)
>>
>> Questions:
>>
>> 1. Is anyone using Dell HW for their Cassandra clusters? How do they
>>behave?
>> Anybody care to share their configurations or tips for buying, what to
>>avoid
>> etc?
>>
>> 2. Obviously I am going to keep to the advice on the
>> http://wiki.apache.org/cassandra/CassandraHardware and split the
>>commmitlog
>> and data on separate disks. I was going to use SSD for commitlog but
>>then
>> did some more research and found out that it doesn't make sense to use
>>SSDs
>> for sequential appends because it won't have a performance advantage
>>with
>> respect to rotational media. So I am going to use rotational disk for
>>the
>> commit log and an SSD for data. Does this make sense?
>>
>> 3. What's the best way to find out how big my commitlog disk and my data
>> disk has to be? The Cassandra hardware page says the Commitlog disk
>> shouldn't be big but still I need to choose a size!
>>
>> 4. I also noticed RAID 0 configuration is recommended for the data file
>> directory. Can anyone explain why?
>>
>> Sorry for the huge email.....
>>
>> Cheers,
>> Alex
>>
>> No virus found in this incoming message.
>> Checked by AVG - www.avg.com
>> Version: 9.0.920 / Virus Database: 271.1.1/3972 - Release Date: 10/24/11
>> 14:35:00


Re: Cassandra cluster HW spec (commit log directory vs data file directory)

Posted by Mohit Anchlia <mo...@gmail.com>.
On Tue, Oct 25, 2011 at 11:18 AM, Dan Hendry <da...@gmail.com> wrote:
>> 2. ... So I am going to use rotational disk for the commit log and an SSD
>> for data. Does this make sense?
>
>
>
> Yes, just keep in mind however that the primary characteristic of SSDs is
> lower seek times which translates into faster random access. We have a
> similar Cassandra use case (time series data and comparable volumes) and
> decided the random read performance boost (unquantified in our case to be
> fair) was not worth the price and we went with more, larger, cheaper 7.2k
> HDDs.
>
>
>
>> 3. What's the best way to find out how big my commitlog disk and my data
>> disk has to be? The Cassandra hardware page says the Commitlog disk
>> shouldn't be big but still I need to choose a size!
>
>
>
> As of Cassandra 1.0, the commit log has an explicit size bound (defaulting
> to 4GB I believe). In 0.8, I dont think I have ever seen my commit log grow
> beyond that point but the limit should be the ammount of data you insert
> within the maximum CF timed flush period (“memtable_flush_after” parameter,
> to be safe, maximumum across all CFs). Any modern drive should be
> sufficient. As for the size of your data disks, that is largely application
> dependent, and you should be able to judge best based on your currnet
> cluster.
>
>
>
>> 4. I also noticed RAID 0 configuration is recommended for the data file
>> directory. Can anyone explain why?
>
>
>
> In comparison to RAID1/RAID1+0? For any RF > 1, Cassadra already takes care
> of redundancy by replicating the data across multiple nodes. Your
> applications choice of replication factor and read/write consistencies
> should be specified to tollerate a node failing (for any reason: disk
> failure, network failure, a disgruntled employee taking a sledge hammer to
> the box, etc). As such, what is the point of waisting your disks duplicating
> data on a single machine to minimize the chances of one particular type of
> failure when it should not matter anyways?

It all boils down to operations cost vs hardware cost. Also consider
MTBF and how equipped you are to handle disk failures which are more
common than others.
>
>
>
> Dan
>
>
>
> From: Alexandru Sicoe [mailto:adsicoe@gmail.com]
> Sent: October-25-11 8:23
> To: user@cassandra.apache.org
> Subject: Cassandra cluster HW spec (commit log directory vs data file
> directory)
>
>
>
> Hi everyone,
>
> I am currently in the process of writing a hardware proposal for a Cassandra
> cluster for storing a lot of monitoring time series data. My workload is
> write intensive and my data set is extremely varied in types of variables
> and insertion rate for these variables (I will have to handle an order of 2
> million variables coming in, each at very different rates - the majority of
> them will come at very low rates but there are many that will come at higher
> rates constant rates and a few coming in with huge spikes in rates). These
> variables correspond to all basic C++ types and arrays of these types. The
> highest insertion rates are received for basic types, out of which U32
> variables seem to be the most prevalent (e.g. I recorded 2 million U32 vars
> were inserted in 8 mins of operation while 600.000 doubles and 170.000
> strings were inserted during the same time. Note this measurement was only
> for a subset of the total data currently taken in).
>
> At the moment I am partitioning the data in Cassandra in 75 CFs (each CF
> corresponds to a logical partitioning of the set of variables mentioned
> before - but this partitioning is not related with the amount of data or
> rates...it is somewhat random). These 75 CFs account for ~1 million of the
> variables I need to store. I have a 3 node Cassandra 0.8.5 cluster (each
> node is a 4 real core with 4 GB RAM and split commit log directory and data
> file directory between two RAID arrays with HDDs). I can handle the load in
> this configuration but the average CPU usage of the Cassandra nodes is
> slightly above 50%. As I will need to add 12 more CFs (corresponding to
> another ~ 1 million variables) plus potentially other data later, it is
> clear that I need better hardware (also for the retrieval part).
>
> I am looking at Dell servers (Power Edge etc)
>
> Questions:
>
> 1. Is anyone using Dell HW for their Cassandra clusters? How do they behave?
> Anybody care to share their configurations or tips for buying, what to avoid
> etc?
>
> 2. Obviously I am going to keep to the advice on the
> http://wiki.apache.org/cassandra/CassandraHardware and split the commmitlog
> and data on separate disks. I was going to use SSD for commitlog but then
> did some more research and found out that it doesn't make sense to use SSDs
> for sequential appends because it won't have a performance advantage with
> respect to rotational media. So I am going to use rotational disk for the
> commit log and an SSD for data. Does this make sense?
>
> 3. What's the best way to find out how big my commitlog disk and my data
> disk has to be? The Cassandra hardware page says the Commitlog disk
> shouldn't be big but still I need to choose a size!
>
> 4. I also noticed RAID 0 configuration is recommended for the data file
> directory. Can anyone explain why?
>
> Sorry for the huge email.....
>
> Cheers,
> Alex
>
> No virus found in this incoming message.
> Checked by AVG - www.avg.com
> Version: 9.0.920 / Virus Database: 271.1.1/3972 - Release Date: 10/24/11
> 14:35:00

RE: Cassandra cluster HW spec (commit log directory vs data file directory)

Posted by Dan Hendry <da...@gmail.com>.
> 2. ... So I am going to use rotational disk for the commit log and an SSD
for data. Does this make sense?

 

Yes, just keep in mind however that the primary characteristic of SSDs is
lower seek times which translates into faster random access. We have a
similar Cassandra use case (time series data and comparable volumes) and
decided the random read performance boost (unquantified in our case to be
fair) was not worth the price and we went with more, larger, cheaper 7.2k
HDDs. 

 

> 3. What's the best way to find out how big my commitlog disk and my data
disk has to be? The Cassandra hardware page says the Commitlog disk
shouldn't be big but still I need to choose a size!

 

As of Cassandra 1.0, the commit log has an explicit size bound (defaulting
to 4GB I believe). In 0.8, I dont think I have ever seen my commit log grow
beyond that point but the limit should be the ammount of data you insert
within the maximum CF timed flush period ("memtable_flush_after" parameter,
to be safe, maximumum across all CFs). Any modern drive should be
sufficient. As for the size of your data disks, that is largely application
dependent, and you should be able to judge best based on your currnet
cluster. 

 

> 4. I also noticed RAID 0 configuration is recommended for the data file
directory. Can anyone explain why?

 

In comparison to RAID1/RAID1+0? For any RF > 1, Cassadra already takes care
of redundancy by replicating the data across multiple nodes. Your
applications choice of replication factor and read/write consistencies
should be specified to tollerate a node failing (for any reason: disk
failure, network failure, a disgruntled employee taking a sledge hammer to
the box, etc). As such, what is the point of waisting your disks duplicating
data on a single machine to minimize the chances of one particular type of
failure when it should not matter anyways? 

 

Dan

 

From: Alexandru Sicoe [mailto:adsicoe@gmail.com] 
Sent: October-25-11 8:23
To: user@cassandra.apache.org
Subject: Cassandra cluster HW spec (commit log directory vs data file
directory)

 

Hi everyone,

I am currently in the process of writing a hardware proposal for a Cassandra
cluster for storing a lot of monitoring time series data. My workload is
write intensive and my data set is extremely varied in types of variables
and insertion rate for these variables (I will have to handle an order of 2
million variables coming in, each at very different rates - the majority of
them will come at very low rates but there are many that will come at higher
rates constant rates and a few coming in with huge spikes in rates). These
variables correspond to all basic C++ types and arrays of these types. The
highest insertion rates are received for basic types, out of which U32
variables seem to be the most prevalent (e.g. I recorded 2 million U32 vars
were inserted in 8 mins of operation while 600.000 doubles and 170.000
strings were inserted during the same time. Note this measurement was only
for a subset of the total data currently taken in).

At the moment I am partitioning the data in Cassandra in 75 CFs (each CF
corresponds to a logical partitioning of the set of variables mentioned
before - but this partitioning is not related with the amount of data or
rates...it is somewhat random). These 75 CFs account for ~1 million of the
variables I need to store. I have a 3 node Cassandra 0.8.5 cluster (each
node is a 4 real core with 4 GB RAM and split commit log directory and data
file directory between two RAID arrays with HDDs). I can handle the load in
this configuration but the average CPU usage of the Cassandra nodes is
slightly above 50%. As I will need to add 12 more CFs (corresponding to
another ~ 1 million variables) plus potentially other data later, it is
clear that I need better hardware (also for the retrieval part).

I am looking at Dell servers (Power Edge etc)

Questions:

1. Is anyone using Dell HW for their Cassandra clusters? How do they behave?
Anybody care to share their configurations or tips for buying, what to avoid
etc?

2. Obviously I am going to keep to the advice on the
http://wiki.apache.org/cassandra/CassandraHardware and split the commmitlog
and data on separate disks. I was going to use SSD for commitlog but then
did some more research and found out that it doesn't make sense to use SSDs
for sequential appends because it won't have a performance advantage with
respect to rotational media. So I am going to use rotational disk for the
commit log and an SSD for data. Does this make sense?

3. What's the best way to find out how big my commitlog disk and my data
disk has to be? The Cassandra hardware page says the Commitlog disk
shouldn't be big but still I need to choose a size!

4. I also noticed RAID 0 configuration is recommended for the data file
directory. Can anyone explain why?

Sorry for the huge email.....

Cheers,
Alex

No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 9.0.920 / Virus Database: 271.1.1/3972 - Release Date: 10/24/11
14:35:00