You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Anthony Molinaro <an...@alumni.caltech.edu> on 2010/01/13 20:26:28 UTC
Tuning and upgrades
Hi,
So after several days of more close examination, I've discovered
something. EC2 io performance is pretty bad. Well okay, we already
all knew that, and I have no choice but to deal with it, as moving
at this time is not an option. But what I've really discovered is
my data is unevenly distributed which I believe is a result of using
random partitioning without specifying tokens. So what I think I can
do to solve this is upgrade to 0.5.0rc3, add more instances, and use
the tools to modify token ranges. Towards that end I had a few
questions about different topics.
Data gathering:
When I run cfstats I get something like this
Keyspace: XXXXXXXX
Read Count: 39287
Read Latency: 14.588 ms.
Write Count: 13930
Write Latency: 0.062 ms.
on a heavily loaded node and
Keyspace: XXXXXXXX
Read Count: 8672
Read Latency: 1.072 ms.
Write Count: 2126
Write Latency: 0.000 ms.
on a lightly loaded node, but my question is what is the timeframe
of the counts? Does a read count of 8K say that 8K reads are currently
in progress, or 8K since the last time I check or 8K for some interval?
Data Striping:
One option I have is to add additional ebs volumes, then either turn
on raid0 across several ebs's or possibly just add additional
<DataFileDirectory> elements to my config? If I were to add
<DataFileDirectory> entries, can I just move sstable's between
directories? If so I assume I want the Index, Filter and Data files
to be in the same directory? Or is this data movement something
Cassandra will do for me? Also, is this likely to help?
Upgrades:
I understand that to upgrade from 0.4.x to 0.5.x I need to do something
like
1. turn off all writes to a node
2. call 'nodeprobe flush' on that node
3. restart node with version 0.5.x
Is this correct?
Data Repartitioning:
So it seems that if I first upgrade my current nodes to 0.5.0, then
bring up some new nodes with AutoBootstrap on, they should take some
data from the most loaded machines? But lets say I just want to first
even out the load on existing nodes, would the process be something like
1. calculate ideal key ranges (ie, i * (2**127 /N) for i=1..N)
(this seems like the ideal candidate for a new tool included
with cassandra).
2. foreach node
'nodeprobe move' to ideal range
3. foreach node
'nodeprobe clean'
Alternatively, it looks like I might be able to use 'nodeprobe loadbalance'
for step 2, and not use step 1?
Also, anyone else running in EC2 and have any sort of tuning tips?
Thanks,
-Anthony
--
------------------------------------------------------------------------
Anthony Molinaro <an...@alumni.caltech.edu>
Re: Tuning and upgrades
Posted by Ryan Daum <ry...@thimbleware.com>.
Sounds like you have a similar configuration to us.
We have 6 EC2 small instances, with EBS for storage.
Nothing scientific for benchmarks right now, but typically we can retrieve
60,000 columns scattered across 3600 row keys in about 7-10 seconds.
Writes haven't been a bottleneck at all.
I also have a key distribution issue similar to what you describe. So I will
be attempting the same recipe as you shortly.
I'm very interested in what your experiences are running Cassandra on EC2.
Ryan
On Wed, Jan 13, 2010 at 2:26 PM, Anthony Molinaro <
anthonym@alumni.caltech.edu> wrote:
> Hi,
>
> So after several days of more close examination, I've discovered
> something. EC2 io performance is pretty bad. Well okay, we already
> all knew that, and I have no choice but to deal with it, as moving
> at this time is not an option. But what I've really discovered is
> my data is unevenly distributed which I believe is a result of using
> random partitioning without specifying tokens. So what I think I can
> do to solve this is upgrade to 0.5.0rc3, add more instances, and use
> the tools to modify token ranges. Towards that end I had a few
> questions about different topics.
>
> Data gathering:
>
> When I run cfstats I get something like this
>
> Keyspace: XXXXXXXX
> Read Count: 39287
> Read Latency: 14.588 ms.
> Write Count: 13930
> Write Latency: 0.062 ms.
>
> on a heavily loaded node and
>
> Keyspace: XXXXXXXX
> Read Count: 8672
> Read Latency: 1.072 ms.
> Write Count: 2126
> Write Latency: 0.000 ms.
>
> on a lightly loaded node, but my question is what is the timeframe
> of the counts? Does a read count of 8K say that 8K reads are currently
> in progress, or 8K since the last time I check or 8K for some interval?
>
> Data Striping:
>
> One option I have is to add additional ebs volumes, then either turn
> on raid0 across several ebs's or possibly just add additional
> <DataFileDirectory> elements to my config? If I were to add
> <DataFileDirectory> entries, can I just move sstable's between
> directories? If so I assume I want the Index, Filter and Data files
> to be in the same directory? Or is this data movement something
> Cassandra will do for me? Also, is this likely to help?
>
> Upgrades:
>
> I understand that to upgrade from 0.4.x to 0.5.x I need to do something
> like
>
> 1. turn off all writes to a node
> 2. call 'nodeprobe flush' on that node
> 3. restart node with version 0.5.x
>
> Is this correct?
>
> Data Repartitioning:
>
> So it seems that if I first upgrade my current nodes to 0.5.0, then
> bring up some new nodes with AutoBootstrap on, they should take some
> data from the most loaded machines? But lets say I just want to first
> even out the load on existing nodes, would the process be something like
>
> 1. calculate ideal key ranges (ie, i * (2**127 /N) for i=1..N)
> (this seems like the ideal candidate for a new tool included
> with cassandra).
> 2. foreach node
> 'nodeprobe move' to ideal range
> 3. foreach node
> 'nodeprobe clean'
>
> Alternatively, it looks like I might be able to use 'nodeprobe
> loadbalance'
> for step 2, and not use step 1?
>
> Also, anyone else running in EC2 and have any sort of tuning tips?
>
> Thanks,
>
> -Anthony
>
> --
> ------------------------------------------------------------------------
> Anthony Molinaro <an...@alumni.caltech.edu>
>
Re: Tuning and upgrades
Posted by Jonathan Ellis <jb...@gmail.com>.
Ah, yes, the enum thing changed in trunk too. We upgraded our version of
the Thrift compiler for trunk, after 0.5. So 0.4 to 0.5 upgrading does not
need to worry.
On Thu, Jan 14, 2010 at 8:31 AM, Hernan Badenes <hb...@ar.ibm.com> wrote:
> Yes, ConsistencyLevel was an enum already -- but the thrift generated api,
> at that version, generated methods that received an int where a
> ConsistencyLevel was declared. (I am looking at gen-java/.../Cassandra.java
> from a downloaded 0.4.2). Then one needs to change the client, assuming you
> are using Java. I don't know at which thrift revision this was changed; I
> just found it when migrating 0.4.2 -> trunk this week.
>
> About the constructors, ok -- I did not know this was different in 0.5.
>
> regards,
> Hernan
>
>
> From:
> Jonathan Ellis <jb...@gmail.com>
> To:
> cassandra-user@incubator.apache.org
> Date: 01/14/2010 10:42 AM
> Subject: Re: Tuning and upgrades
>
> ------------------------------
>
>
>
> This is not correct. ConsistencyLevel was already an enum in 0.4, and the
> constructors don't change until the release after 0.5.
>
> On Thu, Jan 14, 2010 at 7:10 AM, Hernan Badenes <*h...@ar.ibm.com>>
> wrote:
> I think you also need to upgrade your thrift jar, since the version in 0.5
> is different. And this brings a change in enums, which are no longer plain
> int values but classes. Constructors of most thrift-generated classes also
> change (e.g. new ColumnPath(cf, null, colName) -> new
> ColumnPath(cf).setColumn(colName))...
> In any case, I think you will need to upgrade clients.
>
> Regards,
> Hernan
>
> From: Jonathan Ellis <*jbellis@gmail.com* <jb...@gmail.com>> To: *
> cassandra-user@incubator.apache.org* <ca...@incubator.apache.org>
> Date: 01/13/2010 11:47 PM Subject: Re: Tuning and upgrades
>
> ------------------------------
>
>
>
>
> On Wed, Jan 13, 2010 at 6:02 PM, Anthony Molinaro
> <*anthonym@alumni.caltech.edu* <an...@alumni.caltech.edu>> wrote:
> > So is the thrift interface for 0.5.0 compatible with that of 0.4.x or
> > do I need to upgrade clients for that upgrade?
>
> Just exceptions have changed. (And get_range_slice was added.)
>
> -Jonathan
>
>
>
>
>
Re: Tuning and upgrades
Posted by Hernan Badenes <hb...@ar.ibm.com>.
Yes, ConsistencyLevel was an enum already -- but the thrift generated api,
at that version, generated methods that received an int where a
ConsistencyLevel was declared. (I am looking at
gen-java/.../Cassandra.java from a downloaded 0.4.2). Then one needs to
change the client, assuming you are using Java. I don't know at which
thrift revision this was changed; I just found it when migrating 0.4.2 ->
trunk this week.
About the constructors, ok -- I did not know this was different in 0.5.
regards,
Hernan
From:
Jonathan Ellis <jb...@gmail.com>
To:
cassandra-user@incubator.apache.org
Date:
01/14/2010 10:42 AM
Subject:
Re: Tuning and upgrades
This is not correct. ConsistencyLevel was already an enum in 0.4, and the
constructors don't change until the release after 0.5.
On Thu, Jan 14, 2010 at 7:10 AM, Hernan Badenes <hb...@ar.ibm.com>
wrote:
I think you also need to upgrade your thrift jar, since the version in 0.5
is different. And this brings a change in enums, which are no longer plain
int values but classes. Constructors of most thrift-generated classes also
change (e.g. new ColumnPath(cf, null, colName) -> new
ColumnPath(cf).setColumn(colName))...
In any case, I think you will need to upgrade clients.
Regards,
Hernan
From:
Jonathan Ellis <jb...@gmail.com>
To:
cassandra-user@incubator.apache.org
Date:
01/13/2010 11:47 PM
Subject:
Re: Tuning and upgrades
On Wed, Jan 13, 2010 at 6:02 PM, Anthony Molinaro
<an...@alumni.caltech.edu> wrote:
> So is the thrift interface for 0.5.0 compatible with that of 0.4.x or
> do I need to upgrade clients for that upgrade?
Just exceptions have changed. (And get_range_slice was added.)
-Jonathan
Re: Tuning and upgrades
Posted by Jonathan Ellis <jb...@gmail.com>.
This is not correct. ConsistencyLevel was already an enum in 0.4, and the
constructors don't change until the release after 0.5.
On Thu, Jan 14, 2010 at 7:10 AM, Hernan Badenes <hb...@ar.ibm.com> wrote:
> I think you also need to upgrade your thrift jar, since the version in 0.5
> is different. And this brings a change in enums, which are no longer plain
> int values but classes. Constructors of most thrift-generated classes also
> change (e.g. new ColumnPath(cf, null, colName) -> new
> ColumnPath(cf).setColumn(colName))...
> In any case, I think you will need to upgrade clients.
>
> Regards,
> Hernan
>
>
> From: Jonathan Ellis <jb...@gmail.com> To:
> cassandra-user@incubator.apache.org Date: 01/13/2010 11:47 PM Subject: Re:
> Tuning and upgrades
> ------------------------------
>
>
>
> On Wed, Jan 13, 2010 at 6:02 PM, Anthony Molinaro
> <an...@alumni.caltech.edu> wrote:
> > So is the thrift interface for 0.5.0 compatible with that of 0.4.x or
> > do I need to upgrade clients for that upgrade?
>
> Just exceptions have changed. (And get_range_slice was added.)
>
> -Jonathan
>
>
>
Re: Tuning and upgrades
Posted by Hernan Badenes <hb...@ar.ibm.com>.
I think you also need to upgrade your thrift jar, since the version in 0.5
is different. And this brings a change in enums, which are no longer plain
int values but classes. Constructors of most thrift-generated classes also
change (e.g. new ColumnPath(cf, null, colName) -> new
ColumnPath(cf).setColumn(colName))...
In any case, I think you will need to upgrade clients.
Regards,
Hernan
From:
Jonathan Ellis <jb...@gmail.com>
To:
cassandra-user@incubator.apache.org
Date:
01/13/2010 11:47 PM
Subject:
Re: Tuning and upgrades
On Wed, Jan 13, 2010 at 6:02 PM, Anthony Molinaro
<an...@alumni.caltech.edu> wrote:
> So is the thrift interface for 0.5.0 compatible with that of 0.4.x or
> do I need to upgrade clients for that upgrade?
Just exceptions have changed. (And get_range_slice was added.)
-Jonathan
Re: Tuning and upgrades
Posted by Jonathan Ellis <jb...@gmail.com>.
On Wed, Jan 13, 2010 at 6:02 PM, Anthony Molinaro
<an...@alumni.caltech.edu> wrote:
> So is the thrift interface for 0.5.0 compatible with that of 0.4.x or
> do I need to upgrade clients for that upgrade?
Just exceptions have changed. (And get_range_slice was added.)
-Jonathan
Re: Tuning and upgrades
Posted by Jonathan Ellis <jb...@gmail.com>.
Good question. :)
On Wed, Jan 13, 2010 at 6:19 PM, Anthony Molinaro
<an...@alumni.caltech.edu> wrote:
> Also, I notice in 0.5.0 cassandra.in.sh you have
>
> -XX:SurvivorRatio=8 \
>
> then further down in the file
>
> -XX:SurvivorRatio=128 \
>
> Does the second end up winning? Or is there some magic here.
>
> -Anthony
>
> On Wed, Jan 13, 2010 at 04:02:48PM -0800, Anthony Molinaro wrote:
>> So the answer is java handles it fine. However, I unfortunately wasn't
>> able to do a rolling restart, for whatever reason the first node caused
>> all the other nodes to start throwing exceptions, so I had to take
>> everything down for a little bit. However, 0.4.2 seems to start faster
>> than 0.4.1, so that was cool.
>>
>> So is the thrift interface for 0.5.0 compatible with that of 0.4.x or
>> do I need to upgrade clients for that upgrade?
>>
>> -Anthony
>>
>> On Wed, Jan 13, 2010 at 04:27:32PM -0600, Jonathan Ellis wrote:
>> > On Wed, Jan 13, 2010 at 4:19 PM, Anthony Molinaro
>> > <an...@alumni.caltech.edu> wrote:
>> > > Hi Jonathon,
>> > >
>> > > Thanks for all the information.
>> > >
>> > > I just noticed one difference in the .thrift file between 0.4.1 and
>> > > 0.4.2, the call to get_slice had an exception removed. Does this
>> > > mean I have to have all my clients rebuilt? (I'm not excactly sure
>> > > of what sorts of things are backwards compatible with thrift).
>> >
>> > Not 100% sure -- python will be fine with it, that is the one I am
>> > most familiar with. Not sure about other clients. Should be easy to
>> > test.
>> >
>> > -Jonathan
>>
>> --
>> ------------------------------------------------------------------------
>> Anthony Molinaro <an...@alumni.caltech.edu>
>
> --
> ------------------------------------------------------------------------
> Anthony Molinaro <an...@alumni.caltech.edu>
>
Re: Tuning and upgrades
Posted by Anthony Molinaro <an...@alumni.caltech.edu>.
Also, I notice in 0.5.0 cassandra.in.sh you have
-XX:SurvivorRatio=8 \
then further down in the file
-XX:SurvivorRatio=128 \
Does the second end up winning? Or is there some magic here.
-Anthony
On Wed, Jan 13, 2010 at 04:02:48PM -0800, Anthony Molinaro wrote:
> So the answer is java handles it fine. However, I unfortunately wasn't
> able to do a rolling restart, for whatever reason the first node caused
> all the other nodes to start throwing exceptions, so I had to take
> everything down for a little bit. However, 0.4.2 seems to start faster
> than 0.4.1, so that was cool.
>
> So is the thrift interface for 0.5.0 compatible with that of 0.4.x or
> do I need to upgrade clients for that upgrade?
>
> -Anthony
>
> On Wed, Jan 13, 2010 at 04:27:32PM -0600, Jonathan Ellis wrote:
> > On Wed, Jan 13, 2010 at 4:19 PM, Anthony Molinaro
> > <an...@alumni.caltech.edu> wrote:
> > > Hi Jonathon,
> > >
> > > Thanks for all the information.
> > >
> > > I just noticed one difference in the .thrift file between 0.4.1 and
> > > 0.4.2, the call to get_slice had an exception removed. Does this
> > > mean I have to have all my clients rebuilt? (I'm not excactly sure
> > > of what sorts of things are backwards compatible with thrift).
> >
> > Not 100% sure -- python will be fine with it, that is the one I am
> > most familiar with. Not sure about other clients. Should be easy to
> > test.
> >
> > -Jonathan
>
> --
> ------------------------------------------------------------------------
> Anthony Molinaro <an...@alumni.caltech.edu>
--
------------------------------------------------------------------------
Anthony Molinaro <an...@alumni.caltech.edu>
Re: Tuning and upgrades
Posted by Anthony Molinaro <an...@alumni.caltech.edu>.
So the answer is java handles it fine. However, I unfortunately wasn't
able to do a rolling restart, for whatever reason the first node caused
all the other nodes to start throwing exceptions, so I had to take
everything down for a little bit. However, 0.4.2 seems to start faster
than 0.4.1, so that was cool.
So is the thrift interface for 0.5.0 compatible with that of 0.4.x or
do I need to upgrade clients for that upgrade?
-Anthony
On Wed, Jan 13, 2010 at 04:27:32PM -0600, Jonathan Ellis wrote:
> On Wed, Jan 13, 2010 at 4:19 PM, Anthony Molinaro
> <an...@alumni.caltech.edu> wrote:
> > Hi Jonathon,
> >
> > Thanks for all the information.
> >
> > I just noticed one difference in the .thrift file between 0.4.1 and
> > 0.4.2, the call to get_slice had an exception removed. Does this
> > mean I have to have all my clients rebuilt? (I'm not excactly sure
> > of what sorts of things are backwards compatible with thrift).
>
> Not 100% sure -- python will be fine with it, that is the one I am
> most familiar with. Not sure about other clients. Should be easy to
> test.
>
> -Jonathan
--
------------------------------------------------------------------------
Anthony Molinaro <an...@alumni.caltech.edu>
Re: Tuning and upgrades
Posted by Jonathan Ellis <jb...@gmail.com>.
On Wed, Jan 13, 2010 at 4:19 PM, Anthony Molinaro
<an...@alumni.caltech.edu> wrote:
> Hi Jonathon,
>
> Thanks for all the information.
>
> I just noticed one difference in the .thrift file between 0.4.1 and
> 0.4.2, the call to get_slice had an exception removed. Does this
> mean I have to have all my clients rebuilt? (I'm not excactly sure
> of what sorts of things are backwards compatible with thrift).
Not 100% sure -- python will be fine with it, that is the one I am
most familiar with. Not sure about other clients. Should be easy to
test.
-Jonathan
Re: Tuning and upgrades
Posted by Anthony Molinaro <an...@alumni.caltech.edu>.
Hi Jonathon,
Thanks for all the information.
I just noticed one difference in the .thrift file between 0.4.1 and
0.4.2, the call to get_slice had an exception removed. Does this
mean I have to have all my clients rebuilt? (I'm not excactly sure
of what sorts of things are backwards compatible with thrift).
Also, when transitioning from 0.4.2 to 0.5.0rc3 do I need the clients
to upgrade? Trying to figure out the details of how I'll manage
the upgrade.
-Anthony
On Wed, Jan 13, 2010 at 01:38:28PM -0600, Jonathan Ellis wrote:
> On Wed, Jan 13, 2010 at 1:26 PM, Anthony Molinaro
> <an...@alumni.caltech.edu> wrote:
> > When I run cfstats I get something like ...
> > on a lightly loaded node, but my question is what is the timeframe
> > of the counts?
>
> Operations in the last 60 seconds. So times will roll in and out of
> the average gradually, if that makes sense.
>
> > Data Striping:
> >
> > One option I have is to add additional ebs volumes, then either turn
> > on raid0 across several ebs's or possibly just add additional
> > <DataFileDirectory> elements to my config?
>
> Right. You should see slightly better performance w/ raw volumes.
>
> > If I were to add
> > <DataFileDirectory> entries, can I just move sstable's between
> > directories?
>
> Yes. (But compaction, and flush, will rotate among your DFDs in
> round-robin manner so don't rely on them staying there.)
>
> > If so I assume I want the Index, Filter and Data files
> > to be in the same directory?
>
> Yes.
>
> > Or is this data movement something
> > Cassandra will do for me? Also, is this likely to help?
>
> Depends where your bottleneck is, but probably. :)
>
> > Upgrades:
> >
> > I understand that to upgrade from 0.4.x to 0.5.x I need to do something
> > like
> >
> > 1. turn off all writes to a node
> > 2. call 'nodeprobe flush' on that node
> > 3. restart node with version 0.5.x
> >
> > Is this correct?
>
> Yes, remembering that 0.4 and 0.5 gossip are not compatible so you
> need to upgrade the whole cluster at once.
>
> > Data Repartitioning:
> >
> > So it seems that if I first upgrade my current nodes to 0.5.0, then
> > bring up some new nodes with AutoBootstrap on, they should take some
> > data from the most loaded machines?
>
> Yes.
>
> > But lets say I just want to first
> > even out the load on existing nodes, would the process be something like
> >
> > 1. calculate ideal key ranges (ie, i * (2**127 /N) for i=1..N)
> > (this seems like the ideal candidate for a new tool included
> > with cassandra).
> > 2. foreach node
> > 'nodeprobe move' to ideal range
> > 3. foreach node
> > 'nodeprobe clean'
>
> Yes.
>
> > Alternatively, it looks like I might be able to use 'nodeprobe loadbalance'
> > for step 2, and not use step 1?
>
> LB will move the target node to the middle of the most-loaded range,
> so it's not likely to achieve "perfect" ranges, but it should achieve
> "good enough" with relatively
>
> > Also, anyone else running in EC2 and have any sort of tuning tips?
>
> The SimpleGeo guys are apparently pretty happy w/ EC2 i/o performance:
> http://stu.mp/2009/12/disk-io-and-throughput-benchmarks-on-amazons-ec2.html,
> maybe they will chime in here.
>
> -Jonathan
--
------------------------------------------------------------------------
Anthony Molinaro <an...@alumni.caltech.edu>
Re: Tuning and upgrades
Posted by Jonathan Ellis <jb...@gmail.com>.
On Wed, Jan 13, 2010 at 1:26 PM, Anthony Molinaro
<an...@alumni.caltech.edu> wrote:
> When I run cfstats I get something like ...
> on a lightly loaded node, but my question is what is the timeframe
> of the counts?
Operations in the last 60 seconds. So times will roll in and out of
the average gradually, if that makes sense.
> Data Striping:
>
> One option I have is to add additional ebs volumes, then either turn
> on raid0 across several ebs's or possibly just add additional
> <DataFileDirectory> elements to my config?
Right. You should see slightly better performance w/ raw volumes.
> If I were to add
> <DataFileDirectory> entries, can I just move sstable's between
> directories?
Yes. (But compaction, and flush, will rotate among your DFDs in
round-robin manner so don't rely on them staying there.)
> If so I assume I want the Index, Filter and Data files
> to be in the same directory?
Yes.
> Or is this data movement something
> Cassandra will do for me? Also, is this likely to help?
Depends where your bottleneck is, but probably. :)
> Upgrades:
>
> I understand that to upgrade from 0.4.x to 0.5.x I need to do something
> like
>
> 1. turn off all writes to a node
> 2. call 'nodeprobe flush' on that node
> 3. restart node with version 0.5.x
>
> Is this correct?
Yes, remembering that 0.4 and 0.5 gossip are not compatible so you
need to upgrade the whole cluster at once.
> Data Repartitioning:
>
> So it seems that if I first upgrade my current nodes to 0.5.0, then
> bring up some new nodes with AutoBootstrap on, they should take some
> data from the most loaded machines?
Yes.
> But lets say I just want to first
> even out the load on existing nodes, would the process be something like
>
> 1. calculate ideal key ranges (ie, i * (2**127 /N) for i=1..N)
> (this seems like the ideal candidate for a new tool included
> with cassandra).
> 2. foreach node
> 'nodeprobe move' to ideal range
> 3. foreach node
> 'nodeprobe clean'
Yes.
> Alternatively, it looks like I might be able to use 'nodeprobe loadbalance'
> for step 2, and not use step 1?
LB will move the target node to the middle of the most-loaded range,
so it's not likely to achieve "perfect" ranges, but it should achieve
"good enough" with relatively
> Also, anyone else running in EC2 and have any sort of tuning tips?
The SimpleGeo guys are apparently pretty happy w/ EC2 i/o performance:
http://stu.mp/2009/12/disk-io-and-throughput-benchmarks-on-amazons-ec2.html,
maybe they will chime in here.
-Jonathan