You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Oren Benjamin <or...@clearspring.com> on 2010/07/17 01:06:50 UTC

Cassandra benchmarking on Rackspace Cloud

I've been doing quite a bit of benchmarking of Cassandra in the cloud using stress.py  I'm working on a comprehensive spreadsheet of results with a template that others can add to, but for now I thought I'd post some of the basic results here to get some feedback from others.

The first goal was to reproduce the test described on spyced here: http://spyced.blogspot.com/2010/01/cassandra-05.html

Using Cassandra 0.6.3, a 4GB/160GB cloud server (http://www.rackspacecloud.com/cloud_hosting_products/servers/pricing) with default storage-conf.xml and cassandra.in.sh, here's what I got:

Reads: 4,800/s
Writes: 9,000/s

Pretty close to the result posted on the blog, with a slightly lower write performance (perhaps due to the availability of only a single disk for both commitlog and data).

That was with 1M keys (the blog used 700K).

As number of keys scale read performance degrades as would be expected with no caching:
1M 4,800 reads/s
10M 4,600 reads/s
25M 700 reads/s
100M 200 reads/s

Using row cache and an appropriate choice of --stdev to achieve a cache hit rate of >90% restores the read performance to the 4,800 reads/s level in all cases.  Also as expected, write performance is unaffected by writing more data.

Scaling:
The above was single node testing.  I'd expect to be able to add nodes and scale throughput.  Unfortunately, I seem to be running into a cap of 21,000 reads/s regardless of the number of nodes in the cluster.  In order to better understand this, I eliminated the factors of data size, caching, replication etc. and ran read tests on empty clusters (every read a miss - bouncing off the bloom filter and straight back).  1 node gives 24,000 reads/s while 2,3,4... give 21,000 (presumably the bump in single node performance is due to the lack of the extra hop).  With CPU, disk, and RAM all largely unused, I'm at a loss to explain the lack of additional throughput.  I tried increasing the number of clients but that just split the throughput down the middle with each stress.py achieving roughly 10,000 reads/s.  I'm running the clients (stress.py) on separate cloud servers.

I checked the ulimit file count and I'm not limiting connections there.  It seems like there's a problem with my test setup - a clear bottleneck somewhere, but I just don't see what it is.  Any ideas?

Also: 
The disk performance of the cloud servers have been extremely spotty.  The results I posted above were reproducible whenever the servers were in their "normal" state.  But for periods of as much as several consecutive hours, single servers or groups of servers in the cloud would suddenly have horrendous disk performance as measured by dstat and iostat.  The "% steal" by hypervisor on these nodes is also quite high (> 30%).   The performance during these "bad" periods drops from 4,800reads/s in the single node benchmark to just 200reads/s.  The node is effectively useless.  Is this normal for the cloud?  And if so, what's the solution re Cassandra?  Obviously you can just keep adding more nodes until the likelihood that there is at least one good server with every piece of the data is reasonable.  However, Cassandra routes to the nearest node topologically and not to the best performing one, so "bad" nodes will always result in high latency reads.  How are you guys that are running in the cloud dealing with this?  Are you seeing this at all?

Thanks in advance for your feedback and advice,

   -- Oren

Re: Cassandra benchmarking on Rackspace Cloud

Posted by Oren Benjamin <or...@clearspring.com>.
Yes, as the size of the data on disk increases and the OS cannot avoid disk seeks the read performance degrades.  You can see this in the results from the original post where the number of keys in the test goes from 10M to 100M the reads drop from 4,600/s to 200/s.  10M keys in the stress.py test corresponds to roughly 5GB on disk.  If the frequently used records are small enough to fit into cache, you can restore read performance by lifting them into the row cache and largely avoiding seeks during reads.

On Jul 17, 2010, at 1:05 AM, Schubert Zhang wrote:

I think your read throughput is very high, and it may be unauthentic.

For random read, the disk seek will always be the bottleneck (100% utils)
There will be about 3 random disk-seeks for a random read, and aout 10ms for one seek. So, there will be 30ms for a random read.

If you have only one disk, the read throughput will be about 40reads/s.

High tested throughput may because of the Linux FS cache, if your dataset is small (for example only 1GB).
Try to test the random read throughput on 100GB or 1TB, you may get different result.


On Sat, Jul 17, 2010 at 7:06 AM, Oren Benjamin <or...@clearspring.com>> wrote:
I've been doing quite a bit of benchmarking of Cassandra in the cloud using stress.py  I'm working on a comprehensive spreadsheet of results with a template that others can add to, but for now I thought I'd post some of the basic results here to get some feedback from others.

The first goal was to reproduce the test described on spyced here: http://spyced.blogspot.com/2010/01/cassandra-05.html

Using Cassandra 0.6.3, a 4GB/160GB cloud server (http://www.rackspacecloud.com/cloud_hosting_products/servers/pricing) with default storage-conf.xml and cassandra.in.sh<http://cassandra.in.sh/>, here's what I got:

Reads: 4,800/s
Writes: 9,000/s

Pretty close to the result posted on the blog, with a slightly lower write performance (perhaps due to the availability of only a single disk for both commitlog and data).

That was with 1M keys (the blog used 700K).

As number of keys scale read performance degrades as would be expected with no caching:
1M 4,800 reads/s
10M 4,600 reads/s
25M 700 reads/s
100M 200 reads/s

Using row cache and an appropriate choice of --stdev to achieve a cache hit rate of >90% restores the read performance to the 4,800 reads/s level in all cases.  Also as expected, write performance is unaffected by writing more data.

Scaling:
The above was single node testing.  I'd expect to be able to add nodes and scale throughput.  Unfortunately, I seem to be running into a cap of 21,000 reads/s regardless of the number of nodes in the cluster.  In order to better understand this, I eliminated the factors of data size, caching, replication etc. and ran read tests on empty clusters (every read a miss - bouncing off the bloom filter and straight back).  1 node gives 24,000 reads/s while 2,3,4... give 21,000 (presumably the bump in single node performance is due to the lack of the extra hop).  With CPU, disk, and RAM all largely unused, I'm at a loss to explain the lack of additional throughput.  I tried increasing the number of clients but that just split the throughput down the middle with each stress.py achieving roughly 10,000 reads/s.  I'm running the clients (stress.py) on separate cloud servers.

I checked the ulimit file count and I'm not limiting connections there.  It seems like there's a problem with my test setup - a clear bottleneck somewhere, but I just don't see what it is.  Any ideas?

Also:
The disk performance of the cloud servers have been extremely spotty.  The results I posted above were reproducible whenever the servers were in their "normal" state.  But for periods of as much as several consecutive hours, single servers or groups of servers in the cloud would suddenly have horrendous disk performance as measured by dstat and iostat.  The "% steal" by hypervisor on these nodes is also quite high (> 30%).   The performance during these "bad" periods drops from 4,800reads/s in the single node benchmark to just 200reads/s.  The node is effectively useless.  Is this normal for the cloud?  And if so, what's the solution re Cassandra?  Obviously you can just keep adding more nodes until the likelihood that there is at least one good server with every piece of the data is reasonable.  However, Cassandra routes to the nearest node topologically and not to the best performing one, so "bad" nodes will always result in high latency reads.  How are you guys that are running in the cloud dealing with this?  Are you seeing this at all?

Thanks in advance for your feedback and advice,

  -- Oren



Re: Cassandra benchmarking on Rackspace Cloud

Posted by Schubert Zhang <zs...@gmail.com>.
I think your read throughput is very high, and it may be unauthentic.

For random read, the disk seek will always be the bottleneck (100% utils)
There will be about 3 random disk-seeks for a random read, and aout 10ms for
one seek. So, there will be 30ms for a random read.

If you have only one disk, the read throughput will be about 40reads/s.

High tested throughput may because of the Linux FS cache, if your dataset is
small (for example only 1GB).
Try to test the random read throughput on 100GB or 1TB, you may get
different result.


On Sat, Jul 17, 2010 at 7:06 AM, Oren Benjamin <or...@clearspring.com> wrote:

> I've been doing quite a bit of benchmarking of Cassandra in the cloud using
> stress.py  I'm working on a comprehensive spreadsheet of results with a
> template that others can add to, but for now I thought I'd post some of the
> basic results here to get some feedback from others.
>
> The first goal was to reproduce the test described on spyced here:
> http://spyced.blogspot.com/2010/01/cassandra-05.html
>
> Using Cassandra 0.6.3, a 4GB/160GB cloud server (
> http://www.rackspacecloud.com/cloud_hosting_products/servers/pricing) with
> default storage-conf.xml and cassandra.in.sh, here's what I got:
>
> Reads: 4,800/s
> Writes: 9,000/s
>
> Pretty close to the result posted on the blog, with a slightly lower write
> performance (perhaps due to the availability of only a single disk for both
> commitlog and data).
>
> That was with 1M keys (the blog used 700K).
>
> As number of keys scale read performance degrades as would be expected with
> no caching:
> 1M 4,800 reads/s
> 10M 4,600 reads/s
> 25M 700 reads/s
> 100M 200 reads/s
>
> Using row cache and an appropriate choice of --stdev to achieve a cache hit
> rate of >90% restores the read performance to the 4,800 reads/s level in all
> cases.  Also as expected, write performance is unaffected by writing more
> data.
>
> Scaling:
> The above was single node testing.  I'd expect to be able to add nodes and
> scale throughput.  Unfortunately, I seem to be running into a cap of 21,000
> reads/s regardless of the number of nodes in the cluster.  In order to
> better understand this, I eliminated the factors of data size, caching,
> replication etc. and ran read tests on empty clusters (every read a miss -
> bouncing off the bloom filter and straight back).  1 node gives 24,000
> reads/s while 2,3,4... give 21,000 (presumably the bump in single node
> performance is due to the lack of the extra hop).  With CPU, disk, and RAM
> all largely unused, I'm at a loss to explain the lack of additional
> throughput.  I tried increasing the number of clients but that just split
> the throughput down the middle with each stress.py achieving roughly 10,000
> reads/s.  I'm running the clients (stress.py) on separate cloud servers.
>
> I checked the ulimit file count and I'm not limiting connections there.  It
> seems like there's a problem with my test setup - a clear bottleneck
> somewhere, but I just don't see what it is.  Any ideas?
>
> Also:
> The disk performance of the cloud servers have been extremely spotty.  The
> results I posted above were reproducible whenever the servers were in their
> "normal" state.  But for periods of as much as several consecutive hours,
> single servers or groups of servers in the cloud would suddenly have
> horrendous disk performance as measured by dstat and iostat.  The "% steal"
> by hypervisor on these nodes is also quite high (> 30%).   The performance
> during these "bad" periods drops from 4,800reads/s in the single node
> benchmark to just 200reads/s.  The node is effectively useless.  Is this
> normal for the cloud?  And if so, what's the solution re Cassandra?
>  Obviously you can just keep adding more nodes until the likelihood that
> there is at least one good server with every piece of the data is
> reasonable.  However, Cassandra routes to the nearest node topologically and
> not to the best performing one, so "bad" nodes will always result in high
> latency reads.  How are you guys that are running in the cloud dealing with
> this?  Are you seeing this at all?
>
> Thanks in advance for your feedback and advice,
>
>   -- Oren

Re: Cassandra benchmarking on Rackspace Cloud

Posted by David Schoonover <da...@gmail.com>.
> Another thing: Is the py_stress traffic definitely non-determinstic
> such that each client will generate a definitely unique series of
> requests? 

The tests were run both with --random and --std 0.1; in both cases, the key-sequence is non-deterministic.

Cheers,

Dave



On Jul 19, 2010, at 1:30 PM, Peter Schuller wrote:

> Another thing: Is the py_stress traffic definitely non-determinstic
> such that each client will generate a definitely unique series of
> requests? If all clients are deterministically requesting the same
> sequence of keys, it would otherwise be plausible that they end up in
> effective lock-step, if the bottleneck is serialized but the result of
> it goes out in parallel to all clients.
> 
> (This is probably not the case but I'm not sure off hand about whether
> there is anything in Cassandra's implementation that might be
> compatible with this hypothesis, in terms of how concurrent requests
> for the same key will be handled.)
> 
> -- 
> / Peter Schuller


Re: Cassandra benchmarking on Rackspace Cloud

Posted by Peter Schuller <pe...@infidyne.com>.
Another thing: Is the py_stress traffic definitely non-determinstic
such that each client will generate a definitely unique series of
requests? If all clients are deterministically requesting the same
sequence of keys, it would otherwise be plausible that they end up in
effective lock-step, if the bottleneck is serialized but the result of
it goes out in parallel to all clients.

(This is probably not the case but I'm not sure off hand about whether
there is anything in Cassandra's implementation that might be
compatible with this hypothesis, in terms of how concurrent requests
for the same key will be handled.)

-- 
/ Peter Schuller

Re: Cassandra benchmarking on Rackspace Cloud

Posted by Dave Viner <da...@pobox.com>.
This may be too much work... but you might consider building an Amazon EC2
AMI of your nodes.  This would let others quickly boot up your nodes and run
the stress test against it.

I know you mentioned that you're using Rackspace Cloud.  I'm not super
familiar with the internals of RSCloud, but perhaps they have something
similar?

This feels like the kind of problem that might be easier for someone else to
setup and quickly test.  (The beauty of the virtual server - quick setup and
quick tear down)

Dave Viner


On Mon, Jul 19, 2010 at 10:24 AM, Peter Schuller <
peter.schuller@infidyne.com> wrote:

> > I ran this test previously on the cloud, with similar results:
> >
> > nodes   reads/sec
> > 1       24,000
> > 2       21,000
> > 3       21,000
> > 4       21,000
> > 5       21,000
> > 6       21,000
> >
> > In fact, I ran it twice out of disbelief (on different nodes the second
> time) to essentially identical results.
>
> Something other than cassandra just *has* to be fishy here unless
> there is some kind of bug causing communication with nodes that should
> not be involved. It really sounds like there is a hidden bottleneck
> somewhere.
>
> You already mention that you've run multiple test clients so that the
> client is not a bottleneck. What about bandwidth? I could imagine
> bandwidth adding up a bit given those requests rate. Is it possible
> all the nodes are communicating with each other via some bottleneck
> (like 100 mbit)?
>
> What does the load "look like" when you observe the nodes during
> bottlenecking? How much bandwidth is each machine pushing (ifstat,
> nload, etc); is Cassandra obviously CPU bound or does it look idle?
>
> Presumably Cassandra is not perfectly concurrent and you may not
> saturate 8 cores under this load necessarily, but as you add more and
> more nodes and still only reaching 21k/sec you should come past a
> point where you're not even saturating a single core...
>
> *Something* else is probably going on.
>
> --
> / Peter Schuller
>

Re: Cassandra benchmarking on Rackspace Cloud

Posted by Peter Schuller <pe...@infidyne.com>.
> I ran this test previously on the cloud, with similar results:
>
> nodes   reads/sec
> 1       24,000
> 2       21,000
> 3       21,000
> 4       21,000
> 5       21,000
> 6       21,000
>
> In fact, I ran it twice out of disbelief (on different nodes the second time) to essentially identical results.

Something other than cassandra just *has* to be fishy here unless
there is some kind of bug causing communication with nodes that should
not be involved. It really sounds like there is a hidden bottleneck
somewhere.

You already mention that you've run multiple test clients so that the
client is not a bottleneck. What about bandwidth? I could imagine
bandwidth adding up a bit given those requests rate. Is it possible
all the nodes are communicating with each other via some bottleneck
(like 100 mbit)?

What does the load "look like" when you observe the nodes during
bottlenecking? How much bandwidth is each machine pushing (ifstat,
nload, etc); is Cassandra obviously CPU bound or does it look idle?

Presumably Cassandra is not perfectly concurrent and you may not
saturate 8 cores under this load necessarily, but as you add more and
more nodes and still only reaching 21k/sec you should come past a
point where you're not even saturating a single core...

*Something* else is probably going on.

-- 
/ Peter Schuller

Re: Cassandra benchmarking on Rackspace Cloud

Posted by David Schoonover <da...@gmail.com>.
Thanks a ton, Juho. 

The command was:

	./stress.py -o read -t 50 -d $NODELIST -n 75000000 -k -i 2

I made a few minor modifications to stress.py to count errors instead of logging them, and avoid the pointless try-catch on missing keys. (There are also unrelated edits to restart long runs of inserts.) 

My version is uploaded here:

http://gist.github.com/481966

--
David Schoonover

On Jul 19, 2010, at 4:26 PM, Juho Mäkinen wrote:

> I'm about to extend my two node cluster with four dedicated nodes and
> removing one of the old nodes, leaving a five node cluster. The
> cluster is in production, but I can spare it to do some stress testing
> in the meantime as I'm also interested about my cluster performance. I
> can't dedicate the cluster for the test, but the load at day time
> should be low enough not to screw with the end results too much. The
> results might come in within a few days as we'll get the nodes up -
> hopefully my tests will produce something meaningful data which can be
> applied to this issue.
> 
> I haven't used stress.py yet, any tips on that? Could you, David, send
> me the stress.py command line which you used?
> 
> - Juho Mäkinen



Re: Cassandra benchmarking on Rackspace Cloud

Posted by Juho Mäkinen <ju...@gmail.com>.
I'm about to extend my two node cluster with four dedicated nodes and
removing one of the old nodes, leaving a five node cluster. The
cluster is in production, but I can spare it to do some stress testing
in the meantime as I'm also interested about my cluster performance. I
can't dedicate the cluster for the test, but the load at day time
should be low enough not to screw with the end results too much. The
results might come in within a few days as we'll get the nodes up -
hopefully my tests will produce something meaningful data which can be
applied to this issue.

I haven't used stress.py yet, any tips on that? Could you, David, send
me the stress.py command line which you used?

 - Juho Mäkinen

On Mon, Jul 19, 2010 at 10:51 PM, David Schoonover
<da...@gmail.com> wrote:
> Sorry, mixed signals in my response. I was partially replying to suggestions that we were limited by the box's NIC or DC's bandwidth (which is gigabit, no dice there). I also ran the tests with -t50 on multiple tester machines in the cloud with no change in performance; I've now rerun those tests on dedicated hardware.
>
>
>        reads/sec @
> nodes   one client      two clients
> 1       53k             73k
> 2       37k             50k
> 4       37k             50k
>
>
> Notes:
> - All notes from the previous dataset apply here.
> - All clients were reading with 50 processes.
> - Test clients were not co-located with the databases or each other.
> - All machines are in the same DC.
> - Servers showed about 20MB/sec in network i/o for the multi-node clusters, which is well under the max for gigabit.
> - Latency was about 2.5ms/req.
>
>
> At this point, we'd really appreciate it if anyone else could attempt to replicate our results. Ultimately, our goal is to see an increase in throughput given an increase in cluster size.
>
> --
> David Schoonover
>
> On Jul 19, 2010, at 2:25 PM, Stu Hood wrote:
>
>> If you put 25 processes on each of the 2 machines, all you are testing is how fast 50 processes can hit Cassandra... the point of using more machines is that you can use more processes.
>>
>> Presumably, for a single machine, there is some limit (K) to the number of processes that will give you additional gains: above that point, you should use more machines, each running K processes.
>>
>
>

Re: Cassandra benchmarking on Rackspace Cloud

Posted by Peter Schuller <pe...@infidyne.com>.
> Just wanted to follow up on this.
>
> We were never able to achieve throughput scaling in the cloud.  We were able to verify that many of our cluster nodes and test servers were collocated on the same physical hardware (thanks Stu for the tip on the Rackspace REST API), and that performance on collocated nodes rose and fell in concert.  Ultimately we moved to dedicated hardware and throughput scaled as expected with additional nodes.

Thanks for following-up, confirming there is no reason to believe
there was as problem with Cassandra itself. I actually managed to
misunderstand something because I thought you had seen this effect on
real hardware after moving off of the cloud already.

> Thanks for everyone's help on this.  We had to move on in the interest of moving our project along, but I'd still be interested to see benchmarks from successful cloud installations.  Maybe with the node routing in 0.7 and larger cluster sizes, the cloud might become a more viable option for highly available high read throughput applications.

On the cloud side of things I think it would be very useful to be able
to control (at least weekly) the independence of resources. This
includes at least machine instances and things like EBS volumes. This
would be useful not only for scaling purposes but also redundancy
purposes. I wonder which cloud provider will be the first (or is there
one already?) to provide something like that.

-- 
/ Peter Schuller

Re: Cassandra benchmarking on Rackspace Cloud

Posted by Oren Benjamin <or...@clearspring.com>.
Just wanted to follow up on this.

We were never able to achieve throughput scaling in the cloud.  We were able to verify that many of our cluster nodes and test servers were collocated on the same physical hardware (thanks Stu for the tip on the Rackspace REST API), and that performance on collocated nodes rose and fell in concert.  Ultimately we moved to dedicated hardware and throughput scaled as expected with additional nodes.

Thanks for everyone's help on this.  We had to move on in the interest of moving our project along, but I'd still be interested to see benchmarks from successful cloud installations.  Maybe with the node routing in 0.7 and larger cluster sizes, the cloud might become a more viable option for highly available high read throughput applications.

Best,

  -- Oren


On Jul 20, 2010, at 3:18 PM, Peter Schuller wrote:

>> (I'm hoping to have time to run my test on EC2 tonight; will see.)
> 
> Well, I needed three c1.xlarge EC2 instances running py_stress to even
> saturate more than one core on the c1.xlarge instance running a single
> cassandra node (at roughly 21k reqs/second)... Depending on how
> reliable vmstat/top is on EC2 to begin with.
> 
> Would have to revisit this later if I'm to get any sensible numbers
> that are even remotely trustworthy...
> 
> -- 
> / Peter Schuller


Re: Cassandra benchmarking on Rackspace Cloud

Posted by Peter Schuller <pe...@infidyne.com>.
> (I'm hoping to have time to run my test on EC2 tonight; will see.)

Well, I needed three c1.xlarge EC2 instances running py_stress to even
saturate more than one core on the c1.xlarge instance running a single
cassandra node (at roughly 21k reqs/second)... Depending on how
reliable vmstat/top is on EC2 to begin with.

Would have to revisit this later if I'm to get any sensible numbers
that are even remotely trustworthy...

-- 
/ Peter Schuller

Re: Cassandra benchmarking on Rackspace Cloud

Posted by Peter Schuller <pe...@infidyne.com>.
> But what's then the point with adding nodes into the ring? Disk speed!

Well, it may also be cheaper to service an RPC request than service a
full read or write, even in terms of CPU.

But: Even taking into account that requests are distributed randomly,
the cluster should still scale. You will approach the overhead of
taking the overhead of a level of RPC indirection for 100% of
requests, but it won't become worse than that. That overhead is still
going to be distributed across the entire cluster and you should still
be seeing throughput increasing as nodes are added.

That said, given that the test in this case is probably the cheapest
possible test to make, even in terms of CPU, by hitting non-existent
values, maybe the RPC overhead is simply big enough relative to this
type of request that moving from 1 to 4 nodes doesn't show an
improvement. Suppose for example that the cost of forwarding an RPC
request is comparabale to servicing a read request for a non-existent
key. Under those conditions, going from 1 to 2 nodes would not be
expected to affect throughput at all. Going from 2 to 3 should start
to see an improvement, etc. If RPC overhead is higher than servicing
the read, you'd see performance drop from 1 to 2 nodes (but should
still eventually start scaling with node count).

What seems inconsistent with this hypothesis is that in the numbers
reported by David, there is an initial drop in performance going from
1 to 2 nodes, and then it seems to flatten completely rather than
changing as more nodes are added. Other than at the point of
equilibrium between additional RPC overhead and additional capacity,
I'd expect to either see an increase or a decrease in performance with
each added node.

Additionally, in the absolute beginning of this thread, before the
move to testing non-existent keys, they were hitting the performance
'roof' even with "real" read traffic. Presuming such "real" read
traffic is more expensive to process than key misses on an empty
cluster, that is even more inconsistent with the hypothesis.

(I'm hoping to have time to run my test on EC2 tonight; will see.)

-- 
/ Peter Schuller

Re: Cassandra benchmarking on Rackspace Cloud

Posted by Ryan King <ry...@twitter.com>.
On Tue, Jul 20, 2010 at 6:20 AM, Juho Mäkinen <ju...@gmail.com> wrote:
> I managed to run a few benchmarks.
>
> Servers   r/s
>   1        64.5k
>   2        59.5k
>
> The configuration:
> Client: Machine with four Quad Core Intel Xeon CPU E5520 @ 2.27Ghz
> cpus (total 16 cores), 4530 bogomips per core. 12 GB ECC corrected
> memory. Supermicro mainboard (not sure about exact type).
>
> Cassandra Servers:
>  CPU: Intel Xeon X3450 Quad Core (total 4 cores).
>  Memory: 16 GB DDR SDRAM DIMM 1333 Mhz ECC memory (4 x 4GB DIMMs).
>  Disks: Three 1TB Samsung Spinpoint F3 HD103SJ 7300 rpm disks with
> 32MB internal cache in following configuration: single disk as root
> partition which hosts commitlog and two disks in Linux software RAID-0
> which hosts the data.
>
> - All servers are connected to one switch which has dual 1Gbps trunk
> to another switch which hosts the client. Network utilization was
> monitored during testing and it wasn't even close to saturate any
> network component.
> - All machines are using SUN JDK 1.6.0 Update 21 under RedHat
> Enterprise Linux Server release 5.3 with latests updates. Python
> version is 2.6.3.
>
> During first test with one client and one server:
>  - Test took 1163 seconds, resulting 64 488 reads per second.
>  - Cassandra server RF is set to one (RF=1)
>  - cassandra java used about 400% CPU resources which resulted with
> averate load of 8.0:
>  - 24% user, 20% system, 50% idle and ~6.3% software interrupts
>  - I used the David's modified stress.py from
> http://gist.github.com/481966 with args: ./stress.py -o read -t 50 -d
> $NODELIST -n 75000000 -k -i 2
>  - Java was started with following parameters: -XX:+UseCompressedOops
> -Xmx2046m -XX:TargetSurvivorRatio=90 -XX:+AggressiveOpts
> -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled
>
> I added another server to the cluster and tried to ran the client but
> using just the first server. This way I made sure that the server will
> redirect requests to the other server when the key hashes itself into
> it, thus making sure that cassandra works as it should. This decreased
> the performance to about 52.6kr/s which was expected, as some of the
> requests make one hop more before they can be answered.
>
> Third test was with stress.py hammering both nodes.
>
> While I was running the third test I started to thinking. With two
> nodes in a perfectly balanced ring with RF=1 the incoming read request
> has 50% change to hit the local server and 50% change to hit the
> another server, causing an extra network hop. When having two servers
> in the ring and client hitting only one, will be just as fast as
> having two servers and client hitting both of them. What happends when
> we add 3rd server to the ring and balance? the same, as far as I
> understant. Each incoming read request from the client has just 33%
> change to hit the local server and 66% that it needs an extra hop
> (still assuming that RF=1), so this doesn't make the cluster any
> faster. If my thinking is correct, we can keep adding servers and it
> will just make the cluster a bit slower because the request has less
> change to be served on the server it arrived. So in this case the
> answer to get more speed is to increase RF.
>
> But what's then the point with adding nodes into the ring? Disk speed!
> This test does reads to keys which doesn't exists, thus simulating a
> case where all requests hit the memory cache (cassandra or OS disk
> cache). Increasing node count will add more disks, so the requests are
> distributed into larger array of disks thus increasing the overall
> available IOPS throughput. So if all this was correct, we should stop
> making benchmarks which doesn't hit the disks, because it doesn't
> represent the real work scenario.

Reading non-existing keys in cassandra is cheap because they can
almost always be satisfied by a bloom filter, so you're not testing
what you think you are.

Also, you need to be very careful when using the word 'faster' when
talking about benchmarks. 'Fast' can mean one of at least two things:
latency or throughput. Make sure you know which you're measuring.

-ryan

> I didn't have time to run the test with three servers, but I'll do it
> later anyway to see what kind of results it will produce. Also doing
> the test with RF=2 should confirm that we can increase the cluster
> throughput by increasing the RF count even if the requests don't hit
> the disks.
>
>  - Juho Mäkinen
>
>
> On Tue, Jul 20, 2010 at 12:58 AM, Dave Viner <da...@pobox.com> wrote:
>> I've put up a bunch of steps to get Cassandra installed on an EC2 instance:
>> http://wiki.apache.org/cassandra/CloudConfig
>> Look at the "step-by-step guide".
>> I haven't AMI-ed the result, since the steps are fairly quick and it would
>> be just one more thing to update with a new release of Cassandra...
>> Dave Viner
>>
>> On Mon, Jul 19, 2010 at 2:35 PM, Peter Schuller
>> <pe...@infidyne.com> wrote:
>>>
>>> > CPU was approximately equal across the cluster; it was around 50%.
>>> >
>>> > stress.py generates keys randomly or using a gaussian distribution, both
>>> > methods showed the same results.
>>> >
>>> > Finally, we're using a random partitioner, so Cassandra will hash the
>>> > keys using md5 to map it to a position on the ring.
>>>
>>> Ok, weird. FWIW I'll try to re-produce too on EC2 (I don't really have
>>> an opportunity to test this on real hardware atm), but no promises on
>>> when since I haven't prepared cassandra boostrapping for EC2.
>>>
>>> --
>>> / Peter Schuller
>>
>>
>

Re: Cassandra benchmarking on Rackspace Cloud

Posted by Juho Mäkinen <ju...@gmail.com>.
I managed to run a few benchmarks.

Servers   r/s
   1        64.5k
   2        59.5k

The configuration:
Client: Machine with four Quad Core Intel Xeon CPU E5520 @ 2.27Ghz
cpus (total 16 cores), 4530 bogomips per core. 12 GB ECC corrected
memory. Supermicro mainboard (not sure about exact type).

Cassandra Servers:
 CPU: Intel Xeon X3450 Quad Core (total 4 cores).
 Memory: 16 GB DDR SDRAM DIMM 1333 Mhz ECC memory (4 x 4GB DIMMs).
 Disks: Three 1TB Samsung Spinpoint F3 HD103SJ 7300 rpm disks with
32MB internal cache in following configuration: single disk as root
partition which hosts commitlog and two disks in Linux software RAID-0
which hosts the data.

- All servers are connected to one switch which has dual 1Gbps trunk
to another switch which hosts the client. Network utilization was
monitored during testing and it wasn't even close to saturate any
network component.
- All machines are using SUN JDK 1.6.0 Update 21 under RedHat
Enterprise Linux Server release 5.3 with latests updates. Python
version is 2.6.3.

During first test with one client and one server:
 - Test took 1163 seconds, resulting 64 488 reads per second.
 - Cassandra server RF is set to one (RF=1)
 - cassandra java used about 400% CPU resources which resulted with
averate load of 8.0:
 - 24% user, 20% system, 50% idle and ~6.3% software interrupts
 - I used the David's modified stress.py from
http://gist.github.com/481966 with args: ./stress.py -o read -t 50 -d
$NODELIST -n 75000000 -k -i 2
 - Java was started with following parameters: -XX:+UseCompressedOops
-Xmx2046m -XX:TargetSurvivorRatio=90 -XX:+AggressiveOpts
-XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled

I added another server to the cluster and tried to ran the client but
using just the first server. This way I made sure that the server will
redirect requests to the other server when the key hashes itself into
it, thus making sure that cassandra works as it should. This decreased
the performance to about 52.6kr/s which was expected, as some of the
requests make one hop more before they can be answered.

Third test was with stress.py hammering both nodes.

While I was running the third test I started to thinking. With two
nodes in a perfectly balanced ring with RF=1 the incoming read request
has 50% change to hit the local server and 50% change to hit the
another server, causing an extra network hop. When having two servers
in the ring and client hitting only one, will be just as fast as
having two servers and client hitting both of them. What happends when
we add 3rd server to the ring and balance? the same, as far as I
understant. Each incoming read request from the client has just 33%
change to hit the local server and 66% that it needs an extra hop
(still assuming that RF=1), so this doesn't make the cluster any
faster. If my thinking is correct, we can keep adding servers and it
will just make the cluster a bit slower because the request has less
change to be served on the server it arrived. So in this case the
answer to get more speed is to increase RF.

But what's then the point with adding nodes into the ring? Disk speed!
This test does reads to keys which doesn't exists, thus simulating a
case where all requests hit the memory cache (cassandra or OS disk
cache). Increasing node count will add more disks, so the requests are
distributed into larger array of disks thus increasing the overall
available IOPS throughput. So if all this was correct, we should stop
making benchmarks which doesn't hit the disks, because it doesn't
represent the real work scenario.

I didn't have time to run the test with three servers, but I'll do it
later anyway to see what kind of results it will produce. Also doing
the test with RF=2 should confirm that we can increase the cluster
throughput by increasing the RF count even if the requests don't hit
the disks.

 - Juho Mäkinen


On Tue, Jul 20, 2010 at 12:58 AM, Dave Viner <da...@pobox.com> wrote:
> I've put up a bunch of steps to get Cassandra installed on an EC2 instance:
> http://wiki.apache.org/cassandra/CloudConfig
> Look at the "step-by-step guide".
> I haven't AMI-ed the result, since the steps are fairly quick and it would
> be just one more thing to update with a new release of Cassandra...
> Dave Viner
>
> On Mon, Jul 19, 2010 at 2:35 PM, Peter Schuller
> <pe...@infidyne.com> wrote:
>>
>> > CPU was approximately equal across the cluster; it was around 50%.
>> >
>> > stress.py generates keys randomly or using a gaussian distribution, both
>> > methods showed the same results.
>> >
>> > Finally, we're using a random partitioner, so Cassandra will hash the
>> > keys using md5 to map it to a position on the ring.
>>
>> Ok, weird. FWIW I'll try to re-produce too on EC2 (I don't really have
>> an opportunity to test this on real hardware atm), but no promises on
>> when since I haven't prepared cassandra boostrapping for EC2.
>>
>> --
>> / Peter Schuller
>
>

Re: Cassandra benchmarking on Rackspace Cloud

Posted by Dave Viner <da...@pobox.com>.
I've put up a bunch of steps to get Cassandra installed on an EC2 instance:
http://wiki.apache.org/cassandra/CloudConfig
Look at the "step-by-step guide".

I haven't AMI-ed the result, since the steps are fairly quick and it would
be just one more thing to update with a new release of Cassandra...

Dave Viner


On Mon, Jul 19, 2010 at 2:35 PM, Peter Schuller <peter.schuller@infidyne.com
> wrote:

> > CPU was approximately equal across the cluster; it was around 50%.
> >
> > stress.py generates keys randomly or using a gaussian distribution, both
> methods showed the same results.
> >
> > Finally, we're using a random partitioner, so Cassandra will hash the
> keys using md5 to map it to a position on the ring.
>
> Ok, weird. FWIW I'll try to re-produce too on EC2 (I don't really have
> an opportunity to test this on real hardware atm), but no promises on
> when since I haven't prepared cassandra boostrapping for EC2.
>
> --
> / Peter Schuller
>

Re: Cassandra benchmarking on Rackspace Cloud

Posted by Peter Schuller <pe...@infidyne.com>.
> CPU was approximately equal across the cluster; it was around 50%.
>
> stress.py generates keys randomly or using a gaussian distribution, both methods showed the same results.
>
> Finally, we're using a random partitioner, so Cassandra will hash the keys using md5 to map it to a position on the ring.

Ok, weird. FWIW I'll try to re-produce too on EC2 (I don't really have
an opportunity to test this on real hardware atm), but no promises on
when since I haven't prepared cassandra boostrapping for EC2.

-- 
/ Peter Schuller

Re: Cassandra benchmarking on Rackspace Cloud

Posted by David Schoonover <da...@gmail.com>.
> Did you see about equal CPU usage on the cassandra nodes during the
> test? Is it possible that most or all of the keys generated by
> stress.py simply fall on a single node?

CPU was approximately equal across the cluster; it was around 50%.

stress.py generates keys randomly or using a gaussian distribution, both methods showed the same results.

Finally, we're using a random partitioner, so Cassandra will hash the keys using md5 to map it to a position on the ring.

--
David Schoonover

On Jul 19, 2010, at 4:14 PM, Peter Schuller wrote:

> The following is completely irrelevant if you are indeed using the
> default storage-conf.xml as you said. However since I wrote it and it
> remains relevant for anyone testing with the order preserving
> partitioner, I might aswell post it rather than discard it...
> 
> Begin probably irrelevant post:
> 
> Another stab in the dark:
> 
> You do specifically mention that you distributed tokens evenly across
> the cluster and independently for each cluster size. However, were the
> tokens distributed evenly *within the range used by the stress test*?
> 
> This is the random key generator in stress.py:
> 
> def key_generator_random():
>    fmt = '%0' + str(len(str(total_keys))) + 'd'
>    return fmt % randint(0, total_keys - 1)
> 
> Unless I am misreading/mis-testing, this will generate keys that are
> essentially ASCII decimal characters in strings of equal length, with
> numerical values distributed in the range [0,total_keys - 1]. However,
> the key prefixes covered by the range '0-9' make up a very limited
> subset of the token spaces into which cluster nodes are placed, for
> both byte strings and UTF-8 strings.
> 
> Did you see about equal CPU usage on the cassandra nodes during the
> test? Is it possible that most or all of the keys generated by
> stress.py simply fall on a single node?
> 
> -- 
> / Peter Schuller


Re: Cassandra benchmarking on Rackspace Cloud

Posted by Peter Schuller <pe...@infidyne.com>.
The following is completely irrelevant if you are indeed using the
default storage-conf.xml as you said. However since I wrote it and it
remains relevant for anyone testing with the order preserving
partitioner, I might aswell post it rather than discard it...

Begin probably irrelevant post:

Another stab in the dark:

You do specifically mention that you distributed tokens evenly across
the cluster and independently for each cluster size. However, were the
tokens distributed evenly *within the range used by the stress test*?

This is the random key generator in stress.py:

def key_generator_random():
    fmt = '%0' + str(len(str(total_keys))) + 'd'
    return fmt % randint(0, total_keys - 1)

Unless I am misreading/mis-testing, this will generate keys that are
essentially ASCII decimal characters in strings of equal length, with
numerical values distributed in the range [0,total_keys - 1]. However,
the key prefixes covered by the range '0-9' make up a very limited
subset of the token spaces into which cluster nodes are placed, for
both byte strings and UTF-8 strings.

Did you see about equal CPU usage on the cassandra nodes during the
test? Is it possible that most or all of the keys generated by
stress.py simply fall on a single node?

-- 
/ Peter Schuller

Re: Cassandra benchmarking on Rackspace Cloud

Posted by David Schoonover <da...@gmail.com>.
> Now keep adding clients until it stops making the numbers go up...

Neither adding additional readers nor additional cluster nodes showed performance gains. The numbers, they do not move.


--
David Schoonover

On Jul 19, 2010, at 5:18 PM, Jonathan Ellis wrote:

> Now keep adding clients until it stops making the numbers go up...
> 
> On Mon, Jul 19, 2010 at 2:51 PM, David Schoonover
> <da...@gmail.com> wrote:
>> Sorry, mixed signals in my response. I was partially replying to suggestions that we were limited by the box's NIC or DC's bandwidth (which is gigabit, no dice there). I also ran the tests with -t50 on multiple tester machines in the cloud with no change in performance; I've now rerun those tests on dedicated hardware.
>> 
>> 
>>        reads/sec @
>> nodes   one client      two clients
>> 1       53k             73k
>> 2       37k             50k
>> 4       37k             50k
>> 
>> 
>> Notes:
>> - All notes from the previous dataset apply here.
>> - All clients were reading with 50 processes.
>> - Test clients were not co-located with the databases or each other.
>> - All machines are in the same DC.
>> - Servers showed about 20MB/sec in network i/o for the multi-node clusters, which is well under the max for gigabit.
>> - Latency was about 2.5ms/req.
>> 
>> 
>> At this point, we'd really appreciate it if anyone else could attempt to replicate our results. Ultimately, our goal is to see an increase in throughput given an increase in cluster size.
>> 
>> --
>> David Schoonover
>> 
>> On Jul 19, 2010, at 2:25 PM, Stu Hood wrote:
>> 
>>> If you put 25 processes on each of the 2 machines, all you are testing is how fast 50 processes can hit Cassandra... the point of using more machines is that you can use more processes.
>>> 
>>> Presumably, for a single machine, there is some limit (K) to the number of processes that will give you additional gains: above that point, you should use more machines, each running K processes.
>>> 
>> 
>> 
> 
> 
> 
> -- 
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com


Re: Cassandra benchmarking on Rackspace Cloud

Posted by Jonathan Ellis <jb...@gmail.com>.
Now keep adding clients until it stops making the numbers go up...

On Mon, Jul 19, 2010 at 2:51 PM, David Schoonover
<da...@gmail.com> wrote:
> Sorry, mixed signals in my response. I was partially replying to suggestions that we were limited by the box's NIC or DC's bandwidth (which is gigabit, no dice there). I also ran the tests with -t50 on multiple tester machines in the cloud with no change in performance; I've now rerun those tests on dedicated hardware.
>
>
>        reads/sec @
> nodes   one client      two clients
> 1       53k             73k
> 2       37k             50k
> 4       37k             50k
>
>
> Notes:
> - All notes from the previous dataset apply here.
> - All clients were reading with 50 processes.
> - Test clients were not co-located with the databases or each other.
> - All machines are in the same DC.
> - Servers showed about 20MB/sec in network i/o for the multi-node clusters, which is well under the max for gigabit.
> - Latency was about 2.5ms/req.
>
>
> At this point, we'd really appreciate it if anyone else could attempt to replicate our results. Ultimately, our goal is to see an increase in throughput given an increase in cluster size.
>
> --
> David Schoonover
>
> On Jul 19, 2010, at 2:25 PM, Stu Hood wrote:
>
>> If you put 25 processes on each of the 2 machines, all you are testing is how fast 50 processes can hit Cassandra... the point of using more machines is that you can use more processes.
>>
>> Presumably, for a single machine, there is some limit (K) to the number of processes that will give you additional gains: above that point, you should use more machines, each running K processes.
>>
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Re: Cassandra benchmarking on Rackspace Cloud

Posted by Oren Benjamin <or...@clearspring.com>.
I'll just add that CPU usage hovered around 50% during these tests.

On Jul 19, 2010, at 3:51 PM, David Schoonover wrote:

> Sorry, mixed signals in my response. I was partially replying to suggestions that we were limited by the box's NIC or DC's bandwidth (which is gigabit, no dice there). I also ran the tests with -t50 on multiple tester machines in the cloud with no change in performance; I've now rerun those tests on dedicated hardware.
> 
> 
> 	reads/sec @
> nodes	one client	two clients
> 1	53k		73k
> 2	37k		50k
> 4	37k		50k
> 
> 
> Notes:
> - All notes from the previous dataset apply here.
> - All clients were reading with 50 processes.
> - Test clients were not co-located with the databases or each other.
> - All machines are in the same DC.
> - Servers showed about 20MB/sec in network i/o for the multi-node clusters, which is well under the max for gigabit.
> - Latency was about 2.5ms/req.
> 
> 
> At this point, we'd really appreciate it if anyone else could attempt to replicate our results. Ultimately, our goal is to see an increase in throughput given an increase in cluster size.
> 
> --
> David Schoonover
> 
> On Jul 19, 2010, at 2:25 PM, Stu Hood wrote:
> 
>> If you put 25 processes on each of the 2 machines, all you are testing is how fast 50 processes can hit Cassandra... the point of using more machines is that you can use more processes.
>> 
>> Presumably, for a single machine, there is some limit (K) to the number of processes that will give you additional gains: above that point, you should use more machines, each running K processes.
>> 
> 


Re: Cassandra benchmarking on Rackspace Cloud

Posted by David Schoonover <da...@gmail.com>.
Sorry, mixed signals in my response. I was partially replying to suggestions that we were limited by the box's NIC or DC's bandwidth (which is gigabit, no dice there). I also ran the tests with -t50 on multiple tester machines in the cloud with no change in performance; I've now rerun those tests on dedicated hardware.


	reads/sec @
nodes	one client	two clients
1	53k		73k
2	37k		50k
4	37k		50k


Notes:
- All notes from the previous dataset apply here.
- All clients were reading with 50 processes.
- Test clients were not co-located with the databases or each other.
- All machines are in the same DC.
- Servers showed about 20MB/sec in network i/o for the multi-node clusters, which is well under the max for gigabit.
- Latency was about 2.5ms/req.


At this point, we'd really appreciate it if anyone else could attempt to replicate our results. Ultimately, our goal is to see an increase in throughput given an increase in cluster size.

--
David Schoonover

On Jul 19, 2010, at 2:25 PM, Stu Hood wrote:

> If you put 25 processes on each of the 2 machines, all you are testing is how fast 50 processes can hit Cassandra... the point of using more machines is that you can use more processes.
> 
> Presumably, for a single machine, there is some limit (K) to the number of processes that will give you additional gains: above that point, you should use more machines, each running K processes.
> 


Re: Cassandra benchmarking on Rackspace Cloud

Posted by S Ahmed <sa...@gmail.com>.
I'm reading what this thread and I am a little lost, what should the
expected behavioral be?

Should it maintain 53K regardless of nodes?

nodes   reads/sec
1       53,000
2       37,000
4       37,000

I ran this test previously on the cloud, with similar results:

nodes   reads/sec
1       24,000
2       21,000
3       21,000
4       21,000
5       21,000
6       21,000




On Mon, Jul 19, 2010 at 2:02 PM, David Schoonover <
david.schoonover@gmail.com> wrote:

> > Multiple client processes, or multiple client machines?
>
>
> I ran it with both one and two client machines making requests, and ensured
> the sum of the request threads across the clients was 50. That was on the
> cloud. I am re-running the multi-host test against the 4-node cluster on
> dedicated hardware now to ensure that result was not an artifact of the
> cloud.
>
>
> David Schoonover
>
> On Jul 19, 2010, at 1:38 PM, Jonathan Ellis wrote:
>
> > On Mon, Jul 19, 2010 at 12:30 PM, David Schoonover
> > <da...@gmail.com> wrote:
> >>> How many physical client machines are running stress.py?
> >>
> >> One with 50 threads; it is remote from the cluster but within the same
> >> DC in both cases. I also run the test with multiple clients and saw
> >> similar results when summing the reqs/sec.
> >
> > Multiple client processes, or multiple client machines?
> >
> > --
> > Jonathan Ellis
> > Project Chair, Apache Cassandra
> > co-founder of Riptano, the source for professional Cassandra support
> > http://riptano.com
>
>

Re: Cassandra benchmarking on Rackspace Cloud

Posted by malcolm smith <ma...@treehousesystems.com>.
Usually a fixed bottleneck results from a limited resource -- you've
eliminated disk from the test and you don't mention that CPU is a serious
issue, or memory for that matter.

So for me that leaves network i/o and switch capacity.  Is it possible that
your test is saturating your local network card or switch infrastructure.

Some rough numbers would be that 1Gbe does about 120MBytes/second i/o in
practice and 100Mbit will do something like 10MB/sec so if your requests so
37,000 requests per second would mean 270 bytes per request (including
network encoding and meta data) on a 100Mbit network or 3.2K per request if
you have a full 1Gbe network including switch capacity to switch 1Gbe per
node.

Is is possible that you are moving 3.2K per request?

-malcolm

On Mon, Jul 19, 2010 at 2:27 PM, Ryan King <ry...@twitter.com> wrote:

> On Mon, Jul 19, 2010 at 11:02 AM, David Schoonover
> <da...@gmail.com> wrote:
> >> Multiple client processes, or multiple client machines?
> >
> >
> > I ran it with both one and two client machines making requests, and
> ensured the sum of the request threads across the clients was 50. That was
> on the cloud. I am re-running the multi-host test against the 4-node cluster
> on dedicated hardware now to ensure that result was not an artifact of the
> cloud.
>
> Why would you only use 50 threads total across two hosts?
>
> -ryan
>

Re: Cassandra benchmarking on Rackspace Cloud

Posted by Ryan King <ry...@twitter.com>.
On Mon, Jul 19, 2010 at 11:02 AM, David Schoonover
<da...@gmail.com> wrote:
>> Multiple client processes, or multiple client machines?
>
>
> I ran it with both one and two client machines making requests, and ensured the sum of the request threads across the clients was 50. That was on the cloud. I am re-running the multi-host test against the 4-node cluster on dedicated hardware now to ensure that result was not an artifact of the cloud.

Why would you only use 50 threads total across two hosts?

-ryan

Re: Cassandra benchmarking on Rackspace Cloud

Posted by Stu Hood <st...@rackspace.com>.
If you put 25 processes on each of the 2 machines, all you are testing is how fast 50 processes can hit Cassandra... the point of using more machines is that you can use more processes.

Presumably, for a single machine, there is some limit (K) to the number of processes that will give you additional gains: above that point, you should use more machines, each running K processes.

-----Original Message-----
From: "David Schoonover" <da...@gmail.com>
Sent: Monday, July 19, 2010 1:02pm
To: user@cassandra.apache.org
Subject: Re: Cassandra benchmarking on Rackspace Cloud

> Multiple client processes, or multiple client machines?


I ran it with both one and two client machines making requests, and ensured the sum of the request threads across the clients was 50. That was on the cloud. I am re-running the multi-host test against the 4-node cluster on dedicated hardware now to ensure that result was not an artifact of the cloud.


David Schoonover

On Jul 19, 2010, at 1:38 PM, Jonathan Ellis wrote:

> On Mon, Jul 19, 2010 at 12:30 PM, David Schoonover
> <da...@gmail.com> wrote:
>>> How many physical client machines are running stress.py?
>> 
>> One with 50 threads; it is remote from the cluster but within the same
>> DC in both cases. I also run the test with multiple clients and saw
>> similar results when summing the reqs/sec.
> 
> Multiple client processes, or multiple client machines?
> 
> -- 
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com




Re: Cassandra benchmarking on Rackspace Cloud

Posted by David Schoonover <da...@gmail.com>.
> Multiple client processes, or multiple client machines?


I ran it with both one and two client machines making requests, and ensured the sum of the request threads across the clients was 50. That was on the cloud. I am re-running the multi-host test against the 4-node cluster on dedicated hardware now to ensure that result was not an artifact of the cloud.


David Schoonover

On Jul 19, 2010, at 1:38 PM, Jonathan Ellis wrote:

> On Mon, Jul 19, 2010 at 12:30 PM, David Schoonover
> <da...@gmail.com> wrote:
>>> How many physical client machines are running stress.py?
>> 
>> One with 50 threads; it is remote from the cluster but within the same
>> DC in both cases. I also run the test with multiple clients and saw
>> similar results when summing the reqs/sec.
> 
> Multiple client processes, or multiple client machines?
> 
> -- 
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com


Re: Cassandra benchmarking on Rackspace Cloud

Posted by Peter Schuller <pe...@infidyne.com>.
> stress.py uses multiprocessing if it is present, circumventing the GIL; we ran the tests with python 2.6.5.

Ah, sorry about that. I was mis-remembering because I had to use
threading with pystress because multiprocessing was broken/unavailabie
(can't remember which) on FreeBSD.

I agree with Stu's point that if you're using a fixed concurrency of
50, you'd expect to see a fixed maximum throughput. Even with an
infinite capacity in the cassandra cluster, you'll have some
particular latency associated with each request. Having exactly 50
internally sequential clients performing reads, this yields some
particular maximum throughput which won't be affected by a larger
cassandra cluster.

The only way to see higher total throughput is to have a higher total
concurrency, or to make each individual request have lower latency.
The former would scale, while the latter wouldn't.

-- 
/ Peter Schuller

Re: Cassandra benchmarking on Rackspace Cloud

Posted by David Schoonover <da...@gmail.com>.
stress.py uses multiprocessing if it is present, circumventing the GIL; we ran the tests with python 2.6.5.

David Schoonover

On Jul 19, 2010, at 1:51 PM, Peter Schuller wrote:

>>> One with 50 threads; it is remote from the cluster but within the same
>>> DC in both cases. I also run the test with multiple clients and saw
>>> similar results when summing the reqs/sec.
>> 
>> Multiple client processes, or multiple client machines?
> 
> In particular, note that the way CPython works, if you're CPU bound
> across many threads, you're constantly hitting the worst possible
> scenario with respect to wasting CPU cycles on multiple cores (due to
> the extremely contended GIL). While I'd still expect to see an
> increase in throughput from running multiple separate processes on the
> same (multi-core) machine, I really wouldn't be too sure. Even with
> supposedly idle CPU you may still be bottlenecking on the client
> depending on scheduling decisions in the kernel.
> 
> -- 
> / Peter Schuller


Re: Cassandra benchmarking on Rackspace Cloud

Posted by Peter Schuller <pe...@infidyne.com>.
>> One with 50 threads; it is remote from the cluster but within the same
>> DC in both cases. I also run the test with multiple clients and saw
>> similar results when summing the reqs/sec.
>
> Multiple client processes, or multiple client machines?

In particular, note that the way CPython works, if you're CPU bound
across many threads, you're constantly hitting the worst possible
scenario with respect to wasting CPU cycles on multiple cores (due to
the extremely contended GIL). While I'd still expect to see an
increase in throughput from running multiple separate processes on the
same (multi-core) machine, I really wouldn't be too sure. Even with
supposedly idle CPU you may still be bottlenecking on the client
depending on scheduling decisions in the kernel.

-- 
/ Peter Schuller

Re: Cassandra benchmarking on Rackspace Cloud

Posted by Jonathan Ellis <jb...@gmail.com>.
On Mon, Jul 19, 2010 at 12:30 PM, David Schoonover
<da...@gmail.com> wrote:
>> How many physical client machines are running stress.py?
>
> One with 50 threads; it is remote from the cluster but within the same
> DC in both cases. I also run the test with multiple clients and saw
> similar results when summing the reqs/sec.

Multiple client processes, or multiple client machines?

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Re: Cassandra benchmarking on Rackspace Cloud

Posted by Stu Hood <st...@rackspace.com>.
This is absolutely your bottleneck, as Brandon mentioned before. Your client machine is maxing out at 37K requests per second.

-----Original Message-----
From: "David Schoonover" <da...@gmail.com>
Sent: Monday, July 19, 2010 12:30pm
To: user@cassandra.apache.org
Subject: Re: Cassandra benchmarking on Rackspace Cloud

> How many physical client machines are running stress.py?

One with 50 threads; it is remote from the cluster but within the same
DC in both cases. I also run the test with multiple clients and saw
similar results when summing the reqs/sec.


On Mon, Jul 19, 2010 at 1:22 PM, Stu Hood <st...@rackspace.com> wrote:
> How many physical client machines are running stress.py?
>
> -----Original Message-----
> From: "David Schoonover" <da...@gmail.com>
> Sent: Monday, July 19, 2010 12:11pm
> To: user@cassandra.apache.org
> Subject: Re: Cassandra benchmarking on Rackspace Cloud
>
> Hello all, I'm Oren's partner in crime on all this. I've got a few more numbers to add.
>
> In an effort to eliminate everything but the scaling issue, I set up a cluster on dedicated hardware (non-virtualized; 8-core, 16G RAM).
>
> No data was loaded into Cassandra -- 100% of requests were misses. This is, so far as we can reason about the problem, as fast as the database can perform; disk is out of the picture, and the hardware is certainly more than sufficient.
>
> nodes   reads/sec
> 1       53,000
> 2       37,000
> 4       37,000
>
> I ran this test previously on the cloud, with similar results:
>
> nodes   reads/sec
> 1       24,000
> 2       21,000
> 3       21,000
> 4       21,000
> 5       21,000
> 6       21,000
>
> In fact, I ran it twice out of disbelief (on different nodes the second time) to essentially identical results.
>
> Other Notes:
>  - stress.py was run in both random and gaussian mode; there was no difference.
>  - Runs were 10+ minutes (where the above number represents an average excluding the beginning and the end of the run).
>  - Supplied node lists covered all boxes in the cluster.
>  - Data and commitlog directories were deleted between each run.
>  - Tokens were evenly spaced across the ring, and changed to match cluster size before each run.
>
> If anyone has explanations or suggestions, they would be quite welcome. This is surprising to say the least.
>
> Cheers,
>
> Dave
>
>
>
> On Jul 19, 2010, at 11:42 AM, Stu Hood wrote:
>
>> Hey Oren,
>>
>> The Cloud Servers REST API returns a "hostId" for each server that indicates which physical host you are on: I'm not sure if you can see it from the control panel, but a quick curl session should get you the answer.
>>
>> Thanks,
>> Stu
>>
>> -----Original Message-----
>> From: "Oren Benjamin" <or...@clearspring.com>
>> Sent: Monday, July 19, 2010 10:30am
>> To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
>> Subject: Re: Cassandra benchmarking on Rackspace Cloud
>>
>> Certainly I'm using multiple cloud servers for the multiple client tests.  Whether or not they are resident on the same physical machine, I just don't know.
>>
>>   -- Oren
>>
>> On Jul 18, 2010, at 11:35 PM, Brandon Williams wrote:
>>
>> On Sun, Jul 18, 2010 at 8:45 PM, Oren Benjamin <or...@clearspring.com>> wrote:
>> Thanks for the info.  Very helpful in validating what I've been seeing.  As for the scaling limit...
>>
>>>> The above was single node testing.  I'd expect to be able to add nodes and scale throughput.  Unfortunately, I seem to be running into a cap of 21,000 reads/s regardless of the number of nodes in the cluster.
>>>
>>> This is what I would expect if a single machine is handling all the
>>> Thrift requests.  Are you spreading the client connections to all the
>>> machines?
>>
>> Yes - in all tests I add all nodes in the cluster to the --nodes list.  The client requests are in fact being dispersed among all the nodes as evidenced by the intermittent TimedOutExceptions in the log which show up against the various nodes in the input list.  Could it be a result of all the virtual nodes being hosted on the same physical hardware?  Am I running into some connection limit?  I don't see anything pegged in the JMX stats.
>>
>> It's unclear if you're using multiple client machines for stress.py or not, a limitation of 24k/21k for a single quad-proc machine is normal in my experience.
>>
>> -Brandon
>>
>>
>>
>
>
>
>



-- 
LOVE DAVE



Re: Cassandra benchmarking on Rackspace Cloud

Posted by David Schoonover <da...@gmail.com>.
> How many physical client machines are running stress.py?

One with 50 threads; it is remote from the cluster but within the same
DC in both cases. I also run the test with multiple clients and saw
similar results when summing the reqs/sec.


On Mon, Jul 19, 2010 at 1:22 PM, Stu Hood <st...@rackspace.com> wrote:
> How many physical client machines are running stress.py?
>
> -----Original Message-----
> From: "David Schoonover" <da...@gmail.com>
> Sent: Monday, July 19, 2010 12:11pm
> To: user@cassandra.apache.org
> Subject: Re: Cassandra benchmarking on Rackspace Cloud
>
> Hello all, I'm Oren's partner in crime on all this. I've got a few more numbers to add.
>
> In an effort to eliminate everything but the scaling issue, I set up a cluster on dedicated hardware (non-virtualized; 8-core, 16G RAM).
>
> No data was loaded into Cassandra -- 100% of requests were misses. This is, so far as we can reason about the problem, as fast as the database can perform; disk is out of the picture, and the hardware is certainly more than sufficient.
>
> nodes   reads/sec
> 1       53,000
> 2       37,000
> 4       37,000
>
> I ran this test previously on the cloud, with similar results:
>
> nodes   reads/sec
> 1       24,000
> 2       21,000
> 3       21,000
> 4       21,000
> 5       21,000
> 6       21,000
>
> In fact, I ran it twice out of disbelief (on different nodes the second time) to essentially identical results.
>
> Other Notes:
>  - stress.py was run in both random and gaussian mode; there was no difference.
>  - Runs were 10+ minutes (where the above number represents an average excluding the beginning and the end of the run).
>  - Supplied node lists covered all boxes in the cluster.
>  - Data and commitlog directories were deleted between each run.
>  - Tokens were evenly spaced across the ring, and changed to match cluster size before each run.
>
> If anyone has explanations or suggestions, they would be quite welcome. This is surprising to say the least.
>
> Cheers,
>
> Dave
>
>
>
> On Jul 19, 2010, at 11:42 AM, Stu Hood wrote:
>
>> Hey Oren,
>>
>> The Cloud Servers REST API returns a "hostId" for each server that indicates which physical host you are on: I'm not sure if you can see it from the control panel, but a quick curl session should get you the answer.
>>
>> Thanks,
>> Stu
>>
>> -----Original Message-----
>> From: "Oren Benjamin" <or...@clearspring.com>
>> Sent: Monday, July 19, 2010 10:30am
>> To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
>> Subject: Re: Cassandra benchmarking on Rackspace Cloud
>>
>> Certainly I'm using multiple cloud servers for the multiple client tests.  Whether or not they are resident on the same physical machine, I just don't know.
>>
>>   -- Oren
>>
>> On Jul 18, 2010, at 11:35 PM, Brandon Williams wrote:
>>
>> On Sun, Jul 18, 2010 at 8:45 PM, Oren Benjamin <or...@clearspring.com>> wrote:
>> Thanks for the info.  Very helpful in validating what I've been seeing.  As for the scaling limit...
>>
>>>> The above was single node testing.  I'd expect to be able to add nodes and scale throughput.  Unfortunately, I seem to be running into a cap of 21,000 reads/s regardless of the number of nodes in the cluster.
>>>
>>> This is what I would expect if a single machine is handling all the
>>> Thrift requests.  Are you spreading the client connections to all the
>>> machines?
>>
>> Yes - in all tests I add all nodes in the cluster to the --nodes list.  The client requests are in fact being dispersed among all the nodes as evidenced by the intermittent TimedOutExceptions in the log which show up against the various nodes in the input list.  Could it be a result of all the virtual nodes being hosted on the same physical hardware?  Am I running into some connection limit?  I don't see anything pegged in the JMX stats.
>>
>> It's unclear if you're using multiple client machines for stress.py or not, a limitation of 24k/21k for a single quad-proc machine is normal in my experience.
>>
>> -Brandon
>>
>>
>>
>
>
>
>



-- 
LOVE DAVE

Re: Cassandra benchmarking on Rackspace Cloud

Posted by Stu Hood <st...@rackspace.com>.
How many physical client machines are running stress.py?

-----Original Message-----
From: "David Schoonover" <da...@gmail.com>
Sent: Monday, July 19, 2010 12:11pm
To: user@cassandra.apache.org
Subject: Re: Cassandra benchmarking on Rackspace Cloud

Hello all, I'm Oren's partner in crime on all this. I've got a few more numbers to add.

In an effort to eliminate everything but the scaling issue, I set up a cluster on dedicated hardware (non-virtualized; 8-core, 16G RAM). 

No data was loaded into Cassandra -- 100% of requests were misses. This is, so far as we can reason about the problem, as fast as the database can perform; disk is out of the picture, and the hardware is certainly more than sufficient.

nodes	reads/sec
1	53,000
2	37,000
4	37,000

I ran this test previously on the cloud, with similar results:

nodes	reads/sec
1	24,000
2	21,000
3	21,000
4	21,000
5	21,000
6	21,000

In fact, I ran it twice out of disbelief (on different nodes the second time) to essentially identical results. 

Other Notes:
 - stress.py was run in both random and gaussian mode; there was no difference. 
 - Runs were 10+ minutes (where the above number represents an average excluding the beginning and the end of the run). 
 - Supplied node lists covered all boxes in the cluster. 
 - Data and commitlog directories were deleted between each run.
 - Tokens were evenly spaced across the ring, and changed to match cluster size before each run.

If anyone has explanations or suggestions, they would be quite welcome. This is surprising to say the least.

Cheers,

Dave



On Jul 19, 2010, at 11:42 AM, Stu Hood wrote:

> Hey Oren,
> 
> The Cloud Servers REST API returns a "hostId" for each server that indicates which physical host you are on: I'm not sure if you can see it from the control panel, but a quick curl session should get you the answer.
> 
> Thanks,
> Stu
> 
> -----Original Message-----
> From: "Oren Benjamin" <or...@clearspring.com>
> Sent: Monday, July 19, 2010 10:30am
> To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
> Subject: Re: Cassandra benchmarking on Rackspace Cloud
> 
> Certainly I'm using multiple cloud servers for the multiple client tests.  Whether or not they are resident on the same physical machine, I just don't know.
> 
>   -- Oren
> 
> On Jul 18, 2010, at 11:35 PM, Brandon Williams wrote:
> 
> On Sun, Jul 18, 2010 at 8:45 PM, Oren Benjamin <or...@clearspring.com>> wrote:
> Thanks for the info.  Very helpful in validating what I've been seeing.  As for the scaling limit...
> 
>>> The above was single node testing.  I'd expect to be able to add nodes and scale throughput.  Unfortunately, I seem to be running into a cap of 21,000 reads/s regardless of the number of nodes in the cluster.
>> 
>> This is what I would expect if a single machine is handling all the
>> Thrift requests.  Are you spreading the client connections to all the
>> machines?
> 
> Yes - in all tests I add all nodes in the cluster to the --nodes list.  The client requests are in fact being dispersed among all the nodes as evidenced by the intermittent TimedOutExceptions in the log which show up against the various nodes in the input list.  Could it be a result of all the virtual nodes being hosted on the same physical hardware?  Am I running into some connection limit?  I don't see anything pegged in the JMX stats.
> 
> It's unclear if you're using multiple client machines for stress.py or not, a limitation of 24k/21k for a single quad-proc machine is normal in my experience.
> 
> -Brandon
> 
> 
> 




Re: Cassandra benchmarking on Rackspace Cloud

Posted by David Schoonover <da...@gmail.com>.
Hello all, I'm Oren's partner in crime on all this. I've got a few more numbers to add.

In an effort to eliminate everything but the scaling issue, I set up a cluster on dedicated hardware (non-virtualized; 8-core, 16G RAM). 

No data was loaded into Cassandra -- 100% of requests were misses. This is, so far as we can reason about the problem, as fast as the database can perform; disk is out of the picture, and the hardware is certainly more than sufficient.

nodes	reads/sec
1	53,000
2	37,000
4	37,000

I ran this test previously on the cloud, with similar results:

nodes	reads/sec
1	24,000
2	21,000
3	21,000
4	21,000
5	21,000
6	21,000

In fact, I ran it twice out of disbelief (on different nodes the second time) to essentially identical results. 

Other Notes:
 - stress.py was run in both random and gaussian mode; there was no difference. 
 - Runs were 10+ minutes (where the above number represents an average excluding the beginning and the end of the run). 
 - Supplied node lists covered all boxes in the cluster. 
 - Data and commitlog directories were deleted between each run.
 - Tokens were evenly spaced across the ring, and changed to match cluster size before each run.

If anyone has explanations or suggestions, they would be quite welcome. This is surprising to say the least.

Cheers,

Dave



On Jul 19, 2010, at 11:42 AM, Stu Hood wrote:

> Hey Oren,
> 
> The Cloud Servers REST API returns a "hostId" for each server that indicates which physical host you are on: I'm not sure if you can see it from the control panel, but a quick curl session should get you the answer.
> 
> Thanks,
> Stu
> 
> -----Original Message-----
> From: "Oren Benjamin" <or...@clearspring.com>
> Sent: Monday, July 19, 2010 10:30am
> To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
> Subject: Re: Cassandra benchmarking on Rackspace Cloud
> 
> Certainly I'm using multiple cloud servers for the multiple client tests.  Whether or not they are resident on the same physical machine, I just don't know.
> 
>   -- Oren
> 
> On Jul 18, 2010, at 11:35 PM, Brandon Williams wrote:
> 
> On Sun, Jul 18, 2010 at 8:45 PM, Oren Benjamin <or...@clearspring.com>> wrote:
> Thanks for the info.  Very helpful in validating what I've been seeing.  As for the scaling limit...
> 
>>> The above was single node testing.  I'd expect to be able to add nodes and scale throughput.  Unfortunately, I seem to be running into a cap of 21,000 reads/s regardless of the number of nodes in the cluster.
>> 
>> This is what I would expect if a single machine is handling all the
>> Thrift requests.  Are you spreading the client connections to all the
>> machines?
> 
> Yes - in all tests I add all nodes in the cluster to the --nodes list.  The client requests are in fact being dispersed among all the nodes as evidenced by the intermittent TimedOutExceptions in the log which show up against the various nodes in the input list.  Could it be a result of all the virtual nodes being hosted on the same physical hardware?  Am I running into some connection limit?  I don't see anything pegged in the JMX stats.
> 
> It's unclear if you're using multiple client machines for stress.py or not, a limitation of 24k/21k for a single quad-proc machine is normal in my experience.
> 
> -Brandon
> 
> 
> 


Re: Cassandra benchmarking on Rackspace Cloud

Posted by Stu Hood <st...@rackspace.com>.
Hey Oren,

The Cloud Servers REST API returns a "hostId" for each server that indicates which physical host you are on: I'm not sure if you can see it from the control panel, but a quick curl session should get you the answer.

Thanks,
Stu

-----Original Message-----
From: "Oren Benjamin" <or...@clearspring.com>
Sent: Monday, July 19, 2010 10:30am
To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
Subject: Re: Cassandra benchmarking on Rackspace Cloud

Certainly I'm using multiple cloud servers for the multiple client tests.  Whether or not they are resident on the same physical machine, I just don't know.

   -- Oren

On Jul 18, 2010, at 11:35 PM, Brandon Williams wrote:

On Sun, Jul 18, 2010 at 8:45 PM, Oren Benjamin <or...@clearspring.com>> wrote:
Thanks for the info.  Very helpful in validating what I've been seeing.  As for the scaling limit...

>> The above was single node testing.  I'd expect to be able to add nodes and scale throughput.  Unfortunately, I seem to be running into a cap of 21,000 reads/s regardless of the number of nodes in the cluster.
>
> This is what I would expect if a single machine is handling all the
> Thrift requests.  Are you spreading the client connections to all the
> machines?

Yes - in all tests I add all nodes in the cluster to the --nodes list.  The client requests are in fact being dispersed among all the nodes as evidenced by the intermittent TimedOutExceptions in the log which show up against the various nodes in the input list.  Could it be a result of all the virtual nodes being hosted on the same physical hardware?  Am I running into some connection limit?  I don't see anything pegged in the JMX stats.

It's unclear if you're using multiple client machines for stress.py or not, a limitation of 24k/21k for a single quad-proc machine is normal in my experience.

-Brandon




Re: Cassandra benchmarking on Rackspace Cloud

Posted by Oren Benjamin <or...@clearspring.com>.
Certainly I'm using multiple cloud servers for the multiple client tests.  Whether or not they are resident on the same physical machine, I just don't know.

   -- Oren

On Jul 18, 2010, at 11:35 PM, Brandon Williams wrote:

On Sun, Jul 18, 2010 at 8:45 PM, Oren Benjamin <or...@clearspring.com>> wrote:
Thanks for the info.  Very helpful in validating what I've been seeing.  As for the scaling limit...

>> The above was single node testing.  I'd expect to be able to add nodes and scale throughput.  Unfortunately, I seem to be running into a cap of 21,000 reads/s regardless of the number of nodes in the cluster.
>
> This is what I would expect if a single machine is handling all the
> Thrift requests.  Are you spreading the client connections to all the
> machines?

Yes - in all tests I add all nodes in the cluster to the --nodes list.  The client requests are in fact being dispersed among all the nodes as evidenced by the intermittent TimedOutExceptions in the log which show up against the various nodes in the input list.  Could it be a result of all the virtual nodes being hosted on the same physical hardware?  Am I running into some connection limit?  I don't see anything pegged in the JMX stats.

It's unclear if you're using multiple client machines for stress.py or not, a limitation of 24k/21k for a single quad-proc machine is normal in my experience.

-Brandon


Re: Cassandra benchmarking on Rackspace Cloud

Posted by Brandon Williams <dr...@gmail.com>.
On Sun, Jul 18, 2010 at 8:45 PM, Oren Benjamin <or...@clearspring.com> wrote:

> Thanks for the info.  Very helpful in validating what I've been seeing.  As
> for the scaling limit...
>
> >> The above was single node testing.  I'd expect to be able to add nodes
> and scale throughput.  Unfortunately, I seem to be running into a cap of
> 21,000 reads/s regardless of the number of nodes in the cluster.
> >
> > This is what I would expect if a single machine is handling all the
> > Thrift requests.  Are you spreading the client connections to all the
> > machines?
>
> Yes - in all tests I add all nodes in the cluster to the --nodes list.  The
> client requests are in fact being dispersed among all the nodes as evidenced
> by the intermittent TimedOutExceptions in the log which show up against the
> various nodes in the input list.  Could it be a result of all the virtual
> nodes being hosted on the same physical hardware?  Am I running into some
> connection limit?  I don't see anything pegged in the JMX stats.


It's unclear if you're using multiple client machines for stress.py or not,
a limitation of 24k/21k for a single quad-proc machine is normal in my
experience.

-Brandon

Re: Cassandra benchmarking on Rackspace Cloud

Posted by Oren Benjamin <or...@clearspring.com>.
Thanks for the info.  Very helpful in validating what I've been seeing.  As for the scaling limit...

>> The above was single node testing.  I'd expect to be able to add nodes and scale throughput.  Unfortunately, I seem to be running into a cap of 21,000 reads/s regardless of the number of nodes in the cluster.
> 
> This is what I would expect if a single machine is handling all the
> Thrift requests.  Are you spreading the client connections to all the
> machines?

Yes - in all tests I add all nodes in the cluster to the --nodes list.  The client requests are in fact being dispersed among all the nodes as evidenced by the intermittent TimedOutExceptions in the log which show up against the various nodes in the input list.  Could it be a result of all the virtual nodes being hosted on the same physical hardware?  Am I running into some connection limit?  I don't see anything pegged in the JMX stats.



On Jul 17, 2010, at 9:07 AM, Jonathan Ellis wrote:

> On Fri, Jul 16, 2010 at 6:06 PM, Oren Benjamin <or...@clearspring.com> wrote:
>> The first goal was to reproduce the test described on spyced here: http://spyced.blogspot.com/2010/01/cassandra-05.html
>> 
>> Using Cassandra 0.6.3, a 4GB/160GB cloud server (http://www.rackspacecloud.com/cloud_hosting_products/servers/pricing) with default storage-conf.xml and cassandra.in.sh, here's what I got:
>> 
>> Reads: 4,800/s
>> Writes: 9,000/s
>> 
>> Pretty close to the result posted on the blog, with a slightly lower write performance (perhaps due to the availability of only a single disk for both commitlog and data).
> 
> You're getting as close as you are because you're comparing 0.6
> numbers with 0.5.  For 0.6 on the test machine used in the blog post
> (quad core, 2 disks, 4GB) we were getting 7k reads and 14k writes.
> 
> In our tests we saw a 5-15% performance penalty from adding a
> virtualization layer.  Things like only having a single disk are going
> to stack on top of that.
> 
>> The above was single node testing.  I'd expect to be able to add nodes and scale throughput.  Unfortunately, I seem to be running into a cap of 21,000 reads/s regardless of the number of nodes in the cluster.
> 
> This is what I would expect if a single machine is handling all the
> Thrift requests.  Are you spreading the client connections to all the
> machines?
> 
>> The disk performance of the cloud servers have been extremely spotty... Is this normal for the cloud?
> 
> Yes.
> 
>>  And if so, what's the solution re Cassandra?
> 
> The larger the instance you're using, the closer you are to having the
> entire machine, meaning less other users are competing with you for
> disk i/o.
> 
> Of course when you're renting the entire machine's worth, it can be
> more cost-effective to just use dedicated hardware.
> 
>>  However, Cassandra routes to the nearest node topologically and not to the best performing one, so "bad" nodes will always result in high latency reads.
> 
> Cassandra routes reads around nodes with temporarily poor performance
> in 0.7, btw.
> 
> -- 
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com


Re: Cassandra benchmarking on Rackspace Cloud

Posted by Schubert Zhang <zs...@gmail.com>.
I fact, in my cassandra-0.6.2, I can only get about 40~50 reads/s with
disabled Key/Row cache.

On Sun, Jul 18, 2010 at 1:02 AM, Schubert Zhang <zs...@gmail.com> wrote:

> Hi Jonathan,
> The 7k reads/s is very high, could you please make more explain about your
> benchmark?
>
> 7000 reads/s makes average latency of each read operation only talks
> 0.143ms. Consider 2 disks in the benchmark, it may be 0.286ms.
>
> But in most random read applications on very large dataset, OS cache and
> Cassandra Key/Row cache is not so effective. So, I guess, maybe for a test
> on large dataset (such as 1TB) , random reads, the result may not so good.
>
>
> On Sat, Jul 17, 2010 at 9:07 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>
>> On Fri, Jul 16, 2010 at 6:06 PM, Oren Benjamin <or...@clearspring.com>
>> wrote:
>> > The first goal was to reproduce the test described on spyced here:
>> http://spyced.blogspot.com/2010/01/cassandra-05.html
>> >
>> > Using Cassandra 0.6.3, a 4GB/160GB cloud server (
>> http://www.rackspacecloud.com/cloud_hosting_products/servers/pricing)
>> with default storage-conf.xml and cassandra.in.sh, here's what I got:
>> >
>> > Reads: 4,800/s
>> > Writes: 9,000/s
>> >
>> > Pretty close to the result posted on the blog, with a slightly lower
>> write performance (perhaps due to the availability of only a single disk for
>> both commitlog and data).
>>
>> You're getting as close as you are because you're comparing 0.6
>> numbers with 0.5.  For 0.6 on the test machine used in the blog post
>> (quad core, 2 disks, 4GB) we were getting 7k reads and 14k writes.
>>
>> In our tests we saw a 5-15% performance penalty from adding a
>> virtualization layer.  Things like only having a single disk are going
>> to stack on top of that.
>>
>> > The above was single node testing.  I'd expect to be able to add nodes
>> and scale throughput.  Unfortunately, I seem to be running into a cap of
>> 21,000 reads/s regardless of the number of nodes in the cluster.
>>
>> This is what I would expect if a single machine is handling all the
>> Thrift requests.  Are you spreading the client connections to all the
>> machines?
>>
>> > The disk performance of the cloud servers have been extremely spotty...
>> Is this normal for the cloud?
>>
>> Yes.
>>
>> >  And if so, what's the solution re Cassandra?
>>
>> The larger the instance you're using, the closer you are to having the
>> entire machine, meaning less other users are competing with you for
>> disk i/o.
>>
>> Of course when you're renting the entire machine's worth, it can be
>> more cost-effective to just use dedicated hardware.
>>
>> > However, Cassandra routes to the nearest node topologically and not to
>> the best performing one, so "bad" nodes will always result in high latency
>> reads.
>>
>> Cassandra routes reads around nodes with temporarily poor performance
>> in 0.7, btw.
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of Riptano, the source for professional Cassandra support
>> http://riptano.com
>>
>
>

Re: Cassandra benchmarking on Rackspace Cloud

Posted by Jonathan Ellis <jb...@gmail.com>.
On Sat, Jul 17, 2010 at 1:31 PM, Brandon Williams <dr...@gmail.com> wrote:
> Most of the data was hot in either the row/key cache, or in the OS file
> cache.  The point was to benchmark Cassandra, not how fast the disk can
> seek.

IIRC we were running w/ the defaults of 200k key cache and no row cache.

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Re: Cassandra benchmarking on Rackspace Cloud

Posted by Brandon Williams <dr...@gmail.com>.
On Sat, Jul 17, 2010 at 12:02 PM, Schubert Zhang <zs...@gmail.com> wrote:

> Hi Jonathan,
> The 7k reads/s is very high, could you please make more explain about your
> benchmark?
>
> 7000 reads/s makes average latency of each read operation only talks
> 0.143ms. Consider 2 disks in the benchmark, it may be 0.286ms.
>
> But in most random read applications on very large dataset, OS cache and
> Cassandra Key/Row cache is not so effective. So, I guess, maybe for a test
> on large dataset (such as 1TB) , random reads, the result may not so good.


Most of the data was hot in either the row/key cache, or in the OS file
cache.  The point was to benchmark Cassandra, not how fast the disk can
seek.

-Brandon

Re: Cassandra benchmarking on Rackspace Cloud

Posted by Schubert Zhang <zs...@gmail.com>.
Hi Jonathan,
The 7k reads/s is very high, could you please make more explain about your
benchmark?

7000 reads/s makes average latency of each read operation only talks
0.143ms. Consider 2 disks in the benchmark, it may be 0.286ms.

But in most random read applications on very large dataset, OS cache and
Cassandra Key/Row cache is not so effective. So, I guess, maybe for a test
on large dataset (such as 1TB) , random reads, the result may not so good.


On Sat, Jul 17, 2010 at 9:07 PM, Jonathan Ellis <jb...@gmail.com> wrote:

> On Fri, Jul 16, 2010 at 6:06 PM, Oren Benjamin <or...@clearspring.com>
> wrote:
> > The first goal was to reproduce the test described on spyced here:
> http://spyced.blogspot.com/2010/01/cassandra-05.html
> >
> > Using Cassandra 0.6.3, a 4GB/160GB cloud server (
> http://www.rackspacecloud.com/cloud_hosting_products/servers/pricing) with
> default storage-conf.xml and cassandra.in.sh, here's what I got:
> >
> > Reads: 4,800/s
> > Writes: 9,000/s
> >
> > Pretty close to the result posted on the blog, with a slightly lower
> write performance (perhaps due to the availability of only a single disk for
> both commitlog and data).
>
> You're getting as close as you are because you're comparing 0.6
> numbers with 0.5.  For 0.6 on the test machine used in the blog post
> (quad core, 2 disks, 4GB) we were getting 7k reads and 14k writes.
>
> In our tests we saw a 5-15% performance penalty from adding a
> virtualization layer.  Things like only having a single disk are going
> to stack on top of that.
>
> > The above was single node testing.  I'd expect to be able to add nodes
> and scale throughput.  Unfortunately, I seem to be running into a cap of
> 21,000 reads/s regardless of the number of nodes in the cluster.
>
> This is what I would expect if a single machine is handling all the
> Thrift requests.  Are you spreading the client connections to all the
> machines?
>
> > The disk performance of the cloud servers have been extremely spotty...
> Is this normal for the cloud?
>
> Yes.
>
> >  And if so, what's the solution re Cassandra?
>
> The larger the instance you're using, the closer you are to having the
> entire machine, meaning less other users are competing with you for
> disk i/o.
>
> Of course when you're renting the entire machine's worth, it can be
> more cost-effective to just use dedicated hardware.
>
> > However, Cassandra routes to the nearest node topologically and not to
> the best performing one, so "bad" nodes will always result in high latency
> reads.
>
> Cassandra routes reads around nodes with temporarily poor performance
> in 0.7, btw.
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>

Re: Cassandra benchmarking on Rackspace Cloud

Posted by Jonathan Ellis <jb...@gmail.com>.
On Fri, Jul 16, 2010 at 6:06 PM, Oren Benjamin <or...@clearspring.com> wrote:
> The first goal was to reproduce the test described on spyced here: http://spyced.blogspot.com/2010/01/cassandra-05.html
>
> Using Cassandra 0.6.3, a 4GB/160GB cloud server (http://www.rackspacecloud.com/cloud_hosting_products/servers/pricing) with default storage-conf.xml and cassandra.in.sh, here's what I got:
>
> Reads: 4,800/s
> Writes: 9,000/s
>
> Pretty close to the result posted on the blog, with a slightly lower write performance (perhaps due to the availability of only a single disk for both commitlog and data).

You're getting as close as you are because you're comparing 0.6
numbers with 0.5.  For 0.6 on the test machine used in the blog post
(quad core, 2 disks, 4GB) we were getting 7k reads and 14k writes.

In our tests we saw a 5-15% performance penalty from adding a
virtualization layer.  Things like only having a single disk are going
to stack on top of that.

> The above was single node testing.  I'd expect to be able to add nodes and scale throughput.  Unfortunately, I seem to be running into a cap of 21,000 reads/s regardless of the number of nodes in the cluster.

This is what I would expect if a single machine is handling all the
Thrift requests.  Are you spreading the client connections to all the
machines?

> The disk performance of the cloud servers have been extremely spotty... Is this normal for the cloud?

Yes.

>  And if so, what's the solution re Cassandra?

The larger the instance you're using, the closer you are to having the
entire machine, meaning less other users are competing with you for
disk i/o.

Of course when you're renting the entire machine's worth, it can be
more cost-effective to just use dedicated hardware.

> However, Cassandra routes to the nearest node topologically and not to the best performing one, so "bad" nodes will always result in high latency reads.

Cassandra routes reads around nodes with temporarily poor performance
in 0.7, btw.

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com