You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Mark Jones <MJ...@imagehawk.com> on 2010/04/21 17:28:41 UTC

At what point does the cluster get faster than the individual nodes?

I'm seeing a cluster of 4 (replication factor=2) to be about as slow overall as the barely faster than the slowest node in the group.  When I run the 4 nodes individually, I see:

For inserts:
Two nodes @ 12000/second
1 node @ 9000/second
1 node @ 7000/second

For reads:
Abysmal, less than 1000/second (not range slices, individual lookups)  Disk util @ 88+%


How many nodes are required before you see a net positive gain on inserts and reads (QUORUM consistency on both)?
When I use my 2 fastest nodes as a pair, the thruput is around 9000 inserts/second.

What is a good to excellent hardware config for Cassandra?  I have separate drives for data and commit log and 8GB in 3 machines (all dual core).  My fastest insert node has 4GB and a triple core processor.

I've run py_stress, and my C++ code beats it by several 1000 inserts/second toward the end of the runs, so I don't think it is my app, and I've removed the super columns per some suggestions yesterday.

When Cassandra is working, it performs well, the problem is that is frequently slows down to < 50% of its peaks and occasionally slows down to 0 inserts/second which greatly reduces aggregate thruput.

Re: At what point does the cluster get faster than the individual nodes?

Posted by "Jim R. Wilson" <wi...@gmail.com>.

Hi Mark,

I'm a relative newcomer to Cassandra, but I believe the common
experience is that you start seeing gains after 5 nodes in a
column-oriented data store.  It may also depend on your usage pattern.

Others may know better - hope this helps!

-- Jim R. Wilson (jimbojw)

On Wed, Apr 21, 2010 at 11:28 AM, Mark Jones <MJ...@imagehawk.com> wrote:
> I’m seeing a cluster of 4 (replication factor=2) to be about as slow overall
> as the barely faster than the slowest node in the group.  When I run the 4
> nodes individually, I see:
>
>
>
> For inserts:
>
> Two nodes @ 12000/second
>
> 1 node @ 9000/second
>
> 1 node @ 7000/second
>
>
>
> For reads:
>
> Abysmal, less than 1000/second (not range slices, individual lookups)  Disk
> util @ 88+%
>
>
>
>
>
> How many nodes are required before you see a net positive gain on inserts
> and reads (QUORUM consistency on both)?
>
> When I use my 2 fastest nodes as a pair, the thruput is around 9000
> inserts/second.
>
>
>
> What is a good to excellent hardware config for Cassandra?  I have separate
> drives for data and commit log and 8GB in 3 machines (all dual core).  My
> fastest insert node has 4GB and a triple core processor.
>
>
>
> I’ve run py_stress, and my C++ code beats it by several 1000 inserts/second
> toward the end of the runs, so I don’t think it is my app, and I’ve removed
> the super columns per some suggestions yesterday.
>
>
>
> When Cassandra is working, it performs well, the problem is that is
> frequently slows down to < 50% of its peaks and occasionally slows down to 0
> inserts/second which greatly reduces aggregate thruput.

Re: Row deletion and get_range_slices (cassandra 0.6.1)

Posted by Jonathan Ellis <jb...@gmail.com>.

On Fri, Apr 23, 2010 at 3:53 AM, David Harrison
<da...@gmail.com> wrote:
> So I'm guessing that means compaction doesn't include purging of
> tombstone-d keys ?

Incorrect.

http://wiki.apache.org/cassandra/DistributedDeletes
http://wiki.apache.org/cassandra/MemtableSSTable

Re: Row deletion and get_range_slices (cassandra 0.6.1)

Posted by David Harrison <da...@gmail.com>.

So I'm guessing that means compaction doesn't include purging of
tombstone-d keys ?  Is there any situation or maintenance process that
does ? (or are keys forever?)

On 23 April 2010 17:44, Ryan King <ry...@twitter.com> wrote:
> On Thu, Apr 22, 2010 at 8:24 PM, David Harrison
> <da...@gmail.com> wrote:
>> Do those tombstone-d keys ever get purged completely ?  I've tried
>> shortening the GCGraceSeconds right down but they still don't get
>> cleaned up.
>
> The GCGraceSeconds will only apply when you compact data.
>
> -ryan
>

Re: Row deletion and get_range_slices (cassandra 0.6.1)

Posted by Ryan King <ry...@twitter.com>.

On Thu, Apr 22, 2010 at 8:24 PM, David Harrison
<da...@gmail.com> wrote:
> Do those tombstone-d keys ever get purged completely ?  I've tried
> shortening the GCGraceSeconds right down but they still don't get
> cleaned up.

The GCGraceSeconds will only apply when you compact data.

-ryan

Re: Row deletion and get_range_slices (cassandra 0.6.1)

Posted by David Harrison <da...@gmail.com>.

Do those tombstone-d keys ever get purged completely ?  I've tried
shortening the GCGraceSeconds right down but they still don't get
cleaned up.

On 23 April 2010 08:57, Jonathan Ellis <jb...@gmail.com> wrote:
> http://wiki.apache.org/cassandra/FAQ#range_ghosts
>
> On Thu, Apr 22, 2010 at 5:29 PM, Carlos Sanchez
> <ca...@riskmetrics.com> wrote:
>> I have a curious question..
>>
>> I am doing some testing where I insert 500 rows to a super column family and then delete one row, I make sure the row was indeed deleted (NotFoundException in the get call) and then I ran a get_range_slices and the row indeed returned. The shutdown Cassandra and restarted it. I repeated the test (with inserting the rows) and even though I get the NotFoundException for that row, the get_rance_slices still returns it.  Is this the expected behavior? How long should I wait before I don't see the row in the get_range_slices? Do I have to force a flush or change consistency level?
>>
>> Thanks,
>>
>> Carlos
>>
>> This email message and any attachments are for the sole use of the intended recipients and may contain proprietary and/or confidential information which may be privileged or otherwise protected from disclosure. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not an intended recipient, please contact the sender by reply email and destroy the original message and any copies of the message as well as any attachments to the original message.
>>
>

Re: Row deletion and get_range_slices (cassandra 0.6.1)

Posted by Jonathan Ellis <jb...@gmail.com>.

http://wiki.apache.org/cassandra/FAQ#range_ghosts

On Thu, Apr 22, 2010 at 5:29 PM, Carlos Sanchez
<ca...@riskmetrics.com> wrote:
> I have a curious question..
>
> I am doing some testing where I insert 500 rows to a super column family and then delete one row, I make sure the row was indeed deleted (NotFoundException in the get call) and then I ran a get_range_slices and the row indeed returned. The shutdown Cassandra and restarted it. I repeated the test (with inserting the rows) and even though I get the NotFoundException for that row, the get_rance_slices still returns it.  Is this the expected behavior? How long should I wait before I don't see the row in the get_range_slices? Do I have to force a flush or change consistency level?
>
> Thanks,
>
> Carlos
>
> This email message and any attachments are for the sole use of the intended recipients and may contain proprietary and/or confidential information which may be privileged or otherwise protected from disclosure. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not an intended recipient, please contact the sender by reply email and destroy the original message and any copies of the message as well as any attachments to the original message.
>

Row deletion and get_range_slices (cassandra 0.6.1)

Posted by Carlos Sanchez <ca...@riskmetrics.com>.

I have a curious question..

I am doing some testing where I insert 500 rows to a super column family and then delete one row, I make sure the row was indeed deleted (NotFoundException in the get call) and then I ran a get_range_slices and the row indeed returned. The shutdown Cassandra and restarted it. I repeated the test (with inserting the rows) and even though I get the NotFoundException for that row, the get_rance_slices still returns it.  Is this the expected behavior? How long should I wait before I don't see the row in the get_range_slices? Do I have to force a flush or change consistency level?

Thanks,

Carlos

This email message and any attachments are for the sole use of the intended recipients and may contain proprietary and/or confidential information which may be privileged or otherwise protected from disclosure. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not an intended recipient, please contact the sender by reply email and destroy the original message and any copies of the message as well as any attachments to the original message.

Re: At what point does the cluster get faster than the individual nodes?

Posted by Jonathan Ellis <jb...@gmail.com>.

fyi,

https://issues.apache.org/jira/browse/CASSANDRA-930
https://issues.apache.org/jira/browse/CASSANDRA-982

On Thu, Apr 22, 2010 at 11:11 AM, Mike Malone <mi...@simplegeo.com> wrote:
> On Wed, Apr 21, 2010 at 9:50 AM, Mark Greene <gr...@gmail.com> wrote:
>>
>> Right it's a similar concept to DB sharding where you spread the write
>> load around to different DB servers but won't necessarily increase the
>> throughput of an one DB server but rather collectively.
>
> Except with Cassandra, read-repair causes every read to go to every replica
> for a piece of data.
> Mike

Re: At what point does the cluster get faster than the individual nodes?

Posted by Mike Malone <mi...@simplegeo.com>.

On Wed, Apr 21, 2010 at 9:50 AM, Mark Greene <gr...@gmail.com> wrote:

> Right it's a similar concept to DB sharding where you spread the write load
> around to different DB servers but won't necessarily increase the throughput
> of an one DB server but rather collectively.

Except with Cassandra, read-repair causes every read to go to every replica
for a piece of data.

Mike

Re: At what point does the cluster get faster than the individual nodes?

Posted by Mark Greene <gr...@gmail.com>.

Right it's a similar concept to DB sharding where you spread the write load
around to different DB servers but won't necessarily increase the throughput
of an one DB server but rather collectively.

On Wed, Apr 21, 2010 at 12:16 PM, Mike Gallamore <
mike.e.gallamore@googlemail.com> wrote:

>  Some people might be able to answer this better than me. However: with
> quorum consistency you have to communicate with n/2 + 1 where n is the
> replication factor nodes. So unless you are disk bound your real expense is
> going to be all those extra network latencies. I'd expect that you'll see a
> relatively flat throughput per thread once you reach the point that you
> aren't disk or CPU bound. That said the extra nodes mean if you should be
> able to handle more threads/connections at the same throughput on each
> thread/connection. So bigger cluster doesn't mean a single job goes faster
> necessarily, just that you can handle more jobs at the same time.
>
> On 04/21/2010 08:28 AM, Mark Jones wrote:
>
>  I’m seeing a cluster of 4 (replication factor=2) to be about as slow
> overall as the barely faster than the slowest node in the group.  When I run
> the 4 nodes individually, I see:
>
>
>
> For inserts:
>
> Two nodes @ 12000/second
>
> 1 node @ 9000/second
>
> 1 node @ 7000/second
>
>
>
> For reads:
>
> Abysmal, less than 1000/second (not range slices, individual lookups)  Disk
> util @ 88+%
>
>
>
>
>
> How many nodes are required before you see a net positive gain on inserts
> and reads (QUORUM consistency on both)?
>
> When I use my 2 fastest nodes as a pair, the thruput is around 9000
> inserts/second.
>
>
>
> What is a good to excellent hardware config for Cassandra?  I have separate
> drives for data and commit log and 8GB in 3 machines (all dual core).  My
> fastest insert node has 4GB and a triple core processor.
>
>
>
> I’ve run py_stress, and my C++ code beats it by several 1000 inserts/second
> toward the end of the runs, so I don’t think it is my app, and I’ve removed
> the super columns per some suggestions yesterday.
>
>
>
> When Cassandra is working, it performs well, the problem is that is
> frequently slows down to < 50% of its peaks and occasionally slows down to 0
> inserts/second which greatly reduces aggregate thruput.
>
>
>

Re: At what point does the cluster get faster than the individual nodes?

Posted by Mike Gallamore <mi...@googlemail.com>.

Some people might be able to answer this better than me. However: with 
quorum consistency you have to communicate with n/2 + 1 where n is the 
replication factor nodes. So unless you are disk bound your real expense 
is going to be all those extra network latencies. I'd expect that you'll 
see a relatively flat throughput per thread once you reach the point 
that you aren't disk or CPU bound. That said the extra nodes mean if you 
should be able to handle more threads/connections at the same throughput 
on each thread/connection. So bigger cluster doesn't mean a single job 
goes faster necessarily, just that you can handle more jobs at the same 
time.
On 04/21/2010 08:28 AM, Mark Jones wrote:
>
> I'm seeing a cluster of 4 (replication factor=2) to be about as slow 
> overall as the barely faster than the slowest node in the group.  When 
> I run the 4 nodes individually, I see:
>
> For inserts:
>
> Two nodes @ 12000/second
>
> 1 node @ 9000/second
>
> 1 node @ 7000/second
>
> For reads:
>
> Abysmal, less than 1000/second (not range slices, individual lookups)  
> Disk util @ 88+%
>
> How many nodes are required before you see a net positive gain on 
> inserts and reads (QUORUM consistency on both)?
>
> When I use my 2 fastest nodes as a pair, the thruput is around 9000 
> inserts/second.
>
> What is a good to excellent hardware config for Cassandra?  I have 
> separate drives for data and commit log and 8GB in 3 machines (all 
> dual core).  My fastest insert node has 4GB and a triple core processor.
>
> I've run py_stress, and my C++ code beats it by several 1000 
> inserts/second toward the end of the runs, so I don't think it is my 
> app, and I've removed the super columns per some suggestions yesterday.
>
> When Cassandra is working, it performs well, the problem is that is 
> frequently slows down to < 50% of its peaks and occasionally slows 
> down to 0 inserts/second which greatly reduces aggregate thruput.
>