You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Ian Danforth <id...@numenta.com> on 2011/10/21 21:12:03 UTC

Specific Question, General Problem

All,

 I have a specific question which I think highlights a general problem.

===Specific Question===

I'm seeing read times of 2-300ms for getting a single row. This seems slow,
but is it unusual?

Stack

5 node cluster
Version .86
EC2 m1large machines
ebs drives for all data (I know, I know)

Datamodel

Millions of rows that are at most 1440 columns wide. Each column stores a
single int.

===General Problem===

I don't know what 'normal' is in Cassandra. The docs use terms like 'large'
or 'wide' rows, but I don't have any absolute numbers around that adjective.
I don't know if storing millions of rows in 5 nodes is unusual (maybe people
scale out before they get to this size). Etc.

There are plenty of people here who have an idea of what's normal for their
cluster, but only a very few who know what is normal for Cassandra in
general.

I would love, *love*, to have a document that highlighted this.

Heck I'd love to help build a 'performance calculator' in which you could
put in the number of nodes, and it would tell you how much data it would be
reasonable to store. (Yes I know there are a ton of variables involved.)

Thanks for any light that can be shed on my specific question or the general
problem.

Ian

RE: Specific Question, General Problem

Posted by Dan Hendry <da...@gmail.com>.
> of 2-300ms for getting a single row. This seems slow, but is it unusual?

 

What are those numbers? 2 ms being average? 300 ms a 95th/99th percentile? A
value you saw once? Yes, this *seems* slow given your row definition but
without knowing what the value represents it's almost impossible to judge. I
would say its not unthinkable to see that on occasion with m1.larges, EBS
drives, and a high consistency request.

 

> I don't know if storing millions of rows in 5 nodes is unusual

 

Generally, I would say that is a pretty reasonable number but you should
look at other factors to investigate read latency and measure your overall
capacity. What does JVM heap memory/CPU look like (via jmx)? What does your
disk and system utilization look like (good overview, particularly on
iostat:  <http://spyced.blogspot.com/2010/01/linux-performance-basics.html>
http://spyced.blogspot.com/2010/01/linux-performance-basics.html)? Although
less important as of 0.8, are you having GC problems causing long
application pause times (enable the GC logs in cassandra-env)?

 

I feel your frustration with not knowing what is 'expected' but a
performance calculator seems ambitious to me because as you pointed out,
there are a significant number of important and often hard to define
parameters.

 

Dan

 

From: Ian Danforth [mailto:idanforth@numenta.com] 
Sent: October-21-11 15:12
To: user@cassandra.apache.org
Subject: Specific Question, General Problem

 

All,

 

 I have a specific question which I think highlights a general problem. 

 

===Specific Question===

 

I'm seeing read times of 2-300ms for getting a single row. This seems slow,
but is it unusual?

 

Stack

 

5 node cluster

Version .86

EC2 m1large machines

ebs drives for all data (I know, I know)

 

Datamodel

 

Millions of rows that are at most 1440 columns wide. Each column stores a
single int.

 

===General Problem===

 

I don't know what 'normal' is in Cassandra. The docs use terms like 'large'
or 'wide' rows, but I don't have any absolute numbers around that adjective.
I don't know if storing millions of rows in 5 nodes is unusual (maybe people
scale out before they get to this size). Etc.

 

There are plenty of people here who have an idea of what's normal for their
cluster, but only a very few who know what is normal for Cassandra in
general.

 

I would love, *love*, to have a document that highlighted this.

 

Heck I'd love to help build a 'performance calculator' in which you could
put in the number of nodes, and it would tell you how much data it would be
reasonable to store. (Yes I know there are a ton of variables involved.)

 

Thanks for any light that can be shed on my specific question or the general
problem.

 

Ian

No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 9.0.917 / Virus Database: 271.1.1/3963 - Release Date: 10/21/11
02:34:00