You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Sylvain Lebresne <sy...@yakaz.com> on 2010/03/09 14:15:13 UTC

Bad read performances: 'few rows of many columns' vs 'many rows of few columns'

Hello,

I've done some tests and it seems that somehow to have more rows with few
columns is better than to have more rows with fewer columns, at least as long
as read performance is concerned.
Using stress.py, on a quad core 2.27Ghz with 4Go RAM and the out of the box
cassandra configuration, I inserted:

  1) 50000000 rows (that's 50 millions) with 1 column each
(stress.py -n 50000000 -c 1)
  2) 500000 rows (that's 500 thousands) with 100 column each
(stress.py -n 500000 -c 100)

that is, it ends up with 50 millions columns in both case (I use such big
numbers so that in case 2, the resulting data are big enough not to fit in
the system caches, in which case the problem I'm mentioning below
doesn't show).
Those two 'tests' have been done separatly, with data flushed completely
between them. I let cassandra compact everything each time, shutdown the
server and start it again (so that no data is in memtable). Then I tried
reading columns, one at a time using:
  1) stress.py -t 10 -o read -n 50000000 -c 1 -r
  2) stress.py -t 10 -o read -n 500000 -c 1 -r

In the case 1) I get around 200 reads/seconds and that's pretty stable. The
disk is spinning like crazy (~25% io_wait), very few cpu or memory used,
performances are IO bound, which is expected.
In the case 2) however, it starts with reasonnable performance (400+
reads/seconds), but it very quickly drop to an average of 80 reads/seconds
(after a minute and a half or so). And it don't go up significantly after
that. Turns out this seems to be a GC problem. Indeed, the info log (I'm
running trunk from today, but I first saw the problem on an older version of
trunk) show every few seconds lines like:
  GC for ConcurrentMarkSweep: 4599 ms, 57247304 reclaimed leaving
1033481216 used; max is 1211498496
I'm not surprised that performance are bad with such GC pauses. I'm surprised
to have such GC pauses.

Note that in case 1), the resulting data 'weights' ~14G, while in case 2) it
'weights' only ~2.4G.

Let me add that I used stress.py to try to identify the problem, but I first
run into it in an application I'm writting where I had rows with around 1000
columns of 30K each. With about 1000 rows, I had awfull performances, like 5
reads/seconds on average. I try switching to 1 millions row having each 1
column of 30K and end up with more than 300 reads/seconds.

Any idea, insight ? Am I doing something utterly wrong ?
Thanks in advance.

--
Sylvain

Re: Bad read performances: 'few rows of many columns' vs 'many rows of few columns'

Posted by Jesse McConnell <je...@gmail.com>.

in my experience #2 will work well up to a point where it will trigger
a limitation of cassandra (slated to be resolved in .7 \o/) where all
of the columns under a given key must be able to fit into memory.  For
things like index's of data I have opted to shard the keys for really
large data sets to get around this until its fixed....

I suspect if you doubled the test for #2 once or twice you'll start
seeing OOM's

also #2 will end up having a lumpy distribution around a cluster as
all the data under a given key needs to be able to fit on one machine,
#1 will spread out a bit finer.

cheers,
jesse

--
jesse mcconnell
jesse.mcconnell@gmail.com



On Tue, Mar 9, 2010 at 07:15, Sylvain Lebresne <sy...@yakaz.com> wrote:
> Hello,
>
> I've done some tests and it seems that somehow to have more rows with few
> columns is better than to have more rows with fewer columns, at least as long
> as read performance is concerned.
> Using stress.py, on a quad core 2.27Ghz with 4Go RAM and the out of the box
> cassandra configuration, I inserted:
>
>  1) 50000000 rows (that's 50 millions) with 1 column each
> (stress.py -n 50000000 -c 1)
>  2) 500000 rows (that's 500 thousands) with 100 column each
> (stress.py -n 500000 -c 100)
>
> that is, it ends up with 50 millions columns in both case (I use such big
> numbers so that in case 2, the resulting data are big enough not to fit in
> the system caches, in which case the problem I'm mentioning below
> doesn't show).
> Those two 'tests' have been done separatly, with data flushed completely
> between them. I let cassandra compact everything each time, shutdown the
> server and start it again (so that no data is in memtable). Then I tried
> reading columns, one at a time using:
>  1) stress.py -t 10 -o read -n 50000000 -c 1 -r
>  2) stress.py -t 10 -o read -n 500000 -c 1 -r
>
> In the case 1) I get around 200 reads/seconds and that's pretty stable. The
> disk is spinning like crazy (~25% io_wait), very few cpu or memory used,
> performances are IO bound, which is expected.
> In the case 2) however, it starts with reasonnable performance (400+
> reads/seconds), but it very quickly drop to an average of 80 reads/seconds
> (after a minute and a half or so). And it don't go up significantly after
> that. Turns out this seems to be a GC problem. Indeed, the info log (I'm
> running trunk from today, but I first saw the problem on an older version of
> trunk) show every few seconds lines like:
>  GC for ConcurrentMarkSweep: 4599 ms, 57247304 reclaimed leaving
> 1033481216 used; max is 1211498496
> I'm not surprised that performance are bad with such GC pauses. I'm surprised
> to have such GC pauses.
>
> Note that in case 1), the resulting data 'weights' ~14G, while in case 2) it
> 'weights' only ~2.4G.
>
> Let me add that I used stress.py to try to identify the problem, but I first
> run into it in an application I'm writting where I had rows with around 1000
> columns of 30K each. With about 1000 rows, I had awfull performances, like 5
> reads/seconds on average. I try switching to 1 millions row having each 1
> column of 30K and end up with more than 300 reads/seconds.
>
> Any idea, insight ? Am I doing something utterly wrong ?
> Thanks in advance.
>
> --
> Sylvain
>