You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Héctor Izquierdo Seliva <iz...@strands.com> on 2011/02/21 15:56:24 UTC

millions of columns in a row vs millions of rows with one column

Hi Everyone.

I'm testing performance differences of millions of columns in a row vs
millions of rows. So far it seems wide rows perform better in terms of
reads, but there can be potentially hundreds of millions of columns in a
row. Is this going to be a problem? Should I go with individual rows? I
run 6 nodes with 7.2 and a RF=3.

Thanks for your help!

Re: millions of columns in a row vs millions of rows with one column

Posted by Héctor Izquierdo Seliva <iz...@strands.com>.

El mar, 22-02-2011 a las 08:49 +1300, Aaron Morton escribió:
> My preference is to go with more rows as it distributes load better. But the best design is the one that supports your read patterns.
> 
> See http://wiki.apache.org/cassandra/LargeDataSetConsiderations for background.
> 
> Aaron
> 

those rows are distributed among the three replicas, so my thought was
that I could get away with it and have a more or less balanced cluster.
Anyway, the columns I read are not contiguous, so then the effect in I/O
is the same as having individual rows right? Cassandra still has to seek
to the position of the columns within the row. 

How much space does the key cache uses per row? This would make the
number of rows increase by a big factor.

> On 22/02/2011, at 3:56 AM, Héctor Izquierdo Seliva <iz...@strands.com> wrote:
> 
> > Hi Everyone.
> > 
> > I'm testing performance differences of millions of columns in a row vs
> > millions of rows. So far it seems wide rows perform better in terms of
> > reads, but there can be potentially hundreds of millions of columns in a
> > row. Is this going to be a problem? Should I go with individual rows? I
> > run 6 nodes with 7.2 and a RF=3.
> > 
> > Thanks for your help!
> >

Re: millions of columns in a row vs millions of rows with one column

Posted by Aaron Morton <aa...@thelastpickle.com>.

My preference is to go with more rows as it distributes load better. But the best design is the one that supports your read patterns.

See http://wiki.apache.org/cassandra/LargeDataSetConsiderations for background.

Aaron

On 22/02/2011, at 3:56 AM, Héctor Izquierdo Seliva <iz...@strands.com> wrote:

> Hi Everyone.
> 
> I'm testing performance differences of millions of columns in a row vs
> millions of rows. So far it seems wide rows perform better in terms of
> reads, but there can be potentially hundreds of millions of columns in a
> row. Is this going to be a problem? Should I go with individual rows? I
> run 6 nodes with 7.2 and a RF=3.
> 
> Thanks for your help!
>