You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Josep Blanquer <bl...@rightscale.com> on 2012/01/18 02:44:49 UTC

poor Memtable performance on column slices?

Hi,

 I've been doing some tests using wide rows recently, and I've seen some
odd performance problems that I'd like to understand.

In particular, I've seen that the time it takes for Cassandra to perform a
column slice of a single key, solely in a Memtable, seems to be very
expensive, but most importantly proportional to the ordered position where
the start column of the slice lives.

In other words:
 1- if I start Cassandra fresh (with an empty ColumnFamily with TimeUUID
comparator)
 2- I create a single Row with Key "K"
 3- Then add 200K TimeUUID columns to key "K"
 4- (and make sure nothing is flushed to SSTables...so it's all in the
Memtable)

...I observe the following timings (secondds to perform 1000 reads) while
performing multiget slices on it:  (pardon the pseudo-code, but you'll get
the gist)

a) simply a get of the first column:  GET("K",:count=>1)
  --  2.351226

b) doing a slice get, starting from the first column:  GET("K",:start =>
'144abe16-416c-11e1-9e23-2cbae9ddfe8b' , :count => 1 )
  -- 2.189224   <<- so with or without "start" doesn't seem to make much of
a difference

c) doing a slice get, starting from the middle of the ordered
columns..approx starting at item number 100K:   GET("K",:start =>
'9c13c644-416c-11e1-81dd-4ba530dc83d0' , :count => 1 )
 -- 11.849326  <<- 5 times more expensive if the start of the slice is 100K
positions away

d) doing a slice get, starting from the last of the ordered columns..approx
position 200K:   GET("K",:start => '1c1b9b32-416d-11e1-83ff-dd2796c3abd7' ,
:count => 1 )
  -- 19.889741   <<- Almost twice as expensive than starting the slice at
position 100K, and 10 times more expensive than starting from the first one

This behavior leads me to believe that there's a clear Memtable column scan
for the columns of the key.
If one tries a column name read on those positions (i.e., not a slice), the
performance is constant. I.e., GET("K",
'144abe16-416c-11e1-9e23-2cbae9ddfe8b') . Retrieving the first, middle or
last timeUUID is done in the same amount of time.

Having increasingly worse performance for column slices in Memtables seems
to be a bit of a problem...aren't Memtables backed by a structure that has
some sort of column name indexing?...so that landing on the start column
can be efficient? I'm definitely observing very high CPU utilization on
those scans...By the way, with wide columns like this, slicing SSTables is
quite faster than slicing Memtables...I'm attributing that to the sampled
index of the SSTables, hence that's why I'm wondering if the Memtables do
not have such column indexing builtin and resort to linked lists of sort....

Note, that the actual timings shown are not important, it's in my laptop
and I have a small amount of debugging enabled...what it is important is
the difference between then.

I'm using Cassandra trunk as of Dec 1st, but I believe I've done
experiments with 0.8 series too, leading to the same issue.

 Cheers,

Josep M.

Re: poor Memtable performance on column slices?

Posted by Sylvain Lebresne <sy...@datastax.com>.

On Thu, Jan 19, 2012 at 3:54 AM, Josep Blanquer <bl...@rightscale.com> wrote:
>
>
> On Wed, Jan 18, 2012 at 12:44 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>>
>> On Wed, Jan 18, 2012 at 12:31 PM, Josep Blanquer
>> <bl...@rightscale.com> wrote:
>> > If I do a slice without a start (i.e., get me the first column)...it
>> > seems
>> > to fly. GET("K", :count => 1 )
>>
>> Yep, that's a totally different code path (SimpleSliceReader instead
>> of IndexedSliceReader) that we've done to optimize this common case.
>>
>
> Thanks Jonathan, yup, that makes sense. It was surprising to me that
> "avoiding the seek" was that much faster..but I guess if it's a completely
> different code path, there might be many other things in play.
>
>>
>> > The same starting at the last one.  GET("K",:start
>> > => '1c1b9b32-416d-11e1-83ff-dd2796c3abd7' , :count => 1 )
>> > -- 6.489683  -> Much faster than any other slice ... although not quite
>> > as
>> > fast as not using a start column
>>
>> That's not a special code path, but I'd guess that the last column is
>> more likely to be still in memory instead of on disk.
>>
>
> Well, no need to prolong the thread, but my tests are exclusively in
> Memtable reads (data has not flushed)...so there's no SSTable read involved
> here...which is exactly why is felt a bit funny to have that case be
> considerably faster. I just wanted to bring it up to you guys, in case you
> can think of some cause and/or potential issue.

That's not necessarily surprising. Any algorithm that search an
element in a sorted collection (which is exactly what we do here) is
likely to return some elements more quickly than others (which one
exactly depends on the algorithm detail). And since what you do is
basically micro-benchmarking that search algorithm, your results are
not necessarily surprising. For info, trunk internally uses
https://github.com/nbronson/snaptree/ as it's backing map for rows in
memtable. It's basically a balanced binary tree, so the search for the
start is basically a binary search.

--
Sylvain

Re: poor Memtable performance on column slices?

Posted by Josep Blanquer <bl...@rightscale.com>.

On Wed, Jan 18, 2012 at 12:44 PM, Jonathan Ellis <jb...@gmail.com> wrote:

> On Wed, Jan 18, 2012 at 12:31 PM, Josep Blanquer
> <bl...@rightscale.com> wrote:
> > If I do a slice without a start (i.e., get me the first column)...it
> seems
> > to fly. GET("K", :count => 1 )
>
> Yep, that's a totally different code path (SimpleSliceReader instead
> of IndexedSliceReader) that we've done to optimize this common case.
>
>
Thanks Jonathan, yup, that makes sense. It was surprising to me that
"avoiding the seek" was that much faster..but I guess if it's a completely
different code path, there might be many other things in play.


> > The same starting at the last one.  GET("K",:start
> > => '1c1b9b32-416d-11e1-83ff-dd2796c3abd7' , :count => 1 )
> > -- 6.489683  -> Much faster than any other slice ... although not quite
> as
> > fast as not using a start column
>
> That's not a special code path, but I'd guess that the last column is
> more likely to be still in memory instead of on disk.
>
>
Well, no need to prolong the thread, but my tests are exclusively in
Memtable reads (data has not flushed)...so there's no SSTable read involved
here...which is exactly why is felt a bit funny to have that case be
considerably faster. I just wanted to bring it up to you guys, in case you
can think of some cause and/or potential issue.

Thanks for the responses!

Josep M.

Re: poor Memtable performance on column slices?

Posted by Jonathan Ellis <jb...@gmail.com>.

On Wed, Jan 18, 2012 at 12:31 PM, Josep Blanquer
<bl...@rightscale.com> wrote:
> If I do a slice without a start (i.e., get me the first column)...it seems
> to fly. GET("K", :count => 1 )

Yep, that's a totally different code path (SimpleSliceReader instead
of IndexedSliceReader) that we've done to optimize this common case.

> The same starting at the last one.  GET("K",:start
> => '1c1b9b32-416d-11e1-83ff-dd2796c3abd7' , :count => 1 )
> -- 6.489683  -> Much faster than any other slice ... although not quite as
> fast as not using a start column

That's not a special code path, but I'd guess that the last column is
more likely to be still in memory instead of on disk.

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Re: poor Memtable performance on column slices?

Posted by Josep Blanquer <bl...@rightscale.com>.

Excellent Sylvain! Yes, that seems to remove the linear scan component of
slice read times.

FYI, I still see some interesting difference in some aspects though.

If I do a slice without a start (i.e., get me the first column)...it seems
to fly. GET("K", :count => 1 )
-- 4.832877  -->> very fast, and actually in this case I see the reading
client being the bottleneck, not cassandra (which it is at about 20% CPU
only)

If I do the same, but actually specifying the start column with the first
existing value...GET("K",:start => '144abe16-416c-11e1-9e23-2cbae9ddfe8b' ,
:count => 1 )
-- 11.084275 -->> half as fast, and using twice the CPU...hovering about
50% or more. (again Cassandra is not the bottleneck, but the significant
data is that the initial seeking seems to be doubling the time/cpu

If I do the same, starting by the middle.  GET("K",:start
=> '9c13c644-416c-11e1-81dd-4ba530dc83d0' , :count => 1 )
-- 11.038187  --> as expensive as starting from the beginning

The same starting at the last one.  GET("K",:start
=> '1c1b9b32-416d-11e1-83ff-dd2796c3abd7' , :count => 1 )
-- 6.489683  -> Much faster than any other slice ... although not quite as
fast as not using a start column

I could see that not having to seek into whatever backing "map/structure"
is obviously faster...although I'm surprised that seeking to an initial
value results in half as slow reads. Wouldn't this mostly imply following
some links/pointers in memory to start reading ordered columns? What is the
backing store used for Memtables when column slices are performed?

I am not sure why starting at the end (without reversing or anything)
yields much better performance.

 Cheers,

Josep M.

On Wed, Jan 18, 2012 at 12:57 AM, Sylvain Lebresne <sy...@datastax.com>wrote:

> On Wed, Jan 18, 2012 at 2:44 AM, Josep Blanquer <bl...@rightscale.com>
> wrote:
> > Hi,
> >
> >  I've been doing some tests using wide rows recently, and I've seen some
> odd
> > performance problems that I'd like to understand.
> >
> > In particular, I've seen that the time it takes for Cassandra to perform
> a
> > column slice of a single key, solely in a Memtable, seems to be very
> > expensive, but most importantly proportional to the ordered position
> where
> > the start column of the slice lives.
> >
> > In other words:
> >  1- if I start Cassandra fresh (with an empty ColumnFamily with TimeUUID
> > comparator)
> >  2- I create a single Row with Key "K"
> >  3- Then add 200K TimeUUID columns to key "K"
> >  4- (and make sure nothing is flushed to SSTables...so it's all in the
> > Memtable)
> >
> > ...I observe the following timings (secondds to perform 1000 reads) while
> > performing multiget slices on it:  (pardon the pseudo-code, but you'll
> get
> > the gist)
> >
> > a) simply a get of the first column:  GET("K",:count=>1)
> >   --  2.351226
> >
> > b) doing a slice get, starting from the first column:  GET("K",:start =>
> > '144abe16-416c-11e1-9e23-2cbae9ddfe8b' , :count => 1 )
> >   -- 2.189224   <<- so with or without "start" doesn't seem to make much
> of
> > a difference
> >
> > c) doing a slice get, starting from the middle of the ordered
> > columns..approx starting at item number 100K:   GET("K",:start =>
> > '9c13c644-416c-11e1-81dd-4ba530dc83d0' , :count => 1 )
> >  -- 11.849326  <<- 5 times more expensive if the start of the slice is
> 100K
> > positions away
> >
> > d) doing a slice get, starting from the last of the ordered
> columns..approx
> > position 200K:   GET("K",:start
> => '1c1b9b32-416d-11e1-83ff-dd2796c3abd7' ,
> > :count => 1 )
> >   -- 19.889741   <<- Almost twice as expensive than starting the slice at
> > position 100K, and 10 times more expensive than starting from the first
> one
> >
> > This behavior leads me to believe that there's a clear Memtable column
> scan
> > for the columns of the key.
> > If one tries a column name read on those positions (i.e., not a slice),
> the
> > performance is constant. I.e., GET("K",
> > '144abe16-416c-11e1-9e23-2cbae9ddfe8b') . Retrieving the first, middle or
> > last timeUUID is done in the same amount of time.
> >
> > Having increasingly worse performance for column slices in Memtables
> seems
> > to be a bit of a problem...aren't Memtables backed by a structure that
> has
> > some sort of column name indexing?...so that landing on the start column
> can
> > be efficient? I'm definitely observing very high CPU utilization on those
> > scans...By the way, with wide columns like this, slicing SSTables is
> quite
> > faster than slicing Memtables...I'm attributing that to the sampled
> index of
> > the SSTables, hence that's why I'm wondering if the Memtables do not have
> > such column indexing builtin and resort to linked lists of sort....
> >
> > Note, that the actual timings shown are not important, it's in my laptop
> and
> > I have a small amount of debugging enabled...what it is important is the
> > difference between then.
> >
> > I'm using Cassandra trunk as of Dec 1st, but I believe I've done
> experiments
> > with 0.8 series too, leading to the same issue.
>
> You may want to retry your experiments on current trunk. We do had
> inefficiency
> in our memtable search that was fixed by:
> https://issues.apache.org/jira/browse/CASSANDRA-3545
> (the name of the ticket doesn't make it clear that it's related but it is)
>
> The issue was committed on December 8.
>
> --
> Sylvain
>
> >
> >  Cheers,
> >
> > Josep M.
>

Re: poor Memtable performance on column slices?

Posted by Sylvain Lebresne <sy...@datastax.com>.

On Wed, Jan 18, 2012 at 2:44 AM, Josep Blanquer <bl...@rightscale.com> wrote:
> Hi,
>
>  I've been doing some tests using wide rows recently, and I've seen some odd
> performance problems that I'd like to understand.
>
> In particular, I've seen that the time it takes for Cassandra to perform a
> column slice of a single key, solely in a Memtable, seems to be very
> expensive, but most importantly proportional to the ordered position where
> the start column of the slice lives.
>
> In other words:
>  1- if I start Cassandra fresh (with an empty ColumnFamily with TimeUUID
> comparator)
>  2- I create a single Row with Key "K"
>  3- Then add 200K TimeUUID columns to key "K"
>  4- (and make sure nothing is flushed to SSTables...so it's all in the
> Memtable)
>
> ...I observe the following timings (secondds to perform 1000 reads) while
> performing multiget slices on it:  (pardon the pseudo-code, but you'll get
> the gist)
>
> a) simply a get of the first column:  GET("K",:count=>1)
>   --  2.351226
>
> b) doing a slice get, starting from the first column:  GET("K",:start =>
> '144abe16-416c-11e1-9e23-2cbae9ddfe8b' , :count => 1 )
>   -- 2.189224   <<- so with or without "start" doesn't seem to make much of
> a difference
>
> c) doing a slice get, starting from the middle of the ordered
> columns..approx starting at item number 100K:   GET("K",:start =>
> '9c13c644-416c-11e1-81dd-4ba530dc83d0' , :count => 1 )
>  -- 11.849326  <<- 5 times more expensive if the start of the slice is 100K
> positions away
>
> d) doing a slice get, starting from the last of the ordered columns..approx
> position 200K:   GET("K",:start => '1c1b9b32-416d-11e1-83ff-dd2796c3abd7' ,
> :count => 1 )
>   -- 19.889741   <<- Almost twice as expensive than starting the slice at
> position 100K, and 10 times more expensive than starting from the first one
>
> This behavior leads me to believe that there's a clear Memtable column scan
> for the columns of the key.
> If one tries a column name read on those positions (i.e., not a slice), the
> performance is constant. I.e., GET("K",
> '144abe16-416c-11e1-9e23-2cbae9ddfe8b') . Retrieving the first, middle or
> last timeUUID is done in the same amount of time.
>
> Having increasingly worse performance for column slices in Memtables seems
> to be a bit of a problem...aren't Memtables backed by a structure that has
> some sort of column name indexing?...so that landing on the start column can
> be efficient? I'm definitely observing very high CPU utilization on those
> scans...By the way, with wide columns like this, slicing SSTables is quite
> faster than slicing Memtables...I'm attributing that to the sampled index of
> the SSTables, hence that's why I'm wondering if the Memtables do not have
> such column indexing builtin and resort to linked lists of sort....
>
> Note, that the actual timings shown are not important, it's in my laptop and
> I have a small amount of debugging enabled...what it is important is the
> difference between then.
>
> I'm using Cassandra trunk as of Dec 1st, but I believe I've done experiments
> with 0.8 series too, leading to the same issue.

You may want to retry your experiments on current trunk. We do had inefficiency
in our memtable search that was fixed by:
https://issues.apache.org/jira/browse/CASSANDRA-3545
(the name of the ticket doesn't make it clear that it's related but it is)

The issue was committed on December 8.

--
Sylvain

>
>  Cheers,
>
> Josep M.