You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Andy Burgess <an...@rbsworldpay.com> on 2011/01/07 12:21:32 UTC

Question re: the use of multiple ColumnFamilies

Hi,

I have a performance problem related to my use of multiple
ColumnFamilies. Maybe there's a better way to represent my data such
that I don't hit this problem, I don't know, but as things stand, I'm
putting data into each ColumnFamily at a rate which is more-or-less the
same for each. This means that each ColumnFamily crosses the
memtable-to-sstable threshold at approximately the same time - but
Cassandra only has one thread which writes memtables to disk, so the
observed behaviour is that ColumnFamilies start waiting for each other
to be written to disk, but will not accept new inserts until this has
happened, which leads to time-outs for both reads and writes.

Can anyone confirm whether or not this behaviour is expected, and
suggest anything that I could do about it? This is on 0.6.6, by the way.
Patched with time-to-live code, if that makes a difference.

Thanks,
Andy.

-- 
Andy Burgess
Principal Development Engineer
Application Delivery
WorldPay Ltd.
270-289 Science Park, Milton Road
Cambridge, CB4 0WE, United Kingdom (Depot Code: 024)
Office: +44 (0)1223 706 779| Mobile: +44 (0)7909 534 940
andy.burgess@worldpay.com


Re: Question re: the use of multiple ColumnFamilies

Posted by Andy Burgess <an...@rbsworldpay.com>.
Yes, that's exactly so. I have no need for multiple DataFileDirectories, 
except that it was a convenient way to test the hypothesis that the 
performance bottleneck I was experiencing due to writing only one 
sstable to disk at once could be solved by increasing the queue size for 
the writer (the maximum queue size turns out to be equal to the number 
of DataFileDirectories).

I haven't done so yet, but I intend to patch locally to allow direct 
configuration of the queue size rather than relying on this side effect.

Regards,
Andy.

On 22/01/11 00:29, Peter Schuller wrote:
>> "A number of people have experienced lose from using multiple
>> DataFileDirectories, and to my knowledge no one has experienced win
>> from doing so."
> I presume that's disk space reasons.
>
>> Do you have an actual use case for this functionality in which you
>> experience win?
> I understood his use case to be working around
> https://issues.apache.org/jira/browse/CASSANDRA-1955
>

-- 
Andy Burgess
Principal Development Engineer
Application Delivery
WorldPay Ltd.
270-289 Science Park, Milton Road
Cambridge, CB4 0WE, United Kingdom (Depot Code: 024)
Office: +44 (0)1223 706 779| Mobile: +44 (0)7909 534 940
andy.burgess@worldpay.com

WorldPay (UK) Limited, Company No. 07316500. Registered Office: 55 Mansell Street, London E1 8AN

Authorised and regulated by the Financial Services Authority.

‘WorldPay Group’ means WorldPay (UK) Limited and its affiliates from time to time.  A reference to an “affiliate” means any Subsidiary Undertaking, any Parent Undertaking and any Subsidiary Undertaking of any such Parent Undertaking and reference to a “Parent Undertaking” or a “Subsidiary Undertaking” is to be construed in accordance with section 1162 of the Companies Act 2006, as amended.

DISCLAIMER: This email and any files transmitted with it, including replies and forwarded copies (which may contain alterations) subsequently transmitted from the WorldPay Group, are confidential and solely for the use of the intended recipient. If you are not the intended recipient (or authorised to receive for the intended recipient), you have received this email in error and any review, use, distribution or disclosure of its content is strictly prohibited. If you have received this email in error please notify the sender immediately by replying to this message. Please then delete this email and destroy any copies of it.

Messages sent to and from the WorldPay Group may be monitored to ensure compliance with internal policies and to protect our business.  Emails are not necessarily secure.  The WorldPay Group does not accept responsibility for changes made to this message after it was sent. Please note that neither the WorldPay Group nor the sender accepts any responsibility for viruses and it is the responsibility of the recipient to ensure that the onward transmission, opening or use of this message and any attachments will not adversely affect its systems or data. Anyone who communicates with us by email is taken to accept these risks. Opinions, conclusions and other information contained in this message that do not relate to the official business of the WorldPay Group shall not be understood as endorsed or given by it.

Re: Question re: the use of multiple ColumnFamilies

Posted by Peter Schuller <pe...@infidyne.com>.
> "A number of people have experienced lose from using multiple
> DataFileDirectories, and to my knowledge no one has experienced win
> from doing so."

I presume that's disk space reasons.

> Do you have an actual use case for this functionality in which you
> experience win?

I understood his use case to be working around
https://issues.apache.org/jira/browse/CASSANDRA-1955

-- 
/ Peter Schuller

Re: Question re: the use of multiple ColumnFamilies

Posted by Robert Coli <rc...@digg.com>.
On 1/18/11, Andy Burgess <an...@rbsworldpay.com> wrote:
> Sorry for the delayed reply, but thanks very much - this pointed me at
> the exact problem. I found that the queue size here was equal to the
> number of configured DataFileDirectories, so a good test was to lie to
> Cassandra and claim that there were more DataFileDirectories than I
> needed.

My standard disclaimer whenever anyone mentions using multiple
DataFileDirectories :

"A number of people have experienced lose from using multiple
DataFileDirectories, and to my knowledge no one has experienced win
from doing so."

Do you have an actual use case for this functionality in which you
experience win?

=Rob

Re: Question re: the use of multiple ColumnFamilies

Posted by Peter Schuller <pe...@infidyne.com>.
> Sorry for the delayed reply, but thanks very much - this pointed me at the
> exact problem. I found that the queue size here was equal to the number of
> configured DataFileDirectories, so a good test was to lie to Cassandra and
> claim that there were more DataFileDirectories than I needed. Interestingly,
> it still only ever wrote to the first configured DataFileDirectory, but it
> certainly eliminated the problem, which I think means that for my use case
> at least, it will be good enough to patch Cassandra to introduce more
> control of the queue size.

Based on your use case as you originally stated it (some cf:s that got
written at a slow pace and just happened to flush at the same time),
that should be enough.

(If you have some CF:s being written to faster than they are flushed,
there would still be potential for one CF to hog the flush writers
unfairly.)

-- 
/ Peter Schuller

Re: Question re: the use of multiple ColumnFamilies

Posted by Andy Burgess <an...@rbsworldpay.com>.
Sorry for the delayed reply, but thanks very much - this pointed me at 
the exact problem. I found that the queue size here was equal to the 
number of configured DataFileDirectories, so a good test was to lie to 
Cassandra and claim that there were more DataFileDirectories than I 
needed. Interestingly, it still only ever wrote to the first configured 
DataFileDirectory, but it certainly eliminated the problem, which I 
think means that for my use case at least, it will be good enough to 
patch Cassandra to introduce more control of the queue size.

On 08/01/11 18:20, Peter Schuller wrote:
> [multiple active cf;s, often triggering flush at the same time]
>
>> Can anyone confirm whether or not this behaviour is expected, and
>> suggest anything that I could do about it? This is on 0.6.6, by the way.
>> Patched with time-to-live code, if that makes a difference.
> I looked at the code (trunk though, not 0.6.6) and was a bit
> surprised. There seems to be a single shared (static) executor for the
> sorting and writing stages of memtable flushing (so far so good). But
> what I didn't expect was that they seem to have a work queue of a size
> equal to the concurrency.
>
> In the case of the writer, the concurrency is the
> memtable_flush_writers option (not available in 0.6.6). For the
> sorter, it is the number of CPU cores on the system. This makes sense
> for the concurrency aspect.
>
> If my understanding is correct and I am not missing something else,
> this means that for multiple column families you do indeed need to
> expect to have this problem. The more column families the greater the
> probability.
>
> What I expected to find was to see that each cf would be guaranteed to
> have at least one memtable in queue before writes would block for that
> cf.
>
> Assuming the same holds true in your case on 0.6.6 (it looks to be so
> on the 0.6 branch by quick examination), I would have to assume that
> either one of the following is true:
>
> (1) You have more cf:s actively written to than the number of CPU
> cores on your machine so that you're waiting on flushSorter.
>    or
> (2) Your write speed is overall higher than what can be sustained by
> an sstable writer.
>
> If you are willing to patch Cassandra and do the appropriate testing,
> and are find with the implications on heap size, you should be able to
> work around this by adjusting the size of the work queues for the
> flushSorter and flushWriter in ColumnFamilyStory.java.
>
> Note that I did not test this, so proceed with caution if you do.
>
> It will definitely mean that you will eat more heap space if you
> submit writes to the cluster faster than they are processed. So in
> particular if you're relying on backpressure mechanisms to avoid
> causing problems when you do non-rate-limited writes to the cluster,
> results are probably negative.
>
> I'll file a bug about this to (1) elicit feedback if I'm wrong, and
> (2) to fix it.
>

-- 
Andy Burgess
Principal Development Engineer
Application Delivery
WorldPay Ltd.
270-289 Science Park, Milton Road
Cambridge, CB4 0WE, United Kingdom (Depot Code: 024)
Office: +44 (0)1223 706 779| Mobile: +44 (0)7909 534 940
andy.burgess@worldpay.com

WorldPay (UK) Limited, Company No. 07316500. Registered Office: 55 Mansell Street, London E1 8AN

Authorised and regulated by the Financial Services Authority.

‘WorldPay Group’ means WorldPay (UK) Limited and its affiliates from time to time.  A reference to an “affiliate” means any Subsidiary Undertaking, any Parent Undertaking and any Subsidiary Undertaking of any such Parent Undertaking and reference to a “Parent Undertaking” or a “Subsidiary Undertaking” is to be construed in accordance with section 1162 of the Companies Act 2006, as amended.

DISCLAIMER: This email and any files transmitted with it, including replies and forwarded copies (which may contain alterations) subsequently transmitted from the WorldPay Group, are confidential and solely for the use of the intended recipient. If you are not the intended recipient (or authorised to receive for the intended recipient), you have received this email in error and any review, use, distribution or disclosure of its content is strictly prohibited. If you have received this email in error please notify the sender immediately by replying to this message. Please then delete this email and destroy any copies of it.

Messages sent to and from the WorldPay Group may be monitored to ensure compliance with internal policies and to protect our business.  Emails are not necessarily secure.  The WorldPay Group does not accept responsibility for changes made to this message after it was sent. Please note that neither the WorldPay Group nor the sender accepts any responsibility for viruses and it is the responsibility of the recipient to ensure that the onward transmission, opening or use of this message and any attachments will not adversely affect its systems or data. Anyone who communicates with us by email is taken to accept these risks. Opinions, conclusions and other information contained in this message that do not relate to the official business of the WorldPay Group shall not be understood as endorsed or given by it.

Re: Question re: the use of multiple ColumnFamilies

Posted by Peter Schuller <pe...@infidyne.com>.
Filed: https://issues.apache.org/jira/browse/CASSANDRA-1955

-- 
/ Peter Schuller

Re: Question re: the use of multiple ColumnFamilies

Posted by Peter Schuller <pe...@infidyne.com>.
[multiple active cf;s, often triggering flush at the same time]

> Can anyone confirm whether or not this behaviour is expected, and
> suggest anything that I could do about it? This is on 0.6.6, by the way.
> Patched with time-to-live code, if that makes a difference.

I looked at the code (trunk though, not 0.6.6) and was a bit
surprised. There seems to be a single shared (static) executor for the
sorting and writing stages of memtable flushing (so far so good). But
what I didn't expect was that they seem to have a work queue of a size
equal to the concurrency.

In the case of the writer, the concurrency is the
memtable_flush_writers option (not available in 0.6.6). For the
sorter, it is the number of CPU cores on the system. This makes sense
for the concurrency aspect.

If my understanding is correct and I am not missing something else,
this means that for multiple column families you do indeed need to
expect to have this problem. The more column families the greater the
probability.

What I expected to find was to see that each cf would be guaranteed to
have at least one memtable in queue before writes would block for that
cf.

Assuming the same holds true in your case on 0.6.6 (it looks to be so
on the 0.6 branch by quick examination), I would have to assume that
either one of the following is true:

(1) You have more cf:s actively written to than the number of CPU
cores on your machine so that you're waiting on flushSorter.
  or
(2) Your write speed is overall higher than what can be sustained by
an sstable writer.

If you are willing to patch Cassandra and do the appropriate testing,
and are find with the implications on heap size, you should be able to
work around this by adjusting the size of the work queues for the
flushSorter and flushWriter in ColumnFamilyStory.java.

Note that I did not test this, so proceed with caution if you do.

It will definitely mean that you will eat more heap space if you
submit writes to the cluster faster than they are processed. So in
particular if you're relying on backpressure mechanisms to avoid
causing problems when you do non-rate-limited writes to the cluster,
results are probably negative.

I'll file a bug about this to (1) elicit feedback if I'm wrong, and
(2) to fix it.

-- 
/ Peter Schuller