You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@cassandra.apache.org by "Stu Hood (JIRA)" <ji...@apache.org> on 2011/01/19 09:34:48 UTC

[jira] Created: (CASSANDRA-2006) Serverwide caps on memtable thresholds

Serverwide caps on memtable thresholds
--------------------------------------

                 Key: CASSANDRA-2006
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2006
             Project: Cassandra
          Issue Type: Improvement
          Components: Core
            Reporter: Stu Hood
             Fix For: 0.8


By storing global operation and throughput thresholds, we could eliminate the "many small memtables" problem caused by having many CFs. The global threshold would be set in the config file, to allow different classes of servers to have different values configured.

Operations occurring in the memtable would add to the global counters, in addition to the memtable-local counters. When a global threshold was violated, the memtable in the system that was using the largest fraction of it's local threshold would be flushed. Local thresholds would continue to act as they always have.

The result would be larger sstables, safer operation with multiple CFs and per node tuning.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] [Commented] (CASSANDRA-2006) Serverwide caps on memtable thresholds

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13015909#comment-13015909 ] 

Jonathan Ellis commented on CASSANDRA-2006:
-------------------------------------------

The count can take minutes, on a large (GB) memtable under load. We probably don't want to block flush that long.

> Serverwide caps on memtable thresholds
> --------------------------------------
>
>                 Key: CASSANDRA-2006
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2006
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>             Fix For: 0.8
>
>         Attachments: 2006-v2.txt, 2006.txt, jamm-0.2.jar
>
>
> By storing global operation and throughput thresholds, we could eliminate the "many small memtables" problem caused by having many CFs. The global threshold would be set in the config file, to allow different classes of servers to have different values configured.
> Operations occurring in the memtable would add to the global counters, in addition to the memtable-local counters. When a global threshold was violated, the memtable in the system that was using the largest fraction of it's local threshold would be flushed. Local thresholds would continue to act as they always have.
> The result would be larger sstables, safer operation with multiple CFs and per node tuning.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2006) Serverwide caps on memtable thresholds

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-2006:
--------------------------------------

    Attachment: 2006-v3.txt

v3 adds warning messages if computed ratio falls outside of 1 < r < 64.

> Serverwide caps on memtable thresholds
> --------------------------------------
>
>                 Key: CASSANDRA-2006
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2006
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>             Fix For: 0.8
>
>         Attachments: 2006-v2.txt, 2006-v3.txt, 2006.txt, jamm-0.2.jar
>
>
> By storing global operation and throughput thresholds, we could eliminate the "many small memtables" problem caused by having many CFs. The global threshold would be set in the config file, to allow different classes of servers to have different values configured.
> Operations occurring in the memtable would add to the global counters, in addition to the memtable-local counters. When a global threshold was violated, the memtable in the system that was using the largest fraction of it's local threshold would be flushed. Local thresholds would continue to act as they always have.
> The result would be larger sstables, safer operation with multiple CFs and per node tuning.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2006) Serverwide caps on memtable thresholds

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-2006:
--------------------------------------

    Attachment: 2006.txt

Patch that optionally creates a global heap usage threshold and tries to keep total memtable size under that.

The two main points of interest are Memtable.updateLiveRatio and MeteredFlusher.

MeteredFlusher is what checks memory usage (once per second) and kicks of the flushes.  Note that naively flushing when we hit the threshold is wrong, since you can have multiple memtables in-flight during the flush process.  To address this, we track inactive but unflushed memtables and include those in our total. We also aggressively flush any memtable that reaches the level of "if my entire flush pipeline were full of memtables of this size, how big could I allow them to be."

Since counting each object's size is far too slow to be useful directly, we compute the ratio of serialized size to memory size in the background, and update that periodically; That is what updateLiveRatio does.  MeteredFlusher then bases its work on actual serialized size, multiplied by this ratio.

One last note: the config code is a little messy because we want to leave behavior unchanged (i.e.: only use old per-CF thresholds) if the setting is absent as it would be for an upgrader. But, we want a setting to allow "pick a reasonable default based on heap usage;" hence the distinction b/t null (off) and -1 (autocompute).

I tested by creating the stress schema, then modifying the per-CF settings to be multiple TB, so only the new global flusher affects things.  Then I created half a GB of commitlog files to reply -- CL replay hammers it much harder than even stress.java.

It was successful in preventing OOM (or even the "emergency flushing" at 85% of heap) but heap usage as reported by CMS was consistently about 25% higher than what MeteredFlusher thought it should be. It may be that we can fudge factor this; otherwise, tuning by watching CMS vs estimated size and adjusting the setting manually to compensate, is still much easier than the status quo of per-CF tuning.

To experiment, I recommend also patching the log4j settings as follows:

{noformat}
Index: conf/log4j-server.properties
===================================================================
--- conf/log4j-server.properties	(revision 1085010)
+++ conf/log4j-server.properties	(working copy)
@@ -35,7 +35,8 @@
 log4j.appender.R.File=/var/log/cassandra/system.log
 
 # Application logging options
-#log4j.logger.org.apache.cassandra=DEBUG
+log4j.logger.org.apache.cassandra.service.GCInspector=DEBUG
+log4j.logger.org.apache.cassandra.db.MeteredFlusher=DEBUG
 #log4j.logger.org.apache.cassandra.db=DEBUG
 #log4j.logger.org.apache.cassandra.service.StorageProxy=DEBUG
{noformat}


> Serverwide caps on memtable thresholds
> --------------------------------------
>
>                 Key: CASSANDRA-2006
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2006
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>             Fix For: 0.8
>
>         Attachments: 2006.txt
>
>
> By storing global operation and throughput thresholds, we could eliminate the "many small memtables" problem caused by having many CFs. The global threshold would be set in the config file, to allow different classes of servers to have different values configured.
> Operations occurring in the memtable would add to the global counters, in addition to the memtable-local counters. When a global threshold was violated, the memtable in the system that was using the largest fraction of it's local threshold would be flushed. Local thresholds would continue to act as they always have.
> The result would be larger sstables, safer operation with multiple CFs and per node tuning.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2006) Serverwide caps on memtable thresholds

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13017609#comment-13017609 ] 

Hudson commented on CASSANDRA-2006:
-----------------------------------

Integrated in Cassandra #838 (See [https://hudson.apache.org/hudson/job/Cassandra/838/])
    

> Serverwide caps on memtable thresholds
> --------------------------------------
>
>                 Key: CASSANDRA-2006
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2006
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Jonathan Ellis
>             Fix For: 0.8
>
>         Attachments: 2006-v2.txt, 2006-v3.txt, 2006.txt, jamm-0.2.jar
>
>
> By storing global operation and throughput thresholds, we could eliminate the "many small memtables" problem caused by having many CFs. The global threshold would be set in the config file, to allow different classes of servers to have different values configured.
> Operations occurring in the memtable would add to the global counters, in addition to the memtable-local counters. When a global threshold was violated, the memtable in the system that was using the largest fraction of it's local threshold would be flushed. Local thresholds would continue to act as they always have.
> The result would be larger sstables, safer operation with multiple CFs and per node tuning.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2006) Serverwide caps on memtable thresholds

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13016714#comment-13016714 ] 

Stu Hood commented on CASSANDRA-2006:
-------------------------------------

+1
I'm very excited about this change: my last nitpick is that the flush_largest_memtables_at and memtable_total_space_in_mb settings could be made more consistent. At the absolute minimum, they should refer to one another in the config file, but I'm wondering how we might unify the 3 or 4 different reasons for flushing in our monitoring/logging somehow.

Also, we should start making a plan to deprecate the per-cf settings, or convert them into fractions as mentioned above.

> Serverwide caps on memtable thresholds
> --------------------------------------
>
>                 Key: CASSANDRA-2006
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2006
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Jonathan Ellis
>             Fix For: 0.8
>
>         Attachments: 2006-v2.txt, 2006-v3.txt, 2006.txt, jamm-0.2.jar
>
>
> By storing global operation and throughput thresholds, we could eliminate the "many small memtables" problem caused by having many CFs. The global threshold would be set in the config file, to allow different classes of servers to have different values configured.
> Operations occurring in the memtable would add to the global counters, in addition to the memtable-local counters. When a global threshold was violated, the memtable in the system that was using the largest fraction of it's local threshold would be flushed. Local thresholds would continue to act as they always have.
> The result would be larger sstables, safer operation with multiple CFs and per node tuning.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2006) Serverwide caps on memtable thresholds

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13015779#comment-13015779 ] 

Stu Hood commented on CASSANDRA-2006:
-------------------------------------

Since the only one consuming the getLiveSize value is the MeteredFlusher thread, could the live-size update be moved into that thread instead? Adding a new executor and 2 new tasks seems like overkill, and it looks like it would remove the "potentially a flushed memtable being counted by jamm" fuzziness.

> Serverwide caps on memtable thresholds
> --------------------------------------
>
>                 Key: CASSANDRA-2006
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2006
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>             Fix For: 0.8
>
>         Attachments: 2006-v2.txt, 2006.txt, jamm-0.2.jar
>
>
> By storing global operation and throughput thresholds, we could eliminate the "many small memtables" problem caused by having many CFs. The global threshold would be set in the config file, to allow different classes of servers to have different values configured.
> Operations occurring in the memtable would add to the global counters, in addition to the memtable-local counters. When a global threshold was violated, the memtable in the system that was using the largest fraction of it's local threshold would be flushed. Local thresholds would continue to act as they always have.
> The result would be larger sstables, safer operation with multiple CFs and per node tuning.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (CASSANDRA-2006) Serverwide caps on memtable thresholds

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13008087#comment-13008087 ] 

Stu Hood commented on CASSANDRA-2006:
-------------------------------------

HBase accomplishes this by keeping a threshold of heap usage for Memtables, and flushing the largest when the threshold is crossed: similar to the safety threshold that jbellis added recently.

> Serverwide caps on memtable thresholds
> --------------------------------------
>
>                 Key: CASSANDRA-2006
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2006
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>             Fix For: 0.8
>
>
> By storing global operation and throughput thresholds, we could eliminate the "many small memtables" problem caused by having many CFs. The global threshold would be set in the config file, to allow different classes of servers to have different values configured.
> Operations occurring in the memtable would add to the global counters, in addition to the memtable-local counters. When a global threshold was violated, the memtable in the system that was using the largest fraction of it's local threshold would be flushed. Local thresholds would continue to act as they always have.
> The result would be larger sstables, safer operation with multiple CFs and per node tuning.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2006) Serverwide caps on memtable thresholds

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-2006:
--------------------------------------

    Attachment: jamm-0.2.jar

Requires JAMM (see https://github.com/jbellis/jamm/ and CASSANDRA-2203)

> Serverwide caps on memtable thresholds
> --------------------------------------
>
>                 Key: CASSANDRA-2006
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2006
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>             Fix For: 0.8
>
>         Attachments: 2006.txt, jamm-0.2.jar
>
>
> By storing global operation and throughput thresholds, we could eliminate the "many small memtables" problem caused by having many CFs. The global threshold would be set in the config file, to allow different classes of servers to have different values configured.
> Operations occurring in the memtable would add to the global counters, in addition to the memtable-local counters. When a global threshold was violated, the memtable in the system that was using the largest fraction of it's local threshold would be flushed. Local thresholds would continue to act as they always have.
> The result would be larger sstables, safer operation with multiple CFs and per node tuning.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (CASSANDRA-2006) Serverwide caps on memtable thresholds

Posted by "David Boxenhorn (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12984118#action_12984118 ] 

David Boxenhorn commented on CASSANDRA-2006:
--------------------------------------------

The way I see it - and it becomes even more necessary for a multi-tenant configuration - there should be three completely separate levels of configuration.

- Column family configuration is based on data and usage characteristics of column families in your application.
- Keyspace configuration enables the modularization of your application (and in a multi-tenant environment, you can assign a keyspace to a tenant)
- Server configuration is based on the specific hardware limitations of the server.

Server configuration takes priority over keyspace configuration which takes priority over application configuration.

Looking at if from the inverse perspective:

- CF configuration tunes access to your CFs
- Keyspace configuration protects one module of your application from problems in the other modules
- Server configuration protects the server as a whole from going beyond the limits of its hardware 

> Serverwide caps on memtable thresholds
> --------------------------------------
>
>                 Key: CASSANDRA-2006
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2006
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>             Fix For: 0.8
>
>
> By storing global operation and throughput thresholds, we could eliminate the "many small memtables" problem caused by having many CFs. The global threshold would be set in the config file, to allow different classes of servers to have different values configured.
> Operations occurring in the memtable would add to the global counters, in addition to the memtable-local counters. When a global threshold was violated, the memtable in the system that was using the largest fraction of it's local threshold would be flushed. Local thresholds would continue to act as they always have.
> The result would be larger sstables, safer operation with multiple CFs and per node tuning.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-2006) Serverwide caps on memtable thresholds

Posted by "David Boxenhorn (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12986286#action_12986286 ] 

David Boxenhorn commented on CASSANDRA-2006:
--------------------------------------------

I guess that what my suggestion means, in practice, is that "the memtable in the system that was using the largest fraction of it's local threshold would be flushed" would be applied when a keyspace threshold is exceeded, rather than when a system threshold is exceeded.

When a server threshold is exceeded, you would first look for the keyspace that is using the largest fraction of its threshold, then flush the memtable in that keyspace that is using the largest fraction of its local threshold.

> Serverwide caps on memtable thresholds
> --------------------------------------
>
>                 Key: CASSANDRA-2006
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2006
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>             Fix For: 0.8
>
>
> By storing global operation and throughput thresholds, we could eliminate the "many small memtables" problem caused by having many CFs. The global threshold would be set in the config file, to allow different classes of servers to have different values configured.
> Operations occurring in the memtable would add to the global counters, in addition to the memtable-local counters. When a global threshold was violated, the memtable in the system that was using the largest fraction of it's local threshold would be flushed. Local thresholds would continue to act as they always have.
> The result would be larger sstables, safer operation with multiple CFs and per node tuning.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] [Updated] (CASSANDRA-2006) Serverwide caps on memtable thresholds

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-2006:
--------------------------------------

    Attachment: 2006-v2.txt

v2 moves to trunk, adds a 25% fudge factor, and adds MeteredFlusherTest to demonstrate handling 100 CFs in a workload that OOMs w/o this feature.  (Note that I disabled emergency GC-based flushing in the test config, so that doesn't cause a false negative.)

> Serverwide caps on memtable thresholds
> --------------------------------------
>
>                 Key: CASSANDRA-2006
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2006
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>             Fix For: 0.8
>
>         Attachments: 2006-v2.txt, 2006.txt, jamm-0.2.jar
>
>
> By storing global operation and throughput thresholds, we could eliminate the "many small memtables" problem caused by having many CFs. The global threshold would be set in the config file, to allow different classes of servers to have different values configured.
> Operations occurring in the memtable would add to the global counters, in addition to the memtable-local counters. When a global threshold was violated, the memtable in the system that was using the largest fraction of it's local threshold would be flushed. Local thresholds would continue to act as they always have.
> The result would be larger sstables, safer operation with multiple CFs and per node tuning.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira