You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Jonathan Ellis (JIRA)" <ji...@apache.org> on 2011/02/08 21:35:57 UTC

[jira] Created: (CASSANDRA-2142) Add "reduce memory usage because I tuned things poorly" feature

Add "reduce memory usage because I tuned things poorly" feature
---------------------------------------------------------------

                 Key: CASSANDRA-2142
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2142
             Project: Cassandra
          Issue Type: Improvement
            Reporter: Jonathan Ellis
            Assignee: Jonathan Ellis
            Priority: Minor
             Fix For: 0.7.1


Users frequently create too many columnfamilies, set the memtable thresholds too high (or adjust throughput while ignoring operations), and/or set caching thresholds too high.  Then their server OOMs and they tell their friends Cassandra sucks.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (CASSANDRA-2142) Add "reduce memory usage because I tuned things poorly" feature

Posted by "Brandon Williams (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992767#comment-12992767 ] 

Brandon Williams commented on CASSANDRA-2142:
---------------------------------------------

{noformat}

             ,%,_
            %%%/,\
         _.-"%%|//%
      _.' _.-"  /%%%
  _.-'_.-" O)    \%%%
 /.\.'            \%%%
 \ /        _,     |%%%
  `"-----"~`\   _,*'\%%'   _,--""""-,%%,
             )*^     `""~~`          \%%%,
           _/                         \%%%
       _.-`/                           |%%,___
   _.-"   /      ,           ,        ,|%%   .`\
  /\     /      /             `\       \%'   \ /
  \ \ _,/      /`~-._         _,`\      \`""~~`
   `"` /-.,_ /'      `~"----"~    `\     \
       \___,'                       \.-"`/
                                     `--'
{noformat}

> Add "reduce memory usage because I tuned things poorly" feature
> ---------------------------------------------------------------
>
>                 Key: CASSANDRA-2142
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2142
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>            Priority: Minor
>              Labels: ponies
>             Fix For: 0.7.1
>
>         Attachments: 2142-v2.txt, 2142-v3.txt, 2142.txt
>
>
> Users frequently create too many columnfamilies, set the memtable thresholds too high (or adjust throughput while ignoring operations), and/or set caching thresholds too high.  Then their server OOMs and they tell their friends Cassandra sucks.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (CASSANDRA-2142) Add "reduce memory usage because I tuned things poorly" feature

Posted by "Brandon Williams (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992194#comment-12992194 ] 

Brandon Williams commented on CASSANDRA-2142:
---------------------------------------------

Part of the problem there is the heap size of 128M is ridiculously small and you can insert data quickly enough that CMS probably never gets a chance before it goes into the death spiral I'm seen around 90% on various (1G+) heaps.  The threshold is too high though, although this may ultimately depend on how much breathing room is left in the heap (and thus the size of the heap itself.)

> Add "reduce memory usage because I tuned things poorly" feature
> ---------------------------------------------------------------
>
>                 Key: CASSANDRA-2142
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2142
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>            Priority: Minor
>              Labels: ponies
>             Fix For: 0.7.1
>
>         Attachments: 2142.txt
>
>
> Users frequently create too many columnfamilies, set the memtable thresholds too high (or adjust throughput while ignoring operations), and/or set caching thresholds too high.  Then their server OOMs and they tell their friends Cassandra sucks.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (CASSANDRA-2142) Add "reduce memory usage because I tuned things poorly" feature

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992167#comment-12992167 ] 

Jonathan Ellis commented on CASSANDRA-2142:
-------------------------------------------

(cache-size adjustment is done only once; memtable flushing is done every time GCInspector hits that threshold.)

> Add "reduce memory usage because I tuned things poorly" feature
> ---------------------------------------------------------------
>
>                 Key: CASSANDRA-2142
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2142
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>            Priority: Minor
>              Labels: ponies
>             Fix For: 0.7.1
>
>         Attachments: 2142.txt
>
>
> Users frequently create too many columnfamilies, set the memtable thresholds too high (or adjust throughput while ignoring operations), and/or set caching thresholds too high.  Then their server OOMs and they tell their friends Cassandra sucks.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (CASSANDRA-2142) Add "reduce memory usage because I tuned things poorly" feature

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-2142:
--------------------------------------

    Attachment: 2142-v2.txt

Tha makes sense: 1.0-threshold needs to be enough heap remaining that the flush can complete before you finish running out and dying.

v2 splits the threshold into flush_largest_memtables_at and reduce_cache_sizes at, and reduces flush threshold in .yaml to 0.75 (= our default CMSInitiatingOccupancyThreshold, which is the lowest it's going to do any good at.)

v2 initializes GCInspector before CL.recover so it can be useful there.

v2 also adjusts log messages to clarify that we don't flush the same MT twice.

> Add "reduce memory usage because I tuned things poorly" feature
> ---------------------------------------------------------------
>
>                 Key: CASSANDRA-2142
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2142
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>            Priority: Minor
>              Labels: ponies
>             Fix For: 0.7.1
>
>         Attachments: 2142-v2.txt, 2142.txt
>
>
> Users frequently create too many columnfamilies, set the memtable thresholds too high (or adjust throughput while ignoring operations), and/or set caching thresholds too high.  Then their server OOMs and they tell their friends Cassandra sucks.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (CASSANDRA-2142) Add "reduce memory usage because I tuned things poorly" feature

Posted by "Brandon Williams (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992652#comment-12992652 ] 

Brandon Williams commented on CASSANDRA-2142:
---------------------------------------------

reduce_cache_sizes_at probably needs to go to at least 0.85 to be useful, but works really well when cache sizing is the problem.  +1

> Add "reduce memory usage because I tuned things poorly" feature
> ---------------------------------------------------------------
>
>                 Key: CASSANDRA-2142
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2142
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>            Priority: Minor
>              Labels: ponies
>             Fix For: 0.7.1
>
>         Attachments: 2142-v2.txt, 2142-v3.txt, 2142.txt
>
>
> Users frequently create too many columnfamilies, set the memtable thresholds too high (or adjust throughput while ignoring operations), and/or set caching thresholds too high.  Then their server OOMs and they tell their friends Cassandra sucks.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (CASSANDRA-2142) Add "reduce memory usage because I tuned things poorly" feature

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992191#comment-12992191 ] 

Stu Hood commented on CASSANDRA-2142:
-------------------------------------

For a 128 M heap, and a 255 M memtable, I had to tune {{reduce_heap_usage_at}} down to 0.66 to get the flushes to trigger at all.

Once they did trigger, it appeared that they frequently triggered twice, such that we ended up flushing a small memtable, like so:
{noformat} INFO 13:13:45,198 GC for ConcurrentMarkSweep: 694 ms, 22626024 reclaimed leaving 264991288 used; max is 403505152
 INFO 13:13:47,021 GC for ConcurrentMarkSweep: 865 ms, 20137408 reclaimed leaving 267475352 used; max is 403505152
 WARN 13:13:47,022 Flushing ColumnFamilyStore(table='org.apache.cassandra.db.Table@68e26d2e', columnFamily='Standard1') and ColumnFamilyStore(table='org.apache.cassandra.db.Table@68e26d2e', columnFamily='Standard1') to relieve memory pressure
 INFO 13:13:47,024 switching in a fresh Memtable for Standard1 at CommitLogContext(file='/var/lib/cassandra/commitlog/CommitLog-1297199505461.log', position=122618113)
 INFO 13:13:47,025 Enqueuing flush of Memtable-Standard1@1628970020(3822960 bytes, 74960 operations)
 INFO 13:13:48,824 GC for ConcurrentMarkSweep: 802 ms, 19015584 reclaimed leaving 268597464 used; max is 403505152
 WARN 13:13:48,824 Flushing ColumnFamilyStore(table='org.apache.cassandra.db.Table@68e26d2e', columnFamily='Standard1') and ColumnFamilyStore(table='org.apache.cassandra.db.Table@68e26d2e', columnFamily='Standard1') to relieve memory pressure
 INFO 13:13:48,842 switching in a fresh Memtable for Standard1 at CommitLogContext(file='/var/lib/cassandra/commitlog/CommitLog-1297199505461.log', position=122904861)
 INFO 13:13:48,842 Enqueuing flush of Memtable-Standard1@821726121(137445 bytes, 2695 operations)
 INFO 13:13:53,160 Completed flushing /var/lib/cassandra/data/Keyspace1/Standard1-f-8-Data.db (18975528 bytes)
 INFO 13:13:53,161 Writing Memtable-Standard1@1628970020(3822960 bytes, 74960 operations){noformat}

Finally, the node ended up going OOM before the end of the stress.py run.

> Add "reduce memory usage because I tuned things poorly" feature
> ---------------------------------------------------------------
>
>                 Key: CASSANDRA-2142
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2142
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>            Priority: Minor
>              Labels: ponies
>             Fix For: 0.7.1
>
>         Attachments: 2142.txt
>
>
> Users frequently create too many columnfamilies, set the memtable thresholds too high (or adjust throughput while ignoring operations), and/or set caching thresholds too high.  Then their server OOMs and they tell their friends Cassandra sucks.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (CASSANDRA-2142) Add "reduce memory usage because I tuned things poorly" feature

Posted by "Matthew F. Dennis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992748#comment-12992748 ] 

Matthew F. Dennis commented on CASSANDRA-2142:
----------------------------------------------

does this obviate the need for CASSANDRA-2006?

> Add "reduce memory usage because I tuned things poorly" feature
> ---------------------------------------------------------------
>
>                 Key: CASSANDRA-2142
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2142
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>            Priority: Minor
>              Labels: ponies
>             Fix For: 0.7.1
>
>         Attachments: 2142-v2.txt, 2142-v3.txt, 2142.txt
>
>
> Users frequently create too many columnfamilies, set the memtable thresholds too high (or adjust throughput while ignoring operations), and/or set caching thresholds too high.  Then their server OOMs and they tell their friends Cassandra sucks.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (CASSANDRA-2142) Add "reduce memory usage because I tuned things poorly" feature

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-2142:
--------------------------------------

    Attachment: 2142.txt

adds StorageService.reduceHeapUsage, which flushes the largest memtables and caps cache size when GCInspector notices that heap post-full-collection is within a configurable threshold of completely full.

The threshold defaults to 100% if it is not specified in .yaml (i.e., off for people upgrading) and 95% in the sample .yaml.

> Add "reduce memory usage because I tuned things poorly" feature
> ---------------------------------------------------------------
>
>                 Key: CASSANDRA-2142
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2142
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>            Priority: Minor
>              Labels: ponies
>             Fix For: 0.7.1
>
>         Attachments: 2142.txt
>
>
> Users frequently create too many columnfamilies, set the memtable thresholds too high (or adjust throughput while ignoring operations), and/or set caching thresholds too high.  Then their server OOMs and they tell their friends Cassandra sucks.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (CASSANDRA-2142) Add "reduce memory usage because I tuned things poorly" feature

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-2142:
--------------------------------------

    Attachment: 2142-v3.txt

v3 makes the fraction to reduce cache capacities to configurable, and expands comments in .yamls:

{noformat}
# emergency pressure valve: each time heap usage after a full (CMS)
# garbage collection is above this fraction of the max, Cassandra will
# flush the largest memtables.
#
# Set to 1.0 to disable.  Setting this lower than
# CMSInitiatingOccupancyFraction is not likely to be useful.
#
# RELYING ON THIS AS YOUR PRIMARY TUNING MECHANISM WILL WORK POORLY:
# it is most effective under light to moderate load, or read-heavy
# workloads; under truly massive write load, it will often be too
# little, too late.
flush_largest_memtables_at: 0.75

# emergency pressure valve #2: the first time heap usage after a full
# (CMS) garbage collection is above this fraction of the max,
# Cassandra will reduce cache maximum _capacity_ to the given fraction
# of the current _size_.  Should usually be set substantially above
# flush_largest_memtables_at, since that will have less long-term
# impact on the system.
#
# Set to 1.0 to disable.  Setting this lower than
# CMSInitiatingOccupancyFraction is not likely to be useful.
reduce_cache_sizes_at: 0.9
reduce_cache_capacity_to: 0.6
{noformat}

> Add "reduce memory usage because I tuned things poorly" feature
> ---------------------------------------------------------------
>
>                 Key: CASSANDRA-2142
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2142
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>            Priority: Minor
>              Labels: ponies
>             Fix For: 0.7.1
>
>         Attachments: 2142-v2.txt, 2142-v3.txt, 2142.txt
>
>
> Users frequently create too many columnfamilies, set the memtable thresholds too high (or adjust throughput while ignoring operations), and/or set caching thresholds too high.  Then their server OOMs and they tell their friends Cassandra sucks.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (CASSANDRA-2142) Add "reduce memory usage because I tuned things poorly" feature

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992755#comment-12992755 ] 

Jonathan Ellis commented on CASSANDRA-2142:
-------------------------------------------

Unfortunately, no.  By the time it's CMSing you're already in trouble under heavy load.  So you'd really want 2006 too.

> Add "reduce memory usage because I tuned things poorly" feature
> ---------------------------------------------------------------
>
>                 Key: CASSANDRA-2142
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2142
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>            Priority: Minor
>              Labels: ponies
>             Fix For: 0.7.1
>
>         Attachments: 2142-v2.txt, 2142-v3.txt, 2142.txt
>
>
> Users frequently create too many columnfamilies, set the memtable thresholds too high (or adjust throughput while ignoring operations), and/or set caching thresholds too high.  Then their server OOMs and they tell their friends Cassandra sucks.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (CASSANDRA-2142) Add "reduce memory usage because I tuned things poorly" feature

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992763#comment-12992763 ] 

Hudson commented on CASSANDRA-2142:
-----------------------------------

Integrated in Cassandra-0.7 #270 (See [https://hudson.apache.org/hudson/job/Cassandra-0.7/270/])
    add flush_largest_memtables_at and reduce_cache_sizes_at options
patch by jbellis; reviewed by brandonwilliams for CASSANDRA-2142


> Add "reduce memory usage because I tuned things poorly" feature
> ---------------------------------------------------------------
>
>                 Key: CASSANDRA-2142
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2142
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>            Priority: Minor
>              Labels: ponies
>             Fix For: 0.7.1
>
>         Attachments: 2142-v2.txt, 2142-v3.txt, 2142.txt
>
>
> Users frequently create too many columnfamilies, set the memtable thresholds too high (or adjust throughput while ignoring operations), and/or set caching thresholds too high.  Then their server OOMs and they tell their friends Cassandra sucks.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Issue Comment Edited: (CASSANDRA-2142) Add "reduce memory usage because I tuned things poorly" feature

Posted by "Brandon Williams (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992194#comment-12992194 ] 

Brandon Williams edited comment on CASSANDRA-2142 at 2/8/11 9:30 PM:
---------------------------------------------------------------------

Part of the problem there is the heap size of 128M is ridiculously small and you can insert data quickly enough that CMS probably never gets a chance before it goes into the death spiral I'm seeing around 90% on various (1G+) heaps.  The threshold is too high though, although this may ultimately depend on how much breathing room is left in the heap (and thus the size of the heap itself.)

      was (Author: brandon.williams):
    Part of the problem there is the heap size of 128M is ridiculously small and you can insert data quickly enough that CMS probably never gets a chance before it goes into the death spiral I'm seen around 90% on various (1G+) heaps.  The threshold is too high though, although this may ultimately depend on how much breathing room is left in the heap (and thus the size of the heap itself.)
  
> Add "reduce memory usage because I tuned things poorly" feature
> ---------------------------------------------------------------
>
>                 Key: CASSANDRA-2142
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2142
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>            Priority: Minor
>              Labels: ponies
>             Fix For: 0.7.1
>
>         Attachments: 2142.txt
>
>
> Users frequently create too many columnfamilies, set the memtable thresholds too high (or adjust throughput while ignoring operations), and/or set caching thresholds too high.  Then their server OOMs and they tell their friends Cassandra sucks.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira