You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Jonathan Ellis (JIRA)" <ji...@apache.org> on 2010/06/22 18:45:59 UTC
[jira] Issue Comment Edited: (CASSANDRA-1014) GC storming, possible memory leak

    [ https://issues.apache.org/jira/browse/CASSANDRA-1014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881243#action_12881243 ] 

Jonathan Ellis edited comment on CASSANDRA-1014 at 6/22/10 12:44 PM:
---------------------------------------------------------------------

Jacob Kessler explains:

bq. Without the ExplicitGCInvokesConcurrent option, a manually-invoked GC (Anything that calls System.gc(), including the mbean) will invoke a full stop-the-world collection, which has a few different properties in terms of the garbage that it can safely collect than the CMS Cassandra usually uses, i.e. all of it, rather than not collecting things that weren't garbage at the beginning of the sweep.

bq. Off-hand, though, I'd say that those graphs make it look very much like you have a memory leak. I'd wonder if you end up holding stuff for too long in the commitlog (I don't know what that is, but changing what you do with it seems to change you memory behavior =), possibly waiting for a lull in inserts to write it or something like that? I've definitely seen cases where the pause of a full GC causes things in the program to time out and become garbage, which then at least temporarily solves the problem.


      was (Author: jbellis):
    Jacob Kessler explains:

bq. Without the ExplicitGCInvokesConcurrent option, a manually-invoked GC (Anything that calls System.gc(), including the mbean) will invoke a full stop-the-world collection, which has a few different properties in terms of the garbage that it can safely collect than the CMS Cassandra usually uses, i.e. all of it, rather than not collecting things that weren't garbage at the beginning of the sweep.

bq. Off-hand, though, I'd say that those graphs make it look very much like you have a memory leak. I'd wonder if you end up holding stuff for too long in the commitlog (I don't know what that is, but changing what you do with it seems to change you memory behavior =), possibly waiting for a lull in
inserts to write it or something like that? I've definitely seen cases where the pause of a full GC causes things in the program to time out and become
garbage, which then at least temporarily solves the problem.
  
> GC storming, possible memory leak
> ---------------------------------
>
>                 Key: CASSANDRA-1014
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1014
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.6
>         Environment: debian lenny amd64 OpenJDK 64-Bit Server VM (build 1.6.0_0-b11, mixed mode)
>            Reporter: Brandon Williams
>            Assignee: Jonathan Ellis
>             Fix For: 0.6.3
>
>         Attachments: 1014-2Gheap.png, 1014-commitlog-v2.tar.gz, 1014-table.diff, 724-0001.png
>
>
> There appears to be a GC issue due to memory pressure in the 0.6 branch.  You can see this by starting the server and performing many inserts.  Quickly the jvm will consume most of its heap, and pauses for stop-the-world GC will begin.  With verbose GC turned on, this can be observed as follows:
> [GC [ParNew (promotion failed): 79703K->79703K(84544K), 0.0622980 secs][CMS[CMS-concurrent-mark: 3.678/5.031 secs] [Times: user=10.35 sys=4.22, real=5.03 secs]
>  (concurrent mode failure): 944529K->492222K(963392K), 2.8264480 secs] 990745K->492222K(1047936K), 2.8890500 secs] [Times: user=2.90 sys=0.04, real=2.90 secs]
> After enough inserts (around 75-100 million) the server will GC storm and then OOM.
> jbellis and I narrowed this down to patch 0001 in CASSANDRA-724.  Switching LBQ with ABQ made no difference, however using batch mode instead of periodic for the commitlog does prevent the issue from occurring.  The attached screenshot shows the heap usage in jconsole first when the issue is exhibiting, a restart, and then the same amount of inserts when it does not.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.