You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Daniel Doubleday (JIRA)" <ji...@apache.org> on 2011/07/07 16:07:16 UTC

[jira] [Created] (CASSANDRA-2868) Native Memory Leak

Native Memory Leak
------------------

                 Key: CASSANDRA-2868
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2868
             Project: Cassandra
          Issue Type: Bug
          Components: Core
    Affects Versions: 0.7.6
            Reporter: Daniel Doubleday
            Priority: Minor


We have memory issues with long running servers. These have been confirmed by several users in the user list. That's why I report.

The memory consumption of the cassandra java process increases steadily until it's killed by the os because of oom (with no swap)

Our server is started with -Xmx3000M and running for around 23 days.

pmap -x shows

Total SST: 1961616 (mem mapped data and index files)
Anon  RSS: 6499640
Total RSS: 8478376

This shows that > 3G are 'overallocated'.

We will use BRAF on one of our less important nodes to check wether it is related to mmap and report back.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2868) Native Memory Leak

Posted by "Zhu Han (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13066009#comment-13066009 ] 

Zhu Han commented on CASSANDRA-2868:
------------------------------------

{We have been running this in production for 3 days now and rss increased only insignificantly by ~5MB a day}

Do you mean -XX:MaxDirectMemorySize is very helpful to control RSS increasing? 

I have no idea why just some of us meets the problem. I suppose it is a kernel bug.

> Native Memory Leak
> ------------------
>
>                 Key: CASSANDRA-2868
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2868
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.6
>            Reporter: Daniel Doubleday
>            Priority: Minor
>         Attachments: 2868-v1.txt, low-load-36-hours-initial-results.png
>
>
> We have memory issues with long running servers. These have been confirmed by several users in the user list. That's why I report.
> The memory consumption of the cassandra java process increases steadily until it's killed by the os because of oom (with no swap)
> Our server is started with -Xmx3000M and running for around 23 days.
> pmap -x shows
> Total SST: 1961616 (mem mapped data and index files)
> Anon  RSS: 6499640
> Total RSS: 8478376
> This shows that > 3G are 'overallocated'.
> We will use BRAF on one of our less important nodes to check wether it is related to mmap and report back.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (CASSANDRA-2868) Native Memory Leak

Posted by "Daniel Doubleday (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13063280#comment-13063280 ] 

Daniel Doubleday edited comment on CASSANDRA-2868 at 7/11/11 10:45 AM:
-----------------------------------------------------------------------

Hm after 3 days checking a node that does not use mmaped files it looks like this:

nativelib: 14128
locale-archive: 1492
ffiSwFShY(deleted): 8
javajar: 2292
[anon]: 3609388
[stack]: 132
java: 44
7008: 32
jna534482390478104336.tmp: 92

Total RSS: 3627608
Total SST: 0


Compared to start RSS increased by ~400MB. So it seems that this is not related to mem mapping.

We will deploy CASSANDRA-2654 this week. Will see if that changes anything but I suspect not ...

      was (Author: doubleday):
    Hm after 3 days checking a node that does not use mmaped files it looks like this:

nativelib: 14128
locale-archive: 1492
ffiSwFShY(deleted): 8
javajar: 2292
[anon]: 3609388
[stack]: 132
java: 44
7008: 32
jna534482390478104336.tmp: 92

Total RSS: 3627608
Total SST: 0


Compared to start RSS increased by ~400MM. So it seems that this is not related to mem mapping.

We will deploy CASSANDRA-2654 this week. Will see if that changes anything but I suspect not ...
  
> Native Memory Leak
> ------------------
>
>                 Key: CASSANDRA-2868
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2868
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.6
>            Reporter: Daniel Doubleday
>            Priority: Minor
>
> We have memory issues with long running servers. These have been confirmed by several users in the user list. That's why I report.
> The memory consumption of the cassandra java process increases steadily until it's killed by the os because of oom (with no swap)
> Our server is started with -Xmx3000M and running for around 23 days.
> pmap -x shows
> Total SST: 1961616 (mem mapped data and index files)
> Anon  RSS: 6499640
> Total RSS: 8478376
> This shows that > 3G are 'overallocated'.
> We will use BRAF on one of our less important nodes to check wether it is related to mmap and report back.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-2868) Native Memory Leak

Posted by "Chris Burroughs (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Burroughs updated CASSANDRA-2868:
---------------------------------------

    Attachment: 2868-v1.txt

In case it is useful to anyone else this is what I intend to test with.

> Native Memory Leak
> ------------------
>
>                 Key: CASSANDRA-2868
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2868
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.6
>            Reporter: Daniel Doubleday
>            Priority: Minor
>         Attachments: 2868-v1.txt
>
>
> We have memory issues with long running servers. These have been confirmed by several users in the user list. That's why I report.
> The memory consumption of the cassandra java process increases steadily until it's killed by the os because of oom (with no swap)
> Our server is started with -Xmx3000M and running for around 23 days.
> pmap -x shows
> Total SST: 1961616 (mem mapped data and index files)
> Anon  RSS: 6499640
> Total RSS: 8478376
> This shows that > 3G are 'overallocated'.
> We will use BRAF on one of our less important nodes to check wether it is related to mmap and report back.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2868) Native Memory Leak

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13085850#comment-13085850 ] 

Jonathan Ellis commented on CASSANDRA-2868:
-------------------------------------------

dirty working directory.  GCI is the only relevant file.

> Native Memory Leak
> ------------------
>
>                 Key: CASSANDRA-2868
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2868
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Daniel Doubleday
>            Assignee: Brandon Williams
>            Priority: Minor
>             Fix For: 0.8.5
>
>         Attachments: 2868-v1.txt, 2868-v2.txt, 2868-v3.txt, 48hour_RES.png, low-load-36-hours-initial-results.png
>
>
> We have memory issues with long running servers. These have been confirmed by several users in the user list. That's why I report.
> The memory consumption of the cassandra java process increases steadily until it's killed by the os because of oom (with no swap)
> Our server is started with -Xmx3000M and running for around 23 days.
> pmap -x shows
> Total SST: 1961616 (mem mapped data and index files)
> Anon  RSS: 6499640
> Total RSS: 8478376
> This shows that > 3G are 'overallocated'.
> We will use BRAF on one of our less important nodes to check wether it is related to mmap and report back.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2868) Native Memory Leak

Posted by "Daniel Doubleday (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13066835#comment-13066835 ] 

Daniel Doubleday commented on CASSANDRA-2868:
---------------------------------------------

Looks good to me. Guess cassandra should just disable the inspector for now (probably make it jmx'able to start it manually)

Thu Jul 14 09:39:26 CEST 2011: [anon]: 3234068
Thu Jul 14 17:22:45 CEST 2011: [anon]: 3266888
Fri Jul 15 09:33:53 CEST 2011: [anon]: 3269160
Mon Jul 18 09:54:29 CEST 2011: [anon]: 3270188

> Native Memory Leak
> ------------------
>
>                 Key: CASSANDRA-2868
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2868
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.6
>            Reporter: Daniel Doubleday
>            Priority: Minor
>         Attachments: 2868-v1.txt, low-load-36-hours-initial-results.png
>
>
> We have memory issues with long running servers. These have been confirmed by several users in the user list. That's why I report.
> The memory consumption of the cassandra java process increases steadily until it's killed by the os because of oom (with no swap)
> Our server is started with -Xmx3000M and running for around 23 days.
> pmap -x shows
> Total SST: 1961616 (mem mapped data and index files)
> Anon  RSS: 6499640
> Total RSS: 8478376
> This shows that > 3G are 'overallocated'.
> We will use BRAF on one of our less important nodes to check wether it is related to mmap and report back.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (CASSANDRA-2868) Native Memory Leak

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis reassigned CASSANDRA-2868:
-----------------------------------------

    Assignee: Brandon Williams

We could try switching to what JConsole does, which is just log the total number and time spent for each compaction type.  This uses a different API which hopefully does not leak: http://www.java2s.com/Open-Source/Java-Document/6.0-JDK-Modules-sun/tools/sun/tools/jconsole/MemoryTab.java.htm

Logging the lifetime totals there with StatusLogger similar to what we do now for dropped messages would be better than nothing.

> Native Memory Leak
> ------------------
>
>                 Key: CASSANDRA-2868
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2868
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Daniel Doubleday
>            Assignee: Brandon Williams
>            Priority: Minor
>             Fix For: 0.7.7, 0.8.2
>
>         Attachments: 2868-v1.txt, low-load-36-hours-initial-results.png
>
>
> We have memory issues with long running servers. These have been confirmed by several users in the user list. That's why I report.
> The memory consumption of the cassandra java process increases steadily until it's killed by the os because of oom (with no swap)
> Our server is started with -Xmx3000M and running for around 23 days.
> pmap -x shows
> Total SST: 1961616 (mem mapped data and index files)
> Anon  RSS: 6499640
> Total RSS: 8478376
> This shows that > 3G are 'overallocated'.
> We will use BRAF on one of our less important nodes to check wether it is related to mmap and report back.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2868) Native Memory Leak

Posted by "Brandon Williams (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13073409#comment-13073409 ] 

Brandon Williams commented on CASSANDRA-2868:
---------------------------------------------

I created three isolated nodes, all with a hack of setting the inspector interval to 1ms applied (not the tightest loop, but good enough and easy.)  One of the nodes had the inspector disabled entirely (the control), one was vanilla, and one had v2 applied.  After starting them up with a 128M heap and letting them run for a few minutes, here are the results:

||version||resident||
|control|72M|
|patched|72M|
|vanilla|540M|

I think it's safe to say java.lang.management doesn't share the leak.

> Native Memory Leak
> ------------------
>
>                 Key: CASSANDRA-2868
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2868
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Daniel Doubleday
>            Assignee: Brandon Williams
>            Priority: Minor
>             Fix For: 0.8.3
>
>         Attachments: 2868-v1.txt, 2868-v2.txt, 48hour_RES.png, low-load-36-hours-initial-results.png
>
>
> We have memory issues with long running servers. These have been confirmed by several users in the user list. That's why I report.
> The memory consumption of the cassandra java process increases steadily until it's killed by the os because of oom (with no swap)
> Our server is started with -Xmx3000M and running for around 23 days.
> pmap -x shows
> Total SST: 1961616 (mem mapped data and index files)
> Anon  RSS: 6499640
> Total RSS: 8478376
> This shows that > 3G are 'overallocated'.
> We will use BRAF on one of our less important nodes to check wether it is related to mmap and report back.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-2868) Native Memory Leak

Posted by "Chris Burroughs (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Burroughs updated CASSANDRA-2868:
---------------------------------------

    Attachment: 48hour_RES.png

48 hours under production load after C* had already been running for a few days.  Two on the left have GCInspector enabled.  The two on the right do not.  (Note that the scale on the lower right one reflects a change of only 10s of bytes.)

So it looks like victory to me.

> Native Memory Leak
> ------------------
>
>                 Key: CASSANDRA-2868
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2868
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Daniel Doubleday
>            Assignee: Brandon Williams
>            Priority: Minor
>             Fix For: 0.8.3
>
>         Attachments: 2868-v1.txt, 48hour_RES.png, low-load-36-hours-initial-results.png
>
>
> We have memory issues with long running servers. These have been confirmed by several users in the user list. That's why I report.
> The memory consumption of the cassandra java process increases steadily until it's killed by the os because of oom (with no swap)
> Our server is started with -Xmx3000M and running for around 23 days.
> pmap -x shows
> Total SST: 1961616 (mem mapped data and index files)
> Anon  RSS: 6499640
> Total RSS: 8478376
> This shows that > 3G are 'overallocated'.
> We will use BRAF on one of our less important nodes to check wether it is related to mmap and report back.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2868) Native Memory Leak

Posted by "Chris Burroughs (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13064247#comment-13064247 ] 

Chris Burroughs commented on CASSANDRA-2868:
--------------------------------------------

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7066129 will be the id when bugs.sun.com gets around to doing it's thing.

I confirmed that -XX:MaxDirectMemorySize does not protect you from this (ie it's a native native leak, not some DirectByteBuffer thing).  I'll be able to test this but not until the end of this week at the earliest (and it will then take at least another week to be sure).

> Native Memory Leak
> ------------------
>
>                 Key: CASSANDRA-2868
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2868
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.6
>            Reporter: Daniel Doubleday
>            Priority: Minor
>
> We have memory issues with long running servers. These have been confirmed by several users in the user list. That's why I report.
> The memory consumption of the cassandra java process increases steadily until it's killed by the os because of oom (with no swap)
> Our server is started with -Xmx3000M and running for around 23 days.
> pmap -x shows
> Total SST: 1961616 (mem mapped data and index files)
> Anon  RSS: 6499640
> Total RSS: 8478376
> This shows that > 3G are 'overallocated'.
> We will use BRAF on one of our less important nodes to check wether it is related to mmap and report back.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-2868) Native Memory Leak

Posted by "Brandon Williams (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brandon Williams updated CASSANDRA-2868:
----------------------------------------

    Attachment: 2868-v2.txt

v2 switches the GCInspector to us java.lang.managment.  I don't know if it too leaks or not yet.

> Native Memory Leak
> ------------------
>
>                 Key: CASSANDRA-2868
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2868
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Daniel Doubleday
>            Assignee: Brandon Williams
>            Priority: Minor
>             Fix For: 0.8.3
>
>         Attachments: 2868-v1.txt, 2868-v2.txt, 48hour_RES.png, low-load-36-hours-initial-results.png
>
>
> We have memory issues with long running servers. These have been confirmed by several users in the user list. That's why I report.
> The memory consumption of the cassandra java process increases steadily until it's killed by the os because of oom (with no swap)
> Our server is started with -Xmx3000M and running for around 23 days.
> pmap -x shows
> Total SST: 1961616 (mem mapped data and index files)
> Anon  RSS: 6499640
> Total RSS: 8478376
> This shows that > 3G are 'overallocated'.
> We will use BRAF on one of our less important nodes to check wether it is related to mmap and report back.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-2868) Native Memory Leak

Posted by "Chris Burroughs (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Burroughs updated CASSANDRA-2868:
---------------------------------------

    Attachment: low-load-36-hours-initial-results.png

Initial results.  Graph of VmRSS from /proc/PID/status at 10 second intervals from my last comment to now.  Box on the left has GCInspector disabled.  These are on two test boxes under trivial load so this is all still *very* tentative.  Will start testing under real load by early next week.

> Native Memory Leak
> ------------------
>
>                 Key: CASSANDRA-2868
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2868
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.6
>            Reporter: Daniel Doubleday
>            Priority: Minor
>         Attachments: 2868-v1.txt, low-load-36-hours-initial-results.png
>
>
> We have memory issues with long running servers. These have been confirmed by several users in the user list. That's why I report.
> The memory consumption of the cassandra java process increases steadily until it's killed by the os because of oom (with no swap)
> Our server is started with -Xmx3000M and running for around 23 days.
> pmap -x shows
> Total SST: 1961616 (mem mapped data and index files)
> Anon  RSS: 6499640
> Total RSS: 8478376
> This shows that > 3G are 'overallocated'.
> We will use BRAF on one of our less important nodes to check wether it is related to mmap and report back.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-2868) Native Memory Leak

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-2868:
--------------------------------------

    Attachment: 2868-v3.txt

bq. I've never actually been able to get > 1 to happen, but we can add it to the logging

I'm sure it's possible w/ a small enough heap, especially since GCInspector is paused along w/ everything else for STW collections (including new gen).

v3 attached to accomodate this and add durationPerCollection.

> Native Memory Leak
> ------------------
>
>                 Key: CASSANDRA-2868
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2868
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Daniel Doubleday
>            Assignee: Brandon Williams
>            Priority: Minor
>             Fix For: 0.8.5
>
>         Attachments: 2868-v1.txt, 2868-v2.txt, 2868-v3.txt, 48hour_RES.png, low-load-36-hours-initial-results.png
>
>
> We have memory issues with long running servers. These have been confirmed by several users in the user list. That's why I report.
> The memory consumption of the cassandra java process increases steadily until it's killed by the os because of oom (with no swap)
> Our server is started with -Xmx3000M and running for around 23 days.
> pmap -x shows
> Total SST: 1961616 (mem mapped data and index files)
> Anon  RSS: 6499640
> Total RSS: 8478376
> This shows that > 3G are 'overallocated'.
> We will use BRAF on one of our less important nodes to check wether it is related to mmap and report back.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-2868) Native Memory Leak

Posted by "Brandon Williams (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brandon Williams updated CASSANDRA-2868:
----------------------------------------

    Fix Version/s: 0.7.9

> Native Memory Leak
> ------------------
>
>                 Key: CASSANDRA-2868
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2868
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Daniel Doubleday
>            Assignee: Brandon Williams
>            Priority: Minor
>             Fix For: 0.7.9, 0.8.5
>
>         Attachments: 2868-v1.txt, 2868-v2.txt, 2868-v3.txt, 48hour_RES.png, low-load-36-hours-initial-results.png
>
>
> We have memory issues with long running servers. These have been confirmed by several users in the user list. That's why I report.
> The memory consumption of the cassandra java process increases steadily until it's killed by the os because of oom (with no swap)
> Our server is started with -Xmx3000M and running for around 23 days.
> pmap -x shows
> Total SST: 1961616 (mem mapped data and index files)
> Anon  RSS: 6499640
> Total RSS: 8478376
> This shows that > 3G are 'overallocated'.
> We will use BRAF on one of our less important nodes to check wether it is related to mmap and report back.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-2868) Native Memory Leak

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-2868:
--------------------------------------

    Affects Version/s:     (was: 0.7.6)
        Fix Version/s: 0.7.7
                       0.8.2

> Native Memory Leak
> ------------------
>
>                 Key: CASSANDRA-2868
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2868
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Daniel Doubleday
>            Assignee: Brandon Williams
>            Priority: Minor
>             Fix For: 0.7.7, 0.8.2
>
>         Attachments: 2868-v1.txt, low-load-36-hours-initial-results.png
>
>
> We have memory issues with long running servers. These have been confirmed by several users in the user list. That's why I report.
> The memory consumption of the cassandra java process increases steadily until it's killed by the os because of oom (with no swap)
> Our server is started with -Xmx3000M and running for around 23 days.
> pmap -x shows
> Total SST: 1961616 (mem mapped data and index files)
> Anon  RSS: 6499640
> Total RSS: 8478376
> This shows that > 3G are 'overallocated'.
> We will use BRAF on one of our less important nodes to check wether it is related to mmap and report back.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2868) Native Memory Leak

Posted by "Jeremiah Jordan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13086102#comment-13086102 ] 

Jeremiah Jordan commented on CASSANDRA-2868:
--------------------------------------------

Can we get this in 0.7.X as well?

> Native Memory Leak
> ------------------
>
>                 Key: CASSANDRA-2868
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2868
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Daniel Doubleday
>            Assignee: Brandon Williams
>            Priority: Minor
>             Fix For: 0.8.5
>
>         Attachments: 2868-v1.txt, 2868-v2.txt, 2868-v3.txt, 48hour_RES.png, low-load-36-hours-initial-results.png
>
>
> We have memory issues with long running servers. These have been confirmed by several users in the user list. That's why I report.
> The memory consumption of the cassandra java process increases steadily until it's killed by the os because of oom (with no swap)
> Our server is started with -Xmx3000M and running for around 23 days.
> pmap -x shows
> Total SST: 1961616 (mem mapped data and index files)
> Anon  RSS: 6499640
> Total RSS: 8478376
> This shows that > 3G are 'overallocated'.
> We will use BRAF on one of our less important nodes to check wether it is related to mmap and report back.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2868) Native Memory Leak

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13071895#comment-13071895 ] 

Jonathan Ellis commented on CASSANDRA-2868:
-------------------------------------------

Thanks, Chris.  We'll work on rewriting GCInspector to use the java.lang.management api instead, unless you have time to take a stab at that.

> Native Memory Leak
> ------------------
>
>                 Key: CASSANDRA-2868
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2868
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Daniel Doubleday
>            Assignee: Brandon Williams
>            Priority: Minor
>             Fix For: 0.8.3
>
>         Attachments: 2868-v1.txt, 48hour_RES.png, low-load-36-hours-initial-results.png
>
>
> We have memory issues with long running servers. These have been confirmed by several users in the user list. That's why I report.
> The memory consumption of the cassandra java process increases steadily until it's killed by the os because of oom (with no swap)
> Our server is started with -Xmx3000M and running for around 23 days.
> pmap -x shows
> Total SST: 1961616 (mem mapped data and index files)
> Anon  RSS: 6499640
> Total RSS: 8478376
> This shows that > 3G are 'overallocated'.
> We will use BRAF on one of our less important nodes to check wether it is related to mmap and report back.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2868) Native Memory Leak

Posted by "Daniel Doubleday (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13066372#comment-13066372 ] 

Daniel Doubleday commented on CASSANDRA-2868:
---------------------------------------------

Yes - we did disable the GCInspector.

> Native Memory Leak
> ------------------
>
>                 Key: CASSANDRA-2868
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2868
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.6
>            Reporter: Daniel Doubleday
>            Priority: Minor
>         Attachments: 2868-v1.txt, low-load-36-hours-initial-results.png
>
>
> We have memory issues with long running servers. These have been confirmed by several users in the user list. That's why I report.
> The memory consumption of the cassandra java process increases steadily until it's killed by the os because of oom (with no swap)
> Our server is started with -Xmx3000M and running for around 23 days.
> pmap -x shows
> Total SST: 1961616 (mem mapped data and index files)
> Anon  RSS: 6499640
> Total RSS: 8478376
> This shows that > 3G are 'overallocated'.
> We will use BRAF on one of our less important nodes to check wether it is related to mmap and report back.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2868) Native Memory Leak

Posted by "Chris Burroughs (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13066057#comment-13066057 ] 

Chris Burroughs commented on CASSANDRA-2868:
--------------------------------------------

I interpreted Daniel's "this" to be the 2868-v1.txt patch (or something equivalent) with cassandra.enable_gc_inspector=false.  I did not find -XX:MaxDirectMemorySize to be helpful.

> Native Memory Leak
> ------------------
>
>                 Key: CASSANDRA-2868
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2868
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.6
>            Reporter: Daniel Doubleday
>            Priority: Minor
>         Attachments: 2868-v1.txt, low-load-36-hours-initial-results.png
>
>
> We have memory issues with long running servers. These have been confirmed by several users in the user list. That's why I report.
> The memory consumption of the cassandra java process increases steadily until it's killed by the os because of oom (with no swap)
> Our server is started with -Xmx3000M and running for around 23 days.
> pmap -x shows
> Total SST: 1961616 (mem mapped data and index files)
> Anon  RSS: 6499640
> Total RSS: 8478376
> This shows that > 3G are 'overallocated'.
> We will use BRAF on one of our less important nodes to check wether it is related to mmap and report back.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2868) Native Memory Leak

Posted by "Chris Burroughs (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13063463#comment-13063463 ] 

Chris Burroughs commented on CASSANDRA-2868:
--------------------------------------------

At one point I was convinced this was a JVM bug and opened http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7037080  After seeing how totally broken NIO is after CASSANDRA-2654 I'm no longer sure of anything.

I was going to start a survey on the user list after the summit to see if any OS/jvm level pattern could be found, since clearly it doesn't happen to everyone in all cases.

> Native Memory Leak
> ------------------
>
>                 Key: CASSANDRA-2868
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2868
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.6
>            Reporter: Daniel Doubleday
>            Priority: Minor
>
> We have memory issues with long running servers. These have been confirmed by several users in the user list. That's why I report.
> The memory consumption of the cassandra java process increases steadily until it's killed by the os because of oom (with no swap)
> Our server is started with -Xmx3000M and running for around 23 days.
> pmap -x shows
> Total SST: 1961616 (mem mapped data and index files)
> Anon  RSS: 6499640
> Total RSS: 8478376
> This shows that > 3G are 'overallocated'.
> We will use BRAF on one of our less important nodes to check wether it is related to mmap and report back.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2868) Native Memory Leak

Posted by "Jeremiah Jordan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13073006#comment-13073006 ] 

Jeremiah Jordan commented on CASSANDRA-2868:
--------------------------------------------

Depending how long the rewrite is going to take, can we get the config file option to disable gc inspector into a new 0.7.X and 0.8.X release?

> Native Memory Leak
> ------------------
>
>                 Key: CASSANDRA-2868
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2868
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Daniel Doubleday
>            Assignee: Brandon Williams
>            Priority: Minor
>             Fix For: 0.8.3
>
>         Attachments: 2868-v1.txt, 48hour_RES.png, low-load-36-hours-initial-results.png
>
>
> We have memory issues with long running servers. These have been confirmed by several users in the user list. That's why I report.
> The memory consumption of the cassandra java process increases steadily until it's killed by the os because of oom (with no swap)
> Our server is started with -Xmx3000M and running for around 23 days.
> pmap -x shows
> Total SST: 1961616 (mem mapped data and index files)
> Anon  RSS: 6499640
> Total RSS: 8478376
> This shows that > 3G are 'overallocated'.
> We will use BRAF on one of our less important nodes to check wether it is related to mmap and report back.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2868) Native Memory Leak

Posted by "Brandon Williams (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13085848#comment-13085848 ] 

Brandon Williams commented on CASSANDRA-2868:
---------------------------------------------

Why is v3 touching compaction?

> Native Memory Leak
> ------------------
>
>                 Key: CASSANDRA-2868
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2868
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Daniel Doubleday
>            Assignee: Brandon Williams
>            Priority: Minor
>             Fix For: 0.8.5
>
>         Attachments: 2868-v1.txt, 2868-v2.txt, 2868-v3.txt, 48hour_RES.png, low-load-36-hours-initial-results.png
>
>
> We have memory issues with long running servers. These have been confirmed by several users in the user list. That's why I report.
> The memory consumption of the cassandra java process increases steadily until it's killed by the os because of oom (with no swap)
> Our server is started with -Xmx3000M and running for around 23 days.
> pmap -x shows
> Total SST: 1961616 (mem mapped data and index files)
> Anon  RSS: 6499640
> Total RSS: 8478376
> This shows that > 3G are 'overallocated'.
> We will use BRAF on one of our less important nodes to check wether it is related to mmap and report back.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2868) Native Memory Leak

Posted by "Sylvain Lebresne (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13073636#comment-13073636 ] 

Sylvain Lebresne commented on CASSANDRA-2868:
---------------------------------------------

Comments on v2:
* Couldn't we estimate the reclaimed size by recording the last memory used (that would need to be the first thing we do in logGCResults so that we record it each time) ?
* Wouldn't it be worth indicating that how many collection have been done since last log message if it's > 1, since it can (be > 1).
* Nit: especially if we decide to keep the last memory used, it may be more efficient (in cleaner imho) to have just one HashMap of string -> GCInfo where GCInfo would be a small struct with times, counts and usedMemory. Not that it is very performance sensitive... 

> Native Memory Leak
> ------------------
>
>                 Key: CASSANDRA-2868
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2868
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Daniel Doubleday
>            Assignee: Brandon Williams
>            Priority: Minor
>             Fix For: 0.8.3
>
>         Attachments: 2868-v1.txt, 2868-v2.txt, 48hour_RES.png, low-load-36-hours-initial-results.png
>
>
> We have memory issues with long running servers. These have been confirmed by several users in the user list. That's why I report.
> The memory consumption of the cassandra java process increases steadily until it's killed by the os because of oom (with no swap)
> Our server is started with -Xmx3000M and running for around 23 days.
> pmap -x shows
> Total SST: 1961616 (mem mapped data and index files)
> Anon  RSS: 6499640
> Total RSS: 8478376
> This shows that > 3G are 'overallocated'.
> We will use BRAF on one of our less important nodes to check wether it is related to mmap and report back.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2868) Native Memory Leak

Posted by "Brandon Williams (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081834#comment-13081834 ] 

Brandon Williams commented on CASSANDRA-2868:
---------------------------------------------

bq. Wouldn't it be worth indicating that how many collection have been done since last log message if it's > 1, since it can (be > 1).

The only reason I added count tracking was to prevent it from firing when there were no GCs (the api is flakey.)  I've never actually been able to get > 1 to happen, but we can add it to the logging.

bq. IMO the duration-based thresholds are hard to reason about here, where we're dealing w/ summaries and not individual GC results.

We are dealing with individual GCs at least 99% of the time in practice.  The worst case is >1 GC inflates the gctime enough that we errantly log when it's not needed, but I imagine to trigger that you would have to be in a gc pressure situation already.

bq. I think I'd rather have something like the dropped messages logger, where every N seconds we log the summary we get from the mbean.

That seems like it could a lot of noise since GC is constantly happening.

bq. The flushLargestMemtables/reduceCacheSizes stuff should probably be removed. 

I think the logic there is still sound ("Did we just do a CMS? Is the heap still 80% full?") and it seems to work as well as it always has.



> Native Memory Leak
> ------------------
>
>                 Key: CASSANDRA-2868
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2868
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Daniel Doubleday
>            Assignee: Brandon Williams
>            Priority: Minor
>             Fix For: 0.8.4
>
>         Attachments: 2868-v1.txt, 2868-v2.txt, 48hour_RES.png, low-load-36-hours-initial-results.png
>
>
> We have memory issues with long running servers. These have been confirmed by several users in the user list. That's why I report.
> The memory consumption of the cassandra java process increases steadily until it's killed by the os because of oom (with no swap)
> Our server is started with -Xmx3000M and running for around 23 days.
> pmap -x shows
> Total SST: 1961616 (mem mapped data and index files)
> Anon  RSS: 6499640
> Total RSS: 8478376
> This shows that > 3G are 'overallocated'.
> We will use BRAF on one of our less important nodes to check wether it is related to mmap and report back.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (CASSANDRA-2868) Native Memory Leak

Posted by "Zhu Han (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13066009#comment-13066009 ] 

Zhu Han edited comment on CASSANDRA-2868 at 7/15/11 3:33 PM:
-------------------------------------------------------------

{quote}
We have been running this in production for 3 days now and rss increased only insignificantly by ~5MB a day
{quote}

Do you mean -XX:MaxDirectMemorySize is very helpful to control RSS increasing? 

I have no idea why just some of us meets the problem. I suppose it is a kernel bug.

      was (Author: hanzhu):
    {We have been running this in production for 3 days now and rss increased only insignificantly by ~5MB a day}

Do you mean -XX:MaxDirectMemorySize is very helpful to control RSS increasing? 

I have no idea why just some of us meets the problem. I suppose it is a kernel bug.
  
> Native Memory Leak
> ------------------
>
>                 Key: CASSANDRA-2868
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2868
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.6
>            Reporter: Daniel Doubleday
>            Priority: Minor
>         Attachments: 2868-v1.txt, low-load-36-hours-initial-results.png
>
>
> We have memory issues with long running servers. These have been confirmed by several users in the user list. That's why I report.
> The memory consumption of the cassandra java process increases steadily until it's killed by the os because of oom (with no swap)
> Our server is started with -Xmx3000M and running for around 23 days.
> pmap -x shows
> Total SST: 1961616 (mem mapped data and index files)
> Anon  RSS: 6499640
> Total RSS: 8478376
> This shows that > 3G are 'overallocated'.
> We will use BRAF on one of our less important nodes to check wether it is related to mmap and report back.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (CASSANDRA-2868) Native Memory Leak

Posted by "Brandon Williams (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081834#comment-13081834 ] 

Brandon Williams edited comment on CASSANDRA-2868 at 8/9/11 6:43 PM:
---------------------------------------------------------------------

bq. Wouldn't it be worth indicating that how many collection have been done since last log message if it's > 1, since it can (be > 1).

The only reason I added count tracking was to prevent it from firing when there were no GCs (the api is flakey.)  I've never actually been able to get > 1 to happen, but we can add it to the logging.

bq. IMO the duration-based thresholds are hard to reason about here, where we're dealing w/ summaries and not individual GC results.

We are dealing with individual GCs at least 99% of the time in practice.  The worst case is >1 GC inflates the gctime enough that we errantly log when it's not needed, but I imagine to trigger that you would have to be in a gc pressure situation already.

bq. I think I'd rather have something like the dropped messages logger, where every N seconds we log the summary we get from the mbean.

That seems like it could be a lot of noise since GC is constantly happening.

bq. The flushLargestMemtables/reduceCacheSizes stuff should probably be removed. 

I think the logic there is still sound ("Did we just do a CMS? Is the heap still 80% full?") and it seems to work as well as it always has.



      was (Author: brandon.williams):
    bq. Wouldn't it be worth indicating that how many collection have been done since last log message if it's > 1, since it can (be > 1).

The only reason I added count tracking was to prevent it from firing when there were no GCs (the api is flakey.)  I've never actually been able to get > 1 to happen, but we can add it to the logging.

bq. IMO the duration-based thresholds are hard to reason about here, where we're dealing w/ summaries and not individual GC results.

We are dealing with individual GCs at least 99% of the time in practice.  The worst case is >1 GC inflates the gctime enough that we errantly log when it's not needed, but I imagine to trigger that you would have to be in a gc pressure situation already.

bq. I think I'd rather have something like the dropped messages logger, where every N seconds we log the summary we get from the mbean.

That seems like it could a lot of noise since GC is constantly happening.

bq. The flushLargestMemtables/reduceCacheSizes stuff should probably be removed. 

I think the logic there is still sound ("Did we just do a CMS? Is the heap still 80% full?") and it seems to work as well as it always has.


  
> Native Memory Leak
> ------------------
>
>                 Key: CASSANDRA-2868
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2868
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Daniel Doubleday
>            Assignee: Brandon Williams
>            Priority: Minor
>             Fix For: 0.8.4
>
>         Attachments: 2868-v1.txt, 2868-v2.txt, 48hour_RES.png, low-load-36-hours-initial-results.png
>
>
> We have memory issues with long running servers. These have been confirmed by several users in the user list. That's why I report.
> The memory consumption of the cassandra java process increases steadily until it's killed by the os because of oom (with no swap)
> Our server is started with -Xmx3000M and running for around 23 days.
> pmap -x shows
> Total SST: 1961616 (mem mapped data and index files)
> Anon  RSS: 6499640
> Total RSS: 8478376
> This shows that > 3G are 'overallocated'.
> We will use BRAF on one of our less important nodes to check wether it is related to mmap and report back.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (CASSANDRA-2868) Native Memory Leak

Posted by "Brandon Williams (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brandon Williams resolved CASSANDRA-2868.
-----------------------------------------

    Resolution: Fixed

Committed to 0.7 in r1160879

> Native Memory Leak
> ------------------
>
>                 Key: CASSANDRA-2868
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2868
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Daniel Doubleday
>            Assignee: Brandon Williams
>            Priority: Minor
>             Fix For: 0.7.9, 0.8.5
>
>         Attachments: 2868-v1.txt, 2868-v2.txt, 2868-v3.txt, 48hour_RES.png, low-load-36-hours-initial-results.png
>
>
> We have memory issues with long running servers. These have been confirmed by several users in the user list. That's why I report.
> The memory consumption of the cassandra java process increases steadily until it's killed by the os because of oom (with no swap)
> Our server is started with -Xmx3000M and running for around 23 days.
> pmap -x shows
> Total SST: 1961616 (mem mapped data and index files)
> Anon  RSS: 6499640
> Total RSS: 8478376
> This shows that > 3G are 'overallocated'.
> We will use BRAF on one of our less important nodes to check wether it is related to mmap and report back.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (CASSANDRA-2868) Native Memory Leak

Posted by "Zhu Han (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13066009#comment-13066009 ] 

Zhu Han edited comment on CASSANDRA-2868 at 7/15/11 3:35 PM:
-------------------------------------------------------------

{quote}
We have been running this in production for 3 days now and rss increased only insignificantly by ~5MB a day
{quote}

Do you mean it is very helpful to control RSS increasing by removing getLastGcInfo()? 

I have no idea why just some of us meets the problem.

      was (Author: hanzhu):
    {quote}
We have been running this in production for 3 days now and rss increased only insignificantly by ~5MB a day
{quote}

Do you mean -XX:MaxDirectMemorySize is very helpful to control RSS increasing? 

I have no idea why just some of us meets the problem. I suppose it is a kernel bug.
  
> Native Memory Leak
> ------------------
>
>                 Key: CASSANDRA-2868
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2868
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.6
>            Reporter: Daniel Doubleday
>            Priority: Minor
>         Attachments: 2868-v1.txt, low-load-36-hours-initial-results.png
>
>
> We have memory issues with long running servers. These have been confirmed by several users in the user list. That's why I report.
> The memory consumption of the cassandra java process increases steadily until it's killed by the os because of oom (with no swap)
> Our server is started with -Xmx3000M and running for around 23 days.
> pmap -x shows
> Total SST: 1961616 (mem mapped data and index files)
> Anon  RSS: 6499640
> Total RSS: 8478376
> This shows that > 3G are 'overallocated'.
> We will use BRAF on one of our less important nodes to check wether it is related to mmap and report back.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2868) Native Memory Leak

Posted by "Zhu Han (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13066376#comment-13066376 ] 

Zhu Han commented on CASSANDRA-2868:
------------------------------------

Got it!

Do you have any idea why only some of us reports the problem?

> Native Memory Leak
> ------------------
>
>                 Key: CASSANDRA-2868
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2868
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.6
>            Reporter: Daniel Doubleday
>            Priority: Minor
>         Attachments: 2868-v1.txt, low-load-36-hours-initial-results.png
>
>
> We have memory issues with long running servers. These have been confirmed by several users in the user list. That's why I report.
> The memory consumption of the cassandra java process increases steadily until it's killed by the os because of oom (with no swap)
> Our server is started with -Xmx3000M and running for around 23 days.
> pmap -x shows
> Total SST: 1961616 (mem mapped data and index files)
> Anon  RSS: 6499640
> Total RSS: 8478376
> This shows that > 3G are 'overallocated'.
> We will use BRAF on one of our less important nodes to check wether it is related to mmap and report back.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2868) Native Memory Leak

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13064128#comment-13064128 ] 

Jonathan Ellis commented on CASSANDRA-2868:
-------------------------------------------

We call getLastGcInfo several times a second.  http://twitter.com/#!/kimchy/status/90861039930970113

You could try turning GCInspector methods into a no-op and see if that makes it go away.

> Native Memory Leak
> ------------------
>
>                 Key: CASSANDRA-2868
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2868
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.6
>            Reporter: Daniel Doubleday
>            Priority: Minor
>
> We have memory issues with long running servers. These have been confirmed by several users in the user list. That's why I report.
> The memory consumption of the cassandra java process increases steadily until it's killed by the os because of oom (with no swap)
> Our server is started with -Xmx3000M and running for around 23 days.
> pmap -x shows
> Total SST: 1961616 (mem mapped data and index files)
> Anon  RSS: 6499640
> Total RSS: 8478376
> This shows that > 3G are 'overallocated'.
> We will use BRAF on one of our less important nodes to check wether it is related to mmap and report back.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2868) Native Memory Leak

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13073661#comment-13073661 ] 

Jonathan Ellis commented on CASSANDRA-2868:
-------------------------------------------

bq. Couldn't we estimate the reclaimed size

Well, not really, what we'd have is "difference in size between last time it was called, and now" which isn't all that close to "amount reclaimed by a specific GC."

bq. Wouldn't it be worth indicating that how many collection have been done since last log message

IMO the duration-based thresholds are hard to reason about here, where we're dealing w/ summaries and not individual GC results.  I think I'd rather have something like the dropped messages logger, where every N seconds we log the summary we get from the mbean.

The flushLargestMemtables/reduceCacheSizes stuff should probably be removed. :(

> Native Memory Leak
> ------------------
>
>                 Key: CASSANDRA-2868
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2868
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Daniel Doubleday
>            Assignee: Brandon Williams
>            Priority: Minor
>             Fix For: 0.8.3
>
>         Attachments: 2868-v1.txt, 2868-v2.txt, 48hour_RES.png, low-load-36-hours-initial-results.png
>
>
> We have memory issues with long running servers. These have been confirmed by several users in the user list. That's why I report.
> The memory consumption of the cassandra java process increases steadily until it's killed by the os because of oom (with no swap)
> Our server is started with -Xmx3000M and running for around 23 days.
> pmap -x shows
> Total SST: 1961616 (mem mapped data and index files)
> Anon  RSS: 6499640
> Total RSS: 8478376
> This shows that > 3G are 'overallocated'.
> We will use BRAF on one of our less important nodes to check wether it is related to mmap and report back.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2868) Native Memory Leak

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13086100#comment-13086100 ] 

Hudson commented on CASSANDRA-2868:
-----------------------------------

Integrated in Cassandra-0.8 #282 (See [https://builds.apache.org/job/Cassandra-0.8/282/])
    work around native memory leak in com.sun.management.GarbageCollectorMXBean
patch by brandonwilliams and jbellis for CASSANDRA-2868

jbellis : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1158490
Files : 
* /cassandra/branches/cassandra-0.8/CHANGES.txt
* /cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/service/GCInspector.java


> Native Memory Leak
> ------------------
>
>                 Key: CASSANDRA-2868
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2868
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Daniel Doubleday
>            Assignee: Brandon Williams
>            Priority: Minor
>             Fix For: 0.8.5
>
>         Attachments: 2868-v1.txt, 2868-v2.txt, 2868-v3.txt, 48hour_RES.png, low-load-36-hours-initial-results.png
>
>
> We have memory issues with long running servers. These have been confirmed by several users in the user list. That's why I report.
> The memory consumption of the cassandra java process increases steadily until it's killed by the os because of oom (with no swap)
> Our server is started with -Xmx3000M and running for around 23 days.
> pmap -x shows
> Total SST: 1961616 (mem mapped data and index files)
> Anon  RSS: 6499640
> Total RSS: 8478376
> This shows that > 3G are 'overallocated'.
> We will use BRAF on one of our less important nodes to check wether it is related to mmap and report back.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2868) Native Memory Leak

Posted by "Daniel Doubleday (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13065867#comment-13065867 ] 

Daniel Doubleday commented on CASSANDRA-2868:
---------------------------------------------

It's indeed promising. We have been running this in production for 3 days now and rss increased only insignificantly by ~5MB a day. 

> Native Memory Leak
> ------------------
>
>                 Key: CASSANDRA-2868
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2868
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.6
>            Reporter: Daniel Doubleday
>            Priority: Minor
>         Attachments: 2868-v1.txt, low-load-36-hours-initial-results.png
>
>
> We have memory issues with long running servers. These have been confirmed by several users in the user list. That's why I report.
> The memory consumption of the cassandra java process increases steadily until it's killed by the os because of oom (with no swap)
> Our server is started with -Xmx3000M and running for around 23 days.
> pmap -x shows
> Total SST: 1961616 (mem mapped data and index files)
> Anon  RSS: 6499640
> Total RSS: 8478376
> This shows that > 3G are 'overallocated'.
> We will use BRAF on one of our less important nodes to check wether it is related to mmap and report back.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2868) Native Memory Leak

Posted by "Brandon Williams (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13086041#comment-13086041 ] 

Brandon Williams commented on CASSANDRA-2868:
---------------------------------------------

+1 to GCI changes.  Also, it is indeed possible to get >1 with a tiny heap.

> Native Memory Leak
> ------------------
>
>                 Key: CASSANDRA-2868
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2868
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Daniel Doubleday
>            Assignee: Brandon Williams
>            Priority: Minor
>             Fix For: 0.8.5
>
>         Attachments: 2868-v1.txt, 2868-v2.txt, 2868-v3.txt, 48hour_RES.png, low-load-36-hours-initial-results.png
>
>
> We have memory issues with long running servers. These have been confirmed by several users in the user list. That's why I report.
> The memory consumption of the cassandra java process increases steadily until it's killed by the os because of oom (with no swap)
> Our server is started with -Xmx3000M and running for around 23 days.
> pmap -x shows
> Total SST: 1961616 (mem mapped data and index files)
> Anon  RSS: 6499640
> Total RSS: 8478376
> This shows that > 3G are 'overallocated'.
> We will use BRAF on one of our less important nodes to check wether it is related to mmap and report back.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2868) Native Memory Leak

Posted by "Daniel Doubleday (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13066392#comment-13066392 ] 

Daniel Doubleday commented on CASSANDRA-2868:
---------------------------------------------

Well either it's environment specific or (more likely) others didn't notice / care because they have enough memory and/or restart the nodes often enough.

We have 16GB of RAM and run Cassandra with 3GB. Within one month we loose ~3GB (13GB -> 10GB) files system cache because of the mem leak. Looking at our graphs I can't really tell a difference performance wise. So I guess only people with weaker servers (less memory headroom) will really notice. We noticed only because we got the system oom on a cluster that's not critical and which we didn't really monitor.

> Native Memory Leak
> ------------------
>
>                 Key: CASSANDRA-2868
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2868
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.6
>            Reporter: Daniel Doubleday
>            Priority: Minor
>         Attachments: 2868-v1.txt, low-load-36-hours-initial-results.png
>
>
> We have memory issues with long running servers. These have been confirmed by several users in the user list. That's why I report.
> The memory consumption of the cassandra java process increases steadily until it's killed by the os because of oom (with no swap)
> Our server is started with -Xmx3000M and running for around 23 days.
> pmap -x shows
> Total SST: 1961616 (mem mapped data and index files)
> Anon  RSS: 6499640
> Total RSS: 8478376
> This shows that > 3G are 'overallocated'.
> We will use BRAF on one of our less important nodes to check wether it is related to mmap and report back.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Reopened] (CASSANDRA-2868) Native Memory Leak

Posted by "Brandon Williams (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brandon Williams reopened CASSANDRA-2868:
-----------------------------------------


Reopening to backport to 0.7

> Native Memory Leak
> ------------------
>
>                 Key: CASSANDRA-2868
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2868
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Daniel Doubleday
>            Assignee: Brandon Williams
>            Priority: Minor
>             Fix For: 0.8.5
>
>         Attachments: 2868-v1.txt, 2868-v2.txt, 2868-v3.txt, 48hour_RES.png, low-load-36-hours-initial-results.png
>
>
> We have memory issues with long running servers. These have been confirmed by several users in the user list. That's why I report.
> The memory consumption of the cassandra java process increases steadily until it's killed by the os because of oom (with no swap)
> Our server is started with -Xmx3000M and running for around 23 days.
> pmap -x shows
> Total SST: 1961616 (mem mapped data and index files)
> Anon  RSS: 6499640
> Total RSS: 8478376
> This shows that > 3G are 'overallocated'.
> We will use BRAF on one of our less important nodes to check wether it is related to mmap and report back.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2868) Native Memory Leak

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089721#comment-13089721 ] 

Hudson commented on CASSANDRA-2868:
-----------------------------------

Integrated in Cassandra-0.7 #543 (See [https://builds.apache.org/job/Cassandra-0.7/543/])
    work around native memory leak in com.sun.management.GarbageCollectorMXBean
patch by brandonwilliams and jbellis for CASSANDRA-2868

brandonwilliams : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1160879
Files : 
* /cassandra/branches/cassandra-0.7/CHANGES.txt
* /cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/service/GCInspector.java


> Native Memory Leak
> ------------------
>
>                 Key: CASSANDRA-2868
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2868
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Daniel Doubleday
>            Assignee: Brandon Williams
>            Priority: Minor
>             Fix For: 0.7.9, 0.8.5
>
>         Attachments: 2868-v1.txt, 2868-v2.txt, 2868-v3.txt, 48hour_RES.png, low-load-36-hours-initial-results.png
>
>
> We have memory issues with long running servers. These have been confirmed by several users in the user list. That's why I report.
> The memory consumption of the cassandra java process increases steadily until it's killed by the os because of oom (with no swap)
> Our server is started with -Xmx3000M and running for around 23 days.
> pmap -x shows
> Total SST: 1961616 (mem mapped data and index files)
> Anon  RSS: 6499640
> Total RSS: 8478376
> This shows that > 3G are 'overallocated'.
> We will use BRAF on one of our less important nodes to check wether it is related to mmap and report back.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2868) Native Memory Leak

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13065247#comment-13065247 ] 

Jonathan Ellis commented on CASSANDRA-2868:
-------------------------------------------

Promising!

> Native Memory Leak
> ------------------
>
>                 Key: CASSANDRA-2868
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2868
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.6
>            Reporter: Daniel Doubleday
>            Priority: Minor
>         Attachments: 2868-v1.txt, low-load-36-hours-initial-results.png
>
>
> We have memory issues with long running servers. These have been confirmed by several users in the user list. That's why I report.
> The memory consumption of the cassandra java process increases steadily until it's killed by the os because of oom (with no swap)
> Our server is started with -Xmx3000M and running for around 23 days.
> pmap -x shows
> Total SST: 1961616 (mem mapped data and index files)
> Anon  RSS: 6499640
> Total RSS: 8478376
> This shows that > 3G are 'overallocated'.
> We will use BRAF on one of our less important nodes to check wether it is related to mmap and report back.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2868) Native Memory Leak

Posted by "Daniel Doubleday (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13063280#comment-13063280 ] 

Daniel Doubleday commented on CASSANDRA-2868:
---------------------------------------------

Hm after 3 days checking a node that does not use mmaped files it looks like this:

nativelib: 14128
locale-archive: 1492
ffiSwFShY(deleted): 8
javajar: 2292
[anon]: 3609388
[stack]: 132
java: 44
7008: 32
jna534482390478104336.tmp: 92

Total RSS: 3627608
Total SST: 0


Compared to start RSS increased by ~400MM. So it seems that this is not related to mem mapping.

We will deploy CASSANDRA-2654 this week. Will see if that changes anything but I suspect not ...

> Native Memory Leak
> ------------------
>
>                 Key: CASSANDRA-2868
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2868
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.6
>            Reporter: Daniel Doubleday
>            Priority: Minor
>
> We have memory issues with long running servers. These have been confirmed by several users in the user list. That's why I report.
> The memory consumption of the cassandra java process increases steadily until it's killed by the os because of oom (with no swap)
> Our server is started with -Xmx3000M and running for around 23 days.
> pmap -x shows
> Total SST: 1961616 (mem mapped data and index files)
> Anon  RSS: 6499640
> Total RSS: 8478376
> This shows that > 3G are 'overallocated'.
> We will use BRAF on one of our less important nodes to check wether it is related to mmap and report back.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2868) Native Memory Leak

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13063523#comment-13063523 ] 

Jonathan Ellis commented on CASSANDRA-2868:
-------------------------------------------

Is your data size constant?  If not you are probably seeing growth in the index samples and bloom filters.

> Native Memory Leak
> ------------------
>
>                 Key: CASSANDRA-2868
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2868
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.6
>            Reporter: Daniel Doubleday
>            Priority: Minor
>
> We have memory issues with long running servers. These have been confirmed by several users in the user list. That's why I report.
> The memory consumption of the cassandra java process increases steadily until it's killed by the os because of oom (with no swap)
> Our server is started with -Xmx3000M and running for around 23 days.
> pmap -x shows
> Total SST: 1961616 (mem mapped data and index files)
> Anon  RSS: 6499640
> Total RSS: 8478376
> This shows that > 3G are 'overallocated'.
> We will use BRAF on one of our less important nodes to check wether it is related to mmap and report back.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2868) Native Memory Leak

Posted by "Daniel Doubleday (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13063770#comment-13063770 ] 

Daniel Doubleday commented on CASSANDRA-2868:
---------------------------------------------

Next: [anon]: 3675224 (+47616KB in 1 day)

bq. Is your data size constant? If not you are probably seeing growth in the index samples and bloom filters.

Well no - the data size is increasing. But I thought that index and bf is good old plain java heap no? JVM heap stats are really relaxed. Yet I think that doesn't really matter because what we are seeing is an ever increasing rss mem consumption even though we have -Xmx3G and -Xms3G and mlockall (pmap shows these 3G as one block). So something seems to be constantly allocating native mem that has nothing to do with java heap.

> Native Memory Leak
> ------------------
>
>                 Key: CASSANDRA-2868
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2868
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.6
>            Reporter: Daniel Doubleday
>            Priority: Minor
>
> We have memory issues with long running servers. These have been confirmed by several users in the user list. That's why I report.
> The memory consumption of the cassandra java process increases steadily until it's killed by the os because of oom (with no swap)
> Our server is started with -Xmx3000M and running for around 23 days.
> pmap -x shows
> Total SST: 1961616 (mem mapped data and index files)
> Anon  RSS: 6499640
> Total RSS: 8478376
> This shows that > 3G are 'overallocated'.
> We will use BRAF on one of our less important nodes to check wether it is related to mmap and report back.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira