You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@cassandra.apache.org by "Daniel Norberg (JIRA)" <ji...@apache.org> on 2012/09/24 09:10:10 UTC

[jira] [Created] (CASSANDRA-4708) StorageProxy slow-down and memory leak

Daniel Norberg created CASSANDRA-4708:
-----------------------------------------

             Summary: StorageProxy slow-down and memory leak
                 Key: CASSANDRA-4708
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4708
             Project: Cassandra
          Issue Type: Bug
            Reporter: Daniel Norberg


I am consistently observing slow-downs in StorageProxy caused by the NonBlockingHashMap used indirectly by MessagingService via the callbacks ExpiringMap.

This seems do be due to NBHM having unbounded memory usage in the face of workloads with high key churn. As monotonically increasing integers are used as callback id's by MessagingService, the backing NBHM eventually ends up growing the backing store unboundedly. This causes it to also do very large and expensive backing store reallocation and migrations, causing throughput to drop to tens of operations per second, lasting seconds or even minutes. 

This behavior is especially noticable for high throughput workloads where the dataset is completely in ram and I'm doing up to a hundred thousand reads per second.

Replacing NBHM in ExpiringMap with the java standard library ConcurrentHashMap resolved the issue and allowed me to keep a consistent high throughput.

An open issue on NBHM can be seen here: http://sourceforge.net/tracker/?func=detail&aid=3563980&group_id=194172&atid=948362

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4708) StorageProxy slow-down and memory leak

Posted by "Peter Schuller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-4708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13462429#comment-13462429 ] 

Peter Schuller commented on CASSANDRA-4708:
-------------------------------------------

Nice catch!

FWIW, ConcurrentSkipListMap should probably be considered to avoid locking (though I believe it's generally slower and it'll be less memory compact).
                
> StorageProxy slow-down and memory leak
> --------------------------------------
>
>                 Key: CASSANDRA-4708
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4708
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Daniel Norberg
>            Assignee: Daniel Norberg
>             Fix For: 1.0.12, 1.1.6
>
>         Attachments: 0001-MessagingService-don-t-use-NBHM-in-ExpiringMap.patch
>
>
> I am consistently observing slow-downs in StorageProxy caused by the NonBlockingHashMap used indirectly by MessagingService via the callbacks ExpiringMap.
> This seems do be due to NBHM having unbounded memory usage in the face of workloads with high key churn. As monotonically increasing integers are used as callback id's by MessagingService, the backing NBHM eventually ends up growing the backing store unboundedly. This causes it to also do very large and expensive backing store reallocation and migrations, causing throughput to drop to tens of operations per second, lasting seconds or even minutes. 
> This behavior is especially noticable for high throughput workloads where the dataset is completely in ram and I'm doing up to a hundred thousand reads per second.
> Replacing NBHM in ExpiringMap with the java standard library ConcurrentHashMap resolved the issue and allowed me to keep a consistent high throughput.
> An open issue on NBHM can be seen here: http://sourceforge.net/tracker/?func=detail&aid=3563980&group_id=194172&atid=948362

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-4708) StorageProxy slow-down and memory leak

Posted by "Daniel Norberg (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-4708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Norberg updated CASSANDRA-4708:
--------------------------------------

    Attachment: 0001-MessagingService-don-t-use-NBHM-in-ExpiringMap.patch

Attached patch with the proposed fix; replaces NBHM with CHM in ExpiringMap.
                
> StorageProxy slow-down and memory leak
> --------------------------------------
>
>                 Key: CASSANDRA-4708
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4708
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Daniel Norberg
>         Attachments: 0001-MessagingService-don-t-use-NBHM-in-ExpiringMap.patch
>
>
> I am consistently observing slow-downs in StorageProxy caused by the NonBlockingHashMap used indirectly by MessagingService via the callbacks ExpiringMap.
> This seems do be due to NBHM having unbounded memory usage in the face of workloads with high key churn. As monotonically increasing integers are used as callback id's by MessagingService, the backing NBHM eventually ends up growing the backing store unboundedly. This causes it to also do very large and expensive backing store reallocation and migrations, causing throughput to drop to tens of operations per second, lasting seconds or even minutes. 
> This behavior is especially noticable for high throughput workloads where the dataset is completely in ram and I'm doing up to a hundred thousand reads per second.
> Replacing NBHM in ExpiringMap with the java standard library ConcurrentHashMap resolved the issue and allowed me to keep a consistent high throughput.
> An open issue on NBHM can be seen here: http://sourceforge.net/tracker/?func=detail&aid=3563980&group_id=194172&atid=948362

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-4708) StorageProxy slow-down and memory leak

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-4708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-4708:
--------------------------------------

    Fix Version/s: 1.0.12

also committed to 1.0 branch
                
> StorageProxy slow-down and memory leak
> --------------------------------------
>
>                 Key: CASSANDRA-4708
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4708
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Daniel Norberg
>            Assignee: Daniel Norberg
>             Fix For: 1.0.12, 1.1.6
>
>         Attachments: 0001-MessagingService-don-t-use-NBHM-in-ExpiringMap.patch
>
>
> I am consistently observing slow-downs in StorageProxy caused by the NonBlockingHashMap used indirectly by MessagingService via the callbacks ExpiringMap.
> This seems do be due to NBHM having unbounded memory usage in the face of workloads with high key churn. As monotonically increasing integers are used as callback id's by MessagingService, the backing NBHM eventually ends up growing the backing store unboundedly. This causes it to also do very large and expensive backing store reallocation and migrations, causing throughput to drop to tens of operations per second, lasting seconds or even minutes. 
> This behavior is especially noticable for high throughput workloads where the dataset is completely in ram and I'm doing up to a hundred thousand reads per second.
> Replacing NBHM in ExpiringMap with the java standard library ConcurrentHashMap resolved the issue and allowed me to keep a consistent high throughput.
> An open issue on NBHM can be seen here: http://sourceforge.net/tracker/?func=detail&aid=3563980&group_id=194172&atid=948362

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4708) StorageProxy slow-down and memory leak

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-4708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13461856#comment-13461856 ] 

Jonathan Ellis commented on CASSANDRA-4708:
-------------------------------------------

did a quick audit of NBHM usage elsewhere.  think we're okay, everything else has a pretty tiny amount of possible keys.  the main "large" maps are in the column containers where NBHM is a non-candidate since we need sorting.
                
> StorageProxy slow-down and memory leak
> --------------------------------------
>
>                 Key: CASSANDRA-4708
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4708
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Daniel Norberg
>            Assignee: Daniel Norberg
>             Fix For: 1.1.6
>
>         Attachments: 0001-MessagingService-don-t-use-NBHM-in-ExpiringMap.patch
>
>
> I am consistently observing slow-downs in StorageProxy caused by the NonBlockingHashMap used indirectly by MessagingService via the callbacks ExpiringMap.
> This seems do be due to NBHM having unbounded memory usage in the face of workloads with high key churn. As monotonically increasing integers are used as callback id's by MessagingService, the backing NBHM eventually ends up growing the backing store unboundedly. This causes it to also do very large and expensive backing store reallocation and migrations, causing throughput to drop to tens of operations per second, lasting seconds or even minutes. 
> This behavior is especially noticable for high throughput workloads where the dataset is completely in ram and I'm doing up to a hundred thousand reads per second.
> Replacing NBHM in ExpiringMap with the java standard library ConcurrentHashMap resolved the issue and allowed me to keep a consistent high throughput.
> An open issue on NBHM can be seen here: http://sourceforge.net/tracker/?func=detail&aid=3563980&group_id=194172&atid=948362

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira