You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "graham sanderson (JIRA)" <ji...@apache.org> on 2014/07/22 01:28:40 UTC
[jira] [Comment Edited] (CASSANDRA-7546) AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory

    [ https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14069504#comment-14069504 ] 

graham sanderson edited comment on CASSANDRA-7546 at 7/21/14 11:28 PM:
-----------------------------------------------------------------------

{quote}
I do wonder how much of a problem this is in 2.1, though. I wonder if the largest problem with these racy modifications isn't actually the massive amounts of memtable arena allocations they incur in 2.0 with all their transformation.apply() calls (which reallocate the mutation on the arena), which is most likely what causes the promotion failures, as they cannot be collected. I wonder if we shouldn't simply backport the logic to only allocate these once, or at most twice (the first time we race). It seems much more likely to me that this is where the pain is being felt.
{quote}
I'm not sure which changes you are talking about back-porting and whether the "at most twice" refers to looping once then locking, and what is reasonable to modify in 2.0.x now. Certainly avoiding any repeated cloning of the cells is good, however I'm still pretty sure based on PrintFLSStatistics that the slabs themselves are not the biggest problem (I suspect SnapTreeMap nodes, combined with high rebalancing cost of huge trees in the hint case since the keys are almost entirely sorted).

Are you suggesting a one way switch per Atomic*Columns instance that flips after a number waster "operations"? That sounds reasonable... I'd expect that a partition for a table is either likely to have high contention or not based on the schema design/use case. I have no idea how long these instances hang around in practice (presumably not insanely long) at least if they are being actively used since I assume they get flushed eventually in that case, and if they aren't it doesn't really matter anyway


was (Author: graham sanderson):
{quote}
I do wonder how much of a problem this is in 2.1, though. I wonder if the largest problem with these racy modifications isn't actually the massive amounts of memtable arena allocations they incur in 2.0 with all their transformation.apply() calls (which reallocate the mutation on the arena), which is most likely what causes the promotion failures, as they cannot be collected. I wonder if we shouldn't simply backport the logic to only allocate these once, or at most twice (the first time we race). It seems much more likely to me that this is where the pain is being felt.
{quote}
I'm not sure which changes you are talking about back-porting and whether the "at most twice" refers to looping once then locking. Certainly avoiding any repeated cloning of the cells is good, however I'm still pretty sure based on PrintFLSStatistics that the slabs themselves are not the biggest problem (I suspect SnapTreeMap nodes, combined with high rebalancing cost of huge trees in the hint case since the keys are almost entirely sorted).

Are you suggesting a one way switch per Atomic*Columns instance that flips after a number waster "operations"? That sounds reasonable... I'd expect that a partition for a table is either likely to have high contention or not based on the schema design/use case. I have no idea how long these instances hang around in practice (presumably not insanely long)

> AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory
> -----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-7546
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: graham sanderson
>            Assignee: graham sanderson
>         Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_alt.txt, suggestion1.txt, suggestion1_21.txt
>
>
> In order to preserve atomicity, this code attempts to read, clone/update, then CAS the state of the partition.
> Under heavy contention for updating a single partition this can cause some fairly staggering memory growth (the more cores on your machine the worst it gets).
> Whilst many usage patterns don't do highly concurrent updates to the same partition, hinting today, does, and in this case wild (order(s) of magnitude more than expected) memory allocation rates can be seen (especially when the updates being hinted are small updates to different partitions which can happen very fast on their own) - see CASSANDRA-7545
> It would be best to eliminate/reduce/limit the spinning memory allocation whilst not slowing down the very common un-contended case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)