You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "graham sanderson (JIRA)" <ji...@apache.org> on 2014/09/16 00:50:35 UTC

[jira] [Comment Edited] (CASSANDRA-7546) AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory

    [ https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14134635#comment-14134635 ] 

graham sanderson edited comment on CASSANDRA-7546 at 9/15/14 10:50 PM:
-----------------------------------------------------------------------

Finally getting back to this, been doing other things (this slightly lower priority as we have it in production already) as well as keeping breaking myself physically, requiring orthopedic visits! I just realized that the version c6a2c65a75ade being voted on for 2.1.0 that I deployed is not the same as 2.1.0 released. I am now upgrading, since cassandra-stress changes snuck in.

Note, than I plan to stress using 1024, 256, 16, 1 partitions, with all 5 nodes up, and then with 4 nodes up and one down to test effect of hinting, (note repl factor of 3 and cl=LOCAL_QUORUM), as well as with at least memtable_allocation_type = heap_buffers & off_heap_buffers

I want to do one cell insert per batch... I'm upgrading in part because of the new visit/revisit stuff - I'm not 100% sure how to use them correctly, I'll keep playing but you may answer before I have finished upgrading and tried with this. My first attempt on the original 2.1.0 revision, ended up with only one clustering key value per partition which is not what I wanted (because it'll make trees small)

Sample YAML for 1024 partitions
{code}
#
# This is an example YAML profile for cassandra-stress
#
# insert data
# cassandra-stress user profile=/home/jake/stress1.yaml ops(insert=1)
#
# read, using query simple1:
# cassandra-stress profile=/home/jake/stress1.yaml ops(simple1=1)
#
# mixed workload (90/10)
# cassandra-stress user profile=/home/jake/stress1.yaml ops(insert=1,simple1=9)


#
# Keyspace info
#
keyspace: stresscql

#
# The CQL for creating a keyspace (optional if it already exists)
#
keyspace_definition: |
  CREATE KEYSPACE stresscql WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 3};

#
# Table info
#
table: testtable

#
# The CQL for creating a table you wish to stress (optional if it already exists)
#
table_definition: |
  CREATE TABLE testtable (
        p text,
        c text,
        v blob,
        PRIMARY KEY(p, c)
  ) WITH COMPACT STORAGE 
    AND compaction = { 'class':'LeveledCompactionStrategy' }
    AND comment='TestTable'

#
# Optional meta information on the generated columns in the above table
# The min and max only apply to text and blob types
# The distribution field represents the total unique population
# distribution of that column across rows.  Supported types are
# 
#      EXP(min..max)                        An exponential distribution over the range [min..max]
#      EXTREME(min..max,shape)              An extreme value (Weibull) distribution over the range [min..max]
#      GAUSSIAN(min..max,stdvrng)           A gaussian/normal distribution, where mean=(min+max)/2, and stdev is (mean-min)/stdvrng
#      GAUSSIAN(min..max,mean,stdev)        A gaussian/normal distribution, with explicitly defined mean and stdev
#      UNIFORM(min..max)                    A uniform distribution over the range [min, max]
#      FIXED(val)                           A fixed distribution, always returning the same value
#      Aliases: extr, gauss, normal, norm, weibull
#
#      If preceded by ~, the distribution is inverted
#
# Defaults for all columns are size: uniform(4..8), population: uniform(1..100B), cluster: fixed(1)
#
columnspec:
  - name: p
    size: fixed(16)
    population: uniform(1..1024)     # the range of unique values to select for the field (default is 100Billion)
  - name: c
    size: fixed(26)
#    cluster: uniform(1..100B)
  - name: v
    size: gaussian(50..250)

insert:
  partitions: fixed(1)            # number of unique partitions to update in a single operation
                                  # if batchcount > 1, multiple batches will be used but all partitions will
                                  # occur in all batches (unless they finish early); only the row counts will vary
  batchtype: LOGGED               # type of batch to use
  visits: fixed(10M)    # not sure about this

queries:
   simple1: select * from testtable where k = ? and v = ? LIMIT 10
{code}

Command-line

{code}
./cassandra-stress user profile=~/cqlstress-1024.yaml ops\(insert=1\) cl=LOCAL_QUORUM -node $NODES -mode native prepared cql3 | tee results/results-2.1.0-p1024-a.txt
{code}


was (Author: graham sanderson):
Finally getting back to this, been doing other things (this slightly lower priority as we have it in production already)... I just realized that the version c6a2c65a75ade being voted on for 2.1.0 that I deployed is not the same as 2.1.0 released. I am now upgrading, since cassandra-stress changes snuck in.

Note, than I plan to stress using 1024, 256, 16, 1 partitions, with all 5 nodes up, and then with 4 nodes up and one down to test effect of hinting, (note repl factor of 3 and cl=LOCAL_QUORUM)

I want to do one cell insert per batch... I'm upgrading in part because of the new visit/revisit stuff - I'm not 100% sure how to use them correctly, I'll keep playing but you may answer before I have finished upgrading and tried with this. My first attempt on the original 2.1.0 revision, ended up with only one clustering key value per partition which is not what I wanted (because it'll make trees small)

Sample YAML for 1024 partitions
{code}
#
# This is an example YAML profile for cassandra-stress
#
# insert data
# cassandra-stress user profile=/home/jake/stress1.yaml ops(insert=1)
#
# read, using query simple1:
# cassandra-stress profile=/home/jake/stress1.yaml ops(simple1=1)
#
# mixed workload (90/10)
# cassandra-stress user profile=/home/jake/stress1.yaml ops(insert=1,simple1=9)


#
# Keyspace info
#
keyspace: stresscql

#
# The CQL for creating a keyspace (optional if it already exists)
#
keyspace_definition: |
  CREATE KEYSPACE stresscql WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 3};

#
# Table info
#
table: testtable

#
# The CQL for creating a table you wish to stress (optional if it already exists)
#
table_definition: |
  CREATE TABLE testtable (
        p text,
        c text,
        v blob,
        PRIMARY KEY(p, c)
  ) WITH COMPACT STORAGE 
    AND compaction = { 'class':'LeveledCompactionStrategy' }
    AND comment='TestTable'

#
# Optional meta information on the generated columns in the above table
# The min and max only apply to text and blob types
# The distribution field represents the total unique population
# distribution of that column across rows.  Supported types are
# 
#      EXP(min..max)                        An exponential distribution over the range [min..max]
#      EXTREME(min..max,shape)              An extreme value (Weibull) distribution over the range [min..max]
#      GAUSSIAN(min..max,stdvrng)           A gaussian/normal distribution, where mean=(min+max)/2, and stdev is (mean-min)/stdvrng
#      GAUSSIAN(min..max,mean,stdev)        A gaussian/normal distribution, with explicitly defined mean and stdev
#      UNIFORM(min..max)                    A uniform distribution over the range [min, max]
#      FIXED(val)                           A fixed distribution, always returning the same value
#      Aliases: extr, gauss, normal, norm, weibull
#
#      If preceded by ~, the distribution is inverted
#
# Defaults for all columns are size: uniform(4..8), population: uniform(1..100B), cluster: fixed(1)
#
columnspec:
  - name: p
    size: fixed(16)
    population: uniform(1..1024)     # the range of unique values to select for the field (default is 100Billion)
  - name: c
    size: fixed(26)
#    cluster: uniform(1..100B)
  - name: v
    size: gaussian(50..250)

insert:
  partitions: fixed(1)            # number of unique partitions to update in a single operation
                                  # if batchcount > 1, multiple batches will be used but all partitions will
                                  # occur in all batches (unless they finish early); only the row counts will vary
  batchtype: LOGGED               # type of batch to use
  visits: fixed(10M)    # not sure about this

queries:
   simple1: select * from testtable where k = ? and v = ? LIMIT 10
{code}

Command-line

{code}
./cassandra-stress user profile=~/cqlstress-1024.yaml ops\(insert=1\) cl=LOCAL_QUORUM -node $NODES -mode native prepared cql3 | tee results/results-2.1.0-p1024-a.txt
{code}

> AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory
> -----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-7546
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: graham sanderson
>            Assignee: graham sanderson
>             Fix For: 2.1.1
>
>         Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_3.txt, 7546.20_4.txt, 7546.20_5.txt, 7546.20_6.txt, 7546.20_7.txt, 7546.20_7b.txt, 7546.20_alt.txt, 7546.20_async.txt, 7546.21_v1.txt, hint_spikes.png, suggestion1.txt, suggestion1_21.txt, young_gen_gc.png
>
>
> In order to preserve atomicity, this code attempts to read, clone/update, then CAS the state of the partition.
> Under heavy contention for updating a single partition this can cause some fairly staggering memory growth (the more cores on your machine the worst it gets).
> Whilst many usage patterns don't do highly concurrent updates to the same partition, hinting today, does, and in this case wild (order(s) of magnitude more than expected) memory allocation rates can be seen (especially when the updates being hinted are small updates to different partitions which can happen very fast on their own) - see CASSANDRA-7545
> It would be best to eliminate/reduce/limit the spinning memory allocation whilst not slowing down the very common un-contended case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)