You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Alex Li <al...@gmail.com> on 2010/04/22 09:40:06 UTC

Periodically hiccups

Hello,

We recently deployed a cluster of 5 Cassandra nodes into production, and 
ran into big problems with periodically hiccups (individual node goes 
down, high CPU, client connection timeout). It was terrible with 0.5 
(one hiccups every 5-10 minutes), today we upgraded to 0.6.1, it happens 
less frequently now (likely once every 30 minutes or so). But it is 
still quite frustrating.

We used ReplicationFactor=3 for all column families. 5 nodes are behind 
haproxy. Java client goes through haproxy. The most obvious behavior is: 
as soon as one node goes down, the connections between haproxy and 
Cassandra nodes just shoot up to 1000 (in normal case it is stable at 
40), and the connections don't go down for quite a while. Meanwhile Java 
clients just get all kind of TimeoutException, then kept on retrying. 
Eventually we have to restart haproxy, then things go back to normal.

Each node has 5GB max heap, powerful enough CPU (quad-core), software 
RAID mirror. We are definitely NOT putting lots of load yet, mostly 
20-50 concurrent requests to Cassandra, but it is not holding up! Please 
help, we are on the verge of giving up Cassandra after 5 days of 
periodic "outage".

Couple observations:

- cfstats shows significant read latency on "system" keyspace, almost 5s 
(see below)

- RecentReadLatencyMicros and RecentWriteLatencyMicros are super high 
for StorageProxy, as well as every column family in JMX: up to 43 s and 
9s (see screenshot). However, in cfstats, they are quite small.

- Every second we see 5-10 DigestMismatchException in the log:

 INFO [pool-1-thread-15857] 2010-04-22 00:37:37,887 StorageProxy.java 
(line 499) DigestMismatchException: Mismatch for key 1068022523 
(d41d8cd98f00b204e9800998ecf8427e vs 0dd4cdaeeb1a334ae133c6955e109629)

Please advice. Thank you!

storage-conf, cfstats, tpstats are listed below:

root@cdb-006:/glass/sfw/cassandra# cat conf/storage-conf.xml
<!--
 ~ Licensed to the Apache Software Foundation (ASF) under one
 ~ or more contributor license agreements.  See the NOTICE file
 ~ distributed with this work for additional information
 ~ regarding copyright ownership.  The ASF licenses this file
 ~ to you under the Apache License, Version 2.0 (the
 ~ "License"); you may not use this file except in compliance
 ~ with the License.  You may obtain a copy of the License at
 ~
 ~    http://www.apache.org/licenses/LICENSE-2.0
 ~
 ~ Unless required by applicable law or agreed to in writing,
 ~ software distributed under the License is distributed on an
 ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 ~ KIND, either express or implied.  See the License for the
 ~ specific language governing permissions and limitations
 ~ under the License.
-->
<Storage>
    
<!--======================================================================-->
    <!-- Basic 
Configuration                                                  -->
    
<!--======================================================================-->

    <!--
       ~ The name of this cluster.  This is mainly used to prevent 
machines in
       ~ one logical cluster from joining another.
      -->
    <ClusterName>Test Cluster</ClusterName>

    <!--
       ~ Turn on to make new [non-seed] nodes automatically migrate the 
right data
       ~ to themselves.  (If no InitialToken is specified, they will 
pick one
       ~ such that they will get half the range of the most-loaded node.)
       ~ If a node starts up without bootstrapping, it will mark itself 
bootstrapped
       ~ so that you can't subsequently accidently bootstrap a node with
       ~ data on it.  (You can reset this by wiping your data and commitlog
       ~ directories.)
       ~
       ~ Off by default so that new clusters and upgraders from 0.4 don't
       ~ bootstrap immediately.  You should turn this on when you start 
adding
       ~ new nodes to a cluster that already has data on it.  (If you 
are upgrading
       ~ from 0.4, start your cluster with it off once before changing 
it to true.
       ~ Otherwise, no data will be lost but you will incur a lot of 
unnecessary
       ~ I/O before your cluster starts up.)
      -->
    <AutoBootstrap>false</AutoBootstrap>

    <!--
       ~ Keyspaces and ColumnFamilies:
       ~ A ColumnFamily is the Cassandra concept closest to a relational
       ~ table.  Keyspaces are separate groups of ColumnFamilies.  Except in
       ~ very unusual circumstances you will have one Keyspace per 
application.

       ~ There is an implicit keyspace named 'system' for Cassandra 
internals.
      -->
    <Keyspaces>
        <Keyspace Name="Pandora">
            <ColumnFamily CompareWith="BytesType" Name="Standard1"/>
            <ColumnFamily CompareWith="UTF8Type" Name="Standard2"/>
            <ColumnFamily CompareWith="TimeUUIDType" 
Name="StandardByUUID1"/>
            <ColumnFamily CompareWith="TimeUUIDType" Name="Folder1"/>
            <ColumnFamily CompareWith="UTF8Type" Name="FolderInfo"/>
            <ColumnFamily ColumnType="Super"
                          CompareWith="UTF8Type"
                          CompareSubcolumnsWith="UTF8Type"
                          Name="User"
                          Comment="A column family with supercolumns, 
whose column and subcolumn names are UTF8 strings"/>
            <ColumnFamily ColumnType="Super"
                          CompareWith="UTF8Type"
                          CompareSubcolumnsWith="UTF8Type"
                          Name="Message"
                          Comment="A column family with supercolumns, 
whose column and subcolumn names are UTF8 strings"/>
            <ColumnFamily ColumnType="Super"
                          CompareWith="UTF8Type"
                          CompareSubcolumnsWith="UTF8Type"
                          Name="Folder"
                          Comment="A column family with supercolumns, 
whose column and subcolumn names are UTF8 strings"/>
            <ColumnFamily ColumnType="Super"
                          CompareWith="UTF8Type"
                          CompareSubcolumnsWith="UTF8Type"
                          Name="Attachment"
                          Comment="A column family with supercolumns, 
whose column and subcolumn names are UTF8 strings"/>

            
<ReplicaPlacementStrategy>org.apache.cassandra.locator.RackUnawareStrategy</ReplicaPlacementStrategy>
            <ReplicationFactor>3</ReplicationFactor>
            
<EndPointSnitch>org.apache.cassandra.locator.EndPointSnitch</EndPointSnitch>
        </Keyspace>

        <Keyspace Name="Titan">
            <ColumnFamily ColumnType="Super"
                          CompareWith="UTF8Type"
                          CompareSubcolumnsWith="UTF8Type"
                          Name="Club"
                          Comment="A column family with supercolumns, 
whose column and subcolumn names are UTF8 strings"/>
            <ColumnFamily ColumnType="Super"
                          CompareWith="UTF8Type"
                          CompareSubcolumnsWith="UTF8Type"
                          Name="Payment"
                          Comment="A column family with supercolumns, 
whose column and subcolumn names are UTF8 strings"/>
            <ColumnFamily ColumnType="Super"
                          CompareWith="UTF8Type"
                          CompareSubcolumnsWith="UTF8Type"
                          Name="User"
                          Comment="A column family with supercolumns, 
whose column and subcolumn names are UTF8 strings"/>
            <ColumnFamily ColumnType="Super"
                          CompareWith="UTF8Type"
                          CompareSubcolumnsWith="UTF8Type"
                          Name="FbUser"
                          Comment="A column family with supercolumns, 
whose column and subcolumn names are UTF8 strings"/>
            <ColumnFamily CompareWith="UTF8Type" Name="RandomUsers"/>

            
<ReplicaPlacementStrategy>org.apache.cassandra.locator.RackUnawareStrategy</ReplicaPlacementStrategy>
            <ReplicationFactor>3</ReplicationFactor>
            
<EndPointSnitch>org.apache.cassandra.locator.EndPointSnitch</EndPointSnitch>
        </Keyspace>
       
    </Keyspaces>

    <!--
               ~ Authenticator: any IAuthenticator may be used, 
including your own as long
               ~ as it is on the classpath.  Out of the box, Cassandra 
provides
               ~ org.apache.cassandra.auth.AllowAllAuthenticator and,
               ~ org.apache.cassandra.auth.SimpleAuthenticator
               ~ (SimpleAuthenticator uses access.properties and 
passwd.properties by
               ~ default).
               ~
               ~ If you don't specify an authenticator, 
AllowAllAuthenticator is used.
              -->
    
<Authenticator>org.apache.cassandra.auth.AllowAllAuthenticator</Authenticator>

    <!--
               ~ Partitioner: any IPartitioner may be used, including 
your own as long
               ~ as it is on the classpath.  Out of the box, Cassandra 
provides
               ~ org.apache.cassandra.dht.RandomPartitioner,
               ~ org.apache.cassandra.dht.OrderPreservingPartitioner, and
               ~ 
org.apache.cassandra.dht.CollatingOrderPreservingPartitioner.
               ~ (CollatingOPP colates according to EN,US rules, not 
naive byte
               ~ ordering.  Use this as an example if you need 
locale-aware collation.)
               ~ Range queries require using an order-preserving 
partitioner.
               ~
               ~ Achtung!  Changing this parameter requires wiping your data
               ~ directories, since the partitioner can modify the 
sstable on-disk
               ~ format.
              -->
    <Partitioner>org.apache.cassandra.dht.RandomPartitioner</Partitioner>

    <!--
               ~ If you are using an order-preserving partitioner and 
you know your key
               ~ distribution, you can specify the token for this node 
to use. (Keys
               ~ are sent to the node with the "closest" token, so 
distributing your
               ~ tokens equally along the key distribution space will 
spread keys
               ~ evenly across your cluster.)  This setting is only 
checked the first
               ~ time a node is started.

               ~ This can also be useful with RandomPartitioner to force 
equal spacing
               ~ of tokens around the hash space, especially for 
clusters with a small
               ~ number of nodes.
              -->
    <InitialToken></InitialToken>

    <!--
               ~ Directories: Specify where Cassandra should store 
different data on
               ~ disk.  Keep the data disks and the CommitLog disks 
separate for best
               ~ performance
              -->
    <CommitLogDirectory>/var/lib/cassandra/commitlog</CommitLogDirectory>
    <DataFileDirectories>
        <DataFileDirectory>/var/lib/cassandra/data</DataFileDirectory>
    </DataFileDirectories>


    <!--
               ~ Addresses of hosts that are deemed contact points. 
Cassandra nodes
               ~ use this list of hosts to find each other and learn the 
topology of
               ~ the ring. You must change this if you are running 
multiple nodes!
              -->
    <Seeds>
        <Seed>10.10.104.11</Seed>
        <Seed>10.10.104.13</Seed>
        <Seed>10.10.104.15</Seed>
    </Seeds>


    <!-- Miscellaneous -->

    <!-- Time to wait for a reply from other nodes before failing the 
command -->
    <RpcTimeoutInMillis>10000</RpcTimeoutInMillis>
    <!-- Size to allow commitlog to grow to before creating a new 
segment -->
    <CommitLogRotationThresholdInMB>128</CommitLogRotationThresholdInMB>


    <!-- Local hosts and ports -->

    <!--
               ~ Address to bind to and tell other nodes to connect to.  
You _must_
               ~ change this if you want multiple nodes to be able to 
communicate!
               ~
               ~ Leaving it blank leaves it up to 
InetAddress.getLocalHost(). This
               ~ will always do the Right Thing *if* the node is 
properly configured
               ~ (hostname, name resolution, etc), and the Right Thing 
is to use the
               ~ address associated with the hostname (it might not be).
              -->
    <ListenAddress></ListenAddress>
    <!-- internal communications port -->
    <StoragePort>7000</StoragePort>

    <!--
               ~ The address to bind the Thrift RPC service to. Unlike 
ListenAddress
               ~ above, you *can* specify 0.0.0.0 here if you want 
Thrift to listen on
               ~ all interfaces.
               ~
               ~ Leaving this blank has the same effect it does for 
ListenAddress,
               ~ (i.e. it will be based on the configured hostname of 
the node).
              -->
    <ThriftAddress>0.0.0.0</ThriftAddress>
    <!-- Thrift RPC port (the port clients connect to). -->
    <ThriftPort>9160</ThriftPort>
    <!--
               ~ Whether or not to use a framed transport for Thrift. If 
this option
               ~ is set to true then you must also use a framed 
transport on the
               ~ client-side, (framed and non-framed transports are not 
compatible).
              -->
    <ThriftFramedTransport>false</ThriftFramedTransport>


    
<!--======================================================================-->
    <!-- Memory, Disk, and 
Performance                                        -->
    
<!--======================================================================-->

    <!--
               ~ Access mode.  mmapped i/o is substantially faster, but 
only practical on
               ~ a 64bit machine (which notably does not include EC2 
"small" instances)
               ~ or relatively small datasets.  "auto", the safe choice, 
will enable
               ~ mmapping on a 64bit JVM.  Other values are "mmap", 
"mmap_index_only"
               ~ (which may allow you to get part of the benefits of 
mmap on a 32bit
               ~ machine by mmapping only index files) and "standard".
               ~ (The buffer size settings that follow only apply to 
standard,
               ~ non-mmapped i/o.)
               -->
    <DiskAccessMode>auto</DiskAccessMode>

    <!--
               ~ Size of compacted row above which to log a warning.  
(If compacted
               ~ rows do not fit in memory, Cassandra will crash.  This 
is explained
               ~ in 
http://wiki.apache.org/cassandra/CassandraLimitations and is
               ~ scheduled to be fixed in 0.7.)
              -->
    <RowWarningThresholdInMB>512</RowWarningThresholdInMB>

    <!--
               ~ Buffer size to use when performing contiguous column 
slices. Increase
               ~ this to the size of the column slices you typically 
perform.
               ~ (Name-based queries are performed with a buffer size of
               ~ ColumnIndexSizeInKB.)
              -->
    <SlicedBufferSizeInKB>64</SlicedBufferSizeInKB>

    <!--
               ~ Buffer size to use when flushing memtables to disk. 
(Only one
               ~ memtable is ever flushed at a time.) Increase 
(decrease) the index
               ~ buffer size relative to the data buffer if you have few 
(many)
               ~ columns per key.  Bigger is only better _if_ your 
memtables get large
               ~ enough to use the space. (Check in your data directory 
after your
               ~ app has been running long enough.) -->
    <FlushDataBufferSizeInMB>32</FlushDataBufferSizeInMB>
    <FlushIndexBufferSizeInMB>8</FlushIndexBufferSizeInMB>

    <!--
               ~ Add column indexes to a row after its contents reach 
this size.
               ~ Increase if your column values are large, or if you 
have a very large
               ~ number of columns.  The competing causes are, Cassandra 
has to
               ~ deserialize this much of the row to read a single 
column, so you want
               ~ it to be small - at least if you do many partial-row 
reads - but all
               ~ the index data is read for each access, so you don't 
want to generate
               ~ that wastefully either.
              -->
    <ColumnIndexSizeInKB>64</ColumnIndexSizeInKB>

    <!--
               ~ Flush memtable after this much data has been inserted, 
including
               ~ overwritten data.  There is one memtable per column 
family, and
               ~ this threshold is based solely on the amount of data 
stored, not
               ~ actual heap memory usage (there is some overhead in 
indexing the
               ~ columns).
              -->
    <MemtableThroughputInMB>64</MemtableThroughputInMB>
    <!--
               ~ Throughput setting for Binary Memtables.  Typically 
these are
               ~ used for bulk load so you want them to be larger.
              -->
    <BinaryMemtableThroughputInMB>256</BinaryMemtableThroughputInMB>
    <!--
               ~ The maximum number of columns in millions to store in 
memory per
               ~ ColumnFamily before flushing to disk.  This is also a 
per-memtable
               ~ setting.  Use with MemtableThroughputInMB to tune 
memory usage.
              -->
    <MemtableOperationsInMillions>0.3</MemtableOperationsInMillions>
    <!--
               ~ The maximum time to leave a dirty memtable unflushed.
               ~ (While any affected columnfamilies have unflushed data 
from a
               ~ commit log segment, that segment cannot be deleted.)
               ~ This needs to be large enough that it won't cause a 
flush storm
               ~ of all your memtables flushing at once because none has hit
               ~ the size or count thresholds yet.  For production, a larger
               ~ value such as 1440 is recommended.
              -->
    <MemtableFlushAfterMinutes>60</MemtableFlushAfterMinutes>

    <!--
               ~ Unlike most systems, in Cassandra writes are faster 
than reads, so
               ~ you can afford more of those in parallel.  A good rule 
of thumb is 2
               ~ concurrent reads per processor core.  Increase 
ConcurrentWrites to
               ~ the number of clients writing at once if you enable 
CommitLogSync +
               ~ CommitLogSyncDelay. -->
    <ConcurrentReads>8</ConcurrentReads>
    <ConcurrentWrites>32</ConcurrentWrites>

    <!--
               ~ CommitLogSync may be either "periodic" or "batch."  
When in batch
               ~ mode, Cassandra won't ack writes until the commit log 
has been
               ~ fsynced to disk.  It will wait up to 
CommitLogSyncBatchWindowInMS
               ~ milliseconds for other writes, before performing the sync.

               ~ This is less necessary in Cassandra than in traditional 
databases
               ~ since replication reduces the odds of losing data from 
a failure
               ~ after writing the log entry but before it actually 
reaches the disk.
               ~ So the other option is "periodic," where writes may be 
acked immediately
               ~ and the CommitLog is simply synced every 
CommitLogSyncPeriodInMS
               ~ milliseconds.
              -->
    <CommitLogSync>periodic</CommitLogSync>
    <!--
               ~ Interval at which to perform syncs of the CommitLog in 
periodic mode.
               ~ Usually the default of 10000ms is fine; increase it if 
your i/o
               ~ load is such that syncs are taking excessively long times.
              -->
    <CommitLogSyncPeriodInMS>10000</CommitLogSyncPeriodInMS>
    <!--
               ~ Delay (in milliseconds) during which additional commit 
log entries
               ~ may be written before fsync in batch mode.  This will 
increase
               ~ latency slightly, but can vastly improve throughput 
where there are
               ~ many writers.  Set to zero to disable (each entry will 
be synced
               ~ individually).  Reasonable values range from a minimal 
0.1 to 10 or
               ~ even more if throughput matters more than latency.
              -->
    <!-- <CommitLogSyncBatchWindowInMS>1</CommitLogSyncBatchWindowInMS> -->

    <!--
               ~ Time to wait before garbage-collection deletion 
markers.  Set this to
               ~ a large enough value that you are confident that the 
deletion marker
               ~ will be propagated to all replicas by the time this 
many seconds has
               ~ elapsed, even in the face of hardware failures.  The 
default value is
               ~ ten days.
              -->
    <GCGraceSeconds>864000</GCGraceSeconds>
</Storage>
root@cdb-006:/glass/sfw/cassandra# bin/nodetool -h localhost cfstats
Keyspace: system
    Read Count: 878
    Read Latency: 5752.042634396355 ms.
    Write Count: 2260398
    Write Latency: 0.014567047926957996 ms.
    Pending Tasks: 0
        Column Family: LocationInfo
        SSTable count: 2
        Space used (live): 3569
        Space used (total): 3569
        Memtable Columns Count: 0
        Memtable Data Size: 0
        Memtable Switch Count: 1
        Read Count: 1
        Read Latency: NaN ms.
        Write Count: 6
        Write Latency: NaN ms.
        Pending Tasks: 0
        Key cache capacity: 2
        Key cache size: 1
        Key cache hit rate: NaN
        Row cache: disabled
        Compacted row minimum size: 0
        Compacted row maximum size: 0
        Compacted row mean size: 0

        Column Family: HintsColumnFamily
        SSTable count: 2
        Space used (live): 70272035
        Space used (total): 70272035
        Memtable Columns Count: 56264
        Memtable Data Size: 486854
        Memtable Switch Count: 21
        Read Count: 877
        Read Latency: 13614.412 ms.
        Write Count: 2260392
        Write Latency: 0.142 ms.
        Pending Tasks: 0
        Key cache capacity: 2
        Key cache size: 2
        Key cache hit rate: 0.25
        Row cache: disabled
        Compacted row minimum size: 78567
        Compacted row maximum size: 39561901
        Compacted row mean size: 27878603

----------------
Keyspace: Titan
    Read Count: 8948702
    Read Latency: 7.949136100185256 ms.
    Write Count: 3393490
    Write Latency: 0.19255415398306758 ms.
    Pending Tasks: 0
        Column Family: FbUser
        SSTable count: 6
        Space used (live): 3675014807
        Space used (total): 3675014807
        Memtable Columns Count: 250055
        Memtable Data Size: 9146339
        Memtable Switch Count: 9
        Read Count: 6591406
        Read Latency: 8.361 ms.
        Write Count: 343030
        Write Latency: 0.078 ms.
        Pending Tasks: 0
        Key cache capacity: 200000
        Key cache size: 200000
        Key cache hit rate: 0.6341912478864628
        Row cache: disabled
        Compacted row minimum size: 320
        Compacted row maximum size: 586
        Compacted row mean size: 479

        Column Family: Payment
        SSTable count: 3
        Space used (live): 2473
        Space used (total): 2473
        Memtable Columns Count: 0
        Memtable Data Size: 0
        Memtable Switch Count: 0
        Read Count: 27728
        Read Latency: 0.059 ms.
        Write Count: 0
        Write Latency: NaN ms.
        Pending Tasks: 0
        Key cache capacity: 200000
        Key cache size: 0
        Key cache hit rate: NaN
        Row cache: disabled
        Compacted row minimum size: 0
        Compacted row maximum size: 0
        Compacted row mean size: 0

        Column Family: Club
        SSTable count: 6
        Space used (live): 363609256
        Space used (total): 477968226
        Memtable Columns Count: 124134
        Memtable Data Size: 38144903
        Memtable Switch Count: 140
        Read Count: 63668
        Read Latency: 29.187 ms.
        Write Count: 1966098
        Write Latency: 0.184 ms.
        Pending Tasks: 0
        Key cache capacity: 200000
        Key cache size: 63751
        Key cache hit rate: 0.6543073524771221
        Row cache: disabled
        Compacted row minimum size: 274
        Compacted row maximum size: 49784
        Compacted row mean size: 7711

        Column Family: RandomUsers
        SSTable count: 1
        Space used (live): 1329464
        Space used (total): 1329464
        Memtable Columns Count: 170433
        Memtable Data Size: 5525529
        Memtable Switch Count: 8
        Read Count: 58236
        Read Latency: 6.215 ms.
        Write Count: 143339
        Write Latency: 0.575 ms.
        Pending Tasks: 0
        Key cache capacity: 200000
        Key cache size: 21
        Key cache hit rate: 1.0
        Row cache: disabled
        Compacted row minimum size: 1244196
        Compacted row maximum size: 1327409
        Compacted row mean size: 1287929

        Column Family: User
        SSTable count: 4
        Space used (live): 583812728
        Space used (total): 583812728
        Memtable Columns Count: 248833
        Memtable Data Size: 7909687
        Memtable Switch Count: 12
        Read Count: 2207667
        Read Latency: 8.413 ms.
        Write Count: 941029
        Write Latency: 0.107 ms.
        Pending Tasks: 0
        Key cache capacity: 200000
        Key cache size: 94940
        Key cache hit rate: 0.9769078238427732
        Row cache: disabled
        Compacted row minimum size: 259
        Compacted row maximum size: 174340
        Compacted row mean size: 3916

----------------
Keyspace: Pandora
    Read Count: 1475530
    Read Latency: 5.249669284257182 ms.
    Write Count: 856550
    Write Latency: 0.16070848053236822 ms.
    Pending Tasks: 0
        Column Family: Folder
        SSTable count: 7
        Space used (live): 942926793
        Space used (total): 942926793
        Memtable Columns Count: 150119
        Memtable Data Size: 814194
        Memtable Switch Count: 21
        Read Count: 604605
        Read Latency: 7.539 ms.
        Write Count: 553015
        Write Latency: 0.173 ms.
        Pending Tasks: 0
        Key cache capacity: 200000
        Key cache size: 111884
        Key cache hit rate: 0.9770064886992118
        Row cache: disabled
        Compacted row minimum size: 234
        Compacted row maximum size: 1644541
        Compacted row mean size: 1238

        Column Family: Attachment
        SSTable count: 5
        Space used (live): 754692822
        Space used (total): 754692822
        Memtable Columns Count: 548
        Memtable Data Size: 7110
        Memtable Switch Count: 8
        Read Count: 22708
        Read Latency: 6.950 ms.
        Write Count: 29835
        Write Latency: 0.025 ms.
        Pending Tasks: 0
        Key cache capacity: 200000
        Key cache size: 8299
        Key cache hit rate: 0.017857142857142856
        Row cache: disabled
        Compacted row minimum size: 269
        Compacted row maximum size: 274
        Compacted row mean size: 272

        Column Family: Message
        SSTable count: 7
        Space used (live): 3689251440
        Space used (total): 3689251440
        Memtable Columns Count: 50399
        Memtable Data Size: 1361773
        Memtable Switch Count: 10
        Read Count: 163630
        Read Latency: 16.981 ms.
        Write Count: 257843
        Write Latency: 0.113 ms.
        Pending Tasks: 0
        Key cache capacity: 200000
        Key cache size: 52427
        Key cache hit rate: 0.5041365046535677
        Row cache: disabled
        Compacted row minimum size: 272
        Compacted row maximum size: 2754
        Compacted row mean size: 752

        Column Family: FolderInfo
        SSTable count: 0
        Space used (live): 0
        Space used (total): 0
        Memtable Columns Count: 0
        Memtable Data Size: 0
        Memtable Switch Count: 0
        Read Count: 8704
        Read Latency: 0.008 ms.
        Write Count: 0
        Write Latency: NaN ms.
        Pending Tasks: 0
        Key cache capacity: 200000
        Key cache size: 0
        Key cache hit rate: NaN
        Row cache: disabled
        Compacted row minimum size: 0
        Compacted row maximum size: 0
        Compacted row mean size: 0

        Column Family: StandardByUUID1
        SSTable count: 0
        Space used (live): 0
        Space used (total): 0
        Memtable Columns Count: 0
        Memtable Data Size: 0
        Memtable Switch Count: 0
        Read Count: 8704
        Read Latency: 0.005 ms.
        Write Count: 0
        Write Latency: NaN ms.
        Pending Tasks: 0
        Key cache capacity: 200000
        Key cache size: 0
        Key cache hit rate: NaN
        Row cache: disabled
        Compacted row minimum size: 0
        Compacted row maximum size: 0
        Compacted row mean size: 0

        Column Family: User
        SSTable count: 4
        Space used (live): 120121309
        Space used (total): 120121309
        Memtable Columns Count: 1786
        Memtable Data Size: 50657
        Memtable Switch Count: 8
        Read Count: 641067
        Read Latency: 2.728 ms.
        Write Count: 15857
        Write Latency: 0.148 ms.
        Pending Tasks: 0
        Key cache capacity: 200000
        Key cache size: 45965
        Key cache hit rate: 0.957460499034818
        Row cache: disabled
        Compacted row minimum size: 271
        Compacted row maximum size: 506
        Compacted row mean size: 499

        Column Family: Standard1
        SSTable count: 0
        Space used (live): 0
        Space used (total): 0
        Memtable Columns Count: 0
        Memtable Data Size: 0
        Memtable Switch Count: 0
        Read Count: 8704
        Read Latency: 0.005 ms.
        Write Count: 0
        Write Latency: NaN ms.
        Pending Tasks: 0
        Key cache capacity: 200000
        Key cache size: 0
        Key cache hit rate: NaN
        Row cache: disabled
        Compacted row minimum size: 0
        Compacted row maximum size: 0
        Compacted row mean size: 0

        Column Family: Standard2
        SSTable count: 0
        Space used (live): 0
        Space used (total): 0
        Memtable Columns Count: 0
        Memtable Data Size: 0
        Memtable Switch Count: 0
        Read Count: 8704
        Read Latency: 0.006 ms.
        Write Count: 0
        Write Latency: NaN ms.
        Pending Tasks: 0
        Key cache capacity: 200000
        Key cache size: 0
        Key cache hit rate: NaN
        Row cache: disabled
        Compacted row minimum size: 0
        Compacted row maximum size: 0
        Compacted row mean size: 0

        Column Family: Folder1
        SSTable count: 0
        Space used (live): 0
        Space used (total): 0
        Memtable Columns Count: 0
        Memtable Data Size: 0
        Memtable Switch Count: 0
        Read Count: 8704
        Read Latency: 0.005 ms.
        Write Count: 0
        Write Latency: NaN ms.
        Pending Tasks: 0
        Key cache capacity: 200000
        Key cache size: 0
        Key cache hit rate: NaN
        Row cache: disabled
        Compacted row minimum size: 0
        Compacted row maximum size: 0
        Compacted row mean size: 0

----------------
root@cdb-006:/glass/sfw/cassandra# bin/nodetool -h localhost tpstats
Pool Name                    Active   Pending      Completed
FILEUTILS-DELETE-POOL             0         0             70
STREAM-STAGE                      0         0              0
RESPONSE-STAGE                    0         0       18298968
ROW-READ-STAGE                    1         0       10168985
LB-OPERATIONS                     0         0              0
MESSAGE-DESERIALIZER-POOL         0         0       24960681
GMFD                              0         0          93647
LB-TARGET                         0         0              0
CONSISTENCY-MANAGER               0         0              0
ROW-MUTATION-STAGE                0         0        4280703
MESSAGE-STREAMING-POOL            0         0             12
LOAD-BALANCER-STAGE               0         0              0
FLUSH-SORTER-POOL                 0         0              0
MEMTABLE-POST-FLUSHER             0         0            224
FLUSH-WRITER-POOL                 0         0            224
AE-SERVICE-STAGE                  0         0             22
HINTED-HANDOFF-POOL               1         5             73

Re: Periodically hiccups

Posted by Alex Li <al...@gmail.com>.

Here are some system load data I collected.

top - 00:43:43 up 1 day, 20:15,  2 users,  load average: 2.96, 2.74, 2.37
Tasks: 160 total,   1 running, 159 sleeping,   0 stopped,   0 zombie
Cpu0  : 10.9%us,  2.6%sy,  0.0%ni, 83.2%id,  3.2%wa,  0.0%hi,  0.0%si,  
0.0%st
Cpu1  :  9.6%us,  2.2%sy,  0.0%ni, 85.4%id,  2.8%wa,  0.0%hi,  0.0%si,  
0.0%st
Cpu2  : 37.1%us,  4.2%sy,  0.0%ni, 23.3%id, 33.8%wa,  1.2%hi,  0.3%si,  
0.0%st
Cpu3  : 18.5%us,  3.9%sy,  0.0%ni, 62.6%id,  9.5%wa,  4.1%hi,  1.4%si,  
0.0%st
Cpu4  : 11.1%us,  2.6%sy,  0.0%ni, 80.1%id,  3.2%wa,  2.9%hi,  0.2%si,  
0.0%st
Cpu5  : 10.1%us,  2.7%sy,  0.0%ni, 84.4%id,  2.7%wa,  0.0%hi,  0.0%si,  
0.0%st
Cpu6  : 20.1%us,  2.6%sy,  0.0%ni, 62.6%id, 14.7%wa,  0.0%hi,  0.0%si,  
0.0%st
Cpu7  : 18.9%us,  4.4%sy,  0.0%ni, 69.0%id,  7.7%wa,  0.0%hi,  0.1%si,  
0.0%st
Mem:   8188108k total,  8135168k used,    52940k free,    12968k buffers
Swap: 62500792k total,   277484k used, 62223308k free,  2381704k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  
COMMAND                                                                  
18000 root      20   0 15.6g 6.7g 1.5g S   64 85.4 552:54.13 
java                                                                     
    1 root      20   0 19316  672  412 S    0  0.0   0:03.36 init       

root@cdb-006:/glass/sfw/cassandra# vmstat 5
procs -----------memory---------- ---swap-- -----io---- -system-- 
----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy 
id wa
 5  1 277984  50180  12888 2382292   84   67   880   127   10    4 17  4 
69 10
 3  1 278000  51536  12856 2381316    4   14  2556    37 8660 10714  7  
4 82  8
 2  0 277968  52084  12768 2380156    7    2   891   618 6287 8865  6  3 
87  3
 3  0 277872  52020  12764 2379412   24    7  1070    22 5492 10111  8  
3 84  4
 0  1 277528  50764  12760 2379176    2   10  1530   696 8773 13661  8  
4 83  5



Alex Li wrote:
> Hello,
>
> We recently deployed a cluster of 5 Cassandra nodes into production, 
> and ran into big problems with periodically hiccups (individual node 
> goes down, high CPU, client connection timeout). It was terrible with 
> 0.5 (one hiccups every 5-10 minutes), today we upgraded to 0.6.1, it 
> happens less frequently now (likely once every 30 minutes or so). But 
> it is still quite frustrating.
>
> We used ReplicationFactor=3 for all column families. 5 nodes are 
> behind haproxy. Java client goes through haproxy. The most obvious 
> behavior is: as soon as one node goes down, the connections between 
> haproxy and Cassandra nodes just shoot up to 1000 (in normal case it 
> is stable at 40), and the connections don't go down for quite a while. 
> Meanwhile Java clients just get all kind of TimeoutException, then 
> kept on retrying. Eventually we have to restart haproxy, then things 
> go back to normal.
>
> Each node has 5GB max heap, powerful enough CPU (quad-core), software 
> RAID mirror. We are definitely NOT putting lots of load yet, mostly 
> 20-50 concurrent requests to Cassandra, but it is not holding up! 
> Please help, we are on the verge of giving up Cassandra after 5 days 
> of periodic "outage".
>
> Couple observations:
>
> - cfstats shows significant read latency on "system" keyspace, almost 
> 5s (see below)
>
> - RecentReadLatencyMicros and RecentWriteLatencyMicros are super high 
> for StorageProxy, as well as every column family in JMX: up to 
> 152676.92 and 6950 (they are in ms, right?). However, in cfstats, they 
> are quite small.
>
> - Every second we see 5-10 DigestMismatchException in the log:
>
> INFO [pool-1-thread-15857] 2010-04-22 00:37:37,887 StorageProxy.java 
> (line 499) DigestMismatchException: Mismatch for key 1068022523 
> (d41d8cd98f00b204e9800998ecf8427e vs 0dd4cdaeeb1a334ae133c6955e109629)
>
> Please advice. Thank you!
>
> storage-conf, cfstats, tpstats are listed below:
>
> root@cdb-006:/glass/sfw/cassandra# cat conf/storage-conf.xml
> <!--
> ~ Licensed to the Apache Software Foundation (ASF) under one
> ~ or more contributor license agreements.  See the NOTICE file
> ~ distributed with this work for additional information
> ~ regarding copyright ownership.  The ASF licenses this file
> ~ to you under the Apache License, Version 2.0 (the
> ~ "License"); you may not use this file except in compliance
> ~ with the License.  You may obtain a copy of the License at
> ~
> ~    http://www.apache.org/licenses/LICENSE-2.0
> ~
> ~ Unless required by applicable law or agreed to in writing,
> ~ software distributed under the License is distributed on an
> ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
> ~ KIND, either express or implied.  See the License for the
> ~ specific language governing permissions and limitations
> ~ under the License.
> -->
> <Storage>
>    
> <!--======================================================================--> 
>
>    <!-- Basic 
> Configuration                                                  -->
>    
> <!--======================================================================--> 
>
>
>    <!--
>       ~ The name of this cluster.  This is mainly used to prevent 
> machines in
>       ~ one logical cluster from joining another.
>      -->
>    <ClusterName>Test Cluster</ClusterName>
>
>    <!--
>       ~ Turn on to make new [non-seed] nodes automatically migrate the 
> right data
>       ~ to themselves.  (If no InitialToken is specified, they will 
> pick one
>       ~ such that they will get half the range of the most-loaded node.)
>       ~ If a node starts up without bootstrapping, it will mark itself 
> bootstrapped
>       ~ so that you can't subsequently accidently bootstrap a node with
>       ~ data on it.  (You can reset this by wiping your data and 
> commitlog
>       ~ directories.)
>       ~
>       ~ Off by default so that new clusters and upgraders from 0.4 don't
>       ~ bootstrap immediately.  You should turn this on when you start 
> adding
>       ~ new nodes to a cluster that already has data on it.  (If you 
> are upgrading
>       ~ from 0.4, start your cluster with it off once before changing 
> it to true.
>       ~ Otherwise, no data will be lost but you will incur a lot of 
> unnecessary
>       ~ I/O before your cluster starts up.)
>      -->
>    <AutoBootstrap>false</AutoBootstrap>
>
>    <!--
>       ~ Keyspaces and ColumnFamilies:
>       ~ A ColumnFamily is the Cassandra concept closest to a relational
>       ~ table.  Keyspaces are separate groups of ColumnFamilies.  
> Except in
>       ~ very unusual circumstances you will have one Keyspace per 
> application.
>
>       ~ There is an implicit keyspace named 'system' for Cassandra 
> internals.
>      -->
>    <Keyspaces>
>        <Keyspace Name="Pandora">
>            <ColumnFamily CompareWith="BytesType" Name="Standard1"/>
>            <ColumnFamily CompareWith="UTF8Type" Name="Standard2"/>
>            <ColumnFamily CompareWith="TimeUUIDType" 
> Name="StandardByUUID1"/>
>            <ColumnFamily CompareWith="TimeUUIDType" Name="Folder1"/>
>            <ColumnFamily CompareWith="UTF8Type" Name="FolderInfo"/>
>            <ColumnFamily ColumnType="Super"
>                          CompareWith="UTF8Type"
>                          CompareSubcolumnsWith="UTF8Type"
>                          Name="User"
>                          Comment="A column family with supercolumns, 
> whose column and subcolumn names are UTF8 strings"/>
>            <ColumnFamily ColumnType="Super"
>                          CompareWith="UTF8Type"
>                          CompareSubcolumnsWith="UTF8Type"
>                          Name="Message"
>                          Comment="A column family with supercolumns, 
> whose column and subcolumn names are UTF8 strings"/>
>            <ColumnFamily ColumnType="Super"
>                          CompareWith="UTF8Type"
>                          CompareSubcolumnsWith="UTF8Type"
>                          Name="Folder"
>                          Comment="A column family with supercolumns, 
> whose column and subcolumn names are UTF8 strings"/>
>            <ColumnFamily ColumnType="Super"
>                          CompareWith="UTF8Type"
>                          CompareSubcolumnsWith="UTF8Type"
>                          Name="Attachment"
>                          Comment="A column family with supercolumns, 
> whose column and subcolumn names are UTF8 strings"/>
>
>            
> <ReplicaPlacementStrategy>org.apache.cassandra.locator.RackUnawareStrategy</ReplicaPlacementStrategy> 
>
>            <ReplicationFactor>3</ReplicationFactor>
>            
> <EndPointSnitch>org.apache.cassandra.locator.EndPointSnitch</EndPointSnitch> 
>
>        </Keyspace>
>
>        <Keyspace Name="Titan">
>            <ColumnFamily ColumnType="Super"
>                          CompareWith="UTF8Type"
>                          CompareSubcolumnsWith="UTF8Type"
>                          Name="Club"
>                          Comment="A column family with supercolumns, 
> whose column and subcolumn names are UTF8 strings"/>
>            <ColumnFamily ColumnType="Super"
>                          CompareWith="UTF8Type"
>                          CompareSubcolumnsWith="UTF8Type"
>                          Name="Payment"
>                          Comment="A column family with supercolumns, 
> whose column and subcolumn names are UTF8 strings"/>
>            <ColumnFamily ColumnType="Super"
>                          CompareWith="UTF8Type"
>                          CompareSubcolumnsWith="UTF8Type"
>                          Name="User"
>                          Comment="A column family with supercolumns, 
> whose column and subcolumn names are UTF8 strings"/>
>            <ColumnFamily ColumnType="Super"
>                          CompareWith="UTF8Type"
>                          CompareSubcolumnsWith="UTF8Type"
>                          Name="FbUser"
>                          Comment="A column family with supercolumns, 
> whose column and subcolumn names are UTF8 strings"/>
>            <ColumnFamily CompareWith="UTF8Type" Name="RandomUsers"/>
>
>            
> <ReplicaPlacementStrategy>org.apache.cassandra.locator.RackUnawareStrategy</ReplicaPlacementStrategy> 
>
>            <ReplicationFactor>3</ReplicationFactor>
>            
> <EndPointSnitch>org.apache.cassandra.locator.EndPointSnitch</EndPointSnitch> 
>
>        </Keyspace>
>          </Keyspaces>
>
>    <!--
>               ~ Authenticator: any IAuthenticator may be used, 
> including your own as long
>               ~ as it is on the classpath.  Out of the box, Cassandra 
> provides
>               ~ org.apache.cassandra.auth.AllowAllAuthenticator and,
>               ~ org.apache.cassandra.auth.SimpleAuthenticator
>               ~ (SimpleAuthenticator uses access.properties and 
> passwd.properties by
>               ~ default).
>               ~
>               ~ If you don't specify an authenticator, 
> AllowAllAuthenticator is used.
>              -->
>    
> <Authenticator>org.apache.cassandra.auth.AllowAllAuthenticator</Authenticator> 
>
>
>    <!--
>               ~ Partitioner: any IPartitioner may be used, including 
> your own as long
>               ~ as it is on the classpath.  Out of the box, Cassandra 
> provides
>               ~ org.apache.cassandra.dht.RandomPartitioner,
>               ~ org.apache.cassandra.dht.OrderPreservingPartitioner, and
>               ~ 
> org.apache.cassandra.dht.CollatingOrderPreservingPartitioner.
>               ~ (CollatingOPP colates according to EN,US rules, not 
> naive byte
>               ~ ordering.  Use this as an example if you need 
> locale-aware collation.)
>               ~ Range queries require using an order-preserving 
> partitioner.
>               ~
>               ~ Achtung!  Changing this parameter requires wiping your 
> data
>               ~ directories, since the partitioner can modify the 
> sstable on-disk
>               ~ format.
>              -->
>    <Partitioner>org.apache.cassandra.dht.RandomPartitioner</Partitioner>
>
>    <!--
>               ~ If you are using an order-preserving partitioner and 
> you know your key
>               ~ distribution, you can specify the token for this node 
> to use. (Keys
>               ~ are sent to the node with the "closest" token, so 
> distributing your
>               ~ tokens equally along the key distribution space will 
> spread keys
>               ~ evenly across your cluster.)  This setting is only 
> checked the first
>               ~ time a node is started.
>
>               ~ This can also be useful with RandomPartitioner to 
> force equal spacing
>               ~ of tokens around the hash space, especially for 
> clusters with a small
>               ~ number of nodes.
>              -->
>    <InitialToken></InitialToken>
>
>    <!--
>               ~ Directories: Specify where Cassandra should store 
> different data on
>               ~ disk.  Keep the data disks and the CommitLog disks 
> separate for best
>               ~ performance
>              -->
>    <CommitLogDirectory>/var/lib/cassandra/commitlog</CommitLogDirectory>
>    <DataFileDirectories>
>        <DataFileDirectory>/var/lib/cassandra/data</DataFileDirectory>
>    </DataFileDirectories>
>
>
>    <!--
>               ~ Addresses of hosts that are deemed contact points. 
> Cassandra nodes
>               ~ use this list of hosts to find each other and learn 
> the topology of
>               ~ the ring. You must change this if you are running 
> multiple nodes!
>              -->
>    <Seeds>
>        <Seed>10.10.104.11</Seed>
>        <Seed>10.10.104.13</Seed>
>        <Seed>10.10.104.15</Seed>
>    </Seeds>
>
>
>    <!-- Miscellaneous -->
>
>    <!-- Time to wait for a reply from other nodes before failing the 
> command -->
>    <RpcTimeoutInMillis>10000</RpcTimeoutInMillis>
>    <!-- Size to allow commitlog to grow to before creating a new 
> segment -->
>    <CommitLogRotationThresholdInMB>128</CommitLogRotationThresholdInMB>
>
>
>    <!-- Local hosts and ports -->
>
>    <!--
>               ~ Address to bind to and tell other nodes to connect 
> to.  You _must_
>               ~ change this if you want multiple nodes to be able to 
> communicate!
>               ~
>               ~ Leaving it blank leaves it up to 
> InetAddress.getLocalHost(). This
>               ~ will always do the Right Thing *if* the node is 
> properly configured
>               ~ (hostname, name resolution, etc), and the Right Thing 
> is to use the
>               ~ address associated with the hostname (it might not be).
>              -->
>    <ListenAddress></ListenAddress>
>    <!-- internal communications port -->
>    <StoragePort>7000</StoragePort>
>
>    <!--
>               ~ The address to bind the Thrift RPC service to. Unlike 
> ListenAddress
>               ~ above, you *can* specify 0.0.0.0 here if you want 
> Thrift to listen on
>               ~ all interfaces.
>               ~
>               ~ Leaving this blank has the same effect it does for 
> ListenAddress,
>               ~ (i.e. it will be based on the configured hostname of 
> the node).
>              -->
>    <ThriftAddress>0.0.0.0</ThriftAddress>
>    <!-- Thrift RPC port (the port clients connect to). -->
>    <ThriftPort>9160</ThriftPort>
>    <!--
>               ~ Whether or not to use a framed transport for Thrift. 
> If this option
>               ~ is set to true then you must also use a framed 
> transport on the
>               ~ client-side, (framed and non-framed transports are not 
> compatible).
>              -->
>    <ThriftFramedTransport>false</ThriftFramedTransport>
>
>
>    
> <!--======================================================================--> 
>
>    <!-- Memory, Disk, and 
> Performance                                        -->
>    
> <!--======================================================================--> 
>
>
>    <!--
>               ~ Access mode.  mmapped i/o is substantially faster, but 
> only practical on
>               ~ a 64bit machine (which notably does not include EC2 
> "small" instances)
>               ~ or relatively small datasets.  "auto", the safe 
> choice, will enable
>               ~ mmapping on a 64bit JVM.  Other values are "mmap", 
> "mmap_index_only"
>               ~ (which may allow you to get part of the benefits of 
> mmap on a 32bit
>               ~ machine by mmapping only index files) and "standard".
>               ~ (The buffer size settings that follow only apply to 
> standard,
>               ~ non-mmapped i/o.)
>               -->
>    <DiskAccessMode>auto</DiskAccessMode>
>
>    <!--
>               ~ Size of compacted row above which to log a warning.  
> (If compacted
>               ~ rows do not fit in memory, Cassandra will crash.  This 
> is explained
>               ~ in 
> http://wiki.apache.org/cassandra/CassandraLimitations and is
>               ~ scheduled to be fixed in 0.7.)
>              -->
>    <RowWarningThresholdInMB>512</RowWarningThresholdInMB>
>
>    <!--
>               ~ Buffer size to use when performing contiguous column 
> slices. Increase
>               ~ this to the size of the column slices you typically 
> perform.
>               ~ (Name-based queries are performed with a buffer size of
>               ~ ColumnIndexSizeInKB.)
>              -->
>    <SlicedBufferSizeInKB>64</SlicedBufferSizeInKB>
>
>    <!--
>               ~ Buffer size to use when flushing memtables to disk. 
> (Only one
>               ~ memtable is ever flushed at a time.) Increase 
> (decrease) the index
>               ~ buffer size relative to the data buffer if you have 
> few (many)
>               ~ columns per key.  Bigger is only better _if_ your 
> memtables get large
>               ~ enough to use the space. (Check in your data directory 
> after your
>               ~ app has been running long enough.) -->
>    <FlushDataBufferSizeInMB>32</FlushDataBufferSizeInMB>
>    <FlushIndexBufferSizeInMB>8</FlushIndexBufferSizeInMB>
>
>    <!--
>               ~ Add column indexes to a row after its contents reach 
> this size.
>               ~ Increase if your column values are large, or if you 
> have a very large
>               ~ number of columns.  The competing causes are, 
> Cassandra has to
>               ~ deserialize this much of the row to read a single 
> column, so you want
>               ~ it to be small - at least if you do many partial-row 
> reads - but all
>               ~ the index data is read for each access, so you don't 
> want to generate
>               ~ that wastefully either.
>              -->
>    <ColumnIndexSizeInKB>64</ColumnIndexSizeInKB>
>
>    <!--
>               ~ Flush memtable after this much data has been inserted, 
> including
>               ~ overwritten data.  There is one memtable per column 
> family, and
>               ~ this threshold is based solely on the amount of data 
> stored, not
>               ~ actual heap memory usage (there is some overhead in 
> indexing the
>               ~ columns).
>              -->
>    <MemtableThroughputInMB>64</MemtableThroughputInMB>
>    <!--
>               ~ Throughput setting for Binary Memtables.  Typically 
> these are
>               ~ used for bulk load so you want them to be larger.
>              -->
>    <BinaryMemtableThroughputInMB>256</BinaryMemtableThroughputInMB>
>    <!--
>               ~ The maximum number of columns in millions to store in 
> memory per
>               ~ ColumnFamily before flushing to disk.  This is also a 
> per-memtable
>               ~ setting.  Use with MemtableThroughputInMB to tune 
> memory usage.
>              -->
>    <MemtableOperationsInMillions>0.3</MemtableOperationsInMillions>
>    <!--
>               ~ The maximum time to leave a dirty memtable unflushed.
>               ~ (While any affected columnfamilies have unflushed data 
> from a
>               ~ commit log segment, that segment cannot be deleted.)
>               ~ This needs to be large enough that it won't cause a 
> flush storm
>               ~ of all your memtables flushing at once because none 
> has hit
>               ~ the size or count thresholds yet.  For production, a 
> larger
>               ~ value such as 1440 is recommended.
>              -->
>    <MemtableFlushAfterMinutes>60</MemtableFlushAfterMinutes>
>
>    <!--
>               ~ Unlike most systems, in Cassandra writes are faster 
> than reads, so
>               ~ you can afford more of those in parallel.  A good rule 
> of thumb is 2
>               ~ concurrent reads per processor core.  Increase 
> ConcurrentWrites to
>               ~ the number of clients writing at once if you enable 
> CommitLogSync +
>               ~ CommitLogSyncDelay. -->
>    <ConcurrentReads>8</ConcurrentReads>
>    <ConcurrentWrites>32</ConcurrentWrites>
>
>    <!--
>               ~ CommitLogSync may be either "periodic" or "batch."  
> When in batch
>               ~ mode, Cassandra won't ack writes until the commit log 
> has been
>               ~ fsynced to disk.  It will wait up to 
> CommitLogSyncBatchWindowInMS
>               ~ milliseconds for other writes, before performing the 
> sync.
>
>               ~ This is less necessary in Cassandra than in 
> traditional databases
>               ~ since replication reduces the odds of losing data from 
> a failure
>               ~ after writing the log entry but before it actually 
> reaches the disk.
>               ~ So the other option is "periodic," where writes may be 
> acked immediately
>               ~ and the CommitLog is simply synced every 
> CommitLogSyncPeriodInMS
>               ~ milliseconds.
>              -->
>    <CommitLogSync>periodic</CommitLogSync>
>    <!--
>               ~ Interval at which to perform syncs of the CommitLog in 
> periodic mode.
>               ~ Usually the default of 10000ms is fine; increase it if 
> your i/o
>               ~ load is such that syncs are taking excessively long 
> times.
>              -->
>    <CommitLogSyncPeriodInMS>10000</CommitLogSyncPeriodInMS>
>    <!--
>               ~ Delay (in milliseconds) during which additional commit 
> log entries
>               ~ may be written before fsync in batch mode.  This will 
> increase
>               ~ latency slightly, but can vastly improve throughput 
> where there are
>               ~ many writers.  Set to zero to disable (each entry will 
> be synced
>               ~ individually).  Reasonable values range from a minimal 
> 0.1 to 10 or
>               ~ even more if throughput matters more than latency.
>              -->
>    <!-- <CommitLogSyncBatchWindowInMS>1</CommitLogSyncBatchWindowInMS> 
> -->
>
>    <!--
>               ~ Time to wait before garbage-collection deletion 
> markers.  Set this to
>               ~ a large enough value that you are confident that the 
> deletion marker
>               ~ will be propagated to all replicas by the time this 
> many seconds has
>               ~ elapsed, even in the face of hardware failures.  The 
> default value is
>               ~ ten days.
>              -->
>    <GCGraceSeconds>864000</GCGraceSeconds>
> </Storage>
> root@cdb-006:/glass/sfw/cassandra# bin/nodetool -h localhost cfstats
> Keyspace: system
>    Read Count: 878
>    Read Latency: 5752.042634396355 ms.
>    Write Count: 2260398
>    Write Latency: 0.014567047926957996 ms.
>    Pending Tasks: 0
>        Column Family: LocationInfo
>        SSTable count: 2
>        Space used (live): 3569
>        Space used (total): 3569
>        Memtable Columns Count: 0
>        Memtable Data Size: 0
>        Memtable Switch Count: 1
>        Read Count: 1
>        Read Latency: NaN ms.
>        Write Count: 6
>        Write Latency: NaN ms.
>        Pending Tasks: 0
>        Key cache capacity: 2
>        Key cache size: 1
>        Key cache hit rate: NaN
>        Row cache: disabled
>        Compacted row minimum size: 0
>        Compacted row maximum size: 0
>        Compacted row mean size: 0
>
>        Column Family: HintsColumnFamily
>        SSTable count: 2
>        Space used (live): 70272035
>        Space used (total): 70272035
>        Memtable Columns Count: 56264
>        Memtable Data Size: 486854
>        Memtable Switch Count: 21
>        Read Count: 877
>        Read Latency: 13614.412 ms.
>        Write Count: 2260392
>        Write Latency: 0.142 ms.
>        Pending Tasks: 0
>        Key cache capacity: 2
>        Key cache size: 2
>        Key cache hit rate: 0.25
>        Row cache: disabled
>        Compacted row minimum size: 78567
>        Compacted row maximum size: 39561901
>        Compacted row mean size: 27878603
>
> ----------------
> Keyspace: Titan
>    Read Count: 8948702
>    Read Latency: 7.949136100185256 ms.
>    Write Count: 3393490
>    Write Latency: 0.19255415398306758 ms.
>    Pending Tasks: 0
>        Column Family: FbUser
>        SSTable count: 6
>        Space used (live): 3675014807
>        Space used (total): 3675014807
>        Memtable Columns Count: 250055
>        Memtable Data Size: 9146339
>        Memtable Switch Count: 9
>        Read Count: 6591406
>        Read Latency: 8.361 ms.
>        Write Count: 343030
>        Write Latency: 0.078 ms.
>        Pending Tasks: 0
>        Key cache capacity: 200000
>        Key cache size: 200000
>        Key cache hit rate: 0.6341912478864628
>        Row cache: disabled
>        Compacted row minimum size: 320
>        Compacted row maximum size: 586
>        Compacted row mean size: 479
>
>        Column Family: Payment
>        SSTable count: 3
>        Space used (live): 2473
>        Space used (total): 2473
>        Memtable Columns Count: 0
>        Memtable Data Size: 0
>        Memtable Switch Count: 0
>        Read Count: 27728
>        Read Latency: 0.059 ms.
>        Write Count: 0
>        Write Latency: NaN ms.
>        Pending Tasks: 0
>        Key cache capacity: 200000
>        Key cache size: 0
>        Key cache hit rate: NaN
>        Row cache: disabled
>        Compacted row minimum size: 0
>        Compacted row maximum size: 0
>        Compacted row mean size: 0
>
>        Column Family: Club
>        SSTable count: 6
>        Space used (live): 363609256
>        Space used (total): 477968226
>        Memtable Columns Count: 124134
>        Memtable Data Size: 38144903
>        Memtable Switch Count: 140
>        Read Count: 63668
>        Read Latency: 29.187 ms.
>        Write Count: 1966098
>        Write Latency: 0.184 ms.
>        Pending Tasks: 0
>        Key cache capacity: 200000
>        Key cache size: 63751
>        Key cache hit rate: 0.6543073524771221
>        Row cache: disabled
>        Compacted row minimum size: 274
>        Compacted row maximum size: 49784
>        Compacted row mean size: 7711
>
>        Column Family: RandomUsers
>        SSTable count: 1
>        Space used (live): 1329464
>        Space used (total): 1329464
>        Memtable Columns Count: 170433
>        Memtable Data Size: 5525529
>        Memtable Switch Count: 8
>        Read Count: 58236
>        Read Latency: 6.215 ms.
>        Write Count: 143339
>        Write Latency: 0.575 ms.
>        Pending Tasks: 0
>        Key cache capacity: 200000
>        Key cache size: 21
>        Key cache hit rate: 1.0
>        Row cache: disabled
>        Compacted row minimum size: 1244196
>        Compacted row maximum size: 1327409
>        Compacted row mean size: 1287929
>
>        Column Family: User
>        SSTable count: 4
>        Space used (live): 583812728
>        Space used (total): 583812728
>        Memtable Columns Count: 248833
>        Memtable Data Size: 7909687
>        Memtable Switch Count: 12
>        Read Count: 2207667
>        Read Latency: 8.413 ms.
>        Write Count: 941029
>        Write Latency: 0.107 ms.
>        Pending Tasks: 0
>        Key cache capacity: 200000
>        Key cache size: 94940
>        Key cache hit rate: 0.9769078238427732
>        Row cache: disabled
>        Compacted row minimum size: 259
>        Compacted row maximum size: 174340
>        Compacted row mean size: 3916
>
> ----------------
> Keyspace: Pandora
>    Read Count: 1475530
>    Read Latency: 5.249669284257182 ms.
>    Write Count: 856550
>    Write Latency: 0.16070848053236822 ms.
>    Pending Tasks: 0
>        Column Family: Folder
>        SSTable count: 7
>        Space used (live): 942926793
>        Space used (total): 942926793
>        Memtable Columns Count: 150119
>        Memtable Data Size: 814194
>        Memtable Switch Count: 21
>        Read Count: 604605
>        Read Latency: 7.539 ms.
>        Write Count: 553015
>        Write Latency: 0.173 ms.
>        Pending Tasks: 0
>        Key cache capacity: 200000
>        Key cache size: 111884
>        Key cache hit rate: 0.9770064886992118
>        Row cache: disabled
>        Compacted row minimum size: 234
>        Compacted row maximum size: 1644541
>        Compacted row mean size: 1238
>
>        Column Family: Attachment
>        SSTable count: 5
>        Space used (live): 754692822
>        Space used (total): 754692822
>        Memtable Columns Count: 548
>        Memtable Data Size: 7110
>        Memtable Switch Count: 8
>        Read Count: 22708
>        Read Latency: 6.950 ms.
>        Write Count: 29835
>        Write Latency: 0.025 ms.
>        Pending Tasks: 0
>        Key cache capacity: 200000
>        Key cache size: 8299
>        Key cache hit rate: 0.017857142857142856
>        Row cache: disabled
>        Compacted row minimum size: 269
>        Compacted row maximum size: 274
>        Compacted row mean size: 272
>
>        Column Family: Message
>        SSTable count: 7
>        Space used (live): 3689251440
>        Space used (total): 3689251440
>        Memtable Columns Count: 50399
>        Memtable Data Size: 1361773
>        Memtable Switch Count: 10
>        Read Count: 163630
>        Read Latency: 16.981 ms.
>        Write Count: 257843
>        Write Latency: 0.113 ms.
>        Pending Tasks: 0
>        Key cache capacity: 200000
>        Key cache size: 52427
>        Key cache hit rate: 0.5041365046535677
>        Row cache: disabled
>        Compacted row minimum size: 272
>        Compacted row maximum size: 2754
>        Compacted row mean size: 752
>
>        Column Family: FolderInfo
>        SSTable count: 0
>        Space used (live): 0
>        Space used (total): 0
>        Memtable Columns Count: 0
>        Memtable Data Size: 0
>        Memtable Switch Count: 0
>        Read Count: 8704
>        Read Latency: 0.008 ms.
>        Write Count: 0
>        Write Latency: NaN ms.
>        Pending Tasks: 0
>        Key cache capacity: 200000
>        Key cache size: 0
>        Key cache hit rate: NaN
>        Row cache: disabled
>        Compacted row minimum size: 0
>        Compacted row maximum size: 0
>        Compacted row mean size: 0
>
>        Column Family: StandardByUUID1
>        SSTable count: 0
>        Space used (live): 0
>        Space used (total): 0
>        Memtable Columns Count: 0
>        Memtable Data Size: 0
>        Memtable Switch Count: 0
>        Read Count: 8704
>        Read Latency: 0.005 ms.
>        Write Count: 0
>        Write Latency: NaN ms.
>        Pending Tasks: 0
>        Key cache capacity: 200000
>        Key cache size: 0
>        Key cache hit rate: NaN
>        Row cache: disabled
>        Compacted row minimum size: 0
>        Compacted row maximum size: 0
>        Compacted row mean size: 0
>
>        Column Family: User
>        SSTable count: 4
>        Space used (live): 120121309
>        Space used (total): 120121309
>        Memtable Columns Count: 1786
>        Memtable Data Size: 50657
>        Memtable Switch Count: 8
>        Read Count: 641067
>        Read Latency: 2.728 ms.
>        Write Count: 15857
>        Write Latency: 0.148 ms.
>        Pending Tasks: 0
>        Key cache capacity: 200000
>        Key cache size: 45965
>        Key cache hit rate: 0.957460499034818
>        Row cache: disabled
>        Compacted row minimum size: 271
>        Compacted row maximum size: 506
>        Compacted row mean size: 499
>
>        Column Family: Standard1
>        SSTable count: 0
>        Space used (live): 0
>        Space used (total): 0
>        Memtable Columns Count: 0
>        Memtable Data Size: 0
>        Memtable Switch Count: 0
>        Read Count: 8704
>        Read Latency: 0.005 ms.
>        Write Count: 0
>        Write Latency: NaN ms.
>        Pending Tasks: 0
>        Key cache capacity: 200000
>        Key cache size: 0
>        Key cache hit rate: NaN
>        Row cache: disabled
>        Compacted row minimum size: 0
>        Compacted row maximum size: 0
>        Compacted row mean size: 0
>
>        Column Family: Standard2
>        SSTable count: 0
>        Space used (live): 0
>        Space used (total): 0
>        Memtable Columns Count: 0
>        Memtable Data Size: 0
>        Memtable Switch Count: 0
>        Read Count: 8704
>        Read Latency: 0.006 ms.
>        Write Count: 0
>        Write Latency: NaN ms.
>        Pending Tasks: 0
>        Key cache capacity: 200000
>        Key cache size: 0
>        Key cache hit rate: NaN
>        Row cache: disabled
>        Compacted row minimum size: 0
>        Compacted row maximum size: 0
>        Compacted row mean size: 0
>
>        Column Family: Folder1
>        SSTable count: 0
>        Space used (live): 0
>        Space used (total): 0
>        Memtable Columns Count: 0
>        Memtable Data Size: 0
>        Memtable Switch Count: 0
>        Read Count: 8704
>        Read Latency: 0.005 ms.
>        Write Count: 0
>        Write Latency: NaN ms.
>        Pending Tasks: 0
>        Key cache capacity: 200000
>        Key cache size: 0
>        Key cache hit rate: NaN
>        Row cache: disabled
>        Compacted row minimum size: 0
>        Compacted row maximum size: 0
>        Compacted row mean size: 0
>
> ----------------
> root@cdb-006:/glass/sfw/cassandra# bin/nodetool -h localhost tpstats
> Pool Name                    Active   Pending      Completed
> FILEUTILS-DELETE-POOL             0         0             70
> STREAM-STAGE                      0         0              0
> RESPONSE-STAGE                    0         0       18298968
> ROW-READ-STAGE                    1         0       10168985
> LB-OPERATIONS                     0         0              0
> MESSAGE-DESERIALIZER-POOL         0         0       24960681
> GMFD                              0         0          93647
> LB-TARGET                         0         0              0
> CONSISTENCY-MANAGER               0         0              0
> ROW-MUTATION-STAGE                0         0        4280703
> MESSAGE-STREAMING-POOL            0         0             12
> LOAD-BALANCER-STAGE               0         0              0
> FLUSH-SORTER-POOL                 0         0              0
> MEMTABLE-POST-FLUSHER             0         0            224
> FLUSH-WRITER-POOL                 0         0            224
> AE-SERVICE-STAGE                  0         0             22
> HINTED-HANDOFF-POOL               1         5             73
>
>
> ------------------------------------------------------------------------

RE: Periodically hiccups

Posted by "Dr. Martin Grabmüller" <Ma...@eleven.de>.

Hello Alex,

unfortunately I can not help with your problem, just one hint: 

> - RecentReadLatencyMicros and RecentWriteLatencyMicros are super high 
> for StorageProxy, as well as every column family in JMX: up 
> to 43 s and 
> 9s (see screenshot). However, in cfstats, they are quite small.

Remember that the latency values in JMX are in microseconds, whereas
cfstats reports in ms.  This was changed between 0.5 and 0.6, IIRC.

Greetings,
  Martin