You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Donald Smith (JIRA)" <ji...@apache.org> on 2013/12/23 19:21:58 UTC

[jira] [Comment Edited] (CASSANDRA-5220) Repair improvements when using vnodes

    [ https://issues.apache.org/jira/browse/CASSANDRA-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13855774#comment-13855774 ] 

Donald Smith edited comment on CASSANDRA-5220 at 12/23/13 6:21 PM:
-------------------------------------------------------------------

 We ran "nodetool repair" on a 3 node cassandra cluster with production-quality hardware, using version 2.0.3. Each node had about 1TB of data. This is still testing.  After 5 days the repair job still hasn't finished. I can see it's still running.

Here's the process:
{noformat}
root     30835 30774  0 Dec17 pts/0    00:03:53 /usr/bin/java -cp /etc/cassandra/conf:/usr/share/java/jna.jar:/usr/share/cassandra/lib/antlr-3.2.jar:/usr/share/cassandra/lib/apache-cassandra-2.0.3.jar:/usr/share/cassandra/lib/apache-cassandra-clientutil-2.0.3.jar:/usr/share/cassandra/lib/apache-cassandra-thrift-2.0.3.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-lang3-3.1.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.3.jar:/usr/share/cassandra/lib/disruptor-3.0.1.jar:/usr/share/cassandra/lib/guava-15.0.jar:/usr/share/cassandra/lib/high-scale-lib-1.1.2.jar:/usr/share/cassandra/lib/jackson-core-asl-1.9.2.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.9.2.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar:/usr/share/cassandra/lib/jbcrypt-0.3m.jar:/usr/share/cassandra/lib/jline-1.0.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/libthrift-0.9.1.jar:/usr/share/cassandra/lib/log4j-1.2.16.jar:/usr/share/cassandra/lib/lz4-1.2.0.jar:/usr/share/cassandra/lib/metrics-core-2.2.0.jar:/usr/share/cassandra/lib/netty-3.6.6.Final.jar:/usr/share/cassandra/lib/reporter-config-2.1.0.jar:/usr/share/cassandra/lib/servlet-api-2.5-20081211.jar:/usr/share/cassandra/lib/slf4j-api-1.7.2.jar:/usr/share/cassandra/lib/slf4j-log4j12-1.7.2.jar:/usr/share/cassandra/lib/snakeyaml-1.11.jar:/usr/share/cassandra/lib/snappy-java-1.0.5.jar:/usr/share/cassandra/lib/snaptree-0.1.jar:/usr/share/cassandra/lib/stress.jar:/usr/share/cassandra/lib/thrift-server-0.3.2.jar -Xmx32m -Dlog4j.configuration=log4j-tools.properties -Dstorage-config=/etc/cassandra/conf org.apache.cassandra.tools.NodeCmd -p 7199 repair -pr as_reports
{noformat}

The log output has just:
{noformat}
xss =  -ea -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms8192M -Xmx8192M -Xmn2048M -XX:+HeapDumpOnOutOfMemoryError -Xss256k
[2013-12-17 23:26:48,144] Starting repair command #1, repairing 256 ranges for keyspace as_reports
{noformat}

Here's the output of "nodetool tpstats":
{noformat}
cass3 /tmp> nodetool tpstats
xss =  -ea -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms8192M -Xmx8192M -Xmn2048M -XX:+HeapDumpOnOutOfMemoryError -Xss256k
Pool Name                    Active   Pending      Completed   Blocked  All time blocked
ReadStage                         1         0       38083403         0                 0
RequestResponseStage              0         0     1951200451         0                 0
MutationStage                     0         0     2853354069         0                 0
ReadRepairStage                   0         0        3794926         0                 0
ReplicateOnWriteStage             0         0              0         0                 0
GossipStage                       0         0        4880147         0                 0
AntiEntropyStage                  1         3              9         0                 0
MigrationStage                    0         0             30         0                 0
MemoryMeter                       0         0            115         0                 0
MemtablePostFlusher               0         0          75121         0                 0
FlushWriter                       0         0          49934         0                52
MiscStage                         0         0              0         0                 0
PendingRangeCalculator            0         0              7         0                 0
commitlog_archiver                0         0              0         0                 0
AntiEntropySessions               1         1              1         0                 0
InternalResponseStage             0         0              9         0                 0
HintedHandoff                     0         0           1141         0                 0

Message type           Dropped
RANGE_SLICE                  0
READ_REPAIR                  0
PAGED_RANGE                  0
BINARY                       0
READ                       884
MUTATION               1407711
_TRACE                       0
REQUEST_RESPONSE             0
{noformat}
The cluster has some write traffic to it. We decided to test it under load.
This is the busiest column family, as reported by "nodetool cfstats":
{noformat}
   Read Count: 38084316
        Read Latency: 9.409910464927346 ms.
        Write Count: 2850436738
        Write Latency: 0.8083138546641199 ms.
        Pending Tasks: 0
....
    Table: data_report_details
                SSTable count: 592
                Space used (live), bytes: 160644106183
                Space used (total), bytes: 160663248847
                SSTable Compression Ratio: 0.5296494510512617
                Number of keys (estimate): 51015040
                Memtable cell count: 311180
                Memtable data size, bytes: 46275953
                Memtable switch count: 6100
                Local read count: 6147
                Local read latency: 154.539 ms
                Local write count: 750865416
                Local write latency: 0.029 ms
                Pending tasks: 0
                Bloom filter false positives: 265
                Bloom filter false ratio: 0.06009
                Bloom filter space used, bytes: 64690104
                Compacted partition minimum bytes: 30
                Compacted partition maximum bytes: 10090808
                Compacted partition mean bytes: 5267
                Average live cells per slice (last five minutes): 1.0
                Average tombstones per slice (last five minutes): 0.0
{noformat}
We're gonna restart the node.  We barely do deletes or updates (only if a report is re-uploaded), so we suspect that we can get by without doing repairs. Correct me if we're wrong about that.


was (Author: thinkerfeeler):
 We ran "nodetool repair" on a 3 node cassandra cluster with production-quality hardware, using version 2.0.3. Each node had about 1TB of data. This is still testing.  After 5 days the repair job still hasn't finished. I can see it's still running.

Here's the process:
{noformat}
root     30835 30774  0 Dec17 pts/0    00:03:53 /usr/bin/java -cp /etc/cassandra/conf:/usr/share/java/jna.jar:/usr/share/cassandra/lib/antlr-3.2.jar:/usr/share/cassandra/lib/apache-cassandra-2.0.3.jar:/usr/share/cassandra/lib/apache-cassandra-clientutil-2.0.3.jar:/usr/share/cassandra/lib/apache-cassandra-thrift-2.0.3.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-lang3-3.1.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.3.jar:/usr/share/cassandra/lib/disruptor-3.0.1.jar:/usr/share/cassandra/lib/guava-15.0.jar:/usr/share/cassandra/lib/high-scale-lib-1.1.2.jar:/usr/share/cassandra/lib/jackson-core-asl-1.9.2.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.9.2.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar:/usr/share/cassandra/lib/jbcrypt-0.3m.jar:/usr/share/cassandra/lib/jline-1.0.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/libthrift-0.9.1.jar:/usr/share/cassandra/lib/log4j-1.2.16.jar:/usr/share/cassandra/lib/lz4-1.2.0.jar:/usr/share/cassandra/lib/metrics-core-2.2.0.jar:/usr/share/cassandra/lib/netty-3.6.6.Final.jar:/usr/share/cassandra/lib/reporter-config-2.1.0.jar:/usr/share/cassandra/lib/servlet-api-2.5-20081211.jar:/usr/share/cassandra/lib/slf4j-api-1.7.2.jar:/usr/share/cassandra/lib/slf4j-log4j12-1.7.2.jar:/usr/share/cassandra/lib/snakeyaml-1.11.jar:/usr/share/cassandra/lib/snappy-java-1.0.5.jar:/usr/share/cassandra/lib/snaptree-0.1.jar:/usr/share/cassandra/lib/stress.jar:/usr/share/cassandra/lib/thrift-server-0.3.2.jar -Xmx32m -Dlog4j.configuration=log4j-tools.properties -Dstorage-config=/etc/cassandra/conf org.apache.cassandra.tools.NodeCmd -p 7199 repair -pr as_reports
{noformat}

The log output has just:
{noformat}
xss =  -ea -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms8192M -Xmx8192M -Xmn2048M -XX:+HeapDumpOnOutOfMemoryError -Xss256k
[2013-12-17 23:26:48,144] Starting repair command #1, repairing 256 ranges for keyspace as_reports
{noformat}

Here's the output of "nodetool tpstats":
{noformat}
cass3 /tmp> nodetool tpstats
xss =  -ea -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms8192M -Xmx8192M -Xmn2048M -XX:+HeapDumpOnOutOfMemoryError -Xss256k
Pool Name                    Active   Pending      Completed   Blocked  All time blocked
ReadStage                         1         0       38083403         0                 0
RequestResponseStage              0         0     1951200451         0                 0
MutationStage                     0         0     2853354069         0                 0
ReadRepairStage                   0         0        3794926         0                 0
ReplicateOnWriteStage             0         0              0         0                 0
GossipStage                       0         0        4880147         0                 0
AntiEntropyStage                  1         3              9         0                 0
MigrationStage                    0         0             30         0                 0
MemoryMeter                       0         0            115         0                 0
MemtablePostFlusher               0         0          75121         0                 0
FlushWriter                       0         0          49934         0                52
MiscStage                         0         0              0         0                 0
PendingRangeCalculator            0         0              7         0                 0
commitlog_archiver                0         0              0         0                 0
AntiEntropySessions               1         1              1         0                 0
InternalResponseStage             0         0              9         0                 0
HintedHandoff                     0         0           1141         0                 0

Message type           Dropped
RANGE_SLICE                  0
READ_REPAIR                  0
PAGED_RANGE                  0
BINARY                       0
READ                       884
MUTATION               1407711
_TRACE                       0
REQUEST_RESPONSE             0
{noformat}
The cluster has some write traffic to it. We decided to test it under load.
This is the busiest column family, as reported by "nodetool cfstats":
{noformat}
   Read Count: 38084316
        Read Latency: 9.409910464927346 ms.
        Write Count: 2850436738
        Write Latency: 0.8083138546641199 ms.
        Pending Tasks: 0
....
    Table: data_report_details
                SSTable count: 592
                Space used (live), bytes: 160644106183
                Space used (total), bytes: 160663248847
                SSTable Compression Ratio: 0.5296494510512617
                Number of keys (estimate): 51015040
                Memtable cell count: 311180
                Memtable data size, bytes: 46275953
                Memtable switch count: 6100
                Local read count: 6147
                Local read latency: 154.539 ms
                Local write count: 750865416
                Local write latency: 0.029 ms
                Pending tasks: 0
                Bloom filter false positives: 265
                Bloom filter false ratio: 0.06009
                Bloom filter space used, bytes: 64690104
                Compacted partition minimum bytes: 30
                Compacted partition maximum bytes: 10090808
                Compacted partition mean bytes: 5267
                Average live cells per slice (last five minutes): 1.0
                Average tombstones per slice (last five minutes): 0.0
{noformat}
We're gonna restart the node.  We barely do deletes or updates (only if a report is re-uploaded), so we suspect that we can get by without doing repairs.

> Repair improvements when using vnodes
> -------------------------------------
>
>                 Key: CASSANDRA-5220
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5220
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 1.2.0 beta 1
>            Reporter: Brandon Williams
>            Assignee: Yuki Morishita
>             Fix For: 2.1
>
>
> Currently when using vnodes, repair takes much longer to complete than without them.  This appears at least in part because it's using a session per range and processing them sequentially.  This generates a lot of log spam with vnodes, and while being gentler and lighter on hard disk deployments, ssd-based deployments would often prefer that repair be as fast as possible.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)