You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Paulo Motta (JIRA)" <ji...@apache.org> on 2015/12/17 02:34:46 UTC

[jira] [Commented] (CASSANDRA-10874) running stress with compaction strategy and replication factor fails on read after write

    [ https://issues.apache.org/jira/browse/CASSANDRA-10874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15061292#comment-15061292 ] 

Paulo Motta commented on CASSANDRA-10874:
-----------------------------------------

afaik the default stress consistency is ONE for writes and reads, so since you're writing with 300 threads, it's expected that some mutations will be dropped due to overload and some ONE reads will fail, since dropped mutations were not yet hinted (only after 10 minutes). that's why the problem doesn't work without replication or with read CL = quorum.

are repairs completing and do stress reads work after it? if so, I suspect those might only be reporting/presentation errors.

> running stress with compaction strategy and replication factor fails on read after write
> ----------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-10874
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10874
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Tools
>            Reporter: Andrew Hust
>
> When running a read stress after write stress with a compaction strategy and replication factor matching the node count will fail with an exception.  
> {code}
> Operation x0 on key(s) [38343433384b34364c30]: Data returned was not validated
> {code}
> Example run:
> {code}
> ccm create stress -v git:cassandra-3.0 -n 3 -s
> ccm node1 stress write n=10M -rate threads=300 -schema replication\(factor=3\) compaction\(strategy=org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy\)
> ccm node1 nodetool flush
> ccm node1 nodetool compactionstats # check until quiet
> ccm node1 stress read n=10M -rate threads=300
> {code}
> - This will fail with/out vnodes but will occasionally pass without vnodes. 
> - Changing the read phase to be CL=QUORUM will make it pass.  
> - Removing the replication factor on write will make it pass.
> - Happens on all compaction strategies
> So with that in mind I attempted to add a repair after the write phase.  This leads to 1 of 2 outcomes.
> 1: a repair that has a greater than 100% completion, usually stalls after a bit, but have seen it get to >400% progress:
> {code}
>                                       id   compaction type    keyspace       table     completed         total    unit   progress
>     2d5344c0-9dc8-11e5-9d5f-4fdec8d76c27        Validation   keyspace1   standard1   94722609949   44035292145   bytes    215.11%
> {code}
> 2: a repair that has a greatly inflated completed/total value, it will crunch for a bit then lockup:
> {code}
>                                      id   compaction type    keyspace       table   completed          total    unit   progress
>    8c4cf7f0-a34a-11e5-a321-777be88c58ae        Validation   keyspace1   standard1           0   874811100900   bytes      0.00%
> ❯ du -sh ~/.ccm/stress/node1/
> 2.4G  ~/.ccm/stress/node1/
> ❯ du -sh ~/.ccm/stress
> 7.1G  ~/.ccm/stress
> {code}
> This has been reproduced on cassandra-3.0 and cassandra-2.1 both locally and using cstar_perf (links below).  
> A big twist is that cassandra-2.2 will pass the majority of the time.  It will complete successfully without the repair 8 out of 10 runs.  This can be seen in the cstar_perf links below.
> cstar_perf runs:
> http://cstar.datastax.com/tests/id/c8fa27a4-a205-11e5-8fbc-0256e416528f
> http://cstar.datastax.com/tests/id/a254c572-a2ce-11e5-a8b9-0256e416528f



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)