You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Alan Boudreault (JIRA)" <ji...@apache.org> on 2015/05/11 17:38:00 UTC
[jira] [Commented] (CASSANDRA-8819) LOCAL_QUORUM writes returns
wrong message
[ https://issues.apache.org/jira/browse/CASSANDRA-8819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538067#comment-14538067 ]
Alan Boudreault commented on CASSANDRA-8819:
--------------------------------------------
Committed!
> LOCAL_QUORUM writes returns wrong message
> -----------------------------------------
>
> Key: CASSANDRA-8819
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8819
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Environment: CentOS 6.6
> Reporter: Wei Zhu
> Assignee: Sylvain Lebresne
> Labels: qa-resolved
> Fix For: 2.0.13
>
> Attachments: 8819-2.0.patch
>
>
> We have two DC3, each with 7 nodes.
> Here is the keyspace setup:
> create keyspace test
> with placement_strategy = 'NetworkTopologyStrategy'
> and strategy_options = {DC2 : 3, DC1 : 3}
> and durable_writes = true;
> We brought down two nodes in DC2 for maintenance. We only write to DC1 using local_quroum (using datastax JavaClient)
> But we see this errors in the log:
> Cassandra timeout during write query at consistency LOCAL_QUORUM (4 replica were required but only 3 acknowledged the write
> why does it say 4 replica were required? and Why would it give error back to client since local_quorum should succeed.
> Here are the output from nodetool status
> Note: Ownership information does not include topology; for complete information, specify a keyspace
> Datacenter: DC2
> ===============
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> -- Address Load Tokens Owns Host ID Rack
> UN 10.2.0.1 10.92 GB 256 7.9% XXXXXXXXXXXXXXXXXXXXXXXXXXXX RAC206
> UN 10.2.0.2 6.17 GB 256 8.0% XXXXXXXXXXXXXXXXXXXXXXXXXXXX RAC106
> UN 10.2.0.3 6.63 GB 256 7.3% XXXXXXXXXXXXXXXXXXXXXXXXXXXX RAC107
> DL 10.2.0.4 1.54 GB 256 7.7% XXXXXXXXXXXXXXXXXXXXXXXXXXXX RAC107
> UN 10.2.0.5 6.02 GB 256 6.6% XXXXXXXXXXXXXXXXXXXXXXXXXXXX RAC106
> UJ 10.2.0.6 3.68 GB 256 ? XXXXXXXXXXXXXXXXXXXXXXXXXXXX RAC205
> UN 10.2.0.7 7.22 GB 256 7.7% XXXXXXXXXXXXXXXXXXXXXXXXXXXX RAC205
> Datacenter: DC1
> ===============
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> -- Address Load Tokens Owns Host ID Rack
> UN 10.1.0.1 6.04 GB 256 8.6% XXXXXXXXXXXXXXXXXXXXXXXXXXXX RAC10
> UN 10.1.0.2 7.55 GB 256 7.4% XXXXXXXXXXXXXXXXXXXXXXXXXXXX RAC8
> UN 10.1.0.3 5.83 GB 256 7.0% XXXXXXXXXXXXXXXXXXXXXXXXXXXX RAC9
> UN 10.1.0.4 7.34 GB 256 7.9% XXXXXXXXXXXXXXXXXXXXXXXXXXXX RAC6
> UN 10.1.0.5 7.57 GB 256 8.0% XXXXXXXXXXXXXXXXXXXXXXXXXXXX RAC7
> UN 10.1.0.6 5.31 GB 256 7.3% XXXXXXXXXXXXXXXXXXXXXXXXXXXX RAC10
> UN 10.1.0.7 5.47 GB 256 8.6% XXXXXXXXXXXXXXXXXXXXXXXXXXXX RAC9
> I did a cql trace on the query and here is the trace, and it does say
> Write timeout; received 3 of 4 required replies | 17:27:52,831 | 10.1.0.1 | 2002873
> at the end. I guess that is where the client gets the error from. But the rows was inserted to Cassandra correctly. And I traced read with local_quorum and it behaves correctly and the reads don't go to DC2. The problem is only with writes on local_quorum.
> {code}
> Tracing session: 5a789fb0-b70d-11e4-8fca-99bff9c19890
> activity | timestamp | source | source_elapsed
> ---------------------------------------------------------------------------------------------------------------------------------------------+--------------+-------------+----------------
> execute_cql3_query | 17:27:50,828 | 10.1.0.1 | 0
> Parsing insert into test (user_id, created, event_data, event_id)values ( 123456789 , 9eab8950-b70c-11e4-8fca-99bff9c19891, 'test', '16'); | 17:27:50,828 | 10.1.0.1 | 39
> Preparing statement | 17:27:50,828 | 10.1.0.1 | 135
> Message received from /10.1.0.1 | 17:27:50,829 | 10.1.0.5 | 25
> Sending message to /10.1.0.5 | 17:27:50,829 | 10.1.0.1 | 421
> Executing single-partition query on users | 17:27:50,829 | 10.1.0.5 | 177
> Acquiring sstable references | 17:27:50,829 | 10.1.0.5 | 191
> Merging memtable tombstones | 17:27:50,830 | 10.1.0.5 | 208
> Message received from /10.1.0.5 | 17:27:50,830 | 10.1.0.1 | 1461
> Message received from /10.1.0.1 | 17:27:50,830 | 10.1.0.2 | 14
> Key cache hit for sstable 1 | 17:27:50,830 | 10.1.0.5 | 254
> Processing response from /10.1.0.5 | 17:27:50,830 | 10.1.0.1 | 1514
> Executing single-partition query on users | 17:27:50,830 | 10.1.0.2 | 78
> Seeking to partition beginning in data file | 17:27:50,830 | 10.1.0.5 | 264
> Sending message to /10.1.0.2 | 17:27:50,830 | 10.1.0.1 | 1517
> Acquiring sstable references | 17:27:50,830 | 10.1.0.2 | 85
> Skipped 0/1 non-slice-intersecting sstables, included 0 due to tombstones | 17:27:50,830 | 10.1.0.5 | 453
> Determining replicas for mutation | 17:27:50,830 | 10.1.0.1 | 1746
> Merging memtable tombstones | 17:27:50,830 | 10.1.0.2 | 97
> Merging data from memtables and 1 sstables | 17:27:50,830 | 10.1.0.5 | 476
> Message received from /10.1.0.2 | 17:27:50,830 | 10.1.0.1 | 2369
> Key cache hit for sstable 2 | 17:27:50,830 | 10.1.0.2 | 120
> Read 1 live and 0 tombstoned cells | 17:27:50,830 | 10.1.0.5 | 506
> Seeking to partition beginning in data file | 17:27:50,830 | 10.1.0.2 | 123
> Enqueuing response to /10.1.0.1 | 17:27:50,830 | 10.1.0.5 | 570
> Skipped 0/1 non-slice-intersecting sstables, included 0 due to tombstones | 17:27:50,830 | 10.1.0.2 | 288
> Sending message to /10.1.0.1 | 17:27:50,830 | 10.1.0.5 | 617
> Merging data from memtables and 1 sstables | 17:27:50,830 | 10.1.0.2 | 297
> Read 1 live and 0 tombstoned cells | 17:27:50,830 | 10.1.0.2 | 319
> Enqueuing response to /10.1.0.1 | 17:27:50,830 | 10.1.0.2 | 362
> Sending message to /10.1.0.1 | 17:27:50,830 | 10.1.0.2 | 420
> Message received from /10.1.0.1 | 17:27:50,831 | 10.1.0.4 | 17
> Sending message to /10.1.0.2 | 17:27:50,831 | 10.1.0.1 | 2435
> Message received from /10.1.0.1 | 17:27:50,831 | 10.1.0.2 | 8
> Acquiring switchLock read lock | 17:27:50,831 | 10.1.0.4 | 61
> Processing response from /10.1.0.2 | 17:27:50,831 | 10.1.0.1 | 2488
> Acquiring switchLock read lock | 17:27:50,831 | 10.1.0.2 | 44
> Appending to commitlog | 17:27:50,831 | 10.1.0.4 | 78
> Not hinting /10.2.0.4 which has been down 364809650ms | 17:27:50,831 | 10.1.0.1 | 2503
> Appending to commitlog | 17:27:50,831 | 10.1.0.2 | 62
> Adding to event memtable | 17:27:50,831 | 10.1.0.4 | 96
> Sending message to /10.1.0.4 | 17:27:50,831 | 10.1.0.1 | 2557
> Adding to event memtable | 17:27:50,831 | 10.1.0.2 | 80
> Enqueuing response to /10.1.0.1 | 17:27:50,831 | 10.1.0.4 | 177
> Acquiring switchLock read lock | 17:27:50,831 | 10.1.0.1 | 2669
> Enqueuing response to /10.1.0.1 | 17:27:50,831 | 10.1.0.2 | 160
> Sending message to /10.1.0.1 | 17:27:50,831 | 10.1.0.4 | 333
> Appending to commitlog | 17:27:50,831 | 10.1.0.1 | 2682
> Sending message to /10.1.0.1 | 17:27:50,831 | 10.1.0.2 | 266
> Adding to event memtable | 17:27:50,831 | 10.1.0.1 | 2720
> Sending message to /10.2.0.5 | 17:27:50,831 | 10.1.0.1 | 2758
> Message received from /10.1.0.2 | 17:27:50,831 | 10.1.0.1 | 2989
> Processing response from /10.1.0.2 | 17:27:50,831 | 10.1.0.1 | 3024
> Message received from /10.1.0.4 | 17:27:50,832 | 10.1.0.1 | 3764
> Processing response from /10.1.0.4 | 17:27:50,832 | 10.1.0.1 | 3805
> Message received from /10.1.0.1 | 17:27:50,841 | 10.2.0.5 | 24
> Enqueuing forwarded write to /10.2.0.7 | 17:27:50,841 | 10.2.0.5 | 255
> Enqueuing forwarded write to /10.2.0.3 | 17:27:50,841 | 10.2.0.5 | 283
> Enqueuing forwarded write to /10.2.0.6 | 17:27:50,841 | 10.2.0.5 | 307
> Acquiring switchLock read lock | 17:27:50,841 | 10.2.0.5 | 362
> Appending to commitlog | 17:27:50,841 | 10.2.0.5 | 380
> Sending message to /10.2.0.7 | 17:27:50,841 | 10.2.0.5 | 382
> Adding to event memtable | 17:27:50,841 | 10.2.0.5 | 411
> Sending message to /10.2.0.6 | 17:27:50,841 | 10.2.0.5 | 429
> Enqueuing response to /10.1.0.1 | 17:27:50,841 | 10.2.0.5 | 484
> Sending message to /10.2.0.3 | 17:27:50,841 | 10.2.0.5 | 561
> Sending message to /10.1.0.1 | 17:27:50,841 | 10.2.0.5 | 625
> Acquiring switchLock read lock | 17:27:50,842 | 10.2.0.6 | 1031
> Acquiring switchLock read lock | 17:27:50,842 | 10.2.0.7 | 178
> Appending to commitlog | 17:27:50,842 | 10.2.0.6 | 1066
> Appending to commitlog | 17:27:50,842 | 10.2.0.7 | 196
> Adding to event memtable | 17:27:50,842 | 10.2.0.6 | 1118
> Adding to event memtable | 17:27:50,842 | 10.2.0.7 | 231
> Enqueuing response to /10.1.0.1 | 17:27:50,842 | 10.2.0.6 | 1181
> Enqueuing response to /10.1.0.1 | 17:27:50,842 | 10.2.0.7 | 286
> Sending message to /10.1.0.1 | 17:27:50,842 | 10.2.0.7 | 421
> Acquiring switchLock read lock | 17:27:50,843 | 10.2.0.3 | 1216
> Sending message to /10.1.0.1 | 17:27:50,843 | 10.2.0.6 | 1382
> Appending to commitlog | 17:27:50,843 | 10.2.0.3 | 1239
> Adding to event memtable | 17:27:50,843 | 10.2.0.3 | 1313
> Enqueuing response to /10.1.0.1 | 17:27:50,843 | 10.2.0.3 | 1407
> Sending message to /10.1.0.1 | 17:27:50,843 | 10.2.0.3 | 1631
> Message received from /10.2.0.5 | 17:27:50,851 | 10.1.0.1 | 23333
> Processing response from /10.2.0.5 | 17:27:50,851 | 10.1.0.1 | 23380
> Message received from /10.2.0.7 | 17:27:50,852 | 10.1.0.1 | 23908
> Processing response from /10.2.0.7 | 17:27:50,852 | 10.1.0.1 | 23953
> Message received from /10.2.0.6 | 17:27:50,853 | 10.1.0.1 | 25143
> Processing response from /10.2.0.6 | 17:27:50,853 | 10.1.0.1 | 25178
> Message received from /10.2.0.3 | 17:27:50,854 | 10.1.0.1 | 25478
> Processing response from /10.2.0.3 | 17:27:50,854 | 10.1.0.1 | 25516
> Write timeout; received 3 of 4 required replies | 17:27:52,831 | 10.1.0.1 | 2002873
> Request complete | 17:27:52,833 | 10.1.0.1 | 2005989
> {code}
> cqlsh:XXXX> CONSISTENCY
> Current consistency level is LOCAL_QUORUM.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)