You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@cassandra.apache.org by "HUANG DUICAN (Jira)" <ji...@apache.org> on 2021/09/08 01:45:00 UTC

[jira] [Created] (CASSANDRA-16919) cassandra local_quorum query is inconsistent

HUANG DUICAN created CASSANDRA-16919:
----------------------------------------

Summary: cassandra local_quorum query is inconsistent
Key: CASSANDRA-16919
URL: https://issues.apache.org/jira/browse/CASSANDRA-16919
Project: Cassandra
Issue Type: Bug
Reporter: HUANG DUICAN

cassandra version: 2.0.15
Number of nodes: dc1: 80, dc2: 80
problem:
Our copy strategy is as follows:
WITH REPLICATION = \{'class':'NetworkTopologyStrategy','dc1': 3,'dc2': 3};
We encountered a problem with cassandra, and it was inconsistent when querying with local_quorum. We will only read and write in dc1.
We also use local_quorum for writing, and then use local_quorum for queries.
But there is a phenomenon, use the following statement:
select count(*) from table where partitionKey=?
The results of the query were initially inconsistent and eventually consistent.

Assuming that the first is 10000, the second is 9998, and the third is 9997, it may remain at 10001 in the end(Maybe it was triggered to read repair, which led to the final stabilization) .
During this period, we have done a large-scale expansion. And make sure that every machine is cleaned up. And we also found that the results of using getEndpoint <keyspace> <table> <key> on different machines are inconsistent. In the end, we found that the result of getEndpoint has 4 machines in dc1.

Then we executed getSstable on the corresponding 4 machines, only 3 machines showed the results, and the other machine did not show the results. At the same time, we encountered a similar problem with another partitionKey, but this partitionKey was only queried once, because we recorded the total number of partitionKey in another place, and we can confirm that the total number of partitionKey is incorrect.

After we restarted each machine of dc1 one by one, this problem was solved.
The total number of partitionKey is consistent with the result recorded by us, and if the same query is done multiple times, the result will not change.
Therefore, I suspect that the gossip synchronization node information is too slow, which may lead to inconsistent final results when selecting nodes for query.

--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org