You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Brandon Williams (Jira)" <ji...@apache.org> on 2021/09/09 11:40:00 UTC

[jira] [Updated] (CASSANDRA-16937) cassandra local_quorum query is inconsistent

     [ https://issues.apache.org/jira/browse/CASSANDRA-16937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brandon Williams updated CASSANDRA-16937:
-----------------------------------------
    Resolution: Duplicate
        Status: Resolved  (was: Triage Needed)

>  cassandra local_quorum query is inconsistent
> ---------------------------------------------
>
>                 Key: CASSANDRA-16937
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16937
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: HUANG DUICAN
>            Priority: Normal
>
> The version number of this issue is wrong:https://issues.apache.org/jira/browse/CASSANDRA-16919
> update the version: cassandra version: 2.1.15
> Number of nodes: dc1: 80, dc2: 80
> problem:
> Our copy strategy is as follows:
> WITH REPLICATION = \{'class':'NetworkTopologyStrategy','dc1': 3,'dc2': 3};
> We encountered a problem with cassandra, and it was inconsistent when querying with local_quorum. We will only read and write in dc1.
> We also use local_quorum for writing, and then use local_quorum for queries.
> But there is a phenomenon, use the following statement:
> select count(*) from table where partitionKey=?
> The results of the query were initially inconsistent and eventually consistent.
> Assuming that the first is 10000, the second is 9998, and the third is 9997, it may remain at 10001 in the end(Maybe it was triggered to read repair, which led to the final stabilization) .
> During this period, we have done a large-scale expansion. And make sure that every machine is cleaned up. And we also found that the results of using getEndpoint <keyspace> <table> <key> on different machines are inconsistent. In the end, we found that the result of getEndpoint has 4 machines in dc1.
> Then we executed getSstable on the corresponding 4 machines, only 3 machines showed the results, and the other machine did not show the results. At the same time, we encountered a similar problem with another partitionKey, but this partitionKey was only queried once, because we recorded the total number of partitionKey in another place, and we can confirm that the total number of partitionKey is incorrect.
> After we restarted each machine of dc1 one by one, this problem was solved.
> The total number of partitionKey is consistent with the result recorded by us, and if the same query is done multiple times, the result will not change.
> Therefore, I suspect that the gossip synchronization node information is too slow, which may lead to inconsistent final results when selecting nodes for query.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org