You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Jordan West (Jira)" <ji...@apache.org> on 2020/08/12 03:09:00 UTC

[jira] [Commented] (CASSANDRA-15833) Unresolvable false digest mismatch during upgrade due to CASSANDRA-10657

    [ https://issues.apache.org/jira/browse/CASSANDRA-15833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17175955#comment-17175955 ] 

Jordan West commented on CASSANDRA-15833:
-----------------------------------------

Thanks [~jlewandowski]. I've taken bits of both yours and my patches and pushed themĀ [here for 3.11|https://github.com/jrwest/cassandra/tree/jwest/15833-3.11] and [here for trunk|https://github.com/jrwest/cassandra/tree/jwest/15833-trunk]. I've included your test fixes from CASSANDRA-15946 since its not yet merged. The differences between this patch and the original patch are:
* The patch included here addresses the case where the 3.0 node is the coordinator. This is why there is an additional change in {{ColumnFilter.Serializer#deserialize}}
* No change to {{ColumnFilter#selection(CFMetadata, PartitionColumns)}} in 3.11. As far as I could tell this method was only used in testing and that testing broke when fixing deserialization from 3.0 nodes. 
* It does not include the {{ColumnFilter#selection(TableMetadata, RegularAndStaticColumns)}} change either. This method does seem to be used in CAS but that doesn't seem to be related to the failure here -- there might be a separate issue with CAS however. I was curious if you hit this specifically or what motivated that change in your original patch? 
* To fix {{Gossiper#haveAnyMajorVersion3Nodes}}, I modified there check to abort if it detects the race condition with the updated gossip state. This fixes the issue where the method returns true when there are older nodes in the cluster. I did not modify the 3.11 version, {{Gossiper#isAnyNode30}}, because the window where its wrong is very very small and shouldn't be material in practice (testing shows that it settles before the node takes traffic).

Test runs are here: [3.11|https://app.circleci.com/pipelines/github/jrwest/cassandra?branch=jwest%2F15833-3.11] [trunk | https://app.circleci.com/pipelines/github/jrwest/cassandra?branch=jwest%2F15833-trunk]



> Unresolvable false digest mismatch during upgrade due to CASSANDRA-10657
> ------------------------------------------------------------------------
>
>                 Key: CASSANDRA-15833
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15833
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Consistency/Repair
>            Reporter: Jacek Lewandowski
>            Assignee: Jacek Lewandowski
>            Priority: Normal
>             Fix For: 3.11.x, 4.0-beta
>
>         Attachments: CASSANDRA-15833-3.11.patch, CASSANDRA-15833-4.0.patch
>
>
> CASSANDRA-10657 introduced changes in how the ColumnFilter is interpreted. This results in digest mismatch when querying incomplete set of columns from a table with consistency that requires reaching instances running pre CASSANDRA-10657 from nodes that include CASSANDRA-10657 (it was introduced in Cassandra 3.4). 
> The fix is to bring back the previous behaviour until there are no instances running pre CASSANDRA-10657 version. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org