You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Sylvain Lebresne (JIRA)" <ji...@apache.org> on 2015/09/07 17:43:45 UTC

[jira] [Commented] (CASSANDRA-9761) Delay auth setup until peers are upgraded

    [ https://issues.apache.org/jira/browse/CASSANDRA-9761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14733856#comment-14733856 ] 

Sylvain Lebresne commented on CASSANDRA-9761:
---------------------------------------------

For info, this is the reason for the failure of at least some upgrade dtests (typically [this|http://cassci.datastax.com/job/cassandra-2.2_dtest/lastCompletedBuild/testReport/upgrade_tests.paging_test/TestPagingData/test_paging_a_single_wide_row/]).  Basically, the test are issuing a truncate as their first order of business, and the 2.1 closes the connection to the other node due to this, some of the truncation acknowledgment get losts because it's in the queue of that connection, hence ending up in a truncate timeout.

And of course there is the fact that due to this the 2.1 node logs a warning with a stack trace, which might worry operators a bit even though nothing is wrong.


> Delay auth setup until peers are upgraded
> -----------------------------------------
>
>                 Key: CASSANDRA-9761
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9761
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Sam Tunnicliffe
>             Fix For: 3.0.0 rc1, 2.2.2
>
>
> The built in auth classes {{CassandraRoleManager}} and {{CassandraAuthorizer}} both attempt to do some setup and data conversion when a node is upgraded to version 2.2 or higher. At the moment, each node attempts the operations with the expectation that this will fail until enough of the cluster has been upgraded for it to succeed (i.e. enough nodes have the latest schema with the requisite new tables). These expected failures are largely harmless, but they are annoying because they cause the receiving node (the non-upgraded node) to close the connection with the upgraded node, which then has to be restablished. Although this is the normal behaviour on schema disagreement (see CASSANDRA-9136 for further discussion), it may be possible to avoid in this specific circumstance. Given that we expect the operations to fail until enough nodes are upgraded, we could defer them until we're sure they can succeed by checking the messaging service version of peers. 
> Right now these are a one shot thing, each node only makes one attempt at the conversion (until it is restarted). Without investigating further, I don't know if we'd need to add in retries in case it takes a little time for each peer's MS version to be updated as they're upgraded. The setup & conversion operations are idempotent, so there shouldn't be a great issue if several nodes  attempt them at the same time anyway.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)