You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Stefan Miklosovic (Jira)" <ji...@apache.org> on 2020/05/08 08:35:00 UTC
[jira] [Comment Edited] (CASSANDRA-15158) Wait for schema agreement rather then in flight schema requests when bootstrapping

    [ https://issues.apache.org/jira/browse/CASSANDRA-15158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17102374#comment-17102374 ] 

Stefan Miklosovic edited comment on CASSANDRA-15158 at 5/8/20, 8:34 AM:
------------------------------------------------------------------------

Hi [~bdeggleston],

commenting on design issues, I am not completely sure if these issues you are talking about are related to this patch or they are already existing? We could indeed focus on the points you raised but it seems to me that the current (comitted) code is worse without this patch than with as I guess these problems are already there?

Isn't the goal here to have all nodes on same versions? Isn't the very fact that there are multiple versions pretty strange to begin with so we should not even try to join a node if they mismatch hence there is nothing to deal with in the first place? 
{quote}It will only wait until it has _some_ schema to begin bootstrapping, not all
{quote}
This is the most likely not true unless I am not getting something. The node to be bootstrapped will never advance in doing so unless all nodes have same versions. 
{quote} For instance, if a single node is reporting a schema version that no one else has, but the node is unreachable, what do we do?
{quote}

We should fail whole bootstrapping and one should go and fix it.

{quote}For instance, if a single node is reporting a schema version that no one else has, but the node is unreachable, what do we do?{quote}

How can a node report its schema while being unreachable?

{quote}Next, I like how this limits the number of messages sent to a given endpoint, but we should also limit the number of messages we send out for a given schema version. If we have a large cluster, and all nodes are reporting the same version, we don't need to ask every node for it's schema.{quote}

I am sorry, I am not following what you say here, in particular the very last sentence. I think the schema is ever pull (message is sent) _only_  in case that reported schema version from Gossipper is different, only after that we are ever sending a message.

When it comes to testing, I admit that adding isRunningForcibly method feels like a hack but I had very hard time to test this stuff out. It was basically the only reasonable way possible at the time I was coding it, if you know of more better version, please tell me otherwise I am not sure what might be better here and we could stick with this for a time being? The whole testing methodology was based on these callbacks and checking their inner state which results into having a methods which are accepting them so we can elaborate on their state. Without "injecting" them from outside, I would not be able to do that. 


was (Author: stefan.miklosovic):
Hi [~bdeggleston],

commenting on design issues, I am not completely sure if these issues you are talking about are related to this patch or they are already existing? We could indeed focus on the points you raised but it seems to me that the current (comitted) code is worse without this patch than with as I guess these problems are already there?

Isn't the goal here to have all nodes on same versions? Isn't the very fact that there are multiple versions pretty strange to begin with so we should not even try to join a node if they mismatch hence there is nothing to deal with in the first place? 
{quote}It will only wait until it has _some_ schema to begin bootstrapping, not all
{quote}
This is the most likely not true unless I am not getting something. The node to be bootstrapped will never advance in doing so unless all nodes have same versions. 
{quote} For instance, if a single node is reporting a schema version that no one else has, but the node is unreachable, what do we do?
{quote}

We should fail whole bootstrapping and one should go and fix it.

{quote}Next, I like how this limits the number of messages sent to a given endpoint, but we should also limit the number of messages we send out for a given schema version. If we have a large cluster, and all nodes are reporting the same version, we don't need to ask every node for it's schema.{quote}

I am sorry, I am not following what you say here, in particular the very last sentence. I think the schema is ever pull (message is sent) _only_  in case that reported schema version from Gossipper is different, only after that we are ever sending a message.

When it comes to testing, I admit that adding isRunningForcibly method feels like a hack but I had very hard time to test this stuff out. It was basically the only reasonable way possible at the time I was coding it, if you know of more better version, please tell me otherwise I am not sure what might be better here and we could stick with this for a time being? The whole testing methodology was based on these callbacks and checking their inner state which results into having a methods which are accepting them so we can elaborate on their state. Without "injecting" them from outside, I would not be able to do that. 

> Wait for schema agreement rather then in flight schema requests when bootstrapping
> ----------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-15158
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15158
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Cluster/Gossip, Cluster/Schema
>            Reporter: Vincent White
>            Assignee: Ben Bromhead
>            Priority: Normal
>
> Currently when a node is bootstrapping we use a set of latches (org.apache.cassandra.service.MigrationTask#inflightTasks) to keep track of in-flight schema pull requests, and we don't proceed with bootstrapping/stream until all the latches are released (or we timeout waiting for each one). One issue with this is that if we have a large schema, or the retrieval of the schema from the other nodes was unexpectedly slow then we have no explicit check in place to ensure we have actually received a schema before we proceed.
> While it's possible to increase "migration_task_wait_in_seconds" to force the node to wait on each latche longer, there are cases where this doesn't help because the callbacks for the schema pull requests have expired off the messaging service's callback map (org.apache.cassandra.net.MessagingService#callbacks) after request_timeout_in_ms (default 10 seconds) before the other nodes were able to respond to the new node.
> This patch checks for schema agreement between the bootstrapping node and the rest of the live nodes before proceeding with bootstrapping. It also adds a check to prevent the new node from flooding existing nodes with simultaneous schema pull requests as can happen in large clusters.
> Removing the latch system should also prevent new nodes in large clusters getting stuck for extended amounts of time as they wait `migration_task_wait_in_seconds` on each of the latches left orphaned by the timed out callbacks.
>  
> ||3.11||
> |[PoC|https://github.com/apache/cassandra/compare/cassandra-3.11...vincewhite:check_for_schema]|
> |[dtest|https://github.com/apache/cassandra-dtest/compare/master...vincewhite:wait_for_schema_agreement]|
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org