You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Stefan Miklosovic (Jira)" <ji...@apache.org> on 2020/07/27 11:01:00 UTC

[jira] [Commented] (CASSANDRA-15158) Wait for schema agreement rather than in flight schema requests when bootstrapping

    [ https://issues.apache.org/jira/browse/CASSANDRA-15158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17165619#comment-17165619 ] 

Stefan Miklosovic commented on CASSANDRA-15158:
-----------------------------------------------

In general looks good to me in spite of getting lost a bit on the actual schema migration response:

 

 
{code:java}
Future<Void> response(Collection<Mutation> mutations)
{
    synchronized (info)
    {
        if (shouldApplySchemaFrom(endpoint, info))
            mergeSchemaFrom(endpoint, mutations);
        return pullComplete(endpoint, info, true);  /// why?
    }
}
{code}
 

I am not completely sure why are we pulling again here. I would rewrite the whole solution in a such way that this Callable just does one thing on a successful response (merging of a schema) and the actual "retry" would be handled from outside. The reader has to make quite a mental exercise to visualise that this callback might actually call another callback in it until some "version" is completed etc ... At least for me, it was quite tedious to track.

I understand the motivation behind that but callback should just do one task and thats it, it shouldnt be responsible for potentially scheduling another callback recursively until some conditions on some signals or what have you are met ... just my 2 cents.

There is also this:

 
{code:java}
synchronized Future<Void> reportEndpointVersion(InetAddress endpoint, UUID version)
{
    UUID current = endpointVersions.get(endpoint);
    if (current != null && current.equals(version))
        return FINISHED_FUTURE;

    VersionInfo info = versionInfo.computeIfAbsent(version, VersionInfo::new);
    if (isLocalVersion(version))
        info.markReceived();
    info.endpoints.add(endpoint);
    info.requestQueue.add(endpoint);
    endpointVersions.put(endpoint, version);

    removeEndpointFromVersion(endpoint, current); ///// why?
    return maybePullSchema(info);
}
{code}
 

TBH that is quite counterintuitive too, maybe renaming of that method would help.

 

The test has failed for me (repeatedly):

 

 
{code:java}
java.lang.AssertionError: java.lang.AssertionError: 
Expected :2
Actual   :0
<Click to see difference> at org.junit.Assert.fail(Assert.java:92) at org.junit.Assert.failNotEquals(Assert.java:689) at org.junit.Assert.assertEquals(Assert.java:127) at org.junit.Assert.assertEquals(Assert.java:514) at org.junit.Assert.assertEquals(Assert.java:498) at org.apache.cassandra.service.MigrationCoordinatorTest.testWeKeepSendingRequests(MigrationCoordinatorTest.java:278) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:44) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:180) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:41) at org.junit.runners.ParentRunner$1.evaluate(ParentRunner.java:173) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31) at org.junit.runners.ParentRunner.run(ParentRunner.java:220) at org.junit.runner.JUnitCore.run(JUnitCore.java:159) at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68) at com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:33) at com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:230) at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:58)
{code}
 

> Wait for schema agreement rather than in flight schema requests when bootstrapping
> ----------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-15158
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15158
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Cluster/Gossip, Cluster/Schema
>            Reporter: Vincent White
>            Assignee: Ben Bromhead
>            Priority: Normal
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently when a node is bootstrapping we use a set of latches (org.apache.cassandra.service.MigrationTask#inflightTasks) to keep track of in-flight schema pull requests, and we don't proceed with bootstrapping/stream until all the latches are released (or we timeout waiting for each one). One issue with this is that if we have a large schema, or the retrieval of the schema from the other nodes was unexpectedly slow then we have no explicit check in place to ensure we have actually received a schema before we proceed.
> While it's possible to increase "migration_task_wait_in_seconds" to force the node to wait on each latche longer, there are cases where this doesn't help because the callbacks for the schema pull requests have expired off the messaging service's callback map (org.apache.cassandra.net.MessagingService#callbacks) after request_timeout_in_ms (default 10 seconds) before the other nodes were able to respond to the new node.
> This patch checks for schema agreement between the bootstrapping node and the rest of the live nodes before proceeding with bootstrapping. It also adds a check to prevent the new node from flooding existing nodes with simultaneous schema pull requests as can happen in large clusters.
> Removing the latch system should also prevent new nodes in large clusters getting stuck for extended amounts of time as they wait `migration_task_wait_in_seconds` on each of the latches left orphaned by the timed out callbacks.
>  
> ||3.11||
> |[PoC|https://github.com/apache/cassandra/compare/cassandra-3.11...vincewhite:check_for_schema]|
> |[dtest|https://github.com/apache/cassandra-dtest/compare/master...vincewhite:wait_for_schema_agreement]|
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org