You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Doug Rohrer (Jira)" <ji...@apache.org> on 2019/11/01 14:45:00 UTC

[jira] [Comment Edited] (CASSANDRA-15347) Add client testing capabilities to in-jvm tests

    [ https://issues.apache.org/jira/browse/CASSANDRA-15347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16964853#comment-16964853 ] 

Doug Rohrer edited comment on CASSANDRA-15347 at 11/1/19 2:44 PM:
------------------------------------------------------------------

This set of PRs allows the in-jvm dtest framework to support native protocol clients, which allows for testing of the Java client and other use-cases where it makes sense to test from "outside" (Spark, for example).

 

Four PRs for different Cassandra versions:

2.2 [changes|https://github.com/apache/cassandra/pull/377] [Circle|https://circleci.com/workflow-run/19f5082f-eedc-4d8e-8d33-558848fddc77]
 3.0 [changes|https://github.com/apache/cassandra/pull/376] [Circle|https://circleci.com/workflow-run/ddf5b452-2a51-4d3a-9cd4-d4b279e0f280]
 3.11 [changes|https://github.com/apache/cassandra/pull/375] [Circle|https://circleci.com/workflow-run/59c4d1b6-c0c2-4179-b719-a8c041c849ff]
 Trunk [changes|https://github.com/apache/cassandra/pull/374] [Circle|https://circleci.com/workflow-run/fea3a793-bf13-4652-8b88-d29e1b513254] 

The changes are more extensive than just "Add Native Transport Support," as I ran into several reliability issues with the tests once we started allowing connectivity via the native transport, but may have already been causing some level of instability, and to speed up test execution times. These changes include:
 - Setting {{auto_bootstrap}} to false by default for in-jvm dtests. There was no reason to wait for instances to bootstrap before starting tests, as the cluster is empty, which could slow down test execution and caused some test timing issues where requests could be made before the instance was fully ready. Tests that may need {{auto_bootstrap}} later can always set it explicitly.
 - It was possible, especially in {{trunk}}, for tests to fail to be able to create the initial keyspace requested in {{DistributedTestBase.init}} because of a race between a hard-coded 60-second timeout in MigrationManager {{MIGRATION_DELAY_IN_MS}) and an identical 60-second hard-coded wait timeout in the {{SchemaChangeMonitor}}. This could occur if the instance where the schema change was submitted did not yet see one or more other instances in its live member list when first gossiping the schema change. There were two changes made to alleviate this issue:
 ** Extend the {{SchemaChangeMonitor}}'s delay to 70 seconds to accommodate the {{MigrationManager}}'s 60-second delay
 **  In order to avoid the root cause, and the potential of a 70 second delay if tests hit the race, also added a new monitor {{LiveMemberAgreementMonitor}} which waits for all instances to agree that the live member count is equal to our expected count of instances running before moving on from Cluster.startup. This adds a very minor potential delay to cluster startup as we wait for the members to all see each other, but completely avoids the possibility that the subsequent schema change will be delayed by up to 60 seconds.

There are a few other minor changes/refactorings that were picked up from Alex's original patch for this change, which was never submitted to C*, so he was kind enough to help me put this together and has done some early code review as well. A new test {{NativeTransportTest}} was added to cover the native transport functionality and a new {{ResourceLeakTest}} to make sure we weren't introducing any cross-classloader references that would block collection of classes and exhaust java's metaspace.


was (Author: drohrer):
This set of PRs allows the in-jvm dtest framework to support native protocol clients, which allows for testing of the Java client and other use-cases where it makes sense to test from "outside" (Spark, for example).

 

Four PRs for different Cassandra versions:

2.2 [changes|https://github.com/apache/cassandra/pull/377] [Circle|https://circleci.com/workflow-run/19f5082f-eedc-4d8e-8d33-558848fddc77]
 3.0 [changes|https://github.com/apache/cassandra/pull/376] [Circle|https://circleci.com/workflow-run/ddf5b452-2a51-4d3a-9cd4-d4b279e0f280]
 3.11 [changes|https://github.com/apache/cassandra/pull/375] [Circle|https://circleci.com/workflow-run/59c4d1b6-c0c2-4179-b719-a8c041c849ff]
 Trunk [changes|https://github.com/apache/cassandra/pull/374] [Circle JDK8|https://circleci.com/workflow-run/cdef0fd3-38ec-4fb9-b506-739ba84580eb] [Circle JDK11|https://circleci.com/workflow-run/eada78ad-dc5b-42b0-962e-fa80865051a8]

The changes are more extensive than just "Add Native Transport Support," as I ran into several reliability issues with the tests once we started allowing connectivity via the native transport, but may have already been causing some level of instability, and to speed up test execution times. These changes include:
 - Setting {{auto_bootstrap}} to false by default for in-jvm dtests. There was no reason to wait for instances to bootstrap before starting tests, as the cluster is empty, which could slow down test execution and caused some test timing issues where requests could be made before the instance was fully ready. Tests that may need {{auto_bootstrap}} later can always set it explicitly.
 - It was possible, especially in {{trunk}}, for tests to fail to be able to create the initial keyspace requested in {{DistributedTestBase.init}} because of a race between a hard-coded 60-second timeout in MigrationManager {{MIGRATION_DELAY_IN_MS}) and an identical 60-second hard-coded wait timeout in the {{SchemaChangeMonitor}}. This could occur if the instance where the schema change was submitted did not yet see one or more other instances in its live member list when first gossiping the schema change. There were two changes made to alleviate this issue:
 ** Extend the {{SchemaChangeMonitor}}'s delay to 70 seconds to accommodate the {{MigrationManager}}'s 60-second delay
 **  In order to avoid the root cause, and the potential of a 70 second delay if tests hit the race, also added a new monitor {{LiveMemberAgreementMonitor}} which waits for all instances to agree that the live member count is equal to our expected count of instances running before moving on from Cluster.startup. This adds a very minor potential delay to cluster startup as we wait for the members to all see each other, but completely avoids the possibility that the subsequent schema change will be delayed by up to 60 seconds.

There are a few other minor changes/refactorings that were picked up from Alex's original patch for this change, which was never submitted to C*, so he was kind enough to help me put this together and has done some early code review as well. A new test {{NativeTransportTest}} was added to cover the native transport functionality and a new {{ResourceLeakTest}} to make sure we weren't introducing any cross-classloader references that would block collection of classes and exhaust java's metaspace.

> Add client testing capabilities to in-jvm tests
> -----------------------------------------------
>
>                 Key: CASSANDRA-15347
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15347
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Test/dtest
>            Reporter: Alex Petrov
>            Assignee: Doug Rohrer
>            Priority: Normal
>              Labels: patch-available, pull-request-available
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> Allow testing native transport code path using in-jvm tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org