You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@ignite.apache.org by GitBox <gi...@apache.org> on 2020/12/01 22:47:20 UTC

[GitHub] [ignite] Vladsz83 opened a new pull request #8522: Research small timeouts

Vladsz83 opened a new pull request #8522:
URL: https://github.com/apache/ignite/pull/8522


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] Vladsz83 commented on a change in pull request #8522: GNITE-13835 : Improve discovery ducktape test to research small timeouts and behavior on large cluster.

Posted by GitBox <gi...@apache.org>.
Vladsz83 commented on a change in pull request #8522:
URL: https://github.com/apache/ignite/pull/8522#discussion_r570809855



##########
File path: modules/ducktests/tests/ignitetest/services/ignite.py
##########
@@ -90,3 +91,19 @@ def get_event_time(service, log_node, log_pattern, from_the_beginning=True, time
 
     return datetime.strptime(re.match("^\\[[^\\[]+\\]", stdout.read().decode("utf-8")).group(),
                              "[%Y-%m-%d %H:%M:%S,%f]")
+
+
+def exec_command(node, cmd):
+    """Executes the command passed on the given node and returns result as string."""
+    return str(node.account.ssh_client.exec_command(cmd)[1].read(), sys.getdefaultencoding())
+
+
+def node_id(node):
+    """
+    Returns node id from its log if started.
+    This is a remote call. Reuse its results if possible.
+    """
+    regexp = "^>>> Local node \\[ID=([^,]+),.+$"
+    cmd = "grep -E '%s' %s | sed -r 's/%s/\\1/'" % (regexp, node.log_file, regexp)
+
+    return exec_command(node, cmd).strip().lower()

Review comment:
       Because it is 'statuc' helpers to binded to any instanse. Is it ok?

##########
File path: modules/ducktests/tests/ignitetest/services/ignite.py
##########
@@ -90,3 +91,19 @@ def get_event_time(service, log_node, log_pattern, from_the_beginning=True, time
 
     return datetime.strptime(re.match("^\\[[^\\[]+\\]", stdout.read().decode("utf-8")).group(),
                              "[%Y-%m-%d %H:%M:%S,%f]")
+
+
+def exec_command(node, cmd):
+    """Executes the command passed on the given node and returns result as string."""
+    return str(node.account.ssh_client.exec_command(cmd)[1].read(), sys.getdefaultencoding())
+
+
+def node_id(node):
+    """
+    Returns node id from its log if started.
+    This is a remote call. Reuse its results if possible.
+    """
+    regexp = "^>>> Local node \\[ID=([^,]+),.+$"
+    cmd = "grep -E '%s' %s | sed -r 's/%s/\\1/'" % (regexp, node.log_file, regexp)
+
+    return exec_command(node, cmd).strip().lower()

Review comment:
       Because it is 'static' helpers to binded to any instanse. Is it ok?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] anton-vinogradov commented on a change in pull request #8522: GNITE-13835 : Improve discovery ducktape test to research small timeouts and behavior on large cluster.

Posted by GitBox <gi...@apache.org>.
anton-vinogradov commented on a change in pull request #8522:
URL: https://github.com/apache/ignite/pull/8522#discussion_r570873894



##########
File path: modules/ducktests/tests/ignitetest/services/ignite.py
##########
@@ -90,3 +91,19 @@ def get_event_time(service, log_node, log_pattern, from_the_beginning=True, time
 
     return datetime.strptime(re.match("^\\[[^\\[]+\\]", stdout.read().decode("utf-8")).group(),
                              "[%Y-%m-%d %H:%M:%S,%f]")
+
+
+def exec_command(node, cmd):
+    """Executes the command passed on the given node and returns result as string."""
+    return str(node.account.ssh_client.exec_command(cmd)[1].read(), sys.getdefaultencoding())
+
+
+def node_id(node):
+    """
+    Returns node id from its log if started.
+    This is a remote call. Reuse its results if possible.
+    """
+    regexp = "^>>> Local node \\[ID=([^,]+),.+$"
+    cmd = "grep -E '%s' %s | sed -r 's/%s/\\1/'" % (regexp, node.log_file, regexp)
+
+    return exec_command(node, cmd).strip().lower()

Review comment:
       Why do we want to reduce this helper's scope only to IgniteService, not to all IgniteAwares?
   Seems, this will be a good case to have the same feature at IgniteApplicationService, especially considering the relocation price is 0?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] Vladsz83 commented on a change in pull request #8522: GNITE-13835 : Improve discovery ducktape test to research small timeouts and behavior on large cluster.

Posted by GitBox <gi...@apache.org>.
Vladsz83 commented on a change in pull request #8522:
URL: https://github.com/apache/ignite/pull/8522#discussion_r549455369



##########
File path: modules/ducktests/tests/ignitetest/tests/discovery_test.py
##########
@@ -88,99 +93,100 @@ def __init__(self, test_context):
 
     @cluster(num_nodes=MAX_CONTAINERS)
     @ignite_versions(str(DEV_BRANCH), str(LATEST))
-    @matrix(nodes_to_kill=[1, 2], failure_detection_timeout=[FAILURE_TIMEOUT],
-            load_type=[ClusterLoad.NONE, ClusterLoad.ATOMIC, ClusterLoad.TRANSACTIONAL])
-    def test_nodes_fail_not_sequential_tcp(self, ignite_version, nodes_to_kill, load_type, failure_detection_timeout):
+    @matrix(nodes_to_kill=[1, 2], load_type=[ClusterLoad.NONE, ClusterLoad.ATOMIC, ClusterLoad.TRANSACTIONAL])
+    def test_nodes_fail_not_sequential_tcp(self, ignite_version, nodes_to_kill, load_type):
         """
         Test nodes failure scenario with TcpDiscoverySpi not allowing nodes to fail in a row.
         """
         test_config = DiscoveryTestConfig(version=IgniteVersion(ignite_version), nodes_to_kill=nodes_to_kill,
-                                          load_type=ClusterLoad.construct_from(load_type), sequential_failure=False,
-                                          failure_detection_timeout=failure_detection_timeout)
+                                          load_type=ClusterLoad.construct_from(load_type), sequential_failure=False)
 
         return self._perform_node_fail_scenario(test_config)
 
     @cluster(num_nodes=MAX_CONTAINERS)
     @ignite_versions(str(DEV_BRANCH), str(LATEST))
-    @matrix(load_type=[ClusterLoad.NONE, ClusterLoad.ATOMIC, ClusterLoad.TRANSACTIONAL],
-            failure_detection_timeout=[FAILURE_TIMEOUT])
-    def test_2_nodes_fail_sequential_tcp(self, ignite_version, load_type, failure_detection_timeout):
+    @matrix(load_type=[ClusterLoad.NONE, ClusterLoad.ATOMIC, ClusterLoad.TRANSACTIONAL])
+    def test_2_nodes_fail_sequential_tcp(self, ignite_version, load_type):
         """
         Test 2 nodes sequential failure scenario with TcpDiscoverySpi.
         """
         test_config = DiscoveryTestConfig(version=IgniteVersion(ignite_version), nodes_to_kill=2,
-                                          load_type=ClusterLoad.construct_from(load_type), sequential_failure=True,
-                                          failure_detection_timeout=failure_detection_timeout)
+                                          load_type=ClusterLoad.construct_from(load_type), sequential_failure=True)
 
         return self._perform_node_fail_scenario(test_config)
 
     @cluster(num_nodes=MAX_CONTAINERS)
     @version_if(lambda version: version != V_2_8_0)  # ignite-zookeeper package is broken in 2.8.0
     @ignite_versions(str(DEV_BRANCH), str(LATEST))
-    @matrix(nodes_to_kill=[1, 2], failure_detection_timeout=[FAILURE_TIMEOUT],
-            load_type=[ClusterLoad.NONE, ClusterLoad.ATOMIC, ClusterLoad.TRANSACTIONAL])
-    def test_nodes_fail_not_sequential_zk(self, ignite_version, nodes_to_kill, load_type, failure_detection_timeout):
+    @matrix(nodes_to_kill=[1, 2], load_type=[ClusterLoad.NONE, ClusterLoad.ATOMIC, ClusterLoad.TRANSACTIONAL])
+    def test_nodes_fail_not_sequential_zk(self, ignite_version, nodes_to_kill, load_type):
         """
         Test node failure scenario with ZooKeeperSpi not allowing nodes to fail in a row.
         """
         test_config = DiscoveryTestConfig(version=IgniteVersion(ignite_version), nodes_to_kill=nodes_to_kill,
                                           load_type=ClusterLoad.construct_from(load_type), sequential_failure=False,
-                                          with_zk=True, failure_detection_timeout=failure_detection_timeout)
+                                          with_zk=True)
 
         return self._perform_node_fail_scenario(test_config)
 
     @cluster(num_nodes=MAX_CONTAINERS)
     @version_if(lambda version: version != V_2_8_0)  # ignite-zookeeper package is broken in 2.8.0
     @ignite_versions(str(DEV_BRANCH), str(LATEST))
-    @matrix(load_type=[ClusterLoad.NONE, ClusterLoad.ATOMIC, ClusterLoad.TRANSACTIONAL],
-            failure_detection_timeout=[FAILURE_TIMEOUT])
-    def test_2_nodes_fail_sequential_zk(self, ignite_version, load_type, failure_detection_timeout):
+    @matrix(load_type=[ClusterLoad.NONE, ClusterLoad.ATOMIC, ClusterLoad.TRANSACTIONAL])
+    def test_2_nodes_fail_sequential_zk(self, ignite_version, load_type):
         """
         Test node failure scenario with ZooKeeperSpi not allowing to fail nodes in a row.
         """
         test_config = DiscoveryTestConfig(version=IgniteVersion(ignite_version), nodes_to_kill=2,
                                           load_type=ClusterLoad.construct_from(load_type), sequential_failure=True,
-                                          with_zk=True, failure_detection_timeout=failure_detection_timeout)
+                                          with_zk=True)
 
         return self._perform_node_fail_scenario(test_config)
 
     def _perform_node_fail_scenario(self, test_config):
-        max_containers = len(self.test_context.cluster)
+        failure_detection_timeout = self._global_int(self.GLOBAL_DETECTION_TIMEOUT, self.DEFAULT_DETECTION_TIMEOUT)
 
         # One node is required to detect the failure.
-        assert max_containers >= 1 + test_config.nodes_to_kill + (
-            DiscoveryTest.ZOOKEEPER_NODES if test_config.with_zk else 0) + (
-                   0 if test_config.load_type == ClusterLoad.NONE else 1), "Few required containers: " + \
-                                                                           str(max_containers) + ". Check the params."
+        assert len(self.test_context.cluster) >= 1 + test_config.nodes_to_kill + (
+            self.ZOOKEEPER_NODES if test_config.with_zk else 0), \
+            f"Few required containers: {len(self.test_context.cluster)}. Check the params."
 
-        self.logger.info("Starting on " + str(max_containers) + " maximal containers.")
+        self.logger.info("Starting on " + str(len(self.test_context.cluster)) + " maximal containers.")
+        self.logger.info(f"{self.GLOBAL_DETECTION_TIMEOUT}: {failure_detection_timeout}")
 
         results = {}
 
         modules = ['zookeeper'] if test_config.with_zk else None
 
         if test_config.with_zk:
-            zk_quorum = start_zookeeper(self.test_context, DiscoveryTest.ZOOKEEPER_NODES, test_config)
+            zk_quorum = start_zookeeper(self.test_context, self.ZOOKEEPER_NODES, failure_detection_timeout)
 
             discovery_spi = from_zookeeper_cluster(zk_quorum)
         else:
             discovery_spi = TcpDiscoverySpi()
 
+            if LATEST_2_7 < test_config.version < V_2_9_1:

Review comment:
       Fixed




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] anton-vinogradov commented on a change in pull request #8522: GNITE-13835 : Improve discovery ducktape test to research small timeouts and behavior on large cluster.

Posted by GitBox <gi...@apache.org>.
anton-vinogradov commented on a change in pull request #8522:
URL: https://github.com/apache/ignite/pull/8522#discussion_r542427313



##########
File path: modules/ducktests/tests/ignitetest/tests/discovery_test.py
##########
@@ -88,99 +93,100 @@ def __init__(self, test_context):
 
     @cluster(num_nodes=MAX_CONTAINERS)
     @ignite_versions(str(DEV_BRANCH), str(LATEST))
-    @matrix(nodes_to_kill=[1, 2], failure_detection_timeout=[FAILURE_TIMEOUT],
-            load_type=[ClusterLoad.NONE, ClusterLoad.ATOMIC, ClusterLoad.TRANSACTIONAL])
-    def test_nodes_fail_not_sequential_tcp(self, ignite_version, nodes_to_kill, load_type, failure_detection_timeout):
+    @matrix(nodes_to_kill=[1, 2], load_type=[ClusterLoad.NONE, ClusterLoad.ATOMIC, ClusterLoad.TRANSACTIONAL])
+    def test_nodes_fail_not_sequential_tcp(self, ignite_version, nodes_to_kill, load_type):
         """
         Test nodes failure scenario with TcpDiscoverySpi not allowing nodes to fail in a row.
         """
         test_config = DiscoveryTestConfig(version=IgniteVersion(ignite_version), nodes_to_kill=nodes_to_kill,
-                                          load_type=ClusterLoad.construct_from(load_type), sequential_failure=False,
-                                          failure_detection_timeout=failure_detection_timeout)
+                                          load_type=ClusterLoad.construct_from(load_type), sequential_failure=False)
 
         return self._perform_node_fail_scenario(test_config)
 
     @cluster(num_nodes=MAX_CONTAINERS)
     @ignite_versions(str(DEV_BRANCH), str(LATEST))
-    @matrix(load_type=[ClusterLoad.NONE, ClusterLoad.ATOMIC, ClusterLoad.TRANSACTIONAL],
-            failure_detection_timeout=[FAILURE_TIMEOUT])
-    def test_2_nodes_fail_sequential_tcp(self, ignite_version, load_type, failure_detection_timeout):
+    @matrix(load_type=[ClusterLoad.NONE, ClusterLoad.ATOMIC, ClusterLoad.TRANSACTIONAL])
+    def test_2_nodes_fail_sequential_tcp(self, ignite_version, load_type):
         """
         Test 2 nodes sequential failure scenario with TcpDiscoverySpi.
         """
         test_config = DiscoveryTestConfig(version=IgniteVersion(ignite_version), nodes_to_kill=2,
-                                          load_type=ClusterLoad.construct_from(load_type), sequential_failure=True,
-                                          failure_detection_timeout=failure_detection_timeout)
+                                          load_type=ClusterLoad.construct_from(load_type), sequential_failure=True)
 
         return self._perform_node_fail_scenario(test_config)
 
     @cluster(num_nodes=MAX_CONTAINERS)
     @version_if(lambda version: version != V_2_8_0)  # ignite-zookeeper package is broken in 2.8.0
     @ignite_versions(str(DEV_BRANCH), str(LATEST))
-    @matrix(nodes_to_kill=[1, 2], failure_detection_timeout=[FAILURE_TIMEOUT],
-            load_type=[ClusterLoad.NONE, ClusterLoad.ATOMIC, ClusterLoad.TRANSACTIONAL])
-    def test_nodes_fail_not_sequential_zk(self, ignite_version, nodes_to_kill, load_type, failure_detection_timeout):
+    @matrix(nodes_to_kill=[1, 2], load_type=[ClusterLoad.NONE, ClusterLoad.ATOMIC, ClusterLoad.TRANSACTIONAL])
+    def test_nodes_fail_not_sequential_zk(self, ignite_version, nodes_to_kill, load_type):
         """
         Test node failure scenario with ZooKeeperSpi not allowing nodes to fail in a row.
         """
         test_config = DiscoveryTestConfig(version=IgniteVersion(ignite_version), nodes_to_kill=nodes_to_kill,
                                           load_type=ClusterLoad.construct_from(load_type), sequential_failure=False,
-                                          with_zk=True, failure_detection_timeout=failure_detection_timeout)
+                                          with_zk=True)
 
         return self._perform_node_fail_scenario(test_config)
 
     @cluster(num_nodes=MAX_CONTAINERS)
     @version_if(lambda version: version != V_2_8_0)  # ignite-zookeeper package is broken in 2.8.0
     @ignite_versions(str(DEV_BRANCH), str(LATEST))
-    @matrix(load_type=[ClusterLoad.NONE, ClusterLoad.ATOMIC, ClusterLoad.TRANSACTIONAL],
-            failure_detection_timeout=[FAILURE_TIMEOUT])
-    def test_2_nodes_fail_sequential_zk(self, ignite_version, load_type, failure_detection_timeout):
+    @matrix(load_type=[ClusterLoad.NONE, ClusterLoad.ATOMIC, ClusterLoad.TRANSACTIONAL])
+    def test_2_nodes_fail_sequential_zk(self, ignite_version, load_type):
         """
         Test node failure scenario with ZooKeeperSpi not allowing to fail nodes in a row.
         """
         test_config = DiscoveryTestConfig(version=IgniteVersion(ignite_version), nodes_to_kill=2,
                                           load_type=ClusterLoad.construct_from(load_type), sequential_failure=True,
-                                          with_zk=True, failure_detection_timeout=failure_detection_timeout)
+                                          with_zk=True)
 
         return self._perform_node_fail_scenario(test_config)
 
     def _perform_node_fail_scenario(self, test_config):
-        max_containers = len(self.test_context.cluster)
+        failure_detection_timeout = self._global_int(self.GLOBAL_DETECTION_TIMEOUT, self.DEFAULT_DETECTION_TIMEOUT)
 
         # One node is required to detect the failure.
-        assert max_containers >= 1 + test_config.nodes_to_kill + (
-            DiscoveryTest.ZOOKEEPER_NODES if test_config.with_zk else 0) + (
-                   0 if test_config.load_type == ClusterLoad.NONE else 1), "Few required containers: " + \
-                                                                           str(max_containers) + ". Check the params."
+        assert len(self.test_context.cluster) >= 1 + test_config.nodes_to_kill + (

Review comment:
       as for me, a better case is to reduce warning or perform refactoring/decomposition.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] Vladsz83 commented on a change in pull request #8522: GNITE-13835 : Improve discovery ducktape test to research small timeouts and behavior on large cluster.

Posted by GitBox <gi...@apache.org>.
Vladsz83 commented on a change in pull request #8522:
URL: https://github.com/apache/ignite/pull/8522#discussion_r549455549



##########
File path: modules/ducktests/tests/ignitetest/tests/discovery_test.py
##########
@@ -244,17 +250,18 @@ def _simulate_and_detect_failure(self, servers, failed_nodes, timeout):
 
     def _check_failed_number(self, failed_nodes, survived_node):
         """Ensures number of failed nodes is correct."""
-        cmd = "grep '%s' %s | wc -l" % (node_failed_event_pattern(), IgniteAwareService.STDOUT_STDERR_CAPTURE)
+        failed_cnt = int(exec_command(survived_node, "grep '%s' %s | wc -l" %
+                                      (node_failed_event_pattern(), IgniteAwareService.STDOUT_STDERR_CAPTURE)))
 
-        failed_cnt = int(str(survived_node.account.ssh_client.exec_command(cmd)[1].read(), sys.getdefaultencoding()))
+        # Cache survivor id, do not read each time.
+        surv_id = read_node_id(survived_node)

Review comment:
       Moved to the ignite utils




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] Vladsz83 commented on a change in pull request #8522: GNITE-13835 : Improve discovery ducktape test to research small timeouts and behavior on large cluster.

Posted by GitBox <gi...@apache.org>.
Vladsz83 commented on a change in pull request #8522:
URL: https://github.com/apache/ignite/pull/8522#discussion_r549458979



##########
File path: modules/ducktests/tests/ignitetest/tests/discovery_test.py
##########
@@ -88,99 +93,100 @@ def __init__(self, test_context):
 
     @cluster(num_nodes=MAX_CONTAINERS)
     @ignite_versions(str(DEV_BRANCH), str(LATEST))
-    @matrix(nodes_to_kill=[1, 2], failure_detection_timeout=[FAILURE_TIMEOUT],
-            load_type=[ClusterLoad.NONE, ClusterLoad.ATOMIC, ClusterLoad.TRANSACTIONAL])
-    def test_nodes_fail_not_sequential_tcp(self, ignite_version, nodes_to_kill, load_type, failure_detection_timeout):
+    @matrix(nodes_to_kill=[1, 2], load_type=[ClusterLoad.NONE, ClusterLoad.ATOMIC, ClusterLoad.TRANSACTIONAL])
+    def test_nodes_fail_not_sequential_tcp(self, ignite_version, nodes_to_kill, load_type):
         """
         Test nodes failure scenario with TcpDiscoverySpi not allowing nodes to fail in a row.
         """
         test_config = DiscoveryTestConfig(version=IgniteVersion(ignite_version), nodes_to_kill=nodes_to_kill,
-                                          load_type=ClusterLoad.construct_from(load_type), sequential_failure=False,
-                                          failure_detection_timeout=failure_detection_timeout)
+                                          load_type=ClusterLoad.construct_from(load_type), sequential_failure=False)
 
         return self._perform_node_fail_scenario(test_config)
 
     @cluster(num_nodes=MAX_CONTAINERS)
     @ignite_versions(str(DEV_BRANCH), str(LATEST))
-    @matrix(load_type=[ClusterLoad.NONE, ClusterLoad.ATOMIC, ClusterLoad.TRANSACTIONAL],
-            failure_detection_timeout=[FAILURE_TIMEOUT])
-    def test_2_nodes_fail_sequential_tcp(self, ignite_version, load_type, failure_detection_timeout):
+    @matrix(load_type=[ClusterLoad.NONE, ClusterLoad.ATOMIC, ClusterLoad.TRANSACTIONAL])
+    def test_2_nodes_fail_sequential_tcp(self, ignite_version, load_type):
         """
         Test 2 nodes sequential failure scenario with TcpDiscoverySpi.
         """
         test_config = DiscoveryTestConfig(version=IgniteVersion(ignite_version), nodes_to_kill=2,
-                                          load_type=ClusterLoad.construct_from(load_type), sequential_failure=True,
-                                          failure_detection_timeout=failure_detection_timeout)
+                                          load_type=ClusterLoad.construct_from(load_type), sequential_failure=True)
 
         return self._perform_node_fail_scenario(test_config)
 
     @cluster(num_nodes=MAX_CONTAINERS)
     @version_if(lambda version: version != V_2_8_0)  # ignite-zookeeper package is broken in 2.8.0
     @ignite_versions(str(DEV_BRANCH), str(LATEST))
-    @matrix(nodes_to_kill=[1, 2], failure_detection_timeout=[FAILURE_TIMEOUT],
-            load_type=[ClusterLoad.NONE, ClusterLoad.ATOMIC, ClusterLoad.TRANSACTIONAL])
-    def test_nodes_fail_not_sequential_zk(self, ignite_version, nodes_to_kill, load_type, failure_detection_timeout):
+    @matrix(nodes_to_kill=[1, 2], load_type=[ClusterLoad.NONE, ClusterLoad.ATOMIC, ClusterLoad.TRANSACTIONAL])
+    def test_nodes_fail_not_sequential_zk(self, ignite_version, nodes_to_kill, load_type):
         """
         Test node failure scenario with ZooKeeperSpi not allowing nodes to fail in a row.
         """
         test_config = DiscoveryTestConfig(version=IgniteVersion(ignite_version), nodes_to_kill=nodes_to_kill,
                                           load_type=ClusterLoad.construct_from(load_type), sequential_failure=False,
-                                          with_zk=True, failure_detection_timeout=failure_detection_timeout)
+                                          with_zk=True)
 
         return self._perform_node_fail_scenario(test_config)
 
     @cluster(num_nodes=MAX_CONTAINERS)
     @version_if(lambda version: version != V_2_8_0)  # ignite-zookeeper package is broken in 2.8.0
     @ignite_versions(str(DEV_BRANCH), str(LATEST))
-    @matrix(load_type=[ClusterLoad.NONE, ClusterLoad.ATOMIC, ClusterLoad.TRANSACTIONAL],
-            failure_detection_timeout=[FAILURE_TIMEOUT])
-    def test_2_nodes_fail_sequential_zk(self, ignite_version, load_type, failure_detection_timeout):
+    @matrix(load_type=[ClusterLoad.NONE, ClusterLoad.ATOMIC, ClusterLoad.TRANSACTIONAL])
+    def test_2_nodes_fail_sequential_zk(self, ignite_version, load_type):
         """
         Test node failure scenario with ZooKeeperSpi not allowing to fail nodes in a row.
         """
         test_config = DiscoveryTestConfig(version=IgniteVersion(ignite_version), nodes_to_kill=2,
                                           load_type=ClusterLoad.construct_from(load_type), sequential_failure=True,
-                                          with_zk=True, failure_detection_timeout=failure_detection_timeout)
+                                          with_zk=True)
 
         return self._perform_node_fail_scenario(test_config)
 
     def _perform_node_fail_scenario(self, test_config):
-        max_containers = len(self.test_context.cluster)
+        failure_detection_timeout = self._global_int(self.GLOBAL_DETECTION_TIMEOUT, self.DEFAULT_DETECTION_TIMEOUT)
 
         # One node is required to detect the failure.
-        assert max_containers >= 1 + test_config.nodes_to_kill + (
-            DiscoveryTest.ZOOKEEPER_NODES if test_config.with_zk else 0) + (
-                   0 if test_config.load_type == ClusterLoad.NONE else 1), "Few required containers: " + \
-                                                                           str(max_containers) + ". Check the params."
+        assert len(self.test_context.cluster) >= 1 + test_config.nodes_to_kill + (

Review comment:
       Fixed




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] anton-vinogradov merged pull request #8522: GNITE-13835 : Improve discovery ducktape test to research small timeouts and behavior on large cluster.

Posted by GitBox <gi...@apache.org>.
anton-vinogradov merged pull request #8522:
URL: https://github.com/apache/ignite/pull/8522


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] anton-vinogradov commented on a change in pull request #8522: GNITE-13835 : Improve discovery ducktape test to research small timeouts and behavior on large cluster.

Posted by GitBox <gi...@apache.org>.
anton-vinogradov commented on a change in pull request #8522:
URL: https://github.com/apache/ignite/pull/8522#discussion_r570802220



##########
File path: modules/ducktests/tests/ignitetest/tests/discovery_test.py
##########
@@ -213,36 +215,37 @@ def _perform_node_fail_scenario(self, test_config):
                       "targetNodes": tran_nodes,
                       "transactional": bool(tran_nodes)}
 
-            start_load_app(self.test_context, ignite_config=load_config, params=params, modules=modules)
+            start_load_app(self.test_context, load_config, params, modules)
 
+        # Minimal detection timeout is failure_detection_timeout * 2. Let's more to capture also 'bad' results.
         results.update(self._simulate_and_detect_failure(servers, failed_nodes,
-                                                         test_config.failure_detection_timeout * 4,
+                                                         0.004 * ignite_config.failure_detection_timeout,

Review comment:
       let's define what is 0.004 properly.

##########
File path: modules/ducktests/tests/ignitetest/services/ignite.py
##########
@@ -90,3 +91,19 @@ def get_event_time(service, log_node, log_pattern, from_the_beginning=True, time
 
     return datetime.strptime(re.match("^\\[[^\\[]+\\]", stdout.read().decode("utf-8")).group(),
                              "[%Y-%m-%d %H:%M:%S,%f]")
+
+
+def exec_command(node, cmd):
+    """Executes the command passed on the given node and returns result as string."""
+    return str(node.account.ssh_client.exec_command(cmd)[1].read(), sys.getdefaultencoding())
+
+
+def node_id(node):
+    """
+    Returns node id from its log if started.
+    This is a remote call. Reuse its results if possible.
+    """
+    regexp = "^>>> Local node \\[ID=([^,]+),.+$"
+    cmd = "grep -E '%s' %s | sed -r 's/%s/\\1/'" % (regexp, node.log_file, regexp)
+
+    return exec_command(node, cmd).strip().lower()

Review comment:
       Why located not at IgniteAwareService?

##########
File path: modules/ducktests/tests/ignitetest/utils/ignite_test.py
##########
@@ -55,3 +55,11 @@ def tearDown(self):
         self.logger.debug("All services killed.")
 
         super().tearDown()
+
+    def _global_param(self, param_name, default=None):
+        """Reads global parameter passed to the test suite."""
+        return self.test_context.globals.get(param_name, default)
+
+    def _global_int(self, param_name, default: int = None):

Review comment:
       1) any reason to start a name with "_"?
   2) "default=None" or "int = None", please format the code.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] anton-vinogradov commented on a change in pull request #8522: GNITE-13835 : Improve discovery ducktape test to research small timeouts and behavior on large cluster.

Posted by GitBox <gi...@apache.org>.
anton-vinogradov commented on a change in pull request #8522:
URL: https://github.com/apache/ignite/pull/8522#discussion_r570802220



##########
File path: modules/ducktests/tests/ignitetest/tests/discovery_test.py
##########
@@ -213,36 +215,37 @@ def _perform_node_fail_scenario(self, test_config):
                       "targetNodes": tran_nodes,
                       "transactional": bool(tran_nodes)}
 
-            start_load_app(self.test_context, ignite_config=load_config, params=params, modules=modules)
+            start_load_app(self.test_context, load_config, params, modules)
 
+        # Minimal detection timeout is failure_detection_timeout * 2. Let's more to capture also 'bad' results.
         results.update(self._simulate_and_detect_failure(servers, failed_nodes,
-                                                         test_config.failure_detection_timeout * 4,
+                                                         0.004 * ignite_config.failure_detection_timeout,

Review comment:
       let's define what is 0.004 properly.

##########
File path: modules/ducktests/tests/ignitetest/services/ignite.py
##########
@@ -90,3 +91,19 @@ def get_event_time(service, log_node, log_pattern, from_the_beginning=True, time
 
     return datetime.strptime(re.match("^\\[[^\\[]+\\]", stdout.read().decode("utf-8")).group(),
                              "[%Y-%m-%d %H:%M:%S,%f]")
+
+
+def exec_command(node, cmd):
+    """Executes the command passed on the given node and returns result as string."""
+    return str(node.account.ssh_client.exec_command(cmd)[1].read(), sys.getdefaultencoding())
+
+
+def node_id(node):
+    """
+    Returns node id from its log if started.
+    This is a remote call. Reuse its results if possible.
+    """
+    regexp = "^>>> Local node \\[ID=([^,]+),.+$"
+    cmd = "grep -E '%s' %s | sed -r 's/%s/\\1/'" % (regexp, node.log_file, regexp)
+
+    return exec_command(node, cmd).strip().lower()

Review comment:
       Why located not at IgniteAwareService?

##########
File path: modules/ducktests/tests/ignitetest/utils/ignite_test.py
##########
@@ -55,3 +55,11 @@ def tearDown(self):
         self.logger.debug("All services killed.")
 
         super().tearDown()
+
+    def _global_param(self, param_name, default=None):
+        """Reads global parameter passed to the test suite."""
+        return self.test_context.globals.get(param_name, default)
+
+    def _global_int(self, param_name, default: int = None):

Review comment:
       1) any reason to start a name with "_"?
   2) "default=None" or "int = None", please format the code.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] Vladsz83 commented on a change in pull request #8522: GNITE-13835 : Improve discovery ducktape test to research small timeouts and behavior on large cluster.

Posted by GitBox <gi...@apache.org>.
Vladsz83 commented on a change in pull request #8522:
URL: https://github.com/apache/ignite/pull/8522#discussion_r570821329



##########
File path: modules/ducktests/tests/ignitetest/utils/ignite_test.py
##########
@@ -55,3 +55,11 @@ def tearDown(self):
         self.logger.debug("All services killed.")
 
         super().tearDown()
+
+    def _global_param(self, param_name, default=None):
+        """Reads global parameter passed to the test suite."""
+        return self.test_context.globals.get(param_name, default)
+
+    def _global_int(self, param_name, default: int = None):

Review comment:
       Hm. The autoformat says current typing is correct where as supposed fixes are not:
   For 'default = None' - PEP 8: E251 unexpected staces...
   For 'int=None' - PEP 8: E252: Missing whitespace aroung parameters.
   
   Same from pylint.
   
   The reason for _global_param is that it's like 'private' method. Not intended to call it outside

##########
File path: modules/ducktests/tests/ignitetest/tests/discovery_test.py
##########
@@ -213,36 +215,37 @@ def _perform_node_fail_scenario(self, test_config):
                       "targetNodes": tran_nodes,
                       "transactional": bool(tran_nodes)}
 
-            start_load_app(self.test_context, ignite_config=load_config, params=params, modules=modules)
+            start_load_app(self.test_context, load_config, params, modules)
 
+        # Minimal detection timeout is failure_detection_timeout * 2. Let's more to capture also 'bad' results.
         results.update(self._simulate_and_detect_failure(servers, failed_nodes,
-                                                         test_config.failure_detection_timeout * 4,
+                                                         0.004 * ignite_config.failure_detection_timeout,

Review comment:
       Fixed




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] Vladsz83 commented on a change in pull request #8522: GNITE-13835 : Improve discovery ducktape test to research small timeouts and behavior on large cluster.

Posted by GitBox <gi...@apache.org>.
Vladsz83 commented on a change in pull request #8522:
URL: https://github.com/apache/ignite/pull/8522#discussion_r570896919



##########
File path: modules/ducktests/tests/ignitetest/services/ignite.py
##########
@@ -90,3 +91,19 @@ def get_event_time(service, log_node, log_pattern, from_the_beginning=True, time
 
     return datetime.strptime(re.match("^\\[[^\\[]+\\]", stdout.read().decode("utf-8")).group(),
                              "[%Y-%m-%d %H:%M:%S,%f]")
+
+
+def exec_command(node, cmd):
+    """Executes the command passed on the given node and returns result as string."""
+    return str(node.account.ssh_client.exec_command(cmd)[1].read(), sys.getdefaultencoding())
+
+
+def node_id(node):
+    """
+    Returns node id from its log if started.
+    This is a remote call. Reuse its results if possible.
+    """
+    regexp = "^>>> Local node \\[ID=([^,]+),.+$"
+    cmd = "grep -E '%s' %s | sed -r 's/%s/\\1/'" % (regexp, node.log_file, regexp)
+
+    return exec_command(node, cmd).strip().lower()

Review comment:
       Fixed




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] Vladsz83 commented on a change in pull request #8522: GNITE-13835 : Improve discovery ducktape test to research small timeouts and behavior on large cluster.

Posted by GitBox <gi...@apache.org>.
Vladsz83 commented on a change in pull request #8522:
URL: https://github.com/apache/ignite/pull/8522#discussion_r549455483



##########
File path: modules/ducktests/tests/ignitetest/tests/discovery_test.py
##########
@@ -88,99 +93,100 @@ def __init__(self, test_context):
 
     @cluster(num_nodes=MAX_CONTAINERS)
     @ignite_versions(str(DEV_BRANCH), str(LATEST))
-    @matrix(nodes_to_kill=[1, 2], failure_detection_timeout=[FAILURE_TIMEOUT],
-            load_type=[ClusterLoad.NONE, ClusterLoad.ATOMIC, ClusterLoad.TRANSACTIONAL])
-    def test_nodes_fail_not_sequential_tcp(self, ignite_version, nodes_to_kill, load_type, failure_detection_timeout):
+    @matrix(nodes_to_kill=[1, 2], load_type=[ClusterLoad.NONE, ClusterLoad.ATOMIC, ClusterLoad.TRANSACTIONAL])
+    def test_nodes_fail_not_sequential_tcp(self, ignite_version, nodes_to_kill, load_type):
         """
         Test nodes failure scenario with TcpDiscoverySpi not allowing nodes to fail in a row.
         """
         test_config = DiscoveryTestConfig(version=IgniteVersion(ignite_version), nodes_to_kill=nodes_to_kill,
-                                          load_type=ClusterLoad.construct_from(load_type), sequential_failure=False,
-                                          failure_detection_timeout=failure_detection_timeout)
+                                          load_type=ClusterLoad.construct_from(load_type), sequential_failure=False)
 
         return self._perform_node_fail_scenario(test_config)
 
     @cluster(num_nodes=MAX_CONTAINERS)
     @ignite_versions(str(DEV_BRANCH), str(LATEST))
-    @matrix(load_type=[ClusterLoad.NONE, ClusterLoad.ATOMIC, ClusterLoad.TRANSACTIONAL],
-            failure_detection_timeout=[FAILURE_TIMEOUT])
-    def test_2_nodes_fail_sequential_tcp(self, ignite_version, load_type, failure_detection_timeout):
+    @matrix(load_type=[ClusterLoad.NONE, ClusterLoad.ATOMIC, ClusterLoad.TRANSACTIONAL])
+    def test_2_nodes_fail_sequential_tcp(self, ignite_version, load_type):
         """
         Test 2 nodes sequential failure scenario with TcpDiscoverySpi.
         """
         test_config = DiscoveryTestConfig(version=IgniteVersion(ignite_version), nodes_to_kill=2,
-                                          load_type=ClusterLoad.construct_from(load_type), sequential_failure=True,
-                                          failure_detection_timeout=failure_detection_timeout)
+                                          load_type=ClusterLoad.construct_from(load_type), sequential_failure=True)
 
         return self._perform_node_fail_scenario(test_config)
 
     @cluster(num_nodes=MAX_CONTAINERS)
     @version_if(lambda version: version != V_2_8_0)  # ignite-zookeeper package is broken in 2.8.0
     @ignite_versions(str(DEV_BRANCH), str(LATEST))
-    @matrix(nodes_to_kill=[1, 2], failure_detection_timeout=[FAILURE_TIMEOUT],
-            load_type=[ClusterLoad.NONE, ClusterLoad.ATOMIC, ClusterLoad.TRANSACTIONAL])
-    def test_nodes_fail_not_sequential_zk(self, ignite_version, nodes_to_kill, load_type, failure_detection_timeout):
+    @matrix(nodes_to_kill=[1, 2], load_type=[ClusterLoad.NONE, ClusterLoad.ATOMIC, ClusterLoad.TRANSACTIONAL])
+    def test_nodes_fail_not_sequential_zk(self, ignite_version, nodes_to_kill, load_type):
         """
         Test node failure scenario with ZooKeeperSpi not allowing nodes to fail in a row.
         """
         test_config = DiscoveryTestConfig(version=IgniteVersion(ignite_version), nodes_to_kill=nodes_to_kill,
                                           load_type=ClusterLoad.construct_from(load_type), sequential_failure=False,
-                                          with_zk=True, failure_detection_timeout=failure_detection_timeout)
+                                          with_zk=True)
 
         return self._perform_node_fail_scenario(test_config)
 
     @cluster(num_nodes=MAX_CONTAINERS)
     @version_if(lambda version: version != V_2_8_0)  # ignite-zookeeper package is broken in 2.8.0
     @ignite_versions(str(DEV_BRANCH), str(LATEST))
-    @matrix(load_type=[ClusterLoad.NONE, ClusterLoad.ATOMIC, ClusterLoad.TRANSACTIONAL],
-            failure_detection_timeout=[FAILURE_TIMEOUT])
-    def test_2_nodes_fail_sequential_zk(self, ignite_version, load_type, failure_detection_timeout):
+    @matrix(load_type=[ClusterLoad.NONE, ClusterLoad.ATOMIC, ClusterLoad.TRANSACTIONAL])
+    def test_2_nodes_fail_sequential_zk(self, ignite_version, load_type):
         """
         Test node failure scenario with ZooKeeperSpi not allowing to fail nodes in a row.
         """
         test_config = DiscoveryTestConfig(version=IgniteVersion(ignite_version), nodes_to_kill=2,
                                           load_type=ClusterLoad.construct_from(load_type), sequential_failure=True,
-                                          with_zk=True, failure_detection_timeout=failure_detection_timeout)
+                                          with_zk=True)
 
         return self._perform_node_fail_scenario(test_config)
 
     def _perform_node_fail_scenario(self, test_config):
-        max_containers = len(self.test_context.cluster)
+        failure_detection_timeout = self._global_int(self.GLOBAL_DETECTION_TIMEOUT, self.DEFAULT_DETECTION_TIMEOUT)
 
         # One node is required to detect the failure.
-        assert max_containers >= 1 + test_config.nodes_to_kill + (
-            DiscoveryTest.ZOOKEEPER_NODES if test_config.with_zk else 0) + (
-                   0 if test_config.load_type == ClusterLoad.NONE else 1), "Few required containers: " + \
-                                                                           str(max_containers) + ". Check the params."
+        assert len(self.test_context.cluster) >= 1 + test_config.nodes_to_kill + (
+            self.ZOOKEEPER_NODES if test_config.with_zk else 0), \
+            f"Few required containers: {len(self.test_context.cluster)}. Check the params."
 
-        self.logger.info("Starting on " + str(max_containers) + " maximal containers.")
+        self.logger.info("Starting on " + str(len(self.test_context.cluster)) + " maximal containers.")
+        self.logger.info(f"{self.GLOBAL_DETECTION_TIMEOUT}: {failure_detection_timeout}")
 
         results = {}
 
         modules = ['zookeeper'] if test_config.with_zk else None
 
         if test_config.with_zk:
-            zk_quorum = start_zookeeper(self.test_context, DiscoveryTest.ZOOKEEPER_NODES, test_config)
+            zk_quorum = start_zookeeper(self.test_context, self.ZOOKEEPER_NODES, failure_detection_timeout)
 
             discovery_spi = from_zookeeper_cluster(zk_quorum)
         else:
             discovery_spi = TcpDiscoverySpi()
 
+            if LATEST_2_7 < test_config.version < V_2_9_1:
+                discovery_spi.so_linger = 0
+
         ignite_config = IgniteConfiguration(
             version=test_config.version,
             discovery_spi=discovery_spi,
-            failure_detection_timeout=test_config.failure_detection_timeout,
+            failure_detection_timeout=failure_detection_timeout,
             caches=[CacheConfiguration(
                 name='test-cache',
                 backups=1,
                 atomicity_mode='TRANSACTIONAL' if test_config.load_type == ClusterLoad.TRANSACTIONAL else 'ATOMIC'
             )]
         )
 
+        jvm_opts_str = jvm_settings(gc_dump_path=os.path.join(IgniteService.PERSISTENT_ROOT, "ignite_gc.log"),

Review comment:
       Moved to the spec




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] anton-vinogradov commented on a change in pull request #8522: GNITE-13835 : Improve discovery ducktape test to research small timeouts and behavior on large cluster.

Posted by GitBox <gi...@apache.org>.
anton-vinogradov commented on a change in pull request #8522:
URL: https://github.com/apache/ignite/pull/8522#discussion_r542258141



##########
File path: modules/ducktests/tests/ignitetest/tests/discovery_test.py
##########
@@ -244,17 +250,18 @@ def _simulate_and_detect_failure(self, servers, failed_nodes, timeout):
 
     def _check_failed_number(self, failed_nodes, survived_node):
         """Ensures number of failed nodes is correct."""
-        cmd = "grep '%s' %s | wc -l" % (node_failed_event_pattern(), IgniteAwareService.STDOUT_STDERR_CAPTURE)
+        failed_cnt = int(exec_command(survived_node, "grep '%s' %s | wc -l" %
+                                      (node_failed_event_pattern(), IgniteAwareService.STDOUT_STDERR_CAPTURE)))
 
-        failed_cnt = int(str(survived_node.account.ssh_client.exec_command(cmd)[1].read(), sys.getdefaultencoding()))
+        # Cache survivor id, do not read each time.
+        surv_id = read_node_id(survived_node)

Review comment:
       could this be IgniteAppService method or property?

##########
File path: modules/ducktests/tests/ignitetest/tests/discovery_test.py
##########
@@ -88,99 +93,100 @@ def __init__(self, test_context):
 
     @cluster(num_nodes=MAX_CONTAINERS)
     @ignite_versions(str(DEV_BRANCH), str(LATEST))
-    @matrix(nodes_to_kill=[1, 2], failure_detection_timeout=[FAILURE_TIMEOUT],
-            load_type=[ClusterLoad.NONE, ClusterLoad.ATOMIC, ClusterLoad.TRANSACTIONAL])
-    def test_nodes_fail_not_sequential_tcp(self, ignite_version, nodes_to_kill, load_type, failure_detection_timeout):
+    @matrix(nodes_to_kill=[1, 2], load_type=[ClusterLoad.NONE, ClusterLoad.ATOMIC, ClusterLoad.TRANSACTIONAL])
+    def test_nodes_fail_not_sequential_tcp(self, ignite_version, nodes_to_kill, load_type):
         """
         Test nodes failure scenario with TcpDiscoverySpi not allowing nodes to fail in a row.
         """
         test_config = DiscoveryTestConfig(version=IgniteVersion(ignite_version), nodes_to_kill=nodes_to_kill,
-                                          load_type=ClusterLoad.construct_from(load_type), sequential_failure=False,
-                                          failure_detection_timeout=failure_detection_timeout)
+                                          load_type=ClusterLoad.construct_from(load_type), sequential_failure=False)
 
         return self._perform_node_fail_scenario(test_config)
 
     @cluster(num_nodes=MAX_CONTAINERS)
     @ignite_versions(str(DEV_BRANCH), str(LATEST))
-    @matrix(load_type=[ClusterLoad.NONE, ClusterLoad.ATOMIC, ClusterLoad.TRANSACTIONAL],
-            failure_detection_timeout=[FAILURE_TIMEOUT])
-    def test_2_nodes_fail_sequential_tcp(self, ignite_version, load_type, failure_detection_timeout):
+    @matrix(load_type=[ClusterLoad.NONE, ClusterLoad.ATOMIC, ClusterLoad.TRANSACTIONAL])
+    def test_2_nodes_fail_sequential_tcp(self, ignite_version, load_type):
         """
         Test 2 nodes sequential failure scenario with TcpDiscoverySpi.
         """
         test_config = DiscoveryTestConfig(version=IgniteVersion(ignite_version), nodes_to_kill=2,
-                                          load_type=ClusterLoad.construct_from(load_type), sequential_failure=True,
-                                          failure_detection_timeout=failure_detection_timeout)
+                                          load_type=ClusterLoad.construct_from(load_type), sequential_failure=True)
 
         return self._perform_node_fail_scenario(test_config)
 
     @cluster(num_nodes=MAX_CONTAINERS)
     @version_if(lambda version: version != V_2_8_0)  # ignite-zookeeper package is broken in 2.8.0
     @ignite_versions(str(DEV_BRANCH), str(LATEST))
-    @matrix(nodes_to_kill=[1, 2], failure_detection_timeout=[FAILURE_TIMEOUT],
-            load_type=[ClusterLoad.NONE, ClusterLoad.ATOMIC, ClusterLoad.TRANSACTIONAL])
-    def test_nodes_fail_not_sequential_zk(self, ignite_version, nodes_to_kill, load_type, failure_detection_timeout):
+    @matrix(nodes_to_kill=[1, 2], load_type=[ClusterLoad.NONE, ClusterLoad.ATOMIC, ClusterLoad.TRANSACTIONAL])
+    def test_nodes_fail_not_sequential_zk(self, ignite_version, nodes_to_kill, load_type):
         """
         Test node failure scenario with ZooKeeperSpi not allowing nodes to fail in a row.
         """
         test_config = DiscoveryTestConfig(version=IgniteVersion(ignite_version), nodes_to_kill=nodes_to_kill,
                                           load_type=ClusterLoad.construct_from(load_type), sequential_failure=False,
-                                          with_zk=True, failure_detection_timeout=failure_detection_timeout)
+                                          with_zk=True)
 
         return self._perform_node_fail_scenario(test_config)
 
     @cluster(num_nodes=MAX_CONTAINERS)
     @version_if(lambda version: version != V_2_8_0)  # ignite-zookeeper package is broken in 2.8.0
     @ignite_versions(str(DEV_BRANCH), str(LATEST))
-    @matrix(load_type=[ClusterLoad.NONE, ClusterLoad.ATOMIC, ClusterLoad.TRANSACTIONAL],
-            failure_detection_timeout=[FAILURE_TIMEOUT])
-    def test_2_nodes_fail_sequential_zk(self, ignite_version, load_type, failure_detection_timeout):
+    @matrix(load_type=[ClusterLoad.NONE, ClusterLoad.ATOMIC, ClusterLoad.TRANSACTIONAL])
+    def test_2_nodes_fail_sequential_zk(self, ignite_version, load_type):
         """
         Test node failure scenario with ZooKeeperSpi not allowing to fail nodes in a row.
         """
         test_config = DiscoveryTestConfig(version=IgniteVersion(ignite_version), nodes_to_kill=2,
                                           load_type=ClusterLoad.construct_from(load_type), sequential_failure=True,
-                                          with_zk=True, failure_detection_timeout=failure_detection_timeout)
+                                          with_zk=True)
 
         return self._perform_node_fail_scenario(test_config)
 
     def _perform_node_fail_scenario(self, test_config):
-        max_containers = len(self.test_context.cluster)
+        failure_detection_timeout = self._global_int(self.GLOBAL_DETECTION_TIMEOUT, self.DEFAULT_DETECTION_TIMEOUT)
 
         # One node is required to detect the failure.
-        assert max_containers >= 1 + test_config.nodes_to_kill + (
-            DiscoveryTest.ZOOKEEPER_NODES if test_config.with_zk else 0) + (
-                   0 if test_config.load_type == ClusterLoad.NONE else 1), "Few required containers: " + \
-                                                                           str(max_containers) + ". Check the params."
+        assert len(self.test_context.cluster) >= 1 + test_config.nodes_to_kill + (
+            self.ZOOKEEPER_NODES if test_config.with_zk else 0), \
+            f"Few required containers: {len(self.test_context.cluster)}. Check the params."
 
-        self.logger.info("Starting on " + str(max_containers) + " maximal containers.")
+        self.logger.info("Starting on " + str(len(self.test_context.cluster)) + " maximal containers.")
+        self.logger.info(f"{self.GLOBAL_DETECTION_TIMEOUT}: {failure_detection_timeout}")
 
         results = {}
 
         modules = ['zookeeper'] if test_config.with_zk else None
 
         if test_config.with_zk:
-            zk_quorum = start_zookeeper(self.test_context, DiscoveryTest.ZOOKEEPER_NODES, test_config)
+            zk_quorum = start_zookeeper(self.test_context, self.ZOOKEEPER_NODES, failure_detection_timeout)
 
             discovery_spi = from_zookeeper_cluster(zk_quorum)
         else:
             discovery_spi = TcpDiscoverySpi()
 
+            if LATEST_2_7 < test_config.version < V_2_9_1:
+                discovery_spi.so_linger = 0
+
         ignite_config = IgniteConfiguration(
             version=test_config.version,
             discovery_spi=discovery_spi,
-            failure_detection_timeout=test_config.failure_detection_timeout,
+            failure_detection_timeout=failure_detection_timeout,
             caches=[CacheConfiguration(
                 name='test-cache',
                 backups=1,
                 atomicity_mode='TRANSACTIONAL' if test_config.load_type == ClusterLoad.TRANSACTIONAL else 'ATOMIC'
             )]
         )
 
+        jvm_opts_str = jvm_settings(gc_dump_path=os.path.join(IgniteService.PERSISTENT_ROOT, "ignite_gc.log"),

Review comment:
       could this be the default behavior for any IgniteAwareService?

##########
File path: modules/ducktests/tests/ignitetest/tests/discovery_test.py
##########
@@ -88,99 +93,100 @@ def __init__(self, test_context):
 
     @cluster(num_nodes=MAX_CONTAINERS)
     @ignite_versions(str(DEV_BRANCH), str(LATEST))
-    @matrix(nodes_to_kill=[1, 2], failure_detection_timeout=[FAILURE_TIMEOUT],
-            load_type=[ClusterLoad.NONE, ClusterLoad.ATOMIC, ClusterLoad.TRANSACTIONAL])
-    def test_nodes_fail_not_sequential_tcp(self, ignite_version, nodes_to_kill, load_type, failure_detection_timeout):
+    @matrix(nodes_to_kill=[1, 2], load_type=[ClusterLoad.NONE, ClusterLoad.ATOMIC, ClusterLoad.TRANSACTIONAL])
+    def test_nodes_fail_not_sequential_tcp(self, ignite_version, nodes_to_kill, load_type):
         """
         Test nodes failure scenario with TcpDiscoverySpi not allowing nodes to fail in a row.
         """
         test_config = DiscoveryTestConfig(version=IgniteVersion(ignite_version), nodes_to_kill=nodes_to_kill,
-                                          load_type=ClusterLoad.construct_from(load_type), sequential_failure=False,
-                                          failure_detection_timeout=failure_detection_timeout)
+                                          load_type=ClusterLoad.construct_from(load_type), sequential_failure=False)
 
         return self._perform_node_fail_scenario(test_config)
 
     @cluster(num_nodes=MAX_CONTAINERS)
     @ignite_versions(str(DEV_BRANCH), str(LATEST))
-    @matrix(load_type=[ClusterLoad.NONE, ClusterLoad.ATOMIC, ClusterLoad.TRANSACTIONAL],
-            failure_detection_timeout=[FAILURE_TIMEOUT])
-    def test_2_nodes_fail_sequential_tcp(self, ignite_version, load_type, failure_detection_timeout):
+    @matrix(load_type=[ClusterLoad.NONE, ClusterLoad.ATOMIC, ClusterLoad.TRANSACTIONAL])
+    def test_2_nodes_fail_sequential_tcp(self, ignite_version, load_type):
         """
         Test 2 nodes sequential failure scenario with TcpDiscoverySpi.
         """
         test_config = DiscoveryTestConfig(version=IgniteVersion(ignite_version), nodes_to_kill=2,
-                                          load_type=ClusterLoad.construct_from(load_type), sequential_failure=True,
-                                          failure_detection_timeout=failure_detection_timeout)
+                                          load_type=ClusterLoad.construct_from(load_type), sequential_failure=True)
 
         return self._perform_node_fail_scenario(test_config)
 
     @cluster(num_nodes=MAX_CONTAINERS)
     @version_if(lambda version: version != V_2_8_0)  # ignite-zookeeper package is broken in 2.8.0
     @ignite_versions(str(DEV_BRANCH), str(LATEST))
-    @matrix(nodes_to_kill=[1, 2], failure_detection_timeout=[FAILURE_TIMEOUT],
-            load_type=[ClusterLoad.NONE, ClusterLoad.ATOMIC, ClusterLoad.TRANSACTIONAL])
-    def test_nodes_fail_not_sequential_zk(self, ignite_version, nodes_to_kill, load_type, failure_detection_timeout):
+    @matrix(nodes_to_kill=[1, 2], load_type=[ClusterLoad.NONE, ClusterLoad.ATOMIC, ClusterLoad.TRANSACTIONAL])
+    def test_nodes_fail_not_sequential_zk(self, ignite_version, nodes_to_kill, load_type):
         """
         Test node failure scenario with ZooKeeperSpi not allowing nodes to fail in a row.
         """
         test_config = DiscoveryTestConfig(version=IgniteVersion(ignite_version), nodes_to_kill=nodes_to_kill,
                                           load_type=ClusterLoad.construct_from(load_type), sequential_failure=False,
-                                          with_zk=True, failure_detection_timeout=failure_detection_timeout)
+                                          with_zk=True)
 
         return self._perform_node_fail_scenario(test_config)
 
     @cluster(num_nodes=MAX_CONTAINERS)
     @version_if(lambda version: version != V_2_8_0)  # ignite-zookeeper package is broken in 2.8.0
     @ignite_versions(str(DEV_BRANCH), str(LATEST))
-    @matrix(load_type=[ClusterLoad.NONE, ClusterLoad.ATOMIC, ClusterLoad.TRANSACTIONAL],
-            failure_detection_timeout=[FAILURE_TIMEOUT])
-    def test_2_nodes_fail_sequential_zk(self, ignite_version, load_type, failure_detection_timeout):
+    @matrix(load_type=[ClusterLoad.NONE, ClusterLoad.ATOMIC, ClusterLoad.TRANSACTIONAL])
+    def test_2_nodes_fail_sequential_zk(self, ignite_version, load_type):
         """
         Test node failure scenario with ZooKeeperSpi not allowing to fail nodes in a row.
         """
         test_config = DiscoveryTestConfig(version=IgniteVersion(ignite_version), nodes_to_kill=2,
                                           load_type=ClusterLoad.construct_from(load_type), sequential_failure=True,
-                                          with_zk=True, failure_detection_timeout=failure_detection_timeout)
+                                          with_zk=True)
 
         return self._perform_node_fail_scenario(test_config)
 
     def _perform_node_fail_scenario(self, test_config):
-        max_containers = len(self.test_context.cluster)
+        failure_detection_timeout = self._global_int(self.GLOBAL_DETECTION_TIMEOUT, self.DEFAULT_DETECTION_TIMEOUT)
 
         # One node is required to detect the failure.
-        assert max_containers >= 1 + test_config.nodes_to_kill + (
-            DiscoveryTest.ZOOKEEPER_NODES if test_config.with_zk else 0) + (
-                   0 if test_config.load_type == ClusterLoad.NONE else 1), "Few required containers: " + \
-                                                                           str(max_containers) + ". Check the params."
+        assert len(self.test_context.cluster) >= 1 + test_config.nodes_to_kill + (

Review comment:
       any reason to call len(self.test_context.cluster) so much time?

##########
File path: modules/ducktests/tests/ignitetest/tests/discovery_test.py
##########
@@ -88,99 +93,100 @@ def __init__(self, test_context):
 
     @cluster(num_nodes=MAX_CONTAINERS)
     @ignite_versions(str(DEV_BRANCH), str(LATEST))
-    @matrix(nodes_to_kill=[1, 2], failure_detection_timeout=[FAILURE_TIMEOUT],
-            load_type=[ClusterLoad.NONE, ClusterLoad.ATOMIC, ClusterLoad.TRANSACTIONAL])
-    def test_nodes_fail_not_sequential_tcp(self, ignite_version, nodes_to_kill, load_type, failure_detection_timeout):
+    @matrix(nodes_to_kill=[1, 2], load_type=[ClusterLoad.NONE, ClusterLoad.ATOMIC, ClusterLoad.TRANSACTIONAL])
+    def test_nodes_fail_not_sequential_tcp(self, ignite_version, nodes_to_kill, load_type):
         """
         Test nodes failure scenario with TcpDiscoverySpi not allowing nodes to fail in a row.
         """
         test_config = DiscoveryTestConfig(version=IgniteVersion(ignite_version), nodes_to_kill=nodes_to_kill,
-                                          load_type=ClusterLoad.construct_from(load_type), sequential_failure=False,
-                                          failure_detection_timeout=failure_detection_timeout)
+                                          load_type=ClusterLoad.construct_from(load_type), sequential_failure=False)
 
         return self._perform_node_fail_scenario(test_config)
 
     @cluster(num_nodes=MAX_CONTAINERS)
     @ignite_versions(str(DEV_BRANCH), str(LATEST))
-    @matrix(load_type=[ClusterLoad.NONE, ClusterLoad.ATOMIC, ClusterLoad.TRANSACTIONAL],
-            failure_detection_timeout=[FAILURE_TIMEOUT])
-    def test_2_nodes_fail_sequential_tcp(self, ignite_version, load_type, failure_detection_timeout):
+    @matrix(load_type=[ClusterLoad.NONE, ClusterLoad.ATOMIC, ClusterLoad.TRANSACTIONAL])
+    def test_2_nodes_fail_sequential_tcp(self, ignite_version, load_type):
         """
         Test 2 nodes sequential failure scenario with TcpDiscoverySpi.
         """
         test_config = DiscoveryTestConfig(version=IgniteVersion(ignite_version), nodes_to_kill=2,
-                                          load_type=ClusterLoad.construct_from(load_type), sequential_failure=True,
-                                          failure_detection_timeout=failure_detection_timeout)
+                                          load_type=ClusterLoad.construct_from(load_type), sequential_failure=True)
 
         return self._perform_node_fail_scenario(test_config)
 
     @cluster(num_nodes=MAX_CONTAINERS)
     @version_if(lambda version: version != V_2_8_0)  # ignite-zookeeper package is broken in 2.8.0
     @ignite_versions(str(DEV_BRANCH), str(LATEST))
-    @matrix(nodes_to_kill=[1, 2], failure_detection_timeout=[FAILURE_TIMEOUT],
-            load_type=[ClusterLoad.NONE, ClusterLoad.ATOMIC, ClusterLoad.TRANSACTIONAL])
-    def test_nodes_fail_not_sequential_zk(self, ignite_version, nodes_to_kill, load_type, failure_detection_timeout):
+    @matrix(nodes_to_kill=[1, 2], load_type=[ClusterLoad.NONE, ClusterLoad.ATOMIC, ClusterLoad.TRANSACTIONAL])
+    def test_nodes_fail_not_sequential_zk(self, ignite_version, nodes_to_kill, load_type):
         """
         Test node failure scenario with ZooKeeperSpi not allowing nodes to fail in a row.
         """
         test_config = DiscoveryTestConfig(version=IgniteVersion(ignite_version), nodes_to_kill=nodes_to_kill,
                                           load_type=ClusterLoad.construct_from(load_type), sequential_failure=False,
-                                          with_zk=True, failure_detection_timeout=failure_detection_timeout)
+                                          with_zk=True)
 
         return self._perform_node_fail_scenario(test_config)
 
     @cluster(num_nodes=MAX_CONTAINERS)
     @version_if(lambda version: version != V_2_8_0)  # ignite-zookeeper package is broken in 2.8.0
     @ignite_versions(str(DEV_BRANCH), str(LATEST))
-    @matrix(load_type=[ClusterLoad.NONE, ClusterLoad.ATOMIC, ClusterLoad.TRANSACTIONAL],
-            failure_detection_timeout=[FAILURE_TIMEOUT])
-    def test_2_nodes_fail_sequential_zk(self, ignite_version, load_type, failure_detection_timeout):
+    @matrix(load_type=[ClusterLoad.NONE, ClusterLoad.ATOMIC, ClusterLoad.TRANSACTIONAL])
+    def test_2_nodes_fail_sequential_zk(self, ignite_version, load_type):
         """
         Test node failure scenario with ZooKeeperSpi not allowing to fail nodes in a row.
         """
         test_config = DiscoveryTestConfig(version=IgniteVersion(ignite_version), nodes_to_kill=2,
                                           load_type=ClusterLoad.construct_from(load_type), sequential_failure=True,
-                                          with_zk=True, failure_detection_timeout=failure_detection_timeout)
+                                          with_zk=True)
 
         return self._perform_node_fail_scenario(test_config)
 
     def _perform_node_fail_scenario(self, test_config):
-        max_containers = len(self.test_context.cluster)
+        failure_detection_timeout = self._global_int(self.GLOBAL_DETECTION_TIMEOUT, self.DEFAULT_DETECTION_TIMEOUT)
 
         # One node is required to detect the failure.
-        assert max_containers >= 1 + test_config.nodes_to_kill + (
-            DiscoveryTest.ZOOKEEPER_NODES if test_config.with_zk else 0) + (
-                   0 if test_config.load_type == ClusterLoad.NONE else 1), "Few required containers: " + \
-                                                                           str(max_containers) + ". Check the params."
+        assert len(self.test_context.cluster) >= 1 + test_config.nodes_to_kill + (
+            self.ZOOKEEPER_NODES if test_config.with_zk else 0), \
+            f"Few required containers: {len(self.test_context.cluster)}. Check the params."
 
-        self.logger.info("Starting on " + str(max_containers) + " maximal containers.")
+        self.logger.info("Starting on " + str(len(self.test_context.cluster)) + " maximal containers.")
+        self.logger.info(f"{self.GLOBAL_DETECTION_TIMEOUT}: {failure_detection_timeout}")
 
         results = {}
 
         modules = ['zookeeper'] if test_config.with_zk else None
 
         if test_config.with_zk:
-            zk_quorum = start_zookeeper(self.test_context, DiscoveryTest.ZOOKEEPER_NODES, test_config)
+            zk_quorum = start_zookeeper(self.test_context, self.ZOOKEEPER_NODES, failure_detection_timeout)
 
             discovery_spi = from_zookeeper_cluster(zk_quorum)
         else:
             discovery_spi = TcpDiscoverySpi()
 
+            if LATEST_2_7 < test_config.version < V_2_9_1:

Review comment:
       let's assert it 0 at 2.9.1




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] Vladsz83 commented on a change in pull request #8522: GNITE-13835 : Improve discovery ducktape test to research small timeouts and behavior on large cluster.

Posted by GitBox <gi...@apache.org>.
Vladsz83 commented on a change in pull request #8522:
URL: https://github.com/apache/ignite/pull/8522#discussion_r570809855



##########
File path: modules/ducktests/tests/ignitetest/services/ignite.py
##########
@@ -90,3 +91,19 @@ def get_event_time(service, log_node, log_pattern, from_the_beginning=True, time
 
     return datetime.strptime(re.match("^\\[[^\\[]+\\]", stdout.read().decode("utf-8")).group(),
                              "[%Y-%m-%d %H:%M:%S,%f]")
+
+
+def exec_command(node, cmd):
+    """Executes the command passed on the given node and returns result as string."""
+    return str(node.account.ssh_client.exec_command(cmd)[1].read(), sys.getdefaultencoding())
+
+
+def node_id(node):
+    """
+    Returns node id from its log if started.
+    This is a remote call. Reuse its results if possible.
+    """
+    regexp = "^>>> Local node \\[ID=([^,]+),.+$"
+    cmd = "grep -E '%s' %s | sed -r 's/%s/\\1/'" % (regexp, node.log_file, regexp)
+
+    return exec_command(node, cmd).strip().lower()

Review comment:
       Because they are 'static' helpers not binded to any instanse. Is it ok?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] Vladsz83 commented on a change in pull request #8522: GNITE-13835 : Improve discovery ducktape test to research small timeouts and behavior on large cluster.

Posted by GitBox <gi...@apache.org>.
Vladsz83 commented on a change in pull request #8522:
URL: https://github.com/apache/ignite/pull/8522#discussion_r570809855



##########
File path: modules/ducktests/tests/ignitetest/services/ignite.py
##########
@@ -90,3 +91,19 @@ def get_event_time(service, log_node, log_pattern, from_the_beginning=True, time
 
     return datetime.strptime(re.match("^\\[[^\\[]+\\]", stdout.read().decode("utf-8")).group(),
                              "[%Y-%m-%d %H:%M:%S,%f]")
+
+
+def exec_command(node, cmd):
+    """Executes the command passed on the given node and returns result as string."""
+    return str(node.account.ssh_client.exec_command(cmd)[1].read(), sys.getdefaultencoding())
+
+
+def node_id(node):
+    """
+    Returns node id from its log if started.
+    This is a remote call. Reuse its results if possible.
+    """
+    regexp = "^>>> Local node \\[ID=([^,]+),.+$"
+    cmd = "grep -E '%s' %s | sed -r 's/%s/\\1/'" % (regexp, node.log_file, regexp)
+
+    return exec_command(node, cmd).strip().lower()

Review comment:
       Because it is 'statuc' helpers to binded to any instanse. Is it ok?

##########
File path: modules/ducktests/tests/ignitetest/services/ignite.py
##########
@@ -90,3 +91,19 @@ def get_event_time(service, log_node, log_pattern, from_the_beginning=True, time
 
     return datetime.strptime(re.match("^\\[[^\\[]+\\]", stdout.read().decode("utf-8")).group(),
                              "[%Y-%m-%d %H:%M:%S,%f]")
+
+
+def exec_command(node, cmd):
+    """Executes the command passed on the given node and returns result as string."""
+    return str(node.account.ssh_client.exec_command(cmd)[1].read(), sys.getdefaultencoding())
+
+
+def node_id(node):
+    """
+    Returns node id from its log if started.
+    This is a remote call. Reuse its results if possible.
+    """
+    regexp = "^>>> Local node \\[ID=([^,]+),.+$"
+    cmd = "grep -E '%s' %s | sed -r 's/%s/\\1/'" % (regexp, node.log_file, regexp)
+
+    return exec_command(node, cmd).strip().lower()

Review comment:
       Because it is 'static' helpers to binded to any instanse. Is it ok?

##########
File path: modules/ducktests/tests/ignitetest/services/ignite.py
##########
@@ -90,3 +91,19 @@ def get_event_time(service, log_node, log_pattern, from_the_beginning=True, time
 
     return datetime.strptime(re.match("^\\[[^\\[]+\\]", stdout.read().decode("utf-8")).group(),
                              "[%Y-%m-%d %H:%M:%S,%f]")
+
+
+def exec_command(node, cmd):
+    """Executes the command passed on the given node and returns result as string."""
+    return str(node.account.ssh_client.exec_command(cmd)[1].read(), sys.getdefaultencoding())
+
+
+def node_id(node):
+    """
+    Returns node id from its log if started.
+    This is a remote call. Reuse its results if possible.
+    """
+    regexp = "^>>> Local node \\[ID=([^,]+),.+$"
+    cmd = "grep -E '%s' %s | sed -r 's/%s/\\1/'" % (regexp, node.log_file, regexp)
+
+    return exec_command(node, cmd).strip().lower()

Review comment:
       Because they are 'static' helpers not binded to any instanse. Is it ok?

##########
File path: modules/ducktests/tests/ignitetest/utils/ignite_test.py
##########
@@ -55,3 +55,11 @@ def tearDown(self):
         self.logger.debug("All services killed.")
 
         super().tearDown()
+
+    def _global_param(self, param_name, default=None):
+        """Reads global parameter passed to the test suite."""
+        return self.test_context.globals.get(param_name, default)
+
+    def _global_int(self, param_name, default: int = None):

Review comment:
       Hm. The autoformat says current typing is correct where as supposed fixes are not:
   For 'default = None' - PEP 8: E251 unexpected staces...
   For 'int=None' - PEP 8: E252: Missing whitespace aroung parameters.
   
   Same from pylint.
   
   The reason for _global_param is that it's like 'private' method. Not intended to call it outside

##########
File path: modules/ducktests/tests/ignitetest/tests/discovery_test.py
##########
@@ -213,36 +215,37 @@ def _perform_node_fail_scenario(self, test_config):
                       "targetNodes": tran_nodes,
                       "transactional": bool(tran_nodes)}
 
-            start_load_app(self.test_context, ignite_config=load_config, params=params, modules=modules)
+            start_load_app(self.test_context, load_config, params, modules)
 
+        # Minimal detection timeout is failure_detection_timeout * 2. Let's more to capture also 'bad' results.
         results.update(self._simulate_and_detect_failure(servers, failed_nodes,
-                                                         test_config.failure_detection_timeout * 4,
+                                                         0.004 * ignite_config.failure_detection_timeout,

Review comment:
       Fixed




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] Vladsz83 commented on a change in pull request #8522: GNITE-13835 : Improve discovery ducktape test to research small timeouts and behavior on large cluster.

Posted by GitBox <gi...@apache.org>.
Vladsz83 commented on a change in pull request #8522:
URL: https://github.com/apache/ignite/pull/8522#discussion_r542308517



##########
File path: modules/ducktests/tests/ignitetest/tests/discovery_test.py
##########
@@ -88,99 +93,100 @@ def __init__(self, test_context):
 
     @cluster(num_nodes=MAX_CONTAINERS)
     @ignite_versions(str(DEV_BRANCH), str(LATEST))
-    @matrix(nodes_to_kill=[1, 2], failure_detection_timeout=[FAILURE_TIMEOUT],
-            load_type=[ClusterLoad.NONE, ClusterLoad.ATOMIC, ClusterLoad.TRANSACTIONAL])
-    def test_nodes_fail_not_sequential_tcp(self, ignite_version, nodes_to_kill, load_type, failure_detection_timeout):
+    @matrix(nodes_to_kill=[1, 2], load_type=[ClusterLoad.NONE, ClusterLoad.ATOMIC, ClusterLoad.TRANSACTIONAL])
+    def test_nodes_fail_not_sequential_tcp(self, ignite_version, nodes_to_kill, load_type):
         """
         Test nodes failure scenario with TcpDiscoverySpi not allowing nodes to fail in a row.
         """
         test_config = DiscoveryTestConfig(version=IgniteVersion(ignite_version), nodes_to_kill=nodes_to_kill,
-                                          load_type=ClusterLoad.construct_from(load_type), sequential_failure=False,
-                                          failure_detection_timeout=failure_detection_timeout)
+                                          load_type=ClusterLoad.construct_from(load_type), sequential_failure=False)
 
         return self._perform_node_fail_scenario(test_config)
 
     @cluster(num_nodes=MAX_CONTAINERS)
     @ignite_versions(str(DEV_BRANCH), str(LATEST))
-    @matrix(load_type=[ClusterLoad.NONE, ClusterLoad.ATOMIC, ClusterLoad.TRANSACTIONAL],
-            failure_detection_timeout=[FAILURE_TIMEOUT])
-    def test_2_nodes_fail_sequential_tcp(self, ignite_version, load_type, failure_detection_timeout):
+    @matrix(load_type=[ClusterLoad.NONE, ClusterLoad.ATOMIC, ClusterLoad.TRANSACTIONAL])
+    def test_2_nodes_fail_sequential_tcp(self, ignite_version, load_type):
         """
         Test 2 nodes sequential failure scenario with TcpDiscoverySpi.
         """
         test_config = DiscoveryTestConfig(version=IgniteVersion(ignite_version), nodes_to_kill=2,
-                                          load_type=ClusterLoad.construct_from(load_type), sequential_failure=True,
-                                          failure_detection_timeout=failure_detection_timeout)
+                                          load_type=ClusterLoad.construct_from(load_type), sequential_failure=True)
 
         return self._perform_node_fail_scenario(test_config)
 
     @cluster(num_nodes=MAX_CONTAINERS)
     @version_if(lambda version: version != V_2_8_0)  # ignite-zookeeper package is broken in 2.8.0
     @ignite_versions(str(DEV_BRANCH), str(LATEST))
-    @matrix(nodes_to_kill=[1, 2], failure_detection_timeout=[FAILURE_TIMEOUT],
-            load_type=[ClusterLoad.NONE, ClusterLoad.ATOMIC, ClusterLoad.TRANSACTIONAL])
-    def test_nodes_fail_not_sequential_zk(self, ignite_version, nodes_to_kill, load_type, failure_detection_timeout):
+    @matrix(nodes_to_kill=[1, 2], load_type=[ClusterLoad.NONE, ClusterLoad.ATOMIC, ClusterLoad.TRANSACTIONAL])
+    def test_nodes_fail_not_sequential_zk(self, ignite_version, nodes_to_kill, load_type):
         """
         Test node failure scenario with ZooKeeperSpi not allowing nodes to fail in a row.
         """
         test_config = DiscoveryTestConfig(version=IgniteVersion(ignite_version), nodes_to_kill=nodes_to_kill,
                                           load_type=ClusterLoad.construct_from(load_type), sequential_failure=False,
-                                          with_zk=True, failure_detection_timeout=failure_detection_timeout)
+                                          with_zk=True)
 
         return self._perform_node_fail_scenario(test_config)
 
     @cluster(num_nodes=MAX_CONTAINERS)
     @version_if(lambda version: version != V_2_8_0)  # ignite-zookeeper package is broken in 2.8.0
     @ignite_versions(str(DEV_BRANCH), str(LATEST))
-    @matrix(load_type=[ClusterLoad.NONE, ClusterLoad.ATOMIC, ClusterLoad.TRANSACTIONAL],
-            failure_detection_timeout=[FAILURE_TIMEOUT])
-    def test_2_nodes_fail_sequential_zk(self, ignite_version, load_type, failure_detection_timeout):
+    @matrix(load_type=[ClusterLoad.NONE, ClusterLoad.ATOMIC, ClusterLoad.TRANSACTIONAL])
+    def test_2_nodes_fail_sequential_zk(self, ignite_version, load_type):
         """
         Test node failure scenario with ZooKeeperSpi not allowing to fail nodes in a row.
         """
         test_config = DiscoveryTestConfig(version=IgniteVersion(ignite_version), nodes_to_kill=2,
                                           load_type=ClusterLoad.construct_from(load_type), sequential_failure=True,
-                                          with_zk=True, failure_detection_timeout=failure_detection_timeout)
+                                          with_zk=True)
 
         return self._perform_node_fail_scenario(test_config)
 
     def _perform_node_fail_scenario(self, test_config):
-        max_containers = len(self.test_context.cluster)
+        failure_detection_timeout = self._global_int(self.GLOBAL_DETECTION_TIMEOUT, self.DEFAULT_DETECTION_TIMEOUT)
 
         # One node is required to detect the failure.
-        assert max_containers >= 1 + test_config.nodes_to_kill + (
-            DiscoveryTest.ZOOKEEPER_NODES if test_config.with_zk else 0) + (
-                   0 if test_config.load_type == ClusterLoad.NONE else 1), "Few required containers: " + \
-                                                                           str(max_containers) + ". Check the params."
+        assert len(self.test_context.cluster) >= 1 + test_config.nodes_to_kill + (

Review comment:
       The reason is "Too many local variables".




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org