You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Jeff Jirsa (JIRA)" <ji...@apache.org> on 2017/03/12 22:15:04 UTC
[jira] [Commented] (CASSANDRA-13196) test failure in snitch_test.TestGossipingPropertyFileSnitch.test_prefer_local_reconnect_on_listen_address

    [ https://issues.apache.org/jira/browse/CASSANDRA-13196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15906715#comment-15906715 ] 

Jeff Jirsa commented on CASSANDRA-13196:
----------------------------------------

There's a real risk (in large clusters, or in clusters with large schemas, or when upgrading versions where we run in a mixed-version state) that we can have a lot of migrationtasks in flight, so much so that we can actually kill nodes ( see CASSANDRA-11748 for example ) - re-queueing more migration tasks when one times out is a good way to make the problem worse, not better. I'm very concerned with the approach [here|https://github.com/Gerrrr/cassandra/commit/463f3fecd9348ea0a4ce6eeeb30141527b8b10eb#diff-f484a759f797776d9cc5d8af92b29e5eR156] where we just blindly schedule another poll. 

Do we even know why this failed in the first place? Isn't the right fix understanding why all 3 migration tasks failed, not just making more and more and more migration tasks?

> test failure in snitch_test.TestGossipingPropertyFileSnitch.test_prefer_local_reconnect_on_listen_address
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-13196
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13196
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Michael Shuler
>            Assignee: Aleksandr Sorokoumov
>              Labels: dtest, test-failure
>         Attachments: node1_debug.log, node1_gc.log, node1.log, node2_debug.log, node2_gc.log, node2.log
>
>
> example failure:
> http://cassci.datastax.com/job/trunk_dtest/1487/testReport/snitch_test/TestGossipingPropertyFileSnitch/test_prefer_local_reconnect_on_listen_address
> {code}
> {novnode}
> Error Message
> Error from server: code=2200 [Invalid query] message="keyspace keyspace1 does not exist"
> -------------------- >> begin captured logging << --------------------
> dtest: DEBUG: cluster ccm directory: /tmp/dtest-k6b0iF
> dtest: DEBUG: Done setting configuration options:
> {   'initial_token': None,
>     'num_tokens': '32',
>     'phi_convict_threshold': 5,
>     'range_request_timeout_in_ms': 10000,
>     'read_request_timeout_in_ms': 10000,
>     'request_timeout_in_ms': 10000,
>     'truncate_request_timeout_in_ms': 10000,
>     'write_request_timeout_in_ms': 10000}
> cassandra.policies: INFO: Using datacenter 'dc1' for DCAwareRoundRobinPolicy (via host '127.0.0.1'); if incorrect, please specify a local_dc to the constructor, or limit contact points to local cluster nodes
> cassandra.cluster: INFO: New Cassandra host <Host: 127.0.0.1 dc1> discovered
> --------------------- >> end captured logging << ---------------------
> Stacktrace
>   File "/usr/lib/python2.7/unittest/case.py", line 329, in run
>     testMethod()
>   File "/home/automaton/cassandra-dtest/snitch_test.py", line 87, in test_prefer_local_reconnect_on_listen_address
>     new_rows = list(session.execute("SELECT * FROM {}".format(stress_table)))
>   File "/home/automaton/src/cassandra-driver/cassandra/cluster.py", line 1998, in execute
>     return self.execute_async(query, parameters, trace, custom_payload, timeout, execution_profile, paging_state).result()
>   File "/home/automaton/src/cassandra-driver/cassandra/cluster.py", line 3784, in result
>     raise self._final_exception
> 'Error from server: code=2200 [Invalid query] message="keyspace keyspace1 does not exist"\n-------------------- >> begin captured logging << --------------------\ndtest: DEBUG: cluster ccm directory: /tmp/dtest-k6b0iF\ndtest: DEBUG: Done setting configuration options:\n{   \'initial_token\': None,\n    \'num_tokens\': \'32\',\n    \'phi_convict_threshold\': 5,\n    \'range_request_timeout_in_ms\': 10000,\n    \'read_request_timeout_in_ms\': 10000,\n    \'request_timeout_in_ms\': 10000,\n    \'truncate_request_timeout_in_ms\': 10000,\n    \'write_request_timeout_in_ms\': 10000}\ncassandra.policies: INFO: Using datacenter \'dc1\' for DCAwareRoundRobinPolicy (via host \'127.0.0.1\'); if incorrect, please specify a local_dc to the constructor, or limit contact points to local cluster nodes\ncassandra.cluster: INFO: New Cassandra host <Host: 127.0.0.1 dc1> discovered\n--------------------- >> end captured logging << ---------------------'
> {novnode}
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)