You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Paulo Motta (JIRA)" <ji...@apache.org> on 2015/11/10 16:48:11 UTC
[jira] [Commented] (CASSANDRA-10644) multiple repair dtest fails under Windows

    [ https://issues.apache.org/jira/browse/CASSANDRA-10644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14998789#comment-14998789 ] 

Paulo Motta commented on CASSANDRA-10644:
-----------------------------------------

Basically there is a 1 second gap between when the repair stream connection handler is closed and the actual outgoing socket is gracefully closed ({{messageQueue.poll(1, TimeUnit.SECONDS))}} on {{OutgoingMessageHandler}}). When node2 is abruptly stopped before that 1 second has passed, the incoming socket on the other side (node3) is closed gracefully on Linux, but not on Windows (see this StackOverflow [thread|http://stackoverflow.com/questions/22931811/differences-on-java-sockets-between-windows-and-linux-how-to-handle-them] for more details).

Most of the times the test does not fail because node2 is stopped after this 1 second period, so a quick and dirty fix is basically to sleep for 2 seconds on windows before abruptly stopping node2 after a repair session. Since this is a very specific and unlikely situation I think it's enough to address this only in the dtest. WDYT [~yukim]?

Created [dtest PR|https://github.com/riptano/cassandra-dtest/pull/654] with quick and dirty fix.

> multiple repair dtest fails under Windows
> -----------------------------------------
>
>                 Key: CASSANDRA-10644
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10644
>             Project: Cassandra
>          Issue Type: Sub-task
>            Reporter: Jim Witschey
>            Assignee: Paulo Motta
>             Fix For: 3.1, 2.2.x
>
>
> {{incremental_repair_test.py:TestIncRepair.multiple_repair_test}} flaps on CassCI Windows runs on C* 3.0:
> http://cassci.datastax.com/view/win32/job/cassandra-3.0_dtest_win32/100/testReport/junit/incremental_repair_test/TestIncRepair/multiple_repair_test/history/
> The error is {{An existing connection was forcibly closed by the remote host}}, and happens consistently in the failing runs:
> http://cassci.datastax.com/view/win32/job/cassandra-3.0_dtest_win32/100/testReport/junit/incremental_repair_test/TestIncRepair/multiple_repair_test/
> http://cassci.datastax.com/view/win32/job/cassandra-3.0_dtest_win32/72/testReport/junit/incremental_repair_test/TestIncRepair/multiple_repair_test/
> [~yukim] Can you have a look? I feel like you're more likely than anyone else to understand the streaming error. In particular: is this what happens when a node goes down? This could be an environment error, rather than a C* bug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)