You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Paulo Motta (JIRA)" <ji...@apache.org> on 2016/11/24 14:16:58 UTC

[jira] [Comment Edited] (CASSANDRA-12905) Retry acquire MV lock on failure instead of throwing WTE on streaming

    [ https://issues.apache.org/jira/browse/CASSANDRA-12905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15693376#comment-15693376 ] 

Paulo Motta edited comment on CASSANDRA-12905 at 11/24/16 2:16 PM:
-------------------------------------------------------------------

Hmm, it seems that when the node fails to grab the MV lock for a repair mutation it throws a WriteTimeoutException, failing the repair session so this is indeed a bug. In order to fix, we should retry acquiring the lock instead of failing the stream session for mutations originating from repair.

As a workaround, I suggest you to break up repair by tables and disable incremental repair for tables with MVs since these are broken due to CASSANDRA-12888, what should reduce the contention and decrease ocurrences of WriteTimeOut during repair streaming.


was (Author: pauloricardomg):
Hmm, it seems that when the node fails to grab the MV lock for a repair mutation it throws a WriteTimeoutException, failing the repair session so this is indeed a bug. In order to fix, we should retry acquiring the lock instead of throwing the repair session for mutations originating from repair.

As a workaround, I suggest you to break up repair by tables and disable incremental repair for tables with MVs since these are broken due to CASSANDRA-12888, what should reduce the contention and decrease ocurrences of WriteTimeOut during repair streaming.

> Retry acquire MV lock on failure instead of throwing WTE on streaming
> ---------------------------------------------------------------------
>
>                 Key: CASSANDRA-12905
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12905
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Streaming and Messaging
>         Environment: centos 6.7 x86_64
>            Reporter: Nir Zilka
>             Fix For: 3.9
>
>
> Hello,
> I performed two upgrades to the current cluster (currently 15 nodes, 1 DC, private VLAN),
> first it was 2.2.5.1 and repair worked flawlessly,
> second upgrade was to 3.0.9 (with upgradesstables) and also repair worked well,
> then i upgraded 2 weeks ago to 3.9 - and the repair problems started.
> there are several errors types from the system.log (different nodes) :
> - Sync failed between /xxx.xxx.xxx.xxx and /xxx.xxx.xxx.xxx
> - Streaming error occurred on session with peer xxx.xxx.xxx.xxx Operation timed out - received only 0 responses
> - Remote peer xxx.xxx.xxx.xxx failed stream session
> - Session completed with the following error
> org.apache.cassandra.streaming.StreamException: Stream failed
> ----
> i use 3.9 default configuration with the cluster settings adjustments (3 seeds, GossipingPropertyFileSnitch).
> streaming_socket_timeout_in_ms is the default (86400000).
> i'm afraid from consistency problems while i'm not performing repair.
> Any ideas?
> Thanks,
> Nir.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)