You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@cassandra.apache.org by "Jonathan Ellis (JIRA)" <ji...@apache.org> on 2011/02/27 17:57:39 UTC

[jira] Commented: (CASSANDRA-2253) Gossiper Starvation

    [ https://issues.apache.org/jira/browse/CASSANDRA-2253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12999998#comment-12999998 ] 

Jonathan Ellis commented on CASSANDRA-2253:
-------------------------------------------

bq. use a separate pool for periodic and non periodic tasks

that's reasonable; so might splitting Gossiper off to its own executor

> Gossiper Starvation
> -------------------
>
>                 Key: CASSANDRA-2253
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2253
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.0
>         Environment: linux, windows
>            Reporter: Mikael Sitruk
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Gossiper periodic task will get into starvation in case large sstable files need to be deleted.
> Indeed the SSTableDeletingReference uses the same scheduledTasks pool (from StorageService) as the Gossiper and other periodic tasks, but the gossiper tasks should run each second to assure correct cluster status (liveness of nodes). In case of large sstable files to be deleted (several GB) the delete operation can take more than 30 sec, thus making the whole cluster going into a wrong state where nodes are marked as not living while they are!
> This will lead to unneeded additional load like hinted hand off, wrong cluster state, increase in latency.
> One of the possible solution is to use a separate pool for periodic and non periodic tasks. 
> I've implemented such change and it resolves the problem. 
> I can provide a patch 

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira