You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Yuki Morishita (JIRA)" <ji...@apache.org> on 2014/07/01 17:56:25 UTC
[jira] [Commented] (CASSANDRA-6455) Improve concurrency of repair
process
[ https://issues.apache.org/jira/browse/CASSANDRA-6455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14048988#comment-14048988 ]
Yuki Morishita commented on CASSANDRA-6455:
-------------------------------------------
Pushed latest version: https://github.com/yukim/cassandra/tree/6455-v3
bq. Seems the rebase lost CASSANDRA-3569 - we need to unregister from the FD once all validation messages have arrived.
Added.
bq. We should probably cap how big X we can have in -j X - really easy to OOM the nodes involved if you put a big X in.
Right. I thought about the right number to cap and came up with 4 because we don't want to push too much anyway. I also updated command option description to clarify.
bq. Should we make the taskExecutor in RepairSession static?
The reason I made taskExecutor local to RepairSession instance is to cancel all submitted tasks when session faild. I left this as is in the latest version.
bq. Why do we add ourselves as a no-op StreamEventHandler in LocalSyncTask/StreamingRepairTask when creating the StreamPlan?
handleStreamEvent as well as onSuccess/onFailure is part of StreamEventHandler. We don't need to handle only on success/failure but not other events like progress.
> Improve concurrency of repair process
> -------------------------------------
>
> Key: CASSANDRA-6455
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6455
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Yuki Morishita
> Assignee: Yuki Morishita
> Priority: Minor
> Fix For: 3.0
>
> Attachments: 6455-3.0.txt, 6455.txt
>
>
> Currently, most of the repair tasks (taking snapshots, send/receiving merkle tree, compute MT difference, etc) are done on single threaded AntiEntropyStage.
> This causes a problem like CASSANDRA-6415 and likely to cause unnecessary wait.
> Also, repair is done one CF at the time. I think we can parallelize this(concurrency is configurable by a user based on # of CF and load of the nodes) for faster processing.
--
This message was sent by Atlassian JIRA
(v6.2#6252)