You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "xiangwang (Jira)" <ji...@apache.org> on 2019/12/02 10:08:00 UTC

[jira] [Updated] (CASSANDRA-15440) Run "nodetool repair -pr" concurrently

     [ https://issues.apache.org/jira/browse/CASSANDRA-15440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

xiangwang updated CASSANDRA-15440:
----------------------------------
    Description: 
Running "nodetool repair -pr" on each node one by one is too slow.
However, running the command on all nodes at the same time is more resource consuming  because this can trigger more job threads on each node due to the token range overlap among the nodes.
It will be faster if we can run "nodetool repair -pr" concurrently on multiple nodes without token range intersections(overlap).

************
Say, the RF is 3, and we have A-Z nodes. For now, without this feature, we have to do below:

1. Get the information of each primary nodes' token tranges according to the logs of running "nodetool repair -pr" on each node:
RangeCollection_A primary node: nodeA
RangeCollection_B primary node: nodeB
RangeCollection_C primary node: nodeC
...

2. Get the output of running "./nodetool describering prod_keyspace >> nodetool_describering_prod_keyspace.log":
... 
TokenRange(start_token:-1589028858003231727, end_token:-1586606433049008069, endpoints:[10.81.74.134, 10.81.74.132, 10.81.74.133], rpc_endpoints:[10.81.74.134, 10.81.74.132, 10.81.74.133], endpoint_details:[EndpointDetails(host:10.81.74.134, datacenter:hk, rack:1c), EndpointDetails(host:10.81.74.132, datacenter:hk, rack:1a), EndpointDetails(host:10.81.74.133, datacenter:hk, rack:1b)])
...

3. Calculate the overlap of the token ranges among all the nodes.
For example,
RangeCollection_N is stored on nodeN(primary node), nodeO, nodeP
RangeCollection_O is stored on nodeO(primary node), nodeP, nodeQ
RangeCollection_P is stored on nodeP(primary node), nodeQ, nodeR
RangeCollection_Q is stored on nodeQ(primary node), nodeR, nodeS
RangeCollection_R is stored on nodeR(primary node), nodeS, nodeT
RangeCollection_S is stored on nodeS(primary node), nodeT, nodeU


4. Then according to the intersections we figure out, we can find a schedule to make sure there is only one job thread running on each nodes.
For example, the command can be run in the following  3 rounds:
1st round: nodeN and nodeQ
2nd round: nodeO and nodeR
3rd round: nodeP and nodeS




  was:
Running "nodetool repair -pr" on each node one by one is too slow.
However, running the command on all nodes at the same time is more resource consuming  because this can trigger more job threads on each node due to the token range overlap among the nodes.
It will be faster if we can run "nodetool repair -pr" concurrently on multiple nodes without token range intersections(overlap).

************
Say, the RF is 3, and we have A-Z nodes. For now, without this feature, we have to do below:

1. Get the information of each primary nodes' token tranges according to the logs of running "nodetool repair -pr" on each node:
primary nodeA for RangeCollection_A {rangeA_1, ..., rangeA_n}
primary nodeB for RangeCollection_B {rangeB_1, ..., rangeB_n}
primary nodeC for RangeCollection_C {rangeC_1, ..., rangeC_n}
primary nodeD for RangeCollection_D {rangeD_1, ..., rangeD_n}
primary nodeE for RangeCollection_E {rangeE_1, ..., rangeE_n}
primary nodeF for RangeCollection_F {rangeF_1, ..., rangeF_n}
primary nodeG for RangeCollection_G {rangeG_1, ..., rangeG_n}
...
primary nodeZ for RangeCollection_Z {rangeZ_1, ..., rangeZ_n}

2. Get the output of running "./nodetool describering prod_keyspace >> nodetool_describering_prod_keyspace.log":
... 
TokenRange(start_token:-1589028858003231727, end_token:-1586606433049008069, endpoints:[10.81.74.134, 10.81.74.132, 10.81.74.133], rpc_endpoints:[10.81.74.134, 10.81.74.132, 10.81.74.133], endpoint_details:[EndpointDetails(host:10.81.74.134, datacenter:hk, rack:1c), EndpointDetails(host:10.81.74.132, datacenter:hk, rack:1a), EndpointDetails(host:10.81.74.133, datacenter:hk, rack:1b)])
...

3. Calculate the overlap of the token ranges among all the nodes.
For example,
RangeCollection_N is stored on nodeN(primary node), nodeO, nodeP
RangeCollection_O is stored on nodeO(primary node), nodeP, nodeQ
RangeCollection_P is stored on nodeP(primary node), nodeQ, nodeR
RangeCollection_Q is stored on nodeQ(primary node), nodeR, nodeS
RangeCollection_R is stored on nodeR(primary node), nodeS, nodeT
RangeCollection_S is stored on nodeS(primary node), nodeT, nodeU


4. Then according to the intersections we figure out, we can find a schedule to make sure there is only one job thread running on each nodes.
For example, the command can be run in the following  3 rounds:
1st round: nodeN and nodeQ
2nd round: nodeO and nodeR
3rd round: nodeP and nodeS





> Run "nodetool repair -pr" concurrently
> --------------------------------------
>
>                 Key: CASSANDRA-15440
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15440
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Tool/nodetool
>            Reporter: xiangwang
>            Priority: Normal
>
> Running "nodetool repair -pr" on each node one by one is too slow.
> However, running the command on all nodes at the same time is more resource consuming  because this can trigger more job threads on each node due to the token range overlap among the nodes.
> It will be faster if we can run "nodetool repair -pr" concurrently on multiple nodes without token range intersections(overlap).
> ************
> Say, the RF is 3, and we have A-Z nodes. For now, without this feature, we have to do below:
> 1. Get the information of each primary nodes' token tranges according to the logs of running "nodetool repair -pr" on each node:
> RangeCollection_A primary node: nodeA
> RangeCollection_B primary node: nodeB
> RangeCollection_C primary node: nodeC
> ...
> 2. Get the output of running "./nodetool describering prod_keyspace >> nodetool_describering_prod_keyspace.log":
> ... 
> TokenRange(start_token:-1589028858003231727, end_token:-1586606433049008069, endpoints:[10.81.74.134, 10.81.74.132, 10.81.74.133], rpc_endpoints:[10.81.74.134, 10.81.74.132, 10.81.74.133], endpoint_details:[EndpointDetails(host:10.81.74.134, datacenter:hk, rack:1c), EndpointDetails(host:10.81.74.132, datacenter:hk, rack:1a), EndpointDetails(host:10.81.74.133, datacenter:hk, rack:1b)])
> ...
> 3. Calculate the overlap of the token ranges among all the nodes.
> For example,
> RangeCollection_N is stored on nodeN(primary node), nodeO, nodeP
> RangeCollection_O is stored on nodeO(primary node), nodeP, nodeQ
> RangeCollection_P is stored on nodeP(primary node), nodeQ, nodeR
> RangeCollection_Q is stored on nodeQ(primary node), nodeR, nodeS
> RangeCollection_R is stored on nodeR(primary node), nodeS, nodeT
> RangeCollection_S is stored on nodeS(primary node), nodeT, nodeU
> 4. Then according to the intersections we figure out, we can find a schedule to make sure there is only one job thread running on each nodes.
> For example, the command can be run in the following  3 rounds:
> 1st round: nodeN and nodeQ
> 2nd round: nodeO and nodeR
> 3rd round: nodeP and nodeS



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org