You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2020/09/10 02:36:00 UTC

[jira] [Commented] (KUDU-1728) Parallelize tablet copy operations

    [ https://issues.apache.org/jira/browse/KUDU-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17193311#comment-17193311 ] 

ASF subversion and git services commented on KUDU-1728:
-------------------------------------------------------

Commit 56ce1ad8bda24dbaed36c0f059e13a3d8e25b1d0 in kudu's branch refs/heads/master from ningw
[ https://gitbox.apache.org/repos/asf?p=kudu.git;h=56ce1ad ]

KUDU-1728 parallelize download blocks in tablet-copy-client

Parallelize the action of 'Download blocks' in tablet-copy-client.

Previsouly downloading blocks from tablet server is executed sequentially,
thus actions which related to 'DownloadBlocks' like
'recover from other tserver', 'cluster rebalance' may be slow sometimes.
And sometimes downloading the blocks is slow when only one thread is used
while bandwidth isn't the bottleneck.

This commit introduce FLAGS_num_threads_blocks_download to control the
number of threads for download blocks within a tablet-copy-client.

Here I attached the simple benchmark, the result are averaged by 8 epoch
result. Metrics of result is seconds with accuracy of 2 decimal points.

The experiment is download all data from remote tserver to local tserver.

Settings:
Remote machine: 8 cores 32g tserver limited to 8g, tserver version 1.10
Local machine: 8 cores 12g memory
Tserver has 3 x 500M tablets + 3 x 430M tablets + 3 x 420M tablets and
259 x 8M tablets.
All tablets were compacted well with diskrowset height 1.0.

Here I use two variable to controll the experiment.
numbers of tablet-copy-client thread, write as tc.
numbers of blocks-download thread within each tablet-copy-client thread,
(FLAGS_num_threads_blocks_download), write as bd.

The result 37.58 correspond to column 'tc-2', row 'bd-4' can be explained as,
It takes 37.58s to download all data from tserver with 2
tablet-copy-client threads and each tablet-copy-client thread has 4 threads to
download blocks.

All result in column tc-1 refer to behave before this patch.

time     tc-1      tc-2      tc-4      tc-8

bd-1     76.91     48.50     34.27     32.39
bd-2     54.45     43.26     33.27     32.37
bd-4     48.36     37.58     33.31     32.30
bd-8     47.95     38.36     33.98     32.60

Change-Id: Id83abca7a38cf183d9c27d82bb8a022699079e0e
Reviewed-on: http://gerrit.cloudera.org:8080/16274
Tested-by: Alexey Serbin <as...@cloudera.com>
Reviewed-by: Alexey Serbin <as...@cloudera.com>


> Parallelize tablet copy operations
> ----------------------------------
>
>                 Key: KUDU-1728
>                 URL: https://issues.apache.org/jira/browse/KUDU-1728
>             Project: Kudu
>          Issue Type: Bug
>          Components: consensus, tablet
>            Reporter: Mike Percy
>            Priority: Major
>              Labels: roadmap-candidate
>
> Parallelize tablet copy operations. Right now all data is copied serially. We may want to consider throttling on either side if we want to budget IO.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)