You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2019/12/09 04:17:00 UTC

[jira] [Commented] (KUDU-3001) Multi-thread to load containers in a data directory

    [ https://issues.apache.org/jira/browse/KUDU-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16991124#comment-16991124 ] 

ASF subversion and git services commented on KUDU-3001:
-------------------------------------------------------

Commit 6b6910870ce2c35bf8b9be9408f44a8cec6b580a in kudu's branch refs/heads/master from Yingchun Lai
[ https://gitbox.apache.org/repos/asf?p=kudu.git;h=6b69108 ]

KUDU-3001 Multi-thread to load containers in a data directory

When a data directory has many block containers, a single thread to
load these container files is low efficiency, we can improve it by
multi-threads.

We did some simple benchmarks to verify it. Adjust
'log_container_max_size' to 1GB to generate more containers when do
benchmarks, adjust 'startup_benchmark_data_dir_count_for_testing' to 8
to make sure existing concurrent data directories load are effective,
and adjust 'fs_max_thread_count_per_data_dir' and
'startup_benchmark_block_count_for_testing' to different
values, timing 10 times ReopenBlockManager(), in milliseconds,
result details as follow:

disk type: SSD
                         |                          new version
Block count  old version | 1 thread | 2 threads | 4 threads | 8 threads | 16 threads | 32 threads
    100,000        2,375      2,382       2,342       2,372       2,343        2,353        2,393
  1,000,000       24,018     23,813      22,628      22,407      22,367       22,636       23,173
  2,000,000       50,163     51,120      39,726      37,589      37,671       37,501       37,710
  4,000,000      104,051    105,560      90,427      79,778      73,129       73,205       74,947
  8,000,000      214,347    216,210     199,456     159,143     157,190      158,798      157,056

disk type: spinning disk
                         |                          new version
Block count  old version | 1 thread | 2 threads | 4 threads | 8 threads | 16 threads | 32 threads
    100,000        3,207      3,347       3,345       3,279       3,237        3,263        3,221
  1,000,000       33,659     34,106      32,081      30,261      30,142       30,115       30,876
  2,000,000       68,097     74,939      56,976      51,407      50,957       56,299       58,456
  4,000,000      146,503    162,389     116,956     104,435      94,905      102,606      100,526
  8,000,000      331,201    349,609     267,259     247,069     243,064      247,810      247,472

Change-Id: I0721ee4a5a6824db146ba0658e60eec25dd0c65c
Reviewed-on: http://gerrit.cloudera.org:8080/14743
Reviewed-by: Adar Dembo <ad...@cloudera.com>
Tested-by: Adar Dembo <ad...@cloudera.com>


> Multi-thread to load containers in a data directory
> ---------------------------------------------------
>
>                 Key: KUDU-3001
>                 URL: https://issues.apache.org/jira/browse/KUDU-3001
>             Project: Kudu
>          Issue Type: Improvement
>            Reporter: Yingchun Lai
>            Assignee: Yingchun Lai
>            Priority: Major
>             Fix For: 1.12.0
>
>
> As what [~tlipcon] mentioned in https://issues.apache.org/jira/browse/KUDU-2014, we can improve tserver startup time by load containers in a data directoty by multiple threads.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)