You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2020/10/14 02:35:00 UTC

[jira] [Commented] (KUDU-3195) Make DMS flush policy more robust when maintenance threads are idle

    [ https://issues.apache.org/jira/browse/KUDU-3195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17213532#comment-17213532 ] 

ASF subversion and git services commented on KUDU-3195:
-------------------------------------------------------

Commit 640a84ecff857c3d0447c690c68e2361eb3e9c3b in kudu's branch refs/heads/master from Andrew Wong
[ https://gitbox.apache.org/repos/asf?p=kudu.git;h=640a84e ]

KUDU-3195: flush when any DMS in the tablet is older than the time threshold

Currently each tablet will wait at least 2 minutes (controlled by
--flush_threshold_secs) between flushing DMSs, even if there are several
DMSs that are older than 2 minutes in a given tablet. This means that
for tablets with several dozen rowsets and updates across the entire
tablet, it could take hours to flush all the deltas.

Rather than waiting for 2 minutes since the last flush time before
considering time-based flushing, this patch tracks the creation time of
every DMS and flushes as long as there is a DMS that is older than 2
minutes in the tablet.

Change-Id: Id05202bf6a4685f4d79db11ef8ebb0f91f6316b4
Reviewed-on: http://gerrit.cloudera.org:8080/16581
Tested-by: Kudu Jenkins
Reviewed-by: Alexey Serbin <as...@cloudera.com>


> Make DMS flush policy more robust when maintenance threads are idle
> -------------------------------------------------------------------
>
>                 Key: KUDU-3195
>                 URL: https://issues.apache.org/jira/browse/KUDU-3195
>             Project: Kudu
>          Issue Type: Improvement
>          Components: tserver
>    Affects Versions: 1.13.0
>            Reporter: Alexey Serbin
>            Priority: Major
>
> In one scenario I observed very long bootstrap times of tablet servers (something between to 45 minutes and 60 minutes) even if tablet servers had relatively small amount of data under management (~80GByte).  It turned out the time was spent on replaying WAL segments, with {{kudu cluster ksck}} reporting something like below all the time during bootstrap:
> {noformat}
>   b0a20b117a1242ae9fc15620a6f7a524 (tserver-6.local.site:7050): not running
>     State:       BOOTSTRAPPING
>     Data state:  TABLET_DATA_READY
>     Last status: Bootstrap replaying log segment 21/37 (2.28M/7.85M this segment, stats: ops{read=27374 overwritten=0 applied=25016 ignored=657} inserts{seen=5949247 
> ignored=0} mutations{seen=0 ignored=0} orphaned_commits=7)
> {noformat}
> The workload I ran before shutting down the tablet servers consisted of many small UPSERT operations, but the cluster was idle after terminating the workload for long time (about few hours or so).  The workload was generated by
> {noformat}
> kudu perf loadgen \
>   --table_name=$TABLE_NAME \
>   --num_rows_per_thread=800000000 \
>   --num_threads=4 \
>   --use_upsert \
>   --use_random_pk \
>   $MASTER_ADDR
> {noformat}
> The table that the UPSERT workload was running against had been pre-populated by the following:
> {noformat}
> kudu perf loadgen --table_num_replicas=3 --keep-auto-table --table_num_hash_partitions=5 --table_num_range_partitions=5 --num_rows_per_thread=800000000 --num_threads=4 $MASTER_ADDR
> {noformat}
> As it turned out, tablet servers accumulated huge number of DMS which required flushing/compaction, but after the memory pressure subsided, the compaction policy was scheduling just one  operation per tablet in every 120 seconds (the latter interval is controlled by {{\-\-flush_threshold_secs}}).  In fact, tablet servers could flush those rowsets non-stop since the maintenance threads were completely idle otherwise and there were no active workload running against the cluster.  Those DMS has been around for long time (much more than 120 seconds) and were anchoring a lot of WAL segments.  So, the operations from the WAL had to be replayed once I restarted the tablet servers.
> It would be great to update the flushing/compaction policy to allow tablet servers run {{FlushDeltaMemStoresOp}} as soon as a DMS becomes older than specified by {{\-\-flush_threshold_secs}} when the maintenance threads are not busy otherwise.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)