You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "shenxingwuying (Jira)" <ji...@apache.org> on 2022/11/21 12:05:00 UTC
[jira] [Assigned] (KUDU-3422) provide compact CLI tools for kudu administrators

     [ https://issues.apache.org/jira/browse/KUDU-3422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

shenxingwuying reassigned KUDU-3422:
------------------------------------

    Assignee: shenxingwuying

> provide compact CLI tools for kudu administrators
> -------------------------------------------------
>
>                 Key: KUDU-3422
>                 URL: https://issues.apache.org/jira/browse/KUDU-3422
>             Project: Kudu
>          Issue Type: New Feature
>            Reporter: shenxingwuying
>            Assignee: shenxingwuying
>            Priority: Major
>
> h1. Motivation
> In kudu, compaction jobs may be a suffering at some scenario, for example:
>  # mrs, dms flush not timely enough. The patch for this: [https://gerrit.cloudera.org/c/17743/]
>  # Disk space amplification is too serious, need compact all rs, but no jobs runs, even when no maintenance job and workload is very low.
>  # Some kinds of gc jobs should have been launched but no jobs runs, even when no maintenance job and workload is very low.
> We can solve every problem about them case by case. Compaction jobs don't work well may be complex, bugs exist or strategies are not good enough and should be improved. Our new optimize scheme maybe not reach the effect we expected. And we should ensure the new optimization online by upgrade kudu, upgrade need consider some other situations about product environment and users' worries, and the operation itself may encounter another suffering: bootstrap is very very slow.
> All in words, It's a very complex. Every problems need take some time to analyse. The problem when production environment happens, administrators have to change some gflags parameters and restart kudu to expect some compaction jobs can be scheduled. You see, restart kudu may take too much time and restarting cluster may loss availability.
> I want to support a quick method to solve them without restart. It's a troubleshooting for the cases above, not a root solution.
> At this, I view them from another angle to solve some difficulties. The solution can be accepted by SREs.
> h1. Solution
> We can deal with the problem in a flexible way: kudu administrators can launch some kind of compaction jobs based on their jugdements.
> To support the idea. Kudu CLI tool should add a command, like this:
>  
> {{kudu compact <master_list> --tables=<tables> --tablet_ids=<tablet_ids> --servers=<host:port> --compact_type=<compact_rowsets,deleted_rowset_gc,...>}}
> kudu-tserver's network service should add a api, when receive the command, it launch a corresponding compact job. The job should run at ThreadPool 'thread_pool_' in class 'MaintenanceManager'. The compaction job is triggered by administrators and it should skip the best score computation, so its a method for abnormal cases.
> The compaction job should run at another thread not the service thread, because it may be a long time job.
> So we should provide a method to check the job's status.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)