You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "shenxingwuying (Jira)" <ji...@apache.org> on 2022/05/05 03:51:00 UTC

[jira] [Comment Edited] (KUDU-3364) Add TimerThread to ThreadPool to support a category of problem

    [ https://issues.apache.org/jira/browse/KUDU-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17532021#comment-17532021 ] 

shenxingwuying edited comment on KUDU-3364 at 5/5/22 3:50 AM:
--------------------------------------------------------------

[~aserbin] 

Simply, I want to another way to implement the periodic rescheduling, and at the same time can mannual trigge the task immediately.

 

I give an example to explain my intent, I add a rebalance interface for kudu CLI.

[https://gerrit.cloudera.org/c/18402/12/src/kudu/master/master_service.cc#686]


was (Author: shenxingwuying):
Simply, I want to another way to implement the periodic rescheduling, and at the same time can mannual trigge the task immediately.

 

I give an example to explain my intent, I add a rebalance interface for kudu CLI.

https://gerrit.cloudera.org/c/18402/12/src/kudu/master/master_service.cc#686

> Add TimerThread to ThreadPool to support a category of problem
> --------------------------------------------------------------
>
>                 Key: KUDU-3364
>                 URL: https://issues.apache.org/jira/browse/KUDU-3364
>             Project: Kudu
>          Issue Type: New Feature
>            Reporter: shenxingwuying
>            Assignee: shenxingwuying
>            Priority: Minor
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> h1. Scenanios
> In general, I am talking about a category of problem.
> There are some periodic tasks or automatically triggered scheduling tasks in kudu. 
> For example, automatic rebalance of cluster data, some GC task and compaction tasks.
> Their implementation is by kudu Thread, maybe std::thread or ThreadPool, the really task internally periodic scheduled or internally strategy to trigge execution. 
> They are all internal, we cann't do some.
> In fact, we need a method our control to trigge the above types of actions.
> In general, I am talking about a category of problem. 
> Some scenarios is significant.
> Below is examples:
>  
> h2. data rebalance
> There are two rebalance ways:
> 1. enable auto rebalance
> 2. use rebalance tool 1.14 before.
> The two ways maybe exist some conflicts at opeations race, because rebalance tool' logic is a litte complex at tool and auto rebalance is running at master.
> In future, auto rebalance at master will become very steady and become the main way for data rebalance. And at the same time, admin opers need a external trigger the rebalance just like auto rebalance.
> But, now auto rebalance is running in a thread and by time period.
> Although we can add a api for MasterService, but the api is synchronize, and will cose very much, we need a asynchronized method to trigger the rebalance.
> h2. auto compaction
> Another example is auto compaction,
> I have found compaction strategy is not always valid, so maybe we need a method  controlled by admin users to triggle compaction.
> If we can do a RowSetInCompaction, we need not restart the kudu cluster.
> h1.  
> h1. My Solution
> Add a timer in ThreadPool. This timer is a worker thread that schedules tasks to the specified thread according to time.
> We can limit only SERIAL ThreadPoolToken can enable TimerThread.
> Pseudo code expresses my intention:
> {code:java}
> //代码占位符
> class TimerThread {
> class Task {         
> ThreadPoolToken token;         
> std::function<void()> f;     
> };
>     
> void Schedule(Task task, int delay_ms) {         
>   tasks_.insert(...);     
> }
> void RunLoop() {
>   while (...) {
>     SleepFor(100ms);
>     tasks = FindTasks();
>     for (auto task : tasks) {
>       token = task.token;
>       token->Submit(task.f);
>       tasks_.erase...             
>     }
>   }
> }
>   scoped_refptr<Thread> thread_;
>   std::multimap<MonoTime, Task>  tasks;
> };
> class ThreadPool{  
> ...  
> TimerThread* timer_;
> ... 
> };
> class ThreadPoolToken {
>   void Scheduler();      
> };{code}
> This scheme can be compatible with the previous ThreadPool, and timer is nullptr by default.
> For periodic tasks, We can use a Control ThreadPool with timer to refact some codes to make them more clear, to avoid the problem of too many single threads in the past.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)