You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "shenxingwuying (Jira)" <ji...@apache.org> on 2022/04/24 08:32:00 UTC

[jira] [Updated] (KUDU-3364) ThreadPool Timer to execute some Periodic tasks and multi control send tasks

     [ https://issues.apache.org/jira/browse/KUDU-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

shenxingwuying updated KUDU-3364:
---------------------------------
    Description: 
h1. Scenanios

In general, I am talking about a category of problem.

There are some periodic tasks or automatically triggered scheduling tasks in kudu. 

For example, automatic rebalance of cluster data, some GC task and compaction tasks.

Their implementation is by kudu Thread, maybe std::thread or ThreadPool, the really task internally periodic scheduled or internally strategy to trigge execution. 

They are all internal, we cann't do some.

In fact, we need a method our control to trigge the above types of actions.

In general, I am talking about a category of problem. 
Some scenarios is significant.

Below is examples:

 
h2. data rebalance

There are two rebalance ways:

1. enable auto rebalance
2. use rebalance tool 1.14 before.

The two ways maybe exist some conflicts at opeations race, because rebalance tool' logic is a litte complex at tool and auto rebalance is running at master.

In future, auto rebalance at master will become very steady and become the main way for data rebalance. And at the same time, admin opers need a external trigger the rebalance just like auto rebalance.

But, now auto rebalance is running in a thread and by time period.
Although we can add a api for MasterService, but the api is synchronize, and will cose very much, we need a asynchronized method to trigger the rebalance.
h2. auto compaction

Another example is auto compaction,
I have found compaction strategy is not always valid, so maybe we need a method  controlled by admin users to triggle compaction.

If we can do a RowSetInCompaction, we need not restart the kudu cluster.
h1.  
h1. My Solution

Add a timer in ThreadPool. This timer is a worker thread that schedules tasks to the specified thread according to time.

We can limit only SERIAL ThreadPoolToken can enable TimerThread.
Pseudo code expresses my intention:
{code:java}
//代码占位符
class TimerThread {
class Task {         
ThreadPoolToken token;         
std::function<void()> f;     
};
    
void Schedule(Task task, int delay_ms) {         
  tasks_.insert(...);     
}
void RunLoop() {
  while (...) {
    SleepFor(100ms);
    tasks = FindTasks();
    for (auto task : tasks) {
      token = task.token;
      token->Submit(task.f);
      tasks_.erase...             
    }
  }
}
  scoped_refptr<Thread> thread_;
  std::multimap<MonoTime, Task>  tasks;
};

class ThreadPool{  
...  
TimerThread* timer_;
... 
};

class ThreadPoolToken {
  void Scheduler();      
};{code}
This scheme can be compatible with the previous ThreadPool, and timer is nullptr by default.

For periodic tasks, We can use a Control ThreadPool with timer to refact some codes to make them more clear, to avoid the problem of too many single threads in the past.

  was:
# Scenanios
In general, I am talking about a category of problem.

There are some periodic tasks or automatically triggered scheduling tasks in kudu. 

For example, automatic rebalance of cluster data, some GC task and compaction tasks.

Their implementation is by kudu Thread, maybe std::thread or ThreadPool, the really task internally periodic scheduled or internally strategy to trigge execution. 

They are all internal, we cann't do some.

In fact, we need a method our control to trigge the above types of actions.

In general, I am talking about a category of problem. 
Some scenarios is significant.

Below is examples:
## data rebalance
There are two rebalance ways:

1. enable auto rebalance
2. use rebalance tool 1.14 before.

The two ways maybe exist some conflicts at opeations race, because rebalance tool' logic is a litte complex at tool and auto rebalance is running at master.

In future, auto rebalance at master will become very steady and become the main way for data rebalance. And at the same time, admin opers need a external trigger the rebalance just like auto rebalance.

But, now auto rebalance is running in a thread and by time period.
Although we can add a api for MasterService, but the api is synchronize, and will cose very much, we need a asynchronized method to trigger the rebalance.

## auto compaction

Another example is auto compaction,
I have found compaction strategy is not always valid, so maybe we need a method  controlled by admin users to triggle compaction.

If we can do a RowSetInCompaction, we need not restart the kudu cluster.


# My Solution
Add a timer in ThreadPool. This timer is a worker thread that schedules tasks to the specified thread according to time.

We can limit only SERIAL ThreadPoolToken can enable TimerThread.
Pseudo code expresses my intention:
```
class TimerThread {
    class Task {
        ThreadPoolToken token;
        std::function<void()> f;
    }
    
    void Schedule(Task task, int delay_ms) {
        tasks_.insert(...);
    }
    void RunLoop() {
        while (...) {
            SleepFor(100ms);
            
            tasks = FindTasks();
            for (auto task : tasks) {
                token = task.token;
                token->Submit(task.f);
                tasks_.erase...
            }
        }
    }

  scoped_refptr<Thread> thread_;
  std::multimap<MonoTime, Task>  tasks;
};

class ThreadPool {
 ...
 TimerThread* timer_;   
 ...
}

class ThreadPoolToken {
    
    void Scheduler();
    
}

```

This scheme can be compatible with the previous ThreadPool, and timer is nullptr by default.

For periodic tasks, We can use a Control ThreadPool with timer to refact some codes to make them more clear, to avoid the problem of too many single threads in the past.


> ThreadPool Timer to execute some Periodic tasks and multi control send tasks
> ----------------------------------------------------------------------------
>
>                 Key: KUDU-3364
>                 URL: https://issues.apache.org/jira/browse/KUDU-3364
>             Project: Kudu
>          Issue Type: New Feature
>            Reporter: shenxingwuying
>            Assignee: shenxingwuying
>            Priority: Minor
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> h1. Scenanios
> In general, I am talking about a category of problem.
> There are some periodic tasks or automatically triggered scheduling tasks in kudu. 
> For example, automatic rebalance of cluster data, some GC task and compaction tasks.
> Their implementation is by kudu Thread, maybe std::thread or ThreadPool, the really task internally periodic scheduled or internally strategy to trigge execution. 
> They are all internal, we cann't do some.
> In fact, we need a method our control to trigge the above types of actions.
> In general, I am talking about a category of problem. 
> Some scenarios is significant.
> Below is examples:
>  
> h2. data rebalance
> There are two rebalance ways:
> 1. enable auto rebalance
> 2. use rebalance tool 1.14 before.
> The two ways maybe exist some conflicts at opeations race, because rebalance tool' logic is a litte complex at tool and auto rebalance is running at master.
> In future, auto rebalance at master will become very steady and become the main way for data rebalance. And at the same time, admin opers need a external trigger the rebalance just like auto rebalance.
> But, now auto rebalance is running in a thread and by time period.
> Although we can add a api for MasterService, but the api is synchronize, and will cose very much, we need a asynchronized method to trigger the rebalance.
> h2. auto compaction
> Another example is auto compaction,
> I have found compaction strategy is not always valid, so maybe we need a method  controlled by admin users to triggle compaction.
> If we can do a RowSetInCompaction, we need not restart the kudu cluster.
> h1.  
> h1. My Solution
> Add a timer in ThreadPool. This timer is a worker thread that schedules tasks to the specified thread according to time.
> We can limit only SERIAL ThreadPoolToken can enable TimerThread.
> Pseudo code expresses my intention:
> {code:java}
> //代码占位符
> class TimerThread {
> class Task {         
> ThreadPoolToken token;         
> std::function<void()> f;     
> };
>     
> void Schedule(Task task, int delay_ms) {         
>   tasks_.insert(...);     
> }
> void RunLoop() {
>   while (...) {
>     SleepFor(100ms);
>     tasks = FindTasks();
>     for (auto task : tasks) {
>       token = task.token;
>       token->Submit(task.f);
>       tasks_.erase...             
>     }
>   }
> }
>   scoped_refptr<Thread> thread_;
>   std::multimap<MonoTime, Task>  tasks;
> };
> class ThreadPool{  
> ...  
> TimerThread* timer_;
> ... 
> };
> class ThreadPoolToken {
>   void Scheduler();      
> };{code}
> This scheme can be compatible with the previous ThreadPool, and timer is nullptr by default.
> For periodic tasks, We can use a Control ThreadPool with timer to refact some codes to make them more clear, to avoid the problem of too many single threads in the past.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)